Eval3D: Interpretable and Fine-grained Evaluation for 3D Generation

1 Massachusetts Institute of Technology    2 University of Washington    3 New York University   
4 Wayve    5 Allen Institute for AI    6 Cornell University   

Abstract

Despite the unprecedented progress in the field of 3D generation, current systems still often fail to produce high-quality 3D assets that are visually appealing and geometrically and semantically consistent across multiple viewpoints. To effectively assess the quality of the generated 3D data, there is a need for a reliable 3D evaluation tool. Unfortunately, existing 3D evaluation metrics often overlook the geometric quality of generated assets or merely rely on black-box multimodal large language models for coarse assessment. In this paper, we introduce Eval3D, a fine-grained, interpretable evaluation tool that can faithfully evaluate the quality of generated 3D assets based on various distinct yet complementary criteria. Our key observation is that many desired properties of 3D generation, such as semantic and geometric consistency, can be effectively captured by measuring the consistency among various foundation models and tools. We thus leverage a diverse set of models and tools as probes to evaluate the inconsistency of generated 3D assets across different aspects. Compared to prior work, Eval3D provides pixel-wise measurement, enables accurate 3D spatial feedback, and aligns more closely with human judgments. We comprehensively evaluate existing 3D generation models using Eval3D and highlight the limitations and challenges of current models.

Challenges of 3D Generation

  1. Structural inconsistency: lack of globally-coherent 3D shape;
  2. Text-3D misalignment: failure to meet the requirements of the input text-prompt;
  3. Semantic i`nconsistency: content change and incoherent semantics;
  4. Geometric inconsistency: misaligned geometry and texture.

Overview of Eval3D

Eval3D GIF
Eval3D offers interpretable, fine-grained and human-aligned metrics to assess the quality of 3D generations from various aspects. We utilize a diverse array of foundation models, measuring consistency among different representations of 3D assets.

Geometric Inconsistency

Geometry inconsistency evaluates texture-geometry misalignment by comparing 3D rendered normal and image-based normal. Bright-yellow indicates large discrepancy.

Structural Consistency

Structural consistency evaluates the geometric coherence of the generated 3D assets by comparing rendered views with the predictions from a novel view synthesis model (Zero-123) across various rotations. We utilize DreamSim to assess image similarity.

Semantic Inconsistency

3D Inconsistency Map
3D inconsistency maps: The proposed 3D metrics, semantic and geometric consistencies, allow fine-grained localization of the artifacts (e.g., Janus issue: multiple nose/face, inconsistent hand geometry, arbitrary surface patterns on the back) by fusing/computing the metrics in 3D space.

Semantic Inconsistency 1

As we toggle the texture on and off, pay close attention to how the hand appears in the geometry. When the texture is off, it looks as if the dog is holding the phone with two hands. However, when the texture is turned on, it seems that only one hand is holding the phone while the other rests on the ground. See how Eval3D Semantic Consistency beautifully highlights this observation above.

Semantic Inconsistency 2

As we move around the camera to focus on different viewpoints of the geometry, the Jannus issue on the dog nose becomes evident. Eval3D detects it again, see the dog nose highlighted in cyan in the figure above!

Semantic Inconsistency 3

Notice the abrupt geometric patterns on the dog’s back, which even create the illusion of a strange arm. Eval3D Semantic Consistency, being 3D-aware, detects this effectively! The inconsistent or abrupt geometry causes the projections of 3D points onto the rendered image's DINO featuresto become noisy, resulting in a high standard deviation in the aggregated DINO features for these 3D points.

Will Eval3D Semantic Consistency be high for GT Objects?

DINO Visualization 1
DINO Visualization 2
DINO Visualization 3
DINO Visualization 4

Check the high consistency of DINO features for Ground-Truth (GT) Objaverse objects.

Eval3D All in One Benchmark
Structutal, Semantic, Geometric, Text-3D Alignement, Aethetcis

BibTeX

          
            @article{cvpr2025eval3d,
              title     = {Eval3D: Interpretable and Fine-grained Evaluation for 3D Generation},
              author    = {Shivam Duggal and Yushi Hu and Oscar Michel and Aniruddha Kembhavi and William T. Freeman and Noah A. Smith and Ranjay Krishna and Antonio Torralba and Ali Farhadi and Wei-Chiu Ma},
              journal   = {CVPR},
              year      = {2025},
            }