Enhancing High-Resolution 3D Generation through Pixel-wise Gradient Clipping

ICLR 2024

Zijie Pan1, Jiachen Lu1, Xiatian Zhu2, Li Zhang1

1Fudan University 2University of Surrey


Abstract

High-resolution 3D object generation remains a challenging task primarily due to the limited availability of comprehensive annotated training data. Recent advancements have aimed to overcome this constraint by harnessing image generative models, pretrained on extensive curated web datasets, using knowledge transfer techniques like Score Distillation Sampling (SDS). Efficiently addressing the requirements of high-resolution rendering often necessitates the adoption of latent representation-based models, such as the Latent Diffusion Model (LDM). In this framework, a significant challenge arises: To compute gradients for individual image pixels, it is necessary to backpropagate gradients from the designated latent space through the frozen components of the image model, such as the VAE encoder used within LDM. However, this gradient propagation pathway has never been optimized, remaining uncontrolled during training. We find that the unregulated gradients adversely affect the 3D model's capacity in acquiring texture-related information from the image generative model, leading to poor quality appearance synthesis. To address this overarching challenge, we propose an innovative operation termed Pixel-wise Gradient Clipping (PGC) designed for seamless integration into existing 3D generative models, thereby enhancing their synthesis quality. Specifically, we control the magnitude of stochastic gradients by clipping the pixel-wise gradients efficiently, while preserving crucial texture-related gradient directions. Despite this simplicity and minimal extra cost, extensive experiments demonstrate the efficacy of our PGC in enhancing the performance of existing 3D generative models for high-resolution object rendering.


With PGC, high-quality and high-resolution 3D models/textures can be generated from text prompts.

a wooden car

an astronaut riding a horse

A panda is dressed in armor, holding a spear in one hand and a shield in the other


Video


Comparisons and ablation studies

By optimizing SDF field and texture field as Fantasia3D with different LDMs, we show a marked improvement in high-resolution 3D texture synthesis using our proposed PGC.

SD v2.1-base
+PGC
SDXL v0.9
+PGC
an angry cat
a black people taking pictures with a camera
a dragon holding a sword
a castle on a car
an old man
a werewolf archer
Luffy wearing a motorcycle helmet

BibTex

@inproceedings{pan2024enhancing,
  title={Enhancing High-Resolution 3D Generation through Pixel-wise Gradient Clipping},
  author={Pan, Zijie and Lu, Jiachen and Zhu, Xiatian and Zhang, Li},
  booktitle={International Conference on Learning Representations (ICLR)},
  year={2024}
}

              


Template from https://dreamfusion3d.github.io/