Real-time Photorealistic Dynamic Scene Representation
and Rendering with 4D Gaussian Splatting
ICLR 2024
Zeyu Yang, Hongye Yang, Zijie Pan, Li Zhang
Fudan University
4D Gaussian Splatting: Modeling Dynamic Scenes
with Native 4D Primitives
Extended Version
Zeyu Yang1, Zijie Pan1, Xiatian Zhu2, Li Zhang1,
Jianfeng Feng1,Yu-Gang Jiang1, Philip H.S. Torr3
1Fudan University, 2University of Surrey, 3University of Oxford


4D Gaussian Splatting achieves real-time high-fidelity video synthesis by optimizing a collection of 4D primitives to fit the underlying spatio-temporal 4D volume of dynamic scenes.


(a) Indoor scene with intricate dynamics and illuminations

(b) Novel view synthesis from monocular dynamic videos

(c) Reconstruction on the Urban scene



Abstract


Reconstructing dynamic 3D scenes from 2D images and generating diverse views over time is challenging due to scene complexity and temporal dynamics. Despite advancements in neural implicit models, limitations persist: (i) Inadequate Scene Structure: Existing methods struggle to reveal the spatial and temporal structure of dynamic scenes from directly learning the complex 6D plenoptic function. (ii) Scaling Deformation Modeling: Explicitly modeling scene element deformation becomes impractical for complex dynamics. To address these issues, we consider the spacetime as an entirety and propose to approximate the underlying spatio-temporal 4D volume of a dynamic scene by optimizing a collection of 4D primitives, with explicit geometry and appearance modeling. Learning to optimize the 4D primitives enables us to synthesize novel views at any desired time with our tailored rendering routine. Our model is conceptually simple, consisting of a 4D Gaussian parameterized by anisotropic ellipses that can rotate arbitrarily in space and time, as well as view-dependent and time-evolved appearance represented by the coefficient of 4D spherindrical harmonics. This approach offers simplicity, flexibility for variable-length video and end-to-end training, and efficient real-time rendering, making it suitable for capturing complex dynamic scene motions. Experiments across various benchmarks, including monocular and multi-view scenarios, demonstrate our 4DGS model's superior visual quality and efficiency.



Method Overview


Schematic illustration: 4D Gaussian Splatting achieves real-time high-fidelity video synthesis by optimizing a collection of 4D primitives to fit the underlying spatiotemporal 4D volume of dynamic scenes. The rendering process of 4D Gaussian Splatting has conceptual parallels with the imaging process of a dynamic scenes.

Image description


Rendering pipeline: Given a time t and view I each 4D Gaussian is first decomposed into a conditional 3D Gaussian and a marginal 1D Gaussian. Subsequently, the conditional 3D Gaussian is projected to a 2D splat. Finally, we integrate the planar conditional Gaussian, 1D marginal Gaussian, and time-evolving view-dependent color to render the view I.

Image description


Qualitative novel-view synthesis results


4D Gaussian Splatting outperforms previous arts in both visual quality and efficiency.



Temporal feature


(a) 4D Gaussian Splatting can capture the coarse dynamic of scene by only optimizing the rendering loss.

(b) The statics in time dimension naturally form a mask to delineate dynamic and static regions.

(c) 4D Gaussian Splatting allows seamless transition between each frame, thus can naturally simulate and bullet-time effects.



Reconstruction for urban scenes


4D Gaussian Splatting can be also applied for the reconstruction of large-scale urban scenes.



Application for 4D segmentation


4D Gaussian Splatting can be also applied for multi-granularity segmentation.

RGB
Scale-gated feature
2D segmentation
Segmented Gaussians




Acknowledgements: The website template was borrowed from Lior Yariv. The video comparison script is modified based on NeO 360.