3D Gaussian Splatting (3DGS) has demonstrated superior quality in modeling 3D objects and scenes. However, generating 3DGS remains challenging due to their discrete, unstructured, and permutation-invariant nature. In this work, we present a simple yet effective method to overcome these challenges. We utilize spherical mapping to transform 3DGS into a structured 2D representation, termed UVGS. UVGS can be viewed as multi-channel images, with feature dimensions as a concatenation of Gaussian attributes such as position, scale, color, opacity, and rotation. We further find that these heterogeneous features can be compressed into a lower-dimensional (e.g., 3-channel) shared feature space using a carefully designed multi-branch network. The compressed UVGS can be treated as typical RGB images. Remarkably, we discover that typical VAEs trained with latent diffusion models can directly generalize to this new representation without additional training. Our novel representation makes it effortless to leverage foundational 2D models, such as diffusion models, to directly model 3DGS. Additionally, one can simply increase the 2D UV resolution to accommodate more Gaussians, making UVGS a scalable solution compared to typical 3D backbones. This approach immediately unlocks various novel generation applications of 3DGS by inherently utilizing the already developed superior 2D generation capabilities. In our experiments, we demonstrate various unconditional, conditional generation, and inpainting applications of 3DGS based on diffusion models, which were previously non-trivial.
3D高斯溅射(3DGS)在建模3D物体和场景方面展示了卓越的质量。然而,由于其离散、无结构且不变的排列特性,生成3DGS仍然具有挑战性。在本研究中,我们提出了一种简单而有效的方法来克服这些挑战。我们利用球面映射将3DGS转化为结构化的2D表示,称为UVGS。UVGS可以被视为多通道图像,其特征维度是多个高斯属性的拼接,如位置、尺度、颜色、不透明度和旋转。我们进一步发现,这些异质特征可以通过精心设计的多分支网络压缩到一个低维(例如3通道)共享特征空间。压缩后的UVGS可以被视为典型的RGB图像。值得注意的是,我们发现,使用潜在扩散模型训练的典型变分自编码器(VAE)可以直接泛化到这种新表示,而无需额外训练。我们创新的表示方法使得利用基础的2D模型(如扩散模型)直接建模3DGS变得轻而易举。此外,通过简单地增加2D UV分辨率以适应更多的高斯,UVGS相较于典型的3D骨干网络,提供了一个可扩展的解决方案。这一方法通过本质上利用已开发的优越2D生成能力,立即开启了3DGS的各种新型生成应用。在我们的实验中,我们展示了基于扩散模型的多种无条件、条件生成和图像修复应用,之前这些任务并非易事。