This paper proposes ProEdit - a simple yet effective framework for high-quality 3D scene editing guided by diffusion distillation in a novel progressive manner. Inspired by the crucial observation that multi-view inconsistency in scene editing is rooted in the diffusion model's large feasible output space (FOS), our framework controls the size of FOS and reduces inconsistency by decomposing the overall editing task into several subtasks, which are then executed progressively on the scene. Within this framework, we design a difficulty-aware subtask decomposition scheduler and an adaptive 3D Gaussian splatting (3DGS) training strategy, ensuring high quality and efficiency in performing each subtask. Extensive evaluation shows that our ProEdit achieves state-of-the-art results in various scenes and challenging editing tasks, all through a simple framework without any expensive or sophisticated add-ons like distillation losses, components, or training procedures. Notably, ProEdit also provides a new way to control, preview, and select the "aggressivity" of editing operation during the editing process.
本文提出了 ProEdit——一种简单但高效的框架,用于通过新颖的渐进式扩散蒸馏方式实现高质量的 3D 场景编辑。受到以下关键观察的启发:场景编辑中的多视图不一致性源于扩散模型的巨大可行输出空间(FOS),我们的框架通过将整体编辑任务分解为若干子任务并在场景上逐步执行,来控制 FOS 的规模并减少不一致性。在该框架内,我们设计了一个难度感知的子任务分解调度器和一种自适应的 3D Gaussian Splatting (3DGS) 训练策略,从而保证每个子任务的高质量和高效率。广泛的评估表明,ProEdit 在各种场景和复杂编辑任务中达到了最先进的效果,同时框架简单,不需要昂贵或复杂的附加组件,例如蒸馏损失、额外模块或复杂训练流程。值得注意的是,ProEdit 还提供了一种新的方式来在编辑过程中控制、预览和选择编辑操作的“激进性”。