Achieving high-fidelity 3D reconstruction from monocular video remains challenging due to the inherent limitations of traditional methods like Structure-from-Motion (SfM) and monocular SLAM in accurately capturing scene details. While differentiable rendering techniques such as Neural Radiance Fields (NeRF) address some of these challenges, their high computational costs make them unsuitable for real-time applications. Additionally, existing 3D Gaussian Splatting (3DGS) methods often focus on photometric consistency, neglecting geometric accuracy and failing to exploit SLAM's dynamic depth and pose updates for scene refinement. We propose a framework integrating dense SLAM with 3DGS for real-time, high-fidelity dense reconstruction. Our approach introduces SLAM-Informed Adaptive Densification, which dynamically updates and densifies the Gaussian model by leveraging dense point clouds from SLAM. Additionally, we incorporate Geometry-Guided Optimization, which combines edge-aware geometric constraints and photometric consistency to jointly optimize the appearance and geometry of the 3DGS scene representation, enabling detailed and accurate SLAM mapping reconstruction. Experiments on the Replica and TUM-RGBD datasets demonstrate the effectiveness of our approach, achieving state-of-the-art results among monocular systems. Specifically, our method achieves a PSNR of 36.864, SSIM of 0.985, and LPIPS of 0.040 on Replica, representing improvements of 10.7%, 6.4%, and 49.4%, respectively, over the previous SOTA. On TUM-RGBD, our method outperforms the closest baseline by 10.2%, 6.6%, and 34.7% in the same metrics. These results highlight the potential of our framework in bridging the gap between photometric and geometric dense 3D scene representations, paving the way for practical and efficient monocular dense reconstruction.
从单目视频实现高保真三维重建仍然充满挑战,原因在于传统方法如结构光(Structure-from-Motion, SfM)和单目 SLAM 在准确捕捉场景细节方面的内在局限性。尽管可微渲染技术(如神经辐射场,NeRF)可以解决部分问题,但其高计算成本使其不适用于实时应用。此外,现有的三维高斯散点(3DGS)方法通常关注光度一致性,而忽略了几何准确性,并未利用 SLAM 的动态深度和位姿更新进行场景优化。 我们提出了一种将密集 SLAM 与 3DGS 集成的框架,用于实时高保真密集重建。该方法引入了 SLAM 引导的自适应加密(SLAM-Informed Adaptive Densification),通过利用 SLAM 的密集点云动态更新和加密高斯模型。此外,我们结合了 几何引导优化(Geometry-Guided Optimization),融合基于边缘的几何约束与光度一致性,共同优化 3DGS 场景表示的外观和几何,使得 SLAM 映射的重建更加细致和准确。 在 Replica 和 TUM-RGBD 数据集上的实验表明,我们的方法实现了单目系统的最新成果。在 Replica 数据集上,我们的方法取得了 PSNR 为 36.864、SSIM 为 0.985 和 LPIPS 为 0.040 的性能,分别比之前的最新方法提高了 10.7%、6.4% 和 49.4%。在 TUM-RGBD 数据集上,我们的方法在相同指标上比最接近的基线方法分别高出 10.2%、6.6% 和 34.7%。这些结果表明,我们的框架在弥合光度和几何密集三维场景表示的差距方面的潜力,为实用高效的单目密集重建铺平了道路。