MVSDet: Multi-View Indoor 3D Object Detection via Efficient Plane Sweeps

The key challenge of multi-view indoor 3D object detection is to infer accurate geometry information from images for precise 3D detection. Previous method relies on NeRF for geometry reasoning. However, the geometry extracted from NeRF is generally inaccurate, which leads to sub-optimal detection performance. In this paper, we propose MVSDet which utilizes plane sweep for geometry-aware 3D object detection. To circumvent the requirement for a large number of depth planes for accurate depth prediction, we design a probabilistic sampling and soft weighting mechanism to decide the placement of pixel features on the 3D volume. We select multiple locations that score top in the probability volume for each pixel and use their probability score to indicate the confidence. We further apply recent pixel-aligned Gaussian Splatting to regularize depth prediction and improve detection performance with little computation overhead. Extensive experiments on ScanNet and ARKitScenes datasets are conducted to show the superiority of our model. Our code is available at this https URL.

多视角室内三维物体检测的关键挑战在于从图像中推断出准确的几何信息，以实现精确的三维检测。以往的方法依赖于NeRF进行几何推理。然而，从NeRF中提取的几何信息通常不够准确，导致检测性能不理想。本文提出了一种利用平面扫描的几何感知三维物体检测方法——MVSDet。为避免准确深度预测所需的大量深度平面，我们设计了一种概率采样与软加权机制，以确定像素特征在三维体积中的放置位置。我们为每个像素选择在概率体积中得分最高的多个位置，并使用它们的概率分数来指示置信度。此外，我们进一步应用最新的像素对齐高斯点散射技术来正则化深度预测，以较低的计算开销提升检测性能。我们在ScanNet和ARKitScenes数据集上进行了大量实验，展示了我们模型的优越性。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2410.21566.md

2410.21566.md

MVSDet: Multi-View Indoor 3D Object Detection via Efficient Plane Sweeps

Files

2410.21566.md

Latest commit

History

2410.21566.md

File metadata and controls

MVSDet: Multi-View Indoor 3D Object Detection via Efficient Plane Sweeps