- 简介最近在基于查询的多摄像头3D物体检测方面的进展,特点是在3D空间中初始化物体查询,然后从透视视图图像中采样特征进行多轮查询精炼。在这样的框架中,接近同一摄像机光线的查询点可能会从非常接近的像素中采样类似的特征,导致查询特征模糊,检测精度降低。为此,我们引入了RayFormer,一种基于摄像机光线的基于查询的3D物体检测器,它将物体查询的初始化和特征提取与相机的光学特性对齐。具体而言,RayFormer通过升降喷射方法将透视视图图像特征转换为鸟瞰图(BEV),并根据相机光线将BEV地图分段。物体查询沿着每个相机光线均匀且稀疏地初始化,有助于将不同的查询投影到图像的不同区域以提取不同的特征。此外,我们利用图像的实例信息通过沿着2D物体检测框的光线进一步涉及沿光线的附加查询来补充均匀初始化的物体查询。为了提取适合不同查询的独特物体级特征,我们设计了一种光线采样方法,适当组织了图像和鸟瞰图上的特征采样点的分布。我们在nuScenes数据集上进行了大量实验,以验证我们提出的光线启发式模型设计。所提出的RayFormer分别达到55.5%的mAP和63.3%的NDS。我们的代码将提供。
-
- 图表
- 解决问题RayFormer: Camera-Ray-Inspired 3D Object Detection via Sparse and Dense Feature Fusion
- 关键思路The key idea of the paper is to align the initialization and feature extraction of object queries with the optical characteristics of cameras by using a camera-ray-inspired approach. This approach involves transforming perspective-view image features into bird's eye view (BEV), segmenting the BEV map to sectors based on the camera rays, and uniformly and sparsely initializing object queries along each camera ray. Additionally, the paper leverages instance information of images to supplement the uniformly initialized object queries by further involving additional queries along the ray from 2D object detection boxes. A ray sampling method is also designed to suitably organize the distribution of feature sampling points on both images and bird's eye view to extract unique object-level features that cater to distinct queries.
- 其它亮点The proposed RayFormer achieves 55.5% mAP and 63.3% NDS on the nuScenes dataset. The paper also provides extensive experiments to validate the proposed ray-inspired model design. The codes are made available. The paper's approach is innovative and provides a new perspective for query-based multi-camera 3D object detection.
- Related work in this field includes 'PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection' and 'SA-SSD: Structure-Aware Single-Shot Detector for 3D Point Cloud'.
NEW
提问交流
提交问题,平台邀请作者,轻松获得权威解答~
向作者提问

提问交流