开辟了第三种temporal方式,也就是object-centric 的策略,这种策略的好处: 只需要处理少量的object queries,而不是dense的feature map,计算量小 不仅考虑了位置先验,还通过global attention考虑了语义相似性 Camera-only 3D detection 在时序建模上的主要流派: 方法流派简介问题代表方法 BEV temporal methods warp BEV fe...
DETR3D:DETR3D遵循DETR的模式,在注意力模式下检测3D目标,准确性与FCOS3D相似,计算成本减半,尽管如此,复杂的计算流程导致其推理速度与FCOS3D相同。Graph-DETR3D扩展了DETR3D,通过在3D空间中采样多个点来生成对象查询的特征,并根据缩放因子动态调整深度目标,使多尺度训练成为可能。 PETR:PETR通过引入3D坐标生成和位置编码...
本文介绍一篇纯视觉目标检测方法:StreamPETR。 S. Wang, Y. Liu, T. Wang, Y. Li, X. Zhang. Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection. ICCV, 2023. 论文地址 …
In this paper, we propose a point-trajectory transformer with long short-term memory for efficient temporal 3D object detection. To this end, we only utilize point clouds of current-frame objects and their historical trajectories as input to minimize the memory bank storage requirement. Furthermore...
由于Temporal Action Detection和Object Detection存在相似性,很多Temporal Action Detection方法采用与Object Detection相似的框架(例如R-C3D采用与Faster R-CNN相似的结构)。 时序动作检测难点较多,解决方法主要针对这些难点。1)目标检测边界框很明确,但时序动作边界比较模糊;2)时序动作检测必须将静态图像(帧图像)结合时序...
Accurate and reliable 3D detection is vital for many applications including autonomous driving vehicles and service robots. In this paper, we present a flexible and high-performance 3D detection framework, named MPPNet, for 3D temporal object detection with point cloud sequences. We propose a novel...
While recent camera-only 3D detection methods leverage multiple timesteps, the limited history they use significantly hampers the extent to which temporal fusion can improve object perception. Observing that existing works' fusion of multi-frame images are instances of temporal stereo matching, we find...
[NeurIPS 2023]Query-basedTemporal Fusion with Explicit Motion for 3D Object Detection Introduction This repository is an official implementation of QTNet. In this paper, we propose a simple and effective Query-based Temporal Fusion Network (QTNet). The main idea is to exploit the object queries ...
作者首先对比了三种BEV下的时序方法:BEV dense feature级融合,BEV proposal级融合,Query级feature融合。明显发现计算量和特征维度呈递减趋势,因此结论是query based方法上限最高,计算量最小。对比的三个model分别是MGTANet(BEV dense),MPPNet(proposal)和他们自己的方案。其实时序整体思路看上去和旷视的StreamPETR很像。
广泛的实验证明它可以很好地推广到其他基于稀疏query的方法,如DETR3D。 二 相关工作 多视角3D目标检测 在自动驾驶中,多视角3D检测是一项重要任务,需要连续处理多摄像头图像,并随时间预测3D边界框。先驱的工作侧重于将多个透视视图高效地转换为单帧的统一3D空间。这种转换可以分为基于BEV的方法和基于稀疏query的方法。