MIT | 基于扁平窗口注意力的高效点云 Transformer

来自今天的爱可可AI前沿推介

[CV] FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer

Z Liu, X Yang, H Tang, S Yang, S Han
[MIT]

FlatFormer: 基于扁平窗口注意力的高效点云 Transformer

要点:

提出 FlatFormer，一种缩小点云 Transformer 和稀疏卷积模型之间效率差距的方法；
用基于窗口的排序和将点划分为大小相等的组，以避免昂贵的结构化和填充开销；
提高了大规模基准的性能，FlatFormer 在 Waymo Open Dataset上提供了最先进的精度，比之前的点云 Transformer 速度提高了4.6倍。

一句话总结:
FlatFormer 是一种用等大小分组而不是等窗口分组对点云进行分割、以提高计算规则性的方法，在Waymo 开放数据集等大规模基准上取得了最先进的精度，比之前的点云 Transformer 速度提高了4.6倍，使其在保持精度的同时比稀疏卷积法更快。

摘要：
Transformer 作为CNN的替代，在许多模态(如文本和图像)中被证明是有效的。对于 3D 点云 Transformer，现有工作主要集中在将其准确性推到最先进水平上。然而，其延迟落后于基于稀疏卷积的模型(慢3倍)，阻碍了在资源受限、延迟敏感的应用中的使用(如无人驾驶)。这种低效率来自于点云的稀疏和不规则的特性，而 Transformer 是为密集的、有规律的工作负载设计的。本文提出 FlatFormer，通过以空间近似换取更好的计算规则性来弥补这一延迟差距。用基于窗口的排序来平整点云，并将点分成大小相等的组，而不是形状相等的窗口。有效避免了昂贵的结构化和填充开销。在组内用自关注力来提取局部特征，交替排序轴以收集来自不同方向的特征，移动窗口以交换组间的特征。FlatFormer 在 Waymo 开放数据集上提供了最先进的准确性，比(基于 Transformer 的)SST加速4.6倍，比(稀疏卷积)CenterPoint 加速1.4倍。这是第一个在边缘GPU上实现实时性能的点云 Transformer，比稀疏卷积方法更快，同时在大规模基准测试中实现了同等甚至更高的精度。

Transformer, as an alternative to CNN, has been proven effective in many modalities (e.g., texts and images). For 3D point cloud transformers, existing efforts focus primarily on pushing their accuracy to the state-of-the-art level. However, their latency lags behind sparse convolution-based models (3x slower), hindering their usage in resource-constrained, latency-sensitive applications (such as autonomous driving). This inefficiency comes from point clouds' sparse and irregular nature, whereas transformers are designed for dense, regular workloads. This paper presents FlatFormer to close this latency gap by trading spatial proximity for better computational regularity. We first flatten the point cloud with window-based sorting and partition points into groups of equal sizes rather than windows of equal shapes. This effectively avoids expensive structuring and padding overheads. We then apply self-attention within groups to extract local features, alternate sorting axis to gather features from different directions, and shift windows to exchange features across groups. FlatFormer delivers state-of-the-art accuracy on Waymo Open Dataset with 4.6x speedup over (transformer-based) SST and 1.4x speedup over (sparse convolutional) CenterPoint. This is the first point cloud transformer that achieves real-time performance on edge GPUs and is faster than sparse convolutional methods while achieving on-par or even superior accuracy on large-scale benchmarks. Code to reproduce our results will be made publicly available.

论文链接：https://arxiv.org/abs/2301.08739

内容中包含的图片若涉及版权问题，请及时与我们联系删除

MIT | 基于扁平窗口注意力的高效点云 Transformer

[CV] FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer

评论