香港中文大学 | Joint-MAE：用于 3D 点云预训练的 2D-3D 联合蒙版自动编码器

Joint-MAE: 2D-3D Joint Masked Autoencoders for 3D Point Cloud Pre-training

Ziyu Guo, Xianzhi Li, Pheng Ann Heng

The Chinese University of Hong Kong & Huazhong University of Science and Technology

Joint-MAE：用于 3D 点云预训练的 2D-3D 联合蒙版自动编码器

要点：
1.Masked Autoencoders (MAE) 在 2D 和 3D 计算机视觉的自我监督学习中表现出了良好的表现。然而，现有的 MAE 风格方法只能从单一模态的数据中学习，即图像或点云，忽略了 2D 和 3D 之间隐含的语义和几何相关性。

2.在本文中，探讨了 2D 模态如何有益于 3D 掩码自动编码，并提出了 Joint-MAE，这是一种用于自监督 3D 点云预训练的 2D-3D 联合 MAE 框架。 Joint-MAE 随机屏蔽输入的 3D 点云及其投影的 2D 图像，然后重建两种模态的屏蔽信息。为了更好的跨模态交互，我们通过两个分层 2D-3D 嵌入模块、一个联合编码器和一个具有模态共享和模型特定解码器的联合解码器构建我们的 JointMAE。在此之上，我们进一步介绍了两种跨模态策略来促进 3D 表示学习，它们是 2D-3D 语义线索的局部对齐注意机制，以及 2D-3D 几何约束的交叉重建损失。

一句话总结：

通过我们的预训练范例，Joint-MAE 在多个下游任务上实现了卓越的性能，例如，线性 SVM 在 ModelNet40 上的准确率为 92.4%，在 ScanObjectNN 最难分割上的准确率为 86.07%。[机器翻译+人工校对]

Masked Autoencoders (MAE) have shown promising performance in self-supervised learning for both 2D and 3D computer vision. However, existing MAE-style methods can only learn from the data of a single modality, i.e., either images or point clouds, which neglect the implicit semantic and geometric correlation between 2D and 3D. In this paper, we explore how the 2D modality can benefit 3D masked autoencoding, and propose Joint-MAE, a 2D-3D joint MAE framework for self-supervised 3D point cloud pre-training. Joint-MAE randomly masks an input 3D point cloud and its projected 2D images, and then reconstructs the masked information of the two modalities. For better cross-modal interaction, we construct our JointMAE by two hierarchical 2D-3D embedding modules, a joint encoder, and a joint decoder with modal-shared and model-specific decoders. On top of this, we further introduce two cross-modal strategies to boost the 3D representation learning, which are local-aligned attention mechanisms for 2D-3D semantic cues, and a cross-reconstruction loss for 2D-3D geometric constraints. By our pre-training paradigm, Joint-MAE achieves superior performance on multiple downstream tasks, e.g., 92.4% accuracy for linear SVM on ModelNet40 and 86.07% accuracy on the hardest split of ScanObjectNN.

https://arxiv.org/pdf/2302.14007.pdf

内容中包含的图片若涉及版权问题，请及时与我们联系删除

香港中文大学 | Joint-MAE：用于 3D 点云预训练的 2D-3D 联合蒙版自动编码器

评论列表

评论