- 简介“基于视觉的机器人策略学习”需要全面理解不同的视觉任务,而不仅仅是分类或分割等单一任务的需求。受此启发,我们介绍了Theia,这是一个用于机器人学习的视觉基础模型,它提炼了多个经过训练的不同视觉任务的现成视觉基础模型。Theia的丰富视觉表示编码了多样的视觉知识,增强了下游机器人学习。大量实验证明,Theia的表现优于其教师模型和以前的机器人学习模型,同时使用更少的训练数据和更小的模型尺寸。此外,我们量化了预训练视觉表示的质量,并假设特征规范分布中的熵越高,机器人学习性能就越好。代码和模型可在https://github.com/bdaiinstitute/theia上获得。
-
- 图表
- 解决问题The paper aims to improve vision-based robot policy learning by developing a vision foundation model that can handle diverse visual tasks beyond single-task needs.
- 关键思路The key idea is to distill multiple off-the-shelf vision foundation models trained on varied vision tasks and use their rich visual representations to enhance downstream robot learning.
- 其它亮点Theia outperforms its teacher models and prior robot learning models using less training data and smaller model sizes. The paper also quantifies the quality of pre-trained visual representations and hypothesizes that higher entropy in feature norm distributions leads to improved robot learning performance. Code and models are available at https://github.com/bdaiinstitute/theia.
- Some related works in this field include 'Learning to Grasp with Heterogeneous Multimodal Sensing: A Data-Driven Approach' and 'Learning to Learn from Simulation: Faster and Better Policy via Sim2Real and Meta Learning'.
NEW
提问交流
提交问题,平台邀请作者,轻松获得权威解答~
向作者提问

提问交流