Internet Explorer: Targeted Representation Learning on the Open Web

Alexander C. Li, Ellis Brown, Alexei A. Efros, Deepak Pathak

Carnegie Mellon University & University of California

Internet Explorer:开放网络上的目标表征学习
要点:
1.现代视觉模型通常依赖于在大型静态数据集上预训练的微调通用模型。 这些通用模型仅捕获其预训练数据集中的知识,这些数据是微小的、过时的互联网快照——每天上传数十亿张图片。

2.文章提出了一种替代方法:与其希望我们的静态数据集在大规模预训练后转移到我们想要的任务,不如动态地利用互联网快速训练一个在手头任务上表现出色的小规模模型。 方法称为 Internet Explorer,它以自我监督的方式探索网络,以逐步找到可提高所需目标数据集性能的相关示例。 它在使用文本查询在 Internet 上搜索图像、对下载图像进行自我监督训练、确定哪些图像有用以及确定下一步搜索的优先级之间循环。

一句话总结:

我们跨多个数据集评估 Internet Explorer 并表明它优于或匹配 CLIP oracle 性能,仅使用单个 GPU 桌面主动查询 Internet 30--40 小时。 https://internet-explorer-ssl.github.io/ 上的结果、可视化和视频。[机器翻译+人工校对]

Modern vision models typically rely on fine-tuning general-purpose models pre-trained on large, static datasets. These general-purpose models only capture the knowledge within their pre-training datasets, which are tiny, out-of-date snapshots of the Internet -- where billions of images are uploaded each day. We suggest an alternate approach: rather than hoping our static datasets transfer to our desired tasks after large-scale pre-training, we propose dynamically utilizing the Internet to quickly train a small-scale model that does extremely well on the task at hand. Our approach, called Internet Explorer, explores the web in a self-supervised manner to progressively find relevant examples that improve performance on a desired target dataset. It cycles between searching for images on the Internet with text queries, self-supervised training on downloaded images, determining which images were useful, and prioritizing what to search for next. We evaluate Internet Explorer across several datasets and show that it outperforms or matches CLIP oracle performance by using just a single GPU desktop to actively query the Internet for 30--40 hours. Results, visualizations, and videos at https://internet-explorer-ssl.github.io/

https://arxiv.org/pdf/2302.14051.pdf

内容中包含的图片若涉及版权问题,请及时与我们联系删除