Learning Visual Representations via Language-Guided Sampling
M E Banani, K Desai, J Johnson
[University of Michigan]
基于语言引导采样的视觉表示学习
要点:
-
语言引导采样可以改善视觉学习; -
预训练语言模型可以用来对相似的描述进行采样对比学习; -
语言引导学习可以比基于图像的对比学习学到更好的特征; -
对于无标签的数据集,最近邻实例和语言采样优于其他方法。
一句话总结:
利用语言相似性对语义相似图像对进行对比学习,可以获得比图像-图像和图像-文本表现学习方法更好的特征。
Although an object may appear in numerous contexts, we often describe it in a limited number of ways. This happens because language abstracts away visual variation to represent and communicate concepts. Building on this intuition, we propose an alternative approach to visual learning: using language similarity to sample semantically similar image pairs for contrastive learning. Our approach deviates from image-based contrastive learning by using language to sample pairs instead of hand-crafted augmentations or learned clusters. Our approach also deviates from image-text contrastive learning by relying on pre-trained language models to guide the learning rather than minimize a cross-modal similarity. Through a series of experiments, we show that language-guided learning can learn better features than both image-image and image-text representation learning approaches.
论文链接:https://arxiv.org/abs/2302.12248



内容中包含的图片若涉及版权问题,请及时与我们联系删除


评论
沙发等你来抢