来自今天的爱可可AI前沿推介
[CV] Peekaboo: Text to Image Diffusion Models are Zero-Shot Segmentors
R Burgert, K Ranasinghe, X Li, M S. Ryoo
[Stony Brook University]
Peekaboo: 文本到图像扩散模型是零样本分割器
要点:
-
提出了新的无监督分割机制,适用于语义分割和引用分割设置; -
确定了预训练文本到图像扩散模型中存在的像素级定位信息; -
为下游分割任务提供了一种利用Stable Diffusion模型作为现成基础模型的机制。
摘要:
最近基于扩散的生成模型与视觉-语言模型相结合,能用自然语言提示创建逼真的图像。虽然这些模型是在大型互联网规模的数据集上训练的,但此类预训练模型不会直接引入任何语义定位。大多数当前的定位方法都依赖于以边框或分割掩码的形式进行人工标注的定位信息。例外是一些无监督方法,用面向本地化的架构或损失函数,但需要单独训练。本文探索了无需接触此类本地化信息的现成扩散模型如何能在没有细分特定再训练的情况下对各种语义短语进行定位。介绍了一种推理时优化过程,该过程能生成以自然语言为条件的分割掩码。本文评估了Pascal VOC数据集上无监督语义分割的建议Peekaboo。评估了RefCOCO数据集上的引用分割。总之,本文提出第一个零样本、开放词汇、无监督(无本地化信息)、语义定位技术,利用基于扩散的生成模型,无需再训练。
Recent diffusion-based generative models combined with vision-language models are capable of creating realistic images from natural language prompts. While these models are trained on large internet-scale datasets, such pre-trained models are not directly introduced to any semantic localization or grounding. Most current approaches for localization or grounding rely on human-annotated localization information in the form of bounding boxes or segmentation masks. The exceptions are a few unsupervised methods that utilize architectures or loss functions geared towards localization, but they need to be trained separately. In this work, we explore how off-the-shelf diffusion models, trained with no exposure to such localization information, are capable of grounding various semantic phrases with no segmentation-specific re-training. An inference time optimization process is introduced, that is capable of generating segmentation masks conditioned on natural language. We evaluate our proposal Peekaboo for unsupervised semantic segmentation on the Pascal VOC dataset. In addition, we evaluate for referring segmentation on the RefCOCO dataset. In summary, we present a first zero-shot, open-vocabulary, unsupervised (no localization information), semantic grounding technique leveraging diffusion-based generative models with no re-training. Our code will be released publicly.
论文链接:https://arxiv.org/abs/2211.13224
内容中包含的图片若涉及版权问题,请及时与我们联系删除
评论
沙发等你来抢