来自今天的爱可可AI前沿推介

【CV] Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models

H Chefer, Y Alaluf, Y Vinker, L Wolf, D Cohen-Or
[Tel-Aviv University]

Attend-and-Excite: 基于注意力的文本到图像扩散模型语义指导

要点:

  1. 提出生成语义护理(GSN)的概念,以减少文本到图像扩散模型中属性的灾难性忽视和不正确绑定;
  2. 提出 Attend-and-Excite,一种基于注意力的 GSN 形式,旨在指导模型注意文本中的所有主题,并提高生成图像的忠实性;
  3. 讨论 GSN 在文本条件生成以外的其他图像编辑和生成任务中的潜在应用。

一句话总结:
Attend-and-Excite 是一种基于注意力的语义指导方法,通过加强输入文本提示中所有主题 Token 的激活,提高了文本到图像传播模型的忠实性。

摘要:
最近的文本-图像生成模型在目标文本提示的指导下,显示了无与伦比的生成多样化和创造性图像的能力。虽然是革命性的,但目前最先进的扩散模型仍然可能无法生成完全传达给定文本提示语义的图像。本文分析了公开可用的 Stable Diffusion 模型,并评估了灾难性忽视的存在,即该模型未能从输入提示中生成一个或多个主题。此外,本文发现在某些情况下,该模型也不能正确地将属性(如颜色)与相应的主题绑定。为了帮助减轻这些失败的情况,本文提出了生成语义护理(GSN)的概念,试图在推理时对生成过程进行即时干预,以提高生成图像的忠实度。用一种基于注意力的 GSN 表述,称为"注意和激发",引导模型完善交叉注意力单元,以注意文本提示中的所有主题标记,并加强——或激发——它们的激活,鼓励模型生成文本提示中描述的所有主题。本文将所提出方法与其他方法进行比较,证明它在一系列的文本提示中更忠实地传达了所需的概念。

Recent text-to-image generative models have demonstrated an unparalleled ability to generate diverse and creative imagery guided by a target text prompt. While revolutionary, current state-of-the-art diffusion models may still fail in generating images that fully convey the semantics in the given text prompt. We analyze the publicly available Stable Diffusion model and assess the existence of catastrophic neglect, where the model fails to generate one or more of the subjects from the input prompt. Moreover, we find that in some cases the model also fails to correctly bind attributes (e.g., colors) to their corresponding subjects. To help mitigate these failure cases, we introduce the concept of Generative Semantic Nursing (GSN), where we seek to intervene in the generative process on the fly during inference time to improve the faithfulness of the generated images. Using an attention-based formulation of GSN, dubbed Attend-and-Excite, we guide the model to refine the cross-attention units to attend to all subject tokens in the text prompt and strengthen - or excite - their activations, encouraging the model to generate all subjects described in the text prompt. We compare our approach to alternative approaches and demonstrate that it conveys the desired concepts more faithfully across a range of text prompts.

论文链接:https://arxiv.org/abs/2301.13826
图片
图片
图片
图片

内容中包含的图片若涉及版权问题,请及时与我们联系删除