MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance

2024年06月11日
  • 简介
    最近,文本到图像生成模型的进展大大提高了从文本提示生成逼真图像的能力,这导致了对个性化文本到图像应用的兴趣增加,特别是在多主体场景下。然而,这些进展受到两个主要挑战的阻碍:首先,需要根据文本描述准确地维护每个被引用主体的细节;其次,在单个图像中实现多个主体的协调表示而不引入不一致性是困难的。为了解决这些问题,我们的研究引入了MS-Diffusion框架,用于基于布局指导的零样本图像个性化与多个主体。这种创新方法将基础令牌与特征重采样器集成在一起,以保持主体之间的细节保真度。通过布局指导,MS-Diffusion进一步改进了交叉注意力以适应多个主体输入,确保每个主体条件作用于特定区域。所提出的多主体交叉注意力编排了和谐的主体间组合,同时保持了对文本的控制。全面的定量和定性实验证实,该方法在图像和文本保真度方面超过了现有模型,促进了个性化文本到图像生成的发展。
  • 作者讲解
  • 图表
  • 解决问题
    MS-Diffusion framework for layout-guided zero-shot image personalization with multi-subjects aims to address the challenges of accurately maintaining the details of each referenced subject in accordance with the textual descriptions and achieving a cohesive representation of multiple subjects in a single image without introducing inconsistencies.
  • 关键思路
    The proposed MS-Diffusion framework integrates grounding tokens with the feature resampler to maintain detail fidelity among subjects and uses layout guidance to improve cross-attention for multi-subject inputs, ensuring that each subject condition acts on specific areas. The multi-subject cross-attention orchestrates harmonious inter-subject compositions while preserving the control of texts.
  • 其它亮点
    The experiments show that the MS-Diffusion framework outperforms existing models in both image and text fidelity. The paper uses several datasets and provides open-source code for reproducibility. This work promotes the development of personalized text-to-image generation.
  • 相关研究
    Recent related studies include 'Generative Adversarial Text-to-Image Synthesis: A Review' and 'Text-to-Image Generation: A Survey'.
许愿开讲
PDF
原文
点赞 收藏
向作者提问
NEW
分享到Link

提问交流

提交问题,平台邀请作者,轻松获得权威解答~

向作者提问