Composer: Creative and Controllable Image Synthesis with Composable Conditions

Lianghua Huang, Di Chen, Yu Liu, Yujun Shen, Deli Zhao, Jingren Zhou

Alibaba Group & Ant Group

Composer:具有可组合条件的创造性和可控图像合成

要点:

1.最近在大数据上学习的大规模生成模型能够合成令人难以置信的图像,但可控性有限。这项工作提供了一种新的生成范式,允许灵活控制输出图像,如空间布局和调色板,同时保持合成质量和模型创意。

2.以合成性为核心思想,首先将图像分解为代表性因素,然后以所有这些因素为条件训练扩散模型,以重新组合输入。

3.在推理阶段,丰富的中间表示作为可组合元素工作,为可定制的内容创建带来了巨大的设计空间(即,与分解因子的数量成指数比例)。值得注意的是,称之为Composer的方法支持各种级别的条件,例如作为全局信息的文本描述、作为局部指导的深度图和草图、用于低级细节的颜色直方图等。

4.虽然概念上简单且易于实现,但Composer的功能惊人强大,能够在传统和以前未探索的图像生成和操作任务上实现令人鼓舞的性能,包括但不限于:文本到图像生成、多模式条件图像生成、风格转换、姿势转换、图像转换、虚拟试穿、,从不同方向进行插值和图像变化,通过修改草图重新配置图像,深度或分割图,基于可选调色板的彩色化,等等。此外,通过引入掩蔽的正交表示,Composer能够针对所有上述操作将可编辑区域限制在用户指定的区域,比传统的修复操作更灵活,同时还防止修改该区域之外的像素。尽管以多任务方式进行了训练,但当仅使用字幕作为条件时,Composer在COCO数据集上的文本到图像合成中实现了9.2的零镜头FID,表明其能够产生高质量的结果。

一句话总结:

我们确认Composer作为一个通用框架,在不进行再培训的情况下促进了广泛的经典生成任务。将提供代码和型号。

Recent large-scale generative models learned on big data are capable of synthesizing incredible images yet suffer from limited controllability. This work offers a new generation paradigm that allows flexible control of the output image, such as spatial layout and palette, while maintaining the synthesis quality and model creativity. With compositionality as the core idea, we first decompose an image into representative factors, and then train a diffusion model with all these factors as the conditions to recompose the input. At the inference stage, the rich intermediate representations work as composable elements, leading to a huge design space (i.e., exponentially proportional to the number of decomposed factors) for customizable content creation. It is noteworthy that our approach, which we call Composer, supports various levels of conditions, such as text description as the global information, depth map and sketch as the local guidance, color histogram for low-level details, etc. Besides improving controllability, we confirm that Composer serves as a general framework and facilitates a wide range of classical generative tasks without retraining. Code and models will be made available.

https://arxiv.org/pdf/2302.09778.pdf

内容中包含的图片若涉及版权问题,请及时与我们联系删除