本文提出 SnapFusion,一种移动端高性能 Stable Diffusion 模型。SnapFusion 有两点核心贡献:(1)通过对现有 UNet 的逐层分析,定位速度瓶颈,提出一种新的高效 UNet 结构(Efficient UNet),可以等效替换原 Stable Diffusion 中的 UNet,实现 7.4x 加速;(2)对推理阶段的迭代步数进行优化,提出一种全新的步数蒸馏方案(CFG...
谷歌研究,大脑团队 我们介绍了 Imagen,这是一种文本到图像的扩散模型,具有前所未有的逼真度和深层次的语言理解。 Imagen 建立在理解文本的大型 Transformer 语言模型的强大功能之上,并依赖于扩散模型在高保真图像生成方面的优势。 我们的关键发现是,在纯文本语料库上预训练的通用大型语言模型(例如T5)令人惊讶 有效编码...
STABLE DiffusionINTERIOR architectureIn this article, based on its potential contribution to architectural design processes, research has been made on the "text-to-image" systems of artificial intelligence. In the research, the four most common systems Craiyon, Dall-E, Midjour...
Stable Diffusion is a latent text-to-image diffusion model. Thanks to a generous compute donation from Stability AI and support from LAION, we were able to train a Latent Diffusion Model on 512x512 images from a subset of the LAION-5B database. Similar to Google's Imagen, this model us...
Text-to-image generation is a comprehensive task that combines the fields of Computer Vision (CV) and Natural Language Processing (NLP). Research on the methods of text to image based on Generative Adversarial Networks (GANs) continues to grow in popular
Stable Diffusion {v1-4, v1-5, v2-base, v2-1}.Stable Diffusion (v1-4, v1-5, v2-base, v2-1) is a family of 1B-parameter text-to-image models based on latent diffusion [4] trained on LAION [40], a large-scale paired text-image dataset. ...
Title: SwiftBrush: One-Step Text-to-Image Diffusion Model with Variational Score DistillationFrom VinAI ResearchCVPR 2024 图1.SwiftBrush overview Highlight 作者提出了一个image-free的蒸馏方法SwiftBrush. 已有方法Score Distillation Sampling (SDS)有过饱和,过平滑和多样性差的问题,本文基于SDS来提出了Variatio...
如图6所示,MIGC受分而治之的思想启发,将复杂的MIG多实例生成任务在Stable Diffusion的Cross-Attention(CA)层拆解成多个简单的单实例生成任务,通过整合子任务的解以得到MIG的解。在处理子任务“a blue cat”的时候,我们就只需要输入“a blue cat”文本而不带着"a green dog"的属性信息,这就避免了文本泄露。同时...
Figure 1. Gender representation for DALLE-v2, Stable Diffusion, Google Image Search 2020, and BLS data. Figure 2. A sample of the first four images generated for the professions of “computer programmer” and “housekeeper” using the DALL-E v2 and Stable Diffusion models. Notably, o...
Each path intuitively functions as a "painter" for depicting a particular textual concept onto a specified image region at a diffusion timestep. Comprehensive experiments reveal that RAPHAEL outperforms recent cutting-edge models, such as Stable Diffusion, ERNIE-ViLG 2.0, DeepFloyd, and DALL-E 2...