我们的隐式扩散模型(Latent Diffusion Models, LDM) 在图像修复和类条件图像合成(class-conditional image synthesis)方面取得了新的最佳分数,并在包括文本到图像合成、无条件图像生成和超分辨率等人物上的表现都极具竞争力;同时,与基于像素的 DM 相比,我们显著降低了计算要求。 1. 介绍 图像合成是最近发展最引人注目...
Latent Diffusion Model, High-Resolution Image Synthesis with Latent Diffusion Models 时间:21.12 机构:runway TL;DR 这篇文章介绍了一种名为潜在扩散模型(Latent Diffusion Models, LDMs)的新型高分辨率图像合成方法。LDMs通过在预训练的自编码器的潜在空间中应用扩散模型,实现了在有限计算资源下训练高质量图像合成模...
4.4. Super-Resolution with Latent Diffusion 通过将低分辨率的图像作为条件输入,LDMs可以用于增加图像分辨率的任务上,这本质上是一个image-to-image的任务。 评测效果: 人类打分: 4.5. Inpainting with Latent Diffusion 针对图像修复任务,首先相对于pixel-level的方法,LDMs极大提升了修复速度: 其次,在修复质量上,也可...
AutoEncoder(自编码器)是一种无监督学习的神经网络模型,用于学习有效的数据表示。它的目标是将输入数据编码成一种潜在的、紧凑的表示形式,然后从这个表示中重构原始输入。自编码器由两部分组成:编码器(Encoder)和解码器(Decoder)。 编码器(Encoder): 将输入数据映射到潜在表示空间。这一映射过程通常通过神经网络的前...
By introducing cross-attention layers into the model architecture, we turn diffusion models into powerful and flexible generators for general conditioning inputs such as text or bounding boxes and high-resolution synthesis becomes possible in a convolutional manner. Our latent diffusion models (LDMs) ...
High-Resolution Image Synthesis with Latent Diffusion Models Robin Rombach1 ∗ Andreas Blattmann1 ∗ Dominik Lorenz1 Patrick Esser Bjo¨rn Ommer1 1Ludwig Maximilian University of Munich & IWR, Heidelberg University, Germany Runway ML https://github.com/CompVis/latent-...
OpenImages Super-resolution LDM-VQ-4 N/A N/A N/A N/A https://ommer-lab.com/files/latent-diffusion/sr_bsr.zip BSR image degradation OpenImages Layout-to-Image Synthesis LDM-VQ-4 (200 DDIM steps, eta=0) 32.02 15.92 N/A N/A https://ommer-lab.com/files/latent-diffusion/layout2img...
Latent Diffusion Models (LDMs) enable high-quality image synthesis while avoiding excessive compute demands by training a diffusion model in a compressed lower-dimensional latent space. Here, we apply the LDM paradigm to high-resolution video generation, a particularly resource-intensive task. We ...
OpenImages Super-resolution LDM-VQ-4 N/A N/A N/A N/A https://ommer-lab.com/files/latent-diffusion/sr_bsr.zip BSR image degradation OpenImages Layout-to-Image Synthesis LDM-VQ-4 (200 DDIM steps, eta=0) 32.02 15.92 N/A N/A https://ommer-lab.com/files/latent-diffusion/layout2img...
By introducing cross-attention layers into the model architecture, we turn diffusion models into powerful and flexible generators for general conditioning inputs such as text or bounding boxes and high-resolution synthesis becomes possible in a convolutional manner. Our latent diffusion models (LDMs) ...