Alayrac et al. Flamingo: a Visual Language Model for Few-Shot Learning. 2022. Li et al. Otter: A Multi-Modal Model with In-Context Instruction Tuning. 2023. Kirillov et al. Segment Anything. 2023. Bar et al. Visual Prompting via Image Inpainting. 2022 总的来说,相比于过去传统的做法,Pro...
[CVPR 2023] SadTalker:Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation - hixuco/SadTalker
• Egocentric Scene ...Balanced Multimodal Learning via On the Fly Gradient Modulation | CVPR 2022 • Balanced Multimod...Transformers for Multimodal Self Supervised Learning from Raw Video, Audio and Text | NeurIPS 2021 • Transformers for ...Multimodal Few-Shot Learning with Frozen Language ...
AudioLDM: Text-to-Audio Generation with Latent Diffusion Models(2023), Haohe Liu et al.[pdf]Mus...
提出语义区域自适应归一化(SEAN),它是条件生成对抗网络的简单但有效的构建块(条件是描述输出图像中的语义区域的分割mask)。基于SEAN,可以构建单独控制每个语义区域风格的网络结构,例如可为每个区域指定一个风格参考图像。代码:https://github.com/ZPdesu/SEAN ...
今天分享一篇发表在CVPR 2020上的论文:LT-Net: Label Transfer by Learning Reversible Voxel-wise Correspondence for One-shot Medical Image Segmentation (原文链接:[1])。 1 研究背景 近年来随着深度学习的快速发展,深度卷积神经网络 (DCNNs)在许多分割任务上取得很好的性能。但是对于3D医学图像分割任务,获得3D空...
这篇论文去年8月就已经在Arxiv上发布,提出了一个叫做DenseNet的模型,让CNN中的每一层都以前馈的方式和所有其他层相连。 L层的传统卷积网络具有L个连接,而DenseNet具有L(L+1)/2个连接。对于每个层来说,它之前所有层的特征图都是输入,而它的特征图是之后所有层的输入。
• Egocentric Scene ...Balanced Multimodal Learning via On the Fly Gradient Modulation | CVPR 2022 • Balanced Multimod...Transformers for Multimodal Self Supervised Learning from Raw Video, Audio and Text | NeurIPS 2021 • Transformers for ...Multimodal Few-Shot Learning with Frozen Language...