In this paper, we tackle a new and challenging problem of text-driven generation of 3D garments with high-quality textures. We propose, WordRobe , a novel framework for the generation of unposed & textured 3D garment meshes from user-friendly text prompts. We achieve this by first learning ...
Universal Video-Based Planner 之前大部分的text-to-video工作都是给定文本生成没有限制的视频,在这里我们需要利用video generation来完成规划任务,所以我们多了个起始帧状态的条件。 为了完成这样一个规划器,很重要的一个先决条件是在一条轨迹的生成过程中周围环境是不可以改变的。作者的解决方法是在去噪的过程中,一直...
微动的驱动信号由声音/文本指令综合生成。 前作:Gaia: Zero-shot talking avatar generation 相比前作,可以看出主要变化在控制通路。 贡献:基于 AU 数据用 GPT-4V 洗出一个打了表情标注的图像数据集 AU(Action Unit) 是一种经典的微表情表示,可以基本对应于面部肌群的一些运动范式。特别好的一点是 AU 检测模型...
[LG] Text-Guided Molecule Generation with Diffusion Language Model O网页链接 提出一种新方法TGM-DLM,用于文本引导的分子生成。与现有基于SMILES字符串的自回归方法不同,TGM-DLM采用扩散模型,同时迭代更新SMILES token嵌入,分为两阶段:首先根据文本描述从随机噪声中优化嵌入,然后纠正无效的SMILES字符串。研究表明TGM-...
Text-guided generation offers an intuitive solution to convert voices to desired "DreamVoices" according to the users' needs. Our paper presents two major contributions to VC technology: (1) DreamVoiceDB, a robust dataset of voice timbre annotations for 900 speakers from VCTK and LibriTTS. (2)...
TIFF: Text-Guided Generation of Full-Body Image with Preserved Reference Face for Customized AnimationPython implementationJaechan JoPerformace by methodPerformance comparison of generated images between existing methods and TIFF.MethodL1↓Cos Dist.↓LPIPS↓SSIM↑PSNR↑ LoRA 1.169 0.845 0.750 0.426 11.106...
In this work, we introduce Text-guided 3D Human Generation (T3H), where a model is to generate a 3D human, guided by the fashion description. There are two goals: 1) the 3D human should render articulately, and 2) its outfit is controlled by the given text. To address this T3H task...
TECA: Text-Guided Generation and Editing of Compositional 3D Avatars Hao Zhang, Yao Feng, Peter Kulits, Yandong Wen, Justus Thies, Michael J. Black 2023 Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using only Images Cuican Yu, Gu...
(e.g., contacts and semantics) from text prompts. To address this challenge, we propose to decompose the interaction generation task into two subtasks: hand-object contact generation; and hand-object motion generation. For contact generation, a VAE-based network takes as input a text and an ...
In recent years, denoising diffusion models have achieved remarkable success in generating pixel-level representations with semantic values for image generation modeling. In this study, we propose a novel end-to-end framework, called TGEDiff, focusing on medical image segmentation. TGEDiff fuses a te...