Text-conditioned diffusion models have emerged as a promising tool for neural video generation. However, current models still struggle with intricate spatiotemporal prompts and often generate restricted or incorrect motion. To address these limitations, we introduce LLM-grounded Video Diffusion (LVD). In...
LLM-GROUNDED VIDEO DIFFUSION MODELSInstead of directly generating videos from the text inputs, LVD first leverages a large language model (LLM) to generate dynamic scene layouts based on the text in…
LLM-grounded Video Diffusion Models (LVD)是基于LLM的视频扩散模型,其官方实现是为了支持LVD论文。该模型利用语言-图像联合预训练模型(LLM)来实现视频内容的扩散和生成。通过结合自然语言描述和视觉信息,LVD能够实现对视频内容的理解和创作,具有更好的视觉动态生成能力。该模型在ICLR 2024会议上有相关研究成果,并提供了...
| Citation | LLM-grounded Video Diffusion Models TL;DR: Text Prompt -> LLM as a Request Parser -> Intermediate Representation (such as an image layout) -> Stable Diffusion -> Image. Updates [2024.1] Added a result with self-hosted Mixtral-8x7B-Instruct-v0.1 (see our reference benchmark...
基于Diffusion的典型可控图片生成模型 zhuanlan.zhihu.com/p/61 (主要关注如何用cross attention的方式注入控制条件) 2.前言 用LLM 提高SD的生成效果,已经是一种常用的做法,主要是利用LLM将图片的描述转成SD的prompt(SD的一些短词的prompt生成效果会好于直白的自然文本)。本文的做法更有意义,直接利用LLM的in context...
LLM-grounded Video Diffusion Models. In Proceedings of the Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, 7–11 May 2024. [Google Scholar] Zhang, H.; Li, X.; Bing, L. Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video ...
[LVD] LLM-grounded Video Diffusion Models (29 Sep 2023)Long Lian, Baifeng Shi, Adam Yala, et al.Long Lian, Baifeng Shi, Adam Yala, Trevor Darrell, Boyi Li VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning (26 Sep 2023)[arXiv 2023] Han Lin, Abhay Zala,...
Yifan Li, Yifan Du, Kun Zhou, Jinpeng Wang, Wayne Xin Zhao, and Ji-Rong Wen. Evaluating object hallucination in large vision-language models. CoRR, abs/2305.10355, 2023d. 22, 27 Long Lian, Boyi Li, Adam Yala, and Trevor Darrell. Llm-grounded diffusion: Enhancing prompt understanding of ...
propose GEODIFFUSION, an embarrassing simple framework to integrate geometric controls into pre-trained diffusion models for detection data generation via text prompts. SPOT: Scalable 3D Pre-training via Occupancy Prediction for Autonomous Driving Keywords: 3D pre-training, object detection, autonomous driv...
Efficient and Effective Vocabulary Expansion Towards Multilingual Large Language Models. 2024. [arxiv] Yu et al. KIEval: A Knowledge-grounded Interactive Evaluation Framework for Large Language Models. ACL 2024. [arxiv] Huang et al. Key-Point-Driven Data Synthesis with its Enhancement on ...