If you find it useful for your research and applications, please cite related papers/blogs using this BibTeX: @article{li2024llava, title={LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models}, author={Li, Feng and Zhang, Renrui and Zhang, Hao and Zhang...