与CLIP-L/14相比,经过训练的LIMoE-L/16实现了78.6%的zero-shot ImageNet精度(与76.2%相比),当进一步缩放到H/14时(有额外数据),其达到了84.1%,与使用更大自定义每模态主干和预训练方案的最先进方法相当。我们分析了LIMoE的定量和定性行为,并证明了一些现象,例如对模式的不同处理以及模式特定专家的有机出现。
Hallucination Augmented Contrastive Learning for Multimodal Large Language Model arXiv 2023-12-12 Github - MOCHa: Multi-Objective Reinforcement Mitigating Caption Hallucinations arXiv 2023-12-06 Github - Mitigating Fine-Grained Hallucination by Fine-Tuning Large Vision-Language Models with Caption Rewrite...
Hallucination Augmented Contrastive Learning for Multimodal Large Language Model arXiv 2023-12-12 Github - MOCHa: Multi-Objective Reinforcement Mitigating Caption Hallucinations arXiv 2023-12-06 Github - Mitigating Fine-Grained Hallucination by Fine-Tuning Large Vision-Language Models with Caption Rewrite...
we used UMAP to visualize the inferred biological states and technical noises and scMIB and scIB for integration benchmarking and compared the results of different tasks with those generated by de novo trained models. Transfer learning greatly improved performance on the dogma-diagonal,...
"Vision-Language Tracking With CLIP and Interactive Prompt Learning." TITS (2024). [paper] DMITrack: Zhiyi Mo, Guangtong Zhang, Jian Nong, Bineng Zhong, Zhi Li. "Dual-stream Multi-modal Interactive Vision-language Tracking." MMAsia (2024). [paper] CTVLT: X. Feng, D. Zhang, S. Hu,...
Hallucination Augmented Contrastive Learning for Multimodal Large Language Model arXiv 2023-12-12 Github - MOCHa: Multi-Objective Reinforcement Mitigating Caption Hallucinations arXiv 2023-12-06 Github - Mitigating Fine-Grained Hallucination by Fine-Tuning Large Vision-Language Models with Caption Rewrite...
Hallucination Augmented Contrastive Learning for Multimodal Large Language Model arXiv 2023-12-12 Github - MOCHa: Multi-Objective Reinforcement Mitigating Caption Hallucinations arXiv 2023-12-06 Github - Mitigating Fine-Grained Hallucination by Fine-Tuning Large Vision-Language Models with Caption Rewrite...
Hallucination Augmented Contrastive Learning for Multimodal Large Language Model arXiv 2023-12-12 Github - MOCHa: Multi-Objective Reinforcement Mitigating Caption Hallucinations arXiv 2023-12-06 Github - Mitigating Fine-Grained Hallucination by Fine-Tuning Large Vision-Language Models with Caption Rewrite...
Hallucination Augmented Contrastive Learning for Multimodal Large Language Model arXiv 2023-12-12 Github - MOCHa: Multi-Objective Reinforcement Mitigating Caption Hallucinations arXiv 2023-12-06 Github - Mitigating Fine-Grained Hallucination by Fine-Tuning Large Vision-Language Models with Caption Rewrite...
Hallucination Augmented Contrastive Learning for Multimodal Large Language Model arXiv 2023-12-12 Github - MOCHa: Multi-Objective Reinforcement Mitigating Caption Hallucinations arXiv 2023-12-06 Github - Mitigating Fine-Grained Hallucination by Fine-Tuning Large Vision-Language Models with Caption Rewrite...