Caption Anything支持视觉控制和语言控制 用户界面: 支持鼠标点击(连续或单次点击), 输出描述的语言风格控制(情感, 语种, 想象), 利用chatGPT输出物体对应的wiki知识, 同时支持chatGPT进行对话. 代码同时支持Linux和Windows平台. 用户界面编辑于 2023-04-16 13:33・IP 属地广东 ...
近日南方科技大学和腾讯ARC Lab开源了一款交互式图像描述工具, 基于Segment Anything, BLIP-2 Captioning和chatGPT实现, 通过视觉控制(鼠标点击)获取特定区域的object, 并以多样化的语言风格描述出来. 传统图像描述或密集描述通常以解析全图为目的, 如果遇到清明上河图等场景丰富且object交互特别复杂的图像, 一个简单的句...
这个开源项目名为Caption-Anything,其功能包括以下方面: 1. Segment Anything:可以对图片中的任何物体进行分割。 2. 视觉描述:可以自动生成图片的视觉描述。 3. ChatGPT:可以通过点击图片中的物体,自动生成与该物体相关的文本描述。 4. 采用机器学习技术:该项目
We propose a method to efficiently equip the Segment Anything Model (SAM) with the ability to generate regional captions. SAM presents strong generalizability to segment anything while is short for semantic understanding. By introducing a lightweight query-based feature mixer, we align the region-...
Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/spaces/TencentARC/Caption-Anything https://huggingface.co/s
介绍了一种基于Segment Anything Model(SAM)的方法,能有效地生成局部性描述。通过引入一个轻量的基于查询的特征混合器,SAM将区域特征与语言模型的嵌入空间对齐,以便后续生成描述。提出使用弱监督预训练来解决局部描述数据的稀缺性问题,进行了大量实验证明了该方法的优越性。这项工作在扩大局部性描述数据和探索有效增强...
Segment and Caption Anything Xiaoke Huang, Jianfeng Wang, Yansong Tang, Zheng Zhang, Han Hu, Jiwen Lu, Lijuan Wang, Lijuan Wang, Zicheng Liu CVPR 2024|November 2023 Publication|Publication Download BibTex We propose a method to efficiently equip the Segment Anything Model (SAM) with the ability...
[2] Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, et al. Segment anything. ICCV 2023 [3] Matthias Minderer, Alexey Gritsenko, and Neil Houlsby. Scaling open-vocabulary object detection....
We propose a method to efficiently equip the Segment Anything Model (SAM) with the ability to generate regional captions. SAM presents strong generalizability to segment anything while is short for semantic understanding. By introducing a lightweight query-based feature mixer, we align the region-...
Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/spaces/TencentARC/Caption-Anything https://huggingface.co/s