我们引入了一种新的适配器 VLSM-Adapter,它可以通过 transformer 编码器微调预训练的视觉语言分割模型。 我们在广泛使用的基于 CLIP 的分割模型上的实验表明,VLSM-Adapter 只有 300 万个可训练参数,但其性能优于现有最佳成果,并且与端到端微调的上界相当。 源代码:https://github.com/naamiinepal/vlsm-adapter。 1...
We introduce a novel adapter, VLSM-Adapter, that can fine-tune pretrained vision-language segmentation models using transformer encoders. Our experiments in widely used CLIP-based segmentation models show that with only 3 million trainable parameters, the VLSM-Adapter outperforms state-of-the-art ...
## Setup @@ -33,7 +33,7 @@ For running inference, please update the defaults configs (such as `ckpt_path`, ## Results ### Acknowledgement 0 ...s/datamodule/img_txt_mask/bkai
naamiinepal / vlsm-adapter Public Notifications Fork 3 Star 17 Code Issues Pull requests 1 Actions Projects Security Insights pip in for aiohttp, aiohttp, certifi, jinja2, jinja2, lightning, lightning, sentry-sdk, tornado, tornado, tornado, transformers, urllib3, virtualenv - Update #9...