1) 主要是之前的工作pre-train的task中,只有understanding task,而该文章的pre-train task 中既有understanding task(retrieval/classification),也有generation task(captioning) 2) 还有一点是在downstream tasks中,之前的VideoBERT/CBT,做的是action classification/video caption,所以他只为了得到更好的video的representati...
To this end, we present a Multi-Task Collaborative Network (MTCN) that leverages the synergy between multiple tasks to enhance the performance of image-text retrieval. Specifically, we introduce three unimodal tasks, including text-text matching, image multi-label classification, and text multi-...
Text-to-image person retrieval aims to retrieve relevant target individuals based on given textual descriptions. The main challenge faced by this task is h... Z Li,Y Xie - 《Multimedia Systems》 被引量: 0发表: 2024年 Asymmetric bi-encoder for image-text retrieval Image-text retrieval aims ...
In the past few years, cross-modal image-text retrieval (ITR) has experienced increased interest in the research community due to its excellent research value and broad real-world application. It is designed for the scenarios where the queries are from one modality and the retrieval galleries fro...
Source code of our AAAI 2024 paper "Cross-Modal and Uni-Modal Soft-Label Alignment for Image-Text Retrieval" - lerogo/aaai24_itr_cusa
During various types of data queries in digital library,image retrieval plays a very important role.In order to better achieve the image retrieval taske,a new content-based image retrieval method was proposed in this paper.In this method,image was divided some blocks.Then,DCT coefficient was com...
TaskBLIP w/ ViT-BBLIP w/ ViT-B and CapFilt-LBLIP w/ ViT-L Image-Text Retrieval (COCO)Download-Download Image-Text Retrieval (Flickr30k)Download-Download Image Captioning (COCO)-DownloadDownload VQADownloadDownload- NLVR2Download-- Image-Text Retrieval: ...
This task has found extensive application across diverse domains and has received significant scholarly attention in recent years [2]. Specifically, it has been employed for various purposes, including but not limited to restoring the content and enhancing the quality of historical photographs, ...
An active development of semantic-based visual information retrieval methods was made in an attempt to reduce the semantic gap. The semantic-based image retrieval task aims to discover high-level semantic meaning within an image. The mai... L Stanescu,DD Burdescu,M Brezovan,... - Springer Ne...
For sample image below, the output text is "a stream in the middle of a forest". Version: 6 Previewlicense : bsd-3-clausetask : image-to-textSharedComputeCapacityEnabledhuggingface_model_id : Salesforce/blip-image-captioning-baseauthor : Salesforcehiddenlayerscannedinference_compute_allow_list :...