论文的核心内容是提出了一种名为WiSE-FT(Weight-space ensembling for Fine-tuning)的方法,用于在保持零样本(zero-shot)模型的鲁棒性的同时,对其进行微调(fine-tuning)以提高在特定目标分布上的准确性。零样本模型,如CLIP或ALIGN,在没有针对特定数据集进行微调的情况下,能够在一系列数据分布上保持一致的准确性。然而...
robust fine-tuning of zero-shot models "Robust fine-tuning of zero-shot models"是指对零样本模型进行稳健的微调。在机器学习中,零样本学习是指模型在没有见过特定任务的数据情况下,能够对该任务进行推断或预测。 在零样本学习中,通常使用预训练的模型,然后在新任务上进行微调,以适应特定的任务。然而,由于新...
Robust fine-tuning of zero-shot models Mitchell Wortsman∗ University of Washington mitchnw@cs.washington.edu Gabriel Ilharco∗ University of Washington gamaga@cs.washington.edu Jong Wook Kim OpenAI jongwook@openai.com Mike Li Columbia University mli24@gsb.columbia.edu Simon Kornblith Google ...
Nonetheless, optimizing these models in downstream tasks typically requires fine-tuning, which reduces generalization to out-of-distribution (OOD) data and demands extensive computational resources. We introduce Robust Adapter(R-Adapter), a novel method for fine-tuning zero-shot models to downstream ...
Fig. 1: Nucleotide Transformer: effective methodology to pre-train, fine-tune, analyze and compare foundational models for genomics. a,b, Overview of NT training (a) and application for downstream genomic prediction tasks through fine-tuning (b). Downstream task prediction through probing is simila...
Fig. 1: Nucleotide Transformer: effective methodology to pre-train, fine-tune, analyze and compare foundational models for genomics. a,b, Overview of NT training (a) and application for downstream genomic prediction tasks through fine-tuning (b). Downstream task prediction through probing is simila...
With the surge of large-scale pre-trained models (PTMs), fine-tuning these models to numerous downstream tasks becomes a crucial problem. Consequently, parameter efficient transfer learning (PETL) of large models has grasped huge attention. While recent PETL methods showcase impressive performance,...
Large pre-trained models such as CLIP or ALIGN offer consistent accuracy across a range of data distributions when performing zero-shot inference (i.e., without fine-tuning on a specific dataset). Although existing fine-tuning methods substantially improve accuracy on a given target distribution, ...
Table 10:Cross-modal retrievaltop-1/5 recalls on MSCOCO(fine-tuned) andFlickr30k(zero-shot transfer). m2-Mix generally enhances image-text retrieval of a SOTA vision-language model, CoCa. Method(Zero-shot) Flickr30k i→ t (R1)i → t (R5)t → i (R1)t → i (R5) ...
To this end, we devise a new fine-tuning method for robust representation equipping better alignment and uniformity. First, we propose a Geodesic Multi-Modal Mixup that mixes the embeddings of image and text to generate hard negative samples on the hypersphere. Then, we fine-tune the model ...