In this study, we aim to leverage the general capabilities of pre-trained models for knowledge sharing across different tasks while endow them with the capability for continuous learning. To this end, we propose
Smaller ModelNetwork Compression Network Pruning Knowledge Distillation Parameter Quantization Architecture Design Network Architecture Transformer-XL: Segment-Level Recurrence with State Reuse Reformer Longformer 1.2 预训练模型的类型: Autoregressive pre-trained: GPT 系列 (GPT, GPT-2, GPT-3) MLM-based pre-...
Specifically: (1) during pre-training, language models focus on unsupervised learning tasks (e.g. predicting masked words based on the surrounding context), and (2) during fine-tuning, the pre-trained model is further trained on supervised learning tasks (e.g. sequence labeling). Pre-trained...
SLCA: Slow Learner with Classifier Alignment for Continual Learning on a Pre-trained Model @ ICCV 2023 **AND** SLCA++: Unleash the Power of Sequential Fine-tuning for Continual Learning with Pre-training - GengDavid/SLCA
We tested Fine-tuning (FT) and R-SSL, which use a source pre-trained model and a real source dataset for SSL as reference. For FT and R-SSL, we used ImageNet and a subset of ImageNet (Filtered ImageNet), which was collected by confidence-based filtering similar to Xie et al. (...
Our CPRM framework includes three modules: 1) employing both queries and multi-field item to jointly pre-train for enhancing domain knowledge, 2) applying in-context pre-training, a novel approach where LLMs are pre-trained on a sequence of related queries or items, and 3) conducting reading...
Bridging pre-trained models to continual learning: A hypernetwork based framework with parameter-efficient fine-tuning techniques Modern techniques of pre-training and fine-tuning have significantly improved the performance of models on downstream tasks. However, this improvement face... F Ding,C Xu,...
We demonstrate that based on these representa- tions, individuals can be distinguished automatically using classifiers trained with extracted features from annotated reference images. Hence, in our approach, we decouple the feature extraction with deep neural networks from the deci- sion model used to...
Inspirited by BERT, multilingual BERT (mBERT) was developed and released; this model is trained via multilingual masked language modeling (MMLM) on multilingual corpora [41]. From an intuitive perspective, the use of parallel corpora is conducive to learning cross-lingual representations in ...
In the case of regularization-based continual learning, the weight-level importance is utilized as the strength of the L2 regularization between a current model’s weight and the model’s weight trained up to the previous task for overcoming catastrophic forgetting of previous tasks (Kirkpatrick et...