Continual Learning (CL) aims to achieve this goal and meanwhile overcome the catastrophic forgetting of former knowledge when learning new ones. Typical CL methods build the model from scratch to grow with incoming data. However, the advent of the pre-trained model (PTM) era has sparked immense...
MLM-based pre-trained: BERT 系列 (BERT, RoBERTa, ALBERT) 1.2.1Autoregressive Language Models (ALMs): Complete the sentence given its prefix 自监督学习:从任何其他部分预测输入的任何部分 Transformer-based ALMs:由多层transformer层堆叠组成 1.2.2Masked Language Models (MLMs): Use the unmasked words to...
Inspired by the process of learning new knowledge in human brains, we propose a Bayesian generative model for continual learning built on a fixed pre-trained feature extractor. In this model, knowledge of each old class can be compactly represented by a collection of statistical distributions, e....
This paper describes the continual pre-training method for the masked language model (MLM) to enhance the DeBERTa pre-trained language model. Several training strategies are designed to further improve the final downstream performance including the data augmentation with the supervised transfer, child-...
In the case of regularization-based continual learning, the weight-level importance is utilized as the strength of the L2 regularization between a current model’s weight and the model’s weight trained up to the previous task for overcomingcatastrophicforgetting of previous tasks (Kirkpatrick et al...
Flexible Weight Tuning and Weight Fusion Strategies for Continual Named Entity Recognition Yahan Yu, Duzhen Zhang, Xiuyi Chen, Chenhui Chu 2024 Expandable Subspace Ensemble for Pre-Trained Model-Based Class-Incremental Learning Da-Wei Zhou, Hai-Long ...
几篇论文实现代码:《HyPe: Better Pre-trained Language Model Fine-tuning with Hidden Representation Perturbation》(ACL 2023) GitHub: github.com/Yuanhy1997/HyPe [fig7] 《Fully Attentional Networks w...
SLCA: Slow Learner with Classifier Alignment for Continual Learning on a Pre-trained Model @ ICCV 2023 **AND** SLCA++: Unleash the Power of Sequential Fine-tuning for Continual Learning with Pre-training - GengDavid/SLCA
节选自论文《Continual Pre-Training of Large Language Models: How to (re)warm your model?》 个人认为二次预训练的原因是模型第一次预训练使用的语料库通常是更加通用的,而如果想要应用在对专业知识要求较高的领域则会因专业知识欠缺而导致性能不足。二次预训练让模型在更加专业的语料库(垂直领域的语料库,不...
To alleviatethis problem, we propose a new framework of General Memory AugmentedPre-trained Language Model (G-MAP), which augments the domain-specific PLM by amemory representation built from the frozen general PLM without losing anygeneral knowledge. Specifically, we propose a new memory-augmented...