DoRA 在 LoRA 的基础上,将预训练权重分解为 幅度 和 方向 。 作者受到 salimans 论文的启发,将权重矩阵重新参数化为幅度和方向,实现加速优化。拆分如下: W = m \frac{V}{||V||_c} = ||w||_c \frac{W}{||W||_c} 其中,$m$ 是幅度向量,$V$ 是 方向矩阵。 作者用VLBART进行实验,按照 Sun...
DoRA 因此DoRA的公式可以写为 可调整参数为B,A,m。 梯度分析 分析其梯度,可以得到 其中权重矩阵被以m系数放缩,并且得到V'方向的正交投影,这两个因素使得梯度的协方差矩阵更加靠近全等矩阵,这将有益于优化。由于 V′=V+ΔV,V'的梯度可以传递给 ΔV ,因此其优化的好处可以全部继承给LoRA的优化,增进其优化的稳...
DoRA: Weight-Decomposed Low-Rank Adaptation [ICML2024 (Oral)] The Official PyTorch implementation ofDoRA: Weight-Decomposed Low-Rank Adaptation[ICML2024 (Oral, acceptance rate:1.5%)]. Shih-Yang Liu*,Chien-Yi Wang,Hongxu Yin,Pavlo Molchanov,Yu-Chiang Frank Wang,Kwang-Ting Cheng,Min-Hung Chen ...
论文: DoRA: Weight-Decomposed Low-Rank Adaptation (arxiv.org) 该方法为了深入理解FT和LoRA之间的差异,文章首先引入了一种新颖的权重分解分析方法。这种分析基于权重归一化(Weight Normalization)的概念,…
DoRA: Weight-Decomposed Low-Rank Adaptation Shih-Yang Liu, Chien-Yi Wang, Hongxu Yin, Pavlo Molchanov, Yu-Chiang Frank Wang, Kwang-Ting Cheng, Min-Hung Chen Paper: https://arxiv.org/abs/2402.09353 Project page: https://nbasyl.github.io/DoRA-project-page/ DoRA decomposes the pre-trained ...
This repo is now deprecated, please visit NVlabs/DoRA instead!! DoRA: Weight-Decomposed Low-Rank Adaptation Shih-Yang Liu, Chien-Yi Wang, Hongxu Yin, Pavlo Molchanov, Yu-Chiang Frank Wang, Kwang-Ting Cheng, Min-Hung Chen Paper: https://arxiv.org/abs/2402.09353 Project page: https://nba...
DoRA: Weight-Decomposed Low-Rank Adaptation [ICML2024 (Oral)] The Official PyTorch implementation ofDoRA: Weight-Decomposed Low-Rank Adaptation[ICML2024 (Oral, acceptance rate:1.5%)]. Shih-Yang Liu*,Chien-Yi Wang,Hongxu Yin,Pavlo Molchanov,Yu-Chiang Frank Wang,Kwang-Ting Cheng,Min-Hung Chen ...