AffineQuant 论文:arXiv reCAPTCHA 代码:https://github.com/bytedance/AffineQuant AffineQuant首次提出采用仿射变换来进行等效变换操作,进一步地扩大了优化空间。仿射变换可以理解成线性变换+平移,在变换的限制上不像正交变换要求那么严格。如下图所示,最右侧的仿射变换可以灵活地将weight对齐到匹配的定点数上最大程度的...
五、参考链接 1. https://arxiv.org/abs/2406.01721 2. https://github.com/Hsu1023/DuQuant 3. https://arxiv.org/abs/2402.17762 4. https://arxiv.org/abs/2211.10438 5. https://arxiv.org/abs/2308.13137 6. https://arxiv.org/abs/2306.11987 7. https://arxiv.org/abs/2405.16406 8. http...
[2024/09/26] 🌟 Our DuQuant paper has been accepted for a Oral presentation at NeurIPS 2024 (only top 1% out of 15,671 submissions)! 🎉 Cheers! [2024/09/06] 🔥 We release the code! [2024/06/03] 🚀 Our paper is available on arXiv!
自动化所、清华、港城大团队最近有一篇论文入选了NeurIPS 2024(Oral Presentation),他们针对LLM权重激活量化提出了两种正交变换,有效降低了outliers现象,达到了4-bit的新SOTA。 简单理解,在大语言模型(LLM)中,有一些中间层输出的数值(激活值 Activation)会变得非常大,它们被称为“outliers(离群值)”,这些 outliers给模...