🔍 For more details, please refer to the project page:https://duquant.github.io/. 📰 News [2024/09/26] 🌟 Our DuQuant paper has been accepted for a Oral presentation at NeurIPS 2024 (only top 1% out of 15,671 submissions)! 🎉 Cheers!
代码:GitHub - Hsu1023/DuQuant: [NeurIPS 2024 Oral ] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs. DuQuant作者发现在之前处理的常规离群值(所有token计算上都会在某些channel上出现)外,还有一种类型的离群值没有被针对性处理,而它对量化性能有很大影响,作者将其定义为...
对应的 Paper 为:[2406.01721] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs [1] 对应的代码库为:https://github.com/Hsu1023/DuQuant [2] 二、 引言 2.1 量化方法 在每个 Transformer Block 中,多头自注意力(MSA)和前馈网络(FFN)基本上都由线性层组成,将其表示为:,...
几篇论文实现代码:《DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs》(NeurIPS 2024) GitHub: github.com/Hsu1023/DuQuant [fig5] 《UTrack: Multi-Object Tracking w...
论文:https://link.zhihu.com/?target=https%3A//eric-mingjie.github.io/massive-activations/index.html 这些Massive Outliers造成SmoothQuant和OmniQuant等算法在4bit WA量化中表现较差。 △图1:Massive outliers显著加大了低比特权重激活量化的难度 图1(a)(b)对比了普遍常见的Normal Outliers,和在FFN中出现的Mass...