In this paper, we advocate for the direct optimization using equivalent Affine transformations in PTQ (AffineQuant). This approach extends the optimization scope and thus significantly minimizing quantization errors. Additionally, by employing the corresponding inverse matrix, we can ensure equivalence ...
conda create -n affinequant python=3.10 -y conda activate affinequant git clone https://github.com/bytedance/AffineQuant.git cd AffineQuant pip install --upgrade pip pip install -e . We also leverage the kernel fromAutoGPTQto achieve real quantization. So you should also install the bug-fix...
Official implementation of the ICLR 2024 paper AffineQuant - bytedance/AffineQuant