AutoAWQ Kernels AutoAWQ Kernels is a new package that is split up from themain repositoryin order to avoid compilation times. Requirements Windows: Must use WSL2. NVIDIA: GPU: Must be compute capability 7.5 or higher. CUDA Toolkit: Must be 11.8 or higher. ...
pip install git+https://github.com/casper-hansen/AutoAWQ_kernels.git Notes on environment variables:TORCH_VERSION: By default, we build using the current version of torch by torch.__version__. You can override it with TORCH_VERSION. CUDA_VERSION or ROCM_VERSION can also be used to ...
pip install https://github.com/casper-hansen/AutoAWQ_kernels/releases/download/v0.0.2/autoawq_kernels-0.0.2+rocm561-cp310-cp310-linux_x86_64.whl Build from source You can also build from source: git clone https://github.com/casper-hansen/AutoAWQ_kernels cd AutoAWQ_kernels pip install...
AWQ_KERNELS_VERSION="0.0.5" AWQ_KERNELS_VERSION="0.0.6" RELEASE_URL="https://api.github.com/repos/casper-hansen/AutoAWQ_kernels/releases/tags/v${AWQ_KERNELS_VERSION}" # Create a directory to download the wheels 23 changes: 20 additions & 3 deletions 23 setup.py Original file line...
casper-hansen/AutoAWQ_kernelsPublic NotificationsYou must be signed in to change notification settings Fork23 Star56 New issue Merged casper-hansenmerged 4 commits intomainfrommixtral_fused Feb 14, 2024 +1,013−26 Conversation0Commits4Checks0Files changed10 ...
This has the following benefits: when installing AutoAWQ from source, you do not need to compile the kernels every time
pip install autoawq-kernels ``` ###Install release wheels For ROCm and other CUDA versions, you can use the wheels published at each[release](https://github.com/casper-hansen/AutoAWQ_kernels/releases/): ``` pip install https://github.com/casper-hansen/AutoAWQ_kernels/releases/download...
I implemented Triton kernels for AWQ inference. They are much faster then the existing CUDA kernels, especially at larger batch sizes: They are also simpler (core kernel is ~ 50-100 lines of Triton). The weights format is a bit different (it's using the most recent AWQ weight format, ...
Support Python 3.12 Bump to 0.0.8 Deprecate ROCm github actions build (it's too old, ROCm to be supported with Triton kernel)