feat: update exllamav2 kernels (huggingface#1370)Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com> habana-main(huggingface/tgi-gaudi#134) 1 parent 3e22ad9 commit 7eeabb9 Copy full SHA for 7eeabb9File tree 17 files changed +525 -255lines changed Top Filter options server exllama...
Faster, better kernels Cleaner and more versatile codebase Support for a new quant format (see below) Performance Some quick tests to compare performance with V1. There may be more performance optimizations in the future, and speeds will vary across GPUs, with slow CPUs still being a potential...