CUDA kernels for auto_gptq are not installed, this will result in very slow inference speed. This may because: You disabled CUDA extensions compilation by setting BUILD_CUDA_EXT=0 when install auto_gptq from source. You are using pytorch without CUDA support. CUDA and nvcc are not installed...
File "C:\Users\wuyux\anaconda3\envs\localgpt\lib\site-packages\auto_gptq\nn_modules\qlinear\qlinear_cuda_old.py", line 83, in init self.autogptq_cuda = autogptq_cuda_256 NameError: name 'autogptq_cuda_256' is not defined 2023-07-23 17:08:08,075 - INFO - duckdb.py:414 - ...
It won’t work like this for OPT, you should use the from_pretrained method: the checkpoint only contains the base model while the model obtained with AutoModelForCausalLM will have more keys (like the decoder) which are tied, and also parameter names that ...
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm. - Remove use_cuda_fp16 arg. GPTQ kernels are fp16 by default. by Qubitium · Pull Request #37 · Qubitium/AutoGPTQ
Hi I'm having a lot of problems getting AutoGPTQ compiled when using a Docker I've tried: RUN pip install auto-gptq==0.2.0 and RUN /bin/bash -o pipefail -c 'cd /root && \ git clone https://github.com/PanQiWei/AutoGPTQ && \ cd AutoGPTQ &&...
set(LLAMA_CUDA_DMMV_X "32" CACHE STRING "llama: x stride for dmmv CUDA kernels") set(LLAMA_CUDA_DMMV_Y "1" CACHE STRING "llama: y block size for dmmv CUDA kernels") if (GGML_CUBLAS_USE) target_compile_definitions(ggml${SUFFIX} PRIVATE GGML_USE_CUBLAS GGML_CUDA_DMMV_X=${...
Subsequent to this, we have fixed an issue with registration of Pad kernels for the CUDA EP and improved the kernel's performance. Based on the logs you shared above, I think the 6 Pad nodes should be placed on CUDA now and its perf should be better than before. So the improvement sho...
(out var ctx, CUctx_flags.CU_CTX_SCHED_AUTO, dev)); checkCudaErrors(cuCtxSetCurrent(ctx)); cuPrintCurrentContextInfo(); #endif #if USE_CUDA gpt2_load_kernels(model); #endif // read in model from a checkpoint file using (SafeFileHandle model_file = new SafeFileHandle(fopen(checkpoint_...