针对你提出的问题“cannot load cpu or cuda kernel, quantization failed:”,我基于提供的参考信息和给出的提示,整理了以下可能的解决方案: 确认环境配置: 确保CUDA和cuDNN已正确安装和配置。可以通过运行CUDA示例程序来验证CUDA是否安装成功。 检查系统环境变量是否正确设置了CUDA的路径。 检查Py
max_logprobs=20, disable_log_stats=False, quantization=None, rope_scaling=None, rope_theta=None, enforce_eager=False, max_context_len_to_capture=None, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer
Your current environment The output of `python collect_env.py` Collecting environment information... PyTorch version: 2.6.0.dev20241122+rocm6.2 Is debug build: False CUDA used to build PyTorch: N/A ROCM used to build PyTorch: 6.2.41133-d...
Absolutely. I'm not sure if this is the good way to test it, but this is what poped up instantly: import torch def check_pytorch_gpu(): try: if torch.cuda.is_available(): print(f"PyTorch can access {torch.cuda.device_count()} GPU(s).") for i in range(torch.cuda.device_count...
_pickle.UnpicklingError: Weights only load failed. In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code...
in <module> from vllm.model_executor.layers.quantization import (QUANTIZATION_METHODS, File "/mnt/MSAI/home/cephdon/sources/vllm/vllm/model_executor/__init__.py", line 1, in <module> from vllm.model_executor.parameter import (BasevLLMParameter, File "/mnt/MSAI/home/cephdon/sources/v...
CUDA SETUP: PyTorch settings found: CUDA_VERSION=118, Highest Compute Capability: 8.6. CUDA SETUP: To manually override the PyTorch CUDA version please see:https://github.com/TimDettmers/bitsandbytes/blob/main/how_to_use_nonpytorch_cuda.md ...
PyTorch version: 2.5.1 Is debug build: False OS: EulerOS 2.0 (SP10) (aarch64) GCC version: (GCC) 7.3.0 Clang version: Could not collect CMake version: version 3.16.5 Libc version: glibc-2.28 Python version: 3.10.4 (main, Mar 24 2025, 09:38:57) [GCC 7.3.0] (64-bit runtime...
On head of main Your current environment Collecting environment information... PyTorch version: N/A Is debug build: N/A CUDA used to build PyTorch: N/A ROCM used to build PyTorch: N/A OS: Ubuntu 22.04.4 LTS (x86_64) GCC version: (Ubuntu ...
I sawyour issueon rknn-toolkit repo - you told, that you fixed this problem. As far as I understand, your repo performs two step hybrid quantization, but I have no idea, what you did later to make it work. Here is my generated config from step1. I also tried to change float16 to...