针对你提出的问题“cannot load cpu or cuda kernel, quantization failed:”,我基于提供的参考信息和给出的提示,整理了以下可能的解决方案: 确认环境配置: 确保CUDA和cuDNN已正确安装和配置。可以通过运行CUDA示例程序来验证CUDA是否安装成功。 检查系统环境变量是否正确设置了CUDA的路径。 检查Py
max_logprobs=20, disable_log_stats=False, quantization=None, rope_scaling=None, rope_theta=None, enforce_eager=False, max_context_len_to_capture=None, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer
in <module> from vllm.model_executor.layers.quantization import (QUANTIZATION_METHODS, File "/mnt/MSAI/home/cephdon/sources/vllm/vllm/model_executor/__init__.py", line 1, in <module> from vllm.model_executor.parameter import (BasevLLMParameter, File "/mnt/MSAI/home/cephdon/sources/v...
CUDA SETUP: PyTorch settings found: CUDA_VERSION=118, Highest Compute Capability: 8.6. CUDA SETUP: To manually override the PyTorch CUDA version please see:https://github.com/TimDettmers/bitsandbytes/blob/main/how_to_use_nonpytorch_cuda.md CUDA SETUP: Loading binary C:\SQl coder\sqlenv\Lib...
On head of main Your current environment Collecting environment information... PyTorch version: N/A Is debug build: N/A CUDA used to build PyTorch: N/A ROCM used to build PyTorch: N/A OS: Ubuntu 22.04.4 LTS (x86_64) GCC version: (Ubuntu ...
Your current environment The output of `python collect_env.py` Collecting environment information... PyTorch version: 2.6.0.dev20241122+rocm6.2 Is debug build: False CUDA used to build PyTorch: N/A ROCM used to build PyTorch: 6.2.41133-d...
_pickle.UnpicklingError: Weights only load failed. In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code...
pip24.0protobuf4.25.0psutil5.9.8py-cpuinfo9.0.0pyarrow17.0.0pycparser2.21pydantic2.6.4pydantic_core2.16.3PyGObject3.36.0PyMySQL1.1.0pynvml11.5.0pyparsing3.1.2pypdfium24.28.0python-apt2.0.1+ubuntu0.20.4.1python-dateutil2.8.2python-iso6392024.2.7python-magic0.4.27python-multipart0.0.9pytorch-lightning...
As a sanity check, can you import pytorch and verify that it can access your GPUs? Sorry, something went wrong. Copy link Author ergleb78commentedSep 27, 2024• edited @DarkLight1337 Absolutely. I'm not sure if this is the good way to test it, but this is what poped up instantly...
quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=npu, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_...