max_logprobs=20, disable_log_stats=False, quantization=None, rope_scaling=None, rope_theta=None, enforce_eager=False, max_context_len_to_capture=None, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer
On head of main Your current environment Collecting environment information... PyTorch version: N/A Is debug build: N/A CUDA used to build PyTorch: N/A ROCM used to build PyTorch: N/A OS: Ubuntu 22.04.4 LTS (x86_64) GCC version: (Ubuntu ...
in <module> from vllm.model_executor.layers.quantization import (QUANTIZATION_METHODS, File "/mnt/MSAI/home/cephdon/sources/vllm/vllm/model_executor/__init__.py", line 1, in <module> from vllm.model_executor.parameter import (BasevLLMParameter, File "/mnt/MSAI/home/cephdon/sources/v...
CUDA SETUP: PyTorch settings found: CUDA_VERSION=118, Highest Compute Capability: 8.6. CUDA SETUP: To manually override the PyTorch CUDA version please see:https://github.com/TimDettmers/bitsandbytes/blob/main/how_to_use_nonpytorch_cuda.md CUDA SETUP: Loading binary C:\SQl coder\sqlenv\Lib...
pip24.0protobuf4.25.0psutil5.9.8py-cpuinfo9.0.0pyarrow17.0.0pycparser2.21pydantic2.6.4pydantic_core2.16.3PyGObject3.36.0PyMySQL1.1.0pynvml11.5.0pyparsing3.1.2pypdfium24.28.0python-apt2.0.1+ubuntu0.20.4.1python-dateutil2.8.2python-iso6392024.2.7python-magic0.4.27python-multipart0.0.9pytorch-lightning...
import torch max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally! dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+ load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False. ...
_pickle.UnpicklingError: Weights only load failed. In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code...
As a sanity check, can you import pytorch and verify that it can access your GPUs? @DarkLight1337 Absolutely. I'm not sure if this is the good way to test it, but this is what poped up instantly: import torch def check_pytorch_gpu(): ...
quantization=None, rope_scaling=None, rope_theta=None, hf_overrides=None, enforce_eager=False, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, limit_mm_per_prompt=None, mm_processor_kwargs=No...
Your current environment The output of `python collect_env.py` Collecting environment information... PyTorch version: 2.6.0.dev20241122+rocm6.2 Is debug build: False CUDA used to build PyTorch: N/A ROCM used to build PyTorch: 6.2.41133-d...