max_logprobs=20, disable_log_stats=False, quantization=None, rope_scaling=None, rope_theta=None, enforce_eager=False, max_context_len_to_capture=None, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer
(xinf) root@DESKTOP-ESRGKIB:~/vllm# python3 collect_env.py Collecting environment information... PyTorch version: 2.5.1+rocm6.2 Is debug build: False CUDA used to build PyTorch: N/A ROCM used to build PyTorch: 6.2.41133-dd7f95766 OS: Ubuntu 22.04.5 LTS (x86_64) GCC version: (cond...
_pickle.UnpicklingError: Weights only load failed. In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code ...
On head of main Your current environment Collecting environment information... PyTorch version: N/A Is debug build: N/A CUDA used to build PyTorch: N/A ROCM used to build PyTorch: N/A OS: Ubuntu 22.04.4 LTS (x86_64) GCC version: (Ubuntu ...
import torch max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally! dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+ load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False. ...
CUDA SETUP: PyTorch settings found: CUDA_VERSION=118, Highest Compute Capability: 8.6. CUDA SETUP: To manually override the PyTorch CUDA version please see:https://github.com/TimDettmers/bitsandbytes/blob/main/how_to_use_nonpytorch_cuda.md ...
quantization=None, rope_scaling=None, rope_theta=None, hf_overrides=None, enforce_eager=False, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, limit_mm_per_prompt=None, mm_processor_kwargs=No...
Your current environment The output of `python collect_env.py` Collecting environment information... PyTorch version: 2.6.0.dev20241122+rocm6.2 Is debug build: False CUDA used to build PyTorch: N/A ROCM used to build PyTorch: 6.2.41133-d...
"enable_quantization": false, "enable_emphasis": true, "enable_batch_seeds": true, "comma_padding_backtrack": 20, "CLIP_stop_at_last_layers": 1, "upcast_attn": false, "randn_source": "GPU", "tiling": false, "hires_fix_refiner_pass": "second pass", ...
in <module> from vllm.model_executor.layers.quantization import (QUANTIZATION_METHODS, File "/mnt/MSAI/home/cephdon/sources/vllm/vllm/model_executor/__init__.py", line 1, in <module> from vllm.model_executor.parameter import (BasevLLMParameter, File "/mnt/MSAI/home/cephdon/sources/v...