[Bugfix] Remove noisy error logging during local model loading #13458 Merged panf2333 pushed a commit to yottalabsai/vllm that referenced this pull request Feb 18, 2025 [Bugfix] Fix VLLM_USE_MODELSCOPE issue
docker run --runtime nvidia --gpus all -v cache/modelscope:/root/.cache/modelscope --env "VLLM_USE_MODELSCOPE=True" -p 8000:8000 --ipc host -d --name vllm vllm/vllm-openai:v0.5.5 --model LLM-Research/Meta-Llama-3.1-8B-Instruct --trust-remote-code -tp 4 the container exits...
Your current environment The output of `python collect_env.py` Your output of `python collect_env.py` here 🐛 Describe the bug When using VLLM_USE_MODELSCOPE and the tensor-parallel-size > 1, I found that vllm will download the model many...
start vllm with env export VLLM_USE_MODELSCOPE=True, got errors: INFO 07-24 08:44:25 model_runner.py:680] Starting to load model LLM-Research/Meta-Llama-3.1-8B-Instruct... [rank0]: Traceback (most recent call last): [rank0]: File "/usr/lib/python3.10/runpy.py", line 196, ...
Model Evaluation: Uses EvalScope as the evaluation backend and supports evaluation on 100+ datasets for both pure text and multi-modal models. Model Quantization: Supports AWQ, GPTQ, and BNB quantized exports, with models that can use vLLM/LmDeploy for inference acceleration and continue training...