Your current environment The output of `python collect_env.py` Your output of `python collect_env.py` here 🐛 Describe the bug When using VLLM_USE_MODELSCOPE and the tensor-parallel-size > 1, I found that vllm will download the model many...
Problem: For the model service, the tensor-parallel-size value should be set to the number of GPUs when more than 1 GPUs/vGPUs value is set. Solution: Set the tensor-parallel-size using the vgpu nu...
#5856 Draft wooyeonlee0 wants to merge 3 commits into vllm-project:main from wooyeonlee0:spec-draft-tp-gt-1Draft [WIP] [Speculative Decoding] Support draft model on different tensor-parallel size than target model (Extended) #5856 wooyeonlee0 wants to merge 3 commits into vllm-projec...