Assign vllm+cpu 后端(无 gpu 硬件)时,tensor_parallel_size 应该默认设置成 1 而不是 cuda_count(等于 0) #3207 Sign in to view logs Summary Jobs issue_assign Run details Usage Workflow file Triggered via issue November 14, 2024 08:07 qinxuye commented on #2552 042eb5b Status Success Tot...
Your current environment vllm version: '0.5.0.post1' 🐛 Describe the bug When I set tensor_parallel_size=1, it works well. But, if I set tensor_parallel_size>1, below error occurs: RuntimeError: Cannot re-initialize CUDA in forked subproc...
当tensor_parallel_size=2被使用时,输出结果为:
当tensor_parallel_size=2被使用时,输出结果为:
try add --privileged to docker
当tensor_parallel_size=2被使用时,输出结果为:
Hi, I am trying to set up vLLM Mixtral 8x7b on GCP. I have a VM with two A100 80GBs, and am using the following setup: docker image: vllm/vllm-openai:v0.3.0 Model: mistralai/Mixtral-8x7B-Instruct-v0.1 Command I use inside the vm: python3...
vLLM Build Flags: CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled GPU Topology: GPU0 GPU1 GPU2 GPU3 CPU Affinity NUMA Affinity GPU NUMA ID GPU0 X PHB PHB PHB 0-47 0 N/A GPU1 PHB X PHB PHB 0-47 0 N/A GPU2 PHB PHB X PHB 0-47 0 N/A ...
Your current environment My model is Llama3-8B which takes about 14GB GPU-memory. And the machine have 2 * 40GB GPUs. (NVIDIA L40S) How would you like to use vllm Hey, Recently I tried to use AsyncLLMEngine to speed up my LLM inference s...
In my setup, vLLM works fine when running llama2-7b with 1 GPU. But when running it with multiple gpus, it runs into a Fatal error every time. Sharing the traces below. This is persistent - that is there is no single instance when I am able to run vllm with multiple gpus. Can ...