1.将basic_demo中的openai_api_server中的 engine_args = AsyncEngineArgs( model=MODEL_PATH, tokenizer=MODEL_PATH, # 如果你有多张显卡,可以在这里设置成你的显卡数量 tensor_parallel_size=1, dtype="bfloat16", trust_remote_code=True, # 占用显存的比例,请根据你的显卡显存大小设置合适的值,例如,如果...
Your current environment vllm version: '0.5.0.post1' 🐛 Describe the bug When I set tensor_parallel_size=1, it works well. But, if I set tensor_parallel_size>1, below error occurs: RuntimeError: Cannot re-initialize CUDA in forked subproc...
v0.7.3正式支持DeepSeek-AI多令牌预测模块,实测推理速度最高提升69%。只需在启动参数添加--num-speculative-tokens=1即可开启,还能选配--draft-tensor-parallel-size=1进一步优化。更惊人的是,在ShareGPT数据集测试中,该功能实现了81%-82.3%的预测接受率。这意味着在保持精度的同时,大幅缩短了推理耗时。生成式AI开...
try add --privileged to docker
vllm+cpu 后端(无 gpu 硬件)时,tensor_parallel_size 应该默认设置成 1 而不是 cuda_count(等于 0) #3207 Sign in to view logs Summary Jobs issue_assign Run details Usage Workflow file Triggered via issue November 14, 2024 08:07 qinxuye commented on #2552 042eb5b Status Success ...
(s) per core: 1 Core(s) per socket: 24 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 106 Model name: Intel(R) Xeon(R) Gold 6342 CPU @ 2.80GHz Stepping: 6 CPU MHz: 3500.000 CPU max MHz: 3500.0000 CPU min MHz: 800.0000 BogoMIPS: 5600.00 ...
Your current environment PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Ubuntu 22.04.4 LTS (x86_64) GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Clang version...
Let's default the value ofspeculative_draft_tensor_parallel_sizeto 1 when we detect MLPSpeculator, since this is the only case that works right now. botcommentedAug 3, 2024 👋 Hi! Thank you for contributing to the vLLM project.
🐛 Describe the bug By using tensor parallel API, I expect we can disable TP by setting its mesh size to 1. But this does not work. Here is a reproduction case: import torch from torch.distributed.tensor.parallel import ColwiseParallel, p...
GPU 1: NVIDIA GeForce RTX 4090 Nvidia driver version: 552.22 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit ...