另一个问题:在我们的测试中,推测性解码比纯原模型慢。
How would you like to use vllm How max-num-seqs control GPU mem useage? Why increase max-num-seqs will use less memory? With my test: max-num-seqsGPU mem(Gib) 25620.6 204819 409613 TaChaoadded theusageHow to use vllmlabelMar 19, 2024 ...