大概意思是-max_target_seqs和-num_alignments是功能是一样的,只是针对不同的输出格式。当-outfmt小于等于4时使用-num_alignments,当-outfmt大于4时,使用-max_target_seqs。 感觉不太对,如果是这样的话NCBI根本就没有原因设置两个参数来完成这个目的。 2.3 测试 在当前目录下新建两个文件夹 -max_target_seqs、nu...
另一个问题:在我们的测试中,推测性解码比纯原模型慢。
max-num-seqsGPU mem(Gib) 25620.6 204819 409613 TaChaoadded theusageHow to use vllmlabelMar 19, 2024 Collaborator hmellorcommentedApr 20, 2024 Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment ...
If you set the max_num_batched_tokens or max_num_seqs with low value then the prefill batch size will be small (e.g., 1) which might not hurt performance, there is no one size fit all suggestion I guess, I think you can tweak the prefill batch size through these two knobs and u...