可以启动的,但hf推理效率较低,如果bash ./run.sh -c local -i 0 -b vllm -m Qwen-7B-QAnything -t qwen-7b-qanything 这样在一张24g的卡上显存是不够的,因为vllm不支持8bit。 所以希望手动转成4bit,但目前提供的版本无法转换。 AprildreamMI commented Feb 20, 2024 可以启动的,但hf推理效率较低...
qanything-container-local | model_name is set to [Qwen-7B-QAnything] qanything-container-local | conv_template is set to [qwen-7b-qanything] qanything-container-local | tensor_parallel is set to [1] qanything-container-local | gpu_memory_utilization is set to [0.81] qanything-container-...
这个项目里网易开源了自己训练的Qwen模型,通过fastchat进行服务的启动。 具体fastchat文档可以参考FastChat openai调用 在run_for_local_option.sh中(186行),有如下启动命令 mkdir -p /workspace/qanything_local/logs/debug_logs/fastchat_logs && cd /workspace/qanything_local/logs/debug_logs/fastchat_logs noh...