(64-bit runtime) Python platform: Linux-4.15.0-39-generic-x86_64-with-glibc2.31 Is CUDA available: True CUDA runtime version: Could not collect CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: NVIDIA TITAN V GPU 1: NVIDIA TITAN V GPU 2: NVIDIA TITAN V GPU 3:...
Your current environment The output of `python collect_env.py` 🐛 Describe the bug start vllm with env export VLLM_USE_MODELSCOPE=True, got errors: INFO 07-24 08:44:25 model_runner.py:680] Starting to load model LLM-Research/Meta-Llama-3...
CUDA_VISIBLE_DEVICES=0 \ swift infer \ --adapters output/vx-xxx/checkpoint-xxx \ --stream true \ --temperature 0 \ --max_new_tokens 2048 # merge-lora and use vLLM for inference acceleration CUDA_VISIBLE_DEVICES=0 \ swift infer \ --adapters output/vx-xxx/checkpoint-xxx \ --stream tr...
#Using an interactive command line for inference.CUDA_VISIBLE_DEVICES=0 \ swift infer \ --adapters output/vx-xxx/checkpoint-xxx \ --streamtrue\ --temperature 0 \ --max_new_tokens 2048#merge-lora and use vLLM for inference accelerationCUDA_VISIBLE_DEVICES=0 \ swift infer \ --adapters outp...
🎁 2025.02.16: Support LMDeploy in GRPO, use --use_lmdeploy true. Please check this script 🔥 2025.02.12: Support for GRPO(Group Relative Policy Optimization) algorithm for llm and mllm, document can be found in here 🎁 2025.02.10: SWIFT support the fine-tuning of embedding models...