(chatrtx) C:\chatrtx>trtllm-build --checkpoint_dir .\model\mistral_model\model_checkpoints --output_dir .\model\mistral_model\engine --gpt_attention_plugin float16 --gemm_plugin float16 --max_batch_size 1 --max_input_len 7168 --max_output_len 1024 --context_fmha=enable --paged_kv...
There will be another merge request on the GitLab to bring all the TRT-LLM backend changes to the main. Both PRs will need to be merged before code freeze. mc-nv reviewed Oct 6, 2023 View reviewed changes build.py Outdated Show resolved Add TRT-LLM backend build to Triton (#6365) ...