UPDATE (08/09/2023): We have done major performance overhaul in the past few months, and now I'm happy to share the latest results: SOTA performance on CUDA: https://github.com/mlc-ai/llm-perf-bench SOTA performance on ROCm: https://blog...
As the first example, we try out the chat CLI in MLC LLM with 4-bit quantized 7B Llama-2 model. The simplest command to run MLC chat is a one-liner command: .. code:: bash mlc_llm chat HF://mlc-ai/Llama-2-7b-chat-hf-q4f16_1-MLC It may take 1-2 minutes for the first ti...
changed the title[-][Question] MLC-Chat Rest API server: Unable to find *.dylib error[/-]on Jul 28, 2023 Sing-Li commentedon Jul 28, 2023 Sing-Li Jasonnor commentedon Jul 29, 2023 Jasonnor 2remainingitems Sign up for freeto join this conversation on GitHub.Already have an account?Si...