LLM Inference Speeds This repository contains benchmark data for various Large Language Models (LLM) based on their inference speeds measured in tokens per second. The benchmarks are performed across different hardware configurations using the prompt "Give me 1 line phrase". About the Data The dat...
Tutorial for LLM developers about engine design, service deployment, evaluation/benchmark, etc. Provide a C/S style optimized LLM inference engine. Topics tutorial inference transformer llama gpt model-serving llm llm-inference Resources Readme License Apache-2.0 license Activity Custom properties...