Llama-X - Open Academic Research on Improving LLaMA to SOTA LLM. Chinese-Vicuna - A Chinese Instruction-following LLaMA-based Model. GPTQ-for-LLaMA - 4 bits quantization of LLaMA using GPTQ. GPT4All - Demo, data, and code to train open-source assistant-style large language model based on...
🔥 Large Language Models(LLM) have taken theNLP communityAI communitythe Whole Worldby storm. Here is a curated list of papers about large language models, especially relating to ChatGPT. It also contains frameworks for LLM training, tools to deploy LLM, courses and tutorials about LLM and ...
Quantization Pruning Knowledge Distillation Low-Rank Factorization Hardware Acceleration and Deployment Strategies Popular On-Device LLMs Framework Hardware Acceleration Applications Model Reference Tutorials and Learning Resources 🤝 Join the On-Device LLM Revolution ...
KD Algorithms: For KD algorithms, we categorize it into two principal steps: "Knowledge Elicitation" focusing on eliciting knowledge from teacher LLMs, and "Distillation Algorithms" centered on injecting this knowledge into student models. Skill Distillation: We delve into the enhancement of specific ...
SparseML is an open-source model optimization toolkit that enables you to create inference-optimized sparse models using pruning, quantization, and distillation algorithms. Models optimized with SparseML can then be exported to the ONNX and deployed with DeepSparse for GPU-class performance on CPU ...
ggml - Tensor library for machine learning with 16-bit and 4-bit quantization support. [MIT] libsvm - A simple, easy-to-use, efficient library for Support Vector Machines. [BSD-3-Clause] website m2cgen - A CLI tool to transpile trained classic ML models into a native C code with ...
AirLLM: AirLLM optimizes inference memory usage, allowing 70B large language models to run inference on a single 4GB GPU card without quantization, distillation and pruning. And you can run 405B Llama3.1 on 8GB vram now. LLMHub: LLMHub is a lightweight management platform designed to streamli...
Survey: A collection of AWESOME papers and resources on the large language model (LLM) related recommender system topics. - lihuibng/Awesome-LLM-for-RecSys
ggml - Tensor library for machine learning with 16-bit and 4-bit quantization support. [MIT] libsvm - A simple, easy-to-use, efficient library for Support Vector Machines. [BSD-3-Clause] website m2cgen - A CLI tool to transpile trained classic ML models into a native C code with zer...
Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed. GPT-4-LLM : "Instruction Tuning with GPT-4". (arXiv 2023). instruction-tuning-with-gpt-4.github.io/ StarCoder : 💫 StarCoder is a language model (LM) ...