20250123-what-is-LLM-distill assets what-is-LLM-distill.md 20250124-why-some-NVMe-SSD-have-DRAM-and-some-are-not 20250125-does-CXL-will-be-LLM-memory-solution 20250126-what-is-transformer 20250127-how-to-optimize-transformer 20250128-rammap-description 20250129-what-is-quantization-in-LLM ass...
主流的LLMs量化方法都是想在量化的过程中加一些参数去缩小离群值带来的影响(如SmoothQuant\AWQ\OmniQuant\AffineQuant),或者说用分治的思想或者更细粒度的量化来隔离离群值(如LLM.int8()\ZeroQuant)。作者想的和主流的LLMs量化方法不一样,作者通过修改Attention机制来避免训练出一个有离群值的LLM,这样只需要用A...
20250123-what-is-LLM-distill 20250124-why-some-NVMe-SSD-have-DRAM-and-some-are-not 20250125-does-CXL-will-be-LLM-memory-solution 20250126-what-is-transformer 20250127-how-to-optimize-transformer 20250128-rammap-description 20250129-what-is-quantization-in-LLM 20250131-what-is-1DPC 2025...
Quantization is the process of reducing the precision of a digital signal, typically from a higher-precision format to a lower-precision format.
It supports fine-tuning techniques such as full fine-tuning, LoRA (Low-Rank Adaptation), QLoRA (Quantized LoRA), ReLoRA (Residual LoRA), and GPTQ (GPT Quantization). Run LLM fine-tuning on Modal For step-by-step instructions on fine-tuning LLMs on Modal, you can follow the tutorial her...
Quantization - for reducing memory space required to run models. Tensor parallelism - for breaking up the work of processing among multiple GPUs. Speculative decoding - for speeding up text generation by using a smaller model to predict tokens and a larger model to validate that prediction. ...
Key performance metrics, such as latency, error rate, etc., identify performance-hampering factors like changes in input, model behavior, and/or compliance issues. These observations are then used as a base for model improvement using pruning, quantization, knowledge distillation, etc. Regular optim...
LLMs are built on machine learning: specifically, a type of neural network called a transformer model. In simpler terms, an LLM is a computer program that has been fed enough examples to be able to recognize and interpret human language or other types of complex data. Many LLMs are ...
This is where LLMs kick in. This article aims to introduce you to LLMs. After reading the following sections, we will know what LLMs are, how they work, the different types of LLMs with examples, as well as their advantages and limitations. For newcomers to the subject, our Large ...
In particular, the large language model (LLM) ChatGPT and image generators DALL-E and Midjourney have captured the public's imagination and the business world's attention. Other popular generative AI tools include Bard, Bing Chat, and Llama. How is AI used? The use cases for AI are still...