Paper:MLSys Proceedings Code:https://github.com/punica-ai/punica Abstract: 低秩(自)适应(Low-rank adaptation, LoRA)已成为一种重要而流行的方法,用于将预先训练好的模型适配到特定领域。本文提出的 Punica 是一个在共享 GPU 集群中为多个 LoRA 模型提供服务的系统。Punica 包含一种新的 CUDA 内核设计,允许为...
Url:https://proceedings.mlsys.org/paper_files/paper/2023/hash/d3313de3f431fd64513431c4326d237c-Abstract-mlsys2023.html Abstract: Cross-device federated learning (FL) has been well-studied from algorithmic, system scalability, and training speed perspectives. Nonetheless, moving from centralized traini...
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration - mit-han-lab/llm-awq
Pekhimenko and C. De Sa}, pages = {196--209}, title = {Atom: Low-Bit Quantization for Efficient and Accurate LLM Serving}, url = {https://proceedings.mlsys.org/paper_files/paper/2024/file/5edb57c05c81d04beb716ef1d542fe9e-Paper-Conference.pdf}, volume = {6}, year = {2024} ...
Notes:codeOn Noisy Evaluation in Federated Hyperparameter TuningAuthors: Kevin Kuo; Pratiksha Thaker; Mikhail Khodak; John Nguyen; Daniel Jiang; Ameet Talwalkar; Virginia SmithJournal : Proceedings of Machine Learning and SystemsUrl:https://proceedings.mlsys.org/paper_files/paper/2023/hash/294f82c4...
LLM System这块,百花齐放,大家也都讨论了好多了,serving上我喜欢的paper有SGLang,DistServe,ParrotServe...
[2024/05] 🏆 AWQ receives the Best Paper Award at MLSys 2024. 🎉 [2024/05] 🔥 The VILA-1.5 model family which features video understanding is now supported in AWQ and TinyChat. Check out out online demo powered by TinyChat here. Example is here. [2024/05] 🔥 AMD adopts AWQ...
NAACL 2024, Demo / MLSys Workshop @ NeurIPS 2023 [Paper][Twitter][Slides][Demo Video][Documentation] Best Demo Paper Runner Up @ NAACL 2024 Installation Install RedCoast pip install redco Adjust Jax to GPU/TPU version The command above would automatically install cpu version of jax, so the ...
CMU 的 Catalyst 组的贾志豪&苗旭鹏的工作SpecInfer(paper,code) 以 boost 方式训练ensemble draft model...
paper-reading github 主页 Large model training & paper 这块目前还没有比较系统的课,大规模的分布式训练开始应用也就这几年的事情,也是MLsys领域的最大热点,这里简单总结一下需要掌握的知识点和参考论文 Data Parallel(数据并行) Distributed Data Parallel(分布式数据并行) PyTorch Distributed: Experiences on Accelera...