In this paper, we describe the resulting high-level design, sketch some of the challenges and their solutions, and touch upon the open problems and future directions. abstractTranslation: 联邦学习是一种分布式机器学习方法,可以对大量分散数据进行模型训练。我们已经基于 TensorFlow 为移动设备领域的联邦学习...
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration - LLMNexus/llm-awq
[2024/05] 🏆 AWQ receives the Best Paper Award at MLSys 2024. 🎉 [2024/05] 🔥 The VILA-1.5 model family which features video understanding is now supported in AWQ and TinyChat. Check out out online demo powered by TinyChat here. Example is here. [2024/05] 🔥 AMD adopts AWQ...
[2024/12] 🔥 QServe has been integrated into NVIDIA TensorRT-LLM! [2024/05] 🔥 QServe is publicly released! Check our paper here. Key Features OmniServe is a unified, flexible, and efficient LLM serving system designed to support modern large language models and multi-modal language models...
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration - mit-han-lab/llm-awq
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration - LLM-Dev-BB/llm-awq