deepspeed-fp6

2025-01-20 11:34:07

拼音 [ 拼音 ]

DeepSpeed-FP6:大型语言模型中以FP6为核心的强大推理服务 - 知乎

我们的DeepSpeed-FP6目前仅支持线性GEMM。我们期待未来能够支持MoE GEMM。我们将继续根据您的反馈和支持改进DeepSpeed-FP6。DeepSpeed-FP6是更大DeepSpeed生态系统的一部分,包括一系列深度学习系统和建模技术。要了解更多, 请访问我们的网站了解详细的博客文章、教程和文档。在我们的英文 X(Twitter)、日语 X(Twitter)...
...by jeffra · Pull Request #5336 · microsoft/DeepSpeed...

Flexible-bit quantizer-dequantizer library with fp6/fp12/fp8 support Requires Ampere+ architecture, this is due to the initial focus of this op only on bfloat16 input types. Co-authored-by: Reza Yazdani reza.yazdani@snowflake.com
...problem on non-Ampere GPUs. (#5333) · microsoft/DeepSpeed...

// This is a copy of FP6-LLM kernel code: https://arxiv.org/abs/2401.14112 #ifndef DEEPSPEED_CUDA_LINEAR_KERNEL_MATMUL_CUH #define DEEPSPEED_CUDA_LINEAR_KERNEL_MATMUL_CUH #include "configs.h" #include "utils_core.cuh" #include "utils_gmem.cuh" @@ -26,6 +29,8 @@ __global__ voi...