我们的DeepSpeed-FP6目前仅支持线性GEMM。我们期待未来能够支持MoE GEMM。我们将继续根据您的反馈和支持改进DeepSpeed-FP6。DeepSpeed-FP6是更大DeepSpeed生态系统的一部分,包括一系列深度学习系统和建模技术。要了解更多, 请访问我们的网站了解详细的博客文章、教程和文档。 在我们的英文 X(Twitter)、日语 X(Twitter)...
Flexible-bit quantizer-dequantizer library with fp6/fp12/fp8 support Requires Ampere+ architecture, this is due to the initial focus of this op only on bfloat16 input types. Co-authored-by: Reza Yazdani reza.yazdani@snowflake.com
// This is a copy of FP6-LLM kernel code: https://arxiv.org/abs/2401.14112 #ifndef DEEPSPEED_CUDA_LINEAR_KERNEL_MATMUL_CUH #define DEEPSPEED_CUDA_LINEAR_KERNEL_MATMUL_CUH #include "configs.h" #include "utils_core.cuh" #include "utils_gmem.cuh" @@ -26,6 +29,8 @@ __global__ voi...