Coupled with either 4x NVIDIA L40 or L40S PCIe, 48 GB GPUs and enabled by Intel Xeon Scalable processors, this server provides the processing muscle for reliable, precise, and fast 3D Graphics and Compute centric workloads. The PowerEdge R760xa server is position...
Fp8 is now supported for Qwen, but MoE Fp8 requires compute_capability == 9.0 (aka Hopper GPUs) Our MoE kernels are currently implemented using Triton, which require triton==3.0 for Fp8 on Ada Lovelace. We are limited by PyTorch's version of triton ...