fused_moe_kernel

2025-04-26 17:28:15

拼音 [ 拼音 ]

简拼 [ 简拼 ]

含义

...grouped_topk cuda融合算子fused_moe_gate kernel - 知乎

今天介绍一个在SGLang中针对DeepSeek V3模型中的https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/layers/moe/topk.py#L99-L149部分的biased_grouped_topk函数的kernel优化,在DeepSeek V3端到端测试中吞吐提升5%以上。这个函数用于DeepSeek V3/R1模型中的MOE层,用于计算每个token的专家选择概率。
Optimized fused MoE Kernel by pcmoritz · Pull Request #2913...

It is based on the observation that the TensorRT MoE kernels are working very well in the small batch size regime, whereas the fused MoE kernel is working much better in the large batch size regime. I have been trying to optimize the triton kernels in the small batch size regime too, but...
DeepseekMoE support with Fused MoE kernel (#2453) · javi...

270 + invoke_fused_moe_kernel(hidden_states, w1, intermediate_cache1, 271 + topk_weights, topk_ids, sorted_token_ids, 272 + expert_ids, num_tokens_post_padded, False, 273 + topk_ids.shape[1], config) 274 + 275 + ops.silu_and_mul(intermediate_cache2, intermediate_cache1.vie...
[Kernel] W8A16 Int8 inside FusedMoE (#7415) · wenxcs-msft/v...

A high-throughput and memory-efficient inference and serving engine for LLMs - [Kernel] W8A16 Int8 inside FusedMoE (#7415) · wenxcs-msft/vllm-xx@7fc23be
Tune MI300X fused MoE Triton kernel JSON config. by whchung...

Tune MI300X fused MoE Triton kernel JSON config. Align the JSON configs between MI300X and Radeon Graphics for BS=64, E=256, N=256, dtype=fp8_w8a8, block_shape=[128,128] case. The best tuned config was submitted in#3418but it was only for Radeon Graphics. Let MI300X adopt the ...
DeepseekMoE support with Fused MoE kernel (#2453) · Wisdom...

267 + sorted_token_ids, expert_ids, num_tokens_post_padded = moe_align_block_size( 268 + topk_ids, config['BLOCK_SIZE_M'], E) 269 + 270 + invoke_fused_moe_kernel(hidden_states, w1, intermediate_cache1, 271 + topk_weights, topk_ids, sorted_token_ids, 272 + expert_ids,...
[Kernel] W8A16 Int8 inside FusedMoE (#7415) · alex-jw-brooks...

A high-throughput and memory-efficient inference and serving engine for LLMs - [Kernel] W8A16 Int8 inside FusedMoE (#7415) · alex-jw-brooks/vllm@7fc23be

快搜汉语词典

fused_moe_kernel

拼音 [ 拼音 ]

简拼 [ 简拼 ]

含义

...grouped_topk cuda融合算子fused_moe_gate kernel - 知乎

Optimized fused MoE Kernel by pcmoritz · Pull Request #2913...

DeepseekMoE support with Fused MoE kernel (#2453) · javi...

[Kernel] W8A16 Int8 inside FusedMoE (#7415) · wenxcs-msft/v...

Tune MI300X fused MoE Triton kernel JSON config. by whchung...

DeepseekMoE support with Fused MoE kernel (#2453) · Wisdom...

[Kernel] W8A16 Int8 inside FusedMoE (#7415) · alex-jw-brooks...

缩写

今日热搜

上海网友集中晒蘑菇

近反义词

相关词语

相关搜索