1.Describe the current behavior / 问题描述 (Mandatory / 必填) master分支开启graph_kernel_flags: "--enable_cluster_ops=MatMul --online_tuning=1“报错Kernel Error 样例: qwenvl网络 16卡,配置执行参数:dp16gbs64,训练报 Launch kernel failed: Default/GraphKernel Mul split-op193 2.Environment / 环境...
🐛 Describe the bug import torch @torch.compile(fullgraph=True) def f(q, k, v): q = torch.cos(q) with torch.backends.cuda.sdp_kernel(enable_flash=True): return F.scaled_dot_product_attention(q, k, v) f(*[torch.randn(1,8,1024,64, dtype=tor...
Tensors and Dynamic neural networks in Python with strong GPU acceleration - `torch.backends.cuda.sdp_kernel(enable_flash=True)` causes graph breaks · pytorch/pytorch@ec660c3