Make @support_torch_compile work for XLA backend. With the custom dispatcher, overhead of dynamo guard evaluation is eliminated. For TPU backend, each models have 2 FX graphs/dynamo bytecodes: During profiling
🐛 Describe the bug torch.compile is not supported on Windows. torch.compile has dependency triton: triton-lang/triton#1640 Expected outcome Document the requirements to fix torch.compile.
When I start the Mixtral model with --enable-torch-compile, I get the following error File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/custom_op.py", line 16, in forward return self._forward_method(*args, **kwargs) TypeError: fused_moe_forward_native() takes from 6 ...
Checklist 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed. 2. Please use English, otherwise it will be clo...
🚀 The feature, motivation and pitch It would be pretty awesome if torch.compile would work with the zig C++ compiler which is invoked as zig c++ which uses llvm under the hood so should be (mostly?) fine compatibility-wise. I want to dep...
🐛 Describe the bug my repro: import torch import torch._dynamo as dynamo @dynamo.optimize() def func(): a = torch.rand(1,1,1) b = a.repeat(10, 1, 1) c = a.repeat_interleave(repeats=10, dim=0) return b, c b, c = func() torch.equal(b, c) t...
fMHA: support for paged attention in flash fMHA: Added backwards pass formerge_attentions fMHA: Addedtorch.compilesupport for 3 biases (LowerTriangularMask,LowerTriangularMaskWithTensorBiasandBlockDiagonalMask) - some might require PyTorch 2.4 ...
Tensors and Dynamic neural networks in Python with strong GPU acceleration - Support basic TorchBind in aot_compile and aoti_compile_and_package · pytorch/pytorch@0728785
Tensors and Dynamic neural networks in Python with strong GPU acceleration - `torch.compile` support for `CXX="zig c++"` · pytorch/pytorch@72f2b29
Follow up of #552. This PR adds torch library annotation to all FlashInfer kernels so that torch.compile can recognize the kernels. Most changes are tedious. I manually ran subsets of pytest test c...