This introduces a CUDAGraphRunner class to bundle building and caching cudagraphs and attach them to the CUDAGraphTransform introduced in #977 . (Design issue #981 ) From my POV this is ready for r...
>>> import torch >>> import torchsort >>> x = torch.tensor([[8., 0., 5., 3., 2., 1., 6., 7., 9.]], requires_grad=True).cuda() >>> y = torchsort.soft_sort(x) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/shuiy/...