dtype=torch.float16) median absdiff softmax tensor(0., device='cuda:0', dtype=torch.float16) --- SDPA max absdiff (no cast): tensor(0., device='cuda:0') SDPA max absdiff (with cast): tensor(0.000122, device='cuda:0') argwhere absdiff no cast > 0.0001 tensor([], device='cuda:...