24,128,128), I found that only output[:,0,0,:] and output[:,23,127,:] are the same, while others have a significant difference. Besides, I also addedcontiguous()before permuting and testing the tensor.stride(). There is no help for debugging....
# Kernel Performance Events And Counters # Kernel Performance Events And Counters # # CONFIG_PERF_EVENTS=y CONFIG_PERF_EVENTS=y # CONFIG_DEBUG_PERF_USE_VMALLOC is not set # CONFIG_DEBUG_PERF_USE_VMALLOC is not set CONFIG_VM_EVENT_COUNTERS=y CONFIG_VM_EVENT_COUNTERS=y CONFIG_SLUB_DEBUG=y...