peak+fp64+tensor+core

2025-03-10 15:01:40

拼音 [ 拼音 ]

GitHub - peakcrosser7/cutlass: CUDA Templates for Linear...

To compile a subset of Tensor Core GEMM kernels with FP32 accumulation and FP16 input targeting NVIDIA Ampere and Turing architecture, use the below cmake command line:$ cmake .. -DCUTLASS_NVCC_ARCHS='75;80' -DCUTLASS_LIBRARY_KERNELS=cutlass_tensorop_s*gemm_f16_*_nt_align8 ... $ ...