CUDA: use tensor cores for MMQ by JohannesGaessler · Pull...
This PR aims to add int8 tensor core support for mul_mat_q kernels (legacy quants only for now). The supported hardware will be Turing or newer. So far there is only a prototype for q8_0 which on i...