It also supports RingAttention, BPT, memeff/flashattention, and vanilla transformers. Usage Use scan_query_chunk_size and scan_key_chunk_size to control the block size in blockwise compute of the self-attention. Use scan_mlp_chunk_size to control the block size in blockwise compute of the ...
Well, the alien sounds as it should be a mod rather than vanilla, but I'm all for parasites. So... I'm sorry for flooding you with this looong text, but i couldn't help myself. Some of these might be pretty bad idea and most are lethal without cure, but here you have some pa...
使用非常丝滑(除了安装?x) 语法糖超级甜 比triton的indexing强了一万倍 随手写了个vanilla linear attention的kernel链接就在h100上 比之前fla的kernel forward pass 快了近两倍,pip uninstall triton, pip install tilelang! 发布于 2025-03-07 18:15・IP 属地美国 ...