I tried to settorch.autograd.set_detect_anomaly(True)andCUDA_LAUNCH_BLOCKING=1to find out what happened, and the result showed thatFlashAttnQKVPackedFuncBackwardreturned NAN values for its output. It is very strange, because the model params are all finite, and the NAN happen after several ...
>>> from flash_attn import flash_attn_qkvpacked_func, flash_attn_func resulted in Traceback (most recent call last): File "<stdin>", line 1, in <module> ImportError: cannot import name 'flash_attn_qkvpacked_func' from 'flash_attn' (/usr/local/lib/python3.10/dist-packages/flash_at...
zigzag_ring_flash_attn_varlen_qkvpacked.py test test_zigzag_varlen_qkvpacked_func.py 8 changes: 4 additions & 4 deletions 8 README.md Original file line numberDiff line numberDiff line change @@ -18,7 +18,9 @@ The current performance on 8xH800 is ([benchmark/benchmark_qkv...
feat = flash_attn.flash_attn_varlen_qkvpacked_func( AttributeError: module 'flash_attn' has no attribute 'flash_attn_varlen_qkvpacked_func'