GitHub - luchangli03/export_llama_to_onnx: export llama to onnx
Please uninstall/disable FlashAttention (and maybe xformers) before model conversion. For kv_cache, some models use the format of [batch, head, seq, hidden], while some use [batch, seq, head, hidden]. However,