Some models (e.g. InternVideo2 multi modality) depend on flash attention extensions. We would like to add additional outputs for: fused_dense_lib: csrc/fused_dense_lib layer_norm: csrc/layer_norm
When I install and compile flash-attention from the source code in the dockerfile, I implemented the following command: cd flash-attention/csrc/fused_dense_lib && /opt/conda/bin/pip install . And I met the problem: WARNING: Running pip a...
using CUDA, LuxLib, Enzyme, NNlib, Zygote function fused_dense!(y, act, weight, x, b) op = LuxLib.internal_operation_mode((y, weight, x, b)) LuxLib.Impl.fused_dense!(y, op, act, weight, x, b) return end # CPU case y = zeros(Float32, 2, 2) weight = rand(Float32, 2...