In Monarch Mixer, we use layers built up from Monarch matrices to do both mixing across the sequence (replacing the Attention operation) and mixing across the model dimension (replacing the dense MLP). CMU, etc. Mamba Mamba is a new state space model architecture showing promising performance ...
Deep Learning from Scratch Overview ToyDL, as an education project, is aimed to make the concepts behind the deep learning as clear as possible. I do believe the key to reach the target is SIMPLE, as simple as possible. I'm trying to use little code to build this library, although it...
In Monarch Mixer, we use layers built up from Monarch matrices to do both mixing across the sequence (replacing the Attention operation) and mixing across the model dimension (replacing the dense MLP). CMU, etc. Mamba Mamba is a new state space model architecture showing promising performance ...