The Mamba architecture consists of multiple stacked layers of ‘Mamba blocks’. Which, judging from the above illustration, consists of quite a few components. Another important thing to note is that the authors add the output from Selective SSM to the original input and then apply anormalization...