...Survey: Survey of Small Language Models from Penn State, ...
T560M; 220M; 770M; 3B; 11B2019.9Pre-trainGenericGithubHFPaper Transformer:Attention is all you need.Ashish Vaswani. NeurIPS 2017. Mamba 1:Mamba: Linear-time sequence modeling with selective state spaces.Albert Gu and Tri Dao. COLM 2024.[Paper]. ...