Explore All features Documentation GitHub Skills Blog Solutions By company size Enterprises Small and medium teams Startups Nonprofits By use case DevSecOps DevOps CI/CD View all use cases By industry Healthcare Financial services Manufacturing Government View all industries View all sol...
Mamba SSM architecture. Contribute to nateanl/mamba development by creating an account on GitHub.
Actions Projects Security Insights Additional navigation options Files main .github 3rdparty assets benchmarks csrc evals mamba_ssm models __init__.py config_mamba.py mixer_seq_simple.py modules ops utils __init__.py tests .gitignore
Files main .github 3rdparty assets benchmarks csrc evals mamba_ssm models modules ops triton __init__.py selective_scan_interface.py utils __init__.py tests .gitignore .gitmodules AUTHORS LICENSE README.md setup.pyBreadcrumbs mamba /mamba_ssm /ops/ selective_scan_interface.py...
# Reference (Megatron-LM): https://github.com/NVIDIA/Megatron-LM/blob/main/megatron/model/gpt_model.py for name, p in module.named_parameters(): if name in ["out_proj.weight", "fc2.weight"]: # Special Scaled Initialization --> There are 2 Layer Norms per Transformer Block ...
main .github 3rdparty assets benchmarks csrc evals mamba_ssm models modules ops triton __init__.py selective_scan_interface.py utils __init__.py tests .gitignore .gitmodules AUTHORS LICENSE README.md setup.py Breadcrumbs mamba /mamba_ssm ...
Code Pull requests Actions Projects Security Insights Additional navigation options Files main .github 3rdparty assets benchmarks csrc evals mamba_ssm models __init__.py config_mamba.py mixer_seq_simple.py modules ops utils __init__.py
Mamba SSM architecture. Contribute to huxili/mamba development by creating an account on GitHub.
Explore All features Documentation GitHub Skills Blog Solutions By company size Enterprises Small and medium teams Startups By use case DevSecOps DevOps CI/CD View all use cases By industry Healthcare Financial services Manufacturing Government View all industries View all solutions Re...
lm_eval --model mamba_ssm --model_args pretrained=state-spaces/mamba-2.8b-slimpj --tasks boolq,piqa,hellaswag,winogrande,arc_easy,arc_challenge,openbookqa,race,truthfulqa_mc2 --device cuda --batch_size 256 lm_eval --model mamba_ssm --model_args pretrained=state-spaces/mamba-2.8b-slimpj...