⚡ main ~/litgpt litgpt chat checkpoints/google/gemma-2b {'access_token': None, 'checkpoint_dir': PosixPath('checkpoints/google/gemma-2b'), 'compile': False, 'max_new_tokens': 50, 'multiline': False, 'precision': None, 'quantize': None, 'temperature': 0.8, 'top_k': 50, 'to...
模型:https://huggingface.co/mustafaaljadery/gemma-2B-10M 代码:https://github.com/mustafaaljadery/gemma-2B-10M?tab=readme-ov-file
Reminder I have read the README and searched the existing issues. System Info 乌班图18 + 单机八卡4090 Reproduction deepspeed --include="localhost:4,5,6,7" src/train.py --model_name_or_path "google/gemma-2-2b-it" --stage sft --do_train --finetuning_type full --dataset xxx --templa...
SAELenshttps://github.com/jbloomAus/SAELensGoogle Colab 笔记本教程https://colab.research.google.com/drive/17dQFYUYnuKnP6OwQPH9v_GSYUW5aj-Rp 关键链接 Google DeepMind 博客文章https://deepmind.google/discover/blog/gemma-scope-helping-safety-researchers-shed-light-on-the-inner-workings-of-language...
Github地址:https://github.com/google/gemma_pytorch 论文地址:https://storage.googleapis.com/deepmind-media/gemma/gemma-report.pdf 官方博客:Gemma: Google introduces new state-of-the-art open models 其它 BPE。 又称digram coding 双字母组合编码,是一种数据压缩 算法,用来在固定大小的词表中实现可变...
Folders and files Name Last commit message Last commit date Latest commit History 2 Commits .github/workflows examples src .dockerignore .gitignore Dockerfile LICENSE Makefile README.md pyproject.toml README MIT license This project is a collection of notebook and a simple flask web server to se...
Describe the bug When attempting to shard a gemma_2b_en model across two (consumer-grade) GPUs, I get: ValueError: One of device_put args was given the sharding of NamedSharding(mesh=Mesh('data': 1, 'model': 2), spec=PartitionSpec('model...
deepspeed --num_gpus 4 --master_port=9901 src/train_bash.py --deepspeed ds_config.json --stage sft --do_train True --model_name_or_path ../gemma-2b --finetuning_type lora --template default --flash_attn True --dataset_dir data ...
@@ -120,6 +120,61 @@ LlmParameters GetGemma7BParams() { return llm_params; } LlmParameters GetGemma2_2BParams() { LlmParameters llm_params; llm_params.set_start_token_id(2); llm_params.add_stop_tokens("<eos>"); llm_params.add_stop_tokens("<end_of_turn>"); llm_params.set_voc...
https://github.com/yongzhuo/gemma-sft 全部weights要用bf16/fp32/tf32, 使用fp16微调十几或几十的步数后大概率loss=nan;(即便layer-norm是fp32也不行, LLaMA就没有这个问题, 原因暂时未知) 强烈建议微调SFT的时候同预训练PT,要计算inputh和output的损失函数Loss, ADVGEN数据集中如果只计算Output的Loss, 会...