正在训练RWKV-5World v2 1.6/3/7B 多语言模型(支持世界所有100+语言,同时代码能力也强),测试性能如下: 从前的 RWKV-4 World v1 和Pythia相当,现在大家都升级了,所以我们也升级。 从趋势看,训练完成 100% 的 RWKV-5 World v2 1.6B 英文能力(avg%)可达 62% 的 SOTA 水准。 同时,它的多语言能力(xavg...
Rename the base checkpoint in your model folder to rwkv-init.pth, and change the training commands to use --n_layer 32 --n_embd 4096 --vocab_size 65536 --lr_init 1e-5 --lr_final 1e-5 for 7B. 0.1B = --n_layer 12 --n_embd 768 // 0.4B = --n_layer 24 --n_embd 1024...
Measurements were made on CPU AMD Ryzen 9 5900X & GPU AMD Radeon RX 7900 XTX. The model isRWKV-novel-4-World-7B-20230810-ctx128k, 32 layers were offloaded to GPU. Latency per token in ms shown. Format1 thread2 threads4 threads8 threads24 threads ...
Rename the base checkpoint in your model folder to rwkv-init.pth, and change the training commands to use --n_layer 32 --n_embd 4096 --vocab_size 65536 --lr_init 1e-5 --lr_final 1e-5 for 7B. 0.1B = --n_layer 12 --n_embd 768 // 0.4B = --n_layer 24 --n_embd 1024...
Rename the base checkpoint in your model folder to rwkv-init.pth, and change the training commands to use --n_layer 32 --n_embd 4096 --vocab_size 65536 --lr_init 1e-5 --lr_final 1e-5 for 7B. 0.1B = --n_layer 12 --n_embd 768 // 0.4B = --n_layer 24 --n_embd 1024...
模型下载链接: https://modelscope.cn/models/Blink_DL/rwkv-6-world/file/view/master?fileName=RWKV-x060-World-7B-v2.1-20240507-ctx4096.pth&status=2 下载后以cuda fp16i8 -> cuda fp16 *1策略直接运行,没有问题; 以同样的策略转换,然后切换至转换完毕的量化模型,
f"{args.proj_dir}/rwkv-final.pth", ) def on_train_epoch_start(self, trainer, pl_module): args = self.args if pl.__version__[0]=='2': dataset = trainer.train_dataloader.dataset else: dataset = trainer.train_dataloader.dataset.datasets assert "MyDataset" in str(dataset) dataset.glob...
The default setting will train a 3B rwkv model on librispeech 960h dataset, with 4 devices and a batch size of 4 per device (real batch size = 16).The script will overwrite the .pth file in output/. Make sure to save the needed .pth model files under this path to other dir ...
hey i am getting this error : root@DESKTOP-TTBPHVB:~/rwkv/RWKV-v5-lora# python3 train.py --load_model RWKV-5-World-0.4B-v2-20231113-ctx4096.pth --proj_dir . --data_file output --data_type binidx --vocab_size 65536 --ctx_len 1024 --epoch_...
Rename the base checkpoint in your model folder to rwkv-init.pth, and change the training commands to use --n_layer 32 --n_embd 4096 --vocab_size 65536 --lr_init 1e-5 --lr_final 1e-5 for 7B. 0.1B = --n_layer 12 --n_embd 768 // 0.4B = --n_layer 24 --n_embd 1024...