use rwkv 0.8.26+ to auto-load the trained "time_state" Initializing RWKV 5/6 Models When you train RWKV from scratch, try my initialization for best performance. Check generate_init_weight() of src/model.py: emb.weight => nn.init.uniform_(a=-1e-4, b=1e-4) (Note ln0 of bloc...
"url": "https://huggingface.co/xiaol/RWKV-v5-world-v2-1.5B-one-state-slim-16k/blob/main/RWKV-5-1B5-one-state-slim.pth", "downloadUrl": "https://huggingface.co/xiaol/RWKV-v5-world-v2-1.5B-one-state-slim-16k/resolve/main/RWKV-5-1B5-one-state-slim.pth", "tags": [ "Fine...
use rwkv 0.8.26+ to auto-load the trained "time_state" Initializing RWKV 5/6 Models When you train RWKV from scratch, try my initialization for best performance. Check generate_init_weight() of src/model.py: emb.weight => nn.init.uniform_(a=-1e-4, b=1e-4) (Note ln0 of bloc...
"url": "https://huggingface.co/xiaol/RWKV-v5-world-v2-1.5B-one-state-slim-16k/blob/main/RWKV-5-1B5-one-state-slim.pth", "downloadUrl": "https://huggingface.co/xiaol/RWKV-v5-world-v2-1.5B-one-state-slim-16k/resolve/main/RWKV-5-1B5-one-state-slim.pth", "tags": [ "...
use rwkv 0.8.26+ to auto-load the trained "time_state" Initializing RWKV 5/6 Models When you train RWKV from scratch, try my initialization for best performance. Check generate_init_weight() of src/model.py: emb.weight => nn.init.uniform_(a=-1e-4, b=1e-4) (Note ln0 of bloc...
use rwkv 0.8.26+ to auto-load the trained "time_state" Initializing RWKV 5/6 Models When you train RWKV from scratch, try my initialization for best performance. Check generate_init_weight() of src/model.py: emb.weight => nn.init.uniform_(a=-1e-4, b=1e-4) (Note ln0 of bloc...
RWKV-5 is multi-head and here shows one head. There is also a LayerNorm for each head (hence actually GroupNorm). RWKV-4 with real-valuedk&v&u&wRWKV-5 with matrix-valuedk†v&u&wy0r0uk0v0uk0r0(uk0†v0)y1r1uk1v1+k0v0uk1+k0r1(uk1†v1+k0†v0)y2r2uk2v2+k1v1+wk0...
"name": "RWKV-5-1B5-one-state-slim.pth", "desc": { "en": "RWKV-5 Global Languages 1.5B v2 Ctx16k Role Play", "zh": "RWKV-5 全球语言 1.5B v2 16k上下文 角色扮演", "ja": "RWKV-5 グローバル言語 1.5B v2 16kコンテキスト ロールプレイ" }, "size": 3155589871...
use rwkv 0.8.26+ to auto-load the trained "time_state" Initializing RWKV 5/6 Models When you train RWKV from scratch, try my initialization for best performance. Check generate_init_weight() of src/model.py: emb.weight => nn.init.uniform_(a=-1e-4, b=1e-4) (Note ln0 of bloc...
use rwkv 0.8.26+ to auto-load the trained "time_state" Initializing RWKV 5/6 Models When you train RWKV from scratch, try my initialization for best performance. Check generate_init_weight() of src/model.py: emb.weight => nn.init.uniform_(a=-1e-4, b=1e-4) ...