allocated 2971 MiB for model parameters batch_size B=16 * seq_len T=1024 * num_processes=8 and total_batch_size=1048576 => setting grad_accum_steps=8 created directory: log_gpt2_1558M allocating 40409 MiB for activations val loss 11.129390 allocating 2971 MiB for parameter gradients allocat...
nf) nn.init.normal_(w, std=0.02) self.weight = nn.Parameter(w) self.bias = nn.Parameter(torch.zeros(nf)) def forward(self, x): size_out = x.size()[:-1] + (self.nf,) x = torch.addmm(self.
class GPT(nn.Module): def __init__(self, config): super().__init__() self.config = config self.transformer = nn.ModuleDict(dict( wte = nn.Embedding(config.vocab_size, config.n_embd), wpe = nn.Embedding(config.block_size, config.n_embd), h = nn.ModuleList([Block(config) for ...
num_parameters:1557686400=>bytes:3115372800allocated2971MiBformodel parametersbatch_size B=16* seq_len T=1024* num_processes=8andtotal_batch_size=1048576=> setting grad_accum_steps=8createddirectory:log_gpt2_1558Mallocating40409MiBforactivationsval loss11.129390allocating2971MiBforparameter gradientsallocati...
通过这个更大的数据集,训练一个更大的模型--1.5B parameter Transformer (Bert large 只有3.4个亿),但是模型大了很多,但是可惜的是作者发现效果并没有比Bert好很多,于是就又找到了一个叫做 zero-shot 的setting,但是这种方法在NLP里用的并不多,最常用的还是Bert那一类的---在一个比较大的数据集上做一个pre-...
#! /bin/bash # Runs the "345M" parameter model GPUS_PER_NODE=8 # Change for multinode config MASTER_ADDR=localhost MASTER_PORT=6000 NNODES=1 NODE_RANK=0 WORLD_SIZE=$(($GPUS_PER_NODE*$NNODES)) DATA_PATH=data/meg-gpt2_text_document CHECKPOINT_PATH=checkpoints/gpt2 DISTRIBUTED_ARGS="...
Description345M parameter generative Megatron model Publisher- Latest Versionv0.0 ModifiedApril 5, 2023 Size676.92 MB Conversational AINLPNLUNatural Language Understanding OverviewVersion HistoryFile BrowserRelated Collections Megatron-LM GPT2 345M Megatron is a large, powerful transformer. For this ...
GPT_CONFIG_124M={"vocab_size":50257,#词表大小"context_length":256,#上下文长度"emb_dim":768,#词嵌入维度"n_heads":12,#头的个数"n_layers":12,#N=12"drop_rate":0.1,"qkv_bias":False} 0.下载训练数据 训练数据来自伊迪丝·华顿(Edith Wharton)的一篇短篇小说《Roman Fever》。
#! /bin/bash # Runs the "345M" parameter model GPUS_PER_NODE=1 # Change for multinode config MASTER_ADDR=localhost MASTER_PORT=6000 NNODES=1 NODE_RANK=0 WORLD_SIZE=$(($GPUS_PER_NODE*$NNODES)) DATA_PATH=data/meg-gpt2_text_document CHECKPOINT_PATH=checkpoints/gpt2 DISTRIBUTED_ARGS="...