Bert等Transformer都是默认N=4 ,所以你说的3072应该是默认hidden size 768 的4倍。
NOTE: When adjusting the batch size in any of the configs, make sure to also adjust the number of accumulation steps, as the combination of both constitutes the actual batch size. Project based on the cookiecutter data science project template. #cookiecutterdatascience PyTorch Project based on th...
Bert等Transformer都是默认N=4 ,所以你说的3072应该是默认hidden size 768 的4倍。
Bert等Transformer都是默认N=4 ,所以你说的3072应该是默认hidden size 768 的4倍。
Bert等Transformer都是默认N=4 ,所以你说的3072应该是默认hidden size 768 的4倍。