3.2、方法验证——Chinchilla模型 一、文章简介 这篇文章来自DeepMind团队,是NeurIPS 2022的Outstanding Paper,所以其值得作为选读文章研究; 同时事后来看,这篇文章的结果对后来的大模型训练思路有较大影响,例如很多70B左右的模型,不管是国内的还是国外的,其训练数据量级都往2T到3T的tokens量级做,估计也与这篇文章有一...
它在许多下游任务上的性能显著超过了 Gopher (280B), GPT-3 (175B) Jurassic-1 (178B) 和 Megatron-Turing NLG (530B)。 [NeurIPS 2022]Training Compute-Optimal Large Language ModelsTraining Compute-Optimal Large Language Models 本文的 Chinchilla 也是后续对话系统Sparrow的基模型。 大模型面临的一些挑战,...
Wu, W.; Cantero-Chinchilla, S.; Prescott, D.; Remenyte-Prescott, R.; Chiachío, M.: A general approach to assessing SHM reliability considering sensor failures based on information theory. Reliab. Eng. Syst. Saf. (2024). https://doi.org/10.1016/j.ress.2024.110267 Article Google Scholar...
New research from DeepMind attempts to investigate the optimal model size and the number of tokens for training a transformer language model under a given compute budget.