As a result, GPT-2 showed the ability to solve many new tasks without the need for supervised training on large data. Two factors mainly distinguished the succeeding GPT-3 model: the number of model parameters increased to 175B, and 45TB text data was used for pre-training. This model ...