The model simply says what the next word token will be based on the last word tokens where the relationship is modeled through “96 hidden state transformers”. The model is generated by processing the relationship the current symbols of their step (0…95) related to all the others at that ...