To predict the next value, we utilize the embedding of the last token. # Calculate logits by matrix multiplication between the final embedding and the transpose of the output weight tensor logits = torch.matmul(final_embedding[-1], model["output.weight"].T) # Find the index of the maximum...