load import load_tiktoken_bpe import torch import json import matplotlib.pyplot as plt tokenizer_path = "Meta-Llama-3-8B/tokenizer.model" special_tokens = [ "<|begin_of_text|>", "<|end_of_text|>", "<|reserved_special_token_0|>", "<|reserved_special_token_1|>", "<|reserved_...
in the next section we will unwrap the queries from multiple attention heads, the resulting shape is [32x128x4096] here, 32 is the number of attention heads in llama3, 128 is the size of the query vector and 4096 is the size of the token embedding q_layer0 = model["layers.0.atten...
subplots() im = ax.imshow(qk_per_token.to(float).detach(), cmap='viridis') ax.set_xticks(range(len(prompt_split_as_tokens))) ax.set_yticks(range(len(prompt_split_as_tokens))) ax.set_xticklabels(prompt_split_as_tokens) ax.set_yticklabels(prompt_split_as_tokens) ax.figure.color...
in the next section we will unwrap the queries from multiple attention heads, the resulting shape is [32x128x4096] here, 32 is the number of attention heads in llama3, 128 is the size of the query vector and 4096 is the size of the token embedding q_layer0 = model["layers.0.atten...
in the next section we will unwrap the queries from multiple attention heads, the resulting shape is [32x128x4096] here, 32 is the number of attention heads in llama3, 128 is the size of the query vector and 4096 is the size of the token embedding q_layer0 = model["layers.0.atten...
subplots() im = ax.imshow(qk_per_token.to(float).detach(), cmap='viridis') ax.set_xticks(range(len(prompt_split_as_tokens))) ax.set_yticks(range(len(prompt_split_as_tokens))) ax.set_xticklabels(prompt_split_as_tokens) ax.set_yticklabels(prompt_split_as_tokens) ax.figure.color...
in the next section we will unwrap the queries from multiple attention heads, the resulting shape is [32x128x4096] here, 32 is the number of attention heads in llama3, 128 is the size of the query vector and 4096 is the size of the token embedding q_layer0 = model["layers.0.atten...