subplots() im = ax.imshow(qk_per_token.to(float).detach(), cmap='viridis') ax.set_xticks(range(len(prompt_split_as_tokens))) ax.set_yticks(range(len(prompt_split_as_tokens))) ax.set_xticklabels(prompt_split_as_
We want to see how the left bar(with green and red) changes when we filter out unique values of a feature. We can use multiple filters to see if there are any correlations among them. For example, if we click on upper and Female tab, we would see that green color dominates the bar...
in the next section we will unwrap the queries from multiple attention heads, the resulting shape is [32x128x4096] here, 32 is the number of attention heads in llama3, 128 is the size of the query vector and 4096 is the size of the token embedding q_layer0 = model["layers.0.atten...
load import load_tiktoken_bpe import torch import json import matplotlib.pyplot as plt tokenizer_path = "Meta-Llama-3-8B/tokenizer.model" special_tokens = [ "<|begin_of_text|>", "<|end_of_text|>", "<|reserved_special_token_0|>", "<|reserved_special_token_1|>", "<|reserved_...
in the next section we will unwrap the queries from multiple attention heads, the resulting shape is [32x128x4096] here, 32 is the number of attention heads in llama3, 128 is the size of the query vector and 4096 is the size of the token embedding q_layer0 = model["layers.0.atten...
in the next section we will unwrap the queries from multiple attention heads, the resulting shape is [32x128x4096] here, 32 is the number of attention heads in llama3, 128 is the size of the query vector and 4096 is the size of the token embedding q_layer0 = model["layers.0.atten...
subplots() im = ax.imshow(qk_per_token.to(float).detach(), cmap='viridis') ax.set_xticks(range(len(prompt_split_as_tokens))) ax.set_yticks(range(len(prompt_split_as_tokens))) ax.set_xticklabels(prompt_split_as_tokens) ax.set_yticklabels(prompt_split_as_tokens) ax.figure.color...
in the next section we will unwrap the queries from multiple attention heads, the resulting shape is [32x128x4096] here, 32 is the number of attention heads in llama3, 128 is the size of the query vector and 4096 is the size of the token embedding q_layer0 = model["layers.0.atten...