Linformer Language Model fromlinformer_pytorchimportLinformerLMimporttorchmodel=LinformerLM(num_tokens=10000,# Number of tokens in the LMinput_size=512,# Dimension 1 of the inputchannels=64,# Dimension 2 of the inputdim_d=None,# Overwrites the inner dim of the attention heads. If None, sticks...
8 linformer_pytorch/visualizer.py @@ -39,15 +39,15 @@ def get_head_visualization(self, depth_no, max_depth, head_no, axs): def plot_all_heads(self, title="Visualization of Attention Heads", show=True, save_file=None): """ Showcases all of the heads on a grid. It shows the...
The upper right tri is masked (Here, this is exactly what you are referring to). This is done with the functionget_causal_mask: defgen_causal_mask(input_size,dim_k,full_attention=False):"""Generates a causal mask of size (input_size, dim_k) for linformerElse, it generates (input_siz...