Kernel "[GroupQueryAttention] /model/layers.0/attn/GroupQueryAttention" failed. Error: Input "key" is expected to have 3, 4, or 5 dimensions". Describe the issue The following error occurs when trying to runhttps://huggingface.co/HuggingFaceTB/SmolVLM-Instructon WebGPU. Note that the CPU ...
query-attention \ --num-query-groups 8" elif [ $MODEL_SIZE = 70B ]; then NUM_LAYERS=80 HIDDEN_SIZE=8192 NUM_ATTN_HEADS=64 INTERMEDIATE_SIZE=28672 gqa_options=" \ --group-query-attention \ --num-query-groups 8" elif [ $MODEL_SIZE = 175B ]; then NUM_LAYERS=96 HIDDEN_SIZE=12288...
mlp = nn.Linear(64, 10) def forward(self, x, key_padding_mask): x, _ = self.self_attention(query=x, key=x, value=x, need_weights=False, key_padding_mask=key_padding_mask) x = self.ln_1(x) x = self.mlp(x) return x block = EncoderBlock() params = list(block.parameters()...