group+query+attention+onnxruntime

2025-03-11 16:04:01

拼音 [ 拼音 ]

[WebGPU] `Kernel "[GroupQueryAttention] /model/layers.0/attn/...

Kernel "[GroupQueryAttention] /model/layers.0/attn/GroupQueryAttention" failed. Error: Input "key" is expected to have 3, 4, or 5 dimensions". Describe the issue The following error occurs when trying to runhttps://huggingface.co/HuggingFaceTB/SmolVLM-Instructon WebGPU. Note that the CPU ...
[distibuted] torch.distributed.new_group failed with...

query-attention \ --num-query-groups 8" elif [ $MODEL_SIZE = 70B ]; then NUM_LAYERS=80 HIDDEN_SIZE=8192 NUM_ATTN_HEADS=64 INTERMEDIATE_SIZE=28672 gqa_options=" \ --group-query-attention \ --num-query-groups 8" elif [ $MODEL_SIZE = 175B ]; then NUM_LAYERS=96 HIDDEN_SIZE=12288...
nn.InstanceNorm and nn.GroupNorm are affected by padding, so...

mlp = nn.Linear(64, 10) def forward(self, x, key_padding_mask): x, _ = self.self_attention(query=x, key=x, value=x, need_weights=False, key_padding_mask=key_padding_mask) x = self.ln_1(x) x = self.mlp(x) return x block = EncoderBlock() params = list(block.parameters()...