num_attention_heads=2,hidden_size=24,attention_probs_dropout_prob=0.1,hidden_dropout_prob=0.1,intermediate_size=24,num_level=10,embedding_dim=64,EPS=1e-7,centroid={}):self.output_attentions=output_attentionsself.num_attention_heads=num_attention_headsself.hidden_size=hidden_sizeself.attention_pr...
self.proj = nn.Linear(dim, dim * 3) # 一般是 0.1 的 dropout,一般写作 config.attention_probs_dropout_prob # hidden_dropout_prob 一般也是 0.1 self.att_drop = nn.Dropout(0.1) # 不写这个应该也没人怪,应该好像是 MultiHeadAttention 中的产物,这个留给 MultiHeadAttention 也没有问题; self.outpu...
self.proj = nn.Linear(dim, dim * 3) # 一般是 0.1 的 dropout,一般写作 config.attention_probs_dropout_prob # hidden_dropout_prob 一般也是 0.1 self.att_drop = nn.Dropout(0.1) # 不写这个应该也没人怪,应该好像是 MultiHeadAttention 中的产物,这个留给 MultiHeadAttention 也没有问题; self.outpu...
"Can't set attention_probs_dropout_prob with value 0.1 for LEDConfig" but this parameter is required in line 149 of modeling_led.py classLEDEncoderSelfAttention(nn.Module): ... self.dropout=config.attention_probs_dropout_prob It works fine if I remove it from the config. To reproduce...
# "attention_probs_dropout_prob": 0.1, # "classifier_dropout": null, # "gradient_checkpointing": false, # "hidden_act": "gelu", # "hidden_dropout_prob": 0.1, # "hidden_size": 768, # "initializer_range": 0.02, # "intermediate_size": 3072, ...
self.dropout = nn.Dropout(config.attention_probs_dropout_prob) transposes and reshapes:这个函数主要是把维度大小为 [batch_size * seq_length * hidden_size] 的 q,k,v 向量变换成 [batch_size * num_attention_heads * seq_length * attention_he...
"attention_probs_dropout_prob": 0.1, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "initializer_range": 0.02, "intermediate_size": 3072, "max_position_embeddings": 512, "num_attention_heads": 12, "num_hidden_layers": 12, ...
self.dropout = nn.Dropout(config.attention_probs_dropout_prob) transposes and reshapes:这个函数主要是把维度大小为 [batch_size * seq_length * hidden_size] 的 q,k,v 向量变换成 [batch_size * num_attention_heads * seq_length * attention_head_size],便于后面做 Multi-Head Attention。
importtorch, mathfromtorchimportnn dropout_prob =0.1defforward(hidden_size,# dinput,#(b, s, d)attention_mask#(b, s, s)): query = nn.Linear(hidden_size, hidden_size)#(d,d)key = nn.Linear(hidden_size, hidden_size) value = nn.Linear(hidden_size, hidden_size) ...
Python version: 3.8.10 Huggingface_hub version: 0.16.2 Safetensors version: 0.3.1 PyTorch version (GPU?): 2.0.1+cu117 (True) Tensorflow version (GPU?): not installed (NA) Flax version (CPU?/GPU?/TPU?): not installed (NA) Jax version: not installed ...