The thing is, this particular FFN in transformer encoder has two linear layers, according to the implementation of TransformerEncoderLayer : # Implementation of Feedforward model self.linear1 = Linear(d_model, dim_feedforward, **factory_kwargs) self.dropout = Dropout(dropout) self.linear2 = Li...