Note that this cell is not optimized for performance. Please use`tf.contrib.cudnn_rnn.CudnnLSTM`for better performance on GPU, or`tf.contrib.rnn.LSTMBlockCell`and`tf.contrib.rnn.LSTMBlockFusedCell`for better pe
Also, the ELU activation is used in the cell. There is also batch normalization at many places (not drawn). The Multi-Head Attention Mechanism uses an ELU activation rather than unactivated Linears, for the keys and values and the query. There is here only one query rather than many ...
Also, the ELU activation is used in the cell. There is also batch normalization at many places (not drawn). The Multi-Head Attention Mechanism uses an ELU activation rather than unactivated Linears, for the keys and values and the query. There is here only one query rather than many ...
Therefore, multi-head attention with positional encoding is used on the most recent past values of the inner state cell so as to enable a better mid-term memory, such that at each new time steps, the cell looks back at it's own previous cell state values with an attention query.The LA...