no loss calculated during this stage the 2nd RNN (decoder) takes the final hidden state of the 1st RNN as input and generates (usually in an auto-regressive manner) the output seqence when the <STOP> token is predicted, 2nd RNN stops predicting the output...
bank appears agreed youth trip train teacher subjects rock presented mentioned interesting instance germany cells apartment watched sweet san russian provides opposite lives lady index immediate forced event essential cross campaign solid session recognized reality providence offer memory literary knife indicate...