In RWKV-2, the contribution of F[i] to F[t+1] is weighted by . The is a non-linearity and we can use sigmoid. Note is not in the denominator, and I call R the "receptance". The is the time-decay factor. I proposed the same idea (scaling the attention by distance) in Aug ...