This function has a discount factor typically expressed by γ, which in this case is set to one, because it is considered that all rewards are equally important for every time instant. i=T Q(s(t), a(t)) = ∑ rt+i·γi i=0 (9) The immediate reward function for this application ...