sFtorrategy Sensors 2017, 17, 844 9 of 15 is adopted for the action selection, the weight values will gradually decrease with the process of learning and the actions selected will rarely have greater chance to be chosen. For each state-action pair, the eligibility trace e(s, u) and the...