max_index]# Q learning formulaQ[current_state,action]=R[current_state,action]+gamma*max_value# Update Q matrixupdate(initial_state,action,gamma)#---# Training# Train over 10 000 iterations. (Re-iterate the process above).foriinrange(10000):...