Principle of reinforcement learning For the simplest case that preserves the essence of solving the MAB, we consider a player who selects one of two slot machines, called slot machines 1 and 2 hereafter, with the goal of maximizing reward (known as the two-armed bandit problem). Denoting ...