Value Iteration (VI) is one popular approximate dynamic programming [1–7] and reinforcement learning algorithm [8–13], together with Policy Iteration. VI Reinforce- ment Learning (VIRL) algorithm comes in many implementation flavors, online or offline, off-policy or on-policy, batch-wise or ...