associative search task包括trial and error,search for the best actions和association,也称为contextual bandits。此类问题像full RL problem,包括学习一个policy,也想bandit problem,使用immediate reward。 2.10 Summary 本章列了一些平衡exploration and exploitation的简单方法:epsilon-greedy,UCB,gradient bandit algorith...
Altered Statistical Learning and Decision-Making in Methamphetamine Dependence: Evidence from a Two-Armed Bandit Task Understanding how humans weigh long-term and short-term goals is important for both basic cognitive science and clinical neuroscience, as substance users n... KM Harlé,S Zhang,S Max...
MTS-WS is able to choose effective workers because it can maintain accurate worker quality information by updating evaluation parameters according to the status of task accomplishment. We theoretically prove that our C-MAB incentive mechanism is selection efficient, computationally efficient, individually ...