Restless multi-armed bandits (RMAB) are an effective model to solve this problem as they are helpful to allocate limited resources among many agents under resource constraints, where patients behave differently depending on whether they are intervened on or not. However, RMABs assume the reward ...
We pose the problem as a Restless Multi-armed Bandit(RMAB) Problem and propose a Whittle index based policy which is known to be asymptotically optimal. We explicitly characterize the Whittle indices. We numerically evaluate the proposed policy and also compare it to a greedy policy. We show ...
In the multi-armed bandit (MAB) problem there are k distributions associated with the rewards of playing each of k strategies (slot machine arms). The reward distributions are initially unknown to the player. The player iteratively plays one strategy per round, observes the associated...
We obtain the conditions for the emergence of the swarm intelligence effect in an interactive game of restless multi-armed bandit (rMAB). A player competes with multiple agents. Each bandit has a payoff that changes with a probability $p_{c}$ per round. The agents and player choose one ...
We describe and analyze a restless multi-armed bandit (RMAB) in which, in each time-slot, the instantaneous reward from the playing of an arm depends on the time since the arm was last played. This model is motivated by recommendation systems where the payoff from a recommendation on ...
Multi-armed bandit tasks have proven a use- ful paradigm to study the exploration-exploitation trade-off, theoretically (e.g., Gittins, 1979; Whittle, 1988) as well as empirically (e.g., Acuna & Schrater, 2008; Daw et al., 2006; Knox et al., 2012; Steyvers et al., 2009). For ...
In contrast to most of the existing literature, we consider a finite-horizon problem with multiple actions and time-dependent (i.e. nonstationary) upper bound on the number of bandits that can be activated at each time period; indeed, our analysis can also be applied in the sett...
We address the intractable multi-armed bandit problem with switching costs, for which an index that partially characterizes optimal policies was introduced... Jose Nino-Mora - 《Informs Journal on Computing》 被引量: 35发表: 2008年 Behaviors Coordination Using Restless Bandits Allocation Indexes Restle...
Zayas-Caban, G., Jasin, S., Wang, G. 2017. An asymptotically optimal heuristic for general non- stationary finite-horizon restless multi-armed multi-action bandits. Working paper, Ross School of Business, University of Michigan, Ann Arbor, MI....
We refer to such models as Reward-Observing Restless Multi-Armed Bandit (RORMAB) problems. These types of optimal control problems were previously considered in the literature in the context of: (i) the Gilbert-Elliot (GE) channels (where channels are modelled as a two state Markov chain), ...