Our strategy leverages a combinatorial multi-armed bandit framework with an upper confidence bound approach to guide decision-making. We demonstrate the efficacy of our approach through a combination of regret analysis and simulations grounded in realistic scenarios....
In this paper, we study the stochastic combinatorial multi-armed bandit (CMAB) framework that allows a general nonlinear reward function, whose expected value may not depend only on the means of the input random variables but possibly on the entire distributions of these variables. Our framew...
We define a general framework for a large class of combinatorial multi-armed bandit (CMAB) problems, where simple arms with unknown distributions form super arms. In each round, a super arm is played and the outcomes of its related simple arms are observed, which helps th...
Combinatorial Stochastic-Greedy Bandit We propose a novel combinatorial stochastic-greedy bandit (SGB) algorithm for combinatorial multi-armed bandit problems when no extra information other than the joint reward of the selected set of n arms at each time step t∈[T] is observed. SGB adopts an ...
Best Arm Identification in Multi-armed Bandits with Delayed Feedback PMLR, 2018. paper Grover, Aditya and Markov, Todor and Attia, Peter and Jin, Norman and Perkins, Nicolas and Cheong, Bryan and Chen, Michael and Yang, Zi and Harris, Stephen and Chueh, William and others Ranked Reward: ...
Combinatorial OptimizationMulti-Armed BanditMixed-Integer ProgrammingWe study dynamic decision-making under uncertainty when, at each period, the decision maker faces a different instance of a combinatorial optimization problem.doi:10.2139/ssrn.3041893Sajad ModaresiDenis Saure...
Karnin, Z., Koren, T., Somekh, O.: Almost optimal exploration in multi-armed bandits. In: Proceedings of the 30th International Conference on Machine Learning, pp. 1238–1246 (2013) Mannor, S., Tsitsiklis, J.N.: The sample complexity of exploration in the multi-armed bandit problem. J...
Best arm identification in multi-armed bandits with delayed feedback Huiling Huiling Solving a New 3D Bin Packing Problem with Deep Reinforcement Learning Method Xijun Xijun A Multi-task Selected Learning Approach for Solving 3D Flexible Bin Packing Problem Xijun Xijun Pointer Networks Huiling, Xijun ...
We investigate the combinatorial multi-armed bandit problem where an action is to select k arms from a set of base arms, and its reward is the maximum of the sample values of these k arms, under a weak feedback structure that only returns the value and...
Vermorel Joanne, et al., “Multi-Armed Bandit Algorithms and Empirical Evaluation”, Proceedings of the 16th European Conference on Machine Learning, Oct. 2005, vol. 3720, pp. 437-448. Aono, et al., “Amoeba-inspired Tug-of-War algorithms for exploration—exploitation dilemma in extended Bandi...