We study the adversarial multi-armed bandit problem, in which a player must iteratively make online decisions with linear loss vectors and hopes to achieve a small total loss. We consider a natural measure on the loss vectors, called deviation, which is the sum of the distances between every ...
In the multi-armed bandit problem, a gambler must decidewhich arm of K non-identical slot machines to play in a sequenceof trials so as to maximize his reward. This classicalproblem has received much attention because of the simplemodel it provides of the trade-off between exploration (trying...
We study "adversarial scaling", a multi-armed bandit model where rewards have a stochastic and an adversarial component. Our model captures display advertising where the "click-through-rate" can be decomposed to a (fixed across time) arm-quality component and a non-stochastic user-relevance ...
and Section3introduces the network model. Section4describes the underlying system model for task scheduling based on online learning. In Section6, we present the proposed method for resource-aware task scheduling. Section7presents the experimental setup and discusses the simulation results for a target...
The second is to use the reinforcement learning (RL) approach as a basis for DDA based on the multi-armed bandit (MAB) algorithm based on the latent space representation. A data dimensionality reduction model, called adversarial autoencoder (AAE), is considered to represent each matrix as a ...
Bandit Based Monte-Carlo Planning. In European Conference on Machine Learning; Springer: Berlin/Heidelberg, Germany, 2006; pp. 282–293. ISBN 978-3-540-45375-8. [Google Scholar] Auer, P.; Cesa-Bianchi, N.; Fischer, P. Finite-time analysis of the multiarmed bandit problem. Mach. Learn....
(38%)Dovydas Joksas; Luis Muñoz-González; Emil Lupu; Adnan Mehonic Infighting in the Dark: Multi-Labels Backdoor Attack in Federated Learning. (13%)Ye Li; Yanchao Zhao; Chengcheng Zhu; Jiale Zhang Towards Robust Extractive Question Answering Models: Rethinking the Training Methodology. (10...
Adversarial Multi-armed Bandit for mmWave Beam Alignment with One-Bit Feedbackdoi:10.1145/3306309.3306315Irched ChafaaE. Veronica BelmegaMérouane DebbahACMPerformance Evaluation Methodolgies and Tools
To answer those questions, we model the spectrum usage monitoring problem as an adversarial multi-armed bandit problem with switching costs and design two effective online algorithms, SpecWatch and SpecWatch+. In SpecWatch, we select strategies based on the monit...
To answer those questions, we model the spectrum monitoring problem as an adversarial multi-armed bandit problem with switching costs (MAB-SC), propose an effective framework, and design two online algorithms, SpecWatch-II and SpecWatch-III, based on the same framework. To evaluate the ...