We present RobinHood, an offline contextual bandit algorithm designed to satisfy a broad family of fairness constraints. Our algorithm accepts multiple fairness definitions and allows users to construct their own unique fairness definitions for the problem at hand. We provide a theoretical analysis of ...
Methods, systems, and non-transitory computer readable storage media are disclosed for utilizing offline models to warm start online bandit learner models. For example, the disclose
Offline Reinforcement Learning as One Big Sequence Modeling Problem Michael Janner, Qiyang Li, and Sergey Levine. NeurIPS, 2021. Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism [video] Paria Rashidinejad, Banghua Zhu, Cong Ma, Jiantao Jiao, and Stuart Russell...
Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms - Li, Chu, et al. - 2011 () Citation Context ...i et al. (2010) introduce the contextual bandit problem, which is strictly more complex and more realistic then multi-armed bandits but less complex ...