Published 2016
| Submitted + Supplemental Material + Published
Journal Article
Open
Batched bandit problems
Chicago
Abstract
Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic bandits under the constraint that the employed policy must split trials into a small number of batches. We propose a simple policy, and show that a very small number of batches gives close to minimax optimal regret bounds. As a byproduct, we derive optimal policies with low switching cost for stochastic bandits.
Additional Information
© 2016 Institute of Mathematical Statistics. Received May 2015; revised August 2015. Supported by ANR Grant ANR-13-JS01-0004. Supported by NSF Grants DMS-13-17308 and CAREER. Supported by NSF Grant SES-1156154.Attached Files
Published - euclid.aos.1458245731.pdf
Submitted - 1505.00369v3.pdf
Supplemental Material - euclid.aos.1458245731_si.pdf
Files
euclid.aos.1458245731.pdf
Additional details
- Eprint ID
- 66261
- Resolver ID
- CaltechAUTHORS:20160418-155357882
- Agence Nationale pour la Recherche (ANR)
- ANR-13-JS01-0004
- NSF
- DMS-13-17308
- NSF
- SES-1156154
- Created
-
2016-04-19Created from EPrint's datestamp field
- Updated
-
2021-11-10Created from EPrint's last_modified field