Batched bandit problems

Creators: Perchet, Vianney; Rigollet, Philippe; Chassang, Sylvain; Snowberg, Erik

Style

An error occurred while generating the citation.

Abstract

Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic bandits under the constraint that the employed policy must split trials into a small number of batches. We propose a simple policy, and show that a very small number of batches gives close to minimax optimal regret bounds. As a byproduct, we derive optimal policies with low switching cost for stochastic bandits.

Additional Information

© 2016 Institute of Mathematical Statistics. Received May 2015; revised August 2015. Supported by ANR Grant ANR-13-JS01-0004. Supported by NSF Grants DMS-13-17308 and CAREER. Supported by NSF Grant SES-1156154.

Attached Files

Published - euclid.aos.1458245731.pdf

Submitted - 1505.00369v3.pdf

Supplemental Material - euclid.aos.1458245731_si.pdf

Files

euclid.aos.1458245731.pdf

Files (745.9 kB)

Name	Size	Download all
euclid.aos.1458245731.pdf md5:506aeea864febe923b897c42440de75c	248.0 kB	Preview Download
1505.00369v3.pdf md5:5881681e260edfe8b381c7913203517b	371.9 kB	Preview Download
euclid.aos.1458245731_si.pdf md5:8d9a985fe2033a879acce427202018f6	126.0 kB	Preview Download

Additional details

	All versions	This version
Views	0	0
Downloads	0	0
Data volume	0 Bytes	0 Bytes