Efficient Exploration Through Bayesian Deep Q-Networks

Creators: Azizzadenesheli, Kamyar; Brunskill, Emma; Anandkumar, Animashree

Style

An error occurred while generating the citation.

Abstract

We propose Bayesian Deep Q-Network (BDQN), a practical Thompson sampling based Reinforcement Learning (RL) Algorithm. Thompson sampling allows for targeted exploration in high dimensions through posterior sampling but is usually computationally expensive. We address this limitation by introducing uncertainty only at the output layer of the network through a Bayesian Linear Regression (BLR) model, which can be trained with fast closed-form updates and its samples can be drawn efficiently through the Gaussian distribution. We apply our method to a wide range of Atari games in Arcade Learning Environments. Since BDQN carries out more efficient exploration, it is able to reach higher rewards substantially faster than a key baseline, double deep Q network DDQN.

Additional Information

© 2018 Association for the Advancement of Artificial Intelligence. The authors would like to thank Zachary C. Lipton, Marlos C. Machado, Ian Osband, Gergely Neu, and the anonymous reviewers for their feedback and suggestions.

Attached Files

Submitted - 1802.04412.pdf

Files

1802.04412.pdf

Files (3.0 MB)

Name	Size	Download all
1802.04412.pdf md5:9ffa583addd8c646c583c64ea2c9de94	3.0 MB	Preview Download

Additional details

	All versions	This version
Views	0	0
Downloads	0	0
Data volume	0 Bytes	0 Bytes