Reinforcement Learning in Rich-Observation MDPs using Spectral Methods
Abstract
Reinforcement learning (RL) in Markov decision processes (MDPs) with large state spaces is a challenging problem. The performance of standard RL algorithms degrades drastically with the dimensionality of state space. However, in practice, these large MDPs typically incorporate a latent or hidden low-dimensional structure. In this paper, we study the setting of rich-observation Markov decision processes (ROMDP), where there are a small number of hidden states which possess an injective mapping to the observation states. In other words, every observation state is generated through a single hidden state, and this mapping is unknown a priori. We introduce a spectral decomposition method that consistently learns this mapping, and more importantly, achieves it with low regret. The estimated mapping is integrated into an optimistic RL algorithm (UCRL), which operates on the estimated hidden space. We derive finite-time regret bounds for our algorithm with a weak dependence on the dimensionality of the observed space. In fact, our algorithm asymptotically achieves the same average regret as the oracle UCRL algorithm, which has the knowledge of the mapping from hidden to observed spaces. Thus, we derive an efficient spectral RL algorithm for ROMDPs.
Additional Information
© 2018 Kamyar Azizzadenesheli, Alessandro Lazaric, and Animashree Anandkumar. K. Azizzadenesheli is supported in part by NSF Career Award CCF-1254106 and AFOSR YIP FA9550-15-1-0221. A. Lazaric is supported in part by a grant from CPER Nord-Pas de Calais/FEDER DATA Advanced data science and technologies 2015-2020, CRIStAL (Centre de Recherche en Informatique et Automatique de Lille), and the French National Research Agency (ANR) under project ExTra-Learn n.ANR-14- CE24-0010-01. A. Anandkumar is supported in part by Microsoft Faculty Fellowship, Google faculty award, Adobe grant, NSF Career Award CCF-1254106, AFOSR YIP FA9550-15-1-0221, and Army Award No. W911NF-16-1-0134. The work is partially developed when the first K. Azizzadenesheli was visiting INRIA, Lille and Simons Institute for the Theory of Computing, UC. Berkeley.Attached Files
Submitted - 1611.03907.pdf
Files
Name | Size | Download all |
---|---|---|
md5:beca78286e84c28c9a0aafe3152160dd
|
433.8 kB | Preview Download |
Additional details
- Eprint ID
- 94165
- Resolver ID
- CaltechAUTHORS:20190327-085718507
- NSF
- CCF-1254106
- Air Force Office of Scientific Research (AFOSR)
- FA9550-15-1-0221
- Contrat de plan Etat-région Nord - Pas-de-Calais
- Centre de Recherche en Informatique et Automatique de Lille
- Agence Nationale pour la Recherche (ANR)
- ANR-14-CE24-0010-01
- Microsoft Faculty Fellowship
- Google Faculty Research Award
- Adobe
- Army Research Office (ARO)
- W911NF-16-1-0134
- Created
-
2019-03-28Created from EPrint's datestamp field
- Updated
-
2023-06-02Created from EPrint's last_modified field