Reinforcement Learning of POMDPs using Spectral Methods
Abstract
We propose a new reinforcement learning algorithm for partially observable Markov decision processes (POMDP) based on spectral decomposition methods. While spectral methods have been previously employed for consistent learning of (passive) latent variable models such as hidden Markov models, POMDPs are more challenging since the learner interacts with the environment and possibly changes the future observations in the process. We devise a learning algorithm running through episodes, in each episode we employ spectral techniques to learn the POMDP parameters from a trajectory generated by a fixed policy. At the end of the episode, an optimization oracle returns the optimal memoryless planning policy which maximizes the expected reward based on the estimated POMDP model. We prove an order-optimal regret bound w.r.t. the optimal memoryless policy and efficient scaling with respect to the dimensionality of observation and action spaces.
Additional Information
© 2016 K. Azizzadenesheli, A. Lazaric & A. Anandkumar. K. Azizzadenesheli is supported in part by NSF Career award CCF-1254106 and ONR Award N00014-14-1-0665. A. Lazaric is supported in part by a grant from CPER Nord-Pas de Calais/FEDER DATA Advanced data science and technologies 2015-2020, CRIStAL (Centre de Recherche en Informatique et Automatique de Lille), and the French National Research Agency (ANR) under project ExTra-Learn n.ANR-14-CE24-0010-01. A. Anandkumar is supported in part by Microsoft Faculty Fellowship, NSF Career award CCF-1254106, ONR Award N00014-14-1-0665, ARO YIP Award W911NF-13-1-0084 and AFOSR YIP FA9550-15-1-0221.Attached Files
Published - azizzadenesheli16a.pdf
Files
Name | Size | Download all |
---|---|---|
md5:9105dc1f41f8e2cb087839bea1098ee0
|
774.3 kB | Preview Download |
Additional details
- Eprint ID
- 94324
- Resolver ID
- CaltechAUTHORS:20190401-123310700
- NSF
- CCF-1254106
- Office of Naval Research (ONR)
- N00014-14-1-0665
- Contrat de plan Etat-région Nord - Pas-de-Calais
- Fondo Europeo de Desarrollo Regional (FEDER)
- Centre de Recherche en Informatique et Automatique de Lille
- Agence Nationale pour la Recherche (ANR)
- ANR-14-CE24-0010-01
- Microsoft Faculty Fellowship
- Office of Naval Research (ONR)
- N00014-14-1-0665
- Army Research Office (ARO)
- W911NF-13-1-0084
- Air Force Office of Scientific Research (AFOSR)
- FA9550-15-1-0221
- Created
-
2019-04-01Created from EPrint's datestamp field
- Updated
-
2023-06-02Created from EPrint's last_modified field