Competitive Policy Optimization
Abstract
A core challenge in policy optimization in competitive Markov decision processes is the design of efficient optimization methods with desirable convergence and stability properties. To tackle this, we propose competitive policy optimization (CoPO), a novel policy gradient approach that exploits the game-theoretic nature of competitive games to derive policy updates. Motivated by the competitive gradient optimization method, we derive a bilinear approximation of the game objective. In contrast, off-the-shelf policy gradient methods utilize only linear approximations, and hence do not capture interactions among the players. We instantiate CoPO in two ways:(i) competitive policy gradient, and (ii) trust-region competitive policy optimization. We theoretically study these methods, and empirically investigate their behavior on a set of comprehensive, yet challenging, competitive games. We observe that they provide stable optimization, convergence to sophisticated strategies, and higher scores when played against baseline policy gradient methods.
Additional Information
The main body of this work took place when M. Prajapat was a visiting scholar at Caltech. The authors would like to thank Florian Schäfer for his support. M. Prajapat is thankful to Zeno Karl Schindler foundation for providing him with a Master thesis grant. K. Azizzadenesheli is supported in part by Raytheon and Amazon Web Service. A. Anandkumar is supported in part by Bren endowed chair, DARPA PAIHR00111890035 and LwLL grants, Raytheon, Microsoft, Google, and Adobe faculty fellowships.Attached Files
Submitted - 2006.10611.pdf
Files
Name | Size | Download all |
---|---|---|
md5:95bbaf69d1e216b5f7672eb30e9f61d5
|
3.4 MB | Preview Download |
Additional details
- Eprint ID
- 106490
- Resolver ID
- CaltechAUTHORS:20201106-120215567
- Zeno Karl Schindler Foundation
- Raytheon Company
- Amazon Web Services
- Bren Professor of Computing and Mathematical Sciences
- Defense Advanced Research Projects Agency (DARPA)
- HR00111890035
- Learning with Less Labels (LwLL)
- Microsoft Faculty Fellowship
- Google Faculty Research Award
- Adobe
- Created
-
2020-11-06Created from EPrint's datestamp field
- Updated
-
2023-06-02Created from EPrint's last_modified field