Competitive Policy Optimization

Creators: Prajapat, Manish; Azizzadenesheli, Kamyar; Liniger, Alexander; Yue, Yisong; Anandkumar, Anima

Abstract

A core challenge in policy optimization in competitive Markov decision processes is the design of efficient optimization methods with desirable convergence and stability properties. To tackle this, we propose competitive policy optimization (CoPO), a novel policy gradient approach that exploits the game-theoretic nature of competitive games to derive policy updates. Motivated by the competitive gradient optimization method, we derive a bilinear approximation of the game objective. In contrast, off-the-shelf policy gradient methods utilize only linear approximations, and hence do not capture interactions among the players. We instantiate CoPO in two ways:(i) competitive policy gradient, and (ii) trust-region competitive policy optimization. We theoretically study these methods, and empirically investigate their behavior on a set of comprehensive, yet challenging, competitive games. We observe that they provide stable optimization, convergence to sophisticated strategies, and higher scores when played against baseline policy gradient methods.

Additional Information

The main body of this work took place when M. Prajapat was a visiting scholar at Caltech. The authors would like to thank Florian Schäfer for his support. M. Prajapat is thankful to Zeno Karl Schindler foundation for providing him with a Master thesis grant. K. Azizzadenesheli is supported in part by Raytheon and Amazon Web Service. A. Anandkumar is supported in part by Bren endowed chair, DARPA PAIHR00111890035 and LwLL grants, Raytheon, Microsoft, Google, and Adobe faculty fellowships.

Attached Files

Submitted - 2006.10611.pdf

Files

2006.10611.pdf

Files (3.4 MB)

Name	Size	Download all
2006.10611.pdf md5:95bbaf69d1e216b5f7672eb30e9f61d5	3.4 MB	Preview Download

Additional details

	All versions	This version
Views	0	0
Downloads	0	0
Data volume	0 Bytes	0 Bytes