Welcome to the new version of CaltechAUTHORS. Login is currently restricted to library staff. If you notice any issues, please email coda@library.caltech.edu
Published March 16, 2023 | Submitted
Report Open

A Finite-Sample Analysis of Payoff-Based Independent Learning in Zero-Sum Stochastic Games

Abstract

We study two-player zero-sum stochastic games, and propose a form of independent learning dynamics called Doubly Smoothed Best-Response dynamics, which integrates a discrete and doubly smoothed variant of the best-response dynamics into temporal-difference (TD)-learning and minimax value iteration. The resulting dynamics are payoff-based, convergent, rational, and symmetric among players. Our main results provide finite-sample guarantees. In particular, we prove the first-known O̅(1/ϵ²) sample complexity bound for payoff-based independent learning dynamics, up to a smoothing bias. In the special case where the stochastic game has only one state (i.e., matrix games), we provide a sharper O̅(1/ϵ) sample complexity. Our analysis uses a novel coupled Lyapunov drift approach to capture the evolution of multiple sets of coupled and stochastic iterates, which might be of independent interest.

Additional Information

Attribution 4.0 International (CC BY 4.0)

Attached Files

Submitted - 2303.03100.pdf

Files

2303.03100.pdf
Files (734.5 kB)
Name Size Download all
md5:666bc399932a2c693b35afc41b1f0408
734.5 kB Preview Download

Additional details

Created:
August 20, 2023
Modified:
October 25, 2023