Quick Start

If you have not installed L-MALib yet, please refer to installation before running. We give a case of running Policy Space Response Oracle (PSRO) to solve Leduc Holdem,

PSRO Learning

Policy Space Response Oracle (PSRO) is a population-based MARL algorithm which cooperates game-theory and MARL algorithm to solve multi-agent tasks in the scope of meta-game. At each iteration, the algorithm will generate some policy combinations and executes independent learning for each agent. Such a nested learning process comprises rollout, training, evaluation in sequence, and works circularly until the algorithm finds the estimated Nash Equilibrium.

Note

If you want to use alpha-rank to estimate the equilibrium, you need to install open-spiel before that. Follow the installation to get more details.

Run the pre-defined config

bash run.sh