This repo computes feedback SG equilibrium of dynamic SG game using MILP and dynamic programming.
The dynamic SG game settings:
- player 1 is the leader and player 2 is the follower;
- finite state and finite actions;
- discrete time;
- Markov transition kernel.
The purposes are the following:
- verify if feedback SG equilirbium will be stationary when
$T\to\infty$ ; - design some learning algorithms based on Bellman equations for SG.
The game is defined as follows.
$$ \begin{align*} \min_{\pi^1 \in \Delta(|\mathcal{A}^1|), \pi^{2*}} \quad & \mathbb{E}{\pi^1, \pi^{2*}}\left[ \sum{t=0}^T \gamma^t r^1(S_t, A^1_t, A^2_t) | S_0 = s \right] \ \text{s.t.} \quad & \pi^{2*} \in \arg\min_{\pi^2 \in \Delta(|\mathcal{A}^2|)} \mathbb{E}{\pi^1, \pi^{2}} \left[ \sum{t=0}^T \gamma^t r^2(S_t, A^1_t, A^2_t) | S_0 = s \right], \ \end{align*} $$
Here,
For finite horizon SG, we can use dynamic programming to compute feedback SG equilibrium. At stage
where
In MARL settings, people are interested in stationary policies under
- For discounted cases. The value and policy are stationary as time
$T\to\infty$ ; - the follower's strategy is always deterministic.
- the leader's strategy may not be deterministic due to the BR partition.
- For non-discounted cases, the cost does not converge unless there is an absorbing state.
Python scripts need Gurobi to solve mixed integer programming in the SG. Please visit Gurobi website for installation and license request.
Footnotes
-
Paruchuri, Praveen, et al. "Playing games for security: An efficient exact algorithm for solving Bayesian Stackelberg games." Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems-Volume 2. 2008. ↩
-
Zhao, Yuhan, et al. "Stackelberg strategic guidance for heterogeneous robots collaboration." 2022 International Conference on Robotics and Automation (ICRA). IEEE, 2022. ↩