Introduction
Let A \in \mathbb{R}^{m\times n}, c \in \mathbb{R}^n and b \in\mathbb{R}^m, and consider a matrix game with payoff function (x, u) \mapsto \langle Ax, u\rangle + \langle c, x\rangle + \langle b, u\rangle. The Nash equilibrium problem is \begin{eqnarray} \underset{x \in \Delta_n}{\text{minimize }}\underset{u \in \Delta_m}{\text{maximize }}\langle Ax, u\rangle + \langle c, x\rangle + \langle b, u\rangle. \end{eqnarray} From the "min" player's point of view, this problem can be re-written as \begin{eqnarray} \underset{x \in \Delta_n}{\text{minimize }}f(x), \end{eqnarray} where the proper convex function f: \mathbb{R}^n \rightarrow \mathbb{R} is defined by \begin{eqnarray} f(x) := \hat{f}(x) + \underset{u \in \Delta_m}{\text{max }}\langle Ax, u\rangle - \hat{\phi}(u),\hspace{.5em}\hat{f}(x) := \langle c, x\rangle,\hspace{.5em}\hat{\phi}(u) := \langle b, u\rangle. \end{eqnarray} Of course, the dual problem is \begin{eqnarray} \underset{u \in \Delta_m}{\text{maximize }}\phi(u), \end{eqnarray} where the proper concave function \phi: \mathbb{R}^m \rightarrow \mathbb{R} is defined by \begin{eqnarray} \phi(u) := -\hat{\phi}(u) + \underset{x \in \Delta_n}{\text{min }}\langle Ax, u\rangle + \hat{f}(x). \end{eqnarray} Now, for a positive scalar \mu, consider a smoothed version of f \begin{eqnarray} \bar{f}_\mu(x) := \hat{f}(x) + \underbrace{\underset{u \in \Delta_m}{\text{max }}\langle Ax, u\rangle - \hat{\phi}(u) - \mu d_2(u)}_{f_\mu(x)}, \end{eqnarray}where d_2 is a prox-function for the simplex \Delta_m, with prox-center \bar{u} \in \Delta_m. For simplicity, take d_2(u) := \frac{1}{2}\|u - \bar{u}\|^2 \ge 0 = d_2(\bar{u}), \forall u \in \Delta_m and \bar{u} := (1/m,1/m,...,1/m) \in \Delta_m. Define a prox-function d_1 for \Delta_n analogously. Note that \sigma_1 = \sigma_2 = 1 since the d_j's are 1-strongly convex. Also, noting that the d_j's attain their maximum at the kinks of the respective simplexes, one computes where d_2 is a prox-function (see Nesterov's paper for definition) for the D_1 := \underset{x \in \Delta_n}{\text{max }}d_1(x) = \frac{1}{2} \times squared distance between any kink and \bar{x} = \frac{1}{2}\left((1-1/n)^2 + (n-1)(1/n)^2\right) = (1 - 1 / n) / 2. Similarly D_2 := \underset{u \in \Delta_m}{\text{max }}d_2(u) = (1 - 1 / m) / 2. Note that the functions (f_\mu)_{\mu > 0} increase point-wise to f in the limit \mu \rightarrow 0^+.
Ingredients for implementing the algorithm
Let us now prepare the ingredients necessary for implementing the algorithm. Definition. Given a closed convex subset C \subseteq \mathbb{R}^m, and a point x \in \mathbb{R}^m, recall the definition of the orthogonal projection of x unto C \begin{eqnarray} proj_C(x) := \underset{z \in C}{\text{argmin }}\frac{1}{2}\|z-x\|^2. \end{eqnarray} Geometrically, proj_C(x) is the unique point of C which is closest to x, and so proj_C is a well-defined function from \mathbb{R}^m to C. For example, if C is the nonnegative mth-orthant \mathbb{R}_+^m, then proj_C(x) \equiv (x)_+, the component-wise maximum of x and 0.Step 1: Call to the oracle, or computation of f(x_k) and \nabla f(x_k). For any x \in \mathbb{R}^n, define v_\mu(x) := (Ax - b) / \mu and u_\mu(x) := proj_{\Delta_m}(\bar{u} + v_\mu(x)). One computes \begin{eqnarray*} \begin{split} f_\mu(x) := \underset{u \in \Delta_m}{\text{max }}\langle Ax, u\rangle - \hat{\phi}(u) - \mu d_2(u) &= -\underset{u \in \Delta_m}{\text{min }}\frac{1}{2}\mu\|u-\bar{u}\|^2 + \langle b - Ax, u\rangle\\ &= \frac{1}{2}\mu\|v_\mu(x)\|^2 - \underset{u \in \Delta_m}{\text{min }}\frac{1}{2}\mu\|u - (\bar{u} + v_\mu(x))\|^2, \end{split} \end{eqnarray*} by completing the square in u and ignoring constant terms. One recognizes the minimization problem on the right-hand side as the orthogonal projection of the point \bar{u} + v_\mu(x) unto the simplex \Delta_m. It is not difficult to show that f_\mu is smooth with gradient \nabla f_\mu: x \mapsto A^Tu_\mu(x) (one can invoke Danskin's theorem, for example) and L_\mu: = \frac{1}{\mu\sigma_2}\|A\|^2 is a Lipschitz constant for the latter. Thus, \bar{f}_\mu is smooth and one computes \begin{eqnarray} \nabla \bar{f}_\mu(x_k) = \nabla \hat{f}(x_k) + \nabla f_\mu(x_k) = c + A^*u_\mu(x_k). \end{eqnarray} Moreover, L_\mu defined above is also a Lipschitz constant for \nabla \bar{f}_\mu.
Step 3: Computation of z_k. For any s \in \mathbb{R}^n, one has \begin{eqnarray*} \underset{x \in \Delta_n}{\text{argmin }}Ld_1(x) + \langle s, x\rangle = \underset{x \in \Delta_n}{\text{argmin }}\frac{1}{2}L_\mu\|x-\bar{x}\|^2 + \langle s, x\rangle = proj_{\Delta_n}\left(\bar{x} - \frac{1}{L_\mu}s\right), \end{eqnarray*} by completing the square in x and ignoring constant terms. Thus, with s = \sum_{i=0}^k\frac{i+1}{2}\nabla \bar{f}_\mu(x_i), we obtain the update rule \begin{eqnarray} z_k = proj_{\Delta_n}\left(\bar{x} - \frac{1}{L_\mu}\sum_{i=0}^k\frac{i+1}{2}\nabla \bar{f}_\mu(x_i)\right). \end{eqnarray}
Step 2: Computation of y_k. Similarly to the previous computation, one obtains the update rule \begin{eqnarray} y_k = T_{\Delta_n}(x_k) := \underset{y \in \Delta_n}{\text{argmin }}\langle \nabla \bar{f}_\mu(x), y\rangle + \frac{1}{2}L_\mu\|y-x\|^2 = proj_{\Delta_n}\left(x_k - \frac{1}{L_\mu}\nabla \bar{f}_\mu (x_k)\right). \end{eqnarray} Stopping criterion. The algorithm is stopped once the primal-dual gap \begin{eqnarray} \begin{split} &f(x_k) - \phi(u_\mu(x_k))=\\ &\left(\langle c, x_k \rangle + \underset{u \in \Delta_m}{\text{max }}\langle Ax_k, u\rangle - \langle b, u\rangle\right) - \left(-\langle b, u_\mu(x_k)\rangle + \underset{z \in \Delta_n}{\text{min }}\langle Az, u_\mu(x_k)\rangle + \langle c, z\rangle\right)\\ &= \langle c, x_k \rangle + \underset{1 \le j \le m}{\text{max }}(Ax_k-b)_j + \langle b, u_\mu(x_k)\rangle - \underset{1 \le i \le n}{\text{min }}(A^Tu_\mu(x_k) + c)_i \end{split} \end{eqnarray} is below a tolerance \epsilon > 0 (say \epsilon \sim 10^{-4}). Note that in the above computation, we have used the fact that for any r \in \mathbb{R}^m, one has \begin{eqnarray} \begin{split} \underset{u \in \Delta_m}{\text{max }}\langle r, u\rangle &= \text{max of }\langle r, .\rangle\text{ on the boundary of }\Delta_m \\ &= \text{max of }\langle r, .\rangle\text{ on the line segment joining any two distinct kinks of } \Delta_m \\ &= \underset{1 \le i < j \le m}{\text{max }}\text{ }\underset{0 \le t \le 1}{\text{max }}tr_i + (1-t)r_j = \underset{1 \le i \le m}{\text{max }}r_i. \end{split} \end{eqnarray}
Algorithm parameters. In accord with Nesterov's recommendation, we may take \begin{eqnarray} \mu = \frac{\epsilon}{2D_2} = \frac{\epsilon}{2}, \text{ and }L_\mu=\frac{\|A\|^2}{\sigma_2 \mu} = \frac{\|A\|^2}{\mu}. \end{eqnarray}
No comments:
Post a Comment