Minimax optimization has found multiple applications in machine learning. The training of generative adversarial networks or robust reinforcement learning are only two of them. However, these applications often lead to nonconvex-nonconcave minimax objectives which cause notorious problems for first-order methods. There is a rich literature that witnesses failures of first-order methods such as divergence or cyclic behavior of the optimization procedure.
Traditionally, minimax optimization problems have been studied under the umbrella of variational inequalities (VIs). In short, VIs are a uniform representation of a wide range of optimization problems. A survey can be found here. The difficulty in the nonconvex-nonconcave case has been established in terms of exponential lower bounds of classical optimization type and in terms of computational complexity.
A classical approach to guarantee convergence outside the convex-concave setting is to assume some additional structure in the problem formulation. One important case are Minty variational inequalities (MVI) which include all quasiconvex-concave and starconvex-concave problems.
Last year, Diakonikolas proposed a relaxation of MVI, called weak Minty variational inequality, by adding a new parameter to the formulation. More precisely, they allow that MVI also admits negative values up to a certain range that is governed by the parameter. They also developed an extension of the classical extra gradient algorithm, called EG+ and showed that for a certain range of the new parameter EG+ converges to a stationary point. However, the new class of problems was still too weak to capture even the simplest known counterexample for general Robbins-Monro first-order schemes (the forsaken example, see here).
The paper [Pet22E] proposes curvatureEG+, an extension of the EG+ algorithm. The main difference to EG+ is the adaptive computation of the step size. They show that curvatureEG+ converges for a broader range of the parameter in the weak MVI definition than EG+. Interestingly, this is enough to capture the aforementioned forsaken example (see below).
Minimax optimization has just recently entered the stage in machine learning, and I am delighted to see how this new domain of application already sparks new impulses for this complex but important topic.