Section: New Results
Optimal control and zero-sum games
Fixed points of order preserving homogeneous maps and zero-sum games
Participants : Marianne Akian, Stéphane Gaubert.
In a series of joint works with Antoine Hochart, applied methods of non-linear fixed point theory to zero-sum games.
A key issue is the solvability of the ergodic equation associated to a zero-sum game with finite state space, i.e., given a dynamic programming operator associated to an undiscounted problem, one looks for a vector , called the bias, and for a scalar , the ergodic constant, such that . The bias vector is of interest as it allows to determine optimal stationnary strategies.
In [41], we studied zero-sum games with perfect information and finite action spaces, and showed that the set of payments for which the bias vector is not unique (up to an additive constant) coincides with the union of lower dimensional cells of a polyhedral complex, in particular, the bias vector is unique, generically. We provided an application to perturbation schemes in policy iteration.
In [14], we apply game theory methods to the study of the nonlinear eigenproblem for homogeneous order preserving self maps of the interior of the cone. We show that the existence and uniqueness of an eigenvector is governed by combinatorial conditions, involving dominions (sets of states “controlled” by one of the two players). In this way, we characterize the situation in which the existence of an eigenvector holds independently of perturbations, and we solve an open problem raised in [77].
Nonlinear fixed point methods to compute joint spectral raddi of nonnegative matrices
Participant : Stéphane Gaubert.
In [21], we introduce a non-linear fixed point method to approximate the joint spectral radius of a finite set of nonnegative matrices. We show in particular that the joint spectral radius is the limit of the eigenvalues of a family of non-linear risk-sensitive type dynamic programming operators. We develop a projective version of Krasnoselskii-Mann iteration to solve these eigenproblems, and report experimental results on large scale instances (several matrices in dimensions of order 1000 within a minute).
Probabilistic and max-plus approximation of Hamilton-Jacobi-Bellman equations
Participant : Marianne Akian.
We consider fully nonlinear Hamilton-Jacobi-Bellman equations associated to diffusion control problems with finite horizon involving a finite set-valued (or switching) control and possibly a continuum-valued control. In [36], we constructed a lower complexity probabilistic numerical algorithm by combining the idempotent expansion properties obtained by McEneaney, Kaise and Han [92], [98] for solving such problems with a numerical probabilistic method such as the one proposed by Fahim, Touzi and Warin [70] for solving some fully nonlinear parabolic partial differential equations, when the volatility does not oscillate too much. In [37] and [27], we improved the method of Fahim, Touzi and Warin by introducing probabilistic schemes which are monotone without any restrictive condition, allowing one to solve fully nonlinear parabolic partial differential equations with general volatilities. We studied the convergence and obtain error estimates when the parameters and the value function are bounded.
Tropical-SDDP algorithms for stochastic control problems involving a switching control
Participants : Marianne Akian, Duy Nghi Benoît Tran.
The PhD thesis of Benoît Tran, supervised by Jean-Philippe Chancelier (ENPC) and Marianne Akian concerns the numerical solution of the dynamic programming equation of discrete time stochastic control problems.
Several methods have been proposed in the litterature to bypass the curse of dimensionality difficulty of such an equation, by assuming a certain structure of the problem. Examples are the max-plus based method of McEneaney [99], [100], the stochastic max-plus scheme proposed by Zheng Qu [108], the stochastic dual dynamic programming (SDDP) algorithm of Pereira and Pinto [105], the mixed integer dynamic approximation scheme of Philpott, Faisal and Bonnans [55], the probabilistic numerical method of Fahim, Touzi and Warin [70]. We propose to associate and compare these methods in order to solve more general structures.
In a first work [35], see also [24], we build a common framework for both the SDDP and a discrete time and finite horizon version of Zheng Qu's algorithm for deterministic problems involving a finite set-valued (or switching) control and a continuum-valued control. We propose an algorithm that generates monotone approximations of the value function as a pointwise supremum, or infimum, of basic (affine or quadratic for example) functions which are randomly selected. We give sufficient conditions that ensure almost sure convergence of the approximations to the value function. More recently, we study generalizations of these algorithms to the case of stochastic optimal control problems.
In a recent work, we introduce and study an entropic relaxation of the Nested Distance introduced by Pflug [106].
A variance reduction deflated value iteration algorithm to solve ergodic games
Participants : Marianne Akian, Stéphane Gaubert, Omar Saadi.
Recently, Sidford et al. introduced in [116] a variance reduced value iteration algorithm to solve discounted Markov decision processes. In [25], in a joint work work with Zheng Qu (Hong Kong University), we extended this algorithm to the ergodic (mean payoff) case, and also to the two-player case, exploiting techniques from non-linear spectral theory [39] and variational analysis. The deterministic version of this algorithm also yields a new method (alternative to relative value iteration) to solve ergodic problems.