Markov decision processes and mean-field games with costly and delayed information
Christoph Reisinger, University of Oxford
We first consider Markov decision processes where the state of the chain is only given at chosen observation times and of a cost. Optimal strategies involve the optimisation of observation times as well as the subsequent action values. We derive a quasi-variational inequality for the value function, show its well-posedness, and propose an efficient penalty scheme for the numerical solution. The model and numerical scheme are illustrated by an example from medical testing and treatment. We then consider an extension in which agents can exercise control actions that affect their speed of access to information. We will study single as well as multi-agent settings and their mean-field limit. The agents can dynamically decide to receive observations with less delay by paying higher observation costs. Agents seek to exploit their active information gathering by making further decisions to influence their state dynamics to maximize rewards. In the mean field equilibrium, each generic agent solves individually a partially observed Markov decision problem in which the way partial observations are obtained is itself also subject of dynamic control actions by the agent. We prove that with sufficient entropy regularisation, a fixed point iteration converges to the unique MFG equilibrium and yields an approximate Nash equilibrium for a large but finite population size. We illustrate our MFG by an example from epidemiology, where medical testing results at different speeds and costs can be chosen by the agents. Joint work with Dirk Becherer and Jonathan Tam.