Parsimonious structure-exploiting deep neural network surrogates for Bayesian inverse problems and optimal experimental design
Omar Ghattas, University of Texas at Austin
In an inverse problem, one seeks to infer unknown parameters or parameter fields from measurements or observations of the state of a natural or engineered system. Such problems are fundamental to many fields of science and engineering: often available models possess unknown or uncertain input parameters that must be inferred from experimental or observational data. The Bayesian framework for inverse problems accounts for uncertainty in the inferred parameters stemming from uncertainties in the observational data, the model, and any prior knowledge. This leads to the meta-problem of optimal experimental design (OED): how do we optimize the data acquisition so that the uncertainty in the recovered parameters is minimized? In both Bayesian inversion (BI) and OED, the forward model must be solved numerous times—as many as millions—to characterize the uncertainty in the parameters. BI and OED problems governed by large-scale complex models in high parameter dimensions (such as nonlinear PDEs with uncertain infinite dimensional parameter fields) quickly become prohibitive. Efficient evaluation of the parameter-to-observable (p2o) map, defined by solution of the forward model, is the key to making BI and OED tractable. Surrogate approximations of p2o maps have the potential to greatly accelerate BI and OED, provided that the p2o map can be accurately approximated using (far) fewer forward model solves than would be required for solving the BI or OED problem using the full p2o map. Unfortunately, constructing such surrogates presents significant challenges when the parameter dimension is high and the forward model is expensive. Deep neural networks (DNNs) have emerged as leading contenders for overcoming these challenges. We demonstrate that black box application of DNNs for problems with infinite dimensional parameter fields leads to poor results, particularly in the common situation when training data are limited due to the expense of the model. However, by constructing a network architecture that is adapted to the geometry and intrinsic low-dimensionality of the p2o map as revealed through adjoint PDEs, one can construct a “parsimonious” DNN surrogate with superior approximation properties with only limited training data. For training the DNN, we introduce the low rank saddle-free Newton method for stochastic optimization, and show that it outperforms first order methods such as Adam and stochastic gradient descent. Examples from climate modeling are presented. This work is joint with Tom O’Leary-Roseberry, Peng Chen, Umberto Villa, and Nick Alger.