High-resolution climate modeling using coarse-scale models and reanalysis data

This was part of Machine Learning for Climate and Weather Applications

Themistoklis Sapsis, MIT

Thursday, November 3, 2022

Abstract: Geophysical datasets are characterized by unique challenges: multiple scales in space and time, strong nonlinear coupling of dynamical components, and a large number of positive Lyapunov exponents, i.e. instabilities. These properties make the prediction and uncertainty quantification in geophysical settings a problem of unique complexity. Contemporary ocean, atmospheric and climate models aim to overcome these challenges by accurate numerical discretization of the governing equations, careful parameterizations, and/or data-assimilation schemes. However, the resulted models are typically very complex and expensive, making the quantification of rare events over large time horizons a formidable task. This is especially important for the situation where one is interested to evaluate the effectiveness of a particular policy or measure. This work aims to overcome some of these limitations and consists of two parts. In the first part our scope is to build a non-intrusive correction-operator that will take as input a coarse-scale climate simulation (100km) and will provide as output an accurate one that has consistent large-scale statistics with the reanalysis data. This is not a straightforward task, since due to chaotic divergence there is no direct (i.e. time-wise) correspondence between a coarse-scale climate simulation and a reanalysis dataset. To overcome this problem, we formulate a special protocol for the development of suitable training datasets. The idea is to nudge the coarse-climate model towards the reanalysis data with a very weak penalization term. The result is a coarse-scale simulation that remains close to the reanalysis data-set over large time scales and therefore it is appropriate for training. Using the nudged coarse-scale climate dataset as input and the reanalysis dataset as output we machine learn a correction operator in the form of a LSTM RNN. We subsequently assess the performance of the correction operator on free running coarse-scale climate simulations and show that is indeed able to correct statistical errors which primarily appear in the extreme event regions of the temperature and humidity pdfs. The second part aims to produce small scale features (i.e. high resolution outputs) from coarse-scale inputs. Using reanalysis data for a spatial region of interest we train a machine-learning scheme that naturally ‘splits’ the small-scales into a predictable part, which can be effectively parametrized in terms of the large-scales features, and a stochastic residual, which cannot be uniquely determined using the large-scale information. The later is represented using a conditionally Gaussian process, a choice that allows us to overcome the need for a vast amount of training data, which for climate applications, is naturally limited to a single realization for each spatial location. Using a second round of machine-learning we parametrize, for each location, the covariance of the stochastic component in terms of the large scales. We employ the machine-learned statistics to parsimoniously reconstruct random realizations of the small scales. We demonstrate the approach on reanalysis data involving vorticity over Western Europe and we show that the reconstructed random samples for the small scales result in good agreement to the spatial spectrum, single-point probability density functions, and temporal spectral content. The first part of this work is in collaboration with Alexis-Charalampopoulos (MIT), Dr. Ruby Leung (PNNL) and Dr. Shixuan Zhang (PNNL) and it is supported by the DARPA-ACTM program. The second part is in collaboration with Dr. Zong Yi Wan (MIT), Dr. Boyko Dodov (Verisk Analytics) and Dr. Antoine Blanchard (Verisk Analytics) and it is supported by Verisk Analytics.