This workshop will focus on mathematical foundations and methodological developments in kernel methods for efficiently learning and predicting complex systems. Topics of interest will encompass probabilistic approaches to prediction, integration, optimization, approximate inference, and how these can be leveraged in the design of real and computer experiments.
Funding
NOTE: All funding for this workshop has been allocated.
Poster Session
This workshop will include a poster session. In order to propose a poster, you must first register for the workshop, and then submit a proposal using the form that will become available on this page after you register. The registration form should not be used to propose a poster.
Due to high demand, the poster proposal deadline has been extended to January 31, 2025; posters submitted by January 31, 2025 will be guaranteed consideration. If your proposal is accepted, you should plan to attend the event in-person.
The general topic of this talk is Bayesian adaptive learning of excursion sets defined from a costly black-box model. This research field has received many attention in the last decades. During this talk, we will first review Gaussian Process Regression for feasible set estimation in the framework where the set to recover is defined from a numerical model with scalar values. We will exhibit that usual adaptive sampling criteria may lack of robustness, e.g., when the set to recover has several connex components. Then we will address more complex frameworks, such as the presence of uncertainties or the case of numerical models with vector outputs.
9:50-10:05 CDT
Q&A
10:05-10:35 CDT
Coffee Break
10:35-11:10 CDT
Data-Efficient Kernel Methods for Discovering Differential Equations and Their Solution Operators
Speaker: Houman Owhadi (Caltech)
We introduce a kernel-based framework for inferring ordinary and partial differential equations from sparse, partial observations of solution-source pairs. The proposed approach comes with simple and transparent convergence, and a priori error estimates guarantees. This presentation is based on a joint work with Bamdad Hosseini, Alexander Hsu, Yasamin Jalalian, and Juan Osorio.
11:10-11:25 CDT
Q&A
11:25-12:00 CDT
Learning linear operators: Infinite-dimensional regression as a well-behaved non-compact inverse problem
Speaker: Tim Sullivan (University of Warwick)
We consider the problem of learning a linear operator θ between two Hilbert spaces from empirical observations, which we interpret as least squares regression in infinite dimensions. We show that this goal can be reformulated as an inverse problem for θ with the feature that its forward operator is generally non-compact (even if θ is assumed to be compact or of p-Schatten class). However, we prove that, in terms of spectral properties and regularisation theory, this inverse problem is equivalent to the known compact inverse problem associated with scalar response regression. Our framework allows for the elegant derivation of dimension-free rates for generic learning algorithms under Hölder-type source conditions. The proofs rely on the combination of techniques from kernel regression with recent results on concentration of measure for sub-exponential Hilbertian random variables. The obtained rates hold for a variety of practically-relevant scenarios in functional regression as well as nonlinear regression with operator-valued kernels and match those of classical kernel regression with scalar response.
12:00-12:15 CDT
Q&A
12:15-13:45 CDT
Lunch Break
13:45-14:20 CDT
TBA
Speaker: Nathan Kirk (Illinois Institute of Technology)
14:20-14:35 CDT
Q&A
14:35-15:10 CDT
Distributional encoding for Gaussian process regression with qualitative inputs
Speaker: Sébastien Da Veiga (ENSAI)
Gaussian process (GP) regression is a popular and sample-efficient approach for many engineering applications, where observations are expensive to acquire, and is also a central ingredient of Bayesian optimization (BO), a highly prevailing method for the optimization of black-box functions. However, when all or some input variables are categorical, building a predictive and computationally efficient GP remains challenging. Starting from the naive target encoding idea, where the original categorical values are replaced with the mean of the target variable for that category, we propose a generalization based on distributional encoding (DE) which makes use of all the samples of the target variable for a category. To handle this type of encoding inside the GP, we build upon recent results on characteristic kernels for probability distributions, based on the maximum mean discrepancy and the Wasserstein distance. We validate our approach empirically and demonstrate state-of-the-art predictive performance on a variety of synthetic and real-world datasets. DE is naturally complementary to recent advances in BO over discrete and mixed-spaces and easily generalizes to multi-task settings and classification problems.
15:10-15:25 CDT
Q&A
15:30-16:30 CDT
Social Hour
Tuesday, April 1, 2025
8:30-9:00 CDT
Sign-in/Breakfast
9:00-9:35 CDT
Triangulation candidates for Bayesian optimization
Speaker: Robert Gramacy (Virginia Polytechnic Institute & State University (Virginia Tech))
Bayesian optimization involves "inner optimization" over a new-data acquisition criterion which is non-convex/highly multi-modal, may be non-differentiable, or may otherwise thwart local numerical optimizers. In such cases it is common to replace continuous search with a discrete one over random candidates. Here we propose using candidates based on a Delaunay triangulation of the existing input design. We detail the construction of these "tricands" and demonstrate empirically how they outperform both numerically optimized acquisitions and random candidate-based alternatives, and are well-suited for hybrid schemes, on benchmark synthetic and real simulation experiments.
9:35-9:50 CDT
Q&A
9:50-10:20 CDT
Coffee Break
10:20-10:55 CDT
Latent Variable Gaussian Process (LVGP) for Adaptive, Interpretable, and Multi-Fidelity Design of Emerging Materials and Structures
Speaker: Wei Chen (Northwestern University)
Engineering design often involves qualitative and quantitative design variables, which requires systematic methods for the exploration of these mixed-variable design spaces. Existing machine learning (ML) models that can handle mixed variables as inputs require a large amount of data but do not provide uncertainty quantification that is crucial for sequential (adaptive) design of experiments. We have developed a novel Latent Variable Gaussian Process (LVGP) based ML approach that involves a latent variable (LV) representation of qualitative inputs, and automatically discovers a categorical-to-numerical nonlinear map that transforms the underlying high dimensional physical attributes into the LV space. The nonlinear mapping also provides an inherent ordering and structure for the levels of the qualitative factor(s), which leads to substantial insight and interpretable ML. In addition, LVGP provides uncertainty quantification of prediction which is critical for adaptive sampling to sequentially choose samples based on current observations and the method also offers easy integration with Bayesian optimization (BO) or other reinforcement learning strategies for the purpose of design optimization. We will demonstrate the benefits of the LVGP approach using designs of emerging microstructural and metamaterials systems as examples. Furthermore, the LVGP approach has been extended to non-hierarchical multi-fidelity modeling and adaptive sampling to benefit a wide range of engineering problems that involve multi-fidelity or multi-modal data fusion.
10:55-11:10 CDT
Q&A
11:10-11:40 CDT
Spotlight poster presentations 1
11:40-13:20 CDT
Lunch and poster session 1
13:20-13:55 CDT
Robust Optimal sensor placement for Bayesian Inverse Problems Governed by PDEs
Speaker: Alen Alexanderian (North Carolina State University (NCSU))
We consider optimal design of sensor networks for nonlinear Bayesian inverse problems governed by partial differential equations (PDEs). An optimal design is one that optimizes the statistical quality of the solution of the inverse problem. The computed optimal design, however, depends on the modeling assumptions encoded in the governing PDEs or the parameterization of the observation error model. If some of these elements are subject to considerable uncertainties, it is prudent to follow a robust optimal experimental design (ROED) approach. We follow a worst-case scenario approach ROED and develop a scalable computational framework that is suitable for the class of inverse problems under study. Our approach incorporates a probabilistic optimization paradigm for the resulting combinatorial max-min optimization problem. We focus on Bayesian ROED, where the goal is to maximize information gain in presence of uncertainties in the measurement error model. The proposed approach is illustrated in the context of optimal sensor placement for a coefficient inverse problem governed by an elliptic PDE.
13:55-14:10 CDT
Q&A
14:10-14:40 CDT
Coffee Break
14:40-15:15 CDT
Fast data inversion for high-dimensional dynamical systems from noisy measurements
Speaker: Mengyang Gu (University of California, Santa Barbara)
In this work, we develop a scalable approach for a flexible latent factor model for high-dimensional dynamical systems. Each latent factor process has its own correlation and variance parameters, and the orthogonal factor loading matrix can be either fixed or estimated. We utilize an orthogonal factor loading matrix that avoids computing the inversion of the posterior covariance matrix at each time of the Kalman filter, and derive closed-form expressions in an expectation-maximization algorithm for parameter estimation, which substantially reduces the computational complexity without approximation. Our study is motivated by inversely estimating slow slip events from geodetic data, such as continuous GPS measurements. Extensive simulated studies illustrate higher accuracy and scalability of our approach compared to alternatives. By applying our method to geodetic measurements in the Cascadia region, our estimated slip better agrees with independently measured seismic data of tremor events. The substantial acceleration from our method enables the use of massive noisy data for geological hazard quantification and other applications.
15:15-15:30 CDT
Q&A
15:30-16:05 CDT
Subspace accelerated measure transport method for sequential experimental design
Speaker: Karina Koval (University of Heidelberg)
We focus on sequential optimal experimental design (sOED) for Bayesian inverse problems, where the objective is to select experimental conditions that maximize the incremental expected information gain (iEIG) between successive posterior distributions. When the posterior is non-Gaussian, this task becomes analytically intractable and computationally expensive, especially in high-dimensional settings with costly forward models. To address these challenges, we propose a scalable framework for approximately solving the sOED problem using a sharp upper bound on the iEIG. This bound serves as a guide for sOED and is efficiently evaluated through conditional measure transport combined with likelihood-informed subspace methods. We demonstrate the effectiveness of our approach through numerical examples.
16:05-16:20 CDT
Q&A
Wednesday, April 2, 2025
8:30-9:00 CDT
Sign-in/Breakfast
9:00-9:35 CDT
TBA
Speaker: Youssef Marzouk (MIT Center for Computational Science and Engineering)
9:35-9:50 CDT
Q&A
9:50-10:20 CDT
Coffee Break
10:20-10:55 CDT
TBA
Speaker: Victor Picheny (Second Mind)
10:55-11:10 CDT
Q&A
11:10-11:40 CDT
Spotlight poster presentations 2
11:40-13:20 CDT
Lunch and Poster session 2
13:20-13:55 CDT
Respecting the boundaries: Space-filling designs for surrogate modeling with boundary information
Speaker: Simon Mak (Duke University)
Gaussian process (GP) surrogates are widely used for emulating expensive computer simulators, and have led to important advances in science and engineering. One challenge with fitting such surrogates is the costly generation of training data, which can require thousands of CPU hours per run. Recent promising work has investigated the integration of known boundary information within GP surrogates, which can greatly reduce its training sample size and thus its computational cost. There is, however, little work exploring the critical question of how such simulation experiments should be designed given boundary information. We propose here a new class of space-filling designs, called boundary maximin designs, for effective GP surrogate modeling with boundary information. Our designs rely on a new space-filling criterion that is derived from the asymptotic D-optimal designs of the boundary GPs from Vernon et al. (2019) and Ding et al. (2019), which can incorporate a broad class of known boundary information. To account for effect sparsity, we further propose a new boundary maximum projection design that jointly integrates boundary information and ensures good projective properties. Numerical experiments show the improved surrogate performance of boundary-integrated GPs using the proposed boundary maximin designs compared to the state-of-the-art.
13:55-14:10 CDT
Q&A
14:10-14:40 CDT
Coffee Break
14:40-15:15 CDT
TBA
Speaker: Natalie Maus (University of Pennsylvania)
15:15-15:30 CDT
Q&A
Thursday, April 3, 2025
8:30-9:00 CDT
Sign-in/Breakfast
9:00-9:45 CDT
Probabilistic Learning on Manifolds (PLoM) with Transient Diffusion Kernels
Speaker: Roger Ghanem (University of South California)
9:45-10:00 CDT
Q&A
10:00-10:30 CDT
Coffee Break
10:30-11:05 CDT
TBA
Speaker: Matthieu Darcy (Caltech)
11:05-11:20 CDT
Q&A
11:20-11:55 CDT
TBA
Speaker: Mirjeta Pasha (Virginia Tech)
11:55-12:10 CDT
Q&A
12:10-14:00 CDT
Lunch and Mentoring Lunch (invitation only)
14:00-14:35 CDT
TBA
Speaker: Andrew Duncan (Imperial College)
14:35-14:50 CDT
Q&A
14:50-15:05 CDT
Time for Workshop survey to be completed on-site
15:05-15:35 CDT
Coffee Break
15:35-16:10 CDT
Column and Row Subset Selection using Nuclear Scores
Speaker: Michael Lindsey (University of California, Berkeley (UC Berkeley)
Column selection is an essential tool for low-rank approximation with wide-ranging applications including kernel approximation and experimental design. We present a framework for fast, efficient, and theoretically guaranteed column selection. In particular we present a sparsity-exploiting deterministic algorithm for Nyström approximation and a randomized matrix-free formalism that is well-adapted to sparse interpolative decompositions and graph kernel approximation. We bound the performance of our algorithms favorably relative to the expected performance of determinantal point process (DPP) sampling, which represents a theoretical gold standard that is difficult to realize practically. We illustrate strong real-world performance of our algorithms on a diverse set of example approximation tasks. Time permitting, we also present extensions to tensor interpolative decompositions, as well as a framework for graph Laplacian reduction, or reduced order modeling of Markov chains, based on column selection.
16:10-16:25 CDT
Q&A
Friday, April 4, 2025
8:30-9:00 CDT
Sign-in/Breakfast
9:00-9:35 CDT
Rational Kriging
Speaker: Roshan Joseph (Georgia Institute of Technology)
I will talk about a new kriging method that has a rational form. It is shown that the generalized least squares estimator of the mean from rational kriging is much more well behaved than that of ordinary kriging. Parameter estimation and uncertainty quantification for rational kriging are proposed using a Gaussian process framework. I will also discuss a generalized version of rational kriging, which includes ordinary and rational kriging as special cases. Extensive simulations carried out over a wide class of functions show that the generalized rational kriging performs on par or better than both ordinary and rational kriging in terms of prediction and uncertainty quantification. The potential applications of the new kriging methods in the emulation and calibration of computationally expensive models will be illustrated with real and simulated examples. The paper can be downloaded from https://doi.org/10.1080/01621459.2024.2356296 .
9:35-9:50 CDT
Q&A
9:50-10:20 CDT
Coffee Break
10:20-10:55 CDT
TBA
Speaker: Chih-Li Sung (Michigan State University)
10:55-11:10 CDT
Q&A
11:10-11:45 CDT
Return of the Latent Space Cowboys: Rethinking the use of VAEs in Bayesian Optimisation over Structured Spaces
Speaker: Henry Moss (University of Cambridge (UK) and Lancaster University (UK)
IMSI is committed to making all of our programs and events inclusive and accessible. Contact [email protected] to request disability-related accommodations.
In order to register for this workshop, you must have an IMSI account and be logged in. Please use one of the buttons below to login or create an account.