Learning reaction coordinates and finding optimal model parameters from sampled trajectory ensembles

This was part of Learning Collective Variables and Coarse Grained Models

Peter Bolhuis, University of Amsterdam

Wednesday, April 24, 2024

Abstract: The reaction coordinate (RC) is the principal collective variable or feature that determines the progress along an activated or reactive process. A good RC is crucial for generating sufficient statistics with enhanced sampling. Moreover, the RC provides invaluable atomistic insight in the process under study. The optimal RC is the committor, which can be computed with brute force MD, or more efficiently by e.g. Transition Path Sampling. Novel schemes for transition path sampling using reinforcement learning can now effectively map the committor function. The interpretability of the committor, being a high dimensional function, remains very low. Applying dimensionality reduction can reveal the RC in terms of low-dimensional human un- derstandable molecular collective variables (CVs) or order parameters. In the first part, I discuss several methods to perform this dimensionality reduction, such as likelihood maximization or symbolic regression, but they usually require a preselection of these low-dimension CVs. In addition, we apply an extended auto-encoder that maps the input (many CVs) onto a lower- dimensional latent space, used for the reconstruction of the input as well as the prediction of the committor [1]. I illustrate the method on simple but nontrivial toy systems, as well as extensive molecular simulation data of methane hydrate nucleation. The extended autoencoder model can effectively extract the underlying mechanism of a reaction, make reliable predictions about the committor of a given configuration, and potentially even generate new paths representative for a reaction. In the second part, I focus on a general framework of imposing known rate constants as con- straints in molecular dynamics simulations, based on a combination of the maximum-entropy (MaxEnt) and maximum-caliber principles (MaxCal). Starting from an existing ensemble of (rare event) dynamical trajectories or paths, e.g. obtained from TPS, each path is reweighted in order to match the calculated and experimental interconversion rates of a molecular transition of interest, while minimally perturbing the prior path distribution [2]. This kinetically corrected ensemble of trajectories leads to improved structure, kinetics and thermodynamics. One also learns mechanistic insight that may not be readily evident directly from the experiments. This method does not alter the Hamiltonian directly, and therefore we recently proposed a novel MaxCal-based path-reweighting technique to optimize parameters in the molecular model it- self, while constraining kinetic observables [3]. This opens up the possibility to design molecular models that lead to desired kinetic behaviour. [1] M. Frassek, A. Arjun, and P. G. Bolhuis, J. Chem. Phys. 155, 064103 (2021).[2] Z. F. Brotzakis, M. Vendruscolo, and P. G. Bolhuis, Proc. Natl. Acad. Sci. 118, (2021). [3] P. G. Bolhuis, Z. F. Brotzakis, and B. G. Keller, J. Chem. Phys. 159, 074102 (2023) .