Welcome to the new version of CaltechAUTHORS. Login is currently restricted to library staff. If you notice any issues, please email coda@library.caltech.edu
Published September 14, 2022 | public
Journal Article

Molecular dipole moment learning via rotationally equivariant derivative kernels in molecular-orbital-based machine learning

Abstract

This study extends the accurate and transferable molecular-orbital-based machine learning (MOB-ML) approach to modeling the contribution of electron correlation to dipole moments at the cost of Hartree–Fock computations. A MOB pairwise decomposition of the correlation part of the dipole moment is applied, and these pair dipole moments could be further regressed as a universal function of MOs. The dipole MOB features consist of the energy MOB features and their responses to electric fields. An interpretable and rotationally equivariant derivative kernel for Gaussian process regression (GPR) is introduced to learn the dipole moment more efficiently. The proposed problem setup, feature design, and ML algorithm are shown to provide highly accurate models for both dipole moments and energies on water and 14 small molecules. To demonstrate the ability of MOB-ML to function as generalized density-matrix functionals for molecular dipole moments and energies of organic molecules, we further apply the proposed MOB-ML approach to train and test the molecules from the QM9 dataset. The application of local scalable GPR with Gaussian mixture model unsupervised clustering GPR scales up MOB-ML to a large-data regime while retaining the prediction accuracy. In addition, compared with the literature results, MOB-ML provides the best test mean absolute errors of 4.21 mD and 0.045 kcal/mol for dipole moment and energy models, respectively, when training on 110 000 QM9 molecules. The excellent transferability of the resulting QM9 models is also illustrated by the accurate predictions for four different series of peptides.

Additional Information

We thank Vignesh Bhethanabotla for his help in improving the quality of the manuscript. T.F.M. acknowledges support from the U.S. Army Research Laboratory (Grant No. W911NF-12-2-0023), the U.S. Department of Energy (Grant No. DE-SC0019390), the Caltech DeLogi Fund, and the Camille and Henry Dreyfus Foundation (Award No. ML-20-196). Computational resources were provided by the National Energy Research Scientific Computing Center (NERSC), a DOE Office of Science User Facility supported by the DOE Office of Science under contract Grant No. DE-AC02-05CH11231.

Additional details

Created:
August 22, 2023
Modified:
October 24, 2023