Welcome to the new version of CaltechAUTHORS. Login is currently restricted to library staff. If you notice any issues, please email coda@library.caltech.edu
Published August 1, 2022 | Supplemental Material
Journal Article Open

Exploring Deep Learning of Quantum Chemical Properties for Absorption, Distribution, Metabolism, and Excretion Predictions

Abstract

Quantum mechanical (QM) descriptors of small molecules have wide applicability in understanding organic reactivity and molecular properties, but the substantial compute cost required for ab initio QM calculations limits their broad usage. Here, we investigate the use of deep learning for predicting QM descriptors, with the goal of enabling usage of near-QM accuracy electronic properties on large molecular data sets such as those seen in drug discovery. Several deep learning approaches have previously been benchmarked on a published data set called QM9, where 12 ground-state properties have been calculated for molecules with up to nine heavy atoms, limited to C, H, N, O, and F elements. To advance the work beyond the QM9 chemical space and enable application to molecules encountered in drug discovery, we extend the QM9 data set by creating a QM9-extended data set covering an additional ∼20,000 molecules containing S and Cl atoms. Using this extended set, we generate new deep learning models as well as leverage ANI-2x models to provide predictions on larger, more diverse molecules common in drug discovery, and we find the models estimate 11 of 12 ground-state properties reasonably. We use the predicted QM descriptors to augment graph convolutional neural network (GCNN) models for selected ADME end points (rat microsomal clearance, hepatic clearance, total clearance, and P-glycoprotein efflux) and found varying degrees of performance improvement compared to nonaugmented GCNN models, including pronounced improvement in P-glycoprotein efflux prediction.

Additional Information

© 2022 American Chemical Society. Received 1 March 2022. Published online 27 June 2022. Data and Software Availability: ChEMBL data sets and computed descriptors are available in the Supporting Informaiton. This work also leverages proprietary data sets from Merck & Co. (Kenilworth, NJ) to provide higher confidence conclusions. Software used to train models is freely available from Yang et al. (14) at https://github.com/chemprop. Software used for identifying low energy 3D conformations is available from the Chemical Computing Group (Montreal, Canada). We thank our computational and structural chemistry colleagues for feedback on the work. This work was supported in full by Merck Sharp & Dohme Corp., a subsidiary of Merck & Co., Inc., Kenilworth, NJ, USA. Author Contributions. M. A. Lim and S. Yang have contributed equally. All authors contributed to the research, writing of the manuscript, and have given approval to the final version of the manuscript. The authors declare no competing financial interest.

Attached Files

Supplemental Material - ci2c00245_si_001.pdf

Supplemental Material - ci2c00245_si_002.xlsx

Supplemental Material - ci2c00245_si_003.xlsx

Files

ci2c00245_si_001.pdf
Files (20.4 MB)
Name Size Download all
md5:4374fc805dba8266c9806dd666c60573
498.9 kB Preview Download
md5:91bd98dfe84778a62226cca8af94e9dd
74.1 kB Download
md5:eec096cc3f2266cc85d8763075f0c4e0
19.9 MB Download

Additional details

Created:
August 22, 2023
Modified:
October 24, 2023