Exploring Deep Learning of Quantum Chemical Properties for Absorption, Distribution, Metabolism, and Excretion Predictions
Abstract
Quantum mechanical (QM) descriptors of small molecules have wide applicability in understanding organic reactivity and molecular properties, but the substantial compute cost required for ab initio QM calculations limits their broad usage. Here, we investigate the use of deep learning for predicting QM descriptors, with the goal of enabling usage of near-QM accuracy electronic properties on large molecular data sets such as those seen in drug discovery. Several deep learning approaches have previously been benchmarked on a published data set called QM9, where 12 ground-state properties have been calculated for molecules with up to nine heavy atoms, limited to C, H, N, O, and F elements. To advance the work beyond the QM9 chemical space and enable application to molecules encountered in drug discovery, we extend the QM9 data set by creating a QM9-extended data set covering an additional ∼20,000 molecules containing S and Cl atoms. Using this extended set, we generate new deep learning models as well as leverage ANI-2x models to provide predictions on larger, more diverse molecules common in drug discovery, and we find the models estimate 11 of 12 ground-state properties reasonably. We use the predicted QM descriptors to augment graph convolutional neural network (GCNN) models for selected ADME end points (rat microsomal clearance, hepatic clearance, total clearance, and P-glycoprotein efflux) and found varying degrees of performance improvement compared to nonaugmented GCNN models, including pronounced improvement in P-glycoprotein efflux prediction.
Additional Information
© 2022 American Chemical Society. Received 1 March 2022. Published online 27 June 2022. Data and Software Availability: ChEMBL data sets and computed descriptors are available in the Supporting Informaiton. This work also leverages proprietary data sets from Merck & Co. (Kenilworth, NJ) to provide higher confidence conclusions. Software used to train models is freely available from Yang et al. (14) at https://github.com/chemprop. Software used for identifying low energy 3D conformations is available from the Chemical Computing Group (Montreal, Canada). We thank our computational and structural chemistry colleagues for feedback on the work. This work was supported in full by Merck Sharp & Dohme Corp., a subsidiary of Merck & Co., Inc., Kenilworth, NJ, USA. Author Contributions. M. A. Lim and S. Yang have contributed equally. All authors contributed to the research, writing of the manuscript, and have given approval to the final version of the manuscript. The authors declare no competing financial interest.Attached Files
Supplemental Material - ci2c00245_si_001.pdf
Supplemental Material - ci2c00245_si_002.xlsx
Supplemental Material - ci2c00245_si_003.xlsx
Files
Name | Size | Download all |
---|---|---|
md5:4374fc805dba8266c9806dd666c60573
|
498.9 kB | Preview Download |
md5:91bd98dfe84778a62226cca8af94e9dd
|
74.1 kB | Download |
md5:eec096cc3f2266cc85d8763075f0c4e0
|
19.9 MB | Download |
Additional details
- Eprint ID
- 115977
- Resolver ID
- CaltechAUTHORS:20220729-894394000
- Merck Sharp and Dohme
- Created
-
2022-08-01Created from EPrint's datestamp field
- Updated
-
2022-08-01Created from EPrint's last_modified field