Welcome to the new version of CaltechAUTHORS. Login is currently restricted to library staff. If you notice any issues, please email coda@library.caltech.edu
Published July 19, 2019 | Published
Journal Article Open

Multi-component background learning automates signal detection for spectroscopic data

Abstract

Automated experimentation has yielded data acquisition rates that supersede human processing capabilities. Artificial Intelligence offers new possibilities for automating data interpretation to generate large, high-quality datasets. Background subtraction is a long-standing challenge, particularly in settings where multiple sources of the background signal coexist, and automatic extraction of signals of interest from measured signals accelerates data interpretation. Herein, we present an unsupervised probabilistic learning approach that analyzes large data collections to identify multiple background sources and establish the probability that any given data point contains a signal of interest. The approach is demonstrated on X-ray diffraction and Raman spectroscopy data and is suitable to any type of data where the signal of interest is a positive addition to the background signals. While the model can incorporate prior knowledge, it does not require knowledge of the signals since the shapes of the background signals, the noise levels, and the signal of interest are simultaneously learned via a probabilistic matrix factorization framework. Automated identification of interpretable signals by unsupervised probabilistic learning avoids the injection of human bias and expedites signal extraction in large datasets, a transformative capability with many applications in the physical sciences and beyond.

Additional Information

© 2019 The Author(s). This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. Received 18 February 2019; Accepted 26 June 2019; Published 19 July 2019. The development of the MCBL algorithm, inkjet printing synthesis, and Raman measurements were supported by a an Accelerated Materials Design and Discovery grant from the Toyota Research Institute. Initial design of the algorithm and data procurement were supported by the NSF Expedition award for Computational Sustainability CCF-1522054 and by Army Research Office (ARO) award W911-NF-14-1-0498. The implementation of the algorithm for automated, unsupervised operation was supported by MURI/AFOSR grant FA9550. Compute infrastructure was provided by NSF award CNS-0832782 and by ARO DURIP award W911NF-17-1-0187. The sputter deposition and XRD measurements were supported through the Office of Science of the U.S. Department of Energy under Award No. DE-SC0004993. The authors thank Edwin Soedarmadji for assistance with data management. Data availability: The datasets analyzed during the current study are available in the Caltech Data repository: XRD at https://doi.org/10.22002/D1.1178, https://data.caltech.edu/records/1178 and Raman at https://doi.org/10.22002/D1.1179, https://data.caltech.edu/records/1179. Code availability: The codes pertaining to the current study will be available at http://www.cs.cornell.edu/gomes/udiscoverit/. Author Contributions: C.G. and J.G. identified the problem to be solved. S.A. and C.G. conceptualized the model. S.A. developed the mathematical framework, designed the algorithm, and implemented it. J.G., H.S. and D.G. inspected results. S.A., D.G. and J.G. created visualizations of the results. L.Z. performed materials synthesis and data acquisition for XRD data. J.H. synthesized materials for Raman measurements. D.B. and M.U. acquired and provided the Raman data. S.A., J.G., C.G., H.S. and D.G. wrote the paper. The authors declare no competing interests.

Attached Files

Published - s41524-019-0213-0.pdf

Files

s41524-019-0213-0.pdf
Files (2.1 MB)
Name Size Download all
md5:dc28c660edfba6f581781cfbe046a713
2.1 MB Preview Download

Additional details

Created:
August 19, 2023
Modified:
October 20, 2023