Welcome to the new version of CaltechAUTHORS. Login is currently restricted to library staff. If you notice any issues, please email coda@library.caltech.edu
Published December 2022 | public
Journal Article

Standardized multi-omics of Earth's microbiomes reveals microbial and metabolite diversity

Shaffer, Justin P. ORCID icon
Nothias, Louis-Félix ORCID icon
Thompson, Luke R. ORCID icon
Sanders, Jon G. ORCID icon
Salido, Rodolfo A.
Couvillion, Sneha P. ORCID icon
Brejnrod, Asker D.
Lejzerowicz, Franck
Haiminen, Niina ORCID icon
Huang, Shi
Lutz, Holly L. ORCID icon
Zhu, Qiyun ORCID icon
Martino, Cameron ORCID icon
Morton, James T. ORCID icon
Karthikeyan, Smruthi ORCID icon
Nothias-Esposito, Mélissa
Dührkop, Kai
Böcker, Sebastian ORCID icon
Kim, Hyun Woo
Aksenov, Alexander A.
Bittremieux, Wout ORCID icon
Minich, Jeremiah J.
Marotz, Clarisse
Bryant, MacKenzie M.
Sanders, Karenina
Schwartz, Tara
Humphrey, Greg
Vásquez-Baeza, Yoshiki
Tripathi, Anupriya
Parida, Laxmi ORCID icon
Carrieri, Anna Paola ORCID icon
Beck, Kristen L. ORCID icon
Das, Promi
González, Antonio
McDonald, Daniel ORCID icon
Ladau, Joshua
Karst, Søren M.
Albertsen, Mads ORCID icon
Ackermann, Gail ORCID icon
DeReus, Jeff
Thomas, Torsten ORCID icon
Petras, Daniel ORCID icon
Shade, Ashley ORCID icon
Stegen, James
Song, Se Jin
Metz, Thomas O. ORCID icon
Swafford, Austin D. ORCID icon
Dorrestein, Pieter C. ORCID icon
Jansson, Janet K. ORCID icon
Gilbert, Jack A. ORCID icon
Knight, Rob ORCID icon
Angenant, Lars T.
Berry, Alison M.
Bittleston, Leonora S.
Bowen, Jennifer L.
Chavarría, Max
Cowan, Don A.
Distel, Dan
Girguis, Peter R. ORCID icon
Huerta-Cepas, Jaime
Jensen, Paul R.
Jiang, Lingjing
King, Gary M.
Lavrinienko, Anton
MacRae-Crerar, Aurora
Makhalanyane, Thulani P.
Mappes, Tapio
Marzinelli, Ezequiel M.
Mayer, Gregory
McMahon, Katherine D.
Metcalf, Jessica L.
Miyake, Sou
Mousseau, Timothy A.
Murillo-Cruz, Catalina
Myrold, David
Palenik, Brian
Pinto-Tomás, Adrián A.
Porazinska, Dorota L.
Ramond, Jean-Baptiste
Rowher, Forest
RoyChowdhury, Taniya
Sandin, Stuart A.
Schmidt, Steven K.
Seedorf, Henning
Shade, Ashley
Shipway, J. Reuben
Smith, Jennifer E.
Stegen, James
Stewart, Frank J.
Tait, Karen
Thomas, Torsten
Tucker, Yael
U'Ren, Jana M.
Watts, Phillip C.
Webster, Nicole S.
Zaneveld, Jesse R.
Zhang, Shan
Earth Microbiome Project 500 (EMP500) Consortium

Abstract

Despite advances in sequencing, lack of standardization makes comparisons across studies challenging and hampers insights into the structure and function of microbial communities across multiple habitats on a planetary scale. Here we present a multi-omics analysis of a diverse set of 880 microbial community samples collected for the Earth Microbiome Project. We include amplicon (16S, 18S, ITS) and shotgun metagenomic sequence data, and untargeted metabolomics data (liquid chromatography-tandem mass spectrometry and gas chromatography mass spectrometry). We used standardized protocols and analytical methods to characterize microbial communities, focusing on relationships and co-occurrences of microbially related metabolites and microbial taxa across environments, thus allowing us to explore diversity at extraordinary scale. In addition to a reference database for metagenomic and metabolomic data, we provide a framework for incorporating additional studies, enabling the expansion of existing knowledge in the form of an evolving community resource. We demonstrate the utility of this database by testing the hypothesis that every microbe and metabolite is everywhere but the environment selects. Our results show that metabolite diversity exhibits turnover and nestedness related to both microbial communities and the environment, whereas the relative abundances of microbially related metabolites vary and co-occur with specific microbial consortia in a habitat-specific manner. We additionally show the power of certain chemistry, in particular terpenoids, in distinguishing Earth's environments (for example, terrestrial plant surfaces and soils, freshwater and marine animal stool), as well as that of certain microbes including Conexibacter woesei (terrestrial soils), Haloquadratum walsbyi (marine deposits) and Pantoea dispersa (terrestrial plant detritus). This Resource provides insight into the taxa and metabolites within microbial communities from diverse habitats across Earth, informing both microbial and chemical ecology, and provides a foundation and methods for multi-omics microbiome studies of hosts and the environment.

Additional Information

We thank G. Milivenvsky, A. Møller, I. Chizhevsky, S. Kirieiev, A. Nosovsky and M. Ivanenko for logistic support with fieldwork in Ukraine; L. Goldasich and J. Toronczak for assistance with sample processing for sequencing; M. Fedarko, R. Diner, E. Wood-Charlson, S. Nayfach, D. Udwary and E. Eloe-Fadrosh for reviewing the manuscript. This work was supported in part by the Samuel Freeman Charitable Trust, US National Institute of Health (NIH) (awards 1RF1-AG058942-01, 1DP1AT010885, R01HL140976, R01DK102932, R01HL134887, U19AG063744 and U01AI124316 to R.K.), US Department of Agriculture – National Institute of Food and Agriculture (USDA-NIFA) (award 2019-67013-29137 to R.K.), the US National Science Foundation (NSF) - Center for Aerosol Impacts on Chemistry of the Environment, Crohn's & Colitis Foundation Award (CCFA) (award 675191 to R.K.), US Department of Energy - Office of Science - Office of Biological and Environmental Research - Environmental System Science Program, Semiconductor Research Corporation and Defence Advanced Research Projects Agency (SRC/DARPA) (award GI18518 to R.K.), Department of Defense (award W81XWH-17-1-0589 to R.K.), the Office of Naval Research (ONR) (award N00014-15-1-2809 to R.K.), the Emerald Foundation (award 3022 to R.K.), IBM Research AI through the AI Horizons Network, and the Center for Microbiome Innovation. J.P.S. was supported by NIH/NIGMS IRACDA K12 GM068524. L.-F.N. was supported by the NIH (award R01-GM107550). A.D.B. was supported by the Danish Council for Independent Research (DFF) (award 9058-00025B). W.B. was supported by the Research Foundation – Flanders (12W0418N). K.D. and S.B. were supported by Deutsche Forschungsgemeinschaft (BO 1910/20 and 1910/23). P.C.D. was supported by the Gordon and Betty Moore Foundation (award GBMF7622) and the NIH (award R01-GM107550). Metabolomics analyses at Pacific Northwest National Laboratory (PNNL) were supported by the Laboratory Directed Research and Development program via the Microbiomes in Transition Initiative and performed in the Environmental Molecular Sciences Laboratory, a national scientific user facility sponsored by the US Office of Biological and Environmental Research and located at PNNL. This contribution originates in part from the River Corridor Scientific Focus Area project at PNNL. PNNL is a multiprogram national laboratory operated by Battelle for the Department of Energy (DOE) under contract DE-AC05-76RLO 1830, as well as work supported by COMPASS-FME, a multi-institutional project supported by the US DOE, Office of Science, Biological and Environmental Research as part of the Environmental System Science Program. We thank Eppendorf, Illumina and Integrated DNA Technologies for in-kind support at various phases of the project. Contributions: The EMP500 Consortium collected and provided samples. J.A.G., J.K.J. and R.K. conceived the idea for the project. P.C.D. and R.K. designed the multi-omics component of the project and provided project oversight. J.P.S. managed the project, performed preliminary data exploration, coordinated data analysis, analysed data and provided data interpretation. L.-F.N. coordinated and performed LC–MS/MS analysis, and the processing, annotation and interpretation of LC–MS/MS data. M.N.-E. performed sample preparation and extraction before LC–MS/MS analysis. L.R.T. designed the multi-omics component of the project, solicited sample collection, curated sample metadata, processed samples, performed preliminary data exploration and provided project oversight. J.G.S. designed the multi-omics component, managed the project, developed protocols and tools, coordinated and performed sequencing, and performed preliminary exploration of sequence data. R.A.S. developed protocols, and coordinated and performed sequencing. S.P.C. and T.O.M. coordinated and performed GC–MS sample processing and provided interpretation of GC–MS data. A.D.B. conceived the idea for the paper, performed preliminary data exploration, analysed data and provided data interpretation. S.H. performed machine-learning analyses. F.L. performed co-occurrence analysis, multinomial regression analyses and correlations with co-occurrence data. H.L.L. performed multinomial regression analyses. Q.Z. developed tools and provided interpretation of shotgun metagenomics data. C. Martino and J.T.M. provided oversight and interpretation of RPCA, multinomial regression and co-occurrence analyses. S.K. performed preliminary exploration of shotgun metagenomics data. K.D., S.B. and H.W.K. contributed to the annotation of LC–MS/MS data. A.A.A. processed GC–MS data. W.B. provided oversight for machine-learning analyses. C. Marotz processed samples for sequencing. Y.V.B. performed preliminary data exploration and provided oversight for machine-learning analysis. A.T. and D.P. performed preliminary data exploration. J.L. provided oversight and interpretation of nestedness analyses. L.P., A.P.C., N.H. and K.L.B. performed preliminary exploration of shotgun metagenomic data and performed machine-learning analyses. P.D. performed preliminary exploration of shotgun metagenomics data. A.G. developed tools, provided interpretation of shotgun metagenomics data and analysed shotgun metagenomics data. G.H. coordinated short-read amplicon and shotgun metagenomics sequencing. M.M.B. and K.S. performed short-read amplicon and shotgun metagenomics sequencing. T.S. assisted with DNA extraction. D.M. coordinated long-read amplicon sequencing, analysed shotgun metagenomics data and provided interpretation of the data. S.M.K. and M.A. coordinated and performed long-read amplicon sequencing and long-read sequence data analysis. J.J.M. collected samples, coordinated field logistics, developed protocols, and performed short-read amplicon and shotgun metagenomics sequencing. S.J.S. collected samples, coordinated field logistics and provided interpretation of the data. G.A. curated sample metadata and organized sequence data. J.D. processed sequence data. A.D.S. provided project oversight and data interpretation. T.T., A.S. and J.S. collected samples, coordinated field logistics and provided interpretation of the data. J.P.S. wrote the manuscript, with contributions from all authors. Data availability. The mass spectrometry method and data (.RAW and .mzML) were deposited on the MassIVE public repository and are available under the dataset accession number MSV000083475. The processing files were also added to the deposition (updates/2019-08-21_lfnothias_7cc0af40/other/1908_EMPv2_INN/). GNPS molecular networking job is available at https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=929ce9411f684cf8abd009670b293a33 and was also performed in analogue mode https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=fafdbfc058184c2b8c87968a7c56d7aa. The DEREPLICATOR jobs can be accessed at https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=ee40831bcc314bda928886964d853a52 and https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=1fafd4d4fe7e47dd9dd0b3d8bb0e6606. The SIRIUS results are available on the GitHub repository (emp/data/metabolomics/FBMN/SIRIUS). The notebooks for metabolomics data preparation and microbially related molecules establishment are available at https://github.com/lfnothias/emp_metabolomics. Amplicon and shotgun metagenomic sequence data were submitted to the European Nucleotide Archive under Project PRJEB42019 (https://www.ebi.ac.uk/ena/browser/view/PRJEB42019). Raw and demultiplexed amplicon and shotgun sequence data, the feature-table for full-length rRNA operon analysis, feature-tables for LC–MS/MS classical molecular networking and feature-based molecular networking, and the feature-table for GC–MS molecular networking data are available for download and analysis through Qiita at https://www.qiita.ucsd.edu (study: 13114). The GreenGenes database for 16S rRNA can be accessed at https://greengenes.secondgenome.com. The SILVA 138 database for 16S and 18S rRNA can be accessed at https://www.arb-silva.de. The UNITE 9 database for fungal ITS sequences can be accessed at https://unite.ut.ee. The Web of Life database can be accessed at https://biocore.github.io/wol/. The Rep200 database can be accessed at https://www.ncbi.nlm.nih.gov/refseq/. The Natural Products Atlas database can be accessed at https://www.npatlas.org. The MIBiG database can be accessed at https://mibig.secondarymetabolites.org. Code availability. Complete protocols for laboratory and computational workflows for both metagenomics and metabolomics data for use by the broader community are available in GitHub (https://github.com/biocore/emp/blob/master/methods/methods_release2.md). Competing interests. S.B. and K.D. are co-founders of Bright Giant GmbH, which implements some of the tools used for metabolite annotation here (that is, SIRIUS, CSI-FingerID+CANOPUS). The remaining authors declare no competing interests. Peer review. Nature Microbiology thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional details

Created:
August 22, 2023
Modified:
October 24, 2023