Pseudoalignment for metagenomic read assignment
Abstract
Motivation: Read assignment is an important first step in many metagenomic analysis workflows, providing the basis for identification and quantification of species. However ambiguity among the sequences of many strains makes it difficult to assign reads at the lowest level of taxonomy, and reads are typically assigned to taxonomic levels where they are unambiguous. We explore connections between metagenomic read assignment and the quantification of transcripts from RNA-Seq data in order to develop novel methods for rapid and accurate quantification of metagenomic strains. Results: We find that the recent idea of pseudoalignment introduced in the RNA-Seq context is highly applicable in the metagenomics setting. When coupled with the Expectation-Maximization (EM) algorithm, reads can be assigned far more accurately and quickly than is currently possible with state of the art software, making it possible and practical for the first time to analyze abundances of individual genomes in metagenomics projects.
Additional Information
© The Author 2017. Published by Oxford University Press. This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices). Received on October 18, 2016; revised on January 23, 2017; editorial decision on February 15, 2017; accepted on February 17, 2017. Published: 21 February 2017. We thank readers of preprints of this manuscript for helpful suggestions that have improved our method and its description in the paper. H.P. was supported by an NSF graduate research fellowship. P.M. was partially supported by a Fulbright fellowship. L.S and L.P. were partially supported by NIH R01 HG006129 and NIH R01 DK094699. Conflict of Interest: none declared.Attached Files
Submitted - 1510.07371.pdf
Files
Name | Size | Download all |
---|---|---|
md5:afa537b3beb55a7ae0ec1b01424475ae
|
1.6 MB | Preview Download |
Additional details
- PMCID
- PMC5870846
- Eprint ID
- 74793
- DOI
- 10.1093/bioinformatics/btx106
- Resolver ID
- CaltechAUTHORS:20170306-131027010
- NSF Graduate Research Fellowship
- Fulbright Foundation
- NIH
- R01 HG006129
- NIH
- R01 DK094699
- Created
-
2017-03-06Created from EPrint's datestamp field
- Updated
-
2021-11-11Created from EPrint's last_modified field