RefShannon: A genome-guided transcriptome assembler using sparse flow decomposition
Abstract
High throughput sequencing of RNA (RNA-Seq) has become a staple in modern molecular biology, with applications not only in quantifying gene expression but also in isoform-level analysis of the RNA transcripts. To enable such an isoform-level analysis, a transcriptome assembly algorithm is utilized to stitch together the observed short reads into the corresponding transcripts. This task is complicated due to the complexity of alternative splicing - a mechanism by which the same gene may generate multiple distinct RNA transcripts. We develop a novel genome-guided transcriptome assembler, RefShannon, that exploits the varying abundances of the different transcripts, in enabling an accurate reconstruction of the transcripts. Our evaluation shows RefShannon is able to improve sensitivity effectively (up to 22%) at a given specificity in comparison with other state-of-the-art assemblers. RefShannon is written in Python and is available from Github (https://github.com/shunfumao/RefShannon).
Additional Information
© 2020 Mao et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Received: October 18, 2019; Accepted: April 24, 2020; Published: June 2, 2020. The authors would like to thank Joseph Hui and Kayvon Mazooji for their support at the initial stage of the project. Data Availability Statement: All relevant data are within the manuscript and its Supporting Information files. This project is funded by NIH award 1R01HG008164, NSF CCF-1651236, and NSF CIF-1703403. The authors have declared that no competing interests exist. Author Contributions: Conceptualization: Lior Pachter, David Tse, Sreeram Kannan. Data curation: Sreeram Kannan. Formal analysis: Shunfu Mao. Funding acquisition: Lior Pachter, David Tse, Sreeram Kannan. Investigation: Shunfu Mao, Lior Pachter, David Tse, Sreeram Kannan. Methodology: Shunfu Mao, Lior Pachter, David Tse, Sreeram Kannan. Project administration: Lior Pachter, David Tse, Sreeram Kannan. Software: Shunfu Mao, Sreeram Kannan. Supervision: Sreeram Kannan. Validation: Shunfu Mao. Visualization: Shunfu Mao. Writing – original draft: Shunfu Mao. Writing – review & editing: Shunfu Mao, Sreeram Kannan.Attached Files
Published - journal.pone.0232946.pdf
Supplemental Material - journal.pone.0232946.s001.pdf
Supplemental Material - journal.pone.0232946.s002.pdf
Supplemental Material - journal.pone.0232946.s003.pdf
Supplemental Material - journal.pone.0232946.s004.pdf
Supplemental Material - journal.pone.0232946.s005.pdf
Supplemental Material - journal.pone.0232946.s006.pdf
Supplemental Material - journal.pone.0232946.s007.pdf
Files
Name | Size | Download all |
---|---|---|
md5:e08e271c995bbde6f3dc4552344b1ba4
|
115.0 kB | Preview Download |
md5:da3ba87e92afa78b306c5df0ba98321a
|
1.6 MB | Preview Download |
md5:62a7e19c3276a5cdbb587f8475324d10
|
115.4 kB | Preview Download |
md5:18c694787480aff8b7ca2562752ad217
|
117.2 kB | Preview Download |
md5:f1b7ceaa2f82ff0562f8b6b3f5e430cd
|
273.1 kB | Preview Download |
md5:fea04ebe3b142651a3f43b19863cac51
|
45.4 kB | Preview Download |
md5:2b559e1c35948f3fa9cabf36a572f4ae
|
116.3 kB | Preview Download |
md5:ada5865bc40c8215d7b0829ad5ab1852
|
229.3 kB | Preview Download |
Additional details
- PMCID
- PMC7266320
- Eprint ID
- 103638
- Resolver ID
- CaltechAUTHORS:20200602-124021279
- NIH
- 1R01HG008164
- NSF
- CCF-1651236
- NSF
- CIF-1703403
- Created
-
2020-06-02Created from EPrint's datestamp field
- Updated
-
2023-06-01Created from EPrint's last_modified field
- Caltech groups
- Division of Biology and Biological Engineering (BBE)