Low Dimensionality in Gene Expression Data Enables the Accurate Extraction of Transcriptional Programs from Shallow Sequencing
Abstract
A tradeoff between precision and throughput constrains all biological measurements, including sequencing-based technologies. Here, we develop a mathematical framework that defines this tradeoff between mRNA-sequencing depth and error in the extraction of biological information. We find that transcriptional programs can be reproducibly identified at 1% of conventional read depths. We demonstrate that this resilience to noise of "shallow" sequencing derives from a natural property, low dimensionality, which is a fundamental feature of gene expression data. Accordingly, our conclusions hold for ∼350 single-cell and bulk gene expression datasets across yeast, mouse, and human. In total, our approach provides quantitative guidelines for the choice of sequencing depth necessary to achieve a desired level of analytical resolution. We codify these guidelines in an open-source read depth calculator. This work demonstrates that the structure inherent in biological networks can be productively exploited to increase measurement throughput, an idea that is now common in many branches of science, such as image processing.
Additional Information
© 2016 The Authors. Under a Creative Commons license. Received: November 30, 2015. Revised: March 8, 2016. Accepted: April 4, 2016. Published: April 27, 2016. The authors would like to thank Jason Kreisberg, Alex Fields, David Sivak, Patrick Cahan, Jonathan Weissman, Chun Ye, Michael Chevalier, Satwik Rajaram, and Steve Altschuler for careful reading of the manuscript; Eric Chow,John Haliburton, Sisi Chen, and Emeric Charles for their experimental insights; and Paul Rivaud for website design assistance. This work was supported by the UCSF Center for Systems and Synthetic Biology (NIGMS P50 GM081879). H.E.S. acknowledges support from the Paul G. Allen Family Foundation. M.T. acknowledges support from the NIH Office of the Director, the National Cancer Institute, and the National Institute of Dental and Craniofacial Research (NIH DP5 OD012194).Attached Files
Published - PIIS2405471216301090.pdf
Supplemental Material - mmc1.pdf
Files
Name | Size | Download all |
---|---|---|
md5:cb88d5fb543a722d0c00d480489ff75a
|
4.1 MB | Preview Download |
md5:f920f213cf3fde80b8c9380790265a58
|
13.9 MB | Preview Download |
Additional details
- PMCID
- PMC4856162
- Eprint ID
- 74044
- Resolver ID
- CaltechAUTHORS:20170203-145417665
- NIH
- P50 GM081879
- Paul G. Allen Family Foundation
- NIH
- DP5 OD012194
- Created
-
2017-02-03Created from EPrint's datestamp field
- Updated
-
2021-11-11Created from EPrint's last_modified field