Gene-level differential analysis at transcript-level resolution
Abstract
Compared to RNA-sequencing transcript differential analysis, gene-level differential expression analysis is more robust and experimentally actionable. However, the use of gene counts for statistical analysis can mask transcript-level dynamics. We demonstrate that 'analysis first, aggregation second,' where the p values derived from transcript analysis are aggregated to obtain gene-level results, increase sensitivity and accuracy. The method we propose can also be applied to transcript compatibility counts obtained from pseudoalignment of reads, which circumvents the need for quantification and is fast, accurate, and model-free. The method generalizes to various levels of biology and we showcase an application to gene ontologies.
Additional Information
© 2018 The Author(s). This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. We thank Jase Gehring, Páll Melsted, and Vasilis Ntranos for discussion and feedback during development of the methods. Conversations with Cole Trapnell regarding the challenges of functional characterization of individual isoforms were instrumental in launching the project. LY was partially funded by the UCLA-Caltech Medical Science Training Program, NIH T32 GM07616, and the Lee Ramo Fund. Harold Pimentel was partially funded by NIH R01 HG008140. Availability of data and materials: Scripts to reproduce the figures and results of the paper are available at http://github.com/pachterlab/aggregationDE/, which is under GNU General Public License v3.0. [33]. The RNA-seq datasets used in the analysis can be found at GEO GSE89024 [21]and GEO GSE95363 [25]. Authors' contributions: LY, NLB, and LP devised the methods. LY analyzed the biological data. LY and LP performed computational experiments. HP developed and implemented the simulation framework. LY and LP wrote the paper. NLB and LP supervised the research. All authors read and approved the final manuscript. Ethics approval and consent to participate: No data from humans were used in this manuscript. The authors declare that they have no competing interests.Attached Files
Published - s13059-018-1419-z.pdf
Submitted - 190199.full.pdf
Supplemental Material - 13059_2018_1419_MOESM1_ESM.pdf
Files
Additional details
- PMCID
- PMC5896116
- Eprint ID
- 85872
- Resolver ID
- CaltechAUTHORS:20180416-090553011
- Caltech- Medical Science Training Program
- T32 GM07616
- NIH Predoctoral Fellowship
- Lee Ramo Fund
- R01 HG008140
- NIH
- Created
-
2018-04-16Created from EPrint's datestamp field
- Updated
-
2023-06-01Created from EPrint's last_modified field