Gene-level differential analysis at transcript-level resolution

Creators: Yi, Lynn; Pimentel, Harold; Bray, Nicolas L.; Pachter, Lior

Abstract

Compared to RNA-sequencing transcript differential analysis, gene-level differential expression analysis is more robust and experimentally actionable. However, the use of gene counts for statistical analysis can mask transcript-level dynamics. We demonstrate that 'analysis first, aggregation second,' where the p values derived from transcript analysis are aggregated to obtain gene-level results, increase sensitivity and accuracy. The method we propose can also be applied to transcript compatibility counts obtained from pseudoalignment of reads, which circumvents the need for quantification and is fast, accurate, and model-free. The method generalizes to various levels of biology and we showcase an application to gene ontologies.

Additional Information

© 2018 The Author(s). This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. We thank Jase Gehring, Páll Melsted, and Vasilis Ntranos for discussion and feedback during development of the methods. Conversations with Cole Trapnell regarding the challenges of functional characterization of individual isoforms were instrumental in launching the project. LY was partially funded by the UCLA-Caltech Medical Science Training Program, NIH T32 GM07616, and the Lee Ramo Fund. Harold Pimentel was partially funded by NIH R01 HG008140. Availability of data and materials: Scripts to reproduce the figures and results of the paper are available at http://github.com/pachterlab/aggregationDE/, which is under GNU General Public License v3.0. [33]. The RNA-seq datasets used in the analysis can be found at GEO GSE89024 [21]and GEO GSE95363 [25]. Authors' contributions: LY, NLB, and LP devised the methods. LY analyzed the biological data. LY and LP performed computational experiments. HP developed and implemented the simulation framework. LY and LP wrote the paper. NLB and LP supervised the research. All authors read and approved the final manuscript. Ethics approval and consent to participate: No data from humans were used in this manuscript. The authors declare that they have no competing interests.

Attached Files

Published - s13059-018-1419-z.pdf

Submitted - 190199.full.pdf

Supplemental Material - 13059_2018_1419_MOESM1_ESM.pdf

Files

190199.full.pdf

Files (17.4 MB)

Name	Size	Download all
190199.full.pdf md5:3f39b00863193e9c702dacdda7da9466	1.3 MB	Preview Download
13059_2018_1419_MOESM1_ESM.pdf md5:f1ce19baa69de46dc275bad7d62f5579	14.1 MB	Preview Download
s13059-018-1419-z.pdf md5:287e854b937c8beb42c6f872f30604c6	2.0 MB	Preview Download

Additional details

	All versions	This version
Views	0	0
Downloads	0	0
Data volume	0 Bytes	0 Bytes