Length Biases in Single-Cell RNA Sequencing of pre-mRNA
- Creators
-
Gorin, Gennady
-
Pachter, Lior
Abstract
Single-molecule pre-mRNA and mRNA sequencing data can be modeled and analyzed using the Markov chain formalism to yield genome-wide insights into transcription. However, quantitative inference with such data requires careful assessment and understanding of noise sources. We find that long pre-mRNA transcripts are over-represented in sequencing data, and explore the mechanistic implications. A biological explanation for this phenomenon within our modeling framework requires unrealistic transcriptional parameters, leading us to posit a length-based model of capture bias. We provide solutions for this model, and use them to find concordant and mechanistically plausible parameter trends across data from multiple single-cell RNA-seq experiments in several species.
Additional Information
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license. Posted July 31, 2021. G.G. and L.P. are partially funded by NIH U19MH114830. The DNA and RNA illustrations used in Figures 1 and 2 are derived from the DNA Twemoji by Twitter, Inc., used under CC-BY 4.0. Data and code availability: https://github.com/pachterlab/GP_2021_3 contains a Python notebook that can be used to reproduce the figures, as well as a sample notebook that applies the computational pipeline to a 10X PBMC dataset. The same repository contains all scripts used to make references, download datasets, quantify transcripts, and process the resulting loom files through the inference pipeline. The raw loom files and all search results are deposited in the CaltechDATA repository [62, 63].Attached Files
Submitted - 2021.07.30.454514v1.full.pdf
Supplemental Material - media-1.zip
Files
Name | Size | Download all |
---|---|---|
md5:47d7b8fd297e05ad95b97c9bb3a640a5
|
893.2 kB | Preview Download |
md5:10a3f8ed5b49bfa30c3827e15e9bb882
|
25.0 MB | Preview Download |
Additional details
- Eprint ID
- 110118
- Resolver ID
- CaltechAUTHORS:20210802-221611892
- NIH
- U19MH114830
- Created
-
2021-08-02Created from EPrint's datestamp field
- Updated
-
2021-11-16Created from EPrint's last_modified field
- Caltech groups
- Division of Biology and Biological Engineering (BBE)