Analysis of Length Biases in Single-Cell RNA Sequencing of Unspliced mRNA by Markov Modeling
- Creators
-
Gorin, Gennady
-
Pachter, Lior
Abstract
Recent experimental advances in single-cell RNA sequencing (scRNA-seq) have enabled the quantification of transcriptomes with single-molecule resolution. However, thus far, the stochastic modeling of transcription has been separate from the discussion of the statistics of the sequencing process, leading to simplifications that may obfuscate transcriptional dynamics, and technical artifacts in the assays. For example, imputation, normalization, and smoothing, used to correct for stochastic sequencing phenomena, make experimental molecule count data incompatible with a discrete representation, thus rendering the data uninterpretable in the context of conventional Chemical Master Equation (CME) models. Models of gene expression - such as the negative binomial count model - are used with limited physical justification, whereas models for multimodal data are under-explored. Conversely, more detailed CME descriptions of gene expression do not directly address the complexities of the sequencing process. We demonstrate that modeling both phenomena reveals a pervasive gene length-based effect in the detection of unspliced mRNA: long genes are substantially more likely to have higher average unspliced mRNA expression. To explain this effect, we build a stochastic model that accounts for physiological and experimental events, and jointly infer hundreds of gene-specific as well as transcriptome-wide parameters. Specifically, we extend a joint model of mRNA processing described by Singh and Bokes (Biophys. J., 2012) to incorporate downstream Poisson sampling, representing cDNA library construction and sequencing. The explicit inclusion of sampling yields mechanistically interpretable results for the gene expression parameters, and suggests extensions to more complex models.
Additional Information
© 2021 Biophysical Society. Available online 12 February 2021.Additional details
- Eprint ID
- 108918
- DOI
- 10.1016/j.bpj.2020.11.706
- Resolver ID
- CaltechAUTHORS:20210503-100056268
- Created
-
2021-05-03Created from EPrint's datestamp field
- Updated
-
2021-05-03Created from EPrint's last_modified field
- Caltech groups
- Division of Biology and Biological Engineering (BBE)