Welcome to the new version of CaltechAUTHORS. Login is currently restricted to library staff. If you notice any issues, please email coda@library.caltech.edu
Published February 12, 2021 | public
Journal Article

Analysis of Length Biases in Single-Cell RNA Sequencing of Unspliced mRNA by Markov Modeling

Abstract

Recent experimental advances in single-cell RNA sequencing (scRNA-seq) have enabled the quantification of transcriptomes with single-molecule resolution. However, thus far, the stochastic modeling of transcription has been separate from the discussion of the statistics of the sequencing process, leading to simplifications that may obfuscate transcriptional dynamics, and technical artifacts in the assays. For example, imputation, normalization, and smoothing, used to correct for stochastic sequencing phenomena, make experimental molecule count data incompatible with a discrete representation, thus rendering the data uninterpretable in the context of conventional Chemical Master Equation (CME) models. Models of gene expression - such as the negative binomial count model - are used with limited physical justification, whereas models for multimodal data are under-explored. Conversely, more detailed CME descriptions of gene expression do not directly address the complexities of the sequencing process. We demonstrate that modeling both phenomena reveals a pervasive gene length-based effect in the detection of unspliced mRNA: long genes are substantially more likely to have higher average unspliced mRNA expression. To explain this effect, we build a stochastic model that accounts for physiological and experimental events, and jointly infer hundreds of gene-specific as well as transcriptome-wide parameters. Specifically, we extend a joint model of mRNA processing described by Singh and Bokes (Biophys. J., 2012) to incorporate downstream Poisson sampling, representing cDNA library construction and sequencing. The explicit inclusion of sampling yields mechanistically interpretable results for the gene expression parameters, and suggests extensions to more complex models.

Additional Information

© 2021 Biophysical Society. Available online 12 February 2021.

Additional details

Created:
August 20, 2023
Modified:
December 22, 2023