Welcome to the new version of CaltechAUTHORS. Login is currently restricted to library staff. If you notice any issues, please email coda@library.caltech.edu
Published September 10, 2019 | Submitted
Report Open

Quantifying the tradeoff between sequencing depth and cell number in single-cell RNA-seq

Abstract

The allocation of a sequencing budget when designing single cell RNA-seq experiments requires consideration of the tradeoff between number of cells sequenced and the read depth per cell. One approach to the problem is to perform a power analysis for a univariate objective such as differential expression. However, many of the goals of single-cell analysis requires consideration of the multivariate structure of gene expression, such as clustering. We introduce an approach to quantifying the impact of sequencing depth and cell number on the estimation of a multivariate generative model for gene expression that is based on error analysis in the framework of a variational autoencoder. We find that at shallow depths, the marginal benefit of deeper sequencing per cell significantly outweighs the benefit of increased cell numbers. Above about 15,000 reads per cell the benefit of increased sequencing depth is minor. Code for the workflow reproducing the results of the paper is available at https://github.com/pachterlab/SBP_2019/.

Additional Information

The copyright holder has placed this preprint in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, reuse, remix, or adapt this material for any purpose without crediting the original authors. bioRxiv preprint first posted online Sep. 9, 2019. Code availability: A Snakemake [20] file used to subsample and process the data, together with Python notebooks used for downstream analyses are available on GitHub at https://github.com/pachterlab/SBP_2019/. Scripts and notebooks used to create the figures and results, together with gene count matrices outputted by kallisto bus and H5AD files with the UMI counts for all the subsampled read depths are available on CaltechDATA (https://doi.org/10.22002/d1.1276). Author contributions: V.S. designed the evaluation metric and performed statistical analysis. E.V.B. performed data processing and subsampling. V.S., E.V.B., and L.P. interpreted results and wrote the manuscript. The authors want to thank Romain Lopez for helpful feedback on the manuscript. V.S. and L.P. were funded in part by NIH U19MH114830.

Attached Files

Submitted - 762773.full.pdf

Files

762773.full.pdf
Files (5.2 MB)
Name Size Download all
md5:365abb26013cd1700a410e001b902f86
5.2 MB Preview Download

Additional details

Created:
August 19, 2023
Modified:
October 20, 2023