Quantifying the tradeoff between sequencing depth and cell number in single-cell RNA-seq
Abstract
The allocation of a sequencing budget when designing single cell RNA-seq experiments requires consideration of the tradeoff between number of cells sequenced and the read depth per cell. One approach to the problem is to perform a power analysis for a univariate objective such as differential expression. However, many of the goals of single-cell analysis requires consideration of the multivariate structure of gene expression, such as clustering. We introduce an approach to quantifying the impact of sequencing depth and cell number on the estimation of a multivariate generative model for gene expression that is based on error analysis in the framework of a variational autoencoder. We find that at shallow depths, the marginal benefit of deeper sequencing per cell significantly outweighs the benefit of increased cell numbers. Above about 15,000 reads per cell the benefit of increased sequencing depth is minor. Code for the workflow reproducing the results of the paper is available at https://github.com/pachterlab/SBP_2019/.
Additional Information
The copyright holder has placed this preprint in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, reuse, remix, or adapt this material for any purpose without crediting the original authors. bioRxiv preprint first posted online Sep. 9, 2019. Code availability: A Snakemake [20] file used to subsample and process the data, together with Python notebooks used for downstream analyses are available on GitHub at https://github.com/pachterlab/SBP_2019/. Scripts and notebooks used to create the figures and results, together with gene count matrices outputted by kallisto bus and H5AD files with the UMI counts for all the subsampled read depths are available on CaltechDATA (https://doi.org/10.22002/d1.1276). Author contributions: V.S. designed the evaluation metric and performed statistical analysis. E.V.B. performed data processing and subsampling. V.S., E.V.B., and L.P. interpreted results and wrote the manuscript. The authors want to thank Romain Lopez for helpful feedback on the manuscript. V.S. and L.P. were funded in part by NIH U19MH114830.Attached Files
Submitted - 762773.full.pdf
Files
Name | Size | Download all |
---|---|---|
md5:365abb26013cd1700a410e001b902f86
|
5.2 MB | Preview Download |
Additional details
- Eprint ID
- 98536
- Resolver ID
- CaltechAUTHORS:20190910-074005263
- U19MH114830
- NIH
- Created
-
2019-09-10Created from EPrint's datestamp field
- Updated
-
2021-11-16Created from EPrint's last_modified field