Welcome to the new version of CaltechAUTHORS. Login is currently restricted to library staff. If you notice any issues, please email coda@library.caltech.edu
Published February 7, 2020 | Supplemental Material + Published
Journal Article Open

Determining sequencing depth in a single-cell RNA-seq experiment

Abstract

An underlying question for virtually all single-cell RNA sequencing experiments is how to allocate the limited sequencing budget: deep sequencing of a few cells or shallow sequencing of many cells? Here we present a mathematical framework which reveals that, for estimating many important gene properties, the optimal allocation is to sequence at a depth of around one read per cell per gene. Interestingly, the corresponding optimal estimator is not the widely-used plug-in estimator, but one developed via empirical Bayes.

Additional Information

© The Author(s) 2020. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. Received: 6 September 2018; Accepted: 13 December 2019. Published online: 07 February 2020. This research was in part motivated by discussions on the experimental design question in the Human Cell Atlas First Annual Jamboree meeting. We thank Lior Pachter for his valuable input and constructive suggestions throughout the course of this study; Jase Gehring, Wenying Pan, and Taibo Li for their helpful feedback; and Dominic Gr�n for providing the smFISH data corresponding to the CEL-seq data. Thanks also to Patrick Marks for very useful feedback on an earlier version of the paper. D.T. and M.J.Z. are supported in part by the Center of Science of Information, an NSF Science and Technology Center, under grant agreement CCF-0939370 and in part by the National Human Genome Research Institute of the National Institutes of Health under award number R01HG008164. M.J.Z. is also supported by a Stanford Graduate Fellowship (Inventec Fellow). V.N. is supported in part by the Center for Science of Information and in part by a gift from Qualcomm Inc. These authors contributed equally: Martin Jinye Zhang, Vasilis Ntranos. Author Contributions: M.J.Z. and V.N. conceived the idea and performed the empirical experiments. M.J.Z. performed the theoretical analysis. M.J.Z., V.N. and D.T. wrote the manuscript. D.T. supervised the research. All authors reviewed the manuscript. Data availability: The 10× datasets were generated by 10x Genomics' v2 chemistry22. They are publicly available and can be downloaded via the following links: pbmc_4k: https://support.10xgenomics.com/single-cell-gene-expression/datasets/2.1.0/pbmc4k pbmc_8k: https://support.10xgenomics.com/single-cell-gene-expression/datasets/2.1.0/pbmc8k brain_1k: https://support.10xgenomics.com/single-cell-gene-expression/datasets/2.1.0/neurons_900 brain_2k: https://support.10xgenomics.com/single-cell-gene-expression/datasets/2.1.0/neurons_2000 brain_9k: https://support.10xgenomics.com/single-cell-gene-expression/datasets/2.1.0/neuron_9k brain_1.3m: https://support.10xgenomics.com/single-cell-gene-expression/datasets/1.3.0/1M_neurons 293T_1k, 3T3_1k: https://support.10xgenomics.com/single-cell-gene-expression/datasets/2.1.0/hgmm_1k 293T_6k, 3T3_6k: https://support.10xgenomics.com/single-cell-gene-expression/datasets/2.1.0/hgmm_6k 293T_12k, 3T3_12k: https://support.10xgenomics.com/single-cell-gene-expression/datasets/2.1.0/hgmm_12k We note that pbmc_4k and pbmc_8k are from the same donor; brain_1k and brain_9k are also from the same donor. Also, the following pairs of datasets are sequenced together: 293T_1k and 3T3_1k, 293T_6k and 3T3_6k, 293T_12k and 3T3_12k. These six datasets are from the same biological sample. The Drop-seq dataset and the corresponding smFISH data can be found from the original paper15 or a recent paper that analyzed the dataset16. The CEL-seq data can be found from the original paper27. the smFISH data accompany the CEL-seq can be obtained by contacting the author. The three ERCC datasets (Zheng, Klein, Svensson) can be found in a recent paper that analyzed the data set16, where we have used the 2 × (control RNA + ERCC) data in the Svensson et al.52 paper. The Klein dataset with the pure RNA controls (the Klein ERCC dataset being part of it) can be found from the original paper24. The data for sensitivity analysis (Supplementary Figs. 18–19) can be found from the original paper53. Code availability: We developed the python package sceb (single-cell empirical Bayes) for the EB estimators used in this paper (available on PyPI). The code to reproduce all experiments and generate the figures presented in this paper can be found at https://github.com/martinjzhang/single_cell_eb. The authors declare no competing interests. Peer review information: Nature Communications thanks Jay West and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Attached Files

Published - s41467-020-14482-y.pdf

Supplemental Material - 41467_2020_14482_MOESM1_ESM.pdf

Supplemental Material - 41467_2020_14482_MOESM2_ESM.pdf

Files

41467_2020_14482_MOESM1_ESM.pdf
Files (16.4 MB)
Name Size Download all
md5:7f6b7dd248d693bc7d871be00c3f5a64
13.6 MB Preview Download
md5:1b778122ad3354e709992d724f6b8f2a
2.8 MB Preview Download
md5:c33dae534211cca2ec30d86c1f60a459
67.3 kB Preview Download

Additional details

Created:
August 19, 2023
Modified:
October 19, 2023