Fast and accurate single-cell RNA-seq analysis by clustering of transcript-compatibility counts

Creators: Ntranos, Vasilis; Kamath, Govinda M.; Zhang, Jesse M.; Pachter, Lior; Tse, David N.

Abstract

Current approaches to single-cell transcriptomic analysis are computationally intensive and require assay-specific modeling, which limits their scope and generality. We propose a novel method that compares and clusters cells based on their transcript-compatibility read counts rather than on the transcript or gene quantifications used in standard analysis pipelines. In the reanalysis of two landmark yet disparate single-cell RNA-seq datasets, we show that our method is up to two orders of magnitude faster than previous approaches, provides accurate and in some cases improved results, and is directly applicable to data from a wide variety of assays.

Additional Information

© 2016 Ntranos et al. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Received: 24 February 2016; Accepted: 29 April 2016; Published: 26 May 2016. Availability of data and materials: The code used to generate the results presented in this paper is available online on GitHub [49]. All sequencing reads for the Zeisel et al. dataset [7] are available through Gene Expression Omnibus [GEO:GSE60361] and for the Trapnell et al. dataset [12] through [GEO:GSE52529]. The method is publically available on GitHub (https://github.com/govinda-kamath/clustering_on_transcript_compatibility_counts) under the MIT license. Ethics: No ethics approval was required for this study. We thank Páll Melsted for implementing the pseudo command in kallisto. This is the command that allows for direct output of transcript-compatibility counts via pseudoalignment. We would also like to thank Bo Li, Allon Wagner, and Nir Yosef for useful discussions about single-cell RNA-seq assays and their biases. The authors declare that they have no competing interests. Authors' contributions: VN, GMK, and JZ conceived the idea of clustering without quantification, performed analyses of data, analyzed and interpreted results, and wrote the manuscript. DNT and LP interpreted results, supervised the project, and wrote the manuscript. All authors read and approved the final manuscript. GMK and JZ are supported by the Center for Science of Information, an NSF Science and Technology Center, under grant agreement CCF-0939370. VN is supported in part by the Center for Science of Information and in part by a gift from Qualcomm Inc. LP is supported in part by the National Human Genome Research Institute of the National Institutes of Health under award number R01HG006129. DNT is supported in part by the Center of Science of Information and in part by the National Human Genome Research Institute of the National Institutes of Health under award number R01HG008164.

Attached Files

Published - 13059_2016_Article_970.pdf

Submitted - 036863.full.pdf

Supplemental Material - 13059_2016_970_MOESM1_ESM.pdf

Files

036863.full.pdf

Files (65.2 MB)

Name	Size	Download all
036863.full.pdf md5:122ecc707c9611fb32240e6f2d529835	37.9 MB	Preview Download
13059_2016_Article_970.pdf md5:bff2992f6baf46fa6a22f8cf2d2832e0	3.8 MB	Preview Download
13059_2016_970_MOESM1_ESM.pdf md5:4ecdc3075cfba445a7a816dd141a58a5	23.5 MB	Preview Download

Additional details

	All versions	This version
Views	0	0
Downloads	0	0
Data volume	0 Bytes	0 Bytes