Welcome to the new version of CaltechAUTHORS. Login is currently restricted to library staff. If you notice any issues, please email coda@library.caltech.edu
Published June 17, 2019 | Submitted + Supplemental Material
Report Open

Modular and efficient pre-processing of single-cell RNA-seq

Abstract

Analysis of single-cell RNA-seq data begins with the pre-processing of reads to generate count matrices. We investigate algorithm choices for the challenges of pre-processing, and describe a workflow that balances efficiency and accuracy. Our workflow is based on the kallisto and bustools programs, and is near-optimal in speed and memory. The workflow is modular, and we demonstrate its flexibility by showing how it can be used for RNA velocity analyses.

Additional Information

The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license. We thank Vasilis Ntranos and Valentine Svensson for helpful suggestions and comments. We thank Jeff Farrell for the Danio rerio gene annotation used to process SRR6956073, John Schiefelbein for the Arabidopsis thaliana gene annotation used to process SRR8257100, Justin Fear the Drosophila melanogaster gene annotation used to process SRR8513910, and Junhyong Kim and Qin Zhu for the Caenorhabditis elegans gene annotation used to process SRR8611943. The benchmarking work was made possible, in part, thanks to support from the Caltech Bioinformatics Resource Center. Author Contributions: PM developed the algorithms for bustools and wrote the software. ASB conceived of and performed the UMI and barcode calculations motivating the algorithms. FG implemented and performed the benchmarking procedure, and curated indices for the datasets. EB designed and produced the comparisons between Cell Ranger and kallisto. LL investigated in detail the performance of different workflows on the 10k mouse neuron data and produced the analysis of that dataset. ASB designed the RNA velocity workflow and performed the RNA velocity analyses. KH developed and investigated the effect of, and optimal choice for, reference transcriptome sequences for pseudoalignment. JG interpreted results and helped to supervise the research. ASB planned, organized and made figures. ASB, EB, PM and LP planned the manuscript. ASB and LP wrote the manuscript.

Attached Files

Submitted - 673285.full.pdf

Supplemental Material - media-1.pdf

Supplemental Material - media-2.pdf

Supplemental Material - media-3.xlsx

Supplemental Material - media-4.xlsx

Files

media-1.pdf
Files (10.1 MB)
Name Size Download all
md5:d718999f32b81a3911580681dd4a8bc9
8.2 MB Preview Download
md5:4122945fe6e02a6fca7597bb3595f681
1.2 MB Preview Download
md5:140ae422ecca4180d18c4dda1e97ad33
27.1 kB Download
md5:1cc3634d9762d497d9c0b405695fe7ee
46.1 kB Download
md5:d496c2534b0d82bce39cceacb5b83c31
626.6 kB Preview Download

Additional details

Created:
August 19, 2023
Modified:
October 20, 2023