Published June 17, 2019
| Submitted + Supplemental Material
Report
Open
Modular and efficient pre-processing of single-cell RNA-seq
Abstract
Analysis of single-cell RNA-seq data begins with the pre-processing of reads to generate count matrices. We investigate algorithm choices for the challenges of pre-processing, and describe a workflow that balances efficiency and accuracy. Our workflow is based on the kallisto and bustools programs, and is near-optimal in speed and memory. The workflow is modular, and we demonstrate its flexibility by showing how it can be used for RNA velocity analyses.
Additional Information
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license. We thank Vasilis Ntranos and Valentine Svensson for helpful suggestions and comments. We thank Jeff Farrell for the Danio rerio gene annotation used to process SRR6956073, John Schiefelbein for the Arabidopsis thaliana gene annotation used to process SRR8257100, Justin Fear the Drosophila melanogaster gene annotation used to process SRR8513910, and Junhyong Kim and Qin Zhu for the Caenorhabditis elegans gene annotation used to process SRR8611943. The benchmarking work was made possible, in part, thanks to support from the Caltech Bioinformatics Resource Center. Author Contributions: PM developed the algorithms for bustools and wrote the software. ASB conceived of and performed the UMI and barcode calculations motivating the algorithms. FG implemented and performed the benchmarking procedure, and curated indices for the datasets. EB designed and produced the comparisons between Cell Ranger and kallisto. LL investigated in detail the performance of different workflows on the 10k mouse neuron data and produced the analysis of that dataset. ASB designed the RNA velocity workflow and performed the RNA velocity analyses. KH developed and investigated the effect of, and optimal choice for, reference transcriptome sequences for pseudoalignment. JG interpreted results and helped to supervise the research. ASB planned, organized and made figures. ASB, EB, PM and LP planned the manuscript. ASB and LP wrote the manuscript.Attached Files
Submitted - 673285.full.pdf
Supplemental Material - media-1.pdf
Supplemental Material - media-2.pdf
Supplemental Material - media-3.xlsx
Supplemental Material - media-4.xlsx
Files
media-1.pdf
Files
(10.1 MB)
Name | Size | Download all |
---|---|---|
md5:d718999f32b81a3911580681dd4a8bc9
|
8.2 MB | Preview Download |
md5:4122945fe6e02a6fca7597bb3595f681
|
1.2 MB | Preview Download |
md5:140ae422ecca4180d18c4dda1e97ad33
|
27.1 kB | Download |
md5:1cc3634d9762d497d9c0b405695fe7ee
|
46.1 kB | Download |
md5:d496c2534b0d82bce39cceacb5b83c31
|
626.6 kB | Preview Download |
Additional details
- Eprint ID
- 96485
- Resolver ID
- CaltechAUTHORS:20190617-153352518
- Caltech Bioinformatics Resource Center
- Created
-
2019-06-17Created from EPrint's datestamp field
- Updated
-
2021-11-16Created from EPrint's last_modified field