Welcome to the new version of CaltechAUTHORS. Login is currently restricted to library staff. If you notice any issues, please email coda@library.caltech.edu
Published March 22, 2023 | Submitted
Report Open

A machine-readable specification for genomics assays

Abstract

Understanding the structure of sequenced fragments from genomics libraries is essential for accurate read preprocessing. Currently, different assays and sequencing technologies require custom scripts and programs that do not leverage the common structure of sequence elements present in genomics libraries. We present seqspec, a machine-readable specification for libraries produced by genomics assays that facilitates standardization of preprocessing and enables tracking and comparison of genomics assays. The specification and associated seqspec command line tool is available at https://github.com/IGVF/seqspec.

Additional Information

The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license. We thank Delaney Sullivan for helpful discussions and Rahma Elsiesy for helpful feedback on Figure 1. Discussions with the Impact of Genomics Variation on Function (IGVF) Single-Cell Focus Group helped to shape some features of seqspec. Thanks to Idan Gabdank for useful feedback on seqspec and for suggesting the md5 checksum. Meichen Fang contributed the sci-RNA-seq3 seqspec. A.S.B. and L.P. were supported in part by NIH 5UM1HG012077-02. The authors have declared no competing interest.

Attached Files

Submitted - 2023.03.17.533215v1.full.pdf

Files

2023.03.17.533215v1.full.pdf
Files (860.0 kB)
Name Size Download all
md5:376ea5cf1be7543d3d44f5b435664d1d
860.0 kB Preview Download

Additional details

Created:
August 20, 2023
Modified:
December 13, 2023