Metadata retrieval from sequence databases with ffq
Abstract
Motivation: Several genomic databases host data and metadata for an ever-growing collection of sequence datasets. While these databases have a shared hierarchical structure, there are no tools specifically designed to leverage it for metadata extraction. Results: We present a command-line tool, called ffq, for querying user-generated data and metadata from sequence databases. Given an accession or a paper's DOI, ffq efficiently fetches metadata and links to raw data in JSON format. ffq's modularity and simplicity make it extensible to any genomic database exposing its data for programmatic access. Availability and implementation: ffq is free and open source, and the code can be found here: https://github.com/pachterlab/ffq.
Additional Information
© The Author(s) 2023. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. This work was motivated by the need to obtain metadata for Booeshaghi and Pachter (2020). We thank Ali Mortazavi for his suggestion to include ffq querying of the ENCODE database and Anders Goncalves da Silva, Andrea Telatin, Laura Luebbert and Phil Ewels for their contributions to the code base. This work was supported in part by National Institutes of Health (NIH) [U19MH114830]. Data availability. All data and code associated with this manuscript is available at https://github.com/pachterlab/ffq. Conflict of Interest: none declared.Attached Files
Published - btac667.pdf
Files
Name | Size | Download all |
---|---|---|
md5:28d847f26f23e2cf0729d82f72d203cd
|
1.6 MB | Preview Download |
Additional details
- PMCID
- PMC9883619
- Eprint ID
- 120722
- Resolver ID
- CaltechAUTHORS:20230411-694477200.2
- NIH
- U19MH114830
- Created
-
2023-06-14Created from EPrint's datestamp field
- Updated
-
2023-06-14Created from EPrint's last_modified field
- Caltech groups
- Division of Biology and Biological Engineering (BBE)