Regular Architecture (RegArch): A standard expression language for describing protein architectures
- Creators
- Ortega, Davi R.
- Jensen, Grant J.
Abstract
Domain architecture – the arrangement of features in a protein – exhibits syntactic patterns similar to the grammar of a language. This feature enables pattern mining for protein function prediction, comparative genomics, and studies of molecular evolution and complexity. To facilitate such work, here we propose Regular Architecture (RegArch), an expression language to describe syntactic patterns in protein architectures. Like the well-known Regular Expressions for text, RegArchs codify positional and non-positional patterns of elements into nested JSON objects. We describe the standard and provide a reference implementation in JavaScript to parse RegArchs and match annotated proteins.
Additional Information
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license. bioRxiv preprint first posted online Jun. 22, 2019. The authors would like to thank the developers of MiST3: Luke Ulrich, Vadim Gumerov, and Ogun Adebali for helpful suggestions and comments on the manuscript. We also would like to thank Igor Zhulin for discussions on protein domain architectures that led to the idea of Regular Architecture. We also thank Dr. Catherine M. Oikonomou for helpful discussion and suggestions on the manuscript. This work was made possible through the support of the National Institutes of Health (grant R35 GM122588 to G.J.J.) and the John Templeton Foundation as part of the Boundaries of Life Initiative (grants 51250 & 60973 to G.J.J.).Attached Files
Submitted - 679910.full.pdf
Supplemental Material - media-1.pdf
Files
Name | Size | Download all |
---|---|---|
md5:317a9f022daf189eb48a2b257570681f
|
166.5 kB | Preview Download |
md5:eb3fa4fb158a26dd7f7977dc6c4dfa89
|
227.7 kB | Preview Download |
Additional details
- Eprint ID
- 96667
- Resolver ID
- CaltechAUTHORS:20190624-114502121
- R35 GM122588
- NIH
- 51250
- John Templeton Foundation
- 60973
- John Templeton Foundation
- Created
-
2019-06-24Created from EPrint's datestamp field
- Updated
-
2021-11-16Created from EPrint's last_modified field