Welcome to the new version of CaltechAUTHORS. Login is currently restricted to library staff. If you notice any issues, please email coda@library.caltech.edu
Published June 24, 2019 | Submitted + Supplemental Material
Report Open

Regular Architecture (RegArch): A standard expression language for describing protein architectures

Abstract

Domain architecture – the arrangement of features in a protein – exhibits syntactic patterns similar to the grammar of a language. This feature enables pattern mining for protein function prediction, comparative genomics, and studies of molecular evolution and complexity. To facilitate such work, here we propose Regular Architecture (RegArch), an expression language to describe syntactic patterns in protein architectures. Like the well-known Regular Expressions for text, RegArchs codify positional and non-positional patterns of elements into nested JSON objects. We describe the standard and provide a reference implementation in JavaScript to parse RegArchs and match annotated proteins.

Additional Information

The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license. bioRxiv preprint first posted online Jun. 22, 2019. The authors would like to thank the developers of MiST3: Luke Ulrich, Vadim Gumerov, and Ogun Adebali for helpful suggestions and comments on the manuscript. We also would like to thank Igor Zhulin for discussions on protein domain architectures that led to the idea of Regular Architecture. We also thank Dr. Catherine M. Oikonomou for helpful discussion and suggestions on the manuscript. This work was made possible through the support of the National Institutes of Health (grant R35 GM122588 to G.J.J.) and the John Templeton Foundation as part of the Boundaries of Life Initiative (grants 51250 & 60973 to G.J.J.).

Attached Files

Submitted - 679910.full.pdf

Supplemental Material - media-1.pdf

Files

media-1.pdf
Files (394.3 kB)
Name Size Download all
md5:317a9f022daf189eb48a2b257570681f
166.5 kB Preview Download
md5:eb3fa4fb158a26dd7f7977dc6c4dfa89
227.7 kB Preview Download

Additional details

Created:
August 19, 2023
Modified:
October 20, 2023