Welcome to the new version of CaltechAUTHORS. Login is currently restricted to library staff. If you notice any issues, please email coda@library.caltech.edu
Published March 22, 2023 | Supplemental Material + Submitted
Report Open

Pervasive, conserved secondary structure in highly charged protein regions

Abstract

Understanding how protein sequences confer function remains a defining challenge in molecular biology. Two approaches have yielded enormous insight yet are often pursued separately: structure-based, where sequence-encoded structures mediate function, and disorder-based, where sequences dictate physicochemical and dynamical properties which determine function in the absence of stable structure. Here we study highly charged protein regions (>40% charged residues), which are routinely presumed to be disordered. Using recent advances in structure prediction and experimental structures, we show that roughly 40% of these regions form well-structured helices. Features often used to predict disorder—high charge density, low hydrophobicity, low sequence complexity, and evolutionarily varying length—are also compatible with solvated, variable-length helices. We show that a simple composition classifier predicts the existence of structure far better than well-established heuristics based on charge and hydropathy. We show that helical structure is more prevalent than previously appreciated in highly charged regions of diverse proteomes and characterize the conservation of highly charged regions. Our results underscore the importance of integrating, rather than choosing between, structure- and disorder-based approaches.

Additional Information

The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license. C.G.T. is a Damon Runyon Postdoctoral Fellow supported by the Damon Runyon Cancer Research Foundation (DRG-2465-22). R.W.P. acknowledges support from the UChicago Biological Sciences Collegiate Division Summer Fellowship, Liew Family College Research Fellows Fund, and the UChicago Quantitative Biology Summer Fellowship. D.A.D. acknowledges support from the NIH (award numbers GM144278 and GM127406) and the US Army Research Office (W911NF-14-1-0411). A.R.D. acknowledges support from the NIH (award number R35 GM136381). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. The authors thank Alex Holehouse for helpful discussions, and Alexander Cope for providing the structure-annotated PDB data. Author contributions. D.A.D., A.R.D. and C.G.T. developed ideas and direction, R.W.P. and C.G.T. performed analyses, R.W.P., C.G.T. and D.A.D. made figures, and all authors contributed to the text. Data Availability. Data used in this study are from publicly available datasets: AlphaFold protein structure prediction available at https://alphafold.ebi.ac.uk/download#proteomes-section, yeast proteome available from Saccharomyces Genome Database http://sgd-archive.yeastgenome.org/sequence/S288C_reference/orf_protein/, AYbRAH fungal ortholog database available at https://github.com/LMSE/aybrah, and DisProt yeast disordered regions https://www.disprot.org/browse?sort_field=disprot_id&sort_value=asc&page_size=20&page=0&r elease=current&show_ambiguous=true&show_obsolete=false&ncbi_taxon_id=559292. All additional data generated in this study are available at https://github.com/drummondlab/highly-charged-regions-2022. Code availability. All analyses and code used to generate the figures in this work can be found at https://github.com/drummondlab/highly-charged-regions-2022. The authors have declared no competing interest.

Attached Files

Submitted - 2023.02.15.528637v1.full.pdf

Supplemental Material - media-1.pdf

Files

2023.02.15.528637v1.full.pdf
Files (7.0 MB)
Name Size Download all
md5:2c45e04cfc266f86df888d1058f7268a
5.2 MB Preview Download
md5:64b8c5c3bb8407302d081f82d946d73a
1.8 MB Preview Download

Additional details

Created:
August 20, 2023
Modified:
November 15, 2023