A statistical model for improved membrane protein expression using sequence-derived features

Creators: Saladi, Shyam M.; Javed, Nauman; Müller, Axel; Clemons, William M.

Style

An error occurred while generating the citation.

Abstract

The heterologous expression of integral membrane proteins (IMPs) remains a major bottleneck in the characterization of this important protein class. IMP expression levels are currently unpredictable, which renders the pursuit of IMPs for structural and biophysical characterization challenging and inefficient. Experimental evidence demonstrates that changes within the nucleotide or amino-acid sequence for a given IMP can dramatically affect expression levels; yet these observations have not resulted in generalizable approaches to improve expression levels. Here, we develop a data-driven statistical predictor named IMProve, that, using only sequence information, increases the likelihood of selecting an IMP that expresses in E. coli. The IMProve model, trained on experimental data, combines a set of sequence-derived features resulting in an IMProve score, where higher values have a higher probability of success. The model is rigorously validated against a variety of independent datasets that contain a wide range of experimental outcomes from various IMP expression trials. The results demonstrate that use of the model can more than double the number of successfully expressed targets at any experimental scale. IMProve can immediately be used to identify favorable targets for characterization. Most notably, IMProve demonstrates for the first time that IMP expression levels can be predicted directly from sequence.

Additional Information

© 2018 American Society for Biochemistry and Molecular Biology, Inc. Published under license by The American Society for Biochemistry and Molecular Biology, Inc. Received November 22, 2017. Accepted January 29, 2018. We thank Daniel Daley and Thomas Miller's group for discussion, Yaser Abu-Mostafa and Yisong Yue for guidance regarding machine learning, Niles Pierce for providing NUPACK source code (33), Welison Floriano and Naveed Near-Ansari for maintaining local computing resources, and Samuel Schulte for suggesting the model's name. We thank Michiel Niesen, Stephen Marshall, Thomas Miller, Reid van Lehn, James Bowie, and Tom Rapoport for comments on the manuscript. Models and analyses are possible thanks to raw experimental data provided by Daniel Daley and Mikaela Rapp (20); Nir Fluman (29); Edda Kloppmann, Brian Kloss, and Marco Punta from NYCOMPS (2, 3); Pikyee Ma (46); Renaud Wagner (49); Florent Bernaudat (53), and Constance Jeffrey (47). We acknowledge funding from an NIH Pioneer Award to WMC (5DP1GM105385); a Benjamin M. Rosen graduate fellowship, a NIH/NRSA training grant (5T32GM07616), and a NSF Graduate Research fellowship to SMS; and an Arthur A. Noyes Summer Undergraduate Research Fellowship to NJ. Computational time was provided by Stephen Mayo and Douglas Rees. This material is based upon work supported by the National Science Foundation Graduate Research Fellowship Program under Grant No. 1144469. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation grant number ACI-1053575 (108). The authors declare that they have no conflicts of interest with the contents of this article. Author Contributions: S.M.S., A.M., and W.M.C. conceived the project. S.M.S. developed the approach. S.M.S., A.M., and N.J. compiled sequence and experimental data. N.J. created code to demonstrate feasibility. S.M.S. performed all published calculations. S.M.S. and W.M.C. wrote the manuscript.

Attached Files

Published - J._Biol._Chem.-2018-Saladi-4913-27.pdf

Supplemental Material - 134046_1_supp_57854_p2qtl9__1_.xlsx

Supplemental Material - 134046_1_supp_57863_p2q6g6.docx

Supplemental Material - 134046_1_supp_57869_p2qggg.pdf

Supplemental Material - 134046_1_supp_57870_p2qcgb.pdf

Supplemental Material - 134046_1_supp_57871_p2qmgb.pdf

Supplemental Material - 134046_1_supp_57872_p2qmgb.pdf

Supplemental Material - 134046_1_supp_57873_p2qmgb.pdf

Files

Name	Size	Download all
134046_1_supp_57863_p2q6g6.docx md5:e5a936546b6dea10ef541a7bbc83ca33	233.7 kB	Download
134046_1_supp_57854_p2qtl9__1_.xlsx md5:4b6993765ff556bb8406ee04b69f279f	51.3 kB	Download
134046_1_supp_57870_p2qcgb.pdf md5:a96c509c8d8bba63cd159ceafa373021	720.2 kB	Preview Download
134046_1_supp_57872_p2qmgb.pdf md5:2aed4feb3d9e05875e9940fdae945140	126.8 kB	Preview Download
134046_1_supp_57871_p2qmgb.pdf md5:652eb9086ad48a470a566b1ea48354c2	272.0 kB	Preview Download
134046_1_supp_57873_p2qmgb.pdf md5:675ec0dfe391fa77d781c9f79688d6bc	68.8 kB	Preview Download
134046_1_supp_57869_p2qggg.pdf md5:0c558a68c3ef3b5d7c6e7f7e8ec1b44f	555.3 kB	Preview Download
J._Biol._Chem.-2018-Saladi-4913-27.pdf md5:d8b35f94ff471c91d479bccc5f414993	2.1 MB	Preview Download

	All versions	This version
Views	0	0
Downloads	0	0
Data volume	0 Bytes	0 Bytes

A statistical model for improved membrane protein expression using sequence-derived features

Abstract

Additional Information

Attached Files

Files

Additional details