Welcome to the new version of CaltechAUTHORS. Login is currently restricted to library staff. If you notice any issues, please email coda@library.caltech.edu
Published March 30, 2018 | Published + Supplemental Material
Journal Article Open

A statistical model for improved membrane protein expression using sequence-derived features

Abstract

The heterologous expression of integral membrane proteins (IMPs) remains a major bottleneck in the characterization of this important protein class. IMP expression levels are currently unpredictable, which renders the pursuit of IMPs for structural and biophysical characterization challenging and inefficient. Experimental evidence demonstrates that changes within the nucleotide or amino-acid sequence for a given IMP can dramatically affect expression levels; yet these observations have not resulted in generalizable approaches to improve expression levels. Here, we develop a data-driven statistical predictor named IMProve, that, using only sequence information, increases the likelihood of selecting an IMP that expresses in E. coli. The IMProve model, trained on experimental data, combines a set of sequence-derived features resulting in an IMProve score, where higher values have a higher probability of success. The model is rigorously validated against a variety of independent datasets that contain a wide range of experimental outcomes from various IMP expression trials. The results demonstrate that use of the model can more than double the number of successfully expressed targets at any experimental scale. IMProve can immediately be used to identify favorable targets for characterization. Most notably, IMProve demonstrates for the first time that IMP expression levels can be predicted directly from sequence.

Additional Information

© 2018 American Society for Biochemistry and Molecular Biology, Inc. Published under license by The American Society for Biochemistry and Molecular Biology, Inc. Received November 22, 2017. Accepted January 29, 2018. We thank Daniel Daley and Thomas Miller's group for discussion, Yaser Abu-Mostafa and Yisong Yue for guidance regarding machine learning, Niles Pierce for providing NUPACK source code (33), Welison Floriano and Naveed Near-Ansari for maintaining local computing resources, and Samuel Schulte for suggesting the model's name. We thank Michiel Niesen, Stephen Marshall, Thomas Miller, Reid van Lehn, James Bowie, and Tom Rapoport for comments on the manuscript. Models and analyses are possible thanks to raw experimental data provided by Daniel Daley and Mikaela Rapp (20); Nir Fluman (29); Edda Kloppmann, Brian Kloss, and Marco Punta from NYCOMPS (2, 3); Pikyee Ma (46); Renaud Wagner (49); Florent Bernaudat (53), and Constance Jeffrey (47). We acknowledge funding from an NIH Pioneer Award to WMC (5DP1GM105385); a Benjamin M. Rosen graduate fellowship, a NIH/NRSA training grant (5T32GM07616), and a NSF Graduate Research fellowship to SMS; and an Arthur A. Noyes Summer Undergraduate Research Fellowship to NJ. Computational time was provided by Stephen Mayo and Douglas Rees. This material is based upon work supported by the National Science Foundation Graduate Research Fellowship Program under Grant No. 1144469. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation grant number ACI-1053575 (108). The authors declare that they have no conflicts of interest with the contents of this article. Author Contributions: S.M.S., A.M., and W.M.C. conceived the project. S.M.S. developed the approach. S.M.S., A.M., and N.J. compiled sequence and experimental data. N.J. created code to demonstrate feasibility. S.M.S. performed all published calculations. S.M.S. and W.M.C. wrote the manuscript.

Attached Files

Published - J._Biol._Chem.-2018-Saladi-4913-27.pdf

Supplemental Material - 134046_1_supp_57854_p2qtl9__1_.xlsx

Supplemental Material - 134046_1_supp_57863_p2q6g6.docx

Supplemental Material - 134046_1_supp_57869_p2qggg.pdf

Supplemental Material - 134046_1_supp_57870_p2qcgb.pdf

Supplemental Material - 134046_1_supp_57871_p2qmgb.pdf

Supplemental Material - 134046_1_supp_57872_p2qmgb.pdf

Supplemental Material - 134046_1_supp_57873_p2qmgb.pdf

Files

134046_1_supp_57870_p2qcgb.pdf
Files (4.1 MB)
Name Size Download all
md5:4b6993765ff556bb8406ee04b69f279f
51.3 kB Download
md5:a96c509c8d8bba63cd159ceafa373021
720.2 kB Preview Download
md5:2aed4feb3d9e05875e9940fdae945140
126.8 kB Preview Download
md5:652eb9086ad48a470a566b1ea48354c2
272.0 kB Preview Download
md5:675ec0dfe391fa77d781c9f79688d6bc
68.8 kB Preview Download
md5:e5a936546b6dea10ef541a7bbc83ca33
233.7 kB Download
md5:0c558a68c3ef3b5d7c6e7f7e8ec1b44f
555.3 kB Preview Download
md5:d8b35f94ff471c91d479bccc5f414993
2.1 MB Preview Download

Additional details

Created:
August 19, 2023
Modified:
October 18, 2023