Welcome to the new version of CaltechAUTHORS. Login is currently restricted to library staff. If you notice any issues, please email coda@library.caltech.edu
Published August 21, 2020 | Published + Supplemental Material
Journal Article Open

Signal Peptides Generated by Attention-Based Neural Networks

Abstract

Short (15–30 residue) chains of amino acids at the amino termini of expressed proteins known as signal peptides (SPs) specify secretion in living cells. We trained an attention-based neural network, the Transformer model, on data from all available organisms in Swiss-Prot to generate SP sequences. Experimental testing demonstrates that the model-generated SPs are functional: when appended to enzymes expressed in an industrial Bacillus subtilis strain, the SPs lead to secreted activity that is competitive with industrially used SPs. Additionally, the model-generated SPs are diverse in sequence, sharing as little as 58% sequence identity to the closest known native signal peptide and 73% ± 9% on average.

Additional Information

© 2020 American Chemical Society. This is an open access article published under an ACS AuthorChoice License, which permits copying and redistribution of the article or any adaptations for non-commercial purposes. Received: April 21, 2020; Published: July 10, 2020. The authors would like to thank Yisong Yue, Taehwan Kim, and other instructors of the Spring 2017 CS159 course at Caltech for initial guidance, and Zheyuan (Steve) Guo and Lucas Schaus for helpful discussions. Additionally, the authors would like to thank the team members of BASF Enzymes for being gracious hosts over the course of this project and Twist Biosciences for providing DNA at educational rates. Author Contributions: Z.W., K.K.Y., and M.J.L. contributed equally. Z.W., F.H.A., and K.K.Y. conceived and directed this study. K.K.Y., A.L., and Z.W. obtained training data and trained the models. Z.W., M.J.L., and D. Wernick planned the in vivo experimental validation. M.J.L. and A.B. performed the experimental validation. Z.W. analyzed the experimental results. D. Weiner advised the study. Z.W., F.H.A., K.K.Y., and M.J.L. wrote the paper. All authors edited and approved the manuscript. This work was supported by BASF through the California Research Alliance (CARA), the National Science Foundation Division of Chemical, Bioengineering, Environmental and Transport Systems (CBET-1937902), a National Science Foundation Graduate Fellowship GRF2017227007 (to Z.W.), and through generous research credits provided by Amazon Web Services. The authors declare the following competing financial interest(s): Provisional patent applications have been filed based on the results presented here. Notes: The trained Transformer model for generating signal peptides and the data used to train the model will be available at https://github.com/fhalab/SPGen.

Attached Files

Published - acssynbio.0c00219.pdf

Supplemental Material - sb0c00219_si_001.pdf

Supplemental Material - sb0c00219_si_002.xlsx

Files

acssynbio.0c00219.pdf
Files (6.7 MB)
Name Size Download all
md5:cf9be4d50bdc83eb33cd6e95cbe6450a
3.0 MB Preview Download
md5:c2bd4f8f9e2ce36fb0ac2144ce6e7856
50.5 kB Download
md5:c712ed9d81b14eea9e06ac5f616feae6
3.7 MB Preview Download

Additional details

Created:
August 19, 2023
Modified:
December 22, 2023