Welcome to the new version of CaltechAUTHORS. Login is currently restricted to library staff. If you notice any issues, please email coda@library.caltech.edu
Published March 18, 2022 | Submitted + Supplemental Material
Journal Article Open

evSeq: Cost-Effective Amplicon Sequencing of Every Variant in a Protein Library

Abstract

Widespread availability of protein sequence-fitness data would revolutionize both our biochemical understanding of proteins and our ability to engineer them. Unfortunately, even though thousands of protein variants are generated and evaluated for fitness during a typical protein engineering campaign, most are never sequenced, leaving a wealth of potential sequence-fitness information untapped. Primarily, this is because sequencing is unnecessary for many protein engineering strategies; the added cost and effort of sequencing are thus unjustified. It also results from the fact that, even though many lower-cost sequencing strategies have been developed, they often require at least some access to and experience with sequencing or computational resources, both of which can be barriers to access. Here, we present every variant sequencing (evSeq), a method and collection of tools/standardized components for sequencing a variable region within every variant gene produced during a protein engineering campaign at a cost of cents per variant. evSeq was designed to democratize low-cost sequencing for protein engineers and, indeed, anyone interested in engineering biological systems. Execution of its wet-lab component is simple, requires no sequencing experience to perform, relies only on resources and services typically available to biology labs, and slots neatly into existing protein engineering workflows. Analysis of evSeq data is likewise made simple by its accompanying software (found at github.com/fhalab/evSeq, documentation at fhalab.github.io/evSeq), which can be run on a personal laptop and was designed to be accessible to users with no computational experience. Low-cost and easy-to-use, evSeq makes the collection of extensive protein variant sequence-fitness data practical.

Additional Information

© 2022 American Chemical Society. Received: November 24, 2021; Published: February 17, 2022. The authors thank Shan Li, Adrienne Rollie, and Eric Brustad at Illumina, Inc: Shan Li and Adrienne Rollie for helping us troubleshoot the evSeq method and Eric Brustad for critical reading of the manuscript. The authors also thank fellow Arnold laboratory members Nathaniel Goldberg and Nicholas Porter for implementing evSeq (which pointed us to necessary improvements), Anders Knight for suggesting and prototyping evSeq software features, Ella Watkins-Dulaney for assistance in building the TrpB libraries, and Sabine Brinkmann-Chen for critical reading of the manuscript. This work was supported by an Amgen Chem-Bio-Engineering Award (CBEA). This work was supported by the NSF Division of Chemical, Bioengineering, Environmental, and Transport Systems (CBET 1937902). This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Basic Energy Sciences, under award number DE-SC0022218. This report was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor any agency thereof, nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof. Author Contributions. Author contributions are provided using the CRediT taxonomy: B.J.W.: conceptualization, methodology, software, validation, investigation, writing─original draft, writing─review and editing, and funding acquisition. K.E.J.: methodology, data collection, software, investigation, writing─original draft, writing─review and editing, visualization, and funding acquisition. P.J.A.: methodology, data-collection, software, validation, investigation, writing─original draft, writing─review and editing, and visualization. F.H.A.: resources, writing─original draft, writing─review and editing, and funding acquisition. Data Availability: All raw and processed data generated by this study can be found at CaltechData (DOI: 10.22002/D1.2140). The software version used to analyze all data in this study is tagged as v1.0.0 on the associated GitHub repository. The authors declare no competing financial interest.

Attached Files

Submitted - 2021.11.18.469179v1.full.pdf

Supplemental Material - sb1c00592_si_001.pdf

Files

sb1c00592_si_001.pdf
Files (2.1 MB)
Name Size Download all
md5:0d566293014163073b3ef6aac9248820
748.9 kB Preview Download
md5:12682c960cad285b57c5e9ef8b52f1e7
1.4 MB Preview Download

Additional details

Created:
August 22, 2023
Modified:
December 22, 2023