In a recent Nature study, researchers introduced a machine learning (ML)-assisted enzyme engineering platform that integrates cell-free DNA assembly, gene expression, and functional assays.
Applied to amide synthetases, this approach evaluated 1217 variants across 10,953 reactions, successfully predicting optimized enzymes for synthesizing nine pharmaceutical compounds. The ML models significantly enhanced enzyme activity, offering a scalable method for designing specialized biocatalysts.
Background
Engineered enzymes play a crucial role in energy, materials, and medicine by enhancing natural functions or enabling new chemical reactions. However, traditional directed evolution methods are limited by low-throughput screening and restricted sequence exploration, making enzyme optimization challenging.
Computational strategies like ML and de novo protein design offer solutions but require large, high-quality datasets while maintaining genotype-phenotype links. This study aimed to overcome these limitations by developing a ML-guided, cell-free platform to rapidly map enzyme sequence-function relationships, using high-throughput screening and ridge regression models to optimize an amide synthetase for nine pharmaceutical reactions.
Methods and Procedures
The research combined cell-free DNA assembly, gene expression, and functional assays to study wild-type (wt)-McbA and muGFP in E. coli. Both genes were codon-optimized and cloned into the pJL1 plasmid, with McbA featuring an N-terminal CAT-Strep-Linker fusion (CSL-tag) for purification. The DNA library was generated using PCR, Gibson assembly, and iterative site saturation mutagenesis, with ML predictions refining McbA variants.
Expression was performed in E. coli BL21 Star (DE3) on LB plates with 50 µg/mL kanamycin. Recombinant proteins were purified using Strep-Tactin affinity chromatography, and concentrations determined via NanoDrop. Purified McbA variants were stored at 4 °C or −20 °C.
Functional assays included fluorescence measurements for muGFP and high-throughput amide synthetase activity screening. Reactions were conducted in 384-well plates, incubated at 37 °C, and analyzed using liquid chromatography-mass spectrometry (LC-MS). An ATP regeneration assay incorporating polyphosphate kinase (PPK12) maintained ATP levels. Preparative-scale biosynthesis of moclobemide was also conducted, followed by extraction and purification.
Findings and Analysis
The ML-guided design-build-test-learn (DBTL) workflow improved directed evolution for biocatalysis. Wild-type McbA (wt-McbA) was tested across 1100 reactions, identifying substrate preferences and limitations. McbA efficiently synthesized 16 high-value molecules—including 11 pharmaceuticals—but struggled with certain large or aliphatic substrates.
To accelerate enzyme engineering, researchers used cell-free protein synthesis (CFPS) to generate sequence-function data, eliminating transformation and cloning steps. This enabled rapid construction of sequence-defined mutant libraries for ML model training. By engineering McbA to synthesize moclobemide, metoclopramide, and cinchocaine, key mutations influencing substrate specificity were identified.
A hot-spot screen of 64 residues revealed beneficial mutations, which trained ridge regression models. ML predictions were tested alongside traditional iterative saturation mutagenesis (ISM), producing a quadruple McbA mutant with a 42-fold increase in catalytic efficiency for moclobemide synthesis. Metoclopramide synthesis improved 30-fold, though cinchocaine remained challenging.
Discussion and Insights
This ML-guided, high-throughput protein engineering framework accelerates enzyme design without requiring extensive computational resources. By integrating cell-free expression (CFE), mutagenesis, and ML, the approach streamlines directed evolution campaigns.
Of the 1217 ML-variants explored, the researchers were able to identify 19 key residue positions critical for biocatalysis. Engineered variants exhibited activity improvements ranging from 1.6- to 42-fold, with one achieving 96 % conversion efficiency for moclobemide synthesis.
ML models trained on single mutations successfully predicted higher-order mutants with superior activity, reducing search space and increasing success rates. While adaptable to other enzymes, fine-tuning data collection and ML models may be necessary.
By leveraging CFE and LC-MS screening, researchers shortened enzyme engineering cycles from weeks to days. Combined with de novo design strategies, this method holds promise for pharmaceutical and industrial applications.
Conclusion
This study introduced an ML-guided enzyme engineering platform that integrates cell-free DNA assembly, gene expression, and functional assays to efficiently map protein sequence-function relationships.
Applied to amide synthetases, it evaluated 1217 variants and identified optimized enzymes for pharmaceutical synthesis. The method significantly enhanced enzyme activity, providing a scalable, high-throughput strategy for biocatalyst engineering. With catalytic efficiency improvements of up to 42-fold, this ML-driven workflow holds potential for broad pharmaceutical and industrial applications.
Journal Reference
Landwehr et al., 2025. Accelerated enzyme engineering by machine-learning guided cell-free expression. Nature Communications, 16(1). DOI:10.1038/s41467-024-55399-0 https://www.nature.com/articles/s41467-024-55399-0
Disclaimer: The views expressed here are those of the author expressed in their private capacity and do not necessarily represent the views of AZoM.com Limited T/A AZoNetwork the owner and operator of this website. This disclaimer forms part of the Terms and conditions of use of this website.