Despite being extensively studied, the functions of 30% of the proteins comprising E. coli remain elusive. To address this, researchers employed artificial intelligence (AI) to uncover 464 types of enzymes within these unknown proteins. Subsequently, three protein types predicted by the AI were successfully identified through in vitro enzyme assays.
On the 24th, 2023, KAIST (Korea Advanced Institute of Science and Technology) announced a breakthrough by a joint research team, including Gi Bae Kim, Ji Yeon Kim, Dr Jong An Lee, and Distinguished Professor Sang Yup Lee from the Department of Chemical and Biomolecular Engineering at KAIST, along with Dr Charles J. Norsigian and Professor Bernhard O. Palsson from the Department of Bioengineering at UCSD.
The team developed DeepECtransformer, an AI capable of predicting enzyme functions from protein sequences. This AI-based prediction system allows for the rapid and accurate identification of enzyme functions.
Enzymes play a crucial role in catalyzing biological reactions, and understanding the function of each enzyme is vital for comprehending the diverse chemical reactions and metabolic characteristics within living organisms.
The Enzyme Commission (EC) number, a classification system devised by the International Union of Biochemistry and Molecular Biology, aids in understanding the metabolic traits of various organisms. However, current technologies lack the ability to swiftly analyze enzymes and their corresponding EC numbers in a genome.
While various deep learning-based methodologies have been developed for analyzing biological sequences and predicting protein functions, many suffer from the "black box" problem, where the AI’s inference process cannot be interpreted. Previous AI-driven prediction systems for enzyme functions have also been reported. Still, they fail to address the black box issue or provide a fine-grained interpretation of the reasoning process.
The joint research team tackled this challenge by creating DeepECtransformer, an AI incorporating deep learning and a protein homology analysis module for predicting enzyme functions.
The transformer architecture, commonly used in natural language processing, was applied to extract crucial features related to enzyme functions within the entire protein sequence. This approach enabled the accurate prediction of EC numbers, with DeepECtransformer capable of predicting a total of 5360 EC numbers.
An in-depth analysis of the transformer architecture revealed that, during the inference process, the AI relies on information about catalytic active sites and cofactor binding sites—essential elements for enzyme function.
By delving into the black box of DeepECtransformer, the researchers confirmed that the AI autonomously identified features crucial for enzyme function during the learning process.
By utilizing the prediction system we developed, we were able to predict the functions of enzymes that had not yet been identified and verify them experimentally.
Gi Bae Kim, Study First Author, Korea Advanced Institute of Science & Technology
Kim added, “By using DeepECtransformer to identify previously unknown enzymes in living organisms, we will be able to more accurately analyze various facets involved in the metabolic processes of organisms, such as the enzymes needed to biosynthesize various useful compounds or the enzymes needed to biodegrade plastics.”
DeepECtransformer, which quickly and accurately predicts enzyme functions, is a key technology in functional genomics, enabling us to analyze the function of entire enzymes at the systems level. We will be able to use it to develop eco-friendly microbial factories based on comprehensive genome-scale metabolic models, potentially minimizing missing information of metabolism.
Sang Yup Lee, Professor, Korea Advanced Institute of Science & Technology
The collaborative efforts of the research team in developing DeepECtransformer are detailed in their research paper authored by Gi Bae Kim, Professor Sang Yup Lee from the Department of Chemical and Biomolecular Engineering at KAIST, and their colleagues.
The study received financial support from the “Development of next-generation biorefinery platform technologies for leading bio-based chemicals industry project (2022M3J5A1056072)” and the “Development of platform technologies of microbial cell factories for the next-generation biorefineries project (2022M3J5A1056117)” funded by the National Research Foundation and backed by the Korean Ministry of Science and ICT. Distinguished Professor Sang Yup Lee from KAIST led these projects.
Journal Reference:
Kim, G. B., et al. (2023) Functional annotation of enzyme-encoding genes using deep learning with transformer layers. Nature Communications. doi.org/10.1038/s41467-023-43216-z