Reviewed by Lexie CornerJun 24 2024
Researchers at Michigan State University are creating a new artificial intelligence (AI) program, TopoFormer, that has the potential to expedite and lower the cost of medication development. TopoFormer is a software application that utilizes the same technology as ChatGPT. It enables traditional computer models to provide more accurate predictions of a drug's interaction with its biological target by incorporating crucial data that the models were previously unable to use. The study has been published in the journal Nature Machine Intelligence.
TopoFormer has been developed by an interdisciplinary group led by Guowei Wei, a Michigan State University Research Foundation Professor in the Department of Mathematics. It extends the capacity of standard AI-based drug-interaction models to predict potential drug efficacy by converting three-dimensional molecular information into data that the models can use.
With AI, you could make drug discovery faster, more efficient, and cheaper.
Guowei Wei, Department of Mathematics, Michigan State University
Wei also holds appointments in the Department of Biochemistry and Molecular Biology and the Department of Electrical and Computer Engineering.
Instructions for Structure
According to Wei, developing a single drug in the US takes about ten years and costs about $2 billion. About half of that time is spent testing the drug through trials, and the other half is spent finding a new therapeutic candidate to test.
Topo Former can reduce the time needed for development. This can cut development expenses, eventually resulting in a decrease in the drug’s price for patients. This could be especially beneficial for rare diseases, where the limited number of patients often necessitates higher drug prices to recover costs.
Even though researchers currently use computer models to assist in the drug discovery process, the numerous variables involved present limitations.
In our body, we have over 20,000 proteins. When a disease comes up, some or one of those is targeted.
Guowei Wei, Department of Mathematics, Michigan State University
The first step is finding out which protein or proteins are impacted by a disease. Researchers also turn to those proteins to identify compounds that can lessen, stop, or otherwise mitigate the effects of the illness.
When I have a target, I try to find a lot of potential drugs for that particular target.
Guowei Wei, Department of Mathematics, Michigan State University
Scientists can input molecular sequences from proteins and possible drugs into traditional computer models once they know which proteins to target with a drug. The models inform decisions about which drugs to develop and test in clinical trials by forecasting interactions between drugs and targets.
These models can forecast certain interactions solely from the drug's and protein's chemical composition. However, they fail to account for crucial interactions resulting from molecular shape and three-dimensional, or 3D, structure.
One such example is the drug ibuprofen, which chemists discovered in the 1960s. Ibuprofen comes in two distinct molecules with slightly different three-dimensional structures, but they both have the same chemical sequence. Only one configuration can attach to proteins linked to pain and eliminate headaches.
Wei said, “Current deep learning models can’t account for the shape of drugs or proteins when predicting how they’ll work together.”
Herein lies the role of TopoFormer. This artificial intelligence is a transformer model, the same kind that powers ChatGPT, the chatbot from Open AI (GPT stands for “generative pre-trained transformer”).
In other words, TopoFormer is trained to read data in one format and convert it to another. Here, it takes three-dimensional data about the shapes of proteins and drugs and translates it into a one-dimensional format understandable by existing models.
“Topo” refers to the mathematical instruments Wei and his colleagues developed to translate 3D structures into 1D sequences. These instruments are known as “topological Laplacian.”
Tens of thousands of protein-drug interactions are used to train the new model, with each interaction between two molecules being recorded as a "word" or a piece of code. The words are combined to describe the drug-protein complex and provide a record of its structure.
Wei said, “In such a way, you have many, many words knitted together like a sentence.”
These sentences can provide additional context to models predicting new drug interactions. If a new drug were a book, TopoFormer could take a basic story concept and develop it into a complete plotline, ready to be written.
Journal Reference:
Chen, D., et al. (2024) Multiscale topology-enabled structure-to-sequence transformer for protein–ligand interaction predictions. Nature Machine Intelligence. doi.org/10.1038/s42256-024-00855-1