Sep 4 2019
Scientists are harnessing artificial intelligence (AI) to find genes that cause disease. A KAUST research team is performing a creative, combined deep learning method that uses data from numerous sources to instruct algorithms on how to locate patterns between genes and diseases.
Machine learning uses statistical models and algorithms to identify patterns and associations among data to resolve specific issues. By entering sufficient known data, like tagged images of “Jack,” the system can ultimately learn to propose other non-tagged images that contain Jack.
Scientists are using this AI application to discover genes that cause diseases. But, only a narrow number of genes have been experimentally verified to be causative. This means that researchers do not have plenty of data to input into their programs to assist them in learning the patterns depicting gene-disease associations. Therefore, they have to be creative to discover ways to instruct machine learning algorithms to learn and then hunt for these patterns.
Database and information management specialist Panagiotis Kalnis, computational bioscientist Xin Gao, and co-workers have created a deep learning model which according to them outpaces existing state-of-the-art approaches.
First, they turned to familiar databases to derive information on gene locations and functions and on how and when they switch on and off. This information was used to instruct algorithms to locate genes that function together. Then, they acquired data on the features of genetic diseases from other databases. This trained the algorithms on how to detect diseases with similar manifestations. The researchers joined these datasets with data on the identified associations between 12,231 genes and 3,209 diseases.
The KAUST model extracts the patterns derived from how genes network and about the similarities among genetic diseases, and shifts them to a deep learning model known as a graph convolutional network. This gives another set of data that is positioned in matrices, like those used in recommendation systems, to envisage gene-disease association.
The model was able to detect complex, nonlinear associations between genes and diseases, enabling it to go on to predict new associations.
By making use of more information, we achieved better accuracy than the state-of-the-art methods currently in use. But, even though we outperformed other methods in our experiments, it is still not accurate enough to be applied to industry.
Peng Han, Study First Author, KAUST
The researchers are planning to enhance the accuracy of their model by adding more kinds of data. They will also apply the technique to solve other types of issues where only inadequate data is available, such as suggesting new locations to visit according to a user’s past preferences.