Reviewed by Alex SmithFeb 3 2022
Data is not only the answer to various questions in the field of business but also in the field of biomedical research.
To create prevention strategies or new therapies for diseases, researchers require a large quantity of data that is of better quality as well as fast. However, quality is frequently very variable, and the incorporation of different data sets is often quite impossible.
The Computational Health Center at Helmholtz Munich, one of the largest research centers for artificial intelligence in medical science in Europe is currently being set up under the guidance of Fabian Theis.
In close collaboration with the Technical University of Munich (TUM), more than one hundred researchers are harnessing artificial intelligence and machine learning to discover solutions to exactly these issues, thus supporting medical innovations for a healthier society. In an issue of the journal Nature Methods, they showcase three articles with innovative new solutions.
It’s been a crazy 4 weeks, with many of our scientific stories and methods coming to fruition in that same time window. Our research groups focus on using single-cell genomics to understand the origin of disease in a mechanistic fashion – for this we leverage and develop machine learning approaches to better represent this complex data.
Fabian Theis, Head of Computational Health Center, Helmholtz Munich
Fabian Theis is also a professor for Mathematical Modeling of Biological Systems at TUM.
In the three new papers, we worked on single-cell data integration, trajectory learning, and spatial resolution, respectively. Besides the applications shown in the papers, we expect to support the next generation of single-cell research towards disease understanding.
Fabian Theis, Head of Computational Health Center, Helmholtz Munich
The following paragraphs are the most recent solutions formulated by Helmholtz Munich and TUM scientists:
Solving the Data Integration Challenge
To check whether an observation made in a single dataset can be universal, one can check whether the same can be seen in other datasets of the same system.
In single-cell data, so-called batch effects tend to complicate combining datasets in this way. These are differences in the molecular profiles between samples as they were produced at a different time, from a different person, or in a different place.
Conquering these effects is a key challenge in single-cell genomics with over 50 recommended solutions. But which is the best one? A team of scientists around Malte Lücken meticulously curated 86 datasets and compared 16 of the most prevalent data integration techniques on 13 tasks.
After more than 55,000 hours of computation time and a thorough evaluation of 590 results, they put together a guide for enhanced data integration. This facilitates enhanced observations on disease processes spanning datasets at a population scale.
Predicting Cell States with Open-Source Software
Numerous questions in biology spin around continuous processes such as development or regeneration. For any cell in that kind of process, single-cell RNA-sequencing measures gene expression.
The technique, however, is harmful to cells and researchers acquire only static snapshots. Thus, numerous algorithms have been created to rebuild continuous processes from snapshots of gene expression.
A standard limitation: These algorithms cannot offer anything about the direction of the process.
To surpass this limitation, Marius Lange and colleagues formulated a new algorithm termed as CellRank. It approximates directed cell-state trajectories by integrating earlier reconstruction methods with RNA velocity, a theory to estimate gene upregulation or downregulation.
In in-vitro and in-vivo applications, CellRank accurately deduced fate outcomes and recovered formerly known genes. In a lung regeneration example, CellRank projected unique intermediate cell states on a dedifferentiation trajectory whose presence was verified experimentally.
CellRank is an open-source software package that is already used globally by biologists and bioinformaticians to examine multifaceted cellular dynamics in situations like cancer, regeneration or reprogramming.
Visualizing Spatial Omics Analysis
The last few years have witnessed an increasing development of technologies to compute gene expression variation in tissue. The benefit of such technologies is that researchers can observe cells in their context, thus being able to examine concepts of tissue organization and cellular communication.
Scientists require adaptable computational frameworks in order to store, integrate, and visualize the increasing diversity of such data. To handle this challenge, Giovanni Palla, Hannah Spitzer, and colleagues created a new computational framework, termed Squidpy. It allows developers and analysts to handle spatial gene expression data.
Squidpy combines tools for gene expression and image analysis to competently work and interactively visualize spatial omics data. Squidpy is extensible and can be interfaced with several machine learning tools in the python ecosystem. Researchers worldwide are already using it to examine spatial molecular data.
Journal References:
Luecken, M. D., et al. (2022) Benchmarking atlas-level data integration in single-cell genomics. Nature Methods. doi.org/10.6084/m9.figshare.12420968.v7.
Lange, M., et al. (2022) CellRank for directed single-cell fate mapping. Nature Methods. doi.org/10.1038/s41592-021-01346-6.
Palla, G., et al. (2022) Squidpy: a scalable framework for spatial omics analysis. Nature Methods. doi.org/10.1038/s41592-021-01358-2.