Reviewed by Lexie CornerApr 10 2025
A team of scientists from Portugal and Canada has developed a new tool called ulrb, which uses machine learning to autonomously identify rare microorganisms in microbiome datasets. This approach addresses a key challenge in microbial ecology: distinguishing less common organisms from the dominant ones in natural environments. The study was published in Communications Biology.
The tool was developed through an international collaboration, involving CIIMAR, the Faculty of Sciences of the University of Porto, the Institute of Bioengineering and Biosciences (iBB) at the Instituto Superior Técnico of the University of Lisbon, the School of Electrical Engineering and Computer Science at the University of Ottawa (EECS), and the Faculty of Computer Science at Dalhousie University in Canada.
This development stems from the Ph.D. project of CIIMAR student Francisco Pascoal, supervised by CIIMAR researcher Catarina Magalhães and co-supervised by researchers Rodrigo Costa (iBB) and Paula Branco (EECS). The new software is expected to improve the accuracy and depth of ecological analyses of various microbiomes and ecosystems, advancing the understanding of microbial diversity and its role in ecosystem resilience.
What is the Rare Biosphere?
Microbial communities usually show a pattern where only a few species are highly abundant, while the majority are rare and make up the “rare biosphere.” For instance, a liter of seawater may contain thousands of prokaryotic microorganism species, but only 2 to 5 % of these are abundant. The rest are rare and difficult to detect due to current methodological constraints.
Why Is It So Important to Study Rare Microorganisms?
Despite their low abundance, rare species contribute most of the planet’s genetic diversity and are essential for ecosystem resilience.
Francisco Pascoal explained, “If the most abundant species are threatened by climate change, other rare species can take over and ensure the functions of the microbiome, keeping the ecosystem stable.”
Thus, the rare biosphere plays a crucial role in how ecosystems respond to significant environmental changes, such as those caused by climate change. Studying these rare organisms helps researchers understand ecosystem resilience and reactions to environmental shifts.
What is Innovative About ulrb?
By utilizing unsupervised machine learning techniques, ulrb enables researchers to efficiently and accurately identify rare microorganisms within a community. A notable advantage of this method is its adaptability to various methodological contexts. The algorithm learns the patterns within the data itself, regardless of its origin.
The possibility of identifying rare microorganisms arose with the development of high-throughput DNA sequencing technologies, but even with this data, it was never clear among peers how to identify rare microorganisms, as they were overshadowed by the abundant ones. Thus, many researchers limited themselves to establishing random levels of abundance, which was an insufficient approach since it was not supported by biological justification. With this new method, we were able to use sequencing data to automatically distinguish which microorganisms are rare, based on the information provided in each sample.
Francisco Pascoal, Study First Author and Student, CIIMAR
To automate the identification of rare microorganisms, the researchers developed an algorithm that groups microorganisms based on the similarity of their abundance patterns across different samples. This grouping process, which relies on the relative differences in abundance, is fully automated and can be applied to datasets of any size, providing consistent and reliable results with ecological and biological significance.
“Basically, the algorithm ‘learns’ what the abundance groups in a community are and matches them up with an abundance classification, which makes it possible to distinguish microorganisms that are rare from those that are abundant,” said Francisco.
What are the Possible Applications?
The ulrb tool is compatible with data generated from standard microbial ecology methods and could be valuable for studying new diseases and the spread of invasive species. Due to its flexibility with non-microbial data, it can also be used to identify animal and plant species at risk in specific contexts, making it a useful tool for environmental monitoring.
ulrb is available as an open-source R package on CRAN and GitHub. The research team has also developed a website with learning resources to help users utilize the tool.
Journal Reference:
Pascoal, F., et al. (2025) Definition of the microbial rare biosphere through unsupervised machine learning. Communications Biology. doi.org/10.1038/s42003-025-07912-4