
New Machine Learning Model Handles Technical Variability to Accelerate Microbiome-Based Cancer Detection
In the rapidly evolving field of microbiome research, a significant challenge is the variability in data produced by different laboratories. Slightly different experimental procedures can drastically change these sensitive measurements, which hampers the reproducibility and generalizability of findings and poses obstacles to developing reliable diagnostic tools and therapeutic strategies. To address this issue, a team led by Columbia researcher Tal Korem, PhD, has introduced DEBIAS-M, a novel machine-learning-based computational model designed to correct processing biases in microbiome studies, enhancing cross-study integration and predictive accuracy.
"Microbiome research holds incredible potential for early disease detection and personalized medicine," Korem explains. "But the variability introduced by different lab protocols has made it difficult to translate findings into real-world applications. DEBIAS-M is our solution to that problem."
Microbiome profiling involves complex procedures, including sample collection, DNA extraction, sequencing, and data analysis. Each step can introduce biases. For instance, certain DNA extraction kits may be more efficient for Gram-positive compared with Gram-negative bacteria, leading to skewed representations of microbial populations. Such biases complicate the comparison of results across different studies, making it difficult to develop broadly applicable microbiome-based models.
DEBIAS-M (Domain adaptation with phenotype Estimation and Batch Integration Across Studies of the Microbiome) utilizes computational methods to "learn" bias-correction factors for each microbe within each batch of samples. Using a combination of machine learning techniques and statistical modeling, Korem and his team used large collections of publicly available microbiome datasets, training the model to quantify estimates for each protocol’s processing biases for each microbe. By correcting for these biases, the model simultaneously minimizes batch effects and enhances the detection of true associations between microbial profiles and clinical phenotypes. This approach allows researchers to integrate data from multiple studies more effectively, leading to more accurate and generalizable predictive models.
"One of the biggest limitations in microbiome research has been the lack of reproducibility across studies," says Korem, an assistant professor of systems biology and member of the Herbert Irving Comprehensive Cancer Center. "We needed a way to separate real biological signals from artifacts introduced by different lab protocols. DEBIAS-M helps us do just that."
Korem and his team then applied DEBIAS-M to several publicly available datasets, focusing on those related to HIV, colorectal cancer, and cervical neoplasia. They demonstrated that DEBIAS-M outperformed traditional batch-correction methods, such as ComBat and voom-SNM, in improving the predictive performance of microbiome-based models. Notably, the bias-correction factors inferred by DEBIAS-M were found to be stable, interpretable, and closely associated with specific experimental protocols. This differs from previous methods, which often change the data in a way that is often hard for researchers to understand or explain.
DEBIAS-M has further implications for diagnostic applications, particularly in early cancer detection. "Microbiome signatures are increasingly being explored as biomarkers for disease," Korem explained. "If we can make these signatures more reliable across studies and populations, we can move closer to real-world applications, like microbiome-based diagnostics for early cancer detection."
Some of these potential applications are already underway. Korem recently published a study on the vaginal microbiome that provided insights into the microbial factors influencing preeclampsia, potentially paving the way for new screening tools. Korem and his colleagues are also pursuing research into the role of the salivary microbiome in Barrett’s esophagus, a precursor to esophageal cancer. These studies underscore the growing recognition that microbial communities may serve as early indicators of disease.
The hope is to develop early detection tools that could get patients treatment when it is most effective. Korem and his team have made DEBIAS-M available as an open-source package aiming to accelerate its adoption as a standard tool in microbiome data analysis. By making it accessible to all researchers, they hope to promote greater collaboration and consistency across studies.
"The long-term goal is to integrate microbiome-based diagnostics into routine clinical care," Korem said. "We’re still in the early stages, but with methods like DEBIAS-M, we are making significant strides toward that future."
References
Additional Information
Tal Korem, PhD, is an Assistant Professor in the Departments of Systems Biology and Obstetrics & Gynecology. He is a member of the Tumor Biology and Microenvironment program at the Herbert Irving Comprehensive Cancer Center and Columbia’s Program for Mathematical Genomics (PMG).
The study, “Processing-bias correction with DEBIAS-M improves cross-study generalization of microbiome-based prediction models,” was published March 27th, 2025 in Nature Microbiology.