Dark Matter Matters: AI Makes DNA Dark Matter Useful

What is the minimal description that captures a space? Asking a mathematician’s basic question of a biological dataset reveals interesting answers about biology itself. This summarizes our underlying approach to subtyping hematological cancer.

Disease subtyping is a central tenet of precision medicine, and is the challenging task of identifying and classifying patients with similar presentations of a complex and intricate disease – which can help guide better and more informed treatment options for a given individual.

Today, a patient’s data can be collected from a multitude of perspectives (modes): genomic/DNA, transcriptomic/RNA, proteomic, histopathologic images, radiographic and other images, electronic medical records that include a plethora of readouts over time, and much more. Given the general state of our understanding of human diseases, more is indeed more, in terms of data modalities.

Specializing the AI algorithm (ReVeal) cleanly separates the subtypes, shown in distinct colors (top right as opposed to bottom left). The portions of the DNA used by ReVeal is the dark-matter region shown as black segments on the 22 autosomes.

However, understanding how a certain data can help answer a specific question is an intriguing problem. Because most human diseases are complicated and heterogeneous, using data to accurately subtype a disease can open up a plethora of treatment options in a clinical setting. For example, performing a therapy with strong side effects could be justified if data could be used to predict the likelihood of a patient’s rapid decline without treatment.

Today, IBM Research and the Munich Leukemia Laboratory are publishing new research in PLOS Computational Biology that aims to subtype different hematological (blood) cancers based on omic data – or information surrounding the roles, relationships and actions of various types of molecules that make up the cells of an organism. In this case, we looked specifically at elements of the human genome, including DNA and dark matter DNA. We currently do not know anything at all about 50 percent of the human genome (very conservatively speaking) called the “dark matter” – similar to our very limited understanding of the dark matter of our universe.¹

Since the tumor cells of origin for one type of cancer is the same, it makes the problem of molecular subtyping harder. We took our analysis further by asking the question whether DNA alone (not RNA or proteins) gave adequate information to subtype these closely related cancers. Our resulting discoveries resulted in two breakthroughs in this space:

DNA alone contains enough signal to subtype blood cancers: DNA is considered the blueprint of the organism - it encodes genes and there are regions outside of genes which play direct or indirect roles in turning genes on and off.
“Dark matter” DNA plays a much larger role than previously thought in influencing the phenotype of cells/tissues: Our research found that dark matter DNA alone is adequate in subtyping the cancer. This turns on its head the general belief that dark matter is largely outside the functional or any consequential realm, and proves that it deserves more study.

The off-the-shelf AI algorithms that we used for this problem were inadequate, underscoring the importance of domain-specific nuances in the statistical learning process. We designed a stochastic regularization AI model, specifically for DNA data, to address the confounding heterogeneity that exists in these datasets. In fact, this works well even for other phenotypes, including treatment responses (suggesting a molecular basis for those phenotypes).

Using the unique AI models we designed, coined ReVeal, we were able to achieve a 75 percent accuracy rate in identifying blood cancers using either non-dark DNA or dark matter DNA; compared to just a 35 percent accuracy rate achieved with standard AI methods.¹

These results and the models we created lay the groundwork to continue exploring the significance of dark matter DNA further, in blood cancers – and potentially other types of cancers.

Subscribe to our Future Forward newsletter and stay up to date on the latest research news

Subscribe to our newsletter

References

Parida, L. et al. Dark-matter matters: Discriminating subtle blood cancers using the darkest DNA. PLOS Computational Biology vol. 15 e1007332 (2019). ↩ ↩²

From simulated steps to real-world care: AI learns how we walk for neurology
Research
Peter Hess
29 Jul 2025
IBM and Cleveland Clinic unveil the first quantum computer dedicated to healthcare research
News
Mike Murphy and Bethany Douglas
21 Mar 2023
IBM Research and JDRF continue to advance biomarker discovery research
Technical note
Eileen Koski, Kenney Ng, Vibha Anand, Jianying Hu, and Mohamed Ghalwash
27 Jan 2023
- Healthcare
- Life Sciences
Accelerating discoveries in immunotherapy and disease treatment
Technical note
Sara Capponi
06 Jan 2023

References

Related posts

From simulated steps to real-world care: AI learns how we walk for neurology

IBM and Cleveland Clinic unveil the first quantum computer dedicated to healthcare research

IBM Research and JDRF continue to advance biomarker discovery research

Accelerating discoveries in immunotherapy and disease treatment