However, understanding how a certain data can help answer a specific question is an intriguing problem. Because most human diseases are complicated and heterogeneous, using data to accurately subtype a disease can open up a plethora of treatment options in a clinical setting. For example, performing a therapy with strong side effects could be justified if data could be used to predict the likelihood of a patient’s rapid decline without treatment.
Today, IBM Research and the Munich Leukemia Laboratory are publishing new research in PLOS Computational Biology that aims to subtype different hematological (blood) cancers based on omic data – or information surrounding the roles, relationships and actions of various types of molecules that make up the cells of an organism. In this case, we looked specifically at elements of the human genome, including DNA and dark matter DNA. We currently do not know anything at all about 50 percent of the human genome (very conservatively speaking) called the “dark matter” – similar to our very limited understanding of the dark matter of our universe.1
Since the tumor cells of origin for one type of cancer is the same, it makes the problem of molecular subtyping harder. We took our analysis further by asking the question whether DNA alone (not RNA or proteins) gave adequate information to subtype these closely related cancers. Our resulting discoveries resulted in two breakthroughs in this space:
The off-the-shelf AI algorithms that we used for this problem were inadequate, underscoring the importance of domain-specific nuances in the statistical learning process. We designed a stochastic regularization AI model, specifically for DNA data, to address the confounding heterogeneity that exists in these datasets. In fact, this works well even for other phenotypes, including treatment responses (suggesting a molecular basis for those phenotypes).
Using the unique AI models we designed, coined ReVeal, we were able to achieve a 75 percent accuracy rate in identifying blood cancers using either non-dark DNA or dark matter DNA; compared to just a 35 percent accuracy rate achieved with standard AI methods.1
These results and the models we created lay the groundwork to continue exploring the significance of dark matter DNA further, in blood cancers – and potentially other types of cancers.