26 Sep 2022
6 minute read

Why now is the time to accelerate discoveries in health care

IBM Research is working with partners to find solutions to important health care problems in multiple ways, starting with decreasing the time it takes to discover new drugs and treatment approaches.

Why now is the time to accelerate discoveries in health care

IBM Research is working with partners to find solutions to important health care problems in multiple ways, starting with decreasing the time it takes to discover new drugs and treatment approaches.

There are few things more fundamental to people than their health. Despite major advances in health care, many serious challenges remain. Developing new ways to treat patients is a complex, lengthy, and expensive process. But as we have seen with the rapid development of treatments during the COVID-19 pandemic, the scientific community can come together to find answers to urgent questions much more rapidly than has been possible in the past.

At IBM Research, we believe that a step change in the way we solve scientific problems is on the horizon. Recent advancements in AI, hybrid cloud, high-performance computing and quantum computers have ushered in a new era for enabling better, faster, and more cost-effective decision-making grounded in the scientific method. We call this accelerated discovery.

As IBM Research focuses on technology-driven drug discovery and development with our partners, we have three main goals: We’re aiming to accelerate the discoveries of therapeutic molecules and biomarkers by creating and deploying accelerator technologies. We want to validate and scale these technologies by developing an ecosystem of partners. We also want to promote technical and scientific excellence through knowledge discovery and open-source science. 

Our broad research focus

We cut across many phases of the drug discovery and development process, including discovering new molecular entities, finding ways to repurpose existing drugs, improving the safety and efficacy of drug treatment, discovering novel biomarkers, developing disease staging and progression models, and enhancing clinical trials. Let’s dive into some of these areas in a little more detail.

Discovering new molecular entities 

A “new molecular entity” is essentially a novel chemical structure, that has the potential to act in addressing a disease condition. Creating a new drug often starts with identifying a new molecular entity, but there are two main challenges to doing so. First, we need to be able to accurately search and score the vast space of possible molecules, which we do by using available experimental and computational data. After that, any potential molecules we uncover must be studied to be sure they are also effective, safe, and easily produced.

We’ve partnered with Cleveland Clinic to test our new Learn about this work in our research publications here, here, here, and here.molecular entity discovery pipeline. It can classify whether molecules will work for their intended use, as well as generate new molecules, and interpretate predictions of their efficacy.   

Drug repurposing

Finding new uses for drugs that are already either on the market or have passed safety tests can save a great deal of money and time over discovering and developing entirely new drugs. To date, finding new use cases has mainly been serendipitous, and recent efforts to try to systematically identify candidates for re-purposing have been hampered by the fact that observational data often results in false-positive findings.

We have developed a Learn more about this work here and here.drug-repurposing pipeline and validated it working with Teva Pharmaceuticals. It is also now being applied for joint research with Cleveland Clinic. First, a data extraction process efficiently retrieves relevant cohorts of information containing thousands of features and multiple clinically relevant outcomes. The drug repurposing engine then applies IBM’s open-source causal inference library (called causallib) to these cohorts to obtain statistically sound estimates of the effect of each drug on each outcome, filtering the results to obtain statistically significant candidates for further consideration.    

Improving drug safety and efficacy 

Models informed by “real-world” patient data generated from routine health care interactions can provide a way to understand disease and drug mechanisms, as well as drug side effects that can’t be replicated in clinical trials. For example, once drugs have been approved for clinical use, it might be possible to better understand how they interact with other drugs, or other medical conditions a patient may have. In addition, subgroup effects may only appear at this stage due to use in broader populations than may have been included in sufficient numbers in clinical trials. Knowing these can help improve safety and efficacy, but these models can’t at present exploit all population data to infer population mechanisms, predict performance, and enhance safety. It may take years before enough data is available to detect such issues using current methods.
To address this, we developed Hybrid Modeling as a Service (HMaaS), To learn more about this work, read here and here.which scales simulation and AI on the cloud to automate the construction of a quantitative map between drug and disease mechanisms at the population level. Simulation of any pharmacometrics model occurs continuously on the cloud. This allows us to explore parameter spaces, access key features of response, and build a database for surrogate training. Novel AI methods such as generative adversarial networks (GANs) act as inverse surrogates for model parameter inference. 

Discovering composite clinical biomarkers 

Modern discovery and development of new patient treatments depends on clinical biomarkers that may indicate risk, progression to disease and likely response to different treatments. Such biomarkers come from multiple data sources, including clinical records, imaging, and molecular data, and can be combined to represent to identify personalized composite phenotypes. These phenotypes, and the modalities they are extracted from, operate at different biological, contextual, and time scales. Composite biomarkers can therefore incorporate many different factors that contribute to our understanding of disease processes and treatment response.

We’re working with the JDRF, Cleveland Clinic, and other partners to build multimodal representations that can accelerate composite clinical biomarker detection and quantification and improve patient stratification. We engage the scientific community through many venues, including the MICCAI’s workshop and ISBI challenge, in vivid discussions of how the rapid innovation in the field of deep learning and machine learning can drive biomarker discovery in medical images, pathologies, and more. We also offer an open-source Python framework for accelerating ML-based discovery in the medical field called FuseMedML.

This will lead to better predictions of disease progression and treatment response. The generation of multimodal representations is facilitated by several IBM technologies that generate and integrate modality-specific encodings. These include molecular analyses through deep learning on medical images, tissue images, genomic and clinical data, and multiple Learn more about our research here, here, here, and here.data integration and fusion techniques. While biomarkers from modality-specific representations can enhance clinical trials through patient stratification and disease staging, a holistic view of the patient that also encodes the patient journey can accelerate the next-generation of discovery by revealing the connections between biomarkers across biological, contextual, and time scales which can further lead to hypotheses for new drug targets. 

Disease staging and progression modeling 

One of the main bottlenecks in drug discovery is the high failure rate of clinical trials. It’s challenging to identify the relevant patient populations and therapeutic endpoints, leading to a fragmented understanding of disease progression.  

Working with the Broad Institute, the CHDI Foundation, the JDRF and the Michael J. Fox Foundation, we’re creating a layered approach to computational disease progression modeling. The foundation is a general model development and management workbench. To facilitate extracting salient representations from real-world patient data, we’ve developed computational phenotyping methods. We’ve also created a suite of Read more about these models here, here, here, here, and here.reusable AI models to help uncover disease stages and progression patterns from complex longitudinal data. Our approach accelerates the discovery process by quantifying disease status and future risk prediction, tracking time-to-event analysis, and visualizing models and data.  

An example of how this works is our collaboration with JDRF, which led to a publication earlier this year in The Lancet Diabetes & Endocrinology that describes work we have done to take the next logical step with these developing biomarkers, which is to begin charting a path towards screening for T1D particularly in young children with higher genetic or familial risk. This could also help with selecting the right patients for clinical trials.  

Clinical trial enhancement 

The success of clinical trials is affected by inefficiencies across the different trial phases, with downstream impacts on participant recruitment, engagement and retention, and adequate representation of diverse study populations. 

Working with Cleveland Clinic and Trinity College Dublin, we’re developing AI and machine-learning strategies that enable new ways to design clinical trials, select participants, and monitor clinical trials while addressing longstanding bottlenecks in clinical trial diversity, recruitment, and retention.

We’re using IBM’s Deep Search platform, as well as ontology and knowledge graph technologies to address the unmet needs for trial enhancement. Specifically, we use machine learning and natural-language processing to ingest and analyze large amounts of unstructured data (like scientific literature such as PubMed), and a suite of Learn more about the tools here and here.knowledge graph tools to link them with real-world data (like individual records). Our combined technologies enable researchers to create actionable insights for trial enhancements such as engagement and retention. We’ll be validating our tools for predicting engagement and retention using randomized controlled trial data collected as part of the EU's project SEURO.

We’re working to build AI-enabled processes and capabilities that will help us and our partners overcome longstanding bottlenecks in the discovery and development of novel therapeutics at unprecedented speed, automation, and scale. This is just the beginning of the work we’re undertaking, and we invite you to learn more about our work accelerating discoveries in health care research.


  1. Note 1Learn about this work in our research publications here, here, here, and here. ↩︎
  2. Note 2Learn more about this work here and here. ↩︎
  3. Note 3To learn more about this work, read here and here. ↩︎
  4. Note 4Learn more about our research here, here, here, and here. ↩︎
  5. Note 5Read more about these models here, here, here, here, and here. ↩︎
  6. Note 6Learn more about the tools here and here. ↩︎