About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
Experimental Biology and Medicine
Paper
Topology and redescriptions detect multiple alternative biological pathways from clinical phenotypes
Abstract
Biological pathways play a crucial role in the properties of diseases and are important in drug discovery. Identifying the logical relationships among distinctive phenotypic clusters could reveal possible connections to the underlying pathways. However, this process is challenging since clinical phenotypes are often available through unstructured electronic health records. Moreover, in the absence of a standardized questionnaire, there could be bias among physicians toward selecting certain medical terms. In this article, we develop an efficient pipeline to address these challenges and help practitioners to reveal the pathways associated with the disease. We use topological data analysis and redescriptions and propose a pipeline of four phases: (1) pre-processing the clinical notes to extract the salient concepts, (2) constructing a feature space of the patients to characterize the extracted concepts, (3) leveraging the topological properties to distill the available knowledge and visualize the extracted features, and finally, (4) investigating the bias in the clinical notes of the selected features and identify possible pathways. Our experiments on a publicly available dataset of COVID-19 clinical notes testify that our pipeline can indeed extract meaningful pathways.