Dzung Phan, Vinicius Lima
INFORMS 2023
Scientific information provided in public databases (e.g., GEO, GWAS Catalog, PubMed) contains extensive knowledge about biomedical entities and the relations among them, which scientists have gained through experiments and analyses throughout decades. The plethora of scientific information often poses challenges for scientists when they attempt to formulate hypotheses and find answers to them by searching and reading relevant literature from the database. To aid scientists' endeavors, we introduce an AI-enhanced visual analytics workflow called GENET, which aims to help domain scientists visually and interactively explore biomedical entity networks extracted from databases, scientific literature, and more. Namely, the workflow consists of the following four steps: 1) biological network analysis: identify interesting genes/snps that are associated with a given, target disease through a neural network trained to predict links between diseases and genes/snps through surrogate genes/snps; 2) literature evidence mining pipeline: including the biological entities of interest, extract entity-relations, label entity types and relation types using large language models; 3) pre- processing: generate embeddings of entities using pre-trained biomedical language models (e.g., BioBERT, BioLinkBERT) and cluster entities and relations; 4) interactive visualizations: visualize biomedical entity networks and provide interactive handles for exploration. The workflow enables users to formulate hypotheses against evidence from scientific literature and databases and gain insights through interactive visualizations. In this talk, we introduce the systems and demonstrate a use case of the system.