Towards Verifying Results from Biomedical NLP Machine Learning Models Using the UMLS: Cases of Classification and Named Entity Recognition
Machine learning (ML) for biomedical research is one of the fastest growing research areas in the world today. For NLP specifically, free-text healthcare reports are an important resource whose processing can contribute potentially to patient diagnosis, treatment, and management. However, the inability to explain the outputs of ML algorithms is currently a barrier to the use of these models in a clinical setting. We present a method that uses the ontologies and knowledge-bases in the Unified Medical Language System (UMLS) to verify and explain the output of biomedical ML models. Our verifier takes as input the results from an ML model, and uses the UMLS to correlate the results of the task with the confidence of the model for each result. We applied this architecture to two tasks using textual cancer pathology reports: ICD-O topography classification, and named entity recognition. For the former, we identified that the presence of certain entities in a report is inversely related to the model’s confidence values; while, for the latter, we identified categories of errors related to lower confidence values. Our approach, therefore, not only verifies the accuracy of ML model results, but provides explanations that may be used to improve model design and performance.