Computers in Biology and Medicine

The effect of machine learning explanations on user trust for automated diagnosis of COVID-19

View publication


Recent years have seen deep neural networks (DNN) gain widespread acceptance for a range of computer vision tasks that include medical imaging. Motivated by their performance, multiple studies have focused on designing deep convolutional neural network architectures tailored to detect COVID-19 cases from chest computerized tomography (CT) images. However, a fundamental challenge of DNN models is their inability to explain the reasoning for a diagnosis. Explainability is essential for medical diagnosis, where understanding the reason for a decision is as important as the decision itself. A variety of algorithms have been proposed that generate explanations and strive to enhance users' trust in DNN models. Yet, the influence of the generated machine learning explanations on clinicians' trust for complex decision tasks in healthcare has not been understood. This study evaluates the quality of explanations generated for a deep learning model that detects COVID-19 based on CT images and examines the influence of the quality of these explanations on clinicians’ trust. First, we collect radiologist-annotated explanations of the CT images for the diagnosis of COVID-19 to create the ground truth. We then compare ground truth explanations with machine learning explanations. Our evaluation shows that the explanations produced. by different algorithms were often correct (high precision) when compared to the radiologist annotated ground truth but a significant number of explanations were missed (significantly lower recall). We further conduct a controlled experiment to study the influence of machine learning explanations on clinicians' trust for the diagnosis of COVID-19. Our findings show that while the clinicians’ trust in automated diagnosis increases with the explanations, their reliance on the diagnosis reduces as clinicians are less likely to rely on algorithms that are not close to human judgement. Clinicians want higher recall of the explanations for a better understanding of an automated diagnosis system.