Results of the 2016 International Skin Imaging Collaboration International Symposium on Biomedical Imaging challenge: Comparison of the accuracy of computer algorithms to dermatologists for the diagnosis of melanoma from dermoscopic images

Michael A. Marchetti; Noel C.F. Codella; Stephen W. Dusza; David A. Gutman; Brian Helba; Aadi Kalloo; Nabin Mishra; Cristina Carrera; M. Emre Celebi; Jennifer L. DeFazio; Natalia Jaimes; Ashfaq A. Marghoob; Elizabeth Quigley; Alon Scope; Oriol Yélamos; Allan C. Halpern

doi:10.1016/j.jaad.2017.08.016

J. Am. Acad. Dermatol.

Paper

01 Feb 2018

Results of the 2016 International Skin Imaging Collaboration International Symposium on Biomedical Imaging challenge: Comparison of the accuracy of computer algorithms to dermatologists for the diagnosis of melanoma from dermoscopic images

View publication

Abstract

Background: Computer vision may aid in melanoma detection. Objective: We sought to compare melanoma diagnostic accuracy of computer algorithms to dermatologists using dermoscopic images. Methods: We conducted a cross-sectional study using 100 randomly selected dermoscopic images (50 melanomas, 44 nevi, and 6 lentigines) from an international computer vision melanoma challenge dataset (n = 379), along with individual algorithm results from 25 teams. We used 5 methods (nonlearned and machine learning) to combine individual automated predictions into “fusion” algorithms. In a companion study, 8 dermatologists classified the lesions in the 100 images as either benign or malignant. Results: The average sensitivity and specificity of dermatologists in classification was 82% and 59%. At 82% sensitivity, dermatologist specificity was similar to the top challenge algorithm (59% vs. 62%, P =.68) but lower than the best-performing fusion algorithm (59% vs. 76%, P =.02). Receiver operating characteristic area of the top fusion algorithm was greater than the mean receiver operating characteristic area of dermatologists (0.86 vs. 0.71, P =.001). Limitations: The dataset lacked the full spectrum of skin lesions encountered in clinical practice, particularly banal lesions. Readers and algorithms were not provided clinical data (eg, age or lesion history/symptoms). Results obtained using our study design cannot be extrapolated to clinical practice. Conclusion: Deep learning computer vision systems classified melanoma dermoscopy images with accuracy that exceeded some but not all dermatologists.

Paper