Big data for medical image analysis: A performance study
Big data systems can be used to facilitate powerfulmedical image analysis at scale. Understanding their behaviorsin this context can lead to many benefits, ranging from superiorinfrastructure configurations to optimized parallel algorithmimplementations. This paper is, to our knowledge, a first steptowards developing such an understanding for state-of-the-artbig data platforms. We characterize a representative medicalimage segmentation pipeline, detailing the per-stage CPU, memory, I/O reads and writes, and execution time patterns. Thischaracterization has already helped us overcome a bottleneckpersistently causing analysis to crash unexpectedly, and avoidpoor architecture choices on storage and parallel execution.