EfficientMedSAM: Accelerating Medical Image Segmentation via Neural Architecture Search and Knowledge Distillation
Abstract
Medical image segmentation is crucial for precise diagnosis, treatment planning, and disease monitoring in clinical practice. Despite the advancements in segmentation models inspired by the Segment Anything Model (SAM) architecture, their real-time application is limited due to computational inefficiencies arising from transformer-based architectures. We introduce EfficientMedSAM, a suite of high-speed, memory-efficient foundation models for universal medical image segmentation. Our approach leverages differentiable neural architecture search (NAS) to explore a novel search space emphasizing efficient operations over traditional attention mechanisms. The generated candidate architectures are further refined through knowledge distillation from larger MedSAM models and evaluated on the Kvasir dataset of endoscopic images. EfficientMedSAM achieves competitive mean Average Precision (mAP) while substantially reducing Multiply-Accumulate operations (MACs) and model parameters, enhancing throughput. We integrate a knowledge distillation (KD) pipeline that transfers knowledge from logits and attention maps, using saliency maps as proxies for attention map-based distillation. Our findings establish a proof of concept for large-scale distributed training on the SA-Med2D-20M dataset, paving the way for real-time medical image segmentation advancements.