Text-Guided Few-Shot Semantic Segmentation with Training-Free Multimodal Feature Matching

Guillaume Buthmann; Tomoya Sakai; Haoxiang Qiu; Takayuki Katsuki; Daiki Kimura

ICASSP 2025

Conference paper

06 Apr 2025

Text-Guided Few-Shot Semantic Segmentation with Training-Free Multimodal Feature Matching

View publication

Abstract

This paper addresses few-shot semantic segmentation (FSS) guided by text, where we classify unseen novel classes using image and text references as in-context examples, without the need for training. We enhance the quality and stability of the segmentation masks generated by FSS by combining the capability of open-vocabulary zero-shot semantic segmentation (ZSS) based on foundation models for image and text. We propose a training-free approach using multimodal feature matching that performs segmentation by identifying regions in a target image that match the features from both the image and text references. Experimental results demonstrate that the proposed method outperforms state-of-the-art FSS and ZSS methods.

Talk