Dzung Phan, Vinicius Lima
INFORMS 2023
This paper addresses few-shot semantic segmentation (FSS) guided by text, where we classify unseen novel classes using image and text references as in-context examples, without the need for training. We enhance the quality and stability of the segmentation masks generated by FSS by combining the capability of open-vocabulary zero-shot semantic segmentation (ZSS) based on foundation models for image and text. We propose a training-free approach using multimodal feature matching that performs segmentation by identifying regions in a target image that match the features from both the image and text references. Experimental results demonstrate that the proposed method outperforms state-of-the-art FSS and ZSS methods.
Dzung Phan, Vinicius Lima
INFORMS 2023
Jehanzeb Mirza, Leonid Karlinsky, et al.
NeurIPS 2023
Hagen Soltau, Lidia Mangu, et al.
ASRU 2011
Liya Fan, Fa Zhang, et al.
JPDC