VibroFM: Towards Micro Foundation Models for Robust Multimodal IoT Sensing
Abstract
The paper argues for the feasibility and utility of micro foundation models (μFMs), a key direction for future smart IoT/CPS systems that exploits advances in self-supervised pretraining to support multiple downstream tasks. We demonstrate key beneficial properties such as latent representation independence from the downstream task, robustness to domain shifts, and ability to learn from unlabeled data. Importantly, we demonstrate the emergence of these properties after pretraining with only moderate amounts of unlabeled data, earning the name μFMs. To make the argument, evaluate model efficacy, and surface some of the underlying challenges, this paper describes a vibration-based μFM, called VibroFM, pretrained with moderate amounts of unlabeled acoustic and seismic sensing data, to support target classification and tracking applications. VibroFM is pretrained in an environment-agnostic fashion using unlabeled sensor data. It can then be fine-tuned to a given deployment using a small amount of in-situ labeled data. The paper shows that VibroFM (i) improves the robustness of several downstream tasks, (ii) efficiently adapts to different environmental conditions (using only small amounts of fine-tuning), and (iii) allows few-shot generalization to unseen targets. We further show that VibroFM can execute in real time on embedded sensor nodes. We compare the robustness and performance of VibroFM to conventional supervised deep neural networks, showing the advantages of the former. Combined with the feasibility of executing μFMs in resource-limited settings and the sufficiency of only moderate amounts of data for their pretraining, we conclude the importance of micro foundation models as a promising research direction for the IoT/CPS community.