Publication
ACL 2023
Conference paper

pNLP-Mixer: an Efficient all-MLP Architecture for Language

Abstract

Large pre-trained language models based on the transformer architecture drastically changed the natural language processing (NLP) landscape. However, deploying those models for on-device applications in constrained devices such as smart watches is completely impractical due to their size and inference cost. As an alternative to transformer-based architectures, recent work on efficient NLP has shown that weight-efficient models can reach competitive performance for simple tasks, such as slot filling and intent classification, with model sizes in the order of the megabyte. This work introduces the pNLP-Mixer architecture, an embedding-free MLP-Mixer model for on-device NLP that achieves high weight-efficiency thanks to a novel projection layer. We evaluate a pNLP-Mixer model of only two megabytes in size on two multi-lingual semantic parsing datasets, MTOP and multiATIS. On MTOP, our quantized model achieves 99.2% the performance of mBERT, while using 85x less parameters. Our model consistently beats the state-of-the-art of tiny models (pQRNN) of the exact same size by a margin of more than 5%.

Date

09 Jul 2023

Publication

ACL 2023