Paper

Analogue speech recognition based on physical computing

Abstract

With the rise of decentralized computing, such as in the Internet of Things, autonomous driving and personalized healthcare, it is increasingly important to process time-dependent signals ‘at the edge’ efficiently: right at the place where the temporal data are collected, avoiding time-consuming, insecure and costly communication with a centralized computing facility (or ‘cloud’). However, modern-day processors often cannot meet the restrained power and time budgets of edge systems because of intrinsic limitations imposed by their architecture (von Neumann bottleneck) or domain conversions (analogue to digital and time to frequency). Here we propose an edge temporal-signal processor based on two in-materia computing systems for both feature extraction and classification, reaching near-software accuracy for the TI-46-Word1 and Google Speech Commands2 datasets. First, a nonlinear, room-temperature reconfigurable-nonlinear-processing-unit3,4 layer realizes analogue, time-domain feature extraction from the raw audio signals, similar to the human cochlea. Second, an analogue in-memory computing chip5, consisting of memristive crossbar arrays, implements a compact neural network trained on the extracted features for classification. With submillisecond latency, reconfigurable-nonlinear-processing-unit-based feature extraction consuming roughly 300 nJ per inference, and the analogue in-memory computing-based classifier using around 78 µJ (with potential for roughly 10 µJ)6, our findings offer a promising avenue for advancing the compactness, efficiency and performance of heterogeneous smart edge processors through in materia computing hardware.