Using models uploaded to MLCommons, an industry benchmarking and collaboration site, the team could compare their demo system’s efficacy to those running on digital hardware. Developed by MLCommons, the MLPerf repository benchmark data showed that the IBM prototype was seven times faster over the best MLPerf submission in the same network category, while maintaining high accuracy. The model was trained on GPUs using hardware-aware training and then deployed on the team’s analog AI chip.
The second experiment was considerably larger, and hints at a future where generative AI systems, built on analog chips, could be used in place of digital ones. It aimed to implement a large, complex model, using five of the team’s chips stitched together, and simulated off-chip digital computations to showcase the scalability of analog AI. The researchers ran a recurrent neural network transducer (or RNNT) speech-to-text model found on MLPerf to transcribe, letter by letter, what a person is saying. RNNTs are popular for many real-world applications today, including virtual assistants, media content search and subtitling systems, and clinical documentation and dictation.
The system contained 45 million weights on 140 million PCM devices across five chips. It was able to take audio of people speaking and transcribe it with an accuracy very close to digital hardware setups. Unlike the first demo, this one was not entirely end-to-end, meaning it did require some off-chip digital computation. However, so little additional compute is involved here that, had it been implemented on the chip, the resulting energy efficiency would still be higher than products on the market today.
Once again using data uploaded to MLCommons, the team compared their network’s efficacy to RNNTs running on digital hardware. MLPerf data showed that the IBM prototype was estimated to be roughly 14 times more performant per watt — or efficient — than comparable systems. This is the first analog system that IBM researchers have been able to actually test with MLPerf, as past experiments have just been too small to compare.
Natural-language tasks aren’t the only AI problems that analog AI could solve — IBM researchers are working on a host of other uses. In a paper published earlier this month in Nature Electronics, the team showed it was possible to use an energy-efficient analog chip design for scalable mixed-signal architecture that can achieve high accuracy in the CIFAR-10 image dataset for computer vision image recognition.
These chips were conceived and designed by IBM researchers in the Tokyo, Zurich, Yorktown Heights, New York, and Almaden, California labs, and built by an external fabrication company. The phase change memory and metal levels were processed and validated at IBM Research’s lab in the Albany Nanotech Complex.
If you were to combine the benefits of the work published today in Nature, such as large arrays and parallel data-transport, with the capable digital compute-blocks of the chip shown in the Nature Electronics paper, you would see many of the building blocks needed to realize the vision of a fast, low-power analog AI inference accelerator. And pairing these designs with hardware-resilient training algorithms, the team expects these AI devices to deliver the software equivalent of neural network accuracies for a wide range of AI models in the future.
While this work is a large step forward for analog AI systems, there is still much work to be done before we could see machines containing these sorts of devices on the market. The team’s goal in the near future is to bring the two workstreams above into one, analog mixed-signal, chip. The team is also looking to see how foundation models could be implemented on their chips.
Analog AI is now very much on the path to solving the sorts of AI problems that today’s digital systems are tackling, and the vision of power-conscious analog AI, married up with the digital systems we use today, is becoming clearer.
Ambrogio, S., Narayanan, P., Okazaki, A. et al. An analog-AI chip for energy-efficient speech recognition and transcription. Nature 620, 768–775 (2023). ↩