“Architecturally, NorthPole blurs the boundary between compute and memory,” Modha said. "At the level of individual cores, NorthPole appears as memory-near-compute and from outside the chip, at the level of input-output, it appears as an active memory.” This makes NorthPole easy to integrate in systems and significantly reduces load on the host machine.
But the biggest advantage of NorthPole is also a constraint: it can only easily pull from the memory it has onboard. All of the speedups that are possible on the chip would be undercut if it had to access information from another place. Via an approach called scale-out, NorthPole can actually support larger neural networks by breaking them down into smaller sub-networks that fit within NorthPole’s model memory, and connecting these sub-networks together on multiple NorthPole chips. So while there is ample memory on a NorthPole (or collectively on a set of NorthPoles) for many of the models that would be useful for specific applications, this chip is not meant to be a jack of all trades. “We can’t run GPT-4 on this, but we could serve many of the models enterprises need,” Modha said. “And, of course, NorthPole is only for inferencing.”
This efficacy means that the device also doesn’t need bulky liquid-cooling systems to run — fans and heat sinks are more than enough — meaning that it could be deployed in some rather small spaces.
While research into the NorthPole chip is still ongoing, its structure lends itself to emerging AI use cases, as well as more well-established ones.
In testing, NorthPole team focused primarily on computer vision-related uses, in part because funding for the project came from the U.S. Department of Defense. Some of the primary applications in consideration were detection, image segmentation, and video classification. But it was also tested in other arenas, such as natural language processing (on the encoder-only BERT model) and speech recognition (on the DeepSpeech2 model). The team is currently exploring mapping decoder-only large language models to NorthPole scale-out systems.
When you think of these AI tasks, all sorts of fantastical use cases spring to mind, from autonomous vehicles, to robotics, digital assistants, or spatial computing. Many sorts of edge applications that require massive amounts of data processing in real time could be well-suited for NorthPole. For example, it could potentially be the sort of device that’s needed to move autonomous vehicles from machines that require set maps and routes to operate on a small scale, to ones that can think and react to the rare edge-case situations that make navigating in the real world so challenging even for proficient human drivers. These sorts of edge-cases are the exact sweet spot for future NorthPole applications. NorthPole could enable satellites that monitor agriculture and manage wildlife populations, monitor vehicle and freight for safer and less congested roads, operate robots safely, and detect cyber threats for safer businesses.
This is just the start of the work for Modha on NorthPole. The current state of the art for CPUs is 3 nm — and IBM itself is already years into research on 2 nm nodes. That means there’s a handful of generations of chip processing technologies NorthPole could be implemented on, in addition to fundamental architectural innovations, to keep finding efficiency and performance gains.
But for Modha, this is just one important milestone along a continuum that has dominated the last 19 years of his professional career. He’s been working on digital brain-inspired chips throughout that time, knowing that the brain is the most energy-efficient processor we know, and searching for ways to replicate that digitally. TrueNorth was fully inspired by the structures of neurons in the brain — and had as many digital “synapses” in it as the brain of a bee. But sitting on a park bench in 2015 in San Francisco, Modha said he was thinking through his work to date. He had the belief that there was something in marrying the best of traditional processing devices with the structure of processing in the brain, where memory and processing are interspersed throughout the brain. The answer was “brain-inspired computing, with silicon speed,” according to Modha.
Over the next eight years, Modha and his colleagues were single-minded and hermetic in their goal of turning this vision into a reality. Toiling inconspicuously in Almaden, the team didn’t give any lectures or publish any papers on their work, until this year. Each person brought different skills and perspective yet everyone collaborated so that as a whole the team’s contribution was much greater than the sum of the parts. Now, the plan is to show what NorthPole could do, while exploring how to translate the designs into smaller chip production processes and further exploring the architectural possibilities.
This work stemmed from simple ideas — how can we make computers that work like the brain — and after years of fundamental research, has come up with an answer. Something that is really only possible today at a place like IBM Research, where there is the time and space to explore the big questions in computing, and where they can take us. “NorthPole is a faint representation of the brain in the mirror of a silicon wafer,” Modha said.