Carver Mead, an electrical engineering researcher at California Institute of Technology, had a huge influence on the field of neuromorphic computing back in the 1990s, when he and his colleagues realized that it was possible to create an analog device that, at a phenomenological level, resembles the firing of neurons.
Decades later, this is essentially what chips like Hermes and IBM’s other prototype analog AI chip are doing: Analog units both perform calculations and store synaptic weights, much like neurons in the brain do. Both analog chips contain millions of nanoscale phase-change memory (PCM) devices, a sort of analog computing version of brain cells.
The PCM devices are assigned their weights by flowing an electrical current through them, changing the physical state of a piece of chalcogenide glass. When more voltage passes through it, this glass is rearranged from a crystalline to an amorphous solid. This makes it less conductive, changing the value of matrix multiplication operations when they are run through it. After an AI model is trained in software, all synaptic weights are stored in these PCM devices, just like memories are in biological synapses.
“Synapses store information, but they also help compute,” says IBM Research scientist Ghazi Sarwat Syed, who works on designing the materials and device architectures used in PCM. “For certain computations, such as deep neural network inference, co-locating compute and memory in PCM not only overcomes the von Neumann bottleneck, but these devices also store intermediate values beyond just the ones and zeros of typical transistors.” The aim is to create devices that compute with greater precision, can be densely packed onto a chip, and can be programmed with ultra-low currents and power.
“Furthermore, we’re trying to give these devices more flavor,” he says. “Biological synapses store information in a nonvolatile way for a long time, but they also have changes that are short-lived.” So, his team is working on ways to make changes in the analog memory that better emulate biological synapses. Once you have that, you can craft new algorithms that solve problems that digital computers have difficulty doing.
One shortcoming of these analog devices, Bragaglia notes, is that they are currently limited to inferencing. “There are no devices that can be used for training because the accuracy of moving the weights isn’t there yet,” she says. The weights can be cemented into PCM cells once an AI model has been trained on digital architecture, but changing the weights directly through training isn’t yet precise enough. Plus, PCM devices are not durable enough to have their conductance changed a trillion and more times, like would happen during training, according to Syed.
Multiple teams at IBM Research are working to address the issues created by non-ideal material properties and insufficient computational fidelity. One such approach involves new algorithms that work around the errors created during model weight updates in PCM. They’re still in development, but early results suggest that it will soon be possible to perform model training on analog devices.
Bragaglia is involved in a materials science approach to this problem: a different kind of memory device called resistive random-access memory or RRAM. RRAM functions by similar principles as PCM, storing the values of synaptic weights in a physical device. An atomic filament sits between two electrodes, inside an insulator. During AI training, the input voltage changes the oxidation of the filament, which alters its resistance in a very fine manner — and this resistance is read as a weight during inferencing. These cells are arranged on a chip in crossbar arrays, creating a network of synaptic weights. So far, this structure has shown promise for analog chips that can perform computation while remaining flexible to updates. This was made possible only after years of material and algorithm co-optimization by several teams of researchers at IBM.
Beyond the way memories are stored, the way data flows in some neuromorphic computer chips can be fundamentally different from the way it does in conventional ones. In a typical synchronous circuit — most computer processors — streams of data are clock-based, with a continuous oscillating electrical current that synchronizes the actions of the circuit. There can be different structures and multiple layers of clocks, including a clock multiplier that enables a microprocessor to run at a different rate than the rest of the circuit. But on a basic level, things are happening even when no data is being processed.
Instead of this, biology uses event-driven spikes, says Syed. “Our nerve cells are communicating sparsely, which is why we’re so efficient,” he adds. In other words, the brain only works when it must, so by adopting this asynchronous data processing stream, an artificial emulation can save significant amounts of energy.
All three of the brain-inspired chips at IBM Research were designed with a standard clocked process, though.
In one of these cases, IBM Research staff say they’re making significant headway into edge and data center applications. “We want to learn from the brain,” says IBM Fellow Dharmendra Modha, “but we want to learn from the brain in a mathematical fashion while optimizing for silicon.” His lab, which developed NorthPole, doesn’t mimic the phenomena of neurons and synapses via transistor physics, but digitally captures their approximate mathematics. NorthPole is axiomatically designed and incorporates brain-inspired low precision; a distributed, modular, core array with massive compute parallelism within and among cores; memory near compute; and networks-on-chip. NorthPole has also moved from TrueNorth’s spiking neurons and asynchronous design to a synchronous design.
For TrueNorth, an experimental processor that was an early springboard for the more sophisticated NorthPole, Modha and his team realized that event-driven spikes use silicon-based transistors inefficiently. Neurons in the brain fire at about 10 hertz (10 times a second), whereas today’s transistors run in gigahertz — the transistors in IBM’s Z 16 run at 5 GHz, and transistors in a MacBook’s 6-core Intel Core i7 run at 2.6 GHz. If the synapses in the human brain operated at the same rate as a laptop, “our brain would explode,” says Syed. In neuromorphic computer chips such as Hermes — or brain-inspired ones like NorthPole — the goal is to combine the bio-inspiration of how data is processed with the high-bandwidth operation required by AI applications.
Because of their choice to move away from neuron-like spiking and other features that mimic the physics of the brain, Modha says his group leans more towards the term ‘brain-inspired’ computing than ‘neuromorphic.’ He envisions that NorthPole has lots of room for growth, because they can tweak the architecture in purely mathematical and application-centered ways to achieve more gains while also exploiting silicon scaling and lessons gleaned from user feedback. And the data show that their strategy worked: In new results from Modha’s team, NorthPole performed inference on a 3-billion-parameter model 46.9 times faster than the next most energy-efficient GPU, at 72.7 times higher energy efficiency than the next lowest latency one.
Researchers may still be defining what neuromorphic computing is or the best ways to build brain-inspired circuits, says Syed, but they tend to agree that it’s well suited for edge applications — phones, self-driving cars, and other applications that can take advantage of fast, efficient AI inferencing with pre-trained models. A benefit of using PCM chips on the edge, Sebastian says, is that they can be exceptionally small, performant, and inexpensive.
Robotics applications could be well suited for brain-inspired computing, says Modha, as well as video analytics, in-store security cameras for example. Putting neuromorphic computing to work in edge applications could help solve problems of data privacy, says Bragaglia, as in-device inference chips would mean data doesn't need to be shuttled back and forth between devices, or to the cloud, to perform AI inferencing.
Whatever brain-inspired or neuromorphic processors end up coming out on top, researchers also agree that the current crop of AI models are too complicated to be run on classical CPUs or GPUs. There needs to be a new generation of circuits that can run these massive models.
“It’s a very exciting goal,” says Bragaglia. “It’s very hard, but it’s very exciting. And it’s in progress.”