4 minute read

AI transformers shed light on the brain’s mysterious astrocytes

Researchers explore the parallels between how AI transformers process information and how networks of neurons and astrocytes do the same.

IBM RESEARCH_BlogPost_aug31.jpg

Researchers explore the parallels between how AI transformers process information and how networks of neurons and astrocytes do the same.

Astrocytes are among the most abundant cells in the brain. Neurons, however, tend to get most of the attention, in part because their electrical signals are easier to measure than the chemical signals that neurons and astrocytes silently exchange with themselves and each other.

That’s now changing as these star-shaped cells woven into the brain’s neurons come into sharper view. New calcium imaging techniques have allowed scientists to visualize and study the signals astrocytes use to amplify and quiet their neuron neighbors. Now, transformers, the powerful architecture underpinning generative AI chatbots like ChatGPT and Bard, are providing new insights into what astrocytes might be doing computationally.

In a new study in the journal Proceedings of the National Academy of Sciences (PNAS), researchers at IBM, Harvard, and MIT, use transformers to explore how networks of astrocytes and neurons process information and contribute to learning and memory in the brain. Their model is the first to show theoretically how neurons and astrocytes may communicate while processing language and images.

“Neurons in the brain helped inspire artificial neural networks in modern AI,” said Dmitry Krotov, an AI researcher at IBM Research. “We wanted to flip that around and see what recent advances in AI could teach us about the biological computation of neurons and astrocytes.”

Transformers were originally designed to handle language but are now widely used to process images, speech, and audio. Before transformers, neural networks had to be trained on labeled datasets that were costly to compile. Transformers eliminated the bottleneck by being able to ingest massive, raw datasets and extract their underlying structure. By creating a compressed representation of large-scale data, transformer-based AI models known as foundation models could be fine-tuned and applied to a range of tasks.

The most common AI architecture for training a language model before transformers was recurrent neural networks (RNNs). RNNs would compare each word in a sentence to a hidden state determined by the preceding words. Transformers, by contrast, have an “attention” mechanism that gives them a longer context window. By holding more words in memory, they can compare all the words in a sentence at once, building up a richer representation of how words relate to each other, reducing the need for labeled examples to train the model.

A biological analog for transformers

Transformers have led to a paradigm shift in machine learning since their introduction in 2017. Instead of having to train an AI model for each task, transformers let you recycle the same model over and over with additional tuning. Transformers made AI vastly more accessible and led to breakthroughs like IBM’s new enterprise platform watsonx for tuning and deploying AI at enterprise-scale.

Like RNNs for words, and convolutional neural networks (CNNs) for images, transformers are made up of networks of artificial ‘neurons.’ At IBM, Krotov had long been interested in exploring neural networks as digital analogs to associative memory, which is responsible for things like putting names to faces or evoking the smell of a strawberry when we see one.

The transformer paper grabbed Krotov’s interest, and not just for its catchy title, "Attention Is All You Need." For years, he had been working on increasing the memory capacity of neural networks. In a 2021 paper, he and John Hopfield, a neuroscientist at Princeton, showed that the attention mechanism in transformers, and associative memory in biological neurons, were computationally similar. But for biological neurons to match the memory capacity of AI transformers, they hypothesized, three or more neurons would have to be connected by a single synapse. In nature, however, only two neurons connect at a synapse.

As the transformer was becoming the default architecture in AI, the astrocyte’s star was also rising. For years, astrocytes had been viewed as support cells for neurons, regulating blood flow and clearing excess neurotransmitters. But calcium-imaging studies were revealing that astrocytes were also important for information processing.

Leo Kozachkov was a PhD student at MIT when he read the first paper that confirmed astrocytes could send calcium signals fast enough to communicate with neurons and influence their behavior. He joined the MIT-IBM Watson AI Lab as an intern in 2022. Together, he and Krotov decided to explore whether astrocytes were involved in memory.

A biologically plausible transformer made of astrocytes and neurons

Neurons communicate with each other by sending chemicals, called neurotransmitters, across a tiny synaptic gap. One astrocyte may have millions of tentacles, or processes, each one wrapped around a different gap where they can sense neurotransmitter levels and either mute or amplify the neurons’ message by taking up neurotransmitters or releasing their own.

By collecting signals from millions of synapses at once, astrocytes may serve as a kind of memory buffer, said Krotov, storing and integrating information received from nearby neurons. To build a model of how this works, Krotov and Kozachkov took the math underlying an AI transformer and applied it to their neuron-astrocyte model. Over many iterations, and with input from neuroscientists, including study co-author Ksenia Kastanenka at Harvard, they came up with a theoretical model that described how neuron-astrocyte networks “read” and “write” to memory.

To test their hypothesis, they fed their model the abstract of their PNAS paper and teased out just the astrocyte's response, computing its fluctuating calcium levels, as the model processed each word. They did the same for a pre-trained language transformer, recording its fluctuating attention as it processed each word and computing its relative importance to the other words.

When the researchers compared both responses, they found that the rise and fall of the astrocyte’s modeled calcium signal and the transformer’s attention signal were nearly identical. A related experiment with an image transformer had similar results. This confirmed that their biological neuron-astrocyte model could potentially process data in similar ways as transformers.

“Astrocytes are found almost everywhere in the brain, from the hypothalamus to the visual cortex,” said Kozachkov. “And transformers are becoming nearly as ubiquitous in AI because they’re so good at processing all data types – text, images, audio. It makes sense that these two types of networks might perform similar computations.”