IBM’s Dmitry Krotov wants to crack the ‘physics’ of memory
Thrust in the spotlight after his mentor won the Nobel Prize last year, Krotov continues to search for an AI architecture that is both interpretable and could explain how our own memory works.
Dmitry “Dima” Krotov was among the first to congratulate AI pioneer, John Hopfield, on his Nobel Prize in Physics last fall. “John, wow!” he texted Hopfield on the morning the award became public. “Just WOW!!”
As Hopfield’s close collaborator, Krotov has helped explain to the world following the announcement how Hopfield’s single-layer digital neural network led to the “deep” networks in use today. At Princeton, the two researchers invented something called dense associative memory, which lifted the memory storage limits of those early Hopfield networks, opening them to practical applications.
Now a researcher at IBM, Krotov is carrying on Hopfield’s ideas by building computational models to improve artificial intelligence, and to even understand the underpinnings of intelligence itself.
Associative memory may never displace transformers as the backbone of generative AI, but it could provide ideas for making AI more transparent and comprehensible to us humans. And Krotov’s parallel work, using associative memory to model biological computation, could help explain how our brains manage to squeeze so much information into such a small space.
It may not be immediately clear what this all has to do with physics, the science concerned with the nature and evolution of physical matter. But if you have the time, Krotov will elaborate at length, something he now does regularly following talks he gives beyond IBM’s Cambridge lab where he works.
“Computation is a physical process,” he said recently. “We can study the flow of bits just as we study the flow of atoms.”
Traditional software runs on hard-coded instructions for processing data. The nodes and weights underpinning today’s LLM chatbots, by contrast, are designed to learn from raw data, with no explicit instruction, allowing them to take on more complex, open-ended problems.
Like living cells in the brain, the synapses connecting these digital neurons become stronger during learning. From their conception in the 1950s, digital neural networks were viewed as an alternative to traditional programs. But amid questions about their computational limitations, public perception shifted, and interest waned, triggering what was later termed the “AI Winter.”
In 1982, Hopfield helped to revive interest in neural networks by showing that a 30-node network linked by 435 weights could store and retrieve patterns of 1s and 0s. His inspiration came from magnetic materials, and how one atom’s spin could influence its neighbors to fundamentally change the system’s behavior.
On their own, atoms or neurons behave predictably. But in a network, their collective properties sometimes change as they interact, creating what physicists call emergent behavior. This unexpected shift has been observed in all kinds of neural networks as their amount of training data scales, giving them capabilities that go beyond their explicit training.
In Hopfield networks and other energy-based models, the system’s dynamic behavior can be expressed through an “energy” function, which assigns lower energy to correct configurations of neurons or real data points and higher energy to undesirable configurations or incorrect data.
The energy function progressively brings the network toward one of potentially millions of stable, minimal energy states. Hopfield showed that neurons could iteratively encode, and retrieve, stored patterns by minimizing the energy of the network.
“You can think about computation as a ball rolling down a rugged landscape,” says Krotov. “The ball stops when it reaches one of the local minima. At that point, the computation is done. You take that state and convert it to the answer to the question you just asked.”
Transformers, by contrast, use a self-attention mechanism to process sequences of words, pixels, and other elements to encode and retrieve patterns. Their ability to process information in parallel makes them more computationally efficient than energy-based models, but their complexity makes them far less interpretable.
Krotov grew up in Russia. His mother taught English; his grandparents taught physics and math and helped to set him on a similar path.
He won several math and physics competitions as a teenager, earning a spot at the elite boarding school in town, and later, Moscow State University, where he studied physics. He applied to a PhD program in Russia, but when he didn’t get in, he chose Princeton and eventually started working with a well-known theoretical physicist, Alexander Polyakov.
Krotov might have stayed there, in high-energy physics, if not for a seminar that introduced him to open questions in living systems. As his interests shifted, he moved to an office in biology, a few doors down from Hopfield, a polymath condensed matter physicist who had become famous for his insights into not only semiconductors, but proteins, genetics, and neural computation.
He was also a friendly sort. “Hopfield was the type willing to talk to random graduate students,” Krotov remembers. “He was one of those rare people who was both creative and technically skillful. He knew how to ask provocative questions.”
After finishing his thesis, Krotov joined the Institute for Advanced Study at Princeton as Hopfield’s postdoc. Most of Hopfield’s Nobel-winning work had been done three decades earlier at Caltech and Bell Labs, where he drew physicists into his orbit to study the computation behind memory. “He cast this new problem in the language of physics and that’s why the community followed him,” said Krotov.
Like nerve cells, Hopfield networks use recurrent feedback loops to summarize and store information, but they have nowhere near the memory capacity. Together, Hopfield and Krotov realized that more memories could be packed into the same space by having each neuron interact with more than one neuron at a single synapse.
Their 2016 paper, on dense associative memory, helped to revive interest in Hopfield networks and energy-based models. Around the same time, a team at Google published their groundbreaking “attention” paper, leading the way for today transformer-based chatbots and LLM agents.
After finishing his postdoc, Krotov was recruited to join the new MIT-IBM Watson AI Lab. Over the last seven years, he has extended the work he did with Hopfield on associative memory to other AI architectures, as well as the brain.
In 2023, Krotov and his colleagues introduced the “energy transformer,” which reimagined the flexible but opaque architecture of a transformer with one computationally constrained by an energy function. In the energy transformer, you can watch as new "memory" patterns form and see where they end up. Locating stored information in a transformer, by contrast, is still virtually impossible because of the complex flow of data through its many layers.
“You can see where patterns in the energy transformer are stored, and you can watch as the model gradually extracts the relevant information from its memory banks,” says Benjamin Hoover, an IBM researcher who works closely with Krotov.
On a second front, Krotov is exploring the links between associative memory and diffusion models, which learn to generate entirely new, realistic images by effectively correcting errors, or statistical noise, that’s been added to an image.
This error-correction process is not so different from Hopfield networks and other energy-based models that store and recall information. The older models, however, were traditionally seen as more faithful to their training data. When information in memory was retrieved, it usually came out looking the same.
Except, as Krotov helped show, if energy models are fed a lot more data.
Instead of returning a carbon copy, these models might output something far more imaginative. The researchers now hypothesize that the transition from memorizing data to generating something wildly new might be a function of how much data you give the model. Previously, researchers referred to these novel outputs as “spurious states” — or mistakes, essentially.
“These spurious states are very stable and reproducible, but they are also different from the training data,” said Bao Pham, an IBM intern and PhD student at RPI who recently presented the work at ICLR 2025’s associative memory workshop.
Krotov is also exploring how similar principles may apply to the brain, in how regular nerve cells and astrocytes, one of the brain’s most abundant cells, transmit information. With researchers at MIT, Krotov recently built a model of neuron-astrocyte interactions that revealed intriguing parallels between how the brain stores and retrieves memories and how digital networks do the same. They recently published their work in the journal PNAS.
“For years, people thought astrocytes were just support cells, but now, thanks to better imaging tools, we’re starting to realize they play a much bigger role,” said Leo Kozachkov, who worked on the project as a PhD student at MIT and is now a Goldstine postdoctoral fellow at IBM. “This reserach offers a new framework for exploring what these cells are doing. It could also provide new ideas for treating brain disorders and creating alternate AI architectures.”
AI’s most venerable conference, NeurIPS, grew out of the informal “Hopfests” started in the 1980s for neural network enthusiasts to share their work. NeurIPS is held each December, and for the last decade, Krotov has stayed the entire week to catch up with colleagues.
This year, however, he arrived late, so he could be in Stockholm to watch Hopfield accept his medal, along with the other 2024 Nobel laureates, including Geoffrey Hinton, who shared the physics prize. Amid several days of talks, tours, and performances, Krotov had a chance to rub shoulders with people he never dreamed of meeting. While waiting for the dining room to open one morning, he was even mistaken for a family member of Han Kang, the South Korean novelist who won the literature prize.
For the ceremony itself, Krotov traded his typical uniform of jeans and a t-shirt for a tailored tailcoat. Over lunch one day he passed around his phone to share photos with his colleagues at IBM. One researcher asked about the tailcoat. “A formal tux – the type with the tails,” he explained.
A physics researcher from MIT had dropped in for lunch and did a double take, as he realized where Krotov had been. “You touched a Nobel!” he said, his eyes growing wide. Krotov shrugged and chuckled. “I would never have imagined that I’d be invited anywhere near that room,” he said.
In his interactions around the lab, Krotov comes across as approachable and curious, not unlike his mentor. Except now it’s his turn to mentor a new generation of scientists.
“I've learned an enormous amount from Dima,” says Kozachkov. “The big lessons I’ve taken from working with him are to pursue ideas you find beautiful and interesting, and to be thorough and precise in thinking about them.”
Although there’s still many promising avenues for transformers, Krotov is thinking about what comes next for the industry.
“Early quantum mechanics, before Schrödinger’s equation, looked like AI today — a lot of fascinating empirical observations with no theoretical understanding of what was going on,” he said. “Physicists have some of the best mathematical tools to study emergence, a phenomenon we see everywhere in AI. We should take advantage of them.”
Related posts
- ReleaseKim Martineau
IBM Granite model tops Hugging Face speech recognition leaderboard
NewsMike MurphyA faster way to screen supply chains for harmful chemicals
ResearchKim MartineauHow to make AI models more accurate: Embrace failure
ResearchPeter Hess