News
6 minute read

It takes a village to make open infrastructure for AI a reality

At this year’s AI Hardware Forum, IBM Research and its partners showcased how they’re transforming the AI age with complete solutions in an open ecosystem.

The next leap in AI is not going to be made by a single company or a single lab, proclaimed Griselda Bonilla, senior technical staff member at IBM Research. “It’s going to be done with collaboration.”

Standing center stage in the Thomas J. Watson Research Center’s auditorium, Bonilla delivered these opening remarks as she kicked off the sixth annual AI Hardware Forum, sponsored by the IBM Research AI Hardware Center. What followed was a day of presentations and panels led by AI researchers from industry and academia, centered around building open software that supports a heterogeneous hardware ecosystem for the future of AI.

Since its launch in 2019, the AI Hardware Center’s goal has been to help realize AI’s potential, and that work is proceeding apace. Its biggest victory to date has been the launch of IBM’s Spyre AI accelerator for IBM z17 and Power11 systems. Evolving out of prototypes from the Center, this dedicated AI system-on-chip was developed in partnership with the IBM Infrastructure team. Spyre is now a product-grade AI accelerator, poised to dramatically improve the power performance of generative AI workflows, while delivering the security and reliability IBM hardware is known for.

Since last year’s forum, IBM Research also announced a breakthrough in co-packaged optics intended to supercharge generative AI. The new interconnect device increases the optical fiber density at the edge of a chip by six times over existing optical waveguides, offering an estimated 80-fold improvement in bandwidth. It stands to revolutionize power efficiency for datacenters, allowing processors to work at their full capacity rather than idling while they wait for data to travel over copper — meaning fewer processors could be used for model training and inference. One of the biggest bottlenecks in AI computing isn’t chip power — it’s moving all the data.

Along with the emphasis on hardware, completeness was a central theme of this year’s AI Hardware Forum. “A chip alone isn’t enough,” said IBM Research’s VP of hybrid cloud Mukesh Khare. As powerful as Spyre (or any other company’s eventual AI accelerator) may be, it needs to be backed up by a full software stack. And that stack won’t be built by IBM alone.

2025-11-12-AI-Hardware-Forum-IBM-Research-Yorktown-Heights-NY-DCM_6185.jpg
The sixth annual AI Hardware Forum brought together IBM Research's existing partners, alongside representatives from potential new partner companies and universities, for the AI Hardware Center's flagship event.

Growing partnerships

Without the AI Hardware Center’s partners in industry and academia, these achievements would be happening in a vacuum. To this end, Khare announced the next evolution of IBM Research’s partnership with the University at Albany, which will now include additional projects and updated hardware for the Center for Emerging AI Systems, the testbed collaboration formed by the two institutions in 2023.

IBM and UAlbany's first prototype computing cluster was built with experimental Spyre devices in 2024, and the center will now receive the new product-grade Spyre accelerators. The university will be IBM Research’s first external partner to have them. They will also be available to other institutions that want to pursue research collaborations.

In addition to the new devices, UAlbany and IBM have seven new AI research projects eligible for funding in the coming year, in addition to the five that began last year. The new work spans from nitty-gritty computer science research on AI models to specific scientific applications of AI. One project in the math and statistics department, for example, is entitled “Scalable and Expressive Attention Mechanisms for NLP,” while a cancer epidemiology and biostatistics project is called “Accelerating Mutational Signature Extraction Using IBM Spyre Accelerators.”

This year IBM also launched a collaboration with the National University of Singapore. Researchers in the new center will be working on projects around weather modeling and prediction, sustainable urban development, and security. They’ll be making use of IBM’s full-stack AI infrastructure and open-source models, including the Granite family of models and Spyre accelerators, while simultaneously focusing on microelectronics research to accelerate AI systems from the chip up.

2025-11-12-AI-Hardware-Forum-IBM-Research-Yorktown-Heights-NY-DCM_6180.jpg
AI Hardware Center Director Jeff Burns outlined where the Center has been — and where it is going next.

An evolving center

The audience at the 2025 AI Hardware Forum included the Center’s existing partners in industry and academia. And as Khare pointed out, the representatives of more than 40 non-member companies and more than 10 non-member universities were in attendance. This participation and interest from new potential partners spoke to the continuous evolution of the AI Hardware Center, whose focus isn’t limited to inventing chips. To Ed Barth, a business development executive focused on this space for IBM Research, the AI Hardware Center can be thought of as an entry point to the network of collaborations that will be required to drive the whole field forward.

This year, IBM showcased new contributions to open-source projects aimed at building the next generation of open, extensible inference systems. By adapting open-source tools to maximize Spyre’s performance and portability, IBM Research has also contributed new tools to the open-source community. For example, IBM adopted the inference serving library vLLM as Spyre’s inference runtime and torch.compile as its frontend compiler. The work done to make this possible will allow developers to integrate Spyre and other emerging accelerators into their stack with minimal effort.

In the coming year, the AI Hardware Center will focus on enhancing and enriching content for member organizations with a deepening commitment to the open-source software that supports AI compute, including PyTorch, vLLM, and llm-d. At the same time, the Center will continue evolving its foundational work on digital and analog accelerators, including reduced precision math and algorithms for AI.

The Center’s efforts with heterogeneous integration and advanced packaging, for concepts like chiplets and 3D stacking, will keep moving forward too. The goal is to bolster what can be done in existing and future CMOS devices, including co-packaged optics and wafer-level fan-out packaging.

Far-reaching goals

Several talks over the course of the day discussed the growing energy demands of AI workloads. Minlan Yu, professor of computer science at Harvard University, took the stage to break down AI’s energy hunger in greater detail. She noted how power is one of the greatest constraints for the growth of AI, and how AI datacenters break the normal assumptions around load diversity in the power grid: that different power users need different amounts of energy in different places, which naturally creates a leveling effect over time. Datacenters, on the other hand, require massive amounts of power across a small number of sites, with a high concentration of machines in a small space, and a high frequency of variability in their power needs. In other words, AI datacenters pack enormous power consumption into a few places and rarely let up.

As part of Yu’s work, she and her Harvard collaborators are looking into new ways to transform datacenters from passive consumers to active self-regulating collaborators, all while modernizing market structures in the grid to accommodate changing load requirements.

Mike Schulte, a senior fellow at AMD Research, pointed out that reasoning and agents are fueling a surge in compute demand. Schulte went on to outline how at the root of the power crisis is the fact that today’s massive amounts of inference requests require datacenters filled with CPUs and GPUs, while the exponential growth in AI model size is simultaneously driving huge increases in energy requirements for training. Of course, the energy required to move the data is also a big issue, meaning that there will likely be higher demand for AI compute that’s kept as local as possible. Meanwhile, for the first time hardware is being developed faster than software, and the growth of compute efficiency is slowing. This means there’s an opportunity for AI-specific compute that increases efficiency. Low-precision number formats will likely be a part of this future, Schulte said, along with reducing memory energy, increasing energy bandwidth, processing in-memory, and optical communication — all things that both AMD and IBM are working on.

On the software ecosystem side of things, multiple talks focused on the open-source community’s role in integrating backend software with different processors, as well as building, training, and deploying models. In his presentation on PyTorch, Meta senior engineer Alban Desmaison outlined how it can work as a backend for multiple different AI accelerator types, in line with IBM’s commitment to advancing heterogeneous compute. And in a plenary talk on building, training, and deploying models, Red Hat director of engineering and IBM distinguished engineer Carlos Costa took the stage to talk about how vLLM and llm-d are making it easier to scale LLM inference — which has become more complex in this multi-accelerator, high-demand era.

“It’s old-fashioned to talk about just training and inference,” said AI Hardware Center Director Jeff Burns in his Forum talk. Nowadays it’s more accurate to break the divisions down into data preparation, distributed training, model customization, and inference, he argued. As he reflected on how far the Center has come, from the first discussions of low-precision arithmetic to adding AI to compute by putting it on a PCIe card, Burns emphasized how the field is far from settled: “I still think we’re in inning one of at least a nine-inning ballgame.”

Related posts