News
3 minute read

Enhancing enterprise AI with the IBM Spyre Accelerator

At Hot Chips 2024, IBM previewed its new Spyre accelerator chip for IBM Z, designed in collaboration with IBM Research, to scale the enterprise AI workloads of tomorrow.

At Hot Chips 2024, IBM previewed its new Spyre accelerator chip for IBM Z, designed in collaboration with IBM Research, to scale the enterprise AI workloads of tomorrow.

The explosion in AI’s capabilities over the last few years has been immense. And we’re starting to see how AI can be applied to real business use cases at the scale needed for enterprise demands.

In 2022, IBM unveiled the IBM z16, the latest system that brought powerful AI capabilities to IBM Z for the first time. Onboard the microprocessor chip, Telum, was a new AI accelerator jointly designed by IBM Research and IBM Infrastructure. It brought the ability to run AI inferencing at the speed of a transaction — like checking for fraud during a credit card swipe — to IBM Z.

The research behind AI accelerators continued on, and a very similar architecture was applied to a new IBM artificial intelligence unit (AIU) prototype chip. The capabilities of the single accelerator in Telum dramatically expanded by the 32 accelerator cores in the AIU prototype. And now, IBM has worked to evolve this prototype chip into an enterprise-grade product so that it could be incorporated into the next generation mainframe. The result of that work is the new IBM Spyre accelerator, previewed at the Hot Chips 2024 conference in Palo Alto, California.

IMAGE 4 FOR MEDIA - Telum II - Spyre Chip .png

As the newest AI accelerator, Spyre shares a very similar architecture to that first prototype. Spyre has 32 individual accelerator cores onboard, and contains 25.6 billion transistors using 14 miles of wire. It will be produced using 5 nm node process technology, and each Spyre is mounted on a PCIe card. Cards can be clustered together — for example, a cluster of 8 cards adds 256 additional accelerator cores to a single IBM Z system.

Roughly 70% of the entire world’s transactions by value run through IBM mainframes. And now, there’s a simple way to bring generative AI to these mission-critical machines. The new Spyre accelerators are meant to help enterprise users expand their AI capabilities as their need grows. As users begin to test out and look to deploy new AI models and programs across their organization, they’re going to require more horsepower to get these tasks done. Every day, new use cases tailor-made for businesses are springing, from generative AI solutions for automating business processes, to generative systems for app modernization. With the Spyre Accelerator, businesses can deploy AI software on Z, while benefiting from the security and reliability IBM Z offers.

The Spyre accelerator is the first system-on-a-chip that will allow future IBM Z systems to perform AI inferencing at an even greater scale than available today. Working with the IBM Z team, IBM Research has helped build a device that brings modern AI workloads to the mainframe. This work puts the IBM Z system on a path where generative AI and model fine-tuning is possible on premises. The teams’ innovative design approach allows the PCIe cards to be slotted into an IBM Z system.

As with Telum and the AIU prototype before it, Spyre’s architecture is far more efficient for AI tasks than industry-standard CPUs. In traditional computing structures, instructions and data for calculations are constantly transferred between the processor unit and memory. But as most AI calculations involve matrix and vector multiplication, the IBM Research chip architecture for AI devices features a simpler layout than CPUs that are designed to be jacks-of-all-trades. Here, the chip has been designed to send data directly from one compute engine to the next, leading to energy savings. This family of processors also uses a range of lower precision numeric formats (such as int4 and int8), to make running an AI model more energy efficient and far less memory intensive.

The new Spyre Accelerator will lead to exciting potential new use cases for IBM Z. Beyond simply detecting fraud in transactions, a system equipped with a Spyre cluster could leverage much more complex AI models to identify intricate fraud patterns that a less sophisticated model might have missed.

IBM_AIU_PCIE_05.jpg
The reverse of the card.

It also opens up how IBM Z can make use of generative AI and watsonx, IBM’s AI and data platform. Spyre brings the ability to run products like watsonx Code Assistant, which allows businesses to modernize code bases on mainframes, with far greater efficacy. You can use generative AI to understand what code is doing in your application, and what needs to be updated, amended, or just removed.

This is only the next step of what IBM Research sees as possible when it comes to AI on IBM Z. Teams are working to move beyond inferencing, to find effective and robust ways to do fine-tuning and even potentially training models, on mainframes. With systems like these, it’s easy to envision a future where organizations and businesses that want to keep their data secure on their premises (or can’t move data for regulatory or privacy reasons) could begin to train and deploy models on platforms like watsonx entirely within their organization — with all data remaining securely in place.

Date