As the newest AI accelerator, Spyre shares a very similar architecture to that first prototype. Spyre has 32 individual accelerator cores onboard, and contains 25.6 billion transistors using 14 miles of wire. It will be produced using 5 nm node process technology, and each Spyre is mounted on a PCIe card. Cards can be clustered together — for example, a cluster of 8 cards adds 256 additional accelerator cores to a single IBM Z system.
Roughly 70% of the entire world’s transactions by value run through IBM mainframes. And now, there’s a simple way to bring generative AI to these mission-critical machines. The new Spyre accelerators are meant to help enterprise users expand their AI capabilities as their need grows. As users begin to test out and look to deploy new AI models and programs across their organization, they’re going to require more horsepower to get these tasks done. Every day, new use cases tailor-made for businesses are springing, from generative AI solutions for automating business processes, to generative systems for app modernization. With the Spyre Accelerator, businesses can deploy AI software on Z, while benefiting from the security and reliability IBM Z offers.
The Spyre accelerator is the first system-on-a-chip that will allow future IBM Z systems to perform AI inferencing at an even greater scale than available today. Working with the IBM Z team, IBM Research has helped build a device that brings modern AI workloads to the mainframe. This work puts the IBM Z system on a path where generative AI and model fine-tuning is possible on premises. The teams’ innovative design approach allows the PCIe cards to be slotted into an IBM Z system.
As with Telum and the AIU prototype before it, Spyre’s architecture is far more efficient for AI tasks than industry-standard CPUs. In traditional computing structures, instructions and data for calculations are constantly transferred between the processor unit and memory. But as most AI calculations involve matrix and vector multiplication, the IBM Research chip architecture for AI devices features a simpler layout than CPUs that are designed to be jacks-of-all-trades. Here, the chip has been designed to send data directly from one compute engine to the next, leading to energy savings. This family of processors also uses a range of lower precision numeric formats (such as int4 and int8), to make running an AI model more energy efficient and far less memory intensive.
The new Spyre Accelerator will lead to exciting potential new use cases for IBM Z. Beyond simply detecting fraud in transactions, a system equipped with a Spyre cluster could leverage much more complex AI models to identify intricate fraud patterns that a less sophisticated model might have missed.
It also opens up how IBM Z can make use of generative AI and watsonx, IBM’s AI and data platform. Spyre brings the ability to run products like watsonx Code Assistant, which allows businesses to modernize code bases on mainframes, with far greater efficacy. You can use generative AI to understand what code is doing in your application, and what needs to be updated, amended, or just removed.
This is only the next step of what IBM Research sees as possible when it comes to AI on IBM Z. Teams are working to move beyond inferencing, to find effective and robust ways to do fine-tuning and even potentially training models, on mainframes. With systems like these, it’s easy to envision a future where organizations and businesses that want to keep their data secure on their premises (or can’t move data for regulatory or privacy reasons) could begin to train and deploy models on platforms like watsonx entirely within their organization — with all data remaining securely in place.