Consider what can happen in the instant between swiping your credit card and having your transaction approved. It’s already a modern miracle that trillions of transactions authorizing sales in split seconds can take place around the world each day. But in that infinitely small period of time, what if a powerful AI system could determine whether that transaction was fraudulent, rather than figuring it out after the fact?
Today, IBM is unveiling the IBM Telum Processor, a new CPU chip that will allow IBM clients to use deep learning inference at scale. Telum is IBM’s first commercial processor to contain on-chip acceleration for AI inferencing. This could lead to breakthroughs in combating fraud, in credit approval, claims and settlements, and in financial trading, with systems able to conduct inferencing at the speed of a transaction.
From the dawn of the digital age, IBM has regularly produced new chip designs. Starting with the earliest mainframes to today's servers, every few years, we’ve been releasing ever more powerful CPUs. The original IBM Z and Power systems were major shifts in technology capability, and we believe Telum is the next shift for IBM. The AI technology powering Telum comes, in part, from years of work of the AI hardware team at IBM Research.
Artificial intelligence is enabling automation in a wide spectrum of industries but requires very high computational horsepower. Roughly six years ago, we started looking into building purpose-built AI hardware to meet the future challenges that will require dedicated processing power for AI systems. Over that time, we’ve built three generations of AI cores, and in 2019 launched the AI Hardware Center in Albany, New York to create a wider AI hardware-software ecosystem. Since 2017, we’ve been consistently improving the performance efficiency of our AI chips, boosting power performance by 2.5 times each year.
Our goal is to continue improving AI hardware compute efficiency by 2.5 times every year for a decade, achieving 1,000 times better performance by 2029.
Our most recent AI core design was presented at the 2021 International Solid-State Circuits Conference (ISSCC) as the world’s first energy-efficient AI chip. It's at the forefront of low-precision training and inference AI, built atop 7nm chip technology. Over the past few years, we’ve been working with the IBM Systems teams to integrate the AI core technology from this chip into IBM Z. This work eventually became part of a Telum-based system, which we expect in the first half of next year.
We see Telum as the next major step on a path for our processor technology, like previously the inventions of the mainframe and servers. The challenges facing businesses around the world keep getting more complex, and Telum will help solve these problems for years to come.
How it works
In traditional computing systems such as CPUs calculations are performed by repeatedly transferring instructions and data between the memory and processors. But AI workloads have much higher computational requirements and operate on large quantities of data. So as you infuse AI into application workflows, it is critical to have a heterogeneous system consisting of both CPU and AI cores that are tightly integrated on the same chip to support very low-latency AI inference.
Telum achieves exactly that, providing a dedicated AI resource for future IBM systems alongside the traditional horsepower of the CPU. The CPU cores are used effectively for general-purpose software applications, the AI cores are highly efficient at running deep learning workloads—and the tight coupling of the two types of cores helps facilitate fast data exchange.
Telum follows IBM’s history of a full-stack approach to system design—co-optimizing silicon technology, hardware, firmware, operating systems, and middleware—for our clients’ most critical workloads. With Telum, clients can process tens of thousands of transactions infused with AI every second.
The chip contains eight processor cores, running with more than 5GHz clock frequency, optimized for the demands of enterprise-class workloads. The completely redesigned cache and chip-interconnection infrastructure provides 32MB cache per core. The chip also contains 22 billion transistors and 19 miles of wire on 17 metal layers.
This horsepower is necessary to keep up with the demands of enterprise-grade AI solutions. AI model sizes are getting larger, as are the datasets they’re relying on, and the systems used to deploy them are getting increasingly complex. Telum represents the first IBM product developed using research advances from the AI Hardware Center. We believe that the future of AI is moving from systems that rely on reams of data and machine learning models, to ones that can reason and infer more like humans, providing context and nuance to future business decisions. We call this Neurosymbolic AI, and we’re working on our own hardware to support this vision—not just in the lab, but in an enterprise.
AI keeps proliferating into more and more enterprise systems. Even today, the fraud detection systems that many financial institutions use to determine whether transactions are legitimate or not are based on AI. But even the fastest systems tend to react seconds, minutes, or hours after the transaction happens, and tend to be quite compute-intensive. When that happens, the bad actor has the time to get away with the stolen items or cash. According to a recent Federal Trade Commission report, consumers lost more than $3.3 billion to fraud in 2020, up from $1.8 billion in 2019.
With Telum, financial institutions will be able to move from fraud detection to a fraud prevention, catching instances of fraud while the transaction is still ongoing. The dedicated AI core in Telum makes future IBM systems ideal for deploying AI in a wide range of financial services industries—such as payment and loan processing, clearing and settling trades, detecting money laundering, and risk analysis. But the potential use cases go beyond banking. Our research team thinks these AI cores could be applicable in natural language processing, computer vision, speech and genomics, and more, enabling the AI revolution powered by deep learning.
We are at a pivotal moment in computing, in the midst of a transition where AI is being applied everywhere. That has implications for the hardware and systems we design and the software we build. Such transitions don’t happen often, perhaps once every 10 to 30 years. The last significant shift into a new form of compute was in the ’80s. With the new artificial intelligence workloads we’re facing, there’s an opportunity for great ideas from our team at IBM Research, working closely with IBM Systems, to create new AI compute capabilities and lay the foundation for the next generation of computing.