About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
ISSCC 2024
Short course
Architecture and Design Approaches to ML Hardware Acceleration: Performance Compute Environment
Abstract
With the recent explosion in generative AI and large language models, hardware acceleration has become particularly important in high-performance compute environments. In such applications, AI accelerators should address a broad range of AI models and enable workflows spanning model pre-training, fine-tuning, and inference. System-level design and software co-optimization must be considered to balance compute and communication costs, especially with inference workloads driving aggressive latency targets and model size growth driving the use of distributed systems. This talk will discuss these considerations in the context of high-performance system deployments and explore approaches to AI accelerator circuit design as well as research roadmaps to improve both compute efficiency and communication bandwidth.