Together with precision scaling, the IBM Research AI Hardware Center is also pushing boundaries of multi-chip and chiplet packaging in lateral and vertical dimensions.
It just announced at IEEE ECTC the new DBHi technology1 that uses silicon as a communication bridge bonded between chips with copper pillars. That's not all. The Center also reported for the first time on other advances made in packaging technologies, including advanced laminates, 3D integration (3Di), hybrid bonding, photonics, advances in thermal management, and AI-enabled modeling for packaging applications.
As a show of confidence in the future of the packaging technology, IBM is making significant investments in partnership with NY CREATES at our Albany, New York research site to further expand on the capabilities and scope of our advanced heterogeneous integration agenda.
Even beyond the impressive capabilities of digital reduced-precision AI compute cores and their scalability potential with 2D and 3D packaging, the broad innovation horizon of the IBM Research AI Hardware Center roadmap extends to novel analog AI architectures. Neural networks are mapped onto arrays of non-volatile memory (NVM) elements acting as synaptic weights. Multiplication and addition operations are performed in-place, largely eliminating costly data movement, promising unparalleled speedup and energy efficiency for AI workloads.
In June 2021, the IBM Research AI Hardware Center reached a significant milestone and announced a world first 14-nanometer fully on-hardware deep learning inference technology. It is used not in one but two types of analog AI compute chips based on back-end inserted Phase Change Memory (PCM).
On one hand, an all-analog chip2 relies on 35 million PCM devices and time-encoded communication to perform end-to-end multi-layer neural network inference with 8.9 million synaptic weights without extensive use of analog-to-digital converters (ADCs).
Benchmarking on the common Modified National Institute of Standards and Technology (MNIST) image datasets yields classification with good accuracy. On the other hand, a mixed-precision analog chip3 demonstrates inference on MNIST actually matching digital accuracy, while scalability to large neural nets is evidenced by a ResNet-9 network, running at 85.6% classification accuracy on the CIFAR-10 dataset at record speed (10.5 TOPs per Watt and 1.6 TOPs per square millimeter).
Details on the design and performance of the two chips are described in papers, selected as highlights in the 2021 IEEE VLSI Technology and Circuits Symposia.
While the IBM AI Research Center team continues to address in-memory compute challenges towards iso-accuracy in large networks at record efficiency, everyone can now use the IBM Analog Hardware Acceleration Toolkit to simulate both inference and training within the artificial synaptic arrays.5
A no-code-required experience version of the toolkit can be accesses via the AI Hardware Composer, which comes with pre-defined presets users can choose from to build their network.
Precision scaling, advanced chiplet packaging and in-memory computing constitute the foundation needed to optimize data movement and maximize throughput. The IBM Research AI Hardware Center will continue to meet the AI technology challenges and sustain its innovation pace by combining some or all of these elements in its upcoming cores.
Foundational Hardware Technology: At IBM Research, we’re working on semiconductor innovations for the hybrid cloud and AI — foundational hardware technology that we can use to build the next generation of chips to solve those challenges.
Sikka, K., Divakaruni, R., et al. "Direct Bonded Heterogeneous Integration (DBHi) Si Bridge". 2021 IEEE 71st Electronic Component and Technology Conference (ECTC), Proceedings, p. 136 ↩ ↩2
Narayanan, P., Burr, G. W., et al. "Fully on-chip MAC at 14nm enabled by accurate row-wise programming of PCM-based weights and parallel vector-transport in duration-format". 2021 Symposium on VLSI Technology, Digest of Technical Papers, T13-3 ↩ ↩2
Khaddam-Aljameh, R., Eleftheriou, E., et al. "A 14nm CMOS and PCM-based In-Memory Compute Core using an array of 300ps/LSB Linearized CCO-based ADCs and local digital processing". 2021 Symposium on VLSI Technology, Digest of Technical Papers, JFS2-5 ↩ ↩2
Venkataramani, S., Gopalakrishnan, K., et al. AI Accelerator for Ultra-low Precision Training and Inference. 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). (2021). ↩
Rasch, M. J., Narayanan, V., et al. "A Flexible and Fast PyTorch Toolkit for Simulating Training and Inference on Analog Crossbar Arrays". 2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS), Proceedings. (2021). ↩