6 minute read

VLSI 2020: IBM Research highlights nanosheet, AI processor and photonics advances

At the 2020 Symposia on VLSI Technology and Circuits this week, IBM Research is presenting a variety of papers, short courses, workshops and virtual sessions that demonstrate the latest advances in systems research. Our research spotlights key developments for hybrid cloud infrastructure and AI, marked by improvements in performance, energy efficiency, area scaling, and new workloads.

At VLSI’s first-ever virtual conference, IBM researchers are presenting their work on a universal air spacer compatible with different transistor architectures, whether it’s a fin field-effect transistor (FinFET) or a Nanosheet device architecture. Another team of IBM researchers demonstrates a new AI processor core design resulting in hardware utilization improvements that led to notable enhancements in training efficiency and performance. In a third paper, researchers focused on faster silicon photonics-based network switching, with one goal of eventually making these networks more useful for data centers.

Air Spacer

The new air spacer design, taken by a transmission electron microscope.

In their paper, “Improved Air Spacer Co-Integrated with Self-Aligned Contact (SAC) and Contact Over Active Gate (COAG) for Highly Scaled CMOS Technology,” IBM researchers described how the new air spacer reduces effective capacitance – a critical factor impacting the characteristics of CMOS devices – by 15 percent through a reduction in the air spacer’s dielectric constant, leading to performance gains and power reductions at the same time. Although SAC and COAG have been adopted in FinFET technology to reduce the footprint of transistors and standard cells, co-integrating air spacers with SAC and COAG has been challenging.

The spacer is an isolation layer between a gate and the contacts for source and drain in the transistor – essentially, an electronic switch. When the gate is on, electricity flows from the source to the drain, and the gate serves as a valve. The spacer ensures the gate controls only the flow and that the gate and the source and drain are electrically isolated. Without the spacer, the gate cannot serve as a valve.

Researchers positioned their improved air spacer as a viable approach to enhance energy efficiency and performance of advanced CMOS technology by reducing parasitic capacitance, the unwanted capacitance between the parts of an electronic component or circuit due to their proximity to one another.

The paper introduces a new process to form air spacers and provides a practical approach to enabling an electronic device to consume less power while achieving better performance. Excitingly, introducing the new air spacer module into 7nm FinFET produces more performance gains than more costly and disruptive scaling of FinFET to 5nm. The researchers expect their work will help pave the way for their technology’s adoption in FinFET and NanoSheet transistors in the coming years.

Paper authors: Kangguo Cheng, Chanro Park, Heng Wu, Juntao Li, Son Nguyen, Jingyun Zhang, Miaomiao Wang, Sanjay Mehta, Zuoguang Liu,  Richard Conti, Nicolas Loubet, Julien Frougier, Andrew Greene, Tenko Yamashita, Bala Haran, Rama Divakaruni

AI Processor Core

The Digital AI Core with heterogeneous compute engines, featuring dual corelet architecture, shared L1 scratchpad, and memory neighbor interface.

A worldwide team of IBM researchers described a hardware demonstration of a processor core that can be applied to both AI training and inference applications in their paper, “A 3.0 TFLOPS 0.62V Scalable Processor Core for High Compute Utilization AI Training and Inference.” The researchers achieved leading-edge compute efficiency for robust AI computations via efficient heterogeneous 2-D systolic array-SIMD (single instruction, multiple data) compute engines leveraging compact DLFloat16 Floating Point Units (FPUs). DLFloat is a 16-bit floating point format designed by IBM for deep learning training and inference.

For this study, the researchers optimized a Gen 1 core they first published in 2018, focusing on circuit design, architecture, and software enhancements to produce testchips with Gen 2 cores. This updated Gen 2 design features two corelets working in parallel and sharing memory to facilitate efficient computations. The resulting Gen 2 testchip achieved 5.5x power-efficiency improvements over their Gen 1 testchip for Deep Learning training and inference workflows while using a smaller supply voltage than their first-generation core. Each of the two corelets in the new design has 64 processing elements (each with multiple FPUs) that perform convolution and matrix multiplication operations, which is greater than 80 percent of overall workload in deep learning.

This advancement is part of the Digital AI Core accelerator research in the IBM Research AI Hardware Center. AI hardware accelerators can be used for building and deploying neural network models  for applications such as speech recognition, natural language processing and computer vision. This latest chip focuses on 16-bit training and inference, but the researchers have also published progress towards  8 bit training and inference as low as 2 bits.

Paper authors: Jinwook Oh, SaeKyu Lee, Mingu Kang, Matthew Ziegler, Joel Silberman, Ankur Agrawal, Swagath Venkataramani, Bruce Fleischer, Michael Guillorn, Jungwook Choi, WeiWang, Silvia Mueller, Shimon Ben-Yehuda, James Bonanno, Nianzheng Cao, Robert Casatuta, Chia-Yu Chen, Matt Cohen, Ophir Erez, Thomas Fox, George Gristede, Howard Haynie, Vicktoria Ivanov, Siyu Koswatta, Shih-Hsien Lo, Martin Lutz, Gary Maier, Alex Mesh, Yevgeny Nustov, Scot Rider, Marcel Schaal, Michael Scheuermann, Xiao Sun, Naigang Wang, Fanchieh Yee, Ching Zhou, Vinay Shah, Brian Curran, Vijayalakshmi Srinivasan, Pong-Fei Lu, Sunil Shukla, Kailash Gopalakrishnan, Leland Chang

Silicon Photonics

The silicon photonics switch module.

In the paper, “A Monolithically Integrated Silicon Photonics 8×8 Switch in 90nm SOI CMOS,” IBM researchers from the U.S. and Canada presented a silicon photonics-based network switch integrated with switching and control electronics. Silicon photonics, an evolving technology in which optical rays transfer data between computer chips, provides an affordable way to build faster switches. Optical rays can carry far more data in less time than electrical conductors.

IBM researchers have created one of the best performing high speed photonic switches, closing the performance gap with packet switching, which the internet uses to send data as well as information about where the data should be delivered. They have also simplified many problems that arise when trying to build electronics and photonics on the same chip. Their goal is to include all of the necessary electronics in order to reduce the packaging load and make a switch that’s both easier to manufacture and more affordable to implement.

The new optical-based circuit switching technology enables switch reconfiguration times of less than 15 nanoseconds while avoiding the high power of more conventional packet-based electronic switches, which require optical-to-electronic domain conversion. The technology uses a scalable process with simple flip chip packaging. Flip chip is a method for interconnecting integrated circuit chips, microelectromechanical systems, or other semiconductor components to external circuitry.

Paper authors: Jonathan E. Proesel, Nicolas Dupuis, Herschel Ainspan, Christian W. Baks, Fuad Doany, Nicolas Boyer, Elaine Cyr, Benjamin G. Lee

Additional Works

Other accepted VLSI papers from IBM and AI Hardware Center members, in addition to those above, include:

“Selective Enablement of Dual Dipoles for Near Bandedge Multi-Vt Solution in High Performance FinFET and Nanosheet Technologies,” R. Bao, K. Watanabe, J. Zhang, H. Zhou, M. Sankarapandian, J. Li, S. Pancharatnam, P. Jamison, R. G Southwick, M. Wang, J. J Demarest, J. Guo, N. Loubet, V. Basker, D. Guo, V. Narayanan, B. Haran, H. Bu, M. Khare

“Si Incorporation Into AsSeGe Chalcogenides for High Thermal Stability, High Endurance and Extremely Low Vth Drift 3D Stackable Cross-point Memory,” H. Y. Cheng, I. T. Kuo, W C. Chien, C. W. Yeh, Y. C. Chou, N. Gong, L. Gignac, C. H. Yang, C. W. Cheng, C. Lavoie, M. Hopstaken, B. R. Bruce, L. Buzi, E. K. Lai, F. Carta, A. Ray, M. H. Lee, H. Y.Ho, W. Kim, M. BrightSky, H. L. Lung

“Structural and Electrical Demonstration of SiGe Cladded Channel for PMOS Stacked Nanosheet Gate-All-Around Devices,” S.Mochizuki, B.Colombeau, J.Zhang, S. C.Kung, M.Stolfi, H. Zhou, M. Breton, K. Watanabe, J. Li, H. Jagannathan, M.Cogorno, T.Mandrekar, P.Chen, N. Loubet, S.Natarajan, B.Haran

“Composite Interconnects for High-Performance Computing Beyond the 7nm Node” P. Bhosale, S. Parikh, N. Lanzillo, T. Nogami, R. Tao, M. Gage, R. Shaviv, A. Simon, M. Stolfi, S. Reidy, N.Loubet, B. Haran

“A no-verification Multi-Level-Cell (MLC) operation in Cross-Point OTS-PCM” N. Gong, W. Chien, Y. Chou, C. Yeh, N. Li, H. Cheng, C. Cheng, I. Kuo, C. Yang, R. Bruce, A. Ray, L. Gignac, Y. Lin, C. Miller, T. Perri, W. Kim, L. Buzi, H. Utomo, F. Carta, E. Lai, H. Ho, H. Lung, M. BrightSky

“A 25-50Gb/s 2.22pJ/b NRZ RX with Dual-Bank and 3-tap Speculative DFE for Microprocessor Application in 7nm FinFET CMOS” Y. You, G. Wiedemeier, C. Marquart, C. Steffen, E. English, De. Yilma, T. Pham, V. Nammi, J. Okyere, N. Blanchard, A. Sutton, Z. Zhang, D. Friend D. Barba, T. Bohlke, M. Spear, V. Raj, J. Crugnale, D. Dreps, P.A. Francese, M. Kossel, T. Morf

Additionally, at VLSI:

These advances are part of IBM’s systems research group, which includes initiatives focusing on hybrid cloud, AI hardware, and exploratory science.