ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training Chia-Yu ChenJiamin Niet al.2020NeurIPS 2020
A 3.0 TFLOPS 0.62V Scalable Processor Core for High Compute Utilization AI Training and InferenceJinwook OhSae Kyu Leeet al.2020VLSI Circuits 2020
DyVEDeep: Dynamic Variable Effort Deep Neural NetworksSanjay GanapathySwagath Venkataramaniet al.2020ACM TECS
Hybrid 8-bit floating point (HFP8) training and inference for deep neural networksXiao SunJungwook Choiet al.2019NeurIPS 2019
Memory and Interconnect Optimizations for Peta-Scale Deep Learning SystemsSwagath VenkataramaniVijayalakshmi Srinivasanet al.2019HiPC 2019
Performance-driven Programming of Multi-TFLOP Deep Learning Accelerators∗Swagath VenkataramaniJungwook Choiet al.2019IISWC 2019
DeepTools: Compiler and Execution Runtime Extensions for RaPiD AI AcceleratorSwagath VenkataramaniJungwook Choiet al.2019IEEE Micro
Dynamic Spike Bundling for Energy-Efficient Spiking Neural NetworksSarada KrithivasanSanchari Senet al.2019ISLPED 2019
BiScaled-DNN: Quantizing long-tailed datastructures with two scale factors for deep neural networksShubham JainSwagath Venkataramaniet al.2019DAC 2019
Low-overhead Error Prediction And Preemption In Deep Neural Network Using Apriori Network Statistics24 May 2021US11016840
Programmable Data Delivery By Load And Store Agents On A Processing Chip Interfacing With On-chip Memory Components And Directing Data To External Memory Components16 Nov 2020US10838868
Processor And Memory Transparent Convolutional Lowering And Auto Zero Padding For Deep Neural Network Implementations17 Feb 2020US10565285
MOMori OharaDeputy Director, IBM Research Tokyo, Distinguished Engineer, Chief SW Engineer for Hybrid Cloud on IBM HW