4-bit quantization of LSTM-based speech recognition modelsAndrea FasoliChia-Yu Chenet al.2021INTERSPEECH 2021
Efficacy of Pruning in Ultra-Low Precision DNNsSanchari SenSwagath Venkataramaniet al.2021ISLPED 2021
RaPiD: AI Accelerator for Ultra-Low Precision Training and InferenceSwagath VenkataramaniVijayalakshmi Srinivasanet al.2021ISCA 2021
Efficient Management of Scratch-Pad Memories in Deep Learning AcceleratorsSubhankar PalSwagath Venkataramaniet al.2021ISPASS 2021
A 7nm 4-Core AI Chip with 25.6TFLOPS Hybrid FP8 Training, 102.4TOPS INT4 Inference and Workload-Aware ThrottlingAnkur AgrawalSaekyu Leeet al.2021ISSCC 2021
Value Similarity Extensions for Approximate Computing in General-Purpose ProcessorsYounghoon KimSwagath Venkataramaniet al.2021DATE 2021
ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training Chia-Yu ChenJiamin Niet al.2020NeurIPS 2020
Efficient AI System Design with Cross-Layer Approximate ComputingSwagath VenkataramaniXiao Sunet al.2020Proceedings of the IEEE
A 3.0 TFLOPS 0.62V Scalable Processor Core for High Compute Utilization AI Training and InferenceJinwook OhSae Kyu Leeet al.2020VLSI Circuits 2020
06 Nov 2023US11810340System And Method For Consensus-based Representation And Error Checking For Neural Networks
11 May 2023CNZL202010150294.1Programmable Data Delivery To A System Of Shared Processing Elements With Shared Memory
09 Jan 2023US11551054System-aware Selective Quantization For Performance Optimized Distributed Deep Learning
KEKaoutar El MaghraouiPrincipal Research Scientist and Manager, AIU Spyre Model Enablement, AI Hardware Center