4-bit quantization of LSTM-based speech recognition modelsAndrea FasoliChia-Yu Chenet al.2021INTERSPEECH 2021
Efficacy of Pruning in Ultra-Low Precision DNNsSanchari SenSwagath Venkataramaniet al.2021ISLPED 2021
RaPiD: AI Accelerator for Ultra-Low Precision Training and InferenceSwagath VenkataramaniVijayalakshmi Srinivasanet al.2021ISCA 2021
Efficient Management of Scratch-Pad Memories in Deep Learning AcceleratorsSubhankar PalSwagath Venkataramaniet al.2021ISPASS 2021
A 7nm 4-Core AI Chip with 25.6TFLOPS Hybrid FP8 Training, 102.4TOPS INT4 Inference and Workload-Aware ThrottlingAnkur AgrawalSaekyu Leeet al.2021ISSCC 2021
Value Similarity Extensions for Approximate Computing in General-Purpose ProcessorsYounghoon KimSwagath Venkataramaniet al.2021DATE 2021
ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training Chia-Yu ChenJiamin Niet al.2020NeurIPS 2020
Efficient AI System Design with Cross-Layer Approximate ComputingSwagath VenkataramaniXiao Sunet al.2020Proceedings of the IEEE
A 3.0 TFLOPS 0.62V Scalable Processor Core for High Compute Utilization AI Training and InferenceJinwook OhSae Kyu Leeet al.2020VLSI Circuits 2020
11 Nov 2024US12141513Method To Map Convolutional Layers Of Deep Neural Network On A Plurality Of Processing Elements With Simd Execution Units, Private Memories, And Connected As A 2d Systolic Processor Array
30 Apr 2024TWI840790Single Function To Perform Combined Matrix Multiplication And Bias Add Operations
21 Apr 2024JP7477249System-aware Selective Quantization For Performance Optimized Distributed Deep Learning
27 Nov 2023US11831467Programmable Multicast Protocol For Ring-topology Based Artificial Intelligence Systems
KEKaoutar El MaghraouiPrincipal Research Scientist and Manager, AIU Spyre Model Enablement, AI Hardware Center