George Saon

Title

Speech strategy lead, distinguished research scientist

Bio

George Saon received his M.Sc. and PhD degrees in Computer Science from Henri Poincare University in Nancy, France in 1994 and 1997. In 1995, Dr. Saon obtained his engineer diploma from the Polytechnic University of Bucharest, Romania. From 1994 to 1998, he worked on two-dimensional stochastic models for off-line handwriting recognition at the Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA). Since 1998, Dr. Saon is with the IBM T.J. Watson Research Center where he worked on a variety of problems spanning several areas of large vocabulary continuous speech recognition such as discriminative feature processing, acoustic modeling, speaker adaptation and large vocabulary decoding algorithms. Some of the techniques that he co-invented are well known to the speech community like heteroscedastic discriminant analysis (HDA), lattice-MLLR, fast FSM-based Viterbi decoding, i-vector speaker adaptation for DNNs, joint CNN/DNN training etc. Since 2001, Dr. Saon has been a key member of IBM's speech recognition team which participated in several U.S. government-sponsored evaluations for the EARS, SPINE, GALE, RATS and BOLT programs. He has published over 150 conference and journal papers and holds several patents in the field of ASR. He is the recipient of three best paper awards (EARS RT'04, INTERSPEECH 2010, ASRU 2011) and has served as an elected member of the IEEE Speech and Language Technical Committee.

Publications

Exploring the Limits of Conformer CTC-Encoder for Speech Emotion Recognition using Large Language Models
- - Edmilson Da Silva Morais
  - Hagai Aronowitz
  - et al.
- 2025
- INTERSPEECH 2025
LLM based Text Generation for Improved Low-resource Speech Recognition Models
- - Tohru Nagano
  - Gakuto Kurata
  - et al.
- 2025
- ICASSP 2025
Knowledge Distillation Based Training of Unified Conformer CTC Models for Multi-form ASR
- - Takashi Fukuda
  - Gakuto Kurata
  - et al.
- 2025
- ICASSP 2025
A Non-autoregressive Model for Joint STT and TTS
- - Vishal Sunder
  - Brian Kingsbury
  - et al.
- 2025
- ICASSP 2025
Exploring the limits of decoder-only models trained on public speech recognition corpora
- - Ankit Gupta
  - George Saon
  - et al.
- 2024
- INTERSPEECH 2024
MULTIPLE REPRESENTATION TRANSFER FROM LARGE LANGUAGE MODELS TO END-TO-END ASR SYSTEMS
- - Takuma Udagawa
  - Masayuki Suzuki
  - et al.
- 2024
- ICASSP 2024
Semi-Autoregressive Streaming ASR With Label Context
- - Siddanth Arora
  - George Saon
  - et al.
- 2024
- ICASSP 2024
Diagonal State Space Augmented Transformers for Speech Recognition
- - George Saon
  - Ankit Gupta
  - et al.
- 2023
- ICASSP 2023
Towards Reducing the Need for Speech Training Data To Build Spoken Language Understanding Systems
- - Samuel Thomas
  - Jeff Kuo
  - et al.
- 2022
- ICASSP 2022
Improving End-to-End Models for Set Prediction in Spoken Language Understanding
- - Jeff Kuo
  - Zoltan Tuske
  - et al.
- 2022
- ICASSP 2022

Visit Google Scholar

Blog posts

IBM Granite model tops Hugging Face speech recognition leaderboard
News
Mike Murphy
16 Jun 2025
- AI

Top collaborators

George Saon

Title

Bio

Publications

Exploring the Limits of Conformer CTC-Encoder for Speech Emotion Recognition using Large Language Models

LLM based Text Generation for Improved Low-resource Speech Recognition Models

Knowledge Distillation Based Training of Unified Conformer CTC Models for Multi-form ASR

A Non-autoregressive Model for Joint STT and TTS

Exploring the limits of decoder-only models trained on public speech recognition corpora

MULTIPLE REPRESENTATION TRANSFER FROM LARGE LANGUAGE MODELS TO END-TO-END ASR SYSTEMS

Semi-Autoregressive Streaming ASR With Label Context

Diagonal State Space Augmented Transformers for Speech Recognition

Towards Reducing the Need for Speech Training Data To Build Spoken Language Understanding Systems

Improving End-to-End Models for Set Prediction in Spoken Language Understanding

Patents

Accuracy Of Streaming Rnn Transducer

Accuracy Of Streaming Rnn Transducer

Soft-forgetting For Connectionist Temporal Classification Based Automatic Speech Recognition

Reducing Exposure Bias In Machine Learning Training Of Sequence-to-sequence Transducers

Integrating Dialog History Into End-to-end Spoken Language Understanding Systems

Soft-forgetting For Connectionist Temporal Classification Based Automatic Speech Recognition

Training End-to-end Spoken Language Understanding Systems With Unordered Entities

Soft-forgetting For Connectionist Temporal Classification Based Automatic Speech Recognition

Chunking And Overlap Decoding Strategy For Streaming Rnn Transducers For Speech Recognition

Customization Of Recurrent Neural Network Transducers For Speech Recognition

Blog posts

IBM Granite model tops Hugging Face speech recognition leaderboard

Top collaborators

Brian Kingsbury

Samuel Thomas

Gakuto Kurata

Takashi Fukuda