George Saon

Title

Speech strategy lead, distinguished research scientist

Bio

George Saon received his M.Sc. and PhD degrees in Computer Science from Henri Poincare University in Nancy, France in 1994 and 1997. In 1995, Dr. Saon obtained his engineer diploma from the Polytechnic University of Bucharest, Romania. From 1994 to 1998, he worked on two-dimensional stochastic models for off-line handwriting recognition at the Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA). Since 1998, Dr. Saon is with the IBM T.J. Watson Research Center where he worked on a variety of problems spanning several areas of large vocabulary continuous speech recognition such as discriminative feature processing, acoustic modeling, speaker adaptation and large vocabulary decoding algorithms. Some of the techniques that he co-invented are well known to the speech community like heteroscedastic discriminant analysis (HDA), lattice-MLLR, fast FSM-based Viterbi decoding, i-vector speaker adaptation for DNNs, joint CNN/DNN training etc. Since 2001, Dr. Saon has been a key member of IBM's speech recognition team which participated in several U.S. government-sponsored evaluations for the EARS, SPINE, GALE, RATS and BOLT programs. He has published over 150 conference and journal papers and holds several patents in the field of ASR. He is the recipient of three best paper awards (EARS RT'04, INTERSPEECH 2010, ASRU 2011) and has served as an elected member of the IEEE Speech and Language Technical Committee.

Publications

Exploring the Limits of Conformer CTC-Encoder for Speech Emotion Recognition using Large Language Models
- - Edmilson Da Silva Morais
  - Hagai Aronowitz
  - et al.
- 2025
- INTERSPEECH 2025
Exploring the limits of decoder-only models trained on public speech recognition corpora
- - Ankit Gupta
  - George Saon
  - et al.
- 2024
- INTERSPEECH 2024
Semi-Autoregressive Streaming ASR With Label Context
- - Siddanth Arora
  - George Saon
  - et al.
- 2024
- ICASSP 2024
MULTIPLE REPRESENTATION TRANSFER FROM LARGE LANGUAGE MODELS TO END-TO-END ASR SYSTEMS
- - Takuma Udagawa
  - Masayuki Suzuki
  - et al.
- 2024
- ICASSP 2024
Diagonal State Space Augmented Transformers for Speech Recognition
- - George Saon
  - Ankit Gupta
  - et al.
- 2023
- ICASSP 2023
Towards Reducing the Need for Speech Training Data To Build Spoken Language Understanding Systems
- - Samuel Thomas
  - Jeff Kuo
  - et al.
- 2022
- ICASSP 2022
Improving End-to-End Models for Set Prediction in Spoken Language Understanding
- - Jeff Kuo
  - Zoltan Tuske
  - et al.
- 2022
- ICASSP 2022
Towards efficient end-to-end speech recognition with biologically-inspired neural networks
- - Thomas Bohnstingl
  - Ayush Garg
  - et al.
- 2021
- NeurIPS 2021

Visit Google Scholar

Blog posts

IBM Granite model tops Hugging Face speech recognition leaderboard
News
Mike Murphy
16 Jun 2025
- AI

Top collaborators

George Saon

Title

Bio

Publications

Exploring the Limits of Conformer CTC-Encoder for Speech Emotion Recognition using Large Language Models

Exploring the limits of decoder-only models trained on public speech recognition corpora

Semi-Autoregressive Streaming ASR With Label Context

MULTIPLE REPRESENTATION TRANSFER FROM LARGE LANGUAGE MODELS TO END-TO-END ASR SYSTEMS

Diagonal State Space Augmented Transformers for Speech Recognition

Towards Reducing the Need for Speech Training Data To Build Spoken Language Understanding Systems

Improving End-to-End Models for Set Prediction in Spoken Language Understanding

Towards efficient end-to-end speech recognition with biologically-inspired neural networks

Patents

Soft-forgetting For Connectionist Temporal Classification Based Automatic Speech Recognition

Chunking And Overlap Decoding Strategy For Streaming Rnn Transducers For Speech Recognition

Integrating Text Inputs For Training And Adapting Neural Network Transducer Asr Models

Customization Of Recurrent Neural Network Transducers For Speech Recognition

Integrating Text Inputs For Training And Adapting Neural Network Transducer Asr Models

Accuracy Of Streaming Rnn Transducer

Fast - Soft-forgetting For Connectionist Temporal Classification Based Automatic Speech Recognition

Multiplicative Integration In Neural Network Transducer Models For End-to-end Speech Recognition

Soft-forgetting For Connectionist Temporal Classification Based Automatic Speech Recognition

Vocal Recognition Using Generally Available Speech-to-text Systems And User-defined Vocal Training

Blog posts

IBM Granite model tops Hugging Face speech recognition leaderboard

Top collaborators

Brian Kingsbury

Samuel Thomas

Xiaodong Cui

Thomas Ortner