Fine-Grained Textual Knowledge Transfer to Improve RNN Transducers for Speech Recognition and UnderstandingVishal SunderSamuel Thomaset al.2023ICASSP 2023
Global RNN Transducer Models For Multi-dialect Speech RecognitionTakashi FukudaSamuel Thomaset al.2022INTERSPEECH 2022
Extending RNN-T-based speech recognition systems with emotion and language classificationZvi KonsHagai Aronowitzet al.2022INTERSPEECH 2022
Tokenwise Contrastive Pretraining for Finer Speech-to-BERT Alignment in End-to-End Speech-to-Intent SystemsVishal SunderEric Fosler-Lussieret al.2022INTERSPEECH 2022
Everything at Once - Multi-modal Fusion Transformer for Video RetrievalNina ShvetsovaBrian Chenet al.2022CVPR 2022
Towards Reducing the Need for Speech Training Data To Build Spoken Language Understanding SystemsSamuel ThomasJeff Kuoet al.2022ICASSP 2022
A new data augmentation method for intent classification enhancement and its application on spoken conversation datasetsZvi KonsAharon Sattet al.2022ICASSP 2022
Integrating Text Inputs For Training and Adapting RNN Transducer ASR ModelsSamuel ThomasBrian Kingsburyet al.2022ICASSP 2022
Towards End-to-end Integration of Dialog History For Improved Spoken Language UnderstandingVishal SunderSamuel Thomaset al.2022ICASSP 2022
Improving End-to-End Models for Set Prediction in Spoken Language UnderstandingJeff KuoZoltan Tuskeet al.2022ICASSP 2022