Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen LanguagesAndrew RouditchenkoSameer Khuranaet al.2023INTERSPEECH 2023
Resource-Efficient and Cross-Modal Learning Toward Foundation ModelsPin-Yu ChenChao-han Huck Yanget al.2023INTERSPEECH 2023
Smartwatch-derived Acoustic Markers for Deficits in Cognitively Relevant Everyday FunctioningYasunori YamadaKaoru Shinakwaet al.2023ICDH 2023
Remote Inference of Cognitive Scores in ALS Patients Using a Picture DescriptionCarla Agurto RiosGuillermo Cecchiet al.2023ICDH 2023
Effective Training of RNN Transducer Models on Diverse Sources of Speech and Text DataTakashi FukudaSamuel Thomas2023ICASSP 2023
Multi-Speaker Data Augmentation for Improved end-to-end Automatic Speech RecognitionSamuel ThomasHong-Kwang J. Kuoet al.2023ICASSP 2023
Low-Resource Music Genre Classification with Cross-Modal Neural Model ReprogrammingYun-ning HungChao-han Huck Yanget al.2023ICASSP 2023
Fine-Grained Textual Knowledge Transfer to Improve RNN Transducers for Speech Recognition and UnderstandingVishal SunderSamuel Thomaset al.2023ICASSP 2023
C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video RetrievalAndrew RouditchenkoYung-Sung Chuanget al.2023ICASSP 2023
Diagonal State Space Augmented Transformers for Speech RecognitionGeorge SaonAnkit Guptaet al.2023ICASSP 2023