Speech Technologies

We specialize in speech and multimodal AI for interaction, analytics, media, and security applications. Using advanced research and development, we create AI algorithms, technology components, solutions, and services – to enhance the experience and capabilities offered to enterprises, mobile users, application and content developers.

Publications

Authors Title Conference/Journal Year Link
Zvi Kons, Aharon Satt, Hong-Kwang Kuo, Samuel Thomas, Boaz Carmeli, Ron Hoory, Brian Kingsbury A New Data Augmentation Method for Intent Classification Enhancement and its Application on Spoken Conversation Datasets ICASSP 2022 2022 Link
Hagai Aronowitz, Itai Gat, Edmilson Morais, Weizhong Zhu, Ron Hoory Towards a Common Speech Analysis Engine ICASSP 2022 2022 Link
Itai Gat, Weizhong Zhu, Edmilson Morais, Ron Hoory, Hagai Aronowitz Speaker Normalization for Self-Supervised Speech Emotion Recognition ICASSP 2022 2022 Link
Edmilson Morais, Ron Hoory, Weizhong Zhu, Itai Gat, Matheus Damasceno and Hagai Aronowitz Speech Emotion Recognition Using Self-Supervised Features ICASSP 2022 2022 Link
A. Ben-David, S. Shechtman Acquiring conversational speaking style from multi-speaker spontaneous dialog corpus for prosody-controllable sequence-to-sequence speech synthesis SSW 2021 2021 Link
S. Shechtman, R. Fernandez, A. Sorin, D. Haws Synthesis of expressive speaking styles with limited training data in a multi-speaker, prosody-controllable sequence-to-sequence architecture Interspeech 2021 2021 Link
S. Shechtman, D. Haws, R. Fernandez Stable Checkpoint Selection and Evaluation in Sequence-to-Sequence Speech Synthesis ICASSP 2021 2021 Link
Samuel Thomas, Jeff Kuo, George Saon, Zoltan Tuske, Gakuto Kurata, Zvi Kons, Ron Hoory, Brian Kingsbury RNN Transducer Models for Spoken Language Understanding ICASSP 2021 2021 Link
S. Shechtman, R. Fernandez, D. Haws Supervised and Unsupervised Approaches for Controlling Narrow Lexical Focus in Sequence-to-Sequence Speech Synthesis SLT 2021 2021  
H. Aronowitz, W. Zhu, M. Suzuki, G. Kurata, R. Hoory New Advances in Speaker Diarization Interspeech 2020 2020 Link
S. Rozenberg, H. Aronowitz, R. Hoory Siamese X-Vector Reconstruction for Domain Adapted Speaker Recognition Interspeech 2020 2020 Link
A. Sorin, S. Shechtman, R. Hoory Principal Style Components: Expressive Style Control and Cross-Speaker Transfer in Neural TTS Interspeech 2020 2020 Link
H. Kuo, Z. Tuske, S. Thomas, Y. Huang, K. Audhkhasi , B. Kingsbury, G. Kurata, Z. Kons, R. Hoory, L. Lastras End-to-End Spoken Language Understanding without Full Transcripts Interspeech 2020 2020 Link
Y. Huang, H. Kuo, S. Thomas, Z. Kons, K. Audhkhasi, B. Kingsbury, R. Hoory, M. Picheny Leveraging Unpaired Text Data for Training End-to-End Speech-to-Intent Systems ICASSP 2020 2020 Link
H. Aronowitz, W. Zhu Context and Uncertainty Modeling for Online Speaker Change Detection ICASSP 2020 2020 Link
Z. Kons, S. Shechtman, A. Sorin, C. Rabinovitz, R. Hoory High quality, lightweight and adaptable TTS using LPCNet Interspeech 2019 2019 Link
S. Shechtman, A. Sorin Sequence to Sequence Neural Speech Synthesis with Prosody Modification Capabilities SSW 2019 2019 Link
Z. Kons, S. Shechtman, A. Sorin, R. Hoory, C. Rabinovitz, E. Da Silva Morais Neural TTS Voice Conversion SLT 2018 2018 Link
Y. Mass, S. Shechtman, M. Mordechai, R. Hoory, O. Sar Shalom, G. Lev, D. Konopnicki Word Emphasis Prediction for Expressive Text to Speech Interspeech 2018 2018 Link
A. Sorin, S. Shechtman, Z. Kons, R. Hoory, S. Ben-David, J. Pavitt, S. Rozenberg, C. Rabinovitz, T. Drory The IBM Virtual Voice Creator Interspeech 2018 (Show & Tell demo) 2018 Link
A. Aides, D. Dov. H. Aronowitz Robust Audiovisual Liveness Detection for Biometric Authentication Using Deep Joint Embedding and Dynamic Time Warping ICASSP 2018 2018 Link
S. Shechtman, M. Mordechay Emphatic Speech Prosody Prediction with Deep LSTM Networks ICASSP 2018 2018 Link
K. A. Lee, V. Hautam¨aki, T. Kinnunen, A. Larcher, C. Zhang, A. Nautsch, T. Stafylakis, G. Liu, M. Rouvier, W. Rao, F. Alegre, J. Ma, M. W. Mak, A. K. Sarkar, H. Delgado, R. Saeidi, H. Aronowitz, et al The I4U Mega Fusion and Collaboration for NIST Speaker Recognition Evaluation 2016 Interspeech 2017 2017 Link
A. Sorin, S. Shechtman, A. Rendel Semi Parametric Concatenative TTS with Instant Voice Modification Capabilities Interspeech 2017 2017 Link
A. Satt, S. Rozenberg, R. Hoory Efficient Emotion Recognition from Speech Using Deep Learning on Spectrograms Interspeech 2017 2017 Link
A. Rendel, R. Fernandez, Z. Kons, A. Rosenberg, R. Hoory, B. Ramabhadran Weakly-Supervised Phrase Assignment from Text in a Speech-Synthesis System Using Noisy Labels Interspeech 2017 2017 Link
R. Fernandez, A. Rosenberg, A. Sorin, B. Ramabhadran, R. Hoory Voice-Transformation-Based Data Augmentation for Prosodic Classification ICASSP 2017 2017 Link
H. Aronowitz Speaker Recognition using Common Passphrases in RedDots in Proc. ICASSP, 2017 2017 Link
H. Aronowitz Inter Dataset Variability Modelling for Speaker Recognition ICASSP 2017 2017 Link
A. Aides, H. Aronowitz Text-Dependent Audiovisual Synchrony Detection for Spoofing Detection in Mobile Person Recognition Interspeech, 2016 2016 Link
S. Shechtman, A. Sorin Wideband Harmonic Model: Alignment and Noise Modeling for High Quality Speech Synthesis 9th ISCA Speech Synthesis Workshop, 2016 2016 Link
Y. Solewicz, H. Aronowitz, T. Becker Reducing Noise Bias in the i-Vector Space for Speaker Recognition Speaker Oddysey, 2016 2016 Link
A. Rendel, R. Fernandez, R. Hoory, B. Ramabhadran Using Continuous Lexical Embeddings to Improve Symbolic-Prosody Prediction in a Text-to-Speech Front-End ICASSP, 2016 2016 Link
H. Aronowitz Speaker Recognition using Matched Filters ICASSP, 2016 2016 Link
O.Plchot, L. Burget, H. Aronowitz, P. Majetka Audio Enhancing with DNN Autoencoders for Speaker Recognition ICASSP, 2016 2016 Link
R. Fernandez, A. Rendel, B. Ramabhadran, R. Hoory Using Deep Bidirectional Recurrent Neural Networks for Prosodic-Target Prediction in a Unit-Selection Text-to-Speech System Interspeech, 2015 2015 Link
H. Aronowitz Exploiting Supervector Structure for Speaker Recognition Trained on a Small Development Set Interspeech, 2015 2015 Link
H. Aronowitz Score Stabilization for Speaker Recognition Trained on a Small Development Set Interspeech, 2015 2015  
K.A. Lee, A. Larcher, G.Wang, P. Kenny, N. Brummer, D. van Leeuwen, H. Aronowitz, M. Kockmann, C. Vaquero, B. Ma, H. Li, T. Stafylakis, J. Alam, A. Swart, J. Perez The RedDots Data Collection for Speaker Recognition Interspeech, 2015 2015 Link
A. Sorin, S. Shechtman, V. Pollet Coherent Modification of Pitch and Energy for Expressive Prosody Implantation ICASSP, 2015 2015 Link
H. Aronowitz, M. Li, O. Toledo-Ronen, S. Harary, A. Geva, S. Ben-David, A. Rendel, R. Hoory, N. Ratha, S. Pankanti, D. Nahamoo Multi-Modal Biometrics for Mobile Authentication IJCB, 2014 2014 Link
Aharon Satt, Ron Hoory, Alexandra König, Pauline Aalten, Philippe H Robert Speech-based automatic and robust detection of very early dementia Interspeech, 2014 2014 Link
E. Bozkurt, O. Toledo-Ronen, A. Sorin, R. Hoory Exploring Modulation Spectrum Features for Speech-Based Depression Level Classification Interspeech, 2014 2014 Link
R. Fernandez, A. Rendel, B. Ramabhadran, R. Hoory Prosody Contour Prediction with Long Short-Term Memory, Bi-Directional, Deep Recurrent Neural Networks Interspeech, 2014 2014 Link
H. Aronowitz, A. Rendel Domain Adaptation for Text Dependent Speaker Verification Interspeech, 2014 2014 Link
A. Sorin, S. Shechtman, V, Pollet Refined Inter-segment Joining in Multi-Form Speech Synthesis Interspeech, 2014 2014 Link
H. Aronowitz Compensating Inter-Dataset Variability in PLDA Hyper-Parameters for Robust Speaker Recognition Speaker Odyssey, 2014 2014 Link
H. Aronowitz Inter Dataset Variability Compensation for Speaker Recognition ICASSP, 2014 2014  
J. Cui, J. Mamou, B. Kingsbury, B. Ramabhadran Automatic Keyword Selection for Keyword Search Development and Tuning ICASSP, 2014 2014 Link
E. Vaiciukynas, A. Verikas, A. Gelzinis, M. Bacauskiene, Z. Kons, A. Satt, R. Hoory Fusion of voice signal information for detection of mild laryngeal pathology Appl. Soft Comput. (ASC) 18:91-103 (2014) 2014 Link
O. Barkan, J. Weill, L. Wolf and H. Aronowitz Fast high dimensional vector multiplication based face recognition ICCV, 2013 2013 Link
H. Aronowitz, O. Barkan On Leveraging Conversational Data for Building a Text Dependent Speaker Verification System Interpseech, 2013 2013  
A. Satt, A. Sorin, O. Toledo-Ronen, O. Barkan, I. Kompatsiaris, A. Kokonozi, M. Tsolaki Evaluation of speech-based protocol for detection of early-stage dementia INTERSPEECH 2013:1692-1696, 2013 2013  
Z. Kons, H. Aronowitz Voice transformation-based spoofing of text-dependent speaker verification systems INTERSPEECH 2013:945-949, 2013 2013  
Z. Kons, O. Toledo-Ronen Audio event classification using deep neural networks INTERSPEECH 2013:1482-1486, 2013 2013  
O. Barkan, H. Aronowitz Diffusion Maps for PLDA-based Speaker Verification ICASSP, 2013 2013  
O. Toledo-Ronen, A. Sorin Voice-based sadness and anger recognition with cross-corpora evaluation ICASSP 2013:7517-7521, 2013 2013  
J. Cui, X. Cui, B. Ramabhadran, J. Kim, B. Kingsbury, J. Mamou, L. Mangu, M. Picheny, T. N. Sainath, A. Sethy Developing speech recognition systems for corpus indexing under the IARPA Babel program ICASSP 2013:6753-6757, 2013 2013  
J. Mamou, J. Cui, X. Cui, M. J. F. Gales, B. Kingsbury, K. Knill, L. Mangu, D. Nolden, M. Picheny, B. Ramabhadran, R. Schluter, A. Sethy, P. C. Woodland System combination and score normalization for spoken term detection ICASSP 2013:8272-8276, 2013 2013  
B. Kingsbury, J. Cui, X. Cui, M. J. F. Gales, K. Knill, J. Mamou, L. Mangu, D. Nolden, M. Picheny, B. Ramabhadran, R. Schluter, A. Sethy, P. C. Woodland A high-performance Cantonese keyword search system ICASSP 2013:8277-8281, 2013 2013  
R. Fernandez, A. Rendel, B. Ramabhadran, R. Hoory F0 contour prediction with a deep belief network-Gaussian process hybrid model ICASSP 2013:6885-6889, 2013 2013  
S. Shechtman Transient modeling for overlap-add sinusoidal model of speech ICASSP 2013:8189-8192, 2013 2013  
M. Saraclar, A. Sethy, B. Ramabhadran, L. Mangu, J. Cui, X. Cui, B. Kingsbury, J. Mamou An empirical study of confusion modeling in keyword search for low resource languages ASRU 2013:464-469, 2013 2013  
B. Bhana, S.V. Flowerday, A. Satt Using Participatory Crowdsourcing in South Africa to Create a Safer Living Environment International Journal of Distributed Sensor Networks, 2013 2013  
R. Piderit, S. Flowerday, and A. Satt Identifying Barriers to Citizen Participation in Public Safety Crowdsourcing in East London Joint International Conference on Engineering Education and Research and International Conference on Information Technology, Cape Town, South Africa, 2013 2013  
L. Cilliers, S. Flowerday, and A. Satt Can Information Security Produce Trust in a Public Safety Smart City Project? Joint International Conference on Engineering Education and Research and International Conference on Information Technology, Cape Town, South Africa, 2013 2013  
J. Mamou, J. Cui, X. Cui, M. J. F. Gales, B. Kingsbury, K. Knill, L. Mangu, D. Nolden, M. Picheny, B. Ramabhadran, R. Schluter, A. Sethy, P. C. Woodland DEVELOPING KEYWORD SEARCH UNDER THE IARPA BABEL PROGRAM Afeka Speech Processing Conference, 2013 2013  
O. Toledo-Ronen, A. Sorin Emotion Detection for Dementia Patients Monitoring Afeka Speech Processing Conference, 2013 2013  
A. Satt, A. Sorin, O. Toledo-Ronen Vocal Biomarkers for Dementia Patient Monitoring Afeka Speech Processing Conference, 2013 2013  
O. Barkan, H. Aronowitz Non-linear i-Vector Extraction for Speaker Recognition Afeka Speech Processing Conference, 2013 2013  
H. Aronowitz, Y. Solewicz, O. Toledo-Ronen Online Two Speaker Diarization Speaker Odyssey, 2012 2012  
H. Aronowitz Text Dependent Speaker Verification Using a Small Development Set Speaker Odyssey, 2012 2012  
O. Toledo-Ronen, H. Aronowitz Confidence for Speaker Diarization using PCA Spectral Ratio INTERSPEECH 2012 2012  
A. Sorin, S. Shechtman, V. Pollet Psychoacoustic Segment Scoring for Multi-Form Speech Synthesis INTERSPEECH 2012 2012  
H. Aronowitz, O. Barkan Efficient approximated i-vector extraction ICASSP, 2012 2012  
A. Rendel, A. Sorin, R. Hoory, A. Breen Towards automatic phonetic segmentation for TTS ICASSP 2012:4533-4536, 2012 2012  
H. Roitman, J. Mamou, S. Mehta, A. Satt, LV Subramaniam Harnessing the crowds for smart city sensing Proceedings of the 1st international workshop on Multimodal crowd sensing, pp. 17--18, 2012 2012  
T. Shoham, D. Malah, S. Shechtman Quality Preserving Compression of a Concatenative Text-To-Speech Acoustic Database IEEE Transactions on Audio, Speech & Language Processing (TASLP) 20(3):1056-1068 (2012) 2012  
S. Trewin, C. Swart, L. Koved, J. Martino, K. Singh, S. Ben-David Biometric authentication on a mobile device: a study of user effort, error and task disruption ACSAC 2012:159-168, 2012 2012  
H. Aronowitz Voice Biometrics for User Authentication Afeka Speech Processing Conference, 2012 2012  
J. Mamou, B. Ramabhadran, A. Sethy New Developments in Spoken Query Transcription Afeka Speech Processing Conference, 2012 2012  
H. Aronowitz, H. Hoory, J. Pelecanos, D. Nahamoo New Developments in Voice Biometrics for User Authentication Interspeech, 2011 2011  
H. Aronowitz Speaker Diarization using A Priori Acoustic Information Interspeech, 2011 2011  
Y. Solewicz, H. Aronowitz Implicit Segmentation in Two-Wire Speaker Recognition Interspeech, 2011 2011  
H. Aronowitz, O. Barkan New Developments in Joint Factor Analysis for Speaker Verification Interspeech, 2011 2011  
O. Toledo-Ronen, H. Aronowitz, R. Hoory, J. W. Pelecanos, D. Nahamoo Towards Goat Detection in Text-Dependent Speaker Verification INTERSPEECH 2011:9-12, 2011 2011  
J. Mamou, A. Sethy, B. Ramabhadran, R. Hoory, P. Vozila Improved Spoken Query Transcription Using Co-Occurrence Information INTERSPEECH 2011:1473-1476, 2011 2011  
A. Sorin, S. Shechtman, V. Pollet Uniform Speech Parameterization for Multi-Form Segment Synthesis INTERSPEECH 2011:337-340, 2011 2011  
A. Sorin, H. Aronowitz, J. Mamou, O. Toledo-Ronen, R. Hoory, M. Kuritzky, Y. Erez, B. Ramabhadran, A. Sethy Speech processing and retrieval in a personal memory aid system for the elderly ICASSP 2011:1749-1752, 2011 2011  
O. Toledo-Ronen, H. Aronowitz Detecting Goats in Speaker Verification Systems Afeka Speech Processing Conference, June 2011 2011  
S. Tiomkin, D. Malah, Z. Kons and S. Shechtman A Hybrid Text-to-Speech System that Combines Concatenative and Statistical Synthesis Units IEEE Transactions on audio, speech and language processing, 2010 2010  
S. Tiomkin, D. Malah and S. Shechtman Statistical Text-To-Speech Synthesis based on Segment-wise Representation with a Norm Constraint IEEE Transactions on audio, speech and language processing, July 2010 2010  
S. Shechtman and A. Sorin Sinusoidal model parameterization for HMM-based TTS system Interspeech, 2010, Makuhari, Japan 2010  
H. Aronowitz Unsupervised Compensation of Intra-Session Intra-Speaker Variability for Speaker Diarization Odyssey 2010, Brno, Czech Republic 2010  
H. Aronowitz and V. Aronowitz Efficient score normalization for speaker recognition ICASSP 2010, Dallas, USA 2010  
T. Shoham, D. Malah and S. Shechtman Footprint Reduction of Concatenative Text-To-Speech Synthesizers using Polynomial Temporal Decomposition ISCCSP 2010, Limassol, Cyprus 2010  
A. Sorin and R. Hoory Automatic Speech Transcription in AAL Solutions AMI Workshop 2009, Salzburg, Austria 2009  
Y. Solewicz, H. Aronowitz Two-Wire Nuisance Attribute Projection Interspeech 2009, Brighton, UK 2009  
J. Huerta , C. Wu, A Sakrajda , S. Caskey, E. Jan, A. Faisman, S. Ben-David, W. Liu, U. Stewart, M. Frissora , D. Lubensky , A. Lee RTTS: Towards Enterprise-level Real-Time Speech Transcription and Translation Services Interspeech 2009, Brighton, UK 2009  
A. Kaplan, J. Mamou, F. Gallo, and B. Sznajder Multimedia Feature Extraction in the SAPIR Project UIMA Workshop at GSCL 2009, Potsdam, Germany 2009  
J. Mamou, Y. Mass, M. Shmueli-Scheuer, B. Sznajder A Unified Inverted Index for an efficient Image and Text Retrieval SIGIR, 2009, Boston, USA 2009  
B. Ramabhadran, A. Sethy, J. Mamou, B. Kingsbury, U. Chaudhari Fast Decoding for Open Vocabulary Spoken Term Detection NAACL-HLT, 2009, Boulder, USA 2009  
S. Shechtman and R. Tachibana Efficient Gradient F0 Tree Model for Prosody Modeling and Unit-selection, Applied for the Embedded American English Concatenative TTS ICASSP, 2009, Taipai Taiwan 2009  
R. Fernandez, Z. Kons, S. Shechtman, Z. Shuang, R. Hoory, B. Ramabhadran and Y. Qin The IBM Submission to the 2008: Text-to-Speech Blizzard Challenge Blizzard Workshop, Sep. 2008, Brisbane Australia 2008  
S. Tiomkin and D. Malah Statistical Text-to-Speech Synthesis with Improved Dynamics Interspeech, Sep. 2008, Brisbane, Australia, Sep. 2008 2008  
H. Aronowitz Online Vocabulary Adaptation Using Contextual Information and Information Retrieval Interspeech, Sep. 2008, Brisbane Australia 2008  
H. Aronowitz and Y. Solewicz Speaker Recognition in Two Wire Test Sessions Interspeech, Sep. 2008, Brisbane Australia 2008  
J. Mamou, B. Ramabhadran Phonetic Query Expansion for Spoken Document Retrieval Interspeech, Sep. 2008, Brisbane Australia 2008  
J. Mamou, Y. Mass, B. Ramabhadran, B. Sznajder Combination of Multiple Speech Transcription Methods for Vocabulary Independent Search Search in Spontaneous Conversational Speech Workshop, SIGIR 2008, Singapore 2008  
A. Geven, M. Tscheligi, A. Sorin and H. Aronowitz Presenting a speech-based mobile reminder system SiMPE 2008, Sept. 2008, Amsterdam, Netherlands 2008  
V. Mylonakis, J. Soldatos, A. Pnevmatikakis, L. Polymenakos, A. Sorin and H. Aronowitz Using Robust Audio and Video Processing Technologies to Alleviate the Elderly Cognitive Decline PETRA 2008, July 2008, Athens, Greece 2008  
B. Sznajder, J. Mamou, Y. Mass, and M. Shmueli-Scheuer Metric inverted - an efficient inverted indexing method for metric spaces Efficiency Issues in Information Retrieval Workshop, ECIR 2008 2008  
W. Allasia, F. Falchi, F. Gallo, M. Kacimi, A. Kaplan, J. Mamou, Y. Mass and N. Orio Audio-visual content analysis in P2P networks: the SAPIR approach 1st Workshop on Automated Information Extraction in Media Production, AIEMPro'08 2008  
S. Chu, H. Kuo, L. Mangu, Y. Liu , S. Qin, Q. Shi, S. Zhang, H. Aronowitz Recent advances in the IBM GALE Mandarin transcription system ICASSP, Apr. 2008, Las Vegas, USA 2008