About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
INTERSPEECH 2023
Tutorial
Resource-Efficient and Cross-Modal Learning Toward Foundation Models
Abstract
In this tutorial, the first session will introduce the theoretical advantages of large-scale pre-trained foundation models by the universal approximation theory and how to update the large-scale speech and acoustic models effectively using parameter-efficient learning. Next, our second session will introduce how we can do effective cross-modal pre-training of representations across visual, speech, and language modalities, which can be learned without necessarily needing aligned data across modalities and can benefit tasks in individual modalities as well. Finally, our third session will explore different applications on multimedia processing benefited from the pre-training of acoustic and language modelling with benchmark performance.