Channel-mapping for speech corpus recycling

Osamu Ichikawa; Steven J. Rennie; Takashi Fukuda; Masafumi Nishimura

doi:10.1109/ICASSP.2013.6639052

ICASSP 2013

Conference paper

18 Oct 2013

Channel-mapping for speech corpus recycling

View publication

Abstract

The performance of automatic speech recognition (ASR) is heavily dependent on the acoustic environment in the target domain. Large investments have focused on ways to record speech data in specific environments. In contrast, recent Internet services using hand-held devices such as smartphones have created opportunities to acquire huge amounts of 'live' speech data at low cost. There are practical demands to reuse this abundant data in different acoustic environments. To transform such source data for a target domain, developers can use channel mapping and noise addition. However, channel mapping of the data is difficult without stereo mapping data or impulse response data. We tested GMM-based channel mapping with a vector Taylor series (VTS) formulation on a per-utterance basis. We found this type of channel mapping effectively simulated our target domain data. © 2013 IEEE.

Paper