A multimodal speaker detection and tracking system for teleconferencing
Abstract
A serious problem in both audio and video conferencing facilities available today is the difficulty in determining who is speaking among a large number of participants. There is a strong need for developing meeting room infrastructure and teleconference facilities that improve the sense of presence and participation experienced in remote meetings. We present a distributed multimodal tracking system that uses multiple cameras and microphones to automatically select the current speaker among multiple meeting participants. The system actively obtains and transmits video showing a good view of the selected speaker. The tracking system is integrated into a web-based video conferencing application that connects seven meeting rooms around the globe. An important part of designing such a system is to determine sensor placement and configuration through systematic experiments in the actual rooms where the system is deployed.