A scalable feature learning and tag prediction framework for natural environment sounds

Prasanna Sattigeri; Jayaraman J. Thiagarajan; M. Shah; Karthikeyan Natesan Ramamurthy; Andreas Spanias

doi:10.1109/ACSSC.2014.7094773

ACSSC 2014

Conference paper

24 Apr 2015

A scalable feature learning and tag prediction framework for natural environment sounds

View publication

Abstract

Building feature extraction approaches that can effectively characterize natural environment sounds is challenging due to the dynamic nature. In this paper, we develop a framework for feature extraction and obtaining semantic inferences from such data. In particular, we propose a new pooling strategy for deep architectures, that can preserve the temporal dynamics in the resulting representation. By constructing an ensemble of semantic embeddings, we employ an l1-reconstruction based prediction algorithm for estimating the relevant tags. We evaluate our approach on challenging environmental sound recognition datasets, and show that the proposed features outperform traditional spectral features.

Poster