Clustering with modified cosine distance learned from constraints

Leonid Rachevsky; Dimitri Kanevsky; Ruhi Sarikaya; Bhuvana Ramabhadran

INTERSPEECH 2011

Conference paper

01 Dec 2011

Clustering with modified cosine distance learned from constraints

Abstract

In this paper we present a modified cosine similarity metric that helps to make features more discriminative. The new metric is defined via various linear transformations of the original feature space to a space in which these samples are better separated. These transformations are learned from a set of constraints representing available domain knowledge by solving related optimization problems. We present results on two natural language call routing datasets that show significant improvements ranging from 3% to 5% absolute in the purity of clusters obtained in an unsupervised fashion. Copyright © 2011 ISCA.

Conference paper