When we humans are asked whether or not the emotions in two speech samples are in the same category, the judgment depends on the size of the target category. Hierarchical clustering is a suitable technique for simulating such perceptions by humans of relative similarities of the emotions in speech. For better reflection of subjective similarities in clustering results, we have devised a method of hierarchical clustering that uses a new type of relative similarity data based on tagging the most similar pair in sets of three samples. This type of data allowed us to create a closed-loop algorithm for feature weight learning that uses the clustering performance as the objective function. When classifying the utterances of a specific sentence in Japanese recorded at a real call center, the method reduced the errors by 15.2%. Copyright © 2011 ISCA.