Fast ordering of large categorical datasets for visualization

Alina Beygelzimer; Chang-Shing Perng; M.A. Sheng

doi:10.3233/ida-2002-6406

Intelligent Data Analysis

Paper

01 Jan 2002

Fast ordering of large categorical datasets for visualization

View publication

Abstract

An important issue in visualizing categorical data is how to order categorical values-non-numeric values that do not have a natural ordering, which makes it difficult to map them to visual coordinates. The focus of this paper is on constructing categorical orderings efficiently without compromising their visual quality. In order to avoid the inherent intractability of previous discrete formulations, we consider a continuous relaxation of the problem solvable exactly using the spectral method. The latter is based on computing certain algebraic information about the similarity matrix of the dataset. However, even computing the similarity matrix itself is prohibitive for large datasets. In order to achieve greater efficiency, we propose a new multi-level scheme based on an approximate representation of the matrix. We show that it sufficient to compute only a small portion of the matrix of size linear in the number of objects, as opposed to quadratic, to guarantee a small probability of approximation error. Thus an effective ordering can be constructed without actually having to compute most pairwise similarities of values. Experiments have been conducted to qualitatively verify the effectiveness of resulting visualizations. © 2002-IOS Press. All rights reserved.

Workshop