A method to accelerate human in the loop clustering
Abstract
Data analysis tasks often require grouping of information to identify trends and associations. However, as the number of elements rises to the hundreds and thousands the cost of having a person perform the groupings unassisted quickly becomes prohibitive. Previous approaches have combined traditional clustering techniques with manual interaction steps, yielding human-in-the-loop clustering algorithms that incorporate user feedback by reweighting features or adjusting a similarity function. But in the real world, many grouping tasks lack both a feature set and a well-defined (dis)similarity metric, having only a subject matter expert with an implicit understanding of the correct relationships between elements based on the domain and the task at hand. We present a refine-and-lock clustering interaction model and demonstrate its effectiveness for cognitive-assisted human clustering over other interaction models such as split/merge and must-link/can't-link. Our approach offers effective automatic clustering assistance even in the absence of clear features or a definitive similarity metric; ensures that every cluster has final user approval; and exhibits at least a 3.94x improvement over other interactive clustering approaches in time to completion.