About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
WWW 2019
Conference paper
Identifying high value opportunities for human in the loop lexicon expansion
Abstract
Many real world analytics problems examine multiple entities or classes that may appear in a corpus. For example, in a customer satisfaction survey analysis there are over 60 categories of (somewhat overlapping) concerns. Each of these is backed by a lexicon of terminology associated with the concern (e.g., �Easy, user friendly process" or "Process confusing, too many handoffs�). These categories need to be expanded by a subject matter expert as the terminology is not always straight forward (e.g., �handoffs� may also include �ping-pong� and �hot potato� as relevant terms). But given that Subject Matter Expert time is costly, which of the 60+ lexicons should we expand first? We propose a metric for evaluating an existing set of lexicons and providing guidance on which are likely to benefit most from human-in-the-loop expansion. Using our ranking results we achieved 4 improvement in impact when expanding the first few lexicons off our suggested list as compared to a random selection.