About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
ACL 2022
Paper
Domain Representative Keywords Selection: A Probabilistic Approach
Abstract
We propose a probabilistic approach to select a subset of a target domain representative keywords from a candidate set, contrasting with a context domain. Such a task is crucial for many downstream tasks in natural language processing. To contrast the target domain and the context domain, we adopt the two-component mixture model concept to generate a distribution of candidate keywords. It provides more importance to the distinctive keywords of the target domain than common keywords contrasting with the context domain. To support the representativeness of the selected keywords towards the target domain, we introduce an optimization algorithm for selecting the subset from the generated candidate distribution. We have shown that the optimization algorithm can be efficiently implemented with a near-optimal approximation guarantee. Finally, extensive experiments on multiple domains demonstrate the superiority of our approaches over other baselines for the tasks of keyword summary generation and trending keywords selection.