About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
SDM 2021
Conference paper
Phrasescope: An effective and unsupervised framework for mining high quality phrases
Abstract
Phrase mining is one of the fundamental NLP tasks that can have significant impact on the efficacy of many downstream applications. Many supervised and unsupervised phrase mining approaches have been proposed. Some rely on linguistic analyzers, and others are language agnostic. A daunting challenge in this task is to distinguish quality phrases from noise phrases, which tightly coexists with quality phrases in the entire frequency spectrum. Most existing approaches to phrase mining, however, rely on frequency-based statistics, hence suffer from quality loss. In this paper, we propose an unsupervised phrase mining framework, “PhraseScope”, which consists of a sequence of filters, namely cohesion, domain, and graph filters, to remove noise phrase. Each filter is responsible for removing noise phrase of particular characteristics. Collectively, our proposed filters are capable of detecting and removing noise phrases effectively while preserving quality phrases. Our results show significant improvement in both recall and precision over state-of-the-art frameworks when tested on three different domains of datasets.