Interactive dictionary expansion using neural language models
Dictionaries and ontologies are foundational elements of systems extracting knowledge from unstructured text. However, as new content arrives keeping dictionaries up-to-date is a crucial operation. In this paper, we propose a human-in-the-loop (HumL) dictionary expansion approach that employs a lightweight neural language model coupled with tight HumL supervision to assist the user in building and maintaining a domain-specific dictionary from an input text corpus. The approach is based on the explore/exploit paradigm to effectively discover new instances (explore) from the text corpus as well as predict new “unseen” terms not currently in the corpus using the accepted dictionary entries (exploit). We evaluate our approach on a real-world scenario in the healthcare domain, in which we construct a dictionary of adverse drug reactions from user blogs as input text corpus. The evaluation shows that using our approach the user can easily extend the input dictionary, where tight human-in-the-loop integration results in a 216% improvement in effectiveness.