Learning-based methods with human-in-the-loop for entity resolution

Sairam Gurajada; Kun Qian; Lucian Popa; Prithviraj Sen

doi:10.1145//3357384.3360316

CIKM 2019

Conference paper

03 Nov 2019

Learning-based methods with human-in-the-loop for entity resolution

Abstract

This tutorial is intended for researchers and practitioners working in the data integration area and, in particular, entity resolution (ER), which is a sub-area focused on linking entities across heterogeneous datasets. We outline the ideal requirements of modern ER systems: (1) capture domain knowledge via (minimal) human interaction, (2) provide as much automation as possible via machine learning techniques, and (3) achieve high explainability. We describe recent research trends towards bringing such ideal ER systems closer to reality. We begin with an overview of human-in-the-loop methods that are based on techniques such as crowdsourcing and active learning. We then dive into recent trends that involve deep learning techniques such as representation learning to automate feature engineering, and combinations of transfer and active learning to reduce the amount of user labels required. We also discuss how explainable AI relates to ER, and outline some of the recent advances towards explainable ER.

Conference paper