Learning-based methods with human-in-the-loop for entity resolution
This tutorial is intended for researchers and practitioners working in the data integration area and, in particular, entity resolution (ER), which is a sub-area focused on linking entities across heterogeneous datasets. We outline the ideal requirements of modern ER systems: (1) capture domain knowledge via (minimal) human interaction, (2) provide as much automation as possible via machine learning techniques, and (3) achieve high explainability. We describe recent research trends towards bringing such ideal ER systems closer to reality. We begin with an overview of human-in-the-loop methods that are based on techniques such as crowdsourcing and active learning. We then dive into recent trends that involve deep learning techniques such as representation learning to automate feature engineering, and combinations of transfer and active learning to reduce the amount of user labels required. We also discuss how explainable AI relates to ER, and outline some of the recent advances towards explainable ER.