About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
CIKM 2023
Conference paper
Related Table Search for Numeric data using Large Language Models and Enterprise Knowledge Graphs
Abstract
Searching related tables is a crucial part of enterprise data lake exploration. However, data lakes often contain numeric tables with unreliable column headers, and ID columns whose text names have been lost. Finding such related numeric tables in large data lakes is a challenging task. State-of-the-art related table search relies on text values in tables, and cannot be applied on numeric tables. On the other hand, the state-of-the-art for semantic labeling of numeric tables using enterprise knowledge graphs (EKGs) has clear sources of semantic ambiguity due to its heuristic and rule-based approaches for determining numeric types and EKG labels, leading to poor performance. In this paper, we propose a system, NumSearchLLM, that leverages LLMs alongside EKGs to alleviate the ambiguity in semantic labeling of numeric columns and facilitate both joinable table search, and more general table relatedness tasks. Specifically, we use LLMs to: (i) discover new relationships absent from EKGs; (ii) validate numeric types assigned by heuristics; and (iii) check whether the semantic labels assigned to columns of a table form a meaningful schema. We also show how EKGs can be used in conjunction with LLMs to fix labeling inconsistencies discovered by LLMs by finding alternate labels. We show that by an integrated use of LLMs with EKGs, we can achieve superior performance in joinable and related table search tasks in comparison to the current approaches.