Using word embedding to enable semantic queries in relational databases
Abstract
We investigate opportunities for exploiting Artificial Intelligence (AI) techniques for enhancing capabilities of relational databases. In particular, we explore applications of Natural Language Processing (NLP) techniques to endow relational databases with capabilities that were very hard to realize in practice. We apply an unsupervised neural-network based NLP idea, Distributed Representation via Word Embedding, to extract latent information from a relational table. The word embedding model is based on meaningful textual view of a relational database and captures inter-/intra-attribute relationships between database tokens. For each database token, the model includes a vector that encodes these contextual semantic relationships. These vectors enable processing a new class of SQL-based business intelligence queries called cognitive intelligence (CI) queries that use the generated vectors to analyze contextual semantic relationships between database tokens. The cognitive capabilities enable complex queries such as semantic matching, reasoning queries such as analogies, predictive queries using entities not present in a database, and using knowledge from external sources.