Publication
NeurIPS 2023
Talk

FlowPilot: An LLM-powered system for enterprise data integration

Abstract

Traditional data integration techniques often require complex coding and a deep understanding of data architectures, which can be daunting for non-specialists. In the evolving landscape of AI, there's a growing need for tools that democratize data access and analysis. We present FlowPilot, a novel system that departs from the current one-shot text-to-SQL paradigms that often fail to answer complex queries. A key innovation in our work is the automated generation of the training/fine-tuning dataset by leveraging a dynamic set of inputs, including metadata from enterprise catalogs, database schemas, query logs, etc. The generated dataset is then used to fine-tune an LLM tailored for the customer that is able to understand the context of enterprise data by embedding its core knowledge with the relevant schemas, relationships and patterns. Flowpilot ensures the mitigation of errors during both the training and inference phases by leveraging the uncertainty estimation for the query validity and alignment with the user intent, also by allowing the model to execute and refine statements in a sandbox environment. A coordinator seamlessly integrates fine tuned text-to-SQL, text-to-Python, and text-to-chart models, delivering thorough answers to a spectrum of data-related questions. FlowPilot's user-friendly interface comprises three synchronized, AI-powered interactive views: chat, flow, and data. This arrangement provides users with the flexibility to select their preferred mode of interaction with the system throughout their conversation with the databases. FlowPilot offers an advanced approach to data integration, utilizing generative AI and a streamlined data pre-processing method. It introduces a novel conversational text-to-SQL feature, aiming to make data access simpler and provide reliable responses, thereby enhancing user interactions with enterprise databases.