02 Jul 2024

News

6 minute read

IBM’s text-to-SQL generator takes top place on a benchmark for handling complex database queries

The solution, based on IBM's Granite code model, is part of a larger effort to bring generative AI into data services to help businesses extract fresh insights from large databases.

From website clicks to sales reports, organizations are gathering and storing more data than ever. But tools for finding the information you need across databases, data warehouses, and data lakehouses, and transforming it into something useful, have not kept pace.

Many businesses fail to unlock the full value of their data because employees either can’t find what they’re looking for or can’t translate their questions into the code required to unlock the answers.

Generative AI is set to simplify the process, with large language models (LLMs) removing key roadblocks that currently make finding, retrieving, and transforming tabular data so difficult. SQL is the dominant language for interacting with databases. But in any given enterprise there are a limited number of people who understand how large databases are laid out and can query them in SQL, effectively limiting who can access the data to uncover insights to improve the business.

To open enterprise data to more users, IBM and others in tech have focused on teaching LLMs to write SQL, or structured query language. In a recent milestone, IBM’s Granite code model jumped to the top of the BIRD leaderboard that measures how well LLMs can parse a question in natural language and translate it to SQL, to be run on the real data to answer the question.

IBM researchers named their BIRD submission, ExSL+granite-20b-code, in part for its extractive schema-linking technique that does the work of identifying how the database is organized (its schema), and retrieving the right data tables and columns to answer the question. Researchers tuned three versions of an IBM Granite 20B code model to find the most relevant data columns, identify linkages between values needed to answer the query, and generate the SQL code.

IBM’s text-to-SQL generator still has some way to go before it can code like a pro. Even as BIRD’s top performer, it answered just 68% of questions correctly, compared to 93% of engineers who volunteered to take the test. But given the rapid progress LLMs have made on other coding tasks, including refactoring programs written in COBOL to Java, the gap between AI and human-generated SQL may soon close.

On BIRD’s benchmark for code execution speed (how much computation is required to run the AI-generated SQL against the database), BIRD evaluators gave IBM’s solution an 80, just below the 90 that volunteer engineers received. The next-best AI systems earned a 65.

The SQL code generator is one of several technologies that IBM researchers are developing to help enterprises find, retrieve, transform, and visualize their data. IBM has already rolled out other LLM-powered components that enrich structured data with descriptions and business terminology to make database tables and columns easier to find. The technologies were recently introduced in IBM’s Knowledge Catalog and watsonx.data products.

“We’re on a mission to drive AI into the entire data services pipeline,” said Lisa Amini, a research director at IBM who led the team that developed the data enrichment technologies and SQL generator. “The features we’re developing can help data stewards and engineers to be more productive, and data and business analysts to reach insights faster.”

Search-locate-and-compare with generative AI

Like the humble Excel spreadsheet, relational databases are organized into data tables, each split into rows and columns. But the relational database was designed to change in real time, and to support multiple users at once. IBM researchers in 1970 were the first to come up with the idea to structure data relationally, instead of hierarchically. The SQL programming language, also invented at IBM, came along a few years later. Both innovations allowed people to compare subsets of the data and bring to light relationships that might otherwise have gone unnoticed.

The birth of the internet and big data naturally gave way to bigger databases. Today a database held by a bank or insurance company might contain tens of thousands of tables, often with cryptic names like “cust_id” or“full_nm.” Finding what you needed was never exactly easy, but the increasing scale of the problem convinced IBM Research that a solution was needed. One team was handed the text-to-SQL challenge while another was tasked with related search issues.

BIRD is one of the more challenging text-to-SQL benchmarks because it draws from 95 large databases covering topics as varied as blockchain, hockey, and healthcare. An AI system needs to transform a question like, “What percentage of soccer players born in Denmark weigh more than 154 lbs?” into a prompt for the code model, along with insights to help the model focus on the most relevant data.

IBM researchers came up with a three-step solution.

In the first step, called schema linking, key words in the question are matched against data tables and columns in the database needed to answer the question. Researchers tuned an IBM Granite model to find and extract the relevant columns of data to populate a set of focused sub-tables.

In the second step, called content linking, the sub-tables are passed as a string representation to another instance of the Granite model that’s been trained to generate up to 10 pieces of SQL code comparing columns of data with specific values relevant to the query (for example, “playerinfo.nation = 'Denmark,”).

In the final step, a third instance of the Granite model takes the initial question, the focused data tables, and the highest scoring lines of code, and generates a series of SQL queries. The system upvotes the queries that execute and picks the one that repeats most often.

IBM’s solution is unique in two ways. While most text-to-SQL systems currently generate column names for relevant sub-tables, IBM researchers chose an extractive method, that led to a seven-times speedup. In the content-linking step, they chose a generative, rather than search-based, method for improved accuracy.

“Extractive schema linking is a faster way to do it,” said Michael Glass, an LLM researcher who led IBM’s text-to-SQL team. “If you have a large schema, you might care about speed. For content-linking, what matters most is precision. We had to make trade-offs between recall and precision.”

IBM’s solution shot to the top of the BIRD leaderboard in May. But as difficult as BIRD might be, its test database comes pre-labeled in natural language. In the real world, the situation is far messier, which is why researchers fine-tuned an LLM to tag lakehouse data with natural language descriptions to make it easier to find and transform. (IBM Knowledge Catalog and watsonx.data recently incorporated this technology).

Talking to your data through a conversational GUI

IBM researchers designed a conversational GUI, or CGUI, that allows data engineers, stewards, and analysts to talk to their data. The chat interface on the left is aligned with the GUI on the right, allowing users to seamlessly interact with, and review, the AI system’s work.

In addition to the text-to-SQL generator, IBM researchers are developing a user experience that makes it easier to interact with your structured data. Their goal is to create a conversational graphical user interface (GUI) that combines the personal feel of an AI chat interface with the intuitiveness of a web-based GUI.

For inspiration, researchers collected extensive feedback from data stewards and engineers as well as business analysts. “We spent time learning about the user before we wrote a single line of code,” said Daniel Weidele, an IBM researcher who led the team that developed the conversational GUI, or CGUI.

To align the two interface modalities — text and graphics — IBM researchers made sure that any questions and answers that appear in the chat box are mirrored in the GUI, where the user can visualize results and explore further with a click.

Let’s say you want to know more about the purchasing habits of high-net-worth companies. You would start by curating a new table, and typing your question into the chat window on the left. An LLM-powered assistant instantly brings up a set of suggested tables on the right with short and long descriptions of the data.

schema graph 1 — The schema graph shows example questions on the right.

To find additional tables that could be relevant, you would click on the tab, “schema graph,” which brings up a map of how all the tables in the collection relate conceptually. “You may not be familiar with all the data in the lakehouse,” said Weidele. “How can you even ask for products a customer bought if you don’t know you have that table?”

A tab for “example questions” offers prompts designed to help you refine your search, for example, to companies with a certain amount of gross revenue. The suggested questions also provide a creativity boost, allowing you to expand your search in new and potentially fruitful directions.

After submitting your revised question in the chatbox, the LLM returns a table with results and a breakdown of how it computed its answer. Click on the “decomposition” tab, and you can review each step the system took to solve the problem, along with the SQL code it ran to get there. Editing the code is as simple as clicking on it.

decomposition 2.png — Decomposition shows the model showing its work.

“Seeing the code helps you trust what the model is saying,” said Weidele. “It also helps you refine your query if the results don’t capture your intent.”

SQL answer 3.png — SQL answer shows solution.

Embedded in the CGUI is a visualization tool that lets you conversationally turn your results into a chart that can be edited and exported into a slide deck to more clearly communicate takeaway messages to colleagues.

bar chart 4.png — Shows barchart that can be exported as a PNG.

What’s next

IBM researchers are working with developers to incorporate these new generative AI features into several watsonx products. They are also continuing to improve the text-to-SQL generator and underlying language models.

Subscribe to our Future Forward newsletter and stay up to date on the latest research news

Subscribe to our newsletter

An artist’s tribute to modern AI
Q & A
Kim Martineau
27 Oct 2025
Expanding AI model training and inference for the open-source community
News
Peter Hess
21 Oct 2025
Toucan: A new goldmine for tool-calling AI agents
Release
Kim Martineau
17 Oct 2025
Lightweight tools for ‘steering’ LLMs down the right path
Research
Kim Martineau
15 Oct 2025

Search-locate-and-compare with generative AI

Talking to your data through a conversational GUI

What’s next

Related posts

An artist’s tribute to modern AI

Expanding AI model training and inference for the open-source community

Toucan: A new goldmine for tool-calling AI agents

Lightweight tools for ‘steering’ LLMs down the right path