Localization is the first big task tackled with an agent. If you can’t find where a bug is in your code, you can’t fix it. Localization is the task of finding the files and lines of code in an organization’s codebase that are causing a given error.
When an error is spotted by a quality assurance (QA) engineer, they’ll file a bug report, which would go into the developer’s backlog and add to the pile of bugs to sift through. Finding the offending line — and ensuring altering it won’t affect anything else in the codebase — is a time-consuming process.
But with the SWE localization agent, a developer could open a bug report they’ve received on GitHub, tag it with “ibm-ai-agent-swe-1.0” and the agent will quickly work in the background to find the troublesome code. Once it’s found the location, it’ll suggest a fix that the developer could implement to resolve the issue. That developer could then review the proposed fix and decide if it’s the best way to solve the problem, potentially even using other agents to figure it out.
The localization tool is just one of a few new AI agents IBM Research has developed that aim to take a chunk out of the workload developers have on their plate. There’s also one for editing lines of code based on developer requests, which relies on IBM's Granite LLM on watsonx, via PDL. Another agent can be used for developing and executing tests to ensure that code will run as intended. In each case, they can be invoked right where developers would want them to be, such as in GitHub.
On average, the SWE agents can localize and fix problems within five minutes, and in testing, they have managed a 23.7% success rate on the SWE-bench tests. These measure how efficiently AI agents can solve real-world problems found on GitHub. That score places the IBM SWE agent high up the SWE-bench leaderboard, well above many other agents relying on massive frontier models, like GPT-4o and Claude 3.
In each case, these agents observe, think, and act. They differ from singular LLMs or foundation models, as they can call upon different models and stores of information to answer questions in ways that are most efficient. And they can plan out the steps required to carry out that task — something an LLM cannot do on its own. An agent can do complex, complete tasks without additional input from a user.
It made sense for IBM to build agentic tools like these, argues Ruchir Puri, chief scientist at IBM Research, not just for its own developers, but for all the enterprise developers IBM strives to assist. There are other competitive SWE agent tools looking to aid developers at work, but they primarily are relying on massive, proprietary frontier models that cost a great deal at inference time. “Our goal was to build IBM AI Agent SWE for enterprises who want a cost efficient SWE agent to run wherever their code resides — even behind your firewall — while still being performant,” Puri said.