News
3 minute read

IBM Granite has new experimental features for developers to test

IBM has open-sourced a pair of prototype tools for detecting hallucinations in RAG applications and estimating model uncertainty. If feedback is positive, these capabilities could be added to the next Granite release.

IBM has open-sourced a pair of prototype tools for detecting hallucinations in RAG applications and estimating model uncertainty. If feedback is positive, these capabilities could be added to the next Granite release.

The promise of large language models is offset by one notable issue: they can be unpredictable. It’s not always easy to tell when information an LLM just generated is accurate or grounded on trusted documents.

IBM Research has designed a pair of open-source low-rank adapters, or LoRAs, to give developers more control over AI content generation. These experimental adapters can be swapped in and out of IBM’s Granite 3.0 Instruct 8B model to provide an additional quality check. The LoRAs are available on Hugging Face, and if they prove to be useful, the data used to train them will be incorporated into Granite 3.1.

The LoRAs are part of a new IBM Research playground called Granite Experiments for testing ideas to see if AI developers see value in them. If the two experiments are successful, researchers plan to put out new open-source features for community testing.

“We wanted to give developers a chance to try out these features and provide feedback,” said Kate Soule, director of technical product management at IBM Research. “LoRAs provide a way to experiment and iteratively build out capabilities quickly.”

One LoRA acts as a kind of weather forecaster, estimating the likelihood that the model’s generated answer is correct. The other LoRA flags potentially hallucinated content in retrieval-augmented generation (RAG) applications, identifying when the model’s responses may have drifted from the source documents provided.

The new LoRAs were built for Granite Instruct 8B models, but similar LoRAs could be built for other Granite 3.0 models using the same pipeline. Both LoRAs accept structured inputs, allowing developers to turn features on and off as needed. The experimental LoRAs are meant to complement IBM’s standalone Granite Guardian models for detecting biased, dangerous, and inaccurate content in AI training data and at inference time.

A sentence-by-sentence source check

RAG applications help to improve an LLM’s reliability by restricting their responses to a database or set of documents related to a specialized domain, like HR policy or US environmental law. RAG keeps LLMs on-topic and reduces the chance of ‘hallucinations,’ or responses that sound authoritative but may be incorrect or misleading.

But even with RAG, LLMs sometimes improvise in ambiguous situations. IBM’s RAG LoRA is a kind of improv detector. For each sentence the model generates, the LoRA provides a source and a rating of how likely the sentence is tied to it. Sentences with no apparent connection are flagged as a high hallucination risk.

“It gives you more information,” says Chulaka Gunasekara, an IBM researcher focused on RAG and AI reasoning. “If you see the citation, you know what’s behind the model’s answer and whether to trust it.”

Calculating the odds of an accurate response

If you follow the weather, you probably know that predictions come with uncertainty estimates. A forecast calling for 90% chance of rain is a strong signal to pack an umbrella. If chances are only 10%, it’s probably safe to leave the umbrella at home.

IBM created a LoRA that acts in a similar way to rate its confidence in the LLM’s prediction. “If the model’s confidence is high, the answer is most likely accurate,” says Prasanna Sattigeri, an IBM researcher focused on AI safety. “If confidence is low, I can delegate my question to another model to see if it does better.”

IBM’s LoRA for measuring uncertainty comes out of AI model calibration work done at the MIT-IBM Watson AI Lab and presented at the 2024 International Conference on Machine Learning. Researchers provided their target Granite model with a dataset of questions, and had the model generate answers that were checked for accuracy against the ground-truth answers.

Researchers used this evaluation data to train an auxiliary model to predict the probability that the AI-generated answers were correct, without seeing the actual answers. “The auxiliary model learns what the Granite model knows and doesn't know, allowing it to generalize to unseen question-answer pairs,” says Kristjan Greenewald, an AI researcher at IBM who co-authored the work.

The auxiliary model would have been difficult to deploy because of its size, so researchers trained a LoRA to imitate it. Now, when this LoRA is flipped on, an accuracy rating appears alongside responses generated by IBM’s Granite Instruct model.

What’s next

With feedback from developers, IBM researchers could potentially incorporate these LoRA capabilities into the next iteration of Granite. The LoRAs are an experiment to see if it’s possible to solicit feedback earlier in the AI-model development cycle and to speed up innovation.

Under the existing process, it can take up to a year for AI model updates to be released. By contrast, traditional software updates can be done in as quickly as a day or two. If the new IBM adapters prove to be popular, researchers plan to develop new specialized features and gather community feedback.