IBM’s Granite model is one of the most transparent LLMs in the world
A new report from Stanford University’s Center for Research on Foundation Models showed that IBM’s model scored a perfect 100% in several categories designed to measure how open models really are.
A new report from Stanford University’s Center for Research on Foundation Models showed that IBM’s model scored a perfect 100% in several categories designed to measure how open models really are.
Generative AI and foundation models have already begun to revolutionize how the world lives and works. Popular chatbots and models are used by tens of millions of people around the world each day. But how these models work, the data they’re trained on and how it’s weighted, is often obfuscated.
To ensure that AI affects the greatest number of people as safely and equitably as possible, IBM is committed to transparency with its models. Along with spurring global innovation, it’s one of the major reasons why we recently open-sourced our Granite code models. And today, our commitment to transparency was bolstered by the AI community itself.
Stanford’s Center for Research on Foundation Models released its second Foundation Model Transparency Index (FMTI), which scores major models on a set of 100 indicators that intend to show in detail how transparent that model is for developers and end users. In the new report, IBM’s Granite large language model scored better overall than many popular models.
To create the report, members of each organization’s model team compiled information on how their respective models performed on the 100 indicators chosen by Stanford. These include things like data sources, training compute resources, harmful data filtration, model components, terms of service, and licensing details.
The team at Stanford took each group’s compilations and scored them against their rubrics for each indicator. Each organization was given time to refute any of Stanford’s scores before they were published today.
In many of the categories tested by the FMTI, IBM’s Granite model scored a perfect 100%, far surpassing the averages for those categories. In the risk mitigations category, Granite was the clear leader. In the data category, the Granite model performed better than any other model, barring ServiceNow’s Starcoder, which is a code model, rather than a natural-language model. IBM’s Granite model also scored a perfect 100% in the compute, methods, and model basics categories. Overall, IBM’s Granite model scored 64%, with only three of the models surveyed performing better.
These results were based on a report sent to Stanford in February — before IBM open-sourced any model. The team at IBM Research who filed the report included IBM Fellow Kush Varshney who believes that with the latest version of the Granite model open to anyone, IBM’s transparency score would likely be even higher.
For IBM, making transparent models isn’t just important from an ethical standpoint — it’s a sensible business decision. Many enterprises have been reluctant to roll out LLMs at scale, and the reason is the same as any other supply chain decision: Businesses want to know where their supply of goods or services comes from.
Even when there are clear use cases for adopting AI models, companies need to contend with the opacity of the provenance of their data, how they were trained, how they’re filtered for hate, abuse, and profanity, among myriad other questions. By being transparent by default, businesses can spend more time looking for solutions to their problems — rather than worrying about the reliability of the models they’re using.