“We wanted a model with 95% of the big model’s accuracy but fast enough to be useful when low latency and high throughput is needed,” said Bishwaranjan Bhattacharjee, a senior technical leader who led the development of both models.
Similar versions of both HAP detectors have been available on IBM’s watsonx AI platform for more than a year — and in 11 languages, including English. IBM used a version of granite-guardian-hap-38m to vet all the data that went into its Granite language and code models.
By making both HAP detectors available on Hugging Face, IBM is continuing its tradition of promoting trustworthy AI.
“People tend to treat social harm and environmental harm as separate issues with LLMs, but it’s possible to minimize both,” said Kush Varshney, an IBM Fellow who has led IBM’s efforts to adapt its Trust 360 toolkits for generative AI.
Both HAP filters currently flag problematic sentences. Researchers are trying to develop more granular tools to isolate offensive words and phrases within a sentence. At the 2023 EMNLP conference, they demoed a multi-lingual HAP visualization tool that could eventually be used to hide offensive content from users.