Project Debater argues for key point analysis of surveys, reviews, and social media

Our new online demo for technology that identifies main points and their prevalence in opinionated text, and organized a shared task in the ArgMining workshop to push the field forward.

Since it was made public in 2018, Project Debater has taken on one-on-one live, long-form debates; answered crowdsourced questions; and in March of this year, we published details of its architecture and performance evaluation,¹ and made 12 elements of its technology freely available as APIs for academic research.

Those APIs included novel Key Point Analysis (KPA) technology, which identifies main points from opinionated text. And because not all arguments are made from podiums, or in 2,000-word editorials, we’ve put Debater’s KPA to work analyzing social media posts, product and business reviews,² and perhaps the pinnacle of all open-ended, meaningful, but difficult to understand opinionated text—survey results.

Our team shared Debater’s latest progress in KPA at this year’s EMNLP 21 conference, including a web demo³ that allows anyone to try KPA, without any programming required.

Watch this video on YouTube - IBM Project Debater Key Point Analysis Demo.

Since it was first deployed, more than 6 million comments have been analyzed with Debater’s KPA.

Key point analysis has practical applications in many businesses. Analysts in companies exploring the voice of their customers can use it to tease out what people think of their companies and products. HR analytics teams can use it to present the voice of the employees to executives, and analysts in market intelligence teams can use KPA to identify new trends in public opinion in relevant markets.

More comments? Yes, please.

The KPA service starts by summarizing a collection of comments on a given topic to establish a small set of key points. Then, the prominence of each point is determined by the number of matching mentions in the given comments. This prominence, or “salience,” results in a reported fraction of matching comments.

It highlights the most-prevalent points in the data as a set of key points in concise, human readable sentences—not word clouds or strings of clipped phrases, which are what existing opinion analysis solutions provide.

And the more comments, the better.

Human capacity for accurately understanding the nuance of surveys, reviews, and social posts tops out at around 200 comments. Up to this point, humans actually do better at predicting meaning than Debater, or any other NLP tech.

But as human acuity drops with each passing post, Debater’s KPA capability shines.

We put it to the test, analyzing more than 550,000 comments from an internal engagement survey taken by more than 300,000 IBM employees in 2020. The immediate benefit we, and our HR colleagues noticed was how the KPA tech automatically identified new recurring points as the data came in—for example, on pandemic-related topics.

IBM’s HR team found the results insightful and compelling, and will use Debater’s KPA again to analyze this year’s employee survey. And after a market technology evaluation, Debater’s KPA was recently selected as the core analysis technology in a new self-service solution developed for all HR Analytics needs at IBM. This technology is also available for commercial use through the IBM Research Early Access Program.

No training required.

“Training data” is often referred to as the backbone for how AI technologies learn and understand a topic that developers task it with. Machine learning has to “learn” from something. So, many existing text analytics solutions for opinion analysis require training or definition of new rules when applied to new data or domains.

Not so in the case of Debater’s KPA, whose deep learning models have been shown to perform well across different domains, with no additional domain-specific training data required.⁴ This drastically reduces—by up to 80% according to some users—the time and the manual effort required to turn this kind of raw data into actionable insights.

But there’s always room for improvement.

Existing EXAMPLE: In a community survey from 2016 and 2017, conducted in the city of Austin, Texas, citizens were asked “If there was ONE thing you could share with the Mayor regarding the city of Austin (any comment, suggestion, etc.), what would it be?” Read the team’s analysis and tutorial of how Debater found insight in the open-ended responses.users of KPA already note the differentiated quality of analysis generated by the technology, when compared to existing approaches. However, as KPA addresses a wide range of important tasks, we wanted to tap into the talent and capabilities of the academic community to further advance the field.

To facilitate this, we initiated a new shared task at this week’s ArgMining 21 workshop. A companion event to EMNLP 21, we solicited participants to develop new algorithms and methods that improve the quality of the generated key points, and the accuracy of the mapping between the key points and the comments. We defined a common dataset based on arguments on controversial topics collected by the crowd, including “for and against the regulation of social media,” and “for and against mandatory routine child vaccinations.”

The quality of each system was benchmarked against a gold standard.

Seventeen teams participated in the shared task. The top systems provided novel algorithms that can make Key Point Analysis more accurate and faster. The team of Alshomary, et al., out of Germany, produced the winning system, which used contrastive learning to improve results of KPA matching.⁵ We posted the complete results in Overview of the 2021 Key Point Analysis Shared Task.⁶

Test your own debates

You can interact with Debater’s online demos, including KPA, by selecting “login as guest” on our website.

API keys for programmatic use are provided freely for academic use, and for a free trial for business use. If you have any questions, please send an e-mail to project.debater@il.ibm.com.

Subscribe to our Future Forward newsletter and stay up to date on the latest research news

Subscribe to our newsletter

Notes

Note 1: EXAMPLE: In a community survey from 2016 and 2017, conducted in the city of Austin, Texas, citizens were asked “If there was ONE thing you could share with the Mayor regarding the city of Austin (any comment, suggestion, etc.), what would it be?” Read the team’s analysis and tutorial of how Debater found insight in the open-ended responses. ↩︎

References

Slonim, N., Bilu, Y., Alzate, C., et al. An autonomous debating system. Nature, 591(7850), 379–384 (2021). ↩
Bar-Haim, R., Eden, L., Kantor, Y., et al. Every Bite Is an Experience: Key Point Analysis of Business Reviews. ACL-IJCNLP. (2021). ↩
Bar-Haim, R., Kantor, Y., Venezian, E., et al. Project Debater APIs: Decomposing the AI Grand Challenge. EMNLP (Demos). (2021). ↩
Bar-Haim, R., Kantor, Y., Eden, L., et al. Quantitative Argument Summarization and Beyond: Cross-Domain Key Point Analysis. EMNLP. (2020). ↩
Alshomary, M., Gurcke, T., Syed, S., et al. Key Point Analysis via Contrastive Learning and Extractive Argument Summarization. Argument Mining Workshop. (2021). ↩
Friedman, R., Dankin, L., Hou, Y., et al. Overview of the 2021 Key Point Analysis Shared Task. Argument Mining Workshop. (2021). ↩

Debugging LLMs to improve their credibility
Research
Kim Martineau
30 Jul 2025
Teaching AI models to improve themselves
Research
Peter Hess
14 Aug 2024
IBM and RPI researchers demystify in-context learning in large language models
News
Peter Hess
25 Jul 2024
The latest AI safety method is a throwback to our maritime past
Research
Kim Martineau
16 Nov 2023

More comments? Yes, please.

No training required.

Test your own debates

Notes

References

Related posts

Debugging LLMs to improve their credibility

Teaching AI models to improve themselves

IBM and RPI researchers demystify in-context learning in large language models

The latest AI safety method is a throwback to our maritime past