IBM Research | Debater Datasets

Overview

Argument Detection

Argument Quality

Argument Stance
Classification and
Sentiment Analysis

Debate Speech Analysis

Debate Topic Expansion

Expressive
Text to Speech

Basic NLP Tasks

Classes of
Principled Arguments

Key Point Analysis

Claim Generation

Multilingual
Argument Mining

Targeted Sentiment
Analysis

Intent Classification

Trust Classification

IBM Project Debater
Debater Datasets

Back to Project Debater

Project Debater Datasets

The development of an automatic debating system naturally involves advancing research in a range of artificial intelligence fields. This page presents several annotated data sets developed as part of Project Debater to facilitate this research. It is organized by research sub-fields explained below.

Argument Mining is a prominent research frontier. Within this field, we distinguish between Argument Detection - the detection and segmentation of argument components such as claims and evidence; and Argument Stance Classification – determining the polarity of an argument component with respect to a given topic.

Beyond argument mining, a debating system should face the challenge of interactivity i.e., the ability to understand and rebut the text of the opponent’s speech. Debate Speech Analysis is a new research field that focuses on this challenge.

Another important aspect of a debating system is the ability to interact with its surroundings in a human-like manner. Namely, it should be able to articulate arguments and listen to arguments made by others. Regarding the former, the Text to Speech system must demonstrate human-like expressiveness to keep human listeners engaged. The latter may call for Speech-to-text systems that are especially designed for a debating scenario.

Finally, a debating system should naturally rely on more fundamental NLP capabilities. One example is the ability to assess the semantic relatedness of various pieces of texts and glue these into a coherent narrative. The system should also have the ability to identify the basic concepts mentioned in the text. The corresponding benchmark data we released thus far in this context are described in the section on Basic NLP.

Yoav Katz, Manager, Project Debater team, IBM Research - Haifa

Yoav Katz,
Manager, Project Debater team,
IBM Research - Haifa

Contact

Noam Slonim, Principal Investigator, Project Debater team, IBM Research - Haifa

Noam Slonim,
Principal Investigator, Project Debater team,
IBM Research - Haifa

Contact

This page allows you to download copies of the Project Debater Datasets.

The datasets are released under the following licensing and copyright terms, unless specified otherwise in their release notes:

To download, please fill in the request forms below.

Other datasets are expected to be released over time.

Overview

Argument Detection

Argument Quality

Argument Stance
Classification and
Sentiment Analysis

Debate Speech Analysis

Debate Topic Expansion

Expressive
Text to Speech

Basic NLP Tasks

Classes of
Principled Arguments

Key Point Analysis

Claim Generation

Multilingual
Argument Mining

Targeted Sentiment
Analysis

Argument Detection

The various argument detection datasets differ in size (e.g., number of topics), type of element detected (claims, claim sentences, or evidence), and method used for detection (pre-selected articles vs. automatic retrieval). The table below lists the different datasets and provides information on their characteristics:

Dataset	Reference	Topics	Element	Method
IBM Debater® - Argumentative Sentences in Recorded Debates New annotation layer for 700 sentences that were extracted from ASR (automatic speech recognition) output of debate speeches over controversial topics. A sentence is annotated as positive if it contains an argument for the given topic. The sentences were sampled from the releases of IBM Debater® - Recorded Debating Datasets..	Findings of EMNLP 2020	20	700	Extracted from ASR (automatic speech recognition) output of debate speeches over controversial topics
IBM Debater® - Evidences Sentences 29,429 pairs of a topic and a sentence with a score between 0 to 1, indicating the extent to which the sentence can serve as evidence that supports or contests the topic. The sentences were extracted from Wikipedia, and the score was obtained by manual annotation using crowd-sourcing (Figure Eight platform, www.figure-eight.com).	AAAI 2020	222	29,429	Automatically retrieved Wikipedia sentences with manual crowd-sourcing scoring
IBM Debater® - Claim Sentences Search Claim sentence search results of the q_mc query containing 1.49M sentences, and claim sentence test set containing 2.5k top predicted sentences along with their labels. The sentences were retrieved from Wikipedia 2017. The dataset includes: - readme_mc_queries.txt - Readme of the claim sentence search results - readme_test_set.txt - Readme of the test set - q_mc_train.csv - Sentences retrieved by the q_mc query on 70 train topics - q_mc_heldout.csv - Sentences retrieved by the q_mc query on 30 heldout topics - q_mc_test.csv - Sentences retrieved by the q_mc query on 50 test topics - test_set.csv - Top predictions of our system along with their labels	COLING 2018	150 (70 train, 30 held-out, 50 test)	Claim Sentence	Automatically retrieved Wikipedia sentences
IBM Debater® - Evidence Sentences 5,785 pairs of a topic and a sentence with a binary annotation indicating whether the sentence is valid evidence relevant for the topic. The sentences were extracted from Wikipedia, and the prior for a positive instance is 41%. The data set includes 118 diverse topics, from domains such as politics, science and education. Each topic generally deals with one clearly identifiable concept. The data set is split into two sets: 83 topics for training (4,066 sentences), and 35 topics for testing (1,719 sentences).	ACL 2018	118 (83 train, 35 test)	Evidence	Automatically retrieved Wikipedia sentences
IBM Debater® - Claims and Evidence 2294 labeled claims and 4690 labeled evidence for 58 different topics. Labeled data published by Rinott et al. EMNLP-2015. This data is an extension of the CE-ACL-2014 data. The dataset includes: - Two CSV files containing, for each topic, the claims and evidence that were identified for it in relevant Wikipedia articles. - The original Wikipedia articles - from Wikipedia April 2012 dump - in the form of text files, cleaned from any Wikisyntax or HTML markup.	EMNLP 2015	58 (leave one topic out)	Claim/Evidence	Pre-selected Wikipedia articles
IBM Debater® - Claims and Evidence 1,392 labeled claims for 33 different topics, and 1,291 labeled evidence for 350 distinct claims in 12 different topics. These data were published by Aharoni et al. in the First Workshop on Argumentation Mining at ACL-2014. The dataset includes: - Two CSV files containing, for each topic, the claims and evidence that were identified for it in relevant Wikipedia articles. - The original Wikipedia articles - from Wikipedia April 2012 dump - in the form of text files, cleaned from any Wikisyntax or HTML markup.	ArgMining 2014	33 (leave one topic out)	Claim/Evidence	Pre-selected Wikipedia articles

Dataset	Reference	Topics	Element	Method
IBM Debater® - Evidence Quality 5,697 pairs of evidence with annotations for the question of which evidence, in each pair, is more convincing in the context of a given topic. The evidences were taken from the data set IBM Debater® - Evidence Sentences and cover 69 different topics. The data set is split into two sets: 4,319 pairs for train and 1,378 for test.	ACL 2019	69	Evidence	Automatically retrieved Wikipedia sentences
IBM Debater® - IBM-ArgQ-6.3kArgs - 6.3K arguments with point-wise quality label (before cleansing) IBM Debater® - IBM-ArgQ-5.3kArgs - 5.3K arguments with point-wise quality label (after cleansing) IBM Debater® - IBM-ArgQ-14kPairs - 14K argument pairs with pair-wise quality label (before cleansing) IBM Debater® - IBM-ArgQ-9.1kPairs - 9.1K argument pairs with pair-wise quality label (after cleansing) 6.3K arguments collected actively from crowds about 22 debatable motions. All arguments were labeled for point-wise quality and a set of 14K argument pairs were labeled for pair-wise quality. The dataset includes the arguments before and after cleansing.	EMNLP 2019	22	Argument	Actively collected arguments from crowds
IBM Debater® - IBM-ArgQ-Rank-30kArgs 30,497 arguments actively collected from the crowd on 71 debatable topics, labeled for point-wise quality and stance. Each argument is presented with two types of quality scores, as presented in the paper, and a stance score. Arguments are split into train, dev and test sets.	arXiv	71	Argument	Actively collected arguments from crowds

Dataset	Reference	Content	Source
IBM Debater® - Sentiment Lexicon of IDiomatic Expressions (SLIDE) 5000 frequently occurring idioms with sentiment annotation. The idioms were selected from Wiktionary, and over 40% of them were found to be sentiment-bearing via crowdsource labeling. Dataset includes idioms, sentiment labels, and distribution of sentiment annotation from crowdsourced labels.	LREC 2018	5,000 frequently occurring idioms with sentiment annotation	Manually annotated idioms from Wiktionary
IBM Debater® - Sentiment Composition Lexicons Sentiment composition lexicons containing 2,783 words and sentiment lexicons containing 66K unigrams and 262K bigrams. The lexicons were learned from a large proprietary English corpus. The dataset includes: - ReleaseNotes.txt - release notes - SEMANTIC_CLASSES.xlsx - composition lexicons for reversers, propagators, and dominators - ADJECTIVES.xlsx - composition lexicons for two pairs of gradable adjectives - LEXICON_UG.txt - unigrams sentiment lexicon - LEXICON_BG.txt - bigrams sentiment lexicon	COLING 2018	Sentiment composition lexicons containing 2,783 words and sentiment lexicons containing 66K unigrams and 262K bigrams.	Automatically learned from a large proprietary English corpus

Dataset	Reference	Speeches	Topics	Contents
IBM Debater® - Recorded Debating Dataset - Release #5 (Full version - 2 parts) + Counter speech annotations 3,562 speeches recorded by professional debaters discussing 440 controversial topics (with their automatic and manually-corrected transcript texts), and an annotation specifying the response speeches recorded for each speech. The dataset will include: - Audio files of all debate speeches - Automatic and manually-corrected transcripts of the speeches, in both raw and cleaned (processed) versions - An annotation specifying the response speeches recorded for each speech, and the type of the response (explicit/implicit) - Metadata describing the speeches, such as the topic discussed in each speech Size: 30 + 21.7 GB IBM Debater® - Recorded Debating Dataset - Release #5 (Downsampled audio files) + Counter speech annotations 3,562 speeches recorded by professional debaters discussing 440 controversial topics (with their automatic and manually-corrected transcript texts), and an annotation specifying the response speeches recorded for each speech. The dataset will include: - Audio files of all debate speeches (down-sampled, mono & compressed with flac) - Automatic and manually-corrected transcripts of the speeches, in both raw and cleaned (processed) versions - An annotation specifying the response speeches recorded for each speech, and the type of the response (explicit/implicit) - Metadata describing the speeches, such as the topic discussed in each speech Size: 21.2 GB IBM Debater® - Recorded Debating Dataset - Release #5 (Light version - no audio files) + Counter speech annotations 3,562 speeches recorded by professional debaters discussing 440 controversial topics (with their automatic and manually-corrected transcript texts), and an annotation specifying the response speeches recorded for each speech. The dataset will include: - Automatic and manually-corrected transcripts of the speeches, in both raw and cleaned (processed) versions - An annotation specifying the response speeches recorded for each speech, and the type of the response (explicit/implicit) - Metadata describing the speeches, such as the topic discussed in each speech Size: 12.1 MB	ACL 2020	3,562	440	- Recordings of expert debaters - Automatic and manually-corrected transcripts of the speeches, in both raw and cleaned (processed) versions - An annotation specifying the response speeches recorded for each speech, and the type of the response (explicit/implicit) - Metadata describing the speeches, such as the topic discussed in each speech
IBM Debater® - Recorded Debating Dataset - Release #4 (Full version) + Annotated general-purpose claim-rebuttal pairs 200 speeches recorded by professional debaters discussing 50 controversial topics (with their manual and automatic transcriptions), and 55 general-purpose claim-rebuttal pairs, along with the results of several annotation experiments performed on these data. The dataset includes: - Audio files of 200 debating speeches. [first released in IBM Debater® - Recorded Debating Dataset - Release #2] - Manual and automatic transcripts of the speeches, in both raw and cleaned (processed) versions. [first released in IBM Debater® - Recorded Debating Dataset - Release #2] - 55 general-purpose claim-rebuttal pairs written by an expert human debater - The results of several annotation experiments performed using the general-purpose claim-rebuttal pairs and the speeches Size: 3.2 GB IBM Debater® - Recorded Debating Dataset - Release #4 (Compressed audio files) + Annotated general-purpose claim-rebuttal pairs 200 speeches recorded by professional debaters discussing 50 controversial topics (with their manual and automatic transcriptions), and 55 general-purpose claim-rebuttal pairs, along with the results of several annotation experiments performed on these data. The dataset includes: - Audio files of 200 debating speeches (down-sampled, mono & compressed with flac). [first released in IBM Debater® - Recorded Debating Dataset - Release #2] - Manual and automatic transcripts of the speeches, in both raw and cleaned (processed) versions. [first released in IBM Debater® - Recorded Debating Dataset - Release #2] - 55 general-purpose claim-rebuttal pairs written by an expert human debater - The results of several annotation experiments performed using the general-purpose claim-rebuttal pairs and the speeches Size: 1.2 GB IBM Debater® - Recorded Debating Dataset - Release #4 (Light version - no audio files) + Annotated general-purpose claim-rebuttal pairs 200 speeches recorded by professional debaters discussing 50 controversial topics (with their manual and automatic transcriptions), and 55 general-purpose claim-rebuttal pairs, along with the results of several annotation experiments performed on these data. The dataset includes: - Manual and automatic transcripts of the speeches, in both raw and cleaned (processed) versions. [first released in IBM Debater® - Recorded Debating Dataset - Release #2] - 55 general-purpose claim-rebuttal pairs written by an expert human debater - The results of several annotation experiments performed using the general-purpose claim-rebuttal pairs and the speeches Size: 3.3 MB	EMNLP 2019 EMNLP 2018	200	50	- Recordings of expert debaters - 55 general-purpose claim and rebuttal pairs written by an expert human debater - An annotation specifying for each of the 50 controversial topics, which of the 55 general-purpose claims is relevant to the topic - An annotation of general-purpose claims relevant to a topic, specifying whether a relevant claim was mentioned in speeches discussing the topic - An annotation of general-purpose claims and sentences from speeches in which they were mentioned, specifying whether the claim was mentioned in the sentence - An annotation of general-purpose rebuttals, specifying whether they are a plausible response to general-purpose claims mentioned in speeches
IBM Debater® - Recorded Debating Dataset - Release #3 (Full version) + Annotated mined claims 400 speeches recorded by professional debaters about 200 controversial topics (with their manual and automatic transcripts) and 4,876 mined claims annotated as mentioned explicitly, implicitly, or not at all, in those speeches. The dataset includes: - Audio files of 400 debating speeches - Manual and automatic transcripts of the speeches, in both raw and cleaned (processed) versions - 4,876 annotated mined claims Size: 5.6GB IBM Debater® - Recorded Debating Dataset - Release #3 (Compressed audio files) + Annotated mined claims 400 speeches recorded by professional debaters about 200 controversial topics (with their manual and automatic transcripts) and 4,876 mined claims annotated as mentioned explicitly, implicitly, or not at all, in those speeches. The dataset includes: - Audio files of 400 debating speeches (down-sampled, mono & compressed with flac) - Manual and automatic transcripts of the speeches, in both raw and cleaned (processed) versions - 4,876 annotated mined claims Size: 2.4GB IBM Debater® - Recorded Debating Dataset - Release #3 (Light version - no audio files) + Annotated mined claims 400 speeches recorded by professional debaters about 200 controversial topics (with their manual and automatic transcripts) and 4,876 mined claims annotated as mentioned explicitly, implicitly, or not at all, in those speeches. The dataset includes: - Manual and automatic transcripts of the speeches, in both raw and cleaned (processed) versions - 4,876 annotated mined claims Size: 5.16MB	ArgMining 2019 @ACL LREC 2018	400	200	Recordings of expert debaters + mined claims annotated in a listening comprehension task
IBM Debater® - Recorded Debating Dataset - Release #2 (Full version) + Annotated arguments 200 speeches recorded by professional debaters about controversial topics (with their manual and automatic transcripts) and 756 arguments annotated as mentioned/not mentioned in these speeches. Note: 60 speeches from release#1 are not included in this dataset. The dataset includes: - Audio files of 200 debating speeches - Manual and automatic transcripts of the speeches, in both raw and cleaned (processed) versions - 756 annotated arguments Size: 5GB IBM Debater® - Recorded Debating Dataset - Release #2 (Compressed audio files) + Annotated arguments 200 speeches recorded by professional debaters about controversial topics (with their manual and automatic transcripts) and 756 arguments annotated as mentioned/not mentioned in these speeches. Note: 60 speeches from release#1 are not included in this dataset. The dataset includes: - Audio files of 200 debating speeches (down-sampled, mono & compressed with flac) - Manual and automatic transcripts of the speeches, in both raw and cleaned (processed) versions - 756 annotated arguments Size: 1.19GB IBM Debater® - Recorded Debating Dataset - Release #2 (Light version - no audio files) + Annotated arguments 200 speeches recorded by professional debaters about controversial topics (with their manual and automatic transcripts) and 756 arguments annotated as mentioned/not mentioned in these speeches. Note: 60 speeches from release#1 are not included in this dataset. The dataset includes: - Manual and automatic transcripts of the speeches, in both raw and cleaned (processed) versions. - 756 annotated arguments Size: 2.87MB	EMNLP 2018 LREC 2018	200	50	Recordings of expert debaters + arguments annotated in a listening comprehension task
IBM Debater® - Recorded Debating Dataset - Release #1 (Full version) 60 speeches recorded by professional debaters about controversial topics, and their manual and automatic transcripts, in both raw and cleaned (processed) versions. The dataset includes: - Audio files of 60 debating speeches - Manual and automatic transcripts of the speeches, raw and cleaned versions Size: 1.62GB IBM Debater® - Recorded Debating Dataset - Release #1 (Compressed audio files) 60 speeches recorded by professional debaters about controversial topics, and their manual and automatic transcripts, in both raw and cleaned (processed) versions. The dataset includes: - Audio files of 60 debating speeches (down-sampled, mono & compressed with flac) - Manual and automatic transcripts of the speeches, raw and cleaned versions Size: 326MB IBM Debater® - Recorded Debating Dataset - Release #1 (Light version - no audio files) 60 speeches recorded by professional debaters about controversial topics, and their manual and automatic transcripts, in both raw and cleaned (processed) versions. The dataset includes: - Manual and automatic transcripts of the speeches, raw and cleaned versions Size: 1MB	LREC 2018	60	16	Recordings of 10 expert debaters

Dataset	Reference	Number of Topics	Type of elements	Number of pairs
IBM Debater® - Wikipedia Oriented Relatedness Dataset (WORD) A large semantic relatedness dataset, composed of 19,276 pairs of Wikipedia concepts with manual scores for their level of relatedness.	LREC 2018	143 (82 train, 41 test)	Wikipedia Entities	19,276 (12,969 train, 6307 test)
IBM Debater® - Multi-word Term Relatedness Benchmark (TR9856) Term-relatedness values for 9,856 pairs of terms. These data were published by Levy et al. at ACL-2015. The dataset includes: - Release Notes.txt - release notes describing the data - TermRelatednessResults.csv - the dataset - TermRelatednessLabeling.doc - the guidelines used for labeling the data	ACL 2015	47	Words and Multi-word Terms	9,856

Dataset	Reference	Content
IBM Debater® - CoPA-Motion Labeling Matching of motions to CoPAs	ACL 2019	Association of CoPAs to 689 motions
IBM Debater® - CoPA-Speech Labeling Fraction of claims mentioned in speeches, by CoPA	ACL 2019	Statistics for the number of claims mentioned in recorded speeches from each CoPA

Dataset	Reference	Topics	Pairs	Source
IBM Debater® - SurveyKP - 15K labeled (sentence, key point) pairs 15,189 (argument, key point) pairs labeled as matching/non-matching, for 1 controversial topics. For each pair, the topic and stance are also indicated.	EMNLP 2023 Industry Track	1	15,189	The sentences were sampled from open-ended responses to the 2016-2017 Austin Community Survey. The key points were automatically extracted by our KPA system from the entire survey, with minor manual edits. The stance of each pair (whether the feedback is positive or negative) is also indicated. The labeling of each (sentence, key point) pair as matching/non-matching was performed manually, by an in-house team of annotators.
IBM Debater® - ArgKP - 2023 9.2K labeled (argument, key point) pairs 9,281 (argument, key point) pairs labeled as matching/non-matching, for 10 controversial topics. For each pair, the topic and stance are also indicated.	EMNLP 2023 Industry Track	10	9,281	Similar to the ArgKP dataset (see below), ArgKP-2023 includes pro and con arguments from the IBM-ArgQ-Rank-30kArgs dataset, for 10 additional topics. Key points were extracted automatically by our system. The labeling of each (argument, key point) pair as matching/non-matching was performed manually, via crowd-sourcing.
IBM Debater® - ArgKP - 2021 27.5K labeled (argument, key point) pairs 27,519 (argument, key point) pairs labeled as matching/non-matching, for 31 controversial topics. For each pair, the topic and stance are also indicated.	Argument Mining workshop @ EMNLP 2021	31	27,519	ArgKP dataset extended with 3 additional topics. The extension arguments were actively collected from the crowd and labeled for stance. As in ArgKP, the key points for each new topic were manually composed by an expert debater. The labeling of each new (argument, key point) pair as matching/non-matching was performed manually, via crowd-sourcing.
IBM Debater® - ArgKP - 24K labeled (argument, key point) pairs 24,093 (argument, key point) pairs labeled as matching/non-matching, for 28 controversial topics. For each pair, the topic and stance are also indicated.	ACL 2020	28	24,093	The arguments, along with their topic and stance were taken from the IBM-ArgQ-Rank-30kArgs dataset (see above). The key points for each topic were manually composed by an expert debater. The labeling of each (argument, key point) pair as matching/non-matching was performed manually, via crowd-sourcing.

IBM Project Debater Debater Datasets

Project Debater Datasets

Argument Detection

Argument Quality

Argument Stance Classification and Sentiment Analysis

Claim Stance

Sentiment Analysis

Expert Stance

Debate Speech Analysis

Debate Topic Expansion

Expressive Text to Speech

Basic NLP Tasks

Semantic Relatedness

Mention Detection

Text Clustering

Concept Abstractness

Concept Controversiality

Automatic Claim Negation

Classes of Principled Arguments

Key Point Analysis

Argumentation Datasets

Municipal Survey Dataset

Claim Generation

Multilingual Argument Mining

Targeted Sentiment Analysis

Intent Classification

Trust Classification

Debater Datasets - Licensing Notice

IBM Project Debater
Debater Datasets