This repository contains the data and explanations required to reconstruct the dataset as described in the paper “Detecting Persuasive Arguments based on Author-Reader Personality Traits and their Interaction”.
Dataset:
The data is in CVS format, and contains 9747 quadruplets of the form:
comment_id, submission_id, comment_author_id, delta_reader_id
For example:
ceefsm0, 1u4fqn, jmsolerm, CleanMyWounds53013
Extract the content using REDDIT API
- First access the comment content: https://www.reddit.com/api/info.json?id=comment_id, and from there you can get:
- The content of the comment
- The link to the submission by extracting the “parent_id“ field
- Get the content of the submission by accessing https://www.reddit.com/api/info.json?id=parent_id, and extracting the field “selftext”.
- To access the comment_author_id or delta_reader_id content, use: https://www.reddit.com/user/comment_author_id/comments/ to get the comments, and https://www.reddit.com/user/comment_author_id/posts/ to get the posts.
To read more about the API, see: https://www.reddit.com/dev/api/
If you use this data, please cite:
@inproceedings{umap-Shmueli-Scheuer19, author = {Michal Shmueli{-}Scheuer and Jonathan Herzig and David Konopnicki and Tommy Sandbank}, title = {Detecting Persuasive Arguments based on Author-Reader Personality Traits and their Interaction}, booktitle = {Proceedings of the 27th {ACM} Conference on User Modeling, Adaptation and Personalization, {UMAP} 2019, Larnaca, Cyprus, June 9-12, 2019.}, pages = {211--215}, year = {2019}, url = {https://doi.org/10.1145/3320435.3320467}, doi = {10.1145/3320435.3320467} }
Contact
- Michal Shmueli-Scheuer
Information Retrieval Solutions, IBM Research - Haifa