Social question answering: Textual, user, and network features for best answer prediction
Abstract
Community question answering (CQA) sites use a collaborative paradigm to satisfy complex information needs. Although the task of matching questions to their best answers has been tackled for more than a decade, the social question-answering practice is a complex process. The factors influencing the accuracy of question-answer matching are many and hard to disentangle. We approach the task from an applicationoriented perspective, probing the space of several dimensions relevant to this problem: features, algorithms, and topics. We gather under a learning to rank framework the most extensive feature set used in literature to date, including 225 features from five different families. We test the power of such features in predicting the best answer to a question on the largest dataset from Yahoo Answers used for this task so far (40M answers) and provide a faceted analysis of the results along different topical areas and question types. We propose a novel family of distributional semantics measures that most of the time can seamlessly replace widely used linguistic similarity features, being more than one order of magnitude faster to compute and providing greater predictive power. The best feature set reaches an improvement between 11% and 26% in P@1 compared to recent well-established state-of-the-art methods.