We also created a retrieval system, eXplanation Retriever (XR), shown in Figure 2, that represents properties in a latent space, where data is transformed to using deep learning techniques, and retrieves the facts against a CQA example from a given common-sense knowledge corpus.
Our generation system, eXplanation Generator (XG), is based on the OpenAI’s GPT-2 model. XG can generate the common-sense properties for a given question, answer choice, and the (in)correctness flag for the answer choice (as shown in upper part of Figure 3). XG also has a free-flow explanation generation model to generate the explanations in natural language (as shown in the lower part of Figure 3).
XR outperformed one popular information retrieval method—known as BM25—by a relative gain of 100% when retrieving explanations for the correct answers from our ECQA corpus of annotated properties.
Our “property generation model” earned a respectable alignment F1 score of 36.4 between generated properties and gold properties. Here, an F1 Score captures the quality of the overlap between the generated set of properties vis-à-vis the gold set of properties, averaged with the entire test data. Further, our “free-flow explanation generation model” achieved an STS-BERT score (a semantic similarity score) of 61.9 with gold free-flow explanations.
To ensure that performance numbers for both of our generative models capture the human perception about generated explanations quality, we picked those semantic similarity metrics (from among multiple metrics including STS-BERT, SPICE, CIDEr, METEOR, ROUGE) that we found to be maximally correlated with human judgement.
Our research opens up several avenues for further usage by researchers, as well as practitioners in the field. One of the prominent real-world applications is in the field of primary education, where the techniques developed in this work could be used to build novel AI apps that can converse with children to boost their general understanding of the world around them, using common sense explanations. This could include helping them better understand why some of the routine phenomena in their lives behave as they do—e.g., why one should give way to a speeding ambulance.
Further, the underlying concept for introducing explainability into AI could likewise be applied beyond common sense questions. With the right domain expertise added to the ECQA dataset, AI could be made to explain right and wrong answers in any number of areas, including science, medicine, or finance.
Knowledge and Reasoning: At IBM Research, we’re working on systems to help AI better reason with the tasks it’s presented, such as understanding context and analogies, comprehension, and planning through scenarios.
Conversational AI: The demand for virtual agents that can handle customer needs has continued to increase dramatically. At IBM Research, we’re building the next generation of artificial intelligence systems.
Shourya Aggarwal, Divyanshu Mandowara, Vishwajeet Agrawal, Dinesh Khandelwal, Parag Singla, Dinesh Garg. Explanations for CommonsenseQA: New Dataset and Models. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). (2021). ↩
Alon Talmor, Jonathan Herzig, Nicholas Lourie, and Jonathan Berant. 2019. CommonsenseQA: A question answering challenge targeting commonsense knowledge. In Proceedings of NAACL-HLT, pages 4149–4158. ↩
Nazneen Fatema Rajani, Bryan McCann, Caiming Xiong, and Richard Socher. 2019. Explain yourself! leveraging language models for commonsense reasoning. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. (2019). ↩