About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
SOLI 2017
Conference paper
Effective data curation for frequently asked questions
Abstract
Frequently-asked-question (FAQ) systems are effective in operating and reducing costs of IT services. Basically, FAQ data preparation requires data curation of available heterogeneous question-and-answer (QA) data sets and creating FAQ clusters. We identified that the labor intensiveness of data curation is a major problem and that it strongly affects the final FAQ output quality. To deal with this problem, we designed a FAQ creation system with a strong focus on the effectiveness of its data-curation component. We conducted a field study by inspecting two sources: incident reports and a QA forum. The first source of incident reports showed a high F-score of 89.9% (precision: 82.5%, recall: 100%). We also applied the same set of parameters to 300 entries of the QA forum and achieved an F-score of 94.3% (precision: 94.9%, recall: 93.8%).