A multi-layer system for semantic textual similarity

Ngoc Phuoc An Vo; Octavian Popescu

doi:10.5220/0006045800560067

IC3K 2016

Conference paper

09 Nov 2016

A multi-layer system for semantic textual similarity

View publication

Abstract

Building a system able to cope with various phenomena which falls under the umbrella of semantic similarity is far from trivial. It is almost always the case that the performances of a system do not vary consistently or predictably from corpora to corpora. We analyzed the source of this variance and found that it is related to the word-pair similarity distribution among the topics in the various corpora. Then we used this insight to construct a 4-module system that would take into consideration not only string and semantic word similarity, but also word alignment and sentence structure. The system consistently achieves an accuracy which is very close to the state of the art, or reaching a new state of the art. The system is based on a multi-layer architecture and is able to deal with heterogeneous corpora which may not have been generated by the same distribution.

Conference paper