Sensespotting: Never let your parallel data tie you to an old domain

Marine Carpuat; H. Daumé III; Katharine Henry; Ann Irvine; Jagadeesh Jagarlamudi; Rachel Rudinger

ACL 2013

Conference paper

04 Aug 2013

Sensespotting: Never let your parallel data tie you to an old domain

Abstract

Words often gain new senses in new domains. Being able to automatically identify, from a corpus of monolingual text, which word tokens are being used in a previously unseen sense has applications to machine translation and other tasks sensitive to lexical semantics. We define a task, SenseSpotting, in which we build systems to spot tokens that have new senses in new domain text. Instead of difficult and expensive annotation, we build a goldstandard by leveraging cheaply available parallel corpora, targeting our approach to the problem of domain adaptation for machine translation. Our system is able to achieve F-measures of as much as 80%, when applied to word types it has never seen before. Our approach is based on a large set of novel features that capture varied aspects of how words change when used in new domains. © 2013 Association for Computational Linguistics.

Conference paper