Extracting Semantic Hierarchies from A Large On-Line Dictionary

Martin S. Chodorow

ACL 1985

Conference paper

08 Jul 1985

Extracting Semantic Hierarchies from A Large On-Line Dictionary

Abstract

Dictionaries are rich sources of detailed semantic information, but in order to use the information for natural language processing, it must be organized systematically. This paper describes automatic and semi-automatic procedures for extracting and organizing semantic fea- ture information implicit in dictionary definitions. Two head-finding heuristics are described for locating the genus terms in noun and verb definitions. The assumption is that the genus term represents inherent features of the word it defines. The two heuristics have been used to process definitions of 40,000 nouns and 8,000 verbs, producing indexes in which each genus term is associated with the words it defined. The Sprout program interactively grows a taxonomic "tree" from any specified root feature by consulting the genus index. Its output is a tree in which all of the nodes have the root feature for at least one of their senses. The Filter program uses an inverted form of the genus index. Filtering begins with an initial filter file consisting of words that have a given feature (e.g. [+human]) in all of their senses. The program then locates, in the index, words whose genus terms all appear in the filter file. The output is a list of new words that have the given feature in all of their senses.

Conference paper