Evaluating computer-generated domain-oriented vocabularies
Abstract
It is generally accepted that natural language understanding systems are not now able to deal successfully with unrestricted text, except in very superficial ways. Certainly no current NL system exhibits any significant degree of understanding over arbitrary subject matter. Moreover, there is no convincing reason to believe this situation will change in the near future. Successful systems, therefore, have been restricted to specific applications in particular discourse domains. In those situations where users are expected to provide the domain vocabulary (e.g., TEAM, TQA, etc.) it would be very desirable to provide at least suggestions as to what this vocabulary might be, because a good part of the difficulty in customizing a general system consists of supplying the domain vocabulary and specifying its grammatical properties. This paper discusses some methods for identifying domain vocabulary, as well as techniques for evaluating the quality of the resulting word list. © 1990.