About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Abstract
The Web is a vast source of information. However, due to the disparate authorship of Web pages, this information is buried in its amorphous and chaotic structure. At the same time, with the pervasiveness of Web access, an increasing number of users is relying on Web search engines for interesting information. We are increased in identifying how pieces of information are related as they are presented on the Web. One such problem is studying patterns of occurrences of related phrases in Web documents and in identifying relationships between these phrases. We call these the duality problems of the Web. Duality problems are materialized in trying to define and identify two sets of inter-related concepts, and are solved by iteratively refining mutually dependent coarse definitions of these concepts. In this paper we define and formalize the general duality problem of relations on the Web. Duality of patterns and relationships are of importance because they allow us to define the rules of patterns and relationships iteratively through the multitude of their occurrences. Our solution includes Web crawling to iteratively refine the definition of patterns and relations. As an example we solve the problem of identifying acronyms and their expansions through patterns of occurrences of (acronym, expansion) pairs as they occur in Web pages.