About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
EITC 2005
Conference paper
Advanced technology for managing XML document collection
Abstract
Organizing large document collections for finding information easily and quickly has always been a challenging problem. In the last few years, XML has become the de-facto standard for content publishing and data exchange. The proliferation of XML documents and data has created new challenges and opportunities for managing document collections. Existing technologies for automatically organizing document collections are either imprecise or based only on simple grouping criteria. Since XML documents are self describing, it is possible to automatically categorize XML documents precisely, according to their content. With the availability of the standard XML query languages, e.g. XQuery, much more powerful folder and categorization technologies are now feasible. To address this new challenge and exploit this new opportunity, this paper describes a new and powerful categorization technology. This technology fully exploits the rich data model and semantic information embedded in the XML documents to dynamically categorize XML document collections precisely. Besides supporting directory-like document look-up operations, this technology also provides advanced operations such as multi-path navigation and document traversal across multiple collections. A preliminary performance study shows that this new categorization technology is both efficient and scalable. Thus, it is an ideal technology for automating the process of organizing and categorizing XML documents. © 2005 IEEE.