Template detection via data mining and its applications

Ziv Bar-Yossef; Sridhar Rajagopalan

doi:10.1145/511446.511522

WWW 2002

Conference paper

01 Dec 2002

Template detection via data mining and its applications

View publication

Abstract

We formulate and propose the template detection problem, and suggest a practical solution for it based on counting frequent item sets. We show that the use of templates is pervasive on the web. We describe three principles, which characterize the assumptions made by hypertext information retrieval (IR) and data mining (DM) systems, and show that templates are a major source of violation of these principles. As a consequence, basic "pure" implementations of simple search algorithms coupled with template detection and elimination show surprising increases in precision at all levels of recall.

Conference paper