Publication
WWW 2010
Conference paper

Topic initiator detection on the world wide web

View publication

Abstract

In this paper we introduce a new Web mining and search technique - Topic Initiator Detection (TID) on the Web. Given a topic query on the Internet and the resulting collection of time-stamped web documents which contain the query keywords, the task of TID is to automatically return which web document (or its author) initiated the topic or was the first to discuss about the topic. To deal with the TID problem, we design a system framework and propose algorithm InitRank (Initiator Ranking) to rank the web documents by their possibility to be the topic initiator. We first extract features from the web documents and design several topic initiator indicators. Then, we propose a TCL graph which integrates the Time, Content and Link information and design an optimization framework over the graph to compute InitRank. Experiments show that compared with baseline methods, such as direct time sorting, well-known link based ranking algorithms PageRank and HITS, InitRank achieves the best overall performance with high effectiveness and robustness. In case studies, we successfully detected (1) the first web document related to a famous rumor of an Australia product banned in USA and (2) the pre-release of IBM and Google Cloud Computing collaboration before the official announcement. © 2010 International World Wide Web Conference Committee (IW3C2).

Date

Publication

WWW 2010

Authors

Topics

Share