Stochastic models for the web graph
Abstract
The web may be viewed as a directed graph each of whose vertices is a static HTML web page, and each of whose edges corresponds to a hyperlink from one web page to another. In this paper we propose and analyze random graph models inspired by a series of empirical observations on the web. Our graph models differ from the traditional Gn,p models in two ways: 1. Independently chosen edges do not result in the statistics (degree distributions, clique multitudes) observed on the web. Thus, edges in our model are statistically dependent on each other. 2. Our model introduces new vertices in the graph as time evolves. This captures the fact that the web is changing with time. Our results are two fold: we show that graphs generated using our model exhibit the statistics observed on the web graph, and additionally, that natural graph models proposed earlier do not exhibit them. This remains true even when these earlier models are generalized to account for the arrival of vertices over time. In particular, the sparse random graphs in our models exhibit properties that do not arise in far denser random graphs generated by Erdos-Renyi models.