Distribution-free bounds for relational classification

View publication


Statistical relational learning (SRL) is a subarea in machine learning which addresses the problem of performing statistical inference on data that is correlated and not independently and identically distributed (i. i. d.)-as is generally assumed. For the traditional i. i. d. setting, distribution-free bounds exist, such as the Hoeffding bound, which are used to provide confidence bounds on the generalization error of a classification algorithm given its hold-out error on a sample size of N. Bounds of this form are currently not present for the type of interactions that are considered in the data by relational classification algorithms. In this paper, we extend the Hoeffding bounds to the relational setting. In particular, we derive distribution-free bounds for certain classes of data generation models that do not produce i. i. d. data and are based on the type of interactions that are considered by relational classification algorithms that have been developed in SRL. We conduct empirical studies on synthetic and real data which show that these data generation models are indeed realistic and the derived bounds are tight enough for practical use. © 2011 Springer-Verlag London Limited.


08 May 2011