Effective estimation of posterior probabilities: Explaining the accuracy of randomized decision tree approaches

Wei Fan; Ed Greengrass; Joe McCloskey; Philip S. Yu; Kevin Drummey

doi:10.1109/ICDM.2005.54

ICDM 2005

Conference paper

27 Nov 2005

Effective estimation of posterior probabilities: Explaining the accuracy of randomized decision tree approaches

View publication

Abstract

There has been increasing number of independently proposed randomization methods in different stages of decision tree construction to build multiple trees. Randomized decision tree methods have been reported to be significantly more accurate than widely-accepted single decision trees, although the training procedure of some methods incorporates a surprisingly random factor and therefore opposes the generally accepted idea of employing gain functions to choose optimum features at each node and compute a single tree that fits the data. One important question that is not well understood yet is the reason behind the high accuracy. We provide an insight based on posterior probability estimations. We first establish the relationship between effective posterior probability estimation and effective loss reduction. We argue that randomized decision tree methods effectively approximate the true probability distribution using the decision tree hypothesis space. We conduct experiments using both synthetic and real-world datasets under both 0-1 and cost-sensitive loss functions. © 2005 IEEE.

Conference paper