Efficient construction of regression trees with range and region splitting
Abstract
We propose a method for constructing regression trees with range and region splitting. We present an efficient algorithm for computing the optimal two-dimensional region that minimizes the mean squared error of an objective numeric attribute in a given database. As two-dimensional regions, we consider a class R of grid-regions, such as "x-monotone," "rectilinear-convex," and "rectangular," in the plane associated with two numeric attributes. We compute the optimal region R ε R. We propose to use a test that splits data into those that lie inside the region R and those that lie outside the region in the construction of regression trees. Experiments confirm that the use of region splitting gives compact and accurate regression trees in many domains.