# Efficient construction of regression trees with range and region splitting

## Abstract

We propose an efficient way of constructing re gression trees in order to predict the objective nu meric attribute values of given tuples. A regres sion tree is a rooted binary tree such that each internal node contains a test, which can be ex pressed as an RDB query, for splitting tuples into two disjoint classes and passing data in each class down to the left or right subtree. The mean of the objective attribute values at the leaf is used as the predicted value of the tuple. To test a numeric attribute, traditional approaches use a guillotine-cut splitting that classifies data into those below a given value and others. Instead, we consider a family R of grid-regions in the plane associated with two given numeric at-tributes. We propose to use a test that splits data into those that lie inside a region R and those that lie outside. The contributions of this paper are as follows. We present an efficient algorithm for computing R 6 Jl that minimizes the mean squared error after the introduction of the test with the region R. Experiments confirmed that the use of region splitting gives a smaller mean squared error of re gression trees. Our approach can also generate smaller regression trees.