Trees Weighting Random Forest method for classifying high-dimensional noisy data

Hong Bo Li; Wei Wang; Hong Wei Ding; Jin Dong

doi:10.1109/ICEBE.2010.99

ICEBE 2010

Conference paper

01 Dec 2010

Trees Weighting Random Forest method for classifying high-dimensional noisy data

View publication

Abstract

Random forest is an excellent ensemble learning method, which is composed of multiple decision trees grown on random input samples and splitting nodes on a random subset of features. Due to its good classification and generalization ability, random forest has achieved success in various domains. However, random forest will generate many noisy trees when it learns from the data set that has high dimension with many noise features. These noisy trees will affect the classification accuracy, and even make a wrong decision for new instances. In this paper, we present a new approach to solve this problem through weighting the trees according to their classification ability, which is named Trees Weighting Random Forest (TWRF). Here, Out-Of-Bag, which is the training data subset generated by Bagging and not involved in building decision tree, is used to evaluate the tree. For simplicity, we choose the accuracy as the index that notes tree's classification ability and set it as the tree's weight. Experiments show that TWRF has better performance than the original random forest and other traditional methods, such as C45, Naïve Bayes and so on. © 2010 IEEE.

Conference paper