Classifier Recommendation Using Data Complexity Measures
Abstract
Application of machine learning to new and unfamiliar domains calls for increasing automation in choosing a learning algorithm suitable for the data arising from each domain. Meta-learning could address this need since it has been largely used in the last years to support the recommendation of the most suitable algorithms for a new dataset. The use of complexity measures could increase the systematic comprehension over the meta-models and also allow to differentiate the performance of a set of techniques taking into account the overlap between classes imposed by feature values, the separability and distribution of the data points. In this paper we compare the effectiveness of several standard regression models in predicting the accuracies of classifiers for classification problems from the OpenML repository. We show that the models can predict the classifiers' accuracies with low mean-squared-error and identify the best classifier for a problem that results in statistically significant improvements over a randomly chosen classifier or a fixed classifier believed to be good on average.