Performance characterization of a data mining application via hardware-based monitoring
Abstract
In many fields, such as data mining and e-commerce, performance issues are typically addressed by waiting for the next generation of processors and/or distributing the application in a parallel environment. An alternative has been to instrument the code so that observation can drive modifications to improve performance. Success is measured typically by the improvement in wall clock time of program execution. In the latest generation of commercial processors (IBM Power/PowerPC, Compaq Alpha, Intel Pentium III) programmable counters are included in the hardware to gather data that can be used for performance monitoring. These counters allow internal events in the processor to be observed without impacting the performance of the program that is being monitored. This paper explores the use of performance monitoring to characterize the machine learning based data mining program C4.5 running on an IBM Power II processor node in an IBM RS/6000 SP. Development and verification of the methodology to utilize the performance monitoring hardware is presented. The starting point of this work is an existing performance monitoring application that has been extended to allow monitoring of individual programs running on the single chip implementation of the Power II architecture. Examples of the data collected from the monitoring of C4.5 are presented and analyzed. With the experience gained from the work on a single node, the potential issues in extending this methodology into a parallel environment such as the IBM RS/6000 SP are explored. © 2001 SPIE.