Pipe breaks in urban water distribution network lead to significant economical and social costs, putting the service quality as well as the profit of water utilities at risk. To cope with such a situation, scheduled preventive maintenance is desired, which aims to predict and fix potential break pipes proactively. Physical models developed for understanding and predicting the failure of pipes are usually expensive, thus can only be used on a limited number of trunk pipes. As an alternative, statistical models that try to predict pipe breaks based on historical data are far less expensive, and therefore have attracted a lot of interests from water utilities recently. In this paper, we report a novel data mining prediction system that has been built for a water utility in a big Chinese city. Various aspects of how to build such a system are described, including problem formulation, data cleaning, model construction, as well as evaluating the importance of attributes according to the requirements of end users in water utilities. Satisfactory results have been achieved by our prediction system. For example, with the system trained on the available dataset at the end of 2010, the water utility would avoid 50% of pipe breaks in 2011 by examining only 6.98% of its pipes in advance. During the construction of the system, we find that the extremely skew distribution of break and non-break pipes, interestingly, is not an obstacle. This lesson could serve as a practical reference for both academical studies on imbalanced learning as well as future explorations on pipe failure prediction problems. © 2013 IEEE.