Exascale applications will exploit a massive amount of parallelism. The analysis of computation and communication requirements at thread-level provides important insight into the application behavior useful to optimize the design of the exascale architecture. Performing such an analysis is challenging because the exascale system is not available yet. The target applications can be profiled only on existing machines, processing a significantly smaller amount of data and exploiting significantly less parallelism. To tackle this problem we propose a methodology that couples a) unsupervised machine-learning techniques to consistently classify threads in different program runs, and b) extrapolation techniques to learn how thread classes behave at scale. The main contribution of this work is the classification methodology that assigns a class to each thread observed during a set of experimental runs carried out by varying the parallelism and the processed data size. Based on this classification we generate extrapolation models per thread class to predict the profile at a scale significantly larger than the initial experiments. The availability of per-thread-class extrapolation models simplifies the analysis of exascale systems because we manage a small number of thread classes rather than a huge number of individual threads. We apply the methodology to different computing domains including: large-scale graph analytics, fluid dynamics, and radio astronomy. The proposed approach accurately classifies threads, whereas state-of-the-art techniques fail. The resulting extrapolation models have prediction errors of less than 10% for a real-life radio-astronomy case study.