Programming for large-scale, multicore-based architectures requires adequate tools that offer ease of programming and do not hinder application performance. StarSs is a family of parallel programming models based on automatic function-level parallelism that targets productivity. StarSs deploys a data-flow model: it analyzes dependencies between tasks and manages their execution, exploiting their concurrency as much as possible. This paper introduces Cluster Superscalar (ClusterSs), a new StarSs member designed to execute on clusters of SMPs (Symmetric Multiprocessors). ClusterSs tasks are asynchronously created and assigned to the available resources with the support of the IBM APGAS runtime, which provides an efficient and portable communication layer based on one-sided communication. We present the design of ClusterSs on top of APGAS, as well as the programming model and execution runtime for Java applications. Finally, we evaluate the productivity of ClusterSs, both in terms of programmability and performance and compare it to that of the IBM X10 language. Copyright © 2012 John Wiley & Sons, Ltd. Copyright © 2012 John Wiley & Sons, Ltd.