Many fundamental graph mining problems, such as maximal clique enumeration and subgraph isomorphism, can be solved using combinatorial algorithms that are naturally expressed in a recursive form. However, recursive graph mining algorithms suffer from a high algorithmic complexity and long execution times. Moreover, because the recursive nature of these algorithms causes unpredictable execution and memory access patterns, parallelizing them on modern computer architectures poses challenges. In this work, we describe an efficient manycore CPU implementation of maximal clique enumeration (MCE), a basic building block of several social and biological network mining algorithms. First, we improve the single-thread performance of MCE by accelerating its computation-intensive kernels through cache-conscious data structures and vector instructions. Then, we develop a multi-core solution and eliminate its scalability bottlenecks by minimizing the scheduling and the memory-management overheads. On highly-parallel modern CPUs, we demonstrate an up to 19-fold performance improvement compared to a state-of-the-art multi-core implementation of MCE.