Structure learning is a key problem in using Bayesian networks for data mining tasks but its computation complexity increases dramatically with the number of features in the dataset. Thus, it is computationally intractable to extend structure learning to large networks without using a scalable parallel approach. This work explores computation primitives to parallelize the first phase of Cheng et al.'s (Artificial Intelligence, 137(1-2):43-90, 2002) Bayesian network structure learning algorithm. The proposed primitives are highly suitable for multithreading architectures. Firstly, we propose a waitfree table construction primitive for building potential tables from the training data in parallel. Notably, this primitive allows multiple cores to update a potential table simultaneously without appealing to any lock operation, allowing all cores to be fully utilized. Secondly, the marginalization primitive is proposed to enable efficient statistics tests to be performed on all pairs of variables in the learning algorithm. These primitives are quantitatively evaluated on a 32-core platform and the experiment results show 23:5× speedup compared to a single thread implementation.