We illustrate how employing Graphics Processing Units (GPU) can speed-up intensive image processing operations. In particular, we demonstrate the use of the NVIDIA CUDA architecture to implement a color digital binary halftoning algorithm based on Direct Binary Search (DBS). Halftoning a color image is more computationally expensive than the single color case as there is a need to minimize dot interaction between different color planes as well. We propose processing all color planes in parallel. In addition we employ processing several non-overlapping neighborhoods in parallel, by utilizing the GPU's parallel architecture, to further improve the computational efficiency. This parallel approach allows us to use a large neighborhood and filter size, to achieve the highest halftone quality, while having minimal impact on performance. © 2012 IEEE.