Advances in Next Generation Sequencing (NGS) technologies have caused the proliferation of genomic applications to detect DNA mutations and guide personalized medicine. These applications have an enormous computational cost due to the large amount of genomic data they process. Although leveraging FPGAs can improve the processing time of such amount of data, the limited memory capacity of FPGAs often restricts the potential gains. To overcome this limitation, IBM CAPI (Coherent Accelerator Processor Interface) supported platforms provide FPGAs with direct access to the CPU memory. This paper proposes a hardware/software co-design for k-mer counting, one of the most time-consuming phases of genomic applications. The proposed co-design targets CAPI-enabled FPGAs and is integrated into SMUFIN, a state-of-the-art reference-free method for finding DNA mutations. Results show that the proposed co-design outperforms the CPU-only design by a factor of 2.14×, it consumes 2.93× less energy, and it requires 1.57× less memory.