Three-dimensional (3D) computed tomography (CT) is one of the key components of many clinical workflows. Because CT reconstruction has been known as a compute-intensive workload, accelerating this workload using special-purpose accelerators, such as GPUs and FPGAs, or multi-socket server-grade processors has been widely studied. Due to recent advances in semiconductor technologies, even commodity processors, such as those used in PCs, can provide sufficient computing power for CT reconstruction by multiple cores with vector processing units. Despite their huge computing power, commodity processors often provide limited system memory bandwidth compared to server-grade processors due to constraints in cost and energy consumption. In this paper, we describe our memory-optimization technique and its implementation targeting on general-purpose processors with limited memory bandwidth. By reducing the memory-bandwidth requirement with batch processing, the memory optimization achieved up to 80% performance improvements in RabbitCT, a widely-used CT benchmark, on a quad-core processor with limited memory bandwidth. Without the memory optimization, the performance did not scale with more than two cores. The implementation can process about 40 projection images per second for the most common problem size of 512A3 with only four cores used. It is therefore practical to use such commodity processors in real CT systems without additional accelerators, which trade greatly increased cost and energy consumption for higher throughput.