IS&T/SPIE Electronic Imaging 1991
Scaled dct algorithms for jpeg and mpeg implementations on fused multiply/add architectures

We introduce a new scaled Discrete Cosine Transform (SDCT) and inverse SDCTs optimized for architectures where a primitive arithmetic operation is a fused multiply/add. Explicit algorithms are derived for 1-dimensional inputs of 8 points and for 2-dimensional inputs of 8 x 8 points. The latter require 416 operations. When constants are programmable, descaling plus computing the inverse DCT on 8 x 8 points can be done with 417 operations.