A scalable queuing service based on an in-memory data grid
Yuan Wang, Han Chen, et al.
ICEBE 2010
Pure software HDTV video decoding is still a challenging task on entry-level to mid-range desktop and notebook PCs, even with today's microprocessors frequency measured in GHz. This paper shows that the performance bottleneck in a software MPEG-2 decoder has been shifted to memory operations, as microprocessor technologies including multimedia instruction extensions have been improving at a fast rate during the past years. Our study exploits concurrencies at macroblock level to alleviate the performance bottleneck in a software MPEG-2 decoder. First, the paper introduces an interleaved block-order data layout to improve CPU cache performance. Second, the paper describes an algorithm to explicitly prefetch macroblocks for motion compensation. Finally, the paper presents an algorithm to schedule interleaved decoding and output at macroblock level. Our implementation and experiments show that these methods can effectively hide the latency of memory and frame buffer. The optimizations improve the performance of a multimedia-instruction-optimized software MPEG-2 decoder by a factor of about two. On a PC with a 933 MHz Pentium III CPU, the decoder can decode and display 1280 × 720-resolution HDTV streams at over 62 frames per second. © 2005 Springer Science + Business Media, Inc.
Yuan Wang, Han Chen, et al.
ICEBE 2010
James Philbin, Jan Edler, et al.
SIGPLAN Notices (ACM Special Interest Group on Programming Languages)
Grant Wallace, Otto J. Anshus, et al.
IEEE Computer Graphics and Applications
Yih-Farn Chen, Yennun Huang, et al.
NOSSDAV 2007