Software compression in the client/server environment
Abstract
Lempel-Ziv based compression algorithms are universal, not assuming any prior knowledge of the file to be compressed or its statistics. Accordingly, the reference dictionary of these textual substitution compression algorithms includes only segments of the already-processed portion of the file. It is often the case, though, that both, compressor and decompressor, even when they reside on different sites, share knowledge of files (e.g., devices managed by a server, or software customers holding older releases of products). For such cases, we suggest to add shared files to the reference dictionary. Preferably, files to be included are those which resemble the file to be compressed. Such an extension of the reference dictionary lengthens the matches found while compressing the file, and thus lessens the number of matches needed to cover the file. We found out that with a careful selection (which can be automated) of the shared files to be included, the advantage of the decrease in the number of matches overwhelms the disadvantage of the increase in the number of bits needed to express the index of each match in the extended dictionary. Altogether, compression attainable by our proposed scheme can be significantly better than with the original Lempel-Ziv dictionary. Maintaining and searching a dictionary that is much larger than the original Lempel-Ziv dictionary demand strong computational resources and suits off-line more than on-line compression. We thus conclude that in the client/server environment, where shared files commonly exist, and the server enjoys extensive computational resources, the scheme suggested is advantageous for transferring files from the server to its clients.