Welcome to the new version of CaltechAUTHORS. Login is currently restricted to library staff. If you notice any issues, please email coda@library.caltech.edu
Published November 2000 | public
Journal Article Open

PPM performance with BWT complexity: a fast and effective data compression algorithm

Abstract

This paper introduces a new data compression algorithm. The goal underlying this new code design is to achieve a single lossless compression algorithm with the excellent compression ratios of the prediction by partial mapping (PPM) algorithms and the low complexity of codes based on the Burrows Wheeler Transform (BWT). Like the BWT-based codes, the proposed algorithm requires worst case O(n) computational complexity and memory; in contrast, the unbounded-context PPM algorithm, called PPM*, requires worst case O(n2) computational complexity. Like PPM*, the proposed algorithm allows the use of unbounded contexts. Using standard data sets for comparison, the proposed algorithm achieves compression performance better than that of the BWT-based codes and comparable to that of PPM*. In particular, the proposed algorithm yields an average rate of 2.29 bits per character (bpc) on the Calgary corpus; this result compares favorably with the 2.33 and 2.34 bpc of PPM5 and PPM* (PPM algorithms), the 2.43 bpc of BW94 (the original BWT-based code), and the 3.64 and 2.69 bpc of compress and gzip (popular Unix compression algorithms based on Lempel-Ziv (LZ) coding techniques) on the same data set. The given code does not, however, match the best reported compression performance-2.12 bpc with PPMZ9-listed on the Calgary corpus results web page at the time of this publication. Results on the Canterbury corpus give a similar relative standing. The proposed algorithm gives an average rate of 2.15 bpc on the Canterbury corpus, while the Canterbury corpus web page gives average rates of 1.99 bpc for PPMZ9, 2.11 bpc for PPM5, 2.15 bpc for PPM7, 2.23 bpc for BZIP2 (a popular BWT-based code), and 3.31 and 2.53 bpc for compress and gzip, respectively.

Additional Information

© Copyright 2000 IEEE. Reprinted with permission. Manuscript received February 15, 2000; revised August 8, 2000. This work was partially supported by NSF Award CCR-9909026.

Files

EFFprocieee00.pdf
Files (173.9 kB)
Name Size Download all
md5:a1c93e61b1a28ec339296122b469488b
173.9 kB Preview Download

Additional details

Created:
August 21, 2023
Modified:
October 16, 2023