Welcome to the new version of CaltechAUTHORS. Login is currently restricted to library staff. If you notice any issues, please email coda@library.caltech.edu
Published 2000 | public
Book Section - Chapter Open

PPM performance with BWT complexity: a new method for lossless data compression

Abstract

This work combines a new fast context-search algorithm with the lossless source coding models of PPM to achieve a lossless data compression algorithm with the linear context-search complexity and memory of BWT and Ziv-Lempel codes and the compression performance of PPM-based algorithms. Both sequential and nonsequential encoding are considered. The proposed algorithm yields an average rate of 2.27 bits per character (bpc) on the Calgary corpus, comparing favorably to the 2.33 and 2.34 bpc of PPM5 and PPM* and the 2.43 bpc of BW94 but not matching the 2.12 bpc of PPMZ9, which, at the time of this publication, gives the greatest compression of all algorithms reported on the Calgary corpus results page. The proposed algorithm gives an average rate of 2.14 bpc on the Canterbury corpus. The Canterbury corpus Web page gives average rates of 1.99 bpc for PPMZ9, 2.11 bpc for PPM5, 2.15 bpc for PPM7, and 2.23 bpc for BZIP2 (a BWT-based code) on the same data set.

Additional Information

© Copyright 2002 IEEE. Reprinted with permission. This material is based upon work partially supported by NSF Award No. CCR-9909026.

Files

EFFdcc00.pdf
Files (236.5 kB)
Name Size Download all
md5:0ba8daff5d65ac84b2796fe175643a1d
236.5 kB Preview Download

Additional details

Created:
August 22, 2023
Modified:
October 16, 2023