PPM performance with BWT complexity: a new method for lossless data compression
- Creators
- Effros, Michelle
Abstract
This work combines a new fast context-search algorithm with the lossless source coding models of PPM to achieve a lossless data compression algorithm with the linear context-search complexity and memory of BWT and Ziv-Lempel codes and the compression performance of PPM-based algorithms. Both sequential and nonsequential encoding are considered. The proposed algorithm yields an average rate of 2.27 bits per character (bpc) on the Calgary corpus, comparing favorably to the 2.33 and 2.34 bpc of PPM5 and PPM* and the 2.43 bpc of BW94 but not matching the 2.12 bpc of PPMZ9, which, at the time of this publication, gives the greatest compression of all algorithms reported on the Calgary corpus results page. The proposed algorithm gives an average rate of 2.14 bpc on the Canterbury corpus. The Canterbury corpus Web page gives average rates of 1.99 bpc for PPMZ9, 2.11 bpc for PPM5, 2.15 bpc for PPM7, and 2.23 bpc for BZIP2 (a BWT-based code) on the same data set.
Additional Information
© Copyright 2002 IEEE. Reprinted with permission. This material is based upon work partially supported by NSF Award No. CCR-9909026.Files
Name | Size | Download all |
---|---|---|
md5:0ba8daff5d65ac84b2796fe175643a1d
|
236.5 kB | Preview Download |
Additional details
- Eprint ID
- 7402
- Resolver ID
- CaltechAUTHORS:EFFdcc00
- Created
-
2007-02-12Created from EPrint's datestamp field
- Updated
-
2021-11-08Created from EPrint's last_modified field