Lossless Source Coding

Lossless source coding is data compression where decompression gives back an exact copy of the original. It is used in modems, for compressing text files, where legal issues require it, etc. Although images, sound, and video are more often compressed using a lossy compression technique to achieve better compression many compressors have a lossless mode. In addition, a lossless algorithm may be used as a building block in designing a lossy compressor.

Some examples:

  1. Fixed-length to fixed-length (FF) coding: Suppose you have a program that uses files that contain only decimal digits, spaces, tabs, carriage returns, dots, and commas. On most computers these files would be stored in eight-bit ASCII. It is easy to write a special-purpose compressor that will compress these files by a factor of two just by encoding each ASCII character into a four-bit codeword.

  2. Fixed-length to variable-length (FV) coding: Even if all 256 possible characters occur in the file, it is still possible to compress it if some characters are more frequent than others. Suppose the files described in the previous example are not guaranteed to contain only the fifteen characters mentioned, but the remaining 241 characters occur very infrequently. The fifteen frequent characters could be encoded using codewords 0001 through 1111, while 0000 would be reserved as an escape code to indicate that the following eight bits should be interpreted as a single character.

  3. Variable-length to fixed-length (VF) coding: Suppose you have some files representing black and white images, with black pixels represented by 1-bits and white pixels by a 0-bits. Often such files will have long runs of identical bits. A simple run-length compressor might encode runs of 1 through 128 0-bits using codewords 00000000 through 01111111 and encode runs of 1 through 128 1-bits using codewords 10000000 through 11111111. If there are many more long runs than short runs this would give good compression.

  4. Variable-length to variable-length (VV) coding: The compressor in the previous example might be improved by using short codewords for frequently-occurring run lengths at the expense of lengthening the codewords for less frequent run lengths.

Note that not all compressors fall into one of these four classes.