Lossless Source Coding
Lossless source coding is data compression where decompression gives back
an exact copy of the original. It is used in modems, for compressing text files,
where legal issues require it, etc. Although images, sound, and video are more often
compressed using a lossy compression technique to achieve better compression
many compressors have a lossless mode. In addition, a lossless algorithm may be
used as a building block in designing a lossy compressor.
Some examples:
- Fixed-length to fixed-length (FF) coding: Suppose you have a program that uses
files that contain only decimal digits, spaces, tabs, carriage returns, dots, and commas.
On most computers these files would be stored in eight-bit ASCII. It is easy to write
a special-purpose compressor that will compress these files by a factor of two just
by encoding each ASCII character into a four-bit codeword.
- Fixed-length to variable-length (FV) coding: Even if all 256 possible characters occur
in the file, it is still possible to compress it if some characters are more frequent than
others. Suppose the files described in the previous example are not guaranteed to
contain only the fifteen characters mentioned, but the remaining 241 characters occur
very infrequently. The fifteen frequent characters could be encoded using codewords
0001 through 1111, while 0000 would be reserved as an escape code to indicate that
the following eight bits should be interpreted as a single character.
- Variable-length to fixed-length (VF) coding: Suppose you have some files
representing black and white images, with black pixels represented by 1-bits and white
pixels by a 0-bits. Often such files will have long runs of identical bits. A simple
run-length compressor might encode runs of 1 through 128 0-bits using codewords
00000000 through 01111111 and encode runs of 1 through 128 1-bits using codewords
10000000 through 11111111. If there are many more long runs than short runs
this would give good compression.
- Variable-length to variable-length (VV) coding: The compressor in the previous
example might be improved by using short codewords for frequently-occurring
run lengths at the expense of lengthening the codewords for less frequent run
lengths.
Note that not all compressors fall into one of these four classes.