We can compress if each symbol is translated by code-words and, in
average, the lengths of the code-words are smaller than the length of the
symbols.
The encoder and the decoder have a probabilistic model
which says to the variable-length encoder ()/decoder()
the probability
of each symbol
(see Figure 1).
The most probable symbols are represented by the shorter code-words and
viceversa.
Figure 1: Block diagram of the entropy encoding/decoding.
2 Bit of data and bit of information
Data is the representation of the information.
Lossless data compression uses a shorter representation of the information.
By definition, a bit of data stores a bit of information if and only if it
represents the occurrence of an equiprobable event (an event that can be
true or false with the same probability).
By definition, a symbol
with probability
stores
(Eq:symbol_information)
bits of information.
The length of the code-word depends on the probability as:
3 Entropy of a information source
The entropy
measures the amount of information per symbol that a source of information
produces, in average, i.e.
(1)
bits-of-information/symbol, where
is the size of the source alphabet (number of different symbols).