Let’s suppose that we use the Spanish alphabet. Humans know that symbols does not form words in any order (this fact can help us to formulate the following VLC (Variable Length Codec)):
In Spanish there are 28 letters. Therefore, to encode, for example, the word “preciosa”, the first symbol “p” can be represented by it index inside the Spahish alphabet with a code-word of 5 bits. In this try, the encoding is not a very efficient, but this is the first letter ... For the second one “r” we can see (using a Spanish dictionary) that after a “p”, the following symbols are possible: (1) “a”, (2) “e”, (3) “i”, (4) “l”, (5) “n”, (5) “o”, (7) “r”, (8) “s” and (9) “u”. Therefore, we don’t need 5 bits now, 4 are enough.
Chances | p | r | e | c | i | o | s | a |
1 | a | a | a | a | a | a | s | a |
2 | l | e | e | b | e | n | i | |
3 | l | i | i | c | i | o | o | |
4 | l | o | d | l | p | |||
5 | o | n | u | f | o | s | ||
6 | f | o | g | u | t | |||
7 | r | h | ||||||
8 | t | s | i | |||||
9 | h | u | j | |||||
10 | e | l | ||||||
11 | m | m | ||||||
12 | n | |||||||
13 | n | |||||||
14 | o | |||||||
15 | p | |||||||
16 | r | |||||||
17 | s | |||||||
18 | t | |||||||
19 | v | |||||||
20 | z | |||||||
: | ||||||||
28 | ||||||||
bits | 5 | 4 | 3 | 5 | 3 | 3 | 0 | 2 |
total | 5 | 9 | 12 | 17 | 20 | 23 | 23 | 25 |
The compression ratio has been 401 /25:1.