An universal entropy encoder

Juan Francisco Rodríguez Herrera
Vicente González Ruiz

September 12, 2016

Contents

1 The idea
2 Encoder (only one symbol)
3 Decoder (only one symbol)

1 The idea

2 Encoder (only one symbol)

  1. While the decoder can not know the symbol:
    1. Assert something about the symbol that allows to the decoder to minimize the uncertainty of finding that symbol. This guess should have the same probability of to be true or false.
    2. Output a bit of code that says if the last guess is true or false.

3 Decoder (only one symbol)

  1. While the symbol is not known without uncertainty:
    1. Make the same guess that the encoder.
    2. Input a bit of code that represents the result of the last guess.

Example

Let’s suppose that we use the Spanish alphabet. Humans know that symbols does not form words in any order (this fact can help us to formulate the following VLC (Variable Length Codec)):

In Spanish there are 28 letters. Therefore, to encode, for example, the word “preciosa”, the first symbol “p” can be represented by it index inside the Spahish alphabet with a code-word of 5 bits. In this try, the encoding is not a very efficient, but this is the first letter ... For the second one “r” we can see (using a Spanish dictionary) that after a “p”, the following symbols are possible: (1) “a”, (2) “e”, (3) “i”, (4) “l”, (5) “n”, (5) “o”, (7) “r”, (8) “s” and (9) “u”. Therefore, we don’t need 5 bits now, 4 are enough.

Chances p r e c i o s a









1 a a a a a a s a
2 l e e b e n i
3 l i i c i o o
4 l o d l p
5 o n u f o s
6 f o g u t
7 r h
8 t s i
9 h u j
10 e l
11 m m
12 n
13  n
14 o
15 p
16 r
17 s
18 t
19 v
20 z
:
28









bits 5 4 3 5 3 3 0 2
total 5 9 12 17 20 23 23 25

The compression ratio has been 401 /25:1.