An universal entropy encoder

Let’s suppose that we use the Spanish alphabet. Humans know that symbols does not form words in any order (this fact can help us to formulate the following VLC (Variable Length Codec)):

In Spanish there are 28 letters. Therefore, to encode, for example, the word “preciosa”, the ﬁrst symbol “p” can be represented by it index inside the Spahish alphabet with a code-word of 5 bits. In this try, the encoding is not a very eﬃcient, but this is the ﬁrst letter ... For the second one “r” we can see (using a Spanish dictionary) that after a “p”, the following symbols are possible: (1) “a”, (2) “e”, (3) “i”, (4) “l”, (5) “n”, (5) “o”, (7) “r”, (8) “s” and (9) “u”. Therefore, we don’t need 5 bits now, 4 are enough.

Chances	p	r	e	c	i	o	s	a

1	a	a	a	a	a	a	s	a
2	l	e	e	b	e	n		i
3	l	i	i	c	i	o		o
4		l	o	d	l	p
5	o	n	u	f	o	s
6	f	o		g	u	t
7		r		h
8	t	s		i
9	h	u		j
10	e			l
11	m			m
12				n
13				n
14				o
15				p
16				r
17				s
18				t
19				v
20				z
:
28

bits	5	4	3	5	3	3	0	2
total	5	9	12	17	20	23	23	25

The compression ratio has been 40¹ /25:1.

An universal entropy encoder

Contents

1 The idea

2 Encoder (only one symbol)

3 Decoder (only one symbol)

Example