Probabilistic models

Probabilistic models

Juan Francisco Rodríguez Herrera
Vicente González Ruiz

July 10, 2017

Contents

1 Funtionality
2 Static models
3 Adaptive models
4 Encoding
5 Decoding
6 Initially empty models
7 Encoder
8 Decoder
9 Models with memory
10 Encoder
11 Decoder

1 Funtionality

Compute the probabilities of the symbols (see Figure ??).
Both, the encoder and the decoder must be syncronized.

2 Static models

The simplest models because the probabilities of the symbols remain constant.
The variable-length codec can be precomputed.
If the last premise is true, the entropy codec is eﬃcient an fast. For this reason, static models are very common in codecs such as JPEG, MPEG (audio and video), etc.

Let’s go the the lab!

Download http://www.ace.ual.es/~vruiz/doctorado/text-compression.tar.gz and make huff_s0.
Encode each image of The Test Image Corpus using the codec. Continue ﬁlling the table:
    Codec | lena boats pepers zelda
----------+------------------------
      RLE | ....  ....   ....  ....
  BWT+RLE | ....  ....   ....  ....
     LZSS | ....  ....   ....  ....
      LZW | ....  ....   ....  ....
  huff_s0 | ....  ....   ....  ....
Check that the huff_s0 codec is lossless.

3 Adaptive models

The probabilities of the symbols are computed in run-time.
In general, the compression ratios that an adaptive model get are better than the static model’s ones because the probabilities of the symbols are localy computed.

4 Encoding

Asign the same probability to all the symbols.
While the input if not exhausted:
1. Encode the next symbol.
2. Update (increase) its probability.

5 Decoding

Identical to the step 1 of the encoder.
While the input is not exhausted:
1. Decode the next symbol.
2. Identical to the step 2.b of the encoder.

Let’s go the the lab!

Download http://www.ace.ual.es/~vruiz/doctorado/progs.tar.gz and make huff_a0 and make arith_a0.
Encode each image of The Test Image Corpus using the codec. Continue ﬁlling the table:
    Codec | lena boats pepers zelda Average
----------+--------------------------------
        : |    :     :      :     :       :
  huff_s0 | ....  ....   ....  ....    ....
  huff_a0 | ....  ....   ....  ....    ....
arith_a0 | ....  ....   ....  ....    ....
Remember to check that both codecs are lossless.

6 Initially empty models

Used in adaptive models.
The smaller the number of symbols used by the model, the higher the probabilities, and therefore, the better the compression ratios.
An initially empty model only stores the ESC(cape) symbol, a symbol that it is used by the encoder only when a new symbol is found.

7 Encoder

Set the probability of the ESC to $1.0$ (and the probability of the rest of the symbols to $0.0$ ).
While the input is not exhausted:
1. $s \leftarrow$ next symbol.
2. If s has been found before, then:
  1. Encode $s$ and output $c (s)$ .
3. Else:
  1. Output $c (ESC)$ .
  2. Output a raw symbol $s$ .
4. Update $p (s)$ .

8 Decoder

Identical to the step 1 of the encoder.
While the input is not exhausted:
1. $c (s) \leftarrow$ next code-word.
2. Decode $s$ .
3. If s = ESC, then:
  1. Input a raw symbol $s$ .
4. Update $p (s)$ .
5. Output $s$ .

Let’s go the the lab!

make arith-n-c and make arith-n-d.
         Codec | lena boats pepers zelda Average
---------------+--------------------------------
             : |    :     :      :     :       :
arith-n-c -o 0 | ....  ....   ....  ....    .... #1
BWT | ahuff_0 | ....  ....   ....  ....    ....
      bzip2 -9 | ....  ....   ....  ....    .... #2
       gzip -9 | ....  ....   ....  ....    .... #3

#1 -> Similar to arith_a0 but using an initially
      empty model.

#2 -> Similar to BWT | ahuff_0

#3 -> Similar to LZ77 + ahuff_0
Check the reversibility.

9 Models with memory

In most cases, the probability of a symbol depends on its neighborhood (context).
The higher the memory of the model (the context), the higher the accuracy of the predictions (probabilities), and therefore, the lower the length of the code-words [1].
Let $𝒞 [i]$ the last $i$ encoded symbols and $p (s | 𝒞 [i])$ the probability that the symbol $s$ follows the context $𝒞 [i]$ .
Let $k$ the maximal order of the prediction (i.e. the largest number of symbols of $𝒞 []$ that are going to be used as the actual context). Notice that $𝒞 [0] = \emptyset$ and the model has no memory.
We suppose that arithmetic coding is used and therefore, when we input or output $c (s)$ , we are transmitting $I (s)$ bits of code.
Let $r$ the size of the source alphabet.

10 Encoder

Create an empty model for every context $0 \leq i \leq k$ .
Create an non-empty model for $k = - 1$ .
While the input is not exhausted:
1. $s \leftarrow$ Input $_{{log}_{2} (r)}$ .
2. $i \leftarrow k$ (except for the ﬁrst symbol, where $i \leftarrow 0$ ).
3. While p(s|𝒞[i]) = 0 (it is the ﬁrst time that s follows 𝒞[i]):
  1. Output $\leftarrow c (ESC | 𝒞 [i])$ .
  2. Update $p (ESC | 𝒞 [i])$ .
  3. Update $p (s | 𝒞 [i])$ (insert $s$ into the $𝒞 [i]$ context).
  4. $i \leftarrow i - 1$ .
4. Output $\leftarrow c (s | 𝒞 [i])$ . The symbols that were in contexts with order $> i$ must be excluded of the actual ( $𝒞 [i]$ ) context because $s$ is not none of them.
5. If $i \geq 0$ , update $p (s | 𝒞 [i])$ .

Example

Let $r = 256$ the size of the source alphabet.
The probabilistic model $M [𝒞 [- 1]]$ (for the special context $𝒞 [- 1]$ ) is non adaptative, non empty and has an special symbol EOF (End Of File) that is going to be used when the compression has ﬁnished: $M [𝒞 [- 1]] = {0, 1 1, 1 \dots a, 1 b, 1 \dots 255, 1 EOF, 1} .$
In a pair $a, b$ , $a$ is the symbol and $b$ is its probability (counts).
$M [𝒞 [0]]$ is adaptative and empty: $M [𝒞 [0]] = {ESC, 1} .$
In this example (for the sake of the simplicity), the maximal order of the prediction $k = 1$ (we only remember the previous symbol). Therefore, there are $r = 256$ probabilistic models: $M [𝒞 [1]] = {ESC, 1}, 0 \leq 𝒞 [1] \leq r .$
Encoding of the ﬁrst symbol (a) (see Figure 1):
1. $s \leftarrow$ a.
2. $i \leftarrow 0$ (we don’t know the previous symbol).
3. $p (a | 𝒞 [0]) = 0$ (the context only has the ESC).
4. Output $\leftarrow c (ESC | 𝒞 [0])$ (althought $l (c (ESC | 𝒞 [0])) = 0$ ).
5. Update $p (ESC | 𝒞 [0])$ (now, the count of ESC is 2).
6. Insert a into $M [𝒞 [0]] = {ESC, 2 a, 1}$ .
7. $i \leftarrow - 1$ .
8. $p (a | 𝒞 [- 1]) \neq 0$ .
9. Output $\leftarrow c (a | 𝒞 [- 1])$ where $p (a | 𝒞 [- 1]) = 1 ∕ (256 + 1)$ .
Encoding of the second symbol (b):
1. $s \leftarrow$ b.
2. $i \leftarrow 1$ .
3. $p (b | 𝒞 [1]) = 0$ because $𝒞 [1] = a$ and $M [a] = {ESC, 1}$ .
4. Output $\leftarrow c (ESC | a)$ (althought $l (c (ESC | a)) = 0$ ).
5. Update $p (ESC | a)$ (now, the count of ESC is 2).
6. Insert b into $M [a] = {ESC, 2 b, 1}$ .
7. $i \leftarrow 0$ .
8. $p (b | 𝒞 [0]) = 0$ because $M [𝒞 [0]] = {ESC, 2 a, 1}$ .
9. Output $\leftarrow c (ESC | 𝒞 [0])$ where $p (ESC | 𝒞 [0]) = 2 ∕ 3$ .
10. Update $p (ESC | 𝒞 [0])$ (now, the count of ESC is 3).
11. Insert b into $M [𝒞 [0]] = {ESC, 3 a, 1 b, 1}$ .
12. $i \leftarrow - 1$ .
13. $p (b | 𝒞 [- 1]) \neq 0$ .
14. Output $\leftarrow c (b | 𝒞 [- 1])$ where $p (b | 𝒞 [- 1]) = 1 ∕ r$ . The symbol a has been excluded in the calculus of the probability of b because $a \in M [𝒞 [0]] = {ESC, 3 a, 1 b, 1}$ .

Input	Output	Prob. of the output	Related contexts

a	$c_{M [𝒞 [0]]} (ESC) c_{M [𝒞 [- 1]]} (a)$	$1 \cdot 1 ∕ (r + 1)$	$M [𝒞 [0]] = {ESC, 2 a, 1}$
b	$c_{M [a]} (ESC) c_{M [𝒞 [0]]} (ESC) c_{M [𝒞 [- 1]]} (b)$	$1 \cdot 2 ∕ 3 \cdot 1 ∕ r$	$M [a] = {ESC, 1 b, 1}$
			$M [c [0]] = {ESC, 3 a, 1 b, 1}$
a	$c_{M [b]} (ESC) c_{M [𝒞 [0]]} (a)$	$1 \cdot 1 ∕ 3$	$M [b] = {ESC, 1 a, 1}$
			$M [𝒞 [0]] = {ESC, 3 a, 2 b, 1}$
b	$c_{M [a]} (b)$	$1 ∕ 2$	$M [𝒞 [a]] = {ESC, 1 b, 2}$
c	$c_{M [b]} (ESC) c_{M [𝒞 [0]]} (ESC) c_{M [𝒞 [- 1]]} (c)$	$1 ∕ 2 \cdot 1 ∕ 4 \cdot 1 ∕ (r - 1)$	$M [b] = {ESC, 1 a, 1 c, 1}$
			$M [𝒞 [0]] = {ESC, 4 a, 2 b, 1 c, 1}$
b	$c_{M [c]} (ESC) c_{M [𝒞 [0]]} (b)$	$1 \cdot 1 ∕ 5$	$M [c] = {ESC, 1 b, 1}$
			$M [𝒞 [0]] = {ESC, 4 a, 2 b, 2 c, 1}$
a	$c_{M [b]} (a)$	$1 ∕ 3$	$M [b] = {ESC, 1 a, 2 c, 1}$
b	$c_{M [a]} (b)$	$2 ∕ 3$	$M [a] = {ESC, 1 b, 3}$
a	$c_{M [b]} (a)$	$2 ∕ 4$	$M [b] = {ESC, 1 a, 3 c, 1}$
b	$c_{M [a]} (b)$	$3 ∕ 4$	$M [a] = {ESC, 1 b, 4}$
a	$c_{M [b]} (a)$	$3 ∕ 5$	$M [b] = {ESC, 1 a, 4 c, 1}$
a	$c_{M [a]} (ESC) c_{M [𝒞 [0]]} (a)$	$1 ∕ 5 \cdot 2 ∕ 4$	$M [a] = {ESC, 1 a, 1 b, 4}$
			$M [𝒞 [0]] = {ESC, 4 a, 3 b, 2 c, 1}$
a	$c_{M [a]} (a)$	$1 ∕ 6$	$M [a] = {ESC, 1 a, 2 b, 4}$
a	$c_{M [a]} (a)$	$2 ∕ 7$	$M [a] = {ESC, 1 a, 3 b, 4}$
a	$c_{M [a]} (a)$	$3 ∕ 8$	$M [a] = {ESC, 1 a, 4 b, 4}$
a	$c_{M [a]} (a)$	$4 ∕ 9$	$M [a] = {ESC, 1 a, 5 b, 4}$
a	$c_{M [a]} (a)$	$5 ∕ 10$	$M [a] = {ESC, 1 a, 6 b, 4}$

Figure 1: An example of context-based statistical encoding.

11 Decoder

Equal to the step 1 of the encoder.
While the input is not exhausted:
1. $i \leftarrow k$ (except for the ﬁrst symbol, where $i \leftarrow 0$ ).
2. $s \leftarrow$ next decoded symbol.
3. While s = ESC:
  1. Update $p (ESC | 𝒞 [i])$ .
  2. $i \leftarrow i - 1$ .
  3. $s \leftarrow$ next decoded symbol.
4. Update $p (s | 𝒞 [i])$ .
5. While i < k:
  1. $i \leftarrow i + 1$ .
  2. Update $p (s | 𝒞 [i])$ (insert $s$ into the $𝒞 [i]$ context).

Let’s go the the lab!

         Codec | lena boats pepers zelda Average
---------------+--------------------------------
             : |    :     :      :     :       :
arith-n-c -o 1 | ....  ....   ....  ....    ....
arith-n-c -o 2 | ....  ....   ....  ....    ....
arith-n-c -o 3 | ....  ....   ....  ....    ....
Check the reversibility.

References

[1] J.G. Cleary and I.H. Witten. Data Compression using Adaptive Coding and Partial String Matching. IEEE Transactions on Communications, 4(32):396–402, 1984.