Probabilistic models

Juan Francisco Rodríguez Herrera
Vicente González Ruiz

July 10, 2017

Contents

1 Funtionality
2 Static models
3 Adaptive models
4 Encoding
5 Decoding
6 Initially empty models
7 Encoder
8 Decoder
9 Models with memory
10 Encoder
11 Decoder

1 Funtionality

2 Static models

Let’s go the the lab!

  1. Download http://www.ace.ual.es/~vruiz/doctorado/text-compression.tar.gz and make huff_s0.
  2. Encode each image of The Test Image Corpus using the codec. Continue filling the table:
        Codec | lena boats pepers zelda  
    ----------+------------------------  
          RLE | ....  ....   ....  ....  
      BWT+RLE | ....  ....   ....  ....  
         LZSS | ....  ....   ....  ....  
          LZW | ....  ....   ....  ....  
      huff_s0 | ....  ....   ....  ....

  3. Check that the huff_s0 codec is lossless.

3 Adaptive models

4 Encoding

  1. Asign the same probability to all the symbols.
  2. While the input if not exhausted:
    1. Encode the next symbol.
    2. Update (increase) its probability.

5 Decoding

  1. Identical to the step 1 of the encoder.
  2. While the input is not exhausted:
    1. Decode the next symbol.
    2. Identical to the step 2.b of the encoder.

Let’s go the the lab!

  1. Download http://www.ace.ual.es/~vruiz/doctorado/progs.tar.gz and make huff_a0 and make arith_a0.
  2. Encode each image of The Test Image Corpus using the codec. Continue filling the table:
        Codec | lena boats pepers zelda Average  
    ----------+--------------------------------  
            : |    :     :      :     :       :  
      huff_s0 | ....  ....   ....  ....    ....  
      huff_a0 | ....  ....   ....  ....    ....  
     arith_a0 | ....  ....   ....  ....    ....

  3. Remember to check that both codecs are lossless.

6 Initially empty models

7 Encoder

  1. Set the probability of the ESC to 1.0 (and the probability of the rest of the symbols to 0.0).
  2. While the input is not exhausted:
    1. s next symbol.
    2. If s has been found before, then:
      1. Encode s and output c(s).
    3. Else:
      1. Output c(ESC).
      2. Output a raw symbol s.
    4. Update p(s).

8 Decoder

  1. Identical to the step 1 of the encoder.
  2. While the input is not exhausted:
    1. c(s) next code-word.
    2. Decode s.
    3. If s = ESC, then:
      1. Input a raw symbol s.
    4. Update p(s).
    5. Output s.

Let’s go the the lab!

  1. make arith-n-c and make arith-n-d.
  2.          Codec | lena boats pepers zelda Average  
    ---------------+--------------------------------  
                 : |    :     :      :     :       :  
    arith-n-c -o 0 | ....  ....   ....  ....    .... #1  
     BWT | ahuff_0 | ....  ....   ....  ....    ....  
          bzip2 -9 | ....  ....   ....  ....    .... #2  
           gzip -9 | ....  ....   ....  ....    .... #3  
     
    #1 -> Similar to arith_a0 but using an initially  
          empty model.  
     
    #2 -> Similar to BWT | ahuff_0  
     
    #3 -> Similar to LZ77 + ahuff_0

  3. Check the reversibility.

9 Models with memory

10 Encoder

  1. Create an empty model for every context 0 i k.
  2. Create an non-empty model for k = 1.
  3. While the input is not exhausted:
    1. s Inputlog 2(r).
    2. i k (except for the first symbol, where i 0).
    3. While p(s|𝒞[i]) = 0 (it is the first time that s follows 𝒞[i]):
      1. Output c(ESC|𝒞[i]).
      2. Update p(ESC|𝒞[i]).
      3. Update p(s|𝒞[i]) (insert s into the 𝒞[i] context).
      4. i i 1.
    4. Output c(s|𝒞[i]). The symbols that were in contexts with order > i must be excluded of the actual (𝒞[i]) context because s is not none of them.
    5. If i 0, update p(s|𝒞[i]).

Example


Input Output Prob. of the output Related contexts




a cM[𝒞[0]](ESC)cM[𝒞[1]](a) 1 1(r + 1) M[𝒞[0]] = {ESC,2a,1}
b cM[a](ESC)cM[𝒞[0]](ESC)cM[𝒞[1]](b) 1 23 1r M[a] = {ESC,1b,1}
  M[c[0]] = {ESC,3a,1b,1}
a cM[b](ESC)cM[𝒞[0]](a) 1 13 M[b] = {ESC,1a,1}
  M[𝒞[0]] = {ESC,3a,2b,1}
b cM[a](b) 12 M[𝒞[a]] = {ESC,1b,2}
c cM[b](ESC)cM[𝒞[0]](ESC)cM[𝒞[1]](c) 12 14 1(r 1) M[b] = {ESC,1a,1c,1}
  M[𝒞[0]] = {ESC,4a,2b,1c,1}
b cM[c](ESC)cM[𝒞[0]](b) 1 15 M[c] = {ESC,1b,1}
  M[𝒞[0]] = {ESC,4a,2b,2c,1}
a cM[b](a) 13 M[b] = {ESC,1a,2c,1}
b cM[a](b) 23 M[a] = {ESC,1b,3}
a cM[b](a) 24 M[b] = {ESC,1a,3c,1}
b cM[a](b) 34 M[a] = {ESC,1b,4}
a cM[b](a) 35 M[b] = {ESC,1a,4c,1}
a cM[a](ESC)cM[𝒞[0]](a) 15 24 M[a] = {ESC,1a,1b,4}
  M[𝒞[0]] = {ESC,4a,3b,2c,1}
a cM[a](a) 16 M[a] = {ESC,1a,2b,4}
a cM[a](a) 27 M[a] = {ESC,1a,3b,4}
a cM[a](a) 38 M[a] = {ESC,1a,4b,4}
a cM[a](a) 49 M[a] = {ESC,1a,5b,4}
a cM[a](a) 510 M[a] = {ESC,1a,6b,4}

Figure 1: An example of context-based statistical encoding.

11 Decoder

  1. Equal to the step 1 of the encoder.
  2. While the input is not exhausted:
    1. i k (except for the first symbol, where i 0).
    2. s next decoded symbol.
    3. While s = ESC:
      1. Update p(ESC|𝒞[i]).
      2. i i 1.
      3. s next decoded symbol.
    4. Update p(s|𝒞[i]).
    5. While i < k:
      1. i i + 1.
      2. Update p(s|𝒞[i]) (insert s into the 𝒞[i] context).

Let’s go the the lab!

  1.          Codec | lena boats pepers zelda Average  
    ---------------+--------------------------------  
                 : |    :     :      :     :       :  
    arith-n-c -o 1 | ....  ....   ....  ....    ....  
    arith-n-c -o 2 | ....  ....   ....  ....    ....  
    arith-n-c -o 3 | ....  ....   ....  ....    ....

  2. Check the reversibility.

References

[1]   J.G. Cleary and I.H. Witten. Data Compression using Adaptive Coding and Partial String Matching. IEEE Transactions on Communications, 4(32):396–402, 1984.