The Context-based Text Predictive transform

Juan Francisco Rodríguez Herrera
Vicente González Ruiz

September 12, 2016

Contents

1 A MTF transform that uses a probabilistic model
2 0-order encoder
3 0-order decoder
4 N-order encoder
5 N-order decoder

1 A MTF transform that uses a probabilistic model

2 0-order encoder

  1. The step 1 of the MTF transform, although now every node of the list stores also a count of the symbol.
  2. While the input is not exhausted:
    1. s next input symbol.
    2. c position of s in L (the prediction error).
    3. Output c.
    4. Update the count of L[c] (the count of s) and keep sorted L.

Example

L
Input Output







a, 0 b, 0 c, 0 b b
b, 1 a, 0 c, 0 a b
a, 1 b, 1 c, 0 a 0
a, 2 b, 1 c, 0 a 0
a, 3 b, 1 c, 0 a 0
a, 4 b, 1 c, 0 a 0
a, 5 b, 1 c, 0 a 0
a, 6 b, 1 c, 0 b 1
a, 6 b, 2 c, 0 b 1
a, 6 b, 3 c, 0 a 0
a, 7 b, 3 c, 0 b 1
a, 7 b, 4 c, 0 a 0
a, 8 b, 4 c, 0 a 0
a, 9 b, 4 c, 0 c c
a, 9 b, 4 c, 1 a 0
a,10 b, 4 c, 1 a 0
a,11 b, 4 c, 1 b 1

3 0-order decoder

  1. The step 1 of the encoder.
  2. While the input is not exhausted:
    1. c next input code.
    2. s L[c].
    3. Output s.
    4. Step 2.d of the encoder.

Let’s go the the lab!

  1. make tpt (Notice that tpt should be invoked using the sintaxis tpt e <order> < input_file > output_file).
  2.                              Codec | lena boats pepers zelda Average  
    -----------------------------------+--------------------------------  
                                     : |    :     :      :     :       :  
                tpt 0 | arith-n-c -o 0 | ....  ....   ....  ....    ....  
          tpt 0 | rle | arith-n-c -o 0 | ....  ....   ....  ....    ....  
          bwt | tpt 0 | arith-n-c -o 0 | ....  ....   ....  ....    ....  
    bwt | tpt 0 | rle | arith-n-c -o 0 | ....  ....   ....  ....    ....

  3. Are they lossless?

4 N-order encoder

  1. Let 𝒞[i] the context of s and L𝒞[i] the list for that context. If i > 0 then the lists are empty, else, the list is full and the count of every node is 0.
  2. Let N the order of the prediction.
  3. Let H = a list of tested symbols. All symbols in H must be different.
  4. While the input is not exhausted:
    1. s the next input symbol.
    2. i k (except for the first symbol, where i 0).
    3. While sL𝒞[i]:
      1. H reduce(H L𝒞[i]). (reduce() deletes the repeated nodes).
      2. Update the count of s in L𝒞[i] and keep sorted it.
      3. i i 1.
    4. Let c the position of s en L𝒞[i].
    5. c c+ symbols of H L𝒞[i]. In this way, the decoder will know the length of the context where s happens and does not count the same symbol twice.
    6. Output c.
    7. Update the count of s in L𝒞[i] and keep sorted it.
    8. H .

Example (k = 1)

Input Output Related contexts



a a L = {a,1}
b b La = {b,1},L = {b,1a,1}
a 1 Lb = {a,1},L = {a,2b,1}
b 0 La = {b,2}
c c Lb = {c,1a,1},L = {a,2c,1b,1}
b 2 Lc = {b,1},L = {b,2a,2c,1}
a 1 Lb = {a,2c,1}
b 0 La = {b,3}
a 0 Lb = {a,3c,1}
b 0 La = {b,4}
a 0 Lb = {a,4c,1}
a 1 La = {b,4a,1},L = {a,3b,2c,1}
a 1 La = {b,4a,2}
a 1 La = {b,4a,3}
a 0 La = {a,4b,4}
a 0 La = {a,5b,4}
a 0 La = {a,6b,4}

5 N-order decoder

  1. Steps 1, 2 and 3 of the encoder.
  2. While the input is not exhausted:
    1. c the next input code.
    2. i k (except for the first symbol, where i 0).
    3. While L𝒞[i][c] = :
      1. H reduce(H L𝒞[i]).
      2. i i 1.
    4. s reduce(H L𝒞[i])[c].
    5. Update the count of L𝒞[i][c].
    6. While i < k:
      1. i i + 1.
      2. Insert the symbol s in L𝒞[i].

Let’s go the the lab!

  1.                              Codec | lena boats pepers zelda Average  
    -----------------------------------+--------------------------------  
                                     : |    :     :      :     :       :  
                tpt 1 | arith-n-c -o 0 | ....  ....   ....  ....    ....  
                tpt 2 | arith-n-c -o 0 | ....  ....   ....  ....    ....  
          bwt | tmp 1 | arith-n-c -o 0 | ....  ....   ....  ....    ....