The Context-based Text Predictive transform

Juan Francisco Rodríguez Herrera
Vicente González Ruiz

September 12, 2016

The MTF uses a model where a symbol that has happened only one time can get a index-code that is lower than the index-code of a symbol that has been found thousands of times :-(
We can solve this problem if the positions of the symbols are determined by their probability. In other words, the list $L$ will be sorted by the ocurrence of the symbols.

The step 1 of the MTF transform, although now every node of the list stores also a count of the symbol.
While the input is not exhausted:
1. $s \leftarrow$ next input symbol.
2. $c \leftarrow$ position of $s$ in $L$ (the prediction error).
3. Output $\leftarrow c$ .
4. Update the count of $L [c]$ (the count of $s$ ) and keep sorted $L$ .

L

The step 1 of the encoder.
While the input is not exhausted:
1. $c \leftarrow$ next input code.
2. $s \leftarrow L [c]$ .
3. Output $s$ .
4. Step 2.d of the encoder.

make tpt (Notice that tpt should be invoked using the sintaxis tpt e <order> < input_file > output_file).
                             Codec | lena boats pepers zelda Average
-----------------------------------+--------------------------------
                                 : |    :     :      :     :       :
            tpt 0 | arith-n-c -o 0 | ....  ....   ....  ....    ....
      tpt 0 | rle | arith-n-c -o 0 | ....  ....   ....  ....    ....
      bwt | tpt 0 | arith-n-c -o 0 | ....  ....   ....  ....    ....
bwt | tpt 0 | rle | arith-n-c -o 0 | ....  ....   ....  ....    ....
Are they lossless?

Let $𝒞 [i]$ the context of $s$ and $L_{𝒞 [i]}$ the list for that context. If $i > 0$ then the lists are empty, else, the list is full and the count of every node is $0$ .
Let $N$ the order of the prediction.
Let $H = \emptyset$ a list of tested symbols. All symbols in $H$ must be diﬀerent.
While the input is not exhausted:
1. $s \leftarrow$ the next input symbol.
2. $i \leftarrow k$ (except for the ﬁrst symbol, where $i \leftarrow 0$ ).
3. While s∉L𝒞[i]:
  1. $H \leftarrow reduce (H \cup L_{𝒞 [i]})$ . (reduce $()$ deletes the repeated nodes).
  2. Update the count of $s$ in $L_{𝒞 [i]}$ and keep sorted it.
  3. $i \leftarrow i - 1$ .
4. Let $c$ the position of $s$ en $L_{𝒞 [i]}$ .
5. $c \leftarrow c +$ symbols of $H - L_{𝒞 [i]}$ . In this way, the decoder will know the length of the context where $s$ happens and does not count the same symbol twice.
6. Output $\leftarrow c$ .
7. Update the count of $s$ in $L_{𝒞 [i]}$ and keep sorted it.
8. $H \leftarrow \emptyset$ .