Example
The transform is done by sorting all rotations of the text in lexicographic order, then taking the last column. For example, the text "^BANANA|" is transformed into "BNN^AA|A" through these steps (the red | character indicates the 'EOF' pointer):
Transformation | ||||
---|---|---|---|---|
Input | All Rotations |
Sorting All Rows in Alphabetical Order by their first letters |
Taking Last Column |
Output Last Column |
^BANANA| | ^BANANA| |^BANANA A|^BANAN NA|^BANA ANA|^BAN NANA|^BA ANANA|^B BANANA|^ | ANANA|^B ANA|^BAN A|^BANAN BANANA|^ NANA|^BA NA|^BANA ^BANANA| |^BANANA | ANANA|^B ANA|^BAN A|^BANAN BANANA|^ NANA|^BA NA|^BANA ^BANANA| |^BANANA | BNN^AA|A |
The following pseudocode gives a simple (though inefficient) way to calculate the BWT and its inverse. It assumes that the input string s
contains a special character 'EOF' which is the last character, occurs nowhere else in the text, and is ignored during sorting.
To understand why this creates more-easily-compressible data, consider transforming a long English text frequently containing the word "the". Sorting the rotations of this text will often group rotations starting with "he " together, and the last character of that rotation (which is also the character before the "he ") will usually be "t", so the result of the transform would contain a number of "t" characters along with the perhaps less-common exceptions (such as if it contains "Brahe ") mixed in. So it can be seen that the success of this transform depends upon one value having a high probability of occurring before a sequence, so that in general it needs fairly long samples (a few kilobytes at least) of appropriate data (such as text).
The remarkable thing about the BWT is not that it generates a more easily encoded output—an ordinary sort would do that—but that it is reversible, allowing the original document to be re-generated from the last column data.
The inverse can be understood this way. Take the final table in the BWT algorithm, and erase all but the last column. Given only this information, you can easily reconstruct the first column. The last column tells you all the characters in the text, so just sort these characters alphabetically to get the first column. Then, the first and last columns (of each row) together give you all pairs of successive characters in the document, where pairs are taken cyclically so that the last and first character form a pair. Sorting the list of pairs gives the first and second columns. Continuing in this manner, you can reconstruct the entire list. Then, the row with the "end of file" character at the end is the original text. Reversing the example above is done like this:
Inverse Transformation | |||
---|---|---|---|
Input | |||
BNN^AA|A | |||
Add 1 | Sort 1 | Add 2 | Sort 2 |
B N N ^ A A | A | A A A B N N ^ | | BA NA NA ^B AN AN |^ A| | AN AN A| BA NA NA ^B |^ |
Add 3 | Sort 3 | Add 4 | Sort 4 |
BAN NAN NA| ^BA ANA ANA |^B A|^ | ANA ANA A|^ BAN NAN NA| ^BA |^B | BANA NANA NA|^ ^BAN ANAN ANA| |^BA A|^B | ANAN ANA| A|^B BANA NANA NA|^ ^BAN |^BA |
Add 5 | Sort 5 | Add 6 | Sort 6 |
BANAN NANA| NA|^B ^BANA ANANA ANA|^ |^BAN A|^BA | ANANA ANA|^ A|^BA BANAN NANA| NA|^B ^BANA |^BAN | BANANA NANA|^ NA|^BA ^BANAN ANANA| ANA|^B |^BANA A|^BAN | ANANA| ANA|^B A|^BAN BANANA NANA|^ NA|^BA ^BANAN |^BANA |
Add 7 | Sort 7 | Add 8 | Sort 8 |
BANANA| NANA|^B NA|^BAN ^BANANA ANANA|^ ANA|^BA |^BANAN A|^BANA | ANANA|^ ANA|^BA A|^BANA BANANA| NANA|^B NA|^BAN ^BANANA |^BANAN | BANANA|^ NANA|^BA NA|^BANA ^BANANA| ANANA|^B ANA|^BAN |^BANANA A|^BANAN | ANANA|^B ANA|^BAN A|^BANAN BANANA|^ NANA|^BA NA|^BANA ^BANANA| |^BANANA |
Output | |||
^BANANA| |
Read more about this topic: Burrows–Wheeler Transform
Famous quotes containing the word example:
“Our intellect is not the most subtle, the most powerful, the most appropriate, instrument for revealing the truth. It is life that, little by little, example by example, permits us to see that what is most important to our heart, or to our mind, is learned not by reasoning but through other agencies. Then it is that the intellect, observing their superiority, abdicates its control to them upon reasoned grounds and agrees to become their collaborator and lackey.”
—Marcel Proust (18711922)