Burrows–Wheeler Transform - Example

Example

The transform is done by sorting all rotations of the text in lexicographic order, then taking the last column. For example, the text "^BANANA|" is transformed into "BNN^AA|A" through these steps (the red | character indicates the 'EOF' pointer):

Transformation
Input All
Rotations
Sorting All Rows in Alphabetical
Order by their first letters
Taking
Last Column
Output
Last Column
^BANANA| ^BANANA| |^BANANA A|^BANAN NA|^BANA ANA|^BAN NANA|^BA ANANA|^B BANANA|^ ANANA|^B ANA|^BAN A|^BANAN BANANA|^ NANA|^BA NA|^BANA ^BANANA| |^BANANA ANANA|^B ANA|^BAN A|^BANAN BANANA|^ NANA|^BA NA|^BANA ^BANANA| |^BANANA BNN^AA|A

The following pseudocode gives a simple (though inefficient) way to calculate the BWT and its inverse. It assumes that the input string s contains a special character 'EOF' which is the last character, occurs nowhere else in the text, and is ignored during sorting.

function BWT (string s) create a table, rows are all possible rotations of s sort rows alphabetically return (last column of the table) function inverseBWT (string s) create empty table repeat length(s) times insert s as a column of table before first column of the table // first insert creates first column sort rows of the table alphabetically return (row that ends with the 'EOF' character)

To understand why this creates more-easily-compressible data, consider transforming a long English text frequently containing the word "the". Sorting the rotations of this text will often group rotations starting with "he " together, and the last character of that rotation (which is also the character before the "he ") will usually be "t", so the result of the transform would contain a number of "t" characters along with the perhaps less-common exceptions (such as if it contains "Brahe ") mixed in. So it can be seen that the success of this transform depends upon one value having a high probability of occurring before a sequence, so that in general it needs fairly long samples (a few kilobytes at least) of appropriate data (such as text).

The remarkable thing about the BWT is not that it generates a more easily encoded output—an ordinary sort would do that—but that it is reversible, allowing the original document to be re-generated from the last column data.

The inverse can be understood this way. Take the final table in the BWT algorithm, and erase all but the last column. Given only this information, you can easily reconstruct the first column. The last column tells you all the characters in the text, so just sort these characters alphabetically to get the first column. Then, the first and last columns (of each row) together give you all pairs of successive characters in the document, where pairs are taken cyclically so that the last and first character form a pair. Sorting the list of pairs gives the first and second columns. Continuing in this manner, you can reconstruct the entire list. Then, the row with the "end of file" character at the end is the original text. Reversing the example above is done like this:

Inverse Transformation
Input
BNN^AA|A
Add 1 Sort 1 Add 2 Sort 2
B N N ^ A A | A A A A B N N ^ | BA NA NA ^B AN AN |^ A| AN AN A| BA NA NA ^B |^
Add 3 Sort 3 Add 4 Sort 4
BAN NAN NA| ^BA ANA ANA |^B A|^ ANA ANA A|^ BAN NAN NA| ^BA |^B BANA NANA NA|^ ^BAN ANAN ANA| |^BA A|^B ANAN ANA| A|^B BANA NANA NA|^ ^BAN |^BA
Add 5 Sort 5 Add 6 Sort 6
BANAN NANA| NA|^B ^BANA ANANA ANA|^ |^BAN A|^BA ANANA ANA|^ A|^BA BANAN NANA| NA|^B ^BANA |^BAN BANANA NANA|^ NA|^BA ^BANAN ANANA| ANA|^B |^BANA A|^BAN ANANA| ANA|^B A|^BAN BANANA NANA|^ NA|^BA ^BANAN |^BANA
Add 7 Sort 7 Add 8 Sort 8
BANANA| NANA|^B NA|^BAN ^BANANA ANANA|^ ANA|^BA |^BANAN A|^BANA ANANA|^ ANA|^BA A|^BANA BANANA| NANA|^B NA|^BAN ^BANANA |^BANAN BANANA|^ NANA|^BA NA|^BANA ^BANANA| ANANA|^B ANA|^BAN |^BANANA A|^BANAN ANANA|^B ANA|^BAN A|^BANAN BANANA|^ NANA|^BA NA|^BANA ^BANANA| |^BANANA
Output
^BANANA|

Read more about this topic:  Burrows–Wheeler Transform

Other articles related to "example":

Example

Example may also refer to:

  • Example (musician), a British musician
  • Example, Florida rock band For Squirrels' sole major-label album, released in 1995
  • example.com, example.net, example.org, example.edu and .example, domain names reserved for use in documentation as examples
  • HMS Example (P165), an Archer-class patrol and training vessel of the British Royal Navy
  • The Example, a 1634 play by James Shirley
  • The Example (comics), a 2009 graphic novel by Tom Taylor and Colin Wilson
Wave Front Set - Definition - Example
... Then the projection on the first component of a distribution's wave front set is nothing else than its classical singular support, i.e ... the complement of the set on which its restriction would be a smooth function ...
Finite Model Theory - Basic Challenges - Characterisation of A Class of Structures - Example
4 ... Next we have to scale the structures up by increasing m ...

Famous quotes containing the word example:

    Our intellect is not the most subtle, the most powerful, the most appropriate, instrument for revealing the truth. It is life that, little by little, example by example, permits us to see that what is most important to our heart, or to our mind, is learned not by reasoning but through other agencies. Then it is that the intellect, observing their superiority, abdicates its control to them upon reasoned grounds and agrees to become their collaborator and lackey.
    Marcel Proust (1871–1922)