Burrows–Wheeler Transform

The Burrows–Wheeler transform (BWT, also called block-sorting compression), is an algorithm used in data compression techniques such as bzip2. It was invented by Michael Burrows and David Wheeler in 1994 while working at DEC Systems Research Center in Palo Alto, California. It is based on a previously unpublished transformation discovered by Wheeler in 1983.

When a character string is transformed by the BWT, none of its characters change value. The transformation permutes the order of the characters. If the original string had several substrings that occurred often, then the transformed string will have several places where a single character is repeated multiple times in a row. This is useful for compression, since it tends to be easy to compress a string that has runs of repeated characters by techniques such as move-to-front transform and run-length encoding.

For example:

Input SIX.MIXED.PIXIES.SIFT.SIXTY.PIXIE.DUST.BOXES
Output TEXYDST.E.IXIXIXXSSMPPS.B..E.S.EUSFXDIIOIIIT

The output is easier to compress because it has many repeated characters. In fact, in the transformed string, there are a total of six runs of identical characters: XX, SS, PP, .., II, and III, which together make 13 out of the 44 characters in it.

Read more about Burrows–Wheeler Transform:  Example, Optimization, Bijective Variant, Dynamic Burrows–Wheeler Transform, Sample Implementation, BWT in Bioinformatics

Famous quotes containing the word transform:

    Government ... thought [it] could transform the country through massive national programs, but often the programs did not work. Too often they only made things worse. In our rush to accomplish great deeds quickly, we trampled on sound principles of restraint and endangered the rights of individuals.
    Gerald R. Ford (b. 1913)