Burrows–Wheeler Transform - Sample Implementation

Sample Implementation

This Python implementation sacrifices speed for simplicity: the program is short, but takes more than the linear time that would be desired in a practical implementation.

Using the null character as the end of file marker, and using s + s to construct the ith rotation of s, the forward transform takes the last character of each of the sorted rows:

def bwt(s): """Apply Burrows-Wheeler transform to input string.""" assert "\0" not in s, "Input string cannot contain null character ('\\0')" s += "\0" # Add end of file marker table = sorted(s + s for i in range(len(s))) # Table of rotations of string last_column = for row in table] # Last characters of each row return "".join(last_column) # Convert list of characters into string

The inverse transform repeatedly inserts r as the left column of the table and sorts the table. After the whole table is built, it returns the row that ends with null, minus the null.

def ibwt(r): """Apply inverse Burrows-Wheeler transform.""" table = * len(r) # Make empty table for i in range(len(r)): table = sorted(r + table for i in range(len(r))) # Add a column of r s = # Find the correct row (ending in "\0") return s.rstrip("\0") # Get rid of trailing null character

Here is another, more efficient method for the inverse transform. Although more complex, it increases the speed greatly when decoding lengthy strings.

def ibwt(r, *args): "Inverse Burrows-Wheeler transform. args is the original index \ if it was not indicated by a null byte" firstCol = "".join(sorted(r)) count = *256 byteStart = *256 output = * len(r) shortcut = *len(r) #Generates shortcut lists for i in range(len(r)): shortcutIndex = ord(r) shortcut = count count += 1 shortcutIndex = ord(firstCol) if byteStart == -1: byteStart = i localIndex = (r.index("\x00") if not args else args) for i in range(len(r)): #takes the next index indicated by the transformation vector nextByte = r output = nextByte shortcutIndex = ord(nextByte) #assigns localIndex to the next index in the transformation vector localIndex = byteStart + shortcut return "".join(output).rstrip("\x00")

Read more about this topic:  Burrows–Wheeler Transform

Famous quotes containing the word sample:

    As a rule they will refuse even to sample a foreign dish, they regard such things as garlic and olive oil with disgust, life is unliveable to them unless they have tea and puddings.
    George Orwell (1903–1950)