Data Conversion - Lost and Inexact Data Conversion

Lost and Inexact Data Conversion

The objective of data conversion is to maintain all of the data, and as much of the embedded information as possible. This can only be done if the target format supports the same features and data structures present in the source file. Conversion of a word processing document to a plain text file necessarily involves loss of formatting information, because plain text format does not support word processing constructs such as marking a word as boldface. For this reason, conversion from one format to one which does not support a feature which is important to the user is rarely carried out, though it may be necessary for interoperability, e.g. converting a file from one version of Microsoft Word to an earlier version to enable transfer and use by other users who do not have the same later version of Word installed on their computer.

Loss of information can be mitigated by approximation in the target format. There is no way of converting a character like ä to ASCII, since the ASCII standard lacks it, but the information may be retained by approximating the character as ae. Of course, this is not an optimal solution, and can impact operations like searching and copying; and if a language makes a distinction between ä and ae, then that approximation does involve loss of information.

Data conversion can also suffer from inexactitude, the result of converting between formats that are conceptually different. The WYSIWYG paradigm, extant in word processors and desktop publishing applications, versus the structural-descriptive paradigm, found in SGML, XML and many applications derived therefrom, like HTML and MathML, is one example. Using a WYSIWYG HTML editor conflates the two paradigms, and the result is HTML files with suboptimal, if not nonstandard, code. In the WYSIWYG paradigm a double linebreak signifies a new paragraph, as that is the visual cue for such a construct, but a WYSIWYG HTML editor will usually convert such a sequence to

, which is structurally no new paragraph at all. As another example, converting from PDF to an editable word processor format is a tough chore, because PDF records the textual information like engraving on stone, with each character given a fixed position and linebreaks hard-coded, whereas word processor formats accommodate text reflow. PDF does not know of a word space character—the space between two letters and the space between two words differ only in quantity. Therefore, a title with ample letter-spacing for effect will usually end up with spaces in the word processor file, for example INTRODUCTION with spacing of 1 em as I N T R O D U C T I O N on the word processor.

Read more about this topic:  Data Conversion

Famous quotes containing the words conversion, data, lost and/or inexact:

    The conversion of a savage to Christianity is the conversion of Christianity to savagery.
    George Bernard Shaw (1856–1950)

    To write it, it took three months; to conceive it three minutes; to collect the data in it—all my life.
    F. Scott Fitzgerald (1896–1940)

    There are souls that are incurable and lost to the rest of society. Deprive them of one means of folly, they will invent ten thousand others. They will create subtler, wilder methods, methods that are absolutely DESPERATE. Nature herself is fundamentally antisocial, it is only by a usurpation of powers that the organized body of society opposes the natural inclination of humanity.
    Antonin Artaud (1896–1948)

    Thanks to recent trends in the theory of knowledge, history is now better aware of its own worth and unassailability than it formerly was. It is precisely in its inexact character, in the fact that it can never be normative and does not have to be, that its security lies.
    Johan Huizinga (1872–1945)