E-text - "Just Plain Text" | Technology Trends

"Just Plain Text"

In some communities, "E-text" is used much more narrowly, to refer to electronic documents that are, so to speak, "just plain ASCII." By this is meant not only that the document is a plain text file, but that it has no information beyond "the text itself". Michael S. Hart, for example, argued that this was the only way to have true portability. However, this usage is now uncommon, as the notion of "just plain ASCII" (though very attractive at first glance), has turned out to have difficulties.

First, this narrow sense can only deal with texts in languages that use only only the English letters. Spanish ñ cannot be represented (unless awkwardly as "~n" or some such); the accented vowels used in most European languages are unavailable; non-Latin-based writing systems are right out.

Second, of course, diagrams and pictures cannot be accommodated, and many books have at least some such material.

Third, "E-texts" in this narrow sense have no principled or reliable way to distinguish "the text" from other things that occur in a work. For example, page numbers, page headers, and footnotes might be omitted, or might simply appear as additional lines of text, perhaps with blank lines before and after (or not). An ornate separator line might be represented instead by a line of asterisks (or not). Chapter and sections titles, likewise, are just additional lines of text: they might be detectable by capitalization if they were all caps in the original (or not).

In consequence of this, such texts cannot be reliably re-formatted. A program cannot reliably tell where footnotes attach to the text, where headers and footers are, or perhaps even where paragraphs break, so it cannot re-arrange the text to accurately represent it in another way, such as on a screen of a different size, or read aloud for the visually impaired. Programs might apply heuristics to guess at the structure and do their best, but this can easily fail.

Third, even the simplest tables pose problems. It is only feasible to lay out a table without markup if you assume a monospaced (fixed-pitch) font; and then if some reader uses a different font, all bets are off.

Fourth, and a perhaps surprisingly important issues, the narrow sense of "E-text" affords no way to represent information about the work. For example, is it the first or the tenth edition? Who prepared it, and what rights do they reserve or grant to others? Is this the raw version straight off a scanner, or has it been proofread and corrected? Metadata relating to the text is sometimes included with an e-text, but there is by this definition no way to say whether or where it is preset.

If fact, any e-text uses some selection of control characters, spaces, tabs, and the like to express some distinctions: Spaces between words are standard in English; and 2 returns and 5 spaces makes something that looks like a common kind of paragraph. Individual texts often use idiosyncratic convention, such as "###" plus a number to mark page breaks. All these are, properly speaking, markup, but not in a documented or reliable form.

The narrow sense of "E-text" has fallen out of favors. Nevertheless, many such texts are freely available on the Web, perhaps as much because they are easily produced as because of any purported portability advantage. For many years Project Gutenberg strongly favored this model of text, but with time, has begun to develop and distribute more capable forms such as HTML.

Famous quotes containing the words plain and/or text:

“I might show facts as plain as day:
But, since your eyes are blind, you’d say,
“Where? What?” and turn away.”
—Christina Georgina Rossetti (1830–1894)

“Don Pedro. But when shall we set the savage bull’s horns on the sensible Benedick’s head?
Claudio. Yes, and text underneath, “Here dwells Benedick, the married man?””
—William Shakespeare (1564–1616)

Main Site Subjects

Human Computer Interface