All four TCP text projects are produced in the same way and to the same standards, which are documented, at least in part, on the TCP web site.
- Accuracy. The TCP strives to produce texts that are as accurately transcribed as possible, with a specified overall accuracy rate of 99.995% or better (i.e. one error or fewer per 20,000 characters).
- Keying. Given the nature of the material, the only method found to deliver such accuracy economically has been to have the books keyed by data conversion firms under contract.
- Quality control. Accuracy of transcription and aptness of markup are assessed in all cases by a group of library-based proofers and reviewers managed by the University of Michigan DLPS.
- Encoding. All resultant text files are marked up in valid SGML or XML (SGML is archived, XML is exported) conforming to a proprietary "Document Type Description" (DTD) derived from the P3/P4 version of the Text Encoding Initiative (TEI) standard.
- Purposeful markup. Compared to the full TEI, the TCP DTD is very simple and intended to capture only the features most useful for intelligible display, intelligent navigation, and productive searching. The TCP practice is to capture, so far as feasible, the overall hierarchical structure of each book (parts, sections, chapters, etc.); the features that tend to mark the beginnings and ends of divisions (headings, explicits, salutations, valedictions, datelines, bylines, epigraphs, etc.); the most significant elements of discourse and organization (paragraphs in prose, lines and stanzas in verse, speeches, speakers, and stage directions in drama, notes, block quotes, sequential numerations of all kinds); and only the most essential aspects of physical formatting (page breaks, lists, tables, font changes).
- Fidelity to the original. In each case, the text is intended to represent the book as originally printed, so far as that is possible. Printer's errors are preserved, hand-written changes are ignored, duplicate scans are omitted, out-of-order images are keyed in the intended order, and most of the unusual characters of the original are preserved.
- Ease of reading and searching. At the same time, though the transcriptions are carried out character-by-character, TCP, on the theory that all transcription is a kind of translation from one symbolic system to another, tends to define characters in terms more of their meaning than of their form, and to map eccentric letter-forms to meaningful modern equivalents, generally in keeping with the Unicode definition of "character."
- Languages. Though most of the TCP texts are in English, many are not. Books and divisions of books not in English are tagged with an appropriate language code, but are not otherwise distinguished.
- Omitted material. The TCP produces Latin-alphabet text. Non-textual material such as musical notation, mathematical formulae, and illustrations (except for any text they may contain) are omitted and their locations marked with a special tag. Extended text in non-Latin alphabets (Greek, Hebrew, Persian, etc.) is also omitted.
Read more about this topic: Text Creation Partnership
Other articles related to "standards":
... Emission standards are requirements that set specific limits to the amount of pollutants that can be released into the environment ... Many emissions standards focus on regulating pollutants released by automobiles (motor cars) and other powered vehicles but they can also regulate emissions from industry, power ... Frequent policy alternatives to emissions standards are technology standards ...
... The Parker Morris Committee drew up an influential 1961 report on housing space standards in public housing in the United Kingdom entitled Homes for ... the quality of social housing needed to be improved to match the rise in living standards and made a number of recommendations ... took a functional approach to determining space standards in the home by considering what furniture was needed in rooms, the space needed to use the furniture and move around it, and the space ...
... Its primary aim is to avoid duplication of (potentially conflicting) standards between CEN and ISO ... In the last decade CEN has adopted a number of ISO standards which replaced the corresponding CEN standards ...
... The major functions of QEC are to review quality standards and the quality of teaching and learning in each subject area ... promote public confidence that the quality and standards of the award of degrees are enhanced and safeguarded ... define lucid and explicit standards as points of reference to the reviews to be carried out ...
Famous quotes containing the word standards:
“Men are rewarded for learning the practice of violence in virtually any sphere of activity by money, admiration, recognition, respect, and the genuflection of others honoring their sacred and proven masculinity. In male culture, police are heroic and so are outlaws; males who enforce standards are heroic and so are those who violate them.”
—Andrea Dworkin (b. 1946)
“Chief among our gains must be reckoned this possibility of choice, the recognition of many possible ways of life, where other civilizations have recognized only one. Where other civilizations give a satisfactory outlet to only one temperamental type, be he mystic or soldier, business man or artist, a civilization in which there are many standards offers a possibility of satisfactory adjustment to individuals of many different temperamental types, of diverse gifts and varying interests.”
—Margaret Mead (19011978)
“The standards of His Majestys taste made all those ladies who aspired to his favour, and who were near the Statutable size, strain and swell themselves, like the frogs in the fable, to rival and bulk and dignity of the ox. Some succeeded, and others burst.”
—Philip Dormer Stanhope, 4th Earl Chesterfield (16941773)