Unicode

Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems. Developed in conjunction with the Universal Character Set standard and published in book form as The Unicode Standard, the latest version of Unicode consists of a repertoire of more than 110,000 characters covering 100 scripts, a set of code charts for visual reference, an encoding methodology and set of standard character encodings, an enumeration of character properties such as upper and lower case, a set of reference data computer files, and a number of related items, such as character properties, rules for normalization, decomposition, collation, rendering, and bidirectional display order (for the correct display of text containing both right-to-left scripts, such as Arabic and Hebrew, and left-to-right scripts). As of September 2012, the most recent version is Unicode 6.2.

Unicode's success at unifying character sets has led to its widespread and predominant use in the internationalization and localization of computer software. The standard has been implemented in many recent technologies, including XML, the Java programming language, the Microsoft .NET Framework, and modern operating systems.

Unicode can be implemented by different character encodings. The most commonly used encodings are UTF-8, UTF-16 and the now-obsolete UCS-2. UTF-8 uses one byte for any ASCII characters, which have the same code values in both UTF-8 and ASCII encoding, and up to four bytes for other characters. UCS-2 uses a 16-bit code unit (two 8-bit bytes) for each character but cannot encode every character in the current Unicode standard. UTF-16 extends UCS-2, using two 16-bit units (4 × 8 bit) to handle each of the additional characters.

Read more about UnicodeOrigin and Development

Other articles related to "unicode":

Unicode - Origin and Development - Architecture and Terminology - Character General Category
... ASCII is both a control and a formatting separator in Unicode the General Category is "Other, Control" ...
character" class="article_title_2">Celsius - Name and Symbol Typesetting - Unicode Character
... Unicode provides a compatibility character for the degree Celsius at U+2103 (decimal 8451), for compatibility with CJK encodings that provide such a character (as such, in most fonts the width is ... immediately by the two-component version ℃ °C When viewed on computers that properly support Unicode, the above line may be similar to the image in the line below (enlarged for clarity) The ...
Siddhaṃ Alphabet - Unicode
... Siddhaṃ is not yet encoded in the Unicode standard ... to encode the script has been developed by Anshuman Pandey and submitted to the Unicode Technical Committee ...
Subscript And Superscript - Position Adjustment in Italic/oblique/slanted Styles - Unicode
... Unicode defines subscript and superscript characters in several areas, in particular it has a full set of superscript and subscript digits ...
Specials (Unicode Block)
... Specials is the name of a short Unicode block allocated at the very end of the Basic Multilingual Plane, at U+FFF0–FFFF ... Of these 16 codepoints, 5 are assigned as of Unicode 6.0 U+FFF9 interlinear annotation anchor, marks start of annotated text U+FFFA interlinear annotation separator, marks start ... and FFFF are not unassigned in the usual sense, but guaranteed not to be a Unicode character at all ...