Unicode - Origin and Development - Architecture and Terminology - Character General Category

Character General Category

Each code point has a single General Category property. The major categories are: Letter, Mark, Number, Punctuation, Symbol, Separator and Other. Within these categories, there are subdivisions. The General Category is not useful for every use, since legacy encodings have used multiple characteristics per single code point. E.g. U+000A Line feed (LF) in ASCII is both a control and a formatting separator; in Unicode the General Category is "Other, Control". Often, other properties must be used to specify the characteristics and behaviour of a code point. The possible General Categories are:

General Category
Value Category Major, minor Basic type Character assigned Fixed Remarks
&000Letter
&001Lu Letter, uppercase Graphic Character
&002Ll Letter, lowercase Graphic Character
&003Lt Letter, titlecase Graphic Character
&004Lm Letter, modifier Graphic Character
&005Lo Letter, other Graphic Character
&010Mark
&011Mn Mark, nonspacing Graphic Character
&012Mc Mark, spacing combining Graphic Character
&013Me Mark, enclosing Graphic Character
&020Number
&021Nd Number, decimal digit Graphic Character All these, and only these, have Numeric Type = De
&022Nl Number, letter Graphic Character
&023No Number, other Graphic Character
&030Punctuation
&031Pc Punctuation, connector Graphic Character
&032Pd Punctuation, dash Graphic Character
&033Ps Punctuation, open Graphic Character
&034Pe Punctuation, close Graphic Character
&035Pi Punctuation, initial quote Graphic Character May behave like Ps or Pe depending on usage
&036Pf Punctuation, final quote Graphic Character May behave like Ps or Pe depending on usage
&037Po Punctuation, other Graphic Character
&040Symbol
&041Sm Symbol, math Graphic Character
&042Sc Symbol, currency Graphic Character
&043Sk Symbol, modifier Graphic Character
&044So Symbol, other Graphic Character
&050Separator
&051Zs Separator, space Graphic Character
&052Zl Separator, line Format Character Only U+2028 line separator (L​SEP)
&053Zp Separator, paragraph Format Character Only U+2029 paragraph separator (P​SEP)
&060Other
&061Cc Other, control Control Character Fixed 65 No name,
&062Cf Other, format Format Character
&063Cs Other, surrogate Surrogate Not (but abstract) Fixed 2048 No name,
&064Co Other, private use Private-use Not (but abstract) Fixed 6400 in BMP, 131,068 in Planes 15–16 No name,
&065Cn Other, not assigned Noncharacter Not Fixed 66 No name,
Reserved Not Not fixed No name,

Read more about this topic:  Unicode, Origin and Development, Architecture and Terminology

Other articles related to "characters, character, general, character general category, general category":

Word Processor - History
... Input was normally on punched cards, with 80 capital letters and non-alphabetic characters per card ... Basic editing functions included Insert, Delete, Skip (character, line), and so on ... Cheap general-purpose computers were still the domain of hobbyists ...
U+ - Origin and Development - Architecture and Terminology - Character General Category
... Each code point has a single General Category property ... The General Category is not useful for every use, since legacy encodings have used multiple characteristics per single code point ... U+000A Line feed (LF) in ASCII is both a control and a formatting separator in Unicode the General Category is "Other, Control" ...

Famous quotes containing the words category, character and/or general:

    I see no reason for calling my work poetry except that there is no other category in which to put it.
    Marianne Moore (1887–1972)

    Reputation is not of enough value to sacrifice character for it.
    —“Miss Clark,” U.S. charity worker. As quoted in Petticoat Surgeon, ch. 9, by Bertha Van Hoosen (1947)

    There is a mortifying experience in particular, which does not fail to wreak itself also in the general history; I mean “the foolish face of praise,” the forced smile which we put on in company where we do not feel at ease, in answer to conversation which does not interest us. The muscles, not spontaneously moved but moved, by a low usurping wilfulness, grow tight about the outline of the face, with the most disagreeable sensation.
    Ralph Waldo Emerson (1803–1882)