Comma-separated Values - Basic Rules and Examples

Basic Rules and Examples

Many informal documents exist that describe "CSV" formats. IETF RFC 4180 (summarized above) defines the format for the "text/csv" MIME type registered with the IANA. (Shafranovich 2005) Another relevant specification is provided by Fielded Text. Creativyst (2010) provides an overview of the variations used in the most widely used applications and explains how CSV can best be used and supported.

Rules typical of these and other "CSV" specifications and implementations are as follow:

  • CSV is a delimited data format that has fields/columns separated by the comma character and records/rows terminated by newlines.
  • A CSV file does not require a specific character encoding, byte order, or line terminator format (some software does not support all line-end variations).
  • A record ends at a line terminator. However, line-terminators can be embedded as data within fields, so software must recognize quoted line-separators (see below) in order to correctly assemble an entire record from perhaps multiple lines.
  • All records should have the same number of fields, in the same order.
  • Data within fields is interpreted as a sequence of characters, not as a sequence of bits or bytes (see RFC 2046, section 4.1). For example, the numeric quantity 65535 may be represented as the 5 ASCII characters "65535" (or perhaps other forms such as "0xFFFF", "000065535.000E+00", etc.); but not as a sequence of 2 bytes intended to be treated as a single binary integer rather than as two characters. If this "plain text" convention is not followed, then the CSV file no longer contains sufficient information to interpret it correctly, the CSV file will not likely survive transmission across differing computer architectures, and will not conform to the text/csv MIME type.
  • Adjacent fields must be separated by a single comma. However, "CSV" formats vary greatly in this choice of separator character. In particular, in locales where the comma is used as a decimal separator, semicolon, TAB, or other characters are used instead.
1997,Ford,E350
  • Any field may be quoted (that is, enclosed within double-quote characters). Some fields must be quoted, as specified in following rules.
"1997","Ford","E350"
  • Fields with embedded commas must be quoted.
1997,Ford,E350,"Super, luxurious truck"
  • Each of the embedded double-quote characters must be represented by a pair of double-quote characters.
1997,Ford,E350,"Super, ""luxurious"" truck"
  • Fields with embedded line breaks must be quoted (however, many CSV implementations simply do not support this).
1997,Ford,E350,"Go get one now they are going fast"
  • In some CSV implementations, leading and trailing spaces and tabs are trimmed. This practice is controversial, and does not accord with RFC 4180, which states "Spaces are considered part of a field and should not be ignored."
1997, Ford, E350 not same as 1997,Ford,E350
  • In CSV implementations that do trim leading or trailing spaces, fields with such spaces as meaningful data must be quoted.
1997,Ford,E350," Super luxurious truck "
  • The first record may be a "header", which contains column names in each of the fields (there is no reliable way to tell whether a file does this or not; however, it is uncommon to use characters other than letters, digits, and underscores in such column names).
Year,Make,Model 1997,Ford,E350 2000,Mercury,Cougar

Read more about this topic:  Comma-separated Values

Other articles related to "basic rules and examples, example":

Comma-separated Values - Basic Rules and Examples - Example
... SELL! air, moon roof, loaded",4799.00 Example of a USA/UK CSV file (where the decimal separator is a period/full stop and the value separator is a comma) Year,Make,Model,Length ...

Famous quotes containing the words examples, basic and/or rules:

    No rules exist, and examples are simply life-savers answering the appeals of rules making vain attempts to exist.
    André Breton (1896–1966)

    Of course I lie to people. But I lie altruistically—for our mutual good. The lie is the basic building block of good manners. That may seem mildly shocking to a moralist—but then what isn’t?
    Quentin Crisp (b. 1908)

    Neither Aristotelian nor Russellian rules give the exact logic of any expression of ordinary language; for ordinary language has no exact logic.
    Sir Peter Frederick Strawson (b. 1919)