Entity–attribute–value Model - Sparse Columns As An Alternative To EAV

Sparse Columns As An Alternative To EAV

Microsoft SQL Server 2008 offers a (proprietary) approach: columns with an atomic data type (e.g., numeric, varchar or datetime columns) can be designated as sparse simply by including the word SPARSE in the column definition of the CREATE TABLE statement. Sparse columns optimize the storage of NULL values (which now take up no space at all) and are useful when the majority records in a table will have NULL values for that column. Indexes on sparse columns are also optimized: only those rows with values are indexed. In addition, the contents of all sparse columns in a particular row of a table can be collectively aggregated into a single XML column (a column set), whose contents are of the form *.... In fact, if a column set is defined for a table as part of a CREATE TABLE statement, all sparse columns subsequently defined are typically added to it. This has the interesting consequence that the SQL statement SELECT * from will not return the individual sparse columns, but concatenate all of them into a single XML column whose name is that of the column set (which therefore acts as a virtual, computed column).

Sparse columns are convenient for business applications such as product information, where the applicable attributes can be highly variable depending on the product type, but where the total number of variable attributes per product type are relatively modest. They do not offer a solution for the modeling of clinical data:

  • The maximum number of sparse columns in a table is 30,000, which falls short for clinical data, where the possible number of attributes is one order of magnitude larger. Therefore, this is not a solution for modeling *all* possible clinical attributes for a patient.
  • Addition of new attributes – an important consideration in biomedicine, where knowledge of diseases evolves and new laboratory tests are being developed – still requires a DBA, and the problem of building a user interface to such data is not addressed: only the storage mechanism is streamlined.
    Applications can dynamically add and remove sparse columns from a table at run-time: an attempt to perform such an action in a multi-user scenario where other users/processes are still using the table would be prevented for tables without sparse columns. This capability offers power and flexibility, but can result in significant performance penalties, in part because any compiled query plans that use this table are automatically invalidated. In addition, dynamic column addition or removal is an operation that should be audited, at the very least – column removal can cause data loss – and allowing an application to modify a table without maintaining some kind of a trail – including a justification for the action – is not good software practice. Such a feature therefore invites abuse and should be used infrequently and judiciously.
  • Another major limitation is that SQL constraints (e.g., range checks) cannot be applied to sparse columns: the only check that is applied is for correct data type. These would have to be implemented in metadata tables and middle-tier code, as is done in production EAV systems. (This consideration also applies to business applications as well.)
  • SQL Server has limitations on row size: the total contents of all atomic-datatype columns, sparse and non-sparse, in a row that contain data cannot exceed 8016 bytes if that table contains a sparse column. Further, sparse columns that happen to contain data have a storage overhead of 4 bytes per column in addition to storage for the data type itself (e.g., 4 bytes for datetime columns). This impacts the amount of sparse-column data that you can associate with a given row. This size restriction is relaxed for the varchar data type, which means that, if one hits row-size limits in a production system, one has to work around it by designating sparse columns as varchar even though they may have a different intrinsic data type. Unfortunately, this approach now subverts server-side data-type checking.

For more details, see http://msdn.microsoft.com/en-us/library/cc280604.aspx

Read more about this topic:  Entity–attribute–value Model

Famous quotes containing the words sparse, columns and/or alternative:

    The report reflects incredibly terrible judgments, shockingly sparse concern for human life, instances of officials lacking the courage to exercise the responsibilities of their high office and some very bewildering thought processes.
    Jane Jarrell Smith, U.S. widow of American astronaut Michael J. Smith. As quoted in Newsweek magazine, p. 13 (June 30, 1986)

    Rage is the only quality which has kept me, or anybody I have ever studied, writing columns for newspapers.
    Jimmy Breslin (b. 1929)

    A mental disease has swept the planet: banalization.... Presented with the alternative of love or a garbage disposal unit, young people of all countries have chosen the garbage disposal unit.
    Ivan Chtcheglov (b. 1934)