Kullback–Leibler Divergence

In probability theory and information theory, the Kullback–Leibler divergence (also information divergence, information gain, relative entropy, or KLIC) is a non-symmetric measure of the difference between two probability distributions P and Q. Specifically, the Kullback-Leibler divergence of Q from P is a measure of the information lost when Q is used to approximate P: KL measures the expected number of extra bits required to code samples from P when using a code based on Q, rather than using a code based on P. Typically P represents the "true" distribution of data, observations, or a precisely calculated theoretical distribution. The measure Q typically represents a theory, model, description, or approximation of P.

Although it is often intuited as a metric or distance, the KL divergence is not a true metric — for example, it is not symmetric: the KL from P to Q is generally not the same as the KL from Q to P. However, its infinitesimal form, specifically its Hessian, is a metric tensor: it is the Fisher information metric.

KL divergence is a special case of a broader class of divergences called f-divergences. It was originally introduced by Solomon Kullback and Richard Leibler in 1951 as the directed divergence between two distributions. It can be derived from the Bregman divergence.

Read more about Kullback–Leibler Divergence:  Definition, Motivation, Computing The Closed Form, Properties, KL Divergence For Normal Distributions, Relation To Metrics, Relation To Other Quantities of Information Theory, KL Divergence and Bayesian Updating, Discrimination Information, Relationship To Available Work, Quantum Information Theory, Relationship Between Models and Reality, Symmetrised Divergence, Relationship To Hellinger Distance, Other Probability-distance Measures, Data Differencing