In information theory, self-information is a measure of the information content associated with the outcome of a random variable. It is expressed in a unit of information, for example bits, nats, or hartleys, depending on the base of the logarithm used in its calculation. The term self-information is also sometimes used as a synonym of entropy, i.e. the expected value of self-information in the first sense, because, where is the mutual information of X with itself. These two meanings are not equivalent, and this article covers the first sense only. For the other sense, see Entropy.
By definition, the amount of self-information contained in a probabilistic event depends only on the probability of that event: the smaller its probability, the larger the self-information associated with receiving the information that the event indeed occurred.
Further, by definition, the measure of self-information is positive and additive. If an event C is the intersection of two independent events A and B, then the amount of information at the proclamation that C has happened, equals the sum of the amounts of information at proclamations of event A and event B respectively: I(A ∩ B)=I(A)+I(B).
Taking into account these properties, the self-information associated with outcome with probability is:
This definition complies with the above conditions. In the above definition, the base of the logarithm is not specified: if using base 2, the unit of is in bits. When using the logarithm of base, the unit will be in nat. For the log of base 10, the unit will be in hartley.
As a quick illustration, the information content associated with an outcome of 4 heads (or any specific outcome) in 4 consecutive tosses of a coin would be 4 bits (probability 1/16), and the information content associated with getting a result other than the one specified would be 0.09 bits (probability 15/16). See below for detailed examples.
This measure has also been called surprisal, as it represents the "surprise" of seeing the outcome (a highly improbable outcome is very surprising). This term was coined by Myron Tribus in his 1961 book Thermostatics and Thermodynamics.
The information entropy of a random event is the expected value of its self-information.
Self-information is an example of a proper scoring rule.