# G-test - Relation To Mutual Information

Relation To Mutual Information

For analysis of contingency tables the value of G can also be expressed in terms of mutual information.

Let

, and

Then G can be expressed in several alternative forms:

where the entropy of a discrete random variable is defined as

and where

is the mutual information between the row vector and the column vector of the contingency table.

It can also be shown that the inverse document frequency weighting commonly used for text retrieval is an approximation of G applicable when the row sum for the query is much smaller than the row sum for the remainder of the corpus. Similarly, the result of Bayesian inference applied to a choice of single multinomial distribution for all rows of the contingency table taken together versus the more general alternative of a separate multinomial per row produces results very similar to the G statistic.

