The conditional probability P ( Y ≤ 1/3 | X ) may be defined as the best predictor of the indicator
given X. That is, it minimizes the mean square error E ( I - g(X) )2 on the class of all random variables of the form g (X).
In the case f = f1 the corresponding function g = g1 may be calculated explicitly,
Alternatively, the limiting procedure may be used,
giving the same result.
Thus, P ( Y ≤ 1/3 | X ) = g1 (X). The expectation of this random variable is equal to the (unconditional) probability, E ( P ( Y ≤ 1/3 | X ) ) = P ( Y ≤ 1/3 ), namely,
which is an instance of the law of total probability E ( P ( A | X ) ) = P ( A ).
In the case f = f2 the corresponding function g = g2 probably cannot be calculated explicitly. Nevertheless it exists, and can be computed numerically. Indeed, the space L2 (Ω) of all square integrable random variables is a Hilbert space; the indicator I is a vector of this space; and random variables of the form g (X) are a (closed, linear) subspace. The orthogonal projection of this vector to this subspace is well-defined. It can be computed numerically, using finite-dimensional approximations to the infinite-dimensional Hilbert space.
Once again, the expectation of the random variable P ( Y ≤ 1/3 | X ) = g2 (X) is equal to the (unconditional) probability, E ( P ( Y ≤ 1/3 | X ) ) = P ( Y ≤ 1/3 ), namely,
However, the Hilbert space approach treats g2 as an equivalence class of functions rather than an individual function. Measurability of g2 is ensured, but continuity (or even Riemann integrability) is not. The value g2 (0.5) is determined uniquely, since the point 0.5 is an atom of the distribution of X. Other values x are not atoms, thus, corresponding values g2 (x) are not determined uniquely. Once again, "the concept of a conditional probability with regard to an isolated hypothesis whose probability equals 0 is inadmissible." (Kolmogorov; quoted in ).
Alternatively, the same function g (be it g1 or g2) may be defined as the Radon–Nikodym derivative
where measures μ, ν are defined by
for all Borel sets That is, μ is the (unconditional) distribution of X, while ν is one third of its conditional distribution,
Both approaches (via the Hilbert space, and via the Radon–Nikodym derivative) treat g as an equivalence class of functions; two functions g and g′ are treated as equivalent, if g (X) = g′ (X) almost surely. Accordingly, the conditional probability P ( Y ≤ 1/3 | X ) is treated as an equivalence class of random variables; as usual, two random variables are treated as equivalent if they are equal almost surely.
Other articles related to "conditional probability, probability, conditional":
... Conditioning (probability) Conditional expectation Conditional probability distribution Regular conditional probability Disintegration theorem Bayes' theorem de Finetti's ...
... Formally, is defined as the probability of according to a new probability function on the sample space, such that outcomes not in have probability 0 and that it ... A new probability distribution (denoted by the conditional notation) is to be assigned on to reflect this ... Substituting 1 and 2 into 3 to select So the new probability distribution is Now for a general event ...
... The conditional probability P ( Y ≤ 1/3
Famous quotes containing the words probability and/or conditional:
“The probability of learning something unusual from a newspaper is far greater than that of experiencing it; in other words, it is in the realm of the abstract that the more important things happen in these times, and it is the unimportant that happens in real life.”
—Robert Musil (18801942)
“Computer mediation seems to bathe action in a more conditional light: perhaps it happened; perhaps it didnt. Without the layered richness of direct sensory engagement, the symbolic medium seems thin, flat, and fragile.”
—Shoshana Zuboff (b. 1951)