A **sample space**, S, can be any set - finite or infinite, discrete or continuous

An **event** is a (measurable) subset of the sample space

Examples:

- Flipping a fair coin: S = {H, T}, P(H) = P(T) = 1/2

events are {}, {H}, {T}, {H,T} This last is the event of getting either a head or a tail, and has probability 1 - Rolling a fair die: S = {1, 2, 3, 4, 5, 6}, P(1) = P(2) = P(3) = P(4) = P(5) = P(6) = 1/6
- Choosing a card: S = 52 cards, each with a suit (S, H, D, C) and a rank (2, 3, 4, ..., 10, J, Q, K, A)
- Urn models: An urn contains balls of different colors. A ball is selected at random and recorded. It may or may not be replaced before the next ball is selected.
- Selecting a real number between 0 and 1. If all numbers have equal probabilities, then the probability of the event (a,b) with 0 < a < b < 1 is the length of the interval, b-a.

A **probability distribution** on S satisfies the following axioms:

- P[A] is non-negative for any event A
- P[S] = 1
- If A and B are mutually exclusive, then P[A U B] = P[A] + P[B]

On a finite sample space, the **discrete uniform probability distribution**
gives the same probability, P[s] = 1/|S|, to each point s in S.

If S is the interval [a, b], the **continuous uniform probability distribution**
gives probability P[ [c, d] ] = (d-c)/(b-a) to a subinterval of S.

The **conditional probability** of an event A with respect to an event
B with non-zero probability is P[A|B] = P[A.B]/P[B], where A.B denotes
the intersection of A and B.

A and B are **independent** if P[A.B] = P[A]P[B], so P[A|B] = P[A].

**Bayes' Theorem:** P[A|B] = P[A]P[B|A}/P[B]

A **random variable** is a real-valued function on a sample space.
We will usually assume the sample space is discrete to avoid measurability
problems.

A probability distribution on the sample space induces a probability density function for the random variable, X, via P[X = r] = P[s : X(s) = r]

Two random variables, X and Y are **independent** if P[ X = p and Y = q ] =
P[X = p]P[Y = q] for all p and q.

The **expected value** of X, written E[X], is the weighted average of all possible
values of X weighted by their probabilities.

The **variance** of X is Var[X] = E[ (X - E[X])^{2} ]
= E[ X^{2} - 2XE[X] + E[X]^{2} ]
= E[X^{2}] - 2E[X]E[X] + E[X]^{2}
= E[X^{2}] - E[X]^{2}

The **standard deviation** is the square root of the variance.

The **covariance** of two random variables, X and Y is

E[ (X - E[X])(Y - E[Y]) ] = E[XY] - E[X]E[Y]

If X and Y are independent this is zero, so it measures the
degree of dependence between X and Y. If you normalize this
by dividing by the standard deviations of X and Y, you get
the **correlation coefficient**.