A sample space, S, can be any set - finite or infinite, discrete or continuous
An event is a (measurable) subset of the sample space
Examples:
A probability distribution on S satisfies the following axioms:
On a finite sample space, the discrete uniform probability distribution gives the same probability, P[s] = 1/|S|, to each point s in S.
If S is the interval [a, b], the continuous uniform probability distribution gives probability P[ [c, d] ] = (d-c)/(b-a) to a subinterval of S.
The conditional probability of an event A with respect to an event B with non-zero probability is P[A|B] = P[A.B]/P[B], where A.B denotes the intersection of A and B.
A and B are independent if P[A.B] = P[A]P[B], so P[A|B] = P[A].
Bayes' Theorem: P[A|B] = P[A]P[B|A}/P[B]
A random variable is a real-valued function on a sample space. We will usually assume the sample space is discrete to avoid measurability problems.
A probability distribution on the sample space induces a probability density function for the random variable, X, via P[X = r] = P[s : X(s) = r]
Two random variables, X and Y are independent if P[ X = p and Y = q ] = P[X = p]P[Y = q] for all p and q.
The expected value of X, written E[X], is the weighted average of all possible values of X weighted by their probabilities.
The variance of X is Var[X] = E[ (X - E[X])2 ] = E[ X2 - 2XE[X] + E[X]2 ] = E[X2] - 2E[X]E[X] + E[X]2 = E[X2] - E[X]2
The standard deviation is the square root of the variance.
The covariance of two random variables, X and Y is
E[ (X - E[X])(Y - E[Y]) ] = E[XY] - E[X]E[Y]
If X and Y are independent this is zero, so it measures the degree of dependence between X and Y. If you normalize this by dividing by the standard deviations of X and Y, you get the correlation coefficient.