A hypothesis is a function on the sample space, giving a value for each point in the sample space. If the possible values are {0, 1} then we can identify a hypothesis with the subset of those points that are given value 1. The error of a hypothesis is the probability of that subset where the hypothesis disagrees with the true hypothesis. Learning from examples is the process of making independent random observations and eliminating those hypotheses that disagree with your observations.

How many examples do you need to see so that you have a given level of confidence that picking a hypothesis consistent with your observations will give you error less than E?

Suppose we have a finite set of hypotheses, H, and that we make m observations. If h is a hypothesis with error greater than E, then the probability that it will be consistent with a given observation is less than 1 - E, and the probability that it will be consistent with all m observations is less than (1 - E)m, which is less than exp( -Em ). Therefore the total probability that some hypothesis with error greater than E remains after m observations is less than |H|exp( -Em ). We can set this bound at some desired level, say d, and solve for m, giving

m > (ln |H | + ln (1/d) )/ E

### Reference:

Tom M. Mitchell, Machine Learning, McGraw-Hill, 1997