In a room with *k* people, how likely is it that there will
be at least two people with the same birthday? How large should
*k* be to ensure better than even odds that at least one birthday
is shared?
The event "At least two people share a birthday" includes many
subevents such as "Three people share a birthday" and
"Two pairs of people share two different birthdays".
The complementary event "No two people share a birthday"
is simpler to work with.
Ignoring leap years and assuming that all the remaining 365
birthdays are equally likely (and that *k* isn't larger
than 365), the number of ways that *k*
different birthdays can be assigned to *k* people is
365! / ( 365 - *k* )!, while the total number of ways
of assigning birthdays to *k* people is 365^{k}.
The ratio of these gives the desired probability, which can
be written

(365/365) * (364/365) * (363/365) * ... * ( (365 - *k* + 1)/365 )

which goes below 1/2 at 23 people.

This can also be calculated using the conditional probability

P( person *i*+1 has a different birthday | the previous *i* people didn't
share a birthday) = (365 - *i*)/365

In general, the probability that no value is assigned twice when assigning
*n* different equally likely values to *k* objects is

(1 - 1/*n*) * (1 - 2/*n*) * (1 - 3/*n*) * ... * (1 - (*k*-1)/*n*)

We can upper bound this using 1 - *x* < exp( -*x* ):

exp(-1/*n*) * exp(-2/*n*) * exp (-3/*n*) * ... * exp( (1-*k*)/*n*)

= exp( -*k*(*k*-1)/2*n* )

so for *k* larger than about sqrt( 2*n* ) there is likely to be
some value assigned twice.

This has applications to the design of cryptographically strong hash
functions. The hash function should not be invertible - it should be
computationally too expensive to find a message that matches a
given hash value - and it should also be difficult to find collisions -
two messages that hash to the same value. If a machine can compute
one million hash values per second, it would take 600,000 years of random
searching to find a message that hashed to a given 64-bit value, but it would
only take about an hour to find a collision.

### References:

Randomized Algorithms, by Rajeev Motwani and Prabhakar Raghavan,
Cambridge University Press, 1995
Applied Cryptography, by Bruce Schneier,
Wiley, 1996