The event "At least two people share a birthday" includes many subevents such as "Three people share a birthday" and "Two pairs of people share two different birthdays". The complementary event "No two people share a birthday" is simpler to work with. Ignoring leap years and assuming that all the remaining 365 birthdays are equally likely (and that k isn't larger than 365), the number of ways that k different birthdays can be assigned to k people is 365! / ( 365 - k )!, while the total number of ways of assigning birthdays to k people is 365k. The ratio of these gives the desired probability, which can be written
(365/365) * (364/365) * (363/365) * ... * ( (365 - k + 1)/365 )
which goes below 1/2 at 23 people.
This can also be calculated using the conditional probability
P( person i+1 has a different birthday | the previous i people didn't share a birthday) = (365 - i)/365
In general, the probability that no value is assigned twice when assigning n different equally likely values to k objects is
(1 - 1/n) * (1 - 2/n) * (1 - 3/n) * ... * (1 - (k-1)/n)
We can upper bound this using 1 - x < exp( -x ):
exp(-1/n) * exp(-2/n) * exp (-3/n) * ... * exp( (1-k)/n)
= exp( -k(k-1)/2n )
so for k larger than about sqrt( 2n ) there is likely to be some value assigned twice.
This has applications to the design of cryptographically strong hash functions. The hash function should not be invertible - it should be computationally too expensive to find a message that matches a given hash value - and it should also be difficult to find collisions - two messages that hash to the same value. If a machine can compute one million hash values per second, it would take 600,000 years of random searching to find a message that hashed to a given 64-bit value, but it would only take about an hour to find a collision.
Applied Cryptography, by Bruce Schneier, Wiley, 1996