The problems below are related to applications in **machine learning** and **analysis of data**. These techniques are the foundation of making decisions like "what language is this web page written in?" Probability and dice -------------------- Given a set of dice, how many distinct throws are there overall? (requires just list of dice) Given a set of dice, how many ways are there to throw a 12? A 6-sided die has been shaved on the 2/5 axis. As a result, the probabilities of different rolls are are non-uniform: Face Probability ---- ----------- 1 0.162 2 0.176 3 0.162 4 0.162 5 0.176 6 0.162 Table: probabilities of a shaved die A bag contains ten 6-sided dice. One is shaved as described above; the others are fair (which to say all faces are equally probable). - Define a function that uses Racket's `random` function to simulate a fair die - Define a function that uses Racket's `random` function to simulate the shaved die - What examples and tests can you write? Can a test distinguish, with 100% certainty, between a fair die and a shaved die? (You may find it helpful to write tests using `check-range`.) What data structure best represents a die? Design a data structure that is capable of answering these questions: - What is the probability of rolling a given face? - Using the die and the `random` function, what is the result of a simulated roll of the die? - An experimenter is rolling one of two dice $D$ and $D'$. Write a function that takes as arguments the dice $D$ and $D'$ and an observed roll $R$. Return the *weight of evidence* in favor of the proposition that the experimenter used die $D$. The weight of evidence should be measure in *decibans*. If you've forgotten how to compute weight of evidence, choosing between just the two dice you would have $$ W(D:R) = 10 \log_{10} \frac{P(R|D)}{P(R|D')} $$ which gives the weight measured in decibans. - Opinions vary, but if a proposition is plausible, many people might expect 15 decibans to constitute strong evidence in favor of the proposition. Comparing the fair 6-sided die with the shaved 6-sided die, how many rolls would you need to expect to accumulate a weight of evidence of 15 decibans in favor of the correct die? Solve this problem by writing a function that answers the question for *any* pair of dice. - Suppose a gambler has a bag of ten dice and one is shaved. You're allowed to reach into the bag, pick a die, and roll it as many times as you like. Based on the results of the rolls, you will bet a thousand dollars that you know whether the die was shaved. The bet will be settled with a micrometer, which will answer the question unambiguously. You want to have a 99% chance of winning this bet. How many rolls should you insist on? *What function do you write* to answer this kind of question? What information in the problem needs to be input to this function? Do you want or need auxiliary functions? How will you structure the program? Conditional probability ----------------------- I'm going to use dice to illustrate a technique that plays a major role in many algorithms and data structures used for modeling and learning. The technique provides a way to account for probabilities of events that are not themselves observed, but which affect an observation. (NEED A REAL-WORLD EXAMPLE HERE.) The law in question is simple: given any observation $O$ and set of mutually exclusive events $E_1, \ldots, E_n$, then we can compute the probability of\ $O$ by summing up the probability of\ $O$ conditional on each event: - $P(O|E_i)$ is the probability of $O$ given $E_i$ - $P(E_i)$ the probability of $E_i$ - Therefore the probability of $O$ overall is $$ P(O) = \sum_{i=1}^n P(O | E_i) P(E_i) $$ Another way to arrive at the same formula is that if $i \ne j$, then not only are $E_i$ and $E_j$ mutually exclusive events, but so are $O \land E_i$ and $O \land E_j$. That means $$ P(O) = \sum_{i=1}^n P(O \land E_i) $$ at which point our standard way of handling conjunction kicks in Programming conditional probability ----------------------------------- If we have a particular observation $O$ in mind, we can code $P(O|E_i)$ as a function of the two variables $O$ and $E_i$. And if we have an additional observable set $F_i$, let's code $P(O|E_i \land F_i)$ as a function of the three variables $O$, $E_i$, and $F_i$. We work our way from simple functions to more complex functions: - Define `characteristic-bool` as a function that takes a Boolean and returns 1 if the Boolean is true and 0 if it is false. - Define `P-total-given-both` as a function of three arguments which returns the probability that the first argument equals the total of the second and third arguments. - Define `P-total-given-first-and-d4` which takes two arguments and gives the probability that the first argument is the total of the second argument and the result of throwing a fair four-sided die. - Define `P-total-given-2d4` which takes one arguments and gives the probability that the argument is the total of the throwing two independent, fair, four-sided dice. - **Extra Credit**: James Bond is playing craps with Blofeld. Blofeld surreptitiously substitutes two shaved dice for Bond's fair six-sided dice. What is the probability that Bond rolls a seven? - **To do after we do lists**: make the dice arguments - **To do when we get to anonymous lambdas**: abstract over everything and make things higher-order. Probability in real life ------------------------ 3. *Problem-solving by calculation*. In the United States, breast cancer kills more women than any other cancer except lung cancer, and it is the second most frequently diagnosed form of cancer after skin cancer. Breast cancer is commonly screened for using a diagnostic tool called a *mammogram*. Here are some facts about breast cancer and mammograms: - The United states has about 91 million women between the ages of 18 and 64. Of these, about 232,340 are expected to be diagnosed with invasive breast cancer. - If a woman has invasive breast cancer, the probability that a mammogram shows breast cancer (a *true positive*) is about 78%. (This probability varies significantly with the age of the woman.) - If a woman does *not* have invasive breast cancer, the probability that a mammogram shows breast cancer (a *false positive*) is about 7.67%. This problem has several parts: A) A woman between the ages of 18 and 64 tests positive on a mammogram. Using DrRacket, calculate the probability that she has invasive breast cancer. B) Using the same technique as you used in the previous part, calculate the probability that the woman does *not* have invasive breast cancer. C) Generalize your calculation by defining a function that can be used over and over to calculate probabilities based on the observed results of medical screening tests. Use the design recipe from Figure 4 on page 21.