% COMP 50 Homework 1 Homework Assignment: Introduction to Racket =========================================== The purpose of this assignment is twofold: - Introduce you to the infrastructure of homework: DrRacket (and `provide`?) - Create simple functions that will start you thinking in domains where we can solve problems later - Solve a simple problem by computer Please solve all the numbered problems 1 through 4. In addition, you may solve extra-credit problems **D**, **R1**, **R2**, **RS**, **RG1**, and **RG2**. Notes on probability -------------------- If $H$ is a hypothesis and $O$ is an observation then $P(O|H)$ is the probability of the observation given the hypothesis and $P(H|O)$ is the probability of the hypothesis given the observation. For any two propositions $A$ and $B$ we have $$ P(A \land B) = P(A | B) P(B) = P(B | A) P(A) $$ so these probabilities can be related. Frequently we will want to use an observation to adjust our opinion of a hypothesis. We write $Odds(H|O)$ for the "odds" of hypothesis $H$ given observation $O$. The definition is $$ Odds(H|O) = \frac{P(H|O)}{P(\lnot H|O)} $$ The *logarithm* of the odds is called the *weight of evidence* adduced by\ $O$ in favor of\ $H$. Weights of evidence from independent observations are *additive*. TO COME: NOTES ON PRIOR AND POSTERIOR PROBABILITY Main problems ------------- Solve the following problems: 1. *GPS coordinates*. **QUESTION: IS IT BETTER TO MAKE THIS THE LAB PROBLEM AND TO ASK STUDENT FOR HOMEWORK TO CONVERT A NONNEGATIVE MAGNITUDE IN RADIANS TO DD/MM/SS.SSS FORMAT?** When *people* exchange GPS coordinates, they use compass directions, degrees, minutes, and seconds. But when *computers* work with GPS coordinates, they use angles measured in radians. In this problem you write functions to convert human-readable GPS coordinates into radians Using the design recipe from Figure 4 on page 21, define functions `north`, `south`, `east`, and `west`, each of which takes three arguments and returns an angle in radians: >`(north` *dd* *mm* *ss.sss*`)` returns a latitude in radians, as does `south`. > >`(east` *ddd* *mm* *ss.sss*`)` returns a longitude in radians, as does `west`. Callers of these functions must respect these contracts: - *dd* and *mm* are integers (which are exact numbers) - *ss.sss* may be an exact number or an inexact number - $0 \le dd \le 90$ (for north/south coordinates) - $0 \le ddd \le 180$ (for east/west coordinates) - $0 \le \mathit{mm} < 60$ - $0 \le \mathit{ss} < 60$ You must **follow the full design recipe** (XXX WHERE IS THE DESIGN RECIPE LOCATED?) >*Note*: this problem amounts to a change of number system. >The combination of sexagesimal and decimal number systems used >here is unusual, but the ability to change number systems is a >fundamental one used in many computations. For example, the >computer's internal number system is different from the decimal >number system that we use to communicate with the computer. 2. *Plane geometry*. Chapter 6 describes the `posn` structure, which represents a position using $(x, y)$ coordinates. If two `posns` $A$ and $B$ are distinct, then they define a unique line on the plane. Your job is to define a function that tells on which side of this line a third point $C$ falls. SERIES OF PICTURES HERE, INCLUDING THREE DIFFERENT C'S WITH POSITIVE, NEGATIVE, AND ZERO DIFFERENCES. http://cl.ly/240w1J2G132O Using the design recipe from Figure 4 on page 21, define a function `sign-of-location` such that - If point $C$ is to the left of the line drawn from $A$ to $B$, then `(sign-of-location A B C)` returns a negative number. - If point $C$ is on the line drawn from $A$ to $B$, then `(sign-of-location A B C)` returns zero (or if the computation is inexact, a number very close to zero). - If point $C$ is to the right of the line drawn from $A$ to $B$, then `(sign-of-location A B C)` returns a positive number. *Hint*: How would you expect the value `(sign-of-location A B C)` to be related to the value `(sign-of-location B A C)`? Is there a mathematical law that should relate the two? *Hint*: Any line in the plane can be characterized by a triple of coefficients $(a, b, c)$; the line is the set of points $(x, y)$ satisfying the equation $ax + by + c = 0$. The coefficients are not unique; the can vary by a multiplicative factor $\mu$. In particular, if $\mu \ne 0$, then the triple $(\mu a, \mu b, \mu c)$ characterizes the same line as $(a, b, c)$. The coefficients also obey the invariant that coefficients $a$ and $b$ cannot *both* be zero. 3. *Problem-solving by calculation*. In the United States, breast cancer kills more women than any other cancer except lung cancer, and it is the second most frequently diagnosed form of cancer after skin cancer. Breast cancer is commonly screened for using a diagnostic tool called a *mammogram*. Here are some facts about breast cancer and mammograms: - The United states has about 91 million women between the ages of 18 and 64. Of these, about 232,340 are expected to be diagnosed with invasive breast cancer. - If a woman has invasive breast cancer, the probability that a mammogram shows breast cancer (a *true positive*) is about 78%. (This probability varies significantly with the age of the woman.) - If a woman does *not* have invasive breast cancer, the probability that a mammogram shows breast cancer (a *false positive*) is about 7.67%. This problem has several parts: A) A woman between the ages of 18 and 64 tests positive on a mammogram. Using DrRacket, calculate the probability that she has invasive breast cancer. B) Using the same technique as you used in the previous part, calculate the probability that the woman does *not* have invasive breast cancer. C) Generalize your calculation by defining a function that can be used over and over to calculate probabilities based on the observed results of medical screening tests. Use the design recipe from Figure 4 on page 21. Extra credit ------------ Extend your plane geometry to include a distance calculation: - **D**: Write a function `directed-distance` such that `(directed-distance A B C)` gives the distance and direction of point $C$ from the line defined by points $A$ and $B$. Professor Ramsey has a collection of funny dice. His collection includes a 4-sided die, a 6-sided die, an 8-died die, a 10-sided die, and a 20-sided die. These are abbreviated d4, d6, d8, and so on. A d$N$ die has $N$ dies numbered from 1 to $N$. If Professor Ramsey rolls the d4 and d20 together (the "small group") and adds up the numbers, he gets a total between 2 and 24. If he rolls the d6, d8, and d10 together (the "big group") and adds up the numbers, he gets a total between 3 and 24. Solve the following problems: - **R1**: Professor Ramsey rolls a 6. What is the weight of evidence in favor of the proposition that Professor Ramsey is rolling the big group? What is the weight of evidence in favor of the proposition that Professor Ramsey is rolling the small group? - **R2**: Via telepathic brain wave, you receive the information that Professor Ramsey dislikes the d4, and so when rolling dice, he chooses the big group twice as often as the small group. You are told that he rolled a 12. What are the chances he chose the big group for that roll? Super extra credit ------------------ Professor Ramsey chooses one group of dice or the other and makes a long sequence of rolls, discarding all numbers except those between 5 and 20. Here are two examples of such sequences: - 13 20 15 13 11 17 11 7 12 6 13 - 8 18 13 11 10 19 17 18 8 15 20 Answer this question: - **RS**: supposing each group was chosen with equal probability, which group was most likely used to roll the first sequence? Which group was most likely used to roll the sequence? Now, Professor Ramsey offers to play the following game: 1. He will choose but not reveal one of the two groups of dice. 2. As many times as you wish, you may pay him a quarter. On receiving the money, he will roll the dice until they produce a number between 5 and 20, and he will tell you that number. 3. At any point you may ask the rolling to stop and you may guess which group of dice Professor Ramsey is using. If you guess correctly, he will pay you $20. Professor Ramsey charges $10 for the privilege of playing this game. If you guess without paying for any die rolls, you win half of the time, and your average return is $10, so in the long run you could expect to break even. Please answer these questions about the game: - **RG1**: Is the 25-cent price for a die roll in step 2 a fair price? Or could Professor Ramsey be charging too much (or too little)? - **RG2**: How many die rolls would you have to pay for to increase your chances of winning the game to 90%? *Hint*: Assume that the professor is using the big group of dice, and compute the *expected weight of evidence* for a single die roll. Then do the same for the small group of dice. (The expected weight of evidence is the average amount of evidence accumulated, that is the sum of the evidence accumulated by each individual die roll, weighted by the probability of that roll.)