You are on page 1of 11

Lec4 1.

April 18, 2005

Relations in categorical data Conditional probability


Outline: Conditional probability; Independence of events; The Bayes rule; Tree diagram; Simpsons paradox.

Conditional Probability If P (B) = 0, the conditional probability of event A given B has occurred, denoted by P (A | B), is dened by, P (A and B) P (A | B) = P (B) We can rewrite the formula: P (A and B) = P (A | B)P (B) = P (B | A)P (A)

Example
A focus group of 10 consumers has been selected to view a new TV commercial. After the viewing, 2 members of the focus group will be randomly selected and asked to answer detailed questions about the commercial. The group contains 4 men and 6 women. P (rst chosen person is female) =? P (second person is female | rst person is female) = ? P (both people are female) =?

Independence Events A and B are independent if P (A | B) = P (A) or equivalently P (B | A) = P (B) P (A and B) = P (A)P (B) This means that the probability that A occurs is unchanged by information about whether B has occurred (and vice versa). If A1 , . . . , An are independent, P (A1 and A2 and An ) = P (A1 ) P (An ) Note: When A and B are disjoint P (A or B) = P (A) + P (B) P (A and B) = P () = 0

Example John and Paul go duck hunting together. Suppose that John hits a target with probability 0.3 and Paul, independently, with probability 0.1. They both re one shot at a duck. Given that exactly one shot hits the duck, what is the conditional probability that it is Johns shot? That it is Pauls? Given that the duck is hit, what is the conditional probability that John hit it? that Paul hit it?

The Bayess rule For any two events A and B, A = (A and B) or (A and B c ) = (A B) (A B c ) Therefore, P (A) = P (A and B) + P (A and B c ) = P (A | B)P (B) + P (A | B c )P (B c ) If we know the probabilities P (A | B), P (A | B c ) and P (B), we can reverse the conditioning using Bayess rule: P (A | B)P (B) P (B | A) = P (A) P (A | B)P (B) = P (A | B)P (B) + P (A | B c )P (B c )

Example A blood test is 99% eective in detecting a certain disease when the disease is present. However, the test also yields a false-positive result for 2% of healthy patients tested. Suppose 0.1% of the population has the disease. What is the probability that a randomly tested individual actually has the disease given that his or her test result is positive? D = an individual has the disease T = test result is positive

The Tree Diagram


5% of male high school athletes go on to play at college level. Of these, 1.7% enter major league professional sports. Given that he doesnt compete in college, the probability that a high school athelete reaches the professional play is 0.01%. A = {competes in college} B = {competes professionally} What is the probability that a high school athlete competes in college and enter major league professional sports? What is the probability that a high school athlete will go on to professional sports? What proportion of professional athletes competed in college?
8

Relationships in Categorical Data A two-way table is a way to display the data from two categorical variables. Example: Surgery death rates in two hospitals Hospital A Hospital B died survived total 63 2037 2100 16 784 800

Hospital A: 3% of surgery patients die Hospital B : 2% of surgery patients die Question: Which hospital is safer?

Example (cont.) Patients are classied as being in either poor or good conditions before surgery. good cond. A died survived total 6 594 600 B 8 592 600 poor cond. A 57 1443 1500 B 8 192 200

For patients in good condition: hospital A: 1% of surgery patients die hospital B: 1.3% of surgery patients die For patients in poor condition: hospital A: 3.8% of surgery patients die hospital B: 4% of surgery patients die
10

An association or comparison that holds for all of several groups can reverse direction when the data are combined to form a single group. This reversal is called Simpsons paradox.

11

You might also like