Conditional Probability

Contents

5.4. Conditional Probability#

Independent events are unrelated events. For example, a die roll coming up 1 is independent of whether the previous roll came up 1.

Many common events are not independent. Consider and discuss the following events and whether the probability of each is high or low:

  • your bicycle needs repairs,

  • you have a bicycle accident,

  • your bicycle needs repairs given that you had a bicycle accident,

  • you had a bicycle accident, given that your bicycle needs repairs.

Question: Are the bicycle repairs and the bicycle accident independent events?

If not, we call them conditionally related.

Question: Which statement is more logical? Or are they equally logical?

  1. Whether your bike needs repairs (event R) is related to whether you have a bike accident (event A).

  2. Whether you have a bike accident (event A) is related to whether your bike needs repairs (event R).

The conditional probability of an event is based on prior knowledge of conditions that might be related to the event (based on the Wikipedia article on Bayes’ Theorem). We speak of the conditional probability that event Y occurs, given that event X occurred. We denote the probability as P(Y|X).

For example, we may want to know the probability that your bike needs repairs (event R) given that you had a bike accident (event A). We denote the probability as P(R|A).

The quantity P(R|A) is a conditional probability. Here A is considered the prior event. We say we are conditioning on A.

The (conditional) probability of Y given X can be calculated using the mathematical definition

\[P(Y|X) = \frac{P(X \text{ and } Y)}{P(X)}.\]

The probability P(X) of the prior event X is called the prior probability.

We can derive Bayes’ Theorem by first writing the conditional probability of X given Y

\[P(X|Y) = \frac{P(X \text{ and } Y)}{P(Y)}.\]

Solving each of the previous two equations for P(X and Y) shows that

\[P(Y|X) P(X) = P(X|Y) {P(Y)}\]

or

\[P(Y|X) = \frac{P(X|Y) P(Y)}{P(X)}.\]

Bayes’ Theorem expresses one conditional probability in terms of the other. It captures how a probability should change to account for related evidence.

Example: Suppose 231 college teens have a parent with a college degree; 214 college teens have no parent with a college degree; 49 teens who graduated from high school and did not go to college have a parent who went to college; and 298 non-college teens have no parent with a college degree. (This problem is adapted from OpenIntoStatistics, David M. Diez, Christoper D. Barr, and Mine Cetinkaya-Rundel, Creative Commons, 2016.)

Do the following problems and interpret your answers in complete sentences.

(a) Calculate the probability that a teen drawn from this pool goes to college, given that they have a parent with a college degree.

PC: A parent of the teen has a college degree

PN: Neither parent of the teen has a college degree

C: The teen is in college

NC: The teen graduated from high school and is not in college

There are 792 teens

(a) P(C | PC) = P(C and PC)/P(PC)=231/280 #The probability that the teen goes to college, given that they have a parent with a college degree is:

(b) Calculate the probability that a teen drawn from this pool does not go to college, given that they have a parent with a college degree.

P(NC | PC) = P(NC and PC)/P(PC)=49/280 The probability that a teen does not go to college, given that they have a parent with a college degree is:

(c) Calculate the probability that a teen drawn from this pool has a parent with a college degree, given that the teen is in college.

P (PC | C) = P(PC and C)/P(C)= 231/792 / ((231 + 214)/792) The probability that the teen has a parent with a college degree, given that the teen is in college is:

(d) Verify that Bayes’ Theorem holds for some data from this problem.

According to Bayes’ Theorem, P(C | PC) = P(PC | C) P(C) / P(PC) (which agrees with (a)).

(e) Which of the above quantities do you consider to be of the most practical interest?

I consider (a) the probability that a teen goes to college, given that they have a parent with a college degree, to be of the most practical interest because it is interesting to consider whether the teen’s presence in college is related to whether the teen has a parent with a college degree.

(f) Suppose we are trying to make decisions about whether a teen who graduates from high school is likely to go promptly into college. Calculate the most relevant probabilities, explain their significance in the context of the practical problem at hand, and describe possible limitations of their applicability.

To look at the likelihood that a teen goes to college, we can look at whether they have a parent with a college degree.

Part (a) shows that the probability that the teen goes to college, given that they have a parent with a college degree is 82.5%.

(Part (b) shows that the probability that the teen goes to college, given that they do not have a parent with a college degree is 17.5%.)

This result seems to suggest that if a parent has a college degree, the teen is likely to go into college. One limitation of this conclusion is that it is based only on the data provided. We would need reasons to believe that the data are representative of all teens who graduated from high school, both those in college and those not in college.

Another thing to be cautious about is whether a parent’s education is relevant to whether the teen goes to college promptly. For example, if instead of checking parent’s education we checked whether the parent’s favorite color is purple, we might see that 82.5% of teens who have a parent with the favorite color purple are in college and 17.5% of the teens who have a parent with the favorite color purple are not in college, and this would simply reflect that going to college is more common than not, rather than indicate that parental preference for purple is relevant.

5.4.1. Exercises#

Exercises

Exercise 1: Suppose 35 spam email messages contain the word “free”; 25 spam messages do not contain the word “free”; three of non-spam messages contain the word “free”; and 37 non-spam messages do not contain the word “free”. (This problem is adapted from OpenIntoStatistics, David M. Diez, Christoper D. Barr, and Mine Cetinkaya-Rundel, Creative Commons, 2016.)

Do the following problems and interpret your answer in complete sentences.

(a) Calculate the probability that a message is spam, given that it contains the word “free”. (b) Calculate the probability that a message is not spam, given that it contains the word “free”. (c) Calculate the probability that a message contains the word “free”, given that it is spam. (d) Verify that Bayes’ Theorem holds for some data from this problem. (e) Which of the above quantities do you consider to be of the most practical interest? (f) Suppose we are trying to make decisions about whether a message is spam. Calculate the most relevant probabilities, explain their significance in the context of the practical problem at hand, and describe possible limitations of their applicability.

Exercise 2: Suppose the disease meningitis causes a stiff neck 70% of the time, the prior probability of any patient having meningitis is 1/50,000, and the prior probability of any patient having a stiff neck is 1/100. (Artificial Intelligence: A Modern Approach (Third Edition), Stuart J. Russell and Peter Norvig, Prentice Hall, Upper Saddle River, 2010). Use Bayes’ Theorem to calculate the probability that a patient has meningitis, given they have a stiff neck. Show how you apply Bayes’ Theorem. Interpret the results in a complete sentence.