Conditional probability is one of the most important concept in Statistics. All the field of Bayesian statistics is based on it. However, it is also one of the most misleading concept, the one that many people misuse in real life. The prosecutor’s fallacy is one good example illustrating the severe consequences that a wrong interpretation of the conditional probability can lead to.

We begin with a simple example: we can be almost certain that if it is raining (hypothesis A), there will be cloudy (hypothesis B). In other words, the probability P(B|A) is nearly 100%. So the question now is: if it is cloudy, will it rain soon ? Here we search P(A|B), and we can easily see that this probability will be much lower than that of the first case. Lesson learned: P(B|A) is not the same thing as P(A|B).

That conclusion looks ridiculously simple, any students who have taken STAT101 know about it. However, when being placed in a much more complicated situation as the one we will show below, the conditional probability can be easily mis-interpreted.

Suppose that police pick up a suspect and match his or her DNA to evidence collected at a crime scene. Suppose that the likelihood of a match, purely by chance, is only 1 in 10,000. Is this also the chance that he is innocent? It’s easy to make this leap, but you shouldn’t.

Here’s why. Suppose the city in which the person lives has 500,000 adult inhabitants. Given the 1 in 10,000 likelihood of a random DNA match, you’d expect that about 50 people in the city would have DNA that also matches the sample. So the suspect is only 1 of 50 people who could have been at the crime scene. Based on the DNA evidence only, the person is almost certainly innocent, not certainly guilty.”

The generic insight is that the probability of the hypothesis given the available evidence is not equal to the probability of the evidence assuming the hypothesis is true.

In bayesian term, the likelihood is P(B|A) with B: person whose DNA is matched, and A: the person is innocent, P(A) is the prior and P(B) is the evidence. What we are looking for is P(A|B): the posterior probability.

To see some real life examples where this has really happened read about the Sally Clark case in Britain (1998) , the OJ Simpson case (1995) and People vs. Collins (1968).

This fallacy is seen in medical fields too, especially in drug/disease testing, where many people cannot distinguish the difference between: the probability that the test is positive given that the person is really sick, P(positive|sick), and the probability that the person is sick given that the test is positive, P(sick|positive). For example, we usually hear that “The discover rate of the test is 90%”, which means that P(positive|sick) = 90%. However, what we really care about is P(sick|positive), which shows the reliability of the test, because in real life, we don’t know whether the person is sick or not and we need to know if we can trust the result of the test. We can compute it using Bayes Theorem and usually the result is much more lower than the number “90%”.