In today’s CS109 lecture, Joe Blitzstein has told a story where he went to a Bio conference and ask everybody there to give him a definition of p-value. The result is “weird” for him : no one could give a good answer on what p-value is. This story struck me deeply because even though this year I have read A LOT of articles on p-value, I always have this insecured feeling that I don’t truly understand its meaning. So here I am, trying to write down my definition:

p-value is the probability of an event that is as strong as or stronger than the observed event, given that the null hypothesis is true.

In other word, p-value tells us whether the event that we observe is solely based on luck, or whether there really is a difference there. So if we obtain a small p-value ( usually < 5%), that event is rare enough to conclude that there is a true difference, that the event is not happening based on some luck.

Something that I need to emphasize : p-value is not an indication of the practical significance. It means that even if we have a tiny p-value, we still can not conclude whether the difference between H0 and H1 is really large or not. This is where the effect size come into play. It’s a quantitative measure for the real strength of the phenomenon. We can compute it, or use the confidence interval to infer it ( or can we ? I’m not sure about this point…). Anyway, the important point is that nowadays more and more journals demand an evidence of practical significance beside the statistical significance (p-value).

In a long time, many researchers ( particularly in field of social science like psychology,… ) have misinterpreted the meaning of p-value and consider it as the ultimate indication for the viability of their work: a p-value = 0.051 means trash where as a p-value = 0.0049 totally changes the game. Some people try to manipulate their p-valu by keep doing the experiment, keep adding the observation until receiving a valid p-value, which is called p-value fishing. Moreover, it can be proved that with a large enough sample, any event can result in a p-value less than 0.05, but it doesn’t mean that the observed difference has any practical value.