This morning I came across an article on normality testing, one problem that I had thought about a lot when I was doing a project for my Statistics class last year. I did an analysis on the the airplane accidents, more precisely, I was comparing the fatalities numbers between 2 periods : 1994-2001 and 2001-2009 ( I was trying to find a difference in safety before and after 11/09 ). The distribution was showed below:

As we can see, the density plot is completely skewed, and normality is out of the table. In case of small sample, this could mean that we can’t use some parametric test ( Student t-test…) with normality hypothesis. For large sample, thanks to the Central Limit Theorem, we can ignore this condition. So the question that pop up in my heart was: Is normality really helpful ? Especially in case of large sample ?

The article ( Is normality testing ‘essentially useless’?) is a question on CrossValidated. The author quoted his colleague’s argument:

We usually apply normality tests to the results of processes that, under the null, generate random variables that are only

asymptoticallyornearlynormal (with the ‘asymptotically’ part dependent on some quantity which we cannot make large); In the era of cheap memory, big data, and fast processors, normality tests shouldalwaysreject the null of normal distribution for large (though not insanely large) samples. And so, perversely, normality tests should only be used for small samples, when they presumably have lower power and less control over type I rate.

This is also my though about the normality test. The answers mostly back up this argument. There was one answer that explained in more detail the true purpose of normality test:

The question normality tests answer:

With moderately large real data sets, the answer is almost always yes.Is there convincing evidence of any deviation from the Gaussian ideal ?The question scientists often expect the normality test to answer: Do the data deviate enough from the Gaussian ideal to “forbid” use of a test that assumes a Gaussian distribution? Scientists often want the normality test to be the referee that decides when to abandon conventional (ANOVA, etc.) tests and instead analyze transformed data or use a rank-based nonparametric test or a resampling or bootstrap approach. For this purpose, normality tests are not very useful.

So there it is, the ugly truth: if we want to know whether a parametric test with normality hypothesis can be applied or not, normality testing is not the way to go. The problem now is: * what do we need to do to answer that question ?* Some answers suggested “seeing and trying” method: investigate visually the normality of the sample. However, in some cases, this could be very difficult and time-consuming…