Today, while reviewing the regression techniques in the ESL book (Element of Statistical Learning, btw this book is pure gold, I highly recommend it!), I stumbled upon the chi-squared distribution. Concretely, the author shows that:
a chi-squared distribution with degrees of freedom, with an unbiased estimate of .
They use these distributional properties to form tests of hypothesis and confidene interval for the parameters .
It has been a very long time since I saw the chi-squared ditribution, and of course my understanding of it becomes quite rusty. So I think this is a good chance to revisit this important distribution.
Before talking about the chi-squared distribution, we need to review some notions. First, we have the gamma function:
For , the gamma function is defined by:
and for any positive integer , we have: .
Then . The gamma function implies that:
Thus satisfies the 2 basic properties of a probability distribution function.
We will now use this function to define the Gamma distribution and then the Chi-squared distribution.
A continuous random variable X is said to have a Gamma distribution if the pdf of X is:
where the parameters and are positive. The standard Gamma distribution has , so the pdf of a standard gamma is given by the given above.
The Gamma distribution is widely used to model the extent of degradation such as corrosion, creep, wear or survival time.
The Gamma distribution is a family of distribution. Both the Exponential distribution and Chi-squared distribution are special case of the Gamma.
As we can see, the gamma distribution takes two arguments. The first () defines the shape. If shape is close to zero, the gamma is very similar to the exponential. If shape is large, then the gamma is similar to the chi-squared distribution.
Now we will define the chi-squared distribution.
Let be a positive integer. Then a random variable is said to have a chi-squared distribution with parameter if the pdf of is the gamma density with and . The pdf of a chi-squared rv is thus:
The parameter is called the degrees of freedom df of X.
The chi-squared distribution is important because it is the basis for a number of procedures in statistical inference. The central role played by the chi-squared distribution in inference springs from its relationship to normal distributions. Concretely, the chi-squared distribution is the distribution of a sum of the squares of k independent standard normal random variables. For example, in the case of linear regression, the variable follow a normal distribution, thus to model the variance of X, which is the sum of the square of these values, we use a chi-square distribution.
The chi-squared distribution is the foundation of chi-squared tests. There are 2 types:
- The goodness-of-fit test.
- The test of independence.
Perhaps we will look at these tests in details in another post.