Chi-squared distribution revisited

Today, while reviewing the regression techniques in the ESL book (Element of Statistical Learning, btw this book is pure gold, I highly recommend it!), I stumbled upon the chi-squared distribution. Concretely, the author shows that:

(N-p-1) \hat{\sigma}^2 \sim \sigma^2 \chi^2_N-p-1

a chi-squared distribution with N-p-1 degrees of freedom, with \hat{\sigma}^2 an unbiased estimate of \sigma^2 .

They use these distributional properties to form tests of hypothesis and confidene interval for the parameters \beta_j .

It has been a very long time since I saw the chi-squared ditribution, and of course my understanding of it becomes quite rusty. So I think this is a good chance to revisit this important distribution.

Before talking about the chi-squared distribution, we need to review some notions. First, we have the gamma function:

For \alpha > 0, the gamma function \Gamma(\alpha) is defined by:

\Gamma(\alpha) = \int_0^\infty x^{\alpha-1}e^{-x}dx

and for any positive integer n, we have: \Gamma(n) = (n-1)!.

Now let:

f(x,a) = \frac{x^{\alpha-1}e^{-x}}{\Gamma(\alpha)}  if x \geq 0


f(x,a) = 0 otherwise.

Then f(x,a) \geq 0. The gamma function implies that:

\int_0^\infty f(x,a)dx = \frac{\Gamma(\alpha)}{\Gamma(\alpha)} = 1

Thus f(x,a) satisfies the 2 basic properties of a probability distribution function.

We will now use this function to define the Gamma distribution and then the Chi-squared distribution. 

A continuous random variable X is said to have a Gamma distribution if the pdf of X is:

f(x;\alpha,\beta) = \frac{1}{\beta^\alpha \Gamma(\alpha)} x^{\alpha-1} e^{-x/\beta} for x \geq 0


f(x;\alpha,\beta) = 0 otherwise

where the parameters \alpha and \beta are positive. The standard Gamma distribution has \beta = 1, so the pdf of a standard gamma is given by the f(x,a) given above.

The Gamma distribution is widely used to model the extent of degradation such as corrosion, creep, wear or survival time.

The Gamma distribution is a family of distribution. Both the Exponential distribution and Chi-squared distribution are special case of the Gamma.

As we can see, the gamma distribution takes two arguments. The first (\alpha) defines the shape. If shape is close to zero, the gamma is very similar to the exponential. If shape is large, then the gamma is similar to the chi-squared distribution.

Now we will define the chi-squared distribution.

Let \nu be a positive integer. Then a random variable X is said to have a chi­-squared distribution with parameter n if the pdf of X is the gamma density with \alpha = \nu /2 and \beta = 2. The pdf of a chi-squared rv is thus:

f(x,\nu) = \frac{1}{2^{\nu/2}\Gamma(\nu/2)}x^{\nu/2}e^{-x/2} for x \geq 0


f(x,\nu) = 0 otherwise

The parameter \nu is called the degrees of freedom df of X.

The chi-squared distribution is important because it is the basis for a number of procedures in statistical inference. The central role played by the chi-squared distribution in inference springs from its relationship to normal distributions. Concretely, the chi-squared distribution is the distribution of a sum of the squares of k independent standard normal random variables. For example, in the case of linear regression, the variable (y_i - \hat{y}_i) follow a normal distribution, thus to model the variance of X, which is the sum of the square of these values, we use a chi-square distribution.

The chi-squared distribution is the foundation of chi-squared tests. There are 2 types:

  • The goodness-of-fit test.
  • The test of independence.

Perhaps we will look at these tests in details in another post.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s