# Chi-squared distribution revisited

Today, while reviewing the regression techniques in the ESL book (Element of Statistical Learning, btw this book is pure gold, I highly recommend it!), I stumbled upon the chi-squared distribution. Concretely, the author shows that:

$(N-p-1) \hat{\sigma}^2 \sim \sigma^2 \chi^2_N-p-1$

a chi-squared distribution with $N-p-1$ degrees of freedom, with $\hat{\sigma}^2$ an unbiased estimate of $\sigma^2$.

They use these distributional properties to form tests of hypothesis and confidene interval for the parameters $\beta_j$.

It has been a very long time since I saw the chi-squared ditribution, and of course my understanding of it becomes quite rusty. So I think this is a good chance to revisit this important distribution.

Before talking about the chi-squared distribution, we need to review some notions. First, we have the gamma function:

For $\alpha > 0$, the gamma function $\Gamma(\alpha)$ is defined by:

$\Gamma(\alpha) = \int_0^\infty x^{\alpha-1}e^{-x}dx$

and for any positive integer $n$, we have: $\Gamma(n) = (n-1)!$.

Now let:

$f(x,a) = \frac{x^{\alpha-1}e^{-x}}{\Gamma(\alpha)}$  if $x \geq 0$

and

$f(x,a) = 0$ otherwise.

Then $f(x,a) \geq 0$. The gamma function implies that:

$\int_0^\infty f(x,a)dx = \frac{\Gamma(\alpha)}{\Gamma(\alpha)} = 1$

Thus $f(x,a)$ satisfies the 2 basic properties of a probability distribution function.

We will now use this function to define the Gamma distribution and then the Chi-squared distribution.

A continuous random variable X is said to have a Gamma distribution if the pdf of X is:

$f(x;\alpha,\beta) = \frac{1}{\beta^\alpha \Gamma(\alpha)} x^{\alpha-1} e^{-x/\beta} for x \geq 0$

and

$f(x;\alpha,\beta) = 0$ otherwise

where the parameters $\alpha$ and $\beta$ are positive. The standard Gamma distribution has $\beta = 1$, so the pdf of a standard gamma is given by the $f(x,a)$ given above.

The Gamma distribution is widely used to model the extent of degradation such as corrosion, creep, wear or survival time.

The Gamma distribution is a family of distribution. Both the Exponential distribution and Chi-squared distribution are special case of the Gamma.

As we can see, the gamma distribution takes two arguments. The first ($\alpha$) defines the shape. If shape is close to zero, the gamma is very similar to the exponential. If shape is large, then the gamma is similar to the chi-squared distribution.

Now we will define the chi-squared distribution.

Let $\nu$ be a positive integer. Then a random variable $X$ is said to have a chi­-squared distribution with parameter $n$ if the pdf of $X$ is the gamma density with $\alpha = \nu /2$ and $\beta = 2$. The pdf of a chi-squared rv is thus:

$f(x,\nu) = \frac{1}{2^{\nu/2}\Gamma(\nu/2)}x^{\nu/2}e^{-x/2} for x \geq 0$

and

$f(x,\nu) = 0$ otherwise

The parameter $\nu$ is called the degrees of freedom df of X.

The chi-squared distribution is important because it is the basis for a number of procedures in statistical inference. The central role played by the chi-squared distribution in inference springs from its relationship to normal distributions. Concretely, the chi-squared distribution is the distribution of a sum of the squares of k independent standard normal random variables. For example, in the case of linear regression, the variable $(y_i - \hat{y}_i)$ follow a normal distribution, thus to model the variance of X, which is the sum of the square of these values, we use a chi-square distribution.

The chi-squared distribution is the foundation of chi-squared tests. There are 2 types:

• The goodness-of-fit test.
• The test of independence.

Perhaps we will look at these tests in details in another post.