Kahibaro
Discord Login Register

13.3.2 Normal distribution

Understanding the Normal Distribution

The normal distribution is one of the most important continuous probability distributions in statistics. Many natural and human-made measurements are approximately normally distributed, especially when they result from the sum of many small, random influences (for example, heights, test scores, measurement errors).

In this chapter, we focus specifically on what makes a distribution “normal,” how it is described, and how it is used in practice.

Shape and Key Features

A normal distribution has a very distinctive shape:

Two parameters completely describe a normal distribution:

If a random variable $X$ is normally distributed with mean $\mu$ and standard deviation $\sigma$, we write
$$
X \sim N(\mu, \sigma^2).
$$

Here $ \sigma^2 $ is the variance.

Symmetry and the Mean/Median/Mode

For a normal distribution:

The Probability Density Function (PDF)

The normal distribution is continuous, so it is described by a probability density function (PDF), not by a list of discrete probabilities.

For $X \sim N(\mu, \sigma^2)$, the PDF is
$$
f(x) = \frac{1}{\sigma \sqrt{2\pi}} \exp\!\left( -\frac{(x - \mu)^2}{2\sigma^2} \right).
$$

Key points:

There is no simple formula for these integrals, so in practice we use:

The Standard Normal Distribution

A particularly important special case is the standard normal distribution, which has mean 0 and standard deviation 1:

$$
Z \sim N(0, 1).
$$

Its PDF is
$$
\phi(z) = \frac{1}{\sqrt{2\pi}} \exp\!\left( -\frac{z^2}{2} \right).
$$

The symbol $ \phi(z) $ is often used for this PDF. The cumulative distribution function (CDF) of the standard normal, usually written as $ \Phi(z) $, is
$$
\Phi(z) = P(Z \le z).
$$

Standard normal tables give values of $ \Phi(z) $ for many $z$ values. Software and calculators can compute $ \Phi(z) $ directly.

The $z$-Score and Standardization

Any normal random variable $X \sim N(\mu, \sigma^2)$ can be converted into a standard normal variable $Z$ by the transformation
$$
Z = \frac{X - \mu}{\sigma}.
$$

This process is called standardization.

The quantity
$$
z = \frac{x - \mu}{\sigma}
$$
for a particular observed value $x$ is called the $z$-score of $x$. It tells you how many standard deviations $x$ is above or below the mean:

Because of standardization, probabilities involving any normal variable $X$ can be turned into probabilities involving $Z$:

The Empirical Rule (68–95–99.7 Rule)

For any normal distribution $N(\mu, \sigma^2)$, approximate proportions of the data lie within certain distances of the mean:

This pattern is sometimes called the 68–95–99.7 rule or the empirical rule. It gives a quick way to judge how unusual a data point is:

Using the Normal Distribution in Practice

Here are typical ways the normal distribution is used:

Approximating Real-World Measurements

Many measurements or test scores are modeled as normal distributions. Once you assume $X \sim N(\mu, \sigma^2)$, you can:

Finding Percentiles and Cutoffs

The $p$th percentile of a normal distribution is the value $x_p$ such that
$$
P(X \le x_p) = p.
$$

Using standardization:

  1. Find $z_p$ so that $\Phi(z_p) = p$ (from tables or software).
  2. Convert back: $x_p = \mu + z_p \sigma$.

This is how you find, for example, the score that marks the top 5% of a normally distributed test.

Converting Between Raw Scores and $z$-Scores

This allows you to compare results from different normal distributions on a common scale (the $z$-scale).

Normal Approximation to Other Distributions (Idea Only)

The normal distribution is often used to approximate other distributions in certain conditions, especially when sample sizes are large.

One important case is when a sum (or average) of many small, independent random effects is involved. Under suitable conditions, such sums are approximately normal. This idea is formalized in the Central Limit Theorem, which is treated elsewhere, but it helps explain why the normal distribution appears so frequently.

Limitations and Cautions

While the normal distribution is very useful, it is not always appropriate:

Despite these limitations, the normal distribution remains a central tool in probability and statistics, especially when working with continuous data, $z$-scores, and approximate probabilities.

Views: 56

Comments

Please login to add a comment.

Don't have an account? Register now!