Kahibaro
Discord Login Register

Standard deviation

Standard deviation is a number that tells you how spread out a set of data values is around its mean (average). In this chapter, we focus on understanding what standard deviation measures, how to compute it in simple cases, and how to interpret it in context.

Because this chapter is part of Descriptive Statistics, we will assume you already know what a mean is and have a basic idea of what “spread” or “variability” means from the chapter on variance. Here, we connect that idea to a more practical and commonly used measure: standard deviation.

Why standard deviation is useful

When you have a data set, two questions often matter:

  1. Where are the data values centered? (This is measured by the mean or other “center” measures.)
  2. How spread out are the data values? (This is measured by things like range, variance, and standard deviation.)

Range (largest minus smallest) uses only two data points. Variance uses all the data points but is expressed in “squared units,” which can feel unnatural. Standard deviation fixes this by being:

For example, if you measure test scores in points, the standard deviation is also in points. If you measure height in centimeters, the standard deviation is in centimeters.

Relationship between variance and standard deviation

Variance measures average squared distance from the mean. Standard deviation is simply the square root of the variance.

If the variance is $s^2$, the standard deviation is
$$
s = \sqrt{s^2}.
$$

So:

In practice, people usually talk about standard deviation more often than variance because its units are easier to interpret.

Population vs. sample standard deviation

There are two common situations:

The formulas are slightly different in the denominator, just as for variance.

Population standard deviation

Suppose your population has $N$ values:
$$
x_1, x_2, \dots, x_N
$$
and the population mean is
$$
\mu = \frac{1}{N}\sum_{i=1}^{N} x_i.
$$

The population variance is
$$
\sigma^2 = \frac{1}{N}\sum_{i=1}^{N}(x_i - \mu)^2,
$$
and the population standard deviation is
$$
\sigma = \sqrt{\sigma^2}
= \sqrt{\frac{1}{N}\sum_{i=1}^{N}(x_i - \mu)^2}.
$$

Here the symbol $\sigma$ (Greek letter “sigma”) is used for the population standard deviation.

Sample standard deviation

Now suppose you only have a sample of $n$ values from a larger population:
$$
x_1, x_2, \dots, x_n,
$$
with sample mean
$$
\bar{x} = \frac{1}{n}\sum_{i=1}^{n} x_i.
$$

The sample variance is
$$
s^2 = \frac{1}{n - 1}\sum_{i=1}^{n}(x_i - \bar{x})^2,
$$
and the sample standard deviation is
$$
s = \sqrt{s^2}
= \sqrt{\frac{1}{n - 1}\sum_{i=1}^{n}(x_i - \bar{x})^2}.
$$

Notice the denominator $n - 1$ instead of $n$. This adjustment is used when you are using the sample to estimate variability in a larger population. The important practical point: when working with a sample, use $n - 1$ in the denominator.

Step-by-step computation (sample standard deviation)

To make the process concrete, here is the step-by-step procedure for a sample:

  1. List your data values.
  2. Compute the sample mean $\bar{x}$.
  3. Subtract the mean from each value to get deviations:
    $x_i - \bar{x}$.
  4. Square each deviation: $(x_i - \bar{x})^2$.
  5. Sum the squared deviations:
    $\sum_{i=1}^{n}(x_i - \bar{x})^2$.
  6. Divide by $n - 1$ to get the sample variance $s^2$.
  7. Take the square root of the variance to get $s$, the standard deviation.

These same steps apply to the population standard deviation, with the only change being that you divide by $N$ instead of $n - 1$ in step 6.

Interpreting the size of the standard deviation

The standard deviation tells you how tightly or loosely the data are grouped around the mean.

The actual number has to be interpreted in the context of the units and the mean. For example:

Standard deviation and distance from the mean

Standard deviation gives a rough sense of how far a “typical” data value lies from the mean, but it is not exactly the average distance. It’s the square root of the average squared distance.

However, for many kinds of data (especially when the distribution is roughly bell-shaped), a useful rule of thumb is:

The exact percentages depend on the shape of the distribution and are explored more formally in other chapters (particularly when you study the normal distribution).

Standard deviation in grouped data (optional idea)

Often data are summarized in frequency tables, where exact raw values are not all listed, but counts (frequencies) of each value or each class are given. In such cases:

The underlying idea does not change: standard deviation is still the square root of the variance, and it still measures spread around the mean.

Comparing variability with standard deviations

Standard deviation is especially helpful for comparing how variable two data sets are, even if they have similar means.

Suppose Data Set A and Data Set B both have a mean of $50$:

This kind of comparison is common in many fields:

Practical notes

Understanding standard deviation prepares you for later chapters where:

Views: 14

Comments

Please login to add a comment.

Don't have an account? Register now!