Kahibaro
Discord Login Register

Mean

The mean is one of the most common measures of the “center” of a set of data. In everyday language, it is often called the “average.”

In this chapter we focus on:

You can assume that ideas like “data set” and “descriptive statistics” are already introduced elsewhere.

The arithmetic mean (the usual average)

Suppose you have $n$ numbers:
$$
x_1, x_2, x_3, \dots, x_n.
$$

The arithmetic mean (usually just called “the mean”) is defined as
$$
\bar{x} = \frac{x_1 + x_2 + x_3 + \dots + x_n}{n}.
$$

We often write:

Using the summation symbol $\sum$, the same formula becomes
$$
\bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i.
$$

This is simply: add all the values, then divide by how many there are.

Example of computing an arithmetic mean

Data: $5,\ 7,\ 3,\ 5$.

So the mean of this data set is $5$.

Mean of grouped data (with frequencies)

Sometimes data are given in a frequency table, rather than as a full list. For example:

You can think of this as the data set
$$
2,2,2,4,4,4,4,4,7,7
$$
but we do not want to write them all out.

Let:

Then:

Example: mean from a frequency table

Suppose a test score table is:

Compute the mean:

You do not need to list all 7 scores to find the mean.

Weighted mean

A weighted mean is a generalization of the frequency mean. It is used when some values “count more” than others, not because they appear more often, but because they have different weights or importance.

If we have values $x_1, x_2, \dots, x_k$ with associated weights $w_1, w_2, \dots, w_k$, then the weighted mean is
$$
\bar{x}_w = \frac{\sum_{i=1}^{k} w_i x_i}{\sum_{i=1}^{k} w_i}.
$$

Here the $w_i$ do not have to be whole numbers, and they do not have to be counts. They are just nonnegative numbers representing how “strongly” each value should influence the mean.

Example: course grade as a weighted mean

Suppose a course grade is calculated as:

Let the weights be the percentages (in decimal form):

Then
$$
\bar{x}_w = \frac{0.40\cdot80 + 0.25\cdot70 + 0.35\cdot90}{0.40 + 0.25 + 0.35}.
$$

Since $0.40 + 0.25 + 0.35 = 1$, this simplifies to
$$
\bar{x}_w = 0.40\cdot80 + 0.25\cdot70 + 0.35\cdot90.
$$

Now compute:
$$
0.40\cdot80 = 32,\quad 0.25\cdot70 = 17.5,\quad 0.35\cdot90 = 31.5.
$$
Add:
$$
\bar{x}_w = 32 + 17.5 + 31.5 = 81.
$$

So the mean course grade is $81$.

The key idea: values with higher weight pull the weighted mean more strongly toward themselves.

Interpreting the mean

The mean has several important interpretations.

Balance point interpretation

If we imagine each data value as a point mass placed on a number line, the mean is the balance point of all these masses.

For data $x_1, \dots, x_n$, the mean $\bar{x}$ is the unique point where the “torque” (or total moment) to the left equals the torque to the right.

This is why:

Center of “typical” value

In descriptive statistics, the mean is often used as a measure of central tendency:

However, the mean alone does not tell you how spread out the data are, nor whether the data have extreme values or are skewed. Those ideas are treated in other chapters (like variance, standard deviation, and shape of distributions).

Effect of changing data on the mean

Certain operations on the data change the mean in predictable ways. These rules are very useful for mental calculations and for understanding how the mean behaves.

Let $x_1, \dots, x_n$ have mean $\bar{x}$.

Adding a constant to all data values

If we add the same number $c$ to every value to get new data
$$
y_i = x_i + c \quad (i = 1,\dots,n),
$$
then the new mean $\bar{y}$ is
$$
\bar{y} = \bar{x} + c.
$$

So adding a constant to all data points shifts the mean by that constant.

Multiplying all data values by a constant

If we multiply every value by $c$:
$$
y_i = c x_i,
$$
then the new mean is
$$
\bar{y} = c \bar{x}.
$$

So multiplying all data points by a constant scales the mean by that constant.

Combining two data sets

Consider two data sets:

If we combine them into one big data set of $n_1 + n_2$ values, the combined mean $\bar{x}$ is
$$
\bar{x} = \frac{n_1 \bar{x}_1 + n_2 \bar{x}_2}{n_1 + n_2}.
$$

This formula is often used, for example, when combining class averages or averages from different time periods.

Sensitivity to extreme values (outliers)

The mean is sensitive to outliers—data values that are much larger or smaller than most of the rest.

Example:

By changing just one value from $5$ to $50$, the mean changed from $3$ to $12$, which no longer looks like a “typical” value in the data. This shows:

Understanding this helps you interpret the mean correctly rather than taking it as a perfect description of the data.

Sample mean vs population mean (notation only)

In statistics it is common to distinguish between:

The population mean is usually denoted by $\mu$ (the Greek letter “mu”), while the sample mean is usually denoted by $\bar{x}$.

The formulas look the same, but the interpretation differs: $\mu$ is a fixed (but often unknown) characteristic of the whole population, while $\bar{x}$ is computed from the sample and used to estimate $\mu$.

The deeper ideas of estimation and sampling are treated in other chapters; here we only note the notational difference.

Summary

In this chapter you learned that:

These facts form the basis for using the mean in more advanced parts of descriptive and inferential statistics.

Views: 10

Comments

Please login to add a comment.

Don't have an account? Register now!