13.5 Inferential Statistics

Table of Contents

Inferential statistics is about using information from a part of a group (a sample) to say something about the whole group (a population), while taking uncertainty into account. It contrasts with descriptive statistics, which only summarizes the data you actually have.

In this chapter you will see the basic ideas that lie behind the more specific topics of confidence intervals and hypothesis testing. The focus here is on the general logic and structure of inference, not on all the formulas or special cases (those belong to the subchapters).

Populations, Samples, and Parameters

Inferential statistics always involves three key ideas:

A population is the complete set you care about: all voters in a country, all manufactured bolts from a factory in a year, all students at a school, etc.
A sample is a subset of the population that you actually observe or measure.
A parameter is a number that describes the population, such as the true average height of all adults in a city or the true proportion of defective items from a factory.

You typically do not know the parameter. Instead, you:

Collect a sample.
Calculate statistics from the sample (for example, sample mean, sample proportion).
Use these statistics to infer something about the unknown population parameter.

Statistics are known (computed from data). Parameters are unknown constants you are trying to learn about.

Sampling and Randomness

Not every sample is equally useful for inference. To trust the results, the method of selecting the sample matters more than its sheer size.

Two important ideas:

Random sampling
A sample is random if every member of the population had a known (often equal) chance of being selected. This helps avoid systematic bias, where certain types of individuals are over- or under-represented.
Independence
Observations are independent if knowing one observation gives you no information about another. Many standard methods in inferential statistics assume independence (or approximate independence). For example, if you randomly sample people without replacement from a large population, independence is approximately satisfied.

Good inference usually relies on:

Samples that are representative of the population.
Procedures that are as random as practical.
Awareness of any deviations from ideal random sampling (for example, nonresponse, convenience sampling).

Sampling Variability and the Idea of a Sampling Distribution

If you took many different random samples from the same population and computed the same statistic each time (for example, the sample mean), you would not get exactly the same answer every time. The values would vary from sample to sample. This is called sampling variability.

The sampling distribution of a statistic is the distribution of its values over all possible random samples (of a given size) from the population.

Important consequences:

A single statistic (such as one sample mean) is not equal to the parameter; it is just an estimate that varies from sample to sample.
Inferential methods must account for this variability to avoid overconfidence in our conclusions.

Many standard inferential procedures (like those in confidence intervals and hypothesis tests) are built on approximations to sampling distributions. For example:

Under some conditions, the sampling distribution of the sample mean is approximately normal (bell-shaped), even if the population itself is not. This is a consequence of the central limit theorem (discussed elsewhere in the course).

Point Estimates and Interval Estimates

Inferential statistics uses sample statistics in two main ways:

Point estimate
A point estimate is a single number used as a best guess for a parameter. For example:

The sample mean $\bar{x}$ as a point estimate of the population mean $\mu$.
The sample proportion $\hat{p}$ as a point estimate of the population proportion $p$.

Point estimates are simple but do not show how uncertain they are.

Interval estimate
An interval estimate gives a range of plausible values for the parameter. A key example is the confidence interval, which you will study in detail in the next subchapter. In general form, an interval estimate looks like:
$$\text{estimate} \pm \text{margin of error}.$$

The margin of error reflects the typical size of sampling variability for the statistic.

Inferential statistics is largely about constructing reasonable point and interval estimates and interpreting them correctly.

Standard Error and Margin of Error

To turn a sample statistic into an interval that reflects uncertainty, we need to know how much that statistic typically varies from sample to sample. This is where the standard error comes in.

The standard error (SE) of a statistic measures the typical size of its sampling variability. It is essentially the standard deviation of its sampling distribution.

Once you have a standard error, many interval estimates look like:
$$\text{estimate} \pm \text{(multiplier)} \times \text{SE}.$$

The multiplier depends on how confident you want to be (for example, 95% confidence vs 99%) and on the shape of the sampling distribution (often related to the normal distribution or $t$-distribution).
The product of the multiplier and the SE is usually called the margin of error.

Understanding standard error and margin of error is central to both confidence intervals and hypothesis tests.

Confidence, Probability, and Uncertainty

Inferential statistics uses probability to express uncertainty. Two key ideas:

Uncertainty is about methods, not about fixed past data
Once you have collected a sample and computed an interval, the population parameter is either inside that interval or not; it is not “random.” The probability statements in inferential statistics are usually about the procedure in the long run, not about the specific number in hand.
Confidence level vs probability
When we say “a 95% confidence interval,” we mean:

If we repeated the whole process (sampling and interval construction) many times, about 95% of the intervals computed this way would contain the true parameter.
This does not mean there is a 95% chance that this particular interval contains the parameter in a literal sense. However, in practice, people often informally talk that way, so it is important to know the more precise interpretation.

Inferential statistics provides a structured way to quantify uncertainty so that conclusions are not simply “yes/no” but graded by how strongly the data support them.

The General Logic of Hypothesis Testing

Hypothesis testing is one of the main tools of inferential statistics. The details and formulas are covered in its own subchapter; here the goal is to present the basic logical structure.

A typical hypothesis test follows this pattern:

State hypotheses about a parameter

The null hypothesis $H_0$ usually represents “no effect,” “no difference,” or a status quo value of a parameter (for example, $\mu = 0$ or $p = 0.5$).
The alternative hypothesis $H_1$ (or $H_a$) represents the type of difference or effect you are interested in (for example, $\mu > 0$ or $p \neq 0.5$).

Assume the null hypothesis is true
Under that assumption, you work out (or rely on known results for) the sampling distribution of your chosen test statistic.
Compute a test statistic from the data
A test statistic is a function of the sample that measures how far the sample is from what you would expect under $H_0$ (often measured in “standard errors”).
Calculate a p-value
The p-value is the probability, under $H_0$, of getting a test statistic at least as extreme as the one observed. Symbolically:
$$\text{p-value} = P(\text{test statistic is as or more extreme than observed} \mid H_0\ \text{true}).$$
Make a decision

Compare the p-value to a pre-chosen significance level $\alpha$ (for example, $\alpha = 0.05$).
If p-value $\le \alpha$, you reject $H_0$ (this is sometimes called a “statistically significant” result).
If p-value $> \alpha$, you do not reject $H_0$ (you conclude that there is not enough evidence against it, given the data and the test).

Hypothesis testing does not prove that a hypothesis is true or false. Instead, it measures how compatible the observed data are with the null hypothesis, according to a specified model and significance level.

Types of Errors in Decision Making

Since inferential statistics uses samples subject to variability, mistakes are possible even when standard methods are followed correctly. In hypothesis testing, there are two basic types of errors:

Type I error
Rejecting a true null hypothesis.
The probability of a type I error is usually denoted by $\alpha$ and is chosen in advance (for example, 0.05).
Type II error
Failing to reject a false null hypothesis.
The probability of a type II error is usually denoted by $\beta$, and - \beta$ is called the power of the test (the probability of correctly detecting an effect when it exists).

Inferential statistics involves balancing these risks: making it unlikely that you reject $H_0$ when it is true, while still having enough power to detect meaningful differences.

Role of Sample Size

Sample size has a strong influence on inference:

Larger samples generally lead to smaller standard errors.
Smaller standard errors lead to narrower confidence intervals and more sensitive hypothesis tests.
With very large samples, even very tiny differences from the null hypothesis can become statistically significant, which emphasizes the difference between:

Statistical significance (unlikely to be due to chance under the model).
Practical significance (large enough to matter in real-world terms).

Inferential statistics requires you to think both statistically and contextually when interpreting results.

Assumptions and Model-Based Inference

Many inferential techniques are model-based. They assume that:

The data come from a particular kind of random process (for example, independent observations from a normal distribution).
Certain conditions hold (for example, approximate normality, equal variances, or large enough sample size for an approximation).

If the assumptions are seriously violated, the conclusions of the inferential methods may be misleading.

Key practices include:

Checking whether assumptions seem roughly reasonable using plots or simple diagnostics.
Using alternative methods when assumptions appear badly violated (for example, nonparametric methods or resampling methods, which are discussed elsewhere).

Putting It Together

Inferential statistics provides a framework to answer questions such as:

“What range of values for the true average test score is consistent with what we observed?”
“Is there evidence that a new medication changes recovery times compared to the old one?”
“Based on a sample, what can we say about the proportion of defective products produced by a machine?”

The central pieces you have seen are:

Distinguishing populations, samples, parameters, and statistics.
Recognizing sampling variability and the role of sampling distributions.
Using point estimates and interval estimates (with standard error and margin of error).
Understanding the logic of hypothesis testing, p-values, and error types.
Appreciating the importance of assumptions, sample size, and the difference between statistical and practical significance.

The subchapters on confidence intervals and hypothesis testing will build directly on these ideas and show how to apply them in specific, commonly used methods.