Kahibaro
Discord Login Register

Confidence intervals

A confidence interval is a way to use sample data to give a range of plausible values for an unknown population parameter, such as a mean or a proportion. In inferential statistics, this is one of the two main tasks (the other is hypothesis testing): instead of just giving a single estimate, we report an interval together with a confidence level.

In this chapter, we focus on what is specific to confidence intervals: what they mean, how they are built in simple common cases, and how to interpret them correctly.

The idea of a confidence interval

Suppose you want to estimate the average height of all students in a large school (the population mean), but you only measure the heights of a small group (a sample). From the sample, you can compute:

A typical confidence interval has the form
$$
\text{estimate} \;\pm\; \text{margin of error}.
$$

For a mean, you might see:
$$
\bar{x} \;\pm\; \text{(critical value)} \times \text{(standard error)}.
$$

The standard error measures the typical size of sampling fluctuations in your estimate. The critical value depends on the confidence level (for example, $95\%$) and on the sampling distribution (like the normal or $t$-distribution).

The result is a range:
$$
[\text{lower bound},\ \text{upper bound}],
$$
which we report along with the confidence level.

Confidence level and its meaning

A confidence level (such as $90\%$, $95\%$, or $99\%$) describes the long-run performance of the method, not a probability that the specific interval you computed is correct.

Imagine you repeatedly:

  1. Draw a random sample from the same population.
  2. Compute a confidence interval using the same procedure and same confidence level.

Then:

Once you have one particular interval from your data, the true parameter is either inside it or not; there is no randomness left in the parameter. The confidence level refers to the reliability of the procedure, not to the probability of this one specific interval.

A common, but technically incorrect, wording is:

Structure of a confidence interval

Most basic confidence intervals follow this pattern:
$$
\text{estimate} \;\pm\; (\text{critical value}) \times (\text{standard error}).
$$

The more variable your estimate is (larger standard error), the wider your interval. A higher confidence level also leads to a larger critical value, and therefore a wider interval.

Confidence interval for a population mean (large-sample / known $\sigma$ case)

We start with a basic, idealized case that illustrates the structure clearly. Assume:

Then the standard error of the sample mean is
$$
\text{SE}(\bar{x}) = \frac{\sigma}{\sqrt{n}}.
$$

For a confidence level such as $95\%$, there is a corresponding $z$-critical value $z^\*$ so that:

Common $z^\*$ values:

The confidence interval for $\mu$ is then
$$
\bar{x} \;\pm\; z^\* \cdot \frac{\sigma}{\sqrt{n}}.
$$

The two endpoints are:

This formula shows:

In practice, $\sigma$ is usually unknown; then we typically replace it with the sample standard deviation and use a $t$-distribution. Detailed work with the $t$-distribution belongs to another chapter on distributions; here we focus on the idea that the critical value changes depending on the distribution used.

Confidence interval for a population proportion

Now consider estimating a population proportion $p$ (for example, the proportion of people in a city who support a certain policy). Suppose:

For a sufficiently large sample size (so that the sampling distribution of $\hat{p}$ is approximately normal), the standard error of $\hat{p}$ is
$$
\text{SE}(\hat{p}) \approx \sqrt{\frac{\hat{p}(1 - \hat{p})}{n}}.
$$

For a confidence level such as $95\%$, we again use a $z$-critical value $z^\*$, as in the mean case. The confidence interval for $p$ is
$$
\hat{p} \;\pm\; z^\* \cdot \sqrt{\frac{\hat{p}(1 - \hat{p})}{n}}.
$$

As before:

This is one of the most common types of confidence intervals in introductory statistics, especially in survey results.

How sample size and confidence level affect width

Three key factors influence the width of a confidence interval:

  1. Sample size $n$
    The standard error usually contains a factor $\dfrac{1}{\sqrt{n}}$, so:
    • Larger $n$ → smaller standard error → narrower interval.
    • Smaller $n$ → larger standard error → wider interval.
  2. Confidence level
    The critical value (such as $z^\*$) increases with the confidence level:
    • Higher confidence (e.g., $99\%$ instead of $95\%$) → larger critical value → wider interval.
    • Lower confidence (e.g., $90\%$ instead of $95\%$) → smaller critical value → narrower interval.

There is a trade-off: more confidence means less precision (wider interval), and more precision (narrower interval) means less confidence.

  1. Variability in the data
    The population standard deviation $\sigma$, or the estimated variability (like $\hat{p}(1-\hat{p})$ for proportions), affects the standard error:
    • More variability → larger standard error → wider interval.
    • Less variability → smaller standard error → narrower interval.

These relationships are crucial when planning studies: to achieve a certain margin of error at a chosen confidence level, you can determine what sample size you need.

Assumptions behind confidence intervals

Confidence intervals are not magic; they rely on assumptions about how the data were collected and what distributions are appropriate. Typical assumptions for simple intervals are:

If these assumptions are badly violated, the advertised confidence level (e.g., $95\%$) might no longer be accurate.

Correct and incorrect interpretations

Interpreting confidence intervals accurately is essential. Some common points:

Some further interpretive points:

Confidence intervals should be reported alongside the point estimate, because they convey the uncertainty as well as the central value.

Using confidence intervals in practice

In real applications, confidence intervals are used to express uncertainty in many settings:

When reporting results, it is common to write something like:

Such statements indicate both the estimate and its precision, helping others evaluate how reliable the estimate is.

Summary

In this chapter, we have focused on what is distinctive to confidence intervals in inferential statistics:

More advanced chapters can extend these ideas to other parameters, more complicated data structures, and alternative interval-construction methods.

Views: 13

Comments

Please login to add a comment.

Don't have an account? Register now!