Kahibaro
Discord Login Register

Inferential Statistics

Inferential statistics is about using information from a part of a group (a sample) to say something about the whole group (a population), while taking uncertainty into account. It contrasts with descriptive statistics, which only summarizes the data you actually have.

In this chapter you will see the basic ideas that lie behind the more specific topics of confidence intervals and hypothesis testing. The focus here is on the general logic and structure of inference, not on all the formulas or special cases (those belong to the subchapters).

Populations, Samples, and Parameters

Inferential statistics always involves three key ideas:

You typically do not know the parameter. Instead, you:

Statistics are known (computed from data). Parameters are unknown constants you are trying to learn about.

Sampling and Randomness

Not every sample is equally useful for inference. To trust the results, the method of selecting the sample matters more than its sheer size.

Two important ideas:

  1. Random sampling
    A sample is random if every member of the population had a known (often equal) chance of being selected. This helps avoid systematic bias, where certain types of individuals are over- or under-represented.
  2. Independence
    Observations are independent if knowing one observation gives you no information about another. Many standard methods in inferential statistics assume independence (or approximate independence). For example, if you randomly sample people without replacement from a large population, independence is approximately satisfied.

Good inference usually relies on:

Sampling Variability and the Idea of a Sampling Distribution

If you took many different random samples from the same population and computed the same statistic each time (for example, the sample mean), you would not get exactly the same answer every time. The values would vary from sample to sample. This is called sampling variability.

The sampling distribution of a statistic is the distribution of its values over all possible random samples (of a given size) from the population.

Important consequences:

Many standard inferential procedures (like those in confidence intervals and hypothesis tests) are built on approximations to sampling distributions. For example:

Point Estimates and Interval Estimates

Inferential statistics uses sample statistics in two main ways:

  1. Point estimate
    A point estimate is a single number used as a best guess for a parameter. For example:
    • The sample mean $\bar{x}$ as a point estimate of the population mean $\mu$.
    • The sample proportion $\hat{p}$ as a point estimate of the population proportion $p$.

Point estimates are simple but do not show how uncertain they are.

  1. Interval estimate
    An interval estimate gives a range of plausible values for the parameter. A key example is the confidence interval, which you will study in detail in the next subchapter. In general form, an interval estimate looks like:
    $$\text{estimate} \pm \text{margin of error}.$$

The margin of error reflects the typical size of sampling variability for the statistic.

Inferential statistics is largely about constructing reasonable point and interval estimates and interpreting them correctly.

Standard Error and Margin of Error

To turn a sample statistic into an interval that reflects uncertainty, we need to know how much that statistic typically varies from sample to sample. This is where the standard error comes in.

Once you have a standard error, many interval estimates look like:
$$\text{estimate} \pm \text{(multiplier)} \times \text{SE}.$$

Understanding standard error and margin of error is central to both confidence intervals and hypothesis tests.

Confidence, Probability, and Uncertainty

Inferential statistics uses probability to express uncertainty. Two key ideas:

  1. Uncertainty is about methods, not about fixed past data
    Once you have collected a sample and computed an interval, the population parameter is either inside that interval or not; it is not “random.” The probability statements in inferential statistics are usually about the procedure in the long run, not about the specific number in hand.
  2. Confidence level vs probability
    When we say “a 95% confidence interval,” we mean:
    • If we repeated the whole process (sampling and interval construction) many times, about 95% of the intervals computed this way would contain the true parameter.
    • This does not mean there is a 95% chance that this particular interval contains the parameter in a literal sense. However, in practice, people often informally talk that way, so it is important to know the more precise interpretation.

Inferential statistics provides a structured way to quantify uncertainty so that conclusions are not simply “yes/no” but graded by how strongly the data support them.

The General Logic of Hypothesis Testing

Hypothesis testing is one of the main tools of inferential statistics. The details and formulas are covered in its own subchapter; here the goal is to present the basic logical structure.

A typical hypothesis test follows this pattern:

  1. State hypotheses about a parameter
    • The null hypothesis $H_0$ usually represents “no effect,” “no difference,” or a status quo value of a parameter (for example, $\mu = 0$ or $p = 0.5$).
    • The alternative hypothesis $H_1$ (or $H_a$) represents the type of difference or effect you are interested in (for example, $\mu > 0$ or $p \neq 0.5$).
  2. Assume the null hypothesis is true
    Under that assumption, you work out (or rely on known results for) the sampling distribution of your chosen test statistic.
  3. Compute a test statistic from the data
    A test statistic is a function of the sample that measures how far the sample is from what you would expect under $H_0$ (often measured in “standard errors”).
  4. Calculate a p-value
    The p-value is the probability, under $H_0$, of getting a test statistic at least as extreme as the one observed. Symbolically:
    $$\text{p-value} = P(\text{test statistic is as or more extreme than observed} \mid H_0\ \text{true}).$$
  5. Make a decision
    • Compare the p-value to a pre-chosen significance level $\alpha$ (for example, $\alpha = 0.05$).
    • If p-value $\le \alpha$, you reject $H_0$ (this is sometimes called a “statistically significant” result).
    • If p-value $> \alpha$, you do not reject $H_0$ (you conclude that there is not enough evidence against it, given the data and the test).

Hypothesis testing does not prove that a hypothesis is true or false. Instead, it measures how compatible the observed data are with the null hypothesis, according to a specified model and significance level.

Types of Errors in Decision Making

Since inferential statistics uses samples subject to variability, mistakes are possible even when standard methods are followed correctly. In hypothesis testing, there are two basic types of errors:

Inferential statistics involves balancing these risks: making it unlikely that you reject $H_0$ when it is true, while still having enough power to detect meaningful differences.

Role of Sample Size

Sample size has a strong influence on inference:

Inferential statistics requires you to think both statistically and contextually when interpreting results.

Assumptions and Model-Based Inference

Many inferential techniques are model-based. They assume that:

If the assumptions are seriously violated, the conclusions of the inferential methods may be misleading.

Key practices include:

Putting It Together

Inferential statistics provides a framework to answer questions such as:

The central pieces you have seen are:

The subchapters on confidence intervals and hypothesis testing will build directly on these ideas and show how to apply them in specific, commonly used methods.

Views: 12

Comments

Please login to add a comment.

Don't have an account? Register now!