13 Probability and Statistics

Table of Contents

Why Study Probability and Statistics?

Probability and statistics provide tools to understand and work with uncertainty and data. Many everyday situations do not have guaranteed outcomes: weather forecasts, games of chance, medical test results, traffic, stock prices, and even online product recommendations all involve uncertainty. Probability gives a language and framework to describe and analyze this uncertainty. Statistics uses data to learn, to make decisions, and to evaluate how reliable those decisions are.

In this chapter you will not yet go into technical details (these appear in later chapters such as “Probability Basics” or “Descriptive Statistics”). Instead, you will get a broad picture of what this subject is about, what kinds of questions it tries to answer, and how its two main parts—probability and statistics—relate to each other.

The Two Sides: Probability vs. Statistics

Probability and statistics are closely connected but not the same thing. They answer different types of questions.

Probability starts from a model of how random events behave and uses it to predict chances of outcomes.
Statistics starts from data and uses it to learn something about the world, often about an underlying probability model.

A rough way to distinguish:

Probability: “Given the rules, what is likely to happen?”
Statistics: “Given what happened, what can we say about the rules?”

Probability: From Rules to Outcomes

In probability, you assume you know how a random process works (or you choose a model for it), and then you calculate chances.

Typical probability questions:

If a fair coin is flipped 10 times, what is the chance of getting exactly 6 heads?
If the chance of rain tomorrow is $40\%$, what is the chance it will not rain?
In a simple lottery, what is the probability your ticket wins?

Here, the “rules” might be:

A fair coin: probability of heads is $0.5$ and tails is $0.5$.
A six-sided die: each face $1,2,3,4,5,6$ is equally likely with probability $1/6$.
A specified lottery mechanism: for example, drawing balls without replacement.

Once these rules (or assumptions) are in place, probability theory tells you how to combine them to find the chance of different outcomes or combinations of events.

Statistics: From Outcomes to Rules

In statistics, you start with data that has already been collected and try to learn something about the process that produced it.

Typical statistical questions:

After weighing a sample of apples from a farm, what can we say about the average weight of all apples on that farm?
A medical trial compares a new treatment with a standard one. From the patient outcomes, can we conclude that the new treatment is better?
A poll asks a random sample of voters which candidate they support. What can we say about the true proportion of all voters who support each candidate?

Here, you do not know the “rules” exactly. Instead, you:

Use data to estimate unknown quantities (for example, an average or a proportion).
Assess how much uncertainty there is in your estimates.
Test whether patterns in the data are strong enough to support certain conclusions.

Randomness, Uncertainty, and Variability

Probability and statistics both deal with situations where things are not perfectly predictable. This can be because:

The process is genuinely random (for example, quantum events, radioactive decay).
The system is too complex to track exactly (for example, traffic patterns, human behavior).
We only observe a small part of a large population (for example, a sample survey).

Some common ideas:

Random experiment: A process with an outcome that is not known in advance, even if repeated under the same conditions (for example, rolling a die).
Outcome: The result of one run of the experiment (for example, rolling a 4).
Uncertainty: You do not know in advance which outcome will occur.
Variability: Repeated observations do not all give the same value; they vary.

Probability provides a way to quantify uncertainty. Statistics uses that quantification to understand and model variability in data.

Key Objects in Probability and Statistics

Later chapters will define these precisely; here we describe them conceptually.

Populations and Samples

In statistics, you often distinguish between:

Population: The entire collection you are interested in. Examples:

All people living in a country.
All manufactured parts from a factory.
All tosses of a particular coin, in principle.

Sample: A smaller subset that you actually observe or measure. Examples:

1,000 people selected from the country for a survey.
50 parts taken from the factory’s production line.
20 coin tosses recorded in an experiment.

The goal is to use the sample to learn about the population. Careful methods are needed to select samples in ways that make such learning trustworthy.

Events and Probabilities

On the probability side, the objects of interest include:

Event: A set of outcomes that share a particular property.

Example: “The die shows an even number” is the event $\{2, 4, 6\}$.
Example: “It rains tomorrow” is an event with two basic outcomes: rain, no rain.

Probability of an event: A number between $0$ and $1$ describing how likely the event is to occur.

Probability $0$ means it never occurs (under the model).
Probability $1$ means it always occurs (under the model).
Probability $0.5$ suggests equal chance of happening or not happening.

You use probability rules to combine and compare events and their chances.

Data

Statistics focuses on data: collected information, usually in numerical or categorical form.

Examples of data:

Heights (in cm) of 100 students.
Daily rainfall (in mm) over a year.
Survey responses: “yes/no,” “agree/disagree,” or multiple-choice categories.

Data can be:

Quantitative (numerical): You can measure them on a number line and compute averages, differences, etc.
Qualitative (categorical): Describe types or categories; you count how many fall into each category.

Data are often summarized and visualized before any deeper analysis is done.

The Flow from Probability to Statistics and Back

Probability and statistics are deeply connected. A typical cycle looks like this:

Modeling with probability
You propose a probability model for a situation. For example, you might assume that measurement errors are “normally distributed” around the true value, or that each customer arrives at random times according to some distribution.
Data collection
You gather data: conduct experiments, run surveys, record measurements.
Statistical analysis
You:

Summarize the data.
Check whether the data appear consistent with your model.
Estimate unknown parameters (for example, an average, a variance, or a probability).
Quantify uncertainty and test hypotheses.

Model revision
If the data do not match the model well, you may revise the probability model and repeat the cycle.

In many real applications, the model is never “perfect.” Instead, you look for models that are useful—they describe reality well enough for practical decisions.

Everyday Situations Involving Probability and Statistics

Even without doing formal calculations, you already make informal probabilistic and statistical judgments:

Weather forecasting: Hearing “$60\%$ chance of rain” is a probability statement. Meteorologists use past data (statistics) to build models (probability) that produce such forecasts.
Games and gambling: Rolling dice, drawing cards, or spinning a roulette wheel all involve known rules and random outcomes. Casinos rely on careful probability calculations; players often make intuitive but sometimes mistaken probability judgments.
Medicine: Drug trials collect data on patient responses. Statistical analysis decides whether a treatment is effective and safe, and quantifies risks and benefits.
Quality control: Factories inspect samples of products. Statistics helps to estimate the defect rate and decide whether the process is working properly.
Finance and insurance: Insurance companies estimate the chance and cost of events such as accidents or natural disasters. They use probability models and past data to set premiums.
Polling and elections: Polls ask a sample of people how they intend to vote. Statistical methods turn these responses into predictions and attach margins of error.

In all these cases, the key questions are:

“How likely is this event?”
“How much trust can we place in this estimate or prediction?”
“Is this pattern in the data real, or could it be due to chance variation?”

Probability and statistics supply tools to answer such questions systematically.

The Role of Assumptions and Models

Neither probability nor statistics can work without assumptions. A model is a simplified description of reality that captures essential features while ignoring others.

Examples of common modeling assumptions:

Assuming a coin is fair: $P(\text{heads}) = P(\text{tails}) = 0.5$.
Assuming measurement errors follow a particular distribution.
Assuming each person in a survey has the same chance of being selected, and that selections are independent.

These assumptions are not always exactly true, but they can be reasonable approximations. Statistics then asks:

Are these assumptions plausible when compared with data?
How sensitive are the conclusions to the assumptions?

Learning to question and understand assumptions is a central skill in probability and statistics.

Typical Questions This Part of the Course Will Address

Later chapters in this section of the course will address questions such as:

How do we assign probabilities in simple situations?
How do we describe and summarize data with numbers and graphs?
How do we quantify the center and spread (variability) of a dataset?
How do we model random variables and their distributions?
How do we estimate unknown parameters (like means and proportions) and express uncertainty using confidence intervals?
How do we test claims about populations using sample data (hypothesis testing)?

This introductory chapter prepares you conceptually for these topics by highlighting the types of problems probability and statistics are designed to solve and how they complement each other.

Building Intuition and Caution

When working with probability and statistics, two habits are especially important:

Developing intuition:
Try to get an informal sense of whether a probability or statistical result “makes sense” in context. For example, does it seem reasonable that a certain rare event happened purely by chance, or is it more likely that some assumption is wrong?
Being cautious:
Many incorrect conclusions arise from:

Confusing correlation with causation.
Misinterpreting probabilities (for example, misunderstanding “$p$-values” or risk figures).
Ignoring how data were collected or how representative a sample is.

Later chapters in this section will introduce standard, reliable methods designed to reduce these errors and to guide sound interpretation.

How This Section Fits into the Overall Course

Within the broader course, “Probability and Statistics” plays a particular role:

It connects earlier ideas from arithmetic and algebra (for example, working with numbers and formulas) to real-world decision making under uncertainty.
It supports later topics in various areas of mathematics and applied fields, such as:

Random processes in differential equations.
Data-driven modeling in science and engineering.
Risk analysis in economics and finance.
Learning algorithms in computer science and machine learning.

The next chapters—“Probability Basics,” “Random Variables,” “Probability Distributions,” “Descriptive Statistics,” and “Inferential Statistics”—will build from this conceptual foundation to give you specific tools and techniques.

13.1 Probability Basics

▼

13.2 Random Variables

▼

13.3 Probability Distributions

▼

13.4 Descriptive Statistics

▼

13.5 Inferential Statistics

▼

Comments

Please login to add a comment.

Don't have an account? Register now!