Kahibaro
Discord Login Register

Probability Distributions

In probability, we are often interested in more than just “what can happen” (the sample space) and “how likely is each event.” We also care about numerical outcomes: scores, heights, waiting times, profits, and so on. A probability distribution is a way to describe, in a precise and compact form, the probabilities of all possible values of a random variable.

This chapter gives a general introduction to probability distributions before you study particular examples (like the binomial and normal distributions) in later chapters.

Random variables and their distributions

A random variable is a variable whose value is determined by a random process. It assigns a number to each outcome of an experiment. For example:

The probability distribution of a random variable is the rule that tells you how probabilities are spread over its possible values.

Informally:

Discrete vs continuous distributions

Random variables, and therefore distributions, are usually divided into two types.

These two types of variables have probability distributions described in different ways.

Discrete probability distributions

For a discrete random variable $X$, you can list each possible value and the probability that $X$ takes that value.

The function that assigns these probabilities is called the probability mass function (pmf) of $X$.

If the possible values of $X$ are $x_1, x_2, x_3, \dots$, then the pmf is the function
$$
p(x) = P(X = x).
$$
This function must satisfy:

  1. $p(x) \ge 0$ for all possible $x$.
  2. The sum of probabilities over all possible values equals 1:
    $$
    \sum_x p(x) = 1.
    $$

Simple examples of discrete distributions

  1. Single fair die
    Let $X$ be the number rolled on a fair six-sided die. The possible values are $ 1,2,3,4,5,6$.
    The probability mass function is
    $$
    p(x) = P(X = x) = \frac{1}{6} \quad \text{for } x = 1,2,3,4,5,6.
    $$
    For all other $x$, $p(x) = 0$.
  2. Counting heads
    Flip a fair coin twice and let $Y$ be the number of heads. The possible values are $ 0,1,2$.
    A simple table can describe the distribution:
$y$$P(Y = y)$
0$1/4$
1$1/2$
2$1/4$

Tables like this are a common way to present discrete distributions.

Distribution functions for discrete variables

Besides the pmf, another useful way to describe a probability distribution is the cumulative distribution function (cdf).

For any real number $x$, the cdf $F(x)$ of a random variable $X$ is defined by
$$
F(x) = P(X \le x).
$$

For a discrete $X$, $F(x)$ increases in jumps at the possible values of $X$.

For example, if $Y$ is the number of heads when flipping a fair coin twice, then:

For $x$ between 0 and 1, $F(x) = 1/4$; for $x$ between 1 and 2, $F(x) = 3/4$; and for $x \ge 2$, $F(x) = 1$.

The cdf has some general properties that hold for any random variable:

Continuous probability distributions

For a continuous random variable $X$, the probability of taking any single exact value is 0:
$$
P(X = a) = 0 \quad \text{for any real } a.
$$

Instead, probabilities are assigned to intervals of values (for example, $P(a < X \le b)$).

The rule that assigns probabilities to intervals is often described using a probability density function (pdf), $f(x)$.

Informally, $f(x)$ is a non-negative function such that probabilities are given by areas under the curve of $f$:
$$
P(a \le X \le b) = \int_a^b f(x)\,dx.
$$

A pdf must satisfy:

  1. $f(x) \ge 0$ for all $x$,
  2. The total area under the curve is 1:
    $$
    \int_{-\infty}^{\infty} f(x)\,dx = 1.
    $$

Example of a simple continuous distribution

Consider a random variable $X$ that is equally likely to take any value between 0 and 1 (this is called a uniform distribution on $[0,1]$).

The pdf is
$$
f(x) = \begin{cases}
1, & 0 \le x \le 1,\\[4pt]
0, & \text{otherwise.}
\end{cases}
$$

The total area under this curve from 0 to 1 is
$$
\int_0^1 1\,dx = 1,
$$
so it is a valid pdf.

The probability that $X$ lies between $0.2$ and $0.5$ is
$$
P(0.2 \le X \le 0.5) = \int_{0.2}^{0.5} 1\,dx = 0.3.
$$

Distribution functions for continuous variables

The cdf $F(x)$ of a continuous random variable $X$ is still defined by
$$
F(x) = P(X \le x).
$$

If $X$ has pdf $f(x)$, then
$$
F(x) = \int_{-\infty}^{x} f(t)\,dt.
$$

When the pdf is nice enough (for example, continuous), then:

For the uniform example above,
$$
F(x) =
\begin{cases}
0, & x < 0,\\[4pt]
x, & 0 \le x \le 1,\\[4pt]
1, & x > 1.
\end{cases}
$$

Expected value and variance of a distribution

To summarize a probability distribution numerically, two key concepts are:

You will use these quantities frequently when working with specific distributions.

Expected value for discrete distributions

If $X$ is a discrete random variable with pmf $p(x)$, its expected value $E[X]$ (or $\mu$) is
$$
E[X] = \sum_x x\,p(x).
$$

This is a weighted average of the possible values, with weights equal to their probabilities.

The variance of $X$, denoted $\operatorname{Var}(X)$ or $\sigma^2$, is
$$
\operatorname{Var}(X) = E\big[(X - \mu)^2\big]
= \sum_x (x - \mu)^2 p(x).
$$

Expected value for continuous distributions

If $X$ is a continuous random variable with pdf $f(x)$, then the expected value is
$$
E[X] = \int_{-\infty}^{\infty} x\,f(x)\,dx.
$$

The variance is
$$
\operatorname{Var}(X) = \int_{-\infty}^{\infty} (x - \mu)^2 f(x)\,dx,
$$
where $\mu = E[X]$.

The details of computing $E[X]$ and $\operatorname{Var}(X)$ for particular distributions (like the binomial or normal) will be covered in their own chapters. What matters here is that:

Interpreting and using probability distributions

A probability distribution is more than just a formula. It is a model of how a random variable behaves.

Some common uses:

In later chapters on the binomial and normal distributions, you will see concrete, widely used examples of probability distributions, and you will learn specific formulas and methods for working with them. Here, the key ideas are:

Views: 10

Comments

Please login to add a comment.

Don't have an account? Register now!