Kahibaro
Discord Login Register

Descriptive Statistics

Overview

Descriptive statistics is about summarizing and describing data. Instead of looking at a long list of numbers, we use a small set of numbers, tables, and graphs to capture the main features of the data:

Descriptive statistics does not try to make predictions about the future or about a larger population. It only describes what has been observed. Using these summaries is the first step before doing any deeper statistical analysis.

In this chapter, we focus on the main descriptive tools that will later connect to the specific measures: mean, variance, and standard deviation.

Types of Data and Why They Matter

Before summarizing, it is important to keep in mind what kind of data you have, because not all descriptive summaries are appropriate for all data types.

Broadly, you may have:

For categorical data, we usually describe:

For numerical data, we usually describe:

The measures in later subsections (mean, variance, standard deviation) are for numerical data.

Frequency Tables

A frequency table shows how often each value (or each category) appears.

For categorical data, a simple frequency table might look like:

Category | Count | Relative frequency
---------|-------|-------------------
Car | 18 | 0.45
Bus | 12 | 0.30
Bike | 10 | 0.25

For numerical data with many possible values, we usually group values into intervals (also called classes or bins). For example, for test scores:

Score range | Count
-----------|------
0–49 | 2
50–59 | 5
60–69 | 10
70–79 | 8
80–89 | 4
90–100 | 1

Frequency tables make it easier to see patterns such as which values or ranges are most common.

Graphical Summaries

Graphs are visual forms of descriptive statistics. They help you see patterns at a glance.

Bar charts

A bar chart is used mainly for categorical data.

Bar charts help compare sizes of categories easily.

Histograms

A histogram is used for numerical data that have been grouped into intervals.

Histograms are useful for seeing:

Pie charts

A pie chart is a circle divided into slices, usually for categorical data.

Pie charts are mainly used to emphasize how a whole is divided among categories.

Boxplots (Box-and-Whisker Plots)

A boxplot is a compact summary of a numerical data set using a few special values:

These five values form the five-number summary. A boxplot shows:

Boxplots are helpful for quickly comparing different groups side by side and for spotting outliers and skewness.

Measures of Center

Measures of center try to capture a “typical” or “central” value for a numerical data set.

Common measures of center are:

You should know conceptually that:

The exact computation and deeper properties of the mean are treated in the “Mean” subsection; here we only place it among other descriptive measures.

Measures of Spread

A complete description of a data set’s center is not enough; it also matters how spread out the values are. Two groups can have the same mean but very different variability.

Key concepts:

Conceptually:

The range is easy to compute but depends only on two values. Measures like variance, standard deviation, and IQR use more of the data and give a richer description of variability.

Shape of a Distribution

The shape of a numerical distribution describes the overall pattern you see in a histogram or boxplot.

Some common shapes:

Understanding shape guides you in choosing appropriate summary measures and later modeling choices.

Outliers

An outlier is a data point that is unusually far from the rest of the values.

Outliers can occur for different reasons:

Descriptively, outliers matter because:

Boxplots and numerical criteria (for example, based on quartiles and IQR) are common tools to flag possible outliers, though deciding what to do with them depends on context.

Summarizing a Data Set in Practice

When you describe a data set, you often combine several descriptive statistics instead of relying on just one number. A typical descriptive summary for numerical data might include:

For categorical data, a typical descriptive summary might include:

Descriptive statistics provide the foundation for the later chapters in this section:

Views: 12

Comments

Please login to add a comment.

Don't have an account? Register now!