16.3 Introduction to NumPy

What Is NumPy?

NumPy is a popular Python library for working with numbers and arrays of data. It is especially important in data science because it lets you:

Store large sets of numbers efficiently
Do math on whole arrays at once (vectorized operations)
Work with data in a way that is similar to Excel tables or mathematical matrices

In most code, NumPy is imported like this:

import numpy as np

You will see np used a lot as the short name.

Installing and Importing NumPy

If NumPy is not already installed, you can install it using pip in the terminal:

pip install numpy

Then, in your Python script or interactive session:

import numpy as np

Now you can use everything in NumPy with the np. prefix, such as np.array(), np.mean(), and more.

NumPy Arrays vs Python Lists

Python already has lists, so why use NumPy arrays?

Key differences:

A NumPy array:

Is usually faster and uses less memory for numeric data
Is designed for math operations on many values at once
Requires elements to be of (mostly) the same type (e.g., all numbers)

A Python list:

Can store mixed types (numbers, strings, etc.)
Is more general-purpose, but slower for large numeric calculations

Example: a Python list vs a NumPy array:

# Python list
numbers_list = [1, 2, 3, 4]
# NumPy array
import numpy as np
numbers_array = np.array([1, 2, 3, 4])
print(numbers_list)
print(numbers_array)

Both may look similar when printed, but they behave differently in math.

Creating NumPy Arrays

From Python lists

The most basic way to create an array is from a Python list:

import numpy as np
a = np.array([1, 2, 3, 4])
print(a)
print(type(a))  # <class 'numpy.ndarray'>

A NumPy array has the type numpy.ndarray (often just called an “ndarray”).

Multi-dimensional arrays

You can create 2D arrays (like a table or matrix) with a list of lists:

b = np.array([[1, 2, 3],
              [4, 5, 6]])
print(b)

This represents 2 rows and 3 columns.

Arrays filled with zeros or ones

NumPy can quickly create arrays filled with zeros or ones. This is useful when you need a starting array.

zeros = np.zeros(5)        # 1D array of length 5
ones = np.ones((2, 3))     # 2 rows, 3 columns
print(zeros)
print(ones)

Arrays with a range of numbers

To create sequences of numbers:

np.arange(start, stop, step) – like range() but returns an array
np.linspace(start, stop, num) – evenly spaced values between start and stop (inclusive)

x = np.arange(0, 10, 2)      # 0, 2, 4, 6, 8
y = np.linspace(0, 1, 5)     # 0. , 0.25, 0.5 , 0.75, 1.
print(x)
print(y)

Array Shape and Dimensions

NumPy arrays can have different numbers of dimensions:

1D: [1, 2, 3]
2D: table of rows and columns
3D or more: used for more advanced data

Useful attributes:

array.shape – the size of each dimension
array.ndim – the number of dimensions
array.size – total number of elements

import numpy as np
a = np.array([1, 2, 3, 4])
b = np.array([[1, 2, 3],
              [4, 5, 6]])
print(a.shape)  # (4,)
print(a.ndim)   # 1
print(b.shape)  # (2, 3)
print(b.ndim)   # 2
print(b.size)   # 6

Basic Indexing and Slicing

NumPy indexing is similar to Python lists, but also works in multiple dimensions.

Indexing 1D arrays

a = np.array([10, 20, 30, 40, 50])
print(a[0])   # first element: 10
print(a[2])   # third element: 30
print(a[-1])  # last element: 50

Slicing 1D arrays

Use [start:stop] (stop is not included), and optional step:

print(a[1:4])   # elements at indices 1, 2, 3 -> [20 30 40]
print(a[:3])    # from start to index 2 -> [10 20 30]
print(a[2:])    # from index 2 to end -> [30 40 50]
print(a[::2])   # every 2nd element -> [10 30 50]

Indexing 2D arrays

For 2D arrays, you use [row, column]:

b = np.array([[1, 2, 3],
              [4, 5, 6]])
print(b[0, 0])  # first row, first column -> 1
print(b[1, 2])  # second row, third column -> 6

You can also slice rows and columns:

print(b[0, :])   # first row -> [1 2 3]
print(b[:, 1])   # second column -> [2 5]
print(b[:, 0:2]) # all rows, first two columns

Vectorized Operations (Math with Arrays)

One of NumPy’s biggest strengths is that you can apply operations to whole arrays at once without writing explicit loops.

Element-wise operations

When you add, subtract, multiply, or divide arrays of the same shape, NumPy performs the operation element by element.

import numpy as np
a = np.array([1, 2, 3])
b = np.array([10, 20, 30])
print(a + b)   # [11 22 33]
print(b - a)   # [ 9 18 27]
print(a * b)   # [ 10  40  90]
print(b / a)   # [10. 10. 10.]

You can also do math with scalars (single numbers):

print(a * 2)   # [2 4 6]
print(a + 5)   # [6 7 8]

This is called broadcasting when a smaller value (like a single number) is stretched to match the array shape in an operation.

Common mathematical functions

NumPy provides many functions that work element-wise on arrays:

x = np.array([0, 1, 2, 3])
print(np.sqrt(x))   # square root
print(np.exp(x))    # e^x
print(np.sin(x))    # sine

Each function applies to every element of the array.

Basic Statistics with NumPy

NumPy makes it easy to compute simple statistics, which is very useful in data science.

Given an array data:

np.mean(data) – average (mean)
np.median(data) – median
np.min(data) / np.max(data) – minimum and maximum
np.std(data) – standard deviation (spread of values)
np.sum(data) – sum of all values

Example:

import numpy as np
data = np.array([10, 20, 20, 40, 50])
print("Mean:", np.mean(data))
print("Median:", np.median(data))
print("Min:", np.min(data))
print("Max:", np.max(data))
print("Standard deviation:", np.std(data))
print("Sum:", np.sum(data))

For 2D arrays, you can compute along rows or columns by using the axis argument:

axis=0 – along rows (down columns)
axis=1 – along columns (across rows)

m = np.array([[1, 2, 3],
              [4, 5, 6]])
print(np.sum(m))           # sum of all values
print(np.sum(m, axis=0))   # sum per column -> [5 7 9]
print(np.sum(m, axis=1))   # sum per row    -> [ 6 15]

Reshaping Arrays

Sometimes you need to change the shape of an array without changing its data. NumPy lets you do this with reshape.

The total number of elements must stay the same. For example, if you have 6 elements, you can reshape into 2x3 or 3x2.

import numpy as np
a = np.array([1, 2, 3, 4, 5, 6])
b = a.reshape((2, 3))   # 2 rows, 3 columns
c = a.reshape((3, 2))   # 3 rows, 2 columns
print(b)
print(c)

You can let NumPy figure out one dimension by using -1:

d = a.reshape((2, -1))  # 2 rows, NumPy decides columns
print(d)

Random Numbers with NumPy

Random numbers are often used in data science for testing, simulation, and splitting data.

NumPy has a random module available as np.random (for older NumPy) or np.random.default_rng (newer style). For beginners, np.random is simpler:

Random floats between 0 and 1

import numpy as np
r = np.random.rand(5)      # 5 random floats in [0, 1)
print(r)

Random integers

ints = np.random.randint(0, 10, size=5)  # 5 integers from 0 to 9
print(ints)

Random 2D arrays

matrix = np.random.rand(2, 3)  # 2x3 array of random floats
print(matrix)

A Tiny Data Science Example with NumPy

Here is a small example to show how NumPy might be used in a simple data science task.

Imagine we have the daily temperatures (in °C) for a week:

import numpy as np
temps = np.array([18.5, 20.1, 19.8, 22.0, 21.5, 19.0, 18.0])
print("Average temperature:", np.mean(temps))
print("Maximum temperature:", np.max(temps))
print("Minimum temperature:", np.min(temps))
# Convert all temperatures to Fahrenheit: F = C * 9/5 + 32
temps_f = temps * 9/5 + 32
print("Temperatures in Fahrenheit:", temps_f)

This example shows:

How to store numeric data in an array
How to compute simple statistics
How to apply a formula to all data at once using vectorized operations

Summary

In this chapter, you:

Learned what NumPy is and how to install/import it
Created 1D and 2D arrays using np.array, np.zeros, np.ones, np.arange, and np.linspace
Saw how to inspect array shape and dimensions
Practiced indexing and slicing, including 2D indexing
Used vectorized operations for fast math on arrays
Calculated basic statistics like mean, min, max, and standard deviation
Reshaped arrays with reshape
Generated random numbers with np.random

These basics will prepare you to combine NumPy with other libraries like pandas and matplotlib in later chapters.

Comments

Please login to add a comment.

Don't have an account? Register now!