Table of Contents
What Is NumPy?
NumPy is a popular Python library for working with numbers and arrays of data. It is especially important in data science because it lets you:
- Store large sets of numbers efficiently
- Do math on whole arrays at once (vectorized operations)
- Work with data in a way that is similar to Excel tables or mathematical matrices
In most code, NumPy is imported like this:
import numpy as np
You will see np used a lot as the short name.
Installing and Importing NumPy
If NumPy is not already installed, you can install it using pip in the terminal:
pip install numpyThen, in your Python script or interactive session:
import numpy as np
Now you can use everything in NumPy with the np. prefix, such as np.array(), np.mean(), and more.
NumPy Arrays vs Python Lists
Python already has lists, so why use NumPy arrays?
Key differences:
- A NumPy array:
- Is usually faster and uses less memory for numeric data
- Is designed for math operations on many values at once
- Requires elements to be of (mostly) the same type (e.g., all numbers)
- A Python list:
- Can store mixed types (numbers, strings, etc.)
- Is more general-purpose, but slower for large numeric calculations
Example: a Python list vs a NumPy array:
# Python list
numbers_list = [1, 2, 3, 4]
# NumPy array
import numpy as np
numbers_array = np.array([1, 2, 3, 4])
print(numbers_list)
print(numbers_array)Both may look similar when printed, but they behave differently in math.
Creating NumPy Arrays
From Python lists
The most basic way to create an array is from a Python list:
import numpy as np
a = np.array([1, 2, 3, 4])
print(a)
print(type(a)) # <class 'numpy.ndarray'>
A NumPy array has the type numpy.ndarray (often just called an “ndarray”).
Multi-dimensional arrays
You can create 2D arrays (like a table or matrix) with a list of lists:
b = np.array([[1, 2, 3],
[4, 5, 6]])
print(b)This represents 2 rows and 3 columns.
Arrays filled with zeros or ones
NumPy can quickly create arrays filled with zeros or ones. This is useful when you need a starting array.
zeros = np.zeros(5) # 1D array of length 5
ones = np.ones((2, 3)) # 2 rows, 3 columns
print(zeros)
print(ones)Arrays with a range of numbers
To create sequences of numbers:
np.arange(start, stop, step)– likerange()but returns an arraynp.linspace(start, stop, num)– evenly spaced values between start and stop (inclusive)
x = np.arange(0, 10, 2) # 0, 2, 4, 6, 8
y = np.linspace(0, 1, 5) # 0. , 0.25, 0.5 , 0.75, 1.
print(x)
print(y)Array Shape and Dimensions
NumPy arrays can have different numbers of dimensions:
- 1D:
[1, 2, 3] - 2D: table of rows and columns
- 3D or more: used for more advanced data
Useful attributes:
array.shape– the size of each dimensionarray.ndim– the number of dimensionsarray.size– total number of elements
import numpy as np
a = np.array([1, 2, 3, 4])
b = np.array([[1, 2, 3],
[4, 5, 6]])
print(a.shape) # (4,)
print(a.ndim) # 1
print(b.shape) # (2, 3)
print(b.ndim) # 2
print(b.size) # 6Basic Indexing and Slicing
NumPy indexing is similar to Python lists, but also works in multiple dimensions.
Indexing 1D arrays
a = np.array([10, 20, 30, 40, 50])
print(a[0]) # first element: 10
print(a[2]) # third element: 30
print(a[-1]) # last element: 50Slicing 1D arrays
Use [start:stop] (stop is not included), and optional step:
print(a[1:4]) # elements at indices 1, 2, 3 -> [20 30 40]
print(a[:3]) # from start to index 2 -> [10 20 30]
print(a[2:]) # from index 2 to end -> [30 40 50]
print(a[::2]) # every 2nd element -> [10 30 50]Indexing 2D arrays
For 2D arrays, you use [row, column]:
b = np.array([[1, 2, 3],
[4, 5, 6]])
print(b[0, 0]) # first row, first column -> 1
print(b[1, 2]) # second row, third column -> 6You can also slice rows and columns:
print(b[0, :]) # first row -> [1 2 3]
print(b[:, 1]) # second column -> [2 5]
print(b[:, 0:2]) # all rows, first two columnsVectorized Operations (Math with Arrays)
One of NumPy’s biggest strengths is that you can apply operations to whole arrays at once without writing explicit loops.
Element-wise operations
When you add, subtract, multiply, or divide arrays of the same shape, NumPy performs the operation element by element.
import numpy as np
a = np.array([1, 2, 3])
b = np.array([10, 20, 30])
print(a + b) # [11 22 33]
print(b - a) # [ 9 18 27]
print(a * b) # [ 10 40 90]
print(b / a) # [10. 10. 10.]You can also do math with scalars (single numbers):
print(a * 2) # [2 4 6]
print(a + 5) # [6 7 8]This is called broadcasting when a smaller value (like a single number) is stretched to match the array shape in an operation.
Common mathematical functions
NumPy provides many functions that work element-wise on arrays:
x = np.array([0, 1, 2, 3])
print(np.sqrt(x)) # square root
print(np.exp(x)) # e^x
print(np.sin(x)) # sineEach function applies to every element of the array.
Basic Statistics with NumPy
NumPy makes it easy to compute simple statistics, which is very useful in data science.
Given an array data:
np.mean(data)– average (mean)np.median(data)– mediannp.min(data)/np.max(data)– minimum and maximumnp.std(data)– standard deviation (spread of values)np.sum(data)– sum of all values
Example:
import numpy as np
data = np.array([10, 20, 20, 40, 50])
print("Mean:", np.mean(data))
print("Median:", np.median(data))
print("Min:", np.min(data))
print("Max:", np.max(data))
print("Standard deviation:", np.std(data))
print("Sum:", np.sum(data))
For 2D arrays, you can compute along rows or columns by using the axis argument:
axis=0– along rows (down columns)axis=1– along columns (across rows)
m = np.array([[1, 2, 3],
[4, 5, 6]])
print(np.sum(m)) # sum of all values
print(np.sum(m, axis=0)) # sum per column -> [5 7 9]
print(np.sum(m, axis=1)) # sum per row -> [ 6 15]Reshaping Arrays
Sometimes you need to change the shape of an array without changing its data. NumPy lets you do this with reshape.
The total number of elements must stay the same. For example, if you have 6 elements, you can reshape into 2x3 or 3x2.
import numpy as np
a = np.array([1, 2, 3, 4, 5, 6])
b = a.reshape((2, 3)) # 2 rows, 3 columns
c = a.reshape((3, 2)) # 3 rows, 2 columns
print(b)
print(c)
You can let NumPy figure out one dimension by using -1:
d = a.reshape((2, -1)) # 2 rows, NumPy decides columns
print(d)Random Numbers with NumPy
Random numbers are often used in data science for testing, simulation, and splitting data.
NumPy has a random module available as np.random (for older NumPy) or np.random.default_rng (newer style). For beginners, np.random is simpler:
Random floats between 0 and 1
import numpy as np
r = np.random.rand(5) # 5 random floats in [0, 1)
print(r)Random integers
ints = np.random.randint(0, 10, size=5) # 5 integers from 0 to 9
print(ints)Random 2D arrays
matrix = np.random.rand(2, 3) # 2x3 array of random floats
print(matrix)A Tiny Data Science Example with NumPy
Here is a small example to show how NumPy might be used in a simple data science task.
Imagine we have the daily temperatures (in °C) for a week:
import numpy as np
temps = np.array([18.5, 20.1, 19.8, 22.0, 21.5, 19.0, 18.0])
print("Average temperature:", np.mean(temps))
print("Maximum temperature:", np.max(temps))
print("Minimum temperature:", np.min(temps))
# Convert all temperatures to Fahrenheit: F = C * 9/5 + 32
temps_f = temps * 9/5 + 32
print("Temperatures in Fahrenheit:", temps_f)This example shows:
- How to store numeric data in an array
- How to compute simple statistics
- How to apply a formula to all data at once using vectorized operations
Summary
In this chapter, you:
- Learned what NumPy is and how to install/import it
- Created 1D and 2D arrays using
np.array,np.zeros,np.ones,np.arange, andnp.linspace - Saw how to inspect array shape and dimensions
- Practiced indexing and slicing, including 2D indexing
- Used vectorized operations for fast math on arrays
- Calculated basic statistics like mean, min, max, and standard deviation
- Reshaped arrays with
reshape - Generated random numbers with
np.random
These basics will prepare you to combine NumPy with other libraries like pandas and matplotlib in later chapters.