Kahibaro
Discord Login Register

Introduction to NumPy

What Is NumPy?

NumPy is a popular Python library for working with numbers and arrays of data. It is especially important in data science because it lets you:

In most code, NumPy is imported like this:

import numpy as np

You will see np used a lot as the short name.


Installing and Importing NumPy

If NumPy is not already installed, you can install it using pip in the terminal:

pip install numpy

Then, in your Python script or interactive session:

import numpy as np

Now you can use everything in NumPy with the np. prefix, such as np.array(), np.mean(), and more.


NumPy Arrays vs Python Lists

Python already has lists, so why use NumPy arrays?

Key differences:

Example: a Python list vs a NumPy array:

# Python list
numbers_list = [1, 2, 3, 4]
# NumPy array
import numpy as np
numbers_array = np.array([1, 2, 3, 4])
print(numbers_list)
print(numbers_array)

Both may look similar when printed, but they behave differently in math.


Creating NumPy Arrays

From Python lists

The most basic way to create an array is from a Python list:

import numpy as np
a = np.array([1, 2, 3, 4])
print(a)
print(type(a))  # <class 'numpy.ndarray'>

A NumPy array has the type numpy.ndarray (often just called an “ndarray”).

Multi-dimensional arrays

You can create 2D arrays (like a table or matrix) with a list of lists:

b = np.array([[1, 2, 3],
              [4, 5, 6]])
print(b)

This represents 2 rows and 3 columns.

Arrays filled with zeros or ones

NumPy can quickly create arrays filled with zeros or ones. This is useful when you need a starting array.

zeros = np.zeros(5)        # 1D array of length 5
ones = np.ones((2, 3))     # 2 rows, 3 columns
print(zeros)
print(ones)

Arrays with a range of numbers

To create sequences of numbers:

x = np.arange(0, 10, 2)      # 0, 2, 4, 6, 8
y = np.linspace(0, 1, 5)     # 0. , 0.25, 0.5 , 0.75, 1.
print(x)
print(y)

Array Shape and Dimensions

NumPy arrays can have different numbers of dimensions:

Useful attributes:

import numpy as np
a = np.array([1, 2, 3, 4])
b = np.array([[1, 2, 3],
              [4, 5, 6]])
print(a.shape)  # (4,)
print(a.ndim)   # 1
print(b.shape)  # (2, 3)
print(b.ndim)   # 2
print(b.size)   # 6

Basic Indexing and Slicing

NumPy indexing is similar to Python lists, but also works in multiple dimensions.

Indexing 1D arrays

a = np.array([10, 20, 30, 40, 50])
print(a[0])   # first element: 10
print(a[2])   # third element: 30
print(a[-1])  # last element: 50

Slicing 1D arrays

Use [start:stop] (stop is not included), and optional step:

print(a[1:4])   # elements at indices 1, 2, 3 -> [20 30 40]
print(a[:3])    # from start to index 2 -> [10 20 30]
print(a[2:])    # from index 2 to end -> [30 40 50]
print(a[::2])   # every 2nd element -> [10 30 50]

Indexing 2D arrays

For 2D arrays, you use [row, column]:

b = np.array([[1, 2, 3],
              [4, 5, 6]])
print(b[0, 0])  # first row, first column -> 1
print(b[1, 2])  # second row, third column -> 6

You can also slice rows and columns:

print(b[0, :])   # first row -> [1 2 3]
print(b[:, 1])   # second column -> [2 5]
print(b[:, 0:2]) # all rows, first two columns

Vectorized Operations (Math with Arrays)

One of NumPy’s biggest strengths is that you can apply operations to whole arrays at once without writing explicit loops.

Element-wise operations

When you add, subtract, multiply, or divide arrays of the same shape, NumPy performs the operation element by element.

import numpy as np
a = np.array([1, 2, 3])
b = np.array([10, 20, 30])
print(a + b)   # [11 22 33]
print(b - a)   # [ 9 18 27]
print(a * b)   # [ 10  40  90]
print(b / a)   # [10. 10. 10.]

You can also do math with scalars (single numbers):

print(a * 2)   # [2 4 6]
print(a + 5)   # [6 7 8]

This is called broadcasting when a smaller value (like a single number) is stretched to match the array shape in an operation.

Common mathematical functions

NumPy provides many functions that work element-wise on arrays:

x = np.array([0, 1, 2, 3])
print(np.sqrt(x))   # square root
print(np.exp(x))    # e^x
print(np.sin(x))    # sine

Each function applies to every element of the array.


Basic Statistics with NumPy

NumPy makes it easy to compute simple statistics, which is very useful in data science.

Given an array data:

Example:

import numpy as np
data = np.array([10, 20, 20, 40, 50])
print("Mean:", np.mean(data))
print("Median:", np.median(data))
print("Min:", np.min(data))
print("Max:", np.max(data))
print("Standard deviation:", np.std(data))
print("Sum:", np.sum(data))

For 2D arrays, you can compute along rows or columns by using the axis argument:

m = np.array([[1, 2, 3],
              [4, 5, 6]])
print(np.sum(m))           # sum of all values
print(np.sum(m, axis=0))   # sum per column -> [5 7 9]
print(np.sum(m, axis=1))   # sum per row    -> [ 6 15]

Reshaping Arrays

Sometimes you need to change the shape of an array without changing its data. NumPy lets you do this with reshape.

The total number of elements must stay the same. For example, if you have 6 elements, you can reshape into 2x3 or 3x2.

import numpy as np
a = np.array([1, 2, 3, 4, 5, 6])
b = a.reshape((2, 3))   # 2 rows, 3 columns
c = a.reshape((3, 2))   # 3 rows, 2 columns
print(b)
print(c)

You can let NumPy figure out one dimension by using -1:

d = a.reshape((2, -1))  # 2 rows, NumPy decides columns
print(d)

Random Numbers with NumPy

Random numbers are often used in data science for testing, simulation, and splitting data.

NumPy has a random module available as np.random (for older NumPy) or np.random.default_rng (newer style). For beginners, np.random is simpler:

Random floats between 0 and 1

import numpy as np
r = np.random.rand(5)      # 5 random floats in [0, 1)
print(r)

Random integers

ints = np.random.randint(0, 10, size=5)  # 5 integers from 0 to 9
print(ints)

Random 2D arrays

matrix = np.random.rand(2, 3)  # 2x3 array of random floats
print(matrix)

A Tiny Data Science Example with NumPy

Here is a small example to show how NumPy might be used in a simple data science task.

Imagine we have the daily temperatures (in °C) for a week:

import numpy as np
temps = np.array([18.5, 20.1, 19.8, 22.0, 21.5, 19.0, 18.0])
print("Average temperature:", np.mean(temps))
print("Maximum temperature:", np.max(temps))
print("Minimum temperature:", np.min(temps))
# Convert all temperatures to Fahrenheit: F = C * 9/5 + 32
temps_f = temps * 9/5 + 32
print("Temperatures in Fahrenheit:", temps_f)

This example shows:

Summary

In this chapter, you:

These basics will prepare you to combine NumPy with other libraries like pandas and matplotlib in later chapters.

Views: 15

Comments

Please login to add a comment.

Don't have an account? Register now!