Table of Contents
Understanding Files
In programming, a file is a named collection of data stored on a device (like your hard drive, SSD, or USB stick). Python programs often read from files and write data to them, so understanding what a file is—and how it’s organized—is important before you start working with them in code.
This section focuses on what files are conceptually, not yet on the Python code to use them.
Files as Long-Term Storage
When your program runs, it uses memory (RAM), which is temporary. As soon as the program stops (or the computer is turned off), anything stored only in memory is lost.
A file provides persistent storage:
- Data is saved on disk, not just in memory.
- The data stays there until you delete or change the file.
- You (or other programs) can use the data again later.
Examples of files you already use:
- A
.txtdocument with notes. - An
.mp3music file. - A
.jpgphoto. - A
.pyPython script.
From Python’s point of view, all of these are just data stored in files.
File Names and Extensions
Every file has:
- A name, like
shopping_list - Often an extension, like
.txtor.csv
Together they form the file name, like:
shopping_list.txtdata.csvscript.py
The extension:
- Is usually the part after the last dot in the file name.
- Gives humans and programs a hint about the file’s format or type.
- Does not actually store the type; it’s more of a convention.
Some common extensions you will likely meet in Python work:
.txt– plain text.csv– comma-separated values (simple table-like data).json– structured data, often used for configuration or web APIs.py– Python source code files
Where Files Live: Folders and Paths
Files are organized in folders (also called directories). Folders can contain:
- Files
- Other folders
You can think of it as a tree:
- The top-level is often called the root.
- Each folder can have subfolders, and so on.
To uniquely describe where a file is, you use a path.
Absolute vs Relative Paths (Concept Only)
Without going into Python code yet, you will see two general ideas:
- Absolute path: Describes the location from the root of the file system.
Example (Windows):C:\Users\Alice\Documents\notes.txt
Example (macOS/Linux):/home/alice/Documents/notes.txt - Relative path: Describes the location starting from “where you are now” (the current folder).
Example:notes.txt(the file is in the current folder)
Example:data/values.csv(inside adatasubfolder)
You will use these paths in Python when opening files.
How Computers See Files: Bytes
Inside, a file is just a sequence of bytes.
- A byte is a small unit of data (8 bits).
- Every type of file—text, image, audio—is stored as bytes.
- The meaning of those bytes depends on the file format.
From a programming perspective, you can think of a file as:
- An ordered sequence:
$$ b_1, b_2, b_3, \dots, b_n $$ - Where each $b_i$ is a byte.
Python can read or write these bytes directly or interpret them as text using an encoding (like UTF-8).
Text Files vs Binary Files
A very important distinction for programming is between text files and binary files.
Text Files
Text files store human-readable text. They contain characters like letters, digits, punctuation, and special characters (spaces, newlines, etc.), all represented using an encoding.
Common examples:
.txt– plain text.py– Python code.csv– table-like data, where lines and commas separate values.json– structured text data
Conceptually:
- The file is a sequence of characters.
- An encoding (such as UTF-8) turns each character into bytes and back.
When Python works with text files, it typically:
- Reads bytes from the file.
- Decodes them into characters using an encoding.
Binary Files
Binary files store data that is not meant to be read as plain text. The bytes represent other kinds of information, such as:
- Pixel data in an image (
.png,.jpg) - Sound samples in an audio file (
.mp3,.wav) - Compiled programs (
.exe) - Some specialized data formats
When a human opens a binary file in a text editor, it usually looks like random characters, because the bytes are not meant to map cleanly to readable text.
When Python works with binary files, it:
- Reads and writes raw bytes directly.
- Does not try to interpret them as characters.
How Programs Use Files
Most interactions with a file follow the same general pattern:
- Open the file.
- Read from it or write to it.
- Close the file.
Conceptually, when a file is open, Python has a connection to that file on disk. You then:
- Read its contents (e.g., to process saved data).
- Write new data (e.g., to log results, save a configuration, store user input).
The details of how to open, read, write, and close files are handled in later sections of this chapter. For now, the key idea is that:
- A file is an external, named storage of data that your program can use over multiple runs.
Files and Data Organization
Files are one of the simplest ways to organize data:
- You can use one file per type of data, such as:
users.txtfor user namessettings.jsonfor configuration- You can use folders to group related files:
data/2024/sales.csvdata/2024/customers.csv
Later, you might use more advanced storage (like databases), but they still rely on files underneath. Knowing what a file is and how it behaves is the foundation.
Why Files Matter in Python
In many practical Python programs, files are involved, for example:
- Loading input data from a text or CSV file.
- Saving the results of a calculation.
- Keeping logs of what your program did.
- Reading configuration from JSON files.
- Storing small amounts of data without needing a full database.
All of those tasks rely on the basic concept:
- A file is a named, persistent sequence of bytes on disk that your program can read from and write to.
In the next sections of this chapter, you will see how to use Python to read existing files, create new ones, and work with their contents.