5.3 Image Layers Explained

Table of Contents

Understanding Image Layers

Docker images are made of layers stacked on top of each other. Each layer stores a change to the filesystem, and together these changes form a complete image. When you run a container, Docker combines all these layers into a single view that looks like one filesystem.

Layers let Docker avoid storing the same data multiple times. If several images share a common base layer, Docker keeps that layer only once on your machine. This saves disk space and also speeds up many operations.

How Layers Are Created

Every time you build an image from a Dockerfile, Docker creates new layers from certain instructions. In most cases, each instruction that changes the filesystem becomes its own layer. For example, a simple Dockerfile:

FROM ubuntu:22.04
RUN apt-get update
RUN apt-get install -y curl
COPY app/ /usr/src/app/
CMD ["python", "app.py"]

is turned into a set of layers.

The base image line, the FROM instruction, defines the starting layers. The official ubuntu:22.04 image is itself made of multiple layers maintained by its authors. Your Dockerfile then adds more layers on top.

Each RUN that installs packages or modifies files writes a new layer. The COPY instruction that brings your code into the image also creates a layer. The final CMD instruction usually does not change the filesystem and is not stored as a layer, it is stored as metadata. The result is an ordered chain of layers that represent each step of your build.

The Union Filesystem and the Layer Stack

To make layered images usable, Docker uses a union filesystem. This system shows multiple read only layers and one writable layer as if they were a single filesystem.

At the bottom of the stack are the base layers from the FROM image. On top of those are your custom layers from RUN, COPY and similar instructions. When a container starts, Docker adds one more layer on top, the container layer, which is writable.

Reads go from the top down. If a file exists in the topmost relevant layer, Docker uses that version. If not, Docker looks downward through the stack until it finds the file. Writes always go into the container’s writable layer, never into the read only image layers.

This design ensures that many containers can share the same image layers. Each container only needs its own thin writable layer for any runtime changes.

Copy on Write Behavior

Layers are read only, and containers get a writable layer. When a container modifies a file that originally came from an image layer, Docker uses a technique called copy on write.

When the container first writes to a file, Docker copies that file from the lower, read only layer into the container’s writable layer. The container then changes this copy. From that point on, the container sees only its modified version. Other containers that use the same image still see the original file because they have their own writable layers.

If you delete a file inside a container, Docker cannot remove it from the original image layer. Instead, it records a marker in the container’s writable layer that hides the file. The file still exists in the underlying image layer on disk, but it is not visible inside that container.

This behavior is important for understanding why containers can diverge from the images they were created from, even though they share common layers beneath.

Layer Caching During Builds

When you run docker build, Docker tries to reuse existing layers from previous builds. This is called layer caching and it can speed up builds dramatically if your Dockerfile is structured well.

Docker compares each instruction and the files it depends on with what it has seen before. If nothing relevant has changed, Docker reuses the previously built layer. If something changes, Docker must rebuild that instruction and all instructions that follow it.

The cache works top to bottom. Once a layer cannot be reused, every subsequent layer in the Dockerfile is rebuilt. For example, if you change something in a RUN instruction near the top, all later steps will also be rebuilt and cached again.

Important rule: As soon as Docker cannot use the cache for a Dockerfile instruction, all following instructions will also be rebuilt and cannot reuse previous cached layers from older builds.

This rule explains why the order of instructions in a Dockerfile has such a strong effect on build performance. It also explains why putting frequently changing content, such as your application source code, in later instructions makes builds much faster.

Layer Reuse Across Images

Layers are identified by a content based hash. If two different images share a layer with the same content, Docker stores that layer once and both images reference it.

For example, if you have several images based on node:18, they share all the node:18 layers. Docker will only need to pull those layers from a registry once. Further pulls of other node:18 based images will be faster and use less network traffic because only the new layers on top have to be downloaded.

This reuse applies not only across your own images, but also across official and third party images, as long as they use the same exact layer contents. It is one of the main reasons container images are efficient for distribution.

Layers and Image Size

Each layer adds to the final image size. The total size of an image is the sum of the unique layer sizes that are not already on your system. If several layers contain large files, the image becomes large.

Removing a file in a later layer does not reduce the size of earlier layers. The file data remains in the lower layer. The upper layer only records that the file should no longer appear. This subtle behavior means that poorly structured Dockerfiles can easily lead to images that are larger than expected.

You will explore methods to control and reduce image size in the chapter dedicated to image size optimization. Here, it is enough to understand that each layer is permanent and that the way you write your instructions determines how much data ends up frozen into each layer.

Inspecting Layers in Practice

Docker lets you see information about layers to help you understand how an image is built. You can inspect an image to view metadata such as the list of layers and their identifiers. You can also inspect images using more advanced tools and registry interfaces to see how many layers an image has and how large they are.

When you pull an image, Docker prints out the layers as they download. Each line corresponds to a layer. Some may already be present, in which case Docker will say they are already downloaded. These messages make the layering mechanism visible and show when your system is reusing existing content.

By understanding how image layers work, you gain the foundation needed to design faster builds, smaller images, and more efficient use of storage in later chapters.

Comments

Please login to add a comment.

Don't have an account? Register now!