7.5.4 Namespaces

Table of Contents

Isolating the World with Namespaces

Linux namespaces give the kernel a way to make one system look like many smaller, isolated systems, all running on the same machine. Each process can see its own version of certain global resources, such as process IDs, network interfaces, or mounted filesystems. This idea is at the heart of containers on Linux, but namespaces are a general kernel feature that can be used without any container runtime.

At a high level, a namespace limits the visibility of some resource to a particular group of processes. When a process is placed into a namespace, what it sees for that resource can differ entirely from what processes outside the namespace see. The same kernel manages everything, but each namespace presents its own "view of the world" to its members.

Namespaces isolate views of resources among groups of processes. They do not, by themselves, guarantee security or resource limits. Combine namespaces with other features such as cgroups and MAC systems for stronger isolation.

This chapter focuses on the specific types of namespaces, how they change a process view, and the basic kernel interfaces used to work with them. Details of container tooling and resource control belong in other chapters.

Core Namespace Concepts

A namespace is identified inside the kernel by an internal object, but from user space you typically see it via special files under /proc and use system calls to create or join namespaces.

Each process has a set of namespace memberships, one per namespace type. Conceptually, you can imagine:

$$
\text{Process} = \{ \text{PID namespace}, \text{Mount namespace}, \text{UTS namespace}, \dots \}
$$

Two processes are in the same namespace for a given resource if they share the same namespace object for that resource. They may share some namespaces and differ in others. For example, they might share the same mount and network namespaces but have different PID namespaces.

Namespaces are hierarchical only for some types. In hierarchical namespaces, like PID and user namespaces, you can have a parent-child relationship so that a process in a parent can be aware of or control parts of the child namespace. Other namespaces do not have a strict tree in the same way, but processes still move into them via explicit operations.

System Calls for Namespaces

User space programs do not manage namespaces by direct manipulation of kernel structures. Instead, they rely on system calls that integrate namespace operations with process creation and control.

The main system calls related to namespaces are:

clone is an extended version of process creation that can place the new process into new namespaces. By passing specific flags, you request that the child be created in a new instance of a particular namespace type.

unshare takes the current process and detaches it from one or more namespaces by creating new ones for those resources and moving the calling process into them. This allows an existing process to "split off" its view without creating a child.

setns lets a process join an existing namespace, identified via a file descriptor that points to a namespace inode, typically somewhere under /proc. This permits one process to enter the namespaces of another, subject to permission checks.

The flags that represent namespace types are shared between these calls. They include constants such as CLONE_NEWNS for mount namespaces or CLONE_NEWNET for network namespaces. These constants are used when calling clone, unshare, or setns.

An example of using unshare from the command line is the unshare utility, which wraps the system call. For instance, you might run:

unshare --uts --ipc --mount --pid --fork /bin/bash

This will create new UTS, IPC, mount, and PID namespaces for the shell you start, so that its view of hostname, IPC objects, mounts, and process IDs is isolated from your original shell.

Mount Namespaces

A mount namespace isolates the set of mounted filesystems a process can see. The same underlying block devices and filesystems exist, but different mount namespaces can mount them at different paths, or not at all.

When a process is placed into a new mount namespace, it initially receives a copy of the parent's mount table. Changes in one mount namespace, like mounting or unmounting a filesystem, are not visible in other mount namespaces unless special propagation rules are used.

This ability to give processes different views of the filesystem is a crucial building block for containers, chroots, and sandboxing tools. Combining mount namespaces with chroot or pivot_root allows you to create an isolated root filesystem tree that bears little resemblance to the host's actual layout.

From /proc/<pid>/mountinfo you can inspect what the current mount namespace presents to a process. Two processes in different mount namespaces will report different mount trees even though they share the same underlying kernel and physical disks.

UTS Namespaces

UTS namespaces isolate system identifiers that are traditionally global, such as hostname and NIS domain name. The name comes from the historical "UNIX Time-sharing System" structures.

In an isolated UTS namespace, a process can change the hostname via calls like sethostname without affecting the hostname seen by processes outside that namespace. Each UTS namespace therefore has its own local notion of the system name.

You can see the current hostname with the hostname command. If you run it in different UTS namespaces, it may show different values, even though the same kernel is running everything. This is why each container can appear to have its own hostname.

UTS namespaces are often created together with PID and mount namespaces for a minimal container environment, so that the isolated processes not only have their own processes and filesystem tree, but also their own system name.

PID Namespaces

PID namespaces control the set of process identifiers that a process can see. Each PID namespace has its own process tree and its own mapping of numeric PIDs to tasks.

Inside a PID namespace, the first process you start becomes PID 1 for that namespace. This process is special within the namespace because it is responsible for reaping zombie children and is often used as the "init" process of a container. If PID 1 in a PID namespace exits, processes in that namespace are typically terminated and the namespace becomes empty.

PID namespaces are hierarchical. A parent PID namespace can see the processes in its child namespaces. The same process can have multiple PIDs, one per namespace in the hierarchy, and you might see a mapping like:

$$
\text{PID}_{\text{host}} \neq \text{PID}_{\text{container}}
$$

For example, the host might see a process as PID 12345, while inside the child PID namespace the same process is PID 1. You can inspect these relationships through files in /proc/<pid>/status and other proc entries.

This hierarchical model allows management tools in the host namespace to inspect or kill processes inside containers, while processes inside the container only see their own local PID space. It is a form of isolation that avoids PID collisions and hides unrelated processes.

Network Namespaces

Network namespaces give each namespace its own independent network stack. This includes its own set of network interfaces, IP addresses, routing tables, ARP tables, firewall rules, and so on.

In a fresh network namespace, there are no network interfaces except a loopback device, which often starts in a down state. To connect such a namespace to the outside world or to other namespaces, you typically create virtual interfaces such as veth pairs. One end of the pair is placed into the namespace and can be assigned IP addresses and routes that apply only within that namespace.

The key idea is that network configuration is duplicated per namespace. Two network namespaces can bind to the same port number on different virtual interfaces without conflict because the kernel treats them as separate networking domains.

Tools like ip netns provide a convenient interface around network namespaces. For example, you can create a network namespace and run a command inside it with:

ip netns add testns
ip netns exec testns bash

Within that shell, commands like ip addr or ip route reflect the configuration of the testns network namespace, not the host network stack.

IPC Namespaces

IPC namespaces isolate System V IPC objects and POSIX message queues. IPC stands for interprocess communication. Traditional IPC facilities, such as shared memory segments, semaphores, and message queues, are typically global resources identified by keys or IDs.

With IPC namespaces, each namespace has its own separate set of these objects. A process in one IPC namespace cannot directly see or attach to the IPC objects in another. Tools like ipcs will show different lists of IPC resources depending on which IPC namespace the process belongs to.

This isolation prevents accidental or malicious interference between processes in different environments that might otherwise share IPC keys. It is particularly important when running multiple containerized applications that use generic or well-known IPC identifiers.

User Namespaces

User namespaces are one of the most powerful and subtle namespace types. They change how the kernel interprets user and group IDs for a process, permitting different identity mappings inside and outside the namespace.

Within a user namespace, a process can appear as root (UID 0) while corresponding to an unprivileged user ID on the host. This mapping is described by tables that relate inside-namespace UIDs and GIDs to outside-namespace UIDs and GIDs.

Conceptually, you can think of a mapping as a piecewise linear function:

$$
\text{UID}_{\text{inside}} \mapsto \text{UID}_{\text{outside}}
$$

with regions that are contiguous ranges. These mappings are configured via /proc/<pid>/uid_map and /proc/<pid>/gid_map. Files like /etc/subuid and /etc/subgid specify what ranges of subordinate IDs the system is willing to allocate for an unprivileged user.

Inside a user namespace, processes with UID 0 have capabilities that are valid only within that namespace. They can, for example, create additional namespaces or manipulate certain resources that belong to that namespace. However, unless the mapping includes host UID 0, this root is not root on the host and cannot perform privileged operations outside the namespace.

User namespaces are hierarchical. A new user namespace can be created by a process with sufficient capability in its current namespace. Child namespaces can further remap IDs, but cannot elevate privileges with respect to their parent.

User namespaces allow unprivileged processes to create environments where they are root inside, but not on the host. This enables unprivileged containers but also introduces complex security considerations, so many distributions configure user namespaces conservatively.

Cgroup Namespaces

Although cgroups themselves are discussed in another chapter, there is a namespace type that affects how cgroups appear to processes. A cgroup namespace gives processes their own view of the cgroup hierarchy, rooted at some point in the actual cgroup tree.

When a process is moved into a new cgroup namespace, its /proc/self/cgroup entries and the visible cgroup paths start at a new apparent root. This does not change the underlying resource control or the actual cgroup placement, but it hides parts of the cgroup hierarchy above the namespace root.

This is useful for making containers self-contained from the perspective of tooling that inspects /proc or cgroup-related filesystems. Tools inside a container can see and manage cgroups relative to their own apparent root without needing to know about host-level structure.

Time Namespaces

Time namespaces provide per-namespace control over certain system clocks. This allows different groups of processes to perceive time differently from each other, within defined limits.

Time namespaces currently focus on monotonic and boot-time related clocks, such as CLOCK_MONOTONIC and CLOCK_BOOTTIME. The real-time clock, which reflects wall-clock time, is treated separately through other mechanisms.

Inside a time namespace, offsets can be applied to these clocks. For example, you might have:

$$
\text{CLOCK\_MONOTONIC}_{\text{namespace}}(t) = \text{CLOCK\_MONOTONIC}_{\text{host}}(t) + \Delta
$$

where $\Delta$ is a fixed offset configured for the namespace. This is useful for testing time-dependent behavior, simulating system uptime, or running reproducible workloads that rely on monotonic time.

Time namespaces interact with timer and sleep APIs, such as clock_gettime, clock_nanosleep, and related functionality, so that processes see their adjusted notion of time consistently.

Namespace Files in /proc

Linux exposes namespace membership through special files in /proc/<pid>/ns. Each of these entries is a symbolic link that points to a kernel-internal representation of a namespace, often including a unique identifier. For instance, listing /proc/self/ns may show entries such as mnt, uts, pid, net, ipc, user, cgroup, and time.

These links are not ordinary files; they represent references to namespace objects. If you open one of these paths and pass the resulting file descriptor to setns, you can join the corresponding namespace, subject to permission checks.

Because these paths are unique per namespace, you can compare them across processes to determine if they share a given namespace. The inode numbers in the stat output of these links are often used for that purpose. If two processes have mnt links that refer to the same inode, they are in the same mount namespace.

Nesting and Combining Namespaces

Namespaces are most powerful when combined. A typical container might involve a user namespace for identity mapping, a PID namespace for process isolation, a mount namespace for filesystem isolation, a network namespace for network stack isolation, and UTS and IPC namespaces for name and IPC isolation. Cgroup and time namespaces can further refine the environment.

Not all namespaces are hierarchical in the same way, but there are common patterns:

User namespaces form a base for unprivileged containers because privilege within a namespace is decoupled from host privilege.

PID namespaces are often nested under user namespaces, giving each container its own process tree under a container-specific root user.

Mount and network namespaces are frequently created together so that the container has its own filesystem view and networking.

The order of creation is important because some operations require particular capabilities at the time the namespace is created. For example, creating certain namespaces or configuring their resources often requires effective capabilities inside the user namespace.

From a kernel perspective, the process state includes a vector of namespace pointers. Operations that affect a resource consult the namespace associated with that resource type for the current process. This design allows the kernel to multiplex global resources across many isolated views.

Limitations and Interactions

Namespaces are a mechanism for isolation of views, not a full virtualization solution by themselves. There are several important limitations and interactions:

Security is not guaranteed by namespaces alone. Many kernel interfaces and side channels, such as certain /proc details or hardware performance counters, may leak information across namespaces unless carefully managed alongside other hardening features.

Resource limits come from cgroups, quotas, and related mechanisms, not from namespaces. It is easy to create an isolated environment that still has unrestricted access to CPU, memory, or disk bandwidth unless these are managed separately.

Some operations remain global or are only partially namespaced, depending on kernel version and configuration. The set of resources covered by each namespace type has evolved over time.

Privilege transitions must be handled carefully. User namespaces, in particular, provide a complex model where processes can be privileged inside a namespace yet unprivileged outside. Kernel bugs in this area can have serious security implications, which is why distributions sometimes restrict or disable unprivileged user namespaces.

Despite these caveats, namespaces provide the fundamental isolation building blocks used by higher level systems such as container runtimes, sandboxing tools, and multi-tenant hosting environments. Understanding namespaces at this level gives you insight into how a single Linux kernel can safely host many seemingly independent systems.

Comments

Please login to add a comment.

Don't have an account? Register now!