7.5.4 Namespaces

Table of Contents

Overview: What Namespaces Solve

Namespaces isolate global kernel resources so that different sets of processes see different “worlds”:

Different PIDs
Different network stack
Different filesystem layout
Different hostname / domain
Different users and IDs
Different IPC objects
Different control groups

They are the core building block of containers, but they are much more general: any process can enter its own set of namespaces and live in a “virtualized” view of the machine.

From here on, assume you already understand basic process lifecycle and cgroups at a high level; this chapter focuses on namespaces specifically.

Types of Linux Namespaces

Each namespace type virtualizes a particular kernel resource. A process can be in exactly one namespace of each type at any time, but in multiple different types simultaneously (e.g., one PID namespace, one mount namespace, one network namespace, etc.).

The main types (as of modern kernels) are:

Mount (CLONE_NEWNS)
UTS (CLONE_NEWUTS)
IPC (CLONE_NEWIPC)
PID (CLONE_NEWPID)
Network (CLONE_NEWNET)
User (CLONE_NEWUSER)
Cgroup (CLONE_NEWCGROUP)
Time (CLONE_NEWTIME) — newer, often less covered

Each is identified in the kernel by an internal ID and exposed to user space via /proc.

How Namespaces Are Represented

Every process has namespace membership tracked by the kernel. User space visibility is primarily through:

/proc/<pid>/ns/ directory
Each file in this directory is a special reference to a namespace

Example:

$ ls -l /proc/$$/ns
total 0
lrwxrwxrwx 1 user user 0 ... cgroup -> 'cgroup:[4026531835]'
lrwxrwxrwx 1 user user 0 ... ipc    -> 'ipc:[4026531839]'
lrwxrwxrwx 1 user user 0 ... mnt    -> 'mnt:[4026531840]'
lrwxrwxrwx 1 user user 0 ... net    -> 'net:[4026531992]'
lrwxrwxrwx 1 user user 0 ... pid    -> 'pid:[4026531836]'
lrwxrwxrwx 1 user user 0 ... pid_for_children -> 'pid:[4026531836]'
lrwxrwxrwx 1 user user 0 ... time   -> 'time:[4026531834]'
lrwxrwxrwx 1 user user 0 ... user   -> 'user:[4026531837]'
lrwxrwxrwx 1 user user 0 ... uts    -> 'uts:[4026531838]'

The bracketed number is a kernel-internal namespace ID.

Two PIDs share a namespace if their corresponding /proc/<pid>/ns/<type> links point to the same underlying object (same ID).

Mount (mnt) Namespace

Mount namespaces provide each process group with its own view of the filesystem tree:

Separate set of mount points
Ability to mount/umount without affecting other namespaces
Foundations for chroot-like isolation, but more powerful

Key behaviors:

When a new mount namespace is created, it starts with a copy of the parent’s mount table (copy-on-write semantics).
Subsequent mounts/unmounts are local to that namespace (unless marked as shared/propagated).

Practical implications:

Containers have their own root filesystem by using a mount namespace + pivot_root or chroot.
You can hide host paths, add bind-mounts, or mount /proc, /sys, etc., specifically for that environment.

Example (using unshare from util-linux):

# Create a new mount namespace and run a shell
sudo unshare --mount --fork /bin/bash
# Inside, mount a tmpfs that only this shell can see
mount -t tmpfs tmpfs /mnt
# From another shell (host namespace), /mnt is unaffected

Mount propagation flags (e.g., MS_SHARED, MS_PRIVATE) control how mounts propagate between namespaces; these are advanced but crucial for real container implementations.

UTS Namespace (Hostname / Domainname)

UTS (UNIX Timesharing System) namespaces isolate:

System hostname (gethostname, sethostname)
NIS domain name (getdomainname, setdomainname)

This is what allows each container to have its own “host” name without changing the real host.

Example:

# New UTS namespace; change hostname only there
sudo unshare --uts --fork /bin/bash
hostname container1
hostname

Outside that namespace, hostname remains the host’s original name.

IPC Namespace

IPC namespaces isolate:

System V IPC objects: message queues, semaphores, shared memory
(Kernel’s POSIX message queues are also namespaced through mount namespaces of /dev/mqueue, but conceptually, IPC namespace limits visibility of classic System V IPC handles.)

Without IPC namespaces, SysV resources are global per-kernel and visible to any process that knows their keys.

With IPC namespaces:

A container’s shared memory segments (shmget, shmat) are not visible in another container.
System IPC limits (like per-namespace SHM) are applied per IPC namespace.

You can examine IPC objects with tools like ipcs and observe differences across namespaces.

PID Namespace

PID namespaces provide processes with their own PID numbering:

Each PID namespace has its own PID 1.
A process can have different PIDs in different nested namespaces.
Parent namespaces can see (and usually control) processes in child PID namespaces, but not vice versa.

Key properties:

PID namespace nesting: like a tree. The initial namespace is the “root”.
A process may have:

PID $p_0$ in the initial namespace
PID $p_1$ in a child PID namespace
etc., as you nest further

Signal delivery is constrained by namespace boundaries and user namespace mappings.

PID 1 semantics in a PID namespace:

The first process in a PID namespace gets PID 1 in that namespace.
It has special behavior: it becomes the “init” for that namespace.
If it exits, all other processes in that PID namespace are terminated (like system shutdown).

Example:

# New PID namespace, mount namespace, and a shell
sudo unshare --pid --mount-proc --fork /bin/bash
# Inside
echo "PID in this namespace: $$"
ps -o pid,ppid,cmd
# Host's perspective:
ps -o pid,ppid,cmd | grep bash

--mount-proc remounts /proc inside the new namespace so /proc shows the namespaced PIDs, not the host’s global ones.

Nested view:

From outside, you see all processes with their host PIDs.
From inside, you only see processes in that PID namespace (and possibly descendants), with local numbering.

This is crucial for containers: processes think they are PID 1 in a “full” system.

Network Namespace

Network namespaces create isolated network stacks:

Own set of network interfaces
Own IP addresses, routing tables, ARP tables
Own netfilter/iptables/nftables rules
Own /proc/sys/net sysctls

The initial network namespace contains the host’s real interfaces (e.g., eth0, wlan0).

A new network namespace starts with:

Only a loopback interface (lo, usually down by default)
No routes, no external connectivity, unless configured

To connect network namespaces, you typically use:

Virtual Ethernet (veth) pairs
Bridges
Tunnels

Example:

# Create a new net namespace and run a shell
sudo unshare --net --uts --fork /bin/bash
# Inside the net namespace:
ip link            # Only 'lo'
ip addr add 10.0.0.2/24 dev lo
ip link set lo up

Typical container networking:

Host creates a veth pair: veth-host and veth-guest.
veth-guest is moved into the container’s net namespace.
veth-host is attached to a Linux bridge (e.g., docker0).
Routes and NAT rules are set on the host to give containers external access.

Tools:

ip netns add/del/exec — manage persistent named network namespaces via /var/run/netns.
ip link set dev X netns Y — move interfaces into a given net namespace.

User Namespace

User namespaces map user and group IDs inside the namespace to different IDs outside:

Inside a user namespace, a process can have UID 0 (root) but map to a non-root UID on the host.
This enables “rootless” containers: they appear to run as root in their namespace but have limited power on the host.

Key concepts:

UID/GID mapping controlled via:

/proc/<pid>/uid_map
/proc/<pid>/gid_map
setgroups handling via /proc/<pid>/setgroups

A mapping entry has the form:

  inside_id   outside_id   length

Meaning: for a range of length IDs starting at inside_id, map them to starting at outside_id in the parent user namespace.

Example (simplified):

Inside namespace: UID 0–65535
Host: UID 100000–165535

Mapping:

0 100000 65536

So inside-UID 0 is host-UID 100000, inside-UID 1 is host-UID 100001, etc.

Security implications:

User namespaces significantly change permission checks: many kernel operations consider the “user namespace owning UID” and capability sets relative to that namespace.
Processes may have capabilities (CAP_SYS_ADMIN etc.) inside their user namespace without having them on the host.
You must think carefully about which kernel features are still reachable from a user-namespaced root.

Unprivileged user namespaces:

Many distributions allow unprivileged users to create user namespaces (unshare --user) with configured ID ranges.
Some distros restrict this for security hardening (e.g., via kernel.unprivileged_userns_clone).

Combined with other namespaces:

Typically, a container uses a user namespace plus PID, mount, net, UTS, IPC, cgroup namespaces.
The user namespace is often created first; additional namespaces are created “under” it, so capability checks are evaluated within that user namespace.

Cgroup Namespace

Cgroup namespaces virtualize the view of control groups:

Without cgroup namespaces, a process inside a container would see the host’s full cgroup hierarchy in /proc/self/cgroup or /sys/fs/cgroup.
With cgroup namespaces, a container sees its own cgroups as if they were the root, even though they are nested deeper in the host’s tree.

Key effects:

Hides host-level cgroup paths from containers.
Makes in-container tools (like systemd or metrics collectors) think they are at the top of the cgroup hierarchy.
Does not by itself enforce resource limiting — that’s still cgroup functionality — but changes how the hierarchy is presented.

Note: cgroup v1 and v2 behave differently in detail, but the namespace idea is the same: localized view of cgroup paths.

Time Namespace (Newer)

Time namespaces allow processes to have:

Per-namespace offsets for CLOCK_MONOTONIC and CLOCK_BOOTTIME
Potentially different perceived system boot time

Motivations:

Testing software that depends on uptime or monotonic clocks.
Simulating long-running environments without changing host clock.
Container migrations and checkpoint/restore.

Key aspects:

The host real-time clock (CLOCK_REALTIME) is not fully virtualized in a trivial way; time namespaces primarily deal with monotonic-style clocks.
Offsets are visible via /proc/<pid>/timens_offsets.

Example usage is more specialized and often integrated with tools like CRIU (Checkpoint/Restore In Userspace).

Creating and Joining Namespaces Programmatically

At the system call level, namespaces are manipulated by:

clone(2) — create a new process and optionally new namespaces (CLONE_NEW* flags).
unshare(2) — detach the calling process from its current namespaces and move into newly created ones for specified types.
setns(2) — join an existing namespace referenced by a file descriptor.

Typical patterns:

Use clone to start a process directly in a new namespace:

   pid_t child = clone(child_func, child_stack,
                       CLONE_NEWUTS | CLONE_NEWPID | SIGCHLD, arg);

Use unshare and then execve:

   unshare(CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWPID);
   execve("/bin/bash", ...);

Use setns to enter a namespace that another process is already in:

   int fd = open("/proc/1234/ns/net", O_RDONLY);
   setns(fd, 0);  // enter PID 1234's net namespace
   close(fd);

Constraints:

Many namespace operations require capabilities in the relevant user namespace (e.g. CAP_SYS_ADMIN).
Often you must create a user namespace first to relax some requirements for the caller.

Inspecting and Comparing Namespaces

Besides /proc/<pid>/ns, there are several ways to inspect and analyze namespaces:

lsns — list all namespaces of a given type, show their processes

  sudo lsns
  sudo lsns -t net

readlink /proc/<pid>/ns/<type> — see namespace ID
ip netns list — for named network namespaces
unshare — to quickly create and test behaviors in new namespaces

For debugging container internals:

Check which namespaces a container’s main process is in.
Use nsenter to join them:

  sudo nsenter --target <pid> --mount --uts --pid --net --ipc /bin/bash

Here, nsenter is essentially a convenient wrapper around setns.

Namespace Relationships and Nesting

A process has:

One mount namespace
One UTS namespace
One IPC namespace
One PID namespace (with possible ancestors)
One network namespace
One user namespace (with possible ancestors)
One cgroup namespace
One time namespace

Key structural properties:

User namespace hierarchy: user namespaces form a tree; privilege checks and uid/gid mappings depend on parent-child relationships.
PID namespaces: hierarchical; a process is visible upward but not downward.
Network and IPC namespaces: generally not hierarchical in the same sense; they’re just separate instances.

Lifecycle:

A namespace is reference-counted:

It exists as long as there is at least one process or open file descriptor referring to it (e.g., a bind-mounted /proc/self/ns/net).

Once the last reference goes away, the namespace is destroyed.

Interaction across namespaces:

You can combine types arbitrarily (e.g., same PID namespace but different network namespaces).
Security and isolation properties depend on which namespaces are used together and how user namespaces are configured.

Namespaces and Containers: Conceptual Mapping

While containers are not a kernel primitive, they are mostly composed of:

PID namespace — isolated process tree
Mount namespace — isolated filesystem
UTS namespace — per-container hostname
Network namespace — per-container interfaces + routing
IPC namespace — per-container SysV IPC
User namespace — remapped root and user IDs
Cgroup namespace + cgroups — per-container resource view and control

Higher-level tools (Docker, Podman, Kubernetes runtimes, LXC/LXD, systemd-nspawn) are essentially elaborate frontends that:

Set up mount trees, networking, id mappings, and cgroups.
Create and join namespaces appropriately.
Manage lifecycle, images, and configuration.

As you study containers more deeply, understanding each namespace’s semantics and interactions is essential for debugging, security analysis, and performance tuning.

Practical Experiments to Understand Namespaces

To make the concepts concrete, useful hands-on exercises include:

Use unshare to create single-type namespaces:

UTS: modify hostnames without affecting host.
PID: run ps inside vs outside and compare.
Mount: mount a filesystem in the new namespace and verify it’s invisible in the host one.
Net: configure lo and veth pairs, ping between namespaces, and examine routing tables.

Use lsns to see how many namespaces your system already uses (many service managers isolate things).
Pick a container PID and use /proc/<pid>/ns plus nsenter to “step into” its namespaces and observe what’s different compared to the host.

These experiments help connect the abstract idea of “isolated kernel resources” with observable differences in process behavior and system layout.

Comments

Please login to add a comment.

Don't have an account? Register now!

7.5.4 Namespaces

Overview: What Namespaces Solve

Types of Linux Namespaces

How Namespaces Are Represented

Mount (mnt) Namespace

UTS Namespace (Hostname / Domainname)

IPC Namespace

PID Namespace

Network Namespace

User Namespace

Cgroup Namespace

Time Namespace (Newer)

Creating and Joining Namespaces Programmatically

Inspecting and Comparing Namespaces

Namespace Relationships and Nesting

Namespaces and Containers: Conceptual Mapping

Practical Experiments to Understand Namespaces

Comments

Where to Move