Table of Contents
Overview: What Namespaces Solve
Namespaces isolate global kernel resources so that different sets of processes see different “worlds”:
- Different PIDs
- Different network stack
- Different filesystem layout
- Different hostname / domain
- Different users and IDs
- Different IPC objects
- Different control groups
They are the core building block of containers, but they are much more general: any process can enter its own set of namespaces and live in a “virtualized” view of the machine.
From here on, assume you already understand basic process lifecycle and cgroups at a high level; this chapter focuses on namespaces specifically.
Types of Linux Namespaces
Each namespace type virtualizes a particular kernel resource. A process can be in exactly one namespace of each type at any time, but in multiple different types simultaneously (e.g., one PID namespace, one mount namespace, one network namespace, etc.).
The main types (as of modern kernels) are:
- Mount (
CLONE_NEWNS) - UTS (
CLONE_NEWUTS) - IPC (
CLONE_NEWIPC) - PID (
CLONE_NEWPID) - Network (
CLONE_NEWNET) - User (
CLONE_NEWUSER) - Cgroup (
CLONE_NEWCGROUP) - Time (
CLONE_NEWTIME) — newer, often less covered
Each is identified in the kernel by an internal ID and exposed to user space via /proc.
How Namespaces Are Represented
Every process has namespace membership tracked by the kernel. User space visibility is primarily through:
/proc/<pid>/ns/directory- Each file in this directory is a special reference to a namespace
Example:
$ ls -l /proc/$$/ns
total 0
lrwxrwxrwx 1 user user 0 ... cgroup -> 'cgroup:[4026531835]'
lrwxrwxrwx 1 user user 0 ... ipc -> 'ipc:[4026531839]'
lrwxrwxrwx 1 user user 0 ... mnt -> 'mnt:[4026531840]'
lrwxrwxrwx 1 user user 0 ... net -> 'net:[4026531992]'
lrwxrwxrwx 1 user user 0 ... pid -> 'pid:[4026531836]'
lrwxrwxrwx 1 user user 0 ... pid_for_children -> 'pid:[4026531836]'
lrwxrwxrwx 1 user user 0 ... time -> 'time:[4026531834]'
lrwxrwxrwx 1 user user 0 ... user -> 'user:[4026531837]'
lrwxrwxrwx 1 user user 0 ... uts -> 'uts:[4026531838]'The bracketed number is a kernel-internal namespace ID.
Two PIDs share a namespace if their corresponding /proc/<pid>/ns/<type> links point to the same underlying object (same ID).
Mount (mnt) Namespace
Mount namespaces provide each process group with its own view of the filesystem tree:
- Separate set of mount points
- Ability to
mount/umountwithout affecting other namespaces - Foundations for
chroot-like isolation, but more powerful
Key behaviors:
- When a new mount namespace is created, it starts with a copy of the parent’s mount table (copy-on-write semantics).
- Subsequent mounts/unmounts are local to that namespace (unless marked as shared/propagated).
Practical implications:
- Containers have their own root filesystem by using a mount namespace +
pivot_rootorchroot. - You can hide host paths, add bind-mounts, or mount
/proc,/sys, etc., specifically for that environment.
Example (using unshare from util-linux):
# Create a new mount namespace and run a shell
sudo unshare --mount --fork /bin/bash
# Inside, mount a tmpfs that only this shell can see
mount -t tmpfs tmpfs /mnt
# From another shell (host namespace), /mnt is unaffected
Mount propagation flags (e.g., MS_SHARED, MS_PRIVATE) control how mounts propagate between namespaces; these are advanced but crucial for real container implementations.
UTS Namespace (Hostname / Domainname)
UTS (UNIX Timesharing System) namespaces isolate:
- System hostname (
gethostname,sethostname) - NIS domain name (
getdomainname,setdomainname)
This is what allows each container to have its own “host” name without changing the real host.
Example:
# New UTS namespace; change hostname only there
sudo unshare --uts --fork /bin/bash
hostname container1
hostname
Outside that namespace, hostname remains the host’s original name.
IPC Namespace
IPC namespaces isolate:
- System V IPC objects: message queues, semaphores, shared memory
- (Kernel’s POSIX message queues are also namespaced through mount namespaces of
/dev/mqueue, but conceptually, IPC namespace limits visibility of classic System V IPC handles.)
Without IPC namespaces, SysV resources are global per-kernel and visible to any process that knows their keys.
With IPC namespaces:
- A container’s shared memory segments (
shmget,shmat) are not visible in another container. - System IPC limits (like per-namespace SHM) are applied per IPC namespace.
You can examine IPC objects with tools like ipcs and observe differences across namespaces.
PID Namespace
PID namespaces provide processes with their own PID numbering:
- Each PID namespace has its own PID 1.
- A process can have different PIDs in different nested namespaces.
- Parent namespaces can see (and usually control) processes in child PID namespaces, but not vice versa.
Key properties:
- PID namespace nesting: like a tree. The initial namespace is the “root”.
- A process may have:
- PID $p_0$ in the initial namespace
- PID $p_1$ in a child PID namespace
- etc., as you nest further
- Signal delivery is constrained by namespace boundaries and user namespace mappings.
PID 1 semantics in a PID namespace:
- The first process in a PID namespace gets PID 1 in that namespace.
- It has special behavior: it becomes the “init” for that namespace.
- If it exits, all other processes in that PID namespace are terminated (like system shutdown).
Example:
# New PID namespace, mount namespace, and a shell
sudo unshare --pid --mount-proc --fork /bin/bash
# Inside
echo "PID in this namespace: $$"
ps -o pid,ppid,cmd
# Host's perspective:
ps -o pid,ppid,cmd | grep bash
--mount-proc remounts /proc inside the new namespace so /proc shows the namespaced PIDs, not the host’s global ones.
Nested view:
- From outside, you see all processes with their host PIDs.
- From inside, you only see processes in that PID namespace (and possibly descendants), with local numbering.
This is crucial for containers: processes think they are PID 1 in a “full” system.
Network Namespace
Network namespaces create isolated network stacks:
- Own set of network interfaces
- Own IP addresses, routing tables, ARP tables
- Own netfilter/iptables/nftables rules
- Own
/proc/sys/netsysctls
The initial network namespace contains the host’s real interfaces (e.g., eth0, wlan0).
A new network namespace starts with:
- Only a loopback interface (
lo, usually down by default) - No routes, no external connectivity, unless configured
To connect network namespaces, you typically use:
- Virtual Ethernet (veth) pairs
- Bridges
- Tunnels
Example:
# Create a new net namespace and run a shell
sudo unshare --net --uts --fork /bin/bash
# Inside the net namespace:
ip link # Only 'lo'
ip addr add 10.0.0.2/24 dev lo
ip link set lo upTypical container networking:
- Host creates a veth pair:
veth-hostandveth-guest. veth-guestis moved into the container’s net namespace.veth-hostis attached to a Linux bridge (e.g.,docker0).- Routes and NAT rules are set on the host to give containers external access.
Tools:
ip netns add/del/exec— manage persistent named network namespaces via/var/run/netns.ip link set dev X netns Y— move interfaces into a given net namespace.
User Namespace
User namespaces map user and group IDs inside the namespace to different IDs outside:
- Inside a user namespace, a process can have UID 0 (root) but map to a non-root UID on the host.
- This enables “rootless” containers: they appear to run as root in their namespace but have limited power on the host.
Key concepts:
- UID/GID mapping controlled via:
/proc/<pid>/uid_map/proc/<pid>/gid_mapsetgroupshandling via/proc/<pid>/setgroups- A mapping entry has the form:
inside_id outside_id length
Meaning: for a range of length IDs starting at inside_id, map them to starting at outside_id in the parent user namespace.
Example (simplified):
- Inside namespace: UID 0–65535
- Host: UID 100000–165535
Mapping:
0 100000 65536So inside-UID 0 is host-UID 100000, inside-UID 1 is host-UID 100001, etc.
Security implications:
- User namespaces significantly change permission checks: many kernel operations consider the “user namespace owning UID” and capability sets relative to that namespace.
- Processes may have capabilities (CAP_SYS_ADMIN etc.) inside their user namespace without having them on the host.
- You must think carefully about which kernel features are still reachable from a user-namespaced root.
Unprivileged user namespaces:
- Many distributions allow unprivileged users to create user namespaces (
unshare --user) with configured ID ranges. - Some distros restrict this for security hardening (e.g., via
kernel.unprivileged_userns_clone).
Combined with other namespaces:
- Typically, a container uses a user namespace plus PID, mount, net, UTS, IPC, cgroup namespaces.
- The user namespace is often created first; additional namespaces are created “under” it, so capability checks are evaluated within that user namespace.
Cgroup Namespace
Cgroup namespaces virtualize the view of control groups:
- Without cgroup namespaces, a process inside a container would see the host’s full cgroup hierarchy in
/proc/self/cgroupor/sys/fs/cgroup. - With cgroup namespaces, a container sees its own cgroups as if they were the root, even though they are nested deeper in the host’s tree.
Key effects:
- Hides host-level cgroup paths from containers.
- Makes in-container tools (like
systemdor metrics collectors) think they are at the top of the cgroup hierarchy. - Does not by itself enforce resource limiting — that’s still cgroup functionality — but changes how the hierarchy is presented.
Note: cgroup v1 and v2 behave differently in detail, but the namespace idea is the same: localized view of cgroup paths.
Time Namespace (Newer)
Time namespaces allow processes to have:
- Per-namespace offsets for
CLOCK_MONOTONICandCLOCK_BOOTTIME - Potentially different perceived system boot time
Motivations:
- Testing software that depends on uptime or monotonic clocks.
- Simulating long-running environments without changing host clock.
- Container migrations and checkpoint/restore.
Key aspects:
- The host real-time clock (
CLOCK_REALTIME) is not fully virtualized in a trivial way; time namespaces primarily deal with monotonic-style clocks. - Offsets are visible via
/proc/<pid>/timens_offsets.
Example usage is more specialized and often integrated with tools like CRIU (Checkpoint/Restore In Userspace).
Creating and Joining Namespaces Programmatically
At the system call level, namespaces are manipulated by:
clone(2)— create a new process and optionally new namespaces (CLONE_NEW*flags).unshare(2)— detach the calling process from its current namespaces and move into newly created ones for specified types.setns(2)— join an existing namespace referenced by a file descriptor.
Typical patterns:
- Use
cloneto start a process directly in a new namespace:
pid_t child = clone(child_func, child_stack,
CLONE_NEWUTS | CLONE_NEWPID | SIGCHLD, arg);- Use
unshareand thenexecve:
unshare(CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWPID);
execve("/bin/bash", ...);- Use
setnsto enter a namespace that another process is already in:
int fd = open("/proc/1234/ns/net", O_RDONLY);
setns(fd, 0); // enter PID 1234's net namespace
close(fd);Constraints:
- Many namespace operations require capabilities in the relevant user namespace (e.g.
CAP_SYS_ADMIN). - Often you must create a user namespace first to relax some requirements for the caller.
Inspecting and Comparing Namespaces
Besides /proc/<pid>/ns, there are several ways to inspect and analyze namespaces:
lsns— list all namespaces of a given type, show their processes
sudo lsns
sudo lsns -t netreadlink /proc/<pid>/ns/<type>— see namespace IDip netns list— for named network namespacesunshare— to quickly create and test behaviors in new namespaces
For debugging container internals:
- Check which namespaces a container’s main process is in.
- Use
nsenterto join them:
sudo nsenter --target <pid> --mount --uts --pid --net --ipc /bin/bash
Here, nsenter is essentially a convenient wrapper around setns.
Namespace Relationships and Nesting
A process has:
- One mount namespace
- One UTS namespace
- One IPC namespace
- One PID namespace (with possible ancestors)
- One network namespace
- One user namespace (with possible ancestors)
- One cgroup namespace
- One time namespace
Key structural properties:
- User namespace hierarchy: user namespaces form a tree; privilege checks and uid/gid mappings depend on parent-child relationships.
- PID namespaces: hierarchical; a process is visible upward but not downward.
- Network and IPC namespaces: generally not hierarchical in the same sense; they’re just separate instances.
Lifecycle:
- A namespace is reference-counted:
- It exists as long as there is at least one process or open file descriptor referring to it (e.g., a bind-mounted
/proc/self/ns/net). - Once the last reference goes away, the namespace is destroyed.
Interaction across namespaces:
- You can combine types arbitrarily (e.g., same PID namespace but different network namespaces).
- Security and isolation properties depend on which namespaces are used together and how user namespaces are configured.
Namespaces and Containers: Conceptual Mapping
While containers are not a kernel primitive, they are mostly composed of:
- PID namespace — isolated process tree
- Mount namespace — isolated filesystem
- UTS namespace — per-container hostname
- Network namespace — per-container interfaces + routing
- IPC namespace — per-container SysV IPC
- User namespace — remapped root and user IDs
- Cgroup namespace + cgroups — per-container resource view and control
Higher-level tools (Docker, Podman, Kubernetes runtimes, LXC/LXD, systemd-nspawn) are essentially elaborate frontends that:
- Set up mount trees, networking, id mappings, and cgroups.
- Create and join namespaces appropriately.
- Manage lifecycle, images, and configuration.
As you study containers more deeply, understanding each namespace’s semantics and interactions is essential for debugging, security analysis, and performance tuning.
Practical Experiments to Understand Namespaces
To make the concepts concrete, useful hands-on exercises include:
- Use
unshareto create single-type namespaces: - UTS: modify hostnames without affecting host.
- PID: run
psinside vs outside and compare. - Mount: mount a filesystem in the new namespace and verify it’s invisible in the host one.
- Net: configure
loand veth pairs, ping between namespaces, and examine routing tables. - Use
lsnsto see how many namespaces your system already uses (many service managers isolate things). - Pick a container PID and use
/proc/<pid>/nsplusnsenterto “step into” its namespaces and observe what’s different compared to the host.
These experiments help connect the abstract idea of “isolated kernel resources” with observable differences in process behavior and system layout.