7.5.1 Process lifecycle

Table of Contents

Overview

Every running program in Linux is represented by a process. The process lifecycle is the story of how a process is born, how it runs, how it pauses or waits, and how it eventually dies. Understanding this lifecycle is essential for diagnosing performance problems, debugging, and reasoning about how user space interacts with the kernel.

This chapter focuses on what is specific to process lifecycle, not on general concepts of memory management, signals, or interprocess communication, which are covered elsewhere.

From program to process

When you start a program in Linux, for example by typing ls in a shell, several distinct steps happen between the moment you request it and the moment the corresponding process is running.

At the userspace level, a shell is itself a process. When you enter a command, the shell interprets it, resolves the executable path, and then asks the kernel to create a new process to run that program. In POSIX environments, the classic pattern uses the fork() and exec() family of system calls. However, the kernel does not implement a high level fork() primitive directly in the same way as the C library exposes it, but rather provides lower level process creation primitives. Libraries then build the familiar semantics on top of these primitives.

The high level view is that there is always a parent process that requests the creation of a child. There is no spontaneous appearance of processes. Even system services that appear at the start of boot are ultimately descendants of the initial process that the kernel creates after boot, usually pid 1, which under modern Linux systems is most often systemd or sometimes alternative init systems.

Process creation: fork-like operations

Linux historically supports Unix-like semantics based on fork(), which creates a new process by duplicating the calling process. At a high level, fork() creates a child process with its own process identifier (PID). After a successful fork(), both the parent and child resume execution at the same point in code, but with different return values from fork() so that they can distinguish their roles.

Internally, Linux does not copy all memory immediately. Instead, it uses a technique called copy-on-write. Parent and child initially share the same physical memory pages, marked as read only. When one of them writes to a page, the kernel creates a private copy of that page for the writing process. This avoids expensive copying at the moment of fork and has a significant performance impact on process creation.

Linux generalizes process and thread creation with a primitive called clone(). This system call allows the caller to select which resources are shared and which are not. By default, a process created by fork() does not share memory, file descriptor tables, or other state, while a thread creation library uses clone() with flags that share memory and other resources so that threads live in the same address space.

From the lifecycle point of view, the key aspects of process creation are the assignment of a new PID, the creation of a task structure in the kernel, the definition of the initial register set, and the setup of scheduling state so that the new process can compete for CPU time.

Executing a new program: exec phase

If the goal is to run a different program than the parent, the child process typically calls one of the execve()-style functions after fork. The exec family replaces the current process image with a new program. This means that the process keeps the same PID and many kernel level attributes, but its code and data segments are replaced with those from the new executable and its specified interpreter if needed.

The lifecycle transition here is from "child process that still looks like the parent" to "process running a new program". The kernel discards the old user space address space, loads the executable from disk, maps the program into memory, sets up the stack, environment variables, and arguments (argv and envp), and then sets the instruction pointer to the entry point of the new program.

The combination of fork() followed by execve() is the canonical way shells and many other programs create new processes. In modern systems, there are also optimizations such as posix_spawn() that try to create and exec a new program in a more efficient, single logical step, though internally it still corresponds to a sequence of kernel operations that end in a new process image being attached to a task.

A process is identified by its PID. After exec, the PID remains the same, but the user space program image is completely replaced. Creation of a new process requires a new PID; exec alone never creates a new PID.

Process identifiers and relationships

Every process has a PID, which is unique at a given point in time within one PID namespace. PIDs are assigned from a range of integers and eventually reused when processes exit and sufficient time has passed to avoid conflicts.

Each process also has a parent process identifier, PPID, which records the PID of the process that created it. This parent child relationship creates a process tree. At the root of this tree is pid 1, the init system, which is created by the kernel at the end of the boot sequence. All other processes are descendants of this root.

During the lifecycle, parent and child are linked in several ways. The parent can wait for the child to terminate. Signals such as SIGCHLD inform the parent when the child exits or stops. If the parent exits before the child, the child becomes an orphan process and is reparented to a special process, usually pid 1, which then becomes responsible for performing cleanup when the child eventually terminates.

Scheduling and running

Once a process exists and has a program loaded, it enters the running part of its lifecycle. At this stage, the kernel scheduler decides when and on which CPU core the process runs. From the point of view of lifecycle, a process alternates among several basic states, such as running on the CPU, runnable but waiting its turn, or sleeping while it waits for some resource or event.

A process that is currently executing instructions on a CPU core is in the running state. Soon, the scheduler will preempt it, which means it removes the process from the CPU and places it on a run queue so that another process can run. When a process is ready to run but currently not on a CPU, it is usually described as runnable.

When a process makes a blocking system call, for example to read from disk or wait for input, the kernel may put it to sleep. A sleeping process does not consume CPU time. It waits in the kernel until the condition it is waiting for becomes true, such as data becoming available. Only then does the kernel move the process back to the runnable set.

A process only runs when scheduled on a CPU. Most of its lifetime is typically spent off CPU, either runnable or sleeping while it waits for I/O or other events.

In addition to ordinary sleeping and running, there are more specific internal states, such as interruptible and uninterruptible sleep, traced states, and stopped states related to signals and debugging, but the key lifecycle idea is that a process continuously transitions among active CPU execution, readiness to run, and waiting.

Waiting and blocking

A process often must wait. The lifecycle includes several types of waits. The most common are waits for I/O, such as when reading from a disk or socket, and waits for other processes, such as a parent waiting for a child.

When a process calls blocking I/O, the kernel checks whether the data is immediately available. If not, the kernel marks the process as sleeping and records the event that will wake it up, such as a completion of a disk operation or receipt of a network packet. The scheduler then chooses another process to run. When the event occurs, the kernel changes the sleeping process back to a runnable state, and at some later scheduling decision the process will run again and return from its system call with the requested data.

Waiting for a child process is a lifecycle specific variant. When a process calls wait() or similar functions, it may block until any child changes state, for example by exiting or stopping. When the kernel detects that a child has terminated, it wakes up the waiting parent so that the parent can collect the child's exit status and release the kernel resources associated with that child.

Nonblocking variants of many operations exist. In those cases, the lifecycle is similar except that the process may check for availability and, if the data is not there, immediately return to user space and do other work instead of entering a sleeping state.

Signals and lifecycle transitions

Signals are asynchronous notifications that can change a process state abruptly. While the details of signals are covered elsewhere, here they matter as lifecycle triggers.

Signals can request that a process terminate, stop, or continue. For example, SIGTERM normally asks a process to terminate gracefully, while SIGKILL forces immediate termination. SIGSTOP can cause a process to stop execution, moving it into a stopped state, and SIGCONT can resume it, bringing it back to the runnable set.

From the lifecycle point of view, signal delivery can cause transitions from running to stopped, from running or sleeping to terminated, or from stopped to runnable. If a signal is configured with a custom handler, the process may temporarily divert its execution to that handler function, then return to normal flow. If it has the default action of termination or stopping, the kernel immediately adjusts the process state according to the specified action.

Some signals are also used to inform parents about child lifecycle events. For example, when a child exits, the kernel sends SIGCHLD to the parent, unless the parent explicitly changed handling for this signal. The parent may use this to trigger wait() calls or other cleanup actions.

Process exit and termination

Eventually, every process reaches the end of its execution. Termination can be voluntary, for example when the main function returns, or explicit when the program calls _exit() or exit(). It can also be forced by the kernel in response to unrecoverable exceptions, security violations, or signals such as SIGKILL.

When a process terminates voluntarily, user space execution stops at a point where the kernel enters an exit path. The kernel then performs several operations. It closes file descriptors, releases memory mappings, frees kernel data structures associated with the process, and updates accounting information. The process state changes to a special "zombie" or "dead" state, depending on whether the parent still needs to read the exit status.

The process exit code is a small integer, usually in the range 0 to 255 for shell-visible codes, that indicates success or failure to interested observers. By convention, 0 means success and nonzero values indicate errors. The shell and other supervisory processes use this code to decide what to do next.

If the process terminates because of a fatal signal, the kernel records that signal as the reason. From the outside, tools may report that the process was "killed by signal N" rather than "exited with status X". This difference matters when debugging abnormal terminations.

Zombie processes and reaping

When a process terminates, the kernel cannot always immediately discard all of its bookkeeping data. The parent process may still want to call wait() to learn the exit status. For this reason, the kernel keeps a minimal record of the process. A process in this state is commonly called a zombie.

A zombie does not have an active thread of execution. It does not run or consume CPU time. Most of its resources such as memory and file descriptors have already been released. What remains is a small entry in the kernel's task table that stores the PID, exit code, and some accounting data. The process remains in this zombie state until its parent calls wait() or a related function and collects its termination status.

A zombie process is not running. It is a dead process that has not yet been reaped by its parent. Zombies hold a PID and small bookkeeping data until the parent calls wait().

Reaping is the act of calling wait() (or variants such as waitpid()) to collect the child's exit status and allow the kernel to finally discard its remaining entry. Once a zombie has been reaped, it disappears completely from process listings and its PID may eventually be reused for a new process.

If a parent never calls wait(), zombies can accumulate. To avoid permanent buildup, Linux reassigns orphans to pid 1. The init system periodically calls wait() to reap any adopted zombies, which ensures that the system does not run out of PIDs.

Orphans and reparenting

If a parent process terminates while it still has running children, those children become orphans. From a lifecycle perspective, they continue to run and behave normally. The kernel simply assigns them a new parent, typically the init process with PID 1 or a subreaper process configured to take over child responsibilities.

Reparenting ensures that every process always has a parent in the process tree, even if the original creator dies. This is important because someone must be responsible for reaping the process when it terminates. PIDs are a limited resource, so the kernel needs some parent to call wait() eventually.

Subreapers provide more flexible control. A process can request to act as a subreaper, which means that when its descendant processes become orphaned, they are reparented to this subreaper rather than to PID 1. This is often used in container runtimes. Within a container, the container's init process becomes the effective root of a process subtree, responsible for lifecycle management of its descendants.

Process lifecycle and namespaces

While namespaces are discussed in detail elsewhere, they interact strongly with the process lifecycle. A namespace isolates particular kernel resources, such as PIDs, from other parts of the system. In a PID namespace, the process tree appears to start at a different root. The lifecycle steps creation, running, waiting, and exiting are the same, but the visible PIDs and parent relationships inside the namespace may be different from those seen in the parent namespace.

Creation of a new PID namespace typically uses clone() with a specific flag. The first process in a PID namespace has PID 1 inside that namespace, and it assumes the role of init for that namespace, including reaping zombies. When that namespace init exits, the kernel terminates or reassigns remaining tasks in the namespace, which is an important lifecycle boundary, similar to system shutdown in the global namespace.

Lifecycle and cgroups

Control groups, or cgroups, provide a way to group processes for resource accounting and control. From a lifecycle perspective, a process enters one or more cgroups when it is created or when it is later moved by an administrator or a management process.

The transitions among running, sleeping, and exiting still occur as usual, but cgroups can impose additional limits. For example, if a CPU quota is applied to a cgroup, processes within that group may be throttled. If a memory limit is exceeded, the kernel's out of memory killer may select one or more processes in that cgroup to terminate. These controls effectively shape the lifecycle of the contained processes by adding resource driven termination or throttling events.

Cgroups also allow structured cleanup. When a cgroup is destroyed, all processes that remain in it must first exit or be killed. This creates a natural grouping of process lifecycles associated with higher level constructs such as containers or services.

Process lifecycle and system shutdown

When the system shuts down or reboots, process lifecycles reach a global endpoint. The init system coordinates a controlled termination of running services and applications. From the kernel's perspective, shutdown is a sequence of termination requests and cleanup operations applied to almost all processes.

The init process sends signals, usually SIGTERM followed by SIGKILL if necessary, to processes under its control. Each process transitions through its normal exit path or is forcibly terminated. The init system waits for services to stop, reaps zombies, and ensures no critical processes are left running that would interfere with unmounting filesystems or finalizing device state.

This orchestrated termination is different from an isolated process exit, but uses the same fundamental lifecycle mechanisms. The only process that remains at the end of a graceful shutdown is the kernel itself, which then executes platform specific code to power off or restart the machine.

Summary

The Linux process lifecycle covers several key stages. A process is created as a child of an existing process using fork-like mechanisms, it may then replace its program image using exec. Once started, it alternates between running on a CPU, being runnable, and sleeping while it waits for resources, with signals able to alter its state asynchronously.

Termination occurs voluntarily or in response to signals or kernel decisions. On exit, a process becomes a zombie until its parent reaps it by calling wait(). If the parent dies first, the kernel reparents the child to a suitable ancestor, typically pid 1 or a configured subreaper. Namespaces and cgroups affect how lifecycles are seen and constrained, but they build on the same kernel primitives.

By viewing processes through this lifecycle lens, you gain a deeper understanding of how the kernel manages the creation, existence, and destruction of the activities that make up a Linux system.

Comments

Please login to add a comment.

Don't have an account? Register now!