8.2 MPI processes

Table of Contents

Overview

In distributed memory programming with MPI, the basic unit of execution is the MPI process. Understanding what an MPI process is, how it is created and identified, and how it interacts with other processes is essential before you can make sense of any MPI communication.

This chapter focuses on the life cycle and properties of MPI processes, how they differ from threads, and the implications for program design and correctness.

What is an MPI Process

An MPI process is a regular operating system process that participates in an MPI application. Each MPI process has its own address space, its own variables, its own heap and stack, and generally cannot directly access the memory of another MPI process. All communication between MPI processes goes through MPI library calls.

When you run an MPI program, you typically start several independent processes that all execute the same program image. This model is often called SPMD, which stands for Single Program, Multiple Data. Each process has its own view of the world and decides, usually based on its rank, which part of the work to do.

Key property: Different MPI processes do not share memory. Any data exchange between MPI processes must happen through MPI communication calls.

Starting MPI Processes

MPI processes are started by an MPI launcher, not by calling main several times manually. On many systems using MPI and SLURM, you might see commands like:

srun --mpi=pmix_v3 -n 4 ./my_mpi_program

or, on systems where mpirun or mpiexec is used directly:

mpirun -np 4 ./my_mpi_program

In both cases, four MPI processes will be created, and each will start executing the same program from main.

The MPI standard does not define the launcher itself, only that once your program begins, you must call MPI_Init (or MPI_Init_thread) before using any other MPI function. After that call, your code is officially running as one MPI process that is part of a group of MPI processes.

Inside your main, the structure is typically:

#include <mpi.h>
#include <stdio.h>
int main(int argc, char **argv) {
    MPI_Init(&argc, &argv);
    /* Work as an MPI process */
    MPI_Finalize();
    return 0;
}

Every process enters MPI_Init, and every process must eventually call MPI_Finalize. The creation and termination of MPI processes themselves are managed by the runtime environment and launcher.

Ranks and Communicators

An MPI process is identified by an integer rank within an MPI communicator. The most fundamental communicator is MPI_COMM_WORLD, which contains all processes in the MPI job.

To query the number of processes and the rank of the current process, you use:

int size, rank;
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);

After these calls, size holds the total number of MPI processes in MPI_COMM_WORLD, and rank holds the unique identifier of the calling process within that communicator. Ranks are always integers from 0 to size - 1.

Important rule: Within a communicator, each MPI process has a unique rank in the range $0, 1, \dots, \text{size} - 1$. All point-to-point and many collective operations refer to processes by their rank.

For example, in a run with 4 processes:

Process with rank = 0 is often called the root process.
Processes with rank = 1, 2, 3 are the other worker processes.

The choice of what each rank does is up to your program. MPI does not enforce special behavior for rank 0, it is only a common convention to use rank 0 for coordination, I/O, or printing status messages.

SPMD Programming Style with MPI Processes

In the SPMD style, all MPI processes execute the same program. The different roles and work assignments are determined by rank. A very simple pattern looks like:

if (rank == 0) {
    /* root process: e.g., read input, distribute work */
} else {
    /* worker processes: e.g., perform computations */
}

More complex programs avoid a strict root/worker distinction and instead partition data and computation uniformly across ranks. For example, if you have an array of length N and size MPI processes, each process can handle approximately N / size elements.

The important idea is that the program logic is written once, but the behavior differs per process based on rank. You should think of each MPI process as an independent entity that follows the same script but with different local data and perhaps different branches.

Memory and Data Isolation

Each MPI process has its own private memory space. If you declare a variable like:

double x = 0.0;

each MPI process has its own x. Changing x in one process does not affect x in any other process.

This model sharply contrasts with shared memory threaded programming, such as with OpenMP, where multiple threads can access the same variable in memory. With MPI, there is no hidden sharing. The only way to make the values in different processes consistent is explicit communication.

This isolation has several consequences for program design:

First, there are no data races in MPI arising from unsynchronized shared variables, because no variables are truly shared between processes. However, there can still be logical races and ordering issues in communication, such as mismatched sends and receives.

Second, each process is responsible for its own memory allocation. If you allocate an array with malloc or new, that array is local to the process. If you want each process to have its own part of a large global array, you must define how that global array is split into local pieces and how processes exchange boundary data when needed.

Third, pointers and addresses are meaningful only within a single MPI process. You cannot send a raw pointer to another process and expect it to point to something useful in the receiver. When using MPI communication calls, you send the contents of memory locations, not their addresses.

Process Topology and Placement

MPI processes are typically placed across nodes in a cluster. Some processes may run on the same node and share a physical memory system, while others run on different nodes and communicate through the network. The MPI layer abstracts this detail and presents a uniform view where all processes belong to a single communicator.

However, process placement often affects performance. Several aspects are important:

First, nearby processes, for example processes on the same node or on the same physical socket, can communicate faster than distant ones. HPC job schedulers and MPI launchers allow you to control how many processes run on each node and sometimes to control their binding to CPU cores. While the detailed control of placement belongs to other chapters, it is useful to remember that ranks are not inherently tied to specific nodes or cores.

Second, MPI provides concepts such as process topologies and Cartesian communicators, which allow you to arrange ranks logically in a grid or other structure. The mapping from these logical coordinates to physical hardware can be important for performance but does not change the basic idea that each MPI process has a unique rank in a communicator.

Third, on hybrid systems, a single node may run multiple MPI processes, each potentially using multiple threads. In that case, processes and threads coexist. Here it is important to distinguish that an MPI process is still an operating system process, while threads are lighter weight entities within that process.

MPI Processes vs Threads

It is useful to compare MPI processes to threads, especially if you have seen shared memory programming before.

An MPI process is a full process, with its own address space, stack, and heap. Communication between MPI processes is always explicit and usually more expensive than communication between threads on the same node. Processes are isolated from each other by the operating system.

A thread, such as a POSIX thread or an OpenMP thread, lives inside a process. All threads within a process share the same address space. They can read and write the same variables in memory, which allows faster data sharing but introduces the need for careful synchronization to avoid race conditions.

Some key contrasts are:

First, memory model: MPI processes have disjoint, private memory spaces. Threads share a single memory space inside a process.

Second, startup and management: MPI processes are created at job launch and, in most applications, their number remains fixed throughout the run. Threads can be created dynamically inside a process, and their number can vary over time.

Third, failure isolation: If one MPI process crashes, the whole MPI job often fails. If one thread crashes or corrupts shared memory, it usually brings down the entire process and thus all threads in it.

Fourth, programming implications: With MPI, you spend more effort managing data distribution and communication. With threads, you spend more effort managing synchronization and avoiding data races.

Hybrid programming models combine MPI processes across nodes with threads within each process. That topic is handled in a separate chapter. Here, the important point is that MPI processes are the outer units of execution in a distributed memory program.

Rank-Based Control Flow and Program Structure

Since all MPI processes start running the same code, your first task in any MPI program is to query your rank and size and then decide what to do. A minimal MPI program that distinguishes behavior by rank might look like:

#include <mpi.h>
#include <stdio.h>
int main(int argc, char **argv) {
    MPI_Init(&argc, &argv);
    int rank, size;
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);
    printf("Hello from rank %d of %d\n", rank, size);
    MPI_Finalize();
    return 0;
}

All processes execute this code, but each prints a different rank. This example illustrates a central pattern: the same code, but different local state.

More sophisticated structures involve dividing ranks into groups or roles. For example, you might choose one rank to perform file I/O, some ranks to handle boundary condition updates, and all ranks to participate in numerical computation.

You can also create new communicators with subsets of processes. Inside each communicator, the processes receive new ranks, which may differ from their ranks in MPI_COMM_WORLD. That allows you to organize processes hierarchically or by function. The detailed mechanisms of communicators and groups are covered elsewhere, but for MPI processes, it is important to understand that a process can have multiple ranks, one per communicator, and that rank is always relative to a specific communicator.

Process Lifetime and Finalization

The lifetime of an MPI process, from the point of view of MPI itself, begins at MPI_Init and ends at MPI_Finalize. Before MPI_Init and after MPI_Finalize, MPI calls are generally not allowed.

Inside this lifetime, the process participates fully in communication and collective operations, may create new communicators, and may synchronize with other processes.

There are a few rules about process lifetime that you must respect:

All MPI processes that call MPI_Init must also call MPI_Finalize, and they must do so in a logically consistent way. For example, if a collective operation involves all processes in a communicator, every process in that communicator is required to participate.

Early termination of a single process by calling exit or by throwing an exception that bypasses MPI_Finalize usually leads to an abnormal termination of the entire MPI job. Some MPI implementations provide extensions for fault tolerance, but these are advanced features and not part of typical introductory usage.

This global dependence of processes on each other means that designing consistent control flow is critical. If one process gets stuck waiting for a message that never arrives because another process has already exited or skipped a matching call, the entire program can deadlock.

Practical Tips for Working with MPI Processes

A few practical points help you reason about MPI processes correctly in your early MPI programs.

First, imagine separate programs: When you think about MPI processes, imagine that you are running several copies of your program in separate terminals. Each has its own variables and memory. The only way they can share information is through sending and receiving messages. This mental model helps prevent accidental assumptions of shared state.

Second, always check rank and size at the beginning: Most MPI programs start by calling MPI_Comm_rank and MPI_Comm_size as early as possible after MPI_Init. This makes it easier to structure your code around the roles of processes.

Third, use rank 0 carefully for I/O: It is common to let rank 0 handle reading input and printing progress. However, doing all work on rank 0 can create bottlenecks. As your programs grow, consider distributed I/O, where multiple ranks read and write their own portions of data.

Fourth, be explicit about ownership: For any global logical object, such as an array or a grid, decide which process owns which parts of it. Write this ownership rule down in comments. This discipline helps you understand which process should allocate which data, and which processes need to communicate.

Fifth, be wary of hard-coded ranks: In simple examples, you might write code that assumes a specific number of processes, for instance, four processes with ranks 0 to 3, each with a custom role. For real applications, try to design algorithms that work correctly for any size that satisfies basic conditions, such as divisibility of data size. This makes your code more portable and flexible.

Summary

MPI processes are the fundamental units of execution in a distributed memory MPI program. Each MPI process is an operating system process with a private memory space. Processes join an MPI job at MPI_Init, obtain a rank within communicators, participate in communication and computation according to their rank, and eventually call MPI_Finalize.

Ranks serve as the logical identifiers of processes, and the SPMD style uses the same program for all processes but with role decisions based on rank. The isolation of memory between processes simplifies some aspects of parallel programming, such as avoiding data races from shared variables, but shifts complexity to explicit data distribution and communication.

A clear understanding of MPI processes, their identity, their isolation, and their lifetime is a prerequisite for effective use of the MPI communication routines that are introduced in later chapters.

Comments

Please login to add a comment.

Don't have an account? Register now!