8.2 MPI processes

Table of Contents

Understanding MPI Processes

In MPI, processes are the fundamental units of execution. Everything in MPI revolves around how these processes are created, identified, and used to cooperate.

This chapter focuses on:

What an MPI process is (in contrast to threads)
How MPI processes are started and identified
Ranks and MPI_COMM_WORLD
Basic process-related MPI calls
How MPI processes interact with the underlying cluster
Common patterns and pitfalls involving MPI processes

What Is an MPI Process?

An MPI process is:

A normal operating system process
With its own:

Address space (private memory)
Stack and heap
File descriptors

Running the same executable on each participating node (SPMD model)

Crucially:

MPI processes do not share memory by default.
Any data exchanged between processes must use MPI communication routines (send/receive, collectives, etc.).

This is fundamentally different from shared-memory threading models (like OpenMP) where multiple threads share a single address space.

The SPMD Model

Most MPI programs follow the Single Program, Multiple Data (SPMD) pattern:

You compile one MPI program (one executable).
You run it as many processes via the MPI launcher (e.g., mpirun, srun, mpiexec).
Each process:

Executes the same code
May follow different code paths depending on its rank
Typically operates on a different subset of the data

Conceptually:

All MPI processes start in main().
Behavior is often distinguished with if (rank == 0) { ... } else { ... } and similar constructs.

Starting MPI Processes

You do not call fork() or create MPI processes yourself. Instead:

You run your MPI program through an MPI launcher such as mpirun, mpiexec, or the job scheduler’s MPI integration (e.g., srun with SLURM).

Typical usage:

mpirun -np 4 ./my_mpi_program

or with a scheduler:

srun -n 4 ./my_mpi_program

Here:

-np 4 or -n 4 requests 4 MPI processes.
The MPI runtime plus the scheduler decide where (on which nodes and cores) to place those processes.
Each of the 4 processes will execute the same ./my_mpi_program binary, starting at main().

Process Ranks and `MPI_COMM_WORLD`

Every MPI process is given an integer identifier called a rank within a communicator.

The default communicator that includes all processes in the MPI job is:

MPI_COMM_WORLD

Within MPI_COMM_WORLD:

Ranks range from 0 to size-1
size is the total number of MPI processes in the job

Two fundamental calls:

MPI_Comm_size(MPI_COMM_WORLD, &size)
Gets the number of processes in the communicator.
MPI_Comm_rank(MPI_COMM_WORLD, &rank)
Gets the calling process’s rank within that communicator.

Minimal MPI skeleton (C-like pseudocode):

#include <mpi.h>
int main(int argc, char *argv[])
{
    MPI_Init(&argc, &argv);  // Start MPI
    int size, rank;
    MPI_Comm_size(MPI_COMM_WORLD, &size); // total # of processes
    MPI_Comm_rank(MPI_COMM_WORLD, &rank); // my process ID (0..size-1)
    // Example: only rank 0 prints a global message
    if (rank == 0) {
        printf("Running with %d MPI processes\n", size);
    }
    // All ranks print their own identity
    printf("Hello from rank %d of %d\n", rank, size);
    MPI_Finalize();  // Cleanly shut down MPI
    return 0;
}

Key ideas:

Rank 0 is often called the root or master (by convention, not by requirement).
All processes can participate equally; special roles are assigned only by the program logic.

Process Lifetime and Initialization

For an MPI process, the life cycle is:

The launcher starts the OS process and links it into the MPI job.
The process calls MPI_Init (or MPI_Init_thread).
The process runs MPI and non-MPI code.
The process calls MPI_Finalize.
The OS process exits.

Important points:

All MPI processes in a job are typically started at the same time.
All processes must call MPI_Init before using MPI functions and MPI_Finalize when done.
After MPI_Finalize, no MPI calls are allowed (except a few special cases dictated by the standard).

MPI Processes and Memory

Each MPI process has its own separate memory:

Variables in one process are not visible to another process unless explicitly sent via MPI.
There is no implicit sharing of pointers/arrays between processes.

Example implication:

If each process does double a[1000];, there are size independent arrays, one per process.
Modifying a[0] in rank 3 modifies only rank 3’s array, not the others.

This separation:

Makes MPI suitable for distributed memory systems.
Forces you to think carefully about data distribution and communication patterns.

Process Mapping and Placement

How MPI processes are assigned to hardware resources affects performance:

On a single node:

Several MPI processes can run on different cores of the same CPU.

Across multiple nodes:

Processes are distributed across nodes according to the scheduler’s allocation and the MPI runtime’s mapping policy.

Typical mapping options (launcher-dependent):

--map-by (Open MPI)
--rank-by
Scheduler options like --ntasks, --ntasks-per-node, --cpus-per-task (SLURM)

Conceptual mapping example:

8 MPI processes, 2 nodes, 4 cores each:

Node 0: ranks 0,1,2,3
Node 1: ranks 4,5,6,7

The logical rank does not have to match the physical core index; mapping is configurable and performance-sensitive but is not controlled by standard MPI calls.

Basic Process-Related MPI Calls

Beyond MPI_Comm_rank and MPI_Comm_size, some common process-related routines include:

MPI_Get_processor_name(char name, int resultlen)
Returns the name of the node (host) a process is running on.
MPI_Barrier(MPI_COMM_WORLD)
A synchronization point: all processes block until every process has called MPI_Barrier on the same communicator.

These are useful for:

Debugging placement (rank + processor name).
Ensuring certain code regions are entered or exited together (barrier).

Multiple Communicators and Process Grouping (Overview Only)

Although MPI_COMM_WORLD includes all processes, you can:

Create sub-communicators that contain subsets of processes.
Use these for:

Splitting the job into groups (e.g., by node, by role, by problem domain).
Implementing multi-level algorithms.

At this stage, know only that:

A process can belong to multiple communicators.
It has a different rank in each communicator.
MPI_COMM_WORLD is just the default “everyone” communicator.

Common Process-Related Patterns

A few typical patterns for working with MPI processes:

Single Root for I/O

Only one process (often rank 0) handles expensive or shared operations, such as:

Reading the input file
Writing global results
Printing progress messages

Pattern:

if (rank == 0) {
    // perform I/O or coordination
}

Data is then broadcast or scattered to other processes using MPI collectives.

Process-Specific Work Decomposition

Processes divide work according to their ranks. For example, for a loop over N elements:

for (int i = rank; i < N; i += size) {
    // each process handles every size-th element
}

or block decomposition:

int chunk = N / size;
int start = rank * chunk;
int end   = (rank == size - 1) ? N : start + chunk;
for (int i = start; i < end; i++) {
    // each process handles a contiguous chunk
}

These patterns exploit the rank to distribute work.

Typical Mistakes Involving MPI Processes

Some frequent errors when dealing with MPI processes:

Forgetting MPI_Init or MPI_Finalize

Leads to crashes or undefined behavior.

Assuming shared memory between processes

Modifying an array in one rank doesn’t change it in others.

Using rank identities inconsistently

E.g., assuming rank numbers correspond to physical topology when they do not.

Running mismatched process counts

Program logic assumes a certain number of ranks (e.g., exactly 4) but is launched with a different -np value.

Rank 0 overload

Making rank 0 do all heavy work (e.g., all I/O or all computation) while others idle.

Summary

An MPI process is an ordinary OS process participating in an MPI job.
Processes are started by the launcher (e.g., mpirun, srun), not created inside the code.
Each process is identified by a rank within a communicator, most commonly MPI_COMM_WORLD.
Processes do not share memory; communication is explicit via MPI calls.
Rank-based logic (SPMD) is the core way to differentiate behavior and divide work.
Correct and efficient use of MPI processes is the foundation for all higher-level MPI communication and parallel algorithms.

Comments

Please login to add a comment.

Don't have an account? Register now!