Table of Contents
Understanding MPI Processes
In MPI, processes are the fundamental units of execution. Everything in MPI revolves around how these processes are created, identified, and used to cooperate.
This chapter focuses on:
- What an MPI process is (in contrast to threads)
- How MPI processes are started and identified
- Ranks and
MPI_COMM_WORLD - Basic process-related MPI calls
- How MPI processes interact with the underlying cluster
- Common patterns and pitfalls involving MPI processes
What Is an MPI Process?
An MPI process is:
- A normal operating system process
- With its own:
- Address space (private memory)
- Stack and heap
- File descriptors
- Running the same executable on each participating node (SPMD model)
Crucially:
- MPI processes do not share memory by default.
- Any data exchanged between processes must use MPI communication routines (send/receive, collectives, etc.).
This is fundamentally different from shared-memory threading models (like OpenMP) where multiple threads share a single address space.
The SPMD Model
Most MPI programs follow the Single Program, Multiple Data (SPMD) pattern:
- You compile one MPI program (one executable).
- You run it as many processes via the MPI launcher (e.g.,
mpirun,srun,mpiexec). - Each process:
- Executes the same code
- May follow different code paths depending on its rank
- Typically operates on a different subset of the data
Conceptually:
- All MPI processes start in
main(). - Behavior is often distinguished with
if (rank == 0) { ... } else { ... }and similar constructs.
Starting MPI Processes
You do not call fork() or create MPI processes yourself. Instead:
- You run your MPI program through an MPI launcher such as
mpirun,mpiexec, or the job scheduler’s MPI integration (e.g.,srunwith SLURM).
Typical usage:
mpirun -np 4 ./my_mpi_programor with a scheduler:
srun -n 4 ./my_mpi_programHere:
-np 4or-n 4requests 4 MPI processes.- The MPI runtime plus the scheduler decide where (on which nodes and cores) to place those processes.
- Each of the 4 processes will execute the same
./my_mpi_programbinary, starting atmain().
Process Ranks and `MPI_COMM_WORLD`
Every MPI process is given an integer identifier called a rank within a communicator.
The default communicator that includes all processes in the MPI job is:
MPI_COMM_WORLD
Within MPI_COMM_WORLD:
- Ranks range from
0tosize-1 sizeis the total number of MPI processes in the job
Two fundamental calls:
MPI_Comm_size(MPI_COMM_WORLD, &size)
Gets the number of processes in the communicator.MPI_Comm_rank(MPI_COMM_WORLD, &rank)
Gets the calling process’s rank within that communicator.
Minimal MPI skeleton (C-like pseudocode):
#include <mpi.h>
int main(int argc, char *argv[])
{
MPI_Init(&argc, &argv); // Start MPI
int size, rank;
MPI_Comm_size(MPI_COMM_WORLD, &size); // total # of processes
MPI_Comm_rank(MPI_COMM_WORLD, &rank); // my process ID (0..size-1)
// Example: only rank 0 prints a global message
if (rank == 0) {
printf("Running with %d MPI processes\n", size);
}
// All ranks print their own identity
printf("Hello from rank %d of %d\n", rank, size);
MPI_Finalize(); // Cleanly shut down MPI
return 0;
}Key ideas:
- Rank
0is often called the root or master (by convention, not by requirement). - All processes can participate equally; special roles are assigned only by the program logic.
Process Lifetime and Initialization
For an MPI process, the life cycle is:
- The launcher starts the OS process and links it into the MPI job.
- The process calls
MPI_Init(orMPI_Init_thread). - The process runs MPI and non-MPI code.
- The process calls
MPI_Finalize. - The OS process exits.
Important points:
- All MPI processes in a job are typically started at the same time.
- All processes must call
MPI_Initbefore using MPI functions andMPI_Finalizewhen done. - After
MPI_Finalize, no MPI calls are allowed (except a few special cases dictated by the standard).
MPI Processes and Memory
Each MPI process has its own separate memory:
- Variables in one process are not visible to another process unless explicitly sent via MPI.
- There is no implicit sharing of pointers/arrays between processes.
Example implication:
- If each process does
double a[1000];, there aresizeindependent arrays, one per process. - Modifying
a[0]in rank 3 modifies only rank 3’s array, not the others.
This separation:
- Makes MPI suitable for distributed memory systems.
- Forces you to think carefully about data distribution and communication patterns.
Process Mapping and Placement
How MPI processes are assigned to hardware resources affects performance:
- On a single node:
- Several MPI processes can run on different cores of the same CPU.
- Across multiple nodes:
- Processes are distributed across nodes according to the scheduler’s allocation and the MPI runtime’s mapping policy.
Typical mapping options (launcher-dependent):
--map-by(Open MPI)--rank-by- Scheduler options like
--ntasks,--ntasks-per-node,--cpus-per-task(SLURM)
Conceptual mapping example:
- 8 MPI processes, 2 nodes, 4 cores each:
- Node 0: ranks 0,1,2,3
- Node 1: ranks 4,5,6,7
The logical rank does not have to match the physical core index; mapping is configurable and performance-sensitive but is not controlled by standard MPI calls.
Basic Process-Related MPI Calls
Beyond MPI_Comm_rank and MPI_Comm_size, some common process-related routines include:
MPI_Get_processor_name(char name, int resultlen)
Returns the name of the node (host) a process is running on.MPI_Barrier(MPI_COMM_WORLD)
A synchronization point: all processes block until every process has calledMPI_Barrieron the same communicator.
These are useful for:
- Debugging placement (
rank+ processor name). - Ensuring certain code regions are entered or exited together (barrier).
Multiple Communicators and Process Grouping (Overview Only)
Although MPI_COMM_WORLD includes all processes, you can:
- Create sub-communicators that contain subsets of processes.
- Use these for:
- Splitting the job into groups (e.g., by node, by role, by problem domain).
- Implementing multi-level algorithms.
At this stage, know only that:
- A process can belong to multiple communicators.
- It has a different rank in each communicator.
MPI_COMM_WORLDis just the default “everyone” communicator.
Common Process-Related Patterns
A few typical patterns for working with MPI processes:
Single Root for I/O
Only one process (often rank 0) handles expensive or shared operations, such as:
- Reading the input file
- Writing global results
- Printing progress messages
Pattern:
if (rank == 0) {
// perform I/O or coordination
}Data is then broadcast or scattered to other processes using MPI collectives.
Process-Specific Work Decomposition
Processes divide work according to their ranks. For example, for a loop over N elements:
for (int i = rank; i < N; i += size) {
// each process handles every size-th element
}or block decomposition:
int chunk = N / size;
int start = rank * chunk;
int end = (rank == size - 1) ? N : start + chunk;
for (int i = start; i < end; i++) {
// each process handles a contiguous chunk
}These patterns exploit the rank to distribute work.
Typical Mistakes Involving MPI Processes
Some frequent errors when dealing with MPI processes:
- Forgetting
MPI_InitorMPI_Finalize - Leads to crashes or undefined behavior.
- Assuming shared memory between processes
- Modifying an array in one rank doesn’t change it in others.
- Using rank identities inconsistently
- E.g., assuming rank numbers correspond to physical topology when they do not.
- Running mismatched process counts
- Program logic assumes a certain number of ranks (e.g., exactly 4) but is launched with a different
-npvalue. - Rank 0 overload
- Making rank 0 do all heavy work (e.g., all I/O or all computation) while others idle.
Summary
- An MPI process is an ordinary OS process participating in an MPI job.
- Processes are started by the launcher (e.g.,
mpirun,srun), not created inside the code. - Each process is identified by a rank within a communicator, most commonly
MPI_COMM_WORLD. - Processes do not share memory; communication is explicit via MPI calls.
- Rank-based logic (SPMD) is the core way to differentiate behavior and divide work.
- Correct and efficient use of MPI processes is the foundation for all higher-level MPI communication and parallel algorithms.