8.6 Process topologies

Table of Contents

Why Process Topologies Matter

In distributed-memory programming with MPI, processes are usually identified by a single number: their rank in MPI_COMM_WORLD. For many algorithms, however, there is a natural geometric or logical structure to how processes should interact (lines, rings, 2D/3D grids, etc.).

Process topologies let you express this structure directly in MPI so that:

Your code more clearly matches the algorithm (e.g., 2D domain decomposition).
You simplify neighborhood communication (e.g., stencils, nearest-neighbor).
The MPI library can try to map your logical layout to the physical network more efficiently.

Process topologies are logical: they provide a structured view of ranks through new communicators. They do not physically move processes, but they can help achieve better performance and cleaner code.

Types of Process Topologies in MPI

MPI defines two main kinds of process topologies:

Cartesian (grid) topologies: processes arranged in 1D, 2D, or 3D grids.
Graph topologies: processes arranged according to an arbitrary graph.

Many HPC codes use Cartesian topologies because they match structured meshes, grids, and stencil computations. Graph topologies are more general and fit irregular communication patterns.

All topologies in MPI are attached to communicators. You start from an existing communicator (often MPI_COMM_WORLD), define a topology, and get back a new communicator with that topology attached.

Cartesian Topologies

Creating a Cartesian Communicator

You describe a grid of processes by specifying:

Number of dimensions ndims.
Size in each dimension: dims[0..ndims-1].
Whether each dimension is periodic: periods[0..ndims-1].

MPI helps you distribute processes into this grid with MPI_Dims_create and then builds the new communicator with MPI_Cart_create.

int ndims = 2;
int dims[2] = {0, 0};        // 0 means "let MPI choose"
int periods[2] = {0, 0};     // non-periodic in both directions
int reorder = 1;             // allow rank reordering for performance
MPI_Comm cart_comm;
int size;
MPI_Comm_size(MPI_COMM_WORLD, &size);
// Let MPI choose a good 2D grid layout for "size" processes
MPI_Dims_create(size, ndims, dims);
// Create Cartesian communicator
MPI_Cart_create(MPI_COMM_WORLD, ndims, dims, periods,
                reorder, &cart_comm);

Key points:

MPI_Dims_create tries to find a "nice" factorization of the total number of processes into grid dimensions, often preferring nearly square grids for 2D.
reorder = 1 tells MPI it may change rank numbers to improve mapping to hardware; your code should use Cartesian coordinates, not raw ranks, once you commit to this topology.
The resulting cart_comm has an associated Cartesian topology; MPI_COMM_WORLD remains untouched.

Coordinates and Ranks

In a Cartesian communicator, each process has both:

A rank in the communicator.
A coordinate vector in the Cartesian grid.

You can convert between them:

MPI_Cart_coords: rank → coordinates.
MPI_Cart_rank: coordinates → rank.

int my_rank, my_coords[2];
MPI_Comm_rank(cart_comm, &my_rank);
// Get my coordinates in the 2D grid
MPI_Cart_coords(cart_comm, my_rank, 2, my_coords);
// Example: find rank at coordinates (x, y)
int x = 0, y = 0;
int origin_rank;
int target_coords[2] = {x, y};
MPI_Cart_rank(cart_comm, target_coords, &origin_rank);

This mapping is extremely useful in domain-decomposed codes: coordinates often directly correspond to subdomain indices.

Finding Neighbors: Stencil and Domain Decomposition

Nearest-neighbor communication appears frequently in structured-grid solvers (stencils, finite differences, etc.). Cartesian topologies make it easy to find neighbors with MPI_Cart_shift.

For a 2D grid, suppose dimension 0 is the "x" direction and dimension 1 is "y". You can find neighbors in each direction:

int source, dest;
// Shift along x-direction (dim=0)
MPI_Cart_shift(cart_comm, 0, 1, &source, &dest);
// "source" is neighbor at -1 in x
// "dest"   is neighbor at +1 in x
// Shift along y-direction (dim=1)
MPI_Cart_shift(cart_comm, 1, 1, &source, &dest);
// "source" is neighbor at -1 in y
// "dest"   is neighbor at +1 in y

If the grid is non-periodic, boundary processes will have MPI_PROC_NULL as a neighbor when there is no process in that direction. This is convenient: many MPI routines ignore MPI_PROC_NULL without extra checks.

Typical usage: exchanging "halo" or "ghost" cells with immediate neighbors by pairing MPI_Sendrecv or non-blocking sends/receives, using neighbor ranks from MPI_Cart_shift.

Periodic Boundaries

For some problems (e.g., simulation on a torus or with periodic boundary conditions), edges of the domain wrap around. This is encoded via the periods array.

Example: 2D periodic grid in both directions:

int periods[2] = {1, 1};   // periodic in x and y
MPI_Cart_create(MPI_COMM_WORLD, 2, dims, periods,
                1, &cart_comm);

In this case, MPI_Cart_shift will always return valid ranks; the "right" neighbor of the rightmost process is the leftmost process, and similarly for the other boundaries.

Sub-Communicators for Rows and Columns

Often you need to perform collective operations along rows or columns of your process grid (e.g., for parallel matrix algorithms). MPI_Cart_sub creates sub-communicators by collapsing some dimensions.

Example: from a 2D grid, create row-wise and column-wise communicators:

int remain_dims_row[2] = {0, 1};  // keep y-dimension, collapse x
int remain_dims_col[2] = {1, 0};  // keep x-dimension, collapse y
MPI_Comm row_comm, col_comm;
MPI_Cart_sub(cart_comm, remain_dims_row, &row_comm);
MPI_Cart_sub(cart_comm, remain_dims_col, &col_comm);

Each process now belongs to:

one row communicator (all processes with same x coordinate).
one column communicator (all processes with same y coordinate).

You can then perform MPI_Bcast, MPI_Allreduce, etc. along rows or columns independently, which is standard in many parallel linear algebra and FFT algorithms.

Graph and Distributed Graph Topologies

Cartesian topologies assume a grid-like structure. For more irregular communication patterns, MPI provides graph and distributed graph topologies.

Classic Graph Topology

In a graph topology, processes are vertices, and edges indicate potential communication partners. You describe the whole graph, then let MPI create a communicator that encapsulates it.

Basic ideas:

You specify for each rank what its neighbors are.
MPI can potentially reorder ranks to match hardware better (similar to Cartesian).
You can query neighbors via MPI calls rather than manually storing neighbor lists.

However, the classic graph interface is less scalable for very large systems because graph information can be collective and global.

Distributed Graph Topology

To handle scalability, MPI introduced distributed graph topologies (MPI_Dist_graph_create and related routines). Each process specifies its own neighbors locally, and MPI constructs a topology communicator.

Key characteristics:

Each rank provides its outgoing edges (neighbors it communicates with).
The full graph does not need to be gathered on one process.
Good for applications where each process knows its local neighbors (e.g., domain decompositions on unstructured meshes).

Example schematic (using the C API, omitting full error checking):

// Suppose each process knows its out-degree and destination ranks
int outdegree = ...;
int *destinations = ...;    // array of neighbor ranks
MPI_Comm graph_comm;
MPI_Dist_graph_create_adjacent(
    MPI_COMM_WORLD,
    outdegree, destinations, MPI_UNWEIGHTED,
    outdegree, destinations, MPI_UNWEIGHTED,
    MPI_INFO_NULL, 1, &graph_comm);

Once created, you can query neighbors using:

MPI_Dist_graph_neighbors_count
MPI_Dist_graph_neighbors

You can then implement generic "send to all neighbors" / "receive from all neighbors" patterns without hard-coding rank IDs.

Graph topologies are particularly useful for:

Irregular or unstructured meshes.
Applications with evolving communication patterns that are not well modeled by a simple grid.

Performance and Mapping Considerations

Process topologies are not just syntactic sugar; they can influence performance when used correctly.

Rank Reordering

Both Cartesian and graph topologies allow MPI to reorder ranks when creating the new communicator:

If reorder = 1, MPI can choose a mapping from logical positions (coordinates or graph nodes) to physical ranks to reduce communication cost (e.g., place neighbors close together in the network).
Use communicator-local ranks (MPI_Comm_rank on the topology communicator) and coordinates, not global ranks, once you commit to a topology.

You should not assume any particular relationship between MPI_COMM_WORLD ranks and those in your topology communicator if reordering is allowed.

Matching Topology to Algorithm and Hardware

Some guidelines:

Use Cartesian topologies when:

Your domain is logically structured (1D/2D/3D grids).
You mostly have nearest-neighbor communication in regular patterns.

Use distributed graph topologies when:

Your domain is irregular or unstructured.
Neighborhood sizes vary and global description is expensive.

Letting MPI know your topology gives the library more information to:

Optimize message routing and ordering.
Perform better mapping onto NUMA domains, network switches, etc. (depending on implementation and job launcher).

Performance benefits vary by system and MPI implementation, but on large machines and communication-heavy codes, they can be significant.

When to Use Process Topologies

They are most beneficial when:

You have repeated, structured communication patterns (e.g., stencils in time-stepping codes).
Your code naturally thinks in terms of "neighbors", "rows/columns", or "subdomains".
You want clearer, more maintainable code where communication patterns are explicit.

You may not need them when:

Your application uses mostly global collectives on a single communicator.
Communication patterns are simple and not repeated heavily.
You primarily use higher-level libraries that already manage their own communicators and topologies.

In practice, many mature HPC libraries (e.g., parallel linear algebra packages) internally use process topologies to organize their computations; understanding them helps you reason about how these libraries scale and behave.

Comments

Please login to add a comment.

Don't have an account? Register now!