Kahibaro
Discord Login Register

Collective communication

What Makes Collective Communication Different

In distributed-memory programming with MPI, collective communication refers to operations that involve all processes in a communicator (often MPI_COMM_WORLD) according to a well-defined pattern. Unlike point-to-point communication:

Collective operations are essential for:

This chapter assumes you already understand basic MPI, communicators, and point-to-point communication.

Categories of Collective Operations

Collective operations can be grouped conceptually by what they accomplish:

Most MPI libraries provide both blocking and non-blocking variants for many of these operations.

Barriers and Synchronization

A barrier ensures that no process passes the barrier until all processes in the communicator have reached it.

Example (pseudo-code):

MPI_Barrier(MPI_COMM_WORLD);
/* Now all processes start this timed section "together" */
start = MPI_Wtime();
/* compute and communicate */
end = MPI_Wtime();

One-to-Many and Many-to-One Operations

Broadcast (`MPI_Bcast`)

Broadcast sends the same data from one root process to all other processes.

Key features:

Typical usage:

int root = 0;
double params[10];
if (rank == root) {
    /* initialize params */
}
MPI_Bcast(params, 10, MPI_DOUBLE, root, MPI_COMM_WORLD);
/* Now every process has the same "params" */

Common use cases:

Gather (`MPI_Gather`) and Allgather (`MPI_Allgather`)

Gather

Gather collects values from all processes to a single root.

Usage example:

int local = rank;  /* each process has one integer */
int *all = NULL;
if (rank == 0) {
    all = malloc(size * sizeof(int));
}
MPI_Gather(&local, 1, MPI_INT,
           all,    1, MPI_INT,
           0, MPI_COMM_WORLD);
/* At root: all[i] == i from process i */

Common use cases:

Allgather

Allgather is like gather but distributes the full collected result back to every process.

int local = rank;
int *all = malloc(size * sizeof(int));
MPI_Allgather(&local, 1, MPI_INT,
              all,    1, MPI_INT,
              MPI_COMM_WORLD);
/* Now every process has all[0..size-1] */

Use cases:

Scatter (`MPI_Scatter`)

Scatter distributes distinct chunks of an array from root to all processes.

Example:

double *global_data = NULL;
double local_chunk[CHUNK];
if (rank == 0) {
    global_data = malloc(CHUNK * size * sizeof(double));
    /* fill global_data */
}
MPI_Scatter(global_data, CHUNK, MPI_DOUBLE,
            local_chunk, CHUNK, MPI_DOUBLE,
            0, MPI_COMM_WORLD);
/* Each process now has its own piece of global_data */

Use cases:

Reduction Operations

Basic Reductions: `MPI_Reduce` and `MPI_Allreduce`

Reductions combine values from all processes using an operation, such as:

`MPI_Reduce`

Combines data from all processes and stores the result on a root.

double local_sum = /* partial sum on each process */;
double global_sum;
MPI_Reduce(&local_sum, &global_sum, 1,
           MPI_DOUBLE, MPI_SUM,
           0, MPI_COMM_WORLD);
/* Only root=0 has global_sum */

`MPI_Allreduce`

Same as MPI_Reduce, but the result is sent back to all processes.

double local_sum = /* partial sum */;
double global_sum;
MPI_Allreduce(&local_sum, &global_sum, 1,
              MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD);
/* All processes now know global_sum */

Typical uses:

Other Reduction Variants

Example (MPI_Scan):

int local = 1;
int prefix_sum;
MPI_Scan(&local, &prefix_sum, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD);
/* On rank r: prefix_sum == r+1, assuming ranks start from 0 */

All-to-All Communication

All-to-all patterns let each process exchange distinct data with every other process.

`MPI_Alltoall`

Each process sends a separate block to every other process and receives one from each.

Example (fixed-size blocks):

int sendbuf[size]; /* sendbuf[i] is intended for rank i */
int recvbuf[size];
for (int i = 0; i < size; ++i) {
    sendbuf[i] = rank * 100 + i;  /* identify source and destination */
}
MPI_Alltoall(sendbuf, 1, MPI_INT,
             recvbuf, 1, MPI_INT,
             MPI_COMM_WORLD);
/* recvbuf[j] came from process j */

Use cases:

Variable-Sized All-to-All

When each process sends a different amount of data to each destination:

Conceptually:

Neighbor and Topology-Aware Collectives

In codes using MPI process topologies (e.g., Cartesian grids), neighbor collectives perform collectives between processes that are neighbors in that topology, not all processes.

Examples:

Use cases:

Benefits:

Blocking vs Non-blocking Collectives

Many collective operations have non-blocking forms, e.g.:

Characteristics:

Typical pattern:

MPI_Request req;
MPI_Ibcast(params, 10, MPI_DOUBLE, 0, MPI_COMM_WORLD, &req);
/* Overlap useful work that does not depend on params */
do_local_work();
/* Make sure the broadcast finished before using params */
MPI_Wait(&req, MPI_STATUS_IGNORE);

Benefits:

Caveats:

Correct Usage and Common Pitfalls

Matching Participation

All processes in the communicator must:

Common errors:

These lead to deadlocks or mysterious hangs.

Root and Buffer Rules

For root-based collectives (e.g., MPI_Bcast, MPI_Gather, MPI_Scatter):

Memory layout must be consistent with the call:

Collectives Are Implicitly Synchronized, But…

Collectives provide the necessary synchronization for their own function:

However:

Performance Considerations for Collective Operations

Collectives are often crucial to parallel performance because:

Key performance aspects:

Practical advice:

Designing Algorithms Around Collectives

When designing distributed-memory algorithms:

Using collective communication effectively is central to scalable distributed-memory programming: it enables clear, concise code while leveraging highly optimized implementations inside MPI.

Views: 13

Comments

Please login to add a comment.

Don't have an account? Register now!