Kahibaro
Discord Login Register

MPI and parallel workloads in containers

Why MPI in Containers on OpenShift Is Different

Running MPI and other tightly coupled parallel workloads in containers on OpenShift introduces a few unique considerations compared to traditional bare‑metal HPC:

This chapter focuses on these container‑ and OpenShift‑specific aspects, not on MPI fundamentals themselves.

Containerizing MPI Applications

MPI runtimes and base images

Typical MPI stacks to containerize:

Common approaches:

Key containerization rules:

Image layout and multi‑stage builds

Because MPI applications can be large and require compilers, use multi‑stage builds:

Example Dockerfile pattern (simplified):

FROM registry.access.redhat.com/ubi9/ubi as build
RUN yum install -y openmpi-devel make gcc && yum clean all
ENV PATH=/usr/lib64/openmpi/bin:$PATH \
    LD_LIBRARY_PATH=/usr/lib64/openmpi/lib:$LD_LIBRARY_PATH
WORKDIR /src
COPY . .
RUN mpicc -O3 -o my_mpi_app main.c
FROM registry.access.redhat.com/ubi9/ubi-minimal
RUN microdnf install -y openmpi && microdnf clean all
ENV PATH=/usr/lib64/openmpi/bin:$PATH \
    LD_LIBRARY_PATH=/usr/lib64/openmpi/lib:$LD_LIBRARY_PATH
COPY --from=build /src/my_mpi_app /usr/local/bin/my_mpi_app
ENTRYPOINT ["my_mpi_app"]

Points specific to MPI:

Process model inside containers

Each MPI rank is just a Linux process inside a pod’s container or containers. Common patterns:

Choosing between these depends on:

Networking Considerations for MPI on OpenShift

MPI performance and correctness are heavily influenced by how pods are networked.

Pod‑to‑pod connectivity

MPI requires:

In OpenShift, this means:

Common patterns:

High‑performance fabrics (InfiniBand, RDMA, SR‑IOV)

For HPC‑grade performance, MPI often uses:

On OpenShift, this usually involves:

Container‑specific constraints:

If RDMA is not available or not exposed into pods, MPI will fall back to TCP over the cluster network, which is usually fine for modest‑scale or “throughput‑oriented” HPC, but not ideal for latency‑sensitive tightly coupled codes.

Job Orchestration Patterns for MPI on OpenShift

The main challenge is mapping MPI’s notion of “ranks” and “hosts” to OpenShift’s pods and controllers.

Using Jobs and Pods directly

Minimal pattern:

Typical sequence:

  1. Deploy N worker pods via a Job or a separate Deployment.
  2. When workers are ready, the launcher pod:
    • Queries the API (or reads a ConfigMap/Downward API) to obtain all worker pod IPs.
    • Builds a hostfile dynamically.
    • Invokes mpirun -np N --hostfile hosts.txt my_mpi_app.

Considerations:

MPI Operators and MPIJob CRDs

An MPI Operator introduces a Custom Resource Definition (CRD), often named MPIJob, which automates orchestration:

Benefits specific to OpenShift:

Limitations:

Handling failures and retries

In an HPC job scheduler, node or rank failure may abort the whole job or trigger specific recovery logic. In Kubernetes/OpenShift:

Patterns to align MPI with OpenShift behavior:

Resource Management and Scheduling

Parallel HPC jobs are resource‑hungry and sensitive to placement.

CPU and memory requests/limits

For each MPI pod:

Alignment with MPI:

Node selection and topology

MPI performance depends on:

In OpenShift you use:

For tightly coupled jobs:

GPUs and accelerators

For MPI+GPU workloads:

Hybrid MPI+GPU patterns:

Storage and Data Locality for Parallel Jobs

Parallel workloads often need:

Container‑specific points:

Interaction with MPI:

Security and MPI in Containers

MPI workloads often expect low friction with system resources; OpenShift adds important security controls.

Areas that often require attention:

Balancing security and performance:

Debugging and Performance Tuning

Debugging MPI inside OpenShift differs from SSH into a node and running ad‑hoc commands.

Observability in containerized MPI jobs

Leverage:

Common debugging tips:

Tuning steps specific to containers

Design Patterns and Best Practices

A few practical patterns emerge for MPI and parallel workloads on OpenShift:

By aligning MPI’s rank‑centric view with OpenShift’s pod‑ and controller‑centric model, you can keep the benefits of containerized, cloud‑native operations while still running demanding parallel workloads effectively.

Views: 11

Comments

Please login to add a comment.

Don't have an account? Register now!