17.4 Case studies from science

Table of Contents

Why Case Studies Matter

Scientific HPC case studies show how abstract ideas—parallelism, scaling, memory, I/O—translate into real decisions: which algorithms to use, how to organize data, how to run and manage jobs, and what “success” means (e.g., faster time‑to‑solution, better resolution, more simulations).

In this chapter, the focus is on patterns that appear across scientific domains, illustrated by concrete examples. For each domain, pay attention to:

What is being computed (simulation, data analysis, optimization, …)
How the computation is parallelized (MPI, OpenMP, GPUs, hybrid, …)
Resource usage (cores, GPUs, memory, storage, I/O)
Performance and scaling behavior
Workflow structure (preprocessing, main compute, postprocessing, visualization)

The goal is to help you recognize typical HPC “shapes” you will see again and again.

Climate and Weather Modeling

Problem Setting

Climate and numerical weather prediction (NWP) codes solve partial differential equations (PDEs) for the atmosphere and oceans on a global or regional grid. Typical goals:

Weather forecasts from hours to days ahead
Seasonal and climate projections over decades or centuries
High‑resolution regional simulations for impact studies

These models are often coupled (atmosphere–ocean–land–ice), highly parallel, and run continuously.

Computational Characteristics

Structured grids on spheres or regional domains, often decomposed horizontally into subdomains.
Time stepping: billions of steps for century‑scale climate.
Stencil operations: each grid point updated from neighboring points at each step.
Communication pattern: nearest‑neighbor halo exchanges between subdomains.

Parallelization Strategies

Domain decomposition + MPI:

The global grid is split into horizontal tiles; each MPI process owns one tile.
At every time step, processes exchange boundary “halo” cells with neighbors.

Hybrid MPI + OpenMP:

One MPI process per NUMA domain or socket; OpenMP threads iterate over vertical levels or subsets of the grid.

GPU acceleration (in newer models):

Stencil kernels (advection, diffusion, physics parameterizations) ported to CUDA, OpenACC, or OpenMP target directives.
Asynchronous halo exchanges overlapping with GPU computation.

Example: Global Weather Forecast

Objective: 7‑day global forecast at ~10 km resolution.
Resources: tens of thousands of CPU cores or thousands of GPUs.
Workflow:

Data assimilation: Incorporate observations into a best estimate of current state (itself an HPC workload).
Forecast integration: Run the model forward in time.
Postprocessing & products: Interpolate to user grids, compute diagnostics, archive outputs.

HPC Challenges and Lessons

Strong scaling limit: As you increase cores, halo communication and global reductions dominate; beyond some point, adding cores gives little benefit.
I/O bottlenecks: Terabytes per day of model output; requires parallel I/O, compression, and careful selection of diagnostics.
Resilience: Long climate runs must survive node failures—checkpointing and restart are essential.
Parameter sweeps: Many ensemble members with slightly different initial conditions; schedulers and job arrays are heavily used.

Key takeaway: Climate and weather codes are canonical examples of regular, grid‑based, communication‑intensive MPI (often hybrid) applications with demanding I/O.

Astrophysics and Cosmology

Problem Setting

Astrophysics uses HPC for simulations of:

Formation of galaxies and large‑scale structure in the universe
Stellar evolution and supernova explosions
Compact object mergers (black holes, neutron stars)

The physics involves gravity, hydrodynamics or magnetohydrodynamics (MHD), and sometimes general relativity and radiation transport.

Computational Characteristics

Particles and/or grids:

N‑body particles for dark matter dynamics.
Adaptive mesh refinement (AMR) grids for gas or MHD.

Multiscale: Large dynamic range in space and time; small regions require high resolution.
Irregular structures: Clusters, filaments, shocks—nonuniform distribution of work.

Parallelization Strategies

MPI over domain decomposition:

Space split into subdomains; each MPI rank holds particles and grid cells in its region.

Tree and AMR structures:

Hierarchical data structures distributed across ranks.
Tree traversals and AMR operations can cause load imbalance.

Hybrid & GPU use:

Gravity solvers (e.g., tree or particle‑mesh methods) and hydrodynamics kernels ported to GPUs.
One MPI rank per GPU, with threads or CUDA streams exploiting fine‑grained parallelism.

Example: Large‑Scale Cosmological Simulation

Objective: Evolve billions to trillions of particles in a box hundreds of Mpc on a side.
Resources: Hundreds of thousands of cores, petabytes of storage.
Workflow:

Generate initial conditions (displacement fields).
Main time integration loop (gravity + hydrodynamics).
Periodic snapshot outputs and analysis.

HPC Challenges and Lessons

Load balancing: Galaxy clusters form in some regions, leaving others sparse; domain decomposition must adapt dynamically.
Communication patterns: Tree/mesh methods require nonlocal data exchanges; efficient communication and overlapping with computation are critical.
Data volume: Snapshots can be petabytes; in situ or in transit analysis is used to reduce I/O.
Numerical accuracy vs. performance: Gravity is sensitive to numerical errors; code must balance precision (e.g., double vs. mixed precision) against speed.

Key takeaway: Astrophysics simulations highlight dynamic load balancing, hierarchical algorithms, and extreme data challenges.

Computational Fluid Dynamics (CFD) and Engineering

Problem Setting

CFD is central to:

Aerodynamics (aircraft, cars, wind turbines)
Turbomachinery (jet engines, gas turbines)
Process engineering (chemical reactors, pipelines)
Environmental flows (urban wind, pollutant dispersion)

Codes solve the Navier–Stokes equations, often with turbulence models or direct numerical simulation (DNS) for research.

Computational Characteristics

Structured or unstructured meshes:

Structured grids for simple geometries (channels, pipes).
Unstructured meshes for complex shapes (aircraft, cars).

Stencil‑like kernels on structured grids, but more irregular memory access on unstructured meshes.
High arithmetic intensity in turbulence models or advanced discretizations.

Parallelization Strategies

MPI domain decomposition:

Mesh partitioned into subdomains; interface faces require communication.

Hybrid MPI/OpenMP:

MPI across nodes; OpenMP threads over cells, elements, or faces within a subdomain.

GPU acceleration:

Finite volume/finite element loops on GPUs; careful data layout to ensure coalesced memory access.

Example: Aircraft Wing Simulation

Objective: Simulate turbulent flow around a wing section at realistic Reynolds numbers.
Resources: From a few hundred to tens of thousands of cores, depending on resolution.
Workflow:

Mesh generation (often on workstations or smaller clusters).
Steady‑state RANS or unsteady simulation on the cluster.
Postprocessing: lift/drag coefficients, flow visualizations.

HPC Challenges and Lessons

Scalability on unstructured meshes:

Graph partitioning (e.g., via METIS/ParMETIS) to balance elements per rank and minimize communication.

Preconditioners and solvers:

Linear solvers (e.g., Krylov methods) dominate runtime; preconditioners must be parallel and cache‑friendly.

Cache and memory bandwidth:

Core performance often limited by memory access patterns; data structure choices matter.

Design space exploration:

Many geometry or parameter variants; embarrassingly parallel ensembles can saturate schedulers.

Key takeaway: CFD showcases domain decomposition, linear solver performance, and the interplay between mesh quality, partitioning, and scalability.

Molecular Dynamics and Computational Chemistry

Problem Setting

Molecular dynamics (MD) and related methods simulate motion and interactions of atoms and molecules to study:

Protein folding and conformational changes
Drug binding and free energies
Material properties (polymers, alloys)
Membranes and biomolecular complexes

Simulations often run for nanoseconds to milliseconds of physical time, with time steps of femtoseconds.

Computational Characteristics

Short‑range interactions (Lennard–Jones, short‑range Coulomb) with cutoffs and neighbor lists.
Long‑range electrostatics using particle‑mesh Ewald (PME) or related methods.
Fine time stepping: millions to billions of steps.
Regular kernels inside each step but complex multi‑component algorithms.

Parallelization Strategies

Spatial domain decomposition + MPI:

Space split into cells; atoms assigned to ranks based on position.
Neighbor lists updated periodically; forces computed using local and halo atoms.

Force decomposition & task‑based approaches in some codes.
GPU acceleration:

Short‑range force kernels run on GPUs; PME may use separate ranks or GPUs.
One or more GPUs per node; CPUs handle orchestration and some auxiliary tasks.

Example: Protein–Ligand Binding Simulation

Objective: Estimate binding free energy of a small molecule to a protein.
Resources: From single‑GPU workstations to modest HPC clusters with many GPUs.
Workflow:

System preparation (solvation, ionization).
Equilibration runs.
Production MD (possibly many replicas with different starting conditions).
Analysis (RMSD, free energy estimators).

HPC Challenges and Lessons

Strong scaling limits:

For a single system, adding too many ranks leads to communication overhead; codes often scale well only up to a few dozen ranks per system.

Ensemble simulations:

To use large machines efficiently, run many independent replicas in parallel, each moderately parallelized.

GPU utilization:

MD kernels are compute‑heavy and vectorizable, making them well suited to GPUs; performance hinges on good GPU occupancy and minimizing host–device transfers.

Load balancing:

Inhomogeneous systems (e.g., membrane + solvent) can cause some domains to have many more atoms than others.

Key takeaway: MD is a classic example of moderately strong scaling per simulation plus massive ensemble parallelism across simulations, with heavy GPU usage.

Bioinformatics and Genomics

Problem Setting

Genomics and bioinformatics use HPC for:

Genome assembly from short reads
Alignment of sequencing reads to reference genomes
Variant calling and functional annotation
Metagenomics and transcriptomics analyses

These are typically data‑intensive rather than numerically intensive.

Computational Characteristics

Huge input datasets (terabytes of reads).
String processing, graph algorithms, and hashing dominate compute.
Irregular memory access and branching; often memory bandwidth and latency limited.
Embarrassingly parallel tasks at multiple stages (per‑sample, per‑contig, per‑chromosome).

Parallelization Strategies

Coarse‑grained parallelism:

Process many samples or read chunks in parallel.
Use job arrays on the scheduler for thousands of similar jobs.

Thread‑level parallelism:

Many tools use OpenMP or Pthreads within a node.

Distributed memory:

For large assemblies, data and computation are distributed across many nodes (e.g., distributed de Bruijn graphs).

Example: Whole‑Genome Variant Calling Pipeline

Objective: Identify variants in hundreds or thousands of human genomes.
Resources: From small clusters to large centers; thousands of cores over days to weeks.
Workflow:

Read QC and trimming.
Alignment to reference genome.
Sort, mark duplicates, and recalibrate.
Variant calling and joint genotyping.
Annotation and reporting.

HPC Challenges and Lessons

I/O and storage:

Pipelines read and write many large intermediate files; parallel filesystems and smart caching are essential.

Workflow management:

Complex DAGs of tasks with dependencies; workflow engines (Snakemake, Nextflow, Cromwell) orchestrate jobs on clusters.

Throughput vs. latency:

Focus is often on processing as many samples per day as possible, not on minimizing time for a single sample.

Reproducibility:

Strict versioning and containerization are widespread due to clinical relevance.

Key takeaway: Genomics emphasizes I/O, workflow orchestration, and embarrassingly parallel throughput rather than extreme per‑job scalability.

High‑Energy Physics (HEP)

Problem Setting

Large experiments (e.g., at the LHC) produce enormous volumes of collision data. HPC is used for:

Detector simulation (Monte Carlo)
Event reconstruction
Analysis of recorded events
Theoretical simulations (lattice QCD, perturbative calculations)

Here we highlight two different patterns: Monte Carlo event simulation and lattice QCD.

Monte Carlo Event Simulation

Computational Characteristics

Embarrassingly parallel: each simulated event is independent.
Complex local computations per event (particle interactions, detector response).
Moderate memory usage per task; huge overall data volume.

Parallelization Strategies

Massive task parallelism:

Millions to billions of events, each simulated as an independent job.
Perfectly suited to distributed computing grids and cluster job arrays.

Multi‑threading/GPU within an event:

Newer frameworks vectorize or offload event steps to GPUs.

HPC Challenges and Lessons

Resource federation: global computing grids coordinate thousands of sites.
Data management: replicating and accessing petabytes of event data.
Efficiency: aim for high throughput and high utilization; individual job performance is less critical than overall rate.

Lattice QCD

Computational Characteristics

Discrete space‑time lattice; large sparse linear systems.
Heavy use of iterative solvers and stencil‑like operations.
Highly regular but communication‑intensive patterns (nearest‑neighbor on 4D lattices).

Parallelization Strategies

Domain decomposition + MPI in 4D.
GPU acceleration for linear algebra kernels; often one rank per GPU.
Mixed precision techniques to speed up iterative solvers.

HPC Challenges and Lessons

Strong scaling up to very large core counts, but limited by latency and global reductions.
Machine‑specific tuning: performance highly sensitive to network, cache, and GPU characteristics.

Key takeaway: HEP showcases both embarrassingly parallel Monte Carlo workflows and tightly coupled, stencil‑based simulations.

Earth Sciences and Natural Hazards

Problem Setting

Earth sciences use HPC for:

Seismic wave propagation and earthquake modeling
Tsunami and storm surge simulations
Volcanic eruption and landslide modeling
Groundwater and reservoir simulations

These applications directly support hazard assessment and risk mitigation.

Computational Characteristics

Wave and transport PDEs on 2D/3D meshes.
Time‑critical in some cases (early warning, real‑time forecasting).
Complex geometries (topography, subsurface structures).
Multi‑physics coupling (e.g., earthquake rupture + wave propagation).

Parallelization Strategies

MPI domain decomposition of the mesh.
Hybrid MPI/OpenMP and GPU:

Seismic wave equations implemented as high‑order stencil kernels on CPUs or GPUs.
GPUs provide significant speedups for high‑order methods.

Example: Regional Earthquake Scenario Simulation

Objective: Model ground motion for a hypothetical earthquake to produce hazard maps.
Resources: Thousands to tens of thousands of CPU cores or hundreds of GPUs.
Workflow:

Build geological and source models.
Run wave propagation simulation.
Generate shaking intensity maps and risk metrics.

HPC Challenges and Lessons

Real‑time constraints:

For early warning, computation must keep up with or outpace real time.

I/O and visualization:

Large time‑dependent 3D fields; strategies include output decimation and in situ visualization.

Mesh resolution vs. runtime:

Trade‑offs between capturing high‑frequency waves and computational cost.

Key takeaway: Earth‑hazard applications illustrate the use of HPC for time‑critical simulations with strong societal impact.

Cross‑Cutting Patterns from Scientific Case Studies

Across these domains, a few recurring patterns emerge:

1. Workload Types

Tightly coupled simulations (climate, CFD, lattice QCD):
Require fast interconnects, careful parallelization, and attention to scalability and communication patterns.
Embarrassingly or loosely coupled tasks (Monte Carlo, genomics pipelines, MD ensembles):
Dominated by workflow management, job arrays, and efficient use of cluster queues.

2. Parallelization Models

MPI is almost universal for distributed memory.
Hybrid MPI + threads is common for node‑level performance on CPUs.
GPUs and accelerators are increasingly central, especially for regular, compute‑intensive kernels.
Ensembles: Many independent moderate‑size jobs used to fill large systems.

3. Performance Concerns

Scaling:

Strong scaling limits appear in tightly coupled simulations; beyond a point, communication dominates.

I/O:

Many scientific codes are I/O‑bound when output frequency or resolution is high.

Load balance:

Adaptive meshes, inhomogeneous physics, and data‑dependent workloads require dynamic balancing and smart partitioning.

4. Workflow and Operations

End‑to‑end pipelines:

Preprocessing → main compute → postprocessing/analysis → archiving.

Checkpoint/restart:

Long‑running simulations rely heavily on fault tolerance.

Reproducibility and provenance:

Scientific results must be reproducible; environment control and consistent software stacks are crucial.

Understanding these real‑world patterns will help you reason about how to design, run, and optimize your own HPC workloads in scientific contexts.

Comments

Please login to add a comment.

Don't have an account? Register now!

17.4 Case studies from science

Why Case Studies Matter

Climate and Weather Modeling

Problem Setting

Computational Characteristics

Parallelization Strategies

Example: Global Weather Forecast

HPC Challenges and Lessons

Astrophysics and Cosmology

Problem Setting

Computational Characteristics

Parallelization Strategies

Example: Large‑Scale Cosmological Simulation

HPC Challenges and Lessons

Computational Fluid Dynamics (CFD) and Engineering

Problem Setting

Computational Characteristics

Parallelization Strategies

Example: Aircraft Wing Simulation

HPC Challenges and Lessons

Molecular Dynamics and Computational Chemistry

Problem Setting

Computational Characteristics

Parallelization Strategies

Example: Protein–Ligand Binding Simulation

HPC Challenges and Lessons

Bioinformatics and Genomics

Problem Setting

Computational Characteristics

Parallelization Strategies

Example: Whole‑Genome Variant Calling Pipeline

HPC Challenges and Lessons

High‑Energy Physics (HEP)

Problem Setting

Monte Carlo Event Simulation

Computational Characteristics

Parallelization Strategies

HPC Challenges and Lessons

Lattice QCD

Computational Characteristics

Parallelization Strategies

HPC Challenges and Lessons

Earth Sciences and Natural Hazards

Problem Setting

Computational Characteristics

Parallelization Strategies

Example: Regional Earthquake Scenario Simulation

HPC Challenges and Lessons

Cross‑Cutting Patterns from Scientific Case Studies

1. Workload Types

2. Parallelization Models

3. Performance Concerns

4. Workflow and Operations

Comments

Where to Move