17.5 Case studies from industry

Table of Contents

Overview

High performance computing is deeply embedded in modern industry, even in places where it is not immediately visible. In this chapter the focus is on concrete patterns of how companies use HPC to create products, reduce risk, shorten time to market, and cut costs. The goal is not to teach new technical mechanisms, which are covered elsewhere in the course, but to show how those mechanisms are combined into complete, real world workflows.

Industrial HPC differs from academic HPC in three important ways. First, the primary metric is business value, not publications or novelty. Second, reliability and turnaround time are often more important than squeezing out the last bit of theoretical performance. Third, HPC is usually one step inside a much larger pipeline that includes data management, engineering processes, compliance, and integration with existing IT systems.

In the following sections you will see typical end to end case studies from selected industries. Each one highlights which parts of the HPC stack are critical, how parallelism is structured in practice, what constraints companies care about, and what trade offs they accept.

In industrial HPC, time to useful result and cost per result usually matter more than raw peak performance.

Automotive and Aerospace Design

In automotive and aerospace companies, HPC is a core tool for virtual prototyping. It replaces a large number of expensive and slow physical tests with digital simulations. The dominant applications are computational fluid dynamics (CFD) and structural mechanics, but electromagnetics, acoustics, and multiphysics simulations are also common.

A typical automotive CFD workflow starts from a CAD geometry of a car or engine component. The geometry is cleaned and meshed, which generates millions to billions of cells that discretize the fluid domain. The CFD solver then runs on a cluster using distributed memory parallelism. Most production codes use MPI across nodes, often with some shared memory or GPU acceleration inside each node.

The industrial constraint that shapes these jobs is turnaround time. Engineers want to test many design variants, not just one. For example, evaluating aerodynamic drag for dozens of configurations under different yaw angles and speeds can quickly generate hundreds of simulation cases. Instead of one massive simulation, the workload becomes a large ensemble of medium sized jobs. This leads to heavy use of job schedulers, job arrays, and parameter sweeps. Load balancing across many jobs is as important as scaling a single job to many cores.

In aerospace structural analysis, similar patterns arise for stress, fatigue, and vibration analyses of wings, fuselages, and engines. Large finite element models with tens of millions of degrees of freedom are solved in parallel. Here, reproducibility and certification are critical. Results may feed into regulatory approval, so companies rely on well validated commercial solvers, version controlled input decks, and carefully managed software environments. It is common to freeze compiler versions and library stacks for certified workflows, even if newer versions could run faster.

A recurring compromise in this sector is between mesh resolution and time to answer. A finer mesh can yield more accurate drag or stress predictions, but at a much higher computational cost. Engineering groups often define standard meshes that are “good enough” for concept design and reserve very high resolution meshes for late stage verification. HPC resources are therefore tiered, with many cheap runs for initial screening and a smaller number of very large runs for final checks.

Energy and Oil & Gas

The energy industry, especially oil and gas, has used HPC at scale for decades. The main use cases include seismic imaging, reservoir simulation, and increasingly wind and solar resource modeling.

Seismic imaging is a classic HPC problem. Companies collect massive volumes of sensor data from seismic surveys, then run large computations to reconstruct an image of the subsurface. The underlying algorithms often involve wave equation solvers and inverse problems. These workloads use thousands of MPI processes, heavy I/O, and carefully tuned parallel filesystems. Turnaround time directly maps to business value, because faster imaging enables faster drilling decisions.

Reservoir simulation models the flow of oil, gas, and water in underground reservoirs over time. These are long running simulations that can take hours or days even on large clusters. What matters here is not just a single simulation, but large scenario sets. Companies evaluate different production strategies, well placements, or operational plans, each corresponding to one or more runs. The resulting pattern is many related jobs with slightly different parameters, all reading similar input data and generating significant output. Checkpointing is important to avoid restarts after failures and to support long optimization loops.

Power grid and renewable energy companies use HPC for grid stability simulations, unit commitment optimization, and weather driven forecasting. For example, high resolution weather forecasts feed into wind farm output predictions, which in turn help plan grid operations. These workflows combine traditional simulation codes with data driven models, often involving machine learning accelerated by GPUs. The HPC aspect is not only the large compute jobs, but also the tight timing requirements. Forecasts must arrive before operational deadlines, so scheduling, resource guarantees, and quality of service policies matter.

A characteristic feature in this sector is close coupling between HPC and business risk. Simulation results influence multi million or billion dollar decisions. Because of this, there is a strong emphasis on verification, validation, and traceability. Production runs use standardized scripts and environments. Ad hoc experiments are possible, but only formally approved workflows feed final decisions.

Finance and Risk Analysis

Banks, hedge funds, and insurance companies use HPC to price complex financial instruments, assess risk, and comply with regulatory stress testing requirements. Here the dominant patterns differ from the physics based simulations of engineering. Many computations are embarrassingly parallel Monte Carlo simulations or scenario based portfolio evaluations.

In a Monte Carlo option pricing job, the same pricing model is applied across millions of random paths. Each path is mostly independent. This maps naturally to data parallel workloads. Large clusters or GPU nodes run thousands of processes or threads, each handling a subset of the paths. The key challenge is orchestrating massive numbers of short lived tasks efficiently and managing randomness carefully. Reproducibility in risk calculations is important. Companies control random number generator seeds and sequences across distributed runs to ensure that results can be reproduced for audits.

Insurance risk and regulatory stress testing often involve evaluating portfolios under thousands of economic scenarios. Each scenario requires reading relevant data, running risk models, and summarizing outputs. The workload is again an ensemble of runs. Rather than one monolithic MPI job, companies may use workflow managers or distributed task systems on top of the scheduler. This allows dynamic load balancing and failure handling at the job level, not only at the process level.

Latency constraints are different in finance from traditional batch oriented HPC. Some workloads are overnight batch jobs that must complete before markets open. Others are intraday or even near real time risk checks. For time critical workloads, organizations partition clusters, reserving dedicated capacity or using priority queues. GPU acceleration is common for Monte Carlo workloads because GPUs are well suited for SIMD style number crunching when models can be vectorized effectively.

Cost and regulation strongly influence technical choices. Cloud based HPC plays an increasingly important role since financial firms can burst to cloud resources for regulatory peaks, such as quarterly or yearly stress tests, without building massive on premises clusters that sit idle most of the time. However, strict controls on data privacy, audit trails, and deterministic behavior must be satisfied. This leads to heavy use of standardized containers and infrastructure as code in financial HPC environments.

Manufacturing and Consumer Products

In manufacturing and consumer product companies, HPC is used across the product lifecycle, from concept design through manufacturing process optimization and digital twins.

Consumer goods companies simulate fluid mixing, packaging integrity, and structural behavior of products under transport or usage. For example, a company might simulate how a detergent moves in a washing machine, how a bottle behaves during a drop, or how air flow works inside a refrigerator. These are usually medium sized CFD and finite element computations. The main driver is reduction of physical prototypes and faster design cycles.

In manufacturing process engineering, HPC helps optimize casting, welding, machining, and additive manufacturing. Thermal and mechanical simulations are used to predict residual stresses, distortions, and defects. These simulations often involve complex materials and nonlinear physics, which leads to long compute times and challenging convergence behavior. Here, HPC is needed not just for resolution, but also for exploring process parameter spaces. Companies run design of experiments campaigns where many simulations, each with different process parameters, are run in parallel. Schedulers and workflow tools orchestrate hundreds or thousands of jobs, each moderate in size.

The digital twin concept brings these ideas together. A digital twin is a virtual representation of a physical asset that is updated with real world data. For example, a manufacturer may maintain a digital twin of a turbine blade in service. Sensor data from the physical equipment feeds into models that estimate remaining life or optimal maintenance schedules. HPC enters when the twin uses detailed physics based simulations that run repeatedly as new data arrives. These workloads combine data ingestion, model calibration, and simulation. They require predictable performance, automated job submission, and robust error handling, since they often run continuously over long periods.

An important constraint in manufacturing environments is integration with existing enterprise systems such as product lifecycle management and manufacturing execution systems. HPC workflows must fit into established approval chains, data repositories, and versioning systems. This influences how input data is stored and accessed, how simulation results are archived, and how traceability is maintained for regulatory or liability reasons.

Pharmaceuticals and Healthcare

Pharmaceutical and biotech companies depend heavily on HPC for drug discovery, molecular modeling, and increasingly genomics and personalized medicine. The workloads combine large scale simulation with data intensive bioinformatics.

Molecular dynamics (MD) simulations are a key application. These simulate the motion of atoms in proteins, drug molecules, and solvents over time. Modern MD codes are highly optimized and use both MPI and GPU acceleration. Industrial drug discovery workflows rarely run one huge simulation in isolation. Instead, they run many medium sized simulations that explore different molecules, binding poses, or environmental conditions. Ensemble simulations and replica exchange methods generate large numbers of trajectories. Job schedulers must manage thousands of concurrent or queued runs, often with dependencies between phases of simulation and analysis.

Docking and virtual screening extend this pattern. Companies may screen millions of candidate molecules against a biological target. Each docking calculation is relatively small but must be repeated many times. This creates a bag of tasks workload. HPC clusters are used as high throughput engines. Success is measured by how many compounds per day can be screened, given a fixed budget. This encourages careful tuning of queue policies, job packing, and use of preemptible or lower priority jobs for less critical tasks.

Genomics and medical imaging bring data volumes to the forefront. Genome sequencing pipelines include alignment, variant calling, and annotation. Each step is moderately compute intensive and highly data intensive. Pipelines must process thousands of samples reliably. Healthcare related HPC workflows often run in regulated environments with strict privacy and security requirements. Access control, data encryption, and auditability are central concerns, so clusters run with tighter operational constraints than many research systems.

In this sector, turnaround time impacts clinical and business outcomes. Faster analysis can speed up clinical trials or diagnosis. Yet results must be reproducible and traceable, so companies rely heavily on workflow management systems that formalize each step. Containerization is widely used to ensure that pipelines run identically across clusters or cloud resources, while satisfying regulatory expectations on software validation.

Media, Entertainment, and Visualization

Media and entertainment companies use HPC for visual effects (VFX), animation, rendering, and post production. The underlying computation is not governed by scientific equations, but by the physics of light and materials as approximated in rendering algorithms.

Feature film rendering is a canonical industrial HPC case study. A single movie can require millions of frames of high resolution images, each composed of multiple layers and passes. Rendering each frame can take minutes to hours, even on powerful hardware. However, frames are independent or loosely coupled. This results in a very large number of embarrassingly parallel jobs. Studios operate render farms that are essentially HPC clusters tailored to this workload.

Schedulers distribute frames or tiles of frames to compute nodes. Jobs tend to be relatively short, so scheduling efficiency and overhead matter. Resource usage is dominated by CPU time and sometimes GPU time, with moderate memory requirements. Parallel filesystems are important, because assets such as textures, geometry, and caches must be accessible at high throughput to all nodes. Techniques like caching, local scratch storage, and data prefetching are used to avoid I/O bottlenecks.

Turnaround in this industry is defined by production schedules. Deadlines are fixed by release dates and marketing plans. Artists interactively refine scenes, which means there is a mix of interactive preview rendering and offline final quality rendering. Clusters must support both. Priority systems are tuned so that shots on the critical path receive more resources. This is similar in spirit to priority queues for urgent scientific runs, but driven by production schedules rather than proposal reviews.

Visualization and post production for scientific and engineering companies also rely on HPC. For example, rendering large simulation datasets into visual form can require large memory and compute resources. Here, compute nodes with powerful GPUs and large memory are used to interactively explore datasets through remote visualization tools. The goal is often human insight and communication rather than a single numerical metric.

Cross-Cutting Patterns and Lessons

Across these industrial case studies, several cross cutting patterns emerge that are useful for beginners to recognize.

First, many industrial workloads are not a single monolithic parallel job, but ensembles of related jobs. These can be parameter sweeps, Monte Carlo sets, scenario analyses, or large frame collections. As a result, efficient use of schedulers, job arrays, and workflow tools is central. Good industrial HPC users design pipelines that tolerate machine failures, use checkpoints, and can restart individual tasks without manual intervention.

Second, strict nontechnical requirements often dominate design choices. Regulatory compliance, data privacy, intellectual property protection, and integration with corporate IT systems all shape how HPC is operated. Standardized environments, containers, and rigorous software change control are common. The fastest code in a lab may not be acceptable in production unless it can be validated and maintained.

Third, cost and time trade offs are explicit. Many companies track cost per simulation, cost per screened compound, or cost per rendered frame. Decisions about mesh size, job scaling, and use of accelerators are framed in economic terms. Cloud HPC appears when elastic capacity is valuable, but it must be weighed against long term cost, data transfer, and governance.

Finally, success in industrial HPC depends on collaboration between domain experts, software developers, and HPC specialists. Engineers and scientists define models and constraints. Developers implement and maintain codes. HPC teams tune performance, manage clusters, and design workflows. The examples in this chapter show that real world HPC is a socio technical system, not only a collection of parallel algorithms.

In industry, HPC is valuable only when it is tightly integrated into decision making workflows and delivers reliable, timely, and economically justified results.

These case studies illustrate how the concepts introduced earlier in the course appear in real organizations. As you work on your own projects, you can think in these same terms: which decisions your computations support, what constraints matter, and how to structure your use of HPC resources to maximize useful outcomes.

Comments

Please login to add a comment.

Don't have an account? Register now!