Table of Contents
Classical HPC Meets Quantum Computing
Quantum computing and classical high performance computing are starting to interact in practical workflows. In this chapter the focus is not on how quantum computers work in detail, but on how they are combined with clusters, supercomputers, and traditional parallel applications.
Why Quantum Needs HPC
Quantum processors are currently small, noisy, and accessed as remote services. They are powerful only for very specific algorithmic kernels. Everything around those kernels preparation, error mitigation, orchestration, and analysis is a classical task that can be very computationally intensive. This is where HPC systems come in.
A typical quantum workflow involves several classical steps. One must formulate the problem and transform it into a quantum-friendly representation. One must optimize and compile the quantum circuit. One must decide parameter values, perform classical preconditioning or approximation, and manage millions of circuit executions for sampling. Afterwards, the results must be analyzed, processed, and fed into a larger simulation or optimization loop. Each of these steps can benefit from parallel processing on CPUs, GPUs, or accelerators.
The result is a hybrid model, often described as quantum-classical or HPC-quantum integration. The quantum hardware acts as a specialized accelerator that is tightly coupled with a large amount of classical compute power and memory.
Quantum as a Specialized Accelerator in HPC
Conceptually, one can think of a quantum processing unit, often called a QPU, as another type of accelerator in a heterogeneous system. However, there are important differences compared to GPUs or other classical accelerators.
QPU calls have high latency because they are usually accessed over a network, and the device may be shared with many users. The number of logical operations a QPU can perform in a single circuit is limited by decoherence and noise, and error correction is not yet widely available. Quantum operations are non deterministic, so one usually needs to execute the same circuit many times to estimate probabilities or expectation values. Access is often through cloud APIs with queueing and time-sliced usage, not through direct PCIe slots or tightly coupled on-node devices, although some early on-premise integration experiments exist.
For an HPC user, this means that the QPU is best treated as a remote, scarce, and stochastic accelerator. The host HPC system must hide latency through asynchronous calls and batch submission of circuits, manage queueing and scheduling, and aggregate results while overlapping other work.
In current practice, quantum devices are not replacements for HPC systems. They act as remote, specialized accelerators that are orchestrated by powerful classical HPC resources.
Hybrid Quantum-Classical Algorithms in HPC Workflows
Many of the first practical quantum use cases are hybrid algorithms. These combine a quantum kernel with a classical optimization or simulation loop. The quantum part usually evaluates a cost, an energy, or a probability, while the classical part updates parameters and handles most of the heavy computation and storage.
A simple abstract pattern is as follows. Start with classical input data and an initial guess for parameters. Use the HPC system to encode data, simulate or precondition it, and construct a parameterized quantum circuit. Submit the circuit to the QPU, possibly many times, to obtain measurement statistics or expectation values. Transfer the results back to the HPC environment. Use a classical optimizer, often running in parallel and possibly on GPUs, to update parameters or refine the problem representation. Decide on new circuits or new problem instances and repeat the cycle until a stopping criterion is satisfied. Finally, post-process the results using standard HPC techniques and integrate them into downstream simulation, visualization, or analysis stages.
Variational algorithms, such as variational quantum eigensolvers in chemistry or quantum approximate optimization methods in combinatorial optimization, are standard examples of this pattern. The classical side runs an iterative optimization over parameters, which can be computationally heavy and parallelizable. The quantum side provides specific evaluations that would be hard or impossible to compute efficiently by purely classical means, at least in principle.
From the HPC perspective, the important aspect is that the total runtime and resource consumption is dominated by the classical steps in most current scenarios. This includes running large numbers of trial circuits, performing classical simulations for benchmarking and debugging, and processing noisy measurements. This balance may shift as quantum hardware matures, but hybrid control and orchestration on HPC systems will remain central.
Quantum Circuit Simulation on Supercomputers
At present, one of the largest uses of HPC in the quantum space is the simulation of quantum circuits on classical hardware. Quantum state vectors grow exponentially in memory with the number of qubits. An ideal pure state of $n$ qubits needs $2^n$ complex amplitudes. If each amplitude uses 16 bytes, then total memory is
$$
M = 16 \times 2^n \text{ bytes}.
$$
This growth rapidly exceeds the capacity of individual machines. HPC clusters with large aggregate memory and fast interconnects are therefore essential for simulating circuits with many qubits.
Such simulators distribute the state vector across many nodes and apply quantum gates using MPI and multithreading or GPUs. Communication patterns are often designed to minimize data movement over the network, and careful layout is needed to align qubit operations with local memory regions when possible. These simulations are used for algorithm prototyping, verification of quantum hardware, and exploration of the limits of classical simulation.
There are different simulation approaches. State vector simulators track the full quantum state and can reproduce exact results for moderate numbers of qubits. Tensor network simulators exploit structure and low entanglement to compress the state and simulate larger systems in some cases. Stabilizer and quasi-probability methods focus on specific classes of circuits. All these approaches share an HPC flavor, since they require careful parallelization, memory management, and performance tuning.
For HPC practitioners, quantum simulation codes are interesting workloads because they stress multiple aspects of the system. They often combine MPI for distributed memory, OpenMP or CUDA for node-level parallelism, heavy use of memory bandwidth, and careful scheduling of collective communication. In some countries, supercomputing centers have specific allocations and software stacks dedicated to such simulations.
HPC-Quantum Integration Architectures
There are several emerging architectural patterns for integrating HPC and quantum resources. These differ in the physical location of the QPUs, the network connections, and the software layers that glue them together.
A common pattern is loose coupling through quantum cloud services. In this case, the HPC job runs on a standard cluster or supercomputer and communicates with provider APIs over wide area networks. The quantum backends may reside in different data centers. This model is relatively easy to adopt and mirrors the way many organizations already use public cloud resources, but it adds significant network latency and variable queue times.
An intermediate pattern is tight coupling within the same data center. Here, the QPUs are housed in the same facility as the supercomputer, connected with low-latency, high-bandwidth networks and integrated into the center's security, scheduling, and monitoring infrastructure. Jobs running on the supercomputer can reach the QPUs with less overhead and more predictable performance.
The most integrated pattern envisions QPUs as first-class devices managed by the HPC resource manager. In such models, the scheduler can allocate both classical nodes and quantum devices to a single job, apply fair-share policies, and enforce accounting and quotas. Workflows can request specific numbers of QPU shots or time windows, and pre- and post-processing can run next to the quantum control systems. This kind of integration is experimental, but several centers are exploring it.
In all architectures, data management and latency hiding are key concerns. Since measurement results are classical and small compared to typical HPC data volumes, the main challenge is not moving large datasets to and from QPUs, but coordinating many small, latency-sensitive calls within large parallel jobs. This often leads to designs where the actual quantum API calls are issued from a small subset of processes, while the rest of the job focuses on parallel simulation, optimization, or analysis with occasional synchronization points.
Programming Models and Software Stacks for Integrated Workflows
Quantum programming environments are evolving toward the same type of layered stacks that exist in classical HPC. At the top there are problem-domain libraries, for example chemistry or optimization frameworks that hide quantum and classical details. Beneath them are quantum SDKs that define circuits, gate sets, measurement operations, and backends. And at the bottom there are device-specific compilers and control stacks.
To integrate with HPC, these SDKs must fit into existing workflows and tooling. This typically means that they provide Python or C++ libraries that can be called from MPI-enabled applications, support asynchronous submission of batches of circuits, and can serialize and deserialize circuits efficiently. They are often installed as part of the HPC software stack, sometimes with environment modules, and configured to interact with external or co-located quantum backends.
For developers used to MPI and OpenMP, hybrid programming patterns start to look familiar. One can use MPI ranks to distribute problem instances, parameter vectors, or circuit templates. Each rank can prepare its own quantum tasks and either simulate them locally using classical simulators, or submit them to real devices. Within each rank, threads or GPUs can accelerate classical pre-processing and result analysis. The quantum backend is accessed through well-defined client libraries, much as one would call linear algebra libraries.
Some quantum SDKs also support local or remote state-vector and tensor network simulators that are themselves parallel. When deployed on clusters, these simulators may use MPI internally, which raises integration questions. A common arrangement is to have a high-level driver program that uses MPI to distribute work across many nodes, while each node runs a standalone simulator instance that uses multithreading or GPU offloading only. More complex setups, where the simulator uses MPI across nodes, require careful planning of communicator hierarchies and resource allocation.
In integrated workflows, quantum libraries are just another part of the software stack, and must coexist with MPI, OpenMP, GPU frameworks, and filesystem conventions used on the cluster.
Scheduling, Resource Management, and Workflows
Integrating quantum resources into the scheduling and resource management model of HPC centers is an active area of development. Traditional schedulers such as SLURM or PBS were designed for CPU and GPU nodes. Extending them to support QPUs involves both technical and policy questions.
One simple approach is external scheduling. The HPC job runs normally, and quantum calls are routed to a separate cloud scheduler controlled by the quantum provider. From the HPC center's perspective, QPU usage is invisible. This is easy to implement but makes it hard to reason about total turnaround time or enforce integrated quotas.
A more integrated approach is to treat QPUs as generic, countable resources. In SLURM, for example, they can be modeled as generic resources that jobs can request. When a user submits a hybrid job, they specify both the number of nodes and the amount of QPU time or number of shots. The scheduler ensures that both classical and quantum resources are available before starting the job. The job itself then communicates with the local quantum control layer rather than with an external cloud.
Workflow systems such as CWL, Snakemake, or domain-specific orchestration tools are also being extended to support tasks that involve quantum computations. In large workflows, not all steps require quantum resources. Some runs may use only classical simulators for testing or parameter exploration. The orchestration system therefore must be able to switch between simulator backends running on the cluster and real devices, potentially based on runtime decisions or current queue lengths.
For users, this means that hybrid HPC-quantum jobs will often be described as workflows with multiple stages. One stage might generate circuit batches or parameter sets. Another stage distributes these tasks to parallel workers. Some tasks run on classical nodes, some make remote QPU calls, and some perform post-processing and checkpointing. Understanding job dependencies, resource requests, and potential bottlenecks becomes essential.
Challenges of Integrating Quantum and HPC
While the idea of using quantum resources from HPC systems is attractive, there are significant challenges that beginners should be aware of.
One challenge is performance unpredictability. Queue wait times on shared quantum services, variable noise levels, and occasional calibration periods can make runtime hard to predict. This complicates job planning, especially for large ensemble runs that combine many quantum calls with time-limited allocations on supercomputers.
Another challenge is error handling and reproducibility. Quantum devices are inherently noisy, and device calibrations change over time. Two runs of the same workflow can produce slightly different measurement statistics. From an HPC perspective, this contrasts with the usual expectation of bitwise reproducibility or at least stable numerical behavior. Hybrid workflows must incorporate error bounds, confidence intervals, and statistical checks, and must record device calibration metadata to allow meaningful comparisons.
Data localization and security also raise integration questions. Many science and industry applications involve sensitive data that cannot easily leave institutional boundaries. If quantum services are only available through public cloud, one must design anonymization or encoding strategies that do not leak sensitive information, or prioritize on-premise QPU installations. Compliance and export control rules can further constrain where and how quantum resources are used.
Finally, there is a software maturity gap. Classical HPC software ecosystems have decades of optimization and standardization. Quantum SDKs are younger, with rapidly changing APIs, backends, and performance characteristics. Integrating these into stable production workflows requires careful version management, containerization, and testing strategies.
Integrated HPC-quantum workflows must handle unpredictable performance, noisy results, changing calibrations, and evolving software, all while fitting into existing scheduling and data governance models.
Outlook for Quantum-HPC Co-design
As both HPC and quantum systems move toward exascale and beyond, co-design of hardware and software for joint use is becoming a research topic. Co-design means that architectures, algorithms, compilers, and applications are developed together rather than in isolation.
Future supercomputers may include dedicated quantum accelerators in the same facility, with network fabrics and control stacks tuned for low-latency quantum-classical interaction. Compilers may automatically partition portions of an application into quantum kernels, based on high-level directives or annotations, similar to how OpenACC or other directive-based models handle GPUs. Runtime systems may schedule quantum calls dynamically based on current load, noise levels, and application priorities.
For application domains such as materials science, cryptography, machine learning, and optimization, long-term roadmaps already consider which parts of the workload might benefit from quantum acceleration. This leads to algorithmic studies where classical HPC and quantum approaches are compared not in isolation, but as components of a combined workflow. The goal is to estimate whole pipeline speedups and energy usage, not just improvements in isolated kernels.
In this co-design view, HPC is not replaced by quantum computing. Instead, quantum becomes another element in a diverse ecosystem that includes CPUs, GPUs, specialized ASICs, FPGAs, and near-memory accelerators. Understanding how to orchestrate these heterogeneous resources, measure their combined performance, and design scalable workflows will be a key skill for future HPC practitioners.