11.1.2 Intel oneAPI

Table of Contents

Overview of Intel oneAPI in HPC

Intel oneAPI is Intel’s unified programming and tool framework that replaces the older Intel Parallel Studio and classic Intel Compiler Suite. In an HPC context, you mostly meet it as a family of compilers and performance tools that target Intel CPUs and GPUs with a single, consistent toolchain. It is designed to support C, C++, Fortran, and data parallel offload, and to work well with common build systems used on clusters.

For this chapter, we focus on what you need to know as an HPC beginner who will compile and run codes on systems that provide Intel oneAPI.

Components Relevant to HPC Users

From the full oneAPI ecosystem, only a few parts are immediately important for compiling and running scientific codes.

The central component is the Intel oneAPI DPC++/C++ Compiler, typically accessed as icx and icpx for C and C++. These compilers are based on LLVM and replace the older icc and icpc. On many clusters, you will still find both the classic and the new compilers available, but the oneAPI compilers are the long term option.

For Fortran, you use the Intel Fortran Compiler Classic ifort or the newer Intel Fortran Compiler ifx. Both are often shipped as part of the oneAPI HPC Toolkit. Classic ifort has been widely used for legacy and production codes, while ifx is the new LLVM based Fortran compiler that is gradually adding full language and feature support.

There are additional oneAPI tools such as math and MPI libraries, profilers, and GPU offload tools. These are important to HPC applications, but they are covered in other chapters. Here you only need to understand that the compilers can automatically link against Intel’s optimized libraries when used with the right options and modules.

Loading Intel oneAPI on HPC Systems

On clusters, Intel oneAPI is almost always provided through environment modules. The specific module names differ between systems but often include strings like intel, intel-oneapi, or intel-compilers.

You typically start with something like:

module avail intel
module load intel-oneapi

module load intel/2024.0

Once the module is loaded, your PATH and related environment variables are updated so that commands like icx, icpx, ifx, and ifort are available and choose Intel’s libraries by default.

Some systems also provide separate modules for components such as intel-oneapi-compilers, intel-oneapi-mkl, or intel-oneapi-mpi. When building codes, make sure that your loaded modules match what your build system expects. A mismatch between compiler and MPI modules often leads to link errors.

Always document the exact Intel oneAPI module name and version used to build and run an application, for example in a README or job script comment.

Intel C and C++ Compilers in oneAPI

The oneAPI C and C++ compilers are accessed through several command names.

icx is the C compiler front end for Intel oneAPI, and icpx is the C++ front end. These are the recommended compilers for new development and are compatible in many respects with clang. You use them much like gcc or clang:

icx -O3 -march=native -o mycode mycode.c
icpx -O2 -g -o mycode_cpp mycode.cpp

On systems that still provide the classic compilers, icc and icpc are older C and C++ compilers. Many legacy build systems expect these names. If your system has both, read local documentation to see which is recommended. Over time, you should expect icx and icpx to replace icc and icpc.

The compilers aim to be drop in replacements for other command line compilers on UNIX like systems. Most flags you know from GCC, such as -c, -o, and -I, behave in the same way. Optimizations and diagnostics, however, use Intel specific flags that you should become familiar with.

Intel Fortran Compilers

Fortran remains central to many HPC codes. Intel provides two main Fortran compiler commands in the oneAPI environment.

ifort is the classic Intel Fortran Compiler. It is widely used for large, existing Fortran codebases and supports a broad range of Fortran standards and vendor extensions. Many production climate, CFD, and physics codes rely on it.

ifx is the newer Fortran compiler based on the same LLVM infrastructure as icx and icpx. It aims for source compatibility with ifort, but there are still differences and some features are being added incrementally. On many systems, ifort will still be the default choice for established codes, while ifx is recommended for new projects or for gradual evaluation.

You compile Fortran code in the usual manner:

ifort -O3 -xHost -o solver solver.f90
ifx -O3 -o solver_new solver.f90

If you switch between ifort and ifx, keep in mind that numerical results and performance can differ slightly due to different optimization engines and math library interactions. In sensitive simulations, validate the new compiler’s results.

Optimization Flags and Tuning for Intel Hardware

In HPC, Intel compilers are used primarily for their optimization capabilities on Intel processors and, increasingly, on Intel GPUs.

At a basic level, you control optimization with the -O family of flags, such as -O0, -O1, -O2, and -O3. -O0 turns off optimizations and is useful for debugging. -O2 is a good general setting. -O3 enables more aggressive optimizations that can improve performance but sometimes increase compile time or change numerical behavior slightly.

A common Intel specific option is -xHost, which tells the compiler to generate instructions tuned for the host CPU you are compiling on. This can produce faster binaries but can limit portability if you later run the program on older processors that lack those instructions.

Another approach is to specify exactly which instruction set to use, with flags like -xAVX2 or -xCORE-AVX512. These options tell the compiler which vector instruction set to target and can significantly influence performance on modern CPUs.

For some codes, you can also experiment with -ipo, which enables interprocedural optimization across multiple source files. This can increase performance at the cost of longer compilation and more complex linking.

When using aggressive optimization flags such as -O3, -ipo, or architecture specific -x... options, always verify numerical correctness with test cases before trusting performance results.

Vectorization and Reports

Intel oneAPI compilers are well known for their vectorization capabilities. They attempt to use SIMD instructions to operate on multiple data elements in parallel. The actual vectorization behavior is not always obvious from the source, so you should use compiler reports.

Typical options include:

icx -O3 -qopt-report=5 -qopt-report-phase=vec -o mycode mycode.c
ifort -O2 -qopt-report=5 -qopt-report-phase=vec -o solver solver.f90

These options generate detailed text reports describing which loops were vectorized and why some loops were not. The exact option naming can vary slightly between compiler generations, so read the installed compiler’s documentation or man pages on your system.

You can then inspect the loop reports to see, for example, that a loop was not vectorized due to assumed data dependencies, function calls inside the loop, or unsupported constructs. This information is essential for targeted performance improvements.

Treat vectorization reports as your primary insight into how Intel compilers actually exploit SIMD. They show which loops are vectorized and which remain scalar, which is critical for performance oriented HPC code.

Debugging, Warnings, and Strictness

Even as you focus on performance, you should use compiler warnings and debug information to catch errors early.

With Intel oneAPI compilers, you can enable debug information using -g, and you can often combine -g with moderate optimization, such as -O2, when you need to debug performance sensitive code.

You can increase warning levels using GCC compatible options such as -Wall and -Wextra in C and C++. These are supported by icx and icpx and help catch common mistakes, unused variables, and suspicious constructs. For Fortran, refer to your compiler’s documentation for the recommended warning set.

When debugging difficult runtime issues, you may want to temporarily disable optimizations with -O0. At this level, variable lifetimes and code structure map more closely to the original source, which simplifies the use of debuggers.

Integration with MPI and Math Libraries

On most HPC systems, Intel compilers are combined with optimized MPI implementations and math libraries. The details of MPI usage are covered in other chapters, but a few points matter for Intel oneAPI integration.

If your system has Intel MPI as part of oneAPI, you will usually compile with wrapper commands such as mpiicc, mpiicpc, mpiifort, or newer wrappers that target icx and ifx. These wrappers automatically pass the right include paths and libraries to the underlying Intel compilers.

For linear algebra and related operations, the Intel oneAPI Math Kernel Library (MKL) provides tuned routines. Linking to MKL can be complex if done manually, so you should prefer documented link line helpers or environment variables that come with the oneAPI MKL module. Some clusters provide convenience variables or wrapper scripts that simplify MKL usage.

Always check your site specific documentation, since HPC centers often provide recommended compiler and library combinations to ensure both correctness and performance on their hardware.

Using Intel oneAPI with Make and CMake

In practical HPC work, you rarely call compilers manually. Instead, you rely on build systems such as Make and CMake, which are covered elsewhere. Here we focus only on how Intel oneAPI fits into these tools.

With Make based builds, you typically set the CC, CXX, and FC variables to Intel compiler commands before running make. For example:

export CC=icx
export CXX=icpx
export FC=ifx
make

Some legacy projects may still expect icc, icpc, and ifort. In that case, either adjust the Makefile to use the newer compilers or load a compatibility module that provides the classic compiler names.

With CMake, you specify the compilers at configure time, for example:

cmake -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DCMAKE_Fortran_COMPILER=ifx ..

CMake will then configure the build for Intel compilers and will adjust flags accordingly. On many clusters, site specific CMake toolchain files exist that preconfigure Intel oneAPI integration. If those are available, prefer them because they encode local best practices.

When switching from one compiler family to another, such as GCC to Intel oneAPI, always clean your build directory or rebuild from scratch to avoid inconsistent object files and link errors.

Typical Usage Patterns on Clusters

As a beginner, you will most often use Intel oneAPI in a few recurring scenarios on an HPC cluster.

You log in, load the Intel compiler and possibly Intel MPI and MKL modules, configure your build system to use icx, icpx, and ifx or ifort, and then run make or cmake to compile your application.

You choose an optimization level, usually -O2 or -O3, and maybe an architecture specific flag such as -xHost. You then produce one build intended for debugging, for example with -O0 -g, and another intended for production performance, with higher optimization and possibly link time and vectorization reports enabled.

Once built, you run the application on the cluster using job scripts and the scheduler. If performance is not satisfactory, you look at compiler vectorization reports, adjust flags, or consult profiling tools, many of which integrate directly with Intel built binaries.

By understanding how Intel oneAPI compilers fit into this process, you can build efficient codes on systems where Intel hardware and toolchains are dominant, and you can work with existing HPC software stacks that have been tuned over many years around these tools.

Comments

Please login to add a comment.

Don't have an account? Register now!