11.1.3 LLVM

Table of Contents

Overview of LLVM in HPC

LLVM is a modular compiler infrastructure rather than a single compiler. In HPC, you most often meet it through the clang family of frontends (C, C++, Fortran) and libraries that many tools build on.

Key ideas relevant to HPC:

Modern, standards-conforming frontends (clang, clang++, flang)
A rich optimization pipeline working on an intermediate representation (LLVM IR)
Good diagnostics and tooling (static analysis, sanitizers, code formatters)
Backends that target many CPU architectures (x86_64, ARM, POWER, etc.) and some accelerators

On many clusters, LLVM-based compilers coexist with GCC and vendor compilers. Choosing between them is often about language support, optimization quality for your target, and available tooling.

LLVM Toolchain Components You’ll See on Clusters

For HPC work you typically interact with:

clang – C compiler
clang++ – C++ compiler
flang or flang-new – Fortran compiler (status can be cluster-dependent)
llvm-ar, llvm-ranlib, llvm-nm – binary utilities
opt – optimization passes at IR level (advanced use)
clang-tidy, clang-format – static analysis and formatting (mainly for C/C++)

Compilers are usually provided through environment modules, e.g.:

bash

module avail llvm
module load llvm/16.0.6

Exact names and versions depend on the site.

Basic Usage Patterns for Clang/LLVM

Invoking the compiler

Typical invocations resemble GCC:

bash

# C
clang -O3 -march=native -o mycode mycode.c
# C++
clang++ -O3 -std=c++20 -march=native -o mycode mycode.cpp
# Fortran (if flang is available)
flang -O3 -o mycode mycode.f90

Common flags map closely to GCC for portability on clusters:

-c – compile only (produce object file)
-o – output file name
-I, -L, -l – include/library path and linking flags
-std= – language standard selection
-O0, -O1, -O2, -O3, -Ofast – optimization levels

You usually use these inside build systems like Make or CMake rather than by hand for large projects.

Optimization Levels and HPC Considerations

Clang supports familiar optimization flags:

-O0: no optimization, best for debugging
-O2: good default; enables most safe optimizations
-O3: more aggressive loop and inlining optimizations
-Ofast: like -O3 plus flags that may violate strict standards (e.g. relaxed floating-point rules)

For scientific codes where numerical reproducibility matters, test carefully when using -Ofast, and consider explicit floating-point flags (below).

CPU-Specific Tuning with LLVM

LLVM’s backends allow tuning for specific architectures:

-march= – target CPU architecture
-mtune= – optimize scheduling for a CPU (if distinct from -march)

Examples:

bash

# Optimize for the node’s current CPU
clang -O3 -march=native mycode.c -o mycode
# Optimize for Intel Ice Lake (example, check your cluster docs)
clang -O3 -march=icelake-server mycode.c -o mycode
# Optimize for AMD Zen 3 (e.g., EPYC Milan)
clang -O3 -march=znver3 mycode.c -o mycode

On shared clusters, -march=native is safe when compiling on the same kind of compute node you will run on. For cross-compilation or heterogeneous clusters, follow site recommendations for -march=.

Vectorization and Floating-Point Behavior

LLVM’s loop vectorizer and SLP vectorizer are essential for SIMD in HPC.

Enabling/controlling vectorization

At -O2 and above, basic vectorization is usually on by default for suitable loops. You can influence it with:

-fvectorize – enable loop vectorization (often on at -O2+)
-fslp-vectorize – enable SLP vectorizer (often on at -O2+)
-Rpass=loop-vectorize – report loops that were vectorized
-Rpass-missed=loop-vectorize – report loops that could not be vectorized

Example:

bash

clang -O3 -Rpass=loop-vectorize -Rpass-missed=loop-vectorize mycode.c

This prints messages during compilation about which loops were vectorized and why some were not, which is helpful for performance tuning.

Floating-point optimization flags

LLVM provides fine-grained control over floating-point optimizations:

-ffast-math – enables aggressive FP optimizations, may change results
-fno-math-errno – allow faster math functions without setting errno
-funsafe-math-optimizations – relax FP safety constraints
-freciprocal-math – allow use of reciprocal approximations
-fno-fast-math – disable fast-math behavior

-Ofast implies -ffast-math. For codes needing bitwise reproducibility across platforms, prefer -O2 or -O3 with more conservative FP flags and test thoroughly.

OpenMP with Clang/LLVM

LLVM has its own OpenMP runtime (libomp) and supports OpenMP offload to some accelerators (depending on version and build).

Basic usage for CPU-only OpenMP with Clang:

bash

clang -O3 -fopenmp mycode.c -o mycode
clang++ -O3 -fopenmp mycode.cpp -o mycode

Notes for HPC:

Some clusters provide separate modules like llvm-openmp or link flags like -lomp if -fopenmp is not enough by default.
If you mix compilers (e.g., clang with gfortran libraries), check OpenMP runtime compatibility.
For GPU offload via OpenMP, support is highly version- and hardware-dependent; follow your site’s documentation if available.

Interoperability with Other Compilers and Libraries

LLVM is often used alongside GCC and vendor compilers:

Linking: mixing object files from different C/C++ compilers is usually possible if they use the same ABI and compatible standard libraries.
Libraries: many numerical libraries (BLAS, LAPACK, FFT, etc.) can be linked to Clang-compiled code exactly as with GCC, e.g.:

bash

clang -O3 main.c -lblas -llapack -o main

Potential issues to watch:

C++ standard library mismatches (libstdc++ vs libc++)
Fortran name mangling and runtime libraries when mixing Fortran and C/C++ from different compiler families

On clusters, module systems often provide “compiler families” where compilers and libraries are tested together. Use matching modules to avoid ABI issues.

Diagnostics, Sanitizers, and Tooling

One of LLVM’s strengths in HPC development is tooling that helps you find bugs and performance issues earlier.

Better diagnostics

Clang provides readable error and warning messages. You can make them stricter:

bash

clang -Wall -Wextra -Wpedantic -O2 mycode.c -o mycode

This often catches undefined behavior and portability problems that can cause mysterious crashes at large scale.

Sanitizers (debug-time error detection)

LLVM implements many sanitizers that instrument your code:

AddressSanitizer (ASan): -fsanitize=address
UndefinedBehaviorSanitizer (UBSan): -fsanitize=undefined
ThreadSanitizer (TSan): -fsanitize=thread (for threaded codes)
MemorySanitizer (MSan): -fsanitize=memory (where supported)

Example (debug build with UB and address sanitizers):

bash

clang -g -O1 -fsanitize=address,undefined mycode.c -o mycode_asan

These instruments slow down your program, so you typically use them on smaller test cases before scaling up to the full cluster job.

Static analysis and formatting

For larger codebases:

clang-tidy: static analysis and style/linting for C/C++
clang-format: automatic formatting to a given style

Example:

bash

clang-tidy mycode.cpp -- -I/path/to/includes
clang-format -i mycode.cpp

These tools are often run locally during development, but they’re also available on many clusters.

LLVM and Build Systems

When using Make or CMake, you often control LLVM usage simply by setting the compiler variables.

Make

In a Makefile:

make

CC = clang
CXX = clang++
FC = flang  # if available
CFLAGS = -O3 -march=native
CXXFLAGS = -O3 -march=native
FFLAGS = -O3 -march=native

Then make will build with LLVM-based compilers.

CMake

When configuring a project:

bash

cmake -DCMAKE_C_COMPILER=clang \
      -DCMAKE_CXX_COMPILER=clang++ \
      -DCMAKE_BUILD_TYPE=Release \
      ..

For Fortran (if using LLVM Fortran):

bash

cmake -DCMAKE_Fortran_COMPILER=flang ..

Cluster-provided CMake toolchains or modules may already preconfigure these for you; check site documentation.

Practical Guidance for Using LLVM on HPC Systems

Check available versions: newer LLVM often has better optimization and OpenMP/OpenMP-offload support. Use the recommended version for your system.
Follow compiler families: use libraries and MPI stacks built with the same compiler family when possible.
Start with moderate optimizations: begin with -O2 or -O3; only move to -Ofast or aggressive FP flags once you have tests in place.
Use diagnostics during development: combine -Wall -Wextra -Wpedantic and sanitizers on smaller runs to catch issues before large-scale jobs.
Compare with other compilers: for performance-critical kernels, compare LLVM builds with GCC or vendor compilers. Performance can vary by architecture and code pattern.

Understanding LLVM’s role and capabilities prepares you to make informed choices about compilers, debugging, and optimization strategies on modern HPC systems.

Comments

Please login to add a comment.

Don't have an account? Register now!

11.1.3 LLVM

Overview of LLVM in HPC

LLVM Toolchain Components You’ll See on Clusters

Basic Usage Patterns for Clang/LLVM

Invoking the compiler

Optimization Levels and HPC Considerations

CPU-Specific Tuning with LLVM

Vectorization and Floating-Point Behavior

Enabling/controlling vectorization

Floating-point optimization flags

OpenMP with Clang/LLVM

Interoperability with Other Compilers and Libraries

Diagnostics, Sanitizers, and Tooling

Better diagnostics

Sanitizers (debug-time error detection)

Static analysis and formatting

LLVM and Build Systems

Make

CMake

Practical Guidance for Using LLVM on HPC Systems

Comments

Where to Move