13.4 Using precompiled software on clusters

Table of Contents

Why precompiled software is common on HPC clusters

On most HPC systems, users rely heavily on software that has already been compiled and installed by system administrators. This is because:

Many HPC applications are large, complex, and depend on many libraries.
Administrators can compile them with machine-specific optimizations and MPI libraries.
Central installations reduce duplicated effort and ensure consistent, tested configurations.
Licensing and security are easier to manage centrally.

Your main tasks as a user are therefore:

Discover what is installed.
Select appropriate versions.
Combine multiple tools and libraries correctly.
Make your own code link against preinstalled libraries.

Where precompiled software lives on a cluster

Clusters usually have a structured software stack, often under directories like:

/usr/bin, /usr/lib – system-level tools, compilers, basic libraries.
/opt, /apps, /software, /sw – site-specific applications and libraries.
/usr/local – locally installed extras (sometimes used, sometimes not).
Shared module trees – e.g. /usr/share/modulefiles, /etc/modulefiles, /gpfs/software/modules.

You normally do not need to know exact paths; instead, you use:

Environment modules (or Lmod) for most scientific packages.
Package managers (Spack, EasyBuild, system package managers) indirectly, via modules.
Site-specific wrapper scripts, such as matlab, abaqus, g09, etc.

Discovering available precompiled software

Using the module system

On most clusters, the module system is your primary interface to precompiled software.

Common commands (exact syntax may vary slightly by site):

List all available modules:

module avail or module av

Search by keyword:

module spider mpi
module avail gcc

Show detailed info about a module:

module show openmpi/4.1.1
module spider hdf5/1.12.1

Typical information you see:

What environment variables it sets (e.g. PATH, LD_LIBRARY_PATH, CPATH, PKG_CONFIG_PATH).
Which modules it depends on (e.g. a library built with a specific compiler or MPI).
Usage notes and site-specific documentation.

Site documentation and software catalogs

Clusters often provide:

A web page listing installed software, version policies, and recommended defaults.
Site-specific wikis with:

Example job scripts using particular applications.
Notes on limitations (e.g. no GUI support for some tools).
Special license or usage constraints (commercial solvers, CFD codes, etc.).

Always check your site documentation for:

The preferred/standard modules for compilers and MPI.
Any “default” software stacks your center recommends.

Loading and managing software environments

Basic workflow with modules

The typical sequence:

Check what’s available:

module avail

Load required modules:

module load gcc/12.2.0
module load openmpi/4.1.5
module load fftw/3.3.10

Verify settings:

which mpicc
echo $LD_LIBRARY_PATH

Additional useful commands:

module list – show currently loaded modules.
module unload modulename – remove a module.
module purge – unload all modules (start clean).
module swap old new – replace one module with another.

Dealing with module hierarchies

Many systems use hierarchical modules, where:

You first load a toolchain or compiler module (e.g. gcc, intel, nvhpc).
This then exposes compatible libraries and tools built with that compiler.
MPI implementations (e.g. openmpi, mpich, intel-mpi) are layered on top.
Domain libraries and applications are layered above compilers and MPI.

Example:

module purge
module load gcc/12.2.0
module load openmpi/4.1.5
module load hdf5/1.12.2-mpi
module load petsc/3.20

If you change the base compiler, most dependent modules will no longer be valid; you typically:

module purge
module load intel/2024.0
module load intel-mpi/2021
module load hdf5/1.12.2-mpi

Making environments reproducible

To ensure that you and collaborators use the same stack:

Save your loaded modules:

module list and copy the output into your job script or a setup script.

Create a small shell script:

# env_hpc.sh
module purge
module load gcc/12.2.0
module load openmpi/4.1.5
module load hdf5/1.12.2-mpi
module load python/3.11

Then in your batch script or interactive shell:

source env_hpc.sh

Avoid relying on “default” modules when long-term reproducibility matters; specify versions.

Linking your own code against precompiled libraries

Precompiled numerical libraries on clusters (BLAS, LAPACK, FFTW, PETSc, etc.) are usually made available as modules. Once loaded, they expose:

Header search paths via CPATH/C_INCLUDE_PATH/CPLUS_INCLUDE_PATH.
Library search paths via LIBRARY_PATH and LD_LIBRARY_PATH.
Optional helper variables or tools (e.g. MKLROOT, pkg-config files).

Common patterns to compile your code:

# Using library-specific wrappers or variables
module load gcc/12.2.0
module load openblas/0.3.26
gcc -O3 -o myprog myprog.c -lblas -llapack

For more complex libraries:

module load petsc/3.20
mpicc mycode.c \
  $(pkg-config --cflags --libs petsc) \
  -o mycode_petsc

Always consult:

module show <libname> for flags or environment variables.
Site docs for recommended compile and link lines.

Using precompiled applications

For end-user applications (CFD codes, chemistry packages, etc.), clusters generally provide:

A module to expose the main executables.
One or more MPI-enabled variants.
Sometimes multiple build “flavors” (e.g. CPU-only versus GPU-accelerated).

Typical usage workflow:

Load the application module:

module load lammps/2023.08-mpi

Check what executable you should use:

which lmp_mpi

In your job script, run it through srun, mpirun, or the site’s recommended launcher:

srun lmp_mpi -in in.lammps

Be careful to:

Match the job’s requested resources (number of tasks, GPUs) to the application’s parallel mode.
Use the execution pattern recommended by your site (e.g. srun vs mpirun).

MPI, compilers, and compatibility

Precompiled parallel libraries and applications are tightly coupled to:

A specific compiler family and version.
A specific MPI implementation and ABI.

Implications for you:

You generally must use the same compiler and MPI modules that were used to build the libraries you depend on.
Mixing, for example, gcc/13 with a library built against gcc/10 often fails at link or runtime.
Similarly, linking against an openmpi-built library but running with mpich is unsupported.

Module hierarchies are designed to prevent most incompatible combinations, but you should still:

Avoid manually manipulating LD_LIBRARY_PATH in ways that override module settings.
Use the MPI compiler wrappers that match the loaded MPI module:

mpicc, mpicxx, mpif90 from openmpi or mpich.

Compile everything in your workflow (your code plus dependencies) within one coherent module stack.

Application-specific environment helpers

Some precompiled packages install helper commands or scripts, such as:

matlab – launches MATLAB (possibly with special flags on clusters).
vmd, gmx, gaussian, etc. – site-specific wrappers that:

Set application-specific environment variables.
Ensure correct license server options.
Apply cluster-specific defaults (e.g. no GUI, batch-only).

When using these:

Prefer the wrapper after loading the module, e.g.:

  module load gromacs/2023-mpi
  which gmx_mpi
  srun gmx_mpi mdrun -s topol.tpr

Read module show and any README or notes the module prints when loaded.

Using precompiled Python and R stacks

Clusters often provide:

Precompiled Python with scientific packages (NumPy, SciPy, MPI4Py, etc.).
R with common statistical and HPC libraries.

Typical usage:

module load python/3.11
python -c "import numpy, mpi4py"

or:

module load r/4.3
Rscript myscript.R

Good practices:

Use site-provided virtual environments or per-user package directories that are compatible with the cluster Python/R.
When performance matters, prefer the precompiled, optimized NumPy/SciPy linked against the cluster’s BLAS/LAPACK.

Handling multiple versions and conflicts

Because many versions exist, you must manage choices:

Use module avail appname to see all versions.
Prefer versions your site marks as “default” or “recommended” for new users.
If you hit problems, try:

A newer version (bug fixes).
An older version (compatibility with existing scripts).

Use module purge before switching toolchains to avoid subtle conflicts.

Example:

module purge
module load gcc/12.2.0
module load openmpi/4.1.5
module load hdf5/1.12.2-mpi

If you later decide to use an Intel-based stack:

module purge
module load intel/2024.0
module load intel-mpi/2021
module load hdf5/1.12.2-mpi

Precompiled software in batch jobs

Everything you set up interactively must also be set up inside batch jobs. Common pattern in a job script:

#!/bin/bash
#SBATCH --job-name=myjob
#SBATCH --ntasks=32
#SBATCH --time=02:00:00
#SBATCH --partition=standard
module purge
module load gcc/12.2.0
module load openmpi/4.1.5
module load lammps/2023.08-mpi
srun lmp_mpi -in in.lammps

Points to remember:

Batch jobs start with a minimal environment; modules must be loaded inside the script.
Do not rely on interactive shell startup files unless your center explicitly recommends it.
Keep the module section near the top of the script for clarity and reproducibility.

When precompiled software is missing or insufficient

Sometimes the cluster does not provide exactly what you need:

Missing package.
Needed version is too old or too new.
Application requires special options not used by the central build.

Common approaches:

Ask your support team to install or upgrade the package:

Provide the name, version, and why you need it.

Use user-level installs:

Python: pip install --user somepackage (respecting site guidelines).
R: install packages into your personal library.
Build smaller tools from source in your home or project space, using the same compiler and MPI modules as the cluster stack.

Use containers (Singularity/Apptainer) when supported, if you need a completely custom environment.

Even if you build your own code, it’s usually best to:

Compile against the cluster’s optimized BLAS, LAPACK, MPI, and other core libraries.
Use existing compiler and MPI modules for consistency and performance.

Good practices when using precompiled software

Always document which modules (and versions) you used for:

Publications.
Project reports.
Sharing with collaborators.

Keep a small “environment script” for each project to reload the same stack.
Prefer site-recommended modules and application builds, especially when starting out.
Test your workflow interactively on small inputs before scaling out in batch jobs.
Avoid manually overriding environment variables that modules manage, unless you understand the implications.

Using precompiled software effectively is largely about managing environments carefully and consistently; once you master modules and the site’s conventions, most HPC numerical libraries and applications become straightforward to use.

Comments

Please login to add a comment.

Don't have an account? Register now!