Table of Contents
Why precompiled software is common on HPC clusters
On most HPC systems, users rely heavily on software that has already been compiled and installed by system administrators. This is because:
- Many HPC applications are large, complex, and depend on many libraries.
- Administrators can compile them with machine-specific optimizations and MPI libraries.
- Central installations reduce duplicated effort and ensure consistent, tested configurations.
- Licensing and security are easier to manage centrally.
Your main tasks as a user are therefore:
- Discover what is installed.
- Select appropriate versions.
- Combine multiple tools and libraries correctly.
- Make your own code link against preinstalled libraries.
Where precompiled software lives on a cluster
Clusters usually have a structured software stack, often under directories like:
/usr/bin,/usr/lib– system-level tools, compilers, basic libraries./opt,/apps,/software,/sw– site-specific applications and libraries./usr/local– locally installed extras (sometimes used, sometimes not).- Shared module trees – e.g.
/usr/share/modulefiles,/etc/modulefiles,/gpfs/software/modules.
You normally do not need to know exact paths; instead, you use:
- Environment modules (or Lmod) for most scientific packages.
- Package managers (Spack, EasyBuild, system package managers) indirectly, via modules.
- Site-specific wrapper scripts, such as
matlab,abaqus,g09, etc.
Discovering available precompiled software
Using the module system
On most clusters, the module system is your primary interface to precompiled software.
Common commands (exact syntax may vary slightly by site):
- List all available modules:
module availormodule av- Search by keyword:
module spider mpimodule avail gcc- Show detailed info about a module:
module show openmpi/4.1.1module spider hdf5/1.12.1
Typical information you see:
- What environment variables it sets (e.g.
PATH,LD_LIBRARY_PATH,CPATH,PKG_CONFIG_PATH). - Which modules it depends on (e.g. a library built with a specific compiler or MPI).
- Usage notes and site-specific documentation.
Site documentation and software catalogs
Clusters often provide:
- A web page listing installed software, version policies, and recommended defaults.
- Site-specific wikis with:
- Example job scripts using particular applications.
- Notes on limitations (e.g. no GUI support for some tools).
- Special license or usage constraints (commercial solvers, CFD codes, etc.).
Always check your site documentation for:
- The preferred/standard modules for compilers and MPI.
- Any “default” software stacks your center recommends.
Loading and managing software environments
Basic workflow with modules
The typical sequence:
- Check what’s available:
module avail- Load required modules:
module load gcc/12.2.0module load openmpi/4.1.5module load fftw/3.3.10- Verify settings:
which mpiccecho $LD_LIBRARY_PATH
Additional useful commands:
module list– show currently loaded modules.module unload modulename– remove a module.module purge– unload all modules (start clean).module swap old new– replace one module with another.
Dealing with module hierarchies
Many systems use hierarchical modules, where:
- You first load a toolchain or compiler module (e.g.
gcc,intel,nvhpc). - This then exposes compatible libraries and tools built with that compiler.
- MPI implementations (e.g.
openmpi,mpich,intel-mpi) are layered on top. - Domain libraries and applications are layered above compilers and MPI.
Example:
module purge
module load gcc/12.2.0
module load openmpi/4.1.5
module load hdf5/1.12.2-mpi
module load petsc/3.20If you change the base compiler, most dependent modules will no longer be valid; you typically:
module purge
module load intel/2024.0
module load intel-mpi/2021
module load hdf5/1.12.2-mpiMaking environments reproducible
To ensure that you and collaborators use the same stack:
- Save your loaded modules:
module listand copy the output into your job script or a setup script.- Create a small shell script:
# env_hpc.sh
module purge
module load gcc/12.2.0
module load openmpi/4.1.5
module load hdf5/1.12.2-mpi
module load python/3.11Then in your batch script or interactive shell:
source env_hpc.sh- Avoid relying on “default” modules when long-term reproducibility matters; specify versions.
Linking your own code against precompiled libraries
Precompiled numerical libraries on clusters (BLAS, LAPACK, FFTW, PETSc, etc.) are usually made available as modules. Once loaded, they expose:
- Header search paths via
CPATH/C_INCLUDE_PATH/CPLUS_INCLUDE_PATH. - Library search paths via
LIBRARY_PATHandLD_LIBRARY_PATH. - Optional helper variables or tools (e.g.
MKLROOT,pkg-configfiles).
Common patterns to compile your code:
# Using library-specific wrappers or variables
module load gcc/12.2.0
module load openblas/0.3.26
gcc -O3 -o myprog myprog.c -lblas -llapackFor more complex libraries:
module load petsc/3.20
mpicc mycode.c \
$(pkg-config --cflags --libs petsc) \
-o mycode_petscAlways consult:
module show <libname>for flags or environment variables.- Site docs for recommended compile and link lines.
Using precompiled applications
For end-user applications (CFD codes, chemistry packages, etc.), clusters generally provide:
- A module to expose the main executables.
- One or more MPI-enabled variants.
- Sometimes multiple build “flavors” (e.g. CPU-only versus GPU-accelerated).
Typical usage workflow:
- Load the application module:
module load lammps/2023.08-mpi- Check what executable you should use:
which lmp_mpi- In your job script, run it through
srun,mpirun, or the site’s recommended launcher: srun lmp_mpi -in in.lammps
Be careful to:
- Match the job’s requested resources (number of tasks, GPUs) to the application’s parallel mode.
- Use the execution pattern recommended by your site (e.g.
srunvsmpirun).
MPI, compilers, and compatibility
Precompiled parallel libraries and applications are tightly coupled to:
- A specific compiler family and version.
- A specific MPI implementation and ABI.
Implications for you:
- You generally must use the same compiler and MPI modules that were used to build the libraries you depend on.
- Mixing, for example,
gcc/13with a library built againstgcc/10often fails at link or runtime. - Similarly, linking against an
openmpi-built library but running withmpichis unsupported.
Module hierarchies are designed to prevent most incompatible combinations, but you should still:
- Avoid manually manipulating
LD_LIBRARY_PATHin ways that override module settings. - Use the MPI compiler wrappers that match the loaded MPI module:
mpicc,mpicxx,mpif90fromopenmpiormpich.- Compile everything in your workflow (your code plus dependencies) within one coherent module stack.
Application-specific environment helpers
Some precompiled packages install helper commands or scripts, such as:
matlab– launches MATLAB (possibly with special flags on clusters).vmd,gmx,gaussian, etc. – site-specific wrappers that:- Set application-specific environment variables.
- Ensure correct license server options.
- Apply cluster-specific defaults (e.g. no GUI, batch-only).
When using these:
- Prefer the wrapper after loading the module, e.g.:
module load gromacs/2023-mpi
which gmx_mpi
srun gmx_mpi mdrun -s topol.tpr- Read
module showand anyREADMEornotesthe module prints when loaded.
Using precompiled Python and R stacks
Clusters often provide:
- Precompiled Python with scientific packages (NumPy, SciPy, MPI4Py, etc.).
- R with common statistical and HPC libraries.
Typical usage:
module load python/3.11
python -c "import numpy, mpi4py"or:
module load r/4.3
Rscript myscript.RGood practices:
- Use site-provided virtual environments or per-user package directories that are compatible with the cluster Python/R.
- When performance matters, prefer the precompiled, optimized NumPy/SciPy linked against the cluster’s BLAS/LAPACK.
Handling multiple versions and conflicts
Because many versions exist, you must manage choices:
- Use
module avail appnameto see all versions. - Prefer versions your site marks as “default” or “recommended” for new users.
- If you hit problems, try:
- A newer version (bug fixes).
- An older version (compatibility with existing scripts).
- Use
module purgebefore switching toolchains to avoid subtle conflicts.
Example:
module purge
module load gcc/12.2.0
module load openmpi/4.1.5
module load hdf5/1.12.2-mpiIf you later decide to use an Intel-based stack:
module purge
module load intel/2024.0
module load intel-mpi/2021
module load hdf5/1.12.2-mpiPrecompiled software in batch jobs
Everything you set up interactively must also be set up inside batch jobs. Common pattern in a job script:
#!/bin/bash
#SBATCH --job-name=myjob
#SBATCH --ntasks=32
#SBATCH --time=02:00:00
#SBATCH --partition=standard
module purge
module load gcc/12.2.0
module load openmpi/4.1.5
module load lammps/2023.08-mpi
srun lmp_mpi -in in.lammpsPoints to remember:
- Batch jobs start with a minimal environment; modules must be loaded inside the script.
- Do not rely on interactive shell startup files unless your center explicitly recommends it.
- Keep the module section near the top of the script for clarity and reproducibility.
When precompiled software is missing or insufficient
Sometimes the cluster does not provide exactly what you need:
- Missing package.
- Needed version is too old or too new.
- Application requires special options not used by the central build.
Common approaches:
- Ask your support team to install or upgrade the package:
- Provide the name, version, and why you need it.
- Use user-level installs:
- Python:
pip install --user somepackage(respecting site guidelines). - R: install packages into your personal library.
- Build smaller tools from source in your home or project space, using the same compiler and MPI modules as the cluster stack.
- Use containers (Singularity/Apptainer) when supported, if you need a completely custom environment.
Even if you build your own code, it’s usually best to:
- Compile against the cluster’s optimized BLAS, LAPACK, MPI, and other core libraries.
- Use existing compiler and MPI modules for consistency and performance.
Good practices when using precompiled software
- Always document which modules (and versions) you used for:
- Publications.
- Project reports.
- Sharing with collaborators.
- Keep a small “environment script” for each project to reload the same stack.
- Prefer site-recommended modules and application builds, especially when starting out.
- Test your workflow interactively on small inputs before scaling out in batch jobs.
- Avoid manually overriding environment variables that modules manage, unless you understand the implications.
Using precompiled software effectively is largely about managing environments carefully and consistently; once you master modules and the site’s conventions, most HPC numerical libraries and applications become straightforward to use.