3.5 Software installation concepts

Table of Contents

Key Ideas: Installing Software in HPC and Linux

On HPC systems, you almost never install software the same way you would on your laptop. You typically:

Do not have root/administrator access.
Need multiple versions of the same software (e.g., several MPI or Python versions).
Care about performance, compiler compatibility, and dependencies.
Want installs to be reproducible and shareable across users.

This chapter introduces the main concepts that underlie software installation in a Linux-based HPC environment, without going deep into any one tool or package manager.

System-wide vs. user-level installation

System-wide installation

System administrators typically install:

Compilers, MPI stacks, numerical libraries
Core tools (shells, editors, debuggers, profilers)
Widely-used applications (e.g., GROMACS, LAMMPS, VASP, OpenFOAM)

Characteristics:

Installed under directories like /usr, /usr/local, or a shared tree like /apps or /opt.
Usually managed with environment modules (discussed elsewhere in this course).
Compiled and tuned for the cluster hardware.
All users can access them, but only admins can modify them.

You mostly “install” these by loading modules, not by running installers yourself.

User-level installation

Users often need:

Custom or newer versions than what the admins provide.
Experimental libraries or niche tools.
Personal Python or R environments.

Because you usually lack root access, you install into:

Your home directory, e.g. ~/software, ~/.local
A project or group directory, e.g. /project/mygroup/software

Concepts:

Install prefix: a directory where software is installed.

System-wide example: /usr/local
User-level example: $HOME/.local, $HOME/software/mytool-1.2

You then modify your environment (PATH, LD_LIBRARY_PATH, etc.) to use software from that prefix.

Static vs. dynamic linking (conceptual level)

How software uses libraries affects what and how you install.

Static linking:

The code from libraries is copied into the executable at link time.
Result: a larger single binary that does not require the libraries at runtime.
Pros: Fewer runtime library dependencies, sometimes easier to run on other systems.
Cons: Larger executables, harder to update a library separately, sometimes less flexible for HPC tuning.

Dynamic linking:

Executable depends on shared libraries (.so files on Linux) located at runtime.
Pros: Smaller executables; you can update a library without recompiling everything; can share one library among many programs.
Cons: Requires the right library versions to be available at runtime; LD_LIBRARY_PATH and module loading become important.

Most HPC software on clusters is dynamically linked and relies on the cluster’s library stacks.

Package-based vs. source-based installation

Binary packages

On personal Linux systems, admins use distributions’ package managers:

Debian/Ubuntu: apt (.deb packages)
Red Hat/CentOS/Rocky: yum, dnf (.rpm packages)

Conceptually:

Precompiled binaries packaged with metadata.
Dependency handling is automatic at the system level.

On clusters:

These tools are mostly used by admins, not users.
Users might use per-user package managers (e.g., in Python or R) that behave similarly conceptually but in user space.

You still need to understand that “installing from a package”:

Chooses pre-built binaries for a specific architecture.
Usually has limited tuning for your exact workload.

Source-based installation

HPC software is often installed from source code to:

Match specific CPU architecture and instruction sets (e.g., AVX2, AVX-512).
Use a particular compiler (e.g., Intel, GCC, LLVM).
Link against specific optimized libraries (e.g., vendor BLAS, MPI, GPU libraries).

Typical build pattern (conceptual):

Configure: detect system features, compilers, libraries, and set build options.
Compile: convert source code to object code, apply optimizations.
Link: produce executables and libraries.
Install: copy built files into the chosen prefix.

You usually select:

Installation location (prefix).
Optimization and debug options.
Dependencies (which MPI, which BLAS/LAPACK, etc.).

On HPC systems, admins often use specialized tools (e.g., Spack, EasyBuild) to automate and standardize source builds; for users, manual builds are common for specific tools.

The “install prefix” and adjusting your environment

When you “install” software, the main conceptual tasks are:

Choose a prefix / installation path

Examples:

System: /usr, /usr/local, /opt/hpc
User: $HOME/.local, $HOME/software/gromacs-2024, /project/mygroup/tools

Arrange subdirectories

Common conventions under a prefix:

bin/ — executables
lib/ or lib64/ — libraries
include/ — header files
share/ — documentation, data, examples

Expose the software to your environment

Most tools are found using environment variables:

PATH — where the shell looks for executables
LD_LIBRARY_PATH — where the dynamic linker looks for shared libraries
CPATH, LIBRARY_PATH, PKG_CONFIG_PATH — where compilers and build tools look for headers and libs
Tool-specific variables (e.g., PYTHONPATH for Python modules)

Conceptually, “installing” for your personal use often means:

Put binaries into some prefix under your control.
Add $PREFIX/bin to $PATH.
Ensure libraries are visible via LD_LIBRARY_PATH or rpath (if used).

Environment modules wrap all of this into simple module load commands.

Versioning and coexistence of multiple installs

On clusters, many versions often coexist, for example:

gcc/9.5, gcc/11.3, gcc/13.1
python/3.10, python/3.11
openmpi/4.1.6, mpich/4.1

Concepts:

Prefix per version:

Each version installed to its own directory.
Example: /apps/gromacs/2021, /apps/gromacs/2023

Versioned names:

Executables may carry version numbers (python3.10, python3.11), or be made available through modules.

Non-interference:

Different versions can be loaded by adjusting environment variables differently, often via modules.

As a user, your core conceptual task is to choose a consistent stack:

Compiler version
MPI implementation and version
Math libraries
Application version

and ensure they are compatible.

Dependencies and dependency resolution

Most non-trivial software depends on other software, such as:

Libraries (e.g., libhdf5, libfftw, libcuda)
Compilers and toolchains
Interpreters or runtimes (Python, Java, etc.)

Conceptually:

Direct dependencies: software you explicitly know you need.
Transitive dependencies: dependencies of your dependencies (e.g., HDF5 needing MPI).
Build-time vs. runtime dependencies:

Build-time: needed to compile and link.
Runtime: needed when you actually execute the program.

On a cluster:

System-level tools (modules, Spack, EasyBuild) help admins manage these.
For users, the main concept is: when installing software, ensure the required dependencies are available (often by loading modules first), and ensure your build uses them correctly.

Installation in interpreted environments (Python, R, etc.)

While full details are covered elsewhere, some concepts are common:

Base runtime vs. add-on packages:

Python interpreter vs. pip/conda packages.
R interpreter vs. CRAN/Bioconductor packages.

Virtual environments / isolated environments:

User-controlled installation locations isolated from system-wide packages.
Avoid conflicts between different project requirements.

Binary wheels vs. source builds:

Install from prebuilt binary wheels when possible (fast, easy).
Fall back to source builds when necessary (requires compilers and dev libraries).

On HPC clusters, you often:

Load a provided base runtime (e.g., module load python/3.11).
Create project-specific environments in your home or project space.
Install packages into those environments without root access.

Build configurations and reproducibility

Even without going into specific tools, some general concepts are important:

Configuration files or scripts:

Record compiler choices, optimization flags, MPI location, library paths.
Make builds reproducible and easier to share with others.

Build types:

Debug vs. optimized (covered more deeply elsewhere).

Documentation of the build:

Keep notes or scripts of:

The modules you loaded.
The configure options or CMake options used.
The exact versions (git tags, release numbers).

Reproducible installation is crucial in HPC for:

Comparing performance across systems.
Debugging and verifying scientific results.
Helping admins or collaborators reproduce your environment.

Permissions, quotas, and shared spaces

When deciding where and how to install software, consider:

Filesystem permissions:

You cannot write to system directories.
You can write to your home directory and possibly group/project directories.

Quotas:

Home space might be small and backed up.
Project space is larger but may have different policies.

Shared vs. personal installs:

Personal installs: in your home or user-owned directory.
Group installs: in a project directory where multiple users need access.

Requires appropriate UNIX permissions or access control lists (ACLs).

Conceptually, installing software for shared use means:

Agreeing on a shared prefix.
Ensuring directory permissions allow others to read/execute (and possibly write, if they co-maintain the install).
Documenting how to use it (e.g., small shell scripts or environment modulefiles).

High-level workflow for installing user software on an HPC cluster

Putting these concepts together, a typical logical workflow is:

Identify what you need:

Application name and version.
Required libraries and runtimes.

Check system-provided options:

Use environment modules or documentation to see if admins already provide it.

Plan your installation:

Choose prefix: personal or project-level.
Decide whether to use binaries or build from source.

Prepare the environment:

Load compatible modules (compiler, MPI, libraries).
Ensure dependencies are present.

Install:

For source: configure, build, install into your prefix.
For Python/R, create an environment and install packages there.

Expose the result:

Adjust PATH, LD_LIBRARY_PATH, and other variables.
Optionally create a simple modulefile or wrapper script.

Document:

Record the steps, versions, and environment so you and others can reproduce it.

These are the fundamental concepts that underpin how software is installed and managed in Linux-based HPC environments, regardless of the specific tools used.

Comments

Please login to add a comment.

Don't have an account? Register now!