Table of Contents
Goals of this Chapter
By the end of this chapter, you should be able to:
- Understand why Linux is the default environment for HPC systems.
- Recognize common characteristics of HPC-oriented Linux installations.
- Log in to a remote Linux machine and work with a shell.
- Navigate the filesystem and manipulate files and directories.
- Understand how software is typically provided and managed on clusters.
- Know, at a high level, how environment modules and software stacks fit together.
Details for topics that have their own sections later (e.g., “Environment modules” under both this chapter and “Reproducibility and Software Environments”) will only be introduced here and explored more fully in their dedicated chapters.
Operating Systems in HPC: Why Linux?
Most modern HPC systems run a variant of Linux. This is not accidental; it follows from several practical needs in large-scale computing:
- Open-source and customizable
- System vendors and HPC centers can modify the kernel, drivers, and system services.
- Features not needed for batch, non-interactive workloads can be removed or disabled for better performance and predictability (e.g., reducing background tasks or desktop-related services).
- Ecosystem and tool support
- Compilers, debuggers, profilers, MPI libraries, numerical libraries, and job schedulers are all primarily developed and tested on Linux.
- Many scientific and engineering codes assume a POSIX/Linux environment.
- Scalability and manageability
- Linux supports clustering tools, configuration management, and monitoring at very large scales.
- System administrators can automate deployment and updates across thousands of nodes.
- Hardware support
- New CPUs, interconnects, GPUs, and accelerators almost always provide Linux drivers and software stacks first.
- Cost and licensing
- As a free and open-source OS, Linux avoids per-node license costs and allows flexible deployment across many nodes.
In practice, HPC clusters are usually built on top of specialized Linux distributions or derivatives (e.g., CentOS/AlmaLinux/Rocky Linux, Ubuntu LTS, SUSE, or vendor-specific HPC stacks). They are tuned for:
- Batch workloads (through job schedulers)
- Parallel file systems
- High-speed network interconnects
- Security and multi-user environments
Working in a Linux Shell on HPC Systems
HPC users typically interact with the system via a shell (command-line interface), not a graphical desktop. Even if you have used Linux on a laptop with a graphical desktop, an HPC cluster is usually accessed as a remote text-based environment.
Remote Access: SSH
To reach the HPC system, you normally connect from your local machine to a login node using ssh (Secure Shell). A typical connection command has the form:
ssh username@cluster.example.eduKey characteristics:
- Secure, encrypted connection
Both your commands and the data you see are encrypted. - Authentication methods
- Passwords (less common on large clusters).
- SSH key pairs (common and recommended).
- Institutional single sign-on or multi-factor authentication.
- No computation on your local machine
Once connected, commands run on the remote login node, not your local computer.
Graphical or file transfer tools often use SSH under the hood (e.g., scp, sftp, or GUI tools like WinSCP or MobaXterm).
The Shell Environment
When you log in, you are placed in a shell such as bash or zsh.
Key ideas relevant for HPC:
- Non-interactive vs interactive shells
- Interactive shells are used when you type commands directly.
- Non-interactive shells run scripts (e.g., job scripts in the scheduler).
- Prompt and current working directory
Your shell prompt often encodes your username, hostname, and current directory.
Example prompt:
:::text
user@login1:~/project$ - Configuration files
Shell startup files like~/.bashrc,~/.bash_profile, or~/.profileare often used to customize your environment. On shared clusters, these files should remain lightweight: - Avoid heavy initialization (e.g., loading many modules, starting long-running processes).
- Avoid interactive prompts or commands that might break batch jobs.
Filesystems and Directory Structures in HPC Linux
While the detailed operation of parallel filesystems is covered elsewhere, it is important to understand how directories are typically organized on a Linux-based HPC cluster and what this implies for your day-to-day work.
Key Directories
Some common top-level directories:
/– the root of the filesystem./home– usually contains users’ home directories:/home/usernameis typically where you land when you log in.- Backups might be enabled; quotas (limits on storage usage) often apply.
/scratchor/work– one or more large, high-performance spaces used for temporary or working data:- Often not backed up but optimized for throughput and parallel access.
- May have automatic cleanup policies (e.g., delete files older than N days).
/projector/group– shared project spaces:- Designed for collaboration within a group or project.
- Larger quotas than home directories.
/usr,/opt,/apps– locations where system-wide software is installed:- System compilers, MPI libraries, numerical libraries, and applications.
- These may be managed by environment modules.
The exact names and policies vary between clusters; always consult your site’s documentation.
Your Home Directory
Common properties on HPC clusters:
- Network-mounted: accessible from multiple nodes (so your data and configuration are visible across login and compute nodes).
- Quota-limited: there is often a strict storage limit.
- Best used for:
- Source code.
- Build scripts and job scripts.
- Small input data and configuration files.
- Not ideal for:
- Large simulation outputs.
- Many small temporary files created at scale.
Working (Scratch) Directories
High-performance work often happens in a scratch or work space:
- High bandwidth and low latency for parallel I/O.
- Designed to handle:
- Large files.
- High concurrency (many processes reading/writing).
- Policies:
- Limited retention time; files older than a threshold may be automatically deleted.
- Typically no backup; you are responsible for moving important data to safer storage.
A common workflow is:
- Prepare job scripts and small inputs in your home or project directory.
- Copy or link data to a scratch area.
- Run computational jobs there.
- Copy essential results back to home or project space.
Permissions and Multi-User Considerations
Linux access control on HPC systems is a key part of the multi-user environment:
- Each file and directory has an owner, group, and permission bits for:
- user (
u), group (g), others (o). - Permissions determine who can read (r), write (w), or execute (x).
Typical patterns:
- Default group membership is often tied to your primary research group.
- Shared project directories may be configured with group permissions to facilitate collaboration.
- Some centers enforce default group settings using special flags (like the
setgidbit on directories), so newly created files inherit group ownership.
Details of Linux permissions and advanced group management belong in more basic Linux tutorials, but as an HPC user, you must be aware that:
- You share the filesystem with many other users.
- Incorrect permissions can expose your data to others or prevent collaborators from accessing shared results.
Basic Linux Command Line Usage in an HPC Context
You will commonly use a subset of Linux commands focused on:
- Navigating the filesystem.
- Inspecting and editing text files.
- Managing data transfers.
- Running and monitoring processes (when allowed on login nodes).
The specific command names and behavior are standard Linux; what is particular to HPC is how and where you use them.
Navigation and File Management
Core commands you will likely use daily:
pwd– show your current working directory.ls– list files in a directory.cd– change directory.cp– copy files and directories.mv– move or rename files.rm– remove files.mkdir– create directories.rmdir– remove empty directories.
Some HPC-specific considerations:
- Be careful with
rm -rorrm -rf, especially in shared spaces and scratch directories. - Large directory trees with many files can make
lsand other tools slow; consider using more targeted operations (findwith restrictions, careful directory layouts). - Cluster policies may limit certain operations on login nodes if they stress shared filesystems (e.g., running scripts that create millions of small files).
Viewing and Editing Text Files
Most of your interaction with code and job scripts will be via text files. Common tools include:
cat,less,head,tail– to view contents of files.- Text editors:
nano– typically the easiest for beginners.vim,emacs– powerful but steeper learning curve.
On HPC clusters, you often edit:
- Source files (
.c,.cpp,.f90,.py, etc.). - Job scripts for the scheduler.
- Configuration files (e.g.,
.bashrc, application input files).
Because many clusters are accessed over relatively slow network connections:
- Editors that minimize screen redraws (e.g.,
vim,nano) work better than heavy graphical tools. - Some institutions encourage local editing on your machine combined with file synchronization (
scp,rsync, Git) to reduce interactive editing time on the cluster.
Managing Processes on Login Nodes
While full job scheduling is covered later, there are some important rules and common commands on Linux login nodes:
- Login nodes are shared among many users and are not intended for heavy computation.
- Short, light tasks are usually acceptable:
- Compiling small programs.
- Running quick tests.
- Editing files, checking logs, using version control.
Common commands:
ps,top,htop– list or monitor processes.topandhtopmight be restricted on some systems for privacy reasons; usage policies vary.kill– terminate your own processes.ulimit– view limits set on processes (e.g., maximum memory, stack size), often configured by the HPC center.
For any long or CPU-intensive work, you should use the job scheduler instead of running directly on the login node.
Software Installation Concepts on HPC Linux
Unlike a personal Linux machine, an HPC cluster usually does not allow users to install software system-wide with sudo or the system package manager. Instead, software management is structured differently to support:
- Multiple versions of the same tool (e.g., several compiler versions).
- Different builds for different CPUs, GPUs, or interconnects.
- Isolation between users and projects.
System-Wide vs User-Level Software
Two broad levels of software on an HPC Linux system:
- System-provided software (managed by administrators):
- Core compilers.
- MPI libraries.
- Math and scientific libraries.
- Popular applications (e.g., GROMACS, LAMMPS, VASP, etc., depending on the site).
- Job scheduler commands.
- User-installed software (within your home or project directories):
- You build and install software in a directory you control, such as:
~/software/project/mygroup/software- You adjust your environment (
PATH,LD_LIBRARY_PATH, etc.) to use these.
Interaction between these levels is typically managed through environment modules.
Package Managers and Build Systems
Although you cannot use system package managers like apt or yum on the cluster (because you lack administrative privileges), you may see:
- Admins using those tools behind the scenes to provide software.
- Users relying on language-specific or user-level package managers:
pip(Python)condaormambaR’sinstall.packages()spack(HPC-oriented package manager)brewin some user spaces
Compilers and build systems (e.g., make, cmake) are standard Linux tools, but used in combination with environment modules and job-scheduler-friendly workflows.
Environment Modules (Overview)
Environment modules provide a flexible way to select which compilers, libraries, and applications you want to use in your current shell session. They are particularly important in HPC because:
- Multiple versions and toolchains must coexist.
- Different applications may require incompatible dependencies.
- You often need to reproduce exactly the same software environment across multiple runs.
Common module systems include:
Environment Modules(classicmodulecommand).Lmod(a Lua-based module system).
Typical commands:
module avail– list available modules.module load name– add a module to your environment.module list– see currently loaded modules.module unload name– remove a module.module purge– remove all loaded modules.
Internally, modules adjust environment variables such as:
PATH– where the shell looks for executables.LD_LIBRARY_PATH– where dynamic libraries are searched for.CPATH,LIBRARY_PATH, and others.
This chapter only introduces the concept. A later chapter on Environment modules and on Reproducibility and Software Environments will cover:
- How modules interact with job scripts.
- Strategies for recording and reproducing module sets.
- Using modules alongside containers and software stacks.
The Linux Environment in Batch and Interactive Jobs
Once you start using the job scheduler (discussed later), you interact with Linux in two main modes:
- Interactive jobs
You request resources from the scheduler and get a shell on a compute node, where you can: - Run interactive tests.
- Debug performance issues.
- Work within the same environment that your batch jobs will use.
- Batch jobs
You submit a job script describing: - Resources required (CPUs, memory, GPUs).
- Commands to run (e.g.,
srun,mpirun,python script.py).
In both cases, job scripts often:
- Load required modules.
- Set shell environment variables.
- Create or navigate to appropriate directories in the filesystem.
The Linux environment inside these jobs is closely related to your login environment, but not identical:
- Some startup files may or may not be sourced.
- The scheduler may set certain environment variables (e.g., job ID, list of assigned nodes).
PATHandLD_LIBRARY_PATHmay differ depending on how you write the job script.
Understanding these differences is crucial for debugging problems where applications behave differently when run interactively vs under the scheduler.
Best Practices for Working in the Linux HPC Environment
To use the Linux environment on an HPC system effectively and politely:
- Respect login node policies
- Use them only for light tasks.
- Run heavy computations and large parallel jobs through the scheduler.
- Organize your filesystem usage
- Use home for code and small files.
- Use project and scratch spaces for large data and temporary results.
- Clean up scratch directories regularly.
- Use environment modules intentionally
- Load only what you need.
- Keep track of which modules are required for your workflows.
- Avoid hard-coding full paths to tools when a module can provide them.
- Avoid cluttering shell startup files
- Do not load large numbers of modules or run heavy commands automatically.
- Be mindful that non-interactive shells (like those in batch jobs) may read your startup files.
- Document your environment
- Record module lists and software versions in project notes or job outputs.
- This helps with reproducibility and debugging.
These habits make it easier to transition from basic Linux usage to more advanced HPC workflows that rely on job scheduling, parallel programming, and performance optimization.