Table of Contents
Why filesystems matter in HPC
On an HPC system, you will rarely interact with disks directly. Instead, you work with:
- A filesystem: how files and directories are organized and accessed.
- A directory structure: how those files and directories are laid out in a tree.
Understanding this is crucial because:
- Different directories live on different types of storage (fast/slow, local/networked).
- Some locations are backed up, others are temporary and purged.
- Job schedulers and applications often expect files in specific places.
This chapter focuses on the basic concepts and layout you’ll encounter on Linux-based HPC systems.
The Unix filesystem as a tree
Linux uses a single-rooted tree filesystem:
- The root of the tree is
/ - Everything else is a directory or file under
/
For example:
/
├── bin
├── boot
├── dev
├── etc
├── home
│ ├── alice
│ └── bob
├── lib
├── opt
├── tmp
├── usr
└── var
There are no separate drive letters (like C: on Windows). Additional disks or network filesystems are mounted into this tree (e.g. /home, /scratch, /projects).
Absolute vs relative paths
A path describes where a file or directory is located in the tree.
- Absolute path: starts from
/, always valid regardless of where you are. - Examples:
/home/alice,/scratch/alice/job1/output.txt - Relative path: starts from your current directory.
- If you are in
/home/alice, thenprojectsrefers to/home/alice/projects.
Special entries:
.: current directory..: parent directory
Examples:
./script.sh—script.shin the current directory../data—datadirectory one level up../../input— go up two levels, then intoinput
In scripts and job files, prefer absolute paths to avoid ambiguity.
Key user directories on HPC systems
Exact names vary by cluster, but you’ll commonly see:
/home/USERNAME— your home directory- Typically small quota
- Backed up
- Best for source code, small scripts, configuration files
/scratch/USERNAMEor similar — scratch / temporary storage- Much larger, high-performance
- Often not backed up
- May be purged after N days
- Intended for large input/output, temporary results
/project/PROJECTNAMEor/work/PROJECTNAME— project storage- Shared by a group
- Quota per project
- Sometimes backed up, sometimes not (depends on system)
/tmp— system-wide temporary directory- Local or shared
- Not a safe long-term location
On many systems, there is also local scratch per node:
- Something like
/local_scratchor/tmpon compute nodes - Accessible only during the job
- Very fast, but contents disappear once the job finishes or node reboots
Always read your site documentation to know what each location is for.
The Linux directory hierarchy: what you’ll actually use
For HPC beginners, you do not need to memorize the entire FHS (Filesystem Hierarchy Standard), but you should recognize:
/— root of the filesystem/home— user home directories/scratch,/work,/project— HPC-specific data areas/bin,/usr/bin— executables (system tools and common utilities)/lib,/usr/lib— system libraries/etc— system configuration (read-only for regular users)/tmp— temporary files/opt— optional / third-party software installs/usr/local— locally installed software (often site-specific HPC software)
As a normal user, you typically:
- Read from many of these
- Write only in:
- Your home directory
- Assigned scratch/work/project directories
- Temporary locations like
/tmp(where allowed)
Your working context: HOME and current directory
Two concepts matter a lot when navigating and running jobs:
- Home directory: where you “live”
- Shown by
echo $HOME - Usually
/home/USERNAME - Current working directory (CWD): where you are right now
- Shown by
pwd(covered in command-line chapter) - Changes as you move around (e.g. with
cd)
Many tools and job schedulers:
- Start from your home directory when you log in
- Use your current directory as the default location for relative paths
In job scripts, it’s common to include something like:
cd $SLURM_SUBMIT_DIRor
cd /scratch/$USER/myprojectto ensure the job runs in the right spot.
File types you’ll encounter
Linux distinguishes several file types; two are most important for beginners:
- Regular files
- Text files, executables, binaries, logs, etc.
- Directories
- Containers for files and other directories
Others you may see (mostly system-level):
- Symbolic links (symlinks)
- Devices
- Named pipes, sockets
For HPC usage, be aware of symbolic links: they can point across filesystems, which can be handy (e.g. a link in your home directory that points to a scratch directory).
Symbolic links and paths
A symbolic link (or symlink) is a special file that points to another path.
For example:
/home/alice
└── data -> /scratch/alice/data
Here, data is a symlink. When you access /home/alice/data, you actually access /scratch/alice/data.
Why this matters in HPC:
- You can keep a stable path in your home directory that points to a large, fast storage area.
- Moving large directories between storage areas can be replaced by:
- Moving the data once
- Updating a symlink
Be aware that:
- Symlinks can break if the target is removed or renamed.
- Tools that operate on directories might follow symlinks and traverse into scratch or network filesystems unexpectedly.
Directory structures for HPC projects
A clear directory layout helps with:
- Keeping large files off your limited home space
- Making job scripts and workflows more maintainable
- Sharing code and data across jobs
A simple, HPC-friendly pattern:
/home/USERNAME/
└── projects/
└── mycode/ # source code, scripts, small config
/scratch/USERNAME/
└── mycode-runs/
├── input/ # large input data
├── output/ # job results
└── logs/ # job logs, stdout/stderrYou might then:
- Edit and version-control code in
/home/USERNAME/projects/mycode - Copy or link input data into
/scratch/USERNAME/mycode-runs/input - Point job scripts at
/scratch/USERNAME/mycode-runs/outputfor result files
Over time, you can evolve this into more complex structures, but the main idea is to separate code/config (small) from data/results (large) and put large items on appropriate storage.
Hidden files and configuration
Files and directories whose names start with . are considered hidden.
Examples:
~/.bashrc~/.ssh/~/.config/
They are widely used for:
- Shell configuration
- Application settings
- Credentials (e.g. SSH keys)
On HPC systems:
- You may need to edit or create some of these for your environment.
- They live in your home directory and are usually small (which is good for quotas).
Hidden directories follow the same rules as any others; they’re just not shown by default in some listings or tools.
Ownership, permissions, and filesystem boundaries (at a high level)
Full details of permissions and user management belong elsewhere, but a few filesystem-specific points matter:
- What you can read/write in a directory depends on ownership and permissions.
- Different mounted filesystems (for example,
/homevs/scratch) may: - Use different quotas (limits on space and number of files)
- Have different backup and retention policies
- Be accessed differently from login vs compute nodes
Quotas are enforced per user or per project on a filesystem; they’re an attribute of the filesystem, not of individual directories alone.
Common HPC filesystem layouts
An HPC cluster often has multiple filesystems mounted in the same namespace. A simplified example:
/
├── home # network filesystem A
│ ├── alice
│ └── bob
├── scratch # parallel filesystem B
│ ├── alice
│ └── bob
├── project # project-oriented filesystem C
│ ├── proj1
│ ├── proj2
│ └── proj3
└── software # shared application stack
├── compilers
├── mpi
└── appsThey may be backed by different technologies (NFS, Lustre, GPFS, etc.), but from your perspective:
- They look like normal directories
- Performance and behavior differ
From a usage standpoint:
- Don’t put big data in
/home - Do put large working sets and I/O-heavy files in
/scratchor the equivalent - Organize shared work in
/project/PROJECTNAME
Directory structures in batch jobs
Jobs typically:
- Start on a compute node
- Have some current working directory (often where you submitted the job)
Because filesystem performance and layout matter:
- It is common to copy input data from slow/archival storage to fast scratch at the start of a job.
- At the end of a job, you may copy back results or summaries to your home or project space.
A typical directory-aware job flow:
- Submit job from
/home/USERNAME/projects/mycode - Job script:
cd /scratch/USERNAME/mycode-runs/run001- Copy or link input from project storage
- Run the application, writing output locally
- Copy final results or selected files back to
/homeor/project
This minimizes unnecessary load on shared home or archival filesystems.
Practical conventions and tips
- Use descriptive directory names for runs:
run001,run-strong-scaling-64,test-small, etc. - Keep logs and output in separate subdirectories to avoid clutter.
- Avoid extremely deep or overly nested trees that are hard to navigate.
- Avoid millions of tiny files in one directory; many filesystems handle this poorly. Group small files into subdirectories, or use archive formats when appropriate.
- Use absolute paths in scripts when in doubt, especially for critical input/output directories.
Understanding how Linux filesystems and directory structures are organized on your HPC system will help you:
- Avoid running out of space
- Place data where it performs best
- Write more robust, portable job scripts and workflows.