3.3 Filesystems and directory structures

Table of Contents

Why filesystems matter in HPC

On an HPC system, you will rarely interact with disks directly. Instead, you work with:

A filesystem: how files and directories are organized and accessed.
A directory structure: how those files and directories are laid out in a tree.

Understanding this is crucial because:

Different directories live on different types of storage (fast/slow, local/networked).
Some locations are backed up, others are temporary and purged.
Job schedulers and applications often expect files in specific places.

This chapter focuses on the basic concepts and layout you’ll encounter on Linux-based HPC systems.

The Unix filesystem as a tree

Linux uses a single-rooted tree filesystem:

The root of the tree is /
Everything else is a directory or file under /

For example:

/
├── bin
├── boot
├── dev
├── etc
├── home
│   ├── alice
│   └── bob
├── lib
├── opt
├── tmp
├── usr
└── var

There are no separate drive letters (like C: on Windows). Additional disks or network filesystems are mounted into this tree (e.g. /home, /scratch, /projects).

Absolute vs relative paths

A path describes where a file or directory is located in the tree.

Absolute path: starts from /, always valid regardless of where you are.

Examples: /home/alice, /scratch/alice/job1/output.txt

Relative path: starts from your current directory.

If you are in /home/alice, then projects refers to /home/alice/projects.

Special entries:

. : current directory
.. : parent directory

Examples:

./script.sh — script.sh in the current directory
../data — data directory one level up
../../input — go up two levels, then into input

In scripts and job files, prefer absolute paths to avoid ambiguity.

Key user directories on HPC systems

Exact names vary by cluster, but you’ll commonly see:

/home/USERNAME — your home directory

Typically small quota
Backed up
Best for source code, small scripts, configuration files

/scratch/USERNAME or similar — scratch / temporary storage

Much larger, high-performance
Often not backed up
May be purged after N days
Intended for large input/output, temporary results

/project/PROJECTNAME or /work/PROJECTNAME — project storage

Shared by a group
Quota per project
Sometimes backed up, sometimes not (depends on system)

/tmp — system-wide temporary directory

Local or shared
Not a safe long-term location

On many systems, there is also local scratch per node:

Something like /local_scratch or /tmp on compute nodes
Accessible only during the job
Very fast, but contents disappear once the job finishes or node reboots

Always read your site documentation to know what each location is for.

The Linux directory hierarchy: what you’ll actually use

For HPC beginners, you do not need to memorize the entire FHS (Filesystem Hierarchy Standard), but you should recognize:

/ — root of the filesystem
/home — user home directories
/scratch, /work, /project — HPC-specific data areas
/bin, /usr/bin — executables (system tools and common utilities)
/lib, /usr/lib — system libraries
/etc — system configuration (read-only for regular users)
/tmp — temporary files
/opt — optional / third-party software installs
/usr/local — locally installed software (often site-specific HPC software)

As a normal user, you typically:

Read from many of these
Write only in:

Your home directory
Assigned scratch/work/project directories
Temporary locations like /tmp (where allowed)

Your working context: HOME and current directory

Two concepts matter a lot when navigating and running jobs:

Home directory: where you “live”

Shown by echo $HOME
Usually /home/USERNAME

Current working directory (CWD): where you are right now

Shown by pwd (covered in command-line chapter)
Changes as you move around (e.g. with cd)

Many tools and job schedulers:

Start from your home directory when you log in
Use your current directory as the default location for relative paths

In job scripts, it’s common to include something like:

cd $SLURM_SUBMIT_DIR

cd /scratch/$USER/myproject

to ensure the job runs in the right spot.

File types you’ll encounter

Linux distinguishes several file types; two are most important for beginners:

Regular files

Text files, executables, binaries, logs, etc.

Directories

Containers for files and other directories

Others you may see (mostly system-level):

Symbolic links (symlinks)
Devices
Named pipes, sockets

For HPC usage, be aware of symbolic links: they can point across filesystems, which can be handy (e.g. a link in your home directory that points to a scratch directory).

Symbolic links and paths

A symbolic link (or symlink) is a special file that points to another path.

For example:

/home/alice
└── data -> /scratch/alice/data

Here, data is a symlink. When you access /home/alice/data, you actually access /scratch/alice/data.

Why this matters in HPC:

You can keep a stable path in your home directory that points to a large, fast storage area.
Moving large directories between storage areas can be replaced by:

Moving the data once
Updating a symlink

Be aware that:

Symlinks can break if the target is removed or renamed.
Tools that operate on directories might follow symlinks and traverse into scratch or network filesystems unexpectedly.

Directory structures for HPC projects

A clear directory layout helps with:

Keeping large files off your limited home space
Making job scripts and workflows more maintainable
Sharing code and data across jobs

A simple, HPC-friendly pattern:

/home/USERNAME/
└── projects/
    └── mycode/          # source code, scripts, small config
/scratch/USERNAME/
└── mycode-runs/
    ├── input/           # large input data
    ├── output/          # job results
    └── logs/            # job logs, stdout/stderr

You might then:

Edit and version-control code in /home/USERNAME/projects/mycode
Copy or link input data into /scratch/USERNAME/mycode-runs/input
Point job scripts at /scratch/USERNAME/mycode-runs/output for result files

Over time, you can evolve this into more complex structures, but the main idea is to separate code/config (small) from data/results (large) and put large items on appropriate storage.

Hidden files and configuration

Files and directories whose names start with . are considered hidden.

Examples:

~/.bashrc
~/.ssh/
~/.config/

They are widely used for:

Shell configuration
Application settings
Credentials (e.g. SSH keys)

On HPC systems:

You may need to edit or create some of these for your environment.
They live in your home directory and are usually small (which is good for quotas).

Hidden directories follow the same rules as any others; they’re just not shown by default in some listings or tools.

Ownership, permissions, and filesystem boundaries (at a high level)

Full details of permissions and user management belong elsewhere, but a few filesystem-specific points matter:

What you can read/write in a directory depends on ownership and permissions.
Different mounted filesystems (for example, /home vs /scratch) may:

Use different quotas (limits on space and number of files)
Have different backup and retention policies
Be accessed differently from login vs compute nodes

Quotas are enforced per user or per project on a filesystem; they’re an attribute of the filesystem, not of individual directories alone.

Common HPC filesystem layouts

An HPC cluster often has multiple filesystems mounted in the same namespace. A simplified example:

/
├── home            # network filesystem A
│   ├── alice
│   └── bob
├── scratch         # parallel filesystem B
│   ├── alice
│   └── bob
├── project         # project-oriented filesystem C
│   ├── proj1
│   ├── proj2
│   └── proj3
└── software        # shared application stack
    ├── compilers
    ├── mpi
    └── apps

They may be backed by different technologies (NFS, Lustre, GPFS, etc.), but from your perspective:

They look like normal directories
Performance and behavior differ

From a usage standpoint:

Don’t put big data in /home
Do put large working sets and I/O-heavy files in /scratch or the equivalent
Organize shared work in /project/PROJECTNAME

Directory structures in batch jobs

Jobs typically:

Start on a compute node
Have some current working directory (often where you submitted the job)

Because filesystem performance and layout matter:

It is common to copy input data from slow/archival storage to fast scratch at the start of a job.
At the end of a job, you may copy back results or summaries to your home or project space.

A typical directory-aware job flow:

Submit job from /home/USERNAME/projects/mycode
Job script:

cd /scratch/USERNAME/mycode-runs/run001
Copy or link input from project storage
Run the application, writing output locally
Copy final results or selected files back to /home or /project

This minimizes unnecessary load on shared home or archival filesystems.

Practical conventions and tips

Use descriptive directory names for runs: run001, run-strong-scaling-64, test-small, etc.
Keep logs and output in separate subdirectories to avoid clutter.
Avoid extremely deep or overly nested trees that are hard to navigate.
Avoid millions of tiny files in one directory; many filesystems handle this poorly. Group small files into subdirectories, or use archive formats when appropriate.
Use absolute paths in scripts when in doubt, especially for critical input/output directories.

Understanding how Linux filesystems and directory structures are organized on your HPC system will help you: