Kahibaro
Discord Login Register

Filesystems and directory structures

Why filesystems matter in HPC

On an HPC system, you will rarely interact with disks directly. Instead, you work with:

Understanding this is crucial because:

This chapter focuses on the basic concepts and layout you’ll encounter on Linux-based HPC systems.

The Unix filesystem as a tree

Linux uses a single-rooted tree filesystem:

For example:

/
├── bin
├── boot
├── dev
├── etc
├── home
│   ├── alice
│   └── bob
├── lib
├── opt
├── tmp
├── usr
└── var

There are no separate drive letters (like C: on Windows). Additional disks or network filesystems are mounted into this tree (e.g. /home, /scratch, /projects).

Absolute vs relative paths

A path describes where a file or directory is located in the tree.

Special entries:

Examples:

In scripts and job files, prefer absolute paths to avoid ambiguity.

Key user directories on HPC systems

Exact names vary by cluster, but you’ll commonly see:

On many systems, there is also local scratch per node:

Always read your site documentation to know what each location is for.

The Linux directory hierarchy: what you’ll actually use

For HPC beginners, you do not need to memorize the entire FHS (Filesystem Hierarchy Standard), but you should recognize:

As a normal user, you typically:

Your working context: HOME and current directory

Two concepts matter a lot when navigating and running jobs:

Many tools and job schedulers:

In job scripts, it’s common to include something like:

cd $SLURM_SUBMIT_DIR

or

cd /scratch/$USER/myproject

to ensure the job runs in the right spot.

File types you’ll encounter

Linux distinguishes several file types; two are most important for beginners:

Others you may see (mostly system-level):

For HPC usage, be aware of symbolic links: they can point across filesystems, which can be handy (e.g. a link in your home directory that points to a scratch directory).

Symbolic links and paths

A symbolic link (or symlink) is a special file that points to another path.

For example:

/home/alice
└── data -> /scratch/alice/data

Here, data is a symlink. When you access /home/alice/data, you actually access /scratch/alice/data.

Why this matters in HPC:

Be aware that:

Directory structures for HPC projects

A clear directory layout helps with:

A simple, HPC-friendly pattern:

/home/USERNAME/
└── projects/
    └── mycode/          # source code, scripts, small config
/scratch/USERNAME/
└── mycode-runs/
    ├── input/           # large input data
    ├── output/          # job results
    └── logs/            # job logs, stdout/stderr

You might then:

Over time, you can evolve this into more complex structures, but the main idea is to separate code/config (small) from data/results (large) and put large items on appropriate storage.

Hidden files and configuration

Files and directories whose names start with . are considered hidden.

Examples:

They are widely used for:

On HPC systems:

Hidden directories follow the same rules as any others; they’re just not shown by default in some listings or tools.

Ownership, permissions, and filesystem boundaries (at a high level)

Full details of permissions and user management belong elsewhere, but a few filesystem-specific points matter:

Quotas are enforced per user or per project on a filesystem; they’re an attribute of the filesystem, not of individual directories alone.

Common HPC filesystem layouts

An HPC cluster often has multiple filesystems mounted in the same namespace. A simplified example:

/
├── home            # network filesystem A
│   ├── alice
│   └── bob
├── scratch         # parallel filesystem B
│   ├── alice
│   └── bob
├── project         # project-oriented filesystem C
│   ├── proj1
│   ├── proj2
│   └── proj3
└── software        # shared application stack
    ├── compilers
    ├── mpi
    └── apps

They may be backed by different technologies (NFS, Lustre, GPFS, etc.), but from your perspective:

From a usage standpoint:

Directory structures in batch jobs

Jobs typically:

Because filesystem performance and layout matter:

A typical directory-aware job flow:

  1. Submit job from /home/USERNAME/projects/mycode
  2. Job script:
    • cd /scratch/USERNAME/mycode-runs/run001
    • Copy or link input from project storage
    • Run the application, writing output locally
    • Copy final results or selected files back to /home or /project

This minimizes unnecessary load on shared home or archival filesystems.

Practical conventions and tips

Understanding how Linux filesystems and directory structures are organized on your HPC system will help you:

Views: 19

Comments

Please login to add a comment.

Don't have an account? Register now!