4.8.3 GPFS

What GPFS Is in an HPC Context

GPFS (IBM Spectrum Scale, formerly General Parallel File System) is IBM’s high‑performance, distributed parallel filesystem widely used on large HPC systems. From a user’s point of view it looks like a normal POSIX filesystem (you ls, cd, cp as usual), but internally it:

Stripes file data across many disks and servers for bandwidth and capacity.
Manages metadata in a distributed, highly scalable way.
Provides advanced features like replication, tiering, and quotas.

You might see it referred to in documentation or mount output as gpfs or spectrum scale, and mounted under paths like /gpfs/, /ibm/, or a site‑specific prefix.

Typical GPFS Layout on HPC Systems

Exact layout is site‑specific, but common patterns include:

A large shared GPFS filesystem for:

Project data (e.g. /gpfs/projects, /gpfs/work).
Shared software stacks (e.g. /gpfs/apps).
Sometimes scratch (e.g. /gpfs/scratch).

Per‑user home directories may or may not be on GPFS.

Key aspects for users:

One namespace, many servers: You see a single directory tree even though data is spread over many disks and servers.
Same view from all nodes: Login and compute nodes see the same GPFS mount(s), simplifying running jobs and sharing results.

To see GPFS mounts, you might use:

$ df -hT | grep gpfs
$ mount | grep gpfs

How GPFS Organizes and Moves Data (User‑Relevant View)

Internally, GPFS uses:

NSDs (Network Shared Disks): Logical disks that may represent RAID arrays, JBODs, etc.
File system wide striping: Files are split into blocks and distributed across NSDs.

Things that matter for users:

Block size and striping: Affect performance for large sequential I/O vs many small files. These are typically configured by administrators, not users.
Storage pools / tiers:

High‑performance pool (fast SSD or NVMe).
Capacity pool (slower, larger HDDs).
Optional archive tier (tape integration via HSM).

Replication / failure groups:

Data and/or metadata can be replicated for resilience.
Can increase reliability but also change performance characteristics.

You may encounter pool‑specific paths or projects (e.g. /gpfs/fast, /gpfs/archive), with guidance on where to put what type of data.

Using GPFS Effectively as a User

From the shell, you interact with GPFS as with any POSIX filesystem. The main difference is how you choose where and how to store data for performance and reliability.

Choosing the Right Location

Clusters often define conventions such as:

Home on GPFS:

Small, backed up, slower I/O.
Good for source code, scripts, configs.

Work / project space on GPFS:

Larger quota.
Good for medium‑term research data, shared project files.

Scratch on GPFS (if provided):

Very large, high throughput.
Not backed up, purged regularly.
Intended for temporary job output.

Follow site documentation. Common recommendations:

Do not run large I/O jobs in your home directory.
Use designated GPFS scratch/project directories for high‑volume data.

File and Directory Operations

All normal tools work:

# Create directories
mkdir /gpfs/projects/myproj/output
# Copy data to a fast GPFS scratch
cp big_input.dat /gpfs/scratch/$USER/
# Move results back to project space
mv /gpfs/scratch/$USER/job123/* /gpfs/projects/myproj/results/

But your choices can impact performance:

Avoid creating millions of tiny files in a single directory; use hierarchical directory structures.
Group related outputs in subdirectories (run001/, run002/), which can help metadata operations.

Performance Characteristics and Best Practices

GPFS is designed for high bandwidth concurrent I/O, but how you access it matters.

When GPFS Shines

GPFS is particularly strong for:

Large, sequential reads/writes:

E.g. simulation checkpoints, trajectory files, large matrices.

Many nodes accessing large files in parallel:

E.g. parallel I/O from an MPI or GPU application.

Shared datasets read by many jobs:

E.g. reference databases, meshes, training datasets.

In such cases, striping and multiple I/O servers can deliver aggregate bandwidth far above any single disk.

Workloads That Stress GPFS

Patterns that often hurt performance:

Huge numbers of small files, especially if:

Created/deleted frequently.
Located in a few hot directories.

Metadata‑heavy operations:

ls -R on large trees.
find on large directory hierarchies.
Massive rm -rf operations.

User strategies to mitigate:

Use fewer, larger files where reasonable.
Use tools like tar to package many small files:

  tar czf results_run123.tar.gz results_run123/

Be careful with recursive operations; limit their scope or run them during low‑usage windows if possible.
Prefer parallel I/O libraries (e.g. MPI‑IO, HDF5, NetCDF) where appropriate instead of per‑process small text files.

Parallel I/O on GPFS

GPFS is built to support parallel I/O APIs. As a user:

Check site recommendations for:

Using MPI‑IO or parallel HDF5/NetCDF.
Preferred chunk sizes and access patterns.

On some systems, special build options or libraries are provided to make parallel I/O work efficiently with GPFS.

Conceptually, best practice is:

Many processes writing to a small number of shared files in parallel, using a parallel I/O library, rather than each process writing its own file in the same directory.

Reliability, Limits, and Policies

GPFS installations expose certain operational rules that users should understand.

Quotas

GPFS commonly enforces:

User quotas: total space (block quota) and sometimes number of files (inode quota).
Group or project quotas.

To check quotas, your site may provide:

# Example commands (site-specific)
quota -s
mmquota fsname --block --user $USER   # Might require specific wrapper tools

If you hit a quota:

Writes may fail with “No space left on device” even if df shows free space.
Remove unneeded files or ask the support team about quota increases.

Snapshots and Backups

GPFS can support:

Snapshots: read‑only, point‑in‑time views of a filesystem or subtree.
Backup integration with external systems.

Usage details are site‑dependent, but practically:

Some sites give access to snapshot directories (e.g. .snapshots, .snapshot) where you can restore prior versions of files.
Don’t rely on GPFS scratch being backed up; project/home space might be, but confirm with your site.

Data Lifetimes and Purge Policies

Even though GPFS is persistent storage, HPC centers often define:

Purge policies for scratch GPFS:

Files not accessed for N days may be deleted automatically.

Archive policies:

Old, inactive data may be moved to cheaper storage tiers.

Always read site documentation about:

Where your data is safe for long‑term storage.
Which directories are purged and on what schedule.

Practical Tips and Common Pitfalls

Checking GPFS Status (User Perspective)

Direct administrative commands (like mm* tools) are typically restricted, but you can:

Check mounts and basic health indications:

  df -hT | grep gpfs

Look for maintenance announcements:

Login message of the day on login nodes.
Email or web portal notifications.

If jobs fail with I/O errors amid maintenance windows, GPFS issues may be involved; consult cluster status pages or support.

Cleaning Up Safely and Efficiently

To avoid overloading GPFS with metadata operations:

Use more targeted deletions:

  # Delete specific job outputs rather than everything:
  rm -rf /gpfs/scratch/$USER/job123/

For very large directories:

Consider deleting in smaller batches or using site‑provided cleanup tools.
Some sites provide optimized purge scripts that perform deletions more gently.

Don’t Assume GPFS = Local Disk

GPFS is networked and shared:

Bandwidth and latency differ from local SSDs.
Heavy I/O by many users at once can affect performance.

Where appropriate:

Copy small sets of files to node‑local storage (e.g. local SSD, $TMPDIR) within jobs.
Run the compute there and copy back results to GPFS at the end.

Example Slurm job snippet:

cp /gpfs/projects/myproj/input.dat $TMPDIR/
mycode -i $TMPDIR/input.dat -o $TMPDIR/output.dat
cp $TMPDIR/output.dat /gpfs/projects/myproj/results/

How to Learn GPFS Details on Your Cluster

Because GPFS is highly configurable, you should always:

Read your site’s storage documentation, which may define:

Names and purposes of different GPFS filesystems.
Quotas, purge rules, and backup behavior.
Recommended I/O practices for that installation.

Look for:

Pages mentioning “IBM Spectrum Scale” or “GPFS”.
Diagrams of storage tiers and examples of good/bad usage patterns.

If unsure whether a path is on GPFS, use:

df -hT /path/of/interest

and look at the Type column; gpfs usually indicates a GPFS filesystem.

Summary of User‑Relevant Points

GPFS (IBM Spectrum Scale) is a high‑performance parallel filesystem appearing as a normal POSIX filesystem but backed by many servers and disks.
It is often used for shared project, work, and scratch spaces on HPC clusters.
It performs best with large, sequential, parallel I/O and struggles with extremely metadata‑heavy workloads and huge numbers of tiny files in a single place.
Quotas, purge policies, and sometimes snapshots are enforced at the GPFS level; know the rules for each GPFS filesystem you use.
Combine GPFS with good data management practices (appropriate directories, parallel I/O libraries, local scratch when appropriate) to get both performance and reliability.

Comments

Please login to add a comment.

Don't have an account? Register now!