4.8.1 NFS

Table of Contents

Overview of NFS in HPC

Network File System (NFS) is one of the oldest and most widely used networked filesystems in Unix/Linux environments. In HPC, NFS is typically used for shared but low‑to‑moderate performance storage, not for the highest‑throughput parallel I/O.

In a cluster, NFS usually serves:

Home directories
Small project spaces
Configuration files and shared tools
License servers/support files

More performance‑critical workloads (large parallel I/O, scratch) are usually placed on dedicated parallel filesystems like Lustre or GPFS, but NFS remains important infrastructure.

Basic NFS Architecture in a Cluster

NFS follows a classic client–server model:

NFS server
A storage server exports one or more directories over the network.
NFS clients
Other systems (login nodes, compute nodes, management nodes) mount these exports, accessing them as if they were local directories.
Network transport
Traditionally TCP over Ethernet or InfiniBand (or IP over InfiniBand). Performance and reliability depend heavily on network quality.

In an HPC cluster, common patterns are:

A single NFS server exporting /home to all nodes
One or more NFS servers exporting shared software trees, e.g. /opt, /apps, or /cm/shared
Management servers exporting configuration directories (e.g. /var/yp, /tftpboot, provisioning/management data)

NFS Versions and Their Impact in HPC

You’ll commonly encounter:

NFSv3

Stateless protocol
Widely supported and simple
Separate locking protocol (NLM)
No built‑in strong security (relies mostly on host‑based trust, firewalls)
Often still used for compatibility

NFSv4 / NFSv4.1 / NFSv4.2

Stateful, with integrated locking and better semantics
Supports ACLs and more advanced security (Kerberos)
NFSv4.1 introduces sessions and better parallelism (pNFS support)
Standard in modern Linux distributions

In HPC, administrators balance:

Compatibility with older OS images and tools
Performance characteristics for many concurrent clients
Security requirements (especially on multi‑user or multi‑tenant clusters)

As a user, you typically don’t choose the NFS version directly; the cluster is configured so mounts use the site‑preferred version.

What NFS Is Typically Used For in HPC

NFS is well suited for:

Home directories

Your shell startup files (.bashrc, .bash_profile)
Source code, small datasets, scripts, configuration
Moderate I/O, usually from a single user at a time

Shared software trees

Compilers, libraries, and tools that don’t need extreme performance
Small executables and libraries (e.g. admin-maintained tools in /opt)

Configuration and management data

Centralized configurations for cluster management
Shared license files or small data for engineering tools

These usages emphasize simplicity and centralization over maximum throughput.

NFS is not usually used for:

High‑bandwidth parallel I/O from thousands of processes
Very large scratch spaces with heavy write loads
Latency‑sensitive metadata operations at massive scale

Those are handled by parallel filesystems introduced in other chapters.

Basic NFS Usage from a User Perspective

On most clusters, NFS mounts are already configured by the administrators. As a user:

Your home directory might be something like /home/username or /users/username.
Shared software could live at paths like /opt, /apps, or /software.

You generally interact with files over NFS no differently than with local files:

ls, cp, mv, rm, mkdir, etc.
Editing files via vim, nano, emacs, or IDEs connected over SSH

However, because NFS is network‑based, performance and behavior differ from local disk:

File operations may feel slower under load
Many simultaneous jobs accessing the same NFS area can cause contention

Recognizing NFS Mounts

To see which directories are on NFS, you can run:

mount | grep nfs

df -hT | grep nfs

This helps you understand where your data lives and what performance to expect.

Performance Characteristics and Limitations

NFS is not a parallel filesystem in the HPC sense; it has several characteristics that matter for cluster workloads.

Single Server Bottlenecks

Traditional NFS setups have one server (or a small number) handling all requests.
A large number of clients (hundreds or thousands of nodes) may overload a single NFS server if jobs hammer it with I/O.
This is why admins often discourage heavy I/O in home directories and provide scratch or parallel storage for data‑intensive runs.

Metadata and Small-File Overheads

Operations on many small files (e.g. millions of small output files) cause:

High metadata load (lookups, create/delete, attribute checks)
Many RPC calls across the network

This can significantly slow jobs and affect other users.

For HPC workloads, this motivates:

Using fewer, larger files where possible
Avoiding massive small‑file creation on NFS
Using parallel filesystems for large, structured outputs

Caching Effects

NFS clients cache file data and attributes to reduce network round trips. Consequences:

Performance improves for repeated reads of the same data.
There may be delays in seeing changes when multiple jobs/users access the same files concurrently.
Some operations may not be immediately visible across nodes (e.g. very small timing windows in rapidly changing files).

For most basic workflows this is invisible, but for tightly coupled concurrent access/busy directories, it can matter.

NFS Mount Options Relevant to HPC

Administrators tune mount options to balance performance, robustness, and consistency. Some common options you might see (for information only):

rw / ro – read‑write vs read‑only
hard – NFS operations retry indefinitely on server failures (common in HPC to avoid silent corruption)
intr / soft (less common in HPC) – allow interrupts/timeout of operations, but can risk data consistency
noatime – avoids updating access times, reducing metadata writes
rsize / wsize – I/O block sizes for performance tuning
vers=n – specify NFS protocol version

On many clusters, these options are set centrally in /etc/fstab or automounter maps and are not user-modifiable.

Typical Best Practices for Users on NFS

While configuration is mostly out of your hands, you can use NFS effectively by following simple practices:

1. Avoid Heavy I/O in Home Directories

Use NFS-mounted home for:

Source code
Submission scripts
Small configuration files
Small to moderate result sets for post-processing and archiving

Use dedicated scratch / parallel filesystem for:

Large simulation outputs
Checkpoints of large runs
High-rate logging or writing from many processes

A common pattern:

# On login node
cd $HOME
cp -r myproject $SCRATCH/myproject_run
# Edit and submit from SCRATCH location
cd $SCRATCH/myproject_run
sbatch run.slurm
# After job finishes, copy key results back to HOME (NFS) for archiving
cp important_results.tar.gz $HOME/results/

2. Limit Small-File Explosion

Group related data into fewer, larger files (e.g. HDF5, NetCDF) when possible.
Avoid creating one file per process per timestep in NFS directories.
Periodically clean up obsolete files and directories.

3. Be Careful with Concurrent Writing

Many processes writing to the same NFS file simultaneously can cause poor performance or undefined behavior, depending on how the application is written.
Prefer:

Each process writing its own output in a controlled pattern, or
Using libraries designed for concurrent I/O, and placing data on storage that supports it well.

4. Consider Job Startup Storms

Launching thousands of processes that all read the same executable or input from NFS at once can overload the server.
Admins may mitigate this via caching or local copies, but as a user:

Avoid huge simultaneous compilations on login/home if not necessary.
Prefer staging large shared input datasets to scratch.

Common NFS-Related Issues Users May See

You may encounter:

“Stale NFS file handle” errors

Happen when files or directories are changed/removed while mounted
Often transient; reloading directory views or re-running commands sometimes helps
Persistent problems should be reported to support

Freezes or slow responses in directories

When the NFS server is heavily loaded or unresponsive
Commands like ls may hang if they access a busy or unavailable NFS mount

Permission mismatches

NFS often relies on consistent user and group IDs across nodes (e.g. via LDAP)
If mapping is misconfigured, you may see unexpected permission denied errors

On a managed HPC system, you typically report these behaviors to the support team rather than modify NFS settings yourself.

Security Considerations (High-Level)

NFS security in HPC environments is typically based on:

Network isolation (internal cluster networks, firewalls)
Consistent user authentication (e.g. centralized directory services)
NFSv4 features like Kerberos in more security-conscious or multi-tenant environments

For most users, the main implications are:

Treat NFS areas as shared infrastructure with other users on the cluster.
Do not store highly sensitive personal data there unless explicitly allowed and protected.
Respect file permissions, quotas, and site policy.

Summary: NFS’s Role in an HPC Filesystem Mix

Within an HPC cluster’s overall storage architecture, NFS usually:

Provides convenient, centralized, moderate-performance shared storage.
Hosts home directories, small shared software trees, and configuration.
Is not the right place for heavy, large-scale parallel I/O.

Effective HPC workflows typically:

Use NFS for scripts, configuration, small data, and long-term personal storage.
Use high-performance parallel or scratch filesystems for data-intensive runs and large datasets.

Understanding where NFS fits lets you design workflows that are both efficient and kind to shared infrastructure.

Comments

Please login to add a comment.

Don't have an account? Register now!