Kahibaro
Discord Login Register

4.8.1 NFS

Overview

Network File System, or NFS, is one of the oldest and most widely used technologies to share files over a network. In HPC clusters it usually appears as a simple, general purpose shared filesystem that all nodes can see. It is often used for home directories, small project areas, and configuration files, while heavier parallel filesystems are used for high bandwidth data.

From a user perspective NFS looks like any other directory in the Linux tree. You cd into it, run ls, and read and write files. The important differences are in performance, scalability, reliability, and the way it behaves under load on a cluster.

Basic idea of NFS in an HPC cluster

NFS follows a client server model. One or more NFS servers export directories, and all other nodes mount those directories over the network. The compute nodes act as NFS clients. When a process on a compute node opens or writes a file on an NFS mount, the data travels over the interconnect to the NFS server.

The key point in an HPC environment is that many nodes and many processes may access the same NFS server at once. This magnifies design choices that do not matter much on a single workstation.

On most clusters you will see NFS used for:

User home directories, often mounted on login nodes and compute nodes.

Small shared software trees or configuration directories.

System wide configuration files or small databases that do not carry heavy I/O loads.

It is usually not used for large checkpoint files, simulation output, or heavy parallel I/O.

NFS versions and relevance to HPC

Modern clusters normally use NFS version 3 or version 4. Older versions are rare and newer extensions may be used in specialized cases.

NFSv3 is stateless from the server perspective. Each request is handled without the server keeping much client state. This can help with server robustness, but limits the semantics and features.

NFSv4 is stateful and includes features such as stronger locking semantics and integrated security mechanisms. It can support compound operations and can be more efficient in some workloads.

On a typical cluster you do not need to worry about version selection as a user. System administrators choose and configure the version. What matters to you is that both versions aim to provide a standard POSIX style interface, but under contention and load they can behave differently from a local filesystem in subtle ways.

Mounts, paths, and visibility

Because NFS is mounted into the existing directory tree, it can be easy to forget that a directory is remote. On an HPC cluster you might see paths like /home/user, /proj/group, or /shared/software that are in fact NFS mounts.

You can check whether a directory is on NFS using commands like df or mount. For example:

$ df -h /home
Filesystem           Size  Used Avail Use% Mounted on
nfs-server:/export   10T   3T    7T   30% /home

From the application viewpoint these paths behave like local paths, but performance and failure modes are determined by the remote NFS server and the network.

It is common that:

The same NFS mount is visible on login nodes and on compute nodes, which makes it convenient for editing code and then running it on the cluster.

Temporary directories such as /tmp or node local scratch areas are not on NFS. Using these correctly has a large impact on performance.

Performance characteristics

NFS was designed for general workload sharing, not for extreme parallel performance. Several properties matter in HPC:

First, there is a single, central server, or a small group of servers. All I/O from all nodes passes through that limited resource. Bandwidth and IOPS are constrained.

Second, the latency of each operation is higher than with local storage. Every metadata operation, such as stat, open, close, and directory listing, involves network communication with the server.

Third, NFS has a consistency model that relies on caching at the client side. Clients cache file attributes and data, and periodically revalidate them with the server. In highly parallel runs with many writers this can cause surprising visibility delays or apparent inconsistencies.

In practice this leads to these typical patterns:

Reading or writing many small files on NFS from many tasks is very slow.

Frequently calling ls, stat, or scanning large directory trees on NFS can become a bottleneck.

Heavy simultaneous writes by many tasks to a single NFS directory can overload the server.

Large sequential reads or writes by a single user can achieve reasonable throughput, but will still be limited by the server and network.

Important rule: Avoid heavy parallel I/O on NFS from many ranks or threads. Use NFS for light, metadata centric activity, not as a high performance data path.

Common usage patterns in HPC

Most sites adopt similar guidelines for NFS usage.

Home directories on NFS are intended for source code, scripts, job submission files, build trees, and small results. They are usually backed up. They are not designed for large simulation outputs or massive temporary files.

Project or group directories on NFS may hold shared input data, reference manuals, and common software built by the group. Again, the expectation is moderate I/O.

Local scratch or parallel filesystems are provided for I/O intensive workloads. A typical workflow is to copy or stage input files from NFS to scratch, run the computation, and copy selected output back to NFS for long term storage and backup.

As a user you should learn which directories on your system are NFS backed and which are not. Your center usually publishes this information in documentation or login messages.

Typical bottlenecks and pitfalls

When many users run jobs that perform I/O on NFS at the same time, the server and network can become overloaded. This can degrade responsiveness for the entire cluster.

Common pitfalls include:

Writing logs from every MPI rank directly into the same NFS directory. Thousands of ranks each writing many small log files or appending to a single log file can saturate NFS.

Using NFS as a working directory for builds involving many small files, for example large C++ projects or Python environments with many tiny files. Builds will be much slower than on local storage.

Running codes that create and delete many small temporary files on NFS during the main computation. Repeated metadata operations are especially expensive.

Using job array tasks that all access the same NFS files at job start and end, for example reading configuration files or writing result summaries at exactly the same time.

These patterns can cause long waits in system calls like open, close, and fsync, even though CPU usage is low. Jobs appear stuck because they are blocked on I/O to a saturated NFS server.

Important rule: Do not write from many processes into a single NFS directory or file at the same time unless the volume is very small. Use node local or parallel filesystems for bulk I/O, then copy results back.

Correctness, locking, and consistency

NFS aims to provide near POSIX semantics, but there are limitations that become visible in parallel use.

Client side caching means that a process on one node may not see a change written by a process on another node immediately. Attribute cache timeouts and read caches control when clients check back with the server. This can affect file based synchronization patterns.

File locking is supported in NFS, but the implementation and reliability depend on version and configuration. In practice many HPC codes avoid relying on NFS locks for correctness. Locking across many nodes via NFS can be slow and fragile.

Write ordering and visibility can also be subtle. When multiple processes append to a file, the order of appended data is not guaranteed to be predictable, and partial writes may appear if the application does not use proper synchronization.

For these reasons, you should not use NFS for process synchronization or for structured, shared file formats that assume strict local filesystem semantics, unless the library or application has been explicitly designed and tested for NFS.

Good practices for users

Within the constraints of a shared HPC environment, you can use NFS effectively by following a few practical habits.

First, use NFS mainly for lighter duties. Store source code, job scripts, small configuration files, and modest sized outputs there. This takes advantage of the convenience and backup of centralized storage without overloading it.

Second, for scratch data and large I/O, use the storage recommended by your center. This might be a parallel filesystem or node local directories such as /scratch or /tmp. Many applications allow you to place temporary files in a separate directory through environment variables or configuration options.

Third, limit the number of files and directories you create on NFS. Deep directory trees with many tiny files slow down operations like ls and can hurt performance for all users.

Fourth, avoid using NFS for heavy per time step output, fine grained checkpointing, or high frequency logging. Collect logs locally and merge them after the job, or reduce logging verbosity when running at scale.

Fifth, use tools like rsync to move data between NFS and scratch efficiently. This reduces metadata operations and can be more robust than many small individual cp actions.

Important rule: Perform I/O intensive work on non NFS storage, then copy only necessary results to NFS for long term availability and backup.

Failure modes and resilience

NFS centralizes storage, and this has implications for fault tolerance on a cluster.

If the NFS server becomes overloaded or partially fails, clients may experience long pauses or apparent hangs. Commands involving the NFS mount, such as ls or cd into a mounted directory, can block while the kernel waits for the server. In severe cases jobs may stall for long periods or fail with I/O errors.

Some clusters configure automounters that mount NFS directories on demand. If the automounter or server is slow, first access to an NFS path in a job can take noticeable time. This can be amplified if thousands of jobs try to mount the same directory simultaneously.

Since NFS based home directories are often used for shell configuration files, a failure of the NFS home server can even prevent logins from working normally. This is one reason why I/O policy on home directories is important.

For critical results, you should understand the backup policy for NFS mounted areas on your system. Many sites back up home and project directories, but not scratch or parallel filesystems. This makes NFS a good resting place for curated data, but not a place to rely on for high volume, high speed file creation during jobs.

NFS and small scale development

Despite its limitations for large scale workloads, NFS is convenient and suitable for small scale development and testing on a cluster.

You can keep your git repositories and development environments in your NFS backed home directory. This lets you edit from a login node, a workstation using a remote mount, or even a web based IDE, and still run the code on the cluster without copying it.

Short test jobs with modest I/O are usually fine on NFS. You can prototype code, generate small test datasets, and verify behavior before moving to large scale runs on scratch and parallel filesystems.

This pattern, develop and test on NFS, run at scale on scratch, archive back to NFS, is typical of many HPC workflows and balances productivity with cluster wide performance.

Summary

NFS provides a familiar, convenient way for all nodes of an HPC cluster to share files, and it is a central part of many cluster environments. However, it is not a high performance parallel filesystem. Its client server design, caching behavior, and metadata costs make it unsuitable for heavy, large scale I/O.

Use NFS where it is strong, for shared configuration, code, scripts, and moderate volumes of data, and avoid it for bulk parallel access from many processes. Understanding this boundary and following site guidelines will make your jobs more reliable and responsive and will help maintain good performance for the entire HPC system.

Views: 29

Comments

Please login to add a comment.

Don't have an account? Register now!