Responsible computing practices

Table of Contents

Why Responsible Computing Matters in HPC

HPC systems are powerful, expensive, and shared by many users. How you use them affects:

Energy consumption and environmental impact
Availability of resources for other users
Scientific reliability and reproducibility
Data security, privacy, and compliance
The longevity and health of shared infrastructure

Responsible computing is about aligning your technical work with ethical, institutional, and community expectations, not just “getting results faster.”

Fair and Respectful Use of Shared Resources

Avoiding Resource Hoarding

On shared clusters, resources are limited:

Request only the CPUs/GPUs, memory, and time you realistically need.
Don’t submit many oversized jobs “just in case.”
Avoid holding interactive sessions or test jobs on large node counts.

Practical habits:

Start with small test jobs, then scale up when the configuration is correct.
Use job arrays instead of thousands of separate, tiny jobs when supported.
Clean up old reservations or long-running idle sessions.

Playing Nicely With the Scheduler

Schedulers implement site policies; working with them is part of responsible use:

Use the correct queues/partitions for your job size and urgency.
Respect project or account limits; don’t try to bypass them with extra accounts.
Avoid repeatedly canceling and resubmitting to “game” the queue.
Use job priorities (if available) only as documented by your site.

If you have a genuine urgent need (paper deadline, instrument time, etc.), talk to support staff instead of trying to hack around policies.

Being a Good Neighbor on Shared Nodes

On nodes shared by multiple users:

Stay within requested resources; if you’re given 4 cores, don’t spawn 64 threads.
Limit background or monitoring processes; don’t start heavy tools on login or shared nodes.
Avoid running heavy computations on login nodes unless explicitly allowed.

If you need exclusive-node access, request it explicitly as a resource, not by over-allocating CPU or memory.

Reducing Waste: Compute, Storage, and I/O

Minimizing Wasted Compute Time

Wasted compute is both an energy and fairness issue:

Validate your code with small problem sizes before large production runs.
Use checkpoints so long jobs can resume after failure instead of restarting from scratch.
Monitor early-time behavior of long jobs to catch obvious problems (e.g., divergence, NaNs, exploding memory usage).

When you discover a bug or bad parameter set:

Cancel affected jobs instead of letting them run to completion.
Document what went wrong so you (and teammates) don’t repeat the waste.

Responsible Storage Usage

Storage is also a shared resource, often expensive and energy-intensive:

Delete temporary and intermediate files when they are no longer needed.
Compress data where feasible, especially archives or infrequently accessed data.
Avoid saving huge logs or debug output by default; keep them short and targeted.
Respect quotas; don’t bypass them by scattering data across many directories or projects.

Distinguish between:

Short-term scratch space (for transient files; you should expect it to be purged).
Long-term project/archive storage (for results, curated data, and documentation).

Responsible I/O Behavior

Heavy I/O can affect other users and filesystem health:

Avoid creating millions of tiny files; prefer fewer, larger files or structured formats.
Don’t use parallel filesystems as personal backup targets for unstructured junk.
Stagger I/O-heavy operations (e.g., mass ls, find, or rsync across the entire filesystem) when possible.

If you plan very I/O-intensive workflows, talk to system staff about best practices and suitable filesystems.

Data Responsibility: Privacy, Compliance, and Integrity

Handling Sensitive and Controlled Data

If you work with human, proprietary, or controlled data:

Know which systems are approved for that data type; don’t copy it to unapproved clusters or cloud services.
Follow institutional policies (IRB, GDPR, HIPAA, export control, etc.) that apply to your project.
Never store passwords, tokens, or private keys in code repositories or world-readable directories.

When sharing data for collaboration or publication:

Anonymize or de-identify human data where required.
Strip metadata that may reveal sensitive information (locations, IDs, etc.) when necessary.
Use proper licenses and respect data usage restrictions from providers.

Protecting Data Integrity

Responsible computing includes preserving trust in your results:

Use checksums or hashes for important datasets transferred between systems.
Keep raw data read-only and separate from processed outputs.
Version your datasets and document transformations (e.g., pre-processing steps, filtering criteria).

If you detect corruption, mislabeling, or incomplete data:

Stop dependent analyses until you understand the issue.
Document the problem and, if relevant to others, report it to your team or facility.

Security-Conscious Behavior on Shared Systems

Account and Credential Hygiene

On shared HPC systems, your account is a trust boundary:

Use strong, unique passwords where passwords are used at all; prefer SSH keys or institutional SSO.
Protect private keys with passphrases.
Never share accounts; avoid password sharing even within a team.
Don’t embed credentials in scripts, job files, or environment variables that others can read.

If you suspect compromise (odd processes, unexplained logins, changed files):

Notify the HPC support team promptly.
Change related passwords and revoke affected keys.

Safe Software Practices

Your jobs run within a shared environment; avoid introducing risk:

Don’t run unverified binaries from untrusted sources.
Prefer official modules, containers, or vetted software repositories.
Avoid modifying system paths, LD_LIBRARY_PATH, or other global settings in ways that might affect other users.

When distributing your own code:

Include clear build instructions and minimal dependencies.
Avoid requiring privileged operations (e.g., sudo) on shared systems.

Research Integrity and Reproducibility as Responsibility

Honest Reporting of Results

HPC lets you generate large volumes of results quickly; integrity matters:

Don’t cherry-pick only “good” runs without documenting selection criteria.
Keep track of failed runs; they may be scientifically relevant or indicate modeling issues.
If you discover bugs that invalidate previous results, treat it as a duty to correct the record where appropriate.

Documenting Computational Workflows

Other researchers—and your future self—should understand what you did:

Record software versions, compiler options, input parameters, and environment settings.
Save job scripts and configuration files used for key results.
Use simple automation (scripts, Makefiles, workflow tools) to encode procedures rather than manual “click paths.”

This reduces unintentional misrepresentation (e.g., misremembered parameters, undocumented changes) and prevents needless reruns.

Collaboration, Attribution, and Community Norms

Giving Credit Where It’s Due

HPC work builds on shared infrastructure and software:

Acknowledge HPC centers and funding sources according to their citation guidelines.
Cite major software packages, libraries, and datasets you rely on.
Attribute contributions from colleagues, students, and staff support appropriately.

Respecting Others’ Work and Time

Systems staff and colleagues maintain the environment you rely on:

Read documentation and policies before requesting exceptions.
Ask concise, well-prepared questions, including error messages and job IDs.
Share solutions or scripts that may benefit others (within your project and, when allowed, publicly).

Personal and Team-Level Practices

Planning and Reviewing Compute Use

Treat compute consumption as something to plan and review:

Estimate compute budgets for projects (core-hours, GPU-hours, storage).
Periodically review which jobs and data actually contributed to outputs.
Adjust configurations to reduce obvious waste (e.g., oversized jobs, unnecessary replications).

Team Norms and Onboarding

Teams can institutionalize responsible behavior:

Establish internal guidelines for job sizes, data retention, and checkpointing.
Document “house styles” for directories, naming, and job scripts.
Onboard new members explicitly about cluster policies and responsible use, not just “how to run a job.”

Responding to Problems and Incidents

When You Make a Mistake

Misconfigurations or bugs that waste resources or cause issues are common:

Cancel problematic jobs immediately if you realize they are misbehaving.
Inform support if the mistake might have affected others (e.g., I/O storms, runaway processes).
Reflect and update your workflow so the error is less likely to recur (e.g., add sanity checks).

Reporting Issues Responsibly

If you notice:

Security concerns (suspicious activity, exposed credentials)
Policy violations (data misuse, abusive resource use)
Technical problems affecting many users

then:

Report to the appropriate contact (helpdesk, security office, PI) rather than ignoring it.
Provide specific information (time, system, job IDs) without speculating or accusing individuals in public channels.

Responsible computing on HPC is as much about behavior and judgment as it is about technical skill. Adopting these practices improves environmental sustainability, protects shared resources, and strengthens the reliability and credibility of your scientific or engineering work.

Comments

Please login to add a comment.

Don't have an account? Register now!