Table of Contents
Why Responsible Computing Matters in HPC
HPC systems are powerful, expensive, and shared by many users. How you use them affects:
- Energy consumption and environmental impact
- Availability of resources for other users
- Scientific reliability and reproducibility
- Data security, privacy, and compliance
- The longevity and health of shared infrastructure
Responsible computing is about aligning your technical work with ethical, institutional, and community expectations, not just “getting results faster.”
Fair and Respectful Use of Shared Resources
Avoiding Resource Hoarding
On shared clusters, resources are limited:
- Request only the CPUs/GPUs, memory, and time you realistically need.
- Don’t submit many oversized jobs “just in case.”
- Avoid holding interactive sessions or test jobs on large node counts.
Practical habits:
- Start with small test jobs, then scale up when the configuration is correct.
- Use job arrays instead of thousands of separate, tiny jobs when supported.
- Clean up old reservations or long-running idle sessions.
Playing Nicely With the Scheduler
Schedulers implement site policies; working with them is part of responsible use:
- Use the correct queues/partitions for your job size and urgency.
- Respect project or account limits; don’t try to bypass them with extra accounts.
- Avoid repeatedly canceling and resubmitting to “game” the queue.
- Use job priorities (if available) only as documented by your site.
If you have a genuine urgent need (paper deadline, instrument time, etc.), talk to support staff instead of trying to hack around policies.
Being a Good Neighbor on Shared Nodes
On nodes shared by multiple users:
- Stay within requested resources; if you’re given 4 cores, don’t spawn 64 threads.
- Limit background or monitoring processes; don’t start heavy tools on login or shared nodes.
- Avoid running heavy computations on login nodes unless explicitly allowed.
If you need exclusive-node access, request it explicitly as a resource, not by over-allocating CPU or memory.
Reducing Waste: Compute, Storage, and I/O
Minimizing Wasted Compute Time
Wasted compute is both an energy and fairness issue:
- Validate your code with small problem sizes before large production runs.
- Use checkpoints so long jobs can resume after failure instead of restarting from scratch.
- Monitor early-time behavior of long jobs to catch obvious problems (e.g., divergence, NaNs, exploding memory usage).
When you discover a bug or bad parameter set:
- Cancel affected jobs instead of letting them run to completion.
- Document what went wrong so you (and teammates) don’t repeat the waste.
Responsible Storage Usage
Storage is also a shared resource, often expensive and energy-intensive:
- Delete temporary and intermediate files when they are no longer needed.
- Compress data where feasible, especially archives or infrequently accessed data.
- Avoid saving huge logs or debug output by default; keep them short and targeted.
- Respect quotas; don’t bypass them by scattering data across many directories or projects.
Distinguish between:
- Short-term scratch space (for transient files; you should expect it to be purged).
- Long-term project/archive storage (for results, curated data, and documentation).
Responsible I/O Behavior
Heavy I/O can affect other users and filesystem health:
- Avoid creating millions of tiny files; prefer fewer, larger files or structured formats.
- Don’t use parallel filesystems as personal backup targets for unstructured junk.
- Stagger I/O-heavy operations (e.g., mass
ls,find, orrsyncacross the entire filesystem) when possible.
If you plan very I/O-intensive workflows, talk to system staff about best practices and suitable filesystems.
Data Responsibility: Privacy, Compliance, and Integrity
Handling Sensitive and Controlled Data
If you work with human, proprietary, or controlled data:
- Know which systems are approved for that data type; don’t copy it to unapproved clusters or cloud services.
- Follow institutional policies (IRB, GDPR, HIPAA, export control, etc.) that apply to your project.
- Never store passwords, tokens, or private keys in code repositories or world-readable directories.
When sharing data for collaboration or publication:
- Anonymize or de-identify human data where required.
- Strip metadata that may reveal sensitive information (locations, IDs, etc.) when necessary.
- Use proper licenses and respect data usage restrictions from providers.
Protecting Data Integrity
Responsible computing includes preserving trust in your results:
- Use checksums or hashes for important datasets transferred between systems.
- Keep raw data read-only and separate from processed outputs.
- Version your datasets and document transformations (e.g., pre-processing steps, filtering criteria).
If you detect corruption, mislabeling, or incomplete data:
- Stop dependent analyses until you understand the issue.
- Document the problem and, if relevant to others, report it to your team or facility.
Security-Conscious Behavior on Shared Systems
Account and Credential Hygiene
On shared HPC systems, your account is a trust boundary:
- Use strong, unique passwords where passwords are used at all; prefer SSH keys or institutional SSO.
- Protect private keys with passphrases.
- Never share accounts; avoid password sharing even within a team.
- Don’t embed credentials in scripts, job files, or environment variables that others can read.
If you suspect compromise (odd processes, unexplained logins, changed files):
- Notify the HPC support team promptly.
- Change related passwords and revoke affected keys.
Safe Software Practices
Your jobs run within a shared environment; avoid introducing risk:
- Don’t run unverified binaries from untrusted sources.
- Prefer official modules, containers, or vetted software repositories.
- Avoid modifying system paths,
LD_LIBRARY_PATH, or other global settings in ways that might affect other users.
When distributing your own code:
- Include clear build instructions and minimal dependencies.
- Avoid requiring privileged operations (e.g.,
sudo) on shared systems.
Research Integrity and Reproducibility as Responsibility
Honest Reporting of Results
HPC lets you generate large volumes of results quickly; integrity matters:
- Don’t cherry-pick only “good” runs without documenting selection criteria.
- Keep track of failed runs; they may be scientifically relevant or indicate modeling issues.
- If you discover bugs that invalidate previous results, treat it as a duty to correct the record where appropriate.
Documenting Computational Workflows
Other researchers—and your future self—should understand what you did:
- Record software versions, compiler options, input parameters, and environment settings.
- Save job scripts and configuration files used for key results.
- Use simple automation (scripts, Makefiles, workflow tools) to encode procedures rather than manual “click paths.”
This reduces unintentional misrepresentation (e.g., misremembered parameters, undocumented changes) and prevents needless reruns.
Collaboration, Attribution, and Community Norms
Giving Credit Where It’s Due
HPC work builds on shared infrastructure and software:
- Acknowledge HPC centers and funding sources according to their citation guidelines.
- Cite major software packages, libraries, and datasets you rely on.
- Attribute contributions from colleagues, students, and staff support appropriately.
Respecting Others’ Work and Time
Systems staff and colleagues maintain the environment you rely on:
- Read documentation and policies before requesting exceptions.
- Ask concise, well-prepared questions, including error messages and job IDs.
- Share solutions or scripts that may benefit others (within your project and, when allowed, publicly).
Personal and Team-Level Practices
Planning and Reviewing Compute Use
Treat compute consumption as something to plan and review:
- Estimate compute budgets for projects (core-hours, GPU-hours, storage).
- Periodically review which jobs and data actually contributed to outputs.
- Adjust configurations to reduce obvious waste (e.g., oversized jobs, unnecessary replications).
Team Norms and Onboarding
Teams can institutionalize responsible behavior:
- Establish internal guidelines for job sizes, data retention, and checkpointing.
- Document “house styles” for directories, naming, and job scripts.
- Onboard new members explicitly about cluster policies and responsible use, not just “how to run a job.”
Responding to Problems and Incidents
When You Make a Mistake
Misconfigurations or bugs that waste resources or cause issues are common:
- Cancel problematic jobs immediately if you realize they are misbehaving.
- Inform support if the mistake might have affected others (e.g., I/O storms, runaway processes).
- Reflect and update your workflow so the error is less likely to recur (e.g., add sanity checks).
Reporting Issues Responsibly
If you notice:
- Security concerns (suspicious activity, exposed credentials)
- Policy violations (data misuse, abusive resource use)
- Technical problems affecting many users
then:
- Report to the appropriate contact (helpdesk, security office, PI) rather than ignoring it.
- Provide specific information (time, system, job IDs) without speculating or accusing individuals in public channels.
Responsible computing on HPC is as much about behavior and judgment as it is about technical skill. Adopting these practices improves environmental sustainability, protects shared resources, and strengthens the reliability and credibility of your scientific or engineering work.