Kahibaro
Discord Login Register

Backup and restore

Goals of backup and restore in OpenShift

In OpenShift, backup and restore is about preserving and recovering:

For production clusters, you typically design backup around:

The goal is not only to copy data, but to be able to rebuild the platform and workloads to a known-good state.

What to back up in OpenShift

Think in terms of distinct layers. Each layer has different tools and strategies.

1. Cluster control plane state (etcd)

etcd is the source of truth for almost all Kubernetes/OpenShift API resources:

Key points:

2. Workload data (persistent volumes)

Persistent Volumes (PVs) store application data outside the API server:

Backups here depend on:

3. Platform configuration and tooling

Beyond etcd and PVs:

Many of these should be declarative and stored in Git (GitOps), so Git itself becomes a key part of your backup strategy.

Backup strategies and patterns

Cluster-wide vs application-focused backups

You usually combine:

This allows you to:

Logical vs physical backups

Logical backups are more portable and version-tolerant; physical backups are faster, but more tightly coupled to versions and infrastructure.

RPO/RTO and scheduling

Common practices:

Use:

Backing up etcd in OpenShift

OpenShift provides cluster-native mechanisms and best practices; exact commands differ slightly by version, but the core ideas are stable.

When to take etcd backups

Take an etcd snapshot:

Characteristics and constraints

Storage and retention

Treat etcd backups like high-value secrets:

Backing up workloads and namespaces

Backup of app configuration should be at a higher level than etcd whenever possible.

Resource manifests

Typical practice:

Using oc:

GitOps-centric backup

With GitOps:

In a disaster:

Persistent storage backup and restore

Storage backups are highly dependent on backends; OpenShift’s storage model (PVs, PVCs, StorageClasses) is the abstraction, but backup is implemented below or beside it.

Storage-level snapshots and backups

For CSI-based or cloud-native storage:

Key considerations:

Application-consistent backups

For databases and other transactional systems:

Backups must be:

Stateless vs stateful apps

Tools and approaches for backup and restore

This chapter does not prescribe a specific vendor, but there are common types of tools used with OpenShift.

Cluster-native and CLI-based workflows

Typical building blocks:

Operator-based backup solutions

Backup solutions often come as Operators:

Advantages:

Storage-integrated solutions

Many enterprise storage platforms:

In OpenShift, you typically:

Restore scenarios and workflows

Recovery is where design choices around backup become visible. It’s important to distinguish what is being restored and where.

1. Restoring an entire cluster from etcd

Used after catastrophic failure of control plane or severe misconfiguration.

High-level flow:

  1. Prepare a clean set of control-plane nodes, matching:
    • OpenShift version.
    • Infrastructure layout (IPs, hostnames, etc.).
  2. Follow the official recovery procedure to:
    • Bootstrap a new control plane.
    • Restore etcd from a previously taken snapshot.
  3. Verify:
    • All API resources are present.
    • Nodes rejoin and become Ready.
    • Operators reconcile and reach healthy status.

Limitations:

2. Restoring a namespace or application in-place

For localized incidents (accidental deletion, misconfiguration of a single app):

  1. Reapply manifests from Git or backups:
    • Namespaces, roles, bindings, Deployments/StatefulSets, Services, Routes, ConfigMaps, Secrets, PVC definitions.
  2. Restore PV data:
    • From snapshot → new PVC.
    • From backup archive → populate volume.
    • From DB dump.
  3. Connect workload to restored PVC:
    • Update PVC name or use same PVC name when restoring.
    • Restart Pods/StatefulSets so they attach to restored data.
  4. Validate:
    • Application functionality.
    • Data correctness and consistency.

3. Cross-cluster migration or DR failover

For DR or migration (e.g., region A → region B):

Key concerns:

Operational considerations and best practices

Test restores regularly

Backups are only as good as your ability to restore:

Align with upgrades and maintenance

Relate backup/restore to other operational procedures:

Security and compliance

Separation of concerns

Clear responsibilities and interfaces simplify incident response.

Documentation and automation

Designing a backup and restore plan for OpenShift

When creating a plan, you typically define:

  1. Scope:
    • Which clusters (prod, stage, dev) and which namespaces.
  2. Protection levels:
    • Critical, important, non-critical workloads and their RPO/RTO.
  3. Mechanisms:
    • etcd snapshots.
    • Storage-level snapshots/backups.
    • Application-level dumps.
    • Git/GitOps for configuration.
  4. Runbooks:
    • Entire cluster failure.
    • Single namespace/data corruption.
    • Cross-cluster migration or DR.
  5. Validation:
    • Regular restore tests.
    • Audits of coverage (ensure new apps are included).
  6. Integration with upgrades and operations:
    • Pre-upgrade snapshot policies.
    • Post-restore verification procedures.

A robust backup and restore strategy lets you perform maintenance and respond to failures confidently, and it is a central part of operating OpenShift as a reliable platform.

Views: 10

Comments

Please login to add a comment.

Don't have an account? Register now!