7.5 Rolling updates and rollbacks

Why rolling updates matter

Rolling updates let you deploy new versions of your application without stopping service. Instead of replacing all pods at once, OpenShift gradually replaces old pods with new ones, keeping the application available.

Rollbacks are the counterpart: if something goes wrong with a new version, you quickly return to a previously working state.

In OpenShift, rolling behavior is implemented by:

Deployment objects (Kubernetes-native)
DeploymentConfig objects (OpenShift-specific)

Both support rolling updates and rollbacks, but the mechanisms and knobs differ slightly.

This chapter focuses on:

How rolling updates work conceptually in OpenShift
How to control rollout behavior
How to monitor rollouts
How to perform and understand rollbacks
Common pitfalls and best practices

Rolling updates with Deployments

A Deployment manages a ReplicaSet, which in turn manages pods. During a rolling update:

A new ReplicaSet is created for the new version.
Pods from the new ReplicaSet are gradually scaled up.
Pods from the old ReplicaSet are gradually scaled down.
Service traffic is continuously routed to all ready pods (old + new) via the underlying Service.

You control the rollout strategy via the spec.strategy field of the Deployment.

RollingUpdate strategy

For a rolling update, spec.strategy.type is RollingUpdate (the default):

yaml

spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 25%
      maxSurge: 25%

Key parameters:

maxUnavailable

Maximum number of pods that can be unavailable during the update
Can be an absolute number (1, 2, …) or a percentage (25%)
Affects how aggressively old pods can be taken down before new ones become ready

maxSurge

Maximum number of extra pods (beyond the desired replica count) allowed during the update
Also numeric or percentage
Affects how many new pods can be started at once

Example:

Desired replicas: 4
maxUnavailable: 1
maxSurge: 1

During the rollout:

Up to 5 pods may run at once (4 desired + 1 surge).
At least 3 pods must remain available.

Choosing values is a trade-off between:

Availability (keep more pods of any version running)
Speed (roll out faster with larger maxSurge / maxUnavailable)
Resource usage (surge pods consume additional CPU/memory/storage)

Deployment update triggers

Deployments create new revisions when certain fields change, most commonly:

Container image (spec.template.spec.containers[].image)
Environment variables
ConfigMap/Secret references (if you change the Pod template to use new keys or names)
Labels/annotations on the Pod template

A typical update workflow:

Edit the Deployment (YAML, oc set image, or via the web console).
A new ReplicaSet is created.
Rolling update starts according to the configured strategy.

Example using oc:

bash

oc set image deployment/myapp mycontainer=image-registry.example.com/team/myapp:v2

This triggers a new revision of the Deployment and starts a rolling update.

Controlling progress and timeouts

Deployments track rollout progress:

spec.progressDeadlineSeconds defines how long OpenShift/Kubernetes waits for the rollout to make progress.
If the rollout stalls (e.g., pods crash-loop or never become ready), it is marked as failed.

Example:

yaml

spec:
  progressDeadlineSeconds: 600  # 10 minutes

If pods fail readiness checks or cannot be scheduled, the Deployment will not complete, and you’ll see conditions like ProgressDeadlineExceeded.

OpenShift commands to inspect:

bash

oc rollout status deployment/myapp
oc describe deployment/myapp

Rolling updates with DeploymentConfigs

DeploymentConfig is an OpenShift-specific controller with its own rolling strategy and trigger system. It behaves similarly to a Deployment but uses ReplicationController objects instead of ReplicaSet.

Rolling strategy in DeploymentConfig

The rolling strategy is defined under spec.strategy.type: Rolling:

yaml

spec:
  strategy:
    type: Rolling
    rollingParams:
      maxUnavailable: 25%
      maxSurge: 25%
      intervalSeconds: 1
      timeoutSeconds: 600
      updatePeriodSeconds: 1

Important parameters:

maxUnavailable / maxSurge

Same concept as for Deployment.

intervalSeconds

Time between polling deployment status and making progress decisions.

updatePeriodSeconds

Time to wait between individual pod updates (throttling the rollout).

timeoutSeconds

Overall timeout for the rollout; if exceeded, the deployment is considered failed.

You can also configure pre/post lifecycle hooks in rollingParams (e.g., to run migrations before switching fully to new pods), but hook details are typically treated in more advanced chapters.

DeploymentConfig triggers and image changes

DeploymentConfig supports triggers to start new deployments automatically:

Typical triggers:

Image change trigger:

Deploys automatically when a referenced image stream tag changes.

Config change trigger:

Deploys when the pod template changes in the DeploymentConfig.

Example snippet:

yaml

spec:
  triggers:
    - type: ConfigChange
    - type: ImageChange
      imageChangeParams:
        automatic: true
        containerNames:
          - mycontainer
        from:
          kind: ImageStreamTag
          name: myapp:latest

Whenever the ImageStreamTag myapp:latest is updated, a new rollout is initiated.

`oc` commands for DeploymentConfigs

Key commands:

Start a new rollout manually:

bash

  oc rollout latest dc/myapp

Watch rollout:

bash

  oc rollout status dc/myapp

The rollout behavior is still governed by rollingParams even if the rollout is manual.

Monitoring and managing rollouts

Regardless of whether you use Deployment or DeploymentConfig, you need to:

Track rollout progress
Inspect failures
Potentially pause or resume rollouts

Inspecting rollout status

For Deployments:

bash

oc rollout status deployment/myapp
oc get deployment myapp
oc describe deployment myapp

For DeploymentConfigs:

bash

oc rollout status dc/myapp
oc get dc myapp
oc describe dc myapp

Typical things to look for:

Conditions such as Progressing, Available, ReplicaFailure
Events showing scheduling issues, image pull errors, or readiness probe failures
Number of updated/available/unavailable replicas

Pausing and resuming rollouts (Deployments only)

Deployments support pausing:

bash

oc rollout pause deployment/myapp
# modify spec.template, e.g., add environment variables, sidecars, etc.
oc rollout resume deployment/myapp

While paused, changes to the deployment specification are recorded but not applied to pods until you resume. This is useful for batching multiple changes into a single rollout.

Rollbacks: reverting to a previous version

If a new version misbehaves (errors, performance issues, failed health checks), you can roll back.

The core idea:

Each rollout is stored as a revision (Deployment revision or DeploymentConfig version).
A rollback sets the controller’s template back to a previous revision and starts another rolling update.

Rollbacks with Deployments

To view revisions and history:

bash

oc rollout history deployment/myapp
oc rollout history deployment/myapp --revision=3

To roll back to the previous revision:

bash

oc rollout undo deployment/myapp

To roll back to a specific revision:

bash

oc rollout undo deployment/myapp --to-revision=3

Outcome:

The Deployment spec (pod template) is reverted to the target revision.
A new rolling update is started from the current pods to the reverted template.

Note:

By default, Deployments keep a number of old ReplicaSets (spec.revisionHistoryLimit). If old revisions are garbage-collected, you cannot roll back to them.

Rollbacks with DeploymentConfigs

DeploymentConfigs have a similar concept, with an internal version number.

View rollout history:

bash

oc rollout history dc/myapp
oc rollout history dc/myapp --revision=3

Rollback:

bash

oc rollout undo dc/myapp
# or to a specific revision:
oc rollout undo dc/myapp --to-revision=3

You can also manually set spec.template from a previous revision if you need finer control, but oc rollout undo is usually sufficient.

What gets reverted (and what doesn’t)

Rollbacks typically revert:

Pod template (container images, env vars, ports, volumes, probes)
Labels/annotations on the template

They do not revert:

Persistent data in volumes or databases
External services or configuration outside this specific controller
Other resources like ConfigMaps or Secrets that may have changed independently

This means a “rollback” in OpenShift is a workload configuration rollback, not a full environment time machine.

Dealing with failed rollouts

During a rollout, failures can occur due to:

Broken container image (crash loop, application error)
Failing liveness or readiness probes
Misconfigured environment variables or secrets
Resource constraints (pods cannot schedule)
Networking issues

Detecting failures

Use:

bash

oc rollout status deployment/myapp
oc logs deployment/myapp
oc describe pod <pod-name>

or for DeploymentConfigs:

bash

oc rollout status dc/myapp
oc logs dc/myapp
oc describe pod <pod-name>

Signs of a failed rollout:

oc rollout status does not complete or reports failure
Pods stuck in CrashLoopBackOff or ImagePullBackOff
Conditions such as ProgressDeadlineExceeded

Reacting to failures

Common responses:

Immediate rollback:

If the new version is clearly broken, use oc rollout undo to restore service quickly.

Fix-forward:

If the issue is minor and quick to fix, you may choose to push a corrected version as a new rollout rather than rolling back.

Adjust strategy:

If failures are related to load or capacity, you might:

Reduce maxSurge / maxUnavailable
Increase resources or limits
Re-tune probes so they reflect realistic startup and readiness behavior

Zero-downtime and readiness considerations

Rolling updates depend heavily on pods becoming ready before they are counted toward available capacity.

Key aspects:

Readiness probes:

Until a pod passes its readiness probe, it will not receive traffic via Services or Routes.
If you misconfigure readiness (too strict, wrong path/port), the rollout can stall.

Shutdown behavior:

When a pod is terminated during a rolling update, it receives a SIGTERM and has a grace period (terminationGracePeriodSeconds) to shut down gracefully.
If your app ignores termination signals or takes too long, users may experience errors.

Stateful components:

For stateful or session-heavy apps, consider:

Allowing some time for connections to drain before killing old pods.
Using readiness/lifecycle hooks (e.g., preStop) to coordinate shutdown.

Deeper details are covered in chapters on storage and stateful applications.

Tuning readiness and termination behavior is essential for true zero-downtime rolling updates.

Blue-green and canary as alternatives

Rolling updates are not the only strategy:

Blue-green:

Run old (blue) and new (green) versions side-by-side.
Switch traffic entirely at once using routing rules.
Provides a clear, quick rollback path (switch back to blue).

Canary:

Send a small fraction of traffic to the new version.
Gradually increase if no issues appear.

OpenShift’s rolling update feature can be combined with these patterns (for example, by using multiple Deployments and Routes), but the implementation details of blue-green and canary are typically addressed in more advanced deployment chapters.

Best practices for rolling updates and rollbacks

Treat your workload definitions (Deployment, DeploymentConfig) as version-controlled code.
Use clear version tags for images (avoid mutable tags like latest in production).
Always define readiness and liveness probes appropriate to your application.
Start with conservative rollout parameters:

maxUnavailable: 0 and a small maxSurge for critical services
Relax these as you gain confidence and need faster rollouts.

Regularly test rollback procedures in non-production environments so the process is familiar and reliable.
Monitor not just rollout status, but also:

Application-level metrics (latency, error rates)
Logs during and after rollouts

Keep revisionHistoryLimit at a sensible value so you have meaningful rollback targets without uncontrolled history growth.

Using rolling updates and rollbacks correctly lets you ship changes frequently while maintaining stability and user experience—one of the core advantages of OpenShift-based application deployment.

Comments

Please login to add a comment.

Don't have an account? Register now!

7.5 Rolling updates and rollbacks

Why rolling updates matter

Rolling updates with Deployments

RollingUpdate strategy

Deployment update triggers

Controlling progress and timeouts

Rolling updates with DeploymentConfigs

Rolling strategy in DeploymentConfig

DeploymentConfig triggers and image changes

`oc` commands for DeploymentConfigs

Monitoring and managing rollouts

Inspecting rollout status

Pausing and resuming rollouts (Deployments only)

Rollbacks: reverting to a previous version

Rollbacks with Deployments

Rollbacks with DeploymentConfigs

What gets reverted (and what doesn’t)

Dealing with failed rollouts

Detecting failures

Reacting to failures

Zero-downtime and readiness considerations

Blue-green and canary as alternatives

Best practices for rolling updates and rollbacks

Comments

Where to Move