11.2 Vertical scaling

Table of Contents

Concept of Vertical Scaling in OpenShift

Vertical scaling in OpenShift means changing the resources available to a pod (and therefore its containers), rather than changing the number of pod replicas. Instead of adding more pods, you give more CPU, memory, or other resources to the existing pods.

In Kubernetes/OpenShift terms, this is about adjusting:

requests – what a pod is guaranteed to get.
limits – the maximum a pod is allowed to use.

Vertical scaling is especially relevant for:

Stateful or legacy applications that cannot scale out horizontally.
Workloads with strong single-threaded bottlenecks.
Applications with strict licensing per node/core.

Horizontal Pod Autoscaling (HPA) and cluster-autoscaling focus on more pods and more nodes; vertical scaling focuses on bigger pods.

Resource Requests and Limits

Vertical scaling is implemented primarily by adjusting CPU and memory requests/limits in pod specs (via Deployment, StatefulSet, DeploymentConfig, etc.).

Example:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: example-app
spec:
  replicas: 1
  template:
    spec:
      containers:
      - name: app
        image: myorg/example:latest
        resources:
          requests:
            cpu: "250m"
            memory: "512Mi"
          limits:
            cpu: "2"
            memory: "4Gi"

Typical vertical scaling actions:

Scale up: increase requests and/or limits (e.g. from 512Mi to 2Gi memory).
Scale down: decrease requests and/or limits.

Implications:

If requests are increased, the pod may need to be rescheduled to a node that has enough free capacity.
If limits are too low, the pod may throttle on CPU or get OOMKilled on memory.
If requests are too high, scheduling options shrink, and overall cluster utilization gets worse.

Manual Vertical Scaling Workflow

Vertical scaling can be done manually by editing the workload spec and letting OpenShift roll out new pods.

Typical workflow:

Observe metrics:

Pod CPU usage, throttling, memory usage, OOM kills.

Update the workload definition:

oc edit deployment my-app or
oc set resources deployment/my-app --limits=cpu=2,memory=4Gi --requests=cpu=1,memory=2Gi

Apply new configuration:

OpenShift triggers a rollout (new pods with new resources).

Verify:

oc get pods to see new pods.
Re-check metrics to ensure the new size is sufficient.

Example with oc:

# Increase CPU and memory for a deployment
oc set resources deployment/my-app \
  --requests=cpu=500m,memory=1Gi \
  --limits=cpu=2,memory=4Gi

Manual scaling is simple but:

Depends on human analysis and intervention.
May be too slow for rapid workload changes.
Can be error-prone in large environments.

Vertical Pod Autoscaling Concepts

Kubernetes has the concept of Vertical Pod Autoscaling (VPA), and similar functionality can be used within OpenShift depending on version, installed Operators, or custom tooling.

The idea:

Continuously observe pod resource usage.
Recommend or apply new CPU/memory requests/limits.
Trigger pod restarts with updated resource settings.

Typical VPA components (conceptual):

Recommender – analyzes historical metrics (CPU, memory usage).
Updater – decides which pods should be updated and when.
Admission controller/webhook – mutates pod specs at creation time to include recommended resources.

Modes you may encounter:

Off / Recommendation-only: produces suggestions but doesn’t change pods.
Initial: sets resources only when a pod is created.
Auto: can evict and recreate pods with updated resources.

Use cases:

Batch workloads with unpredictable size.
Long-running services where it’s hard to tune requests manually.
Environments where you want to gradually converge on right-sizing.

Trade-offs:

VPA may evict pods, causing restarts.
Works best with robust applications that can handle restarts and short downtimes.
Needs properly configured metrics and retention to make good decisions.

Vertical Scaling and Scheduling

Vertical scaling affects how pods fit onto nodes:

Increasing requests makes pods “bigger” from the scheduler’s perspective.
Larger pods may only fit on a few nodes, impacting placement.
If no node can satisfy the increased request, the pod stays in Pending state.

Considerations:

Node size vs pod size:

If nodes are small and pods are huge, fragmentation increases, and scheduling becomes difficult.

Overcommitment:

limits > requests allow overcommit; many pods can ask for more peak than guaranteed.
High overcommit needs careful monitoring; otherwise pods may experience contention and throttling.

In OpenShift clusters with cluster autoscaling, vertical scaling can:

Trigger node scale-out if pods with larger requests cannot be placed.
Cause scale-in delays, as large pods may pin nodes from being emptied.

Vertical vs Horizontal Scaling in OpenShift

Vertical scaling and horizontal scaling often complement each other.

Typical patterns:

Step 1: Right-size vertically:

Make sure each pod has a reasonable amount of resources.
Avoid tiny pods that constantly OOM or huge pods that waste resources.

Step 2: Scale horizontally:

Use HPA or manual replica adjustments to handle varying load.

When vertical scaling is more appropriate:

Apps that must maintain strong session locality or in-memory state that does not easily shard.
Licensed software limited by number of instances or cores.
Workloads benefiting from high per-pod memory (e.g. large in-memory caches, some JVM apps).

When horizontal scaling is preferred:

Stateless microservices.
Web frontends where adding pods is cheap.
Workloads designed around distributed parallelism.

In some scenarios, you may need both:

Start with large-enough pods to be efficient.
Then use HPA to increase/decrease replica count.

Vertical Scaling Strategies and Best Practices

Start with requests, not limits

Requests drive scheduling and capacity planning.
Set requests to a reasonable baseline (e.g. 50–70% of typical usage).
Use limits to prevent extreme outliers, but keep them above normal peaks to reduce throttling.

Use data, not guesses

Base vertical scaling decisions on:

Historical metrics from OpenShift monitoring (CPU, memory).
OOM kill events and restart reasons.
Response time / throughput metrics when saturation is suspected.

Example steps:

Observe pods running at >80% memory with occasional OOMKilled events.
Increase memory request/limit by a small factor (e.g. +25–50%).
Watch for stability over several peak cycles.

Beware of JVM and managed runtimes

Java, .NET, and other managed runtimes may:

Size their heap based on available memory, not just requests.
Hit limits and cause OutOfMemoryError if limits are too low.
Need tuning flags (e.g. heap size) aligned with container memory limits.

For these workloads, vertical scaling is tightly coupled with runtime tuning.

Plan for restarts

Changing pod resources typically implies recreating pods:

For stateless workloads with multiple replicas, this is usually fine.
For single-instance or stateful apps, plan:

Maintenance windows.
Readiness/liveness probes to ensure healthy rollouts.
Use of PodDisruptionBudget to limit concurrent disruptions.

Interactions with quotas and limits

Projects/namespaces may have:

ResourceQuotas that cap cumulative CPU/memory usage.
LimitRanges that define default and maximum requests/limits.

When vertically scaling:

Check that new pod sizes fit within quotas.
Ensure they do not exceed project-level max limits.
Coordinate with cluster administrators when higher resources are needed.

Typical Vertical Scaling Scenarios in OpenShift

Scenario 1: Single-pod legacy application

Old monolithic app, difficult to run multiple replicas safely.
Approach:

Start with conservative resources.
Increase memory/CPU stepwise while observing metrics.
Add a second replica only if the app can handle basic HA.

Scenario 2: Memory-heavy analytics job

Batch job using in-memory datasets.
Approach:

Request a large amount of memory per pod.
Use vertical scaling based on job success/failure and runtime.
Optionally combine with multiple pods if the algorithm supports partitioning.

Scenario 3: Bursty but low-concurrency workload

Internal tool used heavily in short bursts.
Horizontal scaling might not help much if concurrent sessions are limited.
Approach:

Provide more CPU per pod so bursts are absorbed quickly.
Possibly use vertical autoscaling to adapt between quiet and busy periods.

Summary

Vertical scaling in OpenShift focuses on changing pod size (CPU/memory) instead of pod count. It relies on careful configuration of requests and limits, observation of real resource usage, and understanding of scheduling and application behavior.

When used thoughtfully and often in combination with horizontal scaling, vertical scaling helps:

Avoid resource starvation and OOM kills.
Improve performance of workloads that cannot easily scale out.
Increase overall cluster efficiency by right-sizing workloads to their actual needs.

Comments

Please login to add a comment.

Don't have an account? Register now!