Kahibaro
Discord Login Register

Horizontal pod autoscaling

Concept and Goals of Horizontal Pod Autoscaling

Horizontal Pod Autoscaling (HPA) in OpenShift automatically adjusts the number of pod replicas for a workload based on observed metrics. Instead of manually scaling a Deployment/DeploymentConfig up or down, HPA continuously evaluates metrics such as CPU or custom application metrics and changes replica counts accordingly.

Key characteristics:

HPA is best suited for:

How HPA Works in OpenShift

At a high level, HPA involves:

  1. Metrics collection
    • A metrics stack (e.g., OpenShift’s built‑in metrics, cluster monitoring) gathers pod and/or custom metrics.
  2. HPA controller
    • A control loop (in the Kubernetes control plane) periodically checks metrics against the scaling rules defined in the HPA object.
  3. Replica adjustment
    • The controller calculates the desired replica count and updates the target controller (Deployment, DeploymentConfig, etc.).
  4. Workload reconciliation
    • The target controller creates or removes pods to reach the desired number of replicas.

The HPA controller uses the observed metrics and a target value to compute the desired replicas. For resource metrics like CPU, a typical formula is:

$$
\text{desiredReplicas} = \text{currentReplicas} \times \frac{\text{currentMetric}}{\text{targetMetric}}
$$

rounded to an integer and clamped between minReplicas and maxReplicas.

Supported Metrics Types

OpenShift’s HPA uses the same metric types as upstream Kubernetes, but the actual availability of each depends on cluster configuration.

Common categories:

On many OpenShift clusters, you will most commonly start with CPU (and sometimes memory) metrics, and only later integrate custom or external metrics as your observability and metrics stack matures.

Defining an HPA Object

The HPA is defined as a standard Kubernetes resource (HorizontalPodAutoscaler) that references a scalable target and one or more metrics.

Basic fields:

Example: CPU-based HPA for a Deployment:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-frontend-hpa
  namespace: my-app
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-frontend
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

This configuration:

Creating and Managing HPAs with `oc`

You can define HPAs either by manifest or via oc commands.

Creating a basic HPA

CPU-based HPA using oc:

oc autoscale deployment/web-frontend \
  --cpu-percent=70 \
  --min=2 \
  --max=10

For DeploymentConfig:

oc autoscale dc/my-api \
  --cpu-percent=60 \
  --min=1 \
  --max=8

This command generates an HPA resource for the chosen object in the current project/namespace.

Inspecting and describing HPAs

List HPAs:

oc get hpa

Check details and current scaling decisions:

oc describe hpa web-frontend-hpa

You’ll see information like:

Updating and deleting HPAs

Update the HPA manifest (oc edit hpa web-frontend-hpa) or apply a changed YAML:

oc apply -f web-frontend-hpa.yaml

Delete an HPA (stopping autoscaling but not removing the workload):

oc delete hpa web-frontend-hpa

Interaction with Resource Requests and Limits

HPA behavior is strongly influenced by resource requests and limits:

Best practice:

Example pod spec excerpt that HPA will rely on:

resources:
  requests:
    cpu: "200m"
    memory: "256Mi"
  limits:
    cpu: "500m"
    memory: "512Mi"

If averageUtilization is 70, HPA aims for about 70% of the CPU requests, not the limits.

Scaling Behavior: Stabilization and Cooldown

HPA is not instantaneous; it has built‑in protections to avoid oscillation:

In autoscaling/v2, you can configure behavior more explicitly (if supported/enabled in your OpenShift version):

spec:
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
        - type: Percent
          value: 100
          periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
        - type: Percent
          value: 50
          periodSeconds: 60

This example:

Using Custom and External Metrics (Conceptual Overview)

In more advanced setups, HPA can scale based on custom or external metrics via metrics adapters. Without going into installation details:

A conceptual metric definition within an HPA might look like:

metrics:
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "50"

or

metrics:
  - type: External
    external:
      metric:
        name: job_queue_length
      target:
        type: AverageValue
        averageValue: "100"

These require that:

Workload Considerations and Patterns

Not all workloads are good candidates for HPA. Consider:

Also consider:

HPA and Other Scaling Mechanisms

HPA operates at the pod replica level and interacts with other scaling features:

In OpenShift environments, teams often:

Observability and Troubleshooting HPA

To operate HPA effectively, you must be able to see what it is doing and why.

Useful checks:

Common symptoms and their likely causes:

Instrumenting your application with appropriate metrics, and reviewing both HPA status and application performance, is key to tuning autoscaling.

Best Practices for Horizontal Pod Autoscaling in OpenShift

By combining these practices with OpenShift’s monitoring and logging capabilities, you can build applications that respond automatically and predictably to changing load using horizontal pod autoscaling.

Views: 12

Comments

Please login to add a comment.

Don't have an account? Register now!