Kahibaro
Discord Login Register

Metrics and alerts

Types of Metrics in OpenShift

In OpenShift, most “monitoring” data you work with falls into a small set of metric types. Understanding them helps you create meaningful alerts instead of noisy ones.

Core metric types

High-level categories of metrics

Metrics Collection and Exposure

Prometheus as the core metrics engine

OpenShift’s metrics and alerting are centered on Prometheus:

How metrics get into Prometheus

Prometheus relies on a pull model (scraping):

In OpenShift, you generally don’t hand-edit prometheus.yml. Instead, you use:

Example: ServiceMonitor (conceptual example)

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: my-app-monitor
  namespace: my-namespace
spec:
  selector:
    matchLabels:
      app: my-app
  namespaceSelector:
    matchNames:
      - my-namespace
  endpoints:
    - port: metrics
      interval: 30s
      path: /metrics

Working with Metrics via the OpenShift Console

The OpenShift web console provides multiple entry points for metrics:

Metrics visible here are primarily platform-level metrics; application metrics appear when user workload monitoring and the relevant ServiceMonitor/PodMonitor are configured.

Alerting Concepts in OpenShift

Alerts in OpenShift are defined in Prometheus alerting rules and handled by Alertmanager. The Cluster Monitoring Operator manages both for the platform stack.

Key components

PromQL for Alerting

PromQL (Prometheus Query Language) is used both for dashboards and alerts. For alerts, you usually:

  1. Transform raw metrics into a useful signal, often using:
    • rate() / irate() for Counters
    • sum, avg, max, min, count for aggregation
    • histogram_quantile() for latency metrics from histograms
  2. Apply a condition:
    • Compare against thresholds:
      • $metric > threshold
    • Check ratios:
      • $error_rate / $total_requests > 0.05
  3. Use label filters to scope what you alert on:
    • metric{namespace="my-namespace", app="my-app"}

Common helper examples:

Defining Alerting Rules

Cluster-level alerting rules for OpenShift components are managed by the platform. For extending alerting (especially for applications), OpenShift uses Kubernetes custom resources:

Structure of an alert rule

An alert rule contains:

Conceptual example (application-oriented):

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: my-app-rules
  namespace: my-namespace
spec:
  groups:
    - name: my-app-availability
      rules:
        - alert: MyAppHighErrorRate
          expr: |
            (
              sum(rate(http_requests_total{app="my-app", status=~"5.."}[5m]))
            /
              sum(rate(http_requests_total{app="my-app"}[5m]))
            ) > 0.05
          for: 10m
          labels:
            severity: warning
          annotations:
            summary: "High error rate for my-app"
            description: "More than 5% of my-app requests have been failing for 10 minutes."

This example illustrates typical alert design practices:

Alert Lifecycle and States

Prometheus internally tracks alert states before passing them to Alertmanager:

Alertmanager processes firing alerts:

In the console under Observe → Alerts, you can see:

Metrics, SLIs, SLOs, and Alerting

Metrics form the basis of Service Level Indicators (SLIs), which you use to implement Service Level Objectives (SLOs). Alerts should typically be SLO-driven, not just raw metric thresholds.

Typical SLIs in OpenShift environments:

SLO-aligned alerting patterns:

These patterns are implemented using different PromQL windows and thresholds, all backed by the same base metrics.

Best Practices for Metrics and Alerts in OpenShift

Metric design

Alert design

Operational considerations

Putting It Together in an OpenShift Environment

In practice, using metrics and alerts on OpenShift typically looks like:

  1. Instrument your application with metrics (e.g. Prometheus client libraries).
  2. Expose /metrics and create a Service pointing to that port.
  3. Create a ServiceMonitor so user workload monitoring scrapes your app.
  4. Explore metrics using the console’s Metrics tab, refine PromQL queries.
  5. Define PrometheusRule objects with well-designed alerting rules.
  6. Use the Alerts view to:
    • Observe alert states.
    • Tune thresholds and durations.
    • Create silences during maintenance.

This workflow builds on the OpenShift monitoring stack that is already collecting cluster metrics and shipping default alerts, allowing you to extend it with application-specific metrics and alerting tailored to your workloads.

Views: 11

Comments

Please login to add a comment.

Don't have an account? Register now!