Kahibaro
Discord Login Register

Built-in monitoring stack

Overview of the OpenShift Monitoring Stack

OpenShift ships with an opinionated, fully integrated monitoring stack built on top of popular open-source projects, primarily Prometheus and Alertmanager. This stack is:

This chapter focuses on what is in that stack, how it is structured, and how you interact with it as a cluster admin and as an application developer.

Architecture and Components

The built-in monitoring stack is composed of multiple components running in specific namespaces, each with a distinct responsibility.

Core Components

Prometheus

Prometheus is the core time-series database and metrics scraper:

OpenShift runs multiple Prometheus instances for different scopes:

Alertmanager

Alertmanager:

OpenShift provides a managed Alertmanager instance:

Thanos Querier (or Aggregation Layer)

To provide a unified query entry point:

Metrics Collectors and Exporters

Several components expose or collect metrics:

Namespaces and Logical Separation

The monitoring stack is split by namespaces, which also reflect responsibilities:

This separation allows:

Cluster vs User Workload Monitoring

The built-in monitoring stack is deliberately split into cluster monitoring and user workload monitoring. Their differences are important both operationally and for security.

Cluster Monitoring (Platform)

Cluster monitoring is:

Key characteristics:

As an administrator, you typically:

User Workload Monitoring

User workload monitoring:

When enabled:

This separation allows:

Configuration of the Built-in Monitoring Stack

The monitoring stack is managed primarily through custom resources and configuration objects that Operators reconcile. You do not manually edit the deployments.

Cluster Monitoring Configuration

Cluster-level configuration is done via a ConfigMap called cluster-monitoring-config in openshift-monitoring (exact name may vary slightly by version).

Typical structure (simplified):

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    prometheusK8s:
      retention: 15d
      resources:
        requests:
          memory: 2Gi
        limits:
          memory: 4Gi
    alertmanagerMain:
      resources:
        requests:
          memory: 512Mi
    telemeterClient:
      enabled: true

You use this to:

The Cluster Monitoring Operator reconciles this config and adjusts deployments automatically.

User Workload Monitoring Configuration

User workload monitoring is configured via a ConfigMap in openshift-user-workload-monitoring, often called user-workload-monitoring-config.

Simplified example:

apiVersion: v1
kind: ConfigMap
metadata:
  name: user-workload-monitoring-config
  namespace: openshift-user-workload-monitoring
data:
  config.yaml: |
    prometheus:
      retention: 7d
      resources:
        requests:
          memory: 2Gi
    alertmanager:
      enabled: true

You use this to:

Configuring Alertmanager for Notifications

Platform Alertmanager is configured from a Secret in openshift-monitoring, often named alertmanager-main (the Operator manages many details).

Example template of the Alertmanager configuration (YAML stored as string in the Secret):

global:
  resolve_timeout: 5m
route:
  receiver: 'default'
  routes:
  - match:
      severity: critical
    receiver: 'pager'
  - match_re:
      severity: "warning|info"
    receiver: 'email'
receivers:
- name: 'default'
- name: 'pager'
  webhook_configs:
  - url: 'https://pagerduty.example.com/…'
- name: 'email'
  email_configs:
  - to: 'ops@example.com'

Typical admin tasks:

User workload Alertmanager (if enabled) is configured similarly, but as a separate instance to handle application alerts.

Metrics Collection for Applications

The built-in stack gives you a standard way to have Prometheus scrape your apps:

Exposing Metrics

Applications need to expose an HTTP endpoint with metrics in Prometheus format, e.g.:

# HELP http_requests_total Total number of HTTP requests
# TYPE http_requests_total counter
http_requests_total{method="GET",code="200"} 1024

Most languages have Prometheus client libraries that do this automatically.

Using ServiceMonitor and PodMonitor

Instead of manually editing Prometheus configuration, you create custom resources:

Example ServiceMonitor (in a user namespace):

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: myapp-monitor
  namespace: myapp-namespace
spec:
  selector:
    matchLabels:
      app: myapp
  endpoints:
  - port: metrics
    path: /metrics
    interval: 30s

When user workload monitoring is enabled:

Labeling and Multi-tenancy

Metrics scraped by user workload Prometheus are labeled to preserve context:

RBAC and the query layer ensure:

Accessing Metrics and the Monitoring UI

The built-in stack is tightly integrated with the OpenShift web console and APIs.

Web Console Views

In the OpenShift web console you can:

What you see is filtered by RBAC:

Query Endpoints

Under the hood, queries go through the Thanos Querier or similar component, which:

You can also use the oc CLI or HTTP APIs to:

Extending and Integrating the Built-in Stack

The built-in monitoring stack is designed as the default, supported solution; you can still integrate it with external systems.

Remote Write and External Storage

Depending on OpenShift version and configuration policy, the built-in Prometheus may support:

This is usually configured in the cluster monitoring configuration:

prometheusK8s:
  remoteWrite:
  - url: https://external-metrics.example.com/api/v1/write
    writeRelabelConfigs:
    - sourceLabels: [__name__]
      regex: "container_.*"
      action: keep

Check version-specific documentation and organizational policy before enabling remote write.

External Dashboards

While OpenShift provides built-in visualization, you can:

This approach is common when you need:

Operational Considerations

The built-in monitoring stack is a first-class platform component; treating it like any other application may lead to issues. Key points:

Resource Usage and Sizing

Monitoring can be resource-intensive:

As an admin, you:

Storage and Retention

Prometheus uses local storage on persistent volumes:

Typical practices:

Availability and Upgrades

Because monitoring is part of the platform:

The Operators orchestrate:

Typical Usage Patterns

To tie everything together, here is how the built-in stack is typically used:

The built-in monitoring stack provides a standardized, supported foundation for metrics and alerts throughout an OpenShift cluster, without requiring you to build and maintain your own Prometheus deployment from scratch.

Views: 11

Comments

Please login to add a comment.

Don't have an account? Register now!