15 Operators and Platform Services

Table of Contents

Why Operators Matter in OpenShift

Operators are central to how OpenShift delivers “platform as a product” rather than a loose collection of components. They encode operational knowledge—installation, configuration, upgrades, scaling, and recovery—into software that runs in the cluster.

In OpenShift, Operators are not just an optional add-on; they are used to manage:

Core cluster components (e.g., networking, storage, authentication).
Platform services (e.g., logging, monitoring, service mesh).
Application-level services (e.g., databases, message queues, AI/ML stacks).

Understanding Operators and platform services is critical because:

Many features you “just turn on” in OpenShift are actually delivered as Operators.
Day-2 operations (upgrades, backup/restore, tuning) often happen via Operators.
Custom workloads (especially in enterprise and HPC contexts) are increasingly packaged as Operators.

This chapter gives you a conceptual model for how Operators are used to deliver and manage platform services, and how that shapes daily work on OpenShift.

The Operator-Based Platform Model

OpenShift treats much of the cluster as managed by Operators rather than by ad‑hoc scripts or manual configuration. The pattern is:

Desired state is declared using Kubernetes-style resources (often Custom Resource Definitions, CRDs).
Operators continuously reconcile actual state to match the desired state.
Platform services are expressed as custom resources, which you create or adjust, instead of manually configuring deployments, config maps, and secrets.

This has several implications:

Consistency: Every cluster with the same Operator and the same custom resources converges to the same platform behavior.
Upgradeability: Operators encode upgrade logic, avoiding fragile manual upgrade playbooks.
Composability: Multiple Operators can collaborate to provide a composite service (e.g., monitoring stack, service mesh).

Conceptually, you can think of OpenShift as:

Core Kubernetes + CRDs + a large set of system-level Operators that deliver features.
Optional platform Operators you can add to extend the cluster for your workloads.

Categories of Platform Services Delivered via Operators

In OpenShift, many “platform features” are delivered as installable or built-in Operators. Common categories include:

Cluster Foundation Services

These are usually installed and managed automatically as part of OpenShift:

Cluster Network Operator: Configures and manages cluster networking (CNI, network policy enforcement, etc.).
Machine Config Operator: Manages node-level configuration such as OS updates and system settings.
Authentication, Ingress, and Storage Operators: Govern cluster-wide identity, external access, and storage backends.

As a user or admin, you generally:

Do not create deployments directly for these components.
Do adjust their behavior via custom resources they expose (for example, updating a custom resource to change an ingress or storage configuration).

Observability and Operations Services

Monitoring, logging, and related operations tools are typically implemented with Operators:

Monitoring stack: Managed by Operators that deploy and maintain Prometheus, Alertmanager, and related components.
Logging stack: Operators that manage log collectors and storage backends.
Tracing and APM: Often delivered as optional Operators (Jaeger, Tempo, etc.).

By treating these as Operator-based services:

Platform teams can rollout standard monitoring/logging across namespaces.
Developers can often opt-in by creating specific custom resources (e.g., a ServiceMonitor for scraping metrics).

Data and Middleware Services

Commonly offered as Operator-backed services:

Databases: PostgreSQL, MySQL, MongoDB, Redis, and others as “database-as-a-service” on the cluster.
Message brokers: Kafka, AMQ, RabbitMQ, etc.
API gateways and service meshes: e.g., OpenShift Service Mesh, 3scale.

The typical pattern for using these services:

Install the Operator (platform admin task).
Developer creates a custom resource (e.g., a Kafka or PostgresCluster object).
Operator provisions and manages the service (scaling, failover, backups, etc.).

AI/ML and Data Platforms

Modern OpenShift environments frequently add Operators for:

AI/ML platforms (e.g., OpenShift AI stacks).
GPU management and scheduling.
Data science toolchains and notebook environments.

These platform services encapsulate complex infrastructure (accelerators, storage, specialized runtimes) in a way that is consumable by developers and data scientists via a small set of custom resources.

Lifecycle of Platform Services with Operators

Platform services managed by Operators typically follow a predictable lifecycle that mirrors how you work with applications, but at a higher level of abstraction.

1. Installation

Cluster admins install Operators using the catalog and lifecycle tooling in OpenShift.
Installation results in:

Operator controller deployments in specific namespaces.
Registration of CRDs that define new resource types.

Platform services only become available to users when their Operators are present and appropriately scoped (cluster-wide or namespaced).

2. Configuration via Custom Resources

Instead of editing lower-level Kubernetes objects, you manipulate higher-level custom resources that describe:

Service size and capacity (replicas, sizing tiers).
Connectivity and access settings.
Storage, backup, and retention policies.
Integration with other services (e.g., observability, auth).

The Operator reads this desired state and:

Creates/updates deployments, services, routes, PVCs, config maps, and secrets as needed.
Applies embedded best practices (e.g., secure defaults, recommended topology).

3. Day-2 Operations

Day-2 operations are where Operators add most value:

Upgrades and patches: Changing the Operator channel or version can upgrade the whole managed service stack.
Scaling: Adjusting a field (e.g., replica count or tier) in the custom resource can trigger controlled scaling.
Backup and restore: Some platform Operators expose custom resources for backup jobs, restore operations, or snapshot policies.
Policy changes: Modifying configuration in the custom resource is reconciled automatically, ensuring drift correction.

4. Decommissioning

To remove a managed platform service:

Delete the relevant custom resource(s).
Optionally, remove the Operator if no longer needed.

The Operator is responsible for “tear down” behavior (e.g., deleting deployments, cleaning up PVCs depending on policy).

Roles and Responsibilities Around Operators

In an OpenShift environment, different personas interact with Operators and platform services in specific ways.

Platform / Cluster Administrators

Typical responsibilities:

Decide which Operators are approved and installed cluster-wide.
Manage Operator lifecycles: channels, versions, and upgrade windows.
Configure global platform services:

Cluster-wide logging and monitoring.
Shared database or message bus services.
Security-related services (certificate management, secret management).

Critical considerations:

Security: Some Operators require elevated permissions; admins decide where they are allowed and how they’re configured.
Multi-tenant isolation: Choosing whether platform services are shared across namespaces or dedicated per project.
SLA and support: Preferring supported Operators for production-critical services.

Application and Service Developers

Typical interactions:

Request installation of Operators they need (if not already available).
Create and maintain custom resources for:

Databases and caches used by their applications.
Messaging systems and integration middleware.
Specialized runtimes (e.g., service mesh sidecars, AI toolchains).

The key mindset shift:

Instead of deploying raw deployments and statefulsets for complex services, developers treat them as platform building blocks exposed via higher-level CRDs.

SRE / DevOps Practitioners

SREs often:

Monitor the health and behavior of Operators themselves.
Integrate Operator-managed services into:

Centralized alerting policies.
Capacity and cost management.
Backup and disaster recovery procedures.

Tune Operator configurations for:

Performance (e.g., resource reservations, high availability).
Compliance (e.g., logging retention, encryption).

Design Patterns Enabled by Operators

With platform services exposed through Operators, several patterns become easier to implement.

Self-Service Platform Services

Teams can provision services on demand:

A team creates a custom resource for “their” database or Kafka cluster.
Quotas and policies control how big and how many.
No tickets to infrastructure teams for each instance; governance is encoded as policies and Operator defaults.

This enables a “platform as a product” approach: the platform team curates Operators and default configurations; application teams consume them as services.

Opinionated Defaults and Guardrails

Operators can:

Enforce security baselines (TLS, RBAC, encryption).
Attach observability by default (metrics, logs, tracing).
Set sane resource requests/limits and recommended topology.

Platform teams use this to provide:

A golden path: recommended ways to use particular services.
Guardrails rather than hard prohibitions, by shaping defaults and constraints.

Composable Platform Stacks

Multiple Operators can be composed to form:

Full observability stacks (metrics + logs + tracing).
API management plus service mesh plus identity integration.
AI/ML pipelines combining GPU scheduling, data services, and notebook environments.

From the user’s perspective, these appear as a small set of custom resources that “stand up” complex capabilities with minimal effort.

Operational Considerations and Trade-Offs

Running a platform powered by many Operators brings benefits, but also requires conscious design and governance.

Version and Channel Management

Each Operator typically supports:

Channels (e.g., stable, fast, preview).
Streams of updates that may introduce breaking changes.

Platform teams need a strategy for:

Which channels are allowed for production vs. development.
When to roll out updates and how to validate them.
How to align platform service versions with application compatibility.

Dependency and Interactions

Operators may rely on:

Specific Kubernetes/OpenShift versions.
Other platform Operators (e.g., monitoring stack).

This leads to:

The need to consider dependency chains when upgrading.
Testing for interactions (e.g., changes in logging affecting compliance pipelines).

Resource and Capacity Impact

Each Operator and its managed services consume:

CPU and memory (controllers, sidecars, operators’ own pods).
Storage (metrics, logs, databases).
Network bandwidth (especially for observability stacks and messaging).

As more platform services are added:

Capacity planning must consider both user workloads and platform overhead.
Isolation strategies (separate infrastructure nodes, taints/tolerations) may be used for heavy platform components.

Security and Compliance

Operators influence security posture by:

Managing critical components (auth, certificates, ingress, storage).
Handling secrets and credentials for platform services.
Exposing or restricting capabilities to tenant namespaces.

Operational focus areas:

Ensuring Operators run with least privilege.
Verifying that platform services satisfy:

Encryption at rest and in transit requirements.
Audit logging requirements.
Data locality and retention policies.

How Operators Shape Daily Work on OpenShift

In an Operator- and platform-service-centric OpenShift cluster, daily activities look different than on a bare Kubernetes cluster:

Provisioning a service: You create a custom resource; you don’t manually deploy all of its components.
Changing configuration: You edit a high-level spec; the Operator performs all underlying changes and restarts as needed.
Upgrading: You update an Operator’s version or channel and watch it handle the controlled rollout.
Troubleshooting platform behavior: You examine custom resources, Operator logs, and their status fields rather than only deployments and pods.

This shifts focus from micromanaging low-level objects to managing intent and policies, with Operators carrying out the actual work.

Summary

In OpenShift, Operators are the primary mechanism by which platform services—monitoring, logging, networking, storage, middleware, AI/ML, and more—are delivered, configured, and operated.

Understanding this Operator-based platform model helps you:

See OpenShift not only as “Kubernetes with extras” but as a curated, operator-driven platform.
Interact with complex services through simple, declarative custom resources.
Collaborate effectively across roles (admins, developers, SREs) by using Operators as the shared abstraction for platform capabilities.

Subsequent chapters that dive into Operators and specific platform services will build on this view, showing how particular Operators are installed, configured, and used in real-world scenarios.

15.1 What are Operators

15.2 Operator Lifecycle Manager

15.3 Installing and managing Operators

15.4 Common platform Operators

15.5 Custom Operators

Comments

Please login to add a comment.

Don't have an account? Register now!