Table of Contents
Why Operators Exist in OpenShift
Operators are OpenShift’s way of automating the lifecycle of complex software running on the cluster. Instead of cluster admins manually installing, configuring, upgrading, backing up, and healing services (databases, message queues, monitoring stacks, etc.), Operators codify this operational knowledge into software.
In OpenShift, Operators are:
- Kubernetes-native: they rely on Custom Resource Definitions (CRDs) and controllers.
- Continuously running: they watch the cluster state and reconcile it to match what the user requested.
- Versioned and distributable: they are packaged, installed, and updated like software, typically via the Operator Lifecycle Manager (OLM).
Key motivation points:
- Move from “run this YAML once” to “declare the desired state; the Operator keeps it correct over time.”
- Encapsulate vendor or expert know‑how (proper config, tuning, upgrade order) into software.
- Provide a uniform way to manage platform services and applications, not just the cluster itself.
Core Concepts: CRDs, Controllers, and the Operator Pattern
An Operator is typically built on three building blocks that already exist in Kubernetes:
- Custom Resource Definition (CRD)
Defines a new type of Kubernetes resource, such asPostgresCluster,Kafka, orOpenShiftAPIServer. This lets you describe higher‑level services as simple YAML objects. - Controller (the Operator controller)
A process (usually running in a pod) that: - Watches the cluster for changes to its Custom Resources.
- Compares actual state (pods, services, config, etc.) to desired state (what you declared in the CR).
- Takes actions to reconcile differences (create pods, change configuration, roll out upgrades).
- Custom Resource (CR)
A specific instance of the CRD. For example, a YAML manifest that says: - “Create a database with 3 replicas, 100Gi storage, and a daily backup schedule.”
The Operator reads this CR and performs the necessary steps.
In OpenShift, this pattern is standard for both cluster platform components and add‑on software.
Types of Operators in OpenShift
OpenShift uses Operators at multiple layers. While the exact boundaries can blur, it’s useful to distinguish a few categories:
Cluster and Infrastructure Operators
These Operators manage the OpenShift platform itself. They are usually installed and managed automatically as part of the cluster.
Examples (names are illustrative, not exhaustive):
- Cluster Version Operator (CVO)
Coordinates the overall cluster version, ensuring that all core components are at compatible versions during upgrades. - Cluster Operators for core services
Each major OpenShift subsystem (API server, OAuth, Ingress, Storage, Monitoring, etc.) has its own Operator, often surfaced viaClusterOperatorresources. They: - Watch “config” custom resources (like
APIServer,Ingress,Authentication). - Configure and roll out the corresponding deployments and daemons.
- Report status (Available, Degraded, Progressing) to the control plane.
Characteristics:
- Installed by default with the platform.
- Critical for cluster health and functionality.
- Upgraded as part of the OpenShift upgrade process, driven by the CVO.
Platform and Add‑On Operators
These provide additional platform services that are not strictly required for the cluster to run, but commonly used in most environments:
- Logging stacks, monitoring add‑ons, service meshes.
- Storage solutions, backup systems, registry mirrors, etc.
Characteristics:
- Often installed via the OperatorHub and managed by the Operator Lifecycle Manager.
- May be cluster‑scoped or namespace‑scoped.
- Can be optionally enabled/disabled depending on requirements.
Application Operators
These Operators manage workloads that are applications from the cluster user’s perspective, such as:
- Databases (PostgreSQL, MySQL, MongoDB).
- Message queues and streaming systems (Kafka, RabbitMQ).
- AI/ML platforms, big data stacks, or specialized application suites.
Characteristics:
- Enable “database‑as‑a‑service” style usage on OpenShift.
- Typically expose CRDs like
Database,KafkaCluster, orMLWorkspace. - May be provided by vendors, open source communities, or developed in‑house.
How Operators Behave in an OpenShift Cluster
Reconciliation and Desired State
Operators implement a continuous reconciliation loop:
- You create or modify a Custom Resource (for example,
MyDatabase). - The Operator is notified of the change (via Kubernetes watch).
- It:
- Reads the desired specification in the CR.
- Inspects the current cluster state (deployments, pods, PVCs, config maps, secrets, etc.).
- Calculates the “diff” and applies the necessary changes.
- It repeats this process indefinitely, reacting to:
- User changes to the CR.
- External events (node failures, pod crashes, resource pressure).
This makes Operators robust against drift: if someone accidentally deletes a pod or service, the Operator recreates it to maintain the desired state.
Operator Scope: Cluster‑Scoped vs Namespace‑Scoped
In OpenShift you will encounter Operators with different scopes:
- Cluster‑Scoped Operators
- Manage resources across the entire cluster.
- Often deployed in system namespaces (for example,
openshift-*). - Handle shared services or critical infrastructure (API, Ingress, cluster‑wide storage classes).
- Namespace‑Scoped (or Tenant‑Scoped) Operators
- Manage resources only in specific namespaces or projects.
- Safer for multi‑tenant environments, as they limit impact and permission scope.
- Suitable when different teams or projects need isolated instances of the same service.
The scope is controlled through:
- The permissions granted to the Operator’s Service Account (RBAC).
- Whether its CRDs are intended for cluster‑wide or namespaced use.
Interaction with OpenShift Features
Operators integrate well with OpenShift capabilities:
- Role‑Based Access Control (RBAC)
- Operators act under a specific service account.
- RBAC determines which resources they can watch, create, or modify.
- This enables fine‑grained control over what an Operator may operate on.
- Security Context Constraints (SCCs)
- Operators that create pods must respect SCCs, ensuring that workloads run with proper security settings.
- Platform Operators often use more privileged SCCs; application Operators should use restricted ones where possible.
- Monitoring and Status Conditions
- Many Operators report status via CR conditions and/or
ClusterOperatorstatus. - OpenShift’s built‑in monitoring can scrape Operator metrics and alerts.
- This allows cluster admins to see when an Operator is Degraded, Progressing, or Available.
Common Lifecycle Tasks Automated by Operators
Operators automate a set of recurring tasks that would otherwise require scripts or manual runbooks:
Installation and Initial Configuration
- Creating deployments, StatefulSets, services, and other Kubernetes resources.
- Applying default configuration based on the CR specification.
- Validating user input and rejecting invalid configurations early.
Scaling and Topology Changes
- Adjusting the number of replicas for a service based on CR settings.
- Managing topology (for example, leader/follower database replicas or multi‑AZ placements).
- Coordinating rebalances or resharding for distributed systems.
Upgrades and Version Management
- Running safe rolling upgrades to new versions of an application or platform component.
- Enforcing supported upgrade paths (for example, skipping incompatible version jumps).
- Running any necessary pre‑ or post‑upgrade actions (schema migrations, compatibility checks).
Backup, Restore, and Maintenance Operations
- Scheduling regular backups (snapshots, exports).
- Exposing custom actions via CR fields or special “action” resources (for example, “trigger backup now”).
- Automating maintenance tasks like compaction, index cleanup, or certificate rotations.
Self‑Healing and Reliability Behavior
- Recreating missing pods, services, or PVCs that were accidentally deleted.
- Detecting and reacting to unhealthy states reported by readiness/liveness probes.
- Optionally triggering failover between replicas or regions.
How Operators Are Exposed to Users in OpenShift
While details of installation and lifecycle management are handled elsewhere, it is important to understand how Operators appear from a user perspective in OpenShift.
OperatorHub and Operator Catalogs
In the OpenShift web console:
- OperatorHub shows a catalog of Operators available from:
- Red Hat certified sources.
- Community sources.
- Custom organizational catalogs.
- Users with appropriate permissions can:
- Discover Operators by category (databases, monitoring, storage, etc.).
- Read brief descriptions, provider info, and version details.
- Initiate installation in the cluster or a specific namespace.
Installed Operators and Provided APIs
Once installed:
- The Installed Operators view lists Operators available in a namespace or cluster‑wide.
- Each Operator exposes one or more APIs (its CRDs). For each API, you can:
- Create new instances (CRs) via web forms or YAML editors.
- Inspect existing instances, including status fields.
- See documentation or examples provided by the Operator.
From the CLI side, you can:
- Discover CRDs installed by Operators using commands like:
oc get crds- Interact with the Operator’s resources like any other Kubernetes objects:
oc get <kind>(for example,oc get kafka)oc describe <kind> <name>oc apply -f my-cr.yaml
Status and Health Indications
Operators generally communicate health in several ways:
- Status fields in Custom Resources
status.conditionswith types likeReady,Progressing,Degraded.- Human‑readable messages explaining what the Operator is currently doing or why it is blocked.
- ClusterOperator Resources (for platform Operators)
- Expose high‑level conditions:
Available,Progressing,Degraded. - Used by OpenShift to determine cluster readiness and upgrade safety.
These indicators help admins quickly locate which Operator or CR is causing a problem.
Design and Usage Considerations
When working with Operators on OpenShift—either as a cluster admin selecting Operators or as a developer consuming them—there are several aspects to keep in mind.
Trust and Source of the Operator
- Certified / Vendor‑Supported Operators
- Often come with support guarantees.
- Typically recommended for production use, especially for critical data services.
- Community Operators
- Useful for experimentation, non‑critical workloads, or evaluation.
- May have varying levels of quality and maintenance.
- In‑House Operators
- Tailored to the organization’s internal platforms and policies.
- Require internal expertise to maintain.
Permission and Security Boundaries
- Evaluate the RBAC permissions requested by Operators:
- Broad cluster‑admin‑like access may be necessary for platform Operators but should be carefully controlled.
- Application Operators should follow least privilege and be limited to the namespaces they manage.
- Consider multi‑tenant implications:
- Decide whether a service should be shared cluster‑wide via a cluster‑scoped Operator or isolated per team/project via a namespace‑scoped Operator.
Operational Behavior and Upgrades
- Understand how an Operator handles:
- Upgrades between versions.
- Rollbacks, if they are supported at all.
- Configuration changes that may be disruptive (for example, increasing storage, changing data layout).
- Review the Operator’s documentation for:
- Supported configurations and limits.
- Backup and disaster recovery guidance.
- Observability (logs, metrics, alerts) integration.
When to Use Operators in OpenShift
Operators are beneficial when:
- The managed application has a non‑trivial lifecycle:
- Requires careful installation, ordered upgrades, migrations.
- Needs continuous tuning or monitoring of runtime state.
- You want a higher‑level API:
- Rather than managing raw deployments, services, and config maps, you manage a single CR describing intent.
- You need consistency:
- Multiple environments (dev, test, prod) can use the same CRs to describe services, with the Operator enforcing consistent behavior.
They may be less useful for:
- Very simple, stateless applications where a basic Deployment and Service are sufficient.
- One‑off jobs or workloads with no ongoing lifecycle beyond start and finish.
Summary
In OpenShift, Operators are the primary mechanism for managing both the platform itself and complex services running on it. By extending Kubernetes with custom resources and controllers, they turn operational knowledge into continuously running automation.
Key points:
- Operators implement the desired‑state pattern for higher‑level services, not just basic workloads.
- OpenShift relies heavily on Operators for its own core functionality, as well as optional add‑on services and user applications.
- Users interact with Operators mainly through CRDs (via YAML or the web console) and observe their behavior through status fields and cluster health indicators.
- Choosing and using Operators requires attention to trust, permissions, scope, and lifecycle behavior, especially in production and multi‑tenant environments.