Table of Contents
Concept and Motivation
In Kubernetes and OpenShift, many platform features and add‑ons start as “someone’s controller + a bunch of YAML + runbooks.” As clusters grow, this model becomes hard to operate and standardize.
Operators were introduced to solve this by turning operational knowledge into software:
- They are Kubernetes-native applications that manage the entire lifecycle of another application or platform component.
- They encode human operational expertise (installation, upgrades, recovery, tuning) into automated, declarative logic.
- They use Kubernetes APIs and patterns (custom resources, reconciliation loops) instead of out‑of‑band scripts or manual steps.
In OpenShift, Operators are central to how the platform itself is installed, configured, and kept healthy.
Definition
An Operator is:
A Kubernetes controller (or set of controllers) that uses Custom Resource Definitions (CRDs) to manage a specific application or capability, automating its lifecycle in a way that mimics a skilled human operator.
Key characteristics:
- API-driven: Exposes one or more custom APIs via CRDs.
- Reconciler-based: Continuously compares desired state (from CRs) with actual state and takes actions to align them.
- Lifecycle-aware: Knows how to install, configure, upgrade, scale, heal, and sometimes backup/restore the thing it manages.
- Domain-specific: Encodes knowledge specific to a database, message queue, monitoring system, or platform feature.
Core Building Blocks
While detailed mechanics belong to later chapters, it is important here to understand the basic ingredients that make an Operator:
- Custom Resource Definition (CRD)
- Extends the Kubernetes API with new resource types (for example,
Kafka,EtcdCluster,ServiceMeshControlPlane). - Lets users express high‑level intent using YAML, just like built‑in resources.
- Custom Resource (CR)
- An instance of a CRD that declares the desired state (for example, “a 3‑node database cluster with encryption enabled”).
- Users create/edit CRs; the Operator watches and reacts.
- Controller / Reconciliation Loop
- The Operator’s business logic that:
- Watches for changes to relevant resources (CRs and built‑in objects).
- Computes the difference between desired and actual state.
- Performs actions (create/update/delete Kubernetes resources, call APIs, etc.) until the cluster reaches the desired state.
What Problems Operators Solve
Operators aim to remove manual, error‑prone procedures and replace them with consistent automation that works across clusters and environments. Typical challenges they address:
- Complex installation
- Multi-component systems with many manifests, ordering requirements, and dependencies.
- Operator: creates all required Kubernetes objects in the right order from a single CR.
- Configuration drift
- Manual tweaks over time lead to inconsistent and fragile setups.
- Operator: treats the CR as the single source of truth and continually enforces it.
- Upgrades and compatibility
- Coordinating multi-step upgrades, schema migrations, API version changes.
- Operator: implements version‑aware upgrade paths and validations.
- High availability and self‑healing
- Detecting failed instances, rebalancing, or reconfiguring clustering.
- Operator: monitors health via Kubernetes status and app‑specific checks and reacts automatically.
- Operational consistency across teams
- Different admins applying different procedures.
- Operator: standardizes operational behavior behind a declarative API.
How Operators Differ from Plain YAML and Helm Charts
Without Operators, you typically:
- Apply static manifests (or Helm charts) to create resources.
- Rely on:
- Manual processes (runbooks).
- External automation (scripts, Ansible, CI jobs) to handle lifecycle tasks.
Operators differ in several ways:
- Stateful, continuous control vs. one-time templating
- Helm: renders and applies templates once; “upgrade” is another apply with new templates.
- Operator: constantly reconciles, reconfigures, and heals over time.
- Domain logic vs. generic templating
- Operators can:
- Talk to app-specific APIs.
- Inspect internal health/metrics.
- Orchestrate multi-step operations (e.g., rolling config changes with app‑level checks).
- Platform visibility and integration
- Operators use the Kubernetes control loop model directly: they are first‑class controllers that the cluster can introspect and manage.
Types and Levels of Operators
Not all Operators are equally sophisticated. They vary in what parts of the lifecycle they automate.
Common capabilities levels (conceptually):
- Basic install/configure
- Install components when a CR is created.
- Update resources when CRs change.
- Upgrades and version management
- Coordinate rolling upgrades.
- Understand version compatibility and block unsafe transitions.
- Health management and repair
- Detect pod or instance failures and recreate/replace them.
- Reconcile replication factors or cluster membership.
- Day‑2 operations
- Perform and manage backups/restores.
- Run scheduled maintenance tasks and tuning.
- Autonomous operations
- Reactive scaling or topology changes based on metrics.
- Policy‑driven reconfiguration or failover.
In OpenShift, you will encounter Operators at multiple levels of sophistication, depending on their purpose.
Operators in the OpenShift Context
While the overall “Operators and Platform Services” chapter covers the ecosystem and management aspects, this section focuses on what distinguishes Operators in OpenShift’s design:
- Core platform managed by Operators
Many built‑in OpenShift capabilities (networking, ingress, storage, authentication, cluster version) are themselves controlled by Operators. You typically configure the platform by editing custom resources they own. - Standardized delivery via Operator packaging
Operators in OpenShift are published and installed with metadata describing: - Which APIs (CRDs) they provide.
- What versions they support.
- Upgrade paths between versions.
- Opinionated automation for production use
Operators shipped with OpenShift aim to encode Red Hat’s recommended operational practices for running those components reliably.
Example: High-Level Operator Workflow
Conceptually, using an Operator looks like this:
- The Operator (and its CRDs) are available in the cluster.
- A user creates a custom resource, for example:
apiVersion: example.com/v1
kind: ExampleApp
metadata:
name: example
spec:
replicas: 3
storageSize: 100Gi
enableTLS: true- The Operator:
- Sees this new
ExampleAppCR. - Creates/updates Deployments, Services, PersistentVolumeClaims, Secrets, etc.
- Configures the application to run with 3 replicas, 100Gi storage, TLS enabled.
- If a node fails or a pod is deleted:
- Kubernetes restarts pods.
- The Operator may also:
- Reassign roles in the cluster.
- Reconfigure peers.
- Ensure the application remains in the desired state.
- If the user edits the CR to change
replicas: 5: - The Operator adjusts the underlying resources and application configuration accordingly.
Benefits and Trade-Offs
Benefits
- Operational automation: Encapsulates complex logic so users interact with a simpler, declarative interface.
- Consistency: Same behavior across clusters and environments.
- Upgradability: Built‑in, tested upgrade paths for many components.
- Integration: Native use of Kubernetes primitives (RBAC, namespaces, events, conditions, statuses).
Trade‑offs
- Abstraction complexity: Users must understand the CRD’s model and semantics.
- Dependency on Operator quality: Bugs or limitations in Operators can affect the managed component.
- Ecosystem variance: Different Operators may expose different conventions or levels of automation.
When to Use Operators
In the context of OpenShift, Operators are especially suitable when you need to:
- Manage stateful or complex systems (databases, queues, monitoring stacks, service meshes).
- Provide shared platform services to many applications.
- Offer self‑service capabilities to developers without giving them low‑level access to sensitive platform components.
- Standardize and automate day‑2 operations (upgrades, backups, tuning) across clusters.
Later chapters will cover how Operators are installed, managed, and which platform services they typically provide in OpenShift. Here, the key takeaway is that Operators are the primary pattern for expressing and automating operational expertise as Kubernetes‑native software.