4.3 Nodes and machine roles

Table of Contents

Node Types in OpenShift

OpenShift clusters are composed of several types of nodes, each with specific responsibilities. While the architecture overview already introduced the idea of control plane vs worker, here we focus on how OpenShift refines these into concrete machine roles.

At a high level, an OpenShift node is just a machine (virtual or physical) that runs the OpenShift software stack and participates in the cluster. How that node is labeled, configured, and scheduled defines its role.

Typical node types in OpenShift:

Control plane (master) nodes
Worker (compute) nodes
Infrastructure nodes
Specialized nodes (e.g. GPU, storage-heavy, edge, single-node)

These are mostly conventions implemented via labels, taints, and machine configuration, not different binaries.

Control Plane Nodes

Control plane nodes (often called “masters”) run the core Kubernetes and OpenShift control plane components.

Characteristics specific to control plane nodes:

Run API-centric workloads only

kube-apiserver, kube-scheduler, kube-controller-manager
OpenShift-specific control-plane components (e.g. openshift-apiserver, OAuth server, etcd cluster)

Not used for regular application pods

In production, they are typically tainted to prevent user workloads from scheduling there.

High availability focus

Commonly deployed as 3 (or more) nodes to form a highly available etcd quorum and resilient control plane.

Configuration management

Managed as a group via MachineConfigPool (MCP) master and associated MachineConfig resources.

In most OpenShift deployments you do not run your applications on control plane nodes; their role is to maintain cluster state and respond to API requests.

Worker Nodes

Worker nodes (often called “compute” nodes) run your application workloads and most platform services that are not part of the core control plane.

Key characteristics:

Primary scheduling target for application pods

Default node group where deployments, workloads, and many Operators run.

Scalable pool

You scale application capacity by adding/removing worker nodes.

General-purpose resources

Typical workers have balanced CPU, memory, and network capacity.

OpenShift manages worker nodes through the worker MachineConfigPool and, in many cases, through Machine API resources that allow automatic provisioning and scaling (details of provisioning are covered in the installation and deployment chapters).

Infrastructure Nodes

Infrastructure (infra) nodes are worker nodes dedicated to cluster infrastructure workloads instead of user applications.

They still run the node components (kubelet, CRI-O) like any worker, but their scheduling is constrained to specific infrastructure pods, for example:

Ingress controllers (routers)
Internal registries
Monitoring stack (Prometheus, Alertmanager)
Logging components
Some platform Operators and shared services

Why use infra nodes:

Isolation of cluster services from user applications

Reduces “noisy neighbor” effects.
Helps meet performance and SLO requirements for critical cluster services.

Clear capacity planning

You can size infra nodes based on known cluster services, separately from user workload sizing.

Security and compliance

Some organizations want strict separation between cluster management components and tenant workloads.

How infra nodes are typically implemented:

They are standard worker machines with:

Node labels such as:

node-role.kubernetes.io/infra=""
OpenShift also often uses node-role.kubernetes.io/worker="" for these nodes, so they remain in the worker MCP but are distinguished by the infra label.

Taints to repel general workloads:

Example: node-role.kubernetes.io/infra:NoSchedule

Tolerations and node selectors on infra workloads so only those pods land on infra nodes.

The exact label/taint patterns can vary by OpenShift version and organizational standards, but the principle remains: use labels+taints to dedicate a class of worker nodes to infrastructure services.

Specialized Worker Roles

Beyond basic worker and infra roles, OpenShift encourages defining specialized worker groups for specific types of workloads or hardware. These are implemented with the same mechanisms: labels, taints, and custom MachineConfigPools.

Common examples:

GPU / Accelerator Nodes

Nodes with GPUs or other accelerators are dedicated to ML/AI, visualization, or HPC-style workloads.

Typical characteristics:

Special hardware:

NVIDIA/AMD GPUs, FPGAs, or other accelerators.

Special drivers and runtime

Managed through vendor-specific Operators (e.g. NVIDIA GPU Operator).
Might use a dedicated MachineConfigPool, e.g. gpu.

Scheduling controls

Labeled, e.g. node-role.kubernetes.io/gpu="" or accelerator=nvidia.
Workloads request GPUs using device plugin resources such as nvidia.com/gpu.

Storage-Heavy or I/O Nodes

Some clusters use nodes optimized for I/O-heavy workloads (databases, caches, file-serving workloads).

Examples:

Nodes with:

High IOPS local SSDs
Special networking or storage adapters

Labeled and configured so only specific stateful applications are scheduled there.
May host certain storage-related daemons (depending on storage solution and architecture, covered in storage chapters).

Edge or Remote Site Nodes

For edge deployments or remote sites:

Small form-factor or resource-constrained machines.
Might be part of:

Edge clusters, or
Remotely located worker nodes for data locality.

Often have different MachineConfigPools and update policies to reflect connectivity and operational constraints.

Single-Node OpenShift (SNO) as a Special Case

Single-Node OpenShift (SNO) is a special deployment model where control plane and worker roles run on a single machine.

Role characteristics in SNO:

All control plane components and application workloads share one node.
Still logically separated:

Components are labeled and scheduled according to the same role principles,
But physical isolation is not present.

Useful for:

Edge deployments
Development/test environments
Appliances or specialized on-prem installations

Management concepts like MachineConfigPool still exist, but they apply to a single node.

Node Roles, Labels, and MachineConfigPools

Machine roles in OpenShift are mainly implemented using three mechanisms:

Node roles and labels

OpenShift automatically creates some canonical labels, such as:

node-role.kubernetes.io/master="" (or control-plane)
node-role.kubernetes.io/worker=""

Admins can add additional role labels, for example:

node-role.kubernetes.io/infra=""
node-role.kubernetes.io/gpu=""

Workloads then use:

nodeSelector
affinity / antiAffinity
to control where pods run.

Taints and tolerations

Used to repel workloads from certain node types.
Example:

Infra nodes tainted with node-role.kubernetes.io/infra:NoSchedule.
Only infra workloads with corresponding tolerations are allowed to schedule there.

MachineConfigPools (MCPs)

MCPs group nodes with similar configuration needs.
Common pools:

master
worker
Custom pools, for example: infra, gpu, edge.

MachineConfigs target specific MCPs, allowing:

Different OS configuration
Different kubelet settings
Different kernel parameters
per role.

Understanding this mapping is key:

Node role / label → scheduling (which workloads go where).
Taints/tolerations → strong scheduling constraints.
MachineConfigPool → how nodes of that role are configured and updated.

Assigning Workloads to Node Roles

From an application or platform perspective, using node roles correctly is about:

Choosing where workloads should run:

Business apps → general workers or specialized pools (e.g. gpu).
Cluster services → infra nodes.
Control plane components → control plane nodes only.

Configuring workloads appropriately:

Use nodeSelector or nodeAffinity to target the right labels.
Use tolerations only when you intentionally schedule onto tainted nodes.

Example of a Deployment that targets infra nodes only:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: infra-service
spec:
  replicas: 2
  selector:
    matchLabels:
      app: infra-service
  template:
    metadata:
      labels:
        app: infra-service
    spec:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
      - key: "node-role.kubernetes.io/infra"
        operator: "Exists"
        effect: "NoSchedule"
      containers:
      - name: app
        image: myorg/infra-service:latest

This Deployment will:

Run only on nodes labeled node-role.kubernetes.io/infra="".
Be scheduled even if those nodes are tainted with NoSchedule for that key.

Design Considerations for Node Role Layout

When planning node roles in an OpenShift cluster, you typically decide:

How many control plane nodes:

Usually 3 in production for quorum and availability.

How many and what kind of worker pools:

General-purpose workers
Infra nodes
GPU or compute-optimized pools
Storage or I/O optimized pools
Edge or remote-site workers

Key factors that influence design:

Workload types and SLAs

Latency-sensitive, batch, ML/AI, etc.

Security and multi-tenancy needs

Need for strong isolation between platform services and tenant workloads.

Operational model

How you patch and update different pools.
Tolerance for draining and rebooting nodes per role.

By carefully defining node roles and using labels, taints, and MachineConfigPools, you can tightly control where workloads run, how cluster resources are used, and how different classes of machines are managed.

Comments

Please login to add a comment.

Don't have an account? Register now!