18.3 Case studies from industry

Table of Contents

Why Case Studies Matter

Seeing how real organizations use OpenShift helps connect concepts from earlier chapters to concrete decisions, trade-offs, and outcomes:

How teams structure projects and namespaces for real environments.
How they choose deployment models (on-prem, cloud, managed).
How they integrate CI/CD, security, and observability into daily work.
How they handle scale, multi-tenancy, and regulated environments.

This chapter walks through several typical industry scenarios, focusing on architecture choices, workflows, and lessons learned rather than technical basics.

Case Study 1: Financial Services – Regulated Banking Platform

Context and Goals

A large bank wants to modernize core customer-facing applications:

Replace legacy application servers with containers.
Run both cloud-native microservices and modernized monoliths.
Meet strict regulatory/compliance requirements (auditability, data locality).
Enable faster release cycles without sacrificing control.

They select OpenShift primarily because of:

Integrated security and compliance features.
Strong RBAC model and multi-tenancy.
Support for hybrid deployment (on-prem + cloud).
Enterprise support and certified ecosystem.

High-Level Architecture

Deployment model:

Primary OpenShift clusters in two on-prem data centers (active/active).
Additional managed OpenShift clusters in a public cloud for non-production.

Separation of concerns:

Dedicated clusters for:

prod – isolated, most strictly controlled.
nonprod (dev, test, perf) – more flexible, used for experimentation.

Networking and access:

Internal applications accessed via internal Routes behind corporate load balancers.
External-facing apps exposed through DMZ with strict firewall rules.
NetworkPolicies implemented to enforce zero-trust principles between namespaces.

Projects, Namespaces, and Multi-Tenancy

Security and isolation requirements drive a careful namespace design:

Project per application domain (e.g., retail-banking, corporate-banking).
Environment separation via namespaces:

retail-banking-dev, retail-banking-test, retail-banking-prod.

RBAC:

Developers get edit in -dev and -test namespaces, view only in -prod.
Operations and SRE teams get admin in production namespaces.
Service accounts restricted with minimal required privileges, combined with Security Context Constraints (SCCs) tailored for the bank.

This structure allows strict change control in production while giving developers autonomy in non-prod environments.

Application Deployment Pattern

The bank has a mix of:

New microservices (REST APIs, event-driven components).
Modernized monoliths running in containers.
Batch/ETL workloads for daily processing.

Typical patterns:

Deployment objects with:

Rolling updates for stateless services.
Strict resource requests/limits and QoS policies per service tier.

StatefulSet for databases and stateful middleware in some non-critical cases; mission-critical databases remain on dedicated DB platforms, accessed from OpenShift via internal Services.
Configurations externalized using ConfigMaps and Secrets (for connection strings, credentials, keys).

CI/CD and Governance

The bank implements a controlled yet automated pipeline:

Build:

CI (e.g., Jenkins or GitLab CI) triggers builds on code changes.
Container images built and scanned for vulnerabilities before being pushed to a private registry.

Deploy:

GitOps (e.g., Argo CD) used for production:

Desired state stored in Git repositories.
Changes go through pull requests and approval processes.

OpenShift Pipelines (Tekton) commonly used in non-prod for more flexible experimentation.

Key governance features:

ImagePolicies and image signing required for production.
Admission controllers ensure:

Only images from approved registries are allowed.
Mandatory labels and annotations (e.g., ownership, data classification).
Resource limits defined for all workloads.

Observability and Operations

Centralized logging integrated with the bank’s SIEM platform.
Metrics scraped from OpenShift and application components for capacity planning and SLO monitoring.
Alerts integrated with on-call tools and incident management workflows.

Outcomes and Lessons:

Benefits:

Reduced deployment time from weeks to hours.
Easier auditability due to GitOps and OpenShift’s event logging.

Challenges:

Cultural change: moving teams from manual deployments to pipelines.
Managing multiple clusters and aligning security policies across them.

Key practice:

Standardize “golden templates” for projects, RBAC, and network policies to keep large environments manageable.

Case Study 2: E‑Commerce – Highly Scalable Customer-Facing Platform

Context and Goals

A global e-commerce company needs to:

Handle large traffic spikes (sales events, holidays).
Experiment rapidly with features (A/B testing).
Deliver consistent performance with global user base.

They choose OpenShift for:

Kubernetes-based orchestration with opinionated defaults.
Integrated routing and load balancing.
Smooth integration with cloud load balancers and managed storage.
Built-in tools for canary and rolling deployments.

Cluster Layout and Scaling Strategy

Cloud-first deployment across multiple regions.
Multiple OpenShift clusters, each in a different region, fronted by global DNS and cloud load balancers.
Horizontal scalability emphasized:

Worker nodes autoscaled based on cluster metrics.
Horizontal Pod Autoscalers (HPAs) configured for core services.

For example:

Frontend services: scale based on CPU and custom HTTP latency metrics.
Recommendation and search services: scale based on queue depth or request rate.

Application Design on OpenShift

Key workloads:

Stateless microservices (cart, catalog, checkout, user profile).
Background processing (order fulfillment, inventory updates).
API gateway and BFF (Backend-for-Frontend) layers.

Implementation patterns:

Each service runs as a Deployment with:

Blue-green or canary deployment strategies managed via Deployments or specialized Operators.
Separate namespaces for core domains (e.g., checkout, catalog, search).

Caching layers (Redis, in-memory caches) either:

Deployed as Operators on OpenShift, or
Consumed as managed cloud services.

OpenShift Routes used heavily for:

Exposing microservices to the API gateway.
Performing path-based routing and TLS termination.
Integrating with web application firewalls (WAFs).

CI/CD and Feature Delivery

The e-commerce team emphasizes fast iterations:

Developers work in feature branches with short-lived namespaces:

user-feature-<id> projects created via automation.
Automated cleanup of unused namespaces.

Pipelines (Tekton or external CI/CD platforms):

Build images, run automated tests, run performance smoke tests.
Deploy to staging environments, then to production via manual approval or canary policies.

Canary flow example:

Deploy new version to a small subset of pods in production namespace.
Use traffic-splitting (via service mesh or routing rules) to send a small portion of traffic.
Observe metrics (error rate, latency, conversion).
Promote or roll back using OpenShift’s deployment history.

Observability and Business Metrics

Beyond basic technical monitoring:

Correlate technical metrics with business KPIs (checkout success rate, response time vs. cart abandonment).
Distributed tracing deployed for critical user journeys (e.g., from home page to payment).
Log sampling combined with full logging for error and security events.

Outcomes and Lessons:

Benefits:

Ability to handle extreme load spikes by autoscaling pods and nodes.
Safer experimentation with canary deployments and traffic shifting.

Challenges:

Cost management for autoscaling clusters across regions.
Complex cross-region data consistency.

Key practice:

Treat OpenShift configuration as code; consistently reuse Helm charts, Kustomize overlays, or Operators to avoid drift between environments and regions.

Case Study 3: Telecommunications – Network Functions and Edge

Context and Goals

A large telecom operator aims to:

Virtualize network functions (VNFs/CNFs) on cloud-native infrastructure.
Support 5G workloads and edge computing use cases.
Standardize operations across central and edge sites.

OpenShift is selected because of:

Support for telco-specific configurations and Operators.
Ability to run on bare metal with specialized networking.
Ecosystem support for hardened, certified CNFs.

Architecture Overview

Central data centers:

Large OpenShift clusters for control-plane network functions, OSS/BSS, analytics.

Edge locations:

Smaller OpenShift clusters (or Single-node OpenShift) for local packet processing and low-latency services.

Connectivity:

Encrypted tunnels and strict segmentation between edge and core.
Integration with telecom-grade underlay networks.

Specialized Node and Workload Configuration

Node pools (machine sets) tuned for:

Data-plane workloads: CPU pinning, huge pages, SR-IOV for high-performance networking.
Control-plane applications: standard virtualization-like tuning.

Use of specific SCCs and node labels:

CNF pods scheduled only on appropriately tuned nodes via nodeSelector and affinity rules.
Taints and tolerations used to ensure telco workloads do not share nodes with general-purpose applications.

Lifecycle and Operations

Operators used extensively:

For CNF lifecycle management (installation, upgrade, rollback).
For cluster configuration (networking plugins, SR-IOV configuration).

Rolling upgrades coordinated to maintain carrier-grade SLAs:

Change windows and maintenance processes tightly integrated with OpenShift’s upgrade workflows.
Pre-production lab clusters mirror production for validation.

Outcomes and Lessons:

Benefits:

Faster rollout of new network services.
Better utilization of hardware compared to standalone appliances.

Challenges:

Deep expertise required in both networking and Kubernetes/OpenShift.
Strict performance and latency testing for each upgrade or change.

Key practice:

Standardize “telco profiles” for nodes and Operators to ensure consistent deployments across many sites.

Case Study 4: Research and HPC – Data Science at Scale

Context and Goals

A research organization wants to:

Support data scientists and researchers with flexible compute.
Run batch workloads and interactive notebooks.
Mix on-prem HPC resources with cloud capacity.

They adopt OpenShift to:

Provide a self-service platform for running containerized jobs.
Isolate users and teams securely.
Integrate with existing HPC clusters and storage systems.

Architecture and Workflows

Hybrid deployment:

On-prem OpenShift clusters close to HPC storage.
Optional burst capacity in cloud-based OpenShift clusters for peak workloads.

Workload types:

Interactive workloads: Jupyter notebooks, web-based analytics UIs.
Batch workloads: data preprocessing, simulation, model training.
GPU-accelerated jobs: deep learning, scientific computing.

Platform Usage Patterns

Multi-tenant namespaces:

team-astro, team-bio, team-ml, each with quotas and limits.
Projects pre-configured with shared ConfigMaps for data locations and standard images.

Job orchestration:

Job and CronJob resources used for batch processing.
Integration with existing schedulers or workflow managers where needed (e.g., Argo Workflows).

GPUs and accelerators:

Specialized node pools with GPUs, managed with GPU Operators.
Quotas controlling GPU access per team.

Data and Compliance Considerations

Research data often sensitive (e.g., medical, genomic):

Strict RBAC and network isolation between teams.
Encrypted storage and restrictions on external data egress.

Data ingestion and ETL pipelines run inside OpenShift to enforce policies.

Outcomes and Lessons:

Benefits:

Self-service for researchers while centralizing security and operations.
Better reproducibility thanks to containerized environments.

Challenges:

Teaching non-DevOps users how to work with containers and OpenShift.
Complex data access patterns and storage performance tuning.

Key practice:

Provide curated base images, templates, and documentation for common workflows so users don’t have to become platform experts.

Case Study 5: Public Sector – Compliance-Driven Digital Services

Context and Goals

A government agency wants to:

Deliver citizen-facing services online (portals, APIs).
Comply with strict government regulations around security, privacy, and data residency.
Standardize development practices across many internal teams and contractors.

OpenShift is chosen because:

It is available in hardened, certified variants for government regions.
It provides integrated tools for security, compliance, and auditing.
It supports both on-premise and cloud-based environments.

Platform Design and Governance

Central platform team provides shared OpenShift clusters as a service to internal departments.
Tenants (different departments/programs) receive:

Dedicated projects for dev/test/prod.
Pre-configured RBAC roles and resource quotas.
Policy-enforced ingress/egress controls.

Standard CI/CD patterns provided as reusable templates and reference implementations.

Security and Compliance Focus

Typical measures:

Mandatory image scanning and signing; only approved base images allowed.
Admission policies ensuring:

No privileged containers.
Only certain SCCs can be used in production.

Comprehensive logging forwarded to centralized government logging platforms for audit and incident response.

Outcomes and Lessons:

Benefits:

Shared, centrally governed platform reduces duplication and risk.
Faster onboarding of new projects with standard patterns.

Challenges:

Balancing stringent security control with developer productivity.
Coordinating between multiple agencies and contractors.

Key practice:

Document clear “platform contracts”: what the platform guarantees and what application teams are responsible for (e.g., application-level encryption, input validation).

Cross-Cutting Themes and Takeaways

Across these diverse industries, some common patterns emerge:

Platform as a product:

Successful organizations treat OpenShift as an internal product with roadmaps, SLAs, and support processes.

Configuration and policy as code:

GitOps, Operators, and templates reduce configuration drift and improve auditability.

Strong multi-tenancy and RBAC:

Project and namespace design is critical to security and operability.

Standardization vs. flexibility:

A catalog of approved images, Operators, and templates gives consistency while still allowing teams to innovate.

Investment in people and processes:

Training, documentation, and clear operational procedures are as important as the platform features themselves.

These case studies illustrate how the abstract concepts from earlier chapters translate into real-world platform architectures and workflows. They also highlight that technical choices in OpenShift are tightly connected to organizational structure, risk tolerance, and business goals.

Comments

Please login to add a comment.

Don't have an account? Register now!

18.3 Case studies from industry

Why Case Studies Matter

Case Study 1: Financial Services – Regulated Banking Platform

Context and Goals

High-Level Architecture

Projects, Namespaces, and Multi-Tenancy

Application Deployment Pattern

CI/CD and Governance

Observability and Operations

Case Study 2: E‑Commerce – Highly Scalable Customer-Facing Platform

Context and Goals

Cluster Layout and Scaling Strategy

Application Design on OpenShift

CI/CD and Feature Delivery

Observability and Business Metrics

Case Study 3: Telecommunications – Network Functions and Edge

Context and Goals

Architecture Overview

Specialized Node and Workload Configuration

Lifecycle and Operations

Case Study 4: Research and HPC – Data Science at Scale

Context and Goals

Architecture and Workflows

Platform Usage Patterns

Data and Compliance Considerations

Case Study 5: Public Sector – Compliance-Driven Digital Services

Context and Goals

Platform Design and Governance

Security and Compliance Focus

Cross-Cutting Themes and Takeaways

Comments

Where to Move