OpenShift HCI is not only a packaging choice. It is an architecture decision about how compute, storage, virtualization, recovery, and day-2 platform operations work together after a team chooses OpenShift as a private-cloud or VMware-exit destination.
This reference architecture is for platform teams evaluating OpenShift storage, hyper-converged storage for OpenShift, or a VMware-to-OpenShift migration. It focuses on the storage layer underneath virtual machines, containers, databases, platform services, and mixed stateful workloads.
The core idea is simple: start with a vSAN-like operating shape when that helps adoption, but avoid designing the next platform around the same coupling that made the old one hard to change.
Architecture Goal
The goal is to give OpenShift teams an HCI storage model that is familiar enough for infrastructure teams, native enough for platform teams, and flexible enough to evolve into hybrid or disaggregated storage later.
A good OpenShift HCI storage architecture should support:
- OpenShift-native provisioning through CSI and StorageClasses.
- VM disks for OpenShift Virtualization and KubeVirt-adjacent workloads.
- Persistent volumes for databases, queues, platform services, and AI workloads.
- Snapshots, clones, backup, and recovery workflows that fit day-2 operations.
- Failure-domain placement that survives realistic node, rack, or maintenance events.
- A migration path from hyper-converged to hybrid or disaggregated storage when scale changes.
For teams leaving VMware, this matters because the storage system is no longer hidden behind vSAN or a datastore abstraction. OpenShift makes storage policy visible through Kubernetes objects, which is good, but it also means the architecture must be explicit.
Reference Topology
| Layer | What it does | Design note |
|---|---|---|
| OpenShift platform | Schedules containers, VM workloads, operators, and platform services | Keep storage policies visible to platform teams through StorageClasses. |
| Workload layer | Runs databases, VM disks, message brokers, AI services, and stateful apps | Separate workload classes by latency, recovery, and isolation needs. |
| CSI policy layer | Maps PVCs and VM disks to storage capabilities | Use clear classes for performance, replication, snapshots, and topology. |
| simplyblock storage layer | Provides low-latency block storage over NVMe-oF | Use hyper-converged, hybrid, or disaggregated layouts without changing the storage story. |
| Physical infrastructure | Servers, NVMe devices, network, racks, and failure domains | Validate NIC, switch, rack, zone, and power-domain assumptions before production. |
In a hyper-converged starting point, OpenShift worker nodes contribute both compute and storage. That gives smaller platform teams a simpler operational unit and can make the first migration wave easier to explain. The risk is that storage and compute rarely grow at the same rate forever. The architecture should therefore define how the team can add storage capacity, add storage performance, or separate storage from compute without changing the entire platform strategy.
Node and Role Layout
Use a small number of explicit node roles instead of treating every node as identical:
| Node role | Typical responsibility | Storage design note |
|---|---|---|
| Control-plane nodes | OpenShift control plane and cluster API availability | Keep production storage data paths off the control plane unless the cluster profile explicitly requires compact mode. |
| HCI worker nodes | Run applications and contribute storage capacity/performance | Use consistent NVMe, NIC, and failure-domain layout so replicas or protection groups are meaningful. |
| Storage-heavy workers | Add storage density for mixed or hybrid layouts | Useful when capacity grows faster than application CPU demand. |
| Compute-heavy workers | Run CPU/GPU-heavy workloads with less local storage contribution | Useful for AI, analytics, or application tiers that should consume shared block storage without adding storage media. |
| Edge or remote nodes | Smaller footprints with local failure-domain constraints | Keep topology and recovery rules stricter because there are fewer nodes to absorb failures. |
The mistake to avoid is pretending that all worker nodes are interchangeable if their hardware, network path, or failure domains are different. OpenShift scheduling can only make good placement decisions when the platform team models the topology accurately.
Deployment Model Decision
| Model | Best fit | Tradeoff |
|---|---|---|
| Hyper-converged | Smaller OpenShift footprints, fast VMware-exit pilots, edge clusters, or teams that want one operational unit | Compute and storage capacity can grow unevenly. |
| Disaggregated | Larger platforms where storage and compute scale independently | Requires a cleaner storage network, placement model, and capacity plan. |
| Hybrid | Mixed estate with different workload classes or growth patterns | Requires explicit placement and policy standards. |
OpenShift teams often start with HCI because it is easy to explain and operate. That is reasonable. The first topology should not trap the team, though. A platform that begins as HCI may still need independent storage scaling later.
Use this decision rule:
- Choose hyper-converged when the team values simplicity, smaller starting footprint, local storage contribution, and a vSAN-like operating model.
- Choose disaggregated when storage growth, storage performance, or resilience requirements exceed the application node growth rate.
- Choose hybrid when one platform needs more than one storage shape, such as VM disks on HCI nodes and large database volumes on dedicated storage nodes.
Network and Data-Path Requirements
OpenShift HCI storage should be designed as an east-west data path, not an afterthought on the same network used for every control-plane concern.
At minimum, document:
- Which NICs carry storage traffic.
- Whether storage traffic is isolated with VLANs, routing, or dedicated interfaces.
- Which switches and links sit inside the same failure domain.
- Whether the target transport is NVMe/TCP, NVMe/RoCE, or both.
- How multipath or reconnect behavior is expected to work under link failure.
- Which metrics show early network saturation or tail-latency growth.
NVMe/TCP is often the pragmatic default because it works over standard Ethernet and fits many OpenShift private-cloud environments. NVMe/RoCE can be useful when a team has RDMA-capable networking and can operate that fabric correctly. The important point is not choosing a protocol for a benchmark slide; it is choosing a transport the platform team can operate under failure and maintenance conditions.
Failure-Domain Rules
Use these rules before declaring an OpenShift HCI design production-ready:
- Map storage replicas or protection groups across meaningful failure domains, not only across random nodes.
- Check whether a node drain creates temporary storage and compute concentration risk.
- Keep platform services and VM disks away from a single noisy-neighbor storage class.
- Define recovery behavior for a node, rack, power, or zone event before workload onboarding.
- Monitor p95 and p99 latency, not only throughput and average IOPS.
- Test rebuild or recovery behavior while workloads are still running.
- Document what “degraded but acceptable” means for each workload class.
The subtle risk in HCI is correlated failure. If compute, storage, and critical workloads all concentrate on the same failure domain, a node or rack event can become larger than expected. The reference architecture should make those boundaries visible.
StorageClass Pattern
Most OpenShift HCI clusters should not expose one default storage class for every workload. Use a small number of understandable classes:
| StorageClass intent | Typical workloads | Policy focus |
|---|---|---|
latency-critical | Databases, metadata stores, latency-sensitive services | Low latency, isolation, predictable queue depth. |
vm-disk | OpenShift Virtualization and KubeVirt VM disks | VM disk behavior, snapshot/clone workflows, migration planning. |
general-stateful | Message brokers, platform services, normal PVCs | Balanced performance and capacity efficiency. |
capacity-efficient | Less latency-sensitive data services | Efficiency, thin provisioning, and lower-cost growth. |
migration-staging | Temporary migration waves, clones, validation environments | Clear lifecycle owner and cleanup policy. |
Each class should have an owner, a purpose, and measurable behavior. If the storage class list grows without governance, the platform becomes harder to support. If the list is too small, teams lose the ability to protect critical workloads.
Example StorageClass Review Checklist
Before exposing a class broadly, answer these questions:
- What workloads should use this class?
- What workloads should not use it?
- What latency and throughput behavior should users expect?
- Does it support snapshots and clones?
- What is the failure-domain policy?
- What happens during node drain or node failure?
- Who owns capacity and performance alerts?
- How will stale PVCs and migration staging volumes be cleaned up?
This is more important than the exact class name. Teams can change naming conventions later, but unclear policy usually turns into production confusion.
Operational Readiness Gate
Use this gate before production onboarding:
| Gate | Pass condition |
|---|---|
| Topology | Storage placement aligns with node, rack, and zone failure domains. |
| StorageClasses | Each class has a documented workload intent and owner. |
| VM disk behavior | VM disks can be provisioned, snapshotted, cloned, and recovered through the expected workflow. |
| Database behavior | p95 and p99 latency are tested under realistic write pressure. |
| Maintenance | Node drain and upgrade behavior are tested while representative workloads run. |
| Recovery | Backup, restore, and rollback paths are tested, not only documented. |
| Observability | Alerts exist for latency, saturation, capacity, failed volume operations, and rebuild pressure. |
The readiness gate should be owned by the platform team and the storage owner together. If only one side signs off, the design usually misses either operational fit or storage correctness.
Where simplyblock Fits
Simplyblock fits when OpenShift HCI is part of a broader private-cloud or VMware-exit program and the storage layer needs to support serious stateful workloads. The platform provides Kubernetes-native block storage with NVMe/TCP and NVMe/RoCE support, CSI integration, snapshots, clones, and flexible deployment models.
For OpenShift teams, the useful distinction is not “HCI or not HCI.” The better question is whether the storage layer can support the first topology and still evolve later.
Planning OpenShift HCI storage?
Talk to a storage architect about the right starting topology for VM disks, databases, and OpenShift stateful workloads.
Questions and Answers
Is OpenShift HCI the same as vSAN on OpenShift?
No. OpenShift HCI can deliver a similar operational outcome, where compute and storage run close together, but it should still use OpenShift-native workflows, CSI, and a storage layer that is not tied to VMware.
Should OpenShift HCI use one storage class?
Usually no. One default class is easy early on but weak for production. VM disks, databases, and general stateful services often need different latency, protection, and recovery policies.
Can OpenShift HCI evolve into disaggregated storage later?
Yes, if the storage platform supports more than one deployment model. That is one reason simplyblock positions HCI, hybrid, and disaggregated deployment as architecture choices rather than separate products.
How does this relate to OpenShift Data Foundation?
OpenShift Data Foundation is a common reference point for OpenShift storage. Teams still evaluate alternatives when they need different performance, operating-model, or VMware-exit characteristics.
What should teams validate before production?
Validate p99 latency, node drain behavior, failure domains, recovery workflows, and whether the storage class model matches real workload classes.