Skip to content

Stretched Clusters

A stretched kMetal under cluster spans more than one site so tenant control planes stay available when a zone, data center, or region goes offline. Three deployment patterns are supported.

Pattern A — 3-zone distributed under cluster

All three under-cluster CP nodes sit in three separate failure zones (rooms, racks, buildings, or short-haul DC pairs). Each zone carries worker nodes for tenant VMs.

Zone 1            Zone 2            Zone 3
[ CP-1, W-1 ]   [ CP-2, W-2 ]   [ CP-3, W-3 ]
       \              |              /
        \             |             /
         '------ ≤ 10 ms RTT ------'
  • Sites needed: 3 on-prem zones.
  • RTT budget: ≤ 10 ms between zones (etcd Raft is synchronous).
  • Failure tolerance: any single zone can disappear; quorum preserved.
  • Where it fits: campus-scale, multi-room data centers, co-located DC pairs with low-latency inter-site links.

Pattern B — 2 data zones + cloud arbiter

Two on-prem zones plus a cloud VM as the etcd arbiter.

Zone 1 (on-prem)    Zone 2 (on-prem)    Cloud arbiter
[ CP-1, W-1 ]       [ CP-2, W-2 ]       [ CP-3, no workers ]
  • Sites needed: 2 on-prem + 1 cloud VM.
  • RTT budget: ≤ 10 ms between the two on-prem zones; cloud arbiter tolerates higher RTT (quorum traffic only).
  • Failure tolerance: any one of the three can fail.
  • Where it fits: organizations with two real data centers but not three. The cloud node hosts no tenant VMs.

Pattern C — Under-cluster CP on cloud, workers on-prem

The under-cluster CP nodes run in a cloud region. Compute hosts running tenant VMs stay on-prem. Workers connect outbound to the cloud control plane via Konnectivity.

        Cloud region                          On-prem
[ CP-1, CP-2, CP-3 ]  <-- konnectivity tunnel --  [ W-1, W-2, W-3 ... ]
  • Sites needed: 1 on-prem site + 1 cloud account.
  • RTT budget: WAN-tolerant (worker → CP only; no synchronous Raft across the WAN).
  • No inbound firewall on the on-prem side — the tunnel is initiated by the worker.
  • Where it fits: edge, branch, or factory deployments where the on-prem footprint is small and the management plane runs centrally in a cloud account.

Picking a pattern

Concern Pattern A Pattern B Pattern C
Sites 3 2 + cloud VM 1 + cloud
RTT requirement (worst link) ≤ 10 ms ≤ 10 ms between on-prem zones WAN-tolerant
Tenant data plane crosses WAN? No No No
Fit Metro / campus Two-DC orgs Edge / branch

In all three, the under cluster's etcd is synchronously replicated across the participating zones — committed writes to tenant control planes survive site loss with zero RPO. RTO is bounded by Raft re-election plus pod re-scheduling.