Skip to content

Platform Upgrades

Procedures for upgrading kMetal platform components and the under cluster.

Upgrade Strategy

kMetal upgrades happen at two layers, decoupled from each other:

  1. Under cluster Kubernetes version — managed by your under-cluster installer (kubeadm or similar). Driven through standard Kubernetes upgrade procedures.
  2. kMetal platform components — managed by the kmetal Helm umbrella chart. A chart-version bump pins the tested combination of every sub-chart (Kamaji, KubeVirt, Kube-OVN, MetalLB, cert-manager, CAPI providers, …) — you upgrade them together as one atomic chart release.

Tenant cluster upgrades are independent of both layers (see User Guide: Upgrade Clusters).

Under Cluster Upgrade

Pre-upgrade checks

kubectl get nodes
kubectl get pods -A | grep -v Running | grep -v Completed

# Take an etcd snapshot first
kubectl -n kube-system exec -it etcd-<node> -- etcdctl snapshot save /tmp/etcd-backup.db

Kubernetes version upgrade (kubeadm)

Upgrade nodes one at a time:

Control-plane node

kubectl drain <cp-node> --ignore-daemonsets

# On the node:
sudo kubeadm upgrade plan
sudo kubeadm upgrade apply <target-version>      # e.g. v1.34.7
sudo apt-get install -y kubelet=<target-version>-* kubectl=<target-version>-*
sudo systemctl daemon-reload && sudo systemctl restart kubelet

kubectl uncordon <cp-node>

Worker node

kubectl drain <worker-node> --ignore-daemonsets --delete-emptydir-data

# On the node:
sudo kubeadm upgrade node
sudo apt-get install -y kubelet=<target-version>-* kubectl=<target-version>-*
sudo systemctl daemon-reload && sudo systemctl restart kubelet

kubectl uncordon <worker-node>

Use the kMetal kubeVersion constraint in the chart's Chart.yaml to confirm the target Kubernetes version is supported before upgrading.

Platform Component Upgrades

The umbrella chart pins each component version. To bump components, update the chart release (which pins the tested combination of all sub-chart versions):

helm upgrade kmetal oci://ghcr.io/clastix/oci/kmetal \
  --version <new-chart-version> \
  --namespace kmetal-flux \
  --values kmetal-values.yaml \
  --wait --timeout=20m

helm history kmetal -n kmetal-flux

Before any production upgrade, diff the rendered output:

helm template kmetal oci://ghcr.io/clastix/oci/kmetal --version <new> \
  --values kmetal-values.yaml > rendered-new.yaml
helm get manifest kmetal -n kmetal-flux > rendered-current.yaml
diff rendered-current.yaml rendered-new.yaml

If the upgrade leaves bad state, roll back:

helm rollback kmetal <previous-revision> -n kmetal-flux

t.b.d. — Per-component independent upgrade paths (overriding a single sub-chart version while keeping the rest pinned) are t.b.d. in this section.

Infrastructure Provider Upgrades

t.b.d.

Tenant Cluster Upgrades

User Guide Content

Tenant cluster upgrade procedures (control plane version bump, worker node rollout, verification) are documented in the User Guide: Upgrade Clusters.

This section covers only platform-level upgrades.

Upgrade Verification

Post-upgrade checks

# Cluster version
kubectl version
kubectl get nodes -o wide

# Control-plane health (componentstatuses is deprecated)
kubectl get --raw='/readyz?verbose'

# Chart release
helm status kmetal -n kmetal-flux
helm history kmetal -n kmetal-flux

# Component pods healthy
kubectl get pods -A | grep -v Running | grep -v Completed

# Tenant control planes still healthy
kubectl get tenantcontrolplanes -A

Rollback

# Roll back the chart release
helm rollback kmetal <previous-revision> -n kmetal-flux

# If the under-cluster Kubernetes upgrade failed, restore from etcd snapshot
kubectl -n kube-system exec -it etcd-<node> -- \
  etcdctl snapshot restore /tmp/etcd-backup.db

t.b.d. — Full disaster-recovery flow after a botched upgrade is t.b.d. in this section; see Disaster Recovery for the broader procedure.