Skip to content

Component Best Practices

Operational guidance for configuring kMetal platform components in production.

Start with chart defaults

The umbrella chart ships defaults tested against the reference deployments. Override only what your environment actually requires (NIC names, IP pool ranges, node-path mappings, resource sizing). Leave the rest at defaults until you have a concrete reason to change it.

Per-environment overlays

Maintain one values overlay per environment (lab, staging, production). Keep the overlays small — they should contain only the keys that legitimately differ between environments. The chart's own values.yaml provides the canonical defaults.

helm install kmetal oci://ghcr.io/clastix/oci/kmetal \
  --namespace kmetal-flux \
  --values kmetal-values-prod.yaml

Test in non-production first

Always validate values changes in a non-production environment before applying them in production.

# Dry-run render
helm template kmetal oci://ghcr.io/clastix/oci/kmetal \
  --values kmetal-values.yaml > rendered.yaml

# Diff against the current release
helm diff upgrade kmetal oci://ghcr.io/clastix/oci/kmetal \
  --values kmetal-values.yaml

High availability

For production:

  • Run at least 3 control-plane nodes on the under cluster.
  • Keep Kamaji at replicas: 2+ and pin to control-plane nodes (matches chart default behavior).
  • Place tenant control-plane VIPs (MetalLB) on a dedicated address pool with enough headroom for your tenant count.
  • Spread workers across failure domains for tenant compute resilience.

Resource limits

Set requests and limits for every component you customize. Otherwise components compete unbounded for under-cluster resources.

Reasonable starting points (tune from observed usage):

Component requests limits
Kamaji 1 CPU / 512 Mi 2 CPU / 1 Gi
cert-manager 100 m / 128 Mi 200 m / 256 Mi
MetalLB controller 50 m / 50 Mi 100 m / 100 Mi

Monitor real usage with kubectl top and adjust.

Monitor after every change

helm history kmetal -n kmetal-flux
kubectl get pods -A | grep -v Running | grep -v Completed

If a release leaves pods in CrashLoopBackOff or Pending, roll back:

helm rollback kmetal <previous-revision> -n kmetal-flux

Component-specific notes

  • Kube-OVNnetwork.kubeOvn.tunnelInterface is mandatory and per-deployment. The chart can't guess your NIC name.
  • MetalLBmetallb.pools must point at an IP range your network team has reserved for the under cluster's external traffic.
  • local-path-provisioner — Acceptable for lab use; replace with a vendor CSI driver for production storage.
  • KubeVirtkubevirt.version is pinned to a vendored upstream manifest. Don't change it unless you also re-vendor.