Component Best Practices¶
Operational guidance for configuring kMetal platform components in production.
Start with chart defaults¶
The umbrella chart ships defaults tested against the reference deployments. Override only what your environment actually requires (NIC names, IP pool ranges, node-path mappings, resource sizing). Leave the rest at defaults until you have a concrete reason to change it.
Per-environment overlays¶
Maintain one values overlay per environment (lab, staging, production). Keep the overlays small — they should contain only the keys that legitimately differ between environments. The chart's own values.yaml provides the canonical defaults.
helm install kmetal oci://ghcr.io/clastix/oci/kmetal \
--namespace kmetal-flux \
--values kmetal-values-prod.yaml
Test in non-production first¶
Always validate values changes in a non-production environment before applying them in production.
# Dry-run render
helm template kmetal oci://ghcr.io/clastix/oci/kmetal \
--values kmetal-values.yaml > rendered.yaml
# Diff against the current release
helm diff upgrade kmetal oci://ghcr.io/clastix/oci/kmetal \
--values kmetal-values.yaml
High availability¶
For production:
- Run at least 3 control-plane nodes on the under cluster.
- Keep Kamaji at
replicas: 2+and pin to control-plane nodes (matches chart default behavior). - Place tenant control-plane VIPs (MetalLB) on a dedicated address pool with enough headroom for your tenant count.
- Spread workers across failure domains for tenant compute resilience.
Resource limits¶
Set requests and limits for every component you customize. Otherwise components compete unbounded for under-cluster resources.
Reasonable starting points (tune from observed usage):
| Component | requests | limits |
|---|---|---|
| Kamaji | 1 CPU / 512 Mi | 2 CPU / 1 Gi |
| cert-manager | 100 m / 128 Mi | 200 m / 256 Mi |
| MetalLB controller | 50 m / 50 Mi | 100 m / 100 Mi |
Monitor real usage with kubectl top and adjust.
Monitor after every change¶
If a release leaves pods in CrashLoopBackOff or Pending, roll back:
Component-specific notes¶
- Kube-OVN —
network.kubeOvn.tunnelInterfaceis mandatory and per-deployment. The chart can't guess your NIC name. - MetalLB —
metallb.poolsmust point at an IP range your network team has reserved for the under cluster's external traffic. - local-path-provisioner — Acceptable for lab use; replace with a vendor CSI driver for production storage.
- KubeVirt —
kubevirt.versionis pinned to a vendored upstream manifest. Don't change it unless you also re-vendor.
Related Documentation¶
- Component Configuration — the override pattern
- Platform Values — top-level values shape
- Helm Values Reference — full per-component schema