Skip to content

Installation Troubleshooting

Common issues during kMetal installation.

Registry Authentication Failed

Image pull fails with authentication error.

# Recreate the registry pull secret in the chart's namespace
kubectl delete secret clastix-ghcr -n kmetal-flux
kubectl create secret docker-registry clastix-ghcr \
  --docker-server=ghcr.io \
  --docker-username=<username> \
  --docker-password=<token> \
  --namespace=kmetal-flux

# Re-run the install / upgrade so the secret is consumed
helm upgrade kmetal oci://ghcr.io/clastix/oci/kmetal \
  --namespace kmetal-flux \
  --values kmetal-values.yaml \
  --wait --timeout=15m

Helm Install Timeout

Installation exceeds the default timeout (often because component pods take time to pull images first).

helm install kmetal oci://ghcr.io/clastix/oci/kmetal \
  --namespace kmetal-flux \
  --create-namespace \
  --values kmetal-values.yaml \
  --wait --timeout=20m

If the install is in progress, you can watch it:

kubectl get pods -A | grep -v Running | grep -v Completed
helm status kmetal -n kmetal-flux

Release Stuck in pending-install or pending-upgrade

Helm release left in a transient state by a previous failed attempt.

helm history kmetal -n kmetal-flux
helm rollback kmetal <last-good-revision> -n kmetal-flux

If no good revision exists, uninstall and reinstall:

helm uninstall kmetal -n kmetal-flux
helm install kmetal oci://ghcr.io/clastix/oci/kmetal \
  --namespace kmetal-flux \
  --values kmetal-values.yaml --wait --timeout=20m

Insufficient Resources

Pods pending due to resource constraints.

kubectl get pods -A | grep Pending
kubectl describe pod <pod> -n <namespace>     # look for Insufficient cpu/memory

kubectl top nodes
kubectl describe nodes | grep -A 5 'Allocated resources'

Fix: add nodes, or reduce resource requests in the chart values overlay.

PVC Stuck Pending

kubectl get pvc -A
kubectl describe pvc <pvc> -n <namespace>

kubectl get storageclass        # exactly one should be marked (default)

If no default StorageClass:

kubectl patch storageclass <name> \
  -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

MetalLB Not Assigning IPs

LoadBalancer services stuck pending.

kubectl get pods -n kmetal-metallb
kubectl logs -n kmetal-metallb -l app.kubernetes.io/component=speaker --tail=100

kubectl get ipaddresspool -n kmetal-metallb
kubectl get l2advertisement -n kmetal-metallb

The chart creates the IPAddressPool / L2Advertisement from metallb.pools and metallb.l2Advertisements in your values overlay. If those are empty, no IPs will be assigned. Set them in your overlay and run helm upgrade.

Kube-OVN Pods Not Ready

kubectl get pods -n kube-system -l app=ovs
kubectl get pods -n kube-system -l app=kube-ovn-controller
kubectl get pods -n kube-system -l app=kube-ovn-cni

The most common cause is network.kubeOvn.tunnelInterface being unset or pointing at a NIC that doesn't exist on every node. The chart cannot guess the NIC name — it must match a NIC present on every under-cluster node.

Collect Diagnostics

helm status kmetal -n kmetal-flux > helm-status.txt
helm history kmetal -n kmetal-flux > helm-history.txt

kubectl get pods -A > all-pods.txt
kubectl get events -A --sort-by='.lastTimestamp' > events.txt
kubectl top nodes > nodes.txt