Cluster Troubleshooting¶
Common issues with tenant cluster operations.
Cluster Creation Fails¶
Cluster stuck in provisioning phase.
# Check cluster and control plane
kubectl get cluster,tenantcontrolplane <name> -n <namespace>
kubectl describe tenantcontrolplane <name> -n <namespace>
# Check control plane pods
kubectl get pods -n kmetal-kamaji -l kamaji.clastix.io/name=<name>
kubectl logs -n kmetal-kamaji <control-plane-pod>
# Check events
kubectl get events -n <namespace> --sort-by='.lastTimestamp'
Control Plane Not Ready¶
TenantControlPlane shows not ready.
# Check control plane status
kubectl get tenantcontrolplane <name> -n <namespace> -o yaml
# Check datastore
kubectl get pods -n kmetal-kamaji -l app.kubernetes.io/name=etcd
# Restart control plane pod
kubectl delete pod -n kmetal-kamaji -l kamaji.clastix.io/name=<name>
Worker Nodes Not Joining¶
Machines created but nodes don't appear.
# Check machines
kubectl get machines -n <namespace>
kubectl describe machine <machine-name> -n <namespace>
# Check control plane endpoint
kubectl get tenantcontrolplane <name> -n <namespace> -o jsonpath='{.status.controlPlaneEndpoint.host}:{.status.controlPlaneEndpoint.port}'
curl -k https://<endpoint>:6443/healthz
# Check infrastructure provider logs
kubectl logs -n kmetal-capi-providers -l control-plane=controller-manager --tail=100
Kubeconfig Not Working¶
Cannot connect with downloaded kubeconfig.
# Re-download kubeconfig
kubectl get secret <cluster>-admin-kubeconfig -n <namespace> \
-o jsonpath='{.data.admin\.conf}' | base64 -d > cluster.kubeconfig
# Check endpoint and DNS
kubectl get tenantcontrolplane <name> -n <namespace> -o jsonpath='{.status.controlPlaneEndpoint.host}:{.status.controlPlaneEndpoint.port}'
nslookup <endpoint>
# Test connection
export KUBECONFIG=cluster.kubeconfig
kubectl cluster-info
Cluster Scaling Stuck¶
Machines not scaling as expected.
# Check MachineDeployment
kubectl get machinedeployment -n <namespace>
kubectl describe machinedeployment <name> -n <namespace>
# Check machines
kubectl get machines -n <namespace>
# Check infrastructure provider
kubectl logs -n kmetal-capi-providers -l control-plane=controller-manager --tail=100
Cluster Not Deleting¶
Cluster deletion hangs.
# Check finalizers
kubectl get cluster <name> -n <namespace> -o yaml | grep finalizers -A 5
# Delete machines first
kubectl delete machines --all -n <namespace> --wait=false
# Remove finalizers if stuck
kubectl patch cluster <name> -n <namespace> -p '{"metadata":{"finalizers":[]}}' --type=merge
Collect Diagnostics¶
# Management cluster view
kubectl get cluster,tenantcontrolplane,machines -n <namespace> -o yaml > cluster-resources.yaml
kubectl get events -n <namespace> --sort-by='.lastTimestamp' > events.txt
kubectl logs -n kmetal-kamaji -l kamaji.clastix.io/name=<name> --tail=500 > control-plane.log
# Tenant cluster view (if accessible)
kubectl --kubeconfig=<cluster>.kubeconfig get nodes,pods -A -o yaml > tenant-resources.yaml