Skip to content

Cluster Maintenance

Perform routine maintenance on your clusters.

Monitor Cluster Health

# View all clusters
kubectl get tenantcontrolplanes -A

# Check specific cluster
kubectl get tenantcontrolplane my-cluster -n my-namespace -o yaml

# View control plane status
kubectl get pods -n kmetal-kamaji -l kamaji.clastix.io/name=my-cluster

# Check worker nodes status
kubectl --kubeconfig=my-cluster.kubeconfig get nodes

Monitor Resources

Via Console

  1. Navigate to Clusters → Select your cluster
  2. View resource usage dashboard
  3. Check control plane and worker metrics

Via kubectl

# Control plane resources
kubectl top pods -n kmetal-kamaji -l kamaji.clastix.io/name=my-cluster

# Worker node resources
kubectl --kubeconfig=my-cluster.kubeconfig top nodes

# Cluster-wide resource usage
kubectl --kubeconfig=my-cluster.kubeconfig top pods -A

Drain and Cordon Nodes

# Cordon node (prevent new pods)
kubectl --kubeconfig=my-cluster.kubeconfig cordon worker-node-1

# Drain node (move workloads)
kubectl --kubeconfig=my-cluster.kubeconfig drain worker-node-1 \
  --ignore-daemonsets \
  --delete-emptydir-data \
  --force

# Uncordon node
kubectl --kubeconfig=my-cluster.kubeconfig uncordon worker-node-1

Certificate Rotation

Control plane certificates are automatically rotated by Kamaji.

For worker node certificates:

# Check certificate expiration
kubectl --kubeconfig=my-cluster.kubeconfig get csr

# Approve certificate requests
kubectl --kubeconfig=my-cluster.kubeconfig certificate approve <csr-name>

Cluster Pause/Resume

To temporarily stop tenant workloads without deleting the cluster, scale workers down via the Cluster topology. Restore by scaling them back up.

Pause workers

kubectl patch cluster my-cluster -n <tenant-namespace> --type=json \
  -p '[{"op":"replace","path":"/spec/topology/workers/machineDeployments/0/replicas","value":0}]'

Resume workers

kubectl patch cluster my-cluster -n <tenant-namespace> --type=json \
  -p '[{"op":"replace","path":"/spec/topology/workers/machineDeployments/0/replicas","value":3}]'

Control-plane pause

Pausing the Kamaji control plane (replicas=0) is t.b.d. — the kubevirt-kubeadm ClusterClass does not currently expose a control-plane replicas variable, so the only options today are scaling workers down (above) or deleting the cluster entirely.

Troubleshooting

Cluster Not Ready

# Check control plane status
kubectl describe tenantcontrolplane my-cluster -n my-namespace

# Check control plane pods
kubectl get pods -n kmetal-kamaji -l kamaji.clastix.io/name=my-cluster

# View control plane logs
kubectl logs -n kmetal-kamaji <control-plane-pod> -c kube-apiserver

Worker Nodes Not Joining

# Check machine status
kubectl get machines -n my-namespace
kubectl describe machine <machine-name> -n my-namespace

# Check bootstrap configuration
kubectl get kubeadmconfigs -n my-namespace
kubectl describe kubeadmconfig <config-name> -n my-namespace