Cluster Maintenance¶
Perform routine maintenance on your clusters.
Monitor Cluster Health¶
# View all clusters
kubectl get tenantcontrolplanes -A
# Check specific cluster
kubectl get tenantcontrolplane my-cluster -n my-namespace -o yaml
# View control plane status
kubectl get pods -n kmetal-kamaji -l kamaji.clastix.io/name=my-cluster
# Check worker nodes status
kubectl --kubeconfig=my-cluster.kubeconfig get nodes
Monitor Resources¶
Via Console¶
- Navigate to Clusters → Select your cluster
- View resource usage dashboard
- Check control plane and worker metrics
Via kubectl¶
# Control plane resources
kubectl top pods -n kmetal-kamaji -l kamaji.clastix.io/name=my-cluster
# Worker node resources
kubectl --kubeconfig=my-cluster.kubeconfig top nodes
# Cluster-wide resource usage
kubectl --kubeconfig=my-cluster.kubeconfig top pods -A
Drain and Cordon Nodes¶
# Cordon node (prevent new pods)
kubectl --kubeconfig=my-cluster.kubeconfig cordon worker-node-1
# Drain node (move workloads)
kubectl --kubeconfig=my-cluster.kubeconfig drain worker-node-1 \
--ignore-daemonsets \
--delete-emptydir-data \
--force
# Uncordon node
kubectl --kubeconfig=my-cluster.kubeconfig uncordon worker-node-1
Certificate Rotation¶
Control plane certificates are automatically rotated by Kamaji.
For worker node certificates:
# Check certificate expiration
kubectl --kubeconfig=my-cluster.kubeconfig get csr
# Approve certificate requests
kubectl --kubeconfig=my-cluster.kubeconfig certificate approve <csr-name>
Cluster Pause/Resume¶
To temporarily stop tenant workloads without deleting the cluster, scale workers down via the Cluster topology. Restore by scaling them back up.
Pause workers¶
kubectl patch cluster my-cluster -n <tenant-namespace> --type=json \
-p '[{"op":"replace","path":"/spec/topology/workers/machineDeployments/0/replicas","value":0}]'
Resume workers¶
kubectl patch cluster my-cluster -n <tenant-namespace> --type=json \
-p '[{"op":"replace","path":"/spec/topology/workers/machineDeployments/0/replicas","value":3}]'
Control-plane pause
Pausing the Kamaji control plane (replicas=0) is t.b.d. — the kubevirt-kubeadm ClusterClass does not currently expose a control-plane replicas variable, so the only options today are scaling workers down (above) or deleting the cluster entirely.
Troubleshooting¶
Cluster Not Ready¶
# Check control plane status
kubectl describe tenantcontrolplane my-cluster -n my-namespace
# Check control plane pods
kubectl get pods -n kmetal-kamaji -l kamaji.clastix.io/name=my-cluster
# View control plane logs
kubectl logs -n kmetal-kamaji <control-plane-pod> -c kube-apiserver