Monitoring¶
The kMetal umbrella chart does not bundle a monitoring stack. If your environment needs Prometheus, Grafana, or Alertmanager, deploy them separately and point them at the kMetal components below.
Scrape targets¶
The platform components expose Prometheus metrics on standard ports. The most useful ServiceMonitor patterns:
Kamaji controller¶
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: kamaji-controller
namespace: monitoring # wherever your Prometheus operator runs
spec:
selector:
matchLabels:
app.kubernetes.io/name: kamaji
endpoints:
- port: metrics
interval: 30s
namespaceSelector:
matchNames:
- kmetal-kamaji
cert-manager¶
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: cert-manager
namespace: monitoring
spec:
selector:
matchLabels:
app.kubernetes.io/name: cert-manager
endpoints:
- port: tcp-prometheus-servicemonitor
interval: 30s
namespaceSelector:
matchNames:
- kmetal-cert-manager
MetalLB, KubeVirt, Kube-OVN, CAPI¶
t.b.d. — Worked ServiceMonitor examples for the remaining platform components are t.b.d. in this section. Each upstream chart documents its own metrics endpoint and labels; refer to those until this section is populated.
Useful metrics¶
Metric names below are illustrative — verify what's actually exposed in your installed component versions before wiring alerts.
cert_manager_certificate_ready— certificate readiness per Certificate objectcert_manager_certificate_expiry_timestamp— expiration time for certificate-rotation alerting- Kamaji controller metrics — see the Kamaji documentation for the current set
Quick checks without a metrics stack¶
If you haven't deployed a metrics stack yet, the same health is observable through kubectl:
# Anything not Running or Completed
kubectl get pods -A | grep -v Running | grep -v Completed
# Certificates not ready
kubectl get certificates -A | grep -v True
# LoadBalancer services pending an IP
kubectl get svc -A | grep LoadBalancer | grep -i pending
# Tenant control planes
kubectl get tenantcontrolplane -A
# Recent warnings
kubectl get events -A --field-selector type=Warning --sort-by='.lastTimestamp' | tail -20
Alerts¶
t.b.d. — A canonical alert ruleset (which metrics matter, threshold values, severity mapping) is t.b.d. in this section. Operators should start from the upstream chart's recommended PrometheusRules and tune for their environment.