Skip to content

Monitoring

The kMetal umbrella chart does not bundle a monitoring stack. If your environment needs Prometheus, Grafana, or Alertmanager, deploy them separately and point them at the kMetal components below.

Scrape targets

The platform components expose Prometheus metrics on standard ports. The most useful ServiceMonitor patterns:

Kamaji controller

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: kamaji-controller
  namespace: monitoring         # wherever your Prometheus operator runs
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: kamaji
  endpoints:
    - port: metrics
      interval: 30s
  namespaceSelector:
    matchNames:
      - kmetal-kamaji

cert-manager

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: cert-manager
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: cert-manager
  endpoints:
    - port: tcp-prometheus-servicemonitor
      interval: 30s
  namespaceSelector:
    matchNames:
      - kmetal-cert-manager

MetalLB, KubeVirt, Kube-OVN, CAPI

t.b.d. — Worked ServiceMonitor examples for the remaining platform components are t.b.d. in this section. Each upstream chart documents its own metrics endpoint and labels; refer to those until this section is populated.

Useful metrics

Metric names below are illustrative — verify what's actually exposed in your installed component versions before wiring alerts.

  • cert_manager_certificate_ready — certificate readiness per Certificate object
  • cert_manager_certificate_expiry_timestamp — expiration time for certificate-rotation alerting
  • Kamaji controller metrics — see the Kamaji documentation for the current set

Quick checks without a metrics stack

If you haven't deployed a metrics stack yet, the same health is observable through kubectl:

# Anything not Running or Completed
kubectl get pods -A | grep -v Running | grep -v Completed

# Certificates not ready
kubectl get certificates -A | grep -v True

# LoadBalancer services pending an IP
kubectl get svc -A | grep LoadBalancer | grep -i pending

# Tenant control planes
kubectl get tenantcontrolplane -A

# Recent warnings
kubectl get events -A --field-selector type=Warning --sort-by='.lastTimestamp' | tail -20

Alerts

t.b.d. — A canonical alert ruleset (which metrics matter, threshold values, severity mapping) is t.b.d. in this section. Operators should start from the upstream chart's recommended PrometheusRules and tune for their environment.

Next Steps