Skip to content

Scaling the kMetal

This guide provides comprehensive strategies for scaling the kMetal to handle increasing workloads, tenant clusters, and performance requirements.

Scaling Overview

Scaling Dimensions

  1. Under Cluster: Node scaling, resource scaling
  2. Platform Components: Horizontal scaling, vertical scaling
  3. Tenant Clusters: Control plane scaling, worker node scaling

Scaling Strategies

Strategy Use Case Benefits Considerations
Horizontal Scaling Increased workload Better fault tolerance Network complexity
Vertical Scaling Resource-intensive workloads Simplified architecture Single point of failure
Cluster Scaling More tenant clusters Workload isolation Management overhead
Component Scaling Specific bottlenecks Targeted optimization Dependency management

Under Cluster Scaling

Node Scaling

Adding Under Cluster Nodes

# Check current node capacity
kubectl get nodes -o wide
kubectl top nodes

# Add new nodes to cluster
# Example for kubeadm clusters
kubeadm token create --print-join-command

# On new node
kubeadm join <master-ip>:6443 --token <token> --discovery-token-ca-cert-hash <hash>

# Label nodes for specific workloads
kubectl label nodes <node-name> node-role.kubernetes.io/platform=true
kubectl label nodes <node-name> workload-type=control-plane

Node Affinity for Platform Components

# Platform component node affinity
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:

        - matchExpressions:
          - key: node-role.kubernetes.io/platform
            operator: In
            values: ["true"]
      preferredDuringSchedulingIgnoredDuringExecution:

      - weight: 100
        preference:
          matchExpressions:

          - key: workload-type
            operator: In
            values: ["control-plane"]

Resource Planning

Capacity Planning Formula

Total Resource Requirement = Base Platform + (Tenant Clusters * Per-Cluster Overhead)

Base Platform Resources:

- CPU: 4-8 cores
- Memory: 8-16 GB
- Storage: 100-500 GB

Per-Tenant Cluster Overhead:

- Control Plane: 1-2 cores, 2-4 GB RAM
- Monitoring: 0.5 cores, 1 GB RAM
- Network: 0.1 cores, 0.5 GB RAM

Resource Monitoring

# Monitor resource usage
kubectl top nodes
kubectl top pods -A --sort-by=cpu
kubectl top pods -A --sort-by=memory

# Check resource requests vs limits
kubectl describe nodes | grep -A 5 "Allocated resources"

# Monitor storage usage
df -h /var/lib/kubelet
kubectl get pvc -A

Platform Component Scaling

Each component in the chart accepts a replicas (operator-facing) and full sub-chart override (resources, anti-affinity, PDB, etc.) — see Helm Values Reference. Typical production tuning is:

  • Kamaji: kamaji.replicas: 2 or higher, with sub-chart overrides for resource requests/limits and pod-anti-affinity.
  • cert-manager, MetalLB, Kube-OVN, KubeVirt: chart defaults are reasonable; adjust via the corresponding sub-chart override block if you measure resource pressure.

t.b.d. — A worked production-sizing example with concrete numbers is t.b.d. in this section.

Tenant Cluster Scaling

User Guide Content

Tenant cluster scaling operations (control plane, worker nodes, auto-scaling) are documented in the User Guide: Scale Clusters.

This section focuses on platform-level scaling. For tenant-specific scaling, refer to the user guide.

Data Layer Scaling

Each tenant cluster references its own Kamaji DataStore CR. The chart does not deploy a shared platform datastore — datastore choice (etcd vs CNPG, replica count, sizing) is per-tenant. See Hosted Control Plane for the model.

t.b.d. — Worked datastore sizing guidance per tenant tier is t.b.d. in this section.

Monitoring and Observability Scaling

The kMetal umbrella chart does not ship a monitoring stack. If you operate Prometheus / Alertmanager separately, scale it according to that stack's own guidance.

Auto-scaling Implementation

The Kamaji controller scales by adjusting the chart values' kamaji.replicas and re-running helm upgrade. HPA/VPA against the Kamaji Deployment is not the recommended pattern (the controller is steady-state once you've sized it, and tenant-count growth is the driver — not CPU/memory utilization).

t.b.d. — If your environment needs HPA-style autoscaling on any platform Deployment, the exact Deployment name and container name vary with chart rendering; resolve them via kubectl get deploy -n <namespace> before authoring the HPA/VPA.

Scaling Monitoring

If you have a Prometheus stack deployed (operator's choice), useful scrape signals include:

  • Per-node CPU/memory utilization (node_cpu_seconds_total, node_memory_*).
  • Per-tenant control plane pod resource usage (container_memory_usage_bytes filtered by namespace="kmetal-kamaji").
  • Tenant count growth (count of tenantcontrolplanes.kamaji.clastix.io resources over time).

Exact metric names depend on the Kamaji and exporter versions you've deployed; verify against your live stack before wiring alerts.

t.b.d. — A canonical PromQL ruleset for scaling-related alerts is t.b.d. in this section.

Scaling Alerts

Scaling Alerts Configuration

# Scaling alerts
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: scaling-alerts
spec:
  groups:

  - name: scaling
    rules:

    - alert: HighNodeUtilization
      expr: node_cpu_utilization > 80
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "Node {{ $labels.instance }} CPU utilization is high"
        description: "Node {{ $labels.instance }} has CPU utilization above 80%"

    - alert: HighTenantDensity
      expr: tenant_clusters_per_node > 20
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "High tenant density on node {{ $labels.node }}"
        description: "Node {{ $labels.node }} has more than 20 tenant clusters"

    - alert: KamajiControllerMemoryHigh
      expr: kamaji_controller_memory_usage > 6000000000  # 6GB
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "Kamaji controller memory usage is high"
        description: "Kamaji controller memory usage is above 6GB"

Tenant Cluster Autoscaling

Worker scaling inside a tenant cluster is driven by a per-tenant Cluster Autoscaler running in the under cluster, in the tenant's namespace. One autoscaler instance per tenant cluster, watching that tenant's kubeconfig.

The flow:

  1. A workload in the tenant cluster goes Pending because no node has room.
  2. The autoscaler, watching the tenant's api-server, sees the unschedulable pod.
  3. It scales the matching CAPI MachineDeployment (in the under cluster) up by one replica.
  4. CAPK provisions a new KubeVirt VM as a worker node.
  5. The new worker joins the tenant cluster; the Pending pod schedules onto it.

Scaling down works the inverse way: when nodes are underutilized for the configured period, the autoscaler scales the MachineDeployment down and the corresponding worker VM is decommissioned.

Multiple node sizes per tenant

A tenant can expose multiple worker shapes by giving each one its own MachineDeployment with a distinct nodeSelector / taints profile. The autoscaler will scale each MachineDeployment independently based on which one matches the pending workload's scheduling constraints. Typical use: a small pool for control-plane-light workloads, a large pool for memory-heavy work, an optional gpu pool when GPU passthrough is in play.

Scale-from-zero

MachineDeployments can be configured with min: 0, so a tenant pool sits at zero VMs when no workload needs it and provisions on demand. This is the right setting for niche pools (rarely-used GPU pool, on-demand batch pool) — it eliminates the cost of idle VMs.