Storage Configuration¶

Configure storage for kMetal platform and tenant clusters.

Platform Storage¶

The under cluster runs two StorageClasses from a single local-path-provisioner Deployment, with different node-path maps:

`StorageClass`	Backing path	Consumer
`local-path`	Control-plane nodes only (e.g., `/var/data/local` on `c1`)	System workloads pinned to CP nodes: Kamaji etcd, CDI scratch, platform-side PVs.
`tenant-storage-class`	Worker nodes only (e.g., `/var/data/tenant` on `w1`, `w2`)	Tenant-served via `kubevirt-csi-driver` — the tenant CSI controllers create DataVolumes here, and the resulting PVCs bind on whichever worker the importer pod runs on (which matches where the consuming tenant worker VM lives).

Both SCs use reclaimPolicy: Delete, volumeBindingMode: WaitForFirstConsumer, and allowVolumeExpansion: true.

HelmRelease shape¶

apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: local-path-provisioner
  namespace: kmetal-flux
spec:
  targetNamespace: kmetal-local-path-storage
  chart:
    spec:
      chart: local-path-provisioner
      version: 0.0.30
  values:
    nodeSelector:
      node-role.kubernetes.io/control-plane: ""   # provisioner pod pinned to CP
    storageClassConfigs:
      local-path:
        storageClass:
          create: true
          defaultClass: false
          defaultVolumeType: hostPath
          reclaimPolicy: Delete
          volumeBindingMode: WaitForFirstConsumer
        nodePathMap:
          - node: c1
            paths:
              - /var/data/local
          - node: DEFAULT_PATH_FOR_NON_LISTED_NODES
            paths: []                                # ⇒ refuse to bind on non-listed nodes
      tenant-storage-class:
        storageClass:
          create: true
          defaultClass: false
          defaultVolumeType: hostPath
          reclaimPolicy: Delete
          volumeBindingMode: WaitForFirstConsumer
        nodePathMap:
          - node: w1
            paths:
              - /var/data/tenant
          - node: w2
            paths:
              - /var/data/tenant
          - node: DEFAULT_PATH_FOR_NON_LISTED_NODES
            paths: []

Path-map split is the security boundary

The nodePathMap for local-path lists only CP nodes; the map for tenant-storage-class lists only workers. DEFAULT_PATH_FOR_NON_LISTED_NODES: [] (empty path list) tells the provisioner to refuse binding on any other node. This is what stops a tenant PVC from accidentally landing on a CP-node disk, and vice versa.

CDI StorageProfile pin¶

local-path only supports RWO + Filesystem. The auto-derived StorageProfile already lists exactly that, but pinning it explicitly documents the policy in GitOps and survives a future swap to a multi-capability backend (Rook-Ceph, Longhorn, vendor CSI) where CDI's default picks may be wrong:

apiVersion: cdi.kubevirt.io/v1beta1
kind: StorageProfile
metadata:
  name: tenant-storage-class
spec:
  claimPropertySets:
    - accessModes:
        - ReadWriteOnce
      volumeMode: Filesystem

If you swap tenant-storage-class for a backend that advertises RWX+Block first (such as some SAN-backed CSIs), keep this pin — CDI otherwise picks RWX+Block for blank DataVolume requests and the importer pod fails on raw-block permission denied.

Replacing local-path with a production CSI¶

To run tenant-served storage on a vendor CSI driver instead:

Install the vendor CSI driver on the under cluster (its own namespace, RBAC, etc.).
Create a StorageClass literally named tenant-storage-class pointing at the vendor provisioner.
Keep the CDI StorageProfile above, adjusting claimPropertySets to the access mode the vendor SC reports as default.
Remove the tenant-storage-class entry from the local-path-provisioner storageClassConfigs so there is no duplicate SC.

The tenant-side kubevirt SC always references infraStorageClassName: tenant-storage-class — it doesn't know which backend serves the SC, so no tenant-cluster change is needed.

NFS Storage¶

Operator's choice

NFS is not bundled with the kMetal chart. The example below uses nfs-subdir-external-provisioner as a common operator pick; substitute whatever NFS CSI driver your environment runs.

# nfs-provisioner.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: nfs-storage
provisioner: nfs.csi.k8s.io
parameters:
  server: nfs-server.company.com
  share: /exports/kubernetes
  mountPermissions: "0755"
reclaimPolicy: Retain
volumeBindingMode: Immediate
mountOptions:

  - hard
  - nfsvers=4.1
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nfs-provisioner
  namespace: kube-system
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nfs-provisioner
  template:
    metadata:
      labels:
        app: nfs-provisioner
    spec:
      serviceAccountName: nfs-provisioner
      containers:

      - name: nfs-provisioner
        image: registry.k8s.io/sig-storage/nfs-subdir-external-provisioner:v4.0.2
        volumeMounts:

        - name: nfs-client-root
          mountPath: /persistentvolumes
        env:

        - name: PROVISIONER_NAME
          value: nfs.csi.k8s.io

        - name: NFS_SERVER
          value: nfs-server.company.com

        - name: NFS_PATH
          value: /exports/kubernetes
      volumes:

      - name: nfs-client-root
        nfs:
          server: nfs-server.company.com
          path: /exports/kubernetes

Tenant Cluster Storage¶

Tenants get persistent storage in their cluster via a kubevirt StorageClass that maps onto the under cluster's tenant-storage-class. See Tenant Storage (CSI) for the full split-topology wiring.

Per-tenant storage quota¶

Cap each tenant's total storage with a ResourceQuota in the tenant's under-cluster namespace, keyed on tenant-storage-class.storageclass.storage.k8s.io/requests.storage:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: alpha-storage
  namespace: alpha
spec:
  hard:
    tenant-storage-class.storageclass.storage.k8s.io/requests.storage: 10Gi

The kmetal-webhook Deployment (installed per tenant by the kmetal-webhook chart with webhook.storageClass=tenant-storage-class) pushes a ValidatingWebhookConfiguration into the tenant cluster pointing back at this quota — so tenants see a clear ResourceQuota … exhausted error on PVC creation rather than a delayed DataVolume failure.

# See what tenants are consuming
kubectl get resourcequota -A -o custom-columns=NS:.metadata.namespace,NAME:.metadata.name,USED:.status.used,HARD:.status.hard \
  | grep -i storage

# Raise a tenant's cap
kubectl patch resourcequota alpha-storage -n alpha --type=merge \
  -p '{"spec":{"hard":{"tenant-storage-class.storageclass.storage.k8s.io/requests.storage":"50Gi"}}}'

CDI on the under cluster¶

KubeVirt's Containerized Data Importer (CDI) handles golden-image imports and DataVolume lifecycle on the under cluster. The CDI operator runs in the system-cdi namespace; it is required for tenant CSI to work because every tenant PVC ultimately becomes a CDI-managed DataVolume on tenant-storage-class. Nothing to configure beyond installing the operator (the chart handles that); admin work is the StorageProfile pin shown above.

Distributed Storage¶

Operator's choice

The under cluster ships only local-path-provisioner by default. Distributed storage is the operator's pick — install Rook-Ceph, Longhorn, a vendor CSI driver, or anything else compatible with the under cluster's Kubernetes version. The examples below are illustrative starting points, not kMetal-specific configuration.

Rook-Ceph¶

# rook-ceph-cluster.yaml
apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
  name: rook-ceph
  namespace: rook-ceph
spec:
  cephVersion:
    image: quay.io/ceph/ceph:v18.2.0
  dataDirHostPath: /var/lib/rook
  mon:
    count: 3
    allowMultiplePerNode: false
  mgr:
    count: 2
    allowMultiplePerNode: false
  dashboard:
    enabled: true
    ssl: true
  storage:
    useAllNodes: true
    useAllDevices: false
    deviceFilter: "^sd[b-z]"
    config:
      osdsPerDevice: "1"
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: rook-ceph-block
provisioner: rook-ceph.rbd.csi.ceph.com
parameters:
  clusterID: rook-ceph
  pool: replicapool
  imageFormat: "2"
  imageFeatures: layering
  csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
  csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
  csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
  csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
  csi.storage.k8s.io/fstype: ext4
allowVolumeExpansion: true
reclaimPolicy: Delete

Longhorn¶

# longhorn-storage.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: longhorn
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: Delete
volumeBindingMode: Immediate
parameters:
  numberOfReplicas: "3"
  staleReplicaTimeout: "2880"
  fromBackup: ""
  fsType: "ext4"
  dataLocality: "disabled"

Object Storage¶

MinIO¶

# minio-deployment.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: minio
  namespace: minio-system
spec:
  serviceName: minio
  replicas: 4
  selector:
    matchLabels:
      app: minio
  template:
    metadata:
      labels:
        app: minio
    spec:
      containers:

      - name: minio
        image: quay.io/minio/minio:latest
        args:

        - server
        - --console-address
        - ":9001"
        - http://minio-{0...3}.minio.minio-system.svc.cluster.local/data
        env:

        - name: MINIO_ROOT_USER
          valueFrom:
            secretKeyRef:
              name: minio-credentials
              key: rootUser

        - name: MINIO_ROOT_PASSWORD
          valueFrom:
            secretKeyRef:
              name: minio-credentials
              key: rootPassword
        ports:

        - containerPort: 9000
          name: api

        - containerPort: 9001
          name: console
        volumeMounts:

        - name: data
          mountPath: /data
        resources:
          requests:
            cpu: "500m"
            memory: "1Gi"
  volumeClaimTemplates:

  - metadata:
      name: data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: local-path
      resources:
        requests:
          storage: 100Gi

Backup Storage¶

Operator's choice

kMetal does not bundle a backup tool. The examples below show Velero — a common operator choice — configured against S3-compatible object storage. Adapt to whatever backup tooling your environment uses.

Velero with S3¶

# velero-backup-location.yaml
apiVersion: velero.io/v1
kind: BackupStorageLocation
metadata:
  name: default
  namespace: velero
spec:
  provider: aws
  objectStorage:
    bucket: kmetal-backups
    prefix: velero
  config:
    region: us-west-2
    s3ForcePathStyle: "false"
    s3Url: https://s3.us-west-2.amazonaws.com
---
apiVersion: velero.io/v1
kind: VolumeSnapshotLocation
metadata:
  name: default
  namespace: velero
spec:
  provider: aws
  config:
    region: us-west-2

Velero with MinIO¶

# velero-minio-backup.yaml
apiVersion: velero.io/v1
kind: BackupStorageLocation
metadata:
  name: minio
  namespace: velero
spec:
  provider: aws
  objectStorage:
    bucket: velero
  config:
    region: minio
    s3ForcePathStyle: "true"
    s3Url: http://minio.minio-system:9000
    publicUrl: https://minio.company.com

Storage Performance¶

Benchmark Storage¶

# Deploy fio for benchmarking
kubectl run fio-benchmark --rm -it --image=ljishen/fio -- /bin/bash

# Run sequential write test
fio --name=seqwrite --rw=write --bs=1M --size=1G --numjobs=1 --runtime=60 --time_based --filename=/data/test

# Run random write test
fio --name=randwrite --rw=randwrite --bs=4K --size=1G --numjobs=4 --runtime=60 --time_based --filename=/data/test

# Run read test
fio --name=read --rw=read --bs=1M --size=1G --numjobs=1 --runtime=60 --time_based --filename=/data/test

Monitor Storage Usage¶

# Check PV usage
kubectl get pv

# Check PVC usage
kubectl get pvc -A

# Storage capacity
kubectl top pv

# Per-node storage usage
kubectl get --raw /api/v1/nodes/<node-name>/proxy/stats/summary | \
  jq '.node.fs'

Storage Troubleshooting¶

Debug PV/PVC Issues¶

# Check PVC status
kubectl get pvc -A
kubectl describe pvc <pvc-name> -n <namespace>

# Check PV binding
kubectl get pv
kubectl describe pv <pv-name>

# Check storage class
kubectl get storageclass
kubectl describe storageclass <storage-class-name>

# Check provisioner logs
kubectl logs -n kube-system -l app=<provisioner-name> -f

CSI Driver Debugging¶

The kubevirt-csi-driver runs in a split topology — the controller is a Deployment in the under cluster's tenant namespace (one per tenant cluster), and the node DaemonSet runs inside the tenant cluster.

# --- Controller side (under cluster) ---
# One Deployment per tenant cluster, named kubevirt-csi-controller-<cluster>
kubectl get deployment -A -l app=kubevirt-csi-controller
kubectl get pods -A -l app=kubevirt-csi-controller

# Controller logs (sidecars: csi-provisioner, csi-attacher, csi-snapshotter, csi-resizer, csi-driver)
kubectl logs -n <tenant-namespace> deployment/kubevirt-csi-controller-<cluster> -c csi-provisioner
kubectl logs -n <tenant-namespace> deployment/kubevirt-csi-controller-<cluster> -c csi-driver

# --- Node side (tenant cluster) ---
kubectl --kubeconfig=<tenant>.kubeconfig -n kube-system get ds kubevirt-csi-node
kubectl --kubeconfig=<tenant>.kubeconfig -n kube-system logs -l app=kubevirt-csi-node -c csi-driver

# Volume attachments (tenant cluster)
kubectl --kubeconfig=<tenant>.kubeconfig get volumeattachment

For non-kubevirt-csi drivers (NFS, Rook-Ceph, vendor CSI, etc.), the controller usually runs in kube-system or a vendor-specific namespace on the under cluster — adapt the selectors accordingly.

Storage Migration¶

Requires a snapshot-capable CSI driver

local-path-provisioner (the chart default) does not support snapshots. To use the example below, install a vendor CSI driver that ships a VolumeSnapshotClass and update volumeSnapshotClassName + storageClassName to match what your driver provides.

# Create snapshot — replace volumeSnapshotClassName with a class your CSI driver provides
kubectl create -f - <<EOF
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: my-snapshot
  namespace: default
spec:
  volumeSnapshotClassName: <your-csi-snapshot-class>
  source:
    persistentVolumeClaimName: my-pvc
EOF

# Restore from snapshot — storageClassName must match a class that exists on the cluster
kubectl create -f - <<EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: restored-pvc
spec:
  storageClassName: local-path        # or whatever class your CSI driver exposes
  dataSource:
    name: my-snapshot
    kind: VolumeSnapshot
    apiGroup: snapshot.storage.k8s.io
  accessModes:

    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
EOF