Storage Configuration¶
Configure storage for kMetal platform and tenant clusters.
Platform Storage¶
The under cluster runs two StorageClasses from a single local-path-provisioner Deployment, with different node-path maps:
StorageClass |
Backing path | Consumer |
|---|---|---|
local-path |
Control-plane nodes only (e.g., /var/data/local on c1) |
System workloads pinned to CP nodes: Kamaji etcd, CDI scratch, platform-side PVs. |
tenant-storage-class |
Worker nodes only (e.g., /var/data/tenant on w1, w2) |
Tenant-served via kubevirt-csi-driver — the tenant CSI controllers create DataVolumes here, and the resulting PVCs bind on whichever worker the importer pod runs on (which matches where the consuming tenant worker VM lives). |
Both SCs use reclaimPolicy: Delete, volumeBindingMode: WaitForFirstConsumer, and allowVolumeExpansion: true.
HelmRelease shape¶
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: local-path-provisioner
namespace: kmetal-flux
spec:
targetNamespace: kmetal-local-path-storage
chart:
spec:
chart: local-path-provisioner
version: 0.0.30
values:
nodeSelector:
node-role.kubernetes.io/control-plane: "" # provisioner pod pinned to CP
storageClassConfigs:
local-path:
storageClass:
create: true
defaultClass: false
defaultVolumeType: hostPath
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
nodePathMap:
- node: c1
paths:
- /var/data/local
- node: DEFAULT_PATH_FOR_NON_LISTED_NODES
paths: [] # ⇒ refuse to bind on non-listed nodes
tenant-storage-class:
storageClass:
create: true
defaultClass: false
defaultVolumeType: hostPath
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
nodePathMap:
- node: w1
paths:
- /var/data/tenant
- node: w2
paths:
- /var/data/tenant
- node: DEFAULT_PATH_FOR_NON_LISTED_NODES
paths: []
Path-map split is the security boundary
The nodePathMap for local-path lists only CP nodes; the map for tenant-storage-class lists only workers. DEFAULT_PATH_FOR_NON_LISTED_NODES: [] (empty path list) tells the provisioner to refuse binding on any other node. This is what stops a tenant PVC from accidentally landing on a CP-node disk, and vice versa.
CDI StorageProfile pin¶
local-path only supports RWO + Filesystem. The auto-derived StorageProfile already lists exactly that, but pinning it explicitly documents the policy in GitOps and survives a future swap to a multi-capability backend (Rook-Ceph, Longhorn, vendor CSI) where CDI's default picks may be wrong:
apiVersion: cdi.kubevirt.io/v1beta1
kind: StorageProfile
metadata:
name: tenant-storage-class
spec:
claimPropertySets:
- accessModes:
- ReadWriteOnce
volumeMode: Filesystem
If you swap tenant-storage-class for a backend that advertises RWX+Block first (such as some SAN-backed CSIs), keep this pin — CDI otherwise picks RWX+Block for blank DataVolume requests and the importer pod fails on raw-block permission denied.
Replacing local-path with a production CSI¶
To run tenant-served storage on a vendor CSI driver instead:
- Install the vendor CSI driver on the under cluster (its own namespace, RBAC, etc.).
- Create a
StorageClassliterally namedtenant-storage-classpointing at the vendor provisioner. - Keep the
CDI StorageProfileabove, adjustingclaimPropertySetsto the access mode the vendor SC reports as default. - Remove the
tenant-storage-classentry from thelocal-path-provisionerstorageClassConfigsso there is no duplicate SC.
The tenant-side kubevirt SC always references infraStorageClassName: tenant-storage-class — it doesn't know which backend serves the SC, so no tenant-cluster change is needed.
NFS Storage¶
Operator's choice
NFS is not bundled with the kMetal chart. The example below uses nfs-subdir-external-provisioner as a common operator pick; substitute whatever NFS CSI driver your environment runs.
# nfs-provisioner.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: nfs-storage
provisioner: nfs.csi.k8s.io
parameters:
server: nfs-server.company.com
share: /exports/kubernetes
mountPermissions: "0755"
reclaimPolicy: Retain
volumeBindingMode: Immediate
mountOptions:
- hard
- nfsvers=4.1
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: nfs-provisioner
namespace: kube-system
spec:
replicas: 2
selector:
matchLabels:
app: nfs-provisioner
template:
metadata:
labels:
app: nfs-provisioner
spec:
serviceAccountName: nfs-provisioner
containers:
- name: nfs-provisioner
image: registry.k8s.io/sig-storage/nfs-subdir-external-provisioner:v4.0.2
volumeMounts:
- name: nfs-client-root
mountPath: /persistentvolumes
env:
- name: PROVISIONER_NAME
value: nfs.csi.k8s.io
- name: NFS_SERVER
value: nfs-server.company.com
- name: NFS_PATH
value: /exports/kubernetes
volumes:
- name: nfs-client-root
nfs:
server: nfs-server.company.com
path: /exports/kubernetes
Tenant Cluster Storage¶
Tenants get persistent storage in their cluster via a kubevirt StorageClass that maps onto the under cluster's tenant-storage-class. See Tenant Storage (CSI) for the full split-topology wiring.
Per-tenant storage quota¶
Cap each tenant's total storage with a ResourceQuota in the tenant's under-cluster namespace, keyed on tenant-storage-class.storageclass.storage.k8s.io/requests.storage:
apiVersion: v1
kind: ResourceQuota
metadata:
name: alpha-storage
namespace: alpha
spec:
hard:
tenant-storage-class.storageclass.storage.k8s.io/requests.storage: 10Gi
The kmetal-webhook Deployment (installed per tenant by the kmetal-webhook chart with webhook.storageClass=tenant-storage-class) pushes a ValidatingWebhookConfiguration into the tenant cluster pointing back at this quota — so tenants see a clear ResourceQuota … exhausted error on PVC creation rather than a delayed DataVolume failure.
# See what tenants are consuming
kubectl get resourcequota -A -o custom-columns=NS:.metadata.namespace,NAME:.metadata.name,USED:.status.used,HARD:.status.hard \
| grep -i storage
# Raise a tenant's cap
kubectl patch resourcequota alpha-storage -n alpha --type=merge \
-p '{"spec":{"hard":{"tenant-storage-class.storageclass.storage.k8s.io/requests.storage":"50Gi"}}}'
CDI on the under cluster¶
KubeVirt's Containerized Data Importer (CDI) handles golden-image imports and DataVolume lifecycle on the under cluster. The CDI operator runs in the system-cdi namespace; it is required for tenant CSI to work because every tenant PVC ultimately becomes a CDI-managed DataVolume on tenant-storage-class. Nothing to configure beyond installing the operator (the chart handles that); admin work is the StorageProfile pin shown above.
Distributed Storage¶
Operator's choice
The under cluster ships only local-path-provisioner by default. Distributed storage is the operator's pick — install Rook-Ceph, Longhorn, a vendor CSI driver, or anything else compatible with the under cluster's Kubernetes version. The examples below are illustrative starting points, not kMetal-specific configuration.
Rook-Ceph¶
# rook-ceph-cluster.yaml
apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
name: rook-ceph
namespace: rook-ceph
spec:
cephVersion:
image: quay.io/ceph/ceph:v18.2.0
dataDirHostPath: /var/lib/rook
mon:
count: 3
allowMultiplePerNode: false
mgr:
count: 2
allowMultiplePerNode: false
dashboard:
enabled: true
ssl: true
storage:
useAllNodes: true
useAllDevices: false
deviceFilter: "^sd[b-z]"
config:
osdsPerDevice: "1"
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: rook-ceph-block
provisioner: rook-ceph.rbd.csi.ceph.com
parameters:
clusterID: rook-ceph
pool: replicapool
imageFormat: "2"
imageFeatures: layering
csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
csi.storage.k8s.io/fstype: ext4
allowVolumeExpansion: true
reclaimPolicy: Delete
Longhorn¶
# longhorn-storage.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: longhorn
annotations:
storageclass.kubernetes.io/is-default-class: "true"
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: Delete
volumeBindingMode: Immediate
parameters:
numberOfReplicas: "3"
staleReplicaTimeout: "2880"
fromBackup: ""
fsType: "ext4"
dataLocality: "disabled"
Object Storage¶
MinIO¶
# minio-deployment.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: minio
namespace: minio-system
spec:
serviceName: minio
replicas: 4
selector:
matchLabels:
app: minio
template:
metadata:
labels:
app: minio
spec:
containers:
- name: minio
image: quay.io/minio/minio:latest
args:
- server
- --console-address
- ":9001"
- http://minio-{0...3}.minio.minio-system.svc.cluster.local/data
env:
- name: MINIO_ROOT_USER
valueFrom:
secretKeyRef:
name: minio-credentials
key: rootUser
- name: MINIO_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: minio-credentials
key: rootPassword
ports:
- containerPort: 9000
name: api
- containerPort: 9001
name: console
volumeMounts:
- name: data
mountPath: /data
resources:
requests:
cpu: "500m"
memory: "1Gi"
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: local-path
resources:
requests:
storage: 100Gi
Backup Storage¶
Operator's choice
kMetal does not bundle a backup tool. The examples below show Velero — a common operator choice — configured against S3-compatible object storage. Adapt to whatever backup tooling your environment uses.
Velero with S3¶
# velero-backup-location.yaml
apiVersion: velero.io/v1
kind: BackupStorageLocation
metadata:
name: default
namespace: velero
spec:
provider: aws
objectStorage:
bucket: kmetal-backups
prefix: velero
config:
region: us-west-2
s3ForcePathStyle: "false"
s3Url: https://s3.us-west-2.amazonaws.com
---
apiVersion: velero.io/v1
kind: VolumeSnapshotLocation
metadata:
name: default
namespace: velero
spec:
provider: aws
config:
region: us-west-2
Velero with MinIO¶
# velero-minio-backup.yaml
apiVersion: velero.io/v1
kind: BackupStorageLocation
metadata:
name: minio
namespace: velero
spec:
provider: aws
objectStorage:
bucket: velero
config:
region: minio
s3ForcePathStyle: "true"
s3Url: http://minio.minio-system:9000
publicUrl: https://minio.company.com
Storage Performance¶
Benchmark Storage¶
# Deploy fio for benchmarking
kubectl run fio-benchmark --rm -it --image=ljishen/fio -- /bin/bash
# Run sequential write test
fio --name=seqwrite --rw=write --bs=1M --size=1G --numjobs=1 --runtime=60 --time_based --filename=/data/test
# Run random write test
fio --name=randwrite --rw=randwrite --bs=4K --size=1G --numjobs=4 --runtime=60 --time_based --filename=/data/test
# Run read test
fio --name=read --rw=read --bs=1M --size=1G --numjobs=1 --runtime=60 --time_based --filename=/data/test
Monitor Storage Usage¶
# Check PV usage
kubectl get pv
# Check PVC usage
kubectl get pvc -A
# Storage capacity
kubectl top pv
# Per-node storage usage
kubectl get --raw /api/v1/nodes/<node-name>/proxy/stats/summary | \
jq '.node.fs'
Storage Troubleshooting¶
Debug PV/PVC Issues¶
# Check PVC status
kubectl get pvc -A
kubectl describe pvc <pvc-name> -n <namespace>
# Check PV binding
kubectl get pv
kubectl describe pv <pv-name>
# Check storage class
kubectl get storageclass
kubectl describe storageclass <storage-class-name>
# Check provisioner logs
kubectl logs -n kube-system -l app=<provisioner-name> -f
CSI Driver Debugging¶
The kubevirt-csi-driver runs in a split topology — the controller is a Deployment in the under cluster's tenant namespace (one per tenant cluster), and the node DaemonSet runs inside the tenant cluster.
# --- Controller side (under cluster) ---
# One Deployment per tenant cluster, named kubevirt-csi-controller-<cluster>
kubectl get deployment -A -l app=kubevirt-csi-controller
kubectl get pods -A -l app=kubevirt-csi-controller
# Controller logs (sidecars: csi-provisioner, csi-attacher, csi-snapshotter, csi-resizer, csi-driver)
kubectl logs -n <tenant-namespace> deployment/kubevirt-csi-controller-<cluster> -c csi-provisioner
kubectl logs -n <tenant-namespace> deployment/kubevirt-csi-controller-<cluster> -c csi-driver
# --- Node side (tenant cluster) ---
kubectl --kubeconfig=<tenant>.kubeconfig -n kube-system get ds kubevirt-csi-node
kubectl --kubeconfig=<tenant>.kubeconfig -n kube-system logs -l app=kubevirt-csi-node -c csi-driver
# Volume attachments (tenant cluster)
kubectl --kubeconfig=<tenant>.kubeconfig get volumeattachment
For non-kubevirt-csi drivers (NFS, Rook-Ceph, vendor CSI, etc.), the controller usually runs in kube-system or a vendor-specific namespace on the under cluster — adapt the selectors accordingly.
Storage Migration¶
Requires a snapshot-capable CSI driver
local-path-provisioner (the chart default) does not support snapshots. To use the example below, install a vendor CSI driver that ships a VolumeSnapshotClass and update volumeSnapshotClassName + storageClassName to match what your driver provides.
# Create snapshot — replace volumeSnapshotClassName with a class your CSI driver provides
kubectl create -f - <<EOF
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
name: my-snapshot
namespace: default
spec:
volumeSnapshotClassName: <your-csi-snapshot-class>
source:
persistentVolumeClaimName: my-pvc
EOF
# Restore from snapshot — storageClassName must match a class that exists on the cluster
kubectl create -f - <<EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: restored-pvc
spec:
storageClassName: local-path # or whatever class your CSI driver exposes
dataSource:
name: my-snapshot
kind: VolumeSnapshot
apiGroup: snapshot.storage.k8s.io
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
EOF