Persistent storage
This guide walks through Rune’s storage subsystem end-to-end: declaring a volume, mounting it, growing a 3-replica stateful set with a claim template, and finally snapshotting and restoring.
If you want the model first, read the storage concept.
1. Pick a storage class
Section titled “1. Pick a storage class”Rune seeds two classes on first boot:
$ rune storageclass listNAME DRIVER DEFAULT RECLAIM AGElocal local true retain 2mlocal-host local-host false retain 2mlocal is the default. Volumes that omit storageClassName resolve to it.
2. Mount an existing Volume in a service
Section titled “2. Mount an existing Volume in a service”Declare both resources in the same castfile and apply with one rune cast.
---volume: name: web-data namespace: default storageClassName: local size: 5Gi accessMode: ReadWriteOnce reclaimPolicy: retain---service: name: web namespace: default image: ghcr.io/example/web:1.0.0 scale: 1 # RWO + claim → scale must be 1 ports: - { name: http, port: 8080 } volumes: - name: data mountPath: /var/lib/web claim: name: web-datarune cast web.yamlrune volume get web-data# STATUS: BoundRestart the service — the data survives:
rune restart webls /var/lib/rune/volumes/default/web-data # files still therefsUser / fsGroup / fsMode
Section titled “fsUser / fsGroup / fsMode”A fresh ext4 mount is owned by root:root with mode 0700. Containers
that run as a non-root uid (most modern images) hit EACCES on the
first write. Tell Rune to chown / chmod the mount root for you:
volumes: - name: data mountPath: /var/lib/web fsUser: 1000 # uid that owns the mount root fsGroup: 1000 # gid that owns the mount root fsMode: "0775" # optional; octal string so leading zero is kept claim: name: web-dataApplied to the mount root only (subPath ownership is yours to
manage), idempotently — Rune skips the chown when ownership already
matches, so subsequent reconciles don’t stomp on in-place changes.
Works for any driver where Rune owns the mount path (local,
do-volume, …); skipped automatically when the operator omits the
field, so local-host paths you manage by hand are left alone.
This replaces the older initSteps: chown recipe for the common
case — keep initSteps for anything more elaborate than chown.
3. Run a 3-replica stateful set with claimTemplate
Section titled “3. Run a 3-replica stateful set with claimTemplate”claim shares one volume across the whole service. For per-replica state
(databases, queues), use claimTemplate — Rune auto-provisions one volume per
replica with stable per-ordinal names.
service: name: postgres namespace: prod image: postgres:16 scale: 3 env: POSTGRES_PASSWORD: changeme ports: - { name: pg, port: 5432 } volumes: - name: pgdata mountPath: /var/lib/postgresql/data claimTemplate: size: 10Gi accessMode: ReadWriteOnce # storageClassName omitted → resolves to the default class (local)rune cast postgres.yamlrune volume list -n prod# pgdata-postgres-0 local Bound 10Gi ReadWriteOnce# pgdata-postgres-1 local Bound 10Gi ReadWriteOnce# pgdata-postgres-2 local Bound 10Gi ReadWriteOnceThe names pgdata-postgres-{0,1,2} are stable: replica 1 always rebinds to
pgdata-postgres-1. Scaling down does not reclaim the per-ordinal
volumes — they stay Available so a future scale-up reattaches the same
data. Only an explicit rune service delete --cascade runs the
VolumeCleanupFinalizer and removes the per-replica volumes.
4. Snapshot a volume
Section titled “4. Snapshot a volume”rune snapshot create pgdata-postgres-0 \ --name pgdata-2025-11-15 \ -n prod
rune snapshot get pgdata-2025-11-15 -n prod# STATUS: ReadySnapshot drivers vary:
local— filesystem copy (cp -a). Synchronous.do-volume— DigitalOcean snapshot API.local-host— not supported; the API rejects the write.
5. Restore into a new volume
Section titled “5. Restore into a new volume”rune volume restore pgdata-restore \ --from-snapshot pgdata-2025-11-15 \ --snapshot-namespace prod \ --storage-class local \ -n prodA new Volume row is created and provisioned from the snapshot. Mount it on a
sidecar or one-shot job to verify:
service: name: pg-verify namespace: prod image: postgres:16 scale: 1 command: ["sleep", "infinity"] volumes: - name: data mountPath: /var/lib/postgresql/data claim: name: pgdata-restorerune exec pg-verify -- ls /var/lib/postgresql/dataUsing local-host for pre-existing host paths
Section titled “Using local-host for pre-existing host paths”local-host binds an arbitrary pre-existing host directory. The operator
must allow-list the root in the runefile:
[storage]hostPathAllowlist = ["/mnt/rune"]allowCreateMissing = falseThen declare the volume with the host path on parameters:
volume: name: shared-cache namespace: default storageClassName: local-host size: 0 accessMode: ReadWriteOnce parameters: hostPath: /mnt/rune/shared-cachecreateIfMissing: "true" on parameters is honoured only when
allowCreateMissing = true in the runefile (which is the default in
runed --dev-mode).
Using do-volume for DigitalOcean Block Storage
Section titled “Using do-volume for DigitalOcean Block Storage”The do-volume driver provisions, attaches, snapshots and reclaims
DO Block Storage volumes via the DigitalOcean API. End-to-end first-time
setup is three steps.
Step 1 — Mint a scoped DO API token
Section titled “Step 1 — Mint a scoped DO API token”In the DigitalOcean console: API → Tokens → Generate New Token. Choose Custom Scopes (not Full Access) and grant exactly the permissions the driver uses:
| Resource | Operations |
|---|---|
block_storage | create, read, delete |
block_storage_action | create |
actions | read |
droplet | read |
block_storage_snapshot | create, read, delete (omit if you don’t use rune snapshot) |
See the service-spec reference’s scope table
for the per-endpoint breakdown of what each scope unlocks. The one
that’s easy to miss is block_storage_action:create — without it
provisioning appears to work and attach silently 401s, leaving the
volume stuck Available with the consuming instance pending.
Step 2 — Create a Rune Secret holding the token
Section titled “Step 2 — Create a Rune Secret holding the token”The driver reads the token from a Rune Secret rather than the
runefile so it can rotate without restarting runed. The secret’s
data field must be named token:
rune create secret do-api-token \ --from-literal=token=dop_v1_<your_token_here> \ -n sharedStep 3 — Create the StorageClass
Section titled “Step 3 — Create the StorageClass”Reference the secret on apiToken using the FQDN secret-reference
form secret:<name>.<namespace>.rune/<key>. Since StorageClass is
cluster-scoped, the FQDN form pins the lookup to one namespace so a
single shared secret serves every namespace’s volumes — see the
shorthand vs FQDN note
for why the shorthand secret:<name>/<key> is the wrong choice here.
DO volumes are region-pinned, so the StorageClass also names the region; for a multi-region cluster create one StorageClass per region.
storageClass: name: do-volumes-nyc3 driver: do-volume parameters: region: nyc3 fsType: ext4 apiToken: secret:do-api-token.shared.rune/tokenrune storageclass create -f do-volumes-nyc3.yamlrune get storageclasses# NAME DRIVER DEFAULT# do-volumes-nyc3 do-volume falseVerify
Section titled “Verify”Provision a one-off volume to confirm the token and scopes are correct before pointing real workloads at the class:
cat <<'EOF' | rune cast -volume: name: do-smoke-test namespace: default storageClassName: do-volumes-nyc3 size: "10Gi" accessMode: ReadWriteOnceEOF
rune get volume do-smoke-test -n default# STATUS: Available HANDLE: <do-volume-id>
# Quick attach test using a throw-away service. If this stalls in# Pending with `dovolume: action ... errored`, the token is missing# block_storage_action:create or actions:read.cat <<'EOF' | rune cast -service: name: do-smoke image: alpine:3.19 command: ["sleep", "infinity"] volumes: - name: data mountPath: /data claim: name: do-smoke-testEOFrune get service do-smoke# STATUS: Running
rune delete service do-smokerune delete volume do-smoke-test -n defaultIf any step in the verify fails, the storage-resources reference maps each DO API endpoint to the scope it requires and the failure mode you’ll see without it.
Two do-volume gotchas
Section titled “Two do-volume gotchas”Sizing is ceil(bytes / 1e9), not ceil(GiB). DigitalOcean
Volumes are sized in decimal GB (10⁹ bytes), Rune’s size: <quantity>
field accepts Kubernetes-style binary suffixes (Gi = 2³⁰ bytes). The
driver rounds up to the next whole DO GB, so size: 1Gi
(1,073,741,824 bytes) provisions a 2 GB DO Volume — and DO bills
per-GB-month. Write sizes in plain GB (size: 1G) if you want a 1:1
mapping. Sizes ≥ 10 Gi land within a few percent of the requested
amount, so this only bites on tiny volumes.
reclaimPolicy: retain does not reclaim the underlying DO Volume.
When the Rune Volume row is deleted, the DO Volume keeps existing
(and being billed) until you delete it manually:
doctl compute volume list # find the volume IDdoctl compute volume delete <id>Use reclaimPolicy: delete on the StorageClass if you want Rune to
reap the DO Volume when the Rune Volume row goes away. retain is
the safer default for irreplaceable data — make sure it matches your
intent before you delete the row.
Surviving a droplet rebuild
Section titled “Surviving a droplet rebuild”DO Volumes outlive the droplet they’re attached to — the durability
story do-volume exists for. The supported recovery path when
terraform apply destroys + recreates the droplet:
- Fresh droplet boots; the reserved IP reattaches automatically.
- New
runedcomes up with the samenode-role+ the same node hostname (the latter is what the driver matches against/v2/droplets?name=…— see the hostname caveat earlier on this page). - Agent’s volumes Subsystem walks every Volume row whose
BoundNodematches this node. For each, it callsDriver.Attachagainst the existing DO Volume ID (thehandleon the row). EnsureFormattedis a no-op —lsblkreports the existingext4, mkfs is skipped.- The mount target is recreated under
/var/lib/rune/mounts/<volume-id>/and services come up against the existing data.
Caveats:
- The Volume row’s
namespaceandnamemust persist across the rebuild (they’re what BoundNode lookups key on). If you’re seeding the cluster from a fresh state store on the new droplet, also restore the Volume rows (rune castthe same YAML against the new cluster). - The droplet’s region must still match the StorageClass region — DO refuses cross-region attaches. If you’re moving regions, you’re doing a snapshot-restore, not a rebuild.
- Hostname collisions inside one DO account will surface as “no DO droplet matches hostname …” on the first Attach attempt — make sure the new droplet’s hostname is unique.
Cleaning up
Section titled “Cleaning up”rune snapshot delete pgdata-2025-11-15 -n prodrune volume delete pgdata-restore -n prodrune service delete postgres -n prod --cascade # also removes per-replica volumesWithout --cascade, the per-replica volumes survive the service deletion —
that’s the safe default for stateful workloads.
When provisioning fails
Section titled “When provisioning fails”A driver failure marks the volume Failed; the controller retries with
backoff and, after exhausting retries, freezes it in Stalled. Fix the
underlying problem (allowlist, API token, capacity, …) then drive the
controller again:
rune volume retry-provision pgdata-postgres-1 -n prodIf an instance died but the volume is still flagged Bound, break the bind:
rune volume detach pgdata-postgres-1 -n prod