Skip to content

Persistent storage

This guide walks through Rune’s storage subsystem end-to-end: declaring a volume, mounting it, growing a 3-replica stateful set with a claim template, and finally snapshotting and restoring.

If you want the model first, read the storage concept.

Rune seeds two classes on first boot:

Terminal window
$ rune storageclass list
NAME DRIVER DEFAULT RECLAIM AGE
local local true retain 2m
local-host local-host false retain 2m

local is the default. Volumes that omit storageClassName resolve to it.

Declare both resources in the same castfile and apply with one rune cast.

web.yaml
---
volume:
name: web-data
namespace: default
storageClassName: local
size: 5Gi
accessMode: ReadWriteOnce
reclaimPolicy: retain
---
service:
name: web
namespace: default
image: ghcr.io/example/web:1.0.0
scale: 1 # RWO + claim → scale must be 1
ports:
- { name: http, port: 8080 }
volumes:
- name: data
mountPath: /var/lib/web
claim:
name: web-data
Terminal window
rune cast web.yaml
rune volume get web-data
# STATUS: Bound

Restart the service — the data survives:

Terminal window
rune restart web
ls /var/lib/rune/volumes/default/web-data # files still there

A fresh ext4 mount is owned by root:root with mode 0700. Containers that run as a non-root uid (most modern images) hit EACCES on the first write. Tell Rune to chown / chmod the mount root for you:

volumes:
- name: data
mountPath: /var/lib/web
fsUser: 1000 # uid that owns the mount root
fsGroup: 1000 # gid that owns the mount root
fsMode: "0775" # optional; octal string so leading zero is kept
claim:
name: web-data

Applied to the mount root only (subPath ownership is yours to manage), idempotently — Rune skips the chown when ownership already matches, so subsequent reconciles don’t stomp on in-place changes. Works for any driver where Rune owns the mount path (local, do-volume, …); skipped automatically when the operator omits the field, so local-host paths you manage by hand are left alone.

This replaces the older initSteps: chown recipe for the common case — keep initSteps for anything more elaborate than chown.

3. Run a 3-replica stateful set with claimTemplate

Section titled “3. Run a 3-replica stateful set with claimTemplate”

claim shares one volume across the whole service. For per-replica state (databases, queues), use claimTemplate — Rune auto-provisions one volume per replica with stable per-ordinal names.

service:
name: postgres
namespace: prod
image: postgres:16
scale: 3
env:
POSTGRES_PASSWORD: changeme
ports:
- { name: pg, port: 5432 }
volumes:
- name: pgdata
mountPath: /var/lib/postgresql/data
claimTemplate:
size: 10Gi
accessMode: ReadWriteOnce
# storageClassName omitted → resolves to the default class (local)
Terminal window
rune cast postgres.yaml
rune volume list -n prod
# pgdata-postgres-0 local Bound 10Gi ReadWriteOnce
# pgdata-postgres-1 local Bound 10Gi ReadWriteOnce
# pgdata-postgres-2 local Bound 10Gi ReadWriteOnce

The names pgdata-postgres-{0,1,2} are stable: replica 1 always rebinds to pgdata-postgres-1. Scaling down does not reclaim the per-ordinal volumes — they stay Available so a future scale-up reattaches the same data. Only an explicit rune service delete --cascade runs the VolumeCleanupFinalizer and removes the per-replica volumes.

Terminal window
rune snapshot create pgdata-postgres-0 \
--name pgdata-2025-11-15 \
-n prod
rune snapshot get pgdata-2025-11-15 -n prod
# STATUS: Ready

Snapshot drivers vary:

  • local — filesystem copy (cp -a). Synchronous.
  • do-volume — DigitalOcean snapshot API.
  • local-hostnot supported; the API rejects the write.
Terminal window
rune volume restore pgdata-restore \
--from-snapshot pgdata-2025-11-15 \
--snapshot-namespace prod \
--storage-class local \
-n prod

A new Volume row is created and provisioned from the snapshot. Mount it on a sidecar or one-shot job to verify:

service:
name: pg-verify
namespace: prod
image: postgres:16
scale: 1
command: ["sleep", "infinity"]
volumes:
- name: data
mountPath: /var/lib/postgresql/data
claim:
name: pgdata-restore
Terminal window
rune exec pg-verify -- ls /var/lib/postgresql/data

Using local-host for pre-existing host paths

Section titled “Using local-host for pre-existing host paths”

local-host binds an arbitrary pre-existing host directory. The operator must allow-list the root in the runefile:

/etc/rune/runefile.toml
[storage]
hostPathAllowlist = ["/mnt/rune"]
allowCreateMissing = false

Then declare the volume with the host path on parameters:

volume:
name: shared-cache
namespace: default
storageClassName: local-host
size: 0
accessMode: ReadWriteOnce
parameters:
hostPath: /mnt/rune/shared-cache

createIfMissing: "true" on parameters is honoured only when allowCreateMissing = true in the runefile (which is the default in runed --dev-mode).

Using do-volume for DigitalOcean Block Storage

Section titled “Using do-volume for DigitalOcean Block Storage”

The do-volume driver provisions, attaches, snapshots and reclaims DO Block Storage volumes via the DigitalOcean API. End-to-end first-time setup is three steps.

In the DigitalOcean console: API → Tokens → Generate New Token. Choose Custom Scopes (not Full Access) and grant exactly the permissions the driver uses:

ResourceOperations
block_storagecreate, read, delete
block_storage_actioncreate
actionsread
dropletread
block_storage_snapshotcreate, read, delete (omit if you don’t use rune snapshot)

See the service-spec reference’s scope table for the per-endpoint breakdown of what each scope unlocks. The one that’s easy to miss is block_storage_action:create — without it provisioning appears to work and attach silently 401s, leaving the volume stuck Available with the consuming instance pending.

Step 2 — Create a Rune Secret holding the token

Section titled “Step 2 — Create a Rune Secret holding the token”

The driver reads the token from a Rune Secret rather than the runefile so it can rotate without restarting runed. The secret’s data field must be named token:

Terminal window
rune create secret do-api-token \
--from-literal=token=dop_v1_<your_token_here> \
-n shared

Reference the secret on apiToken using the FQDN secret-reference form secret:<name>.<namespace>.rune/<key>. Since StorageClass is cluster-scoped, the FQDN form pins the lookup to one namespace so a single shared secret serves every namespace’s volumes — see the shorthand vs FQDN note for why the shorthand secret:<name>/<key> is the wrong choice here.

DO volumes are region-pinned, so the StorageClass also names the region; for a multi-region cluster create one StorageClass per region.

storageClass:
name: do-volumes-nyc3
driver: do-volume
parameters:
region: nyc3
fsType: ext4
apiToken: secret:do-api-token.shared.rune/token
Terminal window
rune storageclass create -f do-volumes-nyc3.yaml
rune get storageclasses
# NAME DRIVER DEFAULT
# do-volumes-nyc3 do-volume false

Provision a one-off volume to confirm the token and scopes are correct before pointing real workloads at the class:

Terminal window
cat <<'EOF' | rune cast -
volume:
name: do-smoke-test
namespace: default
storageClassName: do-volumes-nyc3
size: "10Gi"
accessMode: ReadWriteOnce
EOF
rune get volume do-smoke-test -n default
# STATUS: Available HANDLE: <do-volume-id>
# Quick attach test using a throw-away service. If this stalls in
# Pending with `dovolume: action ... errored`, the token is missing
# block_storage_action:create or actions:read.
cat <<'EOF' | rune cast -
service:
name: do-smoke
image: alpine:3.19
command: ["sleep", "infinity"]
volumes:
- name: data
mountPath: /data
claim:
name: do-smoke-test
EOF
rune get service do-smoke
# STATUS: Running
rune delete service do-smoke
rune delete volume do-smoke-test -n default

If any step in the verify fails, the storage-resources reference maps each DO API endpoint to the scope it requires and the failure mode you’ll see without it.

Sizing is ceil(bytes / 1e9), not ceil(GiB). DigitalOcean Volumes are sized in decimal GB (10⁹ bytes), Rune’s size: <quantity> field accepts Kubernetes-style binary suffixes (Gi = 2³⁰ bytes). The driver rounds up to the next whole DO GB, so size: 1Gi (1,073,741,824 bytes) provisions a 2 GB DO Volume — and DO bills per-GB-month. Write sizes in plain GB (size: 1G) if you want a 1:1 mapping. Sizes ≥ 10 Gi land within a few percent of the requested amount, so this only bites on tiny volumes.

reclaimPolicy: retain does not reclaim the underlying DO Volume. When the Rune Volume row is deleted, the DO Volume keeps existing (and being billed) until you delete it manually:

Terminal window
doctl compute volume list # find the volume ID
doctl compute volume delete <id>

Use reclaimPolicy: delete on the StorageClass if you want Rune to reap the DO Volume when the Rune Volume row goes away. retain is the safer default for irreplaceable data — make sure it matches your intent before you delete the row.

DO Volumes outlive the droplet they’re attached to — the durability story do-volume exists for. The supported recovery path when terraform apply destroys + recreates the droplet:

  1. Fresh droplet boots; the reserved IP reattaches automatically.
  2. New runed comes up with the same node-role + the same node hostname (the latter is what the driver matches against /v2/droplets?name=… — see the hostname caveat earlier on this page).
  3. Agent’s volumes Subsystem walks every Volume row whose BoundNode matches this node. For each, it calls Driver.Attach against the existing DO Volume ID (the handle on the row).
  4. EnsureFormatted is a no-op — lsblk reports the existing ext4, mkfs is skipped.
  5. The mount target is recreated under /var/lib/rune/mounts/<volume-id>/ and services come up against the existing data.

Caveats:

  • The Volume row’s namespace and name must persist across the rebuild (they’re what BoundNode lookups key on). If you’re seeding the cluster from a fresh state store on the new droplet, also restore the Volume rows (rune cast the same YAML against the new cluster).
  • The droplet’s region must still match the StorageClass region — DO refuses cross-region attaches. If you’re moving regions, you’re doing a snapshot-restore, not a rebuild.
  • Hostname collisions inside one DO account will surface as “no DO droplet matches hostname …” on the first Attach attempt — make sure the new droplet’s hostname is unique.
Terminal window
rune snapshot delete pgdata-2025-11-15 -n prod
rune volume delete pgdata-restore -n prod
rune service delete postgres -n prod --cascade # also removes per-replica volumes

Without --cascade, the per-replica volumes survive the service deletion — that’s the safe default for stateful workloads.

A driver failure marks the volume Failed; the controller retries with backoff and, after exhausting retries, freezes it in Stalled. Fix the underlying problem (allowlist, API token, capacity, …) then drive the controller again:

Terminal window
rune volume retry-provision pgdata-postgres-1 -n prod

If an instance died but the volume is still flagged Bound, break the bind:

Terminal window
rune volume detach pgdata-postgres-1 -n prod