Files
Chris Lu a10607f90a Add Terraform support for VM-based SeaweedFS deployment (#9754)
* terraform: add cloud-agnostic core renderer module

Renders per-node weed argv, systemd units, config files, disk-mount and secret-fetch scripts, and cloud-init from an address map. Creates zero cloud resources. Flags verified against the weed binary: volume uses -mserver for the master list, gRPC is -port.grpc (auto http+10000), minFreeSpacePercent is a string, filer store via -defaultStoreDir.

* terraform: add mTLS and JWT security module

Generates the CA, per-component certs with distinct CNs, and JWT signing keys via the tls/random providers. Emits a core_security object plus PEMs for secret-store delivery.

* terraform: add AWS deployment module and examples

Reserves stable ENIs first, renders config via the core, then creates instances, prevent_destroy EBS data disks mounted at /data, and the cluster security group. With enable_security, generates certs/JWT, stores them in SSM SecureString, grants an instance role, and fetches them at boot so secrets stay out of user_data. Keyed for_each on every stateful tier.

* terraform: add local cluster test harnesses

run_local_cluster.sh and run_local_secure.sh render a cluster with the core and run real weed processes, asserting master quorum, volume registration, filer/s3 round-trips, mutual-TLS formation, and JWT enforcement. Use an isolated high port range with a guard so they never touch a cluster already running on the machine. The weed binary defaults to $(go env GOPATH)/bin/weed.

* terraform: add CI workflow and README

fmt/validate/tofu-test plus smoke jobs that build weed and run both harnesses.

* terraform: guard against empty filesystem UUID in mount script

An empty UUID made grep -q match any fstab line, skipping the fstab entry and breaking the mount. Fail fast when blkid returns no UUID.

* terraform: sanitize cluster name in WEED_CLUSTER env keys

Hyphens or spaces in cluster_name produced invalid systemd/bash env var names; map non-alphanumerics to underscores.

* terraform: omit empty jwt.signing block from security.toml

With enable_security and no JWT key, the template emitted [jwt.signing] key="". Gate the block on a non-empty key and cover it with a test.

* terraform: mark core security input as sensitive

The security object carries JWT signing keys; keep them out of plan output and known values.

* terraform: enforce jwt_length minimum of 32

* terraform: note region/AZ coupling in HA example

* terraform: guard WORKDIR before recursive delete in test harnesses

* terraform: fix README fence language and test count

* terraform: handle embedded s3 with no filer nodes

Indexing sort(keys(var.filers))[0] errored at plan time when embedded S3 was enabled but no filers were defined; fall back to an empty config source.

* terraform: scope kms:Decrypt to a configurable key arn

Replace the hardcoded Resource="*" with a kms_key_arn variable (default "*") so production can restrict decrypt to a specific CMK.

* terraform: encrypt EBS data volumes at rest

Set encrypted = true on the volume/filer data disks and the all-in-one example disk.

* terraform: protect filer instances from API termination

Filers hold the leveldb2 metadata store, so they are stateful and get the same disable_api_termination as masters and volumes.

* terraform: stop instance before detaching in all-in-one example

* terraform: drop stale references to the removed plan doc

* terraform: correct stale mount-step comment in aws module

* terraform: mark Terraform support as experimental in README
2026-05-30 23:43:17 -07:00
..

SeaweedFS on Terraform

Experimental. This Terraform support is an early scaffold under active development. Interfaces (variables, outputs, module layout) may change without notice, and not every tier is implemented yet. Not recommended for production without your own review and testing.

Self-contained Terraform/OpenTofu modules to deploy SeaweedFS on cloud VMs running the weed binary directly under systemd. No Helm and no Kubernetes required.

What works today is verified end-to-end against a real weed cluster (see "Test it locally" below).

Layout

terraform/
  modules/
    core/        cloud-agnostic renderer (ZERO cloud resources): turns an
                 address map + config into per-node weed argv, systemd units,
                 config files, disk-mount + secret-fetch scripts, and cloud-init.
                 Both the cloud wrappers and the local harnesses consume `nodes`.
    security/    cloud-agnostic CA + per-component mTLS certs (distinct CNs) +
                 JWT signing keys (tls/random providers). Emits a `core_security`
                 object ready for the core, plus PEMs for secret-store delivery.
    aws/         thin AWS wrapper: reserves stable ENIs (fixed private IPs)
                 first, feeds them to core, creates instances + protected EBS
                 data disks + SG; with enable_security it generates certs/JWT,
                 stores them in SSM SecureString, grants an instance role, and
                 renders a boot fetch-secrets.sh. Keyed for_each throughout.
  examples/
    aws-ha-distributed/   3 masters + 3 volumes (1/AZ) + 2 filers + 1 S3,
                          secure-by-default (mTLS via SSM-delivered certs)
    aws-all-in-one/       single `weed server` instance via core directly
  test/
    local/         render a cluster with core and run it as real weed processes
                   on 127.0.0.1 (no cloud, no docker), then assert it works
    local-secure/  generate certs/JWT with the security module, run a real
                   mTLS cluster, and assert it forms + enforces JWT auth

Design in one paragraph

The chart is the structural reference, not a dependency. A cloud-agnostic core renders everything portable; thin per-cloud wrappers provision infra. Addressing is an input to the core (wrapper reserves static IPs first), so the wrapper -> core dependency is one-way with no apply-time cycle. Stateful tiers (master/volume/filer) are keyed for_each maps, never count, so a middle node can be replaced without reindexing its peers or reattaching the wrong disk. Flag names are verified against the real weed binary (notably: volume uses -mserver for the master list; gRPC is -port.grpc, auto = http+10000; minFreeSpacePercent is a string).

Test it locally

Requires tofu (or terraform), jq, curl, and a weed binary. Renders a 3-master + volume + filer + S3 cluster from the core module and runs it as real processes, asserting quorum, volume registration, and filer/S3 round-trips:

cd terraform/test/local
WEED=/path/to/weed ./run_local_cluster.sh
# => 7 passed, 0 failed

The harness uses a high port range (29333/28080/28888/28333) so it does not collide with a SeaweedFS cluster already running on the machine, and aborts if a required port is taken. KEEP=1 ./run_local_cluster.sh leaves the cluster up.

mTLS end-to-end

cd terraform/test/local-secure
WEED=/path/to/weed ./run_local_secure.sh
# => generates a CA + component certs + JWT, renders security.toml, runs a real
#    mTLS cluster, asserts master/volume/filer form over mutual TLS and that the
#    filer enforces JWT signing (unsigned writes get 401). 5 passed.

Plan-level tests (no cloud)

cd terraform/modules/core && tofu test
# => 11 passed: peers list, -mserver vs -master, metrics gating,
#    security.toml conditions, all-in-one inheritance, ...

Validate the cloud wrappers

cd terraform/examples/aws-ha-distributed && tofu init && tofu validate

apply needs AWS credentials, a VPC, subnets, and an AMI with weed installed (bake with Packer, or install at boot).

Status

Implemented and verified:

  • Tiers: master / volume / filer / s3 / all-in-one rendered by the core.
  • Disk mount: cloud-init runs a mount-disks.sh that auto-discovers (or takes explicit candidate devices), blkid-guards mkfs, mounts, and persists to /etc/fstab by UUID. The AWS wrapper wires the protected EBS disk to /data.
  • mTLS + JWT: the security module generates the CA + per-component certs (distinct CNs) + JWT keys; the AWS wrapper stores them in SSM SecureString, grants an instance role, and the core renders a boot fetch script so secrets stay OUT of user_data. Proven end-to-end by test/local-secure.

Not yet implemented:

  • sftp / admin / worker tiers (core does not render them yet).
  • GCP/Azure wrappers, the data-plane provider, and the K8s-native module.
  • CA in Vault PKI (currently TF-generated, so the CA key lives in state). KMS decrypt in the IAM policy is scoped by ViaService but uses Resource="*"; tighten to the SSM CMK in production.

Gotchas

  • This weed build starts an Iceberg REST catalog on :8181 by default; set s3.iceberg_port = 0 to disable, or a port to relocate it.
  • Disk auto-discovery assumes a single attached data disk per node. For multiple data disks, pass explicit devices candidates in disk_mounts.

Security note

tofu output/state can contain secrets. Generate secrets outside Terraform (cloud secret manager / Vault) and fetch them at boot; never commit secret material.