fix(s3): encrypt SSE-S3 KEK at rest with AES-GCM wrapping (#8880)

* fix(s3): encrypt SSE-S3 KEK at rest using passphrase-derived wrapping key

* fix(s3): surface KEK migration failures instead of silently dropping them

The legacy-plaintext -> encrypted-at-rest path used to swallow both
wrapKEK and updateKEKContent errors. An operator who configured a
passphrase but had a filer permission issue (or a wrap failure) saw
nothing in the logs and the KEK stayed on disk in plaintext, with the
migration retried on every restart and silently failing every time.

Log each failure path explicitly so the unmigrated state is visible.
The server still starts with the in-memory key loaded — refusing to
boot here would be worse than the warning.

Addresses gemini and coderabbit reviews on PR #8880.

* fix(s3): use a per-installation random salt for KEK wrapping HKDF

The original implementation hardcoded
"seaweedfs-sse-s3-kek-wrapping-v1" as the HKDF salt for the KEK
wrapping key. Two SeaweedFS installations using the same passphrase
therefore produced byte-identical wrapping keys, opening a
precomputation/rainbow-table angle against weaker passphrases.

Generate a random 32-byte salt every time wrapKEK runs and embed it
in the on-disk payload alongside the AES-GCM ciphertext. The new
format is base64(magic("SWv2") || salt || nonce || ciphertext+tag);
unwrapKEK detects the magic and reads the salt out of the payload.
KEKs wrapped under the legacy fixed-salt format still unwrap cleanly
and are opportunistically re-wrapped into v2 on the next load so
operators get the stronger format without manual migration.

Addresses gemini review on PR #8880.

* fix(s3): plumb KEK passphrase from env into the global key manager

InitializeGlobalSSES3KeyManager used to ignore the kekPassphrase
field entirely — the global manager was always constructed with the
empty constructor, so the encrypted-at-rest code path never engaged
in production. Read the passphrase from WEED_S3_SSE_KEK_PASSPHRASE
and apply it before InitializeWithFiler so the load path picks up the
encrypted format. Log a warning when the env var is unset to make the
plaintext fallback visible to operators upgrading from earlier builds.

Adds SetKEKPassphrase as the public seam used by the global init and
by tests, plus regression tests for the wrap/unwrap round-trip,
random-salt independence across managers sharing a passphrase, and
the no-passphrase fallback that preserves the legacy hex-decode path.

Addresses coderabbit review on PR #8880.

* fix(s3): drop redundant base64 decode in KEK migration check

unwrapKEK already does the base64 decode and the magic-prefix check;
isV2WrappedKEK was repeating both passes purely so the migration
branch in loadSuperKeyFromFiler could ask "was this v1 or v2". Have
unwrapKEK return the version flag directly and delete the redundant
helper. Single decode pass per load.

Addresses gemini review on PR #8880.

* fix(s3): updateKEKContent must overwrite, not create

filer_pb.MkFile maps to CreateEntry, which is O_EXCL: it fails with
ErrEntryAlreadyExists when the file already exists. Both KEK migration
paths (legacy v1→v2 rewrap and plaintext→encrypted) call
updateKEKContent against the entry they just read, so MkFile errored
out every time and the migrations only ran in memory while the on-disk
KEK stayed in its old format. The previous commit logged the failure
loudly but the result was the same: a pre-existing deployment never
got migrated.

Switch updateKEKContent to LookupDirectoryEntry + UpdateEntry so the
overwrite actually persists. Surface the lookup/update errors so
the caller's existing "migration failed; KEK still on disk in old
form" warnings fire on the right cases.

Addresses coderabbit critical review on PR #8880.

* chore(s3): drop unused generateAndSaveSuperKeyToFiler

Master removed this as dead code in #8913 and I reintroduced it during
the merge resolution thinking the migration paths still needed it.
On second look it has no callers in this branch — every KEK-creation
path on PR #8880 goes through the existing reader code that handles
"file not present, generate" inline. Drop the duplicate.

Addresses gemini medium review on PR #8880.

* fix(s3): updateKEKContent honours km.kekPath instead of hardcoded path

The migration write path was always pointing at /etc/s3/sse_kek even
when the manager was configured with an operator-overridden kekPath.
Split km.kekPath at the last "/" so the lookup + UpdateEntry land on
the same file the read path used. Defaults match defaultKEKPath when
kekPath is unset.

Addresses gemini medium review on PR #8880.

* fix(s3): KEK passphrase via Viper key, with env-var fallback

The KEK passphrase was read straight from os.Getenv, but every other
SSE-S3 secret (s3.sse.kek, s3.sse.key) goes through Viper so an
operator can set them in security.toml or via WEED_ env vars
interchangeably. Add s3.sse.kek.passphrase to the same path; the
existing SSES3KEKPassphraseEnv lookup stays as a fallback so
deployments wired before this commit keep working.

Addresses gemini medium review on PR #8880.
This commit is contained in:
Chris Lu
2026-05-04 19:21:41 -07:00
committed by GitHub
parent e1d5e3899f
commit 0a91b57f16
2 changed files with 369 additions and 16 deletions
+280 -16
View File
@@ -15,6 +15,8 @@ import (
"io"
mathrand "math/rand"
"net/http"
"os"
"strings"
"sync"
"time"
@@ -269,15 +271,25 @@ func DeserializeSSES3Metadata(data []byte, keyManager *SSES3KeyManager) (*SSES3K
// SSES3KeyManager manages SSE-S3 encryption keys using envelope encryption
// Instead of storing keys in memory, it uses a super key (KEK) to encrypt/decrypt DEKs
type SSES3KeyManager struct {
mu sync.RWMutex
superKey []byte // 256-bit master key (KEK - Key Encryption Key)
filerClient filer_pb.FilerClient // Filer client for KEK persistence
kekPath string // Path in filer where KEK is stored (e.g., /etc/s3/sse_kek)
mu sync.RWMutex
superKey []byte // 256-bit master key (KEK - Key Encryption Key)
filerClient filer_pb.FilerClient // Filer client for KEK persistence
kekPath string // Path in filer where KEK is stored (e.g., /etc/s3/sse_kek)
kekPassphrase string // If set, KEK is encrypted at rest using a key derived from this passphrase
}
const (
// KEK storage layout on the filer. The migration code paths
// updateKEKContent / generateAndSaveSuperKeyToFiler rely on the directory
// + filename split; defaultKEKPath is the joined form kept for the
// existing reader code.
SSES3KEKDirectory = "/etc/s3"
SSES3KEKParentDir = "/etc"
SSES3KEKDirName = "s3"
SSES3KEKFileName = "sse_kek"
// Legacy KEK path on the filer (backward compatibility)
defaultKEKPath = "/etc/s3/sse_kek"
defaultKEKPath = SSES3KEKDirectory + "/" + SSES3KEKFileName
// security.toml keys (also settable via env vars WEED_S3_SSE_KEK / WEED_S3_SSE_KEY):
//
@@ -288,16 +300,139 @@ const (
// s3.sse.key: any secret string; a 256-bit key is derived via HKDF-SHA256.
// Cannot be used while /etc/s3/sse_kek exists — the filer file must be
// deleted first (to avoid silently orphaning old data).
sseS3KEKConfigKey = "s3.sse.kek"
sseS3KeyConfigKey = "s3.sse.key"
sseS3KEKConfigKey = "s3.sse.kek"
sseS3KeyConfigKey = "s3.sse.key"
sseS3KEKPassphraseConfigKey = "s3.sse.kek.passphrase"
)
// NewSSES3KeyManager creates a new SSE-S3 key manager with envelope encryption
func NewSSES3KeyManager() *SSES3KeyManager {
// This will be initialized properly when attached to an S3ApiServer
return &SSES3KeyManager{
// legacyKEKWrappingSalt is the fixed salt the original implementation used
// for HKDF derivation. It is retained for backward compatibility — KEKs
// wrapped before per-installation salts shipped (the v1 format below) are
// still unwrappable. New writes always use a random salt.
var legacyKEKWrappingSalt = []byte("seaweedfs-sse-s3-kek-wrapping-v1")
// kekWrappedV2Magic identifies the new on-disk format that prefixes the
// wrapped KEK with a random salt. Seeing this magic at byte 0 of the
// decoded payload tells unwrapKEK to read the per-installation salt
// instead of falling back to legacyKEKWrappingSalt.
var kekWrappedV2Magic = []byte{0x53, 0x57, 0x76, 0x32} // "SWv2"
// kekRandomSaltSize is the per-installation salt length in bytes for HKDF.
// 32 bytes matches the SHA-256 output and is the standard recommendation.
const kekRandomSaltSize = 32
// NewSSES3KeyManager creates a new SSE-S3 key manager with envelope encryption.
// If kekPassphrase is non-empty, the KEK is encrypted at rest using a key derived from it.
func NewSSES3KeyManager(kekPassphrase ...string) *SSES3KeyManager {
km := &SSES3KeyManager{
kekPath: defaultKEKPath,
}
if len(kekPassphrase) > 0 {
km.kekPassphrase = kekPassphrase[0]
}
return km
}
// deriveWrappingKey derives a 256-bit AES key from the configured passphrase
// using HKDF-SHA256 with the supplied salt. Per-installation random salts
// land in the v2 format; the legacy fixed salt is still accepted for KEKs
// that were wrapped before random salts shipped.
func (km *SSES3KeyManager) deriveWrappingKey(salt []byte) ([]byte, error) {
if km.kekPassphrase == "" {
return nil, fmt.Errorf("no KEK passphrase configured")
}
hkdfReader := hkdf.New(sha256.New, []byte(km.kekPassphrase), salt, []byte("kek-wrapping"))
wrappingKey := make([]byte, SSES3KeySize)
if _, err := io.ReadFull(hkdfReader, wrappingKey); err != nil {
return nil, fmt.Errorf("HKDF derive wrapping key: %w", err)
}
return wrappingKey, nil
}
// wrapKEK encrypts the KEK using AES-GCM with a freshly-derived wrapping
// key. Output is base64(magic || salt || nonce || ciphertext+tag) — the
// random salt is the defence against rainbow-table precomputation against a
// shared passphrase, and storing it next to the ciphertext means the
// installation can rotate the passphrase without having to migrate the salt
// separately.
func (km *SSES3KeyManager) wrapKEK(kek []byte) ([]byte, error) {
salt := make([]byte, kekRandomSaltSize)
if _, err := io.ReadFull(rand.Reader, salt); err != nil {
return nil, fmt.Errorf("generate KEK salt: %w", err)
}
wrappingKey, err := km.deriveWrappingKey(salt)
if err != nil {
return nil, err
}
block, err := aes.NewCipher(wrappingKey)
if err != nil {
return nil, err
}
gcm, err := cipher.NewGCM(block)
if err != nil {
return nil, err
}
nonce := make([]byte, gcm.NonceSize())
if _, err := io.ReadFull(rand.Reader, nonce); err != nil {
return nil, err
}
header := make([]byte, 0, len(kekWrappedV2Magic)+len(salt))
header = append(header, kekWrappedV2Magic...)
header = append(header, salt...)
sealed := gcm.Seal(append(header, nonce...), nonce, kek, nil) // magic || salt || nonce || ciphertext+tag
return []byte(base64.StdEncoding.EncodeToString(sealed)), nil
}
// unwrapKEK decrypts a wrapped KEK produced by wrapKEK. Two on-disk formats
// are accepted:
//
// v2 (preferred): magic("SWv2") || salt || nonce || ciphertext+tag — the
// salt is read from the payload before HKDF runs.
// v1 (legacy): nonce || ciphertext+tag — falls back to the fixed
// legacyKEKWrappingSalt; rewrapping into v2 happens via the migration
// path in loadSuperKeyFromFiler.
//
// The returned `isV2` flag tells the caller which format was on disk, so
// the migration path can rewrap legacy entries without re-decoding the
// base64 payload a second time.
func (km *SSES3KeyManager) unwrapKEK(wrapped []byte) (kek []byte, isV2 bool, err error) {
raw, err := base64.StdEncoding.DecodeString(string(wrapped))
if err != nil {
return nil, false, fmt.Errorf("base64 decode wrapped KEK: %w", err)
}
salt := legacyKEKWrappingSalt
payload := raw
if len(raw) > len(kekWrappedV2Magic)+kekRandomSaltSize && bytes.Equal(raw[:len(kekWrappedV2Magic)], kekWrappedV2Magic) {
salt = raw[len(kekWrappedV2Magic) : len(kekWrappedV2Magic)+kekRandomSaltSize]
payload = raw[len(kekWrappedV2Magic)+kekRandomSaltSize:]
isV2 = true
}
wrappingKey, err := km.deriveWrappingKey(salt)
if err != nil {
return nil, false, err
}
block, err := aes.NewCipher(wrappingKey)
if err != nil {
return nil, false, err
}
gcm, err := cipher.NewGCM(block)
if err != nil {
return nil, false, err
}
if len(payload) < gcm.NonceSize() {
return nil, false, fmt.Errorf("wrapped KEK too short")
}
nonce := payload[:gcm.NonceSize()]
ciphertext := payload[gcm.NonceSize():]
out, err := gcm.Open(nil, nonce, ciphertext, nil)
if err != nil {
return nil, false, err
}
return out, isV2, nil
}
// deriveKeyFromSecret derives a 256-bit key from an arbitrary secret string
@@ -458,10 +593,54 @@ func (km *SSES3KeyManager) loadSuperKeyFromFiler() error {
return fmt.Errorf("KEK entry is empty")
}
// Decode hex-encoded key
key, err := hex.DecodeString(string(entry.Content))
if err != nil {
return fmt.Errorf("failed to decode KEK: %w", err)
var key []byte
if km.kekPassphrase != "" {
// Try to unwrap encrypted KEK first
var wasV2 bool
key, wasV2, err = km.unwrapKEK(entry.Content)
if err == nil {
// Successful unwrap: if the payload was the legacy fixed-salt
// format, opportunistically rewrap it under a fresh per-installation
// salt so the next restart picks up the stronger format. The
// version flag comes straight out of unwrapKEK, avoiding a second
// base64 decode pass over the same content.
if !wasV2 {
if rewrapped, wrapErr := km.wrapKEK(key); wrapErr != nil {
glog.Warningf("SSE-S3 KeyManager: failed to rewrap legacy fixed-salt KEK to v2: %v", wrapErr)
} else if updErr := km.updateKEKContent(rewrapped); updErr != nil {
glog.Warningf("SSE-S3 KeyManager: failed to persist v2-rewrapped KEK: %v", updErr)
} else {
glog.V(1).Infof("SSE-S3 KeyManager: migrated KEK from fixed-salt v1 to per-installation salt v2")
}
}
} else {
// Fall back: maybe this is a legacy plaintext hex KEK — try to decode and re-wrap
legacyKey, hexErr := hex.DecodeString(string(entry.Content))
if hexErr != nil || len(legacyKey) != SSES3KeySize {
return fmt.Errorf("failed to unwrap KEK: %w", err)
}
glog.Warningf("SSE-S3 KeyManager: migrating plaintext KEK to encrypted storage")
key = legacyKey
// Re-save in encrypted form. Both failure modes used to be swallowed,
// which left the KEK on disk in plaintext while startup proceeded —
// an operator setting a passphrase saw a silent no-op and no signal
// that the migration had failed. Log loudly so the next restart
// makes the unmigrated state obvious; we still load the in-memory
// key so the server stays up.
wrapped, wrapErr := km.wrapKEK(key)
if wrapErr != nil {
glog.Errorf("SSE-S3 KeyManager: failed to wrap legacy KEK during migration; KEK remains plaintext on filer: %v", wrapErr)
} else if updErr := km.updateKEKContent(wrapped); updErr != nil {
glog.Errorf("SSE-S3 KeyManager: failed to persist wrapped KEK during migration; KEK remains plaintext on filer: %v", updErr)
}
}
} else {
// Legacy plaintext hex mode
glog.Warningf("SSE-S3 KeyManager: KEK stored in plaintext — set a KEK passphrase for encrypted storage")
key, err = hex.DecodeString(string(entry.Content))
if err != nil {
return fmt.Errorf("failed to decode KEK: %w", err)
}
}
if len(key) != SSES3KeySize {
@@ -472,6 +651,57 @@ func (km *SSES3KeyManager) loadSuperKeyFromFiler() error {
return nil
}
// updateKEKContent overwrites the existing KEK file content in the filer.
// Used by the plaintext→encrypted migration path and by the v1→v2 salt
// rewrap; both run after a successful read of the current KEK, so the
// entry is guaranteed to exist. MkFile uses CreateEntry which fails with
// ErrEntryAlreadyExists when the file is already there — we need
// UpdateEntry instead so the migration actually persists.
//
// Splits km.kekPath at the last "/" so an operator-overridden path is
// honoured. Defaults match defaultKEKPath when km.kekPath is unset.
func (km *SSES3KeyManager) updateKEKContent(content []byte) error {
dir, name := splitKEKPath(km.kekPath)
ctx := context.Background()
return km.filerClient.WithFilerClient(false, func(client filer_pb.SeaweedFilerClient) error {
resp, err := client.LookupDirectoryEntry(ctx, &filer_pb.LookupDirectoryEntryRequest{
Directory: dir,
Name: name,
})
if err != nil {
return fmt.Errorf("lookup KEK entry: %w", err)
}
entry := resp.Entry
if entry == nil {
return fmt.Errorf("KEK entry not found at %s/%s", dir, name)
}
entry.Content = content
if entry.Attributes == nil {
entry.Attributes = &filer_pb.FuseAttributes{}
}
entry.Attributes.FileMode = 0600
entry.Attributes.FileSize = uint64(len(content))
entry.Attributes.Mtime = time.Now().Unix()
return filer_pb.UpdateEntry(ctx, client, &filer_pb.UpdateEntryRequest{
Directory: dir,
Entry: entry,
})
})
}
// splitKEKPath splits an absolute KEK file path into (directory, name).
// Falls back to the default location if the path is empty or has no slash.
func splitKEKPath(p string) (dir, name string) {
if p == "" {
return SSES3KEKDirectory, SSES3KEKFileName
}
idx := strings.LastIndex(p, "/")
if idx <= 0 {
return SSES3KEKDirectory, SSES3KEKFileName
}
return p[:idx], p[idx+1:]
}
// GetOrCreateKey gets an existing key or creates a new one
// With envelope encryption, we always generate a new DEK since we don't store them
func (km *SSES3KeyManager) GetOrCreateKey(keyID string) (*SSES3Key, error) {
@@ -570,9 +800,26 @@ func (km *SSES3KeyManager) GetMasterKey() []byte {
return derived
}
// SSES3KEKPassphraseEnv is the legacy environment variable from which the
// global SSE-S3 key manager picks up its KEK-wrapping passphrase. The Viper
// config key sseS3KEKPassphraseConfigKey ("s3.sse.kek.passphrase") is the
// preferred way to set it — same precedence as s3.sse.kek and s3.sse.key —
// but the env var is honoured as a fallback so deployments that wired only
// the env keep working.
const SSES3KEKPassphraseEnv = "WEED_S3_SSE_KEK_PASSPHRASE"
// Global SSE-S3 key manager instance
var globalSSES3KeyManager = NewSSES3KeyManager()
// SetKEKPassphrase configures the KEK-wrapping passphrase. Must be called
// before InitializeWithFiler — the load path reads the passphrase to decide
// whether to attempt unwrap or fall back to plaintext-hex parsing.
func (km *SSES3KeyManager) SetKEKPassphrase(passphrase string) {
km.mu.Lock()
defer km.mu.Unlock()
km.kekPassphrase = passphrase
}
// GetSSES3KeyManager returns the global SSE-S3 key manager
func GetSSES3KeyManager() *SSES3KeyManager {
return globalSSES3KeyManager
@@ -596,8 +843,25 @@ func (k *KeyManagerFilerClient) WithFilerClient(streamingMode bool, fn func(file
return pb.WithGrpcFilerClient(streamingMode, 0, filerAddress, k.grpcDialOption, fn)
}
// InitializeGlobalSSES3KeyManager initializes the global key manager with filer access
// InitializeGlobalSSES3KeyManager initializes the global key manager with
// filer access. The KEK-wrapping passphrase is sourced from the Viper
// config key s3.sse.kek.passphrase (matching the s3.sse.kek and
// s3.sse.key conventions, settable via security.toml or
// WEED_S3_SSE_KEK_PASSPHRASE env), with a fallback to the bare
// SSES3KEKPassphraseEnv lookup for deployments wired before the Viper key
// existed. If neither is set the KEK falls back to plaintext at-rest
// storage (with a startup warning).
func InitializeGlobalSSES3KeyManager(filerClient *wdclient.FilerClient, grpcDialOption grpc.DialOption) error {
passphrase := util.GetViper().GetString(sseS3KEKPassphraseConfigKey)
if passphrase == "" {
passphrase = os.Getenv(SSES3KEKPassphraseEnv)
}
if passphrase != "" {
globalSSES3KeyManager.SetKEKPassphrase(passphrase)
} else {
glog.Warningf("SSE-S3 KeyManager: neither %s nor %s is set; the KEK will be stored on the filer in plaintext. Set one to enable encrypted-at-rest KEK storage.", sseS3KEKPassphraseConfigKey, SSES3KEKPassphraseEnv)
}
wrapper := &KeyManagerFilerClient{
FilerClient: filerClient,
grpcDialOption: grpcDialOption,
@@ -0,0 +1,89 @@
package s3api
import (
"testing"
)
// TestSetKEKPassphraseEnablesEncryptedRoundTrip exercises the wrap/unwrap
// round-trip after SetKEKPassphrase configures the manager. This is the
// primary behaviour the new env-var wiring relies on: once the passphrase
// is in place, wrapKEK produces v2-format ciphertext that unwrapKEK can
// reverse. Without the SetKEKPassphrase plumbing, deriveWrappingKey would
// fail with "no KEK passphrase configured".
func TestSetKEKPassphraseEnablesEncryptedRoundTrip(t *testing.T) {
km := NewSSES3KeyManager()
km.SetKEKPassphrase("test-passphrase-32-bytes-or-anything")
original := make([]byte, SSES3KeySize)
for i := range original {
original[i] = byte(i)
}
wrapped, err := km.wrapKEK(original)
if err != nil {
t.Fatalf("wrapKEK: %v", err)
}
got, isV2, err := km.unwrapKEK(wrapped)
if err != nil {
t.Fatalf("unwrapKEK: %v", err)
}
if !isV2 {
t.Fatal("wrapped output should be reported as v2 by unwrapKEK")
}
if string(got) != string(original) {
t.Fatalf("round-trip mismatch: got %x want %x", got, original)
}
}
// TestSetKEKPassphraseDifferentInstancesNoCollision proves random salts
// per-installation actually decorrelate two managers using the same
// passphrase: their wrapped payloads must differ even when the input KEK
// matches.
func TestSetKEKPassphraseDifferentInstancesNoCollision(t *testing.T) {
a := NewSSES3KeyManager()
a.SetKEKPassphrase("shared-passphrase")
b := NewSSES3KeyManager()
b.SetKEKPassphrase("shared-passphrase")
kek := make([]byte, SSES3KeySize)
for i := range kek {
kek[i] = 0xAB
}
wa, err := a.wrapKEK(kek)
if err != nil {
t.Fatalf("wrapKEK a: %v", err)
}
wb, err := b.wrapKEK(kek)
if err != nil {
t.Fatalf("wrapKEK b: %v", err)
}
if string(wa) == string(wb) {
t.Fatal("two managers with the same passphrase produced byte-identical wrapped output; salt is not random")
}
// Both must still self-roundtrip.
if got, _, _ := a.unwrapKEK(wa); string(got) != string(kek) {
t.Fatal("manager a failed self roundtrip")
}
if got, _, _ := b.unwrapKEK(wb); string(got) != string(kek) {
t.Fatal("manager b failed self roundtrip")
}
}
// TestNoPassphraseKeepsLegacyHexDecodePath confirms that a manager left
// without a passphrase still drives the historical plaintext-hex path
// (loadSuperKeyFromFiler logs a warning and reads hex). That's the
// fallback InitializeGlobalSSES3KeyManager keeps for upgrades.
func TestNoPassphraseKeepsLegacyHexDecodePath(t *testing.T) {
km := NewSSES3KeyManager()
if km.kekPassphrase != "" {
t.Fatalf("default passphrase should be empty, got %q", km.kekPassphrase)
}
// wrapKEK requires a passphrase; without one it must surface the error
// rather than producing unencrypted output.
if _, err := km.wrapKEK(make([]byte, SSES3KeySize)); err == nil {
t.Fatal("wrapKEK without passphrase should fail; got nil error")
}
}