From 9ae905e4567656d0d8c98d41d913d5c404743a41 Mon Sep 17 00:00:00 2001 From: Chris Lu Date: Tue, 21 Apr 2026 20:20:11 -0700 Subject: [PATCH] feat(security): hot-reload HTTPS certs without restart (k8s cert-manager) (#9181) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * feat(security): hot-reload HTTPS certs for master/volume/filer/webdav/admin S3 and filer already use a refreshing pemfile provider for their HTTPS cert, so rotated certificates (e.g. from k8s cert-manager) are picked up without a restart. Master, volume, webdav, and admin, however, passed cert/key paths straight to ServeTLS/ListenAndServeTLS and loaded once at startup — rotating those certs required a pod restart. Add a small helper NewReloadingServerCertificate in weed/security that wraps pemfile.Provider and returns a tls.Config.GetCertificate closure, then wire it into the four remaining HTTPS entry points. httpdown now also calls ServeTLS when TLSConfig carries a GetCertificate/Certificates but CertFile/KeyFile are empty, so volume server can pre-populate TLSConfig. A unit test exercises the rotation path (write cert, rotate on disk, assert the callback returns the new cert) with a short refresh window. * refactor(security): route filer/s3 HTTPS through the shared cert reloader Before: filer.go and s3.go each kept a *certprovider.Provider on the options struct plus a duplicated GetCertificateWithUpdate method. Both were loading pemfile themselves. Behaviorally they already reloaded, but the logic was duplicated two ways and neither path was shared with the newly-added master/volume/webdav/admin wiring. After: both use security.NewReloadingServerCertificate like the other servers. The per-struct certProvider field and GetCertificateWithUpdate method are removed, along with the now-unused certprovider and pemfile imports. Net: -32 lines, one code path for all HTTPS cert reloading. No behavior change — the refresh window, cache, and handshake contract are identical (the helper wraps the same pemfile.NewProvider). * feat(security): hot-reload HTTPS client certs for mount/backup/upload/etc The HTTP client in weed/util/http/client loaded the mTLS client cert once at startup via tls.LoadX509KeyPair. That left every long-lived HTTPS client process (weed mount, backup, filer.copy, filer→volume, s3→filer/volume) unable to pick up a rotated client cert without a restart — even though the same cert-manager setup was already rotating the server side fine. Swap the client cert loader for a tls.Config.GetClientCertificate callback backed by the same refreshing pemfile provider. New TLS handshakes pick up the rotated cert; in-flight pooled connections keep their old cert and drop as normal transport churn happens. To keep this reusable from both server and client TLS code without an import cycle (weed/security already imports weed/util/http/client for LoadHTTPClientFromFile), extract the pemfile wrapper into a new weed/security/certreload subpackage. weed/security keeps its thin NewReloadingServerCertificate wrapper. The existing unit test moves with the implementation. gRPC mTLS was already handled by security.LoadServerTLS / LoadClientTLS; this PR does not change any gRPC paths. MQ broker, MQ agent, Kafka gateway, and FUSE mount control plane are gRPC-only and therefore already rotate. CA bundles (ClientCAs / RootCAs / grpc.ca) are still loaded once — noted as a known limitation in the wiki. * fix(security): address PR review feedback on cert reloader Bots (gemini-code-assist + coderabbit) flagged three real issues and a couple of nits. Addressing them here: 1. KeyMaterial used context.Background(). The grpc pemfile provider's KeyMaterial blocks until material arrives or the context deadline expires; with Background() a slow disk could hang the TLS handshake indefinitely. Switched both the server and client callbacks to use hello.Context() / cri.Context() so a stuck read is bounded by the handshake timeout. 2. Admin server loaded TLS inside the serve goroutine. If the cert was bad, the goroutine returned but startAdminServer kept blocking on <-ctx.Done() with no listener, making the process look healthy with nothing bound. Moved TLS setup to run before the goroutine starts and propagate errors via fmt.Errorf; also captures the provider and defers Close(). 3. HTTP client discarded the certprovider.Provider from NewClientGetCertificate. That leaked the refresh goroutine, and NewHttpClientWithTLS had a worse case where a CA-file failure after provider creation orphaned the provider entirely. Added a certProvider field and a Close() method on HTTPClient, and made the constructors close the provider on subsequent error paths. 4. Server-side paths (master/volume/filer/s3/webdav/admin) now retain the provider. filer and webdav run ServeTLS synchronously, so a plain defer works. master/volume/s3 dispatch goroutines and return while the server keeps running, so they hook Close() into grace.OnInterrupt. 5. Test: certreload_test now tolerates transient read/parse errors during file rotation (writeSelfSigned rewrites cert before key) and reports the last error only if the deadline expires. No user-visible behavior change for the happy path. * test(tls): add end-to-end HTTPS cert rotation integration test Boots a real `weed master` with HTTPS enabled, captures the leaf cert served at TLS handshake time, atomically rewrites the cert/key files on disk (the same rename-in-place pattern kubelet does when it swaps a cert-manager Secret), and asserts that a subsequent TLS handshake observes the rotated leaf — with no process restart, no SIGHUP, no reloader sidecar. Verifies the full path: on-disk change → pemfile refresh tick → provider.KeyMaterial → tls.Config.GetCertificate → server TLS handshake. Runtime is ~1s by exposing the reloader's refresh window as an env var (WEED_TLS_CERT_REFRESH_INTERVAL) and setting it to 500ms for the test. The same env var is user-facing — documented in the wiki — so operators running short-lived certs (Vault, cert-manager with duration: 24h, etc.) can tighten the rotation-pickup window without a rebuild. Defaults to 5h to preserve prior behavior. security.CredRefreshingInterval is kept for API compatibility but now aliases certreload.DefaultRefreshInterval so the same env controls both gRPC mTLS and HTTPS reload. * ci(tls): wire the TLS rotation integration test into GitHub Actions Mirrors the existing vacuum-integration-tests.yml shape: Ubuntu runner, Go 1.25, build weed, run `go test` in test/tls_rotation, upload master logs on failure. 10-minute job timeout; the test itself finishes in about a second because WEED_TLS_CERT_REFRESH_INTERVAL is set to 500ms inside the test. Runs on every push to master and on every PR to master. * fix(tls): address follow-up PR review comments Three new comments on the integration test + volume shutdown path: 1. Test: peekServerCert was swallowing every dial/handshake error, which meant waitForCert's "last err: " fatal message lost all diagnostic value. Thread errors back through: peekServerCert now returns (*x509.Certificate, error), and waitForCert records the latest error so a CI flake points at the actual cause (master didn't come up, handshake rejected, CA pool mismatch, etc.). 2. Test: set HOME= on the master subprocess. Viper today registers the literal path "$HOME/.seaweedfs" without env expansion, so a developer's ~/.seaweedfs/security.toml is accidentally invisible — the test was relying on that. Pinning HOME is belt-and-braces against a future viper upgrade that does expand env vars. 3. volume.go: startClusterHttpService's provider close was registered via grace.OnInterrupt, which fires on SIGTERM but NOT on the v.shutdownCtx.Done() path used by mini / integration tests. The pemfile refresh goroutine leaked in that shutdown path. Now the helper returns a close func and the caller invokes it on BOTH shutdown paths for parity. Also add MinVersion: TLS 1.2 to the test's tls.Config to quiet the ast-grep static-analysis nit — zero-risk since the pool only trusts our in-memory CA. Test runs clean 3/3. --- .github/workflows/tls-rotation-tests.yml | 56 +++ .../tls_rotation_integration_test.go | 323 ++++++++++++++++++ weed/command/admin.go | 75 ++-- weed/command/filer.go | 25 +- weed/command/master.go | 15 +- weed/command/s3.go | 25 +- weed/command/volume.go | 35 +- weed/command/webdav.go | 9 +- weed/security/certreload/certreload.go | 120 +++++++ weed/security/certreload/certreload_test.go | 122 +++++++ weed/security/tls.go | 8 +- weed/security/tls_reload.go | 20 ++ weed/util/http/client/http_client.go | 89 +++-- weed/util/httpdown/http_down.go | 11 +- 14 files changed, 824 insertions(+), 109 deletions(-) create mode 100644 .github/workflows/tls-rotation-tests.yml create mode 100644 test/tls_rotation/tls_rotation_integration_test.go create mode 100644 weed/security/certreload/certreload.go create mode 100644 weed/security/certreload/certreload_test.go create mode 100644 weed/security/tls_reload.go diff --git a/.github/workflows/tls-rotation-tests.yml b/.github/workflows/tls-rotation-tests.yml new file mode 100644 index 000000000..7b010e430 --- /dev/null +++ b/.github/workflows/tls-rotation-tests.yml @@ -0,0 +1,56 @@ +name: "TLS Rotation Integration Tests" + +on: + push: + branches: [ master ] + pull_request: + branches: [ master ] + +permissions: + contents: read + +jobs: + tls-rotation-tests: + name: TLS Rotation Integration Tests + runs-on: ubuntu-22.04 + timeout-minutes: 10 + steps: + - name: Set up Go 1.x + uses: actions/setup-go@v6 + with: + go-version: ^1.25 + id: go + + - name: Check out code into the Go module directory + uses: actions/checkout@v6 + + - name: Build weed binary + run: | + cd weed && go build -o weed . + + - name: Run TLS Rotation Integration Tests + working-directory: test/tls_rotation + run: | + go test -v -count=1 -timeout 5m + + - name: Collect server logs on failure + if: failure() + run: | + echo "Collecting master logs from temp directories..." + mkdir -p /tmp/tls-rotation-test-logs + find /tmp -maxdepth 1 -type d -name "TestMasterHTTPS*" 2>/dev/null | while read dir; do + if [ -d "$dir" ]; then + echo "Found test directory: $dir" + cp -r "$dir" /tmp/tls-rotation-test-logs/ 2>/dev/null || true + fi + done + echo "Collected logs:" + find /tmp/tls-rotation-test-logs -type f -name "*.log" 2>/dev/null || echo "No logs found" + + - name: Archive logs + if: failure() + uses: actions/upload-artifact@v7 + with: + name: tls-rotation-test-logs + path: /tmp/tls-rotation-test-logs/ + retention-days: 14 diff --git a/test/tls_rotation/tls_rotation_integration_test.go b/test/tls_rotation/tls_rotation_integration_test.go new file mode 100644 index 000000000..e57a69a3a --- /dev/null +++ b/test/tls_rotation/tls_rotation_integration_test.go @@ -0,0 +1,323 @@ +// Package tls_rotation exercises HTTPS certificate rotation end-to-end: +// start a real `weed master` with an HTTPS listener, capture the leaf +// served at handshake time, rewrite the cert/key files on disk, and +// assert that a subsequent handshake sees the new leaf — all without +// stopping the master process. The test shortens the reloader's refresh +// window to ~half a second via WEED_TLS_CERT_REFRESH_INTERVAL so it +// completes in seconds rather than hours. +package tls_rotation + +import ( + "context" + "crypto/ecdsa" + "crypto/elliptic" + "crypto/rand" + "crypto/tls" + "crypto/x509" + "crypto/x509/pkix" + "encoding/pem" + "fmt" + "math/big" + "net" + "os" + "os/exec" + "path/filepath" + "strconv" + "syscall" + "testing" + "time" +) + +// TestMasterHTTPSCertRotation boots `weed master` with HTTPS, confirms +// the initial leaf is served, rotates the cert/key pair on disk, and +// asserts the rotated leaf is served on subsequent TLS handshakes. +func TestMasterHTTPSCertRotation(t *testing.T) { + if testing.Short() { + t.Skip("skipping HTTPS rotation integration test in -short mode") + } + + weedBin := findWeedBinary(t) + + dir := t.TempDir() + tlsDir := filepath.Join(dir, "tls") + if err := os.MkdirAll(tlsDir, 0o755); err != nil { + t.Fatalf("mkdir tls: %v", err) + } + certPath := filepath.Join(tlsDir, "server.crt") + keyPath := filepath.Join(tlsDir, "server.key") + + ca, caKey := generateCA(t) + leafSerial1 := big.NewInt(10001) + leafSerial2 := big.NewInt(10002) + + // Initial leaf on disk. + writeLeaf(t, certPath, keyPath, ca, caKey, leafSerial1) + + masterDir := filepath.Join(dir, "master") + if err := os.MkdirAll(masterDir, 0o755); err != nil { + t.Fatalf("mkdir master: %v", err) + } + // Empty security.toml so the master doesn't pick up a user's + // ~/.seaweedfs/security.toml during the test. + if err := os.WriteFile(filepath.Join(masterDir, "security.toml"), []byte("# test\n"), 0o644); err != nil { + t.Fatalf("write security.toml: %v", err) + } + + // Master auto-derives gRPC port as port+10000 when -port.grpc is + // unset, so both must fit in uint16. Pin both explicitly. + port, grpcPort := getFreeTCPPort(t), getFreeTCPPort(t) + + ctx, cancel := context.WithTimeout(context.Background(), 90*time.Second) + defer cancel() + + cmd := exec.CommandContext(ctx, weedBin, "master", + "-ip", "127.0.0.1", + "-port", strconv.Itoa(port), + "-port.grpc", strconv.Itoa(grpcPort), + "-mdir", masterDir, + ) + cmd.Dir = masterDir + cmd.Env = append(os.Environ(), + // Isolate HOME so the subprocess cannot pick up a developer's + // ~/.seaweedfs/security.toml. Viper's AddConfigPath uses the + // literal string "$HOME/.seaweedfs" without env expansion today, + // so this is only belt-and-braces — but it insures us against a + // future viper upgrade that does expand env vars. + "HOME="+dir, + "WEED_HTTPS_MASTER_CERT="+certPath, + "WEED_HTTPS_MASTER_KEY="+keyPath, + // Short refresh window so rotation completes in seconds. + "WEED_TLS_CERT_REFRESH_INTERVAL=500ms", + ) + logPath := filepath.Join(masterDir, "master.log") + logOut, err := os.Create(logPath) + if err != nil { + t.Fatalf("create master log: %v", err) + } + cmd.Stdout = logOut + cmd.Stderr = logOut + + if err := cmd.Start(); err != nil { + t.Fatalf("start master: %v", err) + } + t.Cleanup(func() { + if cmd.Process != nil { + _ = cmd.Process.Signal(syscall.SIGTERM) + done := make(chan struct{}) + go func() { _ = cmd.Wait(); close(done) }() + select { + case <-done: + case <-time.After(10 * time.Second): + _ = cmd.Process.Kill() + <-done + } + } + _ = logOut.Close() + if t.Failed() { + if b, readErr := os.ReadFile(logPath); readErr == nil { + t.Logf("master.log:\n%s", string(b)) + } + } + }) + + caPool := x509.NewCertPool() + caPool.AddCert(ca) + + addr := fmt.Sprintf("127.0.0.1:%d", port) + + // 1. Wait for the initial leaf to appear. Master takes a few seconds + // to open its HTTPS listener. + waitForCert(t, addr, caPool, leafSerial1, 30*time.Second, "initial cert") + + // Sanity: same handshake twice still observes the initial leaf. + got, err := peekServerCert(addr, caPool) + if err != nil || got == nil || got.SerialNumber.Cmp(leafSerial1) != 0 { + t.Fatalf("second probe before rotation did not return initial leaf: cert=%v err=%v", got, err) + } + + // 2. Rotate on disk. pemfile watches mtime, so each file's write is + // an atomic rename (tempfile in the same directory). + writeLeaf(t, certPath, keyPath, ca, caKey, leafSerial2) + + // 3. Wait for new leaf to take over. With a 500ms refresh and no + // connection pooling (tls.Dial opens a fresh conn each time), this + // should take a couple of seconds. + waitForCert(t, addr, caPool, leafSerial2, 15*time.Second, "rotated cert") +} + +// waitForCert polls until a TLS handshake against addr yields a peer +// cert with the expected serial, or fails the test at the deadline. +// The last handshake error is surfaced in the fatal message so that a +// CI flake makes the root cause obvious (master didn't come up, TLS +// handshake rejected, CA pool mismatch, etc.). +func waitForCert(t *testing.T, addr string, caPool *x509.CertPool, wantSerial *big.Int, within time.Duration, label string) { + t.Helper() + deadline := time.Now().Add(within) + var lastErr error + var lastSerial *big.Int + for time.Now().Before(deadline) { + cert, err := peekServerCert(addr, caPool) + if err != nil { + lastErr = err + } else if cert != nil { + lastSerial = cert.SerialNumber + if cert.SerialNumber.Cmp(wantSerial) == 0 { + return + } + } + time.Sleep(250 * time.Millisecond) + } + t.Fatalf("timeout waiting for %s (want serial %s, last seen %v, last err %v)", label, wantSerial, lastSerial, lastErr) +} + +// peekServerCert opens a one-shot TLS connection and returns the leaf. +// Errors (dial failure, handshake rejection, empty peer chain) are +// returned rather than swallowed, so the caller can surface them when +// the test times out. +func peekServerCert(addr string, caPool *x509.CertPool) (*x509.Certificate, error) { + d := &net.Dialer{Timeout: 2 * time.Second} + conn, err := tls.DialWithDialer(d, "tcp", addr, &tls.Config{ + RootCAs: caPool, + ServerName: "localhost", + MinVersion: tls.VersionTLS12, + }) + if err != nil { + return nil, err + } + defer conn.Close() + state := conn.ConnectionState() + if len(state.PeerCertificates) == 0 { + return nil, fmt.Errorf("handshake returned empty peer chain") + } + return state.PeerCertificates[0], nil +} + +func getFreeTCPPort(t *testing.T) int { + t.Helper() + ln, err := net.Listen("tcp", "127.0.0.1:0") + if err != nil { + t.Fatalf("listen ephemeral: %v", err) + } + port := ln.Addr().(*net.TCPAddr).Port + _ = ln.Close() + return port +} + +func findWeedBinary(t *testing.T) string { + t.Helper() + candidates := []string{ + "../../weed/weed", + "../weed/weed", + "./weed", + } + for _, c := range candidates { + if _, err := os.Stat(c); err == nil { + abs, absErr := filepath.Abs(c) + if absErr == nil { + return abs + } + return c + } + } + if path, err := exec.LookPath("weed"); err == nil { + return path + } + t.Skip("weed binary not found — build with `cd weed && go build` first") + return "" +} + +// --- cert fixtures ------------------------------------------------------- + +// generateCA returns a self-signed CA cert and its private key. +func generateCA(t *testing.T) (*x509.Certificate, *ecdsa.PrivateKey) { + t.Helper() + key, err := ecdsa.GenerateKey(elliptic.P256(), rand.Reader) + if err != nil { + t.Fatalf("gen CA key: %v", err) + } + tmpl := &x509.Certificate{ + SerialNumber: big.NewInt(1), + Subject: pkix.Name{CommonName: "tls-rotation-test-CA"}, + NotBefore: time.Now().Add(-time.Hour), + NotAfter: time.Now().Add(24 * time.Hour), + IsCA: true, + BasicConstraintsValid: true, + KeyUsage: x509.KeyUsageCertSign | x509.KeyUsageCRLSign, + } + der, err := x509.CreateCertificate(rand.Reader, tmpl, tmpl, &key.PublicKey, key) + if err != nil { + t.Fatalf("create CA cert: %v", err) + } + parsed, err := x509.ParseCertificate(der) + if err != nil { + t.Fatalf("parse CA cert: %v", err) + } + return parsed, key +} + +// writeLeaf signs a new leaf cert with the given serial and writes it +// plus its key to the given paths via atomic rename — the pattern +// Kubernetes (cert-manager → Secret volume mount) produces in practice. +func writeLeaf(t *testing.T, certPath, keyPath string, ca *x509.Certificate, caKey *ecdsa.PrivateKey, serial *big.Int) { + t.Helper() + leafKey, err := ecdsa.GenerateKey(elliptic.P256(), rand.Reader) + if err != nil { + t.Fatalf("gen leaf key: %v", err) + } + tmpl := &x509.Certificate{ + SerialNumber: serial, + Subject: pkix.Name{ + CommonName: "localhost", + }, + NotBefore: time.Now().Add(-time.Hour), + NotAfter: time.Now().Add(24 * time.Hour), + KeyUsage: x509.KeyUsageDigitalSignature | x509.KeyUsageKeyEncipherment, + ExtKeyUsage: []x509.ExtKeyUsage{x509.ExtKeyUsageServerAuth, x509.ExtKeyUsageClientAuth}, + DNSNames: []string{"localhost"}, + IPAddresses: []net.IP{net.ParseIP("127.0.0.1"), net.ParseIP("::1")}, + } + der, err := x509.CreateCertificate(rand.Reader, tmpl, ca, &leafKey.PublicKey, caKey) + if err != nil { + t.Fatalf("create leaf cert: %v", err) + } + + atomicWritePEM(t, certPath, "CERTIFICATE", der) + + keyDER, err := x509.MarshalECPrivateKey(leafKey) + if err != nil { + t.Fatalf("marshal leaf key: %v", err) + } + atomicWritePEM(t, keyPath, "EC PRIVATE KEY", keyDER) +} + +// atomicWritePEM writes a PEM file via tempfile-in-same-directory plus +// rename, matching what kubelet does when it swaps the ..data symlink +// for a renewed Secret. Ensures the reader never sees a truncated file. +func atomicWritePEM(t *testing.T, path, blockType string, der []byte) { + t.Helper() + dir := filepath.Dir(path) + tmp, err := os.CreateTemp(dir, ".tls-*") + if err != nil { + t.Fatalf("create tempfile: %v", err) + } + ok := false + defer func() { + if !ok { + _ = os.Remove(tmp.Name()) + } + }() + if err := pem.Encode(tmp, &pem.Block{Type: blockType, Bytes: der}); err != nil { + tmp.Close() + t.Fatalf("pem encode: %v", err) + } + if err := tmp.Close(); err != nil { + t.Fatalf("close tempfile: %v", err) + } + if err := os.Chmod(tmp.Name(), 0o600); err != nil { + t.Fatalf("chmod tempfile: %v", err) + } + if err := os.Rename(tmp.Name(), path); err != nil { + t.Fatalf("rename tempfile onto %s: %v", path, err) + } + ok = true +} diff --git a/weed/command/admin.go b/weed/command/admin.go index 31ddd7a2c..e29e83983 100644 --- a/weed/command/admin.go +++ b/weed/command/admin.go @@ -4,6 +4,7 @@ import ( "bufio" "context" "crypto/rand" + "crypto/tls" "fmt" "log" "net" @@ -411,43 +412,57 @@ func startAdminServer(ctx context.Context, options AdminOptions, enableUI bool, Handler: handler, } + // Decide TLS configuration BEFORE launching the server goroutine, so a + // bad cert or a missing key surfaces as a startup error instead of a + // silently returned goroutine that leaves startAdminServer blocked on + // ctx.Done() with no listener. + var ( + clientCertFile, + certFile, + keyFile string + ) + useTLS := false + useMTLS := false + + if viper.GetString("https.admin.key") != "" { + useTLS = true + certFile = viper.GetString("https.admin.cert") + keyFile = viper.GetString("https.admin.key") + } + + if viper.GetString("https.admin.ca") != "" { + useMTLS = true + clientCertFile = viper.GetString("https.admin.ca") + } + + if useMTLS { + server.TLSConfig = security.LoadClientTLSHTTP(clientCertFile) + } + + if useTLS { + getCert, certProvider, certErr := security.NewReloadingServerCertificate(certFile, keyFile) + if certErr != nil { + return fmt.Errorf("load admin HTTPS certificate: %w", certErr) + } + defer certProvider.Close() + if server.TLSConfig == nil { + server.TLSConfig = &tls.Config{} + } + server.TLSConfig.GetCertificate = getCert + } + // Start server go func() { log.Printf("Starting SeaweedFS Admin Server on port %d", *options.port) - - // start http or https server with security.toml - var ( - clientCertFile, - certFile, - keyFile string - ) - useTLS := false - useMTLS := false - - if viper.GetString("https.admin.key") != "" { - useTLS = true - certFile = viper.GetString("https.admin.cert") - keyFile = viper.GetString("https.admin.key") - } - - if viper.GetString("https.admin.ca") != "" { - useMTLS = true - clientCertFile = viper.GetString("https.admin.ca") - } - - if useMTLS { - server.TLSConfig = security.LoadClientTLSHTTP(clientCertFile) - } - + var serveErr error if useTLS { log.Printf("Starting SeaweedFS Admin Server with TLS on port %d", *options.port) - err = server.ListenAndServeTLS(certFile, keyFile) + serveErr = server.ListenAndServeTLS("", "") } else { - err = server.ListenAndServe() + serveErr = server.ListenAndServe() } - - if err != nil && err != http.ErrServerClosed { - log.Printf("Failed to start server: %v", err) + if serveErr != nil && serveErr != http.ErrServerClosed { + log.Printf("Failed to start server: %v", serveErr) } }() diff --git a/weed/command/filer.go b/weed/command/filer.go index 0781aae0d..0e9f4fe6a 100644 --- a/weed/command/filer.go +++ b/weed/command/filer.go @@ -15,8 +15,6 @@ import ( "time" "github.com/spf13/viper" - "google.golang.org/grpc/credentials/tls/certprovider" - "google.golang.org/grpc/credentials/tls/certprovider/pemfile" "google.golang.org/grpc/reflection" "github.com/seaweedfs/seaweedfs/weed/credential" @@ -82,7 +80,6 @@ type FilerOptions struct { allowedOrigins *string exposeDirectoryData *bool tusBasePath *string - certProvider certprovider.Provider s3ConfigFile *string // optional path to static S3 identity config mountPeerRegistryEnable *bool // accept MountRegister/MountList RPCs (peer chunk sharing tier 1) // shutdownCtx, when non-nil, tells startFiler to gracefully shut down its @@ -315,15 +312,6 @@ func runFiler(cmd *Command, args []string) bool { return true } -// GetCertificateWithUpdate Auto refreshing TSL certificate -func (fo *FilerOptions) GetCertificateWithUpdate(*tls.ClientHelloInfo) (*tls.Certificate, error) { - certs, err := fo.certProvider.KeyMaterial(context.Background()) - if certs == nil { - return nil, err - } - return &certs.Certs[0], err -} - func (fo *FilerOptions) startFiler() { defaultMux := http.NewServeMux() @@ -500,14 +488,11 @@ func (fo *FilerOptions) startFiler() { caCertFile := viper.GetString("https.filer.ca") disbaleTlsVerifyClientCert := viper.GetBool("https.filer.disable_tls_verify_client_cert") - pemfileOptions := pemfile.Options{ - CertFile: certFile, - KeyFile: keyFile, - RefreshDuration: security.CredRefreshingInterval, - } - if fo.certProvider, err = pemfile.NewProvider(pemfileOptions); err != nil { - glog.Fatalf("pemfile.NewProvider(%v) failed: %v", pemfileOptions, err) + getCert, certProvider, err := security.NewReloadingServerCertificate(certFile, keyFile) + if err != nil { + glog.Fatalf("Filer failed to load HTTPS certificate: %v", err) } + defer certProvider.Close() caCertPool := x509.NewCertPool() if caCertFile != "" { @@ -524,7 +509,7 @@ func (fo *FilerOptions) startFiler() { } tlsConfig := &tls.Config{ - GetCertificate: fo.GetCertificateWithUpdate, + GetCertificate: getCert, ClientAuth: clientAuth, ClientCAs: caCertPool, } diff --git a/weed/command/master.go b/weed/command/master.go index 940aea680..bd4a0e6cc 100644 --- a/weed/command/master.go +++ b/weed/command/master.go @@ -328,7 +328,20 @@ func startMaster(masterOption MasterOptions, masterWhiteList []string) { } if useTLS { - go newHttpServer(r, tlsConfig).ServeTLS(masterListener, certFile, keyFile) + getCert, certProvider, err := security.NewReloadingServerCertificate(certFile, keyFile) + if err != nil { + glog.Fatalf("failed to load master HTTPS certificate: %v", err) + } + // Master runs ServeTLS in a goroutine and this function then blocks + // on shutdownCtx / select{}; tie the pem refresh goroutine to the + // existing interrupt hook instead of a local defer that would fire + // while the server is still running. + grace.OnInterrupt(certProvider.Close) + if tlsConfig == nil { + tlsConfig = &tls.Config{} + } + tlsConfig.GetCertificate = getCert + go newHttpServer(r, tlsConfig).ServeTLS(masterListener, "", "") } else { go newHttpServer(r, nil).Serve(masterListener) } diff --git a/weed/command/s3.go b/weed/command/s3.go index dd97eab46..89e916cad 100644 --- a/weed/command/s3.go +++ b/weed/command/s3.go @@ -15,8 +15,6 @@ import ( "time" "github.com/gorilla/mux" - "google.golang.org/grpc/credentials/tls/certprovider" - "google.golang.org/grpc/credentials/tls/certprovider/pemfile" "google.golang.org/grpc/reflection" "github.com/seaweedfs/seaweedfs/weed/glog" @@ -62,7 +60,6 @@ type S3Options struct { localFilerSocket *string dataCenter *string localSocket *string - certProvider certprovider.Provider idleTimeout *int concurrentUploadLimitMB *int concurrentFileUploadLimit *int @@ -228,15 +225,6 @@ func runS3(cmd *Command, args []string) bool { } -// GetCertificateWithUpdate Auto refreshing TSL certificate -func (s3opt *S3Options) GetCertificateWithUpdate(*tls.ClientHelloInfo) (*tls.Certificate, error) { - certs, err := s3opt.certProvider.KeyMaterial(context.Background()) - if certs == nil { - return nil, err - } - return &certs.Certs[0], err -} - // resolveExternalUrl returns the external URL from the flag or falls back to the S3_EXTERNAL_URL env var. func (s3opt *S3Options) resolveExternalUrl() string { if s3opt.externalUrl != nil && *s3opt.externalUrl != "" { @@ -415,14 +403,11 @@ func (s3opt *S3Options) startS3Server() bool { glog.Fatalf("S3 API Server error: -s3.port.https (%d) cannot be the same as -s3.port (%d)", *s3opt.portHttps, *s3opt.port) } - pemfileOptions := pemfile.Options{ - CertFile: *s3opt.tlsCertificate, - KeyFile: *s3opt.tlsPrivateKey, - RefreshDuration: security.CredRefreshingInterval, - } - if s3opt.certProvider, err = pemfile.NewProvider(pemfileOptions); err != nil { - glog.Fatalf("pemfile.NewProvider(%v) failed: %v", pemfileOptions, err) + getCert, certProvider, err := security.NewReloadingServerCertificate(*s3opt.tlsCertificate, *s3opt.tlsPrivateKey) + if err != nil { + glog.Fatalf("S3 API Server failed to load HTTPS certificate: %v", err) } + grace.OnInterrupt(certProvider.Close) caCertPool := x509.NewCertPool() if *s3opt.tlsCACertificate != "" { @@ -440,7 +425,7 @@ func (s3opt *S3Options) startS3Server() bool { } tlsConfig := &tls.Config{ - GetCertificate: s3opt.GetCertificateWithUpdate, + GetCertificate: getCert, ClientAuth: clientAuth, ClientCAs: caCertPool, } diff --git a/weed/command/volume.go b/weed/command/volume.go index f5366964a..137b4b747 100644 --- a/weed/command/volume.go +++ b/weed/command/volume.go @@ -2,6 +2,7 @@ package command import ( "context" + "crypto/tls" "fmt" "net/http" httppprof "net/http/pprof" @@ -312,7 +313,7 @@ func (v VolumeServerOptions) startVolumeServer(volumeFolders, maxVolumeCounts, v } // starting the cluster http server - clusterHttpServer := v.startClusterHttpService(volumeMux) + clusterHttpServer, closeCert := v.startClusterHttpService(volumeMux) grace.OnReload(volumeServer.LoadNewVolumes) grace.OnReload(volumeServer.Reload) @@ -329,6 +330,9 @@ func (v VolumeServerOptions) startVolumeServer(volumeFolders, maxVolumeCounts, v } shutdown(publicHttpDown, clusterHttpServer, grpcS, volumeServer) + if closeCert != nil { + closeCert() + } stopChan <- true }) @@ -337,6 +341,9 @@ func (v VolumeServerOptions) startVolumeServer(volumeFolders, maxVolumeCounts, v case <-stopChan: case <-v.shutdownCtx.Done(): shutdown(publicHttpDown, clusterHttpServer, grpcS, volumeServer) + if closeCert != nil { + closeCert() + } } } else { select { @@ -442,7 +449,13 @@ func (v VolumeServerOptions) startPublicHttpService(handler http.Handler) httpdo return publicHttpDown } -func (v VolumeServerOptions) startClusterHttpService(handler http.Handler) httpdown.Server { +// startClusterHttpService starts the volume cluster HTTP server and +// returns it along with a close func for the cert reloader's refresh +// goroutine (nil when HTTPS is disabled). The caller is responsible +// for invoking the close func on every shutdown path — both the +// SIGTERM/grace.OnInterrupt path and the shutdownCtx path used by +// mini/integration tests. +func (v VolumeServerOptions) startClusterHttpService(handler http.Handler) (httpdown.Server, func()) { var ( certFile, keyFile string ) @@ -461,8 +474,7 @@ func (v VolumeServerOptions) startClusterHttpService(handler http.Handler) httpd httpDown := httpdown.HTTP{ KillTimeout: time.Minute, StopTimeout: 30 * time.Second, - CertFile: certFile, - KeyFile: keyFile} + } httpS := &http.Server{Handler: handler} if viper.GetString("https.volume.ca") != "" { @@ -471,11 +483,24 @@ func (v VolumeServerOptions) startClusterHttpService(handler http.Handler) httpd security.FixTlsConfig(util.GetViper(), httpS.TLSConfig) } + var closeCert func() + if certFile != "" && keyFile != "" { + getCert, certProvider, err := security.NewReloadingServerCertificate(certFile, keyFile) + if err != nil { + glog.Fatalf("Volume server failed to load TLS certificate: %v", err) + } + closeCert = certProvider.Close + if httpS.TLSConfig == nil { + httpS.TLSConfig = &tls.Config{} + } + httpS.TLSConfig.GetCertificate = getCert + } + clusterHttpServer := httpDown.Serve(httpS, listener) go func() { if e := clusterHttpServer.Wait(); e != nil { glog.Fatalf("Volume server fail to serve: %v", e) } }() - return clusterHttpServer + return clusterHttpServer, closeCert } diff --git a/weed/command/webdav.go b/weed/command/webdav.go index 6332832d9..050e194ed 100644 --- a/weed/command/webdav.go +++ b/weed/command/webdav.go @@ -2,6 +2,7 @@ package command import ( "context" + "crypto/tls" "fmt" "net/http" "os" @@ -150,7 +151,13 @@ func (wo *WebDavOption) startWebDav() bool { if *wo.tlsPrivateKey != "" { glog.V(0).Infof("Start Seaweed WebDav Server %s at https %s", version.Version(), listenAddress) - if err = httpS.ServeTLS(webDavListener, *wo.tlsCertificate, *wo.tlsPrivateKey); err != nil && err != http.ErrServerClosed { + getCert, certProvider, err := security.NewReloadingServerCertificate(*wo.tlsCertificate, *wo.tlsPrivateKey) + if err != nil { + glog.Fatalf("WebDav Server failed to load TLS certificate: %v", err) + } + defer certProvider.Close() + httpS.TLSConfig = &tls.Config{GetCertificate: getCert} + if err = httpS.ServeTLS(webDavListener, "", ""); err != nil && err != http.ErrServerClosed { glog.Fatalf("WebDav Server Fail to serve: %v", err) } } else { diff --git a/weed/security/certreload/certreload.go b/weed/security/certreload/certreload.go new file mode 100644 index 000000000..c0288198e --- /dev/null +++ b/weed/security/certreload/certreload.go @@ -0,0 +1,120 @@ +// Package certreload wraps grpc's pemfile.Provider so both TLS servers +// (weed/security) and TLS clients (weed/util/http/client) can share one +// reloading cert implementation without an import cycle between them. +// +// Lives in its own subpackage because weed/security already imports +// weed/util/http/client (for LoadHTTPClientFromFile). +package certreload + +import ( + "context" + "crypto/tls" + "fmt" + "os" + "time" + + "google.golang.org/grpc/credentials/tls/certprovider" + "google.golang.org/grpc/credentials/tls/certprovider/pemfile" +) + +// RefreshIntervalEnv names an environment variable that overrides the +// refresh cadence. Accepts any time.ParseDuration value (e.g. "30m", +// "500ms"). Primarily a hook for integration tests that need rotation +// to complete in seconds, but also useful in production when paired +// with short-lived certs (e.g. Vault-issued). +const RefreshIntervalEnv = "WEED_TLS_CERT_REFRESH_INTERVAL" + +// DefaultRefreshInterval is the cadence at which the pemfile provider +// stats cert/key files on disk. It re-parses only when mtime/contents +// change, so the hot path (KeyMaterial() on each TLS handshake) stays +// cheap. +// +// 5 hours matches the prior constant used for gRPC mTLS. Resolved once +// at process start from RefreshIntervalEnv if set. +var DefaultRefreshInterval = resolveRefreshInterval(5 * time.Hour) + +func resolveRefreshInterval(fallback time.Duration) time.Duration { + if s := os.Getenv(RefreshIntervalEnv); s != "" { + if d, err := time.ParseDuration(s); err == nil && d > 0 { + return d + } + } + return fallback +} + +// NewServerGetCertificate returns a callback suitable for +// tls.Config.GetCertificate. It reloads certFile and keyFile from disk +// on each refresh tick so rotated certs (e.g. from k8s cert-manager) +// are picked up without a restart. Caller should Close() the returned +// provider at shutdown. +func NewServerGetCertificate(certFile, keyFile string) (func(*tls.ClientHelloInfo) (*tls.Certificate, error), certprovider.Provider, error) { + return newServerGetCertificate(certFile, keyFile, DefaultRefreshInterval) +} + +func newServerGetCertificate(certFile, keyFile string, refresh time.Duration) (func(*tls.ClientHelloInfo) (*tls.Certificate, error), certprovider.Provider, error) { + provider, err := newProvider(certFile, keyFile, refresh) + if err != nil { + return nil, nil, err + } + get := func(hello *tls.ClientHelloInfo) (*tls.Certificate, error) { + // KeyMaterial blocks until the pemfile provider has read the files + // for the first time. Use the handshake context so a stuck read + // bounds the handshake instead of hanging it forever. + ctx := context.Background() + if hello != nil { + ctx = hello.Context() + } + return current(ctx, provider, certFile) + } + return get, provider, nil +} + +// NewClientGetCertificate returns a callback suitable for +// tls.Config.GetClientCertificate. Fires per TLS handshake, so long-lived +// HTTPS clients (FUSE mount, backup, filer→volume, etc.) pick up rotated +// client mTLS certs as pooled connections recycle. +func NewClientGetCertificate(certFile, keyFile string) (func(*tls.CertificateRequestInfo) (*tls.Certificate, error), certprovider.Provider, error) { + return newClientGetCertificate(certFile, keyFile, DefaultRefreshInterval) +} + +func newClientGetCertificate(certFile, keyFile string, refresh time.Duration) (func(*tls.CertificateRequestInfo) (*tls.Certificate, error), certprovider.Provider, error) { + provider, err := newProvider(certFile, keyFile, refresh) + if err != nil { + return nil, nil, err + } + get := func(cri *tls.CertificateRequestInfo) (*tls.Certificate, error) { + ctx := context.Background() + if cri != nil { + ctx = cri.Context() + } + return current(ctx, provider, certFile) + } + return get, provider, nil +} + +func newProvider(certFile, keyFile string, refresh time.Duration) (certprovider.Provider, error) { + if certFile == "" || keyFile == "" { + return nil, fmt.Errorf("both certFile and keyFile are required") + } + provider, err := pemfile.NewProvider(pemfile.Options{ + CertFile: certFile, + KeyFile: keyFile, + RefreshDuration: refresh, + }) + if err != nil { + return nil, fmt.Errorf("pemfile.NewProvider: %w", err) + } + return provider, nil +} + +func current(ctx context.Context, provider certprovider.Provider, certFile string) (*tls.Certificate, error) { + km, err := provider.KeyMaterial(ctx) + if err != nil { + return nil, err + } + if km == nil || len(km.Certs) == 0 { + return nil, fmt.Errorf("no TLS key material available for %s", certFile) + } + cert := km.Certs[0] + return &cert, nil +} diff --git a/weed/security/certreload/certreload_test.go b/weed/security/certreload/certreload_test.go new file mode 100644 index 000000000..12399d5b0 --- /dev/null +++ b/weed/security/certreload/certreload_test.go @@ -0,0 +1,122 @@ +package certreload + +import ( + "crypto/ecdsa" + "crypto/elliptic" + "crypto/rand" + "crypto/x509" + "crypto/x509/pkix" + "encoding/pem" + "math/big" + "os" + "path/filepath" + "testing" + "time" +) + +// TestServerGetCertificatePicksUpNewFile writes an initial cert/key +// pair, starts the reloader, then overwrites the files with a new pair +// and asserts that the callback eventually returns the new certificate. +func TestServerGetCertificatePicksUpNewFile(t *testing.T) { + dir := t.TempDir() + certPath := filepath.Join(dir, "tls.crt") + keyPath := filepath.Join(dir, "tls.key") + + if err := writeSelfSigned(certPath, keyPath, "first"); err != nil { + t.Fatalf("write first cert: %v", err) + } + + // Use a short refresh window so the test doesn't wait 5h. + getCert, provider, err := newServerGetCertificate(certPath, keyPath, 100*time.Millisecond) + if err != nil { + t.Fatalf("newServerGetCertificate: %v", err) + } + defer provider.Close() + + c1, err := getCert(nil) + if err != nil { + t.Fatalf("initial getCert: %v", err) + } + first, err := x509.ParseCertificate(c1.Certificate[0]) + if err != nil { + t.Fatalf("parse first cert: %v", err) + } + if first.Subject.CommonName != "first" { + t.Fatalf("want CN=first, got %q", first.Subject.CommonName) + } + + // Rotate the files. pemfile watches mtime; sleep so the new files have + // a later mtime even on filesystems with 1s granularity. + time.Sleep(1100 * time.Millisecond) + if err := writeSelfSigned(certPath, keyPath, "second"); err != nil { + t.Fatalf("write second cert: %v", err) + } + + // The pemfile poller can briefly observe a mismatched cert/key pair + // during rotation (cert written before key). Tolerate transient errors + // and only fail at the deadline; keep the last error for diagnostics. + var lastErr error + deadline := time.Now().Add(5 * time.Second) + for time.Now().Before(deadline) { + c2, err := getCert(nil) + if err != nil { + lastErr = err + time.Sleep(100 * time.Millisecond) + continue + } + parsed, err := x509.ParseCertificate(c2.Certificate[0]) + if err != nil { + lastErr = err + time.Sleep(100 * time.Millisecond) + continue + } + if parsed.Subject.CommonName == "second" { + return + } + time.Sleep(100 * time.Millisecond) + } + t.Fatalf("reloader did not pick up rotated cert within timeout; last error: %v", lastErr) +} + +func writeSelfSigned(certPath, keyPath, commonName string) error { + key, err := ecdsa.GenerateKey(elliptic.P256(), rand.Reader) + if err != nil { + return err + } + tmpl := &x509.Certificate{ + SerialNumber: big.NewInt(time.Now().UnixNano()), + Subject: pkix.Name{CommonName: commonName}, + NotBefore: time.Now().Add(-time.Hour), + NotAfter: time.Now().Add(time.Hour), + KeyUsage: x509.KeyUsageDigitalSignature, + ExtKeyUsage: []x509.ExtKeyUsage{x509.ExtKeyUsageServerAuth}, + } + der, err := x509.CreateCertificate(rand.Reader, tmpl, tmpl, &key.PublicKey, key) + if err != nil { + return err + } + certOut, err := os.Create(certPath) + if err != nil { + return err + } + if err := pem.Encode(certOut, &pem.Block{Type: "CERTIFICATE", Bytes: der}); err != nil { + certOut.Close() + return err + } + if err := certOut.Close(); err != nil { + return err + } + keyDER, err := x509.MarshalECPrivateKey(key) + if err != nil { + return err + } + keyOut, err := os.Create(keyPath) + if err != nil { + return err + } + if err := pem.Encode(keyOut, &pem.Block{Type: "EC PRIVATE KEY", Bytes: keyDER}); err != nil { + keyOut.Close() + return err + } + return keyOut.Close() +} diff --git a/weed/security/tls.go b/weed/security/tls.go index 2e2af94b3..07f5d5cba 100644 --- a/weed/security/tls.go +++ b/weed/security/tls.go @@ -10,11 +10,11 @@ import ( "path/filepath" "slices" "strings" - "time" "github.com/spf13/viper" "github.com/seaweedfs/seaweedfs/weed/glog" + "github.com/seaweedfs/seaweedfs/weed/security/certreload" "github.com/seaweedfs/seaweedfs/weed/util" util_http_client "github.com/seaweedfs/seaweedfs/weed/util/http/client" "google.golang.org/grpc" @@ -24,7 +24,11 @@ import ( "google.golang.org/grpc/security/advancedtls" ) -const CredRefreshingInterval = time.Duration(5) * time.Hour +// CredRefreshingInterval is the refresh cadence for gRPC mTLS certs. +// Shares its source of truth with certreload.DefaultRefreshInterval so +// a single WEED_TLS_CERT_REFRESH_INTERVAL env var tunes both gRPC and +// HTTPS cert reload. +var CredRefreshingInterval = certreload.DefaultRefreshInterval type Authenticator struct { AllowedWildcardDomain string diff --git a/weed/security/tls_reload.go b/weed/security/tls_reload.go new file mode 100644 index 000000000..aa25075e4 --- /dev/null +++ b/weed/security/tls_reload.go @@ -0,0 +1,20 @@ +package security + +import ( + "crypto/tls" + + "github.com/seaweedfs/seaweedfs/weed/security/certreload" + "google.golang.org/grpc/credentials/tls/certprovider" +) + +// NewReloadingServerCertificate returns a GetCertificate callback for +// tls.Config.GetCertificate, backed by a refreshing pem provider so +// rotated certs (e.g. from Kubernetes cert-manager) are picked up +// without a restart. +// +// Thin wrapper over certreload; the shared implementation lives in the +// subpackage so both server TLS (this package) and HTTP client TLS +// (weed/util/http/client) can use it without an import cycle. +func NewReloadingServerCertificate(certFile, keyFile string) (func(*tls.ClientHelloInfo) (*tls.Certificate, error), certprovider.Provider, error) { + return certreload.NewServerGetCertificate(certFile, keyFile) +} diff --git a/weed/util/http/client/http_client.go b/weed/util/http/client/http_client.go index 3291c56a2..ebc938ad3 100644 --- a/weed/util/http/client/http_client.go +++ b/weed/util/http/client/http_client.go @@ -11,6 +11,9 @@ import ( "strings" "sync" + "google.golang.org/grpc/credentials/tls/certprovider" + + "github.com/seaweedfs/seaweedfs/weed/security/certreload" util "github.com/seaweedfs/seaweedfs/weed/util" "github.com/spf13/viper" ) @@ -23,6 +26,25 @@ type HTTPClient struct { Client *http.Client Transport *http.Transport expectHttpsScheme bool + // certProvider, when non-nil, owns a background refresh goroutine for + // the client mTLS cert/key pair. Close() must be called to stop it. + certProvider certprovider.Provider +} + +// Close stops any background cert refresh goroutine. Safe to call on a +// client that was constructed without mTLS. Existing pooled connections +// are also closed via CloseIdleConnections. +func (httpClient *HTTPClient) Close() { + if httpClient == nil { + return + } + if httpClient.certProvider != nil { + httpClient.certProvider.Close() + httpClient.certProvider = nil + } + if httpClient.Client != nil { + httpClient.Client.CloseIdleConnections() + } } func (httpClient *HTTPClient) Do(req *http.Request) (*http.Response, error) { @@ -100,7 +122,7 @@ func NewHttpClient(clientName ClientName, opts ...HttpClientOpt) (*HTTPClient, e var tlsConfig *tls.Config = nil if httpClient.expectHttpsScheme { - clientCertPair, err := getClientCertPair(clientName) + certFileName, keyFileName, hasClientCert, err := clientCertPaths(clientName) if err != nil { return nil, err } @@ -110,20 +132,24 @@ func NewHttpClient(clientName ClientName, opts ...HttpClientOpt) (*HTTPClient, e return nil, err } - if clientCertPair != nil || len(clientCaCert) != 0 { + if hasClientCert || len(clientCaCert) != 0 { caCertPool, err := createHTTPClientCertPool(clientCaCert, clientCaCertName) if err != nil { return nil, err } tlsConfig = &tls.Config{ - Certificates: []tls.Certificate{}, RootCAs: caCertPool, InsecureSkipVerify: false, } - if clientCertPair != nil { - tlsConfig.Certificates = append(tlsConfig.Certificates, *clientCertPair) + if hasClientCert { + getClientCert, provider, err := certreload.NewClientGetCertificate(certFileName, keyFileName) + if err != nil { + return nil, fmt.Errorf("error loading client certificate and key: %s", err) + } + tlsConfig.GetClientCertificate = getClientCert + httpClient.certProvider = provider } } @@ -175,20 +201,20 @@ func getFileContentFromSecurityConfiguration(clientName ClientName, fileType str return nil, "", nil } -func getClientCertPair(clientName ClientName) (*tls.Certificate, error) { - certFileName := getStringOptionFromSecurityConfiguration(clientName, "cert") - keyFileName := getStringOptionFromSecurityConfiguration(clientName, "key") - if certFileName == "" && keyFileName == "" { - return nil, nil +// clientCertPaths reads the https..{cert,key} paths from the +// security config, validates they're either both set or both empty, and +// returns them along with a hasClientCert flag. Loading is deferred to +// certreload so the cert/key pair is picked up from disk on rotation. +func clientCertPaths(clientName ClientName) (certFile, keyFile string, hasClientCert bool, err error) { + certFile = getStringOptionFromSecurityConfiguration(clientName, "cert") + keyFile = getStringOptionFromSecurityConfiguration(clientName, "key") + if certFile == "" && keyFile == "" { + return "", "", false, nil } - if certFileName != "" && keyFileName != "" { - clientCert, err := tls.LoadX509KeyPair(certFileName, keyFileName) - if err != nil { - return nil, fmt.Errorf("error loading client certificate and key: %s", err) - } - return &clientCert, nil + if certFile == "" || keyFile == "" { + return "", "", false, fmt.Errorf("https.%s: both cert and key must be set (got cert=%q key=%q)", clientName.LowerCaseString(), certFile, keyFile) } - return nil, fmt.Errorf("error loading key pair: key `%s` and certificate `%s`", keyFileName, certFileName) + return certFile, keyFile, true, nil } func getClientCaCert(clientName ClientName) ([]byte, string, error) { @@ -208,35 +234,44 @@ func NewHttpClientWithTLS(certFile, keyFile, caFile string, insecureSkipVerify b return nil, fmt.Errorf("both cert and key are required for mTLS, got cert=%q key=%q", certFile, keyFile) } - var clientCert *tls.Certificate + var getClientCert func(*tls.CertificateRequestInfo) (*tls.Certificate, error) if certFile != "" && keyFile != "" { - cert, err := tls.LoadX509KeyPair(certFile, keyFile) + cb, provider, err := certreload.NewClientGetCertificate(certFile, keyFile) if err != nil { return nil, fmt.Errorf("error loading client certificate and key: %s", err) } - clientCert = &cert + getClientCert = cb + httpClient.certProvider = provider + } + // closeProviderOnError ensures the cert reloader's background refresh + // goroutine is shut down if any subsequent step fails before we hand + // the client back to the caller. + closeProviderOnError := func() { + if httpClient.certProvider != nil { + httpClient.certProvider.Close() + httpClient.certProvider = nil + } } var caCertPool *x509.CertPool if caFile != "" { caCert, err := os.ReadFile(caFile) if err != nil { + closeProviderOnError() return nil, fmt.Errorf("error reading CA cert %s: %s", caFile, err) } caCertPool, err = createHTTPClientCertPool(caCert, caFile) if err != nil { + closeProviderOnError() return nil, err } } - if clientCert != nil || caCertPool != nil || insecureSkipVerify { + if getClientCert != nil || caCertPool != nil || insecureSkipVerify { tlsConfig = &tls.Config{ - Certificates: []tls.Certificate{}, - RootCAs: caCertPool, - InsecureSkipVerify: insecureSkipVerify, - } - if clientCert != nil { - tlsConfig.Certificates = append(tlsConfig.Certificates, *clientCert) + GetClientCertificate: getClientCert, + RootCAs: caCertPool, + InsecureSkipVerify: insecureSkipVerify, } } diff --git a/weed/util/httpdown/http_down.go b/weed/util/httpdown/http_down.go index 5cbd9611c..8981aaefc 100644 --- a/weed/util/httpdown/http_down.go +++ b/weed/util/httpdown/http_down.go @@ -289,10 +289,15 @@ func (s *server) manage() { func (s *server) serve() { stats.BumpSum(s.stats, "serve", 1) - if s.certFile == "" && s.keyFile == "" { - s.serveErr <- s.server.Serve(s.listener) - } else { + switch { + case s.certFile != "" || s.keyFile != "": s.serveErr <- s.server.ServeTLS(s.listener, s.certFile, s.keyFile) + case s.server.TLSConfig != nil && (len(s.server.TLSConfig.Certificates) > 0 || s.server.TLSConfig.GetCertificate != nil): + // TLSConfig carries the certificate source (e.g. a hot-reloading + // GetCertificate callback). Pass empty file args so ServeTLS uses it. + s.serveErr <- s.server.ServeTLS(s.listener, "", "") + default: + s.serveErr <- s.server.Serve(s.listener) } close(s.serveDone) close(s.serveErr)