seaweedfs

mirror of https://github.com/seaweedfs/seaweedfs.git synced 2026-06-13 23:36:45 +03:00

Author	SHA1	Message	Date
Chris Lu	79a48256f5	fix(s3): populate s3:prefix from query param for ListObjects policy conditions (#8971 ) * fix(s3): populate s3:prefix from query param for ListObjects policy conditions (#8969) ListObjectsV2/V1 requests with prefix-restricted STS session policies were denied because: 1. s3:prefix was derived from objectKey, which the auth middleware set to the prefix value, but the resource ARN then included the prefix (e.g. arn:aws:s3:::bucket/prefix) instead of staying at bucket level (arn:aws:s3:::bucket) as AWS requires for ListBucket. 2. When objectKey was empty (no middleware propagation), s3:prefix was never populated from the query parameter at all. Now AuthorizeAction extracts the prefix query parameter directly, sets it as s3:prefix in the request context, and uses a bucket-level resource ARN when the objectKey matches the propagated prefix. * fix(s3): use AWS-style wildcard matching for StringLike policy conditions filepath.Match treats * as not matching /, which breaks IAM StringLike conditions on paths (e.g. arn:aws:s3:::bucket/* won't match nested keys). Replace with a case-sensitive variant of AwsWildcardMatch that correctly treats * as matching any character including /. * refactor(s3): replace regex wildcard matching with string-based matcher Use the existing wildcard.MatchesWildcard utility instead of compiling and caching regexes for IAM wildcard matching. Removes the regexCache, its mutex, and the sync import. * refactor(s3): inline and remove AwsWildcardMatch wrapper functions Replace all call sites with direct wildcard.MatchesWildcard calls. * fix(s3): scope s3:prefix condition key to list operations only The s3:prefix logic was running for all actions, so a GetObject on "foo/bar" would wrongly populate s3:prefix. Restrict it to action "List" and always reset resourceObjectKey to "" so the resource ARN stays at bucket level. Also set s3:prefix to "" when no prefix is provided, so policies with StringEquals {"s3:prefix": ""} evaluate correctly.	2026-04-07 13:21:30 -07:00
Chris Lu	a4753b6a3b	S3: delay empty folder cleanup to prevent Spark write failures (#8970 ) * S3: delay empty folder cleanup to prevent Spark write failures (#8963) Empty folders were being cleaned up within seconds, causing Apache Spark (s3a) writes to fail when temporary directories like _temporary/0/task_xxx/ were briefly empty. - Increase default cleanup delay from 5s to 2 minutes - Only process queue items that have individually aged past the delay (previously the entire queue was drained once any item triggered) - Make the delay configurable via filer.toml: [filer.options] s3.empty_folder_cleanup_delay = "2m" * test: increase cleanup wait timeout to match 2m delay The empty folder cleanup delay was increased to 2 minutes, so the Spark integration test needs to wait longer for temporary directories to disappear. * fix: eagerly clean parent directories after empty folder deletion After deleting an empty folder, immediately try to clean its parent rather than relying on cascading metadata events that each re-enter the 2-minute delay queue. This prevents multi-minute waits when cleaning nested temporary directory trees (e.g. Spark's _temporary hierarchy with 3+ levels would take 6m+ vs near-instant). Fixes the CI failure where lingering _temporary parent directories were not cleaned within the test's 3-minute timeout.	2026-04-07 13:20:59 -07:00
Chris Lu	761ec7da00	fix(iceberg): use dot separator for namespace paths instead of unit separator (#8960 ) * fix(iceberg): use dot separator for namespace paths instead of unit separator The Iceberg REST Catalog handler was using \x1F (unit separator) to join multi-level namespaces when constructing S3 location and filer paths. The S3 Tables storage layer uses "." (dot) as the namespace separator, causing tables created via the Iceberg REST API to point to different paths than where S3 Tables actually stores them. Fixes #8959 * fix(iceberg): use dot separator in log messages for readable namespace output * fix(iceberg): use path.Join for S3 location path segments Use path.Join to construct the namespace/table path segments in fallback S3 locations for robustness and consistency with handleCreateTable. * test(iceberg): add multi-level namespace integration tests for Spark and Trino Add regression tests for #8959 that create a two-level namespace (e.g. "analytics.daily"), create a table under it, insert data, and query it back. This exercises the dot-separated namespace path construction and verifies that Spark/Trino can actually read the data at the S3 location returned by the Iceberg REST API. * fix(test): enable nested namespace in Trino Iceberg catalog config Trino requires `iceberg.rest-catalog.nested-namespace-enabled=true` to support multi-level namespaces. Without this, CREATE SCHEMA with a dotted name fails with "Nested namespace is not enabled for this catalog". * fix(test): parse Trino COUNT() output as integer instead of substring match Avoids false matches from strings.Contains(output, "3") by parsing the actual numeric result with strconv.Atoi and asserting equality. fix(test): use separate Trino config for nested namespace test The nested-namespace-enabled=true setting in Trino changes how SHOW SCHEMAS works, causing "Internal error" for all tests sharing that catalog config. Move the flag to a dedicated config used only by TestTrinoMultiLevelNamespace. * fix(iceberg): support parent query parameter in ListNamespaces for nested namespaces Add handling for the Iceberg REST spec's `parent` query parameter in handleListNamespaces. When Trino has nested-namespace-enabled=true, it sends `GET /v1/namespaces?parent=<ns>` to list child namespaces. The parent value is decoded from the Iceberg unit separator format and converted to a dot-separated prefix for the S3 Tables layer. Also simplify TestTrinoMultiLevelNamespace to focus on namespace operations (create, list, show tables) rather than data operations, since Trino's REST catalog has a non-empty location check that conflicts with server-side metadata creation. * fix(test): expand Trino multi-level namespace test and merge config helpers - Expand TestTrinoMultiLevelNamespace to create a table with explicit location, insert rows, query them back, and verify the S3 file path contains the dot-separated namespace (not \x1F). This ensures the original #8959 bug would be caught by the Trino integration test. - Merge writeTrinoConfig and writeTrinoNestedNamespaceConfig into a single parameterized function using functional options.	2026-04-07 12:21:22 -07:00
Chris Lu	d4548376a1	fix(ec): off-by-one in nLargeBlockRows causes EC read corruption (#8957 ) * fix(ec): off-by-one in nLargeBlockRows causes EC read corruption (#8947) The nLargeBlockRows formula in locateOffset used (shardDatSize-1)/largeBlockLength, which produces an off-by-one error when shardDatSize is an exact multiple of largeBlockLength (e.g. a 30GB volume with 10 data shards = 3GB per shard). This causes needles in the last large block row to be mislocated as small blocks, reading from completely wrong shard positions and returning garbage data. Fix: remove the -1 from locateOffset and only apply it in the ecdFileSize fallback path (old volumes without datFileSize in .vif), where it's needed to handle the ambiguous case conservatively. Also fix ReadEcShardNeedle to pass offset=0 to ReadBytes, consistent with the scrub path, since the bytes buffer already starts at position 0. * fix: add volume context to EC read errors, remove contextless glog The glog.Errorf in ReadBytes logged "entry not found" without any volume ID, making it impossible to identify which volume was affected. Remove this contextless log and instead add volume ID, needle ID, offset, and size to the error returned from the EC read path. The EC scrub callers already wrap errors with volume context.	2026-04-07 12:02:51 -07:00
Chris Lu	45bf3ad058	shell: add s3.user.* and s3.policy.attach\|detach commands (#8954 ) * shell: add s3.user.* and s3.policy.attach\|detach commands Add focused IAM shell commands following a noun-verb model: - s3.user.create: create user with auto-generated or explicit credentials - s3.user.list: tabular listing with status, policies, key count - s3.user.show: detailed user view (status, source, policies, credentials) - s3.user.delete: delete a user - s3.user.enable: enable a disabled user - s3.user.disable: disable a user (preserves credentials and policies) - s3.policy.attach: attach a named policy to a user - s3.policy.detach: detach a policy from a user These commands are thin wrappers over the existing IAM gRPC service, producing human-readable output instead of raw protobuf text. This is part of a larger effort to replace the monolithic s3.configure command with a composable set of single-purpose commands. * shell: address review feedback for s3.user.* and s3.policy.attach\|detach - Return flag parse errors instead of swallowing them (all commands) - Use GetConfiguration instead of N+1 GetUser calls in s3.user.list - Add nil check for resp.Identity in s3.user.show - Fix GetPolicy error masking in s3.policy.attach (wrap original error) - Simplify joinMax using strings.Join * shell: add nil identity guards and wrap gRPC errors - Add nil check for resp.Identity in policy_attach, policy_detach, user_enable, user_disable - Wrap GetUser errors with user context for better diagnostics	2026-04-07 11:26:57 -07:00
Chris Lu	d123a2768b	shell: add s3.accesskey., s3.anonymous., s3.serviceaccount.* commands (#8955 ) * shell: add s3.accesskey., s3.anonymous., s3.serviceaccount.* commands Add credential, anonymous access, and service account management commands: Access key commands: - s3.accesskey.create: add credentials to an existing user - s3.accesskey.list: list access keys for a user (key ID + status) - s3.accesskey.delete: remove a specific access key - s3.accesskey.rotate: atomic create-new + delete-old key rotation Anonymous access commands: - s3.anonymous.set: set/remove public access on a bucket - s3.anonymous.get: show anonymous access for a bucket - s3.anonymous.list: list all buckets with anonymous access Service account commands: - s3.serviceaccount.create: create with optional action subset and expiry - s3.serviceaccount.list: tabular listing, optionally filtered by parent - s3.serviceaccount.show: detailed view of a service account - s3.serviceaccount.delete: remove a service account These replace the credential and anonymous portions of the monolithic s3.configure and s3.bucket.access commands. * shell: address review feedback for s3.accesskey., s3.anonymous., s3.serviceaccount.* - Return flag parse errors instead of swallowing them (all commands) - Add action validation in s3.anonymous.set (Read, Write, List, Tagging, Admin) - Fix s3.serviceaccount.create output: note to use list for server-assigned ID since CreateServiceAccountResponse does not return the ID * shell: fix bucket matching and action validation in s3.anonymous.* - Use SplitN instead of HasSuffix for bucket name matching to avoid false positives when one bucket name is a suffix of another - Make action validation case-insensitive with canonical normalization * shell: fix nil panics, dedup actions, validate service account actions - Fix nil-pointer panic in getOrCreateAnonymousUser when GetUser returns err==nil with nil Identity (status.FromError(nil) returns nil status) - Add nil Identity guards in s3.anonymous.get and s3.anonymous.list - Deduplicate action values in s3.anonymous.set (e.g. -access Read,Read) - Add action validation in s3.serviceaccount.create with case normalization * shell: dedup actions and reject negative expiry in s3.serviceaccount.create - Deduplicate -actions values (e.g. Read,read,Read produces one entry) - Reject negative -expiry values instead of silently treating as no expiration	2026-04-07 11:20:15 -07:00
Chris Lu	733517df30	fix(s3): s3:PutObject bucket policy now implicitly allows multipart uploads (#8968 ) * fix(s3): s3:PutObject bucket policy now implicitly allows multipart uploads The PolicyEngine.evaluateStatement() method used raw regex matching for actions, bypassing the multipart-inherits-PutObject logic that only existed in the unused CompiledStatement.MatchesAction() code path. When a bucket policy granted only s3:PutObject, multipart upload operations (CreateMultipartUpload, UploadPart, CompleteMultipartUpload, etc.) were denied, forcing users to explicitly list every multipart action. Fixes https://github.com/seaweedfs/seaweedfs/discussions/8751 * fix(s3): add s3:UploadPartCopy to multipartActionSet and improve test coverage Add missing S3_ACTION_UPLOAD_PART_COPY constant and include it in multipartActionSet so UploadPartCopy is implicitly allowed by s3:PutObject. Also add a bucket-ARN sub-test for ListBucketMultipartUploads to verify that an object-only resource pattern does not match bucket-level requests.	2026-04-07 11:13:29 -07:00
Chris Lu	0fed72d95a	volume.tier.move: fulfill target replication before deleting old replicas (#8950 ) * volume.tier.move: fulfill target replication before deleting old replicas When -toReplication is specified, volume.tier.move now creates all required replicas on the destination tier before deleting old replicas. This closes the data-loss window where only one copy existed on the target tier while awaiting volume.fix.replication. If replication fulfillment fails, old replicas are preserved and marked writable so the volume remains accessible. Also extracts replicateVolumeToServer and configureVolumeReplication helpers to reduce duplication across volume.tier.move and volume.fix.replication. Fixes #8937 * volume.tier.move: always fulfill replication before deleting old replicas When -toReplication is specified, use that replication setting. Otherwise, read the volume's existing replication from the super block. In both cases, all required replicas are created on the destination tier before old replicas are deleted. If replication fulfillment fails (e.g. not enough destination nodes), old replicas are preserved and marked writable so no data is lost. * volume.tier.move: address review feedback on ensureReplicationFulfilled - Add 5s delay before re-collecting topology to allow master heartbeat propagation after the move - Add nil guard for targetTierReplicas to prevent panic if the moved replica is not yet visible in the topology - Treat configureVolumeReplication failure as a hard error instead of a warning, so the rollback logic preserves old replicas * volume.tier.move: harden replication config error handling - Make configureVolumeReplication failure on the primary moved replica a hard error that aborts the move, instead of logging and continuing - Configure replication metadata on all existing target-tier replicas (not just newly created ones) when -toReplication is specified - Deletion of old replicas cannot affect new replicas since the locations list only contains pre-move servers (verified, no change) * volume.tier.move: fix cleanup deleting fulfilled replicas and broken recovery Fix 1: The cleanup loop now preserves pre-existing target-tier replicas that ensureReplicationFulfilled counted toward the replication target. Previously, a mixed-tier volume with an existing replica on the target tier could have that replica deleted right after being counted as fulfilled, leaving the volume under-replicated. ensureReplicationFulfilled now returns a preserveServers set that the deletion loop checks before removing any old replica. Fix 2: Failure paths after LiveMoveVolume (which deletes the source replica) now use restoreSurvivingReplicasWritable instead of markVolumeReplicasWritable. The old helper stopped on first error, so attempting to mark the already-deleted source writable would prevent all surviving replicas from being restored. The new helper skips the deleted source and continues through all remaining locations, logging per-replica errors instead of aborting. * volume.tier.move: mark preserved replicas writable, skip nodes with existing volume Fix 1: Preserved pre-existing target-tier replicas were left read-only after the move completed. They were marked read-only at the start (along with all other replicas) but never restored since the old code deleted them. Now they are explicitly marked writable before cleanup. Fix 2: The fulfillment loop could pick a candidate node that already hosts this volume on a different disk type, causing a VolumeCopy conflict. Added a guard that skips any node already hosting the volume (on any disk) before attempting replication.	2026-04-06 14:55:37 -07:00
dependabot[bot]	d0692f14ad	build(deps): bump github.com/aws/aws-sdk-go-v2/credentials from 1.19.13 to 1.19.14 (#8942 ) build(deps): bump github.com/aws/aws-sdk-go-v2/credentials Bumps [github.com/aws/aws-sdk-go-v2/credentials](https://github.com/aws/aws-sdk-go-v2) from 1.19.13 to 1.19.14. - [Release notes](https://github.com/aws/aws-sdk-go-v2/releases) - [Commits](https://github.com/aws/aws-sdk-go-v2/compare/credentials/v1.19.13...credentials/v1.19.14) --- updated-dependencies: - dependency-name: github.com/aws/aws-sdk-go-v2/credentials dependency-version: 1.19.14 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-04-06 13:26:02 -07:00
Chris Lu	69218c88fe	fix(stats): fix build on openbsd, solaris, and windows (#8951 ) fix(stats): replace undefined calculateDiskRemaining with inline calculation disk_openbsd.go, disk_solaris.go, and disk_windows.go all call calculateDiskRemaining() which is never defined, causing build failures on those platforms. Replace with the same inline calculation used in disk_supported.go.	2026-04-06 12:48:48 -07:00
Chris Lu	b0a4647d87	fix: prevent stack overflow in ECBalanceTask.reportProgress (#8949 ) * fix: prevent stack overflow in ECBalanceTask.reportProgress Add re-entry guard to reportProgress() to prevent infinite recursion. The progressCallback invoked by ReportProgressWithStage can re-enter reportProgress, causing a stack overflow that crashes the worker process (goroutine stack exceeds 1GB limit after ~22M frames). * fix: use atomics for progress and re-entry guard to avoid data races Address review feedback: GetProgress() can be called from a different goroutine while reportProgress is updating the value. Use atomic operations for both the progress field (via Float64bits/Float64frombits) and the reporting re-entry guard (via CompareAndSwap).	2026-04-06 12:26:38 -07:00
dependabot[bot]	83a632669a	build(deps): bump github.com/aws/aws-sdk-go-v2/service/s3 from 1.96.0 to 1.98.0 (#8943 ) build(deps): bump github.com/aws/aws-sdk-go-v2/service/s3 Bumps [github.com/aws/aws-sdk-go-v2/service/s3](https://github.com/aws/aws-sdk-go-v2) from 1.96.0 to 1.98.0. - [Release notes](https://github.com/aws/aws-sdk-go-v2/releases) - [Commits](https://github.com/aws/aws-sdk-go-v2/compare/service/s3/v1.96.0...service/s3/v1.98.0) --- updated-dependencies: - dependency-name: github.com/aws/aws-sdk-go-v2/service/s3 dependency-version: 1.98.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-04-06 10:51:10 -07:00
dependabot[bot]	331d76e024	build(deps): bump google.golang.org/api from 0.267.0 to 0.274.0 (#8945 ) Bumps [google.golang.org/api](https://github.com/googleapis/google-api-go-client) from 0.267.0 to 0.274.0. - [Release notes](https://github.com/googleapis/google-api-go-client/releases) - [Changelog](https://github.com/googleapis/google-api-go-client/blob/main/CHANGES.md) - [Commits](https://github.com/googleapis/google-api-go-client/compare/v0.267.0...v0.274.0) --- updated-dependencies: - dependency-name: google.golang.org/api dependency-version: 0.274.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-04-06 10:50:49 -07:00
dependabot[bot]	2b73db9c71	build(deps): bump go.etcd.io/etcd/client/v3 from 3.6.9 to 3.6.10 (#8944 ) Bumps [go.etcd.io/etcd/client/v3](https://github.com/etcd-io/etcd) from 3.6.9 to 3.6.10. - [Release notes](https://github.com/etcd-io/etcd/releases) - [Commits](https://github.com/etcd-io/etcd/compare/v3.6.9...v3.6.10) --- updated-dependencies: - dependency-name: go.etcd.io/etcd/client/v3 dependency-version: 3.6.10 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-04-06 10:50:41 -07:00
dependabot[bot]	9a7c731e68	build(deps): bump github.com/hashicorp/vault/api from 1.22.0 to 1.23.0 (#8941 ) Bumps [github.com/hashicorp/vault/api](https://github.com/hashicorp/vault) from 1.22.0 to 1.23.0. - [Release notes](https://github.com/hashicorp/vault/releases) - [Changelog](https://github.com/hashicorp/vault/blob/main/CHANGELOG-v1.10-v1.15.md) - [Commits](https://github.com/hashicorp/vault/compare/api/v1.22.0...api/v1.23.0) --- updated-dependencies: - dependency-name: github.com/hashicorp/vault/api dependency-version: 1.23.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-04-06 10:50:21 -07:00
dependabot[bot]	5c9d3949be	build(deps): bump actions/upload-artifact from 4 to 7 (#8940 ) Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4 to 7. - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](https://github.com/actions/upload-artifact/compare/v4...v7) --- updated-dependencies: - dependency-name: actions/upload-artifact dependency-version: '7' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-04-06 10:50:13 -07:00
dependabot[bot]	7dd6d5547e	build(deps): bump docker/login-action from 4.0.0 to 4.1.0 (#8939 ) Bumps [docker/login-action](https://github.com/docker/login-action) from 4.0.0 to 4.1.0. - [Release notes](https://github.com/docker/login-action/releases) - [Commits](https://github.com/docker/login-action/compare/v4...v4.1.0) --- updated-dependencies: - dependency-name: docker/login-action dependency-version: 4.1.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-04-06 10:50:06 -07:00
dependabot[bot]	b201386c8c	build(deps): bump actions/download-artifact from 4 to 8 (#8938 ) Bumps [actions/download-artifact](https://github.com/actions/download-artifact) from 4 to 8. - [Release notes](https://github.com/actions/download-artifact/releases) - [Commits](https://github.com/actions/download-artifact/compare/v4...v8) --- updated-dependencies: - dependency-name: actions/download-artifact dependency-version: '8' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-04-06 10:50:00 -07:00
Mmx233	3cea900241	fix: replication sinks upload ciphertext for SSE-encrypted objects (#8931 ) * fix: decrypt SSE-encrypted objects in S3 replication sink * fix: add SSE decryption support to GCS, Azure, B2, Local sinks * fix: return error instead of warning for SSE-C objects during replication * fix: close readers after upload to prevent resource leaks * fix: return error for unknown SSE types instead of passing through ciphertext * refactor(repl_util): extract CloseReader/CloseMaybeDecryptedReader helpers The io.Closer close-on-error and defer-close pattern was duplicated in copyWithDecryption and the S3 sink. Extract exported helpers to keep a single implementation and prevent future divergence. * fix(repl_util): warn on mixed SSE types across chunks in detectSSEType detectSSEType previously returned the SSE type of the first encrypted chunk without inspecting the rest. If an entry somehow has chunks with different SSE types, only the first type's decryption would be applied. Now scans all chunks and logs a warning on mismatch. * fix(repl_util): decrypt inline SSE objects during replication Small SSE-encrypted objects stored in entry.Content were being copied as ciphertext because: 1. detectSSEType only checked chunk metadata, but inline objects have no chunks — now falls back to checking entry.Extended for SSE keys 2. Non-S3 sinks short-circuited on len(entry.Content)>0, bypassing the decryption path — now call MaybeDecryptContent before writing Adds MaybeDecryptContent helper for decrypting inline byte content. * fix(repl_util): add KMS initialization for replication SSE decryption SSE-KMS decryption was not wired up for filer.backup — the only initialization was for SSE-S3 key manager. CreateSSEKMSDecryptedReader requires a global KMS provider which is only loaded by the S3 API auth-config path. Add InitializeSSEForReplication helper that initializes both SSE-S3 (from filer KEK) and SSE-KMS (from Viper config [kms] section / WEED_KMS_* env vars). Replace the SSE-S3-only init in filer_backup.go. * fix(replicator): initialize SSE decryption for filer.replicate The SSE decryption setup was only added to filer_backup.go, but the notification-based replicator (filer.replicate) uses the same sinks and was missing the required initialization. Add SSE init in NewReplicator so filer.replicate can decrypt SSE objects. * refactor(repl_util): fold entry param into CopyFromChunkViews Remove the CopyFromChunkViewsWithEntry wrapper and add the entry parameter directly to CopyFromChunkViews, since all callers already pass it. * fix(repl_util): guard SSE init with sync.Once, error on mixed SSE types InitializeWithFiler overwrites the global superKey on every call. Wrap InitializeSSEForReplication with sync.Once so repeated calls (e.g. from NewReplicator) are safe. detectSSEType now returns an error instead of logging a warning when chunks have inconsistent SSE types, so replication aborts rather than silently applying the wrong decryption to some chunks. * fix(repl_util): allow SSE init retry, detect conflicting metadata, add tests - Replace sync.Once with mutex+bool so transient failures (e.g. filer unreachable) don't permanently prevent initialization. Only successful init flips the flag; failed attempts allow retries. - Remove v.IsSet("kms") guard that prevented env-only KMS configs (WEED_KMS_) from being detected. Always attempt KMS loading and let LoadConfigurations handle "no config found". - detectSSEType now checks for conflicting extended metadata keys (e.g. both SeaweedFSSSES3Key and SeaweedFSSSEKMSKey present) and returns an error instead of silently picking the first match. - Add table-driven tests for detectSSEType, MaybeDecryptReader, and MaybeDecryptContent covering plaintext, uniform SSE, mixed chunks, inline SSE via extended metadata, conflicting metadata, and SSE-C. test(repl_util): add SSE-S3 and SSE-KMS integration tests Add round-trip encryption/decryption tests: - SSE-S3: encrypt with CreateSSES3EncryptedReader, decrypt with CreateSSES3DecryptedReader, verify plaintext matches - SSE-KMS: encrypt with AES-CTR, wire a mock KMSProvider via SetGlobalKMSProvider, build serialized KMS metadata, verify MaybeDecryptReader and MaybeDecryptContent produce correct plaintext Fix existing tests to check io.ReadAll errors. * test(repl_util): exercise full SSE-S3 path through MaybeDecryptReader Replace direct CreateSSES3DecryptedReader calls with end-to-end tests that go through MaybeDecryptReader → decryptSSES3 → DeserializeSSES3Metadata → GetSSES3IV → CreateSSES3DecryptedReader. Uses WEED_S3_SSE_KEK env var + a mock filer client to initialize the global key manager with a test KEK, then SerializeSSES3Metadata to build proper envelope-encrypted metadata. Cleanup restores the key manager state. * fix(localsink): write to temp file to prevent truncated replicas The local sink truncated the destination file before writing content. If decryption or chunk copy failed, the file was left empty/truncated, destroying the previous replica. Write to a temp file in the same directory and atomically rename on success. On any error the temp file is cleaned up and the existing replica is untouched. --------- Co-authored-by: Chris Lu <chris.lu@gmail.com>	2026-04-06 00:32:27 -07:00
Chris Lu	7ab6306e15	fix(kafka): resolve consumer group resumption timeout in e2e tests (#8935 ) * fix(kafka): resolve consumer group resumption timeout in e2e tests Three issues caused ConsumerGroupResumption to time out when the second consumer tried to resume from committed offsets: 1. ForceCompleteRebalance deadlock: performCleanup() held group.Mu.Lock then called ForceCompleteRebalance() which tried to acquire the same lock — a guaranteed deadlock on Go's non-reentrant sync.Mutex. Fixed by requiring callers to hold the lock (matching actual call sites). 2. Unbounded fallback fetch: when the multi-batch fetch timed out, the fallback GetStoredRecords call used the connection context (no deadline). A slow broker gRPC call could block the data-plane goroutine indefinitely, causing head-of-line blocking for all responses on that connection. Fixed with a 10-second timeout. 3. HWM lookup failure caused empty responses: after a consumer leaves and the partition is deactivated, GetLatestOffset can fail. The fetch handler treated this as "no data" and entered the long-poll loop (up to 10s × 4 retries = 40s timeout). Fixed by assuming data may exist when HWM lookup fails, so the actual fetch determines availability. * fix(kafka): address review feedback on HWM sentinel and fallback timeout - Don't expose synthetic HWM (requestedOffset+1) to clients; keep result.highWaterMark at 0 when the real HWM lookup fails. - Tie fallback timeout to client's MaxWaitTime instead of a fixed 10s, so one slow partition doesn't hold the reader beyond the request budget. * fix(kafka): use large HWM sentinel and clamp fallback timeout - Use requestedOffset+10000 as sentinel HWM instead of +1, so FetchMultipleBatches doesn't artificially limit to 1 record. - Add 2s floor to fallback timeout so disk reads via gRPC have a reasonable chance even when maxWaitMs is small or zero. * fix(kafka): use MaxInt64 sentinel and derive HWM from fetch result - Use math.MaxInt64 as HWM sentinel to avoid integer overflow risk (previously requestedOffset+10000 could wrap on large offsets). - After the fetch, derive a meaningful HWM from newOffset so the client never sees MaxInt64 or 0 in the response. * fix(kafka): use remaining time budget for fallback fetch The fallback was restarting the full maxWaitMs budget even though the multi-batch fetch already consumed part of it. Now compute remaining time from either the parent context deadline or maxWaitMs minus elapsed, skip the fallback if budget is exhausted, and clamp to [2s, 10s] bounds.	2026-04-05 20:13:57 -07:00
Chris Lu	72eb93919c	fix(gcssink): prevent empty object finalization on write failure (#8933 ) * fix(gcssink): prevent empty object finalization on write failure The GCS writer was created unconditionally with defer wc.Close(), which finalizes the upload even when content decryption or copy fails. This silently overwrites valid objects with empty data. Remove the unconditional defer, explicitly close on success to propagate errors, and delete the object on write failure. * fix(gcssink): use context cancellation instead of obj.Delete on failure obj.Delete() after a failed write would delete the existing object at that key, causing data loss on updates. Use a cancelable context instead — cancelling before Close() aborts the GCS upload without touching any pre-existing object.	2026-04-05 16:07:49 -07:00
Chris Lu	4fd974b16b	fix(azuresink): delete freshly created blob on write failure (#8934 ) * fix(azuresink): delete freshly created blob on write failure appendBlobClient.Create() runs before content decryption and copy. If MaybeDecryptContent or CopyFromChunkViews fails, an empty blob is left behind, silently replacing any previous valid data. Add cleanup that deletes the blob on content write errors when we were the ones who created it. * fix(azuresink): track recreated blobs for cleanup on write failure handleExistingBlob deletes and recreates the blob when overwrite is needed, but freshlyCreated was only set on the initial Create success path. Set freshlyCreated = needsWrite after handleExistingBlob so recreated blobs are also cleaned up on content write failure.	2026-04-05 16:07:34 -07:00
Chris Lu	b8fc99a9cd	fix(s3): apply PutObject multipart expansion to STS session policies (#8932 ) * fix(s3): apply PutObject multipart expansion to STS session policy evaluation (#8929) PR #8445 added logic to implicitly grant multipart upload actions when s3:PutObject is authorized, but only in the S3 API policy engine's CompiledStatement.MatchesAction(). STS session policies are evaluated through the IAM policy engine's matchesActions() -> awsIAMMatch() path, which did plain pattern matching without the multipart expansion. Add the same multipart expansion logic to the IAM policy engine's matchesActions() so that session policies containing s3:PutObject correctly allow multipart upload operations. * fix: make multipart action set lookup case-insensitive and optimize Address PR review feedback: - Lowercase multipartActionSet keys and use strings.ToLower for lookup, since AWS IAM actions are case-insensitive - Only check for s3:PutObject permission when the requested action is actually a multipart action, avoiding unnecessary awsIAMMatch calls - Add test case for case-insensitive multipart action matching	2026-04-05 14:06:50 -07:00
Mmx233	69cd5fa37b	fix: S3 sink puts all entry.Extended into Tagging header instead of only object tags (#8930 ) * test: add failing tests for S3 sink buildTaggingString * fix: S3 sink should only put object tags into Tagging header * fix: avoid sending empty x-amz-tagging header	2026-04-05 12:16:04 -07:00
Chris Lu	076d504044	fix(admin): reduce memory usage and verbose logging for large clusters (#8927 ) * fix(admin): reduce memory usage and verbose logging for large clusters (#8919) The admin server used excessive memory and produced thousands of log lines on clusters with many volumes (e.g., 33k volumes). Three root causes: 1. Scanner duplicated all volume metrics: getVolumeHealthMetrics() created VolumeHealthMetrics objects, then convertToTaskMetrics() copied them all into identical types.VolumeHealthMetrics. Now uses the task-system type directly, eliminating the duplicate allocation and removing convertToTaskMetrics. 2. All previous task states loaded at startup: LoadTasksFromPersistence read and deserialized every .pb file from disk, logging each one. With thousands of balance tasks persisted, this caused massive startup I/O, memory usage, and log noise (including unguarded DEBUG glog.Infof per task). Now starts with an empty queue — the scanner re-detects current needs from live cluster state. Terminal tasks are purged from memory and disk when new scan results arrive. 3. Verbose per-volume/per-node logging: V(2) and V(3) logs produced thousands of lines per scan. Per-volume logs bumped to V(4), per-node/rack/disk logs bumped to V(3). Topology summary now logs counts instead of full node ID arrays. Also removes lastTopologyInfo field from MaintenanceScanner — the raw protobuf topology is returned as a local value and not retained between 30-minute scans. * fix(admin): delete stale task files at startup, add DeleteAllTaskStates Old task .pb files from previous runs were left on disk. The periodic CleanupCompletedTasks still loads all files to find completed ones — the same expensive 4GB path from the pprof profile. Now at startup, DeleteAllTaskStates removes all .pb files by scanning the directory without reading or deserializing them. The scanner will re-detect any tasks still needed from live cluster state. * fix(admin): don't persist terminal tasks to disk CompleteTask was saving failed/completed tasks to disk where they'd accumulate. The periodic cleanup only triggered for completed tasks, not failed ones. Now terminal tasks are deleted from disk immediately and only kept in memory for the current session's UI. * fix(admin): cap in-memory tasks to 100 per job type Without a limit, the task map grows unbounded — balance could create thousands of pending tasks for a cluster with many imbalanced volumes. Now AddTask rejects new tasks when a job type already has 100 in the queue. The scanner will re-detect skipped volumes on the next scan. * fix(admin): address PR review - memory-only purge, active-only capacity - purgeTerminalTasks now only cleans in-memory map (terminal tasks are already deleted from disk by CompleteTask) - Per-type capacity limit counts only active tasks (pending/assigned/ in_progress), not terminal ones - When at capacity, purge terminal tasks first before rejecting * fix(admin): fix orphaned comment, add TaskStatusCancelled to terminal switch - Move hasQueuedOrActiveTaskForVolume comment to its function definition - Add TaskStatusCancelled to the terminal state switch in CompleteTask so cancelled task files are deleted from disk	2026-04-04 18:45:57 -07:00
Chris Lu	2c8a1ea6cc	fix(docker): disable glibc _FORTIFY_SOURCE for aarch64-musl cross builds When cross-compiling aws-lc-sys for aarch64-unknown-linux-musl using aarch64-linux-gnu-gcc, glibc's _FORTIFY_SOURCE generates calls to __memcpy_chk, __fprintf_chk etc. which don't exist in musl, causing linker errors. Disable it via CFLAGS_aarch64_unknown_linux_musl.	2026-04-04 14:25:05 -07:00
Chris Lu	4efe0acaf5	fix(master): fast resume state and default resumeState to true (#8925 ) * fix(master): fast resume state and default resumeState to true When resumeState is enabled in single-master mode, the raft server had existing log entries so the self-join path couldn't promote to leader. The server waited the full election timeout (10-20s) before self-electing. Fix by temporarily setting election timeout to 1ms before Start() when in single-master + resumeState mode with existing log, then restoring the original timeout after leader election. This makes resume near-instant. Also change the default for resumeState from false to true across all CLI commands (master, mini, server) so state is preserved by default. * fix(master): prevent fastResume goroutine from hanging forever Use defer to guarantee election timeout is always restored, and bound the polling loop with a timeout so it cannot spin indefinitely if leader election never succeeds. * fix(master): use ticker instead of time.After in fastResume polling loop	2026-04-04 14:15:56 -07:00
Chris Lu	0da1794856	fix(rust): remove transitive openssl dependency from seaweed-volume reqwest's default features include native-tls which depends on openssl-sys, causing builds to fail on musl targets where OpenSSL headers are not available. Since we already use rustls-tls, disable default features to eliminate the openssl-sys dependency entirely.	2026-04-04 14:07:01 -07:00
Chris Lu	47baf6c841	fix(docker): add Rust volume server pre-build to latest and dev container workflows Both container_latest.yml and container_dev.yml use Dockerfile.go_build which expects weed-volume-prebuilt/ with pre-compiled Rust binaries, but neither workflow produced them, causing COPY failures during docker build. Add build-rust-binaries jobs that natively cross-compile for amd64 and arm64, then download and place the artifacts in the Docker build context. Also fix the trivy-scan local build path in container_latest.yml.	2026-04-04 13:53:13 -07:00
Chris Lu	d37b592bc4	Update object_store_users_templ.go	2026-04-04 11:52:57 -07:00
Chris Lu	896114d330	fix(admin): fix master leader link showing incorrect port in Admin UI (#8924 ) fix(admin): use gRPC address for current server in RaftListClusterServers The old Raft implementation was returning the HTTP address (ms.option.Master) for the current server, while peers used gRPC addresses (peer.ConnectionString). The Admin UI's GetClusterMasters() converts all addresses from gRPC to HTTP via GrpcAddressToServerAddress (port - 10000), which produced a negative port (-667) for the current server since its address was already in HTTP format (port 9333). Use ToGrpcAddress() for consistency with both HashicorpRaft (which stores gRPC addresses) and old Raft peers. Fixes #8921	2026-04-04 11:50:43 -07:00
Chris Lu	f6df7126b6	feat(admin): add profiling options for debugging high memory/CPU usage (#8923 ) * feat(admin): add profiling options for debugging high memory/CPU usage Add -debug, -debug.port, -cpuprofile, and -memprofile flags to the admin command, matching the profiling support already available in master, volume, and other server commands. This enables investigation of resource usage issues like #8919. * refactor(admin): move profiling flags into AdminOptions struct Move cpuprofile and memprofile flags from global variables into the AdminOptions struct and init() function for consistency with other flags. * fix(debug): bind pprof server to localhost only and document profiling flags StartDebugServer was binding to all interfaces (0.0.0.0), exposing runtime profiling data to the network. Restrict to 127.0.0.1 since this is a development/debugging tool. Also add a "Debugging and Profiling" section to the admin command's help text documenting the new flags.	2026-04-04 10:05:19 -07:00
Chris Lu	9add18e169	fix(volume-rust): fix volume balance between Go and Rust servers (#8915 ) Two bugs prevented reliable volume balancing when a Rust volume server is the copy target: 1. find_last_append_at_ns returned None for delete tombstones (Size==0 in dat header), falling back to file mtime truncated to seconds. This caused the tail step to re-send needles from the last sub-second window. Fix: change `needle_size <= 0` to `< 0` since Size==0 delete needles still have a valid timestamp in their tail. 2. VolumeTailReceiver called read_body_v2 on delete needles, which have no DataSize/Data/flags — only checksum+timestamp+padding after the header. Fix: skip read_body_v2 when size == 0, reject negative sizes. Also: - Unify gRPC server bind: use TcpListener::bind before spawn for both TLS and non-TLS paths, propagating bind errors at startup. - Add mixed Go+Rust cluster test harness and integration tests covering VolumeCopy in both directions, copy with deletes, and full balance move with tail tombstone propagation and source deletion. - Make FindOrBuildRustBinary configurable for default vs no-default features (4-byte vs 5-byte offsets).	2026-04-04 09:13:23 -07:00
Chris Lu	d1823d3784	fix(s3): include static identities in listing operations (#8903 ) * fix(s3): include static identities in listing operations Static identities loaded from -s3.config file were only stored in the S3 API server's in-memory state. Listing operations (s3.configure shell command, aws iam list-users) queried the credential manager which only returned dynamic identities from the backend store. Register static identities with the credential manager after loading so they are included in LoadConfiguration and ListUsers results, and filtered out before SaveConfiguration to avoid persisting them to the dynamic store. Fixes https://github.com/seaweedfs/seaweedfs/discussions/8896 * fix: avoid mutating caller's config and defensive copies - SaveConfiguration: use shallow struct copy instead of mutating the caller's config.Identities field - SetStaticIdentities: skip nil entries to avoid panics - GetStaticIdentities: defensively copy PolicyNames slice to avoid aliasing the original * fix: filter nil static identities and sync on config reload - SetStaticIdentities: filter nil entries from the stored slice (not just from staticNames) to prevent panics in LoadConfiguration/ListUsers - Extract updateCredentialManagerStaticIdentities helper and call it from both startup and the grace.OnReload handler so the credential manager's static snapshot stays current after config file reloads * fix: add mutex for static identity fields and fix ListUsers for store callers - Add sync.RWMutex to protect staticIdentities/staticNames against concurrent reads during config reload - Revert CredentialManager.ListUsers to return only store users, since internal callers (e.g. DeletePolicy) look up each user in the store and fail on non-existent static entries - Merge static usernames in the filer gRPC ListUsers handler instead, via the new GetStaticUsernames method - Fix CI: TestIAMPolicyManagement/managed_policy_crud_lifecycle was failing because DeletePolicy iterated static users that don't exist in the store * fix: show static identities in admin UI and weed shell The admin UI and weed shell s3.configure command query the filer's credential manager via gRPC, which is a separate instance from the S3 server's credential manager. Static identities were only registered on the S3 server's credential manager, so they never appeared in the filer's responses. - Add CredentialManager.LoadS3ConfigFile to parse a static S3 config file and register its identities - Add FilerOptions.s3ConfigFile so the filer can load the same static config that the S3 server uses - Wire s3ConfigFile through in weed mini and weed server modes - Merge static usernames in filer gRPC ListUsers handler - Add CredentialManager.GetStaticUsernames helper - Add sync.RWMutex to protect concurrent access to static identity fields - Avoid importing weed/filer from weed/credential (which pulled in filer store init() registrations and broke test isolation) - Add docker/compose/s3_static_users_example.json * fix(admin): make static users read-only in admin UI Static users loaded from the -s3.config file should not be editable or deletable through the admin UI since they are managed via the config file. - Add IsStatic field to ObjectStoreUser, set from credential manager - Hide edit, delete, and access key buttons for static users in the users table template - Show a "static" badge next to static user names - Return 403 Forbidden from UpdateUser and DeleteUser API handlers when the target user is a static identity * fix(admin): show details for static users GetObjectStoreUserDetails called credentialManager.GetUser which only queries the dynamic store. For static users this returned ErrUserNotFound. Fall back to GetStaticIdentity when the store lookup fails. * fix(admin): load static S3 identities in admin server The admin server has its own credential manager (gRPC store) which is a separate instance from the S3 server's and filer's. It had no static identity data, so IsStaticIdentity returned false (edit/delete buttons shown) and GetStaticIdentity returned nil (details page failed). Pass the -s3.config file path through to the admin server and call LoadS3ConfigFile on its credential manager, matching the approach used for the filer. * fix: use protobuf is_static field instead of passing config file path The previous approach passed -s3.config file path to every component (filer, admin). This is wrong because the admin server should not need to know about S3 config files. Instead, add an is_static field to the Identity protobuf message. The field is set when static identities are serialized (in GetStaticIdentities and LoadS3ConfigFile). Any gRPC client that loads configuration via GetConfiguration automatically sees which identities are static, without needing the config file. - Add is_static field (tag 8) to iam_pb.Identity proto message - Set IsStatic=true in GetStaticIdentities and LoadS3ConfigFile - Admin GetObjectStoreUsers reads identity.IsStatic from proto - Admin IsStaticUser helper loads config via gRPC to check the flag - Filer GetUser gRPC handler falls back to GetStaticIdentity - Remove s3ConfigFile from AdminOptions and NewAdminServer signature	2026-04-03 20:01:28 -07:00
Chris Lu	0798b274dd	feat(s3): add concurrent chunk prefetch for large file downloads (#8917 ) * feat(s3): add concurrent chunk prefetch for large file downloads Add a pipe-based prefetch pipeline that overlaps chunk fetching with response writing during S3 GetObject, SSE downloads, and filer proxy. While chunk N streams to the HTTP response, fetch goroutines for the next K chunks establish HTTP connections to volume servers ahead of time, eliminating the RTT gap between sequential chunk fetches. Uses io.Pipe for minimal memory overhead (~1MB per download regardless of chunk size, vs buffering entire chunks). Also increases the streaming read buffer from 64KB to 256KB to reduce syscall overhead. Benchmark results (64KB chunks, prefetch=4): - 0ms latency: 1058 → 2362 MB/s (2.2× faster) - 5ms latency: 11.0 → 41.7 MB/s (3.8× faster) - 10ms latency: 5.9 → 23.3 MB/s (4.0× faster) - 20ms latency: 3.1 → 12.1 MB/s (3.9× faster) * fix: address review feedback for prefetch pipeline - Fix data race: use chunkPipeResult (pointer) on channel to avoid copying struct while fetch goroutines write to it. Confirmed clean with -race detector. - Remove concurrent map write: retryWithCacheInvalidation no longer updates fileId2Url map. Producer only reads it; consumer never writes. - Use mem.Allocate/mem.Free for copy buffer to reduce GC pressure. - Add local cancellable context so consumer errors (client disconnect) immediately stop the producer and all in-flight fetch goroutines. fix(test): remove dead code and add Range header support in test server - Remove unused allData variable in makeChunksAndServer - Add Range header handling to createTestServer for partial chunk read coverage (206 Partial Content, 416 Range Not Satisfiable) * fix: correct retry condition and goroutine leak in prefetch pipeline - Fix retry condition: use result.fetchErr/result.written instead of copied to decide cache-invalidation retry. The old condition wrongly triggered retry when the fetch succeeded but the response writer failed on the first write (copied==0 despite fetcher having data). Now matches the sequential path (stream.go:197) which checks whether the fetcher itself wrote zero bytes. - Fix goroutine leak: when the producer's send to the results channel is interrupted by context cancellation, the fetch goroutine was already launched but the result was never sent to the channel. The drain loop couldn't handle it. Now waits on result.done before returning so every fetch goroutine is properly awaited.	2026-04-03 19:57:30 -07:00
Chris Lu	3efe88c718	feat(s3): store and return checksum headers for additional checksum algorithms (#8914 ) * feat(s3): store and return checksum headers for additional checksum algorithms When clients upload with --checksum-algorithm (SHA256, CRC32, etc.), SeaweedFS validated the checksum but discarded it. The checksum was never stored in metadata or returned in PUT/HEAD/GET responses. Now the checksum is computed alongside MD5 during upload, stored in entry extended attributes, and returned as the appropriate x-amz-checksum-* header in all responses. Fixes #8911 * fix(s3): address review feedback and CI failures for checksum support - Gate GET/HEAD checksum response headers on x-amz-checksum-mode: ENABLED per AWS S3 spec, fixing FlexibleChecksumError on ranged GETs and multipart copies - Verify computed checksum against client-provided header value for non-chunked uploads, returning BadDigest on mismatch - Add nil check for getCheckSumWriter to prevent panic - Handle comma-separated values in X-Amz-Trailer header - Use ordered slice instead of map for deterministic checksum header selection; extract shared mappings into package-level vars * fix(s3): skip checksum header for ranged GET responses The stored checksum covers the full object. Returning it for ranged (partial) responses causes SDK checksum validation failures because the SDK validates the header value against the partial content received. Skip emitting x-amz-checksum-* headers when a Range request header is present, fixing PyArrow large file read failures. * fix(s3): reject unsupported checksum algorithm with 400 detectRequestedChecksumAlgorithm now returns an error code when x-amz-sdk-checksum-algorithm or x-amz-checksum-algorithm contains an unsupported value, instead of silently ignoring it. * feat(s3): compute composite checksum for multipart uploads Store the checksum algorithm during CreateMultipartUpload, then during CompleteMultipartUpload compute a composite checksum from per-part checksums following the AWS S3 spec: concatenate raw per-part checksums, hash with the same algorithm, format as "base64-N" where N is part count. The composite checksum is persisted on the final object entry and returned in HEAD/GET responses (gated on x-amz-checksum-mode: ENABLED). Reuses existing per-part checksum storage from putToFiler and the getCheckSumWriter/checksumHeaders infrastructure. * fix(s3): validate checksum algorithm in CreateMultipartUpload, error on missing part checksums - Move detectRequestedChecksumAlgorithm call before mkdir callback so an unsupported algorithm returns 400 before the upload is created - Change computeCompositeChecksum to return an error when a part is missing its checksum (the upload was initiated with a checksum algorithm, so all parts must have checksums) - Propagate the error as ErrInvalidPart in CompleteMultipartUpload * fix(s3): return checksum header in CompleteMultipartUpload response, validate per-part algorithm - Add ChecksumHeaderName/ChecksumValue fields to CompleteMultipartUploadResult and set the x-amz-checksum-* HTTP response header in the handler, matching the AWS S3 CompleteMultipartUpload response spec - Validate that each part's stored checksum algorithm matches the upload's expected algorithm before assembling the composite checksum; return an error if a part was uploaded with a different algorithm	2026-04-03 18:37:54 -07:00
Chris Lu	36f37b9b6a	fix(filer): remove cancellation guard from RollbackTransaction and clean up #8909 (#8916 ) * fix(filer): remove cancellation guard from RollbackTransaction and clean up #8909 RollbackTransaction is a cleanup operation that must succeed even when the context is cancelled — guarding it causes the exact orphaned state that #8909 was trying to prevent. Also: - Use single-evaluation `if err := ctx.Err(); err != nil` pattern instead of double-calling ctx.Err() - Remove spurious blank lines before guards - Add context.DeadlineExceeded test coverage - Simplify tests from ~230 lines to ~130 lines * fix(filer): call cancel() in expiredCtx and test rollback with expired context - Call cancel() instead of suppressing it to avoid leaking timer resources - Test RollbackTransaction with both cancelled and expired contexts	2026-04-03 17:55:27 -07:00
os-pradipbabar	d5128f00f1	fix: Prevent orphaned metadata from cancelled S3 operations (Issue #8908 ) (#8909 ) fix(filer): check if context was already cancelled before ignoring cancellation	2026-04-03 16:22:46 -07:00
Lars Lehtonen	d49c2a7364	chore(weed/admin/plugin): prune unused functions (#8912 ) Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com>	2026-04-03 16:05:42 -07:00
Chris Lu	995dfc4d5d	chore: remove ~50k lines of unreachable dead code (#8913 ) * chore: remove unreachable dead code across the codebase Remove ~50,000 lines of unreachable code identified by static analysis. Major removals: - weed/filer/redis_lua: entire unused Redis Lua filer store implementation - weed/wdclient/net2, resource_pool: unused connection/resource pool packages - weed/plugin/worker/lifecycle: unused lifecycle plugin worker - weed/s3api: unused S3 policy templates, presigned URL IAM, streaming copy, multipart IAM, key rotation, and various SSE helper functions - weed/mq/kafka: unused partition mapping, compression, schema, and protocol functions - weed/mq/offset: unused SQL storage and migration code - weed/worker: unused registry, task, and monitoring functions - weed/query: unused SQL engine, parquet scanner, and type functions - weed/shell: unused EC proportional rebalance functions - weed/storage/erasure_coding/distribution: unused distribution analysis functions - Individual unreachable functions removed from 150+ files across admin, credential, filer, iam, kms, mount, mq, operation, pb, s3api, server, shell, storage, topology, and util packages * fix(s3): reset shared memory store in IAM test to prevent flaky failure TestLoadIAMManagerFromConfig_EmptyConfigWithFallbackKey was flaky because the MemoryStore credential backend is a singleton registered via init(). Earlier tests that create anonymous identities pollute the shared store, causing LookupAnonymous() to unexpectedly return true. Fix by calling Reset() on the memory store before the test runs. * style: run gofmt on changed files * fix: restore KMS functions used by integration tests * fix(plugin): prevent panic on send to closed worker session channel The Plugin.sendToWorker method could panic with "send on closed channel" when a worker disconnected while a message was being sent. The race was between streamSession.close() closing the outgoing channel and sendToWorker writing to it concurrently. Add a done channel to streamSession that is closed before the outgoing channel, and check it in sendToWorker's select to safely detect closed sessions without panicking.	2026-04-03 16:04:27 -07:00
Chris Lu	8fad85aed7	feat(s3): support WEED_S3_SSE_KEY env var for SSE-S3 KEK (#8904 ) * feat(s3): support WEED_S3_SSE_KEY env var for SSE-S3 KEK Add support for providing the SSE-S3 Key Encryption Key (KEK) via the WEED_S3_SSE_KEY environment variable (hex-encoded 256-bit key). This avoids storing the master key in plaintext on the filer at /etc/s3/sse_kek. Key source priority: 1. WEED_S3_SSE_KEY environment variable (recommended) 2. Existing filer KEK at /etc/s3/sse_kek (backward compatible) 3. Auto-generate and save to filer (deprecated for new deployments) Existing deployments with a filer-stored KEK continue to work unchanged. A deprecation warning is logged when auto-generating a new filer KEK. * refactor(s3): derive KEK from any string via HKDF instead of requiring hex Accept any secret string in WEED_S3_SSE_KEY and derive a 256-bit key using HKDF-SHA256 instead of requiring a hex-encoded key. This is simpler for users — no need to generate hex, just set a passphrase. * feat(s3): add WEED_S3_SSE_KEK and WEED_S3_SSE_KEY env vars for KEK Two env vars for providing the SSE-S3 Key Encryption Key: - WEED_S3_SSE_KEK: hex-encoded, same format as /etc/s3/sse_kek. If the filer file also exists, they must match. - WEED_S3_SSE_KEY: any string, 256-bit key derived via HKDF-SHA256. Refuses to start if /etc/s3/sse_kek exists (must delete first). Only one may be set. Existing filer-stored KEKs continue to work. Auto-generating and storing new KEKs on filer is deprecated. * fix(s3): stop auto-generating KEK, fail only when SSE-S3 is used Instead of auto-generating a KEK and storing it on the filer when no key source is configured, simply leave SSE-S3 disabled. Encrypt and decrypt operations return a clear error directing the user to set WEED_S3_SSE_KEK or WEED_S3_SSE_KEY. * refactor(s3): move SSE-S3 KEK config to security.toml Move KEK configuration from standalone env vars to security.toml's new [sse_s3] section, following the same pattern as JWT keys and TLS certs. [sse_s3] kek = "" # hex-encoded 256-bit key (same format as /etc/s3/sse_kek) key = "" # any string, HKDF-derived Viper's WEED_ prefix auto-mapping provides env var support: WEED_SSE_S3_KEK and WEED_SSE_S3_KEY. All existing behavior is preserved: filer KEK fallback, mismatch detection, and HKDF derivation. * refactor(s3): rename SSE-S3 config keys to s3.sse.kek / s3.sse.key Use [s3.sse] section in security.toml, matching the existing naming convention (e.g. [s3.]). Env vars: WEED_S3_SSE_KEK, WEED_S3_SSE_KEY. fix(s3): address code review findings for SSE-S3 KEK - Don't hold mutex during filer retry loop (up to 20s of sleep). Lock only to write filerClient and superKey. - Remove dead generateAndSaveSuperKeyToFiler and unused constants. - Return error from deriveKeyFromSecret instead of ignoring it. - Fix outdated doc comment on InitializeWithFiler. - Use t.Setenv in tests instead of manual os.Setenv/Unsetenv. * fix(s3): don't block startup on filer errors when KEK is configured - When s3.sse.kek is set, a temporarily unreachable filer no longer prevents startup. The filer consistency check becomes best-effort with a warning. - Same treatment for s3.sse.key: filer unreachable logs a warning instead of failing. - Rewrite error messages to suggest migration instead of file deletion, avoiding the risk of orphaning encrypted data. Finding 3 (restore auto-generation) intentionally skipped — auto-gen was removed by design to avoid storing plaintext KEK on filer. * fix(test): set WEED_S3_SSE_KEY in SSE integration test server startup SSE-S3 no longer auto-generates a KEK, so integration tests must provide one. Set WEED_S3_SSE_KEY=test-sse-s3-key in all weed mini invocations in the test Makefile.	2026-04-03 13:01:21 -07:00
Chris Lu	2e98902f29	fix(s3): use URL-safe secret keys for dashboard users and service accounts (#8902 ) * fix(s3): use URL-safe secret keys for admin dashboard users and service accounts The dashboard's generateSecretKey() used base64.StdEncoding which produces +, /, and = characters that break S3 signature authentication. Reuse the IAM package's GenerateSecretAccessKey() which was already fixed in #7990. Fixes #8898 * fix: handle error from GenerateSecretAccessKey instead of ignoring it	2026-04-03 11:20:28 -07:00
Jaehoon Kim	d3cea714d0	fix(filer.backup): local sink readonly permission (#8907 )	2026-04-03 05:36:56 -07:00
dependabot[bot]	91087c0737	build(deps): bump github.com/go-jose/go-jose/v4 from 4.1.3 to 4.1.4 in /test/kafka (#8899 ) build(deps): bump github.com/go-jose/go-jose/v4 in /test/kafka Bumps [github.com/go-jose/go-jose/v4](https://github.com/go-jose/go-jose) from 4.1.3 to 4.1.4. - [Release notes](https://github.com/go-jose/go-jose/releases) - [Commits](https://github.com/go-jose/go-jose/compare/v4.1.3...v4.1.4) --- updated-dependencies: - dependency-name: github.com/go-jose/go-jose/v4 dependency-version: 4.1.4 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-04-03 00:13:07 -07:00
dependabot[bot]	d2d21cd26b	build(deps): bump github.com/go-jose/go-jose/v4 from 4.1.3 to 4.1.4 (#8900 ) Bumps [github.com/go-jose/go-jose/v4](https://github.com/go-jose/go-jose) from 4.1.3 to 4.1.4. - [Release notes](https://github.com/go-jose/go-jose/releases) - [Commits](https://github.com/go-jose/go-jose/compare/v4.1.3...v4.1.4) --- updated-dependencies: - dependency-name: github.com/go-jose/go-jose/v4 dependency-version: 4.1.4 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-04-03 00:13:01 -07:00
Chris Lu	0503311ded	Merge branch 'master' of https://github.com/seaweedfs/seaweedfs	2026-04-02 18:39:28 -07:00
Chris Lu	bb23939b36	fix(volume-rust): resolve gRPC bind address from hostname SocketAddr::parse() only accepts numeric IPs, so binding the gRPC server to "localhost:18833" panicked. Use tokio::net::lookup_host() to resolve hostnames before passing to tonic's serve_with_shutdown.	2026-04-02 18:36:45 -07:00
Chris Lu	059bee683f	feat(s3): add STS GetFederationToken support (#8891 ) * feat(s3): add STS GetFederationToken support Implement the AWS STS GetFederationToken API, which allows long-term IAM users to obtain temporary credentials scoped down by an optional inline session policy. This is useful for server-side applications that mint per-user temporary credentials. Key behaviors: - Requires SigV4 authentication from a long-term IAM user - Rejects calls from temporary credentials (session tokens) - Name parameter (2-64 chars) identifies the federated user - DurationSeconds supports 900-129600 (15 min to 36 hours, default 12h) - Optional inline session policy for permission scoping - Caller's attached policies are embedded in the JWT token - Returns federated user ARN: arn:aws:sts::<account>:federated-user/<Name> No performance impact on the S3 hot path — credential vending is a separate control-plane operation, and all policy data is embedded in the stateless JWT token. * fix(s3): address GetFederationToken PR review feedback - Fix Name validation: max 32 chars (not 64) per AWS spec, add regex validation for [\w+=,.@-]+ character whitelist - Refactor parseDurationSeconds into parseDurationSecondsWithBounds to eliminate duplicated duration parsing logic - Add sts:GetFederationToken permission check via VerifyActionPermission mirroring the AssumeRole authorization pattern - Change GetPoliciesForUser to return ([]string, error) so callers fail closed on policy-resolution failures instead of silently returning nil - Move temporary-credentials rejection before SigV4 verification for early rejection and proper test coverage - Update tests: verify specific error message for temp cred rejection, add regex validation test cases (spaces, slashes rejected) * refactor(s3): use sts.Action* constants instead of hard-coded strings Replace hard-coded "sts:AssumeRole" and "sts:GetFederationToken" strings in VerifyActionPermission calls with sts.ActionAssumeRole and sts.ActionGetFederationToken package constants. * fix(s3): pass through sts: prefix in action resolver and merge policies Two fixes: 1. mapBaseActionToS3Format now passes through "sts:" prefix alongside "s3:" and "iam:", preventing sts:GetFederationToken from being rewritten to s3:sts:GetFederationToken in VerifyActionPermission. This also fixes the existing sts:AssumeRole permission checks. 2. GetFederationToken policy embedding now merges identity.PolicyNames (from SigV4 identity) with policies from the IAM manager (which may include group-attached policies), deduplicated via a map. Previously the IAM manager lookup was skipped when identity.PolicyNames was non-empty, causing group policies to be omitted from the token. * test(s3): add integration tests for sts: action passthrough and policy merge Action resolver tests: - TestMapBaseActionToS3Format_ServicePrefixPassthrough: verifies s3:, iam:, and sts: prefixed actions pass through unchanged while coarse actions (Read, Write) are mapped to S3 format - TestResolveS3Action_STSActionsPassthrough: verifies sts:AssumeRole, sts:GetFederationToken, sts:GetCallerIdentity pass through ResolveS3Action unchanged with both nil and real HTTP requests Policy merge tests: - TestGetFederationToken_GetPoliciesForUser: tests IAMManager.GetPoliciesForUser with no user store (error), missing user, user with policies, user without - TestGetFederationToken_PolicyMergeAndDedup: tests that identity.PolicyNames and IAM-manager-resolved policies are merged and deduplicated (SharedPolicy appears in both sources, result has 3 unique policies) - TestGetFederationToken_PolicyMergeNoManager: tests that when IAM manager is unavailable, identity.PolicyNames alone are embedded * test(s3): add end-to-end integration tests for GetFederationToken Add integration tests that call GetFederationToken using real AWS SigV4 signed HTTP requests against a running SeaweedFS instance, following the existing pattern in test/s3/iam/s3_sts_assume_role_test.go. Tests: - TestSTSGetFederationTokenValidation: missing name, name too short/long, invalid characters, duration too short/long, malformed policy, anonymous rejection (7 subtests) - TestSTSGetFederationTokenRejectTemporaryCredentials: obtains temp creds via AssumeRole then verifies GetFederationToken rejects them - TestSTSGetFederationTokenSuccess: basic success, custom 1h duration, 36h max duration with expiration time verification - TestSTSGetFederationTokenWithSessionPolicy: creates a bucket, obtains federated creds with GetObject-only session policy, verifies GetObject succeeds and PutObject is denied using the AWS SDK S3 client	2026-04-02 17:37:05 -07:00
Chris Lu	b8236a10d1	perf(docker): pre-build Rust binaries to avoid 5-hour QEMU emulation Cross-compile Rust volume server natively for amd64/arm64 using musl targets in a separate job, then inject pre-built binaries into the Docker build. This replaces the ~5-hour QEMU-emulated cargo build with ~15 minutes of native cross-compilation. The Dockerfile falls back to building from source when no pre-built binary is found, preserving local build compatibility.	2026-04-02 16:57:28 -07:00
Chris Lu	a4b896a224	fix(s3): skip directories before marker in ListObjectVersions pagination (#8890 ) * fix(s3): skip directories before marker in ListObjectVersions pagination ListObjectVersions was re-traversing the entire directory tree from the beginning on every paginated request, only skipping entries at the leaf level. For buckets with millions of objects in deep hierarchies, this caused exponentially slower responses as pagination progressed. Two optimizations: 1. Use keyMarker to compute a startFrom position at each directory level, skipping directly to the relevant entry instead of scanning from the beginning (mirroring how ListObjects uses marker descent). 2. Skip recursing into subdirectories whose keys are entirely before the keyMarker. Changes per-page cost from O(entries_before_marker) to O(tree_depth). * test(s3): add integration test for deep-hierarchy version listing pagination Adds TestVersioningPaginationDeepDirectoryHierarchy which creates objects across 20 subdirectories at depth 6 (mimicking Veeam 365 backup layout) and paginates through them with small maxKeys. Verifies correctness (no duplicates, sorted order, all objects found) and checks that later pages don't take dramatically longer than earlier ones — the symptom of the pre-fix re-traversal bug. Also tests delimiter+pagination interaction across subdirectories. * test(s3): strengthen deep-hierarchy pagination assertions - Replace timing warning (t.Logf) with a failing assertion (t.Errorf) so pagination regressions actually fail the test. - Replace generic count/uniqueness/sort checks on CommonPrefixes with exact equality against the expected prefix slice, catching wrong-but- sorted results. * test(s3): use allKeys for exact assertion in deep-hierarchy pagination test Wire the allKeys slice (previously unused dead code) into the version listing assertion, replacing generic count/uniqueness/sort checks with an exact equality comparison against the keys that were created.	2026-04-02 15:59:52 -07:00

1 2 3 4 5 ...

13384 Commits