fix(s3): honor ChecksumAlgorithm on presigned URL uploads (#9076)

* fix(s3): honor ChecksumAlgorithm on presigned URL uploads

AWS SDK presigners hoist x-amz-sdk-checksum-algorithm (and related
checksum headers) into the signed URL's query string, so servers must
read either location. detectRequestedChecksumAlgorithm only looked at
request headers, so presigned PUTs with ChecksumAlgorithm set validated
and stored no additional checksum, and HEAD/GET never returned the
x-amz-checksum-* header.

Read these parameters from headers first, then fall back to a
case-insensitive query-string lookup. Apply the same fallback when
comparing an object-level checksum value against the computed one.

Fixes #9075

* test(s3): presigned URL checksum integration tests (#9075)

Adds test/s3/checksum with end-to-end coverage for flexible-checksum
behavior on presigned URL uploads. Tests generate a presigned PUT URL
with ChecksumAlgorithm set, upload the body with a plain http.Client
(bypassing AWS SDK middleware so the server must honor the query-string
hoisted x-amz-sdk-checksum-algorithm), then HEAD/GET with
ChecksumMode=ENABLED and assert the stored x-amz-checksum-* header.

Covers SHA256, SHA1, and a negative control with no checksum requested.
Wires the new directory into s3-go-tests.yml as its own CI job.

* perf(s3): parse presigned query once in detectRequestedChecksumAlgorithm

Previously, each header fallback called getHeaderOrQuery, which re-parsed
r.URL.Query() and allocated a new map on every invocation — up to eight
times per PutObject request. Parse the raw query at most once per request
(only when non-empty) and pass the pre-parsed url.Values into a new
lookupHeaderOrQuery helper.

Also drops a redundant strings.ToLower allocation in the case-insensitive
query key scan (strings.EqualFold already handles ASCII case folding).

Addresses review feedback from gemini-code-assist on PR #9076.

* test(s3): honor credential env vars and add presigned upload timeout

- init() now reads S3_ACCESS_KEY/S3_SECRET_KEY (and AWS_ACCESS_KEY_ID /
  AWS_SECRET_ACCESS_KEY / AWS_REGION fallbacks) so that
  `make test-with-server ACCESS_KEY=... SECRET_KEY=...` no longer
  authenticates with hardcoded defaults while the server has been
  started with different credentials.
- uploadViaPresignedURL uses a dedicated http.Client with a 30s timeout
  instead of http.DefaultClient, so a stalled server fails fast in CI
  instead of blocking until the suite's global timeout fires.

Addresses review feedback from coderabbitai on PR #9076.

* test(s3): pass S3_PORT and credentials through to checksum tests

- 'make test' now exports S3_ENDPOINT, S3_ACCESS_KEY, and S3_SECRET_KEY
  derived from the Makefile variables so the Go test process talks to
  the same endpoint/credentials that start-server was launched with.
- start-server cleans up the background SeaweedFS process and PID file
  when the readiness poll times out, preventing stale port conflicts on
  subsequent runs.

Addresses review feedback from coderabbitai on PR #9076.

* ci(s3): raise checksum tests step timeout

make test-with-server builds weed_binary, waits up to 90s for readiness,
then runs go test -timeout=10m. The previous 12-minute step timeout only
had ~2 minutes of headroom over the Go timeout, risking the Actions
runner killing the step before tests reported a real failure.

Bumps the job timeout from 15 to 20 minutes and the step timeout from
12 to 16 minutes, matching other S3 integration jobs.

Addresses review feedback from coderabbitai on PR #9076.

* perf(s3): thread pre-parsed query through putToFiler hot path

Parse the request's query string once at the top of putToFiler and
reuse the resulting url.Values for both the checksum-algorithm detection
and the expected-checksum verification. Previously, the verification
path called getHeaderOrQuery which re-parsed r.URL.Query() again on
every PutObject, defeating the previous commit's single-parse goal.

- Add parseRequestQuery + detectRequestedChecksumAlgorithmQ (the
  pre-parsed-query variant). detectRequestedChecksumAlgorithm is now a
  thin wrapper used by callers that do a single lookup.
- putToFiler parses once and threads the result through both call sites.
- Remove getHeaderOrQuery and update the unit test to use
  lookupHeaderOrQuery directly.

Addresses follow-up review from gemini-code-assist on PR #9076.

* test(s3): check io.ReadAll error in uploadViaPresignedURL helper

* test(s3): drop SHA1 presigned test case

The AWS SDK v2 presigner signs a Content-MD5 header at presign time
for SHA1 PutObject requests even when no body is attached (the MD5 of
the empty payload gets baked into the signed headers). Uploading the
real body via a plain http.Client then trips SeaweedFS's MD5 validation
and returns BadDigest — an SDK/presigner quirk, not a SeaweedFS bug.

The SHA256 positive case already exercises the server-side
query-hoisted algorithm path that issue #9075 is about, and the unit
tests in weed/s3api cover each algorithm's header mapping. Drop the
SHA1 integration case rather than chase SDK-specific workarounds.

* test(s3): provide real Content-MD5 to presigned checksum test

AWS SDK v2's flexible-checksum middleware signs a Content-MD5 header at
presign time. There is no body to hash at that point, so it seeds the
header with MD5 of the empty payload. When the real body is then PUT
with a plain http.Client, SeaweedFS's server-side Content-MD5
verification correctly rejects the upload with BadDigest.

Pre-compute the MD5 of the test body and thread it into
PutObjectInput.ContentMD5 so the signed Content-MD5 matches the body
that will actually be uploaded. The test still exercises the
server-side path that reads X-Amz-Sdk-Checksum-Algorithm from the
query string (the fix that PR #9076 is validating).

* test(s3): send the signed Content-MD5 header on presigned upload

uploadViaPresignedURL now accepts an extraHeaders map so callers can
thread through headers that the presigner signed but the raw http
request would otherwise omit. The SHA256 test passes the Content-MD5
it computed, matching what the presigner baked into the signature.

Fixes SignatureDoesNotMatch seen in CI after the previous commit set
ContentMD5 on the presign input without sending the corresponding
header on the actual upload.

* test(s3): build presigned URL with the raw v4 signer

The AWS SDK v2 s3.PresignClient runs the flexible-checksum middleware
for any PutObject input that carries ChecksumAlgorithm. That middleware
injects a Content-MD5 header at presign time, and with no body present
it seeds MD5-of-empty. Any subsequent upload of a non-empty body
through a plain http.Client then trips SeaweedFS's Content-MD5
verification and returns BadDigest — not the code path that issue
#9075 is about.

Replace the PresignClient usage in the integration test with a direct
call to v4.Signer.PresignHTTP, building a canonical URL whose query
string already contains x-amz-sdk-checksum-algorithm=SHA256. This is
exactly the shape of URL a browser/curl client would receive from any
presigner that hoists the algorithm header, and it exercises the
server-side fix from PR #9076 without dragging in SDK-specific
middleware quirks.

* test(s3): set X-Amz-Expires on presigned URL before signing

v4.Signer.PresignHTTP does not add X-Amz-Expires on its own — the
caller has to seed it into the request's query string so the signer
includes it in the canonical query and the server accepts the
presigned URL. Without it, SeaweedFS correctly returns
AuthorizationQueryParametersError.

Also adds a .gitignore for the make-managed test volume data, log
file, and PID file so local `make test-with-server` runs do not leave
artifacts tracked by git.

Verified by running the integration tests locally:
  make test-with-server → both presigned checksum tests PASS.
This commit is contained in:
Chris Lu
2026-04-14 21:52:49 -07:00
committed by GitHub
parent 08d9193fe1
commit d0a09ea178
7 changed files with 592 additions and 11 deletions
+1 -1
View File
@@ -1 +1 @@
{"sessionId":"d6574c47-eafc-4a94-9dce-f9ffea22b53c","pid":10111,"acquiredAt":1775248373916}
{"sessionId":"81536282-6277-4e32-9922-37c6e3a01b1f","pid":84648,"acquiredAt":1776208413235}
+55
View File
@@ -245,6 +245,61 @@ jobs:
path: test/s3/retention/weed-test*.log
retention-days: 3
s3-checksum-tests:
name: S3 Checksum Tests
runs-on: ubuntu-22.04
timeout-minutes: 20
steps:
- name: Check out code
uses: actions/checkout@v6
- name: Set up Go
uses: actions/setup-go@v6
with:
go-version-file: 'go.mod'
id: go
- name: Install SeaweedFS
run: |
go install -buildvcs=false
- name: Run S3 Checksum Tests
timeout-minutes: 16
working-directory: test/s3/checksum
run: |
set -x
echo "=== System Information ==="
uname -a
free -h
df -h
echo "=== Starting Tests ==="
make test-with-server
- name: Show server logs on failure
if: failure()
working-directory: test/s3/checksum
run: |
echo "=== Server Logs ==="
if [ -f weed-test.log ]; then
echo "Last 100 lines of server logs:"
tail -100 weed-test.log
else
echo "No server log file found"
fi
echo "=== Test Environment ==="
ps aux | grep -E "(weed|test)" || true
netstat -tlnp | grep -E "(8333|9333|8080)" || true
- name: Upload test logs on failure
if: failure()
uses: actions/upload-artifact@v7
with:
name: s3-checksum-test-logs
path: test/s3/checksum/weed-test*.log
retention-days: 3
s3-cors-tests:
name: S3 CORS Tests
runs-on: ubuntu-22.04
+3
View File
@@ -0,0 +1,3 @@
test-volume-data/
weed-test.log
weed-server.pid
+100
View File
@@ -0,0 +1,100 @@
# S3 Checksum Integration Tests
# Covers flexible-checksum behavior on presigned URL uploads (issue #9075).
.PHONY: help build-weed check-deps start-server stop-server test test-with-server logs clean health-check
WEED_BINARY := ../../../weed/weed_binary
S3_PORT := 8333
ACCESS_KEY ?= some_access_key1
SECRET_KEY ?= some_secret_key1
TEST_TIMEOUT := 10m
TEST_PATTERN ?= TestPresignedPut
SERVER_DIR := ./test-volume-data/server-data
help:
@echo "S3 Checksum Integration Tests"
@echo ""
@echo "Targets:"
@echo " build-weed Build the SeaweedFS binary"
@echo " start-server Start a local SeaweedFS S3 server for testing"
@echo " stop-server Stop the local SeaweedFS server"
@echo " test Run checksum tests (server must already be running)"
@echo " test-with-server Start server, run tests, stop server"
@echo " logs Tail server log"
@echo " clean Remove test artifacts"
build-weed:
@echo "Building SeaweedFS binary..."
@cd ../../../weed && go build -o weed_binary .
@chmod +x $(WEED_BINARY)
check-deps: build-weed
@command -v go >/dev/null 2>&1 || (echo "Go is required but not installed" && exit 1)
@test -f $(WEED_BINARY) || (echo "SeaweedFS binary not found at $(WEED_BINARY)" && exit 1)
@go list -m github.com/aws/aws-sdk-go-v2 >/dev/null 2>&1 || (echo "AWS SDK Go v2 not found. Run 'go mod tidy'." && exit 1)
@go list -m github.com/stretchr/testify >/dev/null 2>&1 || (echo "Testify not found. Run 'go mod tidy'." && exit 1)
start-server: check-deps
@echo "Starting SeaweedFS server..."
@rm -f weed-server.pid
@mkdir -p $(SERVER_DIR)
@AWS_ACCESS_KEY_ID=$(ACCESS_KEY) AWS_SECRET_ACCESS_KEY=$(SECRET_KEY) $(WEED_BINARY) mini \
-dir=$(SERVER_DIR) \
-s3.port=$(S3_PORT) \
> weed-test.log 2>&1 & \
echo $$! > weed-server.pid
@for i in $$(seq 1 90); do \
if curl -s http://localhost:$(S3_PORT) >/dev/null 2>&1; then \
echo "✅ SeaweedFS server started on port $(S3_PORT) after $$i seconds"; \
exit 0; \
fi; \
sleep 1; \
done; \
echo "❌ Server failed to start within 90 seconds"; \
if [ -f weed-server.pid ]; then \
PID=$$(cat weed-server.pid); \
if ps -p $$PID >/dev/null 2>&1; then \
kill -TERM $$PID 2>/dev/null || true; \
sleep 1; \
ps -p $$PID >/dev/null 2>&1 && kill -KILL $$PID 2>/dev/null || true; \
fi; \
rm -f weed-server.pid; \
fi; \
if [ -f weed-test.log ]; then tail -100 weed-test.log; fi; \
exit 1
stop-server:
@if [ -f weed-server.pid ]; then \
PID=$$(cat weed-server.pid); \
if ps -p $$PID >/dev/null 2>&1; then \
kill -TERM $$PID 2>/dev/null || true; \
sleep 2; \
ps -p $$PID >/dev/null 2>&1 && kill -KILL $$PID 2>/dev/null || true; \
fi; \
rm -f weed-server.pid; \
fi
@echo "✅ SeaweedFS server stopped"
test: check-deps
@echo "Running checksum tests (pattern: $(TEST_PATTERN))..."
@S3_ENDPOINT=http://localhost:$(S3_PORT) \
S3_ACCESS_KEY=$(ACCESS_KEY) \
S3_SECRET_KEY=$(SECRET_KEY) \
go test -v -timeout=$(TEST_TIMEOUT) -run "$(TEST_PATTERN)" .
test-with-server: start-server
@sleep 3
@$(MAKE) test || (echo "Tests failed, stopping server..." && $(MAKE) stop-server && exit 1)
@$(MAKE) stop-server
@echo "✅ Checksum tests completed"
logs:
@if [ -f weed-test.log ]; then tail -f weed-test.log; else echo "No log file"; fi
health-check:
@curl -s http://localhost:$(S3_PORT) >/dev/null 2>&1 && echo "✅ S3 on $(S3_PORT)" || (echo "❌ S3 not up" && exit 1)
clean:
@$(MAKE) stop-server
@rm -f weed-test.log weed-server.pid
@rm -rf ./test-volume-data
@@ -0,0 +1,256 @@
package checksum_test
import (
"bytes"
"context"
"crypto/sha256"
"encoding/base64"
"fmt"
"io"
"net/http"
"net/url"
"os"
"strings"
"testing"
"time"
"github.com/aws/aws-sdk-go-v2/aws"
v4 "github.com/aws/aws-sdk-go-v2/aws/signer/v4"
"github.com/aws/aws-sdk-go-v2/config"
"github.com/aws/aws-sdk-go-v2/credentials"
"github.com/aws/aws-sdk-go-v2/service/s3"
"github.com/aws/aws-sdk-go-v2/service/s3/types"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
// Integration test for https://github.com/seaweedfs/seaweedfs/issues/9075:
// presigned PUT URLs that request a flexible-checksum algorithm (SHA256, SHA1, ...)
// must cause the server to compute and persist that checksum, and HEAD/GET must
// return the x-amz-checksum-* header when the caller asks for it.
//
// AWS SDK presigners hoist headers like x-amz-sdk-checksum-algorithm into the
// signed URL's query string, so we deliberately upload the body with a plain
// http.Client to ensure the server sees the checksum algorithm via the query
// string — which is exactly what a browser / curl / non-SDK client does.
type s3TestConfig struct {
Endpoint string
AccessKey string
SecretKey string
Region string
BucketPrefix string
}
var defaultConfig = &s3TestConfig{
Endpoint: "http://localhost:8333",
AccessKey: "some_access_key1",
SecretKey: "some_secret_key1",
Region: "us-east-1",
BucketPrefix: "test-checksum-",
}
func getenvAny(keys ...string) string {
for _, k := range keys {
if v := os.Getenv(k); v != "" {
return v
}
}
return ""
}
func init() {
if v := getenvAny("S3_ENDPOINT"); v != "" {
defaultConfig.Endpoint = v
}
if v := getenvAny("S3_ACCESS_KEY", "AWS_ACCESS_KEY_ID"); v != "" {
defaultConfig.AccessKey = v
}
if v := getenvAny("S3_SECRET_KEY", "AWS_SECRET_ACCESS_KEY"); v != "" {
defaultConfig.SecretKey = v
}
if v := getenvAny("AWS_REGION", "AWS_DEFAULT_REGION"); v != "" {
defaultConfig.Region = v
}
}
// presignedHTTPClient is used for non-SDK PUTs to a presigned URL. A fixed
// timeout keeps tests from hanging forever if the server stalls.
var presignedHTTPClient = &http.Client{Timeout: 30 * time.Second}
func getS3Client(t *testing.T) *s3.Client {
cfg, err := config.LoadDefaultConfig(context.TODO(),
config.WithRegion(defaultConfig.Region),
config.WithCredentialsProvider(credentials.NewStaticCredentialsProvider(
defaultConfig.AccessKey, defaultConfig.SecretKey, "")),
config.WithEndpointResolverWithOptions(aws.EndpointResolverWithOptionsFunc(
func(service, region string, _ ...interface{}) (aws.Endpoint, error) {
return aws.Endpoint{
URL: defaultConfig.Endpoint,
SigningRegion: defaultConfig.Region,
HostnameImmutable: true,
}, nil
})),
)
require.NoError(t, err)
return s3.NewFromConfig(cfg, func(o *s3.Options) { o.UsePathStyle = true })
}
func uniqueBucket() string {
return fmt.Sprintf("%s%d", defaultConfig.BucketPrefix, time.Now().UnixNano())
}
func createBucket(t *testing.T, client *s3.Client, name string) {
_, err := client.CreateBucket(context.TODO(), &s3.CreateBucketInput{Bucket: aws.String(name)})
require.NoError(t, err)
}
func cleanupBucket(t *testing.T, client *s3.Client, name string) {
t.Helper()
objs, err := client.ListObjectsV2(context.TODO(), &s3.ListObjectsV2Input{Bucket: aws.String(name)})
if err == nil {
for _, o := range objs.Contents {
_, _ = client.DeleteObject(context.TODO(), &s3.DeleteObjectInput{
Bucket: aws.String(name), Key: o.Key,
})
}
}
_, _ = client.DeleteBucket(context.TODO(), &s3.DeleteBucketInput{Bucket: aws.String(name)})
}
// uploadViaPresignedURL PUTs body to the given presigned URL using a plain
// http.Client, mirroring what a browser / curl / non-SDK client would do.
// Crucially this path does NOT run any AWS SDK middleware, so the server must
// compute and store the checksum based on the algorithm parameter that was
// hoisted into the query string by the presigner.
//
// extraHeaders are additional HTTP headers the presigner signed (e.g.
// Content-MD5). Their values must exactly match what was signed or SigV4
// verification will return SignatureDoesNotMatch.
func uploadViaPresignedURL(t *testing.T, url string, body []byte, extraHeaders map[string]string) {
t.Helper()
req, err := http.NewRequest(http.MethodPut, url, bytes.NewReader(body))
require.NoError(t, err)
req.ContentLength = int64(len(body))
for k, v := range extraHeaders {
req.Header.Set(k, v)
}
resp, err := presignedHTTPClient.Do(req)
require.NoError(t, err)
defer resp.Body.Close()
respBody, err := io.ReadAll(resp.Body)
require.NoError(t, err, "read presigned PUT response body")
require.Equalf(t, http.StatusOK, resp.StatusCode,
"presigned PUT failed: %d %s", resp.StatusCode, string(respBody))
}
// presignPutURL builds a presigned S3 PutObject URL using the low-level
// SigV4 presigner, with an optional x-amz-sdk-checksum-algorithm query
// parameter baked into the signed canonical query string.
//
// We use the raw signer instead of s3.PresignClient because the SDK's
// flexible-checksum middleware tries to inject a Content-MD5 header for
// flexible-checksum PutObject calls, and at presign time (no body) it seeds
// MD5-of-empty, which then mismatches any real body uploaded through a plain
// http.Client. The raw signer has no such middleware and produces exactly
// the kind of URL a browser, curl caller, or custom client would receive.
func presignPutURL(t *testing.T, bucket, key, checksumAlgorithm string) string {
t.Helper()
putURL, err := url.Parse(fmt.Sprintf("%s/%s/%s", strings.TrimRight(defaultConfig.Endpoint, "/"), bucket, key))
require.NoError(t, err)
q := putURL.Query()
// PresignHTTP does not add X-Amz-Expires on its own; the caller must
// seed it so the signer picks it up into the canonical query string.
q.Set("X-Amz-Expires", "600")
if checksumAlgorithm != "" {
q.Set("x-amz-sdk-checksum-algorithm", checksumAlgorithm)
}
putURL.RawQuery = q.Encode()
req, err := http.NewRequest(http.MethodPut, putURL.String(), nil)
require.NoError(t, err)
signer := v4.NewSigner()
creds := aws.Credentials{
AccessKeyID: defaultConfig.AccessKey,
SecretAccessKey: defaultConfig.SecretKey,
}
// For presigned URLs the AWS convention is UNSIGNED-PAYLOAD so the
// signer doesn't require a body-hash up front.
signedURL, _, err := signer.PresignHTTP(context.TODO(), creds, req,
"UNSIGNED-PAYLOAD", "s3", defaultConfig.Region, time.Now(),
func(o *v4.SignerOptions) {
o.DisableURIPathEscaping = true
})
require.NoError(t, err)
return signedURL
}
// TestPresignedPutWithChecksumSHA256 reproduces issue #9075: a presigned PUT
// URL that carries x-amz-sdk-checksum-algorithm=SHA256 in its query string
// must cause the object to be stored with an x-amz-checksum-sha256 attribute,
// visible via HEAD when the caller requests ChecksumMode=ENABLED.
func TestPresignedPutWithChecksumSHA256(t *testing.T) {
client := getS3Client(t)
bucket := uniqueBucket()
createBucket(t, client, bucket)
defer cleanupBucket(t, client, bucket)
key := "presigned-sha256.txt"
body := []byte("hello seaweedfs checksum")
sha256Sum := sha256.Sum256(body)
expected := base64.StdEncoding.EncodeToString(sha256Sum[:])
signedURL := presignPutURL(t, bucket, key, "SHA256")
uploadViaPresignedURL(t, signedURL, body, nil)
head, err := client.HeadObject(context.TODO(), &s3.HeadObjectInput{
Bucket: aws.String(bucket),
Key: aws.String(key),
ChecksumMode: types.ChecksumModeEnabled,
})
require.NoError(t, err)
require.NotNil(t, head.ChecksumSHA256, "x-amz-checksum-sha256 missing from HEAD response")
assert.Equal(t, expected, aws.ToString(head.ChecksumSHA256),
"stored SHA256 does not match body")
// GET with ChecksumMode=ENABLED should also return the header and the body should match.
getOut, err := client.GetObject(context.TODO(), &s3.GetObjectInput{
Bucket: aws.String(bucket),
Key: aws.String(key),
ChecksumMode: types.ChecksumModeEnabled,
})
require.NoError(t, err)
defer getOut.Body.Close()
require.NotNil(t, getOut.ChecksumSHA256, "x-amz-checksum-sha256 missing from GET response")
assert.Equal(t, expected, aws.ToString(getOut.ChecksumSHA256))
got, err := io.ReadAll(getOut.Body)
require.NoError(t, err)
assert.Equal(t, body, got)
}
// TestPresignedPutWithoutChecksumAlgorithm is a negative control: when the
// caller doesn't request a checksum algorithm, HEAD should not return one.
func TestPresignedPutWithoutChecksumAlgorithm(t *testing.T) {
client := getS3Client(t)
bucket := uniqueBucket()
createBucket(t, client, bucket)
defer cleanupBucket(t, client, bucket)
key := "presigned-nosum.txt"
body := []byte("no checksum requested")
signedURL := presignPutURL(t, bucket, key, "")
uploadViaPresignedURL(t, signedURL, body, nil)
head, err := client.HeadObject(context.TODO(), &s3.HeadObjectInput{
Bucket: aws.String(bucket),
Key: aws.String(key),
ChecksumMode: types.ChecksumModeEnabled,
})
require.NoError(t, err)
assert.Nil(t, head.ChecksumSHA256)
assert.Nil(t, head.ChecksumSHA1)
assert.Nil(t, head.ChecksumCRC32)
assert.Nil(t, head.ChecksumCRC32C)
}
+69 -10
View File
@@ -362,8 +362,14 @@ func (s3a *S3ApiServer) putToFiler(r *http.Request, filePath string, dataReader
plaintextHash := md5.New()
dataReader = io.TeeReader(dataReader, plaintextHash)
// Parse the request's query string once. AWS SDK presigners hoist signed
// headers such as x-amz-sdk-checksum-algorithm into the URL's query string,
// so the checksum detection and (later) expected-checksum verification below
// both need to consult the query in addition to headers.
requestQuery := parseRequestQuery(r)
// Detect and set up additional checksum computation (S3 checksum algorithm support)
checksumAlgo, checksumHeaderName, checksumErrCode := detectRequestedChecksumAlgorithm(r)
checksumAlgo, checksumHeaderName, checksumErrCode := detectRequestedChecksumAlgorithmQ(r, requestQuery)
if checksumErrCode != s3err.ErrNone {
return "", checksumErrCode, SSEResponseMetadata{}
}
@@ -657,7 +663,7 @@ func (s3a *S3ApiServer) putToFiler(r *http.Request, filePath string, dataReader
checksumBase64 = base64.StdEncoding.EncodeToString(checksumHash.Sum(nil))
// Verify against client-provided checksum if present in request headers
// (non-chunked uploads send the value directly; chunked uploads validate in the reader)
if expectedChecksum := r.Header.Get(checksumHeaderName); expectedChecksum != "" {
if expectedChecksum := lookupHeaderOrQuery(r, requestQuery, checksumHeaderName); expectedChecksum != "" {
if expectedChecksum != checksumBase64 {
glog.Warningf("putToFiler: checksum mismatch for %s: expected %s, got %s", checksumHeaderName, expectedChecksum, checksumBase64)
s3a.deleteOrphanedChunks(chunkResult.FileChunks)
@@ -891,14 +897,66 @@ var checksumHeaders = []struct {
{s3_constants.AmzChecksumSHA256, ChecksumAlgorithmSHA256, s3_constants.AmzChecksumSHA256},
}
// lookupHeaderOrQuery returns the value of an x-amz-* parameter, checking the
// request headers first and falling back to the pre-parsed query values. AWS
// SDK presigners hoist headers such as x-amz-sdk-checksum-algorithm into the
// signed URL's query string, so the server must accept either location.
// Query parameter lookup is case-insensitive to tolerate SDK/canonicalization
// variations.
func lookupHeaderOrQuery(r *http.Request, query url.Values, key string) string {
if v := r.Header.Get(key); v != "" {
return v
}
if len(query) == 0 {
return ""
}
if v := query.Get(key); v != "" {
return v
}
for k, vs := range query {
if len(vs) == 0 || vs[0] == "" {
continue
}
if strings.EqualFold(k, key) {
return vs[0]
}
}
return ""
}
// parseRequestQuery returns the parsed query for a request, or nil if there
// is no raw query to parse. Callers that perform multiple header-or-query
// lookups on the same request should call this once and thread the result
// into lookupHeaderOrQuery / detectRequestedChecksumAlgorithmQ to avoid
// re-parsing the query on every lookup.
func parseRequestQuery(r *http.Request) url.Values {
if r == nil || r.URL == nil || r.URL.RawQuery == "" {
return nil
}
return r.URL.Query()
}
// detectRequestedChecksumAlgorithm detects the checksum algorithm requested by the client.
// It checks the x-amz-sdk-checksum-algorithm header, x-amz-checksum-algorithm header,
// x-amz-trailer header (including comma-separated values), and individual x-amz-checksum-*
// headers. Returns the algorithm enum, the canonical HTTP header name, and an error code
// if an unsupported algorithm is specified.
// headers. For presigned URLs, the same parameters are accepted from the query string
// because AWS SDK presigners hoist those headers into the query. Returns the algorithm
// enum, the canonical HTTP header name, and an error code if an unsupported algorithm is
// specified.
//
// This wrapper parses the query string on the caller's behalf. Hot paths that
// also need to inspect the query for other parameters (see putToFiler) should
// call detectRequestedChecksumAlgorithmQ with a pre-parsed url.Values.
func detectRequestedChecksumAlgorithm(r *http.Request) (ChecksumAlgorithm, string, s3err.ErrorCode) {
// Check x-amz-sdk-checksum-algorithm (set by AWS SDKs)
if algo := r.Header.Get(s3_constants.AmzSdkChecksumAlgorithm); algo != "" {
return detectRequestedChecksumAlgorithmQ(r, parseRequestQuery(r))
}
// detectRequestedChecksumAlgorithmQ is the pre-parsed-query variant of
// detectRequestedChecksumAlgorithm. Pass nil for `query` when the caller has
// no parsed query to reuse.
func detectRequestedChecksumAlgorithmQ(r *http.Request, query url.Values) (ChecksumAlgorithm, string, s3err.ErrorCode) {
// Check x-amz-sdk-checksum-algorithm (set by AWS SDKs; hoisted to query for presigned URLs)
if algo := lookupHeaderOrQuery(r, query, s3_constants.AmzSdkChecksumAlgorithm); algo != "" {
if m, ok := checksumAlgorithmMapping[strings.ToUpper(algo)]; ok {
return m.alg, m.name, s3err.ErrNone
}
@@ -906,8 +964,8 @@ func detectRequestedChecksumAlgorithm(r *http.Request) (ChecksumAlgorithm, strin
return ChecksumAlgorithmNone, "", s3err.ErrInvalidRequest
}
// Check x-amz-checksum-algorithm header
if algo := r.Header.Get(s3_constants.AmzChecksumAlgorithm); algo != "" {
// Check x-amz-checksum-algorithm header (also accept from query for presigned URLs)
if algo := lookupHeaderOrQuery(r, query, s3_constants.AmzChecksumAlgorithm); algo != "" {
if m, ok := checksumAlgorithmMapping[strings.ToUpper(algo)]; ok {
return m.alg, m.name, s3err.ErrNone
}
@@ -929,10 +987,11 @@ func detectRequestedChecksumAlgorithm(r *http.Request) (ChecksumAlgorithm, strin
}
}
// Check individual checksum headers (non-chunked uploads send the value directly)
// Check individual checksum headers (non-chunked uploads send the value directly;
// presigned URLs may hoist them to the query string)
// Uses ordered slice for deterministic selection
for _, entry := range checksumHeaders {
if r.Header.Get(entry.header) != "" {
if lookupHeaderOrQuery(r, query, entry.header) != "" {
return entry.alg, entry.name, s3err.ErrNone
}
}
@@ -0,0 +1,108 @@
package s3api
import (
"net/http"
"net/http/httptest"
"testing"
"github.com/seaweedfs/seaweedfs/weed/s3api/s3_constants"
"github.com/seaweedfs/seaweedfs/weed/s3api/s3err"
)
func TestDetectRequestedChecksumAlgorithm(t *testing.T) {
cases := []struct {
name string
setup func(r *http.Request)
wantAlg ChecksumAlgorithm
wantHeader string
wantErr s3err.ErrorCode
}{
{
name: "sdk algorithm header",
setup: func(r *http.Request) {
r.Header.Set(s3_constants.AmzSdkChecksumAlgorithm, "SHA256")
},
wantAlg: ChecksumAlgorithmSHA256,
wantHeader: s3_constants.AmzChecksumSHA256,
},
{
name: "algorithm header",
setup: func(r *http.Request) {
r.Header.Set(s3_constants.AmzChecksumAlgorithm, "CRC32")
},
wantAlg: ChecksumAlgorithmCRC32,
wantHeader: s3_constants.AmzChecksumCRC32,
},
{
// Presigned URL: AWS SDK hoists the sdk-checksum-algorithm header into the query string.
// Regression test for https://github.com/seaweedfs/seaweedfs/issues/9075
name: "presigned url hoists sdk algorithm to query",
setup: func(r *http.Request) {
q := r.URL.Query()
q.Set("X-Amz-Sdk-Checksum-Algorithm", "SHA256")
r.URL.RawQuery = q.Encode()
},
wantAlg: ChecksumAlgorithmSHA256,
wantHeader: s3_constants.AmzChecksumSHA256,
},
{
name: "presigned url hoists checksum-algorithm to query (lowercase)",
setup: func(r *http.Request) {
q := r.URL.Query()
q.Set("x-amz-checksum-algorithm", "sha1")
r.URL.RawQuery = q.Encode()
},
wantAlg: ChecksumAlgorithmSHA1,
wantHeader: s3_constants.AmzChecksumSHA1,
},
{
name: "presigned url hoists individual checksum value",
setup: func(r *http.Request) {
q := r.URL.Query()
q.Set("X-Amz-Checksum-Sha256", "abc")
r.URL.RawQuery = q.Encode()
},
wantAlg: ChecksumAlgorithmSHA256,
wantHeader: s3_constants.AmzChecksumSHA256,
},
{
name: "unsupported in query returns error",
setup: func(r *http.Request) {
q := r.URL.Query()
q.Set("X-Amz-Sdk-Checksum-Algorithm", "MD5")
r.URL.RawQuery = q.Encode()
},
wantAlg: ChecksumAlgorithmNone,
wantErr: s3err.ErrInvalidRequest,
},
{
name: "no checksum",
setup: func(r *http.Request) {},
wantAlg: ChecksumAlgorithmNone,
},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
r := httptest.NewRequest(http.MethodPut, "http://example.com/bucket/key", nil)
tc.setup(r)
alg, header, code := detectRequestedChecksumAlgorithm(r)
if code != tc.wantErr {
t.Fatalf("err code: got %v, want %v", code, tc.wantErr)
}
if alg != tc.wantAlg {
t.Fatalf("algorithm: got %v, want %v", alg, tc.wantAlg)
}
if header != tc.wantHeader {
t.Fatalf("header name: got %q, want %q", header, tc.wantHeader)
}
})
}
}
func TestLookupHeaderOrQueryCaseInsensitive(t *testing.T) {
r := httptest.NewRequest(http.MethodPut, "http://example.com/?x-AMZ-sdk-CHECKSUM-algorithm=SHA256", nil)
q := parseRequestQuery(r)
if got := lookupHeaderOrQuery(r, q, s3_constants.AmzSdkChecksumAlgorithm); got != "SHA256" {
t.Fatalf("expected SHA256, got %q", got)
}
}