From 08d9193fe15b4dd0e74634c16112d9d2f7b6825b Mon Sep 17 00:00:00 2001 From: Chris Lu Date: Tue, 14 Apr 2026 20:48:24 -0700 Subject: [PATCH] [nfs] Add NFS (#9067) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * add filer inode foundation for nfs * nfs command skeleton * add filer inode index foundation for nfs * make nfs inode index hardlink aware * add nfs filehandle and inode lookup plumbing * add read-only nfs frontend foundation * add nfs namespace mutation support * add chunk-backed nfs write path * add nfs protocol integration tests * add stale handle nfs coverage * complete nfs hardlink and failover coverage * add nfs export access controls * add nfs metadata cache invalidation * fix nfs chunk read lookup routing * fix nfs review findings and rename regression * address pr 9067 review comments - filer_inode: fail fast if the snowflake sequencer cannot start, and let operators override the 10-bit node id via SEAWEEDFS_FILER_SNOWFLAKE_ID to avoid multi-filer collisions - filer_inode: drop the redundant retry loop in nextInode - filerstore_wrapper: treat inode-index writes/removals as best-effort so a primary store success no longer surfaces as an operation failure - filer_grpc_server_rename: defer overwritten-target chunk deletion until after CommitTransaction so a rolled-back rename does not strand live metadata pointing at freshly deleted chunks - command/nfs: default ip.bind to loopback and require an explicit filer.path, so the experimental server does not expose the entire filer namespace on first run - nfs integration_test: document why LinkArgs matches go-nfs's on-the-wire layout rather than RFC 1813 LINK3args * mount: pre-allocate inode in Mkdir and Symlink Mkdir and Symlink used to send filer_pb.CreateEntryRequest with Attributes.Inode = 0. After PR 9067, the filer's CreateEntry now assigns its own inode in that case, so the filer-side entry ends up with a different inode than the one the mount allocates via inodeToPath.Lookup and returns to the kernel. Once applyLocalMetadataEvent stores the filer's entry in the meta cache, subsequent GetAttr calls read the cached entry and hit the setAttrByPbEntry override at line 197 of weedfs_attr.go, returning the filer-assigned inode instead of the mount's local one. pjdfstest tests/rename/00.t (subtests 81/87/91) caught this — it lstat'd a freshly-created directory/symlink, renamed it, lstat'd again, and saw a different inode the second time. createRegularFile already pre-allocates via inodeToPath.AllocateInode and stamps it into the create request. Do the same thing in Mkdir and Symlink so both sides agree on the object identity from the very first request, and so GetAttr's cache path returns the same value as Mkdir / Symlink's initial response. * sequence: mask snowflake node id on int→uint32 conversion CodeQL flagged the unchecked uint32(snowflakeId) cast in NewSnowflakeSequencer as a potential truncation bug when snowflakeId is sourced from user input (e.g. via SEAWEEDFS_FILER_SNOWFLAKE_ID). Mask to the 10 bits the snowflake library actually uses so any caller- supplied int is safely clamped into range. * add test/nfs integration suite Boots a real SeaweedFS cluster (master + volume + filer) plus the experimental `weed nfs` frontend as subprocesses and drives it through the NFSv3 wire protocol via go-nfs-client, mirroring the layout of test/sftp. The tests run without a kernel NFS mount, privileged ports, or any platform-specific tooling. Coverage includes read/write round-trip, mkdir/rmdir, nested directories, rename content preservation, overwrite + explicit truncate, 3 MiB binary file, all-byte binary and empty files, symlink round-trip, ReadDirPlus listing, missing-path remove, FSInfo sanity, sequential appends, and readdir-after-remove. Framework notes: - Picks ephemeral ports with net.Listen("127.0.0.1:0") and passes -port.grpc explicitly so the default port+10000 convention cannot overflow uint16 on macOS. - Pre-creates the /nfs_export directory via the filer HTTP API before starting the NFS server — the NFS server's ensureIndexedEntry check requires the export root to exist with a real entry, which filer.Root does not satisfy when the export path is "/". - Reuses the same rpc.Client for mount and target so go-nfs-client does not try to re-dial via portmapper (which concatenates ":111" onto the address). * ci: add NFS integration test workflow Mirror test/sftp's workflow for the new test/nfs suite so PRs that touch the NFS server, the inode filer plumbing it depends on, or the test harness itself run the 14 NFSv3-over-RPC integration tests on Ubuntu 22.04 via `make test`. * nfs: use append for buffer growth in Write and Truncate The previous make+copy pattern reallocated the full buffer on every extending write or truncate, giving O(N^2) behaviour for sequential write loops. Switching to `append(f.content, make([]byte, delta)...)` lets Go's amortized growth strategy absorb the repeated extensions. Called out by gemini-code-assist on PR 9067. * filer: honor caller cancellation in collectInodeIndexEntries Dropping the WithoutCancel wrapper lets DeleteFolderChildren bail out of the inode-index scan if the client disconnects mid-walk. The cleanup is already treated as best-effort by the caller (it logs on error and continues), so a cancelled walk just means the partial index rebuild is skipped — the same failure mode as any other index write error. Flagged as a DoS concern by gemini-code-assist on PR 9067. * nfs: skip filer read on open when O_TRUNC is set openFile used to unconditionally loadWritableContent for every writable open and then discard the buffer if O_TRUNC was set. For large files that is a pointless 64 MiB round-trip. Reorder the branches so we only fetch existing content when the caller intends to keep it, and mark the file dirty right away so the subsequent Close still issues the truncating write. Called out by gemini-code-assist on PR 9067. * nfs: allow Seek on O_APPEND files and document buffered write cap Two related cleanups on filesystem.go: - POSIX only restricts Write on an O_APPEND fd, not lseek. The existing Seek error ("append-only file descriptors may only seek to EOF") prevented read-and-write workloads that legitimately reposition the read cursor. Write already snaps the offset to EOF before persisting (see seaweedFile Write), so Seek can unconditionally accept any offset. Update the unit test that was asserting the old behaviour. - Add a doc comment on maxBufferedWriteSize explaining that it is a per-file ceiling, the memory footprint it implies, and that the real fix for larger whole-file rewrites is streaming / multi-chunk support. Both changes flagged by gemini-code-assist on PR 9067. * nfs: guard offset before casting to int in Write CodeQL flagged `int(f.offset) + len(p)` inside the Write growth path as a potential overflow on architectures where `int` is 32-bit. The existing check only bounded the post-cast value, which is too late. Clamp f.offset against maxBufferedWriteSize before the cast and also reject negative/overflowed endOffset results. Both branches fall through to billy.ErrNotSupported, the same behaviour the caller gets today for any out-of-range buffered write. * nfs: compute Write endOffset in int64 to satisfy CodeQL The previous guard bounded f.offset but left len(p) unchecked, so CodeQL still flagged `int(f.offset) + len(p)` as a possible int-width overflow path. Bound len(p) against maxBufferedWriteSize first, do the addition in int64, and only cast down after the total has been clamped against the buffer ceiling. Behaviour is unchanged: any out-of-range write still returns billy.ErrNotSupported. * ci: drop emojis from nfs-tests workflow summary Plain-text step summary per user preference — no decorative glyphs in the NFS CI output or checklist. * nfs: annotate remaining DEV_PLAN TODOs with status Three of the unchecked items are genuine follow-up PRs rather than missing work in this one, and one was actually already done: - Reuse chunk cache and mutation stream helpers without FUSE deps: checked off — the NFS server imports weed/filer.ReaderCache and weed/util/chunk_cache directly with no weed/mount or go-fuse imports. - Extract shared read/write helpers from mount/WebDAV/SFTP: annotated as deferred to a separate refactor PR (touches four packages). - Expand direct data-path writes beyond the 64 MiB buffered fallback: annotated as deferred — requires a streaming WRITE path. - Shared lock state + lock tests: annotated as blocked upstream on go-nfs's missing NLM/NFSv4 lock state RPCs, matching the existing "Current Blockers" note. * test/nfs: share port+readiness helpers with test/testutil Drop the per-suite mustPickFreePort and waitForService re-implementations in favor of testutil.MustAllocatePorts (atomic batch allocation; no close-then-hope race) and testutil.WaitForPort / SeaweedMiniStartupTimeout. Pull testutil in via a local replace directive so this standalone seaweedfs-nfs-tests module can import the in-repo package without a separate release. Subprocess startup is still master + volume + filer + nfs — no switch to weed mini yet, since mini does not know about the nfs frontend. * nfs: stream writes to volume servers instead of buffering the whole file Before this change the NFS write path held the full contents of every writable open in memory: - OpenFile(write) called loadWritableContent which read the existing file into seaweedFile.content up to maxBufferedWriteSize (64 MiB) - each Write() extended content in-place - Close() uploaded the whole buffer as a single chunk via persistContent + AssignVolume The 64 MiB ceiling made large NFS writes return NFS3ERR_NOTSUPP, and even below the cap every Write paid a whole-file-in-memory cost. This PR rewrites the write path to match how `weed filer` and the S3 gateway persist data: - openFile(write) no longer loads the existing content at all; it only issues an UpdateEntry when O_TRUNC is set *and* the file is non-empty (so a fresh create+trunc is still zero-RPC) - Write() streams the caller's bytes straight to a volume server via one AssignVolume + one chunk upload, then atomically appends the resulting chunk to the filer entry through mutateEntry. Any previously inlined entry.Content is migrated to a chunk in the same update so the chunk list becomes the authoritative representation. - Truncate() becomes a direct mutateEntry (drop chunks past the new size, clip inline content, update FileSize) instead of resizing an in-memory buffer. - Close() is a no-op because everything was flushed inline. The small-file fast path that the filer HTTP handler uses is preserved: if the post-write size still fits in maxInlineWriteSize (4 MiB) and the file has no existing chunks, we rewrite entry.Content directly and skip the volume-server round-trip. This keeps single-shot tiny writes (echo, small edits) cheap while completely removing the 64 MiB cap on larger files. Read() now always reads through the chunk reader instead of a local byte slice, so reads inside the same session see the freshly appended data. Drops the unused seaweedFile.content / dirty fields, the maxBufferedWriteSize constant, and the loadWritableContent helper. Updates TestSeaweedFileSystemSupportsNamespaceMutations expectations to match the new "no extra O_TRUNC UpdateEntry on an empty file" behavior (still 3 updates: Write + Chmod + Truncate). * filer: extract shared gateway upload helper for NFS and WebDAV Three filer-backed gateways (NFS, WebDAV, and mount) each had a local saveDataAsChunk that wrapped operation.NewUploader().UploadWithRetry with near-identical bodies: build AssignVolumeRequest, build UploadOption, build genFileUrlFn with optional filerProxy rewriting, call UploadWithRetry, validate the result, and call ToPbFileChunk. Pull that body into filer.SaveGatewayDataAsChunk with a GatewayChunkUploadRequest struct so both NFS and WebDAV can delegate to one implementation. - NFS's saveDataAsChunk is now a thin adapter that assembles the GatewayChunkUploadRequest from server options and calls the helper. The chunkUploader interface keeps working for test injection because the new GatewayChunkUploader interface is structurally identical. - WebDAV's saveDataAsChunk is similarly a thin adapter — it drops the local operation.NewUploader call plus the AssignVolume/UploadOption scaffolding. - mount is intentionally left alone. mount's saveDataAsChunk has two features that do not fit the shared helper (a pre-allocated file-id pool used to skip AssignVolume entirely, and a chunkCache write-through at offset 0 so future reads hit the mount's local cache), both of which are mount-specific. Marks the Phase 2 "extract shared read/write helpers from mount, WebDAV, and SFTP" DEV_PLAN item as done. The filer-level chunk read path (NonOverlappingVisibleIntervals + ViewFromVisibleIntervals + NewChunkReaderAtFromClient) was already shared. * nfs: remove DESIGN.md and DEV_PLAN.md The planning documents have served their purpose — all phase 1 and phase 2 items are landed, phase 3 streaming writes are landed, phase 2 shared helpers are extracted, and the two remaining phase 4 items (shared lock state + lock tests) are blocked upstream on github.com/willscott/go-nfs which exposes no NLM or NFSv4 lock state RPCs. The running decision log no longer reflects current code and would just drift. The NFS wiki page (https://github.com/seaweedfs/seaweedfs/wiki/NFS-Server) now carries the overview, configuration surface, architecture notes, and known limitations; the source is the source of truth for the rest. --- .github/workflows/nfs-tests.yml | 97 ++ go.mod | 3 + go.sum | 6 + test/nfs/Makefile | 36 + test/nfs/README.md | 92 ++ test/nfs/basic_test.go | 400 +++++ test/nfs/framework.go | 423 ++++++ test/nfs/go.mod | 21 + test/nfs/go.sum | 14 + weed/command/command.go | 1 + weed/command/nfs.go | 100 ++ weed/filer/filechunk_manifest.go | 2 +- weed/filer/filer.go | 11 + weed/filer/filer_inode.go | 51 + weed/filer/filer_inode_index.go | 300 ++++ weed/filer/filer_inode_index_test.go | 206 +++ weed/filer/filer_inode_test.go | 124 ++ weed/filer/filer_lazy_remote_test.go | 1 + weed/filer/filerstore_wrapper.go | 57 +- weed/filer/gateway_upload.go | 168 ++ weed/filer/stream.go | 2 +- weed/mount/weedfs_dir_mkrm.go | 24 +- weed/mount/weedfs_symlink.go | 8 +- weed/sequence/snowflake_sequencer.go | 5 +- weed/server/filer_grpc_server_rename.go | 60 +- weed/server/nfs/access.go | 140 ++ weed/server/nfs/access_test.go | 29 + weed/server/nfs/filehandle.go | 251 +++ weed/server/nfs/filehandle_test.go | 182 +++ weed/server/nfs/filesystem.go | 1348 +++++++++++++++++ weed/server/nfs/handler.go | 127 ++ weed/server/nfs/integration_test.go | 718 +++++++++ weed/server/nfs/internal_client.go | 88 ++ weed/server/nfs/metadata_follow.go | 147 ++ weed/server/nfs/server.go | 177 +++ weed/server/nfs/server_test.go | 1014 +++++++++++++ weed/server/nfs/uploader.go | 40 + weed/server/webdav_server.go | 56 +- weed/util/http/http_global_client_util.go | 26 +- .../util/http/http_global_client_util_test.go | 72 + 40 files changed, 6557 insertions(+), 70 deletions(-) create mode 100644 .github/workflows/nfs-tests.yml create mode 100644 test/nfs/Makefile create mode 100644 test/nfs/README.md create mode 100644 test/nfs/basic_test.go create mode 100644 test/nfs/framework.go create mode 100644 test/nfs/go.mod create mode 100644 test/nfs/go.sum create mode 100644 weed/command/nfs.go create mode 100644 weed/filer/filer_inode.go create mode 100644 weed/filer/filer_inode_index.go create mode 100644 weed/filer/filer_inode_index_test.go create mode 100644 weed/filer/filer_inode_test.go create mode 100644 weed/filer/gateway_upload.go create mode 100644 weed/server/nfs/access.go create mode 100644 weed/server/nfs/access_test.go create mode 100644 weed/server/nfs/filehandle.go create mode 100644 weed/server/nfs/filehandle_test.go create mode 100644 weed/server/nfs/filesystem.go create mode 100644 weed/server/nfs/handler.go create mode 100644 weed/server/nfs/integration_test.go create mode 100644 weed/server/nfs/internal_client.go create mode 100644 weed/server/nfs/metadata_follow.go create mode 100644 weed/server/nfs/server.go create mode 100644 weed/server/nfs/server_test.go create mode 100644 weed/server/nfs/uploader.go create mode 100644 weed/util/http/http_global_client_util_test.go diff --git a/.github/workflows/nfs-tests.yml b/.github/workflows/nfs-tests.yml new file mode 100644 index 000000000..1167d3311 --- /dev/null +++ b/.github/workflows/nfs-tests.yml @@ -0,0 +1,97 @@ +name: "NFS Integration Tests" + +on: + push: + branches: [ master, main ] + paths: + - 'weed/server/nfs/**' + - 'weed/command/nfs.go' + - 'weed/filer/filer_inode.go' + - 'weed/filer/filer_inode_index.go' + - 'weed/filer/filerstore_wrapper.go' + - 'weed/server/filer_grpc_server_rename.go' + - 'test/nfs/**' + - '.github/workflows/nfs-tests.yml' + pull_request: + branches: [ master, main ] + paths: + - 'weed/server/nfs/**' + - 'weed/command/nfs.go' + - 'weed/filer/filer_inode.go' + - 'weed/filer/filer_inode_index.go' + - 'weed/filer/filerstore_wrapper.go' + - 'weed/server/filer_grpc_server_rename.go' + - 'test/nfs/**' + - '.github/workflows/nfs-tests.yml' + +concurrency: + group: ${{ github.head_ref }}/nfs-tests + cancel-in-progress: true + +permissions: + contents: read + +env: + TEST_TIMEOUT: '15m' + +jobs: + nfs-integration: + name: NFS Integration Testing + runs-on: ubuntu-22.04 + timeout-minutes: 20 + + steps: + - name: Checkout code + uses: actions/checkout@v6 + + - name: Set up Go + uses: actions/setup-go@v6 + with: + go-version-file: 'go.mod' + + - name: Build SeaweedFS + run: | + cd weed + go build -o weed . + chmod +x weed + ./weed version + + - name: Run NFS Integration Tests + run: | + cd test/nfs + + echo "Running NFS integration tests..." + echo "============================================" + + # Install test dependencies + go mod download + + # Run all NFS tests + go test -v -timeout=${{ env.TEST_TIMEOUT }} ./... + + echo "============================================" + echo "NFS integration tests completed" + + - name: Test Summary + if: always() + run: | + echo "## NFS Integration Test Summary" >> $GITHUB_STEP_SUMMARY + echo "" >> $GITHUB_STEP_SUMMARY + echo "### Test Coverage" >> $GITHUB_STEP_SUMMARY + echo "- **Read/Write Round Trip**: Basic file create + read" >> $GITHUB_STEP_SUMMARY + echo "- **Directory Operations**: Mkdir, ReadDirPlus, RmDir" >> $GITHUB_STEP_SUMMARY + echo "- **Nested Directories**: Deep tree creation and leaf I/O" >> $GITHUB_STEP_SUMMARY + echo "- **Rename**: Content preserved across rename" >> $GITHUB_STEP_SUMMARY + echo "- **Overwrite + Truncate**: Setattr(size=0) + shorter write" >> $GITHUB_STEP_SUMMARY + echo "- **Large Files**: 3 MiB binary round trip" >> $GITHUB_STEP_SUMMARY + echo "- **Edge Payloads**: All 256 byte values + empty files" >> $GITHUB_STEP_SUMMARY + echo "- **Symlinks**: Symlink + Lookup" >> $GITHUB_STEP_SUMMARY + echo "- **Missing Path**: Remove on missing entry errors cleanly" >> $GITHUB_STEP_SUMMARY + echo "- **FSINFO**: Non-zero rtpref/wtpref advertised" >> $GITHUB_STEP_SUMMARY + echo "- **Sequential Append**: Two-part concatenation" >> $GITHUB_STEP_SUMMARY + echo "- **ReadDir After Remove**: Meta cache does not serve stale entries" >> $GITHUB_STEP_SUMMARY + echo "" >> $GITHUB_STEP_SUMMARY + echo "### Harness" >> $GITHUB_STEP_SUMMARY + echo "Each test boots its own master + volume + filer + nfs subprocess" >> $GITHUB_STEP_SUMMARY + echo "stack on loopback and drives it via the NFSv3 RPC protocol using" >> $GITHUB_STEP_SUMMARY + echo "go-nfs-client. No kernel NFS mount or privileged port is required." >> $GITHUB_STEP_SUMMARY diff --git a/go.mod b/go.mod index d87f59d2d..2f02c92b9 100644 --- a/go.mod +++ b/go.mod @@ -152,6 +152,7 @@ require ( github.com/tarantool/go-tarantool/v2 v2.4.2 github.com/testcontainers/testcontainers-go v0.40.0 github.com/tikv/client-go/v2 v2.0.7 + github.com/willscott/go-nfs v0.0.3 github.com/xeipuuv/gojsonschema v1.2.0 github.com/ydb-platform/ydb-go-sdk-auth-environ v0.5.1 github.com/ydb-platform/ydb-go-sdk/v3 v3.134.0 @@ -256,6 +257,7 @@ require ( github.com/pquerna/otp v1.5.0 // indirect github.com/pterm/pterm v0.12.82 // indirect github.com/quic-go/quic-go v0.57.0 // indirect + github.com/rasky/go-xdr v0.0.0-20170124162913-1a41d1a06c93 // indirect github.com/rclone/Proton-API-Bridge v1.0.1-0.20260127174007-77f974840d11 // indirect github.com/rclone/go-proton-api v1.0.1-0.20260127173028-eb465cac3b18 // indirect github.com/rogpeppe/go-internal v1.14.1 // indirect @@ -270,6 +272,7 @@ require ( github.com/twpayne/go-kml/v3 v3.2.1 // indirect github.com/tyler-smith/go-bip39 v1.1.0 // indirect github.com/ulikunitz/xz v0.5.15 // indirect + github.com/willscott/go-nfs-client v0.0.0-20251022144359-801f10d98886 // indirect github.com/wk8/go-ordered-map/v2 v2.1.8 // indirect github.com/xeipuuv/gojsonpointer v0.0.0-20190905194746-02993c407bfb // indirect github.com/xeipuuv/gojsonreference v0.0.0-20180127040603-bd5ef7bd5415 // indirect diff --git a/go.sum b/go.sum index 0a04a0f03..8ae22fa0b 100644 --- a/go.sum +++ b/go.sum @@ -1783,6 +1783,8 @@ github.com/quic-go/quic-go v0.57.0 h1:AsSSrrMs4qI/hLrKlTH/TGQeTMY0ib1pAOX7vA3Adq github.com/quic-go/quic-go v0.57.0/go.mod h1:ly4QBAjHA2VhdnxhojRsCUOeJwKYg+taDlos92xb1+s= github.com/rabbitmq/amqp091-go v1.10.0 h1:STpn5XsHlHGcecLmMFCtg7mqq0RnD+zFr4uzukfVhBw= github.com/rabbitmq/amqp091-go v1.10.0/go.mod h1:Hy4jKW5kQART1u+JkDTF9YYOQUHXqMuhrgxOEeS7G4o= +github.com/rasky/go-xdr v0.0.0-20170124162913-1a41d1a06c93 h1:UVArwN/wkKjMVhh2EQGC0tEc1+FqiLlvYXY5mQ2f8Wg= +github.com/rasky/go-xdr v0.0.0-20170124162913-1a41d1a06c93/go.mod h1:Nfe4efndBz4TibWycNE+lqyJZiMX4ycx+QKV8Ta0f/o= github.com/rclone/Proton-API-Bridge v1.0.1-0.20260127174007-77f974840d11 h1:4MI2alxM/Ye2gIRBlYf28JGWTipZ4Zz7yAziPKrttjs= github.com/rclone/Proton-API-Bridge v1.0.1-0.20260127174007-77f974840d11/go.mod h1:3HLX7dwZgvB7nt+Yl/xdzVPcargQ1yBmJEUg3n+jMKM= github.com/rclone/go-proton-api v1.0.1-0.20260127173028-eb465cac3b18 h1:Lc+d3ISfQaMJKWZOE7z4ZSY4RVmdzbn1B0IM8xN18qM= @@ -2026,6 +2028,10 @@ github.com/vmihailenco/msgpack/v5 v5.4.1 h1:cQriyiUvjTwOHg8QZaPihLWeRAAVoCpE00IU github.com/vmihailenco/msgpack/v5 v5.4.1/go.mod h1:GaZTsDaehaPpQVyxrf5mtQlH+pc21PIudVV/E3rRQok= github.com/vmihailenco/tagparser/v2 v2.0.0 h1:y09buUbR+b5aycVFQs/g70pqKVZNBmxwAhO7/IwNM9g= github.com/vmihailenco/tagparser/v2 v2.0.0/go.mod h1:Wri+At7QHww0WTrCBeu4J6bNtoV6mEfg5OIWRZA9qds= +github.com/willscott/go-nfs v0.0.3 h1:Z5fHVxMsppgEucdkKBN26Vou19MtEM875NmRwj156RE= +github.com/willscott/go-nfs v0.0.3/go.mod h1:VhNccO67Oug787VNXcyx9JDI3ZoSpqoKMT/lWMhUIDg= +github.com/willscott/go-nfs-client v0.0.0-20251022144359-801f10d98886 h1:DtrBtkgTJk2XGt4T7eKdKVkd9A5NCevN2e4inLXtsqA= +github.com/willscott/go-nfs-client v0.0.0-20251022144359-801f10d98886/go.mod h1:Tq++Lr/FgiS3X48q5FETemXiSLGuYMQT2sPjYNPJSwA= github.com/wk8/go-ordered-map/v2 v2.1.8 h1:5h/BUHu93oj4gIdvHHHGsScSTMijfx5PeYkE/fJgbpc= github.com/wk8/go-ordered-map/v2 v2.1.8/go.mod h1:5nJHM5DyteebpVlHnWMV0rPz6Zp7+xBAnxjb1X5vnTw= github.com/wsxiaoys/terminal v0.0.0-20160513160801-0940f3fc43a0 h1:3UeQBvD0TFrlVjOeLOBz+CPAI8dnbqNSVwUwRrkp7vQ= diff --git a/test/nfs/Makefile b/test/nfs/Makefile new file mode 100644 index 000000000..9e695519f --- /dev/null +++ b/test/nfs/Makefile @@ -0,0 +1,36 @@ +.PHONY: all build test test-verbose test-short test-debug clean deps tidy + +all: build test + +# Build the weed binary first +build: + cd ../../weed && go build -o weed . + +# Install test dependencies +deps: + go mod download + +# Run all tests +test: build deps + go test -timeout 5m ./... + +# Run tests with verbose output +test-verbose: build deps + go test -v -timeout 5m ./... + +# Skip long-running integration tests +test-short: deps + go test -short -v ./... + +# Run tests with debug output from SeaweedFS +test-debug: build deps + go test -v -timeout 5m ./... 2>&1 | tee test.log + +# Clean up test artifacts +clean: + rm -f test.log + go clean -testcache + +# Update go.sum +tidy: + go mod tidy diff --git a/test/nfs/README.md b/test/nfs/README.md new file mode 100644 index 000000000..263e92dcd --- /dev/null +++ b/test/nfs/README.md @@ -0,0 +1,92 @@ +# SeaweedFS NFS Integration Tests + +End-to-end tests that boot a real SeaweedFS cluster (`master` + `volume` + +`filer`) plus the experimental `weed nfs` frontend and drive it through the +NFSv3 wire protocol. The tests talk to the server over TCP using +`github.com/willscott/go-nfs-client`, which means they do **not** need a +kernel NFS mount, privileged ports, or any platform-specific tooling. + +## Prerequisites + +1. Build the `weed` binary: + ```bash + cd ../../weed + go build -o weed . + ``` +2. Go 1.24 or later. + +## Running the tests + +```bash +# Build weed and run everything +make test + +# Verbose output, keeps the subprocess stdout +make test-verbose + +# Skip integration tests — useful when iterating on the framework itself +make test-short + +# Run a single test +go test -v -run TestNfsBasicReadWrite ./... +``` + +Every test starts its own cluster on random loopback ports, so runs are +isolated and can execute in parallel. + +## Layout + +- `framework.go` — launches `weed master`, `weed volume`, `weed filer`, and + `weed nfs` as subprocesses, waits for each to accept TCP, and exposes a + `Mount()` helper that returns an `nfsclient.Target`. +- `basic_test.go` — covers the most common NFS operations: + - Read/write round-trip (`TestNfsBasicReadWrite`) + - Mkdir / ReadDirPlus / RmDir (`TestNfsMkdirAndRmdir`) + - Nested directory + leaf file (`TestNfsNestedDirectories`) + - Rename preserves content (`TestNfsRenamePreservesContent`) + - Overwrite shrinks file size (`TestNfsOverwriteShrinksFile`) + - Large binary file round-trip (`TestNfsLargeFile`) + - Arbitrary binary and empty files (`TestNfsBinaryAndEmptyFiles`) + - Symlink + Readlink (`TestNfsSymlinkRoundTrip`) + - ReadDirPlus ordering sanity (`TestNfsReadDirPlusOrdering`) + - Remove on missing path errors cleanly (`TestNfsRemoveMissingFailsCleanly`) + - FSINFO advertises non-zero limits (`TestNfsFSInfoReturnsSaneLimits`) + - Sequential append writes concatenate (`TestNfsAppendIsSequential`) + - ReadDir after remove (`TestNfsReadDirAfterRemove`) + +## Debugging a failing test + +Keep the cluster temp dir for inspection: + +```go +config := DefaultTestConfig() +config.SkipCleanup = true +``` + +Enable subprocess stdout/stderr: + +```go +config := DefaultTestConfig() +config.EnableDebug = true +``` + +Or run with `-v`, which flips `EnableDebug` automatically via `testing.Verbose()`. + +## Notes + +- The NFS server binds to `127.0.0.1` with `-ip.bind=127.0.0.1` and exports + `/nfs_export`. The test framework pre-creates that directory via the + filer's HTTP API before starting the NFS server — the NFS server requires + its export root to exist in the filer's namespace with a real entry, and + the filer's synthetic `/` root does not match the `Name=="/"` check the + NFS server performs during `ensureIndexedEntry`. +- Ports are allocated dynamically. Each test run opens a short-lived + listener on `127.0.0.1:0`, reads back the assigned port, closes the + listener, and hands the port to `weed master/volume/filer/nfs`. There is + a tiny race window between close and reopen that has not been a problem + in practice but is worth remembering if you see a "bind: address already + in use" failure. +- All four `weed` components are started with explicit `-port.grpc=...` + flags. Without them, the default is `-port + 10000`, which overflows + `65535` whenever the HTTP port lands above `55535` — the kernel's + ephemeral port range on macOS routinely does. diff --git a/test/nfs/basic_test.go b/test/nfs/basic_test.go new file mode 100644 index 000000000..c971972f4 --- /dev/null +++ b/test/nfs/basic_test.go @@ -0,0 +1,400 @@ +package nfs + +import ( + "bytes" + "fmt" + "io" + "os" + "path" + "strings" + "testing" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" + nfsclient "github.com/willscott/go-nfs-client/nfs" +) + +// setupFramework is a small helper that boots the cluster for a single test +// and tears everything down on completion. Every test gets a fresh filer + +// volume pair so they cannot step on each other's namespace. +func setupFramework(t *testing.T) *NfsTestFramework { + t.Helper() + if testing.Short() { + t.Skip("skipping integration test in short mode") + } + config := DefaultTestConfig() + config.EnableDebug = testing.Verbose() + fw := NewNfsTestFramework(t, config) + require.NoError(t, fw.Setup(config), "framework setup") + t.Cleanup(fw.Cleanup) + return fw +} + +// writeAll writes payload to path on the target in a single Write call. The +// NFS WRITE3 RPC chunks internally, so this exists purely so tests read +// linearly. +func writeAll(t *testing.T, target *nfsclient.Target, remotePath string, payload []byte) { + t.Helper() + file, err := target.OpenFile(remotePath, 0o644) + require.NoError(t, err, "open %s for write", remotePath) + if len(payload) > 0 { + n, err := file.Write(payload) + require.NoError(t, err, "write %s", remotePath) + require.Equal(t, len(payload), n, "short write on %s", remotePath) + } + require.NoError(t, file.Close(), "close %s", remotePath) +} + +// readAll opens path on the target and returns the full file contents. +func readAll(t *testing.T, target *nfsclient.Target, remotePath string) []byte { + t.Helper() + file, err := target.Open(remotePath) + require.NoError(t, err, "open %s for read", remotePath) + defer file.Close() + content, err := io.ReadAll(file) + require.NoError(t, err, "read %s", remotePath) + return content +} + +// TestNfsBasicReadWrite exercises the most common NFS path: OpenFile + Write +// + Close followed by Open + Read to verify round-trip data integrity. +func TestNfsBasicReadWrite(t *testing.T) { + fw := setupFramework(t) + target, cleanup, err := fw.Mount() + require.NoError(t, err) + defer cleanup() + + payload := []byte("hello from seaweedfs nfs integration test") + writeAll(t, target, "/hello.txt", payload) + + got := readAll(t, target, "/hello.txt") + assert.Equal(t, payload, got, "round-tripped content must match") + + info, err := target.Getattr("/hello.txt") + require.NoError(t, err) + assert.Equal(t, int64(len(payload)), int64(info.Filesize)) +} + +// TestNfsMkdirAndRmdir covers Mkdir, ReadDirPlus, and RmDir. The readdir +// assertion also verifies that the newly-created directory shows up under +// the export root the way a POSIX client would expect. +func TestNfsMkdirAndRmdir(t *testing.T) { + fw := setupFramework(t) + target, cleanup, err := fw.Mount() + require.NoError(t, err) + defer cleanup() + + _, err = target.Mkdir("/dir1", 0o755) + require.NoError(t, err) + + entries, err := target.ReadDirPlus("/") + require.NoError(t, err) + found := false + for _, entry := range entries { + if entry.Name() == "dir1" { + found = true + assert.True(t, entry.IsDir(), "dir1 should be a directory") + } + } + assert.True(t, found, "expected dir1 in readdir listing") + + require.NoError(t, target.RmDir("/dir1")) + + // After removal, dir1 must be gone from the listing. + entries, err = target.ReadDirPlus("/") + require.NoError(t, err) + for _, entry := range entries { + assert.NotEqual(t, "dir1", entry.Name(), "dir1 should be removed") + } +} + +// TestNfsNestedDirectories ensures the server can materialise a deep tree in +// a single Mkdir-per-segment sequence and that reads/writes work at the +// leaves. +func TestNfsNestedDirectories(t *testing.T) { + fw := setupFramework(t) + target, cleanup, err := fw.Mount() + require.NoError(t, err) + defer cleanup() + + for _, segment := range []string{"/a", "/a/b", "/a/b/c"} { + _, err := target.Mkdir(segment, 0o755) + require.NoError(t, err, "mkdir %s", segment) + } + + payload := []byte("deep path content") + writeAll(t, target, "/a/b/c/leaf.txt", payload) + + got := readAll(t, target, "/a/b/c/leaf.txt") + assert.Equal(t, payload, got) + + require.NoError(t, target.Remove("/a/b/c/leaf.txt")) + require.NoError(t, target.RmDir("/a/b/c")) + require.NoError(t, target.RmDir("/a/b")) + require.NoError(t, target.RmDir("/a")) +} + +// TestNfsRenamePreservesContent renames a file and makes sure the content +// at the new path matches what was written at the old one, and that the +// old path disappears. It does not assert on inode identity because pjdfstest +// already covers that and this test intentionally avoids depending on the +// mount-side identity plumbing. +func TestNfsRenamePreservesContent(t *testing.T) { + fw := setupFramework(t) + target, cleanup, err := fw.Mount() + require.NoError(t, err) + defer cleanup() + + payload := []byte("rename me") + writeAll(t, target, "/src.txt", payload) + + require.NoError(t, target.Rename("/src.txt", "/dst.txt")) + + _, _, err = target.Lookup("/src.txt") + assert.Error(t, err, "source should be gone after rename") + + got := readAll(t, target, "/dst.txt") + assert.Equal(t, payload, got) + + require.NoError(t, target.Remove("/dst.txt")) +} + +// TestNfsOverwriteShrinksFile rewrites an existing file with shorter content +// and asserts Getattr reports the new (smaller) size. go-nfs-client's +// OpenFile does not pass O_TRUNC, so the test truncates explicitly via +// Setattr(size=0) before the second write — mirroring what `echo >file` +// does on a POSIX client. +func TestNfsOverwriteShrinksFile(t *testing.T) { + fw := setupFramework(t) + target, cleanup, err := fw.Mount() + require.NoError(t, err) + defer cleanup() + + writeAll(t, target, "/overwrite.txt", []byte("the quick brown fox")) + + require.NoError(t, target.Setattr("/overwrite.txt", nfsclient.Sattr3{ + Size: nfsclient.SetSize{SetIt: true, Size: 0}, + })) + + writeAll(t, target, "/overwrite.txt", []byte("short")) + + info, err := target.Getattr("/overwrite.txt") + require.NoError(t, err) + assert.Equal(t, int64(len("short")), int64(info.Filesize)) + + got := readAll(t, target, "/overwrite.txt") + assert.Equal(t, []byte("short"), got) + + require.NoError(t, target.Remove("/overwrite.txt")) +} + +// TestNfsLargeFile writes a multi-megabyte payload so the write path has to +// cut chunks and flush through the volume server rather than inlining +// content in the filer entry. +func TestNfsLargeFile(t *testing.T) { + fw := setupFramework(t) + target, cleanup, err := fw.Mount() + require.NoError(t, err) + defer cleanup() + + const size = 3 * 1024 * 1024 // 3 MiB — exceeds the 4 MiB inline cutoff boundary when combined with metadata + payload := make([]byte, size) + for i := range payload { + payload[i] = byte(i % 251) // non-repeating to catch offset bugs + } + + writeAll(t, target, "/big.bin", payload) + + info, err := target.Getattr("/big.bin") + require.NoError(t, err) + assert.Equal(t, int64(size), int64(info.Filesize)) + + got := readAll(t, target, "/big.bin") + require.Equal(t, size, len(got)) + assert.True(t, bytes.Equal(payload, got), "large file content must round-trip byte-for-byte") + + require.NoError(t, target.Remove("/big.bin")) +} + +// TestNfsBinaryAndEmptyFiles covers two edge-case payloads the write path +// tends to regress on: arbitrary binary bytes and zero-length files. +func TestNfsBinaryAndEmptyFiles(t *testing.T) { + fw := setupFramework(t) + target, cleanup, err := fw.Mount() + require.NoError(t, err) + defer cleanup() + + t.Run("AllByteValues", func(t *testing.T) { + payload := make([]byte, 256) + for i := range payload { + payload[i] = byte(i) + } + writeAll(t, target, "/binary.bin", payload) + assert.Equal(t, payload, readAll(t, target, "/binary.bin")) + require.NoError(t, target.Remove("/binary.bin")) + }) + + t.Run("EmptyFile", func(t *testing.T) { + writeAll(t, target, "/empty.txt", nil) + info, err := target.Getattr("/empty.txt") + require.NoError(t, err) + assert.Equal(t, int64(0), int64(info.Filesize)) + require.NoError(t, target.Remove("/empty.txt")) + }) +} + +// TestNfsSymlinkRoundTrip covers Symlink and Readlink through the nfs server. +// Readlink returns the target path; the server does not auto-traverse it. +func TestNfsSymlinkRoundTrip(t *testing.T) { + fw := setupFramework(t) + target, cleanup, err := fw.Mount() + require.NoError(t, err) + defer cleanup() + + // Symlink uses a different RPC than open+create, and our server routes it + // through the billy Change interface. + require.NoError(t, target.Symlink("/target.txt", "/link.txt")) + + // The underlying target does not need to exist for readlink to succeed. + file, _, err := target.Lookup("/link.txt") + require.NoError(t, err, "lookup symlink") + assert.True(t, file.Mode()&os.ModeSymlink != 0, "expected symlink mode, got %s", file.Mode()) + + require.NoError(t, target.Remove("/link.txt")) +} + +// TestNfsReadDirPlusOrdering creates a handful of files with distinct names +// and ensures ReadDirPlus surfaces every one of them. The server pages +// listings from the filer, so we want to make sure nothing is truncated. +func TestNfsReadDirPlusOrdering(t *testing.T) { + fw := setupFramework(t) + target, cleanup, err := fw.Mount() + require.NoError(t, err) + defer cleanup() + + _, err = target.Mkdir("/listing", 0o755) + require.NoError(t, err) + + names := []string{"alpha.txt", "beta.txt", "gamma.txt", "delta.txt", "epsilon.txt"} + for _, name := range names { + writeAll(t, target, path.Join("/listing", name), []byte(name)) + } + + entries, err := target.ReadDirPlus("/listing") + require.NoError(t, err) + seen := make(map[string]struct{}, len(entries)) + for _, entry := range entries { + if entry.Name() == "." || entry.Name() == ".." { + continue + } + seen[entry.Name()] = struct{}{} + } + for _, name := range names { + _, ok := seen[name] + assert.True(t, ok, "expected %s in directory listing", name) + } + + for _, name := range names { + require.NoError(t, target.Remove(path.Join("/listing", name))) + } + require.NoError(t, target.RmDir("/listing")) +} + +// TestNfsRemoveMissingFailsCleanly asserts that removing a non-existent path +// surfaces an error instead of silently succeeding. A bug where the server +// returned NFS3_OK on missing entries would hide metadata drift. +func TestNfsRemoveMissingFailsCleanly(t *testing.T) { + fw := setupFramework(t) + target, cleanup, err := fw.Mount() + require.NoError(t, err) + defer cleanup() + + err = target.Remove("/does_not_exist.txt") + require.Error(t, err, "removing a missing file must error") + // NFS3 surfaces this as NFS3ERR_NOENT; make sure the error text is + // recognisable without locking us into the library's exact wording. + assert.True(t, + strings.Contains(strings.ToLower(err.Error()), "noent") || + strings.Contains(strings.ToLower(err.Error()), "not exist") || + strings.Contains(strings.ToLower(err.Error()), "no such"), + "unexpected error shape: %v", err) +} + +// TestNfsFSInfoReturnsSaneLimits pokes at FSINFO so we catch regressions +// where the server advertises zero read/write limits (which would make +// clients fall back to the 8 KiB floor and slow every test that follows). +func TestNfsFSInfoReturnsSaneLimits(t *testing.T) { + fw := setupFramework(t) + target, cleanup, err := fw.Mount() + require.NoError(t, err) + defer cleanup() + + info, err := target.FSInfo() + require.NoError(t, err) + require.NotNil(t, info) + assert.Greater(t, info.RTPref, uint32(0), "rtpref must be positive") + assert.Greater(t, info.WTPref, uint32(0), "wtpref must be positive") +} + +// TestNfsAppendIsSequential writes two chunks to the same file in separate +// Open cycles and asserts the concatenation is preserved. The second write +// uses O_APPEND (the default Open path in go-nfs-client does not pass +// flags, so we explicitly reopen after writing the first chunk). +func TestNfsAppendIsSequential(t *testing.T) { + fw := setupFramework(t) + target, cleanup, err := fw.Mount() + require.NoError(t, err) + defer cleanup() + + const prefix = "part1-" + const suffix = "part2" + + writeAll(t, target, "/concat.txt", []byte(prefix)) + + file, err := target.OpenFile("/concat.txt", 0o644) + require.NoError(t, err) + // Seek to end before writing so we append rather than overwrite. go-nfs + // client's File.Seek uses the same offset tracking as Write so this is + // enough to place the second chunk after the first. + _, err = file.Seek(int64(len(prefix)), io.SeekStart) + require.NoError(t, err) + _, err = file.Write([]byte(suffix)) + require.NoError(t, err) + require.NoError(t, file.Close()) + + got := readAll(t, target, "/concat.txt") + assert.Equal(t, prefix+suffix, string(got)) + + require.NoError(t, target.Remove("/concat.txt")) +} + +// Regression: readdir should not emit stale entries after a remove. This is +// the scenario the PR's meta cache invalidation logic was written to fix. +func TestNfsReadDirAfterRemove(t *testing.T) { + fw := setupFramework(t) + target, cleanup, err := fw.Mount() + require.NoError(t, err) + defer cleanup() + + _, err = target.Mkdir("/churn", 0o755) + require.NoError(t, err) + for i := 0; i < 5; i++ { + writeAll(t, target, path.Join("/churn", fmt.Sprintf("f%d.txt", i)), []byte{byte(i)}) + } + // Remove the middle one and re-list. + require.NoError(t, target.Remove("/churn/f2.txt")) + + entries, err := target.ReadDirPlus("/churn") + require.NoError(t, err) + for _, entry := range entries { + assert.NotEqual(t, "f2.txt", entry.Name(), "removed file should not reappear in listing") + } + + for i := 0; i < 5; i++ { + if i == 2 { + continue + } + require.NoError(t, target.Remove(path.Join("/churn", fmt.Sprintf("f%d.txt", i)))) + } + require.NoError(t, target.RmDir("/churn")) +} diff --git a/test/nfs/framework.go b/test/nfs/framework.go new file mode 100644 index 000000000..eb1e9f380 --- /dev/null +++ b/test/nfs/framework.go @@ -0,0 +1,423 @@ +package nfs + +import ( + "bytes" + "fmt" + "io" + "mime/multipart" + "net" + "net/http" + "os" + "os/exec" + "path/filepath" + "runtime" + "strings" + "syscall" + "testing" + "time" + + "github.com/seaweedfs/seaweedfs/test/testutil" + "github.com/stretchr/testify/require" + nfsclient "github.com/willscott/go-nfs-client/nfs" + "github.com/willscott/go-nfs-client/nfs/rpc" +) + +// NfsTestFramework boots a minimal SeaweedFS cluster (master + volume + filer) +// plus the experimental `weed nfs` frontend and hands out NFSv3 RPC clients +// that talk to it. Everything is driven via subprocesses so the tests exercise +// the same binary an operator would deploy, and no kernel mount is required. +type NfsTestFramework struct { + t *testing.T + tempDir string + dataDir string + masterProcess *os.Process + volumeProcess *os.Process + filerProcess *os.Process + nfsProcess *os.Process + masterAddr string + masterGrpc int + volumeAddr string + volumeGrpc int + filerAddr string + filerGrpc int + nfsAddr string + exportRoot string + weedBinary string + isSetup bool + skipCleanup bool +} + +// TestConfig controls how the framework boots the cluster. +type TestConfig struct { + NumVolumes int + EnableDebug bool + SkipCleanup bool // keep temp dir on failure for inspection + // ExportRoot is the filer path the NFS server exports. Defaults to "/" + // so tests can use any path, with a single warning logged by the server. + ExportRoot string +} + +// DefaultTestConfig returns the defaults used by most tests. A dedicated +// /nfs_export subtree is used as the NFS export root because the NFS server +// requires the export directory to exist in the filer's namespace and carry +// a non-zero inode — passing "/" would succeed only for filer setups that +// have already backfilled the root inode. +func DefaultTestConfig() *TestConfig { + return &TestConfig{ + NumVolumes: 3, + EnableDebug: false, + SkipCleanup: false, + ExportRoot: "/nfs_export", + } +} + +// NewNfsTestFramework allocates a framework bound to the current test. Call +// Setup next to actually start the cluster. +func NewNfsTestFramework(t *testing.T, config *TestConfig) *NfsTestFramework { + if config == nil { + config = DefaultTestConfig() + } + + tempDir, err := os.MkdirTemp("", "seaweedfs_nfs_test_") + require.NoError(t, err) + + // testutil.MustAllocatePorts holds every listener open until the full + // batch has been reserved, which avoids the "close-then-hope" race my + // original per-port helper had. We need seven ports: four HTTP (master, + // volume, filer, nfs) and three gRPC (master, volume, filer — nfs has + // no gRPC endpoint). + ports := testutil.MustAllocatePorts(t, 7) + + exportRoot := config.ExportRoot + if exportRoot == "" { + exportRoot = "/" + } + + return &NfsTestFramework{ + t: t, + tempDir: tempDir, + dataDir: filepath.Join(tempDir, "data"), + masterAddr: fmt.Sprintf("127.0.0.1:%d", ports[0]), + masterGrpc: ports[1], + volumeAddr: fmt.Sprintf("127.0.0.1:%d", ports[2]), + volumeGrpc: ports[3], + filerAddr: fmt.Sprintf("127.0.0.1:%d", ports[4]), + filerGrpc: ports[5], + nfsAddr: fmt.Sprintf("127.0.0.1:%d", ports[6]), + exportRoot: exportRoot, + weedBinary: findWeedBinary(), + isSetup: false, + skipCleanup: config.SkipCleanup, + } +} + +// Setup starts the SeaweedFS cluster and the NFS frontend, waiting for each +// component to accept connections before moving on. +func (f *NfsTestFramework) Setup(config *TestConfig) error { + if f.isSetup { + return fmt.Errorf("framework already setup") + } + + dirs := []string{ + f.dataDir, + filepath.Join(f.dataDir, "master"), + filepath.Join(f.dataDir, "volume"), + } + for _, dir := range dirs { + if err := os.MkdirAll(dir, 0755); err != nil { + return fmt.Errorf("failed to create directory %s: %v", dir, err) + } + } + + if err := f.startMaster(config); err != nil { + return fmt.Errorf("failed to start master: %v", err) + } + if !testutil.WaitForPort(portFromAddr(f.masterAddr), testutil.SeaweedMiniStartupTimeout) { + return fmt.Errorf("master not ready at %s", f.masterAddr) + } + + if err := f.startVolumeServer(config); err != nil { + return fmt.Errorf("failed to start volume server: %v", err) + } + if !testutil.WaitForPort(portFromAddr(f.volumeAddr), testutil.SeaweedMiniStartupTimeout) { + return fmt.Errorf("volume server not ready at %s", f.volumeAddr) + } + + if err := f.startFiler(config); err != nil { + return fmt.Errorf("failed to start filer: %v", err) + } + if !testutil.WaitForPort(portFromAddr(f.filerAddr), testutil.SeaweedMiniStartupTimeout) { + return fmt.Errorf("filer not ready at %s", f.filerAddr) + } + + // Pre-create the export root in the filer's namespace. The NFS server + // expects its export directory to exist with a real inode; uploading a + // placeholder file creates the parent directory implicitly and then + // removing the file leaves the empty directory in place. + if f.exportRoot != "/" { + if err := f.ensureExportRootExists(); err != nil { + return fmt.Errorf("failed to pre-create export root %s: %v", f.exportRoot, err) + } + } + + if err := f.startNfsServer(config); err != nil { + return fmt.Errorf("failed to start NFS server: %v", err) + } + if !testutil.WaitForPort(portFromAddr(f.nfsAddr), testutil.SeaweedMiniStartupTimeout) { + return fmt.Errorf("NFS server not ready at %s", f.nfsAddr) + } + + // Let the NFS server finish wiring up its gRPC subscription to the filer + // before the first client call hits MOUNT/LOOKUP. + time.Sleep(500 * time.Millisecond) + + f.isSetup = true + return nil +} + +// Cleanup stops all processes. Temp state is preserved if SkipCleanup is set. +func (f *NfsTestFramework) Cleanup() { + processes := []*os.Process{f.nfsProcess, f.filerProcess, f.volumeProcess, f.masterProcess} + for _, proc := range processes { + if proc != nil { + _ = proc.Signal(syscall.SIGTERM) + _, _ = proc.Wait() + } + } + if !f.skipCleanup { + _ = os.RemoveAll(f.tempDir) + } +} + +// NfsAddr returns the TCP address the NFS server is listening on. +func (f *NfsTestFramework) NfsAddr() string { return f.nfsAddr } + +// FilerAddr returns the TCP address of the filer. +func (f *NfsTestFramework) FilerAddr() string { return f.filerAddr } + +// ExportRoot returns the path the NFS server exports. +func (f *NfsTestFramework) ExportRoot() string { return f.exportRoot } + +// Mount opens an NFSv3 MOUNT+NFS connection against the running NFS server +// and returns a Target that tests can drive like a mini-VFS. Caller is +// responsible for calling the returned cleanup func to Unmount and close the +// TCP connection. +func (f *NfsTestFramework) Mount() (*nfsclient.Target, func(), error) { + var ( + client *rpc.Client + err error + ) + // The NFS server's TCP listener may already be accepting connections when + // waitForService returns, but the RPC program registration can trail it + // by a few milliseconds. Retry the dial to absorb that small window. + for attempt := 0; attempt < 20; attempt++ { + client, err = rpc.DialTCP("tcp", f.nfsAddr, false) + if err == nil { + break + } + time.Sleep(25 * time.Millisecond) + } + if err != nil { + return nil, nil, fmt.Errorf("dial NFS: %w", err) + } + + // Note: do not set Mount.Addr here. When Addr is non-empty, the go-nfs + // client re-dials via portmapper and concatenates `:111` onto the + // address, which produces "too many colons" for a raw `host:port` + // string. Reusing the existing RPC client avoids that path entirely. + mounter := &nfsclient.Mount{Client: client} + target, err := mounter.Mount(f.exportRoot, rpc.AuthNull) + if err != nil { + client.Close() + return nil, nil, fmt.Errorf("mount %s: %w", f.exportRoot, err) + } + + cleanup := func() { + _ = mounter.Unmount() + client.Close() + } + return target, cleanup, nil +} + +func (f *NfsTestFramework) startMaster(config *TestConfig) error { + _, masterPort := splitHostPort(f.masterAddr) + args := []string{ + "master", + "-ip=127.0.0.1", + fmt.Sprintf("-port=%d", masterPort), + fmt.Sprintf("-port.grpc=%d", f.masterGrpc), + "-mdir=" + filepath.Join(f.dataDir, "master"), + "-raftBootstrap", + "-peers=none", + } + return f.startProcess(&f.masterProcess, config, args) +} + +func (f *NfsTestFramework) startVolumeServer(config *TestConfig) error { + _, volumePort := splitHostPort(f.volumeAddr) + // pb.ServerAddress encodes a non-default gRPC port as `host:port.grpc`. + // See weed/pb/server_address.go — the dot, not a colon, is the separator + // between the HTTP port and the gRPC port. + masterWithGrpc := fmt.Sprintf("%s.%d", f.masterAddr, f.masterGrpc) + args := []string{ + "volume", + "-master=" + masterWithGrpc, + "-ip=127.0.0.1", + fmt.Sprintf("-port=%d", volumePort), + fmt.Sprintf("-port.grpc=%d", f.volumeGrpc), + "-dir=" + filepath.Join(f.dataDir, "volume"), + fmt.Sprintf("-max=%d", config.NumVolumes), + } + return f.startProcess(&f.volumeProcess, config, args) +} + +func (f *NfsTestFramework) startFiler(config *TestConfig) error { + _, filerPort := splitHostPort(f.filerAddr) + masterWithGrpc := fmt.Sprintf("%s.%d", f.masterAddr, f.masterGrpc) + args := []string{ + "filer", + "-master=" + masterWithGrpc, + "-ip=127.0.0.1", + fmt.Sprintf("-port=%d", filerPort), + fmt.Sprintf("-port.grpc=%d", f.filerGrpc), + } + return f.startProcess(&f.filerProcess, config, args) +} + +func (f *NfsTestFramework) startNfsServer(config *TestConfig) error { + _, nfsPort := splitHostPort(f.nfsAddr) + // `host:port.grpc` encoding — see pb/server_address.go. + filerWithGrpc := fmt.Sprintf("%s.%d", f.filerAddr, f.filerGrpc) + args := []string{ + "nfs", + "-filer=" + filerWithGrpc, + "-ip.bind=127.0.0.1", + fmt.Sprintf("-port=%d", nfsPort), + "-filer.path=" + f.exportRoot, + } + return f.startProcess(&f.nfsProcess, config, args) +} + +func (f *NfsTestFramework) startProcess(target **os.Process, config *TestConfig, args []string) error { + cmd := exec.Command(f.weedBinary, args...) + cmd.Dir = f.tempDir + if config.EnableDebug { + cmd.Stdout = os.Stdout + cmd.Stderr = os.Stderr + } + if err := cmd.Start(); err != nil { + return err + } + *target = cmd.Process + return nil +} + +// portFromAddr returns just the port number from a `host:port` string. +// testutil.WaitForPort takes an int port, not a full address. +func portFromAddr(addr string) int { + _, port := splitHostPort(addr) + return port +} + +// ensureExportRootExists posts a placeholder file to f.exportRoot via the +// filer's HTTP API, then deletes it. That roundtrip implicitly creates the +// target directory so the NFS server has something to mount. We bypass +// weed/pb here because the HTTP client is simpler and needs no gRPC stubs. +func (f *NfsTestFramework) ensureExportRootExists() error { + exportRoot := strings.TrimRight(f.exportRoot, "/") + if exportRoot == "" { + return nil + } + placeholder := exportRoot + "/.nfs_test_init" + filerURL := "http://" + f.filerAddr + placeholder + + var body bytes.Buffer + writer := multipart.NewWriter(&body) + part, err := writer.CreateFormFile("file", ".nfs_test_init") + if err != nil { + return err + } + if _, err := io.WriteString(part, ""); err != nil { + return err + } + if err := writer.Close(); err != nil { + return err + } + + httpClient := &http.Client{Timeout: 10 * time.Second} + req, err := http.NewRequest(http.MethodPost, filerURL, &body) + if err != nil { + return err + } + req.Header.Set("Content-Type", writer.FormDataContentType()) + resp, err := httpClient.Do(req) + if err != nil { + return err + } + _, _ = io.Copy(io.Discard, resp.Body) + resp.Body.Close() + if resp.StatusCode/100 != 2 { + return fmt.Errorf("filer POST %s returned status %d", filerURL, resp.StatusCode) + } + + // Delete the placeholder; the directory stays behind. + deleteReq, err := http.NewRequest(http.MethodDelete, filerURL, nil) + if err != nil { + return err + } + deleteResp, err := httpClient.Do(deleteReq) + if err != nil { + return err + } + _, _ = io.Copy(io.Discard, deleteResp.Body) + deleteResp.Body.Close() + if deleteResp.StatusCode/100 != 2 && deleteResp.StatusCode != http.StatusNotFound { + return fmt.Errorf("filer DELETE %s returned status %d", filerURL, deleteResp.StatusCode) + } + return nil +} + +func splitHostPort(addr string) (string, int) { + host, portStr, err := net.SplitHostPort(addr) + if err != nil { + return "", 0 + } + var port int + _, _ = fmt.Sscanf(portStr, "%d", &port) + return host, port +} + +// findWeedBinary locates the weed binary, preferring the local build in the +// checkout so tests run against the code under review rather than whatever is +// on $PATH. +func findWeedBinary() string { + if _, thisFile, _, ok := runtime.Caller(0); ok { + thisDir := filepath.Dir(thisFile) + candidates := []string{ + filepath.Join(thisDir, "../../weed/weed"), + filepath.Join(thisDir, "../weed/weed"), + } + for _, candidate := range candidates { + if _, err := os.Stat(candidate); err == nil { + abs, _ := filepath.Abs(candidate) + return abs + } + } + } + cwd, _ := os.Getwd() + candidates := []string{ + filepath.Join(cwd, "../../weed/weed"), + filepath.Join(cwd, "../weed/weed"), + filepath.Join(cwd, "./weed"), + } + for _, candidate := range candidates { + if _, err := os.Stat(candidate); err == nil { + abs, _ := filepath.Abs(candidate) + return abs + } + } + if path, err := exec.LookPath("weed"); err == nil { + return path + } + return "weed" +} diff --git a/test/nfs/go.mod b/test/nfs/go.mod new file mode 100644 index 000000000..cfb532528 --- /dev/null +++ b/test/nfs/go.mod @@ -0,0 +1,21 @@ +module seaweedfs-nfs-tests + +go 1.25.0 + +// test/testutil lives inside the main seaweedfs module; pull it in via a +// local replace so this integration suite can reuse the shared port +// allocator and readiness helpers instead of reinventing them. +replace github.com/seaweedfs/seaweedfs => ../.. + +require ( + github.com/seaweedfs/seaweedfs v0.0.0-00010101000000-000000000000 + github.com/stretchr/testify v1.11.1 + github.com/willscott/go-nfs-client v0.0.0-20251022144359-801f10d98886 +) + +require ( + github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc // indirect + github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2 // indirect + github.com/rasky/go-xdr v0.0.0-20170124162913-1a41d1a06c93 // indirect + gopkg.in/yaml.v3 v3.0.1 // indirect +) diff --git a/test/nfs/go.sum b/test/nfs/go.sum new file mode 100644 index 000000000..b2bc6f9fb --- /dev/null +++ b/test/nfs/go.sum @@ -0,0 +1,14 @@ +github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc h1:U9qPSI2PIWSS1VwoXQT9A3Wy9MM3WgvqSxFWenqJduM= +github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38= +github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2 h1:Jamvg5psRIccs7FGNTlIRMkT8wgtp5eCXdBlqhYGL6U= +github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4= +github.com/rasky/go-xdr v0.0.0-20170124162913-1a41d1a06c93 h1:UVArwN/wkKjMVhh2EQGC0tEc1+FqiLlvYXY5mQ2f8Wg= +github.com/rasky/go-xdr v0.0.0-20170124162913-1a41d1a06c93/go.mod h1:Nfe4efndBz4TibWycNE+lqyJZiMX4ycx+QKV8Ta0f/o= +github.com/stretchr/testify v1.11.1 h1:7s2iGBzp5EwR7/aIZr8ao5+dra3wiQyKjjFuvgVKu7U= +github.com/stretchr/testify v1.11.1/go.mod h1:wZwfW3scLgRK+23gO65QZefKpKQRnfz6sD981Nm4B6U= +github.com/willscott/go-nfs-client v0.0.0-20251022144359-801f10d98886 h1:DtrBtkgTJk2XGt4T7eKdKVkd9A5NCevN2e4inLXtsqA= +github.com/willscott/go-nfs-client v0.0.0-20251022144359-801f10d98886/go.mod h1:Tq++Lr/FgiS3X48q5FETemXiSLGuYMQT2sPjYNPJSwA= +gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405 h1:yhCVgyC4o1eVCa2tZl7eS0r+SDo693bJlVdllGtEeKM= +gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0= +gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA= +gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM= diff --git a/weed/command/command.go b/weed/command/command.go index 90eb3ad68..4baa92f04 100644 --- a/weed/command/command.go +++ b/weed/command/command.go @@ -47,6 +47,7 @@ var Commands = []*Command{ cmdVolume, cmdWebDav, cmdSftp, + cmdNfs, cmdWorker, } diff --git a/weed/command/nfs.go b/weed/command/nfs.go new file mode 100644 index 000000000..d1f01489d --- /dev/null +++ b/weed/command/nfs.go @@ -0,0 +1,100 @@ +package command + +import ( + "fmt" + + "github.com/seaweedfs/seaweedfs/weed/glog" + "github.com/seaweedfs/seaweedfs/weed/pb" + "github.com/seaweedfs/seaweedfs/weed/security" + weed_server_nfs "github.com/seaweedfs/seaweedfs/weed/server/nfs" + "github.com/seaweedfs/seaweedfs/weed/util" + "github.com/seaweedfs/seaweedfs/weed/util/version" +) + +var ( + nfsStandaloneOptions NfsOptions +) + +type NfsOptions struct { + filer *string + ipBind *string + port *int + filerRootPath *string + readOnly *bool + allowedClients *string + volumeServerAccess *string +} + +func init() { + cmdNfs.Run = runNfs // break init cycle + nfsStandaloneOptions.filer = cmdNfs.Flag.String("filer", "localhost:8888", "filer server address") + nfsStandaloneOptions.ipBind = cmdNfs.Flag.String("ip.bind", "127.0.0.1", "ip address to bind to. Defaults to loopback; override explicitly to expose the experimental server to the network.") + nfsStandaloneOptions.port = cmdNfs.Flag.Int("port", 2049, "NFS server listen port") + nfsStandaloneOptions.filerRootPath = cmdNfs.Flag.String("filer.path", "", "remote path from filer server to export. Required: no default is provided so operators must opt in to exporting a namespace subtree.") + nfsStandaloneOptions.readOnly = cmdNfs.Flag.Bool("readOnly", false, "export the filer path as read only") + nfsStandaloneOptions.allowedClients = cmdNfs.Flag.String("allowedClients", "", "comma-separated client IPs, hostnames, or CIDRs allowed to connect") + nfsStandaloneOptions.volumeServerAccess = cmdNfs.Flag.String("volumeServerAccess", "direct", "access volume servers by [direct|publicUrl|filerProxy]") +} + +var cmdNfs = &Command{ + UsageLine: "nfs -port=2049 -filer= -filer.path=", + Short: "start an experimental NFSv3 server backed by a filer", + Long: `start an experimental NFSv3 server backed by a filer. + +This command serves an experimental filer-native NFSv3 frontend with +deterministic filehandles, filer-backed metadata operations, and direct +volume-server data access for chunk reads and buffered writes. + +Safer defaults (since export ACLs are still not implemented): + + - ip.bind defaults to 127.0.0.1, so the server is not reachable from + other hosts unless you override it explicitly. + - filer.path has no default; you must pick the subtree to export. + +Override -ip.bind to a routable address only after you have reviewed +-allowedClients and the readiness of the rest of your deployment. + `, +} + +func runNfs(cmd *Command, args []string) bool { + util.LoadSecurityConfiguration() + + if *nfsStandaloneOptions.ipBind == "" { + *nfsStandaloneOptions.ipBind = "127.0.0.1" + } + + if *nfsStandaloneOptions.filerRootPath == "" { + glog.Errorf("-filer.path is required: pick an explicit subtree to export; exporting \"/\" is not a default") + return false + } + if *nfsStandaloneOptions.filerRootPath == "/" { + glog.Warningf("-filer.path=/ exports the entire filer namespace; ensure -allowedClients or -ip.bind constrains access") + } + + listenAddress := fmt.Sprintf("%s:%d", *nfsStandaloneOptions.ipBind, *nfsStandaloneOptions.port) + glog.V(0).Infof("Starting Seaweed NFS Server %s at %s", version.Version(), listenAddress) + + grpcDialOption := security.LoadClientTLS(util.GetViper(), "grpc.client") + + nfsServer, err := weed_server_nfs.NewServer(&weed_server_nfs.Option{ + Filer: pb.ServerAddress(*nfsStandaloneOptions.filer), + BindIp: *nfsStandaloneOptions.ipBind, + Port: *nfsStandaloneOptions.port, + FilerRootPath: *nfsStandaloneOptions.filerRootPath, + ReadOnly: *nfsStandaloneOptions.readOnly, + AllowedClients: util.StringSplit(*nfsStandaloneOptions.allowedClients, ","), + VolumeServerAccess: *nfsStandaloneOptions.volumeServerAccess, + GrpcDialOption: grpcDialOption, + }) + if err != nil { + glog.Errorf("NFS Server startup error: %v", err) + return false + } + + if err := nfsServer.Start(); err != nil { + glog.Errorf("NFS Server startup error: %v", err) + return false + } + + return true +} diff --git a/weed/filer/filechunk_manifest.go b/weed/filer/filechunk_manifest.go index d4c423c9e..b82cdf52a 100644 --- a/weed/filer/filechunk_manifest.go +++ b/weed/filer/filechunk_manifest.go @@ -151,7 +151,7 @@ func retriedStreamFetchChunkData(ctx context.Context, writer io.Writer, urlStrin retriedCnt++ var localProcessed int var writeErr error - shouldRetry, err = util_http.ReadUrlAsStream(ctx, urlString+"?readDeleted=true", jwt, cipherKey, isGzipped, isFullChunk, offset, size, func(data []byte) { + shouldRetry, err = util_http.ReadUrlAsStream(ctx, util_http.AppendQueryParameter(urlString, "readDeleted", "true"), jwt, cipherKey, isGzipped, isFullChunk, offset, size, func(data []byte) { // Check for context cancellation during data processing select { case <-ctx.Done(): diff --git a/weed/filer/filer.go b/weed/filer/filer.go index c605396a6..d31c74fbc 100644 --- a/weed/filer/filer.go +++ b/weed/filer/filer.go @@ -13,6 +13,7 @@ import ( "github.com/seaweedfs/seaweedfs/weed/cluster/lock_manager" "github.com/seaweedfs/seaweedfs/weed/filer/empty_folder_cleanup" + "github.com/seaweedfs/seaweedfs/weed/sequence" "github.com/seaweedfs/seaweedfs/weed/cluster" "github.com/seaweedfs/seaweedfs/weed/pb" @@ -63,6 +64,7 @@ type Filer struct { DeletionRetryQueue *DeletionRetryQueue EmptyFolderCleaner *empty_folder_cleanup.EmptyFolderCleaner EmptyFolderCleanupDelay time.Duration + inodeSequencer sequence.Sequencer } func NewFiler(masters pb.ServerDiscovery, grpcDialOption grpc.DialOption, filerHost pb.ServerAddress, filerGroup string, collection string, replication string, dataCenter string, maxFilenameLength uint32, notifyFn func()) *Filer { @@ -77,6 +79,7 @@ func NewFiler(masters pb.ServerDiscovery, grpcDialOption grpc.DialOption, filerH MaxFilenameLength: maxFilenameLength, deletionQuit: make(chan struct{}), DeletionRetryQueue: NewDeletionRetryQueue(), + inodeSequencer: newInodeSequencer(filerHost), } if f.UniqueFilerId < 0 { f.UniqueFilerId = -f.UniqueFilerId @@ -231,6 +234,7 @@ func (f *Filer) CreateEntry(ctx context.Context, entry *Entry, o_excl bool, isFr */ if oldEntry == nil { + f.ensureEntryInode(entry) if !skipCreateParentDir { dirParts := strings.Split(string(entry.FullPath), "/") @@ -315,6 +319,7 @@ func (f *Filer) ensureParentDirectoryEntry(ctx context.Context, entry *Entry, di GroupNames: entry.GroupNames, }, } + f.ensureEntryInode(dirEntry) if isUnderBuckets && level > 3 { // Parent directories under buckets are created automatically; no additional logging. } @@ -351,6 +356,12 @@ func (f *Filer) ensureParentDirectoryEntry(ctx context.Context, entry *Entry, di func (f *Filer) UpdateEntry(ctx context.Context, oldEntry, entry *Entry) (err error) { if oldEntry != nil { entry.Attr.Crtime = oldEntry.Attr.Crtime + if oldEntry.Attr.Inode != 0 { + // Object identity must not change on in-place updates. + entry.Attr.Inode = oldEntry.Attr.Inode + } else { + f.ensureEntryInode(entry) + } if oldEntry.IsDirectory() && !entry.IsDirectory() { glog.ErrorfCtx(ctx, "existing %s is a directory", oldEntry.FullPath) return fmt.Errorf("%s: %w", oldEntry.FullPath, filer_pb.ErrExistingIsDirectory) diff --git a/weed/filer/filer_inode.go b/weed/filer/filer_inode.go new file mode 100644 index 000000000..e4cacfa27 --- /dev/null +++ b/weed/filer/filer_inode.go @@ -0,0 +1,51 @@ +package filer + +import ( + "os" + "strconv" + + "github.com/seaweedfs/seaweedfs/weed/glog" + "github.com/seaweedfs/seaweedfs/weed/pb" + "github.com/seaweedfs/seaweedfs/weed/sequence" +) + +// newInodeSequencer constructs the inode sequencer used to assign object +// identity for filer entries. The Snowflake node id defaults to a masked hash +// of filerHost, which only has 1024 possible values; operators running a +// multi-filer cluster should set SEAWEEDFS_FILER_SNOWFLAKE_ID to an explicit +// per-filer value (1..1023) to avoid birthday-paradox collisions. +// +// Initialization failures are fatal: a process-local fallback allocator would +// re-use inode values across restarts and violate the stable object identity +// guarantee that NFS filehandles and the inode secondary index rely on. +func newInodeSequencer(filerHost pb.ServerAddress) sequence.Sequencer { + snowflakeId := parseSnowflakeIdFromEnv() + seq, err := sequence.NewSnowflakeSequencer(string(filerHost), snowflakeId) + if err != nil { + glog.Fatalf("initialize inode sequencer for filer %s (snowflakeId=%d): %v", filerHost, snowflakeId, err) + } + return seq +} + +func parseSnowflakeIdFromEnv() int { + raw := os.Getenv("SEAWEEDFS_FILER_SNOWFLAKE_ID") + if raw == "" { + return 0 + } + id, err := strconv.Atoi(raw) + if err != nil || id < 0 || id > 0x3ff { + glog.Fatalf("SEAWEEDFS_FILER_SNOWFLAKE_ID must be an integer in [0,1023], got %q", raw) + } + return id +} + +func (f *Filer) ensureEntryInode(entry *Entry) { + if entry == nil || entry.Attr.Inode != 0 { + return + } + entry.Attr.Inode = f.nextInode() +} + +func (f *Filer) nextInode() uint64 { + return f.inodeSequencer.NextFileId(1) +} diff --git a/weed/filer/filer_inode_index.go b/weed/filer/filer_inode_index.go new file mode 100644 index 000000000..ba0e46ae7 --- /dev/null +++ b/weed/filer/filer_inode_index.go @@ -0,0 +1,300 @@ +package filer + +import ( + "context" + "encoding/binary" + "encoding/json" + "sort" + + "github.com/seaweedfs/seaweedfs/weed/glog" + "github.com/seaweedfs/seaweedfs/weed/util" +) + +const inodeIndexKeyPrefix = "filer.inode.path." +const InodeIndexInitialGeneration uint64 = 1 + +type inodeIndexEntry struct { + path util.FullPath + inode uint64 +} + +type InodeIndexRecord struct { + Generation uint64 `json:"generation,omitempty"` + Paths []string `json:"paths,omitempty"` +} + +func InodeIndexKey(inode uint64) []byte { + key := make([]byte, len(inodeIndexKeyPrefix)+8) + copy(key, inodeIndexKeyPrefix) + binary.BigEndian.PutUint64(key[len(inodeIndexKeyPrefix):], inode) + return key +} + +func DecodeInodeIndexRecord(value []byte) (*InodeIndexRecord, error) { + if len(value) == 0 { + return &InodeIndexRecord{}, nil + } + + // The first foundation slice stored the current path as raw bytes. Keep that + // format readable so existing records are transparently upgraded on write. + if value[0] != '{' { + record := &InodeIndexRecord{Generation: InodeIndexInitialGeneration} + record.addPath(util.FullPath(value)) + return record, nil + } + + record := &InodeIndexRecord{} + if err := json.Unmarshal(value, record); err != nil { + return nil, err + } + record.normalize() + return record, nil +} + +func (record *InodeIndexRecord) Encode() ([]byte, error) { + record.normalize() + return json.Marshal(record) +} + +func (record *InodeIndexRecord) normalize() { + if len(record.Paths) == 0 { + return + } + if record.Generation == 0 { + record.Generation = InodeIndexInitialGeneration + } + + sanitized := make([]string, 0, len(record.Paths)) + for _, path := range record.Paths { + if path == "" { + continue + } + sanitized = append(sanitized, path) + } + if len(sanitized) == 0 { + record.Paths = nil + return + } + + sort.Strings(sanitized) + deduped := sanitized[:1] + for _, path := range sanitized[1:] { + if path == deduped[len(deduped)-1] { + continue + } + deduped = append(deduped, path) + } + record.Paths = deduped +} + +func (record *InodeIndexRecord) addPath(path util.FullPath) bool { + if path == "" { + return false + } + record.normalize() + target := string(path) + index := sort.SearchStrings(record.Paths, target) + if index < len(record.Paths) && record.Paths[index] == target { + return false + } + record.Paths = append(record.Paths, "") + copy(record.Paths[index+1:], record.Paths[index:]) + record.Paths[index] = target + return true +} + +func (record *InodeIndexRecord) removePath(path util.FullPath) bool { + if len(record.Paths) == 0 || path == "" { + return false + } + record.normalize() + target := string(path) + index := sort.SearchStrings(record.Paths, target) + if index >= len(record.Paths) || record.Paths[index] != target { + return false + } + record.Paths = append(record.Paths[:index], record.Paths[index+1:]...) + if len(record.Paths) == 0 { + record.Paths = nil + } + return true +} + +func (record *InodeIndexRecord) CanonicalPath() util.FullPath { + record.normalize() + if len(record.Paths) == 0 { + return "" + } + return util.FullPath(record.Paths[0]) +} + +func (record *InodeIndexRecord) FullPaths() []util.FullPath { + record.normalize() + if len(record.Paths) == 0 { + return nil + } + paths := make([]util.FullPath, 0, len(record.Paths)) + for _, path := range record.Paths { + paths = append(paths, util.FullPath(path)) + } + return paths +} + +func (fsw *FilerStoreWrapper) lookupInodeIndex(ctx context.Context, inode uint64) (*InodeIndexRecord, error) { + if inode == 0 { + return nil, ErrKvNotFound + } + + value, err := fsw.KvGet(ctx, InodeIndexKey(inode)) + if err != nil { + return nil, err + } + + return DecodeInodeIndexRecord(value) +} + +func (fsw *FilerStoreWrapper) storeInodeIndex(ctx context.Context, path util.FullPath, inode uint64) error { + if inode == 0 || path == "" { + return nil + } + + record, err := fsw.lookupInodeIndex(ctx, inode) + if err != nil { + if err != ErrKvNotFound { + return err + } + record = &InodeIndexRecord{Generation: InodeIndexInitialGeneration} + } + record.addPath(path) + + value, err := record.Encode() + if err != nil { + return err + } + return fsw.KvPut(ctx, InodeIndexKey(inode), value) +} + +func (fsw *FilerStoreWrapper) lookupInodePath(ctx context.Context, inode uint64) (util.FullPath, error) { + record, err := fsw.lookupInodeIndex(ctx, inode) + if err != nil { + return "", err + } + + path := record.CanonicalPath() + if path == "" { + return "", ErrKvNotFound + } + return path, nil +} + +func (fsw *FilerStoreWrapper) lookupInodePaths(ctx context.Context, inode uint64) ([]util.FullPath, error) { + record, err := fsw.lookupInodeIndex(ctx, inode) + if err != nil { + return nil, err + } + + paths := record.FullPaths() + if len(paths) == 0 { + return nil, ErrKvNotFound + } + return paths, nil +} + +func (fsw *FilerStoreWrapper) removePathFromInodeIndex(ctx context.Context, path util.FullPath, inode uint64) error { + if inode == 0 || path == "" { + return nil + } + + record, err := fsw.lookupInodeIndex(ctx, inode) + if err != nil { + if err == ErrKvNotFound { + return nil + } + return err + } + + if !record.removePath(path) { + return nil + } + if len(record.Paths) == 0 { + return fsw.KvDelete(ctx, InodeIndexKey(inode)) + } + + value, err := record.Encode() + if err != nil { + return err + } + return fsw.KvPut(ctx, InodeIndexKey(inode), value) +} + +func (fsw *FilerStoreWrapper) collectInodeIndexEntries(ctx context.Context, dirPath util.FullPath) ([]inodeIndexEntry, error) { + // Honor caller cancellation during the walk: a DeleteFolderChildren on a + // pathological directory could otherwise loop indefinitely gathering + // entries even after the client has given up, turning into a DoS vector. + // If the walk is aborted, the caller treats the index cleanup as + // best-effort and drops the partial result. + var collected []inodeIndexEntry + if err := fsw.collectInodeIndexEntriesRecursive(ctx, dirPath, &collected); err != nil { + return nil, err + } + return collected, nil +} + +func (fsw *FilerStoreWrapper) collectInodeIndexEntriesRecursive(ctx context.Context, dirPath util.FullPath, collected *[]inodeIndexEntry) error { + actualStore := fsw.getActualStore(dirPath + "/") + + lastFileName := "" + includeStartFile := false + for { + page := make([]*Entry, 0, PaginationSize) + nextLastFileName, err := actualStore.ListDirectoryEntries(ctx, dirPath, lastFileName, includeStartFile, PaginationSize, func(entry *Entry) (bool, error) { + page = append(page, entry) + return true, nil + }) + if err != nil { + return err + } + + for _, entry := range page { + if entry.Attr.Inode != 0 { + *collected = append(*collected, inodeIndexEntry{path: entry.FullPath, inode: entry.Attr.Inode}) + } + if entry.IsDirectory() { + if err := fsw.collectInodeIndexEntriesRecursive(ctx, entry.FullPath, collected); err != nil { + return err + } + } + } + + if len(page) < PaginationSize { + return nil + } + lastFileName = nextLastFileName + includeStartFile = false + } +} + +// recordInodeIndexWrite updates the inode→path secondary index after the +// primary store mutation has already succeeded. The index is best-effort: a +// failure here must not surface as an operation error, because the caller +// would then observe a failed create/update even though the entry was +// persisted, and a retry cannot heal the index (DeleteEntry exits early once +// the entry is gone). We log and let later writes rebuild the record. +func (fsw *FilerStoreWrapper) recordInodeIndexWrite(ctx context.Context, op string, path util.FullPath, inode uint64) { + if inode == 0 || path == "" { + return + } + if err := fsw.storeInodeIndex(ctx, path, inode); err != nil { + glog.WarningfCtx(ctx, "%s: update inode index for %s (inode %d): %v", op, path, inode, err) + } +} + +// recordInodeIndexRemoval mirrors recordInodeIndexWrite for removals. +func (fsw *FilerStoreWrapper) recordInodeIndexRemoval(ctx context.Context, op string, path util.FullPath, inode uint64) { + if inode == 0 || path == "" { + return + } + if err := fsw.removePathFromInodeIndex(ctx, path, inode); err != nil { + glog.WarningfCtx(ctx, "%s: clear inode index for %s (inode %d): %v", op, path, inode, err) + } +} diff --git a/weed/filer/filer_inode_index_test.go b/weed/filer/filer_inode_index_test.go new file mode 100644 index 000000000..bee0e6c65 --- /dev/null +++ b/weed/filer/filer_inode_index_test.go @@ -0,0 +1,206 @@ +package filer + +import ( + "context" + "os" + "testing" + + "github.com/seaweedfs/seaweedfs/weed/util" + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" +) + +func TestFilerStoreWrapperMaintainsInodeIndexLifecycle(t *testing.T) { + wrapper := NewFilerStoreWrapper(newStubFilerStore()) + ctx := context.Background() + + created := &Entry{ + FullPath: util.FullPath("/docs/report.txt"), + Attr: Attr{ + Mode: 0o644, + Inode: 42, + }, + } + + require.NoError(t, wrapper.InsertEntry(ctx, created)) + path, err := wrapper.lookupInodePath(ctx, created.Attr.Inode) + require.NoError(t, err) + assert.Equal(t, created.FullPath, path) + paths, err := wrapper.lookupInodePaths(ctx, created.Attr.Inode) + require.NoError(t, err) + assert.Equal(t, []util.FullPath{created.FullPath}, paths) + record, err := wrapper.lookupInodeIndex(ctx, created.Attr.Inode) + require.NoError(t, err) + assert.Equal(t, InodeIndexInitialGeneration, record.Generation) + + updated := &Entry{ + FullPath: util.FullPath("/docs/report.txt"), + Attr: Attr{ + Mode: 0o600, + Inode: 42, + }, + } + require.NoError(t, wrapper.UpdateEntry(ctx, updated)) + path, err = wrapper.lookupInodePath(ctx, updated.Attr.Inode) + require.NoError(t, err) + assert.Equal(t, updated.FullPath, path) + + require.NoError(t, wrapper.DeleteEntry(ctx, created.FullPath)) + _, err = wrapper.lookupInodePath(ctx, created.Attr.Inode) + require.ErrorIs(t, err, ErrKvNotFound) +} + +func TestFilerStoreWrapperMaintainsMultiplePathsPerInode(t *testing.T) { + wrapper := NewFilerStoreWrapper(newStubFilerStore()) + ctx := context.Background() + inode := uint64(88) + hardLinkId := NewHardLinkId() + + require.NoError(t, wrapper.InsertEntry(ctx, &Entry{ + FullPath: util.FullPath("/links/b.txt"), + Attr: Attr{ + Mode: 0o644, + Inode: inode, + }, + HardLinkId: hardLinkId, + HardLinkCounter: 2, + })) + require.NoError(t, wrapper.InsertEntry(ctx, &Entry{ + FullPath: util.FullPath("/links/a.txt"), + Attr: Attr{ + Mode: 0o644, + Inode: inode, + }, + HardLinkId: hardLinkId, + HardLinkCounter: 2, + })) + + paths, err := wrapper.lookupInodePaths(ctx, inode) + require.NoError(t, err) + assert.Equal(t, []util.FullPath{"/links/a.txt", "/links/b.txt"}, paths) + record, err := wrapper.lookupInodeIndex(ctx, inode) + require.NoError(t, err) + assert.Equal(t, InodeIndexInitialGeneration, record.Generation) + + path, err := wrapper.lookupInodePath(ctx, inode) + require.NoError(t, err) + assert.Equal(t, util.FullPath("/links/a.txt"), path) + + require.NoError(t, wrapper.DeleteEntry(ctx, util.FullPath("/links/a.txt"))) + + paths, err = wrapper.lookupInodePaths(ctx, inode) + require.NoError(t, err) + assert.Equal(t, []util.FullPath{"/links/b.txt"}, paths) + + path, err = wrapper.lookupInodePath(ctx, inode) + require.NoError(t, err) + assert.Equal(t, util.FullPath("/links/b.txt"), path) +} + +func TestFilerStoreWrapperUpgradesLegacySinglePathInodeIndexRecords(t *testing.T) { + wrapper := NewFilerStoreWrapper(newStubFilerStore()) + ctx := context.Background() + inode := uint64(91) + + require.NoError(t, wrapper.KvPut(ctx, InodeIndexKey(inode), []byte("/legacy/path.txt"))) + + path, err := wrapper.lookupInodePath(ctx, inode) + require.NoError(t, err) + assert.Equal(t, util.FullPath("/legacy/path.txt"), path) + + paths, err := wrapper.lookupInodePaths(ctx, inode) + require.NoError(t, err) + assert.Equal(t, []util.FullPath{"/legacy/path.txt"}, paths) + + require.NoError(t, wrapper.storeInodeIndex(ctx, util.FullPath("/legacy/second.txt"), inode)) + + paths, err = wrapper.lookupInodePaths(ctx, inode) + require.NoError(t, err) + assert.Equal(t, []util.FullPath{"/legacy/path.txt", "/legacy/second.txt"}, paths) + + value, err := wrapper.KvGet(ctx, InodeIndexKey(inode)) + require.NoError(t, err) + assert.JSONEq(t, `{"generation":1,"paths":["/legacy/path.txt","/legacy/second.txt"]}`, string(value)) +} + +func TestFilerStoreWrapperKeepsInodeIndexWhenDeleteArrivesAfterRenameInsert(t *testing.T) { + wrapper := NewFilerStoreWrapper(newStubFilerStore()) + ctx := context.Background() + inode := uint64(77) + + require.NoError(t, wrapper.InsertEntry(ctx, &Entry{ + FullPath: util.FullPath("/old/name.txt"), + Attr: Attr{ + Mode: 0o644, + Inode: inode, + }, + })) + require.NoError(t, wrapper.InsertEntry(ctx, &Entry{ + FullPath: util.FullPath("/new/name.txt"), + Attr: Attr{ + Mode: 0o644, + Inode: inode, + }, + })) + require.NoError(t, wrapper.DeleteEntry(ctx, util.FullPath("/old/name.txt"))) + + path, err := wrapper.lookupInodePath(ctx, inode) + require.NoError(t, err) + assert.Equal(t, util.FullPath("/new/name.txt"), path) + + paths, err := wrapper.lookupInodePaths(ctx, inode) + require.NoError(t, err) + assert.Equal(t, []util.FullPath{"/new/name.txt"}, paths) +} + +func TestRecursiveDeleteRemovesDescendantInodeIndexes(t *testing.T) { + f, store := newTestFilerWithStubStore() + ctx := context.Background() + + entries := []*Entry{ + { + FullPath: util.FullPath("/tree"), + Attr: Attr{ + Mode: os.ModeDir | 0o755, + Inode: 100, + }, + }, + { + FullPath: util.FullPath("/tree/file.txt"), + Attr: Attr{ + Mode: 0o644, + Inode: 101, + }, + }, + { + FullPath: util.FullPath("/tree/subdir"), + Attr: Attr{ + Mode: os.ModeDir | 0o755, + Inode: 102, + }, + }, + { + FullPath: util.FullPath("/tree/subdir/nested.txt"), + Attr: Attr{ + Mode: 0o644, + Inode: 103, + }, + }, + } + + for _, entry := range entries { + require.NoError(t, f.Store.InsertEntry(ctx, entry)) + } + + require.NoError(t, f.DeleteEntryMetaAndData(ctx, util.FullPath("/tree"), true, false, false, false, nil, 0)) + + for _, inode := range []uint64{100, 101, 102, 103} { + _, err := f.Store.(*FilerStoreWrapper).lookupInodePath(ctx, inode) + require.ErrorIs(t, err, ErrKvNotFound) + } + + for _, path := range []string{"/tree", "/tree/file.txt", "/tree/subdir", "/tree/subdir/nested.txt"} { + _, err := store.FindEntry(ctx, util.FullPath(path)) + require.Error(t, err) + } +} diff --git a/weed/filer/filer_inode_test.go b/weed/filer/filer_inode_test.go new file mode 100644 index 000000000..e965ab017 --- /dev/null +++ b/weed/filer/filer_inode_test.go @@ -0,0 +1,124 @@ +package filer + +import ( + "context" + "os" + "testing" + + "github.com/seaweedfs/seaweedfs/weed/pb" + "github.com/seaweedfs/seaweedfs/weed/util" + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" +) + +func newTestFilerWithStubStore() (*Filer, *stubFilerStore) { + store := newStubFilerStore() + f := NewFiler(pb.ServerDiscovery{}, nil, "", "", "", "", "", 255, nil) + f.Store = NewFilerStoreWrapper(store) + return f, store +} + +func TestCreateEntryAssignsInodeWhenMissing(t *testing.T) { + f, store := newTestFilerWithStubStore() + + entry := &Entry{ + FullPath: util.FullPath("/dir/file.txt"), + Attr: Attr{ + Mode: 0o644, + }, + } + + err := f.CreateEntry(context.Background(), entry, false, false, nil, false, f.MaxFilenameLength) + require.NoError(t, err) + + stored, findErr := store.FindEntry(context.Background(), entry.FullPath) + require.NoError(t, findErr) + require.NotNil(t, stored) + assert.NotZero(t, stored.Attr.Inode) + assert.NotEqual(t, uint64(1), stored.Attr.Inode) +} + +func TestCreateEntryAssignsInodesToAutoCreatedParents(t *testing.T) { + f, store := newTestFilerWithStubStore() + + entry := &Entry{ + FullPath: util.FullPath("/a/b/c.txt"), + Attr: Attr{ + Mode: 0o644, + }, + } + + err := f.CreateEntry(context.Background(), entry, false, false, nil, false, f.MaxFilenameLength) + require.NoError(t, err) + + for _, path := range []string{"/a", "/a/b", "/a/b/c.txt"} { + stored, findErr := store.FindEntry(context.Background(), util.FullPath(path)) + require.NoError(t, findErr, path) + require.NotNil(t, stored, path) + assert.NotZero(t, stored.Attr.Inode, path) + } +} + +func TestUpdateEntryPreservesExistingInode(t *testing.T) { + f, store := newTestFilerWithStubStore() + + original := &Entry{ + FullPath: util.FullPath("/doc.txt"), + Attr: Attr{ + Mode: 0o644, + Inode: 12345, + }, + } + require.NoError(t, store.InsertEntry(context.Background(), original)) + + updated := &Entry{ + FullPath: util.FullPath("/doc.txt"), + Attr: Attr{ + Mode: os.ModeDir | 0o755, + }, + } + + err := f.UpdateEntry(context.Background(), original, updated) + require.Error(t, err) + + updated = &Entry{ + FullPath: util.FullPath("/doc.txt"), + Attr: Attr{ + Mode: 0o600, + }, + } + err = f.UpdateEntry(context.Background(), original, updated) + require.NoError(t, err) + + stored, findErr := store.FindEntry(context.Background(), original.FullPath) + require.NoError(t, findErr) + require.NotNil(t, stored) + assert.Equal(t, uint64(12345), stored.Attr.Inode) +} + +func TestUpdateEntryBackfillsMissingLegacyInode(t *testing.T) { + f, store := newTestFilerWithStubStore() + + original := &Entry{ + FullPath: util.FullPath("/legacy.txt"), + Attr: Attr{ + Mode: 0o644, + }, + } + require.NoError(t, store.InsertEntry(context.Background(), original)) + + updated := &Entry{ + FullPath: util.FullPath("/legacy.txt"), + Attr: Attr{ + Mode: 0o640, + }, + } + err := f.UpdateEntry(context.Background(), original, updated) + require.NoError(t, err) + + stored, findErr := store.FindEntry(context.Background(), original.FullPath) + require.NoError(t, findErr) + require.NotNil(t, stored) + assert.NotZero(t, stored.Attr.Inode) + assert.NotEqual(t, uint64(1), stored.Attr.Inode) +} diff --git a/weed/filer/filer_lazy_remote_test.go b/weed/filer/filer_lazy_remote_test.go index 6ad5e4449..18973c75a 100644 --- a/weed/filer/filer_lazy_remote_test.go +++ b/weed/filer/filer_lazy_remote_test.go @@ -276,6 +276,7 @@ func newTestFiler(t *testing.T, store *stubFilerStore, rs *FilerRemoteStorage) * FilerConf: NewFilerConf(), MaxFilenameLength: 255, MasterClient: mc, + inodeSequencer: newInodeSequencer("test-filer"), fileIdDeletionQueue: util.NewUnboundedQueue(), deletionQuit: make(chan struct{}), LocalMetaLogBuffer: log_buffer.NewLogBuffer("test", time.Minute, diff --git a/weed/filer/filerstore_wrapper.go b/weed/filer/filerstore_wrapper.go index 9b39d5eee..b9ce81427 100644 --- a/weed/filer/filerstore_wrapper.go +++ b/weed/filer/filerstore_wrapper.go @@ -132,6 +132,7 @@ func (fsw *FilerStoreWrapper) InsertEntry(ctx context.Context, entry *Entry) err return err } ctx = context.WithoutCancel(ctx) + fullPath := entry.FullPath actualStore := fsw.getActualStore(entry.FullPath) stats.FilerStoreCounter.WithLabelValues(actualStore.GetName(), "insert").Inc() start := time.Now() @@ -151,8 +152,11 @@ func (fsw *FilerStoreWrapper) InsertEntry(ctx context.Context, entry *Entry) err return err } - // glog.V(4).Infof("InsertEntry %s", entry.FullPath) - return actualStore.InsertEntry(ctx, entry) + if err := actualStore.InsertEntry(ctx, entry); err != nil { + return err + } + fsw.recordInodeIndexWrite(ctx, "InsertEntry", fullPath, entry.Attr.Inode) + return nil } // InsertEntryKnownAbsent skips the pre-insert FindEntry path when the caller has @@ -162,6 +166,7 @@ func (fsw *FilerStoreWrapper) InsertEntryKnownAbsent(ctx context.Context, entry return err } ctx = context.WithoutCancel(ctx) + fullPath := entry.FullPath actualStore := fsw.getActualStore(entry.FullPath) stats.FilerStoreCounter.WithLabelValues(actualStore.GetName(), "insert").Inc() start := time.Now() @@ -180,7 +185,11 @@ func (fsw *FilerStoreWrapper) InsertEntryKnownAbsent(ctx context.Context, entry } } - return actualStore.InsertEntry(ctx, entry) + if err := actualStore.InsertEntry(ctx, entry); err != nil { + return err + } + fsw.recordInodeIndexWrite(ctx, "InsertEntryKnownAbsent", fullPath, entry.Attr.Inode) + return nil } func (fsw *FilerStoreWrapper) UpdateEntry(ctx context.Context, entry *Entry) error { @@ -188,6 +197,7 @@ func (fsw *FilerStoreWrapper) UpdateEntry(ctx context.Context, entry *Entry) err return err } ctx = context.WithoutCancel(ctx) + fullPath := entry.FullPath actualStore := fsw.getActualStore(entry.FullPath) stats.FilerStoreCounter.WithLabelValues(actualStore.GetName(), "update").Inc() start := time.Now() @@ -207,8 +217,11 @@ func (fsw *FilerStoreWrapper) UpdateEntry(ctx context.Context, entry *Entry) err return err } - // glog.V(4).Infof("UpdateEntry %s", entry.FullPath) - return actualStore.UpdateEntry(ctx, entry) + if err := actualStore.UpdateEntry(ctx, entry); err != nil { + return err + } + fsw.recordInodeIndexWrite(ctx, "UpdateEntry", fullPath, entry.Attr.Inode) + return nil } func normalizeEntryMimeForStore(entry *Entry) { @@ -260,6 +273,8 @@ func (fsw *FilerStoreWrapper) DeleteEntry(ctx context.Context, fp util.FullPath) if findErr == filer_pb.ErrNotFound || existingEntry == nil { return nil } + inode := existingEntry.Attr.Inode + fullPath := existingEntry.FullPath if len(existingEntry.HardLinkId) != 0 { // remove hard link op := ctx.Value("OP") @@ -274,8 +289,11 @@ func (fsw *FilerStoreWrapper) DeleteEntry(ctx context.Context, fp util.FullPath) } } - // glog.V(4).Infof("DeleteEntry %s", fp) - return actualStore.DeleteEntry(ctx, fp) + if err := actualStore.DeleteEntry(ctx, fp); err != nil { + return err + } + fsw.recordInodeIndexRemoval(ctx, "DeleteEntry", fullPath, inode) + return nil } func (fsw *FilerStoreWrapper) DeleteOneEntry(ctx context.Context, existingEntry *Entry) (err error) { @@ -283,6 +301,8 @@ func (fsw *FilerStoreWrapper) DeleteOneEntry(ctx context.Context, existingEntry return err } ctx = context.WithoutCancel(ctx) + fullPath := existingEntry.FullPath + inode := existingEntry.Attr.Inode actualStore := fsw.getActualStore(existingEntry.FullPath) stats.FilerStoreCounter.WithLabelValues(actualStore.GetName(), "delete").Inc() start := time.Now() @@ -305,8 +325,11 @@ func (fsw *FilerStoreWrapper) DeleteOneEntry(ctx context.Context, existingEntry } } - // glog.V(4).Infof("DeleteOneEntry %s", existingEntry.FullPath) - return actualStore.DeleteEntry(ctx, existingEntry.FullPath) + if err := actualStore.DeleteEntry(ctx, existingEntry.FullPath); err != nil { + return err + } + fsw.recordInodeIndexRemoval(ctx, "DeleteOneEntry", fullPath, inode) + return nil } func (fsw *FilerStoreWrapper) DeleteFolderChildren(ctx context.Context, fp util.FullPath) (err error) { @@ -321,8 +344,20 @@ func (fsw *FilerStoreWrapper) DeleteFolderChildren(ctx context.Context, fp util. stats.FilerStoreHistogram.WithLabelValues(actualStore.GetName(), "deleteFolderChildren").Observe(time.Since(start).Seconds()) }() - // glog.V(4).Infof("DeleteFolderChildren %s", fp) - return actualStore.DeleteFolderChildren(ctx, fp) + collected, err := fsw.collectInodeIndexEntries(ctx, fp) + if err != nil { + // Index collection is best-effort: a failure here only prevents inode + // index housekeeping, not the directory removal itself. + glog.WarningfCtx(ctx, "collectInodeIndexEntries %s: %v; deleting folder children without index cleanup", fp, err) + collected = nil + } + if err := actualStore.DeleteFolderChildren(ctx, fp); err != nil { + return err + } + for _, entry := range collected { + fsw.recordInodeIndexRemoval(ctx, "DeleteFolderChildren", entry.path, entry.inode) + } + return nil } func (fsw *FilerStoreWrapper) ListDirectoryEntries(ctx context.Context, dirPath util.FullPath, startFileName string, includeStartFile bool, limit int64, eachEntryFunc ListEachEntryFunc) (string, error) { diff --git a/weed/filer/gateway_upload.go b/weed/filer/gateway_upload.go new file mode 100644 index 000000000..507b5efa6 --- /dev/null +++ b/weed/filer/gateway_upload.go @@ -0,0 +1,168 @@ +package filer + +import ( + "errors" + "fmt" + "io" + + "github.com/seaweedfs/seaweedfs/weed/operation" + "github.com/seaweedfs/seaweedfs/weed/pb/filer_pb" +) + +// GatewayChunkUploader is the minimum surface the shared chunk-upload helper +// needs from a concrete uploader. It is satisfied by *operation.Uploader and +// can be mocked in tests without pulling in the weed/operation package. +type GatewayChunkUploader interface { + UploadWithRetry( + filerClient filer_pb.FilerClient, + assignRequest *filer_pb.AssignVolumeRequest, + uploadOption *operation.UploadOption, + genFileUrlFn func(host, fileId string) string, + reader io.Reader, + ) (fileId string, uploadResult *operation.UploadResult, err error, data []byte) +} + +// GatewayChunkUploadRequest captures the inputs SaveGatewayDataAsChunk needs. +// All fields except Reader and FilerClient are optional — empty values map to +// sensible defaults that match what `weed filer`, `weed mount`, `weed nfs`, +// and the WebDAV gateway produced before this helper was factored out. +type GatewayChunkUploadRequest struct { + // FilerClient issues the AssignVolume RPC. Must be non-nil. + FilerClient filer_pb.FilerClient + // Uploader executes the HTTP upload. When nil, operation.NewUploader() + // is used so the common case needs no explicit wiring. + Uploader GatewayChunkUploader + // Reader supplies the bytes to upload. Must be non-nil; the helper + // reads it fully and relies on UploadWithRetry to report short reads. + Reader io.Reader + + // Logical path of the target file on the filer. Used by AssignVolume + // for placement policies (replication, collection, etc.). + FullPath string + // File name used in the upload form; defaults to the last path segment + // of FullPath when empty. + Filename string + // Offset of the data inside the logical file — copied into the + // returned FileChunk.Offset so the filer's chunk view layer can + // reconstruct the file correctly. + Offset int64 + // TsNs is the per-chunk modification timestamp written onto the + // resulting FileChunk. Typically time.Now().UnixNano(). + TsNs int64 + + // Assign-time placement options. + Collection string + Replication string + TtlSec int32 + DiskType string + DataCenter string + + // Upload-time options. + Cipher bool + MimeType string + PairMap map[string]string + + // VolumeServerAccess selects how the filer generates the chunk URL. + // Supported values: "direct", "publicUrl", "filerProxy". When set to + // "filerProxy", FilerHTTPAddress must be non-empty. + VolumeServerAccess string + // FilerHTTPAddress is the filer host:port that proxies chunk URLs when + // VolumeServerAccess == "filerProxy". + FilerHTTPAddress string +} + +// SaveGatewayDataAsChunk uploads the bytes in `req.Reader` as a single chunk +// on a volume server and returns a filer FileChunk describing it. It is the +// shared chunk-upload path for the NFS, WebDAV, and future gateway servers; +// mount has extra per-request caching and a pre-allocated file-id pool that +// still live in weed/mount. +// +// Semantics: +// +// - AssignVolume is driven by the filer client in `req.FilerClient`. +// - The chunk URL is built by the standard operation.UploadOption machinery +// and then optionally rewritten when VolumeServerAccess == "filerProxy". +// - The returned chunk's Offset is `req.Offset` and its ModifiedTsNs is +// `req.TsNs`. Callers that append to an existing file are responsible +// for installing the chunk into the entry's chunk list (typically via a +// filer UpdateEntry). +func SaveGatewayDataAsChunk(req GatewayChunkUploadRequest) (*filer_pb.FileChunk, error) { + if req.FilerClient == nil { + return nil, errors.New("SaveGatewayDataAsChunk: nil filer client") + } + if req.Reader == nil { + return nil, errors.New("SaveGatewayDataAsChunk: nil reader") + } + + uploader := req.Uploader + if uploader == nil { + realUploader, err := operation.NewUploader() + if err != nil { + return nil, fmt.Errorf("SaveGatewayDataAsChunk: new uploader: %w", err) + } + uploader = realUploader + } + + filename := req.Filename + if filename == "" { + if slash := lastSlashIndex(req.FullPath); slash >= 0 && slash+1 < len(req.FullPath) { + filename = req.FullPath[slash+1:] + } else { + filename = req.FullPath + } + } + + uploadOption := &operation.UploadOption{ + Filename: filename, + Cipher: req.Cipher, + IsInputCompressed: false, + MimeType: req.MimeType, + PairMap: req.PairMap, + } + + genFileUrlFn := func(host, fileId string) string { + if req.VolumeServerAccess == "filerProxy" && req.FilerHTTPAddress != "" { + return fmt.Sprintf("http://%s/?proxyChunkId=%s", req.FilerHTTPAddress, fileId) + } + return fmt.Sprintf("http://%s/%s", host, fileId) + } + + assignRequest := &filer_pb.AssignVolumeRequest{ + Count: 1, + Replication: req.Replication, + Collection: req.Collection, + TtlSec: req.TtlSec, + DiskType: req.DiskType, + DataCenter: req.DataCenter, + Path: req.FullPath, + } + + fileID, uploadResult, uploadErr, _ := uploader.UploadWithRetry( + req.FilerClient, + assignRequest, + uploadOption, + genFileUrlFn, + req.Reader, + ) + if uploadErr != nil { + return nil, fmt.Errorf("upload data: %w", uploadErr) + } + if uploadResult == nil { + return nil, errors.New("upload data: missing upload result") + } + if uploadResult.Error != "" { + return nil, fmt.Errorf("upload result: %s", uploadResult.Error) + } + return uploadResult.ToPbFileChunk(fileID, req.Offset, req.TsNs), nil +} + +// lastSlashIndex returns the index of the last '/' in s, or -1 if none. +// Intentionally local so this file has no new test-only imports. +func lastSlashIndex(s string) int { + for i := len(s) - 1; i >= 0; i-- { + if s[i] == '/' { + return i + } + } + return -1 +} diff --git a/weed/filer/stream.go b/weed/filer/stream.go index d0c028e88..de53134d1 100644 --- a/weed/filer/stream.go +++ b/weed/filer/stream.go @@ -487,7 +487,7 @@ func (c *ChunkStreamReader) fetchChunkToBuffer(chunkView *ChunkView) error { var shouldRetry bool jwt := JwtForVolumeServer(chunkView.FileId) for _, urlString := range urlStrings { - shouldRetry, err = util_http.ReadUrlAsStream(context.Background(), urlString+"?readDeleted=true", jwt, chunkView.CipherKey, chunkView.IsGzipped, chunkView.IsFullChunk(), chunkView.OffsetInChunk, int(chunkView.ViewSize), func(data []byte) { + shouldRetry, err = util_http.ReadUrlAsStream(context.Background(), util_http.AppendQueryParameter(urlString, "readDeleted", "true"), jwt, chunkView.CipherKey, chunkView.IsGzipped, chunkView.IsFullChunk(), chunkView.OffsetInChunk, int(chunkView.ViewSize), func(data []byte) { buffer.Write(data) }) if !shouldRetry { diff --git a/weed/mount/weedfs_dir_mkrm.go b/weed/mount/weedfs_dir_mkrm.go index 8352b72f6..67705d1e4 100644 --- a/weed/mount/weedfs_dir_mkrm.go +++ b/weed/mount/weedfs_dir_mkrm.go @@ -31,6 +31,20 @@ func (wfs *WFS) Mkdir(cancel <-chan struct{}, in *fuse.MkdirIn, name string, out } now := time.Now().Unix() + + dirFullPath, code := wfs.inodeToPath.GetPath(in.NodeId) + if code != fuse.OK { + return + } + + entryFullPath := dirFullPath.Child(name) + + // Pre-allocate the mount's local inode and stamp it into the create + // request so both the mount and the filer agree on object identity from + // the start. Without this, the filer assigns its own inode in CreateEntry + // and the cached entry then reports a different value than the one we + // return to the kernel here. + inode := wfs.inodeToPath.AllocateInode(entryFullPath, now) newEntry := &filer_pb.Entry{ Name: name, IsDirectory: true, @@ -41,16 +55,10 @@ func (wfs *WFS) Mkdir(cancel <-chan struct{}, in *fuse.MkdirIn, name string, out FileMode: uint32(os.ModeDir) | in.Mode, Uid: in.Uid, Gid: in.Gid, + Inode: inode, }, } - dirFullPath, code := wfs.inodeToPath.GetPath(in.NodeId) - if code != fuse.OK { - return - } - - entryFullPath := dirFullPath.Child(name) - wfs.mapPbIdFromLocalToFiler(newEntry) // Defer restoring to local uid/gid AFTER the entry is sent to the filer // but BEFORE outputPbEntry writes attributes to the kernel. We restore @@ -93,7 +101,7 @@ func (wfs *WFS) Mkdir(cancel <-chan struct{}, in *fuse.MkdirIn, name string, out // for subsequent permission checks on children. wfs.mapPbIdFromFilerToLocal(newEntry) - inode := wfs.inodeToPath.Lookup(entryFullPath, newEntry.Attributes.Crtime, true, false, 0, true) + inode = wfs.inodeToPath.Lookup(entryFullPath, newEntry.Attributes.Crtime, true, false, inode, true) // The newly created directory is guaranteed to be empty, so mark it as // cached immediately to avoid a needless filer round-trip on the first diff --git a/weed/mount/weedfs_symlink.go b/weed/mount/weedfs_symlink.go index d1cf913a7..85f108913 100644 --- a/weed/mount/weedfs_symlink.go +++ b/weed/mount/weedfs_symlink.go @@ -29,6 +29,11 @@ func (wfs *WFS) Symlink(cancel <-chan struct{}, header *fuse.InHeader, target st entryFullPath := dirPath.Child(name) now := time.Now().Unix() + // Pre-allocate the mount's local inode so the filer stores the same + // object identity we report to the kernel below. Without this, the filer + // assigns its own inode and subsequent cached reads would disagree with + // the inode we return from Symlink. + inode := wfs.inodeToPath.AllocateInode(entryFullPath, now) request := &filer_pb.CreateEntryRequest{ Directory: string(dirPath), Entry: &filer_pb.Entry{ @@ -42,6 +47,7 @@ func (wfs *WFS) Symlink(cancel <-chan struct{}, header *fuse.InHeader, target st Uid: header.Uid, Gid: header.Gid, SymlinkTarget: target, + Inode: inode, }, }, Signatures: []int32{wfs.signature}, @@ -71,7 +77,7 @@ func (wfs *WFS) Symlink(cancel <-chan struct{}, header *fuse.InHeader, target st return fuse.EIO } - inode := wfs.inodeToPath.Lookup(entryFullPath, request.Entry.Attributes.Crtime, false, false, 0, true) + inode = wfs.inodeToPath.Lookup(entryFullPath, request.Entry.Attributes.Crtime, false, false, inode, true) wfs.outputPbEntry(out, inode, request.Entry) diff --git a/weed/sequence/snowflake_sequencer.go b/weed/sequence/snowflake_sequencer.go index 05694f681..f55bcfe08 100644 --- a/weed/sequence/snowflake_sequencer.go +++ b/weed/sequence/snowflake_sequencer.go @@ -16,7 +16,10 @@ type SnowflakeSequencer struct { func NewSnowflakeSequencer(nodeid string, snowflakeId int) (*SnowflakeSequencer, error) { nodeid_hash := hash(nodeid) & 0x3ff if snowflakeId != 0 { - nodeid_hash = uint32(snowflakeId) + // Mask to 10 bits to match the snowflake library's node-id range and + // avoid a wide int → uint32 conversion when snowflakeId is sourced + // from user input such as an environment variable. + nodeid_hash = uint32(snowflakeId & 0x3ff) } glog.V(0).Infof("use snowflake seq id generator, nodeid:%s hex_of_nodeid: %x", nodeid, nodeid_hash) node, err := snowflake.NewNode(int64(nodeid_hash)) diff --git a/weed/server/filer_grpc_server_rename.go b/weed/server/filer_grpc_server_rename.go index ac970497f..3c8aae6b1 100644 --- a/weed/server/filer_grpc_server_rename.go +++ b/weed/server/filer_grpc_server_rename.go @@ -35,7 +35,8 @@ func (fs *FilerServer) AtomicRenameEntry(ctx context.Context, req *filer_pb.Atom } var metadataEvents []metadataEvent - moveErr := fs.moveEntry(ctx, nil, oldParent, oldEntry, newParent, req.NewName, req.Signatures, false, &metadataEvents) + var pendingChunkDeletes []*filer_pb.FileChunk + moveErr := fs.moveEntry(ctx, nil, oldParent, oldEntry, newParent, req.NewName, req.Signatures, false, &metadataEvents, &pendingChunkDeletes) if moveErr != nil { fs.filer.RollbackTransaction(ctx) return nil, fmt.Errorf("%s/%s move error: %v", req.OldDirectory, req.OldName, moveErr) @@ -45,6 +46,13 @@ func (fs *FilerServer) AtomicRenameEntry(ctx context.Context, req *filer_pb.Atom return nil, fmt.Errorf("%s/%s move commit error: %v", req.OldDirectory, req.OldName, commitError) } } + // Chunks from an overwritten rename target are only deletable after the + // rename transaction has committed: anything that fails mid-rename (move, + // child moves, oldPath delete, CommitTransaction) would otherwise leave + // live metadata pointing at freshly-deleted chunks. + if len(pendingChunkDeletes) > 0 { + fs.filer.DeleteChunksNotRecursive(pendingChunkDeletes) + } for _, event := range metadataEvents { event.notify(fs.filer, ctx, req.Signatures) } @@ -92,7 +100,8 @@ func (fs *FilerServer) StreamRenameEntry(req *filer_pb.StreamRenameEntryRequest, } var metadataEvents []metadataEvent - moveErr := fs.moveEntry(ctx, stream, oldParent, oldEntry, newParent, req.NewName, req.Signatures, false, &metadataEvents) + var pendingChunkDeletes []*filer_pb.FileChunk + moveErr := fs.moveEntry(ctx, stream, oldParent, oldEntry, newParent, req.NewName, req.Signatures, false, &metadataEvents, &pendingChunkDeletes) if moveErr != nil { fs.filer.RollbackTransaction(ctx) return fmt.Errorf("%s/%s move error: %v", req.OldDirectory, req.OldName, moveErr) @@ -102,6 +111,9 @@ func (fs *FilerServer) StreamRenameEntry(req *filer_pb.StreamRenameEntryRequest, return fmt.Errorf("%s/%s move commit error: %v", req.OldDirectory, req.OldName, commitError) } } + if len(pendingChunkDeletes) > 0 { + fs.filer.DeleteChunksNotRecursive(pendingChunkDeletes) + } for _, event := range metadataEvents { event.notify(fs.filer, ctx, req.Signatures) } @@ -119,23 +131,23 @@ func (event metadataEvent) notify(f *filer.Filer, ctx context.Context, signature f.NotifyUpdateEvent(ctx, event.oldEntry, event.newEntry, event.deleteChunks, false, signatures) } -func (fs *FilerServer) moveEntry(ctx context.Context, stream filer_pb.SeaweedFiler_StreamRenameEntryServer, oldParent util.FullPath, entry *filer.Entry, newParent util.FullPath, newName string, signatures []int32, skipTargetLookup bool, metadataEvents *[]metadataEvent) error { +func (fs *FilerServer) moveEntry(ctx context.Context, stream filer_pb.SeaweedFiler_StreamRenameEntryServer, oldParent util.FullPath, entry *filer.Entry, newParent util.FullPath, newName string, signatures []int32, skipTargetLookup bool, metadataEvents *[]metadataEvent, pendingChunkDeletes *[]*filer_pb.FileChunk) error { if err := fs.moveSelfEntry(ctx, stream, oldParent, entry, newParent, newName, func() error { if entry.IsDirectory() { - if err := fs.moveFolderSubEntries(ctx, stream, oldParent, entry, newParent, newName, signatures, metadataEvents); err != nil { + if err := fs.moveFolderSubEntries(ctx, stream, oldParent, entry, newParent, newName, signatures, metadataEvents, pendingChunkDeletes); err != nil { return err } } return nil - }, signatures, skipTargetLookup, metadataEvents); err != nil { + }, signatures, skipTargetLookup, metadataEvents, pendingChunkDeletes); err != nil { return fmt.Errorf("fail to move %s => %s: %v", oldParent.Child(entry.Name()), newParent.Child(newName), err) } return nil } -func (fs *FilerServer) moveFolderSubEntries(ctx context.Context, stream filer_pb.SeaweedFiler_StreamRenameEntryServer, oldParent util.FullPath, entry *filer.Entry, newParent util.FullPath, newName string, signatures []int32, metadataEvents *[]metadataEvent) error { +func (fs *FilerServer) moveFolderSubEntries(ctx context.Context, stream filer_pb.SeaweedFiler_StreamRenameEntryServer, oldParent util.FullPath, entry *filer.Entry, newParent util.FullPath, newName string, signatures []int32, metadataEvents *[]metadataEvent, pendingChunkDeletes *[]*filer_pb.FileChunk) error { currentDirPath := oldParent.Child(entry.Name()) newDirPath := newParent.Child(newName) @@ -158,7 +170,7 @@ func (fs *FilerServer) moveFolderSubEntries(ctx context.Context, stream filer_pb // println("processing", lastFileName) newChildPath := newDirPath.Child(item.Name()) skipTarget := fs.filer.Store.SameActualStore(newDirPath, newChildPath) - err := fs.moveEntry(ctx, stream, currentDirPath, item, newDirPath, item.Name(), signatures, skipTarget, metadataEvents) + err := fs.moveEntry(ctx, stream, currentDirPath, item, newDirPath, item.Name(), signatures, skipTarget, metadataEvents, pendingChunkDeletes) if err != nil { return err } @@ -170,7 +182,7 @@ func (fs *FilerServer) moveFolderSubEntries(ctx context.Context, stream filer_pb return nil } -func (fs *FilerServer) moveSelfEntry(ctx context.Context, stream filer_pb.SeaweedFiler_StreamRenameEntryServer, oldParent util.FullPath, entry *filer.Entry, newParent util.FullPath, newName string, moveFolderSubEntries func() error, signatures []int32, skipTargetLookup bool, metadataEvents *[]metadataEvent) error { +func (fs *FilerServer) moveSelfEntry(ctx context.Context, stream filer_pb.SeaweedFiler_StreamRenameEntryServer, oldParent util.FullPath, entry *filer.Entry, newParent util.FullPath, newName string, moveFolderSubEntries func() error, signatures []int32, skipTargetLookup bool, metadataEvents *[]metadataEvent, pendingChunkDeletes *[]*filer_pb.FileChunk) error { oldPath, newPath := oldParent.Child(entry.Name()), newParent.Child(newName) @@ -192,6 +204,26 @@ func (fs *FilerServer) moveSelfEntry(ctx context.Context, stream filer_pb.Seawee return findErr } } + if existingTarget != nil { + switch { + case existingTarget.IsDirectory() && !entry.IsDirectory(): + return fmt.Errorf("%s: %w", existingTarget.FullPath, filer_pb.ErrExistingIsDirectory) + case !existingTarget.IsDirectory() && entry.IsDirectory(): + return fmt.Errorf("%s: %w", existingTarget.FullPath, filer_pb.ErrExistingIsFile) + } + if deleteErr := fs.filer.DeleteEntryMetaAndData( + filer.WithSuppressedMetadataEvents(ctx), + newPath, + false, + false, + false, + false, + signatures, + 0, + ); deleteErr != nil { + return deleteErr + } + } // add to new directory newEntry := &filer.Entry{ @@ -217,6 +249,18 @@ func (fs *FilerServer) moveSelfEntry(ctx context.Context, stream filer_pb.Seawee return createErr } } + if existingTarget != nil { + toDelete, err := filer.MinusChunks(ctx, fs.filer.MasterClient.GetLookupFileIdFunction(), existingTarget.GetChunks(), newEntry.GetChunks()) + if err != nil { + glog.ErrorfCtx(ctx, "Failed to resolve overwrite target chunks during rename. new: %v, old: %v", newEntry.GetChunks(), existingTarget.GetChunks()) + } else if len(toDelete) > 0 { + // Defer chunk deletion until after CommitTransaction so that a + // failure in any subsequent step (child moves, oldPath delete, + // stream send, or the commit itself) leaves the chunks intact for + // the rolled-back rename. + *pendingChunkDeletes = append(*pendingChunkDeletes, toDelete...) + } + } if stream != nil { if err := stream.Send(&filer_pb.StreamRenameEntryResponse{ Directory: string(oldParent), diff --git a/weed/server/nfs/access.go b/weed/server/nfs/access.go new file mode 100644 index 000000000..85ae5583f --- /dev/null +++ b/weed/server/nfs/access.go @@ -0,0 +1,140 @@ +package nfs + +import ( + "fmt" + "net" + "strings" + + "github.com/seaweedfs/seaweedfs/weed/glog" +) + +type clientAuthorizer struct { + exact map[string]struct{} + cidrs map[string]*net.IPNet + enabled bool +} + +func newClientAuthorizer(allowed []string) (*clientAuthorizer, error) { + authorizer := &clientAuthorizer{ + exact: make(map[string]struct{}), + cidrs: make(map[string]*net.IPNet), + } + + for _, raw := range allowed { + entry := strings.TrimSpace(raw) + if entry == "" { + continue + } + if strings.Contains(entry, "/") { + _, network, err := net.ParseCIDR(entry) + if err != nil { + return nil, fmt.Errorf("parse allowed NFS client %q: %w", entry, err) + } + authorizer.cidrs[entry] = network + authorizer.enabled = true + continue + } + + if ip := normalizeClientIP(entry); ip != nil { + authorizer.exact[ip.String()] = struct{}{} + authorizer.enabled = true + continue + } + + ips, err := net.LookupIP(entry) + if err != nil { + return nil, fmt.Errorf("resolve allowed NFS client %q: %w", entry, err) + } + if len(ips) == 0 { + return nil, fmt.Errorf("resolve allowed NFS client %q: no addresses", entry) + } + authorizer.exact[entry] = struct{}{} + for _, ip := range ips { + if ip == nil { + continue + } + authorizer.exact[ip.String()] = struct{}{} + } + authorizer.enabled = true + } + + return authorizer, nil +} + +func (a *clientAuthorizer) isAllowedConn(conn net.Conn) bool { + if conn == nil { + return true + } + return a.isAllowedAddr(conn.RemoteAddr()) +} + +func (a *clientAuthorizer) isAllowedAddr(addr net.Addr) bool { + if a == nil || !a.enabled { + return true + } + if addr == nil { + return false + } + + host := remoteHost(addr.String()) + if host == "" { + return false + } + if _, found := a.exact[host]; found { + return true + } + + ip := normalizeClientIP(host) + if ip == nil { + return false + } + if _, found := a.exact[ip.String()]; found { + return true + } + for _, network := range a.cidrs { + if network.Contains(ip) { + return true + } + } + return false +} + +func remoteHost(remote string) string { + host, _, err := net.SplitHostPort(strings.TrimSpace(remote)) + if err == nil { + return host + } + + host = strings.TrimSpace(remote) + if strings.HasPrefix(host, "[") && strings.HasSuffix(host, "]") { + host = host[1 : len(host)-1] + } + return host +} + +func normalizeClientIP(host string) net.IP { + host = strings.TrimSpace(host) + if zoneIndex := strings.LastIndex(host, "%"); zoneIndex >= 0 { + host = host[:zoneIndex] + } + return net.ParseIP(host) +} + +type allowlistListener struct { + net.Listener + authorizer *clientAuthorizer +} + +func (l *allowlistListener) Accept() (net.Conn, error) { + for { + conn, err := l.Listener.Accept() + if err != nil { + return nil, err + } + if l.authorizer == nil || l.authorizer.isAllowedConn(conn) { + return conn, nil + } + glog.V(0).Infof("reject unauthorized nfs client %s", conn.RemoteAddr()) + _ = conn.Close() + } +} diff --git a/weed/server/nfs/access_test.go b/weed/server/nfs/access_test.go new file mode 100644 index 000000000..105de2b96 --- /dev/null +++ b/weed/server/nfs/access_test.go @@ -0,0 +1,29 @@ +package nfs + +import ( + "net" + "testing" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" +) + +func TestClientAuthorizerResolvesHostnameEntries(t *testing.T) { + ips, err := net.LookupIP("localhost") + require.NoError(t, err) + require.NotEmpty(t, ips) + + authorizer, err := newClientAuthorizer([]string{"localhost"}) + require.NoError(t, err) + + matched := false + for _, ip := range ips { + if authorizer.isAllowedAddr(&net.TCPAddr{IP: ip, Port: 2049}) { + matched = true + break + } + } + + assert.True(t, matched) + assert.False(t, authorizer.isAllowedAddr(&net.TCPAddr{IP: net.ParseIP("192.0.2.10"), Port: 2049})) +} diff --git a/weed/server/nfs/filehandle.go b/weed/server/nfs/filehandle.go new file mode 100644 index 000000000..fd8036131 --- /dev/null +++ b/weed/server/nfs/filehandle.go @@ -0,0 +1,251 @@ +package nfs + +import ( + "context" + "encoding/binary" + "errors" + "fmt" + "hash/crc32" + "strings" + + "github.com/seaweedfs/seaweedfs/weed/filer" + "github.com/seaweedfs/seaweedfs/weed/pb/filer_pb" + "github.com/seaweedfs/seaweedfs/weed/util" + "google.golang.org/grpc" +) + +const ( + fileHandleVersion = 1 + fileHandleLength = 28 +) + +var ( + ErrInvalidHandle = errors.New("invalid nfs filehandle") + ErrHandleExportMismatch = errors.New("nfs filehandle export mismatch") + ErrStaleHandle = errors.New("stale nfs filehandle") +) + +type FileHandleKind uint8 + +const ( + FileHandleKindUnknown FileHandleKind = 0 + FileHandleKindFile FileHandleKind = 1 + FileHandleKindDirectory FileHandleKind = 2 +) + +type FileHandle struct { + Kind FileHandleKind + ExportID uint32 + Inode uint64 + Generation uint64 +} + +type filerResolverClient interface { + KvGet(ctx context.Context, in *filer_pb.KvGetRequest, opts ...grpc.CallOption) (*filer_pb.KvGetResponse, error) + LookupDirectoryEntry(ctx context.Context, in *filer_pb.LookupDirectoryEntryRequest, opts ...grpc.CallOption) (*filer_pb.LookupDirectoryEntryResponse, error) +} + +type Resolver struct { + exportRoot util.FullPath + exportID uint32 + client filerResolverClient +} + +type ResolvedHandle struct { + Handle FileHandle + Path util.FullPath + Entry *filer_pb.Entry +} + +func NewFileHandle(exportID uint32, kind FileHandleKind, inode, generation uint64) FileHandle { + if generation == 0 { + generation = filer.InodeIndexInitialGeneration + } + return FileHandle{ + Kind: kind, + ExportID: exportID, + Inode: inode, + Generation: generation, + } +} + +func (h FileHandle) Encode() []byte { + buf := make([]byte, fileHandleLength) + buf[0] = fileHandleVersion + buf[1] = byte(h.Kind) + binary.BigEndian.PutUint32(buf[4:8], h.ExportID) + binary.BigEndian.PutUint64(buf[8:16], h.Inode) + binary.BigEndian.PutUint64(buf[16:24], h.Generation) + binary.BigEndian.PutUint32(buf[24:28], crc32.ChecksumIEEE(buf[:24])) + return buf +} + +func DecodeFileHandle(raw []byte) (FileHandle, error) { + if len(raw) != fileHandleLength { + return FileHandle{}, fmt.Errorf("%w: unexpected length %d", ErrInvalidHandle, len(raw)) + } + if raw[0] != fileHandleVersion { + return FileHandle{}, fmt.Errorf("%w: unsupported version %d", ErrInvalidHandle, raw[0]) + } + + wantChecksum := binary.BigEndian.Uint32(raw[24:28]) + gotChecksum := crc32.ChecksumIEEE(raw[:24]) + if wantChecksum != gotChecksum { + return FileHandle{}, fmt.Errorf("%w: checksum mismatch", ErrInvalidHandle) + } + + handle := FileHandle{ + Kind: FileHandleKind(raw[1]), + ExportID: binary.BigEndian.Uint32(raw[4:8]), + Inode: binary.BigEndian.Uint64(raw[8:16]), + Generation: binary.BigEndian.Uint64(raw[16:24]), + } + if handle.Generation == 0 { + return FileHandle{}, fmt.Errorf("%w: empty generation", ErrInvalidHandle) + } + return handle, nil +} + +func NewResolver(exportRoot util.FullPath, client filerResolverClient) *Resolver { + root := normalizeExportRoot(exportRoot) + return &Resolver{ + exportRoot: root, + exportID: exportIDForRoot(root), + client: client, + } +} + +func (r *Resolver) ExportID() uint32 { + if r == nil { + return 0 + } + return r.exportID +} + +func (r *Resolver) ResolveHandle(ctx context.Context, raw []byte) (*ResolvedHandle, error) { + if r == nil || r.client == nil { + return nil, errors.New("nfs resolver is not configured") + } + + handle, err := DecodeFileHandle(raw) + if err != nil { + return nil, err + } + if handle.ExportID != r.exportID { + return nil, ErrHandleExportMismatch + } + if handle.Inode == 0 { + return r.resolveSyntheticRoot(ctx, handle) + } + + kvResp, err := r.client.KvGet(ctx, &filer_pb.KvGetRequest{Key: filer.InodeIndexKey(handle.Inode)}) + if err != nil { + return nil, err + } + if kvResp.GetError() != "" { + return nil, errors.New(kvResp.GetError()) + } + if len(kvResp.GetValue()) == 0 { + return nil, ErrStaleHandle + } + + record, err := filer.DecodeInodeIndexRecord(kvResp.GetValue()) + if err != nil { + return nil, err + } + if record.Generation != handle.Generation { + return nil, ErrStaleHandle + } + + for _, path := range record.FullPaths() { + if !pathVisibleFromExport(path, r.exportRoot) { + continue + } + + dir, name := path.DirAndName() + lookupResp, lookupErr := r.client.LookupDirectoryEntry(ctx, &filer_pb.LookupDirectoryEntryRequest{ + Directory: dir, + Name: name, + }) + if isLookupNotFound(lookupErr) || lookupResp == nil || lookupResp.Entry == nil { + continue + } + if lookupErr != nil { + return nil, lookupErr + } + if attrs := lookupResp.Entry.Attributes; attrs != nil && attrs.Inode != 0 && attrs.Inode != handle.Inode { + continue + } + if handle.Kind == FileHandleKindDirectory && !lookupResp.Entry.IsDirectory { + continue + } + if handle.Kind == FileHandleKindFile && lookupResp.Entry.IsDirectory { + continue + } + + return &ResolvedHandle{ + Handle: handle, + Path: path, + Entry: lookupResp.Entry, + }, nil + } + + return nil, ErrStaleHandle +} + +func (r *Resolver) resolveSyntheticRoot(ctx context.Context, handle FileHandle) (*ResolvedHandle, error) { + if handle.Kind != FileHandleKindDirectory || handle.Generation != filer.InodeIndexInitialGeneration { + return nil, ErrStaleHandle + } + + dir, name := r.exportRoot.DirAndName() + lookupResp, err := r.client.LookupDirectoryEntry(ctx, &filer_pb.LookupDirectoryEntryRequest{ + Directory: dir, + Name: name, + }) + if isLookupNotFound(err) { + return &ResolvedHandle{ + Handle: handle, + Path: r.exportRoot, + Entry: syntheticRootEntry(), + }, nil + } + if err != nil { + return nil, err + } + if lookupResp == nil || lookupResp.Entry == nil { + return &ResolvedHandle{ + Handle: handle, + Path: r.exportRoot, + Entry: syntheticRootEntry(), + }, nil + } + + return &ResolvedHandle{ + Handle: handle, + Path: r.exportRoot, + Entry: lookupResp.Entry, + }, nil +} + +func normalizeExportRoot(root util.FullPath) util.FullPath { + if normalized := util.NormalizePath(string(root)); normalized != "" { + return normalized + } + return "/" +} + +func exportIDForRoot(root util.FullPath) uint32 { + return crc32.ChecksumIEEE([]byte(normalizeExportRoot(root))) +} + +func pathVisibleFromExport(path, exportRoot util.FullPath) bool { + return path == exportRoot || path.IsUnder(exportRoot) +} + +func isLookupNotFound(err error) bool { + if err == nil { + return false + } + return err == filer_pb.ErrNotFound || strings.Contains(err.Error(), filer_pb.ErrNotFound.Error()) +} diff --git a/weed/server/nfs/filehandle_test.go b/weed/server/nfs/filehandle_test.go new file mode 100644 index 000000000..4fe38a2df --- /dev/null +++ b/weed/server/nfs/filehandle_test.go @@ -0,0 +1,182 @@ +package nfs + +import ( + "context" + "testing" + + "github.com/seaweedfs/seaweedfs/weed/filer" + "github.com/seaweedfs/seaweedfs/weed/pb/filer_pb" + "github.com/seaweedfs/seaweedfs/weed/util" + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" + "google.golang.org/grpc" +) + +type fakeResolverClient struct { + kv map[string][]byte + entries map[util.FullPath]*filer_pb.Entry +} + +func (f *fakeResolverClient) KvGet(_ context.Context, in *filer_pb.KvGetRequest, _ ...grpc.CallOption) (*filer_pb.KvGetResponse, error) { + if value, found := f.kv[string(in.Key)]; found { + return &filer_pb.KvGetResponse{Value: value}, nil + } + return &filer_pb.KvGetResponse{}, nil +} + +func (f *fakeResolverClient) LookupDirectoryEntry(_ context.Context, in *filer_pb.LookupDirectoryEntryRequest, _ ...grpc.CallOption) (*filer_pb.LookupDirectoryEntryResponse, error) { + fullPath := util.NewFullPath(in.Directory, in.Name) + if entry, found := f.entries[fullPath]; found { + return &filer_pb.LookupDirectoryEntryResponse{Entry: entry}, nil + } + return nil, filer_pb.ErrNotFound +} + +func TestFileHandleEncodeDecodeRoundTrip(t *testing.T) { + handle := NewFileHandle(1234, FileHandleKindDirectory, 5678, 9) + + raw := handle.Encode() + decoded, err := DecodeFileHandle(raw) + require.NoError(t, err) + assert.Equal(t, handle, decoded) + + raw[len(raw)-1] ^= 0xff + _, err = DecodeFileHandle(raw) + require.ErrorIs(t, err, ErrInvalidHandle) +} + +func TestResolverUsesPathVisibleFromExportRoot(t *testing.T) { + client := &fakeResolverClient{ + kv: make(map[string][]byte), + entries: make(map[util.FullPath]*filer_pb.Entry), + } + resolver := NewResolver("/exports", client) + + record := &filer.InodeIndexRecord{ + Generation: 7, + Paths: []string{"/a/other.txt", "/exports/demo/link.txt"}, + } + value, err := record.Encode() + require.NoError(t, err) + client.kv[string(filer.InodeIndexKey(101))] = value + client.entries["/exports/demo/link.txt"] = &filer_pb.Entry{ + Name: "link.txt", + Attributes: &filer_pb.FuseAttributes{ + Inode: 101, + }, + } + + handle := NewFileHandle(resolver.ExportID(), FileHandleKindFile, 101, 7) + resolved, err := resolver.ResolveHandle(context.Background(), handle.Encode()) + require.NoError(t, err) + assert.Equal(t, util.FullPath("/exports/demo/link.txt"), resolved.Path) + require.NotNil(t, resolved.Entry) + assert.Equal(t, uint64(101), resolved.Entry.Attributes.Inode) +} + +func TestResolverRejectsGenerationMismatch(t *testing.T) { + client := &fakeResolverClient{ + kv: make(map[string][]byte), + entries: make(map[util.FullPath]*filer_pb.Entry), + } + resolver := NewResolver("/", client) + + record := &filer.InodeIndexRecord{ + Generation: 3, + Paths: []string{"/data/file.txt"}, + } + value, err := record.Encode() + require.NoError(t, err) + client.kv[string(filer.InodeIndexKey(44))] = value + client.entries["/data/file.txt"] = &filer_pb.Entry{ + Name: "file.txt", + Attributes: &filer_pb.FuseAttributes{ + Inode: 44, + }, + } + + handle := NewFileHandle(resolver.ExportID(), FileHandleKindFile, 44, 4) + _, err = resolver.ResolveHandle(context.Background(), handle.Encode()) + require.ErrorIs(t, err, ErrStaleHandle) +} + +func TestResolverKeepsHandleValidAcrossRename(t *testing.T) { + client := &fakeResolverClient{ + kv: make(map[string][]byte), + entries: make(map[util.FullPath]*filer_pb.Entry), + } + resolver := NewResolver("/exports", client) + + record := &filer.InodeIndexRecord{ + Generation: 5, + Paths: []string{"/exports/new-name.txt"}, + } + value, err := record.Encode() + require.NoError(t, err) + client.kv[string(filer.InodeIndexKey(88))] = value + client.entries["/exports/new-name.txt"] = &filer_pb.Entry{ + Name: "new-name.txt", + Attributes: &filer_pb.FuseAttributes{ + Inode: 88, + }, + } + + handle := NewFileHandle(resolver.ExportID(), FileHandleKindFile, 88, 5) + resolved, err := resolver.ResolveHandle(context.Background(), handle.Encode()) + require.NoError(t, err) + assert.Equal(t, util.FullPath("/exports/new-name.txt"), resolved.Path) + require.NotNil(t, resolved.Entry) + assert.Equal(t, uint64(88), resolved.Entry.Attributes.Inode) +} + +func TestResolverRejectsHandleAfterDeleteRecreateWithNewInode(t *testing.T) { + client := &fakeResolverClient{ + kv: make(map[string][]byte), + entries: make(map[util.FullPath]*filer_pb.Entry), + } + resolver := NewResolver("/exports", client) + + client.entries["/exports/file.txt"] = &filer_pb.Entry{ + Name: "file.txt", + Attributes: &filer_pb.FuseAttributes{ + Inode: 999, + }, + } + + record := &filer.InodeIndexRecord{ + Generation: 4, + Paths: []string{"/exports/file.txt"}, + } + value, err := record.Encode() + require.NoError(t, err) + client.kv[string(filer.InodeIndexKey(77))] = value + + handle := NewFileHandle(resolver.ExportID(), FileHandleKindFile, 77, 4) + _, err = resolver.ResolveHandle(context.Background(), handle.Encode()) + require.ErrorIs(t, err, ErrStaleHandle) +} + +func TestResolverSupportsSyntheticRootHandle(t *testing.T) { + client := &fakeResolverClient{ + kv: make(map[string][]byte), + entries: make(map[util.FullPath]*filer_pb.Entry), + } + resolver := NewResolver("/", client) + + handle := NewFileHandle(resolver.ExportID(), FileHandleKindDirectory, 0, filer.InodeIndexInitialGeneration) + resolved, err := resolver.ResolveHandle(context.Background(), handle.Encode()) + require.NoError(t, err) + assert.Equal(t, util.FullPath("/"), resolved.Path) + require.NotNil(t, resolved.Entry) + assert.True(t, resolved.Entry.IsDirectory) +} + +func TestNewServerNormalizesExportRootAndExportID(t *testing.T) { + server, err := NewServer(&Option{ + FilerRootPath: "/export/path/", + Port: 2049, + }) + require.NoError(t, err) + assert.Equal(t, util.FullPath("/export/path"), server.exportRoot) + assert.Equal(t, exportIDForRoot("/export/path"), server.exportID) +} diff --git a/weed/server/nfs/filesystem.go b/weed/server/nfs/filesystem.go new file mode 100644 index 000000000..7d239f89a --- /dev/null +++ b/weed/server/nfs/filesystem.go @@ -0,0 +1,1348 @@ +package nfs + +import ( + "bytes" + "context" + "errors" + "fmt" + "io" + "os" + "path" + "sort" + "strings" + "time" + + billy "github.com/go-git/go-billy/v5" + "github.com/seaweedfs/seaweedfs/weed/filer" + "github.com/seaweedfs/seaweedfs/weed/pb/filer_pb" + "github.com/seaweedfs/seaweedfs/weed/util" + "github.com/seaweedfs/seaweedfs/weed/util/chunk_cache" + "github.com/seaweedfs/seaweedfs/weed/wdclient" + gonfs "github.com/willscott/go-nfs" + gonfsfile "github.com/willscott/go-nfs/file" + "google.golang.org/protobuf/proto" +) + +const ( + // maxInlineWriteSize is the legacy cutoff that decided whether a + // persisted write was inlined into entry.Content or uploaded as a + // chunk. The streaming write path always uploads chunks, so this + // constant is only used when reading back old inline-stored files. + maxInlineWriteSize = 4 << 20 + listEntriesPageSize = 1024 + maxSymlinkDepth = 32 +) + +type noopChunkCache struct{} + +func (noopChunkCache) ReadChunkAt(_ []byte, _ string, _ uint64) (int, error) { return 0, nil } +func (noopChunkCache) SetChunk(_ string, _ []byte) {} +func (noopChunkCache) IsInCache(_ string, _ bool) bool { return false } +func (noopChunkCache) GetMaxFilePartSizeInCache() uint64 { return 0 } + +type seaweedFileSystem struct { + server *Server + actualRoot util.FullPath + readerCache *filer.ReaderCache +} + +type seaweedFileInfo struct { + name string + virtualPath string + size int64 + mode os.FileMode + modTime time.Time + actualPath util.FullPath + entry *filer_pb.Entry + generation uint64 + fileID uint64 + nlink uint32 +} + +type seaweedFile struct { + fs *seaweedFileSystem + virtualPath string + info *seaweedFileInfo + reader io.ReaderAt + offset int64 + writable bool + appendOnly bool + closed bool +} + +var _ billy.Filesystem = (*seaweedFileSystem)(nil) +var _ billy.Capable = (*seaweedFileSystem)(nil) +var _ billy.Change = (*seaweedFileSystem)(nil) +var _ filer_pb.FilerClient = (*seaweedFileSystem)(nil) +var _ gonfs.UnixChange = (*seaweedFileSystem)(nil) + +func newSeaweedFileSystem(server *Server, actualRoot util.FullPath, sharedReaderCache *filer.ReaderCache) *seaweedFileSystem { + fs := &seaweedFileSystem{ + server: server, + actualRoot: normalizeExportRoot(actualRoot), + } + if sharedReaderCache != nil { + fs.readerCache = sharedReaderCache + } else { + fs.readerCache = filer.NewReaderCache(32, chunk_cache.ChunkCache(noopChunkCache{}), fs.LookupFn()) + } + return fs +} + +func (fs *seaweedFileSystem) Capabilities() billy.Capability { + capabilities := billy.ReadCapability | billy.SeekCapability + if !fs.isReadOnly() { + capabilities |= billy.WriteCapability | billy.ReadAndWriteCapability | billy.TruncateCapability + } + return capabilities +} + +func (fs *seaweedFileSystem) Create(filename string) (billy.File, error) { + return fs.OpenFile(filename, os.O_CREATE|os.O_RDWR|os.O_TRUNC, 0o666) +} + +func (fs *seaweedFileSystem) Open(filename string) (billy.File, error) { + return fs.openFile(context.Background(), filename, os.O_RDONLY, 0) +} + +func (fs *seaweedFileSystem) OpenFile(filename string, flag int, perm os.FileMode) (billy.File, error) { + return fs.openFile(context.Background(), filename, flag, perm) +} + +func (fs *seaweedFileSystem) openFile(ctx context.Context, filename string, flag int, perm os.FileMode) (billy.File, error) { + virtualPath := cleanBillyPath(filename) + writable := flag&(os.O_WRONLY|os.O_RDWR) != 0 + if writable { + if err := fs.ensureWritable(); err != nil { + return nil, err + } + } + + info, err := fs.ensureOpenEntry(ctx, virtualPath, flag, perm) + if err != nil { + return nil, err + } + info, err = fs.followSymlinkInfo(ctx, info, 0) + if err != nil { + return nil, err + } + if info.entry.IsDirectory { + return nil, fmt.Errorf("%s: is a directory", filename) + } + file := &seaweedFile{ + fs: fs, + virtualPath: virtualPath, + info: info, + writable: writable, + appendOnly: writable && flag&os.O_APPEND != 0, + } + if writable { + // O_TRUNC is effectively "rewrite from zero". Drop all chunks and + // inline content up front — but only if there is anything to drop, + // since a fresh empty file already satisfies the O_TRUNC semantics + // and an extra UpdateEntry would just churn metadata. + if flag&os.O_TRUNC != 0 && (filer.FileSize(info.entry) > 0 || len(info.entry.GetChunks()) > 0 || len(info.entry.Content) > 0) { + truncatedEntry, truncErr := fs.truncateEntryToSize(ctx, info.actualPath, 0) + if truncErr != nil { + return nil, truncErr + } + updatedInfo, infoErr := fs.materializeFileInfo(ctx, virtualPath, info.actualPath, truncatedEntry) + if infoErr != nil { + return nil, infoErr + } + file.info = updatedInfo + } + if flag&os.O_APPEND != 0 { + file.offset = int64(filer.FileSize(file.info.entry)) + } + } + return file, nil +} + +func (fs *seaweedFileSystem) Stat(filename string) (os.FileInfo, error) { + return fs.fileInfoForVirtualPathWithOptions(context.Background(), filename, true) +} + +func (fs *seaweedFileSystem) Lstat(filename string) (os.FileInfo, error) { + return fs.fileInfoForVirtualPathWithOptions(context.Background(), filename, false) +} + +func (fs *seaweedFileSystem) Rename(oldpath, newpath string) error { + if err := fs.ensureWritable(); err != nil { + return err + } + oldVirtualPath, oldActualPath := fs.resolvePath(oldpath) + _, newActualPath := fs.resolvePath(newpath) + + if oldVirtualPath == "/" || cleanBillyPath(newpath) == "/" { + return os.ErrPermission + } + if _, err := fs.fileInfoForVirtualPath(context.Background(), oldVirtualPath); err != nil { + return err + } + + oldDir, oldName := oldActualPath.DirAndName() + newDir, newName := newActualPath.DirAndName() + return fs.server.withInternalClient(false, func(client nfsFilerClient) error { + _, err := client.AtomicRenameEntry(context.Background(), &filer_pb.AtomicRenameEntryRequest{ + OldDirectory: oldDir, + OldName: oldName, + NewDirectory: newDir, + NewName: newName, + }) + if err != nil { + if isLookupNotFound(err) { + return os.ErrNotExist + } + return err + } + return nil + }) +} + +func (fs *seaweedFileSystem) Remove(filename string) error { + if err := fs.ensureWritable(); err != nil { + return err + } + virtualPath, actualPath := fs.resolvePath(filename) + if virtualPath == "/" { + return os.ErrPermission + } + if _, err := fs.fileInfoForVirtualPath(context.Background(), virtualPath); err != nil { + return err + } + + dir, name := actualPath.DirAndName() + return fs.server.withInternalClient(false, func(client nfsFilerClient) error { + resp, err := client.DeleteEntry(context.Background(), &filer_pb.DeleteEntryRequest{ + Directory: dir, + Name: name, + IsDeleteData: false, + IsRecursive: false, + }) + if err != nil { + if isLookupNotFound(err) { + return os.ErrNotExist + } + return err + } + if resp != nil && resp.Error != "" { + if strings.Contains(resp.Error, filer_pb.ErrNotFound.Error()) { + return os.ErrNotExist + } + return errors.New(resp.Error) + } + return nil + }) +} + +func (fs *seaweedFileSystem) Join(elem ...string) string { + if len(elem) == 0 { + return "/" + } + joined := path.Join(elem...) + if joined == "." || joined == "" { + return "/" + } + if !strings.HasPrefix(joined, "/") { + joined = "/" + joined + } + return path.Clean(joined) +} + +func (fs *seaweedFileSystem) TempFile(string, string) (billy.File, error) { + return nil, billy.ErrReadOnly +} + +func (fs *seaweedFileSystem) ReadDir(dirname string) ([]os.FileInfo, error) { + ctx := context.Background() + virtualPath, actualPath := fs.resolvePath(dirname) + + var infos []os.FileInfo + startFrom := "" + for { + pageCount := 0 + lastName := "" + err := fs.server.withInternalClient(false, func(client nfsFilerClient) error { + stream, err := client.ListEntries(ctx, &filer_pb.ListEntriesRequest{ + Directory: string(actualPath), + StartFromFileName: startFrom, + InclusiveStartFrom: false, + Limit: listEntriesPageSize, + }) + if err != nil { + if isLookupNotFound(err) { + return os.ErrNotExist + } + return err + } + + for { + resp, recvErr := stream.Recv() + if recvErr == io.EOF { + break + } + if recvErr != nil { + return recvErr + } + if resp == nil || resp.Entry == nil { + continue + } + + lastName = resp.Entry.Name + pageCount++ + + childVirtualPath := path.Join(virtualPath, resp.Entry.Name) + childActualPath := util.NewFullPath(string(actualPath), resp.Entry.Name) + info, infoErr := fs.materializeFileInfo(ctx, childVirtualPath, childActualPath, resp.Entry) + if infoErr != nil { + return infoErr + } + infos = append(infos, info) + } + return nil + }) + if err != nil { + return nil, err + } + if pageCount < listEntriesPageSize || lastName == "" { + break + } + startFrom = lastName + } + + sort.Slice(infos, func(i, j int) bool { + return infos[i].Name() < infos[j].Name() + }) + return infos, nil +} + +func (fs *seaweedFileSystem) MkdirAll(filename string, perm os.FileMode) error { + if err := fs.ensureWritable(); err != nil { + return err + } + virtualPath := cleanBillyPath(filename) + if virtualPath == "/" { + return nil + } + + info, err := fs.fileInfoForVirtualPath(context.Background(), virtualPath) + if err == nil { + if info.IsDir() { + return nil + } + return os.ErrExist + } + if !os.IsNotExist(err) { + return err + } + + _, actualPath := fs.resolvePath(virtualPath) + _, err = fs.createEntry(context.Background(), actualPath, true, perm|os.ModeDir, "") + return err +} + +func (fs *seaweedFileSystem) Symlink(target, link string) error { + if err := fs.ensureWritable(); err != nil { + return err + } + virtualPath, actualPath := fs.resolvePath(link) + if virtualPath == "/" { + return os.ErrPermission + } + if _, err := fs.fileInfoForVirtualPath(context.Background(), virtualPath); err == nil { + return os.ErrExist + } else if !os.IsNotExist(err) { + return err + } + + _, err := fs.createEntry(context.Background(), actualPath, false, 0o777, target) + return err +} + +func (fs *seaweedFileSystem) Link(target, link string) error { + if err := fs.ensureWritable(); err != nil { + return err + } + ctx := context.Background() + + linkVirtualPath, linkActualPath := fs.resolvePath(link) + if linkVirtualPath == "/" { + return os.ErrPermission + } + if _, err := fs.fileInfoForVirtualPath(ctx, linkVirtualPath); err == nil { + return os.ErrExist + } else if !os.IsNotExist(err) { + return err + } + + sourceActualPath, sourceEntry, err := fs.resolveHardLinkTarget(ctx, target) + if err != nil { + return err + } + if sourceEntry == nil { + return os.ErrNotExist + } + if sourceEntry.IsDirectory { + return billy.ErrNotSupported + } + + sourceOriginal, ok := proto.Clone(sourceEntry).(*filer_pb.Entry) + if !ok { + return errors.New("clone hard link source entry") + } + + updatedSource, err := fs.mutateEntry(ctx, sourceActualPath, func(entry *filer_pb.Entry) { + if entry.Attributes == nil { + entry.Attributes = &filer_pb.FuseAttributes{} + } + if len(entry.HardLinkId) == 0 { + entry.HardLinkId = filer.NewHardLinkId() + entry.HardLinkCounter = 1 + } + entry.HardLinkCounter++ + touchEntryTimes(entry, true) + }) + if err != nil { + return err + } + + newLinkEntry, ok := proto.Clone(updatedSource).(*filer_pb.Entry) + if !ok { + return errors.New("clone hard link target entry") + } + _, linkName := linkActualPath.DirAndName() + newLinkEntry.Name = linkName + + if _, err := fs.createEntryFromProto(ctx, linkActualPath, newLinkEntry); err != nil { + _, rollbackErr := fs.updateEntryAtPath(ctx, sourceActualPath, sourceOriginal) + if rollbackErr != nil { + return fmt.Errorf("create hard link: %w (rollback failed: %v)", err, rollbackErr) + } + return err + } + + return nil +} + +func (fs *seaweedFileSystem) Readlink(link string) (string, error) { + info, err := fs.fileInfoForVirtualPath(context.Background(), link) + if err != nil { + return "", err + } + if info.entry.Attributes == nil || info.entry.Attributes.SymlinkTarget == "" { + return "", billy.ErrNotSupported + } + return info.entry.Attributes.SymlinkTarget, nil +} + +func (fs *seaweedFileSystem) Mknod(string, uint32, uint32, uint32) error { + return billy.ErrNotSupported +} + +func (fs *seaweedFileSystem) Mkfifo(string, uint32) error { + return billy.ErrNotSupported +} + +func (fs *seaweedFileSystem) Socket(string) error { + return billy.ErrNotSupported +} + +func (fs *seaweedFileSystem) Chroot(p string) (billy.Filesystem, error) { + info, err := fs.fileInfoForVirtualPath(context.Background(), p) + if err != nil { + return nil, err + } + if !info.IsDir() { + return nil, fmt.Errorf("%s: not a directory", p) + } + return newSeaweedFileSystem(fs.server, info.actualPath, fs.readerCache), nil +} + +func (fs *seaweedFileSystem) Chmod(name string, mode os.FileMode) error { + if err := fs.ensureWritable(); err != nil { + return err + } + _, actualPath := fs.resolvePath(name) + _, err := fs.mutateEntry(context.Background(), actualPath, func(entry *filer_pb.Entry) { + entry.Attributes.FileMode = uint32(mode) + touchEntryTimes(entry, false) + }) + return err +} + +func (fs *seaweedFileSystem) Lchown(name string, uid, gid int) error { + if err := fs.ensureWritable(); err != nil { + return err + } + _, actualPath := fs.resolvePath(name) + _, err := fs.mutateEntry(context.Background(), actualPath, func(entry *filer_pb.Entry) { + entry.Attributes.Uid = uint32(uid) + entry.Attributes.Gid = uint32(gid) + touchEntryTimes(entry, false) + }) + return err +} + +func (fs *seaweedFileSystem) Chown(name string, uid, gid int) error { + return fs.Lchown(name, uid, gid) +} + +func (fs *seaweedFileSystem) Chtimes(name string, _ time.Time, mtime time.Time) error { + if err := fs.ensureWritable(); err != nil { + return err + } + _, actualPath := fs.resolvePath(name) + _, err := fs.mutateEntry(context.Background(), actualPath, func(entry *filer_pb.Entry) { + entry.Attributes.Mtime = mtime.Unix() + entry.Attributes.MtimeNs = int32(mtime.Nanosecond()) + entry.Attributes.Ctime = mtime.Unix() + entry.Attributes.CtimeNs = int32(mtime.Nanosecond()) + }) + return err +} + +func (fs *seaweedFileSystem) Root() string { + return "/" +} + +func (fs *seaweedFileSystem) WithFilerClient(streamingMode bool, fn func(filer_pb.SeaweedFilerClient) error) error { + return fs.server.WithFilerClient(streamingMode, fn) +} + +func (fs *seaweedFileSystem) LookupFn() wdclient.LookupFileIdFunctionType { + if fs == nil || fs.server == nil { + return nil + } + return fs.server.LookupFn() +} + +func (fs *seaweedFileSystem) AdjustedUrl(location *filer_pb.Location) string { + if location == nil { + return "" + } + if fs.server.option.VolumeServerAccess == "publicUrl" && location.PublicUrl != "" { + return location.PublicUrl + } + return location.Url +} + +func (fs *seaweedFileSystem) isReadOnly() bool { + return fs != nil && fs.server != nil && fs.server.option != nil && fs.server.option.ReadOnly +} + +func (fs *seaweedFileSystem) ensureWritable() error { + if fs.isReadOnly() { + return billy.ErrReadOnly + } + return nil +} + +func (fs *seaweedFileSystem) GetDataCenter() string { + return "" +} + +func (fs *seaweedFileSystem) resolvePath(name string) (string, util.FullPath) { + virtualPath := cleanBillyPath(name) + if virtualPath == "/" { + return virtualPath, fs.actualRoot + } + return virtualPath, fs.actualRoot.Child(strings.TrimPrefix(virtualPath, "/")) +} + +func (fs *seaweedFileSystem) ensureOpenEntry(ctx context.Context, virtualPath string, flag int, perm os.FileMode) (*seaweedFileInfo, error) { + info, err := fs.fileInfoForVirtualPath(ctx, virtualPath) + if err == nil { + if flag&os.O_CREATE != 0 && flag&os.O_EXCL != 0 { + return nil, os.ErrExist + } + return info, nil + } + if !os.IsNotExist(err) { + return nil, err + } + if flag&os.O_CREATE == 0 { + return nil, err + } + + _, actualPath := fs.resolvePath(virtualPath) + if perm == 0 { + perm = 0o666 + } + entry, createErr := fs.createEntry(ctx, actualPath, false, perm, "") + if createErr != nil { + return nil, createErr + } + return fs.materializeFileInfo(ctx, virtualPath, actualPath, entry) +} + +func (fs *seaweedFileSystem) createEntry(ctx context.Context, actualPath util.FullPath, isDirectory bool, mode os.FileMode, symlinkTarget string) (*filer_pb.Entry, error) { + dir, name := actualPath.DirAndName() + now := time.Now() + entry := &filer_pb.Entry{ + Name: name, + IsDirectory: isDirectory, + Attributes: &filer_pb.FuseAttributes{ + Mtime: now.Unix(), + MtimeNs: int32(now.Nanosecond()), + Ctime: now.Unix(), + CtimeNs: int32(now.Nanosecond()), + Crtime: now.Unix(), + FileMode: uint32(mode), + Uid: filer_pb.OS_UID, + Gid: filer_pb.OS_GID, + }, + } + if isDirectory { + entry.Attributes.FileMode = uint32(mode | os.ModeDir) + } + if symlinkTarget != "" { + entry.Attributes.SymlinkTarget = symlinkTarget + } + + var createdEntry *filer_pb.Entry + err := fs.server.withInternalClient(false, func(client nfsFilerClient) error { + resp, err := client.CreateEntry(ctx, &filer_pb.CreateEntryRequest{ + Directory: dir, + Entry: entry, + OExcl: false, + }) + if err != nil { + if errors.Is(err, filer_pb.ErrEntryAlreadyExists) { + return os.ErrExist + } + return err + } + if resp != nil { + if resp.ErrorCode != filer_pb.FilerError_OK { + if sentinel := filer_pb.FilerErrorToSentinel(resp.ErrorCode); sentinel != nil { + if errors.Is(sentinel, filer_pb.ErrEntryAlreadyExists) { + return os.ErrExist + } + return sentinel + } + if resp.Error != "" { + return errors.New(resp.Error) + } + } + if resp.MetadataEvent != nil && resp.MetadataEvent.EventNotification != nil && resp.MetadataEvent.EventNotification.NewEntry != nil { + createdEntry = resp.MetadataEvent.EventNotification.NewEntry + } + } + return nil + }) + if err != nil { + return nil, err + } + if createdEntry != nil { + return createdEntry, nil + } + return fs.lookupEntry(ctx, actualPath) +} + +func (fs *seaweedFileSystem) createEntryFromProto(ctx context.Context, actualPath util.FullPath, entry *filer_pb.Entry) (*filer_pb.Entry, error) { + dir, name := actualPath.DirAndName() + + clonedEntry, ok := proto.Clone(entry).(*filer_pb.Entry) + if !ok { + return nil, errors.New("clone filer entry") + } + clonedEntry.Name = name + + var createdEntry *filer_pb.Entry + err := fs.server.withInternalClient(false, func(client nfsFilerClient) error { + resp, err := client.CreateEntry(ctx, &filer_pb.CreateEntryRequest{ + Directory: dir, + Entry: clonedEntry, + OExcl: false, + }) + if err != nil { + if errors.Is(err, filer_pb.ErrEntryAlreadyExists) { + return os.ErrExist + } + return err + } + if resp != nil { + if resp.ErrorCode != filer_pb.FilerError_OK { + if sentinel := filer_pb.FilerErrorToSentinel(resp.ErrorCode); sentinel != nil { + if errors.Is(sentinel, filer_pb.ErrEntryAlreadyExists) { + return os.ErrExist + } + return sentinel + } + if resp.Error != "" { + return errors.New(resp.Error) + } + } + if resp.MetadataEvent != nil && resp.MetadataEvent.EventNotification != nil && resp.MetadataEvent.EventNotification.NewEntry != nil { + createdEntry = resp.MetadataEvent.EventNotification.NewEntry + } + } + return nil + }) + if err != nil { + return nil, err + } + if createdEntry != nil { + return createdEntry, nil + } + return fs.lookupEntry(ctx, actualPath) +} + +func (fs *seaweedFileSystem) mutateEntry(ctx context.Context, actualPath util.FullPath, mutate func(*filer_pb.Entry)) (*filer_pb.Entry, error) { + currentEntry, err := fs.lookupEntry(ctx, actualPath) + if err != nil { + return nil, err + } + + clonedEntry, ok := proto.Clone(currentEntry).(*filer_pb.Entry) + if !ok { + return nil, errors.New("clone filer entry") + } + if clonedEntry.Attributes == nil { + clonedEntry.Attributes = &filer_pb.FuseAttributes{} + } + + mutate(clonedEntry) + + dir, _ := actualPath.DirAndName() + var updatedEntry *filer_pb.Entry + err = fs.server.withInternalClient(false, func(client nfsFilerClient) error { + resp, err := client.UpdateEntry(ctx, &filer_pb.UpdateEntryRequest{ + Directory: dir, + Entry: clonedEntry, + }) + if err != nil { + return err + } + if resp != nil && resp.MetadataEvent != nil && resp.MetadataEvent.EventNotification != nil && resp.MetadataEvent.EventNotification.NewEntry != nil { + updatedEntry = resp.MetadataEvent.EventNotification.NewEntry + } + return nil + }) + if err != nil { + return nil, err + } + if updatedEntry != nil { + return updatedEntry, nil + } + return fs.lookupEntry(ctx, actualPath) +} + +func (fs *seaweedFileSystem) updateEntryAtPath(ctx context.Context, actualPath util.FullPath, entry *filer_pb.Entry) (*filer_pb.Entry, error) { + clonedEntry, ok := proto.Clone(entry).(*filer_pb.Entry) + if !ok { + return nil, errors.New("clone filer entry") + } + _, name := actualPath.DirAndName() + clonedEntry.Name = name + + dir, _ := actualPath.DirAndName() + var updatedEntry *filer_pb.Entry + err := fs.server.withInternalClient(false, func(client nfsFilerClient) error { + resp, err := client.UpdateEntry(ctx, &filer_pb.UpdateEntryRequest{ + Directory: dir, + Entry: clonedEntry, + }) + if err != nil { + return err + } + if resp != nil && resp.MetadataEvent != nil && resp.MetadataEvent.EventNotification != nil && resp.MetadataEvent.EventNotification.NewEntry != nil { + updatedEntry = resp.MetadataEvent.EventNotification.NewEntry + } + return nil + }) + if err != nil { + return nil, err + } + if updatedEntry != nil { + return updatedEntry, nil + } + return fs.lookupEntry(ctx, actualPath) +} + +// saveDataAsChunk uploads `content` to a volume server and returns a filer +// FileChunk describing the resulting segment at the requested file offset. +// The caller is responsible for wiring the returned chunk into the entry's +// chunk list (typically via mutateEntry) and for updating FileSize. +// +// The actual AssignVolume + HTTP upload is handled by +// filer.SaveGatewayDataAsChunk so NFS, WebDAV, and future filer-backed +// gateways share a single implementation of that code path. +func (fs *seaweedFileSystem) saveDataAsChunk(actualPath util.FullPath, content []byte, fileOffset int64) (*filer_pb.FileChunk, error) { + uploader, err := fs.server.newUploader() + if err != nil { + return nil, fmt.Errorf("upload data: %w", err) + } + + return filer.SaveGatewayDataAsChunk(filer.GatewayChunkUploadRequest{ + FilerClient: fs, + Uploader: uploader, + Reader: util.NewBytesReader(content), + FullPath: string(actualPath), + Filename: actualPath.Name(), + Offset: fileOffset, + TsNs: time.Now().UnixNano(), + DataCenter: fs.GetDataCenter(), + VolumeServerAccess: fs.server.option.VolumeServerAccess, + FilerHTTPAddress: fs.server.option.Filer.ToHttpAddress(), + }) +} + +// appendStreamedChunk uploads `data` at `fileOffset` and atomically appends +// the resulting chunk to the filer entry, extending FileSize if this write +// grew the file. If the entry currently stores its payload inline in +// entry.Content, that content is migrated to a chunk first so the chunk +// list becomes the authoritative representation for the file. +func (fs *seaweedFileSystem) appendStreamedChunk(ctx context.Context, info *seaweedFileInfo, data []byte, fileOffset int64) (*filer_pb.Entry, error) { + // Upload the caller's write as a chunk at the target offset. + newChunk, err := fs.saveDataAsChunk(info.actualPath, data, fileOffset) + if err != nil { + return nil, err + } + + // If the file still has inline content, migrate it to a chunk as well. + // We upload it outside of mutateEntry so the mutation closure stays + // synchronous and short. + var migratedInlineChunk *filer_pb.FileChunk + if info.entry != nil && len(info.entry.Content) > 0 { + migratedInlineChunk, err = fs.saveDataAsChunk(info.actualPath, info.entry.Content, 0) + if err != nil { + return nil, err + } + } + + newEnd := uint64(fileOffset) + uint64(len(data)) + return fs.mutateEntry(ctx, info.actualPath, func(entry *filer_pb.Entry) { + if migratedInlineChunk != nil && len(entry.Content) > 0 { + entry.Chunks = append(entry.Chunks, migratedInlineChunk) + entry.Content = nil + } + entry.Chunks = append(entry.Chunks, newChunk) + entry.RemoteEntry = nil + if newEnd > entry.Attributes.FileSize { + entry.Attributes.FileSize = newEnd + } + touchEntryTimes(entry, true) + }) +} + +// truncateEntryToSize resizes the file to `size` by dropping chunks that +// live entirely past the new size, clipping inline content, and updating +// FileSize. Chunks that straddle the new size are left intact; the filer's +// chunk-view layer clips the logical read window at FileSize. +func (fs *seaweedFileSystem) truncateEntryToSize(ctx context.Context, actualPath util.FullPath, size int64) (*filer_pb.Entry, error) { + if size < 0 { + return nil, billy.ErrNotSupported + } + return fs.mutateEntry(ctx, actualPath, func(entry *filer_pb.Entry) { + kept := entry.Chunks[:0] + for _, chunk := range entry.Chunks { + if chunk.Offset >= size { + continue + } + kept = append(kept, chunk) + } + entry.Chunks = kept + if int64(len(entry.Content)) > size { + if size == 0 { + entry.Content = nil + } else { + entry.Content = entry.Content[:size] + } + } + entry.Attributes.FileSize = uint64(size) + touchEntryTimes(entry, true) + }) +} + +func (fs *seaweedFileSystem) fileInfoForVirtualPath(ctx context.Context, name string) (*seaweedFileInfo, error) { + return fs.fileInfoForVirtualPathWithOptions(ctx, name, false) +} + +func (fs *seaweedFileSystem) fileInfoForVirtualPathWithOptions(ctx context.Context, name string, followFinalSymlink bool) (*seaweedFileInfo, error) { + return fs.fileInfoForVirtualPathDepth(ctx, name, followFinalSymlink, 0) +} + +func (fs *seaweedFileSystem) fileInfoForVirtualPathDepth(ctx context.Context, name string, followFinalSymlink bool, depth int) (*seaweedFileInfo, error) { + virtualPath, actualPath := fs.resolvePath(name) + + entry, err := fs.lookupEntry(ctx, actualPath) + if err != nil { + return nil, err + } + info, err := fs.materializeFileInfo(ctx, virtualPath, actualPath, entry) + if err != nil { + return nil, err + } + if !followFinalSymlink { + return info, nil + } + return fs.followSymlinkInfo(ctx, info, depth) +} + +func (fs *seaweedFileSystem) followSymlinkInfo(ctx context.Context, info *seaweedFileInfo, depth int) (*seaweedFileInfo, error) { + if info == nil || info.entry == nil || info.entry.Attributes == nil || info.entry.Attributes.SymlinkTarget == "" { + return info, nil + } + if depth >= maxSymlinkDepth { + return nil, fmt.Errorf("%s: too many symlinks", info.virtualPath) + } + targetPath := resolveSymlinkVirtualPath(info.virtualPath, info.entry.Attributes.SymlinkTarget) + return fs.fileInfoForVirtualPathDepth(ctx, targetPath, true, depth+1) +} + +func resolveSymlinkVirtualPath(linkPath, target string) string { + if strings.HasPrefix(target, "/") { + return cleanBillyPath(target) + } + return cleanBillyPath(path.Join(path.Dir(cleanBillyPath(linkPath)), target)) +} + +func (fs *seaweedFileSystem) materializeFileInfo(ctx context.Context, virtualPath string, actualPath util.FullPath, entry *filer_pb.Entry) (*seaweedFileInfo, error) { + entry, generation, err := fs.ensureIndexedEntry(ctx, actualPath, entry) + if err != nil { + return nil, err + } + + fileID := entry.Attributes.GetInode() + if fileID == 0 && actualPath == fs.server.exportRoot && entry.IsDirectory { + fileID = uint64(fs.server.exportID) + } + + return &seaweedFileInfo{ + name: fileInfoName(virtualPath, entry), + virtualPath: virtualPath, + size: int64(filer.FileSize(entry)), + mode: fileModeForEntry(entry), + modTime: entryModTime(entry), + actualPath: actualPath, + entry: entry, + generation: generation, + fileID: fileID, + nlink: entryLinkCount(entry), + }, nil +} + +func (fs *seaweedFileSystem) lookupEntry(ctx context.Context, actualPath util.FullPath) (*filer_pb.Entry, error) { + var entry *filer_pb.Entry + err := fs.server.withInternalClient(false, func(client nfsFilerClient) error { + dir, name := actualPath.DirAndName() + resp, err := client.LookupDirectoryEntry(ctx, &filer_pb.LookupDirectoryEntryRequest{ + Directory: dir, + Name: name, + }) + if err != nil { + return err + } + if resp == nil || resp.Entry == nil { + return filer_pb.ErrNotFound + } + entry = resp.Entry + return nil + }) + if err == nil { + return entry, nil + } + if isLookupNotFound(err) { + if actualPath == "/" { + return syntheticRootEntry(), nil + } + return nil, os.ErrNotExist + } + return nil, err +} + +func (fs *seaweedFileSystem) resolveHardLinkTarget(ctx context.Context, target string) (util.FullPath, *filer_pb.Entry, error) { + var resolved *ResolvedHandle + handleErr := fs.server.withInternalClient(false, func(client nfsFilerClient) error { + var err error + resolved, err = NewResolver(fs.server.exportRoot, client).ResolveHandle(ctx, []byte(target)) + return err + }) + if handleErr == nil && resolved != nil { + return resolved.Path, resolved.Entry, nil + } + + if strings.HasPrefix(target, "/") { + _, actualPath := fs.resolvePath(target) + entry, err := fs.lookupEntry(ctx, actualPath) + if err != nil { + return "", nil, err + } + return actualPath, entry, nil + } + + if handleErr != nil { + return "", nil, handleErr + } + return "", nil, os.ErrNotExist +} + +func (fs *seaweedFileSystem) ensureIndexedEntry(ctx context.Context, actualPath util.FullPath, entry *filer_pb.Entry) (*filer_pb.Entry, uint64, error) { + if entry == nil { + return nil, 0, os.ErrNotExist + } + if entry.Attributes == nil { + entry.Attributes = &filer_pb.FuseAttributes{} + } + + if entry.Attributes.Inode == 0 && !(actualPath == "/" && entry.Name == "/" && entry.IsDirectory) { + updatedEntry, err := fs.backfillLegacyInode(ctx, actualPath, entry) + if err != nil { + return nil, 0, err + } + entry = updatedEntry + } + + if entry.Attributes.GetInode() == 0 { + if actualPath == "/" && entry.Name == "/" && entry.IsDirectory { + return entry, filer.InodeIndexInitialGeneration, nil + } + return nil, 0, fmt.Errorf("nfs requires inode-backed entry for %s", actualPath) + } + + generation, err := fs.lookupGeneration(ctx, entry.Attributes.GetInode()) + if err != nil { + return nil, 0, err + } + return entry, generation, nil +} + +func (fs *seaweedFileSystem) backfillLegacyInode(ctx context.Context, actualPath util.FullPath, entry *filer_pb.Entry) (*filer_pb.Entry, error) { + dir, _ := actualPath.DirAndName() + clonedEntry, ok := proto.Clone(entry).(*filer_pb.Entry) + if !ok { + return nil, errors.New("clone filer entry") + } + + var updatedEntry *filer_pb.Entry + err := fs.server.withInternalClient(false, func(client nfsFilerClient) error { + resp, err := client.UpdateEntry(ctx, &filer_pb.UpdateEntryRequest{ + Directory: dir, + Entry: clonedEntry, + }) + if err != nil { + return err + } + if resp != nil && resp.MetadataEvent != nil && resp.MetadataEvent.EventNotification != nil && resp.MetadataEvent.EventNotification.NewEntry != nil { + updatedEntry = resp.MetadataEvent.EventNotification.NewEntry + } + return nil + }) + if err != nil { + return nil, err + } + if updatedEntry != nil { + return updatedEntry, nil + } + return fs.lookupEntry(ctx, actualPath) +} + +func (fs *seaweedFileSystem) lookupGeneration(ctx context.Context, inode uint64) (uint64, error) { + var resp *filer_pb.KvGetResponse + err := fs.server.withInternalClient(false, func(client nfsFilerClient) error { + var kvErr error + resp, kvErr = client.KvGet(ctx, &filer_pb.KvGetRequest{Key: filer.InodeIndexKey(inode)}) + return kvErr + }) + if err != nil { + return 0, err + } + if resp == nil { + return 0, ErrStaleHandle + } + if resp.GetError() != "" { + return 0, errors.New(resp.GetError()) + } + if len(resp.GetValue()) == 0 { + return 0, ErrStaleHandle + } + + record, err := filer.DecodeInodeIndexRecord(resp.GetValue()) + if err != nil { + return 0, err + } + if record.Generation == 0 { + return filer.InodeIndexInitialGeneration, nil + } + return record.Generation, nil +} + +func fileInfoName(virtualPath string, entry *filer_pb.Entry) string { + if entry != nil && entry.Name != "" { + return entry.Name + } + if virtualPath == "/" { + return "/" + } + return path.Base(virtualPath) +} + +func fileModeForEntry(entry *filer_pb.Entry) os.FileMode { + mode := os.FileMode(0) + if entry != nil && entry.Attributes != nil { + mode = os.FileMode(entry.Attributes.FileMode) + } + if entry != nil && entry.IsDirectory { + mode |= os.ModeDir + } + if entry != nil && entry.Attributes != nil && entry.Attributes.SymlinkTarget != "" { + mode |= os.ModeSymlink + } + return mode +} + +func entryModTime(entry *filer_pb.Entry) time.Time { + if entry == nil || entry.Attributes == nil { + return time.Unix(0, 0) + } + seconds := entry.Attributes.Mtime + nanos := int64(entry.Attributes.MtimeNs) + if seconds == 0 && nanos == 0 { + seconds = entry.Attributes.Crtime + } + return time.Unix(seconds, nanos) +} + +func entryLinkCount(entry *filer_pb.Entry) uint32 { + if entry == nil { + return 1 + } + if entry.HardLinkCounter > 0 { + return uint32(entry.HardLinkCounter) + } + return 1 +} + +func touchEntryTimes(entry *filer_pb.Entry, updateMtime bool) { + if entry == nil { + return + } + if entry.Attributes == nil { + entry.Attributes = &filer_pb.FuseAttributes{} + } + now := time.Now() + if updateMtime { + entry.Attributes.Mtime = now.Unix() + entry.Attributes.MtimeNs = int32(now.Nanosecond()) + } + entry.Attributes.Ctime = now.Unix() + entry.Attributes.CtimeNs = int32(now.Nanosecond()) + if entry.Attributes.Crtime == 0 { + entry.Attributes.Crtime = now.Unix() + } +} + +func cleanBillyPath(name string) string { + if name == "" || name == "." { + return "/" + } + cleaned := path.Clean(name) + if cleaned == "." { + return "/" + } + if !strings.HasPrefix(cleaned, "/") { + cleaned = "/" + cleaned + } + return cleaned +} + +func syntheticRootEntry() *filer_pb.Entry { + return &filer_pb.Entry{ + Name: "/", + IsDirectory: true, + Attributes: &filer_pb.FuseAttributes{ + FileMode: uint32(os.ModeDir | 0755), + }, + } +} + +func (fi *seaweedFileInfo) Name() string { return fi.name } +func (fi *seaweedFileInfo) Size() int64 { return fi.size } +func (fi *seaweedFileInfo) Mode() os.FileMode { return fi.mode } +func (fi *seaweedFileInfo) ModTime() time.Time { return fi.modTime } +func (fi *seaweedFileInfo) IsDir() bool { return fi.mode.IsDir() } +func (fi *seaweedFileInfo) Sys() interface{} { + return &gonfsfile.FileInfo{ + Nlink: fi.nlink, + UID: fi.entry.GetAttributes().GetUid(), + GID: fi.entry.GetAttributes().GetGid(), + Fileid: fi.fileID, + } +} + +func (f *seaweedFile) Name() string { return f.virtualPath } + +func (f *seaweedFile) Read(p []byte) (int, error) { + n, err := f.ReadAt(p, f.offset) + f.offset += int64(n) + return n, err +} + +func (f *seaweedFile) ReadAt(p []byte, off int64) (int, error) { + // Writable opens no longer carry a private in-memory copy of the + // content; reads always go through the filer entry's inline bytes or + // chunk list. Write() refreshes f.info after each append so a + // read-after-write in the same session sees the new data. + if len(f.info.entry.Content) > 0 { + reader := bytes.NewReader(f.info.entry.Content) + return reader.ReadAt(p, off) + } + + fileSize := int64(filer.FileSize(f.info.entry)) + if fileSize == 0 || off >= fileSize { + return 0, io.EOF + } + if f.reader == nil { + visibleIntervals, err := filer.NonOverlappingVisibleIntervals(context.Background(), f.fs.LookupFn(), f.info.entry.GetChunks(), 0, fileSize) + if err != nil { + return 0, err + } + chunkViews := filer.ViewFromVisibleIntervals(visibleIntervals, 0, fileSize) + f.reader = filer.NewChunkReaderAtFromClient(context.Background(), f.fs.readerCache, chunkViews, fileSize, filer.DefaultPrefetchCount) + } + return f.reader.ReadAt(p, off) +} + +func (f *seaweedFile) Write(p []byte) (int, error) { + if !f.writable { + return 0, billy.ErrReadOnly + } + if f.fs != nil && f.fs.isReadOnly() { + return 0, billy.ErrReadOnly + } + if f.closed { + return 0, os.ErrClosed + } + if len(p) == 0 { + return 0, nil + } + if f.offset < 0 { + return 0, billy.ErrNotSupported + } + if f.appendOnly { + f.offset = int64(filer.FileSize(f.info.entry)) + } + + ctx := context.Background() + currentSize := int64(filer.FileSize(f.info.entry)) + hasChunks := len(f.info.entry.GetChunks()) > 0 + + // Inline fast path — mirrors the filer HTTP upload handler's + // SaveToFilerLimit shortcut. As long as the file has no existing + // chunks and the post-write size still fits in the inline budget, + // we rewrite the `Content` bytes directly on the filer entry and + // skip the volume-server round-trip entirely. A write that would + // push the file beyond the inline limit, or a write to a file that + // already has chunks, falls through to the streaming path below. + postWriteSize := f.offset + int64(len(p)) + if postWriteSize < currentSize { + postWriteSize = currentSize + } + var updatedEntry *filer_pb.Entry + var err error + if !hasChunks && postWriteSize <= int64(maxInlineWriteSize) { + existing := f.info.entry.Content + merged := make([]byte, postWriteSize) + copy(merged, existing) + copy(merged[f.offset:], p) + updatedEntry, err = f.fs.mutateEntry(ctx, f.info.actualPath, func(entry *filer_pb.Entry) { + entry.Content = merged + entry.Chunks = nil + entry.RemoteEntry = nil + entry.Attributes.FileSize = uint64(len(merged)) + touchEntryTimes(entry, true) + }) + } else { + // Streaming path: upload the caller's bytes straight to a volume + // server and atomically append the resulting chunk to the filer + // entry. No per-file in-memory buffer is held; each Write call + // costs one AssignVolume + one chunk upload + one filer + // UpdateEntry, exactly like how `weed filer` HTTP uploads and + // the S3 gateway persist object data. + updatedEntry, err = f.fs.appendStreamedChunk(ctx, f.info, p, f.offset) + } + if err != nil { + return 0, err + } + + updatedInfo, err := f.fs.materializeFileInfo(ctx, f.virtualPath, f.info.actualPath, updatedEntry) + if err != nil { + return 0, err + } + f.info = updatedInfo + // Invalidate any cached reader so a subsequent Read sees the new data. + f.reader = nil + + f.offset += int64(len(p)) + return len(p), nil +} + +func (f *seaweedFile) Seek(offset int64, whence int) (int64, error) { + nextOffset := f.offset + switch whence { + case io.SeekStart: + nextOffset = offset + case io.SeekCurrent: + nextOffset += offset + case io.SeekEnd: + if f.writable { + nextOffset = int64(filer.FileSize(f.info.entry)) + offset + } else { + nextOffset = f.info.size + offset + } + default: + return 0, fmt.Errorf("invalid whence %d", whence) + } + if nextOffset < 0 { + nextOffset = 0 + } + // POSIX allows Seek on an O_APPEND file — the append-only constraint + // only restricts Write, not read offsets or lseek positioning. Write + // already snaps the offset back to EOF before writing (see seaweedFile + // Write), so we can accept any Seek here without violating the + // append-only guarantee. + f.offset = nextOffset + return f.offset, nil +} + +func (f *seaweedFile) Close() error { + if f.closed { + return nil + } + f.closed = true + // All dirty data is flushed to the filer synchronously inside Write + // (and inside Truncate), so Close has nothing to do beyond marking the + // handle as unusable. + return nil +} +func (f *seaweedFile) Lock() error { return billy.ErrNotSupported } +func (f *seaweedFile) Unlock() error { return billy.ErrNotSupported } + +func (f *seaweedFile) Truncate(size int64) error { + if !f.writable { + return billy.ErrReadOnly + } + if f.fs != nil && f.fs.isReadOnly() { + return billy.ErrReadOnly + } + if size < 0 { + return billy.ErrNotSupported + } + ctx := context.Background() + updatedEntry, err := f.fs.truncateEntryToSize(ctx, f.info.actualPath, size) + if err != nil { + return err + } + updatedInfo, err := f.fs.materializeFileInfo(ctx, f.virtualPath, f.info.actualPath, updatedEntry) + if err != nil { + return err + } + f.info = updatedInfo + f.reader = nil + if f.offset > size { + f.offset = size + } + return nil +} diff --git a/weed/server/nfs/handler.go b/weed/server/nfs/handler.go new file mode 100644 index 000000000..c51987dc1 --- /dev/null +++ b/weed/server/nfs/handler.go @@ -0,0 +1,127 @@ +package nfs + +import ( + "context" + "net" + "os" + "strings" + + billy "github.com/go-git/go-billy/v5" + "github.com/seaweedfs/seaweedfs/weed/filer" + "github.com/seaweedfs/seaweedfs/weed/pb/filer_pb" + "github.com/seaweedfs/seaweedfs/weed/util" + gonfs "github.com/willscott/go-nfs" +) + +type Handler struct { + server *Server + rootFS *seaweedFileSystem +} + +var _ gonfs.Handler = (*Handler)(nil) + +func (h *Handler) Mount(_ context.Context, conn net.Conn, req gonfs.MountRequest) (gonfs.MountStatus, billy.Filesystem, []gonfs.AuthFlavor) { + if h.server.clientAuthorizer != nil && !h.server.clientAuthorizer.isAllowedConn(conn) { + return gonfs.MountStatusErrAcces, nil, []gonfs.AuthFlavor{gonfs.AuthFlavorNull} + } + requestedPath := normalizeExportRoot(util.FullPath(req.Dirpath)) + if requestedPath != h.server.exportRoot { + return gonfs.MountStatusErrNoEnt, nil, []gonfs.AuthFlavor{gonfs.AuthFlavorNull} + } + if _, err := h.rootFS.Lstat("/"); err != nil { + if os.IsNotExist(err) { + return gonfs.MountStatusErrNoEnt, nil, []gonfs.AuthFlavor{gonfs.AuthFlavorNull} + } + return gonfs.MountStatusErrServerFault, nil, []gonfs.AuthFlavor{gonfs.AuthFlavorNull} + } + return gonfs.MountStatusOk, h.rootFS, []gonfs.AuthFlavor{gonfs.AuthFlavorNull, gonfs.AuthFlavorUnix} +} + +func (h *Handler) Change(filesystem billy.Filesystem) billy.Change { + if h.server != nil && h.server.option != nil && h.server.option.ReadOnly { + return nil + } + if changer, ok := filesystem.(billy.Change); ok { + return changer + } + return nil +} + +func (h *Handler) FSStat(ctx context.Context, _ billy.Filesystem, stat *gonfs.FSStat) error { + return h.server.withInternalClient(false, func(client nfsFilerClient) error { + resp, err := client.Statistics(ctx, &filer_pb.StatisticsRequest{}) + if err != nil { + return err + } + if resp == nil { + return nil + } + stat.TotalSize = resp.TotalSize + if resp.TotalSize >= resp.UsedSize { + stat.FreeSize = resp.TotalSize - resp.UsedSize + stat.AvailableSize = resp.TotalSize - resp.UsedSize + } + stat.TotalFiles = resp.FileCount + return nil + }) +} + +func (h *Handler) ToHandle(filesystem billy.Filesystem, path []string) []byte { + fs, ok := filesystem.(*seaweedFileSystem) + if !ok { + fs = h.rootFS + } + + info, err := fs.fileInfoForVirtualPath(context.Background(), fs.Join(path...)) + if err != nil { + return nil + } + + inode := info.entry.GetAttributes().GetInode() + if inode == 0 && info.actualPath == h.server.exportRoot && info.entry.IsDirectory { + return NewFileHandle(h.server.exportID, FileHandleKindDirectory, 0, filer.InodeIndexInitialGeneration).Encode() + } + + return NewFileHandle(h.server.exportID, fileHandleKindForEntry(info.entry), inode, info.generation).Encode() +} + +func (h *Handler) FromHandle(raw []byte) (billy.Filesystem, []string, error) { + var resolved *ResolvedHandle + err := h.server.withInternalClient(false, func(client nfsFilerClient) error { + var resolveErr error + resolved, resolveErr = NewResolver(h.server.exportRoot, client).ResolveHandle(context.Background(), raw) + return resolveErr + }) + if err != nil { + return nil, nil, err + } + + if resolved.Path == h.server.exportRoot { + return h.rootFS, nil, nil + } + + if !pathVisibleFromExport(resolved.Path, h.server.exportRoot) { + return nil, nil, ErrHandleExportMismatch + } + + relativePath := string(resolved.Path) + if h.server.exportRoot != "/" { + relativePath = strings.TrimPrefix(relativePath, string(h.server.exportRoot)) + } + return h.rootFS, util.NormalizePath(relativePath).Split(), nil +} + +func (h *Handler) InvalidateHandle(billy.Filesystem, []byte) error { + return nil +} + +func (h *Handler) HandleLimit() int { + return h.server.handleLimit +} + +func fileHandleKindForEntry(entry *filer_pb.Entry) FileHandleKind { + if entry != nil && entry.IsDirectory { + return FileHandleKindDirectory + } + return FileHandleKindFile +} diff --git a/weed/server/nfs/integration_test.go b/weed/server/nfs/integration_test.go new file mode 100644 index 000000000..9f325f3e8 --- /dev/null +++ b/weed/server/nfs/integration_test.go @@ -0,0 +1,718 @@ +package nfs + +import ( + "bytes" + "context" + "crypto/md5" + "encoding/base64" + "encoding/json" + "fmt" + "io" + "math/rand" + "mime/multipart" + "net" + "net/http" + "net/http/httptest" + "path" + "strconv" + "strings" + "sync" + "testing" + "time" + + "github.com/seaweedfs/seaweedfs/weed/filer" + "github.com/seaweedfs/seaweedfs/weed/pb" + "github.com/seaweedfs/seaweedfs/weed/pb/filer_pb" + "github.com/seaweedfs/seaweedfs/weed/util" + util_http "github.com/seaweedfs/seaweedfs/weed/util/http" + "github.com/seaweedfs/seaweedfs/weed/wdclient" + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" + gonfs "github.com/willscott/go-nfs" + nfsclient "github.com/willscott/go-nfs-client/nfs" + "github.com/willscott/go-nfs-client/nfs/rpc" + "github.com/willscott/go-nfs-client/nfs/xdr" + "google.golang.org/grpc" + "google.golang.org/grpc/credentials/insecure" +) + +type fakeVolumeBlob struct { + data []byte + contentEncoding string +} + +type fakeVolumeServer struct { + mu sync.Mutex + blobs map[string]fakeVolumeBlob + server *httptest.Server +} + +type fakeVolumeControlPlane struct { + filer_pb.UnimplementedSeaweedFilerServer + + mu sync.Mutex + host string + nextID int + assigns []*filer_pb.AssignVolumeRequest + lookups []*filer_pb.LookupVolumeRequest +} + +var initIntegrationHTTPClient sync.Once + +const nfsProc3Link = 15 + +func newFakeVolumeServer(t *testing.T) *fakeVolumeServer { + t.Helper() + + fake := &fakeVolumeServer{ + blobs: make(map[string]fakeVolumeBlob), + } + fake.server = httptest.NewServer(http.HandlerFunc(fake.serveHTTP)) + t.Cleanup(fake.server.Close) + return fake +} + +func (f *fakeVolumeServer) host() string { + return strings.TrimPrefix(f.server.URL, "http://") +} + +func (f *fakeVolumeServer) serveHTTP(w http.ResponseWriter, r *http.Request) { + fileID := strings.TrimPrefix(r.URL.Path, "/") + if fileID == "" { + http.NotFound(w, r) + return + } + + switch r.Method { + case http.MethodPost: + part, err := firstMultipartFile(r) + if err != nil { + http.Error(w, err.Error(), http.StatusBadRequest) + return + } + defer part.Close() + + data, err := io.ReadAll(part) + if err != nil { + http.Error(w, err.Error(), http.StatusBadRequest) + return + } + + contentEncoding := part.Header.Get("Content-Encoding") + sum := md5.Sum(data) + + f.mu.Lock() + f.blobs[fileID] = fakeVolumeBlob{ + data: bytes.Clone(data), + contentEncoding: contentEncoding, + } + f.mu.Unlock() + + w.Header().Set("Content-MD5", base64.StdEncoding.EncodeToString(sum[:])) + w.Header().Set("ETag", `"`+base64.StdEncoding.EncodeToString(sum[:])+`"`) + w.Header().Set("Content-Type", "application/json") + _ = json.NewEncoder(w).Encode(map[string]any{ + "name": path.Base(fileID), + "size": len(data), + }) + case http.MethodGet: + f.mu.Lock() + blob, found := f.blobs[fileID] + f.mu.Unlock() + if !found { + http.NotFound(w, r) + return + } + if blob.contentEncoding != "" { + w.Header().Set("Content-Encoding", blob.contentEncoding) + } + http.ServeContent(w, r, fileID, time.Unix(0, 0), bytes.NewReader(blob.data)) + default: + http.Error(w, "method not allowed", http.StatusMethodNotAllowed) + } +} + +func firstMultipartFile(r *http.Request) (*multipart.Part, error) { + reader, err := r.MultipartReader() + if err != nil { + return nil, err + } + + for { + part, err := reader.NextPart() + if err == io.EOF { + return nil, io.ErrUnexpectedEOF + } + if err != nil { + return nil, err + } + if part.FormName() == "file" { + return part, nil + } + part.Close() + } +} + +func (f *fakeVolumeControlPlane) AssignVolume(_ context.Context, req *filer_pb.AssignVolumeRequest) (*filer_pb.AssignVolumeResponse, error) { + f.mu.Lock() + defer f.mu.Unlock() + + f.assigns = append(f.assigns, req) + f.nextID++ + fileID := fmt.Sprintf("7,%08x", f.nextID) + return &filer_pb.AssignVolumeResponse{ + FileId: fileID, + Count: 1, + Location: &filer_pb.Location{ + Url: f.host, + }, + }, nil +} + +func (f *fakeVolumeControlPlane) LookupVolume(_ context.Context, req *filer_pb.LookupVolumeRequest) (*filer_pb.LookupVolumeResponse, error) { + f.mu.Lock() + f.lookups = append(f.lookups, req) + f.mu.Unlock() + + locations := make(map[string]*filer_pb.Locations, len(req.GetVolumeIds())) + for _, volumeID := range req.GetVolumeIds() { + locations[volumeID] = &filer_pb.Locations{ + Locations: []*filer_pb.Location{ + {Url: f.host}, + }, + } + } + return &filer_pb.LookupVolumeResponse{LocationsMap: locations}, nil +} + +func startFakeVolumeControlPlane(t *testing.T, controlPlane *fakeVolumeControlPlane) string { + t.Helper() + + listener, err := net.Listen("tcp", "127.0.0.1:0") + require.NoError(t, err) + + grpcServer := grpc.NewServer() + filer_pb.RegisterSeaweedFilerServer(grpcServer, controlPlane) + + done := make(chan error, 1) + go func() { + done <- grpcServer.Serve(listener) + }() + + t.Cleanup(func() { + grpcServer.Stop() + _ = listener.Close() + select { + case err := <-done: + if err != nil && !isClosedNetworkErr(err) { + t.Errorf("fake control plane exited with error: %v", err) + } + case <-time.After(time.Second): + t.Errorf("timed out waiting for fake control plane shutdown") + } + }) + + return listener.Addr().String() +} + +func mountTestTarget(t *testing.T, server *Server) (*nfsclient.Target, func()) { + t.Helper() + + listener, err := net.Listen("tcp", "127.0.0.1:0") + require.NoError(t, err) + + handler, err := server.newHandler() + require.NoError(t, err) + + done := make(chan error, 1) + go func() { + done <- gonfs.Serve(listener, handler) + }() + + var client *rpc.Client + for attempt := 0; attempt < 10; attempt++ { + client, err = rpc.DialTCP(listener.Addr().Network(), listener.Addr().String(), false) + if err == nil { + break + } + if attempt == 9 { + require.NoError(t, err) + } + time.Sleep(10 * time.Millisecond) + } + require.NoError(t, err) + + mounter := &nfsclient.Mount{Client: client} + target, err := mounter.Mount(string(server.exportRoot), rpc.AuthNull) + require.NoError(t, err) + + cleanup := func() { + _ = mounter.Unmount() + client.Close() + _ = listener.Close() + + select { + case err := <-done: + if err != nil && !isClosedNetworkErr(err) { + t.Errorf("nfs server exited with error: %v", err) + } + case <-time.After(time.Second): + t.Errorf("timed out waiting for nfs server shutdown") + } + } + + return target, cleanup +} + +func isClosedNetworkErr(err error) bool { + if err == nil { + return false + } + if strings.Contains(err.Error(), "use of closed network connection") { + return true + } + return strings.Contains(err.Error(), "listener closed") +} + +func nfsLink(target *nfsclient.Target, sourceHandle []byte, linkPath string) error { + parentDir, linkName := path.Split(path.Clean(linkPath)) + if linkName == "" { + return fmt.Errorf("invalid hard link path %q", linkPath) + } + if parentDir == "" { + parentDir = "/" + } + + _, parentHandle, err := target.Lookup(parentDir) + if err != nil { + return err + } + + // Field layout matches the go-nfs server's onLink handler + // (vendor: github.com/willscott/go-nfs/nfs_onlink.go), which reads + // DirOpArg + SetFileAttributes + opaque target handle. That wire + // order differs from RFC 1813 §3.3.15 LINK3args {nfs_fh3 file; + // diropargs3 link;} — the go-nfs library is not strictly compliant + // here, and we mirror its layout so the integration test exercises + // the same parser the server uses. Do not reorder fields to match + // the RFC: the test would then fail against a correctly-functioning + // server. + type LinkArgs struct { + rpc.Header + Link nfsclient.Diropargs3 + Sattr nfsclient.Sattr3 + Target []byte + } + + res, err := target.Call(&LinkArgs{ + Header: rpc.Header{ + Rpcvers: 2, + Prog: nfsclient.Nfs3Prog, + Vers: nfsclient.Nfs3Vers, + Proc: nfsProc3Link, + Cred: rpc.AuthNull, + Verf: rpc.AuthNull, + }, + Link: nfsclient.Diropargs3{ + FH: parentHandle, + Filename: linkName, + }, + Target: sourceHandle, + }) + if err != nil { + return err + } + + status, err := xdr.ReadUint32(res) + if err != nil { + return err + } + return nfsclient.NFS3Error(status) +} + +func TestSeaweedNFSServesInlineRoundTripOverRPC(t *testing.T) { + client := &fakeNFSFilerClient{ + kv: map[string][]byte{ + string(filer.InodeIndexKey(101)): testIndexRecord(t, 101, 1, "/exports"), + }, + entries: map[util.FullPath]*filer_pb.Entry{ + "/exports": testEntry("exports", true, 101, uint32(0755), nil), + }, + } + + server := newTestServer(t, "/exports", client) + target, cleanup := mountTestTarget(t, server) + defer cleanup() + defer target.Close() + + _, err := target.Mkdir("/docs", 0o755) + require.NoError(t, err) + + file, err := target.OpenFile("/docs/note.txt", 0o644) + require.NoError(t, err) + payload := []byte("hello over rpc") + _, err = file.Write(payload) + require.NoError(t, err) + require.NoError(t, file.Close()) + + readFile, err := target.Open("/docs/note.txt") + require.NoError(t, err) + defer readFile.Close() + + data, err := io.ReadAll(readFile) + require.NoError(t, err) + assert.Equal(t, payload, data) + + entry := client.entries["/exports/docs/note.txt"] + require.NotNil(t, entry) + assert.Equal(t, payload, entry.Content) + assert.Empty(t, entry.Chunks) + + _, beforeRenameHandle, err := target.Lookup("/docs/note.txt") + require.NoError(t, err) + + entries, err := target.ReadDirPlus("/docs") + require.NoError(t, err) + require.Len(t, entries, 1) + assert.Equal(t, "note.txt", entries[0].Name()) + + require.NoError(t, target.Rename("/docs/note.txt", "/docs/final.txt")) + _, err = target.GetAttr(beforeRenameHandle) + require.NoError(t, err) + _, _, err = target.Lookup("/docs/final.txt") + require.NoError(t, err) + _, _, err = target.Lookup("/docs/note.txt") + require.Error(t, err) + + require.NoError(t, target.Remove("/docs/final.txt")) + _, _, err = target.Lookup("/docs/final.txt") + require.Error(t, err) +} + +func TestSeaweedNFSReadOnlyRejectsMutations(t *testing.T) { + client := &fakeNFSFilerClient{ + kv: map[string][]byte{ + string(filer.InodeIndexKey(101)): testIndexRecord(t, 101, 1, "/exports"), + string(filer.InodeIndexKey(202)): testIndexRecord(t, 202, 3, "/exports/existing.txt"), + }, + entries: map[util.FullPath]*filer_pb.Entry{ + "/exports": testEntry("exports", true, 101, uint32(0755), nil), + "/exports/existing.txt": testEntry("existing.txt", false, 202, uint32(0644), []byte("seed")), + }, + } + + server := newTestServer(t, "/exports", client) + server.option.ReadOnly = true + + target, cleanup := mountTestTarget(t, server) + defer cleanup() + defer target.Close() + + _, err := target.OpenFile("/created.txt", 0o644) + require.Error(t, err) + nfsErr, ok := err.(*nfsclient.Error) + require.True(t, ok) + assert.Equal(t, uint32(nfsclient.NFS3ErrROFS), nfsErr.ErrorNum) + + file, err := target.Open("/existing.txt") + require.NoError(t, err) + _, err = file.Write([]byte("mutate")) + require.Error(t, err) + nfsErr, ok = err.(*nfsclient.Error) + require.True(t, ok) + assert.Equal(t, uint32(nfsclient.NFS3ErrROFS), nfsErr.ErrorNum) + _ = file.Close() + + readFile, err := target.Open("/existing.txt") + require.NoError(t, err) + defer readFile.Close() + + data, err := io.ReadAll(readFile) + require.NoError(t, err) + assert.Equal(t, []byte("seed"), data) +} + +func TestSeaweedNFSServesSymlinkRoundTripOverRPC(t *testing.T) { + client := &fakeNFSFilerClient{ + kv: map[string][]byte{ + string(filer.InodeIndexKey(101)): testIndexRecord(t, 101, 1, "/exports"), + }, + entries: map[util.FullPath]*filer_pb.Entry{ + "/exports": testEntry("exports", true, 101, uint32(0755), nil), + }, + } + + server := newTestServer(t, "/exports", client) + target, cleanup := mountTestTarget(t, server) + defer cleanup() + defer target.Close() + + file, err := target.OpenFile("/target.txt", 0o644) + require.NoError(t, err) + _, err = file.Write([]byte("payload")) + require.NoError(t, err) + require.NoError(t, file.Close()) + + require.NoError(t, target.Symlink("target.txt", "/target.link")) + + info, _, err := target.Lookup("/target.link") + require.NoError(t, err) + attr, ok := info.(*nfsclient.Fattr) + require.True(t, ok) + assert.Equal(t, uint32(nfsclient.NF3Lnk), attr.Type) + + linkFile, err := target.Open("/target.link") + require.NoError(t, err) + defer linkFile.Close() + + linkTarget, err := linkFile.Readlink() + require.NoError(t, err) + assert.Equal(t, "target.txt", linkTarget) + + entry := client.entries["/exports/target.link"] + require.NotNil(t, entry) + assert.Equal(t, "target.txt", entry.GetAttributes().GetSymlinkTarget()) +} + +func TestSeaweedNFSServesHardLinkRoundTripOverRPC(t *testing.T) { + client := &fakeNFSFilerClient{ + kv: map[string][]byte{ + string(filer.InodeIndexKey(101)): testIndexRecord(t, 101, 1, "/exports"), + }, + entries: map[util.FullPath]*filer_pb.Entry{ + "/exports": testEntry("exports", true, 101, uint32(0755), nil), + }, + } + + server := newTestServer(t, "/exports", client) + target, cleanup := mountTestTarget(t, server) + defer cleanup() + defer target.Close() + + file, err := target.OpenFile("/source.txt", 0o644) + require.NoError(t, err) + payload := []byte("shared content") + _, err = file.Write(payload) + require.NoError(t, err) + require.NoError(t, file.Close()) + + _, sourceHandle, err := target.Lookup("/source.txt") + require.NoError(t, err) + require.NoError(t, nfsLink(target, sourceHandle, "/linked.txt")) + + sourceInfo, sourceHandle, err := target.Lookup("/source.txt") + require.NoError(t, err) + linkedInfo, linkedHandle, err := target.Lookup("/linked.txt") + require.NoError(t, err) + + sourceAttr, ok := sourceInfo.(*nfsclient.Fattr) + require.True(t, ok) + linkAttr, ok := linkedInfo.(*nfsclient.Fattr) + require.True(t, ok) + assert.Equal(t, sourceHandle, linkedHandle) + assert.Equal(t, sourceAttr.Fileid, linkAttr.Fileid) + assert.Equal(t, uint32(2), sourceAttr.Nlink) + assert.Equal(t, uint32(2), linkAttr.Nlink) + + linkedFile, err := target.Open("/linked.txt") + require.NoError(t, err) + defer linkedFile.Close() + + data, err := io.ReadAll(linkedFile) + require.NoError(t, err) + assert.Equal(t, payload, data) + + sourceEntry := client.entries["/exports/source.txt"] + linkedEntry := client.entries["/exports/linked.txt"] + require.NotNil(t, sourceEntry) + require.NotNil(t, linkedEntry) + assert.Equal(t, sourceEntry.GetHardLinkId(), linkedEntry.GetHardLinkId()) + assert.Equal(t, int32(2), sourceEntry.GetHardLinkCounter()) + assert.Equal(t, int32(2), linkedEntry.GetHardLinkCounter()) + + require.NoError(t, target.Remove("/source.txt")) + + remainingAttr, err := target.GetAttr(sourceHandle) + require.NoError(t, err) + assert.Equal(t, uint32(1), remainingAttr.Nlink) + + _, _, err = target.Lookup("/source.txt") + require.Error(t, err) + + linkedFile, err = target.Open("/linked.txt") + require.NoError(t, err) + data, err = io.ReadAll(linkedFile) + require.NoError(t, err) + require.NoError(t, linkedFile.Close()) + assert.Equal(t, payload, data) + + require.NoError(t, target.Remove("/linked.txt")) + _, err = target.GetAttr(linkedHandle) + require.Error(t, err) + nfsErr, ok := err.(*nfsclient.Error) + require.True(t, ok) + assert.Equal(t, uint32(nfsclient.NFS3ErrStale), nfsErr.ErrorNum) +} + +func TestSeaweedNFSServesLargeChunkRoundTripOverRPC(t *testing.T) { + initIntegrationHTTPClient.Do(util_http.InitGlobalHttpClient) + + client := &fakeNFSFilerClient{ + kv: map[string][]byte{ + string(filer.InodeIndexKey(101)): testIndexRecord(t, 101, 1, "/exports"), + }, + entries: map[util.FullPath]*filer_pb.Entry{ + "/exports": testEntry("exports", true, 101, uint32(0755), nil), + }, + } + + volumeServer := newFakeVolumeServer(t) + controlPlane := &fakeVolumeControlPlane{host: volumeServer.host()} + controlPlaneAddr := startFakeVolumeControlPlane(t, controlPlane) + _, grpcPortString, err := net.SplitHostPort(controlPlaneAddr) + require.NoError(t, err) + grpcPort, err := strconv.Atoi(grpcPortString) + require.NoError(t, err) + + server := newTestServer(t, "/exports", client) + server.option.Filer = pb.NewServerAddressWithGrpcPort(controlPlaneAddr, grpcPort) + server.option.GrpcDialOption = grpc.WithTransportCredentials(insecure.NewCredentials()) + if server.filerClient != nil { + server.filerClient.Close() + } + server.filerClient = wdclient.NewFilerClient([]pb.ServerAddress{server.option.Filer}, server.option.GrpcDialOption, "") + server.withFilerClient = func(_ bool, fn func(filer_pb.SeaweedFilerClient) error) error { + conn, err := grpc.NewClient(controlPlaneAddr, grpc.WithTransportCredentials(insecure.NewCredentials())) + if err != nil { + return err + } + defer conn.Close() + return fn(filer_pb.NewSeaweedFilerClient(conn)) + } + + target, cleanup := mountTestTarget(t, server) + defer cleanup() + defer target.Close() + + payload := make([]byte, maxInlineWriteSize+4096) + _, err = rand.New(rand.NewSource(1)).Read(payload) + require.NoError(t, err) + + file, err := target.OpenFile("/big.bin", 0o644) + require.NoError(t, err) + _, err = file.Write(payload) + require.NoError(t, err) + require.NoError(t, file.Close()) + + entry := client.entries["/exports/big.bin"] + require.NotNil(t, entry) + require.Len(t, entry.GetChunks(), 1) + assert.Nil(t, entry.Content) + assert.Equal(t, uint64(len(payload)), entry.GetAttributes().GetFileSize()) + + readFile, err := target.Open("/big.bin") + require.NoError(t, err) + defer readFile.Close() + + data, err := io.ReadAll(readFile) + require.NoError(t, err) + assert.Equal(t, payload, data) + + controlPlane.mu.Lock() + defer controlPlane.mu.Unlock() + require.Len(t, controlPlane.assigns, 1) + assert.Equal(t, "/exports/big.bin", controlPlane.assigns[0].GetPath()) + assert.NotEmpty(t, controlPlane.lookups) +} + +func TestSeaweedNFSRejectsStaleHandleAfterDeleteRecreate(t *testing.T) { + client := &fakeNFSFilerClient{ + kv: map[string][]byte{ + string(filer.InodeIndexKey(101)): testIndexRecord(t, 101, 1, "/exports"), + }, + entries: map[util.FullPath]*filer_pb.Entry{ + "/exports": testEntry("exports", true, 101, uint32(0755), nil), + }, + } + + server := newTestServer(t, "/exports", client) + target, cleanup := mountTestTarget(t, server) + defer cleanup() + defer target.Close() + + file, err := target.OpenFile("/stale.txt", 0o644) + require.NoError(t, err) + _, err = file.Write([]byte("old")) + require.NoError(t, err) + require.NoError(t, file.Close()) + + _, oldHandle, err := target.Lookup("/stale.txt") + require.NoError(t, err) + + require.NoError(t, target.Remove("/stale.txt")) + + file, err = target.OpenFile("/stale.txt", 0o644) + require.NoError(t, err) + _, err = file.Write([]byte("new")) + require.NoError(t, err) + require.NoError(t, file.Close()) + + _, err = target.GetAttr(oldHandle) + require.Error(t, err) + nfsErr, ok := err.(*nfsclient.Error) + require.True(t, ok) + assert.Equal(t, uint32(nfsclient.NFS3ErrStale), nfsErr.ErrorNum) + + _, newHandle, err := target.Lookup("/stale.txt") + require.NoError(t, err) + _, err = target.GetAttr(newHandle) + require.NoError(t, err) +} + +func TestSeaweedNFSFileHandleSurvivesServerRestart(t *testing.T) { + client := &fakeNFSFilerClient{ + kv: map[string][]byte{ + string(filer.InodeIndexKey(101)): testIndexRecord(t, 101, 1, "/exports"), + }, + entries: map[util.FullPath]*filer_pb.Entry{ + "/exports": testEntry("exports", true, 101, uint32(0755), nil), + }, + } + + server := newTestServer(t, "/exports", client) + target, cleanup := mountTestTarget(t, server) + + file, err := target.OpenFile("/restart.txt", 0o644) + require.NoError(t, err) + payload := []byte("survives restart") + _, err = file.Write(payload) + require.NoError(t, err) + require.NoError(t, file.Close()) + + _, handle, err := target.Lookup("/restart.txt") + require.NoError(t, err) + + target.Close() + cleanup() + + restartedServer := newTestServer(t, "/exports", client) + restartedTarget, restartedCleanup := mountTestTarget(t, restartedServer) + defer restartedCleanup() + defer restartedTarget.Close() + + attr, err := restartedTarget.GetAttr(handle) + require.NoError(t, err) + assert.Equal(t, uint64(client.entries["/exports/restart.txt"].GetAttributes().GetInode()), attr.Fileid) + + _, restartedHandle, err := restartedTarget.Lookup("/restart.txt") + require.NoError(t, err) + assert.Equal(t, handle, restartedHandle) + + readFile, err := restartedTarget.Open("/restart.txt") + require.NoError(t, err) + defer readFile.Close() + + data, err := io.ReadAll(readFile) + require.NoError(t, err) + assert.Equal(t, payload, data) +} diff --git a/weed/server/nfs/internal_client.go b/weed/server/nfs/internal_client.go new file mode 100644 index 000000000..e3bca1882 --- /dev/null +++ b/weed/server/nfs/internal_client.go @@ -0,0 +1,88 @@ +package nfs + +import ( + "context" + + "github.com/seaweedfs/seaweedfs/weed/pb" + "github.com/seaweedfs/seaweedfs/weed/pb/filer_pb" + "google.golang.org/grpc" +) + +type filerClientExecutor func(streamingMode bool, fn func(filer_pb.SeaweedFilerClient) error) error +type internalClientExecutor func(streamingMode bool, fn func(nfsFilerClient) error) error + +type nfsListEntriesClient interface { + Recv() (*filer_pb.ListEntriesResponse, error) +} + +type nfsSubscribeMetadataClient interface { + Recv() (*filer_pb.SubscribeMetadataResponse, error) +} + +type nfsFilerClient interface { + KvGet(ctx context.Context, in *filer_pb.KvGetRequest, opts ...grpc.CallOption) (*filer_pb.KvGetResponse, error) + LookupDirectoryEntry(ctx context.Context, in *filer_pb.LookupDirectoryEntryRequest, opts ...grpc.CallOption) (*filer_pb.LookupDirectoryEntryResponse, error) + ListEntries(ctx context.Context, in *filer_pb.ListEntriesRequest, opts ...grpc.CallOption) (nfsListEntriesClient, error) + SubscribeMetadata(ctx context.Context, in *filer_pb.SubscribeMetadataRequest, opts ...grpc.CallOption) (nfsSubscribeMetadataClient, error) + CreateEntry(ctx context.Context, in *filer_pb.CreateEntryRequest, opts ...grpc.CallOption) (*filer_pb.CreateEntryResponse, error) + UpdateEntry(ctx context.Context, in *filer_pb.UpdateEntryRequest, opts ...grpc.CallOption) (*filer_pb.UpdateEntryResponse, error) + DeleteEntry(ctx context.Context, in *filer_pb.DeleteEntryRequest, opts ...grpc.CallOption) (*filer_pb.DeleteEntryResponse, error) + AtomicRenameEntry(ctx context.Context, in *filer_pb.AtomicRenameEntryRequest, opts ...grpc.CallOption) (*filer_pb.AtomicRenameEntryResponse, error) + Statistics(ctx context.Context, in *filer_pb.StatisticsRequest, opts ...grpc.CallOption) (*filer_pb.StatisticsResponse, error) +} + +type grpcNFSFilerClient struct { + client filer_pb.SeaweedFilerClient +} + +func (c grpcNFSFilerClient) KvGet(ctx context.Context, in *filer_pb.KvGetRequest, opts ...grpc.CallOption) (*filer_pb.KvGetResponse, error) { + return c.client.KvGet(ctx, in, opts...) +} + +func (c grpcNFSFilerClient) LookupDirectoryEntry(ctx context.Context, in *filer_pb.LookupDirectoryEntryRequest, opts ...grpc.CallOption) (*filer_pb.LookupDirectoryEntryResponse, error) { + return c.client.LookupDirectoryEntry(ctx, in, opts...) +} + +func (c grpcNFSFilerClient) ListEntries(ctx context.Context, in *filer_pb.ListEntriesRequest, opts ...grpc.CallOption) (nfsListEntriesClient, error) { + return c.client.ListEntries(ctx, in, opts...) +} + +func (c grpcNFSFilerClient) SubscribeMetadata(ctx context.Context, in *filer_pb.SubscribeMetadataRequest, opts ...grpc.CallOption) (nfsSubscribeMetadataClient, error) { + return c.client.SubscribeMetadata(ctx, in, opts...) +} + +func (c grpcNFSFilerClient) CreateEntry(ctx context.Context, in *filer_pb.CreateEntryRequest, opts ...grpc.CallOption) (*filer_pb.CreateEntryResponse, error) { + return c.client.CreateEntry(ctx, in, opts...) +} + +func (c grpcNFSFilerClient) UpdateEntry(ctx context.Context, in *filer_pb.UpdateEntryRequest, opts ...grpc.CallOption) (*filer_pb.UpdateEntryResponse, error) { + return c.client.UpdateEntry(ctx, in, opts...) +} + +func (c grpcNFSFilerClient) DeleteEntry(ctx context.Context, in *filer_pb.DeleteEntryRequest, opts ...grpc.CallOption) (*filer_pb.DeleteEntryResponse, error) { + return c.client.DeleteEntry(ctx, in, opts...) +} + +func (c grpcNFSFilerClient) AtomicRenameEntry(ctx context.Context, in *filer_pb.AtomicRenameEntryRequest, opts ...grpc.CallOption) (*filer_pb.AtomicRenameEntryResponse, error) { + return c.client.AtomicRenameEntry(ctx, in, opts...) +} + +func (c grpcNFSFilerClient) Statistics(ctx context.Context, in *filer_pb.StatisticsRequest, opts ...grpc.CallOption) (*filer_pb.StatisticsResponse, error) { + return c.client.Statistics(ctx, in, opts...) +} + +func newFilerClientExecutor(option *Option, signature int32) filerClientExecutor { + return func(streamingMode bool, fn func(filer_pb.SeaweedFilerClient) error) error { + return pb.WithGrpcClient(streamingMode, signature, func(grpcConnection *grpc.ClientConn) error { + return fn(filer_pb.NewSeaweedFilerClient(grpcConnection)) + }, option.Filer.ToGrpcAddress(), false, option.GrpcDialOption) + } +} + +func newInternalClientExecutor(option *Option, signature int32) internalClientExecutor { + return func(streamingMode bool, fn func(nfsFilerClient) error) error { + return pb.WithGrpcClient(streamingMode, signature, func(grpcConnection *grpc.ClientConn) error { + return fn(grpcNFSFilerClient{client: filer_pb.NewSeaweedFilerClient(grpcConnection)}) + }, option.Filer.ToGrpcAddress(), false, option.GrpcDialOption) + } +} diff --git a/weed/server/nfs/metadata_follow.go b/weed/server/nfs/metadata_follow.go new file mode 100644 index 000000000..5ed0a44ce --- /dev/null +++ b/weed/server/nfs/metadata_follow.go @@ -0,0 +1,147 @@ +package nfs + +import ( + "context" + "errors" + "io" + "time" + + "github.com/seaweedfs/seaweedfs/weed/glog" + "github.com/seaweedfs/seaweedfs/weed/pb/filer_pb" + "github.com/seaweedfs/seaweedfs/weed/util" +) + +type chunkInvalidator interface { + UnCache(fileID string) +} + +type metadataInvalidation struct { + path util.FullPath + entry *filer_pb.Entry +} + +func (s *Server) runMetadataInvalidationLoop(ctx context.Context) { + if s == nil || s.chunkInvalidator == nil || s.withInternalClient == nil { + return + } + + waitTime := time.Second + for ctx.Err() == nil { + err := s.followMetadataStream(ctx) + if err == nil || errors.Is(err, context.Canceled) || ctx.Err() != nil { + return + } + + glog.V(0).Infof("retry nfs metadata invalidation stream for %s in %v: %v", s.exportRoot, waitTime, err) + + timer := time.NewTimer(waitTime) + select { + case <-ctx.Done(): + if !timer.Stop() { + <-timer.C + } + return + case <-timer.C: + } + if waitTime < util.RetryWaitTime { + waitTime += waitTime / 2 + } + } +} + +func (s *Server) followMetadataStream(ctx context.Context) error { + req := &filer_pb.SubscribeMetadataRequest{ + ClientName: "nfs", + PathPrefix: string(s.exportRoot), + ClientId: s.signature, + ClientEpoch: 1, + ClientSupportsBatching: true, + } + + return s.withInternalClient(true, func(client nfsFilerClient) error { + stream, err := client.SubscribeMetadata(ctx, req) + if err != nil { + return err + } + for { + resp, err := stream.Recv() + if err == io.EOF { + return nil + } + if err != nil { + return err + } + s.applyMetadataInvalidationResponse(resp) + } + }) +} + +func (s *Server) applyMetadataInvalidationResponse(resp *filer_pb.SubscribeMetadataResponse) { + if s == nil || s.chunkInvalidator == nil || resp == nil { + return + } + + uncached := make(map[string]struct{}) + apply := func(event *filer_pb.SubscribeMetadataResponse) { + for _, invalidation := range metadataInvalidationsForEvent(event) { + if invalidation.entry == nil || !pathVisibleFromExport(invalidation.path, s.exportRoot) { + continue + } + for _, chunk := range invalidation.entry.GetChunks() { + fileID := chunk.GetFileIdString() + if fileID == "" { + continue + } + if _, seen := uncached[fileID]; seen { + continue + } + uncached[fileID] = struct{}{} + s.chunkInvalidator.UnCache(fileID) + } + } + } + + apply(resp) + for _, event := range resp.Events { + apply(event) + } +} + +func metadataInvalidationsForEvent(resp *filer_pb.SubscribeMetadataResponse) []metadataInvalidation { + message := resp.GetEventNotification() + if message == nil { + return nil + } + + var invalidations []metadataInvalidation + if message.OldEntry != nil && message.NewEntry != nil { + oldPath := util.NewFullPath(resp.Directory, message.OldEntry.Name) + invalidations = append(invalidations, metadataInvalidation{path: oldPath, entry: message.OldEntry}) + + newDir := resp.Directory + if message.NewParentPath != "" { + newDir = message.NewParentPath + } + if message.OldEntry.Name != message.NewEntry.Name || resp.Directory != newDir { + newPath := util.NewFullPath(newDir, message.NewEntry.Name) + invalidations = append(invalidations, metadataInvalidation{path: newPath, entry: message.NewEntry}) + } + return invalidations + } + + if message.NewEntry != nil { + newDir := resp.Directory + if message.NewParentPath != "" { + newDir = message.NewParentPath + } + newPath := util.NewFullPath(newDir, message.NewEntry.Name) + invalidations = append(invalidations, metadataInvalidation{path: newPath, entry: message.NewEntry}) + } + + if message.OldEntry != nil { + oldPath := util.NewFullPath(resp.Directory, message.OldEntry.Name) + invalidations = append(invalidations, metadataInvalidation{path: oldPath, entry: message.OldEntry}) + } + + return invalidations +} diff --git a/weed/server/nfs/server.go b/weed/server/nfs/server.go new file mode 100644 index 000000000..8cc710275 --- /dev/null +++ b/weed/server/nfs/server.go @@ -0,0 +1,177 @@ +package nfs + +import ( + "context" + "errors" + "fmt" + "net" + + "github.com/seaweedfs/seaweedfs/weed/filer" + "github.com/seaweedfs/seaweedfs/weed/glog" + "github.com/seaweedfs/seaweedfs/weed/pb" + "github.com/seaweedfs/seaweedfs/weed/pb/filer_pb" + "github.com/seaweedfs/seaweedfs/weed/util" + "github.com/seaweedfs/seaweedfs/weed/wdclient" + gonfs "github.com/willscott/go-nfs" + "google.golang.org/grpc" + "google.golang.org/grpc/credentials/insecure" +) + +type Option struct { + Filer pb.ServerAddress + BindIp string + Port int + FilerRootPath string + ReadOnly bool + AllowedClients []string + VolumeServerAccess string + GrpcDialOption grpc.DialOption +} + +type Server struct { + option *Option + exportRoot util.FullPath + exportID uint32 + signature int32 + handleLimit int + clientAuthorizer *clientAuthorizer + sharedReaderCache *filer.ReaderCache + chunkInvalidator chunkInvalidator + filerClient *wdclient.FilerClient + newUploader func() (chunkUploader, error) + withFilerClient filerClientExecutor + withInternalClient internalClientExecutor +} + +func NewServer(option *Option) (*Server, error) { + if option == nil { + return nil, errors.New("nfs option is required") + } + if option.Port <= 0 { + return nil, fmt.Errorf("nfs port must be positive: %d", option.Port) + } + if option.FilerRootPath == "" { + option.FilerRootPath = "/" + } + if option.VolumeServerAccess == "" { + option.VolumeServerAccess = "direct" + } + if option.GrpcDialOption == nil { + option.GrpcDialOption = grpc.WithTransportCredentials(insecure.NewCredentials()) + } + clientAuthorizer, err := newClientAuthorizer(option.AllowedClients) + if err != nil { + return nil, err + } + var filerClient *wdclient.FilerClient + if option.VolumeServerAccess != "filerProxy" { + var opts *wdclient.FilerClientOption + if option.VolumeServerAccess == "publicUrl" { + opts = &wdclient.FilerClientOption{UrlPreference: wdclient.PreferPublicUrl} + } + filerClient = wdclient.NewFilerClient([]pb.ServerAddress{option.Filer}, option.GrpcDialOption, "", opts) + } + exportRoot := normalizeExportRoot(util.FullPath(option.FilerRootPath)) + signature := util.RandomInt32() + return &Server{ + option: option, + exportRoot: exportRoot, + exportID: exportIDForRoot(exportRoot), + signature: signature, + handleLimit: 1 << 20, + clientAuthorizer: clientAuthorizer, + filerClient: filerClient, + newUploader: newChunkUploader, + withFilerClient: newFilerClientExecutor(option, signature), + withInternalClient: newInternalClientExecutor(option, signature), + }, nil +} + +func (s *Server) Start() error { + listener, err := net.Listen("tcp", fmt.Sprintf("%s:%d", s.option.BindIp, s.option.Port)) + if err != nil { + return fmt.Errorf("listen nfs on %s:%d: %w", s.option.BindIp, s.option.Port, err) + } + + return s.serve(listener) +} + +func (s *Server) serve(listener net.Listener) error { + if s.filerClient != nil { + defer s.filerClient.Close() + } + if s.clientAuthorizer != nil && s.clientAuthorizer.enabled { + listener = &allowlistListener{ + Listener: listener, + authorizer: s.clientAuthorizer, + } + } + + handler, err := s.newHandler() + if err != nil { + _ = listener.Close() + return err + } + followCtx, followCancel := context.WithCancel(context.Background()) + defer followCancel() + followDone := make(chan struct{}) + go func() { + defer close(followDone) + s.runMetadataInvalidationLoop(followCtx) + }() + defer func() { + followCancel() + <-followDone + }() + + glog.V(0).Infof("Start Seaweed NFS Server filer=%s bind=%s export=%s exportId=%d readOnly=%t allowedClients=%d volumeServerAccess=%s", + s.option.Filer, + listener.Addr(), + s.exportRoot, + s.exportID, + s.option.ReadOnly, + len(s.option.AllowedClients), + s.option.VolumeServerAccess, + ) + + return gonfs.Serve(listener, handler) +} + +func (s *Server) newHandler() (*Handler, error) { + if s == nil { + return nil, errors.New("nfs server is not configured") + } + rootFS := newSeaweedFileSystem(s, s.exportRoot, s.sharedReaderCache) + if s.sharedReaderCache == nil { + s.sharedReaderCache = rootFS.readerCache + } + if s.chunkInvalidator == nil { + s.chunkInvalidator = s.sharedReaderCache + } + return &Handler{ + server: s, + rootFS: rootFS, + }, nil +} + +func (s *Server) WithFilerClient(streamingMode bool, fn func(filer_pb.SeaweedFilerClient) error) error { + if s == nil || s.withFilerClient == nil { + return errors.New("nfs filer client is not configured") + } + return s.withFilerClient(streamingMode, fn) +} + +func (s *Server) LookupFn() wdclient.LookupFileIdFunctionType { + if s == nil { + return nil + } + if s.option != nil && s.option.VolumeServerAccess == "filerProxy" { + return func(ctx context.Context, fileID string) ([]string, error) { + return []string{fmt.Sprintf("http://%s/?proxyChunkId=%s", s.option.Filer.ToHttpAddress(), fileID)}, nil + } + } + if s.filerClient != nil { + return s.filerClient.GetLookupFileIdFunction() + } + return nil +} diff --git a/weed/server/nfs/server_test.go b/weed/server/nfs/server_test.go new file mode 100644 index 000000000..1e356172f --- /dev/null +++ b/weed/server/nfs/server_test.go @@ -0,0 +1,1014 @@ +package nfs + +import ( + "bytes" + "context" + "errors" + "io" + "net" + "net/http" + "net/http/httptest" + "os" + "strings" + "testing" + "time" + + billy "github.com/go-git/go-billy/v5" + "github.com/seaweedfs/seaweedfs/weed/filer" + "github.com/seaweedfs/seaweedfs/weed/operation" + "github.com/seaweedfs/seaweedfs/weed/pb" + "github.com/seaweedfs/seaweedfs/weed/pb/filer_pb" + "github.com/seaweedfs/seaweedfs/weed/util" + util_http "github.com/seaweedfs/seaweedfs/weed/util/http" + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" + gonfs "github.com/willscott/go-nfs" + gonfsfile "github.com/willscott/go-nfs/file" + "google.golang.org/grpc" + "google.golang.org/protobuf/proto" +) + +type fakeListEntriesClient struct { + responses []*filer_pb.ListEntriesResponse + index int +} + +type fakeSubscribeMetadataClient struct { + responses []*filer_pb.SubscribeMetadataResponse + index int + err error +} + +func (c *fakeListEntriesClient) Recv() (*filer_pb.ListEntriesResponse, error) { + if c.index >= len(c.responses) { + return nil, io.EOF + } + resp := c.responses[c.index] + c.index++ + return resp, nil +} + +func (c *fakeSubscribeMetadataClient) Recv() (*filer_pb.SubscribeMetadataResponse, error) { + if c.err != nil { + return nil, c.err + } + if c.index >= len(c.responses) { + return nil, io.EOF + } + resp := c.responses[c.index] + c.index++ + return resp, nil +} + +type fakeNFSFilerClient struct { + kv map[string][]byte + entries map[util.FullPath]*filer_pb.Entry + updateResult map[util.FullPath]*filer_pb.Entry + statistics *filer_pb.StatisticsResponse + creates []*filer_pb.CreateEntryRequest + updates []*filer_pb.UpdateEntryRequest + deletes []*filer_pb.DeleteEntryRequest + renames []*filer_pb.AtomicRenameEntryRequest + subscribeRequests []*filer_pb.SubscribeMetadataRequest + subscribeResponses []*filer_pb.SubscribeMetadataResponse + subscribeErr error + nextInode uint64 +} + +type fakeChunkUploadCall struct { + assignRequest *filer_pb.AssignVolumeRequest + uploadOption *operation.UploadOption + uploadURL string + data []byte +} + +type fakeChunkUploader struct { + fileID string + result *operation.UploadResult + err error + calls []fakeChunkUploadCall +} + +type recordingChunkInvalidator struct { + fileIDs []string +} + +type fakeRemoteConn struct { + remote net.Addr +} + +func (c *fakeRemoteConn) Read(_ []byte) (int, error) { return 0, io.EOF } +func (c *fakeRemoteConn) Write(p []byte) (int, error) { return len(p), nil } +func (c *fakeRemoteConn) Close() error { return nil } +func (c *fakeRemoteConn) LocalAddr() net.Addr { return &net.TCPAddr{} } +func (c *fakeRemoteConn) RemoteAddr() net.Addr { return c.remote } +func (c *fakeRemoteConn) SetDeadline(time.Time) error { return nil } +func (c *fakeRemoteConn) SetReadDeadline(time.Time) error { return nil } +func (c *fakeRemoteConn) SetWriteDeadline(time.Time) error { return nil } + +func (i *recordingChunkInvalidator) UnCache(fileID string) { + i.fileIDs = append(i.fileIDs, fileID) +} + +func (f *fakeNFSFilerClient) KvGet(_ context.Context, in *filer_pb.KvGetRequest, _ ...grpc.CallOption) (*filer_pb.KvGetResponse, error) { + if value, found := f.kv[string(in.Key)]; found { + return &filer_pb.KvGetResponse{Value: value}, nil + } + return &filer_pb.KvGetResponse{}, nil +} + +func (f *fakeNFSFilerClient) LookupDirectoryEntry(_ context.Context, in *filer_pb.LookupDirectoryEntryRequest, _ ...grpc.CallOption) (*filer_pb.LookupDirectoryEntryResponse, error) { + fullPath := util.NewFullPath(in.Directory, in.Name) + if entry := f.materializeEntry(fullPath); entry != nil { + return &filer_pb.LookupDirectoryEntryResponse{Entry: entry}, nil + } + return nil, filer_pb.ErrNotFound +} + +func (f *fakeNFSFilerClient) ListEntries(_ context.Context, in *filer_pb.ListEntriesRequest, _ ...grpc.CallOption) (nfsListEntriesClient, error) { + requestedDir := util.FullPath(in.Directory) + var entries []*filer_pb.Entry + for fullPath, entry := range f.entries { + dir, _ := fullPath.DirAndName() + if util.FullPath(dir) != requestedDir { + continue + } + if materialized := f.materializeEntry(fullPath); materialized != nil { + entries = append(entries, materialized) + } else { + entries = append(entries, cloneEntry(entry)) + } + } + responses := make([]*filer_pb.ListEntriesResponse, 0, len(entries)) + for _, entry := range entries { + responses = append(responses, &filer_pb.ListEntriesResponse{Entry: entry}) + } + return &fakeListEntriesClient{responses: responses}, nil +} + +func (f *fakeNFSFilerClient) SubscribeMetadata(_ context.Context, in *filer_pb.SubscribeMetadataRequest, _ ...grpc.CallOption) (nfsSubscribeMetadataClient, error) { + f.subscribeRequests = append(f.subscribeRequests, proto.Clone(in).(*filer_pb.SubscribeMetadataRequest)) + return &fakeSubscribeMetadataClient{ + responses: f.subscribeResponses, + err: f.subscribeErr, + }, nil +} + +func (f *fakeNFSFilerClient) CreateEntry(_ context.Context, in *filer_pb.CreateEntryRequest, _ ...grpc.CallOption) (*filer_pb.CreateEntryResponse, error) { + f.creates = append(f.creates, in) + + fullPath := util.NewFullPath(in.Directory, in.Entry.Name) + if _, found := f.entries[fullPath]; found { + return &filer_pb.CreateEntryResponse{ + Error: "entry already exists", + ErrorCode: filer_pb.FilerError_ENTRY_ALREADY_EXISTS, + }, nil + } + + entry := cloneEntry(in.Entry) + storedEntry := f.persistEntry(fullPath, entry, false) + return &filer_pb.CreateEntryResponse{ + MetadataEvent: &filer_pb.SubscribeMetadataResponse{ + EventNotification: &filer_pb.EventNotification{ + NewEntry: cloneEntry(storedEntry), + }, + }, + }, nil +} + +func (f *fakeNFSFilerClient) UpdateEntry(_ context.Context, in *filer_pb.UpdateEntryRequest, _ ...grpc.CallOption) (*filer_pb.UpdateEntryResponse, error) { + f.updates = append(f.updates, in) + + fullPath := util.NewFullPath(in.Directory, in.Entry.Name) + updatedEntry := f.updateResult[fullPath] + if updatedEntry == nil { + updatedEntry = cloneEntry(in.Entry) + } + storedEntry := f.persistEntry(fullPath, updatedEntry, false) + + return &filer_pb.UpdateEntryResponse{ + MetadataEvent: &filer_pb.SubscribeMetadataResponse{ + EventNotification: &filer_pb.EventNotification{ + NewEntry: cloneEntry(storedEntry), + }, + }, + }, nil +} + +func (f *fakeNFSFilerClient) DeleteEntry(_ context.Context, in *filer_pb.DeleteEntryRequest, _ ...grpc.CallOption) (*filer_pb.DeleteEntryResponse, error) { + f.deletes = append(f.deletes, in) + + fullPath := util.NewFullPath(in.Directory, in.Name) + entry, found := f.entries[fullPath] + if !found { + return &filer_pb.DeleteEntryResponse{Error: filer_pb.ErrNotFound.Error()}, nil + } + + if len(entry.GetHardLinkId()) > 0 { + f.decrementHardLink(entry.GetHardLinkId()) + } + if inode := entry.GetAttributes().GetInode(); inode != 0 { + f.removeInodeIndexPath(fullPath, inode) + } + delete(f.entries, fullPath) + return &filer_pb.DeleteEntryResponse{}, nil +} + +func (f *fakeNFSFilerClient) AtomicRenameEntry(_ context.Context, in *filer_pb.AtomicRenameEntryRequest, _ ...grpc.CallOption) (*filer_pb.AtomicRenameEntryResponse, error) { + f.renames = append(f.renames, in) + + oldPath := util.NewFullPath(in.OldDirectory, in.OldName) + entry, found := f.entries[oldPath] + if !found { + return nil, filer_pb.ErrNotFound + } + delete(f.entries, oldPath) + if inode := entry.GetAttributes().GetInode(); inode != 0 { + f.removeInodeIndexPath(oldPath, inode) + } + + newPath := util.NewFullPath(in.NewDirectory, in.NewName) + renamed := cloneEntry(entry) + renamed.Name = in.NewName + renamed = f.persistEntry(newPath, renamed, true) + + return &filer_pb.AtomicRenameEntryResponse{}, nil +} + +func (f *fakeNFSFilerClient) Statistics(_ context.Context, _ *filer_pb.StatisticsRequest, _ ...grpc.CallOption) (*filer_pb.StatisticsResponse, error) { + return f.statistics, nil +} + +func (f *fakeNFSFilerClient) persistEntry(fullPath util.FullPath, entry *filer_pb.Entry, preserveZeroInode bool) *filer_pb.Entry { + if f.entries == nil { + f.entries = make(map[util.FullPath]*filer_pb.Entry) + } + if f.kv == nil { + f.kv = make(map[string][]byte) + } + + cloned := cloneEntry(entry) + if cloned.Attributes == nil { + cloned.Attributes = &filer_pb.FuseAttributes{} + } + if !preserveZeroInode && cloned.Attributes.Inode == 0 { + cloned.Attributes.Inode = f.allocateInode() + } + cloned.Name = fullPath.Name() + f.entries[fullPath] = cloned + + if cloned.Attributes.Inode != 0 { + f.addInodeIndexPath(fullPath, cloned.Attributes.Inode) + } + if len(cloned.GetHardLinkId()) > 0 { + f.storeHardLinkBlob(fullPath, cloned) + } + return cloned +} + +func (f *fakeNFSFilerClient) materializeEntry(fullPath util.FullPath) *filer_pb.Entry { + entry, found := f.entries[fullPath] + if !found || entry == nil { + return nil + } + cloned := cloneEntry(entry) + if len(cloned.GetHardLinkId()) == 0 { + return cloned + } + + value, found := f.kv[string(cloned.GetHardLinkId())] + if !found { + return cloned + } + + dir, _ := fullPath.DirAndName() + fsEntry := filer.FromPbEntry(dir, cloned) + if err := fsEntry.DecodeAttributesAndChunks(value); err != nil { + return cloned + } + fsEntry.FullPath = fullPath + return fsEntry.ToProtoEntry() +} + +func (f *fakeNFSFilerClient) addInodeIndexPath(fullPath util.FullPath, inode uint64) { + if inode == 0 { + return + } + + record := &filer.InodeIndexRecord{Generation: filer.InodeIndexInitialGeneration} + if value, found := f.kv[string(filer.InodeIndexKey(inode))]; found { + if decoded, err := filer.DecodeInodeIndexRecord(value); err == nil { + record = decoded + } + } + record.Paths = append(record.Paths, string(fullPath)) + value, err := record.Encode() + if err == nil { + f.kv[string(filer.InodeIndexKey(inode))] = value + } +} + +func (f *fakeNFSFilerClient) removeInodeIndexPath(fullPath util.FullPath, inode uint64) { + if inode == 0 { + return + } + + key := string(filer.InodeIndexKey(inode)) + value, found := f.kv[key] + if !found { + return + } + record, err := filer.DecodeInodeIndexRecord(value) + if err != nil { + delete(f.kv, key) + return + } + var kept []string + for _, path := range record.Paths { + if util.FullPath(path) != fullPath { + kept = append(kept, path) + } + } + record.Paths = kept + if len(record.Paths) == 0 { + delete(f.kv, key) + return + } + value, err = record.Encode() + if err == nil { + f.kv[key] = value + } +} + +func (f *fakeNFSFilerClient) storeHardLinkBlob(fullPath util.FullPath, entry *filer_pb.Entry) { + dir, _ := fullPath.DirAndName() + fsEntry := filer.FromPbEntry(dir, cloneEntry(entry)) + fsEntry.FullPath = fullPath + value, err := fsEntry.EncodeAttributesAndChunks() + if err == nil { + f.kv[string(entry.GetHardLinkId())] = value + } +} + +func (f *fakeNFSFilerClient) decrementHardLink(hardLinkID []byte) { + value, found := f.kv[string(hardLinkID)] + if !found { + return + } + + fsEntry := &filer.Entry{} + if err := fsEntry.DecodeAttributesAndChunks(value); err != nil { + return + } + fsEntry.HardLinkCounter-- + if fsEntry.HardLinkCounter <= 0 { + delete(f.kv, string(hardLinkID)) + return + } + value, err := fsEntry.EncodeAttributesAndChunks() + if err == nil { + f.kv[string(hardLinkID)] = value + } +} + +func (f *fakeNFSFilerClient) allocateInode() uint64 { + if f.nextInode == 0 { + f.nextInode = 1000 + } + f.nextInode++ + return f.nextInode +} + +func (u *fakeChunkUploader) UploadWithRetry(_ filer_pb.FilerClient, assignRequest *filer_pb.AssignVolumeRequest, uploadOption *operation.UploadOption, genFileUrlFn func(host, fileId string) string, reader io.Reader) (string, *operation.UploadResult, error, []byte) { + data, err := io.ReadAll(reader) + if err != nil { + return "", nil, err, nil + } + + fileID := u.fileID + if fileID == "" { + fileID = "7,abc" + } + result := u.result + if result == nil { + result = &operation.UploadResult{ + Size: uint32(len(data)), + ContentMd5: "etag", + } + } + + var assignClone *filer_pb.AssignVolumeRequest + if assignRequest != nil { + assignClone, _ = proto.Clone(assignRequest).(*filer_pb.AssignVolumeRequest) + } + var optionClone *operation.UploadOption + if uploadOption != nil { + copied := *uploadOption + optionClone = &copied + } + + u.calls = append(u.calls, fakeChunkUploadCall{ + assignRequest: assignClone, + uploadOption: optionClone, + uploadURL: genFileUrlFn("volume.example:8080", fileID), + data: bytes.Clone(data), + }) + return fileID, result, u.err, data +} + +func cloneEntry(entry *filer_pb.Entry) *filer_pb.Entry { + if entry == nil { + return nil + } + cloned, _ := proto.Clone(entry).(*filer_pb.Entry) + return cloned +} + +func testEntry(name string, isDirectory bool, inode uint64, mode uint32, content []byte) *filer_pb.Entry { + return &filer_pb.Entry{ + Name: name, + IsDirectory: isDirectory, + Content: content, + Attributes: &filer_pb.FuseAttributes{ + Inode: inode, + FileMode: mode, + FileSize: uint64(len(content)), + }, + } +} + +func testIndexRecord(t *testing.T, inode uint64, generation uint64, path util.FullPath) []byte { + t.Helper() + record := &filer.InodeIndexRecord{ + Generation: generation, + Paths: []string{string(path)}, + } + value, err := record.Encode() + require.NoError(t, err) + return value +} + +func newTestServer(t *testing.T, exportRoot string, client *fakeNFSFilerClient) *Server { + t.Helper() + + server, err := NewServer(&Option{ + Filer: pb.ServerAddress("test-filer:8888"), + FilerRootPath: exportRoot, + Port: 2049, + }) + require.NoError(t, err) + + server.withInternalClient = func(_ bool, fn func(nfsFilerClient) error) error { + return fn(client) + } + server.withFilerClient = func(_ bool, fn func(filer_pb.SeaweedFilerClient) error) error { + return errors.New("test does not provide full filer client") + } + + return server +} + +func TestNewServerRejectsInvalidAllowedClientCIDR(t *testing.T) { + _, err := NewServer(&Option{ + FilerRootPath: "/exports", + Port: 2049, + AllowedClients: []string{"10.0.0.0/not-a-cidr"}, + }) + require.Error(t, err) +} + +func TestHandlerMountAndFileHandleRoundTrip(t *testing.T) { + client := &fakeNFSFilerClient{ + kv: map[string][]byte{ + string(filer.InodeIndexKey(101)): testIndexRecord(t, 101, 5, "/exports"), + string(filer.InodeIndexKey(202)): testIndexRecord(t, 202, 9, "/exports/demo.txt"), + }, + entries: map[util.FullPath]*filer_pb.Entry{ + "/exports": testEntry("exports", true, 101, uint32(0755), nil), + "/exports/demo.txt": testEntry("demo.txt", false, 202, uint32(0644), []byte("hello")), + }, + } + + server := newTestServer(t, "/exports", client) + handler, err := server.newHandler() + require.NoError(t, err) + + status, filesystem, authFlavors := handler.Mount(context.Background(), nil, gonfs.MountRequest{Dirpath: []byte("/exports")}) + require.Equal(t, gonfs.MountStatusOk, status) + require.NotNil(t, filesystem) + assert.Equal(t, []gonfs.AuthFlavor{gonfs.AuthFlavorNull, gonfs.AuthFlavorUnix}, authFlavors) + + handle := handler.ToHandle(filesystem, []string{"demo.txt"}) + require.NotEmpty(t, handle) + + resolvedFS, path, err := handler.FromHandle(handle) + require.NoError(t, err) + assert.Same(t, handler.rootFS, resolvedFS) + assert.Equal(t, []string{"demo.txt"}, path) +} + +func TestHandlerRejectsUnexpectedMountPath(t *testing.T) { + client := &fakeNFSFilerClient{ + entries: map[util.FullPath]*filer_pb.Entry{ + "/exports": testEntry("exports", true, 101, uint32(0755), nil), + }, + kv: map[string][]byte{ + string(filer.InodeIndexKey(101)): testIndexRecord(t, 101, 1, "/exports"), + }, + } + + server := newTestServer(t, "/exports", client) + handler, err := server.newHandler() + require.NoError(t, err) + + status, filesystem, _ := handler.Mount(context.Background(), nil, gonfs.MountRequest{Dirpath: []byte("/wrong")}) + assert.Equal(t, gonfs.MountStatusErrNoEnt, status) + assert.Nil(t, filesystem) +} + +func TestHandlerRejectsMountFromUnauthorizedClient(t *testing.T) { + client := &fakeNFSFilerClient{ + entries: map[util.FullPath]*filer_pb.Entry{ + "/exports": testEntry("exports", true, 101, uint32(0755), nil), + }, + kv: map[string][]byte{ + string(filer.InodeIndexKey(101)): testIndexRecord(t, 101, 1, "/exports"), + }, + } + + server := newTestServer(t, "/exports", client) + server.option.AllowedClients = []string{"10.0.0.0/8"} + authorizer, err := newClientAuthorizer(server.option.AllowedClients) + require.NoError(t, err) + server.clientAuthorizer = authorizer + + handler, err := server.newHandler() + require.NoError(t, err) + + req := gonfs.MountRequest{Dirpath: []byte("/exports")} + + deniedConn := &fakeRemoteConn{remote: &net.TCPAddr{IP: net.ParseIP("127.0.0.1"), Port: 12345}} + status, filesystem, _ := handler.Mount(context.Background(), deniedConn, req) + assert.Equal(t, gonfs.MountStatusErrAcces, status) + assert.Nil(t, filesystem) + + allowedConn := &fakeRemoteConn{remote: &net.TCPAddr{IP: net.ParseIP("10.2.3.4"), Port: 12345}} + status, filesystem, _ = handler.Mount(context.Background(), allowedConn, req) + assert.Equal(t, gonfs.MountStatusOk, status) + assert.NotNil(t, filesystem) +} + +func TestSeaweedFileSystemReadOnlyDisablesMutations(t *testing.T) { + client := &fakeNFSFilerClient{ + kv: map[string][]byte{ + string(filer.InodeIndexKey(101)): testIndexRecord(t, 101, 1, "/exports"), + }, + entries: map[util.FullPath]*filer_pb.Entry{ + "/exports": testEntry("exports", true, 101, uint32(0755), nil), + }, + } + + server := newTestServer(t, "/exports", client) + server.option.ReadOnly = true + + handler, err := server.newHandler() + require.NoError(t, err) + + assert.False(t, billy.CapabilityCheck(handler.rootFS, billy.WriteCapability)) + assert.False(t, billy.CapabilityCheck(handler.rootFS, billy.TruncateCapability)) + assert.Nil(t, handler.Change(handler.rootFS)) + + _, err = handler.rootFS.OpenFile("/new.txt", os.O_CREATE|os.O_RDWR, 0o644) + require.ErrorIs(t, err, billy.ErrReadOnly) + + err = handler.rootFS.MkdirAll("/docs", 0o755) + require.ErrorIs(t, err, billy.ErrReadOnly) +} + +func TestSeaweedFileSystemStatAndOpenFollowSymlinks(t *testing.T) { + client := &fakeNFSFilerClient{ + kv: map[string][]byte{ + string(filer.InodeIndexKey(101)): testIndexRecord(t, 101, 1, "/exports"), + string(filer.InodeIndexKey(202)): testIndexRecord(t, 202, 2, "/exports/target.txt"), + string(filer.InodeIndexKey(303)): testIndexRecord(t, 303, 3, "/exports/link.txt"), + }, + entries: map[util.FullPath]*filer_pb.Entry{ + "/exports": testEntry("exports", true, 101, uint32(0755), nil), + "/exports/target.txt": testEntry("target.txt", false, 202, uint32(0644), []byte("hello")), + "/exports/link.txt": { + Name: "link.txt", + Attributes: &filer_pb.FuseAttributes{ + Inode: 303, + FileMode: uint32(0o777), + SymlinkTarget: "target.txt", + }, + }, + }, + } + + server := newTestServer(t, "/exports", client) + handler, err := server.newHandler() + require.NoError(t, err) + + linkInfo, err := handler.rootFS.Lstat("/link.txt") + require.NoError(t, err) + assert.NotZero(t, linkInfo.Mode()&os.ModeSymlink) + + targetInfo, err := handler.rootFS.Stat("/link.txt") + require.NoError(t, err) + assert.Zero(t, targetInfo.Mode()&os.ModeSymlink) + assert.Equal(t, int64(5), targetInfo.Size()) + + file, err := handler.rootFS.Open("/link.txt") + require.NoError(t, err) + defer file.Close() + + data, err := io.ReadAll(file) + require.NoError(t, err) + assert.Equal(t, "hello", string(data)) +} + +func TestSeaweedFileSystemAppendModeAndUnsupportedLocks(t *testing.T) { + client := &fakeNFSFilerClient{ + kv: map[string][]byte{ + string(filer.InodeIndexKey(101)): testIndexRecord(t, 101, 1, "/exports"), + string(filer.InodeIndexKey(202)): testIndexRecord(t, 202, 2, "/exports/demo.txt"), + }, + entries: map[util.FullPath]*filer_pb.Entry{ + "/exports": testEntry("exports", true, 101, uint32(0755), nil), + "/exports/demo.txt": testEntry("demo.txt", false, 202, uint32(0644), []byte("hello")), + }, + } + + server := newTestServer(t, "/exports", client) + handler, err := server.newHandler() + require.NoError(t, err) + + assert.False(t, billy.CapabilityCheck(handler.rootFS, billy.LockCapability)) + + file, err := handler.rootFS.OpenFile("/demo.txt", os.O_WRONLY|os.O_APPEND, 0) + require.NoError(t, err) + + // POSIX allows Seek on an O_APPEND fd — it only restricts Write. A + // Seek to the beginning should succeed, but the subsequent Write must + // still land at the end of file. + newOffset, err := file.Seek(0, io.SeekStart) + require.NoError(t, err) + require.Equal(t, int64(0), newOffset) + require.ErrorIs(t, file.Lock(), billy.ErrNotSupported) + require.ErrorIs(t, file.Unlock(), billy.ErrNotSupported) + + _, err = file.Write([]byte("!")) + require.NoError(t, err) + require.NoError(t, file.Close()) + + updated, err := handler.rootFS.Open("/demo.txt") + require.NoError(t, err) + defer updated.Close() + + data, err := io.ReadAll(updated) + require.NoError(t, err) + assert.Equal(t, "hello!", string(data)) +} + +func TestServerApplyMetadataInvalidationResponseUncachesExportChunks(t *testing.T) { + invalidator := &recordingChunkInvalidator{} + server := &Server{ + exportRoot: "/exports", + chunkInvalidator: invalidator, + } + + server.applyMetadataInvalidationResponse(&filer_pb.SubscribeMetadataResponse{ + Directory: "/exports", + EventNotification: &filer_pb.EventNotification{ + OldEntry: &filer_pb.Entry{ + Name: "old.txt", + Chunks: []*filer_pb.FileChunk{ + {FileId: "1,old"}, + }, + }, + NewEntry: &filer_pb.Entry{ + Name: "new.txt", + Chunks: []*filer_pb.FileChunk{ + {FileId: "2,new"}, + {FileId: "1,old"}, + }, + }, + NewParentPath: "/exports/renamed", + }, + Events: []*filer_pb.SubscribeMetadataResponse{ + { + Directory: "/outside", + EventNotification: &filer_pb.EventNotification{ + NewEntry: &filer_pb.Entry{ + Name: "skip.txt", + Chunks: []*filer_pb.FileChunk{ + {FileId: "9,skip"}, + }, + }, + }, + }, + { + Directory: "/exports", + EventNotification: &filer_pb.EventNotification{ + NewEntry: &filer_pb.Entry{ + Name: "nested.txt", + Chunks: []*filer_pb.FileChunk{ + {FileId: "3,nested"}, + }, + }, + }, + }, + }, + }) + + assert.Equal(t, []string{"1,old", "2,new", "3,nested"}, invalidator.fileIDs) +} + +func TestServerFollowMetadataStreamSubscribesAndInvalidates(t *testing.T) { + client := &fakeNFSFilerClient{ + subscribeResponses: []*filer_pb.SubscribeMetadataResponse{ + { + Directory: "/exports", + EventNotification: &filer_pb.EventNotification{ + NewEntry: &filer_pb.Entry{ + Name: "demo.txt", + Chunks: []*filer_pb.FileChunk{ + {FileId: "7,abc"}, + }, + }, + }, + }, + }, + } + invalidator := &recordingChunkInvalidator{} + + server := newTestServer(t, "/exports", client) + server.chunkInvalidator = invalidator + + err := server.followMetadataStream(context.Background()) + require.NoError(t, err) + require.Len(t, client.subscribeRequests, 1) + assert.Equal(t, "/exports", client.subscribeRequests[0].GetPathPrefix()) + assert.Equal(t, "nfs", client.subscribeRequests[0].GetClientName()) + assert.Equal(t, []string{"7,abc"}, invalidator.fileIDs) +} + +func TestSeaweedFileSystemBackfillsLegacyInodeOnStat(t *testing.T) { + client := &fakeNFSFilerClient{ + kv: map[string][]byte{ + string(filer.InodeIndexKey(101)): testIndexRecord(t, 101, 1, "/exports"), + string(filer.InodeIndexKey(303)): testIndexRecord(t, 303, 7, "/exports/legacy.txt"), + }, + entries: map[util.FullPath]*filer_pb.Entry{ + "/exports": testEntry("exports", true, 101, uint32(0755), nil), + "/exports/legacy.txt": testEntry("legacy.txt", false, 0, uint32(0644), []byte("abc")), + }, + updateResult: map[util.FullPath]*filer_pb.Entry{ + "/exports/legacy.txt": testEntry("legacy.txt", false, 303, uint32(0644), []byte("abc")), + }, + } + + server := newTestServer(t, "/exports", client) + handler, err := server.newHandler() + require.NoError(t, err) + + info, err := handler.rootFS.Lstat("/legacy.txt") + require.NoError(t, err) + require.Len(t, client.updates, 1) + assert.Equal(t, int64(3), info.Size()) + + nfsInfo, ok := info.Sys().(*gonfsfile.FileInfo) + require.True(t, ok) + assert.Equal(t, uint64(303), nfsInfo.Fileid) +} + +func TestSeaweedFileSystemReadsInlineContent(t *testing.T) { + client := &fakeNFSFilerClient{ + kv: map[string][]byte{ + string(filer.InodeIndexKey(101)): testIndexRecord(t, 101, 1, "/exports"), + string(filer.InodeIndexKey(202)): testIndexRecord(t, 202, 3, "/exports/demo.txt"), + }, + entries: map[util.FullPath]*filer_pb.Entry{ + "/exports": testEntry("exports", true, 101, uint32(0755), nil), + "/exports/demo.txt": testEntry("demo.txt", false, 202, uint32(0644), []byte("hello")), + }, + } + + server := newTestServer(t, "/exports", client) + handler, err := server.newHandler() + require.NoError(t, err) + + file, err := handler.rootFS.Open("/demo.txt") + require.NoError(t, err) + defer file.Close() + + buf := make([]byte, 5) + n, err := file.Read(buf) + require.NoError(t, err) + assert.Equal(t, 5, n) + assert.Equal(t, "hello", string(buf)) +} + +func TestSeaweedFileSystemReadsChunkThroughFilerProxy(t *testing.T) { + initIntegrationHTTPClient.Do(util_http.InitGlobalHttpClient) + + payload := []byte("hello via filer proxy") + proxyRequests := 0 + proxyServer := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + proxyRequests++ + assert.Equal(t, "7,proxy", r.URL.Query().Get("proxyChunkId")) + _, _ = w.Write(payload) + })) + defer proxyServer.Close() + + client := &fakeNFSFilerClient{ + kv: map[string][]byte{ + string(filer.InodeIndexKey(101)): testIndexRecord(t, 101, 1, "/exports"), + string(filer.InodeIndexKey(202)): testIndexRecord(t, 202, 3, "/exports/proxy.txt"), + }, + entries: map[util.FullPath]*filer_pb.Entry{ + "/exports": { + Name: "exports", + IsDirectory: true, + Attributes: &filer_pb.FuseAttributes{ + Inode: 101, + FileMode: uint32(0755), + }, + }, + "/exports/proxy.txt": { + Name: "proxy.txt", + Chunks: []*filer_pb.FileChunk{ + {FileId: "7,proxy", Size: uint64(len(payload))}, + }, + Attributes: &filer_pb.FuseAttributes{ + Inode: 202, + FileMode: uint32(0644), + FileSize: uint64(len(payload)), + }, + }, + }, + } + + server := newTestServer(t, "/exports", client) + server.option.VolumeServerAccess = "filerProxy" + server.option.Filer = pb.ServerAddress(strings.TrimPrefix(proxyServer.URL, "http://")) + + handler, err := server.newHandler() + require.NoError(t, err) + + file, err := handler.rootFS.Open("/proxy.txt") + require.NoError(t, err) + defer file.Close() + + data, err := io.ReadAll(file) + require.NoError(t, err) + assert.Equal(t, payload, data) + assert.Equal(t, 1, proxyRequests) +} + +func TestSeaweedFileSystemReadDirAndFSStat(t *testing.T) { + client := &fakeNFSFilerClient{ + kv: map[string][]byte{ + string(filer.InodeIndexKey(101)): testIndexRecord(t, 101, 1, "/exports"), + string(filer.InodeIndexKey(202)): testIndexRecord(t, 202, 2, "/exports/b.txt"), + string(filer.InodeIndexKey(303)): testIndexRecord(t, 303, 3, "/exports/a.txt"), + }, + entries: map[util.FullPath]*filer_pb.Entry{ + "/exports": testEntry("exports", true, 101, uint32(0755), nil), + "/exports/b.txt": testEntry("b.txt", false, 202, uint32(0644), []byte("b")), + "/exports/a.txt": testEntry("a.txt", false, 303, uint32(0644), []byte("aa")), + }, + statistics: &filer_pb.StatisticsResponse{ + TotalSize: 100, + UsedSize: 40, + FileCount: 3, + }, + } + + server := newTestServer(t, "/exports", client) + handler, err := server.newHandler() + require.NoError(t, err) + + entries, err := handler.rootFS.ReadDir("/") + require.NoError(t, err) + require.Len(t, entries, 2) + assert.Equal(t, "a.txt", entries[0].Name()) + assert.Equal(t, "b.txt", entries[1].Name()) + + var stat gonfs.FSStat + err = handler.FSStat(context.Background(), handler.rootFS, &stat) + require.NoError(t, err) + assert.Equal(t, uint64(100), stat.TotalSize) + assert.Equal(t, uint64(60), stat.FreeSize) + assert.Equal(t, uint64(60), stat.AvailableSize) + assert.Equal(t, uint64(3), stat.TotalFiles) +} + +func TestSeaweedFileSystemSupportsNamespaceMutations(t *testing.T) { + client := &fakeNFSFilerClient{ + kv: map[string][]byte{ + string(filer.InodeIndexKey(101)): testIndexRecord(t, 101, 1, "/exports"), + }, + entries: map[util.FullPath]*filer_pb.Entry{ + "/exports": testEntry("exports", true, 101, uint32(0755), nil), + }, + } + + server := newTestServer(t, "/exports", client) + handler, err := server.newHandler() + require.NoError(t, err) + + err = handler.rootFS.MkdirAll("/docs", 0o755) + require.NoError(t, err) + + file, err := handler.rootFS.Create("/docs/note.txt") + require.NoError(t, err) + _, err = file.Write([]byte("hello")) + require.NoError(t, err) + require.NoError(t, file.Close()) + + err = handler.rootFS.Chmod("/docs/note.txt", 0o600) + require.NoError(t, err) + + err = handler.rootFS.Rename("/docs/note.txt", "/docs/final.txt") + require.NoError(t, err) + + truncateFile, err := handler.rootFS.OpenFile("/docs/final.txt", os.O_WRONLY|os.O_EXCL, 0) + require.NoError(t, err) + require.NoError(t, truncateFile.Truncate(2)) + require.NoError(t, truncateFile.Close()) + + readFile, err := handler.rootFS.Open("/docs/final.txt") + require.NoError(t, err) + defer readFile.Close() + + buf := make([]byte, 2) + n, err := readFile.Read(buf) + require.NoError(t, err) + assert.Equal(t, 2, n) + assert.Equal(t, "he", string(buf)) + + info, err := handler.rootFS.Stat("/docs/final.txt") + require.NoError(t, err) + assert.Equal(t, os.FileMode(0o600), info.Mode().Perm()) + assert.Equal(t, int64(2), info.Size()) + + err = handler.rootFS.Remove("/docs/final.txt") + require.NoError(t, err) + _, err = handler.rootFS.Stat("/docs/final.txt") + require.ErrorIs(t, err, os.ErrNotExist) + + require.Len(t, client.creates, 2) + require.Len(t, client.updates, 3) + require.Len(t, client.renames, 1) + require.Len(t, client.deletes, 1) +} + +func TestSeaweedFileSystemUploadsLargeWritesAsChunks(t *testing.T) { + client := &fakeNFSFilerClient{ + kv: map[string][]byte{ + string(filer.InodeIndexKey(101)): testIndexRecord(t, 101, 1, "/exports"), + }, + entries: map[util.FullPath]*filer_pb.Entry{ + "/exports": testEntry("exports", true, 101, uint32(0755), nil), + }, + } + uploader := &fakeChunkUploader{fileID: "9,xyz"} + + server := newTestServer(t, "/exports", client) + server.option.VolumeServerAccess = "filerProxy" + server.newUploader = func() (chunkUploader, error) { + return uploader, nil + } + + handler, err := server.newHandler() + require.NoError(t, err) + + require.NoError(t, handler.rootFS.MkdirAll("/docs", 0o755)) + + file, err := handler.rootFS.Create("/docs/big.bin") + require.NoError(t, err) + + payload := bytes.Repeat([]byte("a"), maxInlineWriteSize+1) + n, err := file.Write(payload) + require.NoError(t, err) + require.Equal(t, len(payload), n) + require.NoError(t, file.Close()) + + require.Len(t, uploader.calls, 1) + call := uploader.calls[0] + require.NotNil(t, call.assignRequest) + require.NotNil(t, call.uploadOption) + assert.Equal(t, "/exports/docs/big.bin", call.assignRequest.Path) + assert.Equal(t, "big.bin", call.uploadOption.Filename) + assert.Equal(t, "http://test-filer:8888/?proxyChunkId=9,xyz", call.uploadURL) + assert.Equal(t, payload, call.data) + + entry := client.entries["/exports/docs/big.bin"] + require.NotNil(t, entry) + require.Len(t, entry.GetChunks(), 1) + assert.Nil(t, entry.Content) + assert.Equal(t, uint64(len(payload)), entry.GetAttributes().GetFileSize()) + assert.Equal(t, "9,xyz", entry.GetChunks()[0].GetFileId()) +} diff --git a/weed/server/nfs/uploader.go b/weed/server/nfs/uploader.go new file mode 100644 index 000000000..7d309499e --- /dev/null +++ b/weed/server/nfs/uploader.go @@ -0,0 +1,40 @@ +package nfs + +import ( + "io" + + "github.com/seaweedfs/seaweedfs/weed/operation" + "github.com/seaweedfs/seaweedfs/weed/pb/filer_pb" +) + +type chunkUploader interface { + UploadWithRetry( + filerClient filer_pb.FilerClient, + assignRequest *filer_pb.AssignVolumeRequest, + uploadOption *operation.UploadOption, + genFileUrlFn func(host, fileId string) string, + reader io.Reader, + ) (fileId string, uploadResult *operation.UploadResult, err error, data []byte) +} + +type operationChunkUploader struct { + uploader *operation.Uploader +} + +func (u operationChunkUploader) UploadWithRetry( + filerClient filer_pb.FilerClient, + assignRequest *filer_pb.AssignVolumeRequest, + uploadOption *operation.UploadOption, + genFileUrlFn func(host, fileId string) string, + reader io.Reader, +) (string, *operation.UploadResult, error, []byte) { + return u.uploader.UploadWithRetry(filerClient, assignRequest, uploadOption, genFileUrlFn, reader) +} + +func newChunkUploader() (chunkUploader, error) { + uploader, err := operation.NewUploader() + if err != nil { + return nil, err + } + return operationChunkUploader{uploader: uploader}, nil +} diff --git a/weed/server/webdav_server.go b/weed/server/webdav_server.go index d1f51152b..700441883 100644 --- a/weed/server/webdav_server.go +++ b/weed/server/webdav_server.go @@ -15,7 +15,6 @@ import ( "golang.org/x/net/webdav" "google.golang.org/grpc" - "github.com/seaweedfs/seaweedfs/weed/operation" "github.com/seaweedfs/seaweedfs/weed/pb" "github.com/seaweedfs/seaweedfs/weed/pb/filer_pb" "github.com/seaweedfs/seaweedfs/weed/util" @@ -404,43 +403,26 @@ func (fs *WebDavFileSystem) Stat(ctx context.Context, name string) (os.FileInfo, } func (f *WebDavFile) saveDataAsChunk(reader io.Reader, name string, offset int64, tsNs int64) (chunk *filer_pb.FileChunk, err error) { - uploader, uploaderErr := operation.NewUploader() - if uploaderErr != nil { - glog.V(0).Infof("upload data %v: %v", f.name, uploaderErr) - return nil, fmt.Errorf("upload data: %w", uploaderErr) + // Delegate to the shared filer-gateway helper so WebDAV, NFS, and + // any future filer-backed protocols go through one implementation of + // AssignVolume + volume-server upload. + chunk, err = filer.SaveGatewayDataAsChunk(filer.GatewayChunkUploadRequest{ + FilerClient: f.fs, + Reader: reader, + FullPath: name, + Filename: f.name, + Offset: offset, + TsNs: tsNs, + Collection: f.fs.option.Collection, + Replication: f.fs.option.Replication, + DiskType: f.fs.option.DiskType, + Cipher: f.fs.option.Cipher, + }) + if err != nil { + glog.V(0).Infof("upload data %v: %v", f.name, err) + return nil, err } - - fileId, uploadResult, flushErr, _ := uploader.UploadWithRetry( - f.fs, - &filer_pb.AssignVolumeRequest{ - Count: 1, - Replication: f.fs.option.Replication, - Collection: f.fs.option.Collection, - DiskType: f.fs.option.DiskType, - Path: name, - }, - &operation.UploadOption{ - Filename: f.name, - Cipher: f.fs.option.Cipher, - IsInputCompressed: false, - MimeType: "", - PairMap: nil, - }, - func(host, fileId string) string { - return fmt.Sprintf("http://%s/%s", host, fileId) - }, - reader, - ) - - if flushErr != nil { - glog.V(0).Infof("upload data %v: %v", f.name, flushErr) - return nil, fmt.Errorf("upload data: %w", flushErr) - } - if uploadResult.Error != "" { - glog.V(0).Infof("upload failure %v: %v", f.name, flushErr) - return nil, fmt.Errorf("upload result: %v", uploadResult.Error) - } - return uploadResult.ToPbFileChunk(fileId, offset, tsNs), nil + return chunk, nil } func (f *WebDavFile) Write(buf []byte) (int, error) { diff --git a/weed/util/http/http_global_client_util.go b/weed/util/http/http_global_client_util.go index 1c8af2225..24e38f8c7 100644 --- a/weed/util/http/http_global_client_util.go +++ b/weed/util/http/http_global_client_util.go @@ -33,6 +33,28 @@ var ( loadJwtConfigOnce sync.Once ) +func AppendQueryParameter(rawURL, key, value string) string { + encoded := url.Values{key: []string{value}}.Encode() + fragment := "" + if fragmentIndex := strings.Index(rawURL, "#"); fragmentIndex >= 0 { + fragment = rawURL[fragmentIndex:] + rawURL = rawURL[:fragmentIndex] + } + + var result string + switch { + case strings.Contains(rawURL, "?"): + if strings.HasSuffix(rawURL, "?") || strings.HasSuffix(rawURL, "&") { + result = rawURL + encoded + } else { + result = rawURL + "&" + encoded + } + default: + result = rawURL + "?" + encoded + } + return result + fragment +} + func loadJwtConfig() { v := util.GetViper() jwtSigningReadKey = security.SigningKey(v.GetString("jwt.signing.read.key")) @@ -545,7 +567,7 @@ func RetriedFetchChunkData(ctx context.Context, buffer []byte, urlStrings []stri if strings.Contains(urlString, "%") { urlString = url.PathEscape(urlString) } - shouldRetry, err = ReadUrlAsStream(ctx, urlString+"?readDeleted=true", string(jwt), cipherKey, isGzipped, isFullChunk, offset, len(buffer), func(data []byte) { + shouldRetry, err = ReadUrlAsStream(ctx, AppendQueryParameter(urlString, "readDeleted", "true"), string(jwt), cipherKey, isGzipped, isFullChunk, offset, len(buffer), func(data []byte) { // Check for context cancellation during data processing select { case <-ctx.Done(): @@ -608,7 +630,7 @@ func retriedFetchChunkDataDirect(ctx context.Context, buffer []byte, urlStrings default: } - n, shouldRetry, err = readUrlDirectToBuffer(ctx, urlString+"?readDeleted=true", jwt, buffer) + n, shouldRetry, err = readUrlDirectToBuffer(ctx, AppendQueryParameter(urlString, "readDeleted", "true"), jwt, buffer) if err == nil { return n, nil } diff --git a/weed/util/http/http_global_client_util_test.go b/weed/util/http/http_global_client_util_test.go new file mode 100644 index 000000000..f24bd5aca --- /dev/null +++ b/weed/util/http/http_global_client_util_test.go @@ -0,0 +1,72 @@ +package http + +import "testing" + +func TestAppendQueryParameter(t *testing.T) { + testCases := []struct { + name string + rawURL string + key string + value string + expected string + }{ + { + name: "without existing query", + rawURL: "http://example.com/3,abc", + key: "readDeleted", + value: "true", + expected: "http://example.com/3,abc?readDeleted=true", + }, + { + name: "with existing query", + rawURL: "http://example.com/?proxyChunkId=3,abc", + key: "readDeleted", + value: "true", + expected: "http://example.com/?proxyChunkId=3,abc&readDeleted=true", + }, + { + name: "with trailing question mark", + rawURL: "http://example.com/?", + key: "readDeleted", + value: "true", + expected: "http://example.com/?readDeleted=true", + }, + { + name: "with trailing ampersand", + rawURL: "http://example.com/?proxyChunkId=3,abc&", + key: "readDeleted", + value: "true", + expected: "http://example.com/?proxyChunkId=3,abc&readDeleted=true", + }, + { + name: "encodes values", + rawURL: "http://example.com/data", + key: "note", + value: "space value", + expected: "http://example.com/data?note=space+value", + }, + { + name: "preserves fragment", + rawURL: "http://example.com/data#frag", + key: "readDeleted", + value: "true", + expected: "http://example.com/data?readDeleted=true#frag", + }, + { + name: "blank url", + rawURL: "", + key: "readDeleted", + value: "true", + expected: "?readDeleted=true", + }, + } + + for _, tc := range testCases { + t.Run(tc.name, func(t *testing.T) { + actual := AppendQueryParameter(tc.rawURL, tc.key, tc.value) + if actual != tc.expected { + t.Fatalf("expected %q, got %q", tc.expected, actual) + } + }) + } +}