Table of Contents
What can it do?
If you have two or more SeaweedFS clusters using filers, now you can asynchronously replicate changes to each other.
It can run either in Active-Active mode or Active-Passive mode.
Usage
From any computer that can access filer and volume servers in both SeaweedFS clusters, run one of these commands:
# synchronize each cluster
weed filer.sync -a <filer1_host>:<filer1_port> -b <filer2_host>:<filer2_port>
# synchronize only specific folders:
weed filer.sync -a <filer1_host>:<filer1_port> -b <filer2_host>:<filer2_port> -a.path /filer1/path1 -b.path /filer2/path2
# active-passive mode, replicate all changes in filer1 to filer2
weed filer.sync -a <filer1_host>:<filer1_port> -b <filer2_host>:<filer2_port> -isActivePassive
At the beginning, it will bootstrap from the beginning of time, or resume from the last replication checkpoint. Later, it will just run continuously and persist checkpoints periodically.
On Kubernetes with the SeaweedFS Operator
If you are running SeaweedFS on Kubernetes via the SeaweedFS Operator, you can run weed filer.sync as a sidecar inside the filer pod itself, so it shares the pod's lifecycle and dials the local filer over loopback. Each ComponentSpec exposes a sidecars field that accepts arbitrary v1.Container entries.
apiVersion: seaweed.seaweedfs.com/v1
kind: Seaweed
metadata:
name: prod
spec:
image: chrislusf/seaweedfs:latest
master:
replicas: 3
volume:
replicas: 3
requests:
storage: 100Gi
filer:
replicas: 2
sidecars:
- name: filer-sync
image: chrislusf/seaweedfs:latest
command:
- weed
- filer.sync
- -a=localhost:8888
- -b=remote-filer.dr.svc.cluster.local:8888
- -isActivePassive
The sidecar runs alongside every filer replica. For active-passive replication you typically want a single sync process — pin the sidecar to one replica with a separate single-replica filer, or run filer.sync as its own Deployment instead.
How it works?
Each filer has its own local change logs. weed filer.sync will read the logs and replay them in the other cluster.
weed filer.sync remembers each filer's "signature" and replication checkpoints. So you can stop weed filer.sync and start it later safely.
Also, the "signature" will ensure same change will only be applied once in one filer. Active-Active synchronization would not cause multiple ping-pong changes for one file update.
More clusters?
If there are 3 or more clusters, you can choose fully connected setup or chained setup, or any more complicated topology.
In a sense, you can mix and match to setup filer synchronization as you wish.
Fully Connected Setup?
It is tempting to create a fully connected network topology. E.g., run one weed filer.sync for each pair of clusters. The fully connected topology may seem to be able to provide redundancy in case of network failures.
cluster1 <-- filer.sync --> cluster2
cluster2 <-- filer.sync --> cluster3
cluster3 <-- filer.sync --> cluster1
However, this topology has a loop.
Every filer will leave a signature on each message. The filer.sync use the signatures to avoid processing the same message twice. But for any node within a loop, the same message can come from two difference neighbors. So this mechanism could not help to identify the duplicated the messages.
Because most metadata messages are idempotent, the network loop is not efficient but still works OK.
But for directory renaming, the execution order matters. So the loop should be avoided, or the directories will be inconsistent.
Chained Setup
cluster1 <-- filer.sync --> cluster2 <-- filer.sync --> cluster3
One-Master-Multiple-Slaves
With weed filer.sync -isActivePassive configured, nothing stops you from setting up multiple following clusters.
cluster1 -- filer.sync --> cluster2
cluster1 -- filer.sync --> cluster3
cluster1 -- filer.sync --> cluster4
Multiple-Master-Multiple-Slaves
This should also work, multiple active-active clusters, with chained following clusters.
cluster1 <-- filer.sync --> cluster2 -- filer.sync --> cluster3 -- filer.sync --> cluster4
|
+----- filer.sync --> cluster5
Different Sync strategy for different folders
Here the first folder is active-active synchronized on 2 clusters, but the /home/public on cluster1 is single directionally synchronized to cluster2
cluster1:/home/chris <-- filer.sync --> cluster2:/Users/chris
cluster1:/home/public -- filer.sync --> cluster2:/home/www/public
Filer Proxy
By default, filer.sync will upload files directly to volume servers. This is the most efficient way to avoid extra hops and distribute the network traffic.
However, it uses volume server IP addresses configured for the local cluster. filer.sync is usually cross network. These IPs may not be accessible to the filer.sync because of network configurations (for example cluster1 and cluster2 are on different hosting providers). In this case, it could be useful to use the filerProxy option to make filer.sync does all the transfers through the filer. In order to enable this option, -a.filerProxy or/and -b.filerProxy can be added to the weed filer.sync process starting command line.
Debug log
To see all detail of transfers executed by filer.sync, options -a.debug or/and -b.debug can be added to the weed filer.sync process starting command line.
Server-Side Encryption (SSE)
weed filer.sync copies encrypted chunk data and encryption metadata as-is between clusters — no decryption or re-encryption occurs. This means both clusters must share the same SSE key configuration:
- SSE-S3: Both filers must use the same KEK (configured via
WEED_S3_SSE_KEKorWEED_S3_SSE_KEYinsecurity.toml, or the same/etc/s3/sse_kekfile on the filer). - SSE-KMS: Both clusters must have access to the same KMS provider and keys.
- SSE-C: Works automatically since the encryption metadata is copied with the chunks.
Warning: If the destination cluster uses different SSE keys, replicated encrypted objects will be stored but cannot be decrypted. Reads will fail or return corrupt data.
Limitations
This should be fairly scalable. However, it is limited by network bandwidth and latency. So even though changes are received within milliseconds and replayed right away, there would be data discrepancies if a file is changed quickly in two distant data centers.
For large clusters, the rate of change may be so high that the replication can not catch up. You may want to only synchronize a specific folder to reduce the work load.
Introduction
- Quick Start with weed mini
- Simplest S3 Bucket and User Setup
- Components
- Getting Started
- Production Setup
- A typical step‐by‐step example
- Benchmarks
- FAQ
- Applications
API
Configuration
- Replication
- Store file with a Time To Live
- Failover Master Server
- Erasure coding for warm storage
- EC Bitrot Detection
- Server Startup via Systemd
- Environment Variables
Filer
- Filer Setup
- Directories and Files
- File Operations Quick Reference
- Data Structure for Large Files
- Filer Data Encryption
- Filer Commands and Operations
- Filer JWT Use
- TUS Resumable Uploads
Filer Stores
- Filer Cassandra Setup
- Filer Redis Setup
- Super Large Directories
- Path-Specific Filer Store
- Choosing a Filer Store
- Customize Filer Store
Management
Advanced Filer Configurations
- Migrate to Filer Store
- Add New Filer Store
- Filer Store Replication
- Filer Active Active cross cluster continuous synchronization
- Filer as a Key-Large-Value Store
- Path Specific Configuration
- Filer Change Data Capture
- Filer Operation Serialization
FUSE Mount
- FIO benchmark
- fstab and systemd mount
- POSIX Compliance
- Distributed POSIX Locks
- P2P reading in weed mount
WebDAV
SFTP Server
Cloud Drive
- Cloud Drive Benefits
- Cloud Drive Architecture
- Configure Remote Storage
- Mount Remote Storage
- Cache Remote Storage
- Cloud Drive Quick Setup
- Gateway to Remote Object Storage
AWS S3 API
- Amazon S3 API
- Supported APIs vs Minio
- S3 Lifecycle
- S3 Lifecycle vs Volume TTL
- S3 Conditional Operations
- S3 CORS
- S3 Object Lock and Retention
- S3 Object Versioning
- S3 API Benchmark
- S3 API FAQ
- S3 Bucket Quota
- S3 Rate Limiting
- S3 API Audit log
- S3 Nginx Proxy
- Docker Compose for S3
S3 Table Bucket
- S3 Table Bucket
- S3 Table Bucket Commands
- S3 Tables Security
- SeaweedFS Iceberg Catalog
- Iceberg Table Maintenance
Iceberg Integrations
- Spark Iceberg Integration
- Trino Iceberg Integration
- Dremio Iceberg Integration
- DuckDB Iceberg Integration
- Doris Iceberg Integration
- RisingWave Iceberg Integration
- Lakekeeper Iceberg Integration
S3 Authentication & IAM
- S3 Configuration - Start Here
- S3 Credentials (
-s3.config) - OIDC Integration (
-s3.iam.config) - Kubernetes ServiceAccount Authentication (IRSA-style)
- S3 Policy Variables
- S3 Policy Conditions
- S3 Bucket Policies
- Amazon IAM API
- AWS IAM CLI
- weed shell - Shell IAM Commands
Server-Side Encryption
S3 Client Tools
- AWS CLI with SeaweedFS
- s3cmd with SeaweedFS
- rclone with SeaweedFS
- restic with SeaweedFS
- nodejs with Seaweed S3
Machine Learning
HDFS
- Hadoop Compatible File System
- run Spark on SeaweedFS
- run HBase on SeaweedFS
- run Presto on SeaweedFS
- Hadoop Benchmark
- HDFS via S3 connector
Replication and Backup
- Async Replication to another Filer [Deprecated]
- Async Backup
- Async Filer Metadata Backup
- Async Replication to Cloud [Deprecated]
- Kubernetes Backups and Recovery with K8up
Metadata Change Events
Messaging
- Structured Data Lake with SMQ and SQL
- Seaweed Message Queue
- SQL Queries on Message Queue
- SQL Quick Reference
- PostgreSQL-compatible Server weed db
- Pub-Sub to SMQ to SQL
- Kafka to Kafka Gateway to SMQ to SQL
Use Cases
Operations
- System Metrics
- weed shell
- Data Backup
- Deployment to Kubernetes and Minikube
- Deployment with seaweed-up
Rust Volume Server
Advanced
- Large File Handling
- Optimization
- Optimization for Many Small Buckets
- Volume Management
- Tiered Storage
- Cloud Tier
- Cloud Monitoring
- Load Command Line Options from a file
- SRV Service Discovery
- Volume Files Structure
Security
- Security Overview
- Security Configuration
- Cryptography and FIPS Compliance
- Run Blob Storage on Public Internet