mirror of
https://github.com/seaweedfs/seaweedfs.git
synced 2026-06-13 23:36:45 +03:00
Created Worker (markdown)
+283
@@ -0,0 +1,283 @@
|
||||
# This is still work in progress!
|
||||
|
||||
# Weed Worker
|
||||
|
||||
The `weed worker` command starts a maintenance worker that connects to an admin server to process cluster maintenance tasks.
|
||||
|
||||
## Overview
|
||||
|
||||
Workers are distributed maintenance agents that connect to the admin server to process various maintenance tasks such as:
|
||||
- **Vacuum**: Reclaim disk space by removing deleted files
|
||||
- **Erasure Coding**: Convert volumes to erasure-coded format for storage efficiency
|
||||
- **Remote Upload**: Upload volumes to remote/cloud storage
|
||||
- **Replication**: Fix replication issues and maintain data consistency
|
||||
- **Balance**: Redistribute volumes across volume servers for load balancing
|
||||
|
||||
Workers automatically register with the admin server and receive tasks based on their capabilities and current load.
|
||||
|
||||
## Usage
|
||||
|
||||
```bash
|
||||
weed worker [options]
|
||||
```
|
||||
|
||||
## Options
|
||||
|
||||
| Option | Default | Description |
|
||||
|--------|---------|-------------|
|
||||
| `-admin` | localhost:23646 | Admin server address |
|
||||
| `-capabilities` | vacuum,erasure_coding,balance | Comma-separated list of task types this worker can handle |
|
||||
| `-maxConcurrent` | 2 | Maximum number of concurrent tasks |
|
||||
| `-heartbeat` | 30s | Heartbeat interval to admin server |
|
||||
| `-taskInterval` | 5s | Task request interval |
|
||||
|
||||
## Examples
|
||||
|
||||
### Basic Usage
|
||||
|
||||
```bash
|
||||
# Start worker connecting to local admin server
|
||||
weed worker -admin=localhost:23646
|
||||
|
||||
# Connect to remote admin server
|
||||
weed worker -admin=admin.example.com:23646
|
||||
|
||||
# Start worker with custom admin server and port
|
||||
weed worker -admin=192.168.1.100:8080
|
||||
```
|
||||
|
||||
### Capability Configuration
|
||||
|
||||
```bash
|
||||
# Worker that only handles vacuum tasks
|
||||
weed worker -admin=localhost:23646 -capabilities=vacuum
|
||||
|
||||
# Worker that handles vacuum and replication tasks
|
||||
weed worker -admin=localhost:23646 -capabilities=vacuum,replication
|
||||
|
||||
# Worker with all capabilities (default)
|
||||
weed worker -admin=localhost:23646 -capabilities=vacuum,ec,remote,replication,balance
|
||||
|
||||
# Worker using capability aliases
|
||||
weed worker -admin=localhost:23646 -capabilities=vacuum,ec,remote,replication
|
||||
```
|
||||
|
||||
### Performance Tuning
|
||||
|
||||
```bash
|
||||
# High-performance worker with more concurrent tasks
|
||||
weed worker -admin=localhost:23646 -maxConcurrent=8
|
||||
|
||||
# More frequent task requests for busy clusters
|
||||
weed worker -admin=localhost:23646 -taskInterval=2s
|
||||
|
||||
# Custom heartbeat interval
|
||||
weed worker -admin=localhost:23646 -heartbeat=10s
|
||||
```
|
||||
|
||||
## Task Capabilities
|
||||
|
||||
Workers can be configured to handle specific types of maintenance tasks:
|
||||
|
||||
### Available Task Types
|
||||
|
||||
| Capability | Description |
|
||||
|------------|-------------|
|
||||
| `vacuum` | Reclaim disk space by removing deleted files |
|
||||
| `erasure_coding` | Convert volumes to erasure-coded format |
|
||||
| `balance` | Redistribute volumes for load balancing |
|
||||
|
||||
## Worker Architecture
|
||||
|
||||
### Worker Lifecycle
|
||||
|
||||
1. **Registration**: Worker connects to admin server via gRPC
|
||||
2. **Capabilities**: Worker reports its capabilities to admin
|
||||
3. **Task Request**: Worker periodically requests tasks from admin
|
||||
4. **Task Execution**: Worker processes assigned tasks
|
||||
5. **Heartbeat**: Worker sends periodic heartbeats to admin
|
||||
6. **Graceful Shutdown**: Worker completes current tasks before stopping
|
||||
|
||||
### Connection Details
|
||||
|
||||
- **Protocol**: gRPC connection to admin server
|
||||
- **Port**: Admin HTTP port + 10000 (e.g., admin on 23646 → gRPC on 33646)
|
||||
- **Security**: Supports TLS using `[grpc.worker]` configuration
|
||||
- **Fallback**: Falls back to insecure connection if TLS unavailable
|
||||
|
||||
## Configuration
|
||||
|
||||
### Security Configuration
|
||||
|
||||
Workers read TLS configuration from `security.toml`:
|
||||
|
||||
```toml
|
||||
[grpc.worker]
|
||||
cert = "/etc/ssl/worker.crt"
|
||||
key = "/etc/ssl/worker.key"
|
||||
ca = "/etc/ssl/ca.crt"
|
||||
```
|
||||
|
||||
### Worker Identification
|
||||
|
||||
- **Worker ID**: Automatically generated unique identifier
|
||||
- **Address**: Worker's network address (auto-detected)
|
||||
- **Capabilities**: Reported task capabilities
|
||||
- **Status**: Current worker status (active, idle, busy)
|
||||
|
||||
## Task Processing
|
||||
|
||||
### Concurrent Task Handling
|
||||
|
||||
- **Max Concurrent**: Configurable via `-maxConcurrent` (default: 2)
|
||||
- **Task Queue**: Workers maintain internal task queues
|
||||
- **Load Balancing**: Admin distributes tasks based on worker load
|
||||
- **Task Completion**: Workers report task completion status
|
||||
|
||||
### Task Request Cycle
|
||||
|
||||
1. Worker requests tasks from admin server
|
||||
2. Admin assigns tasks based on worker capabilities and load
|
||||
3. Worker processes tasks concurrently
|
||||
4. Worker reports task completion/failure
|
||||
5. Cycle repeats based on `-taskInterval`
|
||||
|
||||
## Monitoring and Status
|
||||
|
||||
### Worker Status
|
||||
|
||||
Workers report the following status information:
|
||||
- **Worker ID**: Unique identifier
|
||||
- **Current Load**: Number of active tasks
|
||||
- **Capabilities**: Supported task types
|
||||
- **Last Heartbeat**: Timestamp of last heartbeat
|
||||
- **Tasks Completed**: Total completed tasks
|
||||
- **Tasks Failed**: Total failed tasks
|
||||
- **Uptime**: Worker uptime duration
|
||||
|
||||
### Health Monitoring
|
||||
|
||||
- **Heartbeat**: Periodic heartbeat to admin server
|
||||
- **Task Timeout**: Tasks have configurable timeouts
|
||||
- **Error Reporting**: Failed tasks are reported to admin
|
||||
- **Automatic Retry**: Failed tasks may be retried
|
||||
|
||||
## Best Practices
|
||||
|
||||
### Deployment
|
||||
|
||||
1. **Multiple Workers**: Deploy multiple workers for redundancy
|
||||
2. **Capability Specialization**: Consider specialized workers for specific tasks
|
||||
3. **Resource Allocation**: Ensure adequate CPU and memory for concurrent tasks
|
||||
4. **Network Connectivity**: Ensure reliable connection to admin server
|
||||
|
||||
### Performance
|
||||
|
||||
1. **Concurrent Tasks**: Tune `-maxConcurrent` based on available resources
|
||||
2. **Task Interval**: Adjust `-taskInterval` based on cluster activity
|
||||
3. **Heartbeat Frequency**: Balance between responsiveness and overhead
|
||||
4. **Resource Monitoring**: Monitor worker resource usage
|
||||
|
||||
### Security
|
||||
|
||||
1. **TLS Configuration**: Use TLS for production deployments
|
||||
2. **Network Security**: Secure communication between workers and admin
|
||||
3. **Access Control**: Limit worker deployment to trusted systems
|
||||
4. **Certificate Management**: Manage and rotate TLS certificates
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
1. **Cannot connect to admin server**:
|
||||
- Verify admin server address and port
|
||||
- Check network connectivity
|
||||
- Ensure admin server is running
|
||||
- Verify gRPC port (admin HTTP port + 10000)
|
||||
|
||||
2. **No tasks received**:
|
||||
- Check worker capabilities match available tasks
|
||||
- Verify worker registration with admin
|
||||
- Check admin server logs for task assignment
|
||||
- Ensure worker is not overloaded
|
||||
|
||||
3. **TLS connection failures**:
|
||||
- Verify `security.toml` configuration
|
||||
- Check certificate paths and permissions
|
||||
- Ensure certificates are valid
|
||||
- Check certificate compatibility
|
||||
|
||||
4. **Task execution failures**:
|
||||
- Check worker logs for error details
|
||||
- Verify worker has necessary permissions
|
||||
- Check disk space and resources
|
||||
- Ensure target volumes are accessible
|
||||
|
||||
### Debug Information
|
||||
|
||||
Enable debug logging:
|
||||
|
||||
```bash
|
||||
# Run with verbose logging
|
||||
weed worker -admin=localhost:23646 -v=4
|
||||
```
|
||||
|
||||
### Worker Logs
|
||||
|
||||
Workers log important events:
|
||||
- Connection status to admin server
|
||||
- Task assignments and completion
|
||||
- Error conditions and failures
|
||||
- Heartbeat and health information
|
||||
|
||||
## Task-Specific Information
|
||||
|
||||
### Vacuum Tasks
|
||||
|
||||
- **Purpose**: Reclaim disk space from deleted files
|
||||
- **Requirements**: Access to volume servers
|
||||
- **Duration**: Varies based on volume size and deleted data
|
||||
- **Impact**: Temporary increase in I/O during vacuum process
|
||||
|
||||
### Erasure Coding Tasks
|
||||
|
||||
- **Purpose**: Convert volumes to erasure-coded format
|
||||
- **Requirements**: Multiple volume servers for redundancy
|
||||
- **Duration**: Long-running, depends on volume size
|
||||
- **Impact**: Reduces storage requirements but increases complexity
|
||||
|
||||
### Remote Upload Tasks
|
||||
|
||||
- **Purpose**: Upload volumes to remote/cloud storage
|
||||
- **Requirements**: Cloud storage credentials and connectivity
|
||||
- **Duration**: Depends on volume size and upload bandwidth
|
||||
- **Impact**: Enables tiered storage and backup strategies
|
||||
|
||||
### Replication Tasks
|
||||
|
||||
- **Purpose**: Fix replication consistency issues
|
||||
- **Requirements**: Access to master and volume servers
|
||||
- **Duration**: Quick, depends on replication factor
|
||||
- **Impact**: Ensures data consistency and availability
|
||||
|
||||
### Balance Tasks
|
||||
|
||||
- **Purpose**: Redistribute volumes across volume servers
|
||||
- **Requirements**: Multiple volume servers
|
||||
- **Duration**: Depends on data movement requirements
|
||||
- **Impact**: Improves cluster load distribution
|
||||
|
||||
## Related Commands
|
||||
|
||||
- [`weed admin`](Weed-Admin.md): Start admin server that manages workers
|
||||
- [`weed master`](https://github.com/seaweedfs/seaweedfs/wiki/Master-Server): Start master servers
|
||||
- [`weed volume`](https://github.com/seaweedfs/seaweedfs/wiki/Volume-Server): Start volume servers
|
||||
- [`weed scaffold`](https://github.com/seaweedfs/seaweedfs/wiki/Scaffold): Generate configuration files
|
||||
|
||||
## See Also
|
||||
|
||||
- [SeaweedFS Architecture](https://github.com/seaweedfs/seaweedfs/wiki/SeaweedFS-Architecture)
|
||||
- [Maintenance Operations](https://github.com/seaweedfs/seaweedfs/wiki/Maintenance)
|
||||
- [Security Configuration](https://github.com/seaweedfs/seaweedfs/wiki/Security-Configuration)
|
||||
- [Erasure Coding](https://github.com/seaweedfs/seaweedfs/wiki/Erasure-Coding)
|
||||
- [Remote Storage](https://github.com/seaweedfs/seaweedfs/wiki/Remote-Storage)
|
||||
Reference in New Issue
Block a user