Created Worker (markdown)

Chris Lu
2025-07-06 22:37:31 -07:00
parent 6ad192fa9c
commit ff1bea50c4
+283
@@ -0,0 +1,283 @@
# This is still work in progress!
# Weed Worker
The `weed worker` command starts a maintenance worker that connects to an admin server to process cluster maintenance tasks.
## Overview
Workers are distributed maintenance agents that connect to the admin server to process various maintenance tasks such as:
- **Vacuum**: Reclaim disk space by removing deleted files
- **Erasure Coding**: Convert volumes to erasure-coded format for storage efficiency
- **Remote Upload**: Upload volumes to remote/cloud storage
- **Replication**: Fix replication issues and maintain data consistency
- **Balance**: Redistribute volumes across volume servers for load balancing
Workers automatically register with the admin server and receive tasks based on their capabilities and current load.
## Usage
```bash
weed worker [options]
```
## Options
| Option | Default | Description |
|--------|---------|-------------|
| `-admin` | localhost:23646 | Admin server address |
| `-capabilities` | vacuum,erasure_coding,balance | Comma-separated list of task types this worker can handle |
| `-maxConcurrent` | 2 | Maximum number of concurrent tasks |
| `-heartbeat` | 30s | Heartbeat interval to admin server |
| `-taskInterval` | 5s | Task request interval |
## Examples
### Basic Usage
```bash
# Start worker connecting to local admin server
weed worker -admin=localhost:23646
# Connect to remote admin server
weed worker -admin=admin.example.com:23646
# Start worker with custom admin server and port
weed worker -admin=192.168.1.100:8080
```
### Capability Configuration
```bash
# Worker that only handles vacuum tasks
weed worker -admin=localhost:23646 -capabilities=vacuum
# Worker that handles vacuum and replication tasks
weed worker -admin=localhost:23646 -capabilities=vacuum,replication
# Worker with all capabilities (default)
weed worker -admin=localhost:23646 -capabilities=vacuum,ec,remote,replication,balance
# Worker using capability aliases
weed worker -admin=localhost:23646 -capabilities=vacuum,ec,remote,replication
```
### Performance Tuning
```bash
# High-performance worker with more concurrent tasks
weed worker -admin=localhost:23646 -maxConcurrent=8
# More frequent task requests for busy clusters
weed worker -admin=localhost:23646 -taskInterval=2s
# Custom heartbeat interval
weed worker -admin=localhost:23646 -heartbeat=10s
```
## Task Capabilities
Workers can be configured to handle specific types of maintenance tasks:
### Available Task Types
| Capability | Description |
|------------|-------------|
| `vacuum` | Reclaim disk space by removing deleted files |
| `erasure_coding` | Convert volumes to erasure-coded format |
| `balance` | Redistribute volumes for load balancing |
## Worker Architecture
### Worker Lifecycle
1. **Registration**: Worker connects to admin server via gRPC
2. **Capabilities**: Worker reports its capabilities to admin
3. **Task Request**: Worker periodically requests tasks from admin
4. **Task Execution**: Worker processes assigned tasks
5. **Heartbeat**: Worker sends periodic heartbeats to admin
6. **Graceful Shutdown**: Worker completes current tasks before stopping
### Connection Details
- **Protocol**: gRPC connection to admin server
- **Port**: Admin HTTP port + 10000 (e.g., admin on 23646 → gRPC on 33646)
- **Security**: Supports TLS using `[grpc.worker]` configuration
- **Fallback**: Falls back to insecure connection if TLS unavailable
## Configuration
### Security Configuration
Workers read TLS configuration from `security.toml`:
```toml
[grpc.worker]
cert = "/etc/ssl/worker.crt"
key = "/etc/ssl/worker.key"
ca = "/etc/ssl/ca.crt"
```
### Worker Identification
- **Worker ID**: Automatically generated unique identifier
- **Address**: Worker's network address (auto-detected)
- **Capabilities**: Reported task capabilities
- **Status**: Current worker status (active, idle, busy)
## Task Processing
### Concurrent Task Handling
- **Max Concurrent**: Configurable via `-maxConcurrent` (default: 2)
- **Task Queue**: Workers maintain internal task queues
- **Load Balancing**: Admin distributes tasks based on worker load
- **Task Completion**: Workers report task completion status
### Task Request Cycle
1. Worker requests tasks from admin server
2. Admin assigns tasks based on worker capabilities and load
3. Worker processes tasks concurrently
4. Worker reports task completion/failure
5. Cycle repeats based on `-taskInterval`
## Monitoring and Status
### Worker Status
Workers report the following status information:
- **Worker ID**: Unique identifier
- **Current Load**: Number of active tasks
- **Capabilities**: Supported task types
- **Last Heartbeat**: Timestamp of last heartbeat
- **Tasks Completed**: Total completed tasks
- **Tasks Failed**: Total failed tasks
- **Uptime**: Worker uptime duration
### Health Monitoring
- **Heartbeat**: Periodic heartbeat to admin server
- **Task Timeout**: Tasks have configurable timeouts
- **Error Reporting**: Failed tasks are reported to admin
- **Automatic Retry**: Failed tasks may be retried
## Best Practices
### Deployment
1. **Multiple Workers**: Deploy multiple workers for redundancy
2. **Capability Specialization**: Consider specialized workers for specific tasks
3. **Resource Allocation**: Ensure adequate CPU and memory for concurrent tasks
4. **Network Connectivity**: Ensure reliable connection to admin server
### Performance
1. **Concurrent Tasks**: Tune `-maxConcurrent` based on available resources
2. **Task Interval**: Adjust `-taskInterval` based on cluster activity
3. **Heartbeat Frequency**: Balance between responsiveness and overhead
4. **Resource Monitoring**: Monitor worker resource usage
### Security
1. **TLS Configuration**: Use TLS for production deployments
2. **Network Security**: Secure communication between workers and admin
3. **Access Control**: Limit worker deployment to trusted systems
4. **Certificate Management**: Manage and rotate TLS certificates
## Troubleshooting
### Common Issues
1. **Cannot connect to admin server**:
- Verify admin server address and port
- Check network connectivity
- Ensure admin server is running
- Verify gRPC port (admin HTTP port + 10000)
2. **No tasks received**:
- Check worker capabilities match available tasks
- Verify worker registration with admin
- Check admin server logs for task assignment
- Ensure worker is not overloaded
3. **TLS connection failures**:
- Verify `security.toml` configuration
- Check certificate paths and permissions
- Ensure certificates are valid
- Check certificate compatibility
4. **Task execution failures**:
- Check worker logs for error details
- Verify worker has necessary permissions
- Check disk space and resources
- Ensure target volumes are accessible
### Debug Information
Enable debug logging:
```bash
# Run with verbose logging
weed worker -admin=localhost:23646 -v=4
```
### Worker Logs
Workers log important events:
- Connection status to admin server
- Task assignments and completion
- Error conditions and failures
- Heartbeat and health information
## Task-Specific Information
### Vacuum Tasks
- **Purpose**: Reclaim disk space from deleted files
- **Requirements**: Access to volume servers
- **Duration**: Varies based on volume size and deleted data
- **Impact**: Temporary increase in I/O during vacuum process
### Erasure Coding Tasks
- **Purpose**: Convert volumes to erasure-coded format
- **Requirements**: Multiple volume servers for redundancy
- **Duration**: Long-running, depends on volume size
- **Impact**: Reduces storage requirements but increases complexity
### Remote Upload Tasks
- **Purpose**: Upload volumes to remote/cloud storage
- **Requirements**: Cloud storage credentials and connectivity
- **Duration**: Depends on volume size and upload bandwidth
- **Impact**: Enables tiered storage and backup strategies
### Replication Tasks
- **Purpose**: Fix replication consistency issues
- **Requirements**: Access to master and volume servers
- **Duration**: Quick, depends on replication factor
- **Impact**: Ensures data consistency and availability
### Balance Tasks
- **Purpose**: Redistribute volumes across volume servers
- **Requirements**: Multiple volume servers
- **Duration**: Depends on data movement requirements
- **Impact**: Improves cluster load distribution
## Related Commands
- [`weed admin`](Weed-Admin.md): Start admin server that manages workers
- [`weed master`](https://github.com/seaweedfs/seaweedfs/wiki/Master-Server): Start master servers
- [`weed volume`](https://github.com/seaweedfs/seaweedfs/wiki/Volume-Server): Start volume servers
- [`weed scaffold`](https://github.com/seaweedfs/seaweedfs/wiki/Scaffold): Generate configuration files
## See Also
- [SeaweedFS Architecture](https://github.com/seaweedfs/seaweedfs/wiki/SeaweedFS-Architecture)
- [Maintenance Operations](https://github.com/seaweedfs/seaweedfs/wiki/Maintenance)
- [Security Configuration](https://github.com/seaweedfs/seaweedfs/wiki/Security-Configuration)
- [Erasure Coding](https://github.com/seaweedfs/seaweedfs/wiki/Erasure-Coding)
- [Remote Storage](https://github.com/seaweedfs/seaweedfs/wiki/Remote-Storage)