admin: expose Prometheus metrics (#9652)

* admin: add -metricsPort flag to expose Prometheus metrics

The admin command had no metrics endpoint, so passing -metricsPort
(as the operator does for spec.admin.metricsPort) crashed the process
with "flag provided but not defined". Wire up -metricsPort/-metricsIp
and start the shared Prometheus metrics server, matching filer, master,
and volume.

* admin: emit maintenance task and worker fleet metrics

Add Prometheus metrics for the admin server's distinctive work: the
maintenance task queue and the worker fleet that executes it.

Task lifecycle: maintenance_tasks_by_status / _by_type gauges (snapshot
of the queue), maintenance_tasks_completed_total{type,outcome} counter
and maintenance_task_duration_seconds{type} histogram (recorded when a
task reaches a terminal state), and last/next scan timestamp gauges.

Worker fleet: workers_connected and worker_slots{used,max} gauges, plus
worker_events_total{event} counting register/unregister/stale removals.

Gauges are snapshotted by a background goroutine on the admin server;
counters and the histogram are recorded at their event sites.

* admin: read worker slot totals under lock, clear next-scan gauge when idle

GetWorkers returns live worker pointers; summing CurrentLoad/MaxConcurrent
outside the queue lock races with task assignment and completion. Add
GetWorkerSlotTotals to aggregate under the lock.

Also reset maintenance_next_scan_timestamp_seconds to 0 when the scanner
is not running, so it can't retain a stale value after a stop.
This commit is contained in:
Chris Lu
2026-05-24 14:09:02 -07:00
committed by GitHub
parent 6fc212cedb
commit 25beb7ec48
6 changed files with 198 additions and 7 deletions
+17
View File
@@ -30,6 +30,7 @@ import (
"github.com/seaweedfs/seaweedfs/weed/glog"
"github.com/seaweedfs/seaweedfs/weed/pb"
"github.com/seaweedfs/seaweedfs/weed/security"
stats_collect "github.com/seaweedfs/seaweedfs/weed/stats"
"github.com/seaweedfs/seaweedfs/weed/util"
"github.com/seaweedfs/seaweedfs/weed/util/grace"
)
@@ -50,6 +51,8 @@ type AdminOptions struct {
dataDir *string
icebergPort *int
urlPrefix *string
metricsHttpPort *int
metricsHttpIp *string
debug *bool
debugPort *int
cpuProfile *string
@@ -70,6 +73,8 @@ func init() {
a.readOnlyPassword = cmdAdmin.Flag.String("readOnlyPassword", "", "read-only user password (optional, for view-only access; requires adminPassword to be set)")
a.icebergPort = cmdAdmin.Flag.Int("iceberg.port", 8181, "Iceberg REST Catalog port (0 to hide in UI)")
a.urlPrefix = cmdAdmin.Flag.String("urlPrefix", "", "URL path prefix when running behind a reverse proxy under a subdirectory (e.g. /seaweedfs)")
a.metricsHttpPort = cmdAdmin.Flag.Int("metricsPort", 0, "Prometheus metrics listen port")
a.metricsHttpIp = cmdAdmin.Flag.String("metricsIp", "", "metrics listen ip. If empty, listens on all interfaces.")
a.debug = cmdAdmin.Flag.Bool("debug", false, "serves runtime profiling data via pprof on the port specified by -debug.port")
a.debugPort = cmdAdmin.Flag.Int("debug.port", 6060, "http port for debugging")
a.cpuProfile = cmdAdmin.Flag.String("cpuprofile", "", "cpu profile output file")
@@ -160,6 +165,12 @@ var cmdAdmin = &Command{
weed admin -debug -debug.port=6060 -master="localhost:9333"
weed admin -cpuprofile=cpu.prof -memprofile=mem.prof -master="localhost:9333"
Metrics:
- Use -metricsPort to expose Prometheus metrics at http://<host>:<metricsPort>/metrics
- Use -metricsIp to bind the metrics endpoint to a specific ip (default: all interfaces)
- Metrics are disabled when -metricsPort is 0 (the default)
- Example: weed admin -metricsPort=9327 -master="localhost:9333"
Configuration File:
- The security.toml file is read from ".", "$HOME/.seaweedfs/",
"/usr/local/etc/seaweedfs/", or "/etc/seaweedfs/", in that order
@@ -257,6 +268,12 @@ func runAdmin(cmd *Command, args []string) bool {
}
fmt.Printf("Plugin: Enabled\n")
// Start Prometheus metrics endpoint if a port is configured
if *a.metricsHttpPort > 0 {
fmt.Printf("Metrics: http://%s/metrics\n", stats_collect.JoinHostPort(*a.metricsHttpIp, *a.metricsHttpPort))
}
go stats_collect.StartMetricsServer(*a.metricsHttpIp, *a.metricsHttpPort)
// Set up graceful shutdown
ctx, cancel := context.WithCancel(context.Background())
defer cancel()