fix(filer.sync): keep sync_offset fresh while the source is read-only (#9589)

* fix(filer.sync): keep sync_offset fresh while the source is read-only

sync_offset holds the timestamp of the last replicated source event, so
monitoring derives lag from now-sync_offset. A read-only source emits no
metadata events, so the gauge froze at the last write and the derived lag
grew without bound, making thresholds unusable.

The source filer now sends an idle heartbeat carrying its current time
while a subscriber is caught up to the buffer head. filer.sync uses it to
advance the gauge, so now-sync_offset reflects real lag. Heartbeats are
opt-in (client_supports_idle_heartbeat), are never written to the metadata
log, and do not move the resume checkpoint, so a restart still resumes
from the last real event.

* fix(filer.sync): gate idle heartbeat on the read cursor, not SinceNs

In metadata-chunks mode persisted entries replay as log file refs and
never reach eachLogEntryFn, so lastSeenTsNs stays put and a caught-up
subscriber with an old SinceNs would never get a heartbeat. Use the
read cursor (lastReadTime), which advances in that mode too, max'd with
lastSeenTsNs so the in-memory backlog-then-idle case still works while
the cursor returned to the caller has not yet updated.
This commit is contained in:
Chris Lu
2026-05-20 11:26:37 -07:00
committed by GitHub
parent 4385b86bf1
commit 5af7d12f04
6 changed files with 200 additions and 7 deletions
+7
View File
@@ -439,6 +439,13 @@ func doSubscribeFilerMetaChanges(clientId int32, clientEpoch int32, sourceGrpcDi
StartTsNs: sourceFilerOffsetTsNs,
StopTsNs: 0,
EventErrorType: pb.RetryForeverOnError,
// While the source has only read activity it emits no metadata events, so
// the watermark above never advances and sync_offset would look stuck.
// The idle heartbeat moves the gauge to the source's current time once we
// are caught up, so now-sync_offset reflects real lag and stays alertable.
OnIdleHeartbeat: func(tsNs int64) {
statsCollect.FilerSyncOffsetGauge.WithLabelValues(sourceFiler.String(), targetFiler.String(), clientName, sourcePath).Set(float64(tsNs))
},
}
return pb.FollowMetadata(sourceFiler, sourceGrpcDialOption, metadataFollowOption, processEventFnWithOffset)