Skip to content

feat: add rollup_blocks_seen counter and last_rollup_block_seen_timestamp gauge#275

Merged
Evalir merged 3 commits intomainfrom
evalir/host-blocks-seen-metric
May 8, 2026
Merged

feat: add rollup_blocks_seen counter and last_rollup_block_seen_timestamp gauge#275
Evalir merged 3 commits intomainfrom
evalir/host-blocks-seen-metric

Conversation

@Evalir
Copy link
Copy Markdown
Member

@Evalir Evalir commented May 7, 2026

Description

Adds two metrics on the ingress side of the EnvTask rollup-block subscription so operators can detect a silently-dead WS subscription that previously looked identical to a builder choosing not to build:

  • signet.builder.rollup_blocks_seen (counter) — incremented on every observed rollup-block notification before any Quincey/profitability/timestamp logic
  • signet.builder.last_rollup_block_seen_timestamp (gauge) — wall-clock Unix seconds at the most recent observation

Both labeled with rollup_chain_id so operators running multiple builders against different rollups can distinguish them.

The rollup-block stream (ru_provider.subscribe_blocks) is the WS-driven heartbeat in this builder — host headers are fetched via HTTP per slot — so this is where a silently-dead WS would manifest as a stalled metric.

Related Issue

Suggested alert rule

# Builder has not seen a rollup block in ~3 slots: likely stalled WS or hung process.
- alert: BuilderRollupBlocksStale
  expr: time() - signet_builder_last_rollup_block_seen_timestamp > 36
  for: 1m
  labels: { severity: page }
  annotations:
    summary: "Builder {{ $labels.rollup_chain_id }} has not observed a rollup block in 3+ slots"

Tune 36 to 3 * SLOT_DURATION for the deployed chain. The signet_ prefix reflects the dot-to-underscore conversion done by metrics-exporter-prometheus.

Testing

  • make fmt passes
  • make clippy passes (with -D warnings)
  • make test passes (19 tests, +1 new)
  • New test added: record_rollup_block_seen_advances_counter_and_gauge in src/metrics.rs uses metrics_util::debugging::DebuggingRecorder to verify the counter advances by N for N calls and the gauge is set to a wall-clock timestamp within the observed window, with the rollup_chain_id label attached

[Claude Code]

… gauge

Records two metrics on the ingress side of the EnvTask rollup-block
subscription, labeled by host_chain_id. They advance even when the builder
skips block construction, so operators can detect a silently-dead WS
subscription that previously looked identical to a builder choosing not to
build.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Member Author

Evalir commented May 7, 2026

This stack of pull requests is managed by Graphite. Learn more about stacking.

The chain-following subscription is to rollup blocks (ru_provider.subscribe_blocks),
not host blocks. Rename the metric, label, and helper to match what is actually
observed; the WS that can silently die is the rollup-block one.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Evalir Evalir changed the title feat: add host_blocks_seen counter and last_host_block_seen_timestamp gauge feat: add rollup_blocks_seen counter and last_rollup_block_seen_timestamp gauge May 7, 2026
Copy link
Copy Markdown
Contributor

@Fraser999 Fraser999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one tiny nit.

Comment thread src/metrics.rs Outdated
match (key.kind(), key.key().name()) {
(MetricKind::Counter, ROLLUP_BLOCKS_SEEN) => counter_value = Some(value),
(MetricKind::Gauge, LAST_ROLLUP_BLOCK_SEEN_TIMESTAMP) => gauge_value = Some(value),
_ => {}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we could always panic here rather than no-op

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep i think this is probably preferable to silently doing nothing

Address review nit: the local recorder in record_rollup_block_seen_advances_counter_and_gauge
should only ever see the two metrics under test. Silently no-op'ing on anything else
hides isolation bugs. Bind the catch-all so the panic names what leaked in.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Evalir Evalir merged commit 76f1bef into main May 8, 2026
7 checks passed
Copy link
Copy Markdown
Member Author

Evalir commented May 8, 2026

Merge activity

@Evalir Evalir deleted the evalir/host-blocks-seen-metric branch May 8, 2026 12:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants