Skip to content

fix(host): bound preimage hint retries instead of looping forever#2597

Open
BrianBland wants to merge 1 commit intomainfrom
brianbland/bound-online-preimage-retries
Open

fix(host): bound preimage hint retries instead of looping forever#2597
BrianBland wants to merge 1 commit intomainfrom
brianbland/bound-online-preimage-retries

Conversation

@BrianBland
Copy link
Copy Markdown
Contributor

@BrianBland BrianBland commented May 8, 2026

Summary

  • OnlineHostBackend::get_preimage had two infinite-loop paths: a continue after every handle_hint failure with no cap or backoff, and an empty spin when no hint was routed and the key was absent from the KV. Either pinned the PreimageServer task forever, hanging Host::build_witness for the full PROVER_TIMEOUT and burning CPU + RPC budget.
  • Cap retries at MAX_HINT_ATTEMPTS = 5 with exponential backoff (50 ms → 1 s), propagate the last error after exhaustion, and fail fast with a clear error when no hint has been routed.

@cb-heimdall
Copy link
Copy Markdown
Collaborator

cb-heimdall commented May 8, 2026

🟡 Heimdall Review Status

Requirement Status More Info
Reviews 🟡 0/1
Denominator calculation
Show calculation
1 if user is bot 0
1 if user is external 0
2 if repo is sensitive 0
From .codeflow.yml 1
Additional review requirements
Show calculation
Max 0
0
From CODEOWNERS 0
Global minimum 0
Max 1
1
1 if commit is unverified 0
Sum 1

@vercel
Copy link
Copy Markdown

vercel Bot commented May 8, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
base Ignored Ignored Preview May 8, 2026 10:30pm

Request Review

Comment thread crates/proof/host/src/lib.rs Outdated
Comment on lines +31 to +32
INITIAL_HINT_RETRY_BACKOFF, MAX_HINT_ATTEMPTS, MAX_HINT_RETRY_BACKOFF, OfflineHostBackend,
OnlineHostBackend,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These constants (MAX_HINT_ATTEMPTS, INITIAL_HINT_RETRY_BACKOFF, MAX_HINT_RETRY_BACKOFF) are only used internally by OnlineHostBackend::get_preimage. Re-exporting them from the crate root makes them part of the public API, which means changing a retry tuning parameter becomes a semver-breaking change.

Consider keeping them pub only within the module (or pub(crate)) and removing them from the re-exports here and in backend/mod.rs, unless there's a downstream consumer that needs to reference them (e.g. for configuration or testing from another crate).

@linear
Copy link
Copy Markdown

linear Bot commented May 8, 2026

CHAIN-4237

@BrianBland BrianBland force-pushed the brianbland/bound-online-preimage-retries branch from c89fd6f to c1833db Compare May 8, 2026 22:13
Comment thread crates/proof/host/src/backend/online.rs Outdated
Comment on lines +90 to +91
if preimage.is_some() {
return preimage.ok_or(PreimageOracleError::KeyNotFound);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: We just confirmed preimage.is_some() on line 90, so the ok_or(KeyNotFound) on line 91 is dead code — it can never produce Err. Same pattern on line 119-120 inside the retry loop.

Consider using Ok(preimage.expect("checked is_some")) or restructuring with if let Some(data) = preimage { return Ok(data); } to make the intent clearer and avoid surfacing a misleading KeyNotFound variant in a path that can never fire.

OnlineHostBackend::get_preimage's previous loop had no retry cap, no
backoff, and no error path: a persistently-failing handle_hint pinned
the PreimageServer task indefinitely, and a missing hint with the key
absent from the KV produced an empty spin.

Cap retries at MAX_HINT_ATTEMPTS with exponential backoff between
attempts, propagate the last error after exhaustion, and fail fast
with a clear error when no hint has been routed at all.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@BrianBland BrianBland force-pushed the brianbland/bound-online-preimage-retries branch from c1833db to 688500a Compare May 8, 2026 22:30
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

Review Summary

The fix correctly addresses both infinite-loop paths in OnlineHostBackend::get_preimage:

  • Path A (RPC failures): Bounded retry loop with exponential backoff (50ms to 1s cap, 5 attempts max), propagating the last error on exhaustion.
  • Path B (no hint routed): Fail-fast with a descriptive error instead of spinning on an empty last_hint.

No issues found. The implementation is clean:

  • Snapshotting the hint once before the retry loop avoids racing with the hint writer — good design choice.
  • The Ok branch correctly returns early when the hint populates the key, and fails fast with KeyNotFound when the hint succeeds but doesn't produce the requested preimage (retrying the same deterministic hint would be futile).
  • Constants are module-private (const, not pub const), keeping retry tuning out of the public API.
  • Both tests exercise the two formerly-infinite paths with appropriate timeouts and deterministic failure triggers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants