Skip to content

feat(tbtc/signer): Phase 7.2b-1 InteractiveAggregate completion marker#4055

Merged
mswilkison merged 6 commits into
extraction/frost-signer-mirror-2026-05-26from
feat/phase-7-2b-1-aggregate-completion-marker-2026-06-13
Jun 14, 2026
Merged

feat(tbtc/signer): Phase 7.2b-1 InteractiveAggregate completion marker#4055
mswilkison merged 6 commits into
extraction/frost-signer-mirror-2026-05-26from
feat/phase-7-2b-1-aggregate-completion-marker-2026-06-13

Conversation

@mswilkison

Copy link
Copy Markdown
Contributor

Phase 7.2b-1: InteractiveAggregate completion marker

First implementation PR of Phase 7.2b (design signed off in #4054 — gates-doc Decision Log entry 9). Engine-side only — no blame, no envelopes — preserving the crypto-only engine boundary (the Q1 correction): all envelope verification and authoritative blame remain Go-side. Targets the mirror branch directly and is independent of #4054 (the code touches no docs; #4054 touches no source).

What it does

Adds a durable per-attempt "aggregated" marker so re-aggregating a completed interactive attempt is rejected (InteractiveAttemptAlreadyAggregated) rather than recomputed (frozen Phase 7 spec; design §6). This is the only engine-side state 7.2b adds.

Changes

  • errors.rs: new EngineError::InteractiveAttemptAlreadyAggregated { session_id, attempt_id } — code interactive_attempt_already_aggregated, recovery class recoverable (benign duplicate; the engine is healthy).
  • engine/state.rs: aggregated_interactive_attempt_markers: HashSet<String> on SessionState.
  • engine/persistence.rs: mirrored Vec<String> on PersistedSessionState with #[serde(default, skip_serializing_if = "Vec::is_empty")] (loads pre-7.2b state; writes nothing when empty), bounded via the existing consumed-registry helpers, wired through both TryFrom conversions.
  • engine/interactive.rs: interactive_aggregate pre-checks the marker before recomputing; keeps the aggregation lock-free; then re-acquires the lock, re-checks (a concurrent-duplicate race guard), inserts the marker, and persists with rollback-on-failure — mirroring the Round2 consume-before-release pattern — before reporting success.

Durable-retention confirmation (design §9)

The §9 question — is the group public key package durably retained beyond the signing-session TTL (needed by the later 7.2b-3 verify-share FFI)? — is already satisfied: dkg_public_key_package lives on the persisted session and sweep_expired_interactive_state clears only the live attempt's nonces. No new persistence is added here.

Tests

  • Re-aggregating a completed attempt is rejected with the correct code.
  • The completion marker survives simulate_process_restart_for_tests() + reload (persistence round-trip).
  • Error code / recovery class / message format pinned.

Verification

cargo fmt, cargo clippy --all-targets --all-features -- -D warnings clean, full suite 274 lib pass / 0 fail / 1 ignored, Phase-5 chaos suite all scenarios pass.

🤖 Generated with Claude Code

Adds a durable per-attempt "aggregated" marker so re-aggregating a completed interactive attempt is rejected rather than recomputed (frozen Phase 7 spec; Phase 7.2b design section 6). Engine-side only: no blame and no envelopes, preserving the crypto-only engine boundary (the Q1 correction) - all envelope verification and authoritative blame stay Go-side.

- New EngineError::InteractiveAttemptAlreadyAggregated (code interactive_attempt_already_aggregated, recovery_class recoverable).
- aggregated_interactive_attempt_markers: HashSet<String> on SessionState, mirrored as Vec<String> on PersistedSessionState with serde(default) for backward-compat with pre-7.2b state, bounded via the existing consumed-registry helpers and wired through both persistence conversions.
- interactive_aggregate pre-checks the marker before recomputing, keeps the aggregation lock-free, then re-acquires the lock, re-checks (a concurrent-duplicate race guard), inserts the marker, and persists with rollback-on-failure - mirroring the Round2 consume-before-release pattern - before reporting success.

The design's section 9 durable-wallet-pubkey-package-retention question (needed by the 7.2b-3 verify-share FFI) is confirmed already satisfied: the DKG public key package persists on the session and survives the interactive-attempt TTL sweep, so no new persistence is added here.

Tests: repeat-aggregate rejected; completion marker survives restart+reload; error code/recovery/message pinned.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 13, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: a835dc96-cf92-46e7-9db1-100290924551

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/phase-7-2b-1-aggregate-completion-marker-2026-06-13

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Addresses Codex's review of #4055: the completion marker made a successfully-computed aggregate unrecoverable from the engine if the host lost the FFI response after the marker persisted - a liveness/idempotency regression and a restart-divergence. Per section 6's own 'deterministic over public data, not security-load-bearing' rationale, rejecting a retry bought nothing, so the engine now re-emits instead of erroring.

The per-attempt completion marker becomes a completion record: aggregated_interactive_attempt_signatures maps attempt_id -> the public aggregate signature hex (BTreeMap on SessionState + PersistedSessionState, serde(default) backward-compat, bounded as before via a shared length-based capacity helper). A repeat InteractiveAggregate returns the stored signature; a concurrent duplicate that raced past the pre-check returns the identical signature without double-storing; success telemetry counts only a freshly recorded aggregation. The persisted signature is public (it goes on-chain), so this respects the never-persist-secrets freeze. EngineError::InteractiveAttemptAlreadyAggregated is removed (re-aggregation no longer errors).

Tests updated: a repeat aggregate (immediate and across restart+reload) returns the same signature. fmt + clippy --all-targets --all-features -D + full suite (273) + Phase-5 chaos suite green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
mswilkison added a commit that referenced this pull request Jun 13, 2026
Match the 7.2b-1 implementation change (PR #4055): the InteractiveAggregate completion marker becomes a completion record (attempt_id -> aggregate signature hex), and a repeat aggregate returns the stored signature (idempotent) rather than an 'already aggregated' error. Addresses a Codex review finding that the reject-based marker made a successfully-computed signature unrecoverable from the engine if the host lost the FFI response. §11 acceptance updated accordingly.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
mswilkison and others added 2 commits June 13, 2026 18:09
…/Gemini P2)

Both bots' re-review converged on a race in the idempotency follow-up: when two InteractiveAggregate calls for the same attempt race past the empty pre-check, the loser found the record already present, skipped the store, but still returned the signature IT recomputed. Different responsive subsets can produce different VALID signatures for the same attempt, so the value a caller received could diverge from the persisted one that restarts and later re-emissions return - breaking the idempotent completion-record contract.

The post-lock path now returns the canonical PERSISTED signature when the attempt is already recorded (early return, no re-store, no success count), matching the pre-check; dropped the recorded_new flag. Strengthened the idempotency test to overwrite the stored record with a sentinel and assert the repeat returns exactly that (proving return-of-recorded-value, which a plain equality check could not distinguish from a deterministic recompute).

fmt + clippy --all-targets --all-features -D + full suite (273) + Phase-5 chaos green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…it (Codex P2)

Codex's 3rd-round re-review: the completion record is keyed by attempt_id, which does NOT bind taproot_merkle_root, and the coordinator is in the threat model. A reused attempt_id carrying a different root/message would re-emit the stored signature, which would not verify under the caller's requested tweaked key.

Re-emission (both the pre-check and the post-race collision path) now returns the recorded signature only when it verifies under THIS request's tweaked group key and message; a reused attempt_id with mismatched aggregate inputs is rejected with a validation error rather than handed a non-verifying signature. Added recorded_aggregate_matches_request (decode + tweak + verify, mirroring the aggregate self-verify).

Test rework: dropped the garbage sentinel (now correctly rejected by the verify), kept the same-inputs idempotent + restart re-emission tests, added a mismatched-root rejection assertion. fmt + clippy -D + full suite (273) + Phase-5 chaos green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
mswilkison added a commit that referenced this pull request Jun 13, 2026
Match the 7.2b-1 fix (PR #4055): re-emission of the completion record is validated against the request's tweaked key/message - attempt_id does not bind the taproot root, so a reused attempt_id with different aggregate inputs is rejected rather than handed a non-verifying signature (Codex re-review finding). Drops the now-inaccurate 'or erroring' on the same-inputs path.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…dex P2)

Codex 4th-round re-review: a completed aggregate persists a completion record keyed by attempt_id, and the reload path rejects an empty key, so an empty attempt_id (malformed or malicious) could write durable state that fails to reload - bricking the engine on restart. interactive_aggregate now rejects an empty attempt_id up front (before any decode/aggregate/store), matching the loader's non-empty invariant. Not idempotency-specific: the write path simply didn't enforce what the read path requires.

Test: aggregate with an empty attempt_id returns a validation error and persists nothing. fmt + clippy -D + full suite (274) + Phase-5 chaos green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
mswilkison added a commit that referenced this pull request Jun 13, 2026
The open-questions doc's closing paragraph still said owner sign-off was pending and 7.2b-1 could only start afterward, contradicting the signed-off status header + Decision Log entry 9 added in the same PR. Updated to reflect the sign-off and that 7.2b-1 is implemented (PR #4055).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Per design decision after 5 review rounds on the re-emission path: the InteractiveAggregate completion record reverts from idempotent signature re-emission to a simple completion MARKER that rejects re-aggregation of a completed attempt (InteractiveAttemptAlreadyAggregated). Matches frozen spec section 6 ('mark the session complete'); a lost signature is recovered with a fresh ROAST attempt, not re-aggregate.

Eliminates the re-emission surface that accumulated findings: the concurrent-race signature divergence, the request package/root binding on re-emit, and the lost-shares recovery gap - none apply when a completed attempt is simply rejected. The empty-attempt_id guard + restart-safety rationale are kept (the marker set's loader still rejects empty keys).

- aggregated_interactive_attempt_markers: HashSet<String> on SessionState / Vec<String> on PersistedSessionState (serde default, bounded, sorted), replacing the attempt_id->signature map. - interactive_aggregate rejects a completed attempt at the pre-check and the post-aggregation re-check; no signature stored or re-emitted. - Re-added EngineError::InteractiveAttemptAlreadyAggregated.

Tests: re-aggregation rejected (immediate + across restart+reload); empty attempt_id rejected. fmt + clippy -D + full suite (275) + chaos green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
mswilkison added a commit that referenced this pull request Jun 13, 2026
Match the implementation decision (PR #4055): section 6 reverts from idempotent signature re-emission to a completion MARKER that rejects re-aggregation; recovery of a lost signature is via a fresh ROAST attempt. Records why the re-emit variant was dropped (concurrent-race divergence, request-input binding, lost-shares recovery gap). section 11 acceptance + 'completion record'->'completion marker' terminology updated across the design + open-questions docs.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@mswilkison mswilkison merged commit d57237d into extraction/frost-signer-mirror-2026-05-26 Jun 14, 2026
19 checks passed
@mswilkison mswilkison deleted the feat/phase-7-2b-1-aggregate-completion-marker-2026-06-13 branch June 14, 2026 00:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant