Skip to content

feat(frost/roast): RFC-21 Phase 3.1 -- coordinator skeleton + seed bridge#3968

Merged
mswilkison merged 2 commits into
feat/frost-schnorr-migration-scaffoldfrom
feat/frost-roast-coordinator-state-2026-05-22
May 23, 2026
Merged

feat(frost/roast): RFC-21 Phase 3.1 -- coordinator skeleton + seed bridge#3968
mswilkison merged 2 commits into
feat/frost-schnorr-migration-scaffoldfrom
feat/frost-roast-coordinator-state-2026-05-22

Conversation

@mswilkison

Copy link
Copy Markdown
Contributor

Summary

First Phase-3 implementation PR for RFC-21. Introduces the ROAST
coordinator state-machine surface (Coordinator interface, in-memory
implementation, attempt-handle identity, state enum) plus the sterile
seed-folding adapter that lets the new [32]byte AttemptSeed drive
the legacy SelectCoordinator helper without modifying it.

No production code path uses the new `Coordinator` yet. Phase 3
"ships unused" per the RFC. Phase 4 wires it into receivers behind the
`frost_roast_retry` build tag.

What lands

`pkg/frost/roast/coordinator_state.go`

Surface Role
`AttemptState` enum `Pending / Collecting / Aggregating / Succeeded / Transitioned` with `String()`.
`AttemptHandle` Opaque per-attempt identity. `ContextHash()` accessor cross-checks the bound context.
`Coordinator` interface `BeginAttempt(ctx) → handle`, `State(handle) → state`, `SelectedCoordinator(handle) → member`. Later Phase-3 PRs (3.2 / 3.3 / 3.4) extend with `TransitionMessage`, `AggregateBundle`, `VerifyBundle`, and `NextAttempt`.
`NewInMemoryCoordinator()` Concurrent-safe via `sync.Mutex` + `atomic.Uint64` next-id counter.
`ErrUnknownAttempt` Sentinel for handle/instance mismatch.

`pkg/frost/roast/seed_bridge.go`

Surface Role
`foldAttemptSeed(seed [32]byte) int64` First 8 bytes BE → int64 reinterpretation. Sterile, named, non-cryptographic adapter. Documented contract: byte-identical input must produce byte-identical output on every honest signer.

`BeginAttempt` calls `foldAttemptSeed` and forwards to the existing
`SelectCoordinator` to elect the attempt's coordinator. The legacy
helper itself is not modified -- the bridge is the only thing
between RFC-21 contexts and the legacy seed format.

Why the seed bridge

The legacy `SelectCoordinator` takes `(seed int64, attemptNumber uint)`
and is correct in isolation. RFC-21 widens `AttemptSeed` to
`[32]byte` for the canonical-hash binding. We could rewrite the
shuffle, but rewriting cryptographic-consensus logic that already
agrees across the network is the wrong trade-off; the audit and
behaviour are settled.

The bridge satisfies the resolved decision in RFC-21:

"BeginAttempt wraps it with a sterile bridge that folds the new
[32]byte AttemptSeed into the legacy parameter shape... The bridge
is named, isolated, and exhaustively tested so later edits cannot
accidentally desynchronise it."

Test coverage

`coordinator_state_test.go` (9 tests)

  • `TestBeginAttempt_ReturnsHandleWithMatchingContextHash`
  • `TestBeginAttempt_HandlesAreDistinctAcrossAttempts`
  • `TestBeginAttempt_RejectsEmptyIncludedSet` (defence-in-depth)
  • `TestState_ReturnsCollectingAfterBegin`
  • `TestState_UnknownHandleReturnsSentinel`
  • `TestSelectedCoordinator_ReturnsMemberFromIncludedSet`
  • `TestSelectedCoordinator_IsDeterministicForSameContext` -- two
    independent `Coordinator` instances agree on the elected member
  • `TestSelectedCoordinator_DifferentAttemptNumbersCanProduceDifferentLeaders`
    -- 16 attempts produce ≥2 distinct leaders, defending the ROAST
    leader-rotation property
  • `TestSelectedCoordinator_UnknownHandleReturnsSentinel`
  • `TestInMemoryCoordinator_ConcurrentBeginAttemptsAreRaceSafe` --
    16 goroutines × 50 calls each, all handles unique
  • `TestAttemptState_String` -- all enum values + unknown sentinel

`seed_bridge_test.go` (5 tests)

  • `TestFoldAttemptSeed_IsDeterministic`
  • `TestFoldAttemptSeed_TakesFirst8BytesBigEndian` -- specific
    byte pattern verified
  • `TestFoldAttemptSeed_IgnoresBytesAfterIndex7` -- documents the
    contract: bytes 8..31 don't influence output (still bound at
    the `AttemptContext.Hash()` layer)
  • `TestFoldAttemptSeed_FirstByteSwept` -- 256-value sweep of the
    high byte produces 256 distinct outputs (no collisions)
  • `TestFoldAttemptSeed_GoldenFixture` -- literal int64 value
    locks the wire-format reduction; literal drift caught at
    code review

Verification

Command Result
`go build ./...` clean
`go test ./pkg/frost/roast/...` pass (14 cases)
`go test -race ./pkg/frost/roast/...` pass
`go test -tags 'frost_native frost_tbtc_signer' ./pkg/frost/...` pass (5 packages)
`staticcheck -checks '-SA1019' ./pkg/frost/roast/...` silent
`go vet ./pkg/frost/roast/...` clean

Test plan

  • CI green.
  • Reviewer confirms the seed bridge's discard of bytes 8..31 is
    acceptable. (Bytes 8..31 still appear in `AttemptContext.Hash()`,
    so any mutation is detected at the protocol-message layer in
    Phase 1B; the bridge merely reduces 256-bit input to the 64-bit
    width `SelectCoordinator` needs.)
  • Reviewer confirms the `Coordinator` interface scope is
    appropriate for Phase 3.1 (state surface only). Phase 3.2 will
    extend with `TransitionMessage` types.

Refs RFC-21 Phase 3 (`docs/rfc/rfc-21-*`). Stacked at the integration
tip after #3967 merged.

…idge

Introduces the ROAST coordinator state machine surface specified in
RFC-21 Layer B, plus the sterile seed-folding adapter that bridges the
new [32]byte AttemptSeed into the legacy int64 seed accepted by the
existing SelectCoordinator helper.

* pkg/frost/roast/coordinator_state.go
  - AttemptState enum (Pending / Collecting / Aggregating /
    Succeeded / Transitioned) with String() for log/test
    readability.
  - AttemptHandle opaque per-attempt identity; ContextHash()
    accessor.
  - Coordinator interface: BeginAttempt, State, SelectedCoordinator.
    Later Phase-3 PRs (3.2, 3.3, 3.4) extend the interface with
    TransitionMessage / LocalEvidenceSnapshot types, AggregateBundle,
    VerifyBundle, and the NextAttempt policy function.
  - NewInMemoryCoordinator factory; concurrent-safe via sync.Mutex
    and atomic next-id counter.
  - ErrUnknownAttempt sentinel for handle/instance mismatch.

* pkg/frost/roast/seed_bridge.go
  - foldAttemptSeed: first 8 bytes BE -> int64 reinterpretation.
    Documented as non-cryptographic, deterministic adapter only.
    Used by BeginAttempt to call SelectCoordinator without
    modifying the legacy helper.

No production code path uses the new Coordinator yet -- consistent
with RFC-21 Phase 3's "ships unused" requirement. Phase 4 wires the
state machine into receivers behind the frost_roast_retry build tag.

Tests:

* coordinator_state_test.go (9 cases)
  - handle.ContextHash matches input AttemptContext.Hash
  - distinct handles across BeginAttempt calls
  - empty included set rejected
  - State after Begin is Collecting
  - State for unknown handle returns sentinel
  - SelectedCoordinator returns member from included set
  - SelectedCoordinator deterministic across independent Coordinator
    instances given identical context
  - Attempt-number rotation: 16 attempts produce >=2 distinct
    coordinators (defends ROAST's leader-rotation property)
  - Concurrent BeginAttempt (16 goroutines * 50 calls each)
    produces unique handles under sync.Mutex
  - AttemptState.String for every defined value plus unknown sentinel

* seed_bridge_test.go (5 cases)
  - determinism on identical input
  - first-8-bytes-BE decode verified against a specific byte pattern
  - bytes 8..31 do not influence output (contract documentation)
  - 256-value sweep of the high byte produces 256 distinct int64
  - golden fixture locks the wire-format reduction; literal
    drift is caught at code review

All pass under: go test ./pkg/frost/roast/..., go test -race
./pkg/frost/roast/..., go test -tags 'frost_native frost_tbtc_signer'
./pkg/frost/..., staticcheck -checks '-SA1019' ./pkg/frost/roast/...,
go vet ./pkg/frost/roast/...
The map-literal column alignment in the CI's gofmt -s pass differs
from my local alignment; gofmt prefers a single space between the
longest key and the value. Apply the formatter's preference so
client-format passes.

Pure formatting change. No behaviour change.
@mswilkison mswilkison merged commit 2f1fd39 into feat/frost-schnorr-migration-scaffold May 23, 2026
15 checks passed
@mswilkison mswilkison deleted the feat/frost-roast-coordinator-state-2026-05-22 branch May 23, 2026 01:07
mswilkison added a commit that referenced this pull request May 23, 2026
…nceSnapshot (#3969)

## Summary

Second Phase-3 implementation PR. Adds the wire types the
coordinator-aggregation flow (RFC-21 Resolved Decisions) needs:

- **\`LocalEvidenceSnapshot\`** — per-signer signed evidence carrying
  \`SenderID\`, \`AttemptContextHash\`, the canonical sorted
  \`OverflowEntry\` slice, and an \`OperatorSignature\` placeholder.
- **\`TransitionMessage\`** — coordinator-aggregated bundle carrying the
  attempt context hash, the elected coordinator's member index, a
  \`SenderID\`-ascending \`Bundle\` of snapshots, and a
  \`CoordinatorSignature\` placeholder.
- **\`OverflowEntry\`** — JSON-friendly canonical key/value pair for one
  \`attempt.Evidence.Overflows\` entry.

**No transport, aggregation, or signature handling yet.** Phase 3.2 is
the wire-type slice only. Phase 3.3 introduces the canonical-encoding
contract those signatures cover plus the aggregate / verify routines.

Stacked on #3968 (Phase 3.1).

## Why this scope

The RFC's Phase 3 work was always going to mix \"new types\" with
\"new behaviour over those types.\" Landing the types first means
Phase 3.3 reviews can focus on the canonical-encoding /
signature-verification logic without rereading the JSON shapes.
Types are small, mechanical, and review-easy.

## What's in the bundle

| Surface | Notes |
|---|---|
| \`LocalEvidenceSnapshot\` | \`SenderIDValue\`, \`AttemptContextHash\`
(32 bytes exact), \`Overflows\` (sorted asc by Sender),
\`OperatorSignature\` (≤ 256 bytes). |
| \`TransitionMessage\` | \`AttemptContextHash\` (32 bytes exact),
\`CoordinatorIDValue\`, \`Bundle\` (sorted asc by Sender, ≤ 256 entries,
each binding to the bundle's hash), \`CoordinatorSignature\` (≤ 256
bytes). |
| \`OverflowEntry\` | \`{Sender, Count}\`. |
| Type prefix | \`frost_signing/roast/\` -- distinct from
\`native_frost\` / \`native_tbtc_signer\`. |
| Validation in Unmarshal | zero-sender, 32-byte hash, max-snapshots
cap, max-signature caps, ordering, intra-bundle hash consistency,
snapshot validity. |

## Why JSON

Per RFC-21 Resolved Decision 6: matches Phase 1B's FROST/tbtc-signer
protocol messages and inherits envelope-level operator-key signing
from \`pkg/net\`. Raw JSON never lands on the wire; the messages
implement \`net.TaggedUnmarshaler\` and ride the existing
\`pkg/net/gen/pb\`
envelope when sent.

## Test coverage

22 cases across two test surfaces. Highlights:

- Type strings are stable and under the \`frost_signing/roast/\`
  prefix.
- \`NewLocalEvidenceSnapshot\` sorts \`Overflows\` ascending by Sender;
  empty evidence omits the field via \`omitempty\`.
- JSON round-trips preserve all fields.
- Validation rejects every malformed shape: zero sender, wrong
  hash length, oversize signature, unsorted overflows, duplicate
  overflow sender, empty bundle, oversize bundle, zero
  coordinator, oversize coordinator signature, invalid contained
  snapshot, duplicate bundle sender, mismatched bundle-vs-snapshot
  hash.
- Identical inputs produce byte-identical JSON (deterministic
  canonical form).

## Phase 3 status

| PR | Scope | State |
|---|---|---|
| 3.1 (#3968) | Coordinator skeleton + seed bridge | open |
| **3.2 (this)** | **TransitionMessage + LocalEvidenceSnapshot types** |
**open** |
| 3.3 | Aggregation + bundle verification | next |
| 3.4 | NextAttempt policy + thresholds | after 3.3 |

## Verification

| Command | Result |
|---|---|
| \`go build ./...\` | clean |
| \`go test ./pkg/frost/roast/...\` | pass (29 cases total in this
package now) |
| \`go test -race ./pkg/frost/roast/...\` | pass |
| \`go test -tags 'frost_native frost_tbtc_signer' ./pkg/frost/...\` |
pass (5 packages) |
| \`staticcheck -checks '-SA1019' ./pkg/frost/roast/...\` | silent |
| \`go vet ./pkg/frost/roast/...\` | clean |
| \`gofmt -l ./pkg/frost/roast/\` | silent |

## Test plan

- [ ] CI green.
- [ ] Reviewer confirms the size caps (\`MaxSnapshotsPerBundle = 256\`,
\`MaxOperatorSignatureBytes = 256\`, \`MaxCoordinatorSignatureBytes =
256\`)
  are appropriate ceilings for production signing groups.
- [ ] Reviewer confirms the deferred-signature contract is
  acceptable for Phase 3.2 -- signature computation/verification
  lands in Phase 3.3.
mswilkison added a commit that referenced this pull request May 23, 2026
…ion (#3970)

## Summary

Third Phase-3 implementation PR. Adds the methods that drive the
ROAST coordinator-aggregation flow defined in RFC-21's **Resolved
Decisions** section:

| Method | Role |
|---|---|
| \`RecordEvidence(handle, snap)\` | Accept a peer's signed
\`LocalEvidenceSnapshot\`. Validates structure, verifies the operator
signature, checks the snapshot's \`AttemptContextHash\` matches the
handle's bound context, applies first-write-wins / equal-or-reject. |
| \`AggregateBundle(handle)\` | Called by the elected coordinator. Sorts
accumulated snapshots ascending by \`SenderID\`, builds the
\`TransitionMessage\`, signs the canonical bundle bytes, transitions
state to \`Transitioned\`. |
| \`VerifyBundle(handle, msg)\` | Called by every receiver. Verifies
coordinator signature, every snapshot's operator signature, and -- if
the receiver has submitted its own snapshot -- presence of that snapshot
in the bundle (censorship detection). |

**Stacked on #3969 (Phase 3.2).**

## What's new

### \`pkg/frost/roast/signature.go\`

- \`Signer\` / \`SignatureVerifier\` interfaces (Phase 4 wires them to
\`pkg/net\`'s operator-key + member-keys surfaces).
- \`NoOpSigner\` / \`NoOpSignatureVerifier\` for tests that don't
exercise the crypto pipeline.
- \`CanonicalSnapshotBytes\` -- JSON of snapshot fields *excluding*
\`OperatorSignature\`.
- \`CanonicalBundleBytes\` -- JSON of bundle fields *excluding*
\`CoordinatorSignature\` but *including* every snapshot's
\`OperatorSignature\`, so the coordinator's signature attests to the
exact assembled set.
- \`verifySnapshotSignature\` / \`verifyBundleSignature\` /
\`verifyOwnObservationsPresent\` -- the receiver-side checks, each
testable in isolation.
- Sentinels: \`ErrSignatureInvalid\`, \`ErrSignatureMissing\`,
\`ErrCensorshipDetected\`.

### \`pkg/frost/roast/coordinator_state.go\` (extended)

- \`Coordinator\` interface gains \`RecordEvidence\`,
\`AggregateBundle\`, \`VerifyBundle\`.
- \`NewInMemoryCoordinatorWithSigning(selfMember, signer, verifier)\` --
production constructor (Phase 4 callers).
- \`NewInMemoryCoordinator()\` preserved as a Phase-3.1-compatible
convenience that uses no-op signing.
- New sentinels: \`ErrNotAggregator\`, \`ErrAttemptStateInvalid\`,
\`ErrAttemptContextMismatch\`, \`ErrSnapshotConflict\`.

### \`pkg/frost/roast/transition_message.go\` (touched)

- \`validate()\` methods promoted to public \`Validate()\` on both
\`LocalEvidenceSnapshot\` and \`TransitionMessage\` so callers that
construct messages in memory can validate without a marshal/unmarshal
round-trip.

## What's tested

### \`signature_test.go\` (13 cases)

Signature interfaces, canonical encodings, signature verification
round-trips (via a deterministic SHA-256 fake signer/verifier),
tampered-payload rejection, coordinator-mismatch rejection,
censorship-detection helper (missing snapshot, mutated signature, skip
semantics).

### \`bundle_aggregation_test.go\` (11 cases)

- \`RecordEvidence\`: nil rejection, unknown handle, context hash
mismatch, bad signature, valid-and-idempotent re-submission, conflict
rejection, self-submission tracking.
- \`AggregateBundle\`: non-aggregator rejection, signed bundle build
(size, ordering, signature, terminal state), **deterministic bundle JSON
across different record orderings**.
- \`VerifyBundle\`: valid acceptance, **censorship detection**,
coordinator-signature forgery, snapshot-signature forgery,
attempt-context mismatch, nil message, unknown attempt, concurrent
record-and-aggregate safety.

### Verification

| Command | Result |
|---|---|
| \`go build ./...\` | clean |
| \`go test ./pkg/frost/roast/...\` | pass |
| \`go test -race ./pkg/frost/roast/...\` | pass |
| \`go test -tags 'frost_native frost_tbtc_signer' ./pkg/frost/...\` |
pass (5 packages) |
| \`staticcheck -checks '-SA1019' ./pkg/frost/roast/...\` | silent |
| \`go vet ./pkg/frost/roast/...\` | clean |
| \`gofmt -l ./pkg/frost/roast/\` | silent |

## Why the censorship-detection check is what it is

A receiver that has submitted its own snapshot but is missing from
the bundle has two possible explanations: (1) the elected coordinator
maliciously dropped the snapshot, or (2) the bundle was assembled
before the receiver's submission arrived. In either case, feeding
the bundle into \`NextAttempt\` would penalise the receiver (via
silence-parking), so the bundle must be rejected pending re-broadcast
on the next attempt. \`ErrCensorshipDetected\` is the unambiguous
signal.

When the receiver has not yet submitted (selfMember == 0 or
selfSubmission == nil), the check is skipped: there is no submitted
snapshot whose presence to verify.

## Phase 3 status

| PR | Scope | State |
|---|---|---|
| 3.1 (#3968) | Coordinator skeleton + seed bridge | open |
| 3.2 (#3969) | TransitionMessage + LocalEvidenceSnapshot | open |
| **3.3 (this)** | **Aggregation + bundle verification** | **open** |
| 3.4 | NextAttempt policy + thresholds | next |

## Test plan

- [ ] CI green.
- [ ] Reviewer confirms the censorship-detection semantics are
  acceptable (specifically that we *reject* the bundle on missing
  self-snapshot rather than accept it and let silence-parking
  catch it).
- [ ] Reviewer confirms the canonical-encoding contract (snapshot
  omits its own signature; bundle includes snapshot signatures but
  omits its own).
mswilkison added a commit that referenced this pull request May 23, 2026
#3971)

## Summary

**Closes Phase 3 of RFC-21.** Adds the deterministic
\`(AttemptContext, TransitionMessage) -> AttemptContext\` policy that
makes ROAST-aware retry possible.

Two honest signers fed the same previous context and the same
verified bundle compute byte-identical next contexts -- the
foundational invariant the coordinator-aggregation model exists to
enforce.

Stacked on #3970 (Phase 3.3).

## What lands

### Policy (RFC-21 Layer B)

| Step | Logic |
|---|---|
| 1. Permanent exclusion (transport) | overflow count summed across
bundle ≥ \`OverflowExclusionThreshold\` (constant = 4). |
| 2. Permanent exclusion (validation) | reject events; no-op hook
(reject category lands in a later phase). |
| 3. Silence parking (strictly transient) | senders in prev IncludedSet
not present in bundle, not now permanently excluded → moved to
TransientlyParked for **one attempt only**. |
| 4. Reinstatement | prev TransientlyParked members rejoin IncludedSet
automatically. |
| 5. Infeasibility | if next IncludedSet < threshold →
\`ErrAttemptInfeasible\`. |

### AttemptContext: \`TransientlyParked\` field

Parking metadata must flow between attempts because two honest
coordinators computing the next attempt must agree on who is parked.
Including \`TransientlyParked\` in the canonical hash binds the
contract.

The Phase 1A pinned-fixture test's inline reference encoder is
updated in lockstep, so any future drift between the production
encoder and the reference is still caught at code review.

### Why "strictly transient"

This was the formal mitigation Gemini's Phase-3 design review asked
for. A peer falsely labelled silent (network blip, coordinator
censorship caught at \`VerifyBundle\`) is reinstated by the very next
attempt without intervention. Permanent exclusion only follows from
overflow or validation reject, neither of which can fire on a
slow-but-honest peer.

### \`NextAttempt(handle, bundle, threshold, dkgGroupPublicKey)\`

| Parameter | Why |
|---|---|
| \`handle\` | identifies the previous attempt (its IncludedSet +
ExcludedSet + TransientlyParked) |
| \`bundle\` | verified TransitionMessage (caller must call
\`VerifyBundle\` first) |
| \`threshold\` | FROST signing threshold \`t\` for the key group;
constant across a session |
| \`dkgGroupPublicKey\` | per RFC-21 Decision 2: extract from FFI signer
material at attempt construction |

Threshold = 0 disables the infeasibility check (test seam; never
production).

## Test coverage

15 new cases in \`next_attempt_test.go\` covering:

- No-evidence baseline (IncludedSet unchanged, AttemptNumber++)
- Overflow at exact threshold triggers permanent exclusion
- Overflow below threshold does not exclude
- Silent member → \`TransientlyParked\`, not \`ExcludedSet\`
- Previously parked → reinstated to \`IncludedSet\`
- **Full park/reinstate cycle across N → N+1 → N+2** -- the
  defining test for "strictly transient" parking
- Original signer set size preserved across transitions
- Determinism: same inputs → same next-context hash
- Infeasibility error when below threshold
- Threshold = 0 disables check
- Overflow summed across observers (not maxed)
- Nil bundle rejected
- Unknown handle returns \`ErrUnknownAttempt\`
- \`OverflowExclusionThreshold\` constant matches RFC-21 spec

## Phase 3 status

| PR | Scope | State |
|---|---|---|
| 3.1 (#3968) | Coordinator skeleton + seed bridge | open,
Gemini-approved |
| 3.2 (#3969) | TransitionMessage + LocalEvidenceSnapshot | open,
Gemini-approved |
| 3.3 (#3970) | Aggregation + bundle verification | open,
Gemini-approved |
| **3.4 (this)** | **NextAttempt policy + thresholds** | **open** |

Phase 3 is now feature-complete. **Phase 4** wires receivers to a
real coordinator instance behind the \`frost_roast_retry\` build tag.

## Verification

| Command | Result |
|---|---|
| \`go build ./...\` | clean |
| \`go test ./pkg/frost/roast/...\` | pass |
| \`go test -race ./pkg/frost/roast/...\` | pass |
| \`go test -tags 'frost_native frost_tbtc_signer' ./pkg/frost/...\` |
pass (5 packages) |
| \`staticcheck -checks '-SA1019' ./pkg/frost/roast/...\` | silent |
| \`go vet ./pkg/frost/roast/...\` | clean |
| \`gofmt -l ./pkg/frost/roast/\` | silent |

## Test plan

- [ ] CI green.
- [ ] Reviewer confirms the canonical-hash extension is acceptable
  (Phase 1B field is optional during Phases 1-5, so live peers
  running older code can still interop without consuming the hash).
- [ ] Reviewer confirms the parking discipline as documented matches
  what RFC-21's Resolved Decisions section specifies.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant