fix(audit): stamp seq on tool_exec + session_end events by initializ-mk · Pull Request #173 · initializ/forge

initializ-mk · 2026-06-16T16:11:14Z

Symptom

A user on v0.15.0 reported `tool_exec` rows in the audit stream were missing the `seq` field even though they fired inside an A2A invocation. Their Slack-triggered task showed:

```
session_start seq=1 ✓
guardrail_check seq=2 ✓
llm_call seq=3 ✓
tool_exec (start) (none) ← BUG
tool_exec (end) (none) ← BUG
llm_call seq=4 ✓
session_end seq=5 ✓
invocation_complete seq=6 ✓
```

This breaks the FWS-8 guarantee that every event from a given `correlation_id` exposes a monotonically increasing `seq` consumers use for gap detection. `tool_exec` events also lose `trace_id` / `span_id` cross-link (Phase 4 / #105).

Root cause

`registerAuditHooks` registered the tool-exec hooks with `func(_ context.Context, ...)` — discarding the per-invocation ctx — and called `auditLogger.Emit(...)` instead of `EmitFromContext(ctx, ...)`. The per-invocation `SequenceCounter` lives on ctx; plain `Emit` can't see it.

Same pattern existed at one more site: the JSON-RPC `tasks/send` handler's outbound-guardrail-failure `session_end` branch at `runner.go:~1546` — the parallel success-path SSE branch right below it correctly used `EmitFromContext`, this one had drifted.

Fix

Three sites changed from `Emit(...)` to `EmitFromContext(ctx, ...)`:

Site	Was	Now
`registerAuditHooks` BeforeToolExec	`func(_ context.Context, ...)` + `Emit`	`func(ctxStart context.Context, ...)` + `EmitFromContext`
`registerAuditHooks` AfterToolExec	`func(_ context.Context, ...)` + `Emit`	`func(ctxEnd context.Context, ...)` + `EmitFromContext`
JSON-RPC tasks/send outbound-guardrail-failure session_end	`Emit`	`EmitFromContext(ctx, ...)`

Sites left alone (intentional, documented inline)

Site	Why `Emit` is correct
`makeAuthAuditCallback` (auth_verify / auth_fail)	Auth middleware runs before the runner installs the seq counter on ctx.
Egress proxy `OnAttempt` with `source: proxy`	Subprocess HTTP CONNECT through the proxy has no Go ctx tying it back to the active A2A request.
MCP tool conflict at startup	Pre-invocation, no scope.
Schedule dispatcher (`schedule_fire` / `schedule_complete`)	Scheduler tick runs outside any A2A request scope.

The in-process egress enforcer's `OnAttempt` at `runner.go:~496` already uses `EmitFromContext` and stamps seq correctly.

Test plan

New regression test `TestToolExecAudit_CarriesSequenceFromContext` fires the registered hooks against a context carrying a `SequenceCounter`, parses the emitted NDJSON, and asserts `BeforeToolExec.seq=1` + `AfterToolExec.seq=2`. Pre-fix this test fails with `seq=0` on both events.
`go test -count=1 ./...` clean in forge-cli.
`golangci-lint run ./forge-cli/...` → 0 issues.
`gofmt -w` applied.

Schema impact

None. The events were already declared to carry `seq` — the field's `omitempty` JSON tag just dropped it when the value was zero. After the fix, consumers see the field they were already expecting; consumers ignoring unknown keys see no change.

Suggested follow-up (not in this PR)

The auth_verify / auth_fail seq gap is structurally fixable by moving the seq-counter installation earlier in the request pipeline (before the auth middleware). Worth filing as its own issue — same for adding a "audit emit must be EmitFromContext when inside an invocation scope" lint or linter pragma so this kind of drift gets caught at PR time.

Three audit-emit sites inside the per-A2A-invocation scope were using plain AuditLogger.Emit() instead of EmitFromContext(ctx, ...), which meant the per-invocation sequence counter installed on ctx at request entry never reached those events. Consumers relying on FWS-8's "every event from a given correlation_id exposes a monotonically increasing seq" guarantee saw gaps in their gap-detection pipeline exactly at tool_exec and the outbound-guardrail-failure session_end branch. Reported on v0.15.0: a Slack-triggered code-review task's audit stream had session_start (seq=1), guardrail_check (seq=2), llm_call (seq=3), then tool_exec start/end with NO seq, then llm_call (seq=4), session_end (seq=5), invocation_complete (seq=6) — confirming the counter advances but tool_exec emits skip it. Three sites fixed: - registerAuditHooks BeforeToolExec (runner.go:~1990) — was `func(_ context.Context, ...)` then plain Emit; now takes ctxStart and calls EmitFromContext. - registerAuditHooks AfterToolExec (runner.go:~2016) — same. - JSON-RPC tasks/send handler outbound-guardrail-failure session_end (runner.go:~1546) — was plain Emit; now EmitFromContext(ctx, ...) matching the parallel success-path SSE branch right below it. Sites left as plain Emit (intentional, documented inline): - makeAuthAuditCallback (auth_verify / auth_fail) — auth middleware runs BEFORE the runner installs the seq counter on ctx. - egress proxy OnAttempt with source=proxy — subprocess HTTP CONNECT has no Go ctx tying it back to the active A2A request. - MCP tool conflict at startup — pre-invocation, no scope. - Schedule dispatcher (schedule_fire / schedule_complete) — scheduler tick runs outside any A2A request scope. Regression test: TestToolExecAudit_CarriesSequenceFromContext fires the registered hooks against a context carrying a SequenceCounter, parses the emitted NDJSON, and asserts BeforeToolExec.seq=1 + AfterToolExec.seq=2.

…seq=1 (closes #174) Pre-fix: auth_verify / auth_fail audit events lacked the seq field because the SequenceCounter was installed downstream (in the runner's per-A2A-request setup) of the auth middleware. The auth chain's OnAuth callback used plain AuditLogger.Emit and couldn't reach a counter that didn't exist yet on req.Context(). Symptom (user-visible on v0.15.0): auth_verify (no seq) ← BUG session_start seq=1 guardrail_check seq=2 llm_call seq=3 ... The FWS-8 contract promises every event under a given correlation_id exposes a monotonically increasing seq for gap detection. auth_verify sitting outside the sequence breaks that. Fix: install the SequenceCounter on r.Context() BEFORE the auth chain runs, via a thin middleware wrapper around auth.Middleware in the runner. The auth callback now uses EmitFromContext(req.Context(), ...) and stamps seq=1 on auth_verify. The runner's per-A2A-request setup calls coreruntime.EnsureSequenceCounter (new helper) which detects the existing counter and reuses it — so session_start lands seq=2 and the sequence is gap-free. Three pieces: - forge-core/runtime/audit_schema.go gains EnsureSequenceCounter: returns ctx unchanged when a counter is present, installs a fresh one when absent. Used at every invocation-entry point that may run downstream of an upstream middleware. - forge-cli/runtime/sequence_counter_middleware.go (new) is the thin wrapper: composes (next) → authMW(next) → install counter → serve. Cost: ~24B counter allocation per request, even on skip paths — simpler than threading skip-path knowledge into the wrapper, and the alloc is in the same ballpark as the request struct. - forge-cli/runtime/runner.go: * AuthMiddleware: auth.Middleware(authCfg) → installSequenceCounterMiddleware(auth.Middleware(authCfg)) * makeAuthAuditCallback: Emit → EmitFromContext(req.Context()) * Three in-request WithSequenceCounter(ctx, new(...)) calls → EnsureSequenceCounter(ctx). Counter is reused from the auth wrapper (auth ON path) or installed fresh (--no-auth path). * Scheduler dispatcher's WithSequenceCounter call is left alone — it runs outside any HTTP request, no upstream counter. Three other plain-Emit sites stay as documented in PR #173: - source=proxy egress: subprocess HTTP CONNECT has no Go ctx - mcp_tool_conflict: startup-time, pre-invocation - schedule_fire / schedule_complete: scheduler tick, no A2A scope End-to-end smoke (--no-auth path): session_start seq=1 session_end seq=2 invocation_complete seq=3 End-to-end smoke (auth on, internal token): auth_verify seq=1 ← FIXED session_start seq=2 session_end seq=3 invocation_complete seq=4 Tests: - TestAuthAudit_SeqStampedWhenCounterInstalled — the #174 pin: with a counter on req.Context(), auth_verify lands seq=1 - TestAuthAudit_NoSeqWhenCounterAbsent — back-compat: no counter → omitempty drops the field (legacy embedders without the wrapper) - TestSequenceCounterMiddleware_InstallsCounterBeforeNext — the wrapper installs the counter before delegating - TestEnsureSequenceCounter_ReusesExisting — runner must not clobber the auth-installed counter - TestEnsureSequenceCounter_InstallsFresh — --no-auth path: install a fresh counter when none is present Schema impact: none. The events were already declared to carry seq.

Reflects PRs #173 and #174 (both merged to main) in the canonical audit doc and the comprehensive knowledge skill. docs/security/audit-logging.md — under "Sequence numbers", added: - Counter installation order — the SequenceCounter is installed by installSequenceCounterMiddleware (wraps the auth chain) so auth_verify / auth_fail land seq=1; the runner's request entry uses EnsureSequenceCounter to reuse the wrapper-installed counter and only allocate fresh on the --no-auth path. Pinned by TestAuthAudit_SeqStampedWhenCounterInstalled + TestEnsureSequenceCounter_ReusesExisting. - Emit invariant — per-invocation events MUST emit via EmitFromContext (or a typed helper); plain Emit skips the counter and the trace cross-link. The regression behind #173 (tool_exec + session_end) and #174 (auth_verify). A 4-row table names the sites that intentionally stay on plain Emit (egress proxy subprocess CONNECT, MCP startup events, scheduler tick, startup banners) and why. Issue #175 is named as the lint follow-up. .claude/skills/forge.md — three sections updated: - § 3 path trace: middleware order now shows installSequenceCounterMiddleware as outermost; runner request entry uses EnsureSequenceCounter. - § 12.4 FWS-8 section: added the Emit invariant paragraph naming the regression pins and exception list. - § 18 FWS-8 row: appended #173 / #174 follow-up summary + #175 lint tracker. No code change. TOC anchors unchanged. Broken-link sweep clean.

initializ-mk force-pushed the fix/audit-tool-exec-seq branch from 2bf4007 to d857bdd Compare June 16, 2026 16:11

This was referenced Jun 16, 2026

Audit emit: stamp seq on auth_verify / auth_fail by installing the sequence counter before the auth middleware #174

Closed

Lint: AuditLogger.Emit inside per-invocation scope should be EmitFromContext #175

Closed

initializ-mk merged commit 16cfcf8 into main Jun 16, 2026
9 checks passed

initializ-mk mentioned this pull request Jun 16, 2026

fix(audit): install SequenceCounter before auth so auth_verify lands seq=1 (closes #174) #176

Merged

6 tasks

initializ-mk mentioned this pull request Jun 16, 2026

docs: audit seq emit invariant + counter install order (post-#173/#174) #178

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(audit): stamp seq on tool_exec + session_end events#173

fix(audit): stamp seq on tool_exec + session_end events#173
initializ-mk merged 1 commit into
mainfrom
fix/audit-tool-exec-seq

initializ-mk commented Jun 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

initializ-mk commented Jun 16, 2026

Symptom

Root cause

Fix

Sites left alone (intentional, documented inline)

Test plan

Schema impact

Suggested follow-up (not in this PR)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant