Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 29 additions & 6 deletions .claude/skills/forge.md
Original file line number Diff line number Diff line change
Expand Up @@ -124,11 +124,16 @@ The path of a single A2A invocation:
```
1. Inbound HTTP request → POST / with JSON-RPC envelope
forge-cli/server/a2a_server.go:handleJSONRPC
2. Middleware chain:
auth (forge-core/auth/middleware.go) — verifies bearer
rate-limit — per-IP, FWS-10
2. Middleware chain (outermost → innermost):
installSequenceCounterMiddleware — installs the
per-request SequenceCounter on r.Context()
BEFORE the auth chain runs so auth_verify /
auth_fail land seq=1 (FWS-8, fix #174)
auth (forge-core/auth/middleware.go) — verifies bearer;
OnAuth callback emits via EmitFromContext so
seq + trace_id + workflow tags are stamped
rate-limit — per-IP, FWS-10
CORS, security headers, request-size limits
audit middleware — auth_verify/auth_fail
3. Dispatch to method handler:
tasks/send → runs the agent (long)
tasks/cancel → signals an in-flight invocation (instant)
Expand All @@ -137,7 +142,8 @@ The path of a single A2A invocation:
4. Request entry (forge-cli/runtime/runner.go):
correlation_id generated
task_id from params.ID
WithSequenceCounter installed (FWS-8)
EnsureSequenceCounter reuses the auth-installed counter
(or installs a fresh one on the --no-auth path) — FWS-8
LLMUsageAccumulator installed (FWS-3)
CancellationRegistry registers a CancelCauseFunc (FWS-4)
5. AgentExecutor loop:
Expand Down Expand Up @@ -552,6 +558,23 @@ fan-out across configured sinks.
responsible for routing captured payloads to a store appropriate to
their sensitivity — Forge does not redact.

**Emit invariant.** Every audit emission inside an A2A invocation
scope MUST go through `AuditLogger.EmitFromContext(ctx, ...)` (or one
of the typed helpers — `EmitLLMCall`, `EmitToolExec`, etc.). That's
the path that picks the per-invocation `SequenceCounter` off ctx and
stamps `seq` + `trace_id` + `span_id` + workflow tags. Plain `Emit`
emits raw — no seq, no trace link, no workflow tags. Regression pins:
`TestToolExecAudit_CarriesSequenceFromContext` (PR #173,
`tool_exec` + `session_end`),
`TestAuthAudit_SeqStampedWhenCounterInstalled` (PR #176, `auth_verify`
once the counter is installed by the middleware wrapper). Sites that
still call plain `Emit` are explicitly outside any invocation scope:
the egress proxy's subprocess CONNECT (no Go ctx tying back),
MCP-server startup (`mcp_server_started` / `_failed` / `_degraded`,
pre-invocation), and the scheduler tick (`schedule_fire` /
`schedule_complete`, runs on its own timer). Those events
intentionally have no `seq`.

**Sinks** (FWS-7):

| Sink | Always on? | Notes |
Expand Down Expand Up @@ -1009,7 +1032,7 @@ Every event also carries `schema_version: "1.0"` (FWS-8) and `seq`
| **FWS-5** | #89 | Platform policy enforcement at runtime — workspace-level deploy-time bounds (egress / tools / models / sizes); `forge package` policy-ready manifests | `docs/security/platform-policy.md` |
| **FWS-6** | #90 | Three-layer platform policy + channel scope — system / user / workspace layers compose by union + most-restrictive; `denied_channels` first-class; `forge channel disable/enable` edits the user layer | `docs/security/platform-policy.md` |
| **FWS-7** | #95 | Audit event export capability — Unix Domain Socket sink + HTTP localhost fallback; fire-and-forget; `audit_export_status` periodic per-sink health | `docs/security/audit-logging.md` § Audit Event Export |
| **FWS-8** | #91 | Hardened audit emission — `schema_version` + monotonic `seq` per invocation; default metadata-only invariant pinned by regression test; opt-in `AuditPayloadCapture` with per-field byte caps | `docs/security/audit-logging.md` § Schema contract |
| **FWS-8** | #91 | Hardened audit emission — `schema_version` + monotonic `seq` per invocation; default metadata-only invariant pinned by regression test; opt-in `AuditPayloadCapture` with per-field byte caps. Follow-ups: #173 (PR closed the seq gap on `tool_exec` / `session_end` — 3 sites switched from plain `Emit` to `EmitFromContext`) and #174 (PR moved the `SequenceCounter` installation upstream of the auth middleware via a wrapper + `EnsureSequenceCounter` so `auth_verify` / `auth_fail` land seq=1). #175 tracks a follow-up lint to catch future `Emit`-instead-of-`EmitFromContext` drift. | `docs/security/audit-logging.md` § Schema contract / § Sequence numbers |
| **FWS-9** | #100 | Ops logger output stream separation — stdout for `JSONLogger`, stderr stays for audit NDJSON | `docs/security/audit-logging.md` § Streams |
| **FWS-10** | #110 | Rate-limit configurability + orchestration-friendly defaults + cancel exemption — `server.rate_limit:` yaml block + CLI flags + env; `tasks/cancel` exempt from the write bucket by default | `docs/reference/forge-yaml-schema.md` § `server.rate_limit` |
| **OTel v1** | #108 | OpenTelemetry tracing — shipped across phases #101-#107 (PRs #122-#128). Tracer seam → OTLP provider → config resolver + CLI flags → span instrumentation across A2A/executor/LLM/tool → audit ↔ trace cross-link → end-to-end A2A propagation → build-time egress merge. Off by default; reuses the egress-enforced transport. | `docs/core-concepts/observability-tracing.md` |
Expand Down
43 changes: 43 additions & 0 deletions docs/security/audit-logging.md
Original file line number Diff line number Diff line change
Expand Up @@ -490,6 +490,49 @@ start their own counters. Events emitted outside any invocation scope
(`policy_loaded`, `agent_card_published`, `audit_export_status`) omit
`seq` entirely.

#### Counter installation order

The per-invocation `SequenceCounter` is installed on `r.Context()` by
`installSequenceCounterMiddleware`, which wraps the auth middleware so
the counter is already on context before the auth chain runs. This
puts `auth_verify` / `auth_fail` first in the sequence (`seq=1`) and
keeps the rest of the per-invocation events (`session_start`,
`guardrail_check`, `llm_call`, `tool_exec`, `invocation_complete`,
etc.) gap-free under the same `(correlation_id, task_id)` group. The
runner's request entry calls `coreruntime.EnsureSequenceCounter` —
which reuses the wrapper-installed counter when present and installs a
fresh one on the `--no-auth` path, so no embedder configuration loses
seq stamping. Pinned by `TestAuthAudit_SeqStampedWhenCounterInstalled`
and `TestEnsureSequenceCounter_ReusesExisting` (issue #174).

#### Emit invariant

The seq counter is picked up by `AuditLogger.EmitFromContext(ctx, ...)`
(and the typed helpers built on top of it — `EmitLLMCall`,
`EmitToolExec`, `EmitInvocationComplete`, `EmitInvocationCancelled`,
the egress and guardrail emit paths). Plain `AuditLogger.Emit` skips
the counter and the trace cross-link — so every audit emission that
happens inside an invocation scope MUST go through `EmitFromContext`.
This was the regression behind issues #173 (three sites — the
`BeforeToolExec` / `AfterToolExec` hook callbacks and the
outbound-guardrail-failure `session_end` emit — had drifted to plain
`Emit` and lost seq on `tool_exec` + that branch's `session_end`) and
#174 (the auth callback couldn't use `EmitFromContext` until the
counter was installed upstream of the auth middleware). Pinned by
`TestToolExecAudit_CarriesSequenceFromContext`. Sites that still call
plain `Emit` are explicitly outside any invocation scope and are
documented inline:

| Site | Why plain `Emit` |
|---|---|
| Egress proxy `OnAttempt` with `source=proxy` | Subprocess HTTP `CONNECT` has no Go ctx tying back to the A2A request |
| MCP server startup events (`mcp_server_started` / `_failed` / `_degraded`) | Pre-invocation; no scope |
| Scheduler tick (`schedule_fire` / `schedule_complete` / `schedule_skip` / `schedule_modify`) | Runs on its own timer outside any A2A request |
| Startup banners (`policy_loaded`, `agent_card_published`, `audit_export_status`) | Pre-invocation; no scope |

Issue #175 tracks a follow-up vet/lint pass to catch future
`Emit`-instead-of-`EmitFromContext` drift on per-invocation events.

### Schema versioning policy

| Change | Bumps version? |
Expand Down
Loading