feat: LLM command-approval classifier (auto mode) by thomaslwang · Pull Request #33586 · anomalyco/opencode

thomaslwang · 2026-06-24T03:48:34Z

Issue for this PR

Type of change

Bug fix
New feature
Refactor / code improvement
Documentation

What does this PR do?

Adds an opt-in "auto mode" classifier that gates the would-auto-approve path in Permission.ask. When a rule resolves to allow, a model is consulted first; it can allow (proceed silently), block (deny-and-continue — returns a ClassifierDeniedError the agent sees as a tool error, with no halt), or fail closed to a human prompt on error/escalation. It never overrides an explicit user deny/ask. Off by default.

Why it's built this way:

The gate sits at the single !needsAsk decision in Permission.ask, so it covers every permissioned tool (bash, edit, webfetch, MCP, task, external-dir), not just bash. Read-only tools short-circuit before any model call.
The classifier is fed a reasoning-blind transcript — user text + the bare tool-call payload only, no assistant prose and no prior tool output — so tool-sourced/injected content can't grant permission and the model can't be talked into a call by the agent's own narration.
session/tools.ts passes the gate as a thunk run through the existing EffectBridge (run.run), which supplies the captured request context, so Permission.ask's requirement set stays never.
Denials are counted per session (3 consecutive / 20 total → escalate to the human), reset each user turn, so a false positive can't loop forever.
The backend is pluggable; the default calls the user's configured model via the AI SDK. og-local/og-saas backends are present but fail closed until implemented.

Config is a new classifier block in core/v1/config.

How did you verify your code works?

bun run typecheck is clean, and bun test test/classifier.test.ts passes (11 tests covering: no assistant-prose leak into the transcript, unparseable verdict → fail closed, the safe-tool allowlist, and the copy-then-edit policy slots). The pre-push checks pass.

Screenshots / recordings

N/A — no UI changes.

Checklist

I have tested my changes locally
I have not included unrelated changes in this PR

Opt-in classifier that gates auto-approved tool calls, after Claude Code "auto mode". Off by default. - Pluggable ClassifierProvider; default uses the user's configured model via the AI SDK (single-pass <block>yes/no). - Hooks Permission.ask on the would-auto-approve path only: block -> deny-and-continue (ClassifierDeniedError, surfaces as a tool error, no halt); classifier error/escalation -> fail closed (human ask). Never overrides an explicit user deny/ask. - Reasoning-blind transcript (user text + assistant tool calls only): prompt-injection + anti-rationalization defense. - Safe-tool allowlist short-circuit; per-session denial counters (3-consecutive / 20-total escalation, reset each user turn). - New `classifier` config block (backend/model/endpoint/apiKey + allow/ soft_deny/environment policy slots, copy-then-edit). Tests cover reasoning-blindness, verdict parsing (fail-closed), allowlist, and policy slots. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

github-actions · 2026-06-24T04:23:59Z

Thanks for updating your PR! It now meets our contributing guidelines. 👍

github-actions Bot added the needs:compliance This means the issue will auto-close after 2 hours. label Jun 24, 2026

thomaslwang mentioned this pull request Jun 24, 2026

[FEATURE]: LLM command-approval classifier ("auto mode") for permission gating #33585

Open

1 task

github-actions Bot removed the needs:compliance This means the issue will auto-close after 2 hours. label Jun 24, 2026

github-actions Bot mentioned this pull request Jun 24, 2026

📊 AI CLI 工具社区动态日报 2026-06-24 litang9/big_model_radar#114

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: LLM command-approval classifier (auto mode)#33586

feat: LLM command-approval classifier (auto mode)#33586
thomaslwang wants to merge 1 commit into
anomalyco:devfrom
openguardrails:feat/auto-mode-classifier

thomaslwang commented Jun 24, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

thomaslwang commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue for this PR

Type of change

What does this PR do?

How did you verify your code works?

Screenshots / recordings

Checklist

Uh oh!

github-actions Bot commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

thomaslwang commented Jun 24, 2026 •

edited

Loading