Skip to content

fix(sdk): sandbox custom-code evaluators by default (RestrictedPython runner)#4636

Draft
mmabrouk wants to merge 1 commit into
mainfrom
fix/restricted-evaluator-runner-default
Draft

fix(sdk): sandbox custom-code evaluators by default (RestrictedPython runner)#4636
mmabrouk wants to merge 1 commit into
mainfrom
fix/restricted-evaluator-runner-default

Conversation

@mmabrouk

Copy link
Copy Markdown
Member

Context

Custom-code evaluators executed user-supplied Python with a raw exec() inside the services process. There was no sandbox unless an operator explicitly set AGENTA_SERVICES_CODE_SANDBOX_RUNNER=daytona. On a self-hosted deployment, any authenticated user who could create a custom-code evaluator could run arbitrary code on the host. The older RestrictedPython sandbox was dropped when evaluation moved to the new runner architecture, which left raw exec() (local) as the default.

Changes

Add a RestrictedPython-based runner and make it the default. AGENTA_SERVICES_CODE_SANDBOX_RUNNER now selects one of three runners:

  • restricted (new default): in-process RestrictedPython sandbox. Strict pure-stdlib import allowlist, no filesystem, network, or host access.
  • local: raw exec(), no sandbox. Explicit opt-in for trusted or single-tenant deployments.
  • daytona: isolated remote sandbox (unchanged).

The new sandbox closes the two holes the previous RestrictedPython sandbox had:

  1. The old code injected the real __import__, so import os worked. The new runner installs a guarded __import__ that allows only a pure-stdlib allowlist (math, statistics, datetime, json, re, random, string, typing, collections, itertools, functools). httpx and anything with host or network reach are excluded.
  2. The old code never set _getattr_, leaving the ().__class__.__bases__[0].__subclasses__() gadget open. The new runner sets safer_getattr, which blocks dunder and underscore attribute access.

Before:

exec(code, {})        # full builtins, any import
fn = env["evaluate"]

After:

byte_code = compile_restricted(code, ...)   # guarded builtins, allowlisted imports
exec(byte_code, restricted_globals)         # safer_getattr, guarded getitem/getiter/write

import os, import subprocess, import httpx, __import__('os'), open(...), eval(...), and the class-gadget escape all raise instead of running.

Behavior change (rollout note)

The default flips from unrestricted exec to the sandbox. Existing self-hosted evaluators that rely on non-allowlisted imports (for example httpx or os) or on raw exec will now fail under the default. They must opt back in with AGENTA_SERVICES_CODE_SANDBOX_RUNNER=local (trusted only) or move to daytona. The import and error messages name this escape hatch. RestrictedPython is a new dependency, so self-hosters need to rebuild the services image, not just change an env var. Agenta Cloud is unaffected; it already isolates evaluator execution.

Tests / notes

  • New sdks/python/oss/tests/pytest/utils/test_restricted_runner.py: v1 and v2 evaluators return floats, allowlisted imports work, and import os/subprocess/httpx, __import__('os'), open(), eval(), and the class-gadget escape are all blocked. Registry selection is covered (default restricted, local opt-in, legacy AGENTA_SERVICES_SANDBOX_RUNNER, daytona-without-key raises, unknown value raises).
  • The existing test_code_v0.py suite now runs under the restricted default.
  • uv run pytest on both files: 60 passed, 2 daytona-only skipped. ruff format and ruff check clean.
  • Docs: a self-hosting :::warning on the custom-evaluator page and the three runner options in the self-host configuration reference. Example env files and Helm values updated to show restricted as the default and warn on local.
  • Not yet verified end-to-end on a live self-hosted stack (the dev services image needs a rebuild to install RestrictedPython before the default works there).

… runner)

Custom-code evaluators ran user Python with raw exec() in the services
process, sandboxed only if an operator opted into Daytona. Add a
RestrictedPython runner and make it the default; keep raw exec() as an
explicit 'local' opt-in. The new sandbox uses a guarded __import__ limited
to a pure-stdlib allowlist and safer_getattr to block the class-gadget
escape, closing the two holes the previous RestrictedPython sandbox had.
@vercel

vercel Bot commented Jun 11, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agenta-documentation Ready Ready Preview, Comment Jun 11, 2026 9:33am

Request Review

@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 9194e3ca-208a-41c2-981f-b81a61a9d761

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • ✅ Review completed - (🔄 Check again to review again)
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/restricted-evaluator-runner-default

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
hosting/kubernetes/ee/values.ee.example.yaml (1)

116-125: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Helm schema blocks the new "restricted" default across both example files. Both the EE and OSS Kubernetes examples now document and set sandboxRunner: restricted, but hosting/kubernetes/helm/values.schema.json defines the enum as ["local", "daytona"] and omits "restricted". Users applying these examples will hit schema validation failures. The schema enum must be updated to ["restricted", "local", "daytona"] to match the runtime contract and allow the new default value.

🧹 Nitpick comments (1)
sdks/python/agenta/sdk/engines/running/runners/restricted.py (1)

175-182: 💤 Low value

Redundant SyntaxError handler (harmless).

The SyntaxError catch at lines 178-179 is redundant since syntax errors are already caught during compilation at lines 146-147. However, defensive error handling is acceptable here.


ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 53ec6584-c261-4e60-99ba-f4d6cfa0e963

📥 Commits

Reviewing files that changed from the base of the PR and between 787ed8b and e99bee5.

⛔ Files ignored due to path filters (1)
  • sdks/python/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (15)
  • api/oss/src/utils/env.py
  • docs/docs/evaluation/configure-evaluators/07-custom-evaluator.mdx
  • docs/docs/self-host/02-configuration.mdx
  • hosting/docker-compose/ee/env.ee.dev.example
  • hosting/docker-compose/ee/env.ee.gh.example
  • hosting/docker-compose/oss/env.oss.dev.example
  • hosting/docker-compose/oss/env.oss.gh.example
  • hosting/kubernetes/ee/values.ee.example.yaml
  • hosting/kubernetes/oss/values.oss.example.yaml
  • sdks/python/agenta/sdk/engines/running/runners/registry.py
  • sdks/python/agenta/sdk/engines/running/runners/restricted.py
  • sdks/python/agenta/sdk/engines/running/sandbox.py
  • sdks/python/oss/tests/pytest/utils/test_code_v0.py
  • sdks/python/oss/tests/pytest/utils/test_restricted_runner.py
  • sdks/python/pyproject.toml
💤 Files with no reviewable changes (1)
  • sdks/python/agenta/sdk/engines/running/sandbox.py

@junaway

junaway commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

@mmabrouk
Adding back the restricted runtime is great, making it the default is not.
If a user cares about this and understands the implications about the different options, they will make the change. For everybody else, local is much less restrictive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants