Skip to content

[design] Class-oriented authoring API for the Agenta SDK (POC)#4627

Draft
mmabrouk wants to merge 3 commits into
mainfrom
design/class-based-sdk
Draft

[design] Class-oriented authoring API for the Agenta SDK (POC)#4627
mmabrouk wants to merge 3 commits into
mainfrom
design/class-based-sdk

Conversation

@mmabrouk

Copy link
Copy Markdown
Member

Context

Today users write evaluators and applications by decorating plain functions. Schemas are inferred from function signatures at runtime, and the settings/inputs/outputs contracts are untyped dicts. This makes it hard to see what a workflow expects at a glance, and means typos in column names or config keys fail deep inside an evaluation run rather than at definition time.

This PR proposes a class-oriented authoring model as an alternative. None of the code in this PR runs. It is a design POC that shows what the developer experience could look like.

What this adds

Eight annotated example files under docs/designs/class-based-sdk/:

The core pattern. A class IS the workflow. You declare three inner Pydantic models and implement one method:

# Before (today)
@ag.evaluator(slug="rubric-judge", name="Rubric Judge")
async def rubric_judge(inputs: dict, outputs, trace) -> dict:
    ...  # no schema, no validation, raw dicts

# Proposed
class RubricJudge(ag.Evaluator):
    slug = "rubric-judge"
    name = "Rubric Judge"

    class Parameters(BaseModel):     # -> schemas.parameters (the UI config form)
        judge_model: str = "gpt-4o-mini"
        rubric: str = "..."

    class Inputs(BaseModel):         # -> schemas.inputs (testset columns consumed)
        expected_answer: str | None = None

    class Outputs(BaseModel):        # -> schemas.outputs (score columns in the UI)
        score: float
        verdict: str

    async def evaluate(self, *, inputs: Inputs, outputs, parameters: Parameters) -> Outputs:
        ...

The three Pydantic models compile directly to JsonSchemas.parameters/inputs/outputs in the existing WorkflowRevisionData. The class is a typed front-end over the workflow data model, not a parallel system. Everything underneath (middleware chain, handler registry, tracing, upsert, serving) reuses the current engine.

Framework adapters (06_framework_adapters.py). Three tiers for teams already using OpenAI Agents SDK, Pydantic AI, or LangGraph:

  • Manual: build the framework agent inside run() from parameters.
  • Factory: subclass ag.ext.openai_agents.Application, implement build(parameters), the base runs it.
  • Automatic: ag.Application.from_agent(existing_agent). An AgentAdapter port extracts Parameters/Inputs/Outputs from the agent object without mutating it.

Config-only workflows (07_config_only.py). ag.Configuration has Parameters but no handler. It is a versioned, deployable config store (prompts, routing tables, rubrics) that any service can pull with afetch(environment="production"). Generalizes prompt management without a special-case API.

Testsets as classes (08_testsets.py). A Case inner model becomes the column schema. ag.aevaluate can check compatibility between testset columns and application/evaluator inputs before running anything, so a missing column raises at submit time rather than after 200 LLM calls.

Serving (05_serve.py). Each class exposes a standard APIRouter with typed /invoke and /inspect endpoints. You mount it with app.include_router like any FastAPI router. No custom registration call.

Notes

This is purely a design document to spark discussion. The ag.Application, ag.Evaluator, ag.Configuration, ag.Testset, and related methods do not exist yet. The main question for review is whether the proposed surface feels right before any implementation starts.

The implementation path (how classes map onto WorkflowRevisionData, the ~20-line seam change in auto_workflow, what new files would live in sdk/authoring/) is sketched in docs/designs/class-based-sdk/README.md.

@vercel

vercel Bot commented Jun 10, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agenta-documentation Ready Ready Preview, Comment Jun 11, 2026 2:48pm

Request Review

@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 55a44681-9f38-47d3-8f2c-2ecaf0546c5d

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch design/class-based-sdk

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@mmabrouk mmabrouk requested review from ardaerzin and jp-agenta June 10, 2026 20:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants