Skip to content

fix(frontend): show user feedback as filterable fields in observability#4652

Open
mmabrouk wants to merge 3 commits into
mainfrom
fix/observability-feedback-filter
Open

fix(frontend): show user feedback as filterable fields in observability#4652
mmabrouk wants to merge 3 commits into
mainfrom
fix/observability-feedback-filter

Conversation

@mmabrouk

@mmabrouk mmabrouk commented Jun 11, 2026

Copy link
Copy Markdown
Member

Closes #4654

Context

When you sent user feedback on a trace through the API (POST /api/simple/traces/ with an evaluator slug), the Observability trace filter never offered that feedback as a filterable field. The feedback was stored but invisible to filtering.

Two root causes, both on the frontend:

  1. The filter built its feedback options from evaluator.metrics, but its data source returns thin evaluator refs that carry no data/metrics, so the list was always empty (regressed in 5867326f6b).
  2. Evaluators auto-created from feedback infer their output schema with genson over the whole trace data envelope, so the stored schema is wrapped one level deeper than UI-created evaluators. Every frontend consumer expected the flat shape.

Per product decision the backend inference stays as is. The frontend normalizes the shape on read.

Changes

New evaluatorFeedbackSchemasAtom (web/packages/agenta-entities/.../state/evaluatorUtils.ts) resolves each non-archived evaluator's latest revision and exposes its output-metric properties. It follows the existing evaluatorKeyMapAtom pattern. Filters.tsx now reads it instead of the empty .metrics. deriveFeedbackValueType is unchanged, so number/boolean/string detection (including array item types) is preserved.

resolveOutputSchemaProperties (schema.ts) now unwraps the genson envelope when the output schema is a single outputs object with nested properties.

Before (auto-created feedback evaluator output schema):

{ properties: { outputs: { properties: { score, comment } } } }

After (what every consumer now reads):

{ score, comment }

The unwrap is a strict no-op for every other evaluator. No builtin or human evaluator has a lone outputs key with nested properties.

Feedback field free-text entry: evaluators without an output schema suggest no metrics, and the feedback-field Select used to clear anything you typed. It now surfaces the typed text as a <typed> (custom) option that commits and persists, so you can filter by a feedback name even when no schema provides one.

Docs: corrected two pages that claimed evaluators are not auto-created. They are, with an output schema inferred from the first annotation.

Tests / notes

  • Unit tests in web/packages/agenta-entities/tests/unit/resolveOutputSchemaProperties.test.ts cover the wrapped unwrap, flat passthrough, a single real metric literally named outputs, a multi-key schema, and empty/missing schemas. 6/6 pass.
  • @agenta/entities typechecks. ESLint is clean on the changed files.
  • Verified on the dev box: sent feedback with a fresh slug via POST /api/simple/traces/, confirmed the evaluator was auto-created with the wrapped schema, then confirmed score/comment appeared as feedback filter fields with score resolving to a Number filter.
  • The evaluator-name dropdown still sources from the full evaluator list (includes archived) while feedback fields come from non-archived evaluators. Not a regression. Flag for follow-up if archived feedback evaluators should also be hidden from the dropdown.

What to QA

  • In a project with user feedback sent via the API, open Observability, open Filter, add an "Is Annotation" condition, pick the evaluator, then open the Feedback field dropdown. It lists the evaluator's metric keys (e.g. score, comment). Selecting a numeric metric gives a Number filter.
  • Send fresh feedback to a brand-new evaluator slug via POST /api/simple/traces/ (data.outputs + references.evaluator.slug + links.invocation). The new evaluator's fields show up in the filter.
  • In the feedback field, type a name that is not suggested (or pick an evaluator with no schema). A <name> (custom) option appears; selecting it keeps the value and the row stays usable.
  • Regression: existing UI-created human evaluators still show their metrics in the filter and in the trace drawer annotation tab.

The observability trace filter never listed annotation feedback fields
(score, comment, etc.) from evaluators, so feedback sent via the API was
not filterable.

Two causes, both fixed on the frontend:
- The filter read evaluator.metrics off thin list refs that carry no
  data; it now resolves each evaluator's latest revision via a new
  evaluatorFeedbackSchemasAtom.
- Auto-created feedback evaluators store a genson-inferred output schema
  wrapped one level deeper ({outputs:{properties}}); resolveOutputSchema-
  Properties now unwraps that envelope so real metric keys surface.

Also corrects docs that claimed evaluators are not auto-created.
@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Summary by CodeRabbit

Release Notes

  • Documentation

    • Clarified evaluator auto-creation behavior: Agenta automatically creates evaluators on first annotation and infers schemas from the provided data; users can pre-create evaluators to enforce specific schema details like fixed labels and required fields.
  • Improvements

    • Enhanced feedback field selection with improved custom entry support and search functionality.

Walkthrough

This pull request implements evaluator feedback schema support by unwrapping genson-inferred output envelopes, building a Jotai atom to expose per-evaluator properties, refactoring the Filters component to consume feedback schemas (and accept typed custom fields), and updating docs to explain evaluator auto-creation and schema inference.

Changes

Evaluator Feedback Schema Pipeline

Layer / File(s) Summary
Schema unwrapping helper and core logic
web/packages/agenta-entities/src/workflow/core/schema.ts, web/packages/agenta-entities/tests/unit/resolveOutputSchemaProperties.test.ts
unwrapEnvelopeProperties detects and unwraps single-key outputs envelopes in genson-inferred schemas; resolveOutputSchemaProperties applies unwrapping to expose real metric keys. Tests validate envelope unwrapping across edge cases including preserved non-envelope schemas, non-unwrapped outputs with multiple siblings, and null handling.
Feedback schema state atom and exports
web/packages/agenta-entities/src/workflow/state/evaluatorUtils.ts, web/packages/agenta-entities/src/workflow/state/index.ts, web/packages/agenta-entities/src/workflow/index.ts
EvaluatorFeedbackSchema interface and evaluatorFeedbackSchemasAtom build per-evaluator output-metric property data by resolving each evaluator's latest revision and applying resolveOutputSchemaProperties. The atom is re-exported through state and workflow barrel modules as public API.
Filters component integration
web/oss/src/components/Filters/Filters.tsx
The component imports evaluatorFeedbackSchemasAtom and replaces metrics-based option derivation with schema-based building: iterating each evaluator's properties, using property keys as option values, schema.title as labels, and deriveFeedbackValueType to determine option types. Adds feedbackFieldSearch state and typed-custom/select-preserve behavior for feedback-field <Select>.
User documentation updates
docs/docs/observability/trace-with-python-sdk/08-annotate-traces.mdx, docs/docs/tutorials/cookbooks/01-capture-user-feedback.mdx
Documentation clarifies that evaluators are auto-created on first annotation with inferred schemas, and users can pre-create evaluators to control output schema structure (fixed labels, required fields, value ranges).

Sequence Diagram

sequenceDiagram
  participant Filter as Filters Component
  participant Atom as evaluatorFeedbackSchemasAtom
  participant Schema as resolveOutputSchemaProperties
  participant Evaluator as Evaluator Revision
  Filter->>Atom: useAtomValue(evaluatorFeedbackSchemasAtom)
  Atom->>Evaluator: iterate non-archived evaluators
  Evaluator->>Schema: revision.data
  Schema->>Schema: unwrapEnvelopeProperties
  Schema-->>Atom: properties
  Atom-->>Filter: EvaluatorFeedbackSchema[]
  Filter->>Filter: useMemo: iterate properties
  Filter->>Filter: property key → value, schema.title → label
  Filter-->>Filter: annotationFeedbackOptions
Loading

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main frontend fix—showing user feedback as filterable fields in observability—which is the primary objective of this PR.
Description check ✅ Passed The description is comprehensive and related to the changeset, clearly explaining context, root causes, changes made, and testing/verification steps.
Linked Issues check ✅ Passed The PR fully addresses linked issue #4654 by fixing both root causes: it reads evaluator feedback schemas from a new atom instead of empty .metrics, and unwraps the genson-inferred nested schema so feedback metrics (score, comment) appear as filterable fields.
Out of Scope Changes check ✅ Passed All changes align with the linked issue objectives: new atom for feedback schemas, schema unwrapping logic, Filters component updates, and documentation corrections. No unrelated changes detected.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 60.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/observability-feedback-filter

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@dosubot dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. documentation Improvements or additions to documentation Frontend labels Jun 11, 2026
@vercel

vercel Bot commented Jun 11, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agenta-documentation Ready Ready Preview, Comment Jun 12, 2026 11:06am

Request Review

@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Railway Preview Environment

Preview URL https://gateway-production-c0e7.up.railway.app/w
Project agenta-oss-pr-4652
Image tag pr-4652-b44023f
Status Deployed
Railway logs Open logs
Workflow logs View workflow run
Updated at 2026-06-12T11:17:33.320Z

@linear-code

linear-code Bot commented Jun 11, 2026

Copy link
Copy Markdown

AGE-3826

@junaway

junaway commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

@mmabrouk @ardaerzin
Shouldn't this be fixed instead of patched ?
Very curious how this never pop'ed up before/

…ty filter

Evaluators without an output schema expose no feedback metrics to suggest,
and the feedback-field Select cleared any typed value. The Select now
surfaces the typed text as a '<typed> (custom)' option that commits and
persists, so users can filter by a feedback name even when the schema can't
provide one.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
web/oss/src/components/Filters/Filters.tsx (1)

1788-1793: ⚡ Quick win

Debounce feedback-field search updates to reduce rerender churn.

onSearch writes state on every keystroke per row. Please debounce/throttle this handler (even a small 150–250ms debounce) to align with frontend guidance and avoid avoidable render pressure when many rows are open.

As per coding guidelines, “Debounce/throttle search inputs, filters, scroll and resize handlers in React components”.

Source: Coding guidelines


ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: dcb895e2-cf05-431e-8535-9eb48c345d02

📥 Commits

Reviewing files that changed from the base of the PR and between 5240b9b and 4c13a15.

📒 Files selected for processing (1)
  • web/oss/src/components/Filters/Filters.tsx

Comment on lines +544 to +546
// Free-text the user is typing into a row's feedback-field Select. Lets them name a
// feedback metric even when the evaluator has no output schema to suggest options.
const [feedbackFieldSearch, setFeedbackFieldSearch] = useState<Record<number, string>>({})

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Stale custom-search text can leak across rows after index shifts.

feedbackFieldSearch is keyed by row index, but row deletes/reordering can shift indices while this map is preserved. That can show a stale “(custom)” typed option on the wrong row. Prefer keying by a stable row id, or prune/remap feedbackFieldSearch whenever rows are removed/reset.

Also applies to: 1788-1809

@mmabrouk mmabrouk requested a review from ardaerzin June 12, 2026 12:29
@mmabrouk

mmabrouk commented Jun 12, 2026

Copy link
Copy Markdown
Member Author

@mmabrouk @ardaerzin Shouldn't this be fixed instead of patched ? Very curious how this never pop'ed up before/

How is this a patch? I considered the backend response as the ground truth and fixed the FE to follow that pattern.

The only "patch" is allowing users to write down the name of the feedback in case the evaluator does not have a schema (which is something we allow for the moment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation Frontend size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

(bug) Feedback sent via the API can't be filtered in observability

3 participants