test(tracing): add round-trip tests for ag.metrics.duration.cumulativ…#4575
test(tracing): add round-trip tests for ag.metrics.duration.cumulativ…#4575justin212407 wants to merge 6 commits into
Conversation
…e scalar shape Signed-off-by: justin212407 <charlesjustin2124@gmail.com>
|
@justin212407 is attempting to deploy a commit to the agenta projects Team on Vercel. A member of the Team first needs to authorize it. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughSummary by CodeRabbit
WalkthroughAdds two pytest unit tests for parse_spans_from_request covering metric-duration normalization: one ensures a scalar cumulative duration is overwritten by computed wall-clock duration when timestamps exist; the other ensures legacy dict-shaped cumulative metrics are normalized to a scalar when timestamps are absent. ChangesMetric duration ingestion normalization tests
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds unit tests around parse_spans_from_request to ensure duration metric normalization and precedence rules are stable across ingest/parsing.
Changes:
- Adds a test ensuring scalar
duration.cumulativeis accepted and overridden by computed duration when timestamps are present. - Adds a test ensuring legacy
{"total": v}dict shape forduration.cumulativeis normalized to a scalar when timestamps are absent.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # The computed duration (1000ms from timestamps) overwrites the pre-set value; | ||
| # the key contract is that the result is a plain scalar, not a dict. | ||
| assert isinstance(metrics["duration"]["cumulative"], (int, float)) | ||
| assert metrics["duration"]["cumulative"] == 1000.0 |
There was a problem hiding this comment.
Fixed - replaced bare float equality with pytest.approx(1000.0).
| metrics = parsed[0].attributes["ag"]["metrics"] | ||
| # Legacy {"total": v} dict is collapsed to the canonical scalar by | ||
| # _coerce_scalar_metric_value inside initialize_ag_attributes. | ||
| assert metrics["duration"]["cumulative"] == 1922.653 |
There was a problem hiding this comment.
Fixed - added isinstance(metrics["duration"]["cumulative"], (int, float)) scalar contract check and switched to pytest.approx(1922.653).
| assert len(response[0].span_id) == 16 | ||
|
|
||
|
|
||
| def test_parse_spans_from_request_accepts_scalar_duration_cumulative_through_ingest(): |
There was a problem hiding this comment.
Fixed - renamed to test_ingest_accepts_scalar_duration_cumulative.
| assert metrics["duration"]["cumulative"] == 1000.0 | ||
|
|
||
|
|
||
| def test_parse_spans_from_request_normalizes_legacy_dict_duration_cumulative_to_scalar(): |
There was a problem hiding this comment.
Fixed - renamed to test_ingest_normalizes_legacy_dict_duration_cumulative_to_scalar
|
Hi @justin212407, Could you please
|
| # The computed duration (1000ms from timestamps) overwrites the pre-set value; | ||
| # the key contract is that the result is a plain scalar, not a dict. | ||
| assert isinstance(metrics["duration"]["cumulative"], (int, float)) | ||
| assert metrics["duration"]["cumulative"] == 1000.0 |
| metrics = parsed[0].attributes["ag"]["metrics"] | ||
| # Legacy {"total": v} dict is collapsed to the canonical scalar by | ||
| # _coerce_scalar_metric_value inside initialize_ag_attributes. | ||
| assert metrics["duration"]["cumulative"] == 1922.653 |
| span = OTelSpan( | ||
| trace_id=TRACE_HEX, | ||
| span_id=SPAN_HEX, | ||
| span_name="root", | ||
| start_time=1_700_000_000, | ||
| end_time=1_700_000_001, | ||
| attributes={ | ||
| "ag": { | ||
| "metrics": { | ||
| "duration": {"cumulative": 1922.653}, | ||
| } | ||
| } | ||
| }, | ||
| ) |
| span = OTelSpan( | ||
| trace_id=TRACE_HEX, | ||
| span_id=SPAN_HEX, | ||
| span_name="root", | ||
| attributes={ | ||
| "ag": { | ||
| "metrics": { | ||
| "duration": {"cumulative": {"total": 1922.653}}, | ||
| } | ||
| } | ||
| }, | ||
| ) |
Signed-off-by: justin212407 <charlesjustin2124@gmail.com>
Tests pass locally: 2 new tests , full tracing suite 80 passed , full unit suite 737 passed , ruff check and format check clean |
| def test_ingest_accepts_scalar_duration_cumulative(): | ||
| # Canonical scalar shape in attributes — the ingest pipeline accepts it without | ||
| # error. When start_time and end_time are present the computed wall-clock duration | ||
| # takes precedence (1_700_000_001 - 1_700_000_000 = 1s = 1000ms). | ||
| span = OTelSpan( | ||
| trace_id=TRACE_HEX, | ||
| span_id=SPAN_HEX, | ||
| span_name="root", | ||
| start_time=1_700_000_000, | ||
| end_time=1_700_000_001, | ||
| attributes={ | ||
| "ag": { | ||
| "metrics": { | ||
| "duration": {"cumulative": 1922.653}, | ||
| } | ||
| } | ||
| }, | ||
| ) |
| assert len(response[0].span_id) == 16 | ||
|
|
||
|
|
||
| def test_ingest_accepts_scalar_duration_cumulative(): |
There was a problem hiding this comment.
These names were explicitly shortened in the previous review round after Copilot flagged the original test_parse_spans_from_request_accepts_scalar_duration_cumulative_through_ingest names as too long. Reverting to longer names would reintroduce the previous violation. Deferring to @junaway yo make the final call on naming convention.
| assert metrics["duration"]["cumulative"] == pytest.approx(1000.0) | ||
|
|
||
|
|
||
| def test_ingest_normalizes_legacy_dict_duration_cumulative_to_scalar(): |
Signed-off-by: justin212407 <charlesjustin2124@gmail.com>
| assert len(response[0].span_id) == 16 | ||
|
|
||
|
|
||
| def test_ingest_accepts_scalar_duration_cumulative(): |
| assert metrics["duration"]["cumulative"] == pytest.approx(1000.0) | ||
|
|
||
|
|
||
| def test_ingest_normalizes_legacy_dict_duration_cumulative_to_scalar(): |
| start_time=1_700_000_000, | ||
| end_time=1_700_000_001, |
| # The computed duration (1000ms from timestamps) overwrites the pre-set value; | ||
| # the key contract is that the result is a plain scalar, not a dict. | ||
| assert isinstance(metrics["duration"]["cumulative"], (int, float)) | ||
| assert metrics["duration"]["cumulative"] == pytest.approx(1000.0) |
Summary:
Fixes #4172 -
ag.metrics.duration.cumulativequery↔ingest schema asymmetry that caused spans to be silently dropped on re-ingest.During ingestion,
parsing.pycomputed wall-clock duration and wrote it as a raw scalar float. The original Pydantic model (AgMetricEntryAttributes) declaredcumulativeasOptional[Metrics]whereMetrics = Dict[str, NumericJson]- a dict, not a scalar. This meant any span queried via/api/tracing/spans/queryand re-submitted to/api/tracing/spans/ingestfailed Pydantic validation silently, returning{"count": 0, "links": []}with no error surfaced to the caller.Per the team's documented decision in
docs/design/homomorphic-tracing-io/findings.md, the canonical shape forduration.cumulativeis a scalar float. The production fixes (AgScalarMetricEntryAttributesintracing.py,_coerce_scalar_metric_valueinattributes.py, scalar write inparsing.py) were already present in the codebase. This PR adds the missing test coverage for those changes.Testing:
Verified locally
Added or updated tests:
Two new unit tests added to
api/oss/tests/pytest/unit/tracing/utils/test_parsing.py:test_parse_spans_from_request_accepts_scalar_duration_cumulative_through_ingest- Verifies that a span withmetrics.duration.cumulativeset as a scalar float passes through the fullparse_spans_from_requestpipeline without error and the output value is a plainfloat, not a dict. Covers the canonical ingest path.test_parse_spans_from_request_normalizes_legacy_dict_duration_cumulative_to_scalar- Verifies backward compatibility: a span carrying the old dict shapeduration.cumulative = {"total": 1922.653}(as written by older ingestion codeor replayed from a query response) is normalized to the canonical scalar
1922.653by_coerce_scalar_metric_valueinsideinitialize_ag_attributes. No timestamps are supplied so the computed-duration override path is skipped,isolating the normalization behavior.
QA follow-up:
duration.cumulativeas a scalar are still returned correctly by/api/tracing/spans/query.{"count": 0, "links": []}.duration.cumulative = {"total": v}) are normalized correctly on the way through ingest without requiring a backfill.Demo:
N/A - no UI changes.
Checklist:
Contributor Resources: