test(tracing): add round-trip tests for ag.metrics.duration.cumulativ… by justin212407 · Pull Request #4575 · Agenta-AI/agenta

justin212407 · 2026-06-07T20:02:48Z

Summary:

Fixes #4172 - ag.metrics.duration.cumulative query↔ingest schema asymmetry that caused spans to be silently dropped on re-ingest.

During ingestion, parsing.py computed wall-clock duration and wrote it as a raw scalar float. The original Pydantic model (AgMetricEntryAttributes) declared cumulative as Optional[Metrics] where Metrics = Dict[str, NumericJson] - a dict, not a scalar. This meant any span queried via /api/tracing/spans/query and re-submitted to /api/tracing/spans/ingest failed Pydantic validation silently, returning {"count": 0, "links": []} with no error surfaced to the caller.

Per the team's documented decision in docs/design/homomorphic-tracing-io/findings.md, the canonical shape for duration.cumulative is a scalar float. The production fixes (AgScalarMetricEntryAttributes in tracing.py, _coerce_scalar_metric_value in attributes.py, scalar write in parsing.py) were already present in the codebase. This PR adds the missing test coverage for those changes.

Testing:

Verified locally

# New tests only
uv run pytest oss/tests/pytest/unit/tracing/utils/test_parsing.py \
  -v -k "duration_cumulative" --no-header
# 2 passed

# Full tracing suite
uv run pytest oss/tests/pytest/unit/tracing/ -v --no-header -p no:warnings
# 74 passed

# Full unit suite
uv run pytest oss/tests/pytest/unit/ -q --no-header -p no:warnings
# 525 passed

# Linting and formatting
uv run ruff check oss/src/core/tracing/ oss/tests/pytest/unit/tracing/
uv run ruff format --check oss/tests/pytest/unit/tracing/utils/test_parsing.py
# All checks passed

Added or updated tests:

Two new unit tests added to

api/oss/tests/pytest/unit/tracing/utils/test_parsing.py:
test_parse_spans_from_request_accepts_scalar_duration_cumulative_through_ingest - Verifies that a span with metrics.duration.cumulative set as a scalar float passes through the full parse_spans_from_request pipeline without error and the output value is a plain float, not a dict. Covers the canonical ingest path.
test_parse_spans_from_request_normalizes_legacy_dict_duration_cumulative_to_scalar - Verifies backward compatibility: a span carrying the old dict shape duration.cumulative = {"total": 1922.653} (as written by older ingestion code
or replayed from a query response) is normalized to the canonical scalar 1922.653 by _coerce_scalar_metric_value inside initialize_ag_attributes. No timestamps are supplied so the computed-duration override path is skipped,
isolating the normalization behavior.

QA follow-up:

Verify that existing traces in production with duration.cumulative as a scalar are still returned correctly by /api/tracing/spans/query.
Verify that a full query→re-ingest round-trip (the exact repro from the issue) no longer returns {"count": 0, "links": []}.
Verify that traces ingested by older SDK versions (which may have written duration.cumulative = {"total": v}) are normalized correctly on the way through ingest without requiring a backfill.

Demo:

N/A - no UI changes.

Checklist:

I have included a video or screen recording for UI changes, or marked Demo as N/A.
Relevant tests pass locally.
Relevant linting and formatting pass locally.
I have signed the CLA, or I will sign it when the bot prompts me.

Contributor Resources:

…e scalar shape Signed-off-by: justin212407 <charlesjustin2124@gmail.com>

vercel · 2026-06-07T20:02:54Z

@justin212407 is attempting to deploy a commit to the agenta projects Team on Vercel.

A member of the Team first needs to authorize it.

coderabbitai · 2026-06-07T20:03:08Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 6c97097f-8653-4ae8-a515-9eb95415bdfa

📥 Commits

Reviewing files that changed from the base of the PR and between 092b3d6 and d7a28b3.

📒 Files selected for processing (1)

api/oss/tests/pytest/unit/tracing/utils/test_parsing.py

🚧 Files skipped from review as they are similar to previous changes (1)

api/oss/tests/pytest/unit/tracing/utils/test_parsing.py

📝 Walkthrough

Summary by CodeRabbit

Tests
- Added unit tests verifying cumulative duration metrics are normalized: canonical scalar values remain scalars, legacy dict forms are coerced to canonical scalar, and when start/end timestamps are present the cumulative duration is derived from wall-clock time.

Walkthrough

Adds two pytest unit tests for parse_spans_from_request covering metric-duration normalization: one ensures a scalar cumulative duration is overwritten by computed wall-clock duration when timestamps exist; the other ensures legacy dict-shaped cumulative metrics are normalized to a scalar when timestamps are absent.

Changes

Metric duration ingestion normalization tests

Layer / File(s)	Summary
Scalar and dict metric duration handling `api/oss/tests/pytest/unit/tracing/utils/test_parsing.py`	Two tests: one verifies a scalar `ag.metrics.duration.cumulative` is overwritten by computed wall-clock duration when `start_time`/`end_time` exist and remains a numeric scalar; the other verifies legacy dict-shaped `{"total": ...}` cumulative metrics are normalized to a canonical scalar when timestamps are absent.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 60.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly identifies the main change: adding round-trip tests for ag.metrics.duration.cumulative normalization as part of the tracing fix.
Description check	✅ Passed	The description comprehensively explains the issue (`#4172`), the fix, test coverage, verification steps, and clearly relates to the changeset of adding unit tests.
Linked Issues check	✅ Passed	The PR adds test coverage for the production fixes already present in the codebase to address issue `#4172`, validating both scalar ingest path and legacy dict normalization as specified in the issue requirements.
Out of Scope Changes check	✅ Passed	All changes are focused on adding unit test coverage for duration.cumulative normalization; no unrelated modifications to production code or other areas are present.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds unit tests around parse_spans_from_request to ensure duration metric normalization and precedence rules are stable across ingest/parsing.

Changes:

Adds a test ensuring scalar duration.cumulative is accepted and overridden by computed duration when timestamps are present.
Adds a test ensuring legacy {"total": v} dict shape for duration.cumulative is normalized to a scalar when timestamps are absent.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

justin212407 · 2026-06-13T07:30:13Z

+    # The computed duration (1000ms from timestamps) overwrites the pre-set value;
+    # the key contract is that the result is a plain scalar, not a dict.
+    assert isinstance(metrics["duration"]["cumulative"], (int, float))
+    assert metrics["duration"]["cumulative"] == 1000.0


Fixed - replaced bare float equality with pytest.approx(1000.0).

justin212407 · 2026-06-13T07:30:32Z

+    metrics = parsed[0].attributes["ag"]["metrics"]
+    # Legacy {"total": v} dict is collapsed to the canonical scalar by
+    # _coerce_scalar_metric_value inside initialize_ag_attributes.
+    assert metrics["duration"]["cumulative"] == 1922.653


Fixed - added isinstance(metrics["duration"]["cumulative"], (int, float)) scalar contract check and switched to pytest.approx(1922.653).

justin212407 · 2026-06-13T07:30:44Z

    assert len(response[0].span_id) == 16


+def test_parse_spans_from_request_accepts_scalar_duration_cumulative_through_ingest():


Fixed - renamed to test_ingest_accepts_scalar_duration_cumulative.

justin212407 · 2026-06-13T07:30:58Z

+    assert metrics["duration"]["cumulative"] == 1000.0
+
+
+def test_parse_spans_from_request_normalizes_legacy_dict_duration_cumulative_to_scalar():


Fixed - renamed to test_ingest_normalizes_legacy_dict_duration_cumulative_to_scalar

junaway · 2026-06-12T08:07:36Z

Hi @justin212407,

Could you please

fix the agents' comments, then
leave a clear 'Fixed' comment on each of them, then
run all tests locally, and then
let us know when you're done ?

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 4 comments.

+    # The computed duration (1000ms from timestamps) overwrites the pre-set value;
+    # the key contract is that the result is a plain scalar, not a dict.
+    assert isinstance(metrics["duration"]["cumulative"], (int, float))
+    assert metrics["duration"]["cumulative"] == 1000.0


+    metrics = parsed[0].attributes["ag"]["metrics"]
+    # Legacy {"total": v} dict is collapsed to the canonical scalar by
+    # _coerce_scalar_metric_value inside initialize_ag_attributes.
+    assert metrics["duration"]["cumulative"] == 1922.653


+    span = OTelSpan(
+        trace_id=TRACE_HEX,
+        span_id=SPAN_HEX,
+        span_name="root",
+        start_time=1_700_000_000,
+        end_time=1_700_000_001,
+        attributes={
+            "ag": {
+                "metrics": {
+                    "duration": {"cumulative": 1922.653},
+                }
+            }
+        },
+    )


+    span = OTelSpan(
+        trace_id=TRACE_HEX,
+        span_id=SPAN_HEX,
+        span_name="root",
+        attributes={
+            "ag": {
+                "metrics": {
+                    "duration": {"cumulative": {"total": 1922.653}},
+                }
+            }
+        },
+    )


Signed-off-by: justin212407 <charlesjustin2124@gmail.com>

justin212407 · 2026-06-13T07:32:28Z

Hi @justin212407,

Could you please

fix the agents' comments, then

leave a clear 'Fixed' comment on each of them, then

run all tests locally, and then
let us know when you're done ?

Tests pass locally: 2 new tests , full tracing suite 80 passed , full unit suite 737 passed , ruff check and format check clean

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.

+def test_ingest_accepts_scalar_duration_cumulative():
+    # Canonical scalar shape in attributes — the ingest pipeline accepts it without
+    # error. When start_time and end_time are present the computed wall-clock duration
+    # takes precedence (1_700_000_001 - 1_700_000_000 = 1s = 1000ms).
+    span = OTelSpan(
+        trace_id=TRACE_HEX,
+        span_id=SPAN_HEX,
+        span_name="root",
+        start_time=1_700_000_000,
+        end_time=1_700_000_001,
+        attributes={
+            "ag": {
+                "metrics": {
+                    "duration": {"cumulative": 1922.653},
+                }
+            }
+        },
+    )


justin212407 · 2026-06-13T09:51:54Z

    assert len(response[0].span_id) == 16


+def test_ingest_accepts_scalar_duration_cumulative():


These names were explicitly shortened in the previous review round after Copilot flagged the original test_parse_spans_from_request_accepts_scalar_duration_cumulative_through_ingest names as too long. Reverting to longer names would reintroduce the previous violation. Deferring to @junaway yo make the final call on naming convention.

+    assert metrics["duration"]["cumulative"] == pytest.approx(1000.0)
+
+
+def test_ingest_normalizes_legacy_dict_duration_cumulative_to_scalar():


Signed-off-by: justin212407 <charlesjustin2124@gmail.com>

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 4 comments.

    assert len(response[0].span_id) == 16


+def test_ingest_accepts_scalar_duration_cumulative():


+    assert metrics["duration"]["cumulative"] == pytest.approx(1000.0)
+
+
+def test_ingest_normalizes_legacy_dict_duration_cumulative_to_scalar():


+        start_time=1_700_000_000,
+        end_time=1_700_000_001,


+    # The computed duration (1000ms from timestamps) overwrites the pre-set value;
+    # the key contract is that the result is a plain scalar, not a dict.
+    assert isinstance(metrics["duration"]["cumulative"], (int, float))
+    assert metrics["duration"]["cumulative"] == pytest.approx(1000.0)


test(tracing): add round-trip tests for ag.metrics.duration.cumulativ…

c5223d0

…e scalar shape Signed-off-by: justin212407 <charlesjustin2124@gmail.com>

Copilot AI review requested due to automatic review settings June 7, 2026 20:02

dosubot Bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Jun 7, 2026

dosubot Bot added the tests label Jun 7, 2026

Copilot AI reviewed Jun 7, 2026

View reviewed changes

mmabrouk requested a review from jp-agenta June 9, 2026 07:58

Merge branch 'main' into test-tracing-fix

cd2aa47

Merge branch 'main' into test-tracing-fix

1decd6a

Copilot AI review requested due to automatic review settings June 13, 2026 07:27

Copilot AI reviewed Jun 13, 2026

View reviewed changes

test(tracing): address review comments on duration.cumulative tests

092b3d6

Signed-off-by: justin212407 <charlesjustin2124@gmail.com>

justin212407 requested a review from Copilot June 13, 2026 07:32

Copilot AI reviewed Jun 13, 2026

View reviewed changes

justin212407 and others added 2 commits June 13, 2026 15:23

Merge branch 'main' into test-tracing-fix

15775b5

test(tracing): clarify timestamp unit

d7a28b3

Signed-off-by: justin212407 <charlesjustin2124@gmail.com>

Copilot AI review requested due to automatic review settings June 13, 2026 09:59

Copilot AI reviewed Jun 13, 2026

View reviewed changes

		assert len(response[0].span_id) == 16


		def test_parse_spans_from_request_accepts_scalar_duration_cumulative_through_ingest():

		assert metrics["duration"]["cumulative"] == 1000.0


		def test_parse_spans_from_request_normalizes_legacy_dict_duration_cumulative_to_scalar():

		assert len(response[0].span_id) == 16


		def test_ingest_accepts_scalar_duration_cumulative():

		assert metrics["duration"]["cumulative"] == pytest.approx(1000.0)


		def test_ingest_normalizes_legacy_dict_duration_cumulative_to_scalar():

Conversation

justin212407 commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary:

Testing:

Verified locally

Added or updated tests:

QA follow-up:

Demo:

Checklist:

Contributor Resources:

Uh oh!

vercel Bot commented Jun 7, 2026

Uh oh!

coderabbitai Bot commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

justin212407 Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

justin212407 Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

justin212407 Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

justin212407 Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

junaway commented Jun 12, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

justin212407 commented Jun 13, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

justin212407 Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

justin212407 commented Jun 7, 2026 •

edited

Loading

coderabbitai Bot commented Jun 7, 2026 •

edited

Loading