Skip to content

Eval regression detected (2026-05-07) #27

@github-actions

Description

@github-actions

Regression Report: math-20

Baseline: 0.2.0-cerebras-llama3.1-8b
Current: ci-monitor
Status: REGRESSION DETECTED

Metrics

Metric Baseline Current Change
accuracy 0.9000 0.8500 -5.6%
avg_latency 11.2776 3.3946 -69.9%
total_tokens 3194.0000 1966.0000 -38.4%
avg_tool_accuracy 0.0000 0.0000 N/A

Alerts

  • [WARNING] math-20/accuracy: 0.9000 -> 0.8500 (-5.6%, threshold: 5%)

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions