Skip to content

Post sticky PR comment summarizing CI test job failures#1412

Open
rgsl888prabhu wants to merge 3 commits into
mainfrom
pr-test-summary-comment
Open

Post sticky PR comment summarizing CI test job failures#1412
rgsl888prabhu wants to merge 3 commits into
mainfrom
pr-test-summary-comment

Conversation

@rgsl888prabhu

@rgsl888prabhu rgsl888prabhu commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

Add a pr-test-summary job that runs after all test jobs complete and posts a sticky comment on the PR listing which matrix jobs failed with direct links to their logs. The comment updates in-place on re-run.

@rgsl888prabhu rgsl888prabhu requested a review from a team as a code owner June 9, 2026 20:36
@rgsl888prabhu rgsl888prabhu requested a review from msarahan June 9, 2026 20:36
@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a workflow job that runs after tests and invokes ci/utils/pr_test_summary.py to aggregate test job outcomes, extract failures/crashes from logs, and post or update a sticky PR comment with results.

Changes

PR Test Result Aggregation

Layer / File(s) Summary
Test result aggregation script
ci/utils/pr_test_summary.py
A Python utility reads GitHub Actions environment variables, paginates workflow run jobs, filters test jobs by name prefix, categorizes conclusions (failed/passed/skipped), downloads and truncates job logs, detects crash patterns, parses pytest “short test summary info” to collect failing test identifiers (capped per job), builds a markdown summary with counts, log links, and per-job details, and posts or patches a sticky PR comment using the GitHub REST API.
Workflow job for test summary
.github/workflows/pr.yaml
A new pr-test-summary job runs after specified test jobs, performs a sparse checkout of only ci/utils/pr_test_summary.py, executes it with GH_TOKEN, and grants pull-requests: write (and actions: read) to allow updating the PR comment.
Demo failing tests
cpp/tests/routing/unit_tests/breaks.cu, python/cuopt/cuopt/tests/test_ci_summary_demo.py
Added intentionally failing tests (one GoogleTest, two Python tests) to produce CI failures used to verify and demonstrate the PR test summary comment behavior.

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 28.57% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: adding a GitHub Actions job that posts a sticky PR comment summarizing CI test job failures.
Description check ✅ Passed The description clearly explains the purpose and functionality of the changes, relating directly to the PR's implementation of a pr-test-summary job and sticky PR comments.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch pr-test-summary-comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment thread ci/utils/pr_test_summary.py Outdated
)
lines += ["", "| Job | Logs |", "|-----|------|"]
for job in failed:
lines.append(f"| ❌ `{job['name']}` | [View logs]({job['html_url']}) |")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surfacing the names of the tests that failed is pretty useful, how hard would it be to bring that back?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test names are already surfaced — _analyze_job_log parses the pytest short test summary info section from each failed job's log and includes individual test IDs in a collapsible dropdown per job. This was added in the second commit on this PR.

@rgsl888prabhu rgsl888prabhu force-pushed the pr-test-summary-comment branch from b88771d to 418777d Compare June 9, 2026 20:43
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown

CI Test Summary

✅ All 0 test job(s) passed.

@rgsl888prabhu rgsl888prabhu force-pushed the pr-test-summary-comment branch from 418777d to 9fbc08f Compare June 9, 2026 21:40

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
ci/utils/pr_test_summary.py (1)

103-105: 💤 Low value

Optional: timestamp stripping heuristic could be more robust.

The current heuristic (split on "Z " and check length < 35) works for GitHub Actions timestamps but could incorrectly strip short prefixes in edge cases (e.g., "ABC Z content" → "content").

Since pytest output is predictable, the risk is low and the current implementation is acceptable. If you want perfect robustness, consider a regex like r'^\d{4}-\d{2}-\d{2}T[\d:.]+Z '.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ci/utils/pr_test_summary.py` around lines 103 - 105, The timestamp-stripping
heuristic around variables raw, parts, and line can falsely strip non-timestamp
prefixes; replace the split/length check with a strict regex match using
r'^\d{4}-\d{2}-\d{2}T[\d:.]+Z ' to detect GHA timestamps, and only remove that
prefix when the regex matches (e.g., use re.match on raw and slice off the
matched span), keeping the rest of the logic that assigns to line unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@ci/utils/pr_test_summary.py`:
- Around line 170-172: The current categorization in pr_test_summary.py filters
test_jobs into failed, passed, and skipped using explicit conclusion values and
ignores other conclusions (e.g., "timed_out", "cancelled"), causing incorrect
counts; update the logic around the failed/passed/skipped variables (those list
comprehensions that produce failed, passed, skipped from test_jobs) so that
passed only includes conclusion == "success", skipped only includes conclusion
== "skipped", and treat any other conclusion as failure (either by changing the
failed comprehension to include all non-success/non-skipped conclusions or by
computing failed = total_jobs - len(passed) - len(skipped)) to ensure totals
always sum to the number of test_jobs.

---

Nitpick comments:
In `@ci/utils/pr_test_summary.py`:
- Around line 103-105: The timestamp-stripping heuristic around variables raw,
parts, and line can falsely strip non-timestamp prefixes; replace the
split/length check with a strict regex match using r'^\d{4}-\d{2}-\d{2}T[\d:.]+Z
' to detect GHA timestamps, and only remove that prefix when the regex matches
(e.g., use re.match on raw and slice off the matched span), keeping the rest of
the logic that assigns to line unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ce8da740-23e0-441f-9cd4-7d56beab2bcd

📥 Commits

Reviewing files that changed from the base of the PR and between b88771d and 9fbc08f.

📒 Files selected for processing (2)
  • .github/workflows/pr.yaml
  • ci/utils/pr_test_summary.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • .github/workflows/pr.yaml

Comment thread ci/utils/pr_test_summary.py Outdated
Posts a sticky comment on each PR listing which matrix test jobs failed,
with direct links to their logs. Updates on re-run; clears to a pass
message if all jobs succeed.

Uses only actions/github-script (already in repo) and the built-in
GITHUB_TOKEN with pull-requests: write — no S3, no AWS credentials,
no new third-party actions.
@rgsl888prabhu rgsl888prabhu force-pushed the pr-test-summary-comment branch from 9fbc08f to c2950df Compare June 9, 2026 21:46
@rgsl888prabhu rgsl888prabhu self-assigned this Jun 9, 2026
@rgsl888prabhu rgsl888prabhu added non-breaking Introduces a non-breaking change improvement Improves an existing functionality labels Jun 9, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@ci/utils/pr_test_summary.py`:
- Around line 58-63: The GitHub API calls currently use urllib.request.urlopen
without a timeout; add a shared timeout constant (e.g., TIMEOUT_SECONDS = 10)
and pass it to every urlopen call (for example in _paginate and the other two
call sites that use urllib.request.urlopen) as urllib.request.urlopen(req,
timeout=TIMEOUT_SECONDS); update all occurrences to use this constant so every
HTTP request to the GitHub REST API has an explicit timeout.
- Around line 130-133: The parser only looks for lines starting with "FAILED "
so ERROR-summary lines are missed; update the extraction logic (the block that
currently checks if line.startswith("FAILED ")) to also handle lines starting
with "ERROR " (e.g., check startswith for both "FAILED " and "ERROR "), then
extract the test identifier the same way into the failed list (retain the
existing test_id = line[7:].split(" - ")[0].strip() logic or adjust slicing if
you normalize by splitting on the first space). Ensure this change is applied
where the failed list is appended so ERROR entries appear in the comment
details.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 50acfa7f-8113-4bc0-a79c-eecf289cb47e

📥 Commits

Reviewing files that changed from the base of the PR and between 9fbc08f and c2950df.

📒 Files selected for processing (2)
  • .github/workflows/pr.yaml
  • ci/utils/pr_test_summary.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • .github/workflows/pr.yaml

Comment thread ci/utils/pr_test_summary.py Outdated
Comment thread ci/utils/pr_test_summary.py Outdated
Two test failures (one Python, one C++) to verify the PR test summary
comment correctly captures and displays failed test names per job.
Remove before merging.
@rgsl888prabhu rgsl888prabhu requested review from a team as code owners June 9, 2026 21:56
- Add HTTP timeout to all urlopen calls
- Treat timed_out/cancelled jobs as failures in counts
- Capture ERROR lines alongside FAILED in pytest summary parsing

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cpp/tests/routing/unit_tests/breaks.cu`:
- Around line 433-437: The failing GoogleTest TEST(vehicle_breaks,
ci_summary_demo_failure) won't be picked up by ci/utils/pr_test_summary.py
because that parser only extracts IDs from pytest's short test summary; fix by
either removing or converting this intentional failure to a pytest-based test so
the per-test extraction is exercised, or update the parser
(ci/utils/pr_test_summary.py) to explicitly support GTest output (e.g., parse
GoogleTest's short summary or the XML produced via --gtest_output to extract
test names such as vehicle_breaks.ci_summary_demo_failure) and ensure the CI job
emits that GTest output for the parser to consume.
- Around line 433-437: Remove the intentional failing test TEST(vehicle_breaks,
ci_summary_demo_failure) from cpp/tests/routing/unit_tests/breaks.cu (or move it
to a throwaway branch) so CI no longer fails; alternatively, wrap it behind an
explicit opt-in guard (e.g., a preprocessor flag or test filter) so it never
runs in normal PR CI, ensuring the file only contains tests that validate
numerical correctness/regressions.

In `@python/cuopt/cuopt/tests/test_ci_summary_demo.py`:
- Around line 4-17: Remove the intentional failing demo tests from the pytest
suite: delete or relocate python/cuopt/cuopt/tests/test_ci_summary_demo.py
(which contains test_ci_summary_demo_failure and test_ci_summary_demo_failure_2)
out of the python/**/tests/** tree into an opt-in or demo-only path so they no
longer run in the default PR test matrix; if relocating, ensure the new path is
excluded from normal pytest discovery (or only run via an explicit demo CI job).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: edad9b32-c92d-4a15-a868-547e28b66120

📥 Commits

Reviewing files that changed from the base of the PR and between c2950df and ea62e02.

📒 Files selected for processing (2)
  • cpp/tests/routing/unit_tests/breaks.cu
  • python/cuopt/cuopt/tests/test_ci_summary_demo.py

Comment on lines +433 to +437
// Intentional failure to exercise the PR test summary comment feature.
TEST(vehicle_breaks, ci_summary_demo_failure)
{
EXPECT_TRUE(false) << "Intentional failure: remove after verifying PR summary comment.";
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

This C++ demo does not exercise per-test name extraction.

ci/utils/pr_test_summary.py only pulls test IDs out of pytest’s short test summary info, so this GoogleTest failure will show the job as failed but will not populate the per-job failed-test details. If C++ per-test names are part of the validation goal, the parser needs explicit GTest support first.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/tests/routing/unit_tests/breaks.cu` around lines 433 - 437, The failing
GoogleTest TEST(vehicle_breaks, ci_summary_demo_failure) won't be picked up by
ci/utils/pr_test_summary.py because that parser only extracts IDs from pytest's
short test summary; fix by either removing or converting this intentional
failure to a pytest-based test so the per-test extraction is exercised, or
update the parser (ci/utils/pr_test_summary.py) to explicitly support GTest
output (e.g., parse GoogleTest's short summary or the XML produced via
--gtest_output to extract test names such as
vehicle_breaks.ci_summary_demo_failure) and ensure the CI job emits that GTest
output for the parser to consume.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Remove this unconditional red test before merge.

This keeps the C++ test job failing on every run and does not validate solver behavior or a real regression. If you still need it for one-off validation, keep it on a throwaway branch or behind an explicit opt-in path that never runs in normal PR CI.

🧹 Proposed fix
-// Intentional failure to exercise the PR test summary comment feature.
-TEST(vehicle_breaks, ci_summary_demo_failure)
-{
-  EXPECT_TRUE(false) << "Intentional failure: remove after verifying PR summary comment.";
-}

As per coding guidelines, cpp/tests/** should validate numerical correctness/regressions rather than ship intentionally failing tests, and the PR objective already describes these as temporary demo failures to remove before merging.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Intentional failure to exercise the PR test summary comment feature.
TEST(vehicle_breaks, ci_summary_demo_failure)
{
EXPECT_TRUE(false) << "Intentional failure: remove after verifying PR summary comment.";
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/tests/routing/unit_tests/breaks.cu` around lines 433 - 437, Remove the
intentional failing test TEST(vehicle_breaks, ci_summary_demo_failure) from
cpp/tests/routing/unit_tests/breaks.cu (or move it to a throwaway branch) so CI
no longer fails; alternatively, wrap it behind an explicit opt-in guard (e.g., a
preprocessor flag or test filter) so it never runs in normal PR CI, ensuring the
file only contains tests that validate numerical correctness/regressions.

Source: Coding guidelines

Comment on lines +4 to +17
# Intentional failures to exercise the PR test summary comment feature.
# Remove after verifying the summary comment shows correct output.


def test_ci_summary_demo_failure():
assert False, (
"Intentional failure: remove after verifying PR summary comment."
)


def test_ci_summary_demo_failure_2():
raise RuntimeError(
"Intentional error: remove after verifying PR summary comment."
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Drop these demo failures from the checked-in pytest suite.

They keep conda-python-tests red on every run and do not cover cuOpt behavior, edge cases, or a regression. Please remove the file before merge, or move this validation to an opt-in/demo-only path outside the normal PR test matrix.

🧹 Proposed fix
-# Intentional failures to exercise the PR test summary comment feature.
-# Remove after verifying the summary comment shows correct output.
-
-
-def test_ci_summary_demo_failure():
-    assert False, (
-        "Intentional failure: remove after verifying PR summary comment."
-    )
-
-
-def test_ci_summary_demo_failure_2():
-    raise RuntimeError(
-        "Intentional error: remove after verifying PR summary comment."
-    )

As per coding guidelines, python/**/tests/** should validate numerical correctness and regressions, and the PR objective says these failures are only temporary for feature verification.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Intentional failures to exercise the PR test summary comment feature.
# Remove after verifying the summary comment shows correct output.
def test_ci_summary_demo_failure():
assert False, (
"Intentional failure: remove after verifying PR summary comment."
)
def test_ci_summary_demo_failure_2():
raise RuntimeError(
"Intentional error: remove after verifying PR summary comment."
)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@python/cuopt/cuopt/tests/test_ci_summary_demo.py` around lines 4 - 17, Remove
the intentional failing demo tests from the pytest suite: delete or relocate
python/cuopt/cuopt/tests/test_ci_summary_demo.py (which contains
test_ci_summary_demo_failure and test_ci_summary_demo_failure_2) out of the
python/**/tests/** tree into an opt-in or demo-only path so they no longer run
in the default PR test matrix; if relocating, ensure the new path is excluded
from normal pytest discovery (or only run via an explicit demo CI job).

Source: Coding guidelines

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
ci/utils/pr_test_summary.py (1)

141-145: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep skipped-job counts in the green-path summary.

Line 202 now preserves skipped jobs correctly, but Line 144 still renders All {len(passed)} test job(s) passed. whenever failed is empty. That makes the sticky comment wrong on green runs with skipped matrix jobs and misses the PR objective of reporting pass/fail/skip totals across the full matrix.

💡 Minimal fix
 def _build_body(failed, passed, skipped, job_analysis):
     lines = [_MARKER, "## CI Test Summary", ""]
     if not failed:
-        lines.append(f"✅ All {len(passed)} test job(s) passed.")
+        lines.append(
+            f"✅ 0 failed · {len(passed)} passed · {len(skipped)} skipped"
+        )
     else:
         lines.append(
             f"**{len(failed)} failed** · {len(passed)} passed · {len(skipped)} skipped"
         )

Also applies to: 200-202

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ci/utils/pr_test_summary.py` around lines 141 - 145, The green-path summary
in _build_body currently prints "All {len(passed)} test job(s) passed." and
omits skipped counts; update the branch where failed is empty to include skipped
in the message (e.g., "All X passed, Y skipped" or "X passed and Y skipped"
using len(passed) and len(skipped)), and ensure any other place in _build_body
that renders the all-passed message (see the same logic around the lines
handling passed/skipped at the end of the function) is updated consistently so
the sticky comment reports pass/fail/skip totals across the full matrix.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@ci/utils/pr_test_summary.py`:
- Around line 141-145: The green-path summary in _build_body currently prints
"All {len(passed)} test job(s) passed." and omits skipped counts; update the
branch where failed is empty to include skipped in the message (e.g., "All X
passed, Y skipped" or "X passed and Y skipped" using len(passed) and
len(skipped)), and ensure any other place in _build_body that renders the
all-passed message (see the same logic around the lines handling passed/skipped
at the end of the function) is updated consistently so the sticky comment
reports pass/fail/skip totals across the full matrix.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a3d89df2-775a-435d-9608-0e605c043ed8

📥 Commits

Reviewing files that changed from the base of the PR and between ea62e02 and efe1d66.

📒 Files selected for processing (1)
  • ci/utils/pr_test_summary.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants