-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Add copilot-pr-autopilot skill #1944
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
yeelam-gordon
wants to merge
1
commit into
github:staged
Choose a base branch
from
yeelam-gordon:add-copilot-review-autopilot-skill
base: staged
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,153 @@ | ||
| --- | ||
| name: copilot-pr-autopilot | ||
| description: 'Copilot left 14 review comments on your PR — half are nits. Hours of fix → reply → resolve → re-request, and each round lands MORE comments. This skill runs loop engineering: auto-triggers Copilot Code Review via GraphQL (no @copilot mention), triages every open thread (Copilot, humans, advanced-security) with a fix / decline / escalate rubric, dispatches parallel fix sub-agents that obey the repo build/test/lint conventions, commits per iteration, replies+resolves citing the pushed SHA, then re-triggers until HEAD is reviewed with zero threads awaiting the agent''s reply (remaining open threads are explicit hand-offs to the human — escalated declines, design tradeoffs). You merge a clean PR; the bot runs it. Trigger phrases: "address copilot comments", "run a copilot review loop", "fix this PR", "iterate on copilot feedback". Repo-agnostic, gh CLI + PowerShell. Full autopilot needs repo Triage/Write; external PR authors get single-iteration mode plus manual re-trigger (UI 🔄 or substantive-commit push).' | ||
| --- | ||
|
|
||
| # Copilot PR Autopilot | ||
|
|
||
| Drive any GitHub pull request through repeated rounds of Copilot code | ||
| review until the agent has done its job — every Copilot finding has | ||
| a reply from the agent (fix-acknowledgement, decline-with-rationale, | ||
| or explicit escalate-to-user hand-off). Remaining open threads, if | ||
| any, are deliberate hand-offs to the human merge owner — they're | ||
| not loop failures. Repository-agnostic — works on any repo that has | ||
| Copilot Code Review enabled, run from a machine with `gh` CLI | ||
| installed and authenticated (see Prerequisites). | ||
|
|
||
| ## When to Use This Skill | ||
|
|
||
| - The user asks to "request Copilot review" or "run a Copilot review loop" | ||
| on a PR. | ||
| - A PR is functionally complete and the user wants a final correctness pass | ||
| via repeated automated review rounds. | ||
| - A previous Copilot review on the PR has left open threads that need | ||
| triage, fixing, replying, and resolving. | ||
|
|
||
| ## When NOT to Use This Skill | ||
|
|
||
| - The PR is still under active design — wait until the structure is stable; | ||
| otherwise findings churn round-over-round. | ||
| - The user wants human reviewer feedback, not Copilot's. | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| - `gh` CLI installed and authenticated against the target repository. | ||
| - PowerShell on PATH — Windows PowerShell 5.1+ (`powershell.exe`) or | ||
| PowerShell 7+ (`pwsh`). Both are tested. | ||
| - Copilot Code Review is the primary use case (`01-request-review.ps1` | ||
| uses GraphQL `requestReviewsByLogin` to trigger Copilot). It is | ||
| **NOT a hard requirement** — if `01-request-review.ps1` fails | ||
| because Copilot isn't enabled on the repo / account, the agent can | ||
| still drive existing review threads (human, advanced-security, etc.) | ||
| to completion by running steps 3–8 once as a single iteration; just | ||
| skip the trigger + wait. There is no auto-detect for "Copilot | ||
| unavailable" — the agent makes that decision after the trigger | ||
| fails (the script can't reliably tell "Copilot disabled" from | ||
| "Copilot enabled but not yet triggered" from API state alone). | ||
|
|
||
| ### Permissions: who can run the full loop | ||
|
|
||
| The full multi-round autopilot (steps 1 → 9 → 1) needs **Triage or Write** permission on the target repo, because GitHub's only public API for adding the Copilot bot as a reviewer (`requestReviewsByLogin`) is gated on that permission. Verified against the public REST + GraphQL surface in this PR's commit history — there is no public-API path for bot reviewers without write permission. | ||
|
|
||
| | You are… | What works | | ||
| |---|---| | ||
| | **Repo collaborator with Triage / Write** | Full loop: `01` triggers Copilot, `02` waits, `04`–`08` triage / fix / reply, loop back to `01`. Hands-off. | | ||
| | **External PR author (no write permission)** | `01` will throw a clear actionable error. Use `-SingleIteration` mode: address all current findings in one pass, then either click the UI 🔄 next to Copilot, **or** push a substantive commit (the `synchronize` event auto-triggers Copilot on most repos). Then re-run `02` to verify. | | ||
|
|
||
| In single-iteration mode the loop's convergence boolean is `Converged: true` iff `OpenThreadsAwaitingReply == 0` (the agent's side is done). The maintainer-side re-trigger then drives any additional rounds. | ||
|
|
||
| Every script dot-sources [scripts/_lib.ps1](scripts/_lib.ps1) which | ||
| runs `Assert-GhReady` on load: if `gh` is missing OR `gh auth status` | ||
| fails, the script halts **before any work** with a single actionable | ||
| error message naming the install command and `gh auth login`. The | ||
| agent should surface that message to the user verbatim and stop the | ||
| loop — do not retry or work around it. | ||
|
|
||
| ## Step-by-Step Workflow | ||
|
|
||
| > **The loop:** steps 1 → 2 → 3 → 4 → 5 → 6 → 7 → 8 → 9, then **back to step 1** if `Converged: false`. Repeat the 1→9 round until step 9 returns `Converged: true`; only then run step 10 once and call `task_complete`. | ||
|
|
||
| Each round runs steps 1–9; step 10 is a one-time cleanup after convergence. The parent agent coordinates; every sub-agent step runs in a fresh context with a bounded budget. Cross-cutting protocol (time-boxing, extension, single-iteration fallback): [orchestration.md](references/orchestration.md). | ||
|
|
||
| 1. **Request review** _(parent)_ — see [orchestration.md#step-1-request-review](references/orchestration.md#step-1-request-review) | ||
| 2. **Wait for review** _(sub-agent, 20-min cap)_ — see [02-wait.md](references/02-wait.md) | ||
|
yeelam-gordon marked this conversation as resolved.
|
||
| 3. **List + categorize open threads** _(sub-agent, 5 min)_ — see [03-list-threads.md](references/03-list-threads.md) | ||
| 4. **Triage** _(sub-agent, 5 min per ≤5 threads)_ — see [04-triage.md](references/04-triage.md) | ||
| 5. **Fix** _(sub-agents, parallel max 5, 5 min each)_ — see [05-fix.md](references/05-fix.md) | ||
| 6. **Build + test per repo conventions** _(sub-agent, 10 min)_ — see [06-build-test.md](references/06-build-test.md) | ||
| 7. **Commit + push** _(parent)_ — see [orchestration.md#step-7-commit-and-push](references/orchestration.md#step-7-commit-and-push) | ||
| 8. **Reply (always) + resolve (conditional)** _(sub-agent drafts, parent posts)_ — see [08-reply-resolve.md](references/08-reply-resolve.md) | ||
| 9. **Convergence verify** _(sub-agent, 3 min)_ — see [09-convergence.md](references/09-convergence.md) | ||
| - **`Converged: false` → loop back to step 1** for another round (re-trigger, wait, list, triage, fix, push, reply, re-check). Each round addresses Copilot's findings on the previous round's HEAD; the loop terminates as soon as Copilot has nothing new to say AND every open thread has a reply from the agent. | ||
| - **`Converged: true` → exit the loop**, run step 10 once, call `task_complete` with the proof. | ||
| 10. **Cleanup outdated** _(parent, post-convergence, once)_ — see [orchestration.md#step-10-cleanup-outdated](references/orchestration.md#step-10-cleanup-outdated) | ||
|
|
||
| Convergence is computed by [scripts/02-check-review-status.ps1](scripts/02-check-review-status.ps1) as a single `Converged: true` boolean. Do **not** call `task_complete` until it returns true; print the proof (`HeadOid`, `LatestCopilotReview.commitOid`, `submittedAt`) in the completion message. | ||
|
|
||
| ## Gotchas | ||
|
|
||
| The bundled scripts enforce the hard correctness invariants (trigger landing via `copilot_work_started` event id, `Converged` requiring HEAD-match + zero-awaiting + at-HEAD review, single-iteration fallback semantics, PR-state guard). Trust them — don't re-derive. The notes below cover decisions the scripts can't make for you: | ||
|
|
||
| - **Reply to every open thread; resolve only when the loop owns the disposition.** For `fix` and `decline` threads, reply + resolve. For `escalate-to-user` threads, reply with the analysis but leave the thread OPEN (`08-reply-and-resolve.ps1 -NoResolve`) so the human merge owner can act on it. See [08-reply-resolve.md](references/08-reply-resolve.md). | ||
| - **Copilot threads are loop-owned; human / advanced-security / other-bot threads default to `escalate-to-user`.** Auto-resolving a human review thread can hide unaddressed concerns. See [04-triage.md](references/04-triage.md) for the rubric. | ||
| - **One focused commit per round, not one per PR.** Bundling rounds destroys the audit trail of which finding drove which change and breaks `git bisect`. See [orchestration.md#step-7-commit-and-push](references/orchestration.md#step-7-commit-and-push). | ||
| - **Build/test/lint with the repo's own commands** (per its `CONTRIBUTING` / `AGENTS` / `README` / `package.json` / `Makefile`) before pushing a fix. Discovery procedure: [06-build-test.md](references/06-build-test.md). | ||
| - **Push back with written rationale** when a Copilot finding would over-engineer the design for a hypothetical edge case. Auto-accepting every suggestion erodes the design — see the `decline` path in [04-triage.md](references/04-triage.md). | ||
| - **Scripting traps** (`gh api graphql -F` type-coercion, `git stash push -m` positional parsing, the three GraphQL traps for the reviewer mutation) are documented in [references/api-quirks.md](references/api-quirks.md). Read before modifying any script. | ||
|
|
||
| ## Troubleshooting | ||
|
|
||
| | Issue | Solution | | ||
| |-------|----------| | ||
| | Script throws `prerequisite missing — gh CLI is not on PATH` | Install `gh` (`winget install GitHub.cli` on Windows; `brew install gh` on macOS; package manager on Linux; or download from https://cli.github.com). Then `gh auth login`. Surface the message to the user and STOP the loop — do not retry. | | ||
| | Script throws `prerequisite missing — gh CLI is not authenticated` | Run `gh auth login`. STOP the loop until the user completes auth. | | ||
| | Trigger fails or no `copilot_work_started` event lands | Push a substantive (non-whitespace) commit — auto-assign on `synchronize` is the most reliable trigger. Persistent failure indicates Copilot Code Review may not be enabled on the repo / account (check repo Settings → Code & automation → Copilot, or account-level Copilot Pro/Pro+). | | ||
| | No new review after waiting ~10 min | Quiet-period after recent dismissal or trivial-diff suppression. Push a substantive commit and retry. Do not blindly re-run `01-request-review.ps1` — it reports `InFlight` while Copilot is still a requested reviewer. | | ||
| | Outdated-but-unresolved threads in the open list | Expected: unresolved state is the source of truth. Reply + resolve them like any other open thread. `10-cleanup-outdated.ps1` is only a final safety net. | | ||
| | Unsure whether to fix or decline a finding | See [references/04-triage.md](references/04-triage.md). | | ||
| | Need a reply phrasing for "fixed", "declined", or "drift" | See the templates under [templates/](templates/) — [reply-fix.md](templates/reply-fix.md), [reply-decline.md](templates/reply-decline.md), [reply-drift.md](templates/reply-drift.md), [reply-partial.md](templates/reply-partial.md). | | ||
|
|
||
| ## References | ||
|
|
||
| - [references/orchestration.md](references/orchestration.md) — | ||
| parent-owned loop control: time-boxing & extension protocol, | ||
| sub-agent delegation map, steps 1 / 7 / 10 contracts, | ||
| single-iteration fallback, and loop-wide notes. | ||
| - Per-step sub-agent contracts: | ||
| [references/02-wait.md](references/02-wait.md), | ||
| [references/03-list-threads.md](references/03-list-threads.md), | ||
| [references/04-triage.md](references/04-triage.md) (includes the | ||
| fix-vs-decline rubric), | ||
| [references/05-fix.md](references/05-fix.md), | ||
| [references/06-build-test.md](references/06-build-test.md), | ||
| [references/08-reply-resolve.md](references/08-reply-resolve.md), | ||
| [references/09-convergence.md](references/09-convergence.md). | ||
| - [references/api-quirks.md](references/api-quirks.md) — verified | ||
| GitHub API behavior, dead-ends, and the GraphQL traps for the | ||
| reviewer mutation. | ||
| - Templates (one per reply type): | ||
| [templates/reply-fix.md](templates/reply-fix.md) — accepted-fix | ||
| pattern; [templates/reply-decline.md](templates/reply-decline.md) — | ||
| declined-with-rationale pattern; | ||
| [templates/reply-drift.md](templates/reply-drift.md) — | ||
| PR-description / comment / test-plan drift acknowledgement; | ||
| [templates/reply-partial.md](templates/reply-partial.md) — | ||
| partial fix with deferred follow-up. Cross-cutting reply guidance | ||
| and anti-patterns live in | ||
| [references/08-reply-resolve.md](references/08-reply-resolve.md#reply-guidance). | ||
| - [scripts/_lib.ps1](scripts/_lib.ps1) — shared helpers (`Invoke-Gh`, | ||
| `Invoke-GhGraphQL`, `Resolve-RepoCoords`); dot-sourced by every | ||
| script. | ||
| - [scripts/01-request-review.ps1](scripts/01-request-review.ps1) — | ||
| trigger Copilot review and verify pickup via the | ||
| `copilot_work_started` event. | ||
| - [scripts/02-check-review-status.ps1](scripts/02-check-review-status.ps1) — | ||
| single-shot snapshot of the PR's Copilot review state; emits | ||
| `Converged: true` only when all three conditions hold. | ||
| - [scripts/03-list-open-threads.ps1](scripts/03-list-open-threads.ps1) — | ||
| every unresolved PR review thread from **all reviewers** (Copilot, | ||
| humans, github-advanced-security, etc.). | ||
| - [scripts/08-reply-and-resolve.ps1](scripts/08-reply-and-resolve.ps1) — | ||
| post a reply and resolve in one call. | ||
| - [scripts/10-cleanup-outdated.ps1](scripts/10-cleanup-outdated.ps1) — | ||
| safety net for outdated Copilot threads. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,49 @@ | ||
| # Step 2: Wait for review | ||
|
|
||
| Sub-agent type: `general-purpose`; budget: **20-minute hard cap** (one | ||
| bounded sub-agent, NOT extension-driven). | ||
|
|
||
| **Skipped** when the loop is in [single-iteration | ||
| mode](orchestration.md#single-iteration-fallback) — there's no Copilot | ||
| review to wait for. | ||
|
|
||
| ## Inputs | ||
|
|
||
| From step 1: | ||
| - `PrNumber`. | ||
| - `baseline` — the `LatestCopilotReview.submittedAt` string captured | ||
| before the trigger fired (empty string if no prior Copilot review). | ||
|
|
||
| ## Return contract | ||
|
|
||
| - `02-check-review-status.ps1` JSON snapshot. | ||
| - `recommendation` ∈ {`ready`, `give-up-push-commit`}. | ||
| - `ready` iff **both** `LatestCopilotReview.submittedAt > baseline` | ||
| AND `ReviewAtHead: true`. | ||
|
|
||
| ## Procedure | ||
|
|
||
| Poll `02-check-review-status.ps1` approximately every **3 minutes** | ||
| until `ready` or the 20-minute cap is hit: | ||
|
|
||
| ```pwsh | ||
| pwsh ./scripts/02-check-review-status.ps1 -PrNumber <n> | ||
| ``` | ||
|
|
||
| - Extract `submittedAt` and `ReviewAtHead` from the JSON each tick. | ||
| - Stop and return `ready` on the first tick that satisfies both | ||
| conditions vs. the captured `baseline`. | ||
| - On cap reached without `ready`, return `give-up-push-commit`. | ||
|
|
||
| ## Gotchas | ||
|
|
||
| - **Don't poll faster than ~3 minutes.** There is no progress signal | ||
| from the API; faster polling only burns budget. | ||
| - **`give-up-push-commit` fallback is parent-driven.** When the | ||
| sub-agent returns this recommendation, the **parent** pushes a | ||
| substantive (non-whitespace) commit — auto-assign on `synchronize` is | ||
| the most reliable trigger. Then the parent re-enters the loop at | ||
| step 1 with a fresh `baseline`. | ||
| - **Single bounded run, not extension-driven.** Do not request | ||
| extensions on this step — if 20 min isn't enough, the right move is | ||
| the `give-up-push-commit` fallback, not more polling. |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.