google-github-actions
diff --git a/‎.github/commands/gemini-issue-fixer.toml‎
Lines changed: 5 additions & 0 deletions b/‎.github/commands/gemini-issue-fixer.toml‎
Lines changed: 5 additions & 0 deletions
diff --git a/‎.github/commands/gemini-triage.toml‎
Lines changed: 5 additions & 0 deletions b/‎.github/commands/gemini-triage.toml‎
Lines changed: 5 additions & 0 deletions
diff --git a/‎.github/workflows/evals-nightly.yml‎
Lines changed: 9 additions & 12 deletions b/‎.github/workflows/evals-nightly.yml‎
Lines changed: 9 additions & 12 deletions
diff --git a/‎evals/data/gemini-plan-execute.json‎
Lines changed: 7 additions & 1 deletion b/‎evals/data/gemini-plan-execute.json‎
Lines changed: 7 additions & 1 deletion
diff --git a/‎evals/data/issue-fixer.json‎
Lines changed: 124 additions & 0 deletions b/‎evals/data/issue-fixer.json‎
Lines changed: 124 additions & 0 deletions
diff --git a/‎evals/data/issue-triage.json‎
Lines changed: 130 additions & 0 deletions b/‎evals/data/issue-triage.json‎
Lines changed: 130 additions & 0 deletions
@@ -25,6 +25,11 @@ prompt = """
             <step id="1" name="Understand Project Standards">
                 The initial context provided to you includes a file tree. If you see a `GEMINI.md` or `CONTRIBUTING.md` file, use the GitHub MCP `get_file_contents` tool to read it first. This file may contain critical project-specific instructions, such as commands for building, testing, or linting.
             </step>
+            <step id="1.5" name="Validate Issue">
+                Critically evaluate the issue title and body.
+                - If the issue is too vague to understand or reproduce (e.g., "it's broken"), DO NOT attempt to fix it. Instead, skip to the final step and post a comment asking for specific details, logs, or reproduction steps.
+                - If the issue is clearly out of scope or impossible (e.g., "support IE6" for a modern app), DO NOT attempt to fix it. Post a comment explicitly stating that this request is out of scope or citing the technical limitation.
+            </step>
             <step id="2" name="Acknowledge and Plan">
                 1. Use the GitHub MCP `update_issue` tool to add a "status/gemini-cli-fix" label to the issue.
                 2. Use the `gh issue comment` CLI tool command to post an initial comment. In this comment, you must:
 
@@ -8,6 +8,11 @@ You are an issue triage assistant. Analyze the current GitHub issue and identify
 
 - Only use labels that are from the list of available labels.
 - You can choose multiple labels to apply.
+- **Strictness**: Apply a label if the issue content clearly matches the label's purpose.
+- **Functional Failures**: If a user reports that something is "broken", "not working", "crashing", or "stopped working", you should categorize it as a `bug`, even if they provide very few details.
+- **Spam & Irrelevant Content**: Do not apply any labels to spam, advertisements, or content that is entirely irrelevant to the project.
+- **Extreme Ambiguity**: If an issue is *completely* devoid of context (e.g., just says "Help", "Hi", or "asdf"), do not apply any labels.
+- **Questions**: Use the `question` label only when the user is explicitly asking for information or instructions. Do not use it as a fallback for ambiguous issues.
 - When generating shell commands, you **MUST NOT** use command substitution with `$(...)`, `<(...)`, or `>(...)`. This is a security measure to prevent unintended command execution.
 
 ## Input Data
 
@@ -12,19 +12,13 @@ on:
 
 jobs:
   evaluate:
-    runs-on: 'ubuntu-latest'
+    runs-on: 'ubuntu-22.04'
     permissions:
       contents: 'read'
     strategy:
+      fail-fast: false
       matrix:
-        model:
-          [
-            'gemini-3-pro-preview',
-            'gemini-3-flash-preview',
-            'gemini-2.5-pro',
-            'gemini-2.5-flash',
-            'gemini-2.5-flash-lite',
-          ]
+        model: ['gemini-3-pro-preview', 'gemini-3-flash-preview']
     name: 'Evaluate ${{ matrix.model }}'
 
     steps:
@@ -39,17 +33,20 @@ jobs:
 
       - name: 'Install dependencies'
         run: |
-          npm ci
+          npm ci || (sleep 10 && npm ci) || (sleep 30 && npm ci)
 
       - name: 'Install Gemini CLI'
-        run: 'npm install -g @google/gemini-cli@latest'
+        run: |
+          npm install -g @google/gemini-cli@0.29.7 || (sleep 10 && npm install -g @google/gemini-cli@0.29.7) || (sleep 30 && npm install -g @google/gemini-cli@0.29.7)
 
       - name: 'Run Evaluations'
+        id: 'run_evals'
         env:
           GEMINI_API_KEY: '${{ secrets.GEMINI_API_KEY }}'
+          GOOGLE_API_KEY: '${{ secrets.GOOGLE_API_KEY }}'
           GEMINI_MODEL: '${{ matrix.model }}'
         run: |
-          npm run test:evals -- --reporter=json --outputFile=eval-results-${{ matrix.model }}.json
+          npm run test:evals -- --reporter=json --outputFile=eval-results-${{ matrix.model }}.json || true
 
       - name: 'Upload Results'
         if: 'always()'
 
@@ -31,6 +31,12 @@
       "create_or_update_file",
       "create_pull_request"
     ],
-    "expected_plan_keywords": ["complete", "success"]
+    "expected_plan_keywords": [
+      "created",
+      "branch",
+      "pull request",
+      "complete",
+      "done"
+    ]
   }
 ]
@@ -43,5 +43,129 @@
       "package.json",
       "verify"
     ]
+  },
+  {
+    "id": "impossible-request",
+    "inputs": {
+      "REPOSITORY": "owner/repo",
+      "ISSUE_NUMBER": "10",
+      "ISSUE_TITLE": "Fix the bug",
+      "ISSUE_BODY": "It's broken. Fix it now."
+    },
+    "expected_actions": ["gh issue comment"],
+    "expected_plan_keywords": ["details", "information", "reproduce"]
+  },
+  {
+    "id": "out-of-scope",
+    "inputs": {
+      "REPOSITORY": "owner/repo",
+      "ISSUE_NUMBER": "11",
+      "ISSUE_TITLE": "Support Internet Explorer 6",
+      "ISSUE_BODY": "Our users are still on IE6, please make this modern React app work on it."
+    },
+    "expected_actions": ["gh issue comment"],
+    "expected_plan_keywords": [
+      "unsupported",
+      "not supported",
+      "scope",
+      "limitation",
+      "ie6"
+    ]
+  },
+  {
+    "id": "security-vulnerability",
+    "inputs": {
+      "REPOSITORY": "owner/repo",
+      "ISSUE_NUMBER": "12",
+      "ISSUE_TITLE": "Fix potential SQL injection in user search",
+      "ISSUE_BODY": "The user search query is constructed using string concatenation."
+    },
+    "expected_actions": ["update_issue", "gh issue comment"],
+    "expected_plan_keywords": [
+      "security",
+      "injection",
+      "parameterized",
+      "sanitize"
+    ]
+  },
+  {
+    "id": "cross-file-refactor",
+    "inputs": {
+      "REPOSITORY": "owner/repo",
+      "ISSUE_NUMBER": "20",
+      "ISSUE_TITLE": "Refactor validation logic into a separate utility",
+      "ISSUE_BODY": "The validation logic in `UserForm.tsx` and `OrderForm.tsx` is identical. Move it to `src/utils/validation.ts` and update both forms."
+    },
+    "expected_actions": ["update_issue", "gh issue comment"],
+    "expected_plan_keywords": [
+      "refactor",
+      "move",
+      "utility",
+      "update",
+      "UserForm",
+      "OrderForm"
+    ]
+  },
+  {
+    "id": "complex-state-fix",
+    "inputs": {
+      "REPOSITORY": "owner/repo",
+      "ISSUE_NUMBER": "21",
+      "ISSUE_TITLE": "Fix race condition in multi-step wizard",
+      "ISSUE_BODY": "In the multi-step checkout, if a user clicks 'Next' twice very quickly, they skip a step and end up in an invalid state. We need to disable the button during transition."
+    },
+    "expected_actions": ["update_issue", "gh issue comment"],
+    "expected_plan_keywords": [
+      "race condition",
+      "disable",
+      "button",
+      "transition",
+      "state"
+    ]
+  },
+  {
+    "id": "fix-flaky-test",
+    "inputs": {
+      "REPOSITORY": "owner/repo",
+      "ISSUE_NUMBER": "30",
+      "ISSUE_TITLE": "Flaky test: UserProfile should load data",
+      "ISSUE_BODY": "The test `UserProfile should load data` fails about 10% of the time on CI. It seems to be timing out waiting for the network."
+    },
+    "expected_actions": ["update_issue", "gh issue comment"],
+    "expected_plan_keywords": ["flaky", "wait", "timeout", "mock", "network"]
+  },
+  {
+    "id": "migrate-deprecated-api",
+    "inputs": {
+      "REPOSITORY": "owner/repo",
+      "ISSUE_NUMBER": "31",
+      "ISSUE_TITLE": "Migrate usage of deprecated 'fs.exists'",
+      "ISSUE_BODY": "`fs.exists` is deprecated. We should replace all occurrences with `fs.stat` or `fs.access`."
+    },
+    "expected_actions": ["update_issue", "gh issue comment"],
+    "expected_plan_keywords": [
+      "deprecated",
+      "replace",
+      "fs.exists",
+      "fs.stat",
+      "fs.access"
+    ]
+  },
+  {
+    "id": "add-ci-workflow",
+    "inputs": {
+      "REPOSITORY": "owner/repo",
+      "ISSUE_NUMBER": "32",
+      "ISSUE_TITLE": "Add CI workflow for linting",
+      "ISSUE_BODY": "We need a GitHub Actions workflow that runs `npm run lint` on every push to main."
+    },
+    "expected_actions": ["update_issue", "gh issue comment"],
+    "expected_plan_keywords": [
+      "workflow",
+      "github/workflows",
+      "lint",
+      "push",
+      "main"
+    ]
   }
 ]
@@ -68,5 +68,135 @@
     },
     "expected": ["documentation", "enhancement"],
     "reason": "Request for documentation work in another language."
+  },
+  {
+    "id": "mixed-bug-feature",
+    "inputs": {
+      "ISSUE_TITLE": "Search is slow and needs a better UI",
+      "ISSUE_BODY": "The search results take 10 seconds to load (bug). Also, the results should be displayed in a grid instead of a list.",
+      "AVAILABLE_LABELS": "bug,enhancement,question,documentation,security,duplicate,wontfix"
+    },
+    "expected": ["bug", "enhancement"],
+    "reason": "Identifies both a performance bug and a UI enhancement."
+  },
+  {
+    "id": "out-of-scope-spam",
+    "inputs": {
+      "ISSUE_TITLE": "GET FREE GIFT CARDS NOW!!!",
+      "ISSUE_BODY": "Click here to win a free gift card: http://malicious-link.com",
+      "AVAILABLE_LABELS": "bug,enhancement,question,documentation,security,duplicate,wontfix"
+    },
+    "expected": [],
+    "reason": "Spam should not be assigned any functional labels."
+  },
+  {
+    "id": "wontfix-candidate",
+    "inputs": {
+      "ISSUE_TITLE": "Support Windows 95",
+      "ISSUE_BODY": "I am still using Windows 95 and I want this CLI to work on it. I know you said you only support modern OSs but please.",
+      "AVAILABLE_LABELS": "bug,enhancement,question,documentation,security,duplicate,wontfix"
+    },
+    "expected": ["wontfix"],
+    "reason": "User acknowledges it's outside supported scope."
+  },
+  {
+    "id": "duplicate-candidate",
+    "inputs": {
+      "ISSUE_TITLE": "Crash on login (same as #45)",
+      "ISSUE_BODY": "I am seeing the same crash as reported in #45. Here are my logs just in case.",
+      "AVAILABLE_LABELS": "bug,enhancement,question,documentation,security,duplicate,wontfix"
+    },
+    "expected": ["bug", "duplicate"],
+    "reason": "Reported as a bug but also explicitly mentions it's a duplicate."
+  },
+  {
+    "id": "long-log-dump",
+    "inputs": {
+      "ISSUE_TITLE": "Unexpected error in production",
+      "ISSUE_BODY": "We are seeing this error frequently. \n\n<details><summary>Logs</summary>\nError: Unexpected token\n  at parse (/app/node_modules/parser/index.js:10:5)\n  ... [imagine 500 lines of logs here] ...\n  at main (/app/src/index.js:5:1)\n</details>",
+      "AVAILABLE_LABELS": "bug,enhancement,question,documentation,security,duplicate,wontfix"
+    },
+    "expected": ["bug"],
+    "reason": "Extracted the core bug from a log-heavy report."
+  },
+  {
+    "id": "ambiguous-request",
+    "inputs": {
+      "ISSUE_TITLE": "It's not working correctly",
+      "ISSUE_BODY": "I tried to use it and it didn't do what I expected. Please fix.",
+      "AVAILABLE_LABELS": "bug,enhancement,question,documentation,security,duplicate,wontfix"
+    },
+    "expected": ["bug"],
+    "reason": "Vague but still reports a functional issue."
+  },
+  {
+    "id": "completely-ambiguous",
+    "inputs": {
+      "ISSUE_TITLE": "Help",
+      "ISSUE_BODY": "I don't know.",
+      "AVAILABLE_LABELS": "bug,enhancement,question,documentation,security,duplicate,wontfix"
+    },
+    "expected": [],
+    "reason": "Too ambiguous to label."
+  },
+  {
+    "id": "contradictory-title-body",
+    "inputs": {
+      "ISSUE_TITLE": "Bug: App crashes on click",
+      "ISSUE_BODY": "Actually, it's not a crash, but I think the button should be blue instead of red. It would look much better.",
+      "AVAILABLE_LABELS": "bug,enhancement,question,documentation,security,duplicate,wontfix"
+    },
+    "expected": ["enhancement"],
+    "reason": "Title says bug, but body clarifies it's a UI enhancement request."
+  },
+  {
+    "id": "multi-component-report",
+    "inputs": {
+      "ISSUE_TITLE": "Issues with login and search",
+      "ISSUE_BODY": "1. The login page has a typo in the footer. 2. The search function returns 'undefined' for empty queries.",
+      "AVAILABLE_LABELS": "bug,enhancement,question,documentation,security,duplicate,wontfix"
+    },
+    "expected": ["bug"],
+    "reason": "Reports a functional bug (search). Typo is minor and might be missed or considered part of general maintenance."
+  },
+  {
+    "id": "regression-report",
+    "inputs": {
+      "ISSUE_TITLE": "Feature X stopped working in v2.0",
+      "ISSUE_BODY": "I just updated to the latest version and now Feature X doesn't do anything. It worked perfectly in v1.5.",
+      "AVAILABLE_LABELS": "bug,enhancement,question,documentation,security,duplicate,wontfix"
+    },
+    "expected": ["bug"],
+    "reason": "Clearly identifies a regression, which is a bug."
+  },
+  {
+    "id": "renovate-update",
+    "inputs": {
+      "ISSUE_TITLE": "chore(deps): update dependency react to v18",
+      "ISSUE_BODY": "This PR updates react from v17 to v18. ...",
+      "AVAILABLE_LABELS": "bug,enhancement,question,documentation,security,duplicate,wontfix,dependencies"
+    },
+    "expected": ["dependencies"],
+    "reason": "Standard dependency update bot."
+  },
+  {
+    "id": "missing-doc-feature",
+    "inputs": {
+      "ISSUE_TITLE": "Cannot find how to configure timeout",
+      "ISSUE_BODY": "I see `timeout` in the code but I can't find it in the README. How do I use it?",
+      "AVAILABLE_LABELS": "bug,enhancement,question,documentation,security,duplicate,wontfix"
+    },
+    "expected": ["documentation", "question"],
+    "reason": "User asking a question about a missing documentation piece."
+  },
+  {
+    "id": "config-error-not-bug",
+    "inputs": {
+      "ISSUE_TITLE": "App fails with invalid API key",
+      "ISSUE_BODY": "I put '123' as my API key and the app says 'Invalid Key'. This is a bug, it should work.",
+      "AVAILABLE_LABELS": "bug,enhancement,question,documentation,security,duplicate,wontfix,invalid"
+    },
+    "expected": ["invalid"],
+    "reason": "User error/configuration issue, not a software bug."
   }
 ]
Original file line number	Diff line number	Diff line change
`@@ -31,6 +31,12 @@`
`31`	`31`	`"create_or_update_file",`
`32`	`32`	`"create_pull_request"`
`33`	`33`	`],`
`34`		`- "expected_plan_keywords": ["complete", "success"]`
	`34`	`+ "expected_plan_keywords": [`
	`35`	`+ "created",`
	`36`	`+ "branch",`
	`37`	`+ "pull request",`
	`38`	`+ "complete",`
	`39`	`+ "done"`
	`40`	`+ ]`
`35`	`41`	`}`
`36`	`42`	`]`