Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
677b52d
Implement token-saving skill and related evaluation files; update REA…
miroslavpojer May 20, 2026
e709f12
Add title parameter to release notes presence check in PR workflow
miroslavpojer May 20, 2026
f653b27
Update documentation for token-saving skill; add contribution guideli…
miroslavpojer May 21, 2026
fd9f957
Add token-saving skill documentation with response formatting rules
miroslavpojer May 21, 2026
b0eb566
Initial design, review of skill.md files not finished.
miroslavpojer May 21, 2026
dcbfcb5
Update links to living-doc-glossary in various skill documents and ag…
miroslavpojer May 22, 2026
15b5cec
Update links to living-doc-glossary in SKILL.md files and remove obso…
miroslavpojer May 22, 2026
9aab9d1
Add living doc automation scripts for ID assignment, gap detection, i…
miroslavpojer May 22, 2026
c172270
feat: add script to scan .feature files for AC link compliance
miroslavpojer May 22, 2026
ae2fa98
feat: enhance documentation with new skills and update references
miroslavpojer May 22, 2026
313385e
feat: remove outdated implementation roadmap from the repository
miroslavpojer May 22, 2026
8e33c87
Refactor Gherkin step definitions and Living Doc skills
miroslavpojer May 22, 2026
c63f2dd
feat: update Gherkin step definitions and enhance living documentatio…
miroslavpojer May 22, 2026
b983ef6
Add evaluation configurations for Gherkin and PageObject skills
miroslavpojer May 22, 2026
21503cd
feat: enhance Gherkin-Living Doc sync skill with improved description…
miroslavpojer May 22, 2026
ea9b7c8
Refactor living documentation skills for improved clarity and functio…
miroslavpojer May 22, 2026
d13a22f
feat: add evals and trigger evaluations for living-doc and living-doc…
miroslavpojer May 22, 2026
4d94d28
chore: remove roadmap.md as part of project restructuring
miroslavpojer May 22, 2026
8ad9e59
tmp
miroslavpojer May 25, 2026
6ac9ff3
Remove tmp data.
miroslavpojer May 26, 2026
949234b
Updated form Unify project integration.
miroslavpojer May 27, 2026
337bcfd
Updated evals and finished round 1 of trigger evals testing.
miroslavpojer May 27, 2026
e4cef54
Update living documentation agent description and triggers for clarit…
miroslavpojer May 27, 2026
9edf2eb
Update .gitignore and enhance living documentation skills with new co…
miroslavpojer May 27, 2026
8e7ba76
Enhance evals with improved output formatting and additional queries …
miroslavpojer May 27, 2026
17baf4e
Backup from integration
miroslavpojer May 30, 2026
3bdc5c2
partial tune status backup
miroslavpojer May 30, 2026
3fd7e50
Skill gap reduction.
miroslavpojer May 30, 2026
b45c00a
Final fix of gaps and issues.
miroslavpojer May 30, 2026
4f825bc
Test case review.
miroslavpojer May 31, 2026
fafcd64
Tested agent and improved testing doc.
miroslavpojer May 31, 2026
d4462a7
Skill test - round 1.
miroslavpojer May 31, 2026
c615596
Update SKILL.md files to enhance trigger phrases and improve descript…
miroslavpojer May 31, 2026
c70f8ca
Worked in changed from weakness review.
miroslavpojer May 31, 2026
f3c88e3
Testing round 2.
miroslavpojer May 31, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added .DS_Store
Binary file not shown.
355 changes: 355 additions & 0 deletions .github/agents/evals/living-doc-bdd-copilot/evals.json

Large diffs are not rendered by default.

31 changes: 31 additions & 0 deletions .github/agents/evals/living-doc-bdd-copilot/fixture-map.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Fixture Map — living-doc-bdd-copilot agent evals

## Eval coverage summary

| Eval ID | Category | Description | Fixture files |
|---------|----------|-------------|---------------|
| 1 | happy-path | Business Seed assembly — seed.yaml structure | — |
| 2 | happy-path | Create mode: PageObject generation from crawled surface | — |
| 3 | happy-path | Scenario generation from US ACs | — |
| 4 | regression | RE-SCAN mode — selector drift detection and repair | — |
| 5 | regression | HEALING mode — broken step definitions | — |
| 6 | negative | Unit test request → @sdet-copilot | — |
| 7 | paraphrase | "fix failing tests" → HEALING mode trigger | — |
| 8 | regression | REMOVE mode — full feature removal with pre-deletion checklist | — |
| 9 | regression | Partial state rule: seed.yaml present, manifest.json absent → first run | — |
| 10 | regression | Credential safety — literal credentials in seed.yaml rejected | — |
| 11 | edge-case | Source E guided traversal — blocked crawl, unknown field value | — |
| 12 | output-format | manifest.json entry structure for a scanned route | — |
| 25 | negative | Unit test request → decline + direct to @sdet-copilot | — |

## Trigger eval summary

| Count | Triggers (should_trigger=true) | Non-triggers (should_trigger=false) |
|-------|-------------------------------|--------------------------------------|
| 24 total | 20 true | 4 false |

False cases:
- `write a unit test` → @sdet-copilot
- `TypeScript quality gate` → @quality-gate-copilot (out of scope)

> No fixture files — all evals use inline prompt/expected_output; agent behavior is assessed against the agent.md operating rules and skill definitions.
266 changes: 266 additions & 0 deletions .github/agents/evals/living-doc-bdd-copilot/trigger-eval.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,266 @@
[
{
"id": 1,
"query": "Scan the webapp at https://app.example.com and generate PageObjects",
"should_trigger": true,
"reason": "'scan webapp' trigger phrase"
},
{
"id": 2,
"query": "Generate PageObjects for the checkout and login screens",
"should_trigger": true,
"reason": "'generate pageobjects' trigger phrase"
},
{
"id": 3,
"query": "Heal the PageObjects after the UI redesign — selectors are broken",
"should_trigger": true,
"reason": "'heal pageobjects' trigger phrase"
},
{
"id": 4,
"query": "Generate BDD scenarios for the active User Stories",
"should_trigger": true,
"reason": "'generate scenarios' trigger phrase"
},
{
"id": 5,
"query": "Sync the Gherkin feature files with the living doc AC catalog",
"should_trigger": true,
"reason": "'sync gherkin' trigger phrase"
},
{
"id": 6,
"query": "Use Playwright to crawl the application and discover all screens",
"should_trigger": true,
"reason": "'playwright crawl' trigger phrase"
},
{
"id": 7,
"query": "Explore the app and map all the UI surfaces",
"should_trigger": true,
"reason": "'explore the app' trigger phrase"
},
{
"id": 8,
"query": "@bdd-copilot scan the dashboard and generate scenarios",
"should_trigger": true,
"reason": "'bdd copilot' trigger phrase — explicit agent invocation"
},
{
"id": 9,
"query": "@living-doc-bdd-copilot set up the BDD suite for our new module",
"should_trigger": true,
"reason": "'living doc bdd copilot' trigger phrase — explicit agent invocation"
},
{
"id": 10,
"query": "Run the full BDD pipeline — crawl, generate PageObjects, and produce feature files",
"should_trigger": true,
"reason": "'BDD pipeline' trigger phrase"
},
{
"id": 11,
"query": "Crawl the UI to discover all reachable pages",
"should_trigger": true,
"reason": "'crawl the UI' trigger phrase"
},
{
"id": 12,
"query": "Create page objects for the admin portal",
"should_trigger": true,
"reason": "'create page objects' trigger phrase"
},
{
"id": 13,
"query": "Generate a feature file for US-007 — Place an Online Order",
"should_trigger": true,
"reason": "'generate feature file' trigger phrase"
},
{
"id": 14,
"query": "What is the scenario coverage for US-007?",
"should_trigger": true,
"reason": "'scenario coverage' trigger phrase"
},
{
"id": 15,
"query": "Write the step definitions for the checkout scenarios",
"should_trigger": true,
"reason": "'step definitions' trigger phrase"
},
{
"id": 16,
"query": "Generate Gherkin from user story US-003",
"should_trigger": true,
"reason": "'gherkin from user story' trigger phrase"
},
{
"id": 17,
"query": "Create a User Story for the loyalty points redemption feature",
"should_trigger": true,
"reason": "Catalog entity creation is handled by this agent in catalog-operations mode"
},
{
"id": 18,
"query": "Write a unit test for the discount calculation function",
"should_trigger": false,
"reason": "Unit test authoring — out of scope for this toolkit (no @sdet-copilot agent defined)"
},
{
"id": 19,
"query": "Update the AC state on US-007-02 to DEPRECATED",
"should_trigger": true,
"reason": "Catalog entity state update is handled by this agent in catalog-operations mode"
},
{
"id": 20,
"query": "Run the TypeScript quality gate for the frontend",
"should_trigger": false,
"reason": "Quality gate execution — out of scope for this agent"
},
{
"id": 21,
"query": "The manifest.json is missing — start a first exploration run from the seed file",
"should_trigger": true,
"reason": "Partial state: seed present, manifest absent → first exploration run — 'scan webapp' pattern"
},
{
"id": 22,
"query": "The seed.yaml has literal credentials — is that correct?",
"should_trigger": true,
"reason": "Credential safety rule enforcement during seed assembly — BDD session setup task"
},
{
"id": 23,
"query": "I've hit a guided traversal point — the checkout wizard needs a delivery zone code",
"should_trigger": true,
"reason": "Source E guided traversal protocol — blocked crawl point during exploration"
},
{
"id": 24,
"query": "Update the AC on US-007 to change the payment timeout to 30 seconds",
"should_trigger": true,
"reason": "AC update is a catalog layer operation handled by this agent"
},
{
"id": 25,
"query": "Debug the null pointer exception in PaymentService.processOrder()",
"should_trigger": false,
"reason": "Application debugging — outside the living doc / BDD scope"
},
{
"id": 26,
"query": "Add error handling to the checkout API endpoint",
"should_trigger": false,
"reason": "Production code change — outside the living doc / BDD scope"
},
{
"id": 27,
"query": "Write an OpenAPI spec for the orders REST endpoint",
"should_trigger": false,
"reason": "API schema documentation — not living doc entity creation"
},
{
"id": 28,
"query": "Configure the Kubernetes resource limits for the order service",
"should_trigger": false,
"reason": "Infrastructure configuration — outside scope"
},
{
"id": 29,
"query": "Set up a CI pipeline for the frontend build",
"should_trigger": false,
"reason": "CI/CD configuration — outside scope"
},
{
"id": 30,
"query": "Fix the failing unit tests in CartCalculatorTest",
"should_trigger": false,
"reason": "Unit test fix — outside the living doc / BDD scope"
},
{
"id": 31,
"query": "Refactor the PaymentService to use the repository pattern",
"should_trigger": false,
"reason": "Code refactoring — outside the living doc / BDD scope"
},
{
"id": 32,
"query": "Add structured logging to the checkout service",
"should_trigger": false,
"reason": "Application logging — outside scope"
},
{
"id": 33,
"query": "Review this pull request for code quality issues",
"should_trigger": false,
"reason": "Code review — outside scope"
},
{
"id": 34,
"query": "Configure ESLint rules for the frontend project",
"should_trigger": false,
"reason": "Dev tooling configuration — outside scope"
},
{
"id": 35,
"query": "Write a database migration script to add the promo_code column",
"should_trigger": false,
"reason": "DB schema change — outside scope"
},
{
"id": 36,
"query": "Optimize the SQL query in OrderRepository.findByCustomer()",
"should_trigger": false,
"reason": "Query optimization — outside scope"
},
{
"id": 37,
"query": "Set up monitoring alerts for the payment service",
"should_trigger": false,
"reason": "Ops / monitoring — outside scope"
},
{
"id": 38,
"query": "Write technical documentation for the REST API",
"should_trigger": false,
"reason": "Generic tech docs — not living doc entity creation"
},
{
"id": 39,
"query": "How do I set up a multi-stage Docker build for the backend?",
"should_trigger": false,
"reason": "Container/infra question — outside scope"
},
{
"id": 40,
"query": "Run the security vulnerability scan on the checkout service",
"should_trigger": false,
"reason": "Security tooling — outside scope"
},
{
"id": 41,
"query": "Generate a performance report for the checkout flow",
"should_trigger": false,
"reason": "Performance testing — outside scope"
},
{
"id": 42,
"query": "Write a changelog for the v2.1.0 release",
"should_trigger": false,
"reason": "Release management — outside scope"
},
{
"id": 43,
"query": "Configure feature flags for the new checkout flow",
"should_trigger": false,
"reason": "Feature flag setup — outside scope"
},
{
"id": 44,
"query": "Fix the TypeScript compilation error in CheckoutComponent",
"should_trigger": false,
"reason": "Compile error fix — outside scope"
}
]
Loading