From 677b52d658d76721302f7fedbff6246ab0150150 Mon Sep 17 00:00:00 2001 From: miroslavpojer Date: Wed, 20 May 2026 15:41:47 +0200 Subject: [PATCH 01/35] Implement token-saving skill and related evaluation files; update README and documentation for clarity --- README.md | 2 +- docs/getting-started.md | 2 +- skills/SKILL.md | 66 +++++++ skills/token-saving/evals/evals.json | 186 ++++++++++++++++++ .../evals/files/large-user-service.py | 185 +++++++++++++++++ .../evals/files/pr-description-existing.md | 20 ++ .../evals/files/prior-conversation.txt | 27 +++ .../evals/files/rate-limiter-fix.diff | 28 +++ .../evals/files/sprint-changelog.txt | 36 ++++ skills/token-saving/evals/trigger-eval.json | 107 ++++++++++ 10 files changed, 657 insertions(+), 2 deletions(-) create mode 100644 skills/SKILL.md create mode 100644 skills/token-saving/evals/evals.json create mode 100644 skills/token-saving/evals/files/large-user-service.py create mode 100644 skills/token-saving/evals/files/pr-description-existing.md create mode 100644 skills/token-saving/evals/files/prior-conversation.txt create mode 100644 skills/token-saving/evals/files/rate-limiter-fix.diff create mode 100644 skills/token-saving/evals/files/sprint-changelog.txt create mode 100644 skills/token-saving/evals/trigger-eval.json diff --git a/README.md b/README.md index f90df00..428ed11 100644 --- a/README.md +++ b/README.md @@ -87,7 +87,7 @@ Before building a new skill, check whether one already exists: | [skills.sh](https://skills.sh) | Open registry — install with `npx skills add ` | | [anthropics/skills](https://github.com/anthropics/skills) | Anthropic reference skills including `skill-creator` | | [absa-group/agent-skills](https://github.com/absa-group/agent-skills) | Broader ABSA-owned skill collection | -| [absa-group/cps-agentic-toolkit](https://github.com/absa-group/cps-agentic-toolkit) | CPS team's skill set built on top of this repo | +| [absa-group/cps-agentic-toolkit](https://github.com/absa-group/cps-agentic-toolkit) | CPS team's extended skill set (ABSA-internal) | ## Contributing diff --git a/docs/getting-started.md b/docs/getting-started.md index d5402a3..2c9b085 100644 --- a/docs/getting-started.md +++ b/docs/getting-started.md @@ -132,7 +132,7 @@ Project skills take precedence over global skills when both exist. ### Add project-specific skills For skills that only apply to a specific repository, place them in `.github/skills/` within that repo. These are loaded -automatically when Copilot CLI is launched from that directory, layered on top of your personal and CPS base skills. +automatically when Copilot CLI is launched from that directory, layered on top of your personal and shared base skills. ``` your-project-repo/ diff --git a/skills/SKILL.md b/skills/SKILL.md new file mode 100644 index 0000000..473b75d --- /dev/null +++ b/skills/SKILL.md @@ -0,0 +1,66 @@ +--- +name: token-saving +description: > + Always-active response formatting rules — invoke for every reply without exception: coding + questions, code generation, debugging, explanations, comparisons, reviews, diffs, PR updates, + recaps, summaries, workflow tasks, non-technical requests, and anything else. Also invoke on + explicit brevity signals: be concise, keep it short, save tokens, too verbose, shorter, terse, + brief, no fluff, summarise, can you make that shorter. Rules: no filler openers (Certainly!, + Great question!, Happy to help!); no closing platitudes (Let me know if you have questions!); + concise within line limits; skip restating prior context; prefer tables/bullets over prose; + append What changed / Why / How to verify footer only for code-output responses, not Q&A, + reviews, or planning. Boundary: when user explicitly requests full detail, deep dive, complete + explanation, or says "don't hold back", length rules suspend — respond fully. Another active + skill's more specific format requirements take precedence. +--- + +# Token-Saving + +Always-active base behaviour. Apply to every response without exception unless the user explicitly requests verbosity. + +## Always apply — response discipline + +- Default to the shortest response that fully answers the question +- Factual or conceptual answers: aim for ≤ 5 prose lines; one minimal code block is permitted and does not count toward that limit +- Action lists and next-step recommendations: cap at 4 bullets; no header line before the list +- Must not repeat context already established in the conversation +- Must not pad responses with preamble ("Great question!", "Certainly!", "As an AI...") +- Must not add closing summaries that restate what was just said +- Stop when the task is complete — must not append "let me know if you need anything else" filler +- Prefer structured output when it improves clarity: bullets, tables, and short code blocks over dense prose +- If another active skill or task requires a more specific output format, that format takes precedence + +## Format code output responses + +End every response where you output code for the user to incorporate — new functions, patches, inline diffs, config snippets, or any code block that represents a change — with exactly this structure (no more, no less): + +``` +**What changed:** +**Why:** +**How to verify:** +``` + +This footer does NOT apply to pure Q&A, reviews, planning, comparisons, or conceptual explanations — only when you are writing or changing code. + +When applying or confirming a bug fix: always show the changed line(s) or a minimal diff, then the footer. A prose description of a code change without showing the code is not sufficient. + +- Must not paste full file contents unless the user explicitly asks +- Show diffs or changed sections only +- Include enough surrounding context for the change to be unambiguous + +## Keep summaries and recaps concise + +- Aim for ≤ 10 lines in any recap +- Prefer linking to files/lines over quoting large blocks +- Use bullet lists over paragraphs +- Summarise deltas — what is different — not what already existed + +## Update PR bodies by appending only + +- Treat the PR description as a changelog — append only, never rewrite +- Append under `## Update YYYY-MM-DD` with the commit hash — use today's date from your system context (the current date, not a guessed or example date) +- Must not delete prior update sections + +## Respond fully when detail is explicitly requested + +If the user explicitly asks for a full explanation, rationale, or deep dive — ALL rules in this skill are suspended for that response. Cover every step, concept, and detail without omitting any part of the topic. Do not apply line limits, bullet caps, or summarisation. diff --git a/skills/token-saving/evals/evals.json b/skills/token-saving/evals/evals.json new file mode 100644 index 0000000..9fe8459 --- /dev/null +++ b/skills/token-saving/evals/evals.json @@ -0,0 +1,186 @@ +{ + "skill_name": "token-saving", + "evals": [ + { + "id": 1, + "category": "happy-path", + "prompt": "What is the purpose of a Dockerfile ENTRYPOINT vs CMD?", + "files": [], + "expected_output": "A concise answer (2\u20135 lines) explaining the difference. No preamble, no closing filler.", + "expectations": [ + "Response does not begin with filler openers: 'Great question!', 'Certainly!', 'Sure!', 'Of course!', 'As an AI...', 'Happy to help!' or equivalent", + "Response is \u2264 5 prose lines (a single code block, if present, does not count toward this limit)", + "Response does not end with closing filler such as 'Let me know if you need anything else', 'Feel free to ask', or 'Hope that helps!'", + "Response correctly explains ENTRYPOINT (fixed executable) vs CMD (default overridable args) without padding", + "Response does NOT end with a **What changed** / **Why** / **How to verify** footer — pure Q&A, not a code output response" + ] + }, + { + "id": 2, + "category": "happy-path", + "prompt": "Fix the window comparison bug in the rate limiter. The diff is at evals/files/rate-limiter-fix.diff \u2014 apply the fix and confirm.", + "files": [ + "evals/files/rate-limiter-fix.diff" + ], + "expected_output": "Agent confirms the fix and ends with the exact 3-line footer: **What changed**, **Why**, **How to verify**. Shows only the changed line, not the full file.", + "expectations": [ + "Response ends with exactly the footer: '**What changed:** ', '**Why:** ', '**How to verify:** '", + "Response does NOT include the full contents of rate_limiter.py", + "Response shows only the changed line or a short diff/snippet of the fixed function", + "No preamble opener ('Certainly! I'll fix that for you...')", + "No closing filler after the footer" + ] + }, + { + "id": 3, + "category": "regression", + "prompt": "Context from prior conversation is in evals/files/prior-conversation.txt. Now that we've covered the architecture \u2014 what's the concrete next step I should take?", + "files": [ + "evals/files/prior-conversation.txt" + ], + "expected_output": "Agent gives a direct next-step recommendation without re-summarising the already-established context (stack, architecture decisions).", + "expectations": [ + "Response does NOT restate the tech stack or architecture decisions already established in prior-conversation.txt", + "Response does NOT open with 'As we discussed...' or 'To summarise what we covered...' or 'So you have a FastAPI app with...'", + "Response jumps directly to the next-step recommendation", + "Response is \u2264 5 lines", + "No closing filler line" + ] + }, + { + "id": 4, + "category": "regression", + "prompt": "Fix the validate_email bug in evals/files/large-user-service.py \u2014 it currently allows empty strings through. Fix only that function.", + "files": [ + "evals/files/large-user-service.py" + ], + "expected_output": "Agent shows only the fixed validate_email function or a targeted diff of that function. Does not paste the entire large-user-service.py.", + "expectations": [ + "Response does NOT include the full contents of large-user-service.py", + "Response shows only the fixed validate_email function or a targeted diff of that function", + "Response ends with **What changed** / **Why** / **How to verify** footer", + "No preamble opener", + "No closing filler after the footer" + ] + }, + { + "id": 5, + "category": "happy-path", + "prompt": "Give me a concise recap of what changed in the user service this sprint. Use evals/files/sprint-changelog.txt as input.", + "files": [ + "evals/files/sprint-changelog.txt" + ], + "expected_output": "A recap in \u2264 10 bullet lines that summarises the deltas (what changed). Does not quote large blocks from the changelog verbatim.", + "expectations": [ + "Recap is \u2264 10 lines total", + "Uses bullet list format, not prose paragraphs", + "Summarises what changed (deltas), not what existed before", + "Does NOT quote large blocks from sprint-changelog.txt verbatim", + "No closing filler line" + ] + }, + { + "id": 6, + "category": "happy-path", + "prompt": "Update the PR description to reflect the latest commit (abc1234): we added input sanitisation to the registration endpoint. Current PR body is in evals/files/pr-description-existing.md.", + "files": [ + "evals/files/pr-description-existing.md" + ], + "expected_output": "Agent appends a new '## Update 2026-05-15' section with commit abc1234. Does not rewrite or delete any prior section.", + "expectations": [ + "Agent appends a new section headed '## Update YYYY-MM-DD' \u2014 date matches current date", + "The commit hash abc1234 is included in the appended section", + "The existing PR description body and prior '## Update' sections are NOT modified or deleted", + "Agent does NOT rewrite the entire PR description from scratch", + "No closing filler line" + ] + }, + { + "id": 7, + "category": "negative", + "prompt": "Please give me a full in-depth explanation of how OAuth 2.0 Authorization Code Flow works \u2014 I want every step covered in detail.", + "files": [], + "expected_output": "Agent provides a complete, detailed explanation covering all steps. Conciseness rules are suspended because the user explicitly requested full detail.", + "expectations": [ + "Agent provides a comprehensive explanation of ALL steps: client registration, redirect URI, auth request, user consent, auth code, token exchange, token use", + "Response is NOT artificially truncated \u2014 user explicitly asked for all details", + "Agent does NOT cite conciseness or brevity rules to justify shortening the response", + "Response does NOT end with the code-change footer (not a code change task)" + ] + }, + { + "id": 8, + "category": "regression", + "prompt": "Write a Python function to check if a number is prime.", + "files": [], + "expected_output": "Agent writes a correct, concise isPrime function and ends with the **What changed** / **Why** / **How to verify** footer. No preamble.", + "expectations": [ + "Response does NOT open with 'Certainly!', 'Sure!', 'Great!' or any preamble", + "Function is correct \u2014 returns False for n < 2, uses trial division up to sqrt(n) or equivalent", + "Response ends with **What changed** / **Why** / **How to verify** footer", + "Response does NOT end with 'Let me know if you want me to add tests!' or similar filler", + "Response does NOT explain every line of the function with excessive inline commentary" + ] + }, + { + "id": 9, + "category": "paraphrase", + "prompt": "Quick answer only \u2014 what's the difference between a process and a thread?", + "files": [], + "expected_output": "A concise answer (\u2264 4 lines) on the key difference. No acknowledgement of the 'quick answer' request \u2014 it just IS quick.", + "expectations": [ + "Response is \u2264 4 lines", + "Response correctly explains the key difference (separate memory space vs shared memory / OS-scheduled vs cooperatively scheduled)", + "Response does NOT open by acknowledging 'Quick answer:' or 'Here's a quick answer:' \u2014 the terseness is implicit", + "No preamble opener", + "No closing filler", + "Response does NOT end with a **What changed** / **Why** / **How to verify** footer — pure Q&A, not a code output response" + ] + }, + { + "id": 10, + "category": "edge-case", + "prompt": "We've been working on this feature for the past hour. Here's what we've done so far: set up the FastAPI router, added the Pydantic request/response schemas, wired in the database session dependency, and wrote the POST /users endpoint. What should we tackle next?", + "files": [], + "expected_output": "Agent gives a direct next-step suggestion without restating the four things the user just listed.", + "expectations": [ + "Response does NOT restate or re-list the four completed items (router, schemas, DB session, POST endpoint)", + "Response gives a concrete, actionable next step (e.g. tests, auth middleware, error handling, GET endpoint)", + "Response is \u2264 4 lines", + "No preamble opener", + "No closing filler", + "Response does NOT end with a **What changed** / **Why** / **How to verify** footer — planning response, not a code output" + ] + }, + { + "id": 11, + "category": "regression", + "prompt": "Review this diff and tell me if the fix looks correct: evals/files/rate-limiter-fix.diff", + "files": [ + "evals/files/rate-limiter-fix.diff" + ], + "expected_output": "Agent reviews the diff and confirms whether the fix is correct. Does NOT append the code-change footer — this is a code review, not a code output response.", + "expectations": [ + "Response confirms the fix is correct (t > window_start correctly evicts timestamps outside the sliding window)", + "Response does NOT end with **What changed** / **Why** / **How to verify** footer — this is a review task, not a code output response", + "Response does not paste the full contents of rate_limiter.py", + "No preamble opener", + "No closing filler" + ] + }, + { + "id": 12, + "category": "happy-path", + "prompt": "Compare synchronous vs asynchronous SQLAlchemy sessions — when should I use each?", + "files": [], + "expected_output": "Agent answers with structured output (table or clearly delineated bullet comparison) rather than a dense prose paragraph. Concise, within line limits.", + "expectations": [ + "Response uses structured output — a comparison table or clearly delineated bullet sections — not a prose paragraph", + "Response is concise and does not expand into a full SQLAlchemy tutorial", + "No preamble opener", + "No closing filler", + "Response does NOT end with a **What changed** / **Why** / **How to verify** footer — pure Q&A comparison" + ] + } + ] +} \ No newline at end of file diff --git a/skills/token-saving/evals/files/large-user-service.py b/skills/token-saving/evals/files/large-user-service.py new file mode 100644 index 0000000..0fc1395 --- /dev/null +++ b/skills/token-saving/evals/files/large-user-service.py @@ -0,0 +1,185 @@ +"""User service — business logic layer for user management.""" +from __future__ import annotations + +import hashlib +import re +import uuid +from datetime import datetime, timedelta +from typing import List, Optional + +from sqlalchemy.orm import Session + +from src.models.user import User +from src.schemas.user import UserCreate, UserUpdate +from src.core.security import hash_password, verify_password +from src.core.exceptions import UserNotFoundError, DuplicateEmailError + + +# --------------------------------------------------------------------------- +# Internal helpers +# --------------------------------------------------------------------------- + +_EMAIL_RE = re.compile(r"^[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}$") +_PASSWORD_MIN_LEN = 8 + + +def validate_email(email: str) -> bool: + """Return True if *email* is a syntactically valid e-mail address. + + Bug: currently returns True for empty strings because the regex is only + checked when email is truthy. Fix: add an explicit empty-string guard. + """ + # BUG: missing `if not email: return False` — empty string passes through + return bool(_EMAIL_RE.match(email)) + + +def validate_password(password: str) -> bool: + """Return True if password meets minimum length and complexity rules.""" + if not password or len(password) < _PASSWORD_MIN_LEN: + return False + has_upper = any(c.isupper() for c in password) + has_digit = any(c.isdigit() for c in password) + return has_upper and has_digit + + +# --------------------------------------------------------------------------- +# UserService +# --------------------------------------------------------------------------- + +class UserService: + def __init__(self, db: Session) -> None: + self.db = db + + # ------------------------------------------------------------------ + # Read + # ------------------------------------------------------------------ + + def get_user(self, user_id: str) -> User: + user = self.db.query(User).filter(User.id == user_id).first() + if not user: + raise UserNotFoundError(user_id) + return user + + def get_user_by_email(self, email: str) -> Optional[User]: + return self.db.query(User).filter(User.email == email).first() + + def get_users(self, skip: int = 0, limit: int = 50) -> List[User]: + return ( + self.db.query(User) + .filter(User.is_active.is_(True)) + .offset(skip) + .limit(limit) + .all() + ) + + def search_users(self, query: str, limit: int = 20) -> List[User]: + pattern = f"%{query}%" + return ( + self.db.query(User) + .filter( + (User.email.ilike(pattern)) | (User.display_name.ilike(pattern)) + ) + .limit(limit) + .all() + ) + + # ------------------------------------------------------------------ + # Write + # ------------------------------------------------------------------ + + def register(self, payload: UserCreate) -> User: + if not validate_email(payload.email): + raise ValueError(f"Invalid email address: {payload.email!r}") + if not validate_password(payload.password): + raise ValueError("Password does not meet complexity requirements.") + if self.get_user_by_email(payload.email): + raise DuplicateEmailError(payload.email) + + user = User( + id=str(uuid.uuid4()), + email=payload.email.lower().strip(), + password_hash=hash_password(payload.password), + display_name=payload.display_name, + created_at=datetime.utcnow(), + is_active=True, + ) + self.db.add(user) + self.db.commit() + self.db.refresh(user) + return user + + def update_profile(self, user_id: str, payload: UserUpdate) -> User: + user = self.get_user(user_id) + if payload.display_name is not None: + user.display_name = payload.display_name + if payload.email is not None: + if not validate_email(payload.email): + raise ValueError(f"Invalid email address: {payload.email!r}") + existing = self.get_user_by_email(payload.email) + if existing and existing.id != user_id: + raise DuplicateEmailError(payload.email) + user.email = payload.email.lower().strip() + user.updated_at = datetime.utcnow() + self.db.commit() + self.db.refresh(user) + return user + + def change_password( + self, user_id: str, old_password: str, new_password: str + ) -> None: + user = self.get_user(user_id) + if not verify_password(old_password, user.password_hash): + raise ValueError("Current password is incorrect.") + if not validate_password(new_password): + raise ValueError("New password does not meet complexity requirements.") + user.password_hash = hash_password(new_password) + user.updated_at = datetime.utcnow() + self.db.commit() + + def deactivate_user(self, user_id: str) -> None: + user = self.get_user(user_id) + user.is_active = False + user.deactivated_at = datetime.utcnow() + self.db.commit() + + def reactivate_user(self, user_id: str) -> None: + user = self.get_user(user_id) + user.is_active = True + user.deactivated_at = None + user.updated_at = datetime.utcnow() + self.db.commit() + + # ------------------------------------------------------------------ + # Auth helpers + # ------------------------------------------------------------------ + + def generate_password_reset_token(self, email: str) -> Optional[str]: + user = self.get_user_by_email(email) + if not user or not user.is_active: + return None + token = hashlib.sha256( + f"{user.id}{user.password_hash}{datetime.utcnow().isoformat()}".encode() + ).hexdigest() + user.reset_token = token + user.reset_token_expires = datetime.utcnow() + timedelta(hours=1) + self.db.commit() + return token + + def consume_password_reset_token(self, token: str, new_password: str) -> bool: + user = ( + self.db.query(User) + .filter(User.reset_token == token) + .first() + ) + if not user: + return False + if user.reset_token_expires < datetime.utcnow(): + return False + if not validate_password(new_password): + raise ValueError("New password does not meet complexity requirements.") + user.password_hash = hash_password(new_password) + user.reset_token = None + user.reset_token_expires = None + user.updated_at = datetime.utcnow() + self.db.commit() + return True diff --git a/skills/token-saving/evals/files/pr-description-existing.md b/skills/token-saving/evals/files/pr-description-existing.md new file mode 100644 index 0000000..217555e --- /dev/null +++ b/skills/token-saving/evals/files/pr-description-existing.md @@ -0,0 +1,20 @@ +## Summary + +Implements the user registration and authentication endpoints for the platform API. + +Adds: +- `POST /auth/register` — creates a new user account with email + password +- `POST /auth/login` — returns a JWT access token on successful credentials +- `POST /auth/logout` — invalidates the current session token in Redis +- Input validation via Pydantic v2 schemas (email format, password complexity) +- Password hashing with bcrypt + +Migrations included. All endpoints covered by integration tests (pytest + TestClient). + +--- + +## Update 2026-05-12 · commit 9f3a21b + +- Added `POST /auth/password-reset/request` and `POST /auth/password-reset/confirm` +- Reset tokens expire after 1 hour; stored hashed in DB +- 8 new unit tests added for reset flow edge cases (expired, reuse, invalid token) diff --git a/skills/token-saving/evals/files/prior-conversation.txt b/skills/token-saving/evals/files/prior-conversation.txt new file mode 100644 index 0000000..1490b10 --- /dev/null +++ b/skills/token-saving/evals/files/prior-conversation.txt @@ -0,0 +1,27 @@ +# Prior conversation — architecture decisions (established context) +# Used as fixture for eval test 3: agent must NOT restate this context in its next reply. + +--- + +User: I'm starting a new backend project. What tech stack would you recommend for a high-throughput REST API with a relational database? +Agent: For a high-throughput REST API with a relational database I'd go with: +- FastAPI (Python) — async-native, fast, great OpenAPI support out of the box +- PostgreSQL — solid relational choice with good JSON/JSONB support if you need it +- Alembic — standard migration tool for SQLAlchemy-based stacks +- Pydantic v2 — schema validation, already bundled with FastAPI +- Redis — for caching and rate-limiting +- AWS ECS (Fargate) — good managed container option if you're on AWS + +User: Perfect, let's go with that. I want to use async SQLAlchemy sessions and run the service on port 8000. + +Agent: Confirmed — async SQLAlchemy with asyncpg driver, port 8000. Standard session factory pattern with a dependency-injected AsyncSession per request. + +User: Great. I've got the project scaffolded: FastAPI app instance created, PostgreSQL async engine configured, Alembic initialised with the first baseline migration, Pydantic v2 schemas for the user model, and Redis client wired in. The service is listening on port 8000. + +Agent: Solid foundation. You have all the core infrastructure pieces in place. + +--- +# End of established context. The eval prompt asks: "what's the concrete next step?" +# A correct response jumps straight to the recommendation — it does NOT restate: +# - the stack (FastAPI, PostgreSQL, Redis, Alembic, Pydantic v2, port 8000) +# - the architecture decisions already locked in above diff --git a/skills/token-saving/evals/files/rate-limiter-fix.diff b/skills/token-saving/evals/files/rate-limiter-fix.diff new file mode 100644 index 0000000..858511d --- /dev/null +++ b/skills/token-saving/evals/files/rate-limiter-fix.diff @@ -0,0 +1,28 @@ +diff --git a/src/middleware/rate_limiter.py b/src/middleware/rate_limiter.py +index 3a1f2c8..b9e4d71 100644 +--- a/src/middleware/rate_limiter.py ++++ b/src/middleware/rate_limiter.py +@@ -1,25 +1,25 @@ + import time + from collections import defaultdict + from typing import Dict, List + + + class RateLimiter: + """Sliding window rate limiter keyed by client_id.""" + + def __init__(self, max_requests: int = 100, window_seconds: int = 60): + self.max_requests = max_requests + self.window_seconds = window_seconds + self.requests: Dict[str, List[float]] = defaultdict(list) + + def is_allowed(self, client_id: str) -> bool: + now = time.time() + window_start = now - self.window_seconds +- self.requests[client_id] = [t for t in self.requests[client_id] if t > now] ++ self.requests[client_id] = [t for t in self.requests[client_id] if t > window_start] + self.requests[client_id].append(now) + return len(self.requests[client_id]) <= self.max_requests + + def reset(self, client_id: str) -> None: + self.requests.pop(client_id, None) diff --git a/skills/token-saving/evals/files/sprint-changelog.txt b/skills/token-saving/evals/files/sprint-changelog.txt new file mode 100644 index 0000000..6992ae4 --- /dev/null +++ b/skills/token-saving/evals/files/sprint-changelog.txt @@ -0,0 +1,36 @@ +# Sprint 23 — User Service Changelog +# Period: 2026-04-28 to 2026-05-09 + +## AUTH-201 — Password reset flow +- Added POST /auth/password-reset/request endpoint; sends reset token via email +- Added POST /auth/password-reset/confirm endpoint; validates token, sets new password +- Reset tokens expire after 1 hour; stored hashed in users.reset_token column +- Migration: adds reset_token (varchar 64, nullable) and reset_token_expires (timestamptz, nullable) to users table +- Unit tests: 8 new cases covering happy path, expired token, invalid token, reuse prevention + +## USER-88 — Profile update endpoint +- Added PATCH /users/{id}/profile; supports display_name and email changes +- Email change triggers re-verification flow; sets email_verified = false until confirmed +- Duplicate email check added before persisting change +- Integration test: 5 new cases + +## USER-91 — Soft delete / deactivate +- Added DELETE /users/{id} (soft delete); sets is_active = false, records deactivated_at +- GET /users now filters out inactive users by default; added ?include_inactive=true query param for admin use +- Migration: adds deactivated_at (timestamptz, nullable) to users table + +## INFRA-14 — Rate limiting middleware +- Added sliding-window rate limiter (100 req/60 s per client IP) as FastAPI middleware +- Fixed off-by-one bug: window comparison was using `now` instead of `window_start` (requests were never pruned) +- Redis backend for distributed rate limit state; falls back to in-memory if Redis unavailable +- Config: RATE_LIMIT_MAX_REQUESTS, RATE_LIMIT_WINDOW_SECONDS env vars added to .env.example + +## CHORE-09 — Dependency bumps +- Upgraded fastapi 0.109 → 0.111 (security patch: CVE-2024-24762) +- Upgraded pydantic 2.5 → 2.7 (minor; no breaking changes) +- Upgraded alembic 1.13 → 1.14 (minor; added index reflection improvements) + +## TEST-22 — Coverage improvements +- Overall coverage: 61% → 78% +- Added missing edge-case tests for validate_email (empty string, unicode domains, subaddressing) +- Added missing edge-case tests for validate_password (all-digits, all-uppercase, exactly min length) diff --git a/skills/token-saving/evals/trigger-eval.json b/skills/token-saving/evals/trigger-eval.json new file mode 100644 index 0000000..f739903 --- /dev/null +++ b/skills/token-saving/evals/trigger-eval.json @@ -0,0 +1,107 @@ +[ + { + "_comment": "token-saving is always-active: should_trigger is true for every prompt type. There are no should_trigger:false cases — that is itself the key assertion. Cases are grouped into: (1) keyword-free prompts (tests that 'always active' is not just keyword matching), (2) explicit-keyword prompts (tests that the description's trigger list is recognised), (3) boundary prompts (explicit verbosity request — skill still LOADS but its rules are suspended per the override section). The trigger-eval passes when ALL 14 cases return true." + }, + { + "id": "t01-generic-coding-no-keyword", + "query": "What does the `__slots__` attribute do in Python?", + "should_trigger": true, + "reason": "Generic coding question with zero token-saving keywords. Always-active skill must load regardless of prompt content." + }, + { + "id": "t02-code-generation-no-keyword", + "query": "Write a Python function that flattens a nested list.", + "should_trigger": true, + "reason": "Code generation task. No conciseness keywords present. Always-active rule applies." + }, + { + "id": "t03-debugging-no-keyword", + "query": "Why does my Dockerfile build succeed but the container exits immediately at startup?", + "should_trigger": true, + "reason": "Debugging question with no skill keywords. Validates always-active claim holds for question-style prompts." + }, + { + "id": "t04-conceptual-explanation-no-keyword", + "query": "Explain eventual consistency in distributed systems.", + "should_trigger": true, + "reason": "Conceptual explanation request. No trigger keywords. Tests that the skill doesn't require topic-match to load." + }, + { + "id": "t05-non-technical-no-keyword", + "query": "What is the difference between a kanban board and a sprint board?", + "should_trigger": true, + "reason": "Non-technical process question. Always-active means the skill loads even outside software engineering topics." + }, + { + "id": "t06-pr-description-update-no-keyword", + "query": "Update the PR description to mention the hotfix for the null pointer in checkout.", + "should_trigger": true, + "reason": "PR workflow task. No explicit conciseness language. Tests always-active across workflow tasks." + }, + { + "id": "t07-recap-request-no-keyword", + "query": "Summarise the changes we made to the auth service today.", + "should_trigger": true, + "reason": "'summarise' appears in the description trigger list. But the primary signal is always-active. Confirms overlap between trigger list and always-active rule." + }, + { + "id": "t08-explicit-be-concise", + "query": "Give me a concise explanation of how TLS handshakes work.", + "should_trigger": true, + "reason": "'concise' is an explicit trigger keyword in the description. Skill must load and apply length constraints." + }, + { + "id": "t09-explicit-save-tokens", + "query": "Keep the response short — save tokens where you can.", + "should_trigger": true, + "reason": "'save tokens' and 'short' are explicit trigger phrases. Tests that the description's keyword list is matched." + }, + { + "id": "t10-explicit-too-verbose", + "query": "Your last answer was too verbose. Give me the same thing but shorter.", + "should_trigger": true, + "reason": "'too verbose' and 'shorter' are both explicit triggers in the description. Should reliably load the skill." + }, + { + "id": "t11-explicit-shorter", + "query": "Can you make that shorter?", + "should_trigger": true, + "reason": "'shorter' is an explicit trigger keyword. Minimal prompt — tests that a single keyword is sufficient." + }, + { + "id": "b01-explicit-full-detail-boundary", + "query": "Give me a full in-depth explanation of OAuth 2.0 Authorization Code Flow — I want every step.", + "should_trigger": true, + "boundary": true, + "reason": "Explicit verbosity request. Skill should still LOAD (trigger=true), but the override section suspends all length and summarisation rules. Trigger result is true; behavioural result is unconstrained response.", + "expected_behaviour": "Skill loads. All length limits and bullet caps are suspended. Full response with every step is produced." + }, + { + "id": "b02-deep-dive-boundary", + "query": "I want a deep dive — don't hold back on detail.", + "should_trigger": true, + "boundary": true, + "reason": "'deep dive' matches the override trigger in the skill body. Skill loads but rules are suspended.", + "expected_behaviour": "Skill loads. No artificial truncation. Response length reflects actual content depth." + }, + { + "id": "b03-rationale-boundary", + "query": "Walk me through the full rationale for choosing event sourcing over CRUD for this audit log.", + "should_trigger": true, + "boundary": true, + "reason": "'full rationale' matches the override trigger. Skill loads but all conciseness rules are suspended for this response.", + "expected_behaviour": "Skill loads. Agent covers all architectural tradeoffs without summarising or capping bullets." + }, + { + "id": "t12-co-trigger-pr-review", + "query": "Review this PR for API contract breaking changes before we merge.", + "should_trigger": true, + "reason": "Token-saving is always-active and must co-load alongside the pr-review skill. pr-review output structure (Blocker/Important/Nit) takes precedence per the skill's precedence rule, but token-saving still loads to suppress filler openers and closing padding." + }, + { + "id": "t13-co-trigger-kudos", + "query": "Nominate Sarah for kudos — she fixed a critical auth bug under pressure.", + "should_trigger": true, + "reason": "Token-saving is always-active and must co-load alongside the kudos skill. Kudos nomination format takes precedence, but token-saving still loads to enforce no-preamble and no-filler rules." + } +] From e709f122c670724ccb8e61ba4a1fa0391c6cd9e4 Mon Sep 17 00:00:00 2001 From: miroslavpojer Date: Wed, 20 May 2026 15:54:08 +0200 Subject: [PATCH 02/35] Add title parameter to release notes presence check in PR workflow --- .github/workflows/check_pr_release_notes.yml | 1 + 1 file changed, 1 insertion(+) diff --git a/.github/workflows/check_pr_release_notes.yml b/.github/workflows/check_pr_release_notes.yml index af3761e..646e098 100644 --- a/.github/workflows/check_pr_release_notes.yml +++ b/.github/workflows/check_pr_release_notes.yml @@ -21,3 +21,4 @@ jobs: github-repository: ${{ github.repository }} pr-number: ${{ github.event.number }} skip-labels: "no RN" + title: "## [Rr]elease [Nn]otes" From f653b27cb224693c8ff1e6dcc8127be8366e3dba Mon Sep 17 00:00:00 2001 From: miroslavpojer Date: Thu, 21 May 2026 08:35:02 +0200 Subject: [PATCH 03/35] Update documentation for token-saving skill; add contribution guidelines and skill overview --- CONTRIBUTING.md | 1 + README.md | 5 +++- docs/README.md | 24 ++++++++++----- docs/token-saving.md | 71 ++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 92 insertions(+), 9 deletions(-) create mode 100644 docs/token-saving.md diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 7a213fc..66e1e74 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -227,5 +227,6 @@ Before opening a pull request, verify: - [ ] No hardcoded credentials, secrets, or internal paths in skill body or scripts - [ ] Any script in `scripts/` is referenced from `SKILL.md` with usage guidance - [ ] New skill's description does not conflict with or shadow existing skills +- [ ] Skill added to the catalog table in `README.md` - [ ] Evals exist (or a note explains why they are not applicable) - [ ] `skills-ref validate ./skills/my-skill` passes (install: `pip install skills-ref`) diff --git a/README.md b/README.md index 428ed11..5e995c4 100644 --- a/README.md +++ b/README.md @@ -75,7 +75,9 @@ For the full guide — what skills are, how they activate, project-scoped instal Browse all available skills in the **[skills/](./skills/)** directory — each skill folder contains a `SKILL.md` with its purpose, trigger phrases, and full instructions. -> The catalog table will be populated as skills are added. See `skills/` for the current set. +| Skill | Description | +|------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------| +| **[token-saving](./skills/token-saving/)** | Always-active response discipline — enforces brevity, no filler openers or closers, structured output, and a What/Why/How footer on code responses. Suspends on explicit "full detail" requests. | ## Finding More Skills @@ -116,3 +118,4 @@ Claude, Cursor, Windsurf, and custom pipelines. ## Troubleshooting Setup issues and common fixes are covered in **[docs/troubleshooting.md](./docs/troubleshooting.md)**. +All documentation guides are indexed at **[docs/](./docs/)**. diff --git a/docs/README.md b/docs/README.md index a13b50d..e60899a 100644 --- a/docs/README.md +++ b/docs/README.md @@ -2,13 +2,21 @@ Navigation hub for all guides in this repository. Browse by category below. -| Guide | Audience | Description | -|-----------------------------------------|----------|------------------------------------------------------------------------------------| -| [Getting Started](./getting-started.md) | Users | What skills are, how to install them, Copilot CLI usage | -| [Troubleshooting](./troubleshooting.md) | Users | Setup guides and fixes for install, activation, and proxy issues | -| [Contributing](../CONTRIBUTING.md) | Authors | Skill folder layout, frontmatter, description writing, body guidelines, PR process | -| [Skill Testing](./skill-testing.md) | Authors | Eval creation, fixtures, regression loops, trigger and description optimization | - -> **Keep this index up to date.** When you add a new guide under `docs/`, add a row to the table above. +## Setup & Repository Guides + +| Guide | Description | +|-----------------------------------------|-------------------------------------------------------------------------------------| +| [Getting Started](./getting-started.md) | What skills are, how to install them, Copilot CLI usage | +| [Contributing](../CONTRIBUTING.md) | Skill folder layout, frontmatter, description writing, body guidelines, PR process | +| [Skill Testing](./skill-testing.md) | Eval creation, fixtures, regression loops, trigger and description optimization | +| [Troubleshooting](./troubleshooting.md) | Setup fixes for install, activation, and proxy issues | + +## Skill Guides + +| Guide | Description | +|-------------------------------------|------------------------------------------------------------------------------------| +| [Token Saving](./token-saving.md) | Keeping AI responses concise — how the token-saving skill works and when it applies | + +> **Keep this index up to date.** When you add a new guide, add a row to the appropriate table above. See also the [main README](../README.md) for the skill catalog, scope, and FAQ. diff --git a/docs/token-saving.md b/docs/token-saving.md new file mode 100644 index 0000000..22a6a60 --- /dev/null +++ b/docs/token-saving.md @@ -0,0 +1,71 @@ +# Token-Saving Skill + +The `token-saving` skill enforces concise, low-noise AI responses. It is **always active** — it applies to every reply without needing to be triggered by a specific phrase. + +--- + +## What it changes + +Without the skill, AI assistants commonly pad responses with filler openers, closing platitudes, and repeated context. This skill removes that noise. + +| Behaviour | Without skill | With skill | +|-----------|--------------|------------| +| Response opener | "Great question! Certainly, I'd be happy to help..." | Directly answers | +| Closing | "Let me know if you have any questions!" | Stops when done | +| Repeated context | Restates what you just said | Skips it | +| Factual answers | Unlimited prose | ≤ 5 lines | +| Action lists | Unlimited bullets | Capped at 4 | +| Full file dumps | Common | Only when you ask | + +--- + +## Code output footer + +Every response that writes or changes code ends with exactly this footer — no more, no less: + +``` +**What changed:** +**Why:** +**How to verify:** +``` + +This applies to: new functions, patches, inline diffs, config snippets, or any code block representing a change. + +It does **not** apply to: Q&A, reviews, planning, comparisons, or conceptual explanations. + +--- + +## Overriding the skill + +The brevity rules suspend for a single response when you explicitly ask for depth: + +> "Give me a full explanation." +> "Deep dive into this." +> "Don't hold back." +> "Complete explanation." + +The next response is fully unrestricted. Rules resume after that. + +--- + +## Precedence + +If another active skill specifies its own output format (e.g. a review skill with a Blocker / Important / Nit structure), that format takes precedence over this skill's rules. + +--- + +## Installation + +The skill is installed along with the rest of the toolkit: + +```bash +npx skills add https://github.com/AbsaOSS/agentic-toolkit -g +``` + +To install only this skill: + +```bash +npx skills add https://github.com/AbsaOSS/agentic-toolkit -g --skill token-saving +``` + +See [Getting Started](./getting-started.md) for the full install guide. From fd9f957321d16993dbad50d6acaf6fe73aa0b252 Mon Sep 17 00:00:00 2001 From: miroslavpojer Date: Thu, 21 May 2026 08:54:40 +0200 Subject: [PATCH 04/35] Add token-saving skill documentation with response formatting rules --- skills/{ => token-saving}/SKILL.md | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename skills/{ => token-saving}/SKILL.md (100%) diff --git a/skills/SKILL.md b/skills/token-saving/SKILL.md similarity index 100% rename from skills/SKILL.md rename to skills/token-saving/SKILL.md From b0eb566a9b491d87b235efaf4da3463a12b98c18 Mon Sep 17 00:00:00 2001 From: miroslavpojer Date: Thu, 21 May 2026 17:23:44 +0200 Subject: [PATCH 05/35] Initial design, review of skill.md files not finished. --- .github/agents/living-doc-copilot.agent.md | 105 ++++++++++ README.md | 14 ++ docs/README.md | 12 ++ docs/getting-started.md | 45 +++++ docs/living-doc-copilot.md | 125 ++++++++++++ skills/living-doc-create-feature/SKILL.md | 111 ++++++++++ .../evals/evals.json | 124 ++++++++++++ .../evals/files/raw-feature-notes.md | 33 +++ .../evals/fixture-map.md | 27 +++ .../evals/trigger-eval.json | 16 ++ .../living-doc-create-functionality/SKILL.md | 133 ++++++++++++ .../evals/evals.json | 112 ++++++++++ .../evals/files/broad-functionality.json | 25 +++ .../evals/fixture-map.md | 28 +++ .../evals/trigger-eval.json | 16 ++ skills/living-doc-create-user-story/SKILL.md | 124 ++++++++++++ .../evals/evals.json | 117 +++++++++++ .../evals/files/incomplete-user-story.json | 23 +++ .../evals/fixture-map.md | 28 +++ .../evals/trigger-eval.json | 16 ++ skills/living-doc-gap-finder/SKILL.md | 191 ++++++++++++++++++ skills/living-doc-gap-finder/evals/evals.json | 110 ++++++++++ .../evals/files/catalog-snapshot.json | 59 ++++++ .../evals/fixture-map.md | 27 +++ .../evals/trigger-eval.json | 16 ++ .../references/glossary.md | 120 +++++++++++ skills/living-doc-impact-analysis/SKILL.md | 148 ++++++++++++++ .../evals/evals.json | 156 ++++++++++++++ .../files/changed-notification-service.py | 54 +++++ .../evals/fixture-map.md | 29 +++ .../evals/trigger-eval.json | 68 +++++++ skills/living-doc-update/SKILL.md | 138 +++++++++++++ skills/living-doc-update/evals/evals.json | 157 ++++++++++++++ .../evals/files/payment-living-doc.md | 34 ++++ skills/living-doc-update/evals/fixture-map.md | 29 +++ .../living-doc-update/evals/trigger-eval.json | 74 +++++++ skills/references/living-doc-glossary.md | 142 +++++++++++++ 37 files changed, 2786 insertions(+) create mode 100644 .github/agents/living-doc-copilot.agent.md create mode 100644 docs/living-doc-copilot.md create mode 100644 skills/living-doc-create-feature/SKILL.md create mode 100644 skills/living-doc-create-feature/evals/evals.json create mode 100644 skills/living-doc-create-feature/evals/files/raw-feature-notes.md create mode 100644 skills/living-doc-create-feature/evals/fixture-map.md create mode 100644 skills/living-doc-create-feature/evals/trigger-eval.json create mode 100644 skills/living-doc-create-functionality/SKILL.md create mode 100644 skills/living-doc-create-functionality/evals/evals.json create mode 100644 skills/living-doc-create-functionality/evals/files/broad-functionality.json create mode 100644 skills/living-doc-create-functionality/evals/fixture-map.md create mode 100644 skills/living-doc-create-functionality/evals/trigger-eval.json create mode 100644 skills/living-doc-create-user-story/SKILL.md create mode 100644 skills/living-doc-create-user-story/evals/evals.json create mode 100644 skills/living-doc-create-user-story/evals/files/incomplete-user-story.json create mode 100644 skills/living-doc-create-user-story/evals/fixture-map.md create mode 100644 skills/living-doc-create-user-story/evals/trigger-eval.json create mode 100644 skills/living-doc-gap-finder/SKILL.md create mode 100644 skills/living-doc-gap-finder/evals/evals.json create mode 100644 skills/living-doc-gap-finder/evals/files/catalog-snapshot.json create mode 100644 skills/living-doc-gap-finder/evals/fixture-map.md create mode 100644 skills/living-doc-gap-finder/evals/trigger-eval.json create mode 100644 skills/living-doc-gap-finder/references/glossary.md create mode 100644 skills/living-doc-impact-analysis/SKILL.md create mode 100644 skills/living-doc-impact-analysis/evals/evals.json create mode 100644 skills/living-doc-impact-analysis/evals/files/changed-notification-service.py create mode 100644 skills/living-doc-impact-analysis/evals/fixture-map.md create mode 100644 skills/living-doc-impact-analysis/evals/trigger-eval.json create mode 100644 skills/living-doc-update/SKILL.md create mode 100644 skills/living-doc-update/evals/evals.json create mode 100644 skills/living-doc-update/evals/files/payment-living-doc.md create mode 100644 skills/living-doc-update/evals/fixture-map.md create mode 100644 skills/living-doc-update/evals/trigger-eval.json create mode 100644 skills/references/living-doc-glossary.md diff --git a/.github/agents/living-doc-copilot.agent.md b/.github/agents/living-doc-copilot.agent.md new file mode 100644 index 0000000..1c29cf1 --- /dev/null +++ b/.github/agents/living-doc-copilot.agent.md @@ -0,0 +1,105 @@ +--- +description: > + Maintain the living documentation catalog — single source of truth for requirements, + behaviours, and traceability. Use for: creating Feature / Functionality / User Story + entities, updating or deprecating entities, analysing code change impact on docs, + finding documentation gaps, and PO planning in PLANNED state. + Triggers: "create user story", "document feature", "update AC", "impact analysis", + "living doc gaps", "PLAN mode", "HEALING mode", "deprecate entity", "living doc copilot", + "add AC to user story", "trace affected features", "update feature registry". +tools: + - read_file + - replace_string_in_file + - create_file + - grep_search + - file_search + - semantic_search +--- + +# @living-doc-copilot + +Requirements layer agent. Owns the living documentation catalog — creates, updates, heals, and plans entities. Does not write code or test files. + +## Initialisation + +On every session start, ask: + +> "Which storage format does your living doc use? Describe the entity structure, field names, and where entities are stored (e.g. YAML files in `docs/living-doc/`, ADO work items, Confluence pages)." + +Wait for the answer before any create or update operation. Extract from the response: +- **Storage location** — where entity files live (path pattern or external system) +- **Entity templates** — expected fields and their names per entity type (US, Feature, Functionality) +- **AC block structure** — how ACs are represented (inline fields, nested list, table) +- **Field name mappings** — e.g. what the project calls `state`, `version`, `id` + +Never assume a format. If the answer is incomplete, ask one targeted follow-up before proceeding. + +## Scope + +- Create User Story, Feature, and Functionality entities from business requirements or PO descriptions +- Add, update, or reprioritise Acceptance Criteria on existing entities +- Deprecate entities whose corresponding code has been deleted or superseded +- Promote entities from `PLANNED` to `ACTIVE` state after implementation is confirmed +- Analyse the impact of a code change or PR on the catalog (which entities are affected) +- Find gaps in the catalog: undocumented behaviours, orphan tests, untested ACs (HEALING mode) +- Draft ACs from PO descriptions without existing code, in `PLANNED` state (PLAN mode) + +## Does NOT + +- Write Gherkin scenarios or feature files → hand off to `@bdd-copilot` +- Explore or crawl web apps → hand off to `@bdd-copilot` +- Write any test code → hand off to `@sdet-copilot` +- Repair PageObject selectors or step definitions → hand off to `@bdd-copilot` + +## AC Metadata + +Every AC must carry these fields: + +| Field | Values | +|---|---| +| `state` | `PLANNED` / `ACTIVE` / `DEPRECATED` / `IN_REVIEW` | +| `version` | Semantic version string | +| `pre-conditions` | List of conditions that must hold before the AC can be tested | +| `not_in_scope` | Explicit statement of what is excluded from this AC | + +## Gap Finder modes + +**HEALING** — triggered when living doc has drifted from the codebase: +- Detect stale entities (code deleted, AC never implemented) +- Set `DEPRECATED` state on confirmed stale entities +- Fix broken traceability links: US ↔ Feature ↔ Functionality +- Update `version` fields where incremented +- Remove `pre-conditions` that reference deleted flows +- Does NOT repair PageObject selectors or step definition bindings → `@bdd-copilot` + +**PLAN** — triggered by PO descriptions without existing code: +- Draft ACs from plain-language descriptions +- Present draft for confirmation before creating +- Create confirmed entities in `PLANNED` state only + +## Cross-agent HEALING boundary + +This agent heals the **catalog layer** (entities, ACs, traceability links). +`@bdd-copilot` heals the **automation layer** (PageObjects, step definitions, feature files). +Do not cross this boundary. + +> `@bdd-copilot` is the expected cooperating agent for the automation layer. It is deployed separately — if it is not yet available in this repository, hand-off notes should be left as TODO comments for a future BDD session. + +## Skills + +| Skill | Intent | Path | +|---|---|---| +| `living-doc-create-user-story` | Create a new User Story with business-level ACs | `skills/living-doc-create-user-story/SKILL.md` | +| `living-doc-create-feature` | Document a system surface (screen, API, service) | `skills/living-doc-create-feature/SKILL.md` | +| `living-doc-create-functionality` | Define an atomic, testable behaviour | `skills/living-doc-create-functionality/SKILL.md` | +| `living-doc-update` | Amend or deprecate existing entities | `skills/living-doc-update/SKILL.md` | +| `living-doc-impact-analysis` | Trace which entities a code change affects | `skills/living-doc-impact-analysis/SKILL.md` | +| `living-doc-gap-finder` | Find undocumented behaviours and orphan tests | `skills/living-doc-gap-finder/SKILL.md` | + +## Handoff + +**Inbound:** `@bdd-copilot` hands a surface list after Phase 1 exploration. Load it, then create the corresponding Feature and User Story entities. + +**Outbound:** When US and ACs are confirmed and in `ACTIVE` (or `PLANNED`) state, complete with: + +> "US and ACs are ready. Call @bdd-copilot to generate scenarios." diff --git a/README.md b/README.md index 5e995c4..6dca80c 100644 --- a/README.md +++ b/README.md @@ -77,8 +77,22 @@ its purpose, trigger phrases, and full instructions. | Skill | Description | |------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------| +| **[living-doc-create-user-story](./skills/living-doc-create-user-story/)** | Create a well-formed User Story with business-level Acceptance Criteria that are traceable, testable, and E2E-ready. | +| **[living-doc-create-feature](./skills/living-doc-create-feature/)** | Document a system surface (UI screen, API endpoint, service) as a Feature entity with ownership and traceability links. | +| **[living-doc-create-functionality](./skills/living-doc-create-functionality/)** | Define an atomic, testable behaviour (Functionality) with AC designed for fast unit or integration tests. | +| **[living-doc-update](./skills/living-doc-update/)** | Amend or deprecate existing User Story, Feature, or Functionality entities — add ACs, change status, update ownership. | +| **[living-doc-impact-analysis](./skills/living-doc-impact-analysis/)** | Trace which Features, Functionalities, User Stories, and Gherkin scenarios are affected by a code change or PR. | +| **[living-doc-gap-finder](./skills/living-doc-gap-finder/)** | Identify undocumented behaviours, orphan tests, and untested ACs. Shared by `@living-doc-copilot` and `@bdd-copilot`. | | **[token-saving](./skills/token-saving/)** | Always-active response discipline — enforces brevity, no filler openers or closers, structured output, and a What/Why/How footer on code responses. Suspends on explicit "full detail" requests. | +## Agent Roster + +Agents are pre-configured AI personas that orchestrate multiple skills for a specific engineering phase. Agent files live in **[.github/agents/](./.github/agents/)**. + +| Agent | Description | +|---|---| +| **[@living-doc-copilot](./.github/agents/living-doc-copilot.agent.md)** | Creates and maintains the living documentation catalog: User Stories, Features, Functionalities, AC updates, impact analysis, gap finding. | + ## Finding More Skills Before building a new skill, check whether one already exists: diff --git a/docs/README.md b/docs/README.md index e60899a..f95207b 100644 --- a/docs/README.md +++ b/docs/README.md @@ -2,6 +2,12 @@ Navigation hub for all guides in this repository. Browse by category below. +## Contents + +- [Setup & Repository Guides](#setup--repository-guides) +- [Skill Guides](#skill-guides) +- [Agent Guides](#agent-guides) + ## Setup & Repository Guides | Guide | Description | @@ -17,6 +23,12 @@ Navigation hub for all guides in this repository. Browse by category below. |-------------------------------------|------------------------------------------------------------------------------------| | [Token Saving](./token-saving.md) | Keeping AI responses concise — how the token-saving skill works and when it applies | +## Agent Guides + +| Guide | Description | +|-----------------------------------------------|------------------------------------------------------------------------------------------| +| [Living Doc Copilot](./living-doc-copilot.md) | How the living-doc-copilot agent works, its scope, modes, and how to trigger it | + > **Keep this index up to date.** When you add a new guide, add a row to the appropriate table above. See also the [main README](../README.md) for the skill catalog, scope, and FAQ. diff --git a/docs/getting-started.md b/docs/getting-started.md index 2c9b085..a9cd7e4 100644 --- a/docs/getting-started.md +++ b/docs/getting-started.md @@ -142,6 +142,51 @@ your-project-repo/ └── SKILL.md ``` +## Install Agents + +Agents are pre-configured AI personas that orchestrate multiple skills for a specific engineering phase. Unlike skills (which are auto-activated by description matching), agents are invoked explicitly by name in Copilot Chat. + +### How agents differ from skills + +| | Skills | Agents | +|---|---|---| +| Activation | Auto — triggered when your prompt matches the description | Manual — you @-mention the agent name | +| Scope | Any compatible tool (Copilot, Claude, Cursor) | GitHub Copilot Chat (VS Code) | +| Install location | `~/.agents/skills/` or `.github/skills/` | `.github/agents/` inside your project repo | + +### Install an agent into your project + +Copy the agent file into your project's `.github/agents/` directory: + +```bash +# One-time setup +mkdir -p .github/agents + +# Copy a specific agent +cp path/to/agentic-toolkit/.github/agents/living-doc-copilot.agent.md .github/agents/ +``` + +Or clone the toolkit and copy all agents: + +```bash +git clone https://github.com/AbsaOSS/agentic-toolkit.git +cp agentic-toolkit/.github/agents/*.agent.md .github/agents/ +``` + +Commit the `.github/agents/` directory to share the agents with your team. + +### Use an agent in Copilot Chat + +Open Copilot Chat in VS Code and type `@` followed by the agent name: + +``` +@living-doc-copilot create user story for the login feature +@living-doc-copilot living doc gaps +@living-doc-copilot HEALING mode +``` + +The agent loads its skills on demand and follows its defined scope. See the [Agent Roster](../README.md#agent-roster) for the full list of available agents and their guides. + ## Troubleshooting Running into issues? See **[docs/troubleshooting.md](./troubleshooting.md)** guide. diff --git a/docs/living-doc-copilot.md b/docs/living-doc-copilot.md new file mode 100644 index 0000000..8762c0f --- /dev/null +++ b/docs/living-doc-copilot.md @@ -0,0 +1,125 @@ +# Living Doc Copilot Agent + +`@living-doc-copilot` is the requirements layer agent. It owns the living documentation catalog — creating, updating, healing, and planning entities. It does not write code or test files. + +--- + +## What it does + +| Task | When to use | +|---|---| +| Create User Story / Feature / Functionality | Documenting new business requirements or system surfaces | +| Add or update Acceptance Criteria | After a sprint review, new requirement, or AC priority change | +| Deprecate entities | Code deleted, feature removed, or superseded by a new entity | +| Promote `PLANNED` → `ACTIVE` | After implementation is confirmed | +| Impact analysis | Before merging a PR that touches business logic | +| Gap finding — HEALING mode | Catalog has drifted: orphan tests, stale ACs, broken traceability | +| Gap finding — PLAN mode | PO has descriptions but no code exists yet | + +--- + +## How to trigger it + +``` +create user story for X +document feature — login screen +update AC on US-42 +deprecate the payment-gateway functionality +mark US-17 as ready +what does this change affect? +living doc gaps +HEALING mode +PLAN mode +living doc copilot +``` + +--- + +## Before you start — project setup + +On first use in a project, tell the agent how your living documentation is structured. The agent calls this a **Storage Profile** and uses it to apply the correct field names, AC block layout, and entity templates for your project. + +Examples of what to describe: + +| What to tell the agent | Example | +|---|---| +| Where entities are stored | `docs/living-doc/` as YAML files, or ADO work items, or Confluence pages | +| Entity fields | `id`, `title`, `state`, `acs` — and what each is called in your project | +| AC block structure | Inline fields under each AC, nested list, or table | +| State vocabulary | `PLANNED` / `ACTIVE` / `DEPRECATED` or custom terms your project uses | + +The agent will ask this question automatically at session start. You can also state it upfront before any command: + +``` +Our living doc is stored as YAML files in docs/living-doc/. +User Stories have: id, title, state, acs (list). +Each AC has: id, text, state, version, pre-conditions, not_in_scope. +``` + +> If the Storage Profile is incomplete, the agent will ask one targeted follow-up before creating or updating anything. + +--- + +## Modes + +### HEALING mode + +Repairs catalog drift. Triggers when the living doc has fallen behind the codebase: +- Sets `DEPRECATED` state on entities whose code no longer exists +- Fixes broken traceability links (US ↔ Feature ↔ Functionality) +- Updates `version` fields and removes stale `pre-conditions` +- Does **not** repair PageObject selectors or step definitions → `@bdd-copilot` + +> `@bdd-copilot` is the expected cooperating agent for automation-layer healing. It is deployed separately from this agent — if it is not yet available in your repo, record the automation-layer items as TODO notes for a future BDD session. + +### PLAN mode + +Drafts new ACs from PO descriptions before any code exists: +- Presents draft for confirmation before creating +- Creates in `PLANNED` state only — never `ACTIVE` + +--- + +## AC Metadata + +Every AC created or updated by this agent carries: + +| Field | Values | +|---|---| +| `state` | `PLANNED` / `ACTIVE` / `DEPRECATED` / `IN_REVIEW` | +| `version` | Semantic version string | +| `pre-conditions` | Conditions that must hold before the AC can be tested | +| `not_in_scope` | Explicit statement of what is excluded | + +--- + +## Skills used + +| Skill | Purpose | +|---|---| +| `living-doc-create-user-story` | New User Story with business-level ACs | +| `living-doc-create-feature` | New Feature entity (system surface) | +| `living-doc-create-functionality` | New atomic, testable behaviour | +| `living-doc-update` | Amend or deprecate existing entities | +| `living-doc-impact-analysis` | Trace entities affected by a code change | +| `living-doc-gap-finder` | Find undocumented behaviours and orphan tests. **Shared skill** — used top-down here (missing doc entities) and bottom-up by `@bdd-copilot` (scenario coverage gaps against known ACs). | + +--- + +## Handoff + +**Inbound:** `@bdd-copilot` hands a surface list after webapp exploration. Load it and create the corresponding Feature and User Story entities. + +**Outbound:** When entities are confirmed and ready: + +> "US and ACs are ready. Call @bdd-copilot to generate scenarios." + +--- + +## Installation + +```bash +npx skills add https://github.com/AbsaOSS/agentic-toolkit -g +``` + +See [Getting Started](./getting-started.md) for the full install guide. diff --git a/skills/living-doc-create-feature/SKILL.md b/skills/living-doc-create-feature/SKILL.md new file mode 100644 index 0000000..cc9a5ed --- /dev/null +++ b/skills/living-doc-create-feature/SKILL.md @@ -0,0 +1,111 @@ +--- +name: living-doc-create-feature +description: > + Define a system surface (UI screen or API endpoint group) as a Feature entity, enabling impact + analysis and change-management traceability in the living documentation. Activate when + documenting a new screen or API endpoint, mapping system surfaces to User Stories, enumerating + which Functionalities a surface owns, or bootstrapping the structural layer between User Stories + and atomic behaviors. + Triggers on: "document a new feature", "create a feature entity", "new screen documentation", + "document an API endpoint", "feature registry", "what feature owns this", "map user story to + feature", "create feature entity", "system surface documentation", "feature owners", + "feature dependencies". + Does NOT trigger for: creating User Stories (use living-doc-create-user-story), defining atomic + behaviors (use living-doc-create-functionality), scanning a webapp for PageObjects + (use living-doc-pageobject-scan), generating scenarios (use living-doc-scenario-creator). +license: Apache-2.0 +compatibility: GitHub Copilot +--- + +# Living Doc — Create Feature + +> **Key concepts:** Feature, Functionality, User Story, AC — see `../references/living-doc-glossary.md`. + +## Step 1 — Identify the system surface + +Before asking, **scan the conversation context** for a surface name, surface type, and owning team already stated by the user. If all three are present, propose the Feature directly and ask for confirmation rather than re-asking the questions. + +Ask only for what is missing: *What system surface does this Feature represent?* + +Select the surface type: + +| Type | Examples | +|---|---| +| `UI` | A web page, modal, or named screen (e.g. Checkout Page, Login Screen) | +| `API` | A REST/GraphQL endpoint or endpoint group, including a backend service's public API contract (e.g. Orders API, Payment Gateway API) | + +**One surface test abstraction ≈ one Feature** — a UI screen has a PageObject, an API endpoint group has an annotated endpoint method. See `../references/living-doc-glossary.md` for details. + +## Step 2 — Describe purpose and scope + +Ask: +- *What user interactions or system calls does this Feature own?* +- *What does it NOT own?* (helps define boundaries) + +Write a one-to-two sentence purpose statement using business language — not implementation detail. + +## Step 3 — Link to User Stories + +Ask: *Which User Stories rely on this Feature?* + +If unknown at creation time, leave empty `[]` but warn: + +> "An orphaned Feature (not linked to any User Story) contributes no traceable business value. +> Link at least one User Story or mark this as exploratory with status: 'candidate'. +> Orphaned Features are surfaced as gaps in living-doc-gap-finder reports." + +## Step 4 — Enumerate Functionalities + +Ask: *What atomic behaviors (Functionalities) does this Feature implement?* + +Functionalities can be empty at creation time — they are built out as development proceeds. +Add as `"functionalities": ["FUNC-"]` references only when the Functionality has been +formally defined. If they are described as informal notes or candidates (not yet registered as +FUNC entries), leave the array as `[]` and add a warning: +> "Candidate Functionalities must be formally defined using **living-doc-create-functionality** +> before being linked here." + +## Step 5 — Identify owners and dependencies + +| Field | What to capture | +|---|---| +| `owners` | Team name(s) or individual(s) responsible for this surface | +| `external_dependencies` | Services or systems this Feature calls (e.g. payment-gateway, order-service) | + +## Step 6 — Output canonical Feature entity + +Output using the project's Storage Profile format. Canonical fields: + +| Field | Required | Value | +|---|---|---| +| entity type | Yes | `Feature` | +| `id` | Yes | `FEAT-` (e.g. `FEAT-001`) | +| `name` | Yes | Noun phrase (e.g. "Login Page") | +| `surface_type` | Yes | `UI` \| `API` | +| `purpose` | Yes | One-to-two sentence description in business language | +| `status` | Yes | `planned` \| `active` \| `deprecated` | +| `user_stories` | Yes | List of `US-` IDs (can be `[]` for new Features) | +| `functionalities` | Yes | List of `FUNC-` IDs (can be `[]` initially) | +| `owners` | Yes | Team name(s) | +| `external_dependencies` | No | Names of services or systems this Feature calls | + +## Anti-patterns to flag + +| Anti-pattern | Warning | +|---|---| +| Feature covers multiple unrelated screens | Split into one Feature per distinct screen | +| Feature name is a verb (e.g. "Process Payment") | Feature names should be nouns — name the surface. Verb phrases describe *what the surface does*, which belongs in a Functionality entity (use **living-doc-create-functionality**) | +| Feature has no User Stories and no Functionalities | Orphan Feature — link or delete | +| Shared utility library documented as a Feature | A third-party dependency is not a system surface — document it as `external_dependency` in the Features that consume it. Internal module-level behaviors belong in Functionality entities under the API Feature that owns the service contract. | +| Feature name encodes implementation technology (e.g. "React Login Component", "Spring Payment Controller") | Feature names describe the business surface, not the stack. Use "Login Screen" (UI) or "Payment API" (API) — technology choice is an implementation detail that changes without the surface changing. | +| `surface_type` is `UI` for a backend REST controller or service | A REST endpoint group is an `API` surface. `UI` is reserved for screens a human interacts with directly. Misclassification breaks impact analysis routing between frontend and backend changes. | +| Feature shares a name with an existing Feature | Check for duplicates before creating. Identical names indicate a merge candidate or a scope overlap — clarify the boundary before proceeding. | +| `functionalities` field contains User Story IDs (US-nnn) | `functionalities` takes `FUNC-` IDs. User Stories are linked under `user_stories`, not here. | + +## Out-of-scope routing + +| Request type | Use instead | +|---|---| +| Creating a User Story | **living-doc-create-user-story** | +| Defining an atomic behavior (Functionality) | **living-doc-create-functionality** | +| Scanning a webapp for PageObjects | **living-doc-pageobject-scan** | diff --git a/skills/living-doc-create-feature/evals/evals.json b/skills/living-doc-create-feature/evals/evals.json new file mode 100644 index 0000000..3f69b1e --- /dev/null +++ b/skills/living-doc-create-feature/evals/evals.json @@ -0,0 +1,124 @@ +{ + "skill_name": "living-doc-create-feature", + "evals": [ + { + "id": 1, + "category": "happy-path", + "prompt": "I want to document the Checkout Page as a Feature entity.", + "expected_output": "Agent identifies surface type as UI. Asks for: purpose and scope (what user interactions it owns), linked User Stories, known Functionalities, owners, external dependencies. Outputs a canonical Feature JSON with id=FEAT-checkout, surface_type=UI, at least one US link, at least two Functionality references, owners, and external_dependencies including payment-gateway and order-service.", + "files": [], + "expectations": [ + "Identifies surface_type as UI for a web page", + "Asks what user interactions the screen owns", + "Asks for User Story links", + "Asks for Functionalities owned by this Feature", + "Asks for owners and external dependencies", + "Outputs valid canonical Feature JSON with all fields populated", + "Notes: 1 Feature ≈ 1 PageObject" + ] + }, + { + "id": 2, + "category": "happy-path", + "prompt": "I want to document the Orders API as a Feature.", + "expected_output": "Agent identifies surface_type as API. Asks for endpoint group description, linked User Stories, owned Functionalities, team owners, and external dependencies (e.g. order-db, notification-service). Outputs Feature JSON with id=FEAT-orders-api, surface_type=API.", + "files": [], + "expectations": [ + "surface_type is API", + "Purpose describes the API endpoint group in business terms", + "Outputs valid canonical Feature JSON" + ] + }, + { + "id": 3, + "category": "regression", + "prompt": "My Feature entity has no User Stories linked and no Functionalities. Is that OK?", + "expected_output": "Agent warns: an orphaned Feature with no User Stories and no Functionalities contributes no traceable business value. Either link to at least one User Story (or flag as 'candidate' status if it is exploratory), or delete it if it is no longer relevant. Notes that orphan Features will appear in living-doc-gap-finder reports as a gap.", + "files": [], + "expectations": [ + "Warns about orphaned Feature", + "Suggests linking to at least one User Story or setting status to 'candidate'", + "Notes orphan Features appear in gap reports" + ] + }, + { + "id": 4, + "category": "happy-path", + "prompt": "Should I name my Feature 'Process Payment' or 'Payment Page'?", + "expected_output": "Feature names should be nouns — they identify a system surface. 'Payment Page' is correct. 'Process Payment' is a verb phrase that describes a Functionality or an action, not a surface. The naming rule: if it can be a PageObject class name, it's a good Feature name.", + "files": [], + "expectations": [ + "Recommends noun name 'Payment Page'", + "Explains that verb phrases (Process Payment) belong to Functionalities", + "Mentions PageObject class naming as an alignment check" + ] + }, + { + "id": 5, + "category": "negative", + "prompt": "Create a User Story for the checkout capability.", + "expected_output": "User Story creation — routes to living-doc-create-user-story. This skill creates Feature entities (system surfaces), not User Stories.", + "files": [], + "expectations": [ + "Does not create a User Story", + "Routes to living-doc-create-user-story" + ] + }, + { + "id": 6, + "category": "paraphrase", + "prompt": "I want to add the Notification Service to the living doc as a system component. Where do I start?", + "expected_output": "Agent identifies this as a Feature entity creation (system surface). Asks: what type of surface is it (API, Worker, UI)? What User Stories does it enable? What Functionalities does it own? Who are the owners? What are the external dependencies (SMTP relay, template store)? Outputs a canonical Feature JSON for FEAT-notification-service with surface_type=Worker or API, at least one User Story link, owners, and external_dependencies.", + "files": [], + "expectations": [ + "Identifies this as a Feature creation request despite 'system component' phrasing", + "Asks for surface_type (Worker/API for a notification service)", + "Asks for User Story links, owners, and external dependencies", + "Outputs valid canonical Feature JSON" + ] + }, + { + "id": 7, + "category": "edge-case", + "prompt": "We have a shared utility library used by three different services. Should it be documented as a Feature entity?", + "expected_output": "A shared utility library is not a Feature — it is not a system surface with its own UI, API, or event stream. Document it as an external_dependency in the Feature entities of the services that consume it. If the library is substantial enough to warrant its own living doc entry, create it as a Feature with surface_type=Library and explicitly mark it as a shared internal dependency. Note: Features map 1:1 to deployable/distinct surfaces — shared libraries are infrastructure, not surfaces.", + "files": [], + "expectations": [ + "Advises against creating a Feature for a shared utility library by default", + "Recommends listing it as external_dependency in consumer Feature entities", + "Notes the surface_type=Library option for substantial libraries", + "Explains the 1:1 Feature-to-surface mapping rule" + ] + }, + { + "id": 8, + "category": "output-format", + "prompt": "Create a Feature entity for the 'User Profile' screen in our banking app. It is owned by team-identity and depends on the customer-service API.", + "expected_output": "The output contains a single fenced ```json code block with a valid Feature entity. The JSON object includes all required fields: type, id, name, surface_type, purpose, status, user_stories, functionalities, owners, external_dependencies. The id field follows the FEAT- convention. The surface_type value is one of: UI, API, Service, Module. No prose appears inside the JSON code block.", + "files": [], + "expectations": [ + "Single fenced ```json code block", + "All required fields present: type, id, name, surface_type, purpose, status, user_stories, functionalities, owners, external_dependencies", + "id follows FEAT- convention", + "surface_type is one of UI/API/Service/Module", + "No prose inside the code block" + ] + }, + { + "id": 9, + "category": "file-based", + "description": "Create a Feature entity from rough notes provided as a markdown file.", + "prompt": "Read evals/files/raw-feature-notes.md and produce a canonical Feature entity JSON from the notes.", + "expected_output": "Agent extracts the surface type, purpose, owners, and dependencies from the notes and produces a Feature entity JSON in a fenced ```json block. The id follows FEAT- convention derived from the feature name in the notes. The surface_type, owners, and external_dependencies fields are populated from the file content. user_stories and functionalities are empty arrays with a warning that they must be linked.", + "files": [ + "evals/files/raw-feature-notes.md" + ], + "expectations": [ + "Feature entity JSON in fenced ```json block", + "id follows FEAT- from the notes", + "surface_type, owners, external_dependencies populated from file", + "user_stories and functionalities are empty arrays with a linking warning" + ] + } + ] +} diff --git a/skills/living-doc-create-feature/evals/files/raw-feature-notes.md b/skills/living-doc-create-feature/evals/files/raw-feature-notes.md new file mode 100644 index 0000000..9341b3e --- /dev/null +++ b/skills/living-doc-create-feature/evals/files/raw-feature-notes.md @@ -0,0 +1,33 @@ +# Raw Feature Notes — Notifications Centre +# Used by: living-doc-create-feature file-based eval +# +# These are rough notes from a discovery session. The agent must convert +# them into a canonical Feature entity JSON. + +## What is it? +A screen inside the mobile banking app where customers can see all their +recent alerts (balance updates, payment confirmations, security notices). +The screen is called the "Notifications Centre". + +## Who owns it? +team-notifications (primary owner) +team-security also contributes for security alert types + +## What does it depend on? +- notification-service (backend API that stores and delivers alerts) +- customer-profile-service (to fetch customer preferences for notification types) + +## Surface type +UI — it's a screen in the mobile app + +## Status +In development — expected to go live in Q3 + +## Linked user stories +Not known yet — to be linked during sprint planning + +## Atomic behaviors (functionalities) +- Mark a notification as read +- Filter notifications by type (payments, security, promotions) +- Delete a notification +(These are candidates — not formally defined yet) diff --git a/skills/living-doc-create-feature/evals/fixture-map.md b/skills/living-doc-create-feature/evals/fixture-map.md new file mode 100644 index 0000000..9c0bcd8 --- /dev/null +++ b/skills/living-doc-create-feature/evals/fixture-map.md @@ -0,0 +1,27 @@ +# Fixture Map — living-doc-create-feature + +## Fixture files + +No fixture files for this skill. All evals are conversational — the skill provides +procedural guidance for structuring a Feature entity from user-provided answers. + +## Eval to fixture mapping + +| Eval ID | Category | Fixture file(s) | Coverage | +|---|---|---|---| +| 1 | happy-path | _(none — conversational)_ | UI surface: Checkout Page — full elicitation workflow | +| 2 | happy-path | _(none)_ | API surface: Orders API — surface type identification | +| 3 | regression | _(none)_ | Orphan Feature warning (no User Stories, no Functionalities) | +| 4 | happy-path | _(none)_ | Anti-pattern: verb-phrase Feature name (Process Payment) | +| 5 | negative | _(none)_ | Routing: User Story creation → living-doc-create-user-story | + +## Trigger eval summary + +14 entries: 10 `should_trigger=true`, 4 `should_trigger=false` + +| Routes to | Query count | +|---|---| +| living-doc-create-user-story | 1 | +| living-doc-create-functionality | 1 | +| living-doc-pageobject-scan | 1 | +| living-doc-scenario-creator | 1 | diff --git a/skills/living-doc-create-feature/evals/trigger-eval.json b/skills/living-doc-create-feature/evals/trigger-eval.json new file mode 100644 index 0000000..da51b6d --- /dev/null +++ b/skills/living-doc-create-feature/evals/trigger-eval.json @@ -0,0 +1,16 @@ +[ + {"id": 1, "query": "Document the checkout page as a Feature entity", "should_trigger": true, "reason": "Explicit 'document feature entity' trigger phrase"}, + {"id": 2, "query": "Create a Feature entity for the Orders API", "should_trigger": true, "reason": "Explicit 'create a feature entity' trigger keyword"}, + {"id": 3, "query": "New screen documentation for the account preferences page", "should_trigger": true, "reason": "'new screen documentation' trigger phrase"}, + {"id": 4, "query": "Document a new API endpoint — the payment initiation endpoint", "should_trigger": true, "reason": "'document an API endpoint' trigger phrase"}, + {"id": 5, "query": "Update the feature registry with the new notification service", "should_trigger": true, "reason": "'feature registry' trigger keyword"}, + {"id": 6, "query": "What feature owns the checkout screen?", "should_trigger": true, "reason": "'what feature owns this' trigger phrase"}, + {"id": 7, "query": "Map User Story US-007 to its Feature", "should_trigger": true, "reason": "'map user story to feature' trigger phrase"}, + {"id": 8, "query": "I need to document the discount engine as a system surface", "should_trigger": true, "reason": "Documenting a system surface — Feature creation workflow"}, + {"id": 9, "query": "Create a feature entity for the authentication module", "should_trigger": true, "reason": "Explicit 'create feature entity' trigger"}, + {"id": 10, "query": "What are the owners and dependencies for the checkout feature?", "should_trigger": true, "reason": "Asking about Feature properties — skill can populate or validate"}, + {"id": 11, "query": "Create a user story for the checkout capability", "should_trigger": false, "reason": "User Story creation — routes to living-doc-create-user-story"}, + {"id": 12, "query": "Document the atomic behavior: validate cart before checkout", "should_trigger": false, "reason": "Atomic behavior — routes to living-doc-create-functionality"}, + {"id": 13, "query": "Scan the checkout page for PageObjects", "should_trigger": false, "reason": "UI scan — routes to living-doc-pageobject-scan"}, + {"id": 14, "query": "Generate scenarios for the checkout User Story", "should_trigger": false, "reason": "Scenario creation — routes to living-doc-scenario-creator"} +] diff --git a/skills/living-doc-create-functionality/SKILL.md b/skills/living-doc-create-functionality/SKILL.md new file mode 100644 index 0000000..7a35e95 --- /dev/null +++ b/skills/living-doc-create-functionality/SKILL.md @@ -0,0 +1,133 @@ +--- +name: living-doc-create-functionality +description: > + Define an atomic, testable behavior (Functionality) with Functionality-level Acceptance Criteria + designed to be validated by fast unit or integration tests. Activate when documenting an atomic + behavior, component function, or business rule; writing Functionality-level AC; creating the + granular test anchor for a Feature; or identifying reuse candidates across User Stories. + Triggers on: "create a functionality", "document an atomic behavior", "functionality AC", + "unit-testable behavior", "define component behavior", "atomic acceptance criteria", + "document a business rule", "create a functionality entity", "functionality acceptance criteria". + Does NOT trigger for: end-to-end User Stories (use living-doc-create-user-story), system + surface documentation (use living-doc-create-feature), BDD scenario generation + (use living-doc-scenario-creator). + Pairs with living-doc-create-feature and living-doc-scenario-creator. +license: Apache-2.0 +compatibility: GitHub Copilot +--- + +# Living Doc — Create Functionality + +> **Key concepts:** Feature, Functionality, User Story, AC — see `../references/living-doc-glossary.md`. + +## Step 1 — Elicit the behavior + +Before asking, **scan the conversation context** for a behavior phrase and parent Feature already stated by the user. If both are present, form the Functionality name directly and ask for confirmation rather than re-asking the questions. + +Ask only for what is missing: *What is the atomic behavior to document?* + +Express as a **verb phrase** — a single, focused responsibility. The Functionality name follows +the pattern: `` (e.g. "Login Page – Validate Password Strength"). + +``` +✅ "Calculate discount for a cart item given the customer's membership tier" +✅ "Validate that an order quantity is within the allowed range" +✅ "Raise a CartEmptyError when checkout is attempted on an empty cart" + +❌ "Handle the checkout process" (too broad — split into multiple Functionalities) +❌ "The payment page" (that is a Feature, not a Functionality) +``` + +## Step 2 — Identify the parent Feature + +Ask: *Which Feature (system surface) owns this behavior?* + +A Functionality must belong to at least one Feature. If the Feature does not yet exist, suggest +creating it with `living-doc-create-feature` first. + + + +## Step 3 — Elicit Functionality-level Acceptance Criteria + +Functionality ACs describe atomic inputs → outputs. They are: +- **Atomic**: one input condition, one output or side effect per AC +- **Fast-testable**: designed for verification by unit or integration test. E2E tests *can* exercise the same behavior, but they are slow and expensive — they belong in a separate system-test tier, not the fast or regression suite. +- **Unambiguous**: exact error codes, exact output values where relevant + +Use the canonical AC format (see `../references/living-doc-glossary.md`): + +``` +AC:FUNC-- (v – Planned) + – + – : value1, value2, ... ← only when two or more values vary + – Rationale: ← optional +``` + +**Completeness checklist — prompt for each:** + +| Category | Prompt | +|---|---| +| Empty / null input | "What happens when the input is null or empty?" | +| Boundary values | "What happens at the minimum and maximum allowed values?" | +| Invalid type / format | "What error is raised for invalid format, and what is the error code?" | +| Concurrent access | "Is there a race condition? Should this behavior be idempotent?" | +| All error codes | "Are all error codes documented (not just the generic 'error occurred')?" | + +Warn if only happy-path ACs are present — same as for User Stories. + +## Step 4 — Flag reuse candidates + +Before creating, check whether an identical behavior already exists under any Feature. **Compare ACs, not names** — the same verb phrase in a different Feature context often produces a legitimately different contract (e.g. "Validate Amount" on a Payment Feature vs. a Transfer Feature may enforce different limits and error codes and must remain separate). + +If the ACs are identical or near-identical across two Features: + +> "This behavior has the same contract as [FUNC-nnn] under [parent Feature]. Consider whether +> both are genuinely the same behavior in different contexts, or whether one can be reused. +> If the contracts are truly identical, consolidating avoids a maintenance burden — a contract +> change must otherwise be applied in every copy, increasing the risk of divergence." + +If contextually distinct despite similar names, create a new Functionality and note the related one for future reviewers. + +## Step 5 — Output canonical Functionality entity + +Output using the project's Storage Profile format (defined per project — see `../../docs/living-doc-copilot.md`). Canonical fields (see `../references/living-doc-glossary.md` for AC format details): + +| Field | Required | Value | +|---|---|---| +| entity type | Yes | `Functionality` | +| `id` | Yes | `FUNC-` (e.g. `FUNC-001`) | +| `name` | Yes | `` (e.g. "Login Page – Validate Password Strength") | +| `parent_feature` | Yes | `FEAT-` ID of the owning Feature | +| `status` | Yes | `planned` \| `active` \| `deprecated` | +| `acceptance_criteria` | Yes | List of ACs in the format defined in `../references/living-doc-glossary.md` | + +## Distinguishing Functionality ACs from User Story ACs + +| Dimension | User Story AC | Functionality AC | +|---|---|---| +| Perspective | End user observing outcomes | Developer / component behaviour | +| Scope | Full E2E flow | Single function or method | +| Example | "Order is confirmed and email is sent" | "Returns the discounted total when a valid membership tier is applied" | + +If an AC written here is outcome-based from a user's perspective, it belongs in the User Story — +redirect to `living-doc-create-user-story`. + +## Anti-patterns to flag + +| Anti-pattern | Warning | +|---|---| +| Functionality name is a noun (e.g. "Password Validation") | Names must be verb phrases expressing the atomic behavior — e.g. "Validate Password Strength". A noun names a concept; a verb phrase names what the code does. | +| Functionality AC describes a full user journey (e.g. "User logs in and sees their dashboard") | That is a User Story AC — redirect to **living-doc-create-user-story**. Functionality ACs describe a single function's input → output or side effect. | +| Functionality has only happy-path ACs | Edge cases (null input, boundary values, error codes) are missing. Run through the completeness checklist in Step 3 before confirming. Untested error paths are the most common source of production incidents. | +| AC says "returns error" without specifying the type or code | Specify the error code using the canonical AC format: `– Raises {error code} when …` with `– Error code: CODE_VALUE`. Without a named code, the AC cannot be verified against a specific error contract. | +| AC uses `{placeholder}` for a single fixed value | Write the value inline. `{placeholder}` is only justified when two or more values vary across AC variants. | +| Two Functionalities have identical or near-identical ACs | Duplicate ACs create a maintenance burden. Consolidate into one shared Functionality owned by the appropriate parent Feature. | +| Functionality has no parent Feature | A Functionality without a parent Feature is untraceable — it cannot appear in impact analysis. Create or identify the parent Feature first. | + +## Out-of-scope redirects + +| Request type | Correct skill | +|---|---| +| "Create a User Story" | `living-doc-create-user-story` — this skill documents atomic behaviors, not end-to-end User Stories | +| "Create a Feature entity" | `living-doc-create-feature` — a Feature is a system surface, not an atomic behavior | +| "Generate BDD scenarios" | `living-doc-scenario-creator` — scenario generation requires a User Story with ACs | diff --git a/skills/living-doc-create-functionality/evals/evals.json b/skills/living-doc-create-functionality/evals/evals.json new file mode 100644 index 0000000..23c37a1 --- /dev/null +++ b/skills/living-doc-create-functionality/evals/evals.json @@ -0,0 +1,112 @@ +{ + "skill_name": "living-doc-create-functionality", + "evals": [ + { + "id": 1, + "category": "happy-path", + "prompt": "I want to document the atomic behavior: validate that a cart contains at least one in-stock item before checkout.", + "expected_output": "Agent forms verb-phrase name: 'Validate cart contains at least one in-stock item'. Links to FEAT-checkout. Runs completeness checklist: asks about null/empty cart, out-of-stock items, partially in-stock carts, zero-quantity items. Produces at least 4 ACs: empty cart → CART_EMPTY error, all items out of stock → OUT_OF_STOCK error, valid cart → VALID, zero-quantity item → INVALID_QUANTITY. All test_type=unit. Outputs canonical Functionality JSON.", + "files": [], + "expectations": [ + "Name is a verb phrase (not a noun)", + "Asks about empty cart, out-of-stock, and boundary conditions", + "Runs the completeness checklist", + "Produces at least 3 ACs covering error codes explicitly", + "All ACs have test_type=unit", + "Outputs valid canonical Functionality JSON" + ] + }, + { + "id": 2, + "category": "happy-path", + "prompt": "Review this Functionality and tell me what's wrong with it. File: evals/files/broad-functionality.json", + "expected_output": "Agent identifies: Blocker — name 'Handle checkout' is too broad (verb phrase should identify a single atomic behavior); Blocker — both ACs are vague ('works correctly', 'handles it appropriately') — not testable; Important — no error codes specified in the error AC; Important — no boundary value conditions. Proposes splitting into multiple focused Functionalities and rewriting ACs with exact When/Then conditions and error codes.", + "files": [ + "evals/files/broad-functionality.json" + ], + "expectations": [ + "Flags 'Handle checkout' as too broad — not atomic", + "Flags 'works correctly' as non-testable — no exact output", + "Flags 'handles it appropriately' as vague — no error code", + "Proposes splitting into at least 2 focused Functionalities", + "Provides rewritten AC examples with exact outputs and error codes" + ] + }, + { + "id": 3, + "category": "happy-path", + "prompt": "Should this AC have test_type=unit or test_type=integration? The behavior is: validate that the customer's email is not already registered.", + "expected_output": "This requires a database read to check existing registrations — it cannot be tested in pure isolation. test_type=integration. Explains: unit tests mock the DB; an integration test uses a real or test DB to verify the uniqueness constraint actually works. Notes that if the validation logic is a pure function that takes a list of existing emails as input, it could be unit-testable — in that case unit is preferred.", + "files": [], + "expectations": [ + "Recommends integration because it requires a real DB read", + "Explains the unit vs integration distinction", + "Notes the pure-function alternative that would allow unit testing" + ] + }, + { + "id": 4, + "category": "regression", + "prompt": "I noticed this AC appears in both US-001 and US-007: 'When the cart is empty, validation returns INVALID with code CART_EMPTY'. Should I duplicate it?", + "expected_output": "No. This is a reuse candidate. Create a single Functionality with this AC and link both US-001 and US-007 to it via the user_stories array. Duplicating ACs across User Stories creates maintenance burden — a change to the behavior must be updated in multiple places.", + "files": [], + "expectations": [ + "Flags as a reuse candidate — not a duplication", + "Advises creating one Functionality and linking both USs via user_stories array", + "Explains the maintenance risk of duplication" + ] + }, + { + "id": 5, + "category": "negative", + "prompt": "Create a User Story for the order placement capability.", + "expected_output": "User Story creation — routes to living-doc-create-user-story. This skill documents atomic behaviors (Functionalities), not end-to-end User Stories.", + "files": [], + "expectations": [ + "Does not create a User Story", + "Routes to living-doc-create-user-story" + ] + }, + { + "id": 6, + "category": "paraphrase", + "prompt": "I need to capture the business rule: a gold member gets a 20% discount on all orders over £50. How do I document this in the living doc?", + "expected_output": "Agent identifies this as a Functionality entity. Forms verb-phrase name: 'Apply gold member discount on qualifying orders'. Runs completeness checklist: asks about non-gold members, orders exactly £50 (boundary), orders under £50, combination with promo codes. Produces ACs: order>£50 and gold member → 20% discount applied; order<=£50 → no discount; non-gold member → no discount; combined with promo → define stacking rule. All ACs test_type=unit. Outputs canonical Functionality JSON.", + "files": [], + "expectations": [ + "Identifies as Functionality (atomic business rule), not User Story", + "Verb-phrase name captures the business rule precisely", + "Boundary value at £50 is included in the completeness checklist", + "At least 3 ACs with explicit expected outcomes", + "Outputs valid canonical Functionality JSON" + ] + }, + { + "id": 7, + "category": "edge-case", + "prompt": "A Functionality has 12 ACs. Is that a sign of a design problem?", + "expected_output": "Yes. A Functionality with 12 ACs is almost certainly not atomic — it is covering multiple distinct behaviors. An atomic Functionality should have 3-7 ACs covering the happy path, key error codes, and important boundary values. With 12, review whether the ACs can be grouped into 2-3 distinct behaviors that each warrant their own Functionality entity. Split the Functionality by behavioral concern. The goal is one coherent unit-testable behavior per entity.", + "files": [], + "expectations": [ + "Flags 12 ACs as a sign of non-atomic scope", + "Recommends 3-7 ACs as the target range", + "Advises splitting into 2-3 focused Functionalities", + "Explains the atomic unit-testable behavior criterion" + ] + }, + { + "id": 8, + "category": "output-format", + "prompt": "Document the atomic behavior: 'The system deducts the applied voucher discount from the basket total before tax is calculated.'", + "expected_output": "The output contains a fenced ```json code block with a Functionality entity. Required fields: type ('Functionality'), id (FUNC-), name, description, feature_id, acceptance_criteria (array), test_coverage (array), status. Acceptance criteria items are phrased in plain business English. No implementation details appear in the JSON values.", + "files": [], + "expectations": [ + "Single fenced ```json code block", + "type field is 'Functionality'", + "id follows FUNC- convention", + "acceptance_criteria is an array of business-language strings", + "No implementation details in JSON values" + ] + } + ] +} diff --git a/skills/living-doc-create-functionality/evals/files/broad-functionality.json b/skills/living-doc-create-functionality/evals/files/broad-functionality.json new file mode 100644 index 0000000..c61971c --- /dev/null +++ b/skills/living-doc-create-functionality/evals/files/broad-functionality.json @@ -0,0 +1,25 @@ +{ + "type": "Functionality", + "id": "FUNC-checkout-draft", + "name": "Handle checkout", + "parent_feature": "FEAT-checkout", + "user_stories": ["US-001"], + "acceptance_criteria": [ + { + "id": "FUNC-checkout-draft-AC-1", + "description": "Checkout works correctly", + "when": "the customer goes through the checkout process", + "then": "the order is placed and everything works", + "priority": "critical", + "test_type": "unit" + }, + { + "id": "FUNC-checkout-draft-AC-2", + "description": "Error handling works", + "when": "something goes wrong during checkout", + "then": "the system handles it appropriately", + "priority": "high", + "test_type": "unit" + } + ] +} diff --git a/skills/living-doc-create-functionality/evals/fixture-map.md b/skills/living-doc-create-functionality/evals/fixture-map.md new file mode 100644 index 0000000..bdd06ec --- /dev/null +++ b/skills/living-doc-create-functionality/evals/fixture-map.md @@ -0,0 +1,28 @@ +# Fixture Map — living-doc-create-functionality + +## Fixture files + +| File | Description | +|---|---| +| `evals/files/broad-functionality.json` | Draft Functionality with over-broad name ("Handle checkout") and vague ACs — tests completeness enforcement | + +## Eval to fixture mapping + +| Eval ID | Category | Fixture file(s) | Coverage | +|---|---|---|---| +| 1 | happy-path | _(none — conversational)_ | Full elicitation: cart validation behavior, completeness checklist, atomic ACs, error codes | +| 2 | happy-path | `broad-functionality.json` | Blocker detection: broad name, vague ACs, no error codes | +| 3 | happy-path | _(none)_ | unit vs integration decision for a DB uniqueness check | +| 4 | regression | _(none)_ | Reuse candidate detection: same AC in two User Stories | +| 5 | negative | _(none)_ | Routing: User Story creation → living-doc-create-user-story | + +## Trigger eval summary + +14 entries: 10 `should_trigger=true`, 4 `should_trigger=false` + +| Routes to | Query count | +|---|---| +| living-doc-create-user-story | 1 | +| living-doc-create-feature | 1 | +| living-doc-scenario-creator | 1 | +| living-doc-gap-finder | 1 | diff --git a/skills/living-doc-create-functionality/evals/trigger-eval.json b/skills/living-doc-create-functionality/evals/trigger-eval.json new file mode 100644 index 0000000..e7a6c73 --- /dev/null +++ b/skills/living-doc-create-functionality/evals/trigger-eval.json @@ -0,0 +1,16 @@ +[ + {"id": 1, "query": "Create a functionality for validating cart contents before checkout", "should_trigger": true, "reason": "Explicit 'create a functionality' trigger keyword"}, + {"id": 2, "query": "Document the atomic behavior: apply discount to a cart item", "should_trigger": true, "reason": "'document an atomic behavior' trigger phrase"}, + {"id": 3, "query": "Write Functionality ACs for the discount engine", "should_trigger": true, "reason": "'functionality AC' trigger phrase"}, + {"id": 4, "query": "Define a unit-testable behavior for the coupon validation module", "should_trigger": true, "reason": "'unit-testable behavior' and 'define component behavior' trigger phrases"}, + {"id": 5, "query": "Document the business rule: orders over $100 get free shipping", "should_trigger": true, "reason": "'document a business rule' trigger phrase"}, + {"id": 6, "query": "Create a functionality entity for the payment retry logic", "should_trigger": true, "reason": "'create a functionality entity' trigger phrase"}, + {"id": 7, "query": "What ACs should I write for the email validator function?", "should_trigger": true, "reason": "Asking for atomic AC writing — core functionality skill task"}, + {"id": 8, "query": "What test_type should I use for checking DB uniqueness constraints?", "should_trigger": true, "reason": "Deciding unit vs integration — functionality skill task"}, + {"id": 9, "query": "Review this functionality for completeness — it only has a happy path", "should_trigger": true, "reason": "Completeness check of Functionality ACs is a core task"}, + {"id": 10, "query": "I see this AC in both US-001 and US-007 — should I split it out?", "should_trigger": true, "reason": "Reuse candidate identification — a core functionality skill task"}, + {"id": 11, "query": "Create a user story for the checkout capability", "should_trigger": false, "reason": "User Story — routes to living-doc-create-user-story"}, + {"id": 12, "query": "Document the checkout page as a Feature", "should_trigger": false, "reason": "Feature entity — routes to living-doc-create-feature"}, + {"id": 13, "query": "Generate BDD scenarios for US-001", "should_trigger": false, "reason": "Scenario generation — routes to living-doc-scenario-creator"}, + {"id": 14, "query": "Run a gap analysis on the living documentation", "should_trigger": false, "reason": "Gap analysis — routes to living-doc-gap-finder"} +] diff --git a/skills/living-doc-create-user-story/SKILL.md b/skills/living-doc-create-user-story/SKILL.md new file mode 100644 index 0000000..9494bdf --- /dev/null +++ b/skills/living-doc-create-user-story/SKILL.md @@ -0,0 +1,124 @@ +--- +name: living-doc-create-user-story +description: > + Guide the creation of a well-formed User Story (US) with business-level Acceptance Criteria + that are traceable, testable, and E2E-ready. Activate when creating a new User Story for a + business capability, eliciting As-a/I-can/so-that narratives, defining US-level Acceptance + Criteria, or validating User Story completeness before handing off to scenario creation. + Triggers on: "create a user story", "new user story for", "write acceptance criteria for", + "document a business requirement", "define US AC", "user story template", "as a user I want", + "elicit requirements", "AC for user story", "US acceptance criteria". + Does NOT trigger for: atomic component behaviors (use living-doc-create-functionality), + documenting system surfaces (use living-doc-create-feature), generating BDD scenarios + (use living-doc-scenario-creator). + Pairs with living-doc-create-functionality and living-doc-scenario-creator. +license: Apache-2.0 +compatibility: GitHub Copilot +--- + +# Living Doc — Create User Story + +> **Key concepts:** Feature, Functionality, User Story, AC — see `../references/living-doc-glossary.md`. + +## Step 1 — Elicit the narrative + +Before asking, **scan the conversation context** for an actor, capability, or business outcome already stated by the user. If all three are present, form the narrative directly and ask for confirmation rather than re-asking the questions. + +Ask only for what is missing: + +1. **Who is the user?** — The actor using the system (a specific role, not "the user") +2. **What do they want to do?** — The capability or action in business terms +3. **Why?** — The business outcome or value delivered + +Form the canonical narrative: + +``` +As a , +I can , +so that . +``` + +**Validation:** +- Actor must be a named role — not "system", "admin", or "the app" +- If the actor given is "the system" or similar, reject it: *"The system is not a valid actor. + Ask: who triggers this action? Who benefits from it? Name that human role."* + System-initiated or background flows do not belong in a User Story — they belong in a + Functionality. Redirect to `living-doc-create-functionality` for system-driven behaviors. +- Capability must be an action the user performs — not a technical implementation +- Outcome must describe business value — not system state + +## Step 2 — Establish domain context + +Ask: *Which Feature(s) does this User Story touch?* + +A Feature is a named system surface (UI screen or API endpoint group). If the Feature +has not yet been created as a living doc entity, note it as `[NEW: ]` and suggest creating it +with `living-doc-create-feature` after completing the User Story. + +## Step 3 — Elicit Acceptance Criteria + +Each AC must be: +- **End-to-end** — written from the user's perspective, not the database's +- **Outcome-focused** — "order is confirmed" not "DB record is inserted" +- **Binary** — clear pass/fail; no "should usually" or "typically" +- **Single placeholder** — at most ONE `{placeholder}` per AC statement. If two aspects vary independently, write a separate AC for each. + +Use `{placeholder}` syntax when a value varies, and list the concrete values immediately below: + +``` +AC:US-- (v – Planned) + – + – : value1, value2, ... +``` + +See full AC format and examples in `../references/living-doc-glossary.md`. + +**Completeness check — always ask:** +1. What happens on the happy path? (at least one AC required) +2. What happens when the input is invalid or missing? +3. What happens when a downstream dependency fails? +4. Are there alternative flows (e.g. user is not logged in, item is out of stock)? + +Warn if only happy-path ACs are present: +> "No error or alternative-path ACs were provided. Real systems fail — add at least one +> AC for a failure or edge case before marking this US ready." + +**Warn if an AC reads like a Functionality AC** (too atomic/technical): +> "This AC describes a technical behavior rather than an end-to-end user outcome. +> Consider creating a Functionality entity for this behavior with +> living-doc-create-functionality." + +## Step 4 — Validate and output + +Invariants that must hold before outputting: +- At least one AC exists +- At least one Feature is linked (or flagged as `[NEW]`) +- Status defaults to `planned` +- No open `[TODO]` markers + +Output the User Story using the project's Storage Profile format. Canonical fields: + +| Field | Required | Value | +|---|---|---| +| entity type | Yes | `UserStory` | +| `id` | Yes | `US-` (e.g. `US-001`) | +| `name` | Yes | Short imperative title (e.g. "Customer Login") | +| `status` | Yes | `planned` — default for new entities | +| `as_a` | Yes | Named actor | +| `i_can` | Yes | The capability | +| `so_that` | Yes | Business outcome | +| `features` | Yes | List of `FEAT-` IDs | +| `acceptance_criteria` | Yes | List of ACs in the format defined in `../references/living-doc-glossary.md` | + +## Anti-patterns to flag + +| Anti-pattern | Warning | +|---|---| +| AC says "the system saves to the database" | Technical implementation — restate as user outcome. Provide a rewritten AC: e.g. "When the customer confirms the order, then the order is acknowledged and the customer sees a confirmation message." | +| AC says "unit test passes" | Test is not an AC — describe the behavior, not how it's verified | +| Narrative says "As a system..." | System is not a user — name the human role | +| Same capability described for two different actors | Two actors = two separate User Stories. Different actors have different permissions, audit requirements, and AC sets. Mixing two actor perspectives in one User Story produces ambiguous ACs. Shared Functionalities (e.g. OTP generation, email delivery) can be linked to both User Stories. || User Story "I can" clause contains "and" | Multiple capabilities in one User Story — split at each “and”. Each capability has its own failure paths and may touch different Features; bundling them makes ACs ambiguous and traceability impossible. | +| AC uses `{placeholder}` for a single value | Placeholder syntax is only justified when two or more values vary. If only one value applies, write it inline. Example: instead of `{error type}: inline validation message`, write `an inline validation message is shown`. | +| AC describes a non-observable outcome | e.g. “a background job processes the record” — the user cannot observe this. Restate as the observable signal (e.g. “the confirmation email arrives within 60 seconds”), or redirect the behavior to a Functionality entity if it is purely technical. | +| AC identifier is missing the version or state | AC format requires `AC:- (v)`. An AC without version or state cannot be traced across releases or marked as deprecated without rewriting its ID. | +| AC behavior already documented in another User Story | Duplicate ACs create a maintenance burden — any change must be applied in every copy. Extract the shared behavior into a Functionality entity and link both User Stories to it. | \ No newline at end of file diff --git a/skills/living-doc-create-user-story/evals/evals.json b/skills/living-doc-create-user-story/evals/evals.json new file mode 100644 index 0000000..401c846 --- /dev/null +++ b/skills/living-doc-create-user-story/evals/evals.json @@ -0,0 +1,117 @@ +{ + "skill_name": "living-doc-create-user-story", + "evals": [ + { + "id": 1, + "category": "happy-path", + "prompt": "I want to create a new User Story for the password reset capability.", + "expected_output": "Agent asks three elicitation questions in sequence: (1) Who is the user? (2) What do they want to do? (3) Why / what business outcome? After answers, forms the As-a/I-can/so-that narrative. Then asks for domain context (which Feature). Then elicits ACs in Given-When-Then. Checks for error and alternative paths (unregistered email, expired token, already-used token). Assigns priorities. Outputs canonical User Story JSON with at least 3 ACs (happy path + at least 2 error/alternative).", + "files": [], + "expectations": [ + "Asks actor, capability, and business value as distinct questions before writing narrative", + "Forms As-a/I-can/so-that narrative correctly from answers", + "Asks which Feature(s) this story touches", + "Elicits at least one error-path AC (e.g. unregistered email, expired token)", + "Warns if only happy-path AC is provided", + "Assigns independent priority to each AC", + "Outputs valid canonical UserStory JSON" + ] + }, + { + "id": 2, + "category": "happy-path", + "prompt": "Review this User Story and tell me what's missing. File: evals/files/incomplete-user-story.json", + "expected_output": "Agent identifies: (1) Important — only one happy-path AC provided; no error or alternative path ACs. Prompts for: what happens with an unregistered email? what happens with an expired reset link? what happens if the link is used twice? Proposes three additional ACs covering these gaps. Does not flag the narrative or Feature link as issues.", + "files": [ + "evals/files/incomplete-user-story.json" + ], + "expectations": [ + "Flags that only a happy-path AC exists", + "Prompts for error path: unregistered email address", + "Prompts for error path: expired reset token", + "Prompts for alternative: link already used", + "Does not reject the existing happy-path AC", + "Proposes at least 2 additional ACs with Given-When-Then" + ] + }, + { + "id": 3, + "category": "happy-path", + "prompt": "The actor in my narrative is 'the system'. Is that OK?", + "expected_output": "No. 'The system' is not a valid actor — it is not a human user with a goal. An actor must be a named human role (e.g. 'registered customer', 'support agent', 'finance manager'). System-initiated flows belong in a Functionality or a background process, not a User Story. Ask: who triggers this? Who benefits?", + "files": [], + "expectations": [ + "Rejects 'the system' as an actor", + "Explains that actors must be human roles", + "Suggests asking 'who triggers this?' and 'who benefits?'", + "Notes system-initiated flows belong in a Functionality" + ] + }, + { + "id": 4, + "category": "regression", + "prompt": "My AC says: 'When the customer submits the form, the database saves a record to the orders table with status=PENDING.' Is this OK?", + "expected_output": "No. This AC describes a technical implementation detail (database table, status field). It should be restated as a user-observable outcome. Example fix: 'When the customer confirms the order, then the order is acknowledged and the customer sees a confirmation message.' The internal DB state is an implementation concern — not an E2E AC.", + "files": [], + "expectations": [ + "Flags the AC as describing implementation detail, not user outcome", + "Provides a rewritten AC in outcome-focused language", + "Notes that DB state is an implementation concern", + "References the rule: AC must be outcome-focused from the user's perspective" + ] + }, + { + "id": 5, + "category": "negative", + "prompt": "Document the behavior: when discount code is applied to a zero-price item, the discount is silently ignored.", + "expected_output": "This is a Functionality-level behavior (atomic, technical, unit-testable) — routes to living-doc-create-functionality. User Story ACs describe E2E user outcomes, not component-level behaviors.", + "files": [], + "expectations": [ + "Identifies this as a Functionality AC, not a User Story AC", + "Routes to living-doc-create-functionality", + "Explains the distinction: atomic technical behavior vs. E2E user outcome" + ] + }, + { + "id": 6, + "category": "paraphrase", + "prompt": "Help me write a story for a feature where customers can reset their password via SMS instead of email.", + "expected_output": "Agent asks three elicitation questions: (1) Who is the actor? (registered customer who has a phone number on file). (2) What do they want to do? (receive a reset code via SMS). (3) Why? (some customers prefer SMS or don't have email access). Forms narrative: As a registered customer, I can reset my password via SMS, so that I can regain access even without email access. Elicits ACs: happy path (valid phone, code sent), unregistered phone, expired code, wrong code, maximum retries.", + "files": [], + "expectations": [ + "Forms As-a/I-can/so-that narrative", + "Elicits at least 3 ACs covering happy path and error paths", + "Asks actor, capability, and business value as distinct questions", + "Outputs valid canonical UserStory JSON" + ] + }, + { + "id": 7, + "category": "edge-case", + "prompt": "The same capability is needed by two different actors: a registered customer can reset their password, and a support agent can trigger a password reset on behalf of a customer. Should these be one User Story or two?", + "expected_output": "These should be two separate User Stories — different actors have different permissions, audit requirements, and AC sets. A customer-initiated reset has no audit trail requirement; an agent-initiated reset must be logged with the agent ID, customer consent, and reason. Shared Functionalities (OTP generation, email delivery) can be reused by linking both User Stories to the same Functionality entities. Mixing two actor perspectives in one User Story produces ambiguous ACs.", + "files": [], + "expectations": [ + "Recommends two separate User Stories for two distinct actors", + "Explains that different actors have different AC sets and audit requirements", + "Notes shared Functionalities can be linked to both User Stories", + "Warns that mixing actors produces ambiguous ACs" + ] + }, + { + "id": 8, + "category": "output-format", + "prompt": "Create a user story: 'A customer service agent needs to view the full order history for any customer to resolve disputes.'", + "expected_output": "The output contains a fenced ```json code block with a UserStory entity. Required fields: type ('UserStory'), id (US-), title, as_a, i_want, so_that, acceptance_criteria (array of objects with id and description). The as_a/i_want/so_that fields follow the standard user story template. Each AC has a unique id in the format US--AC-. No Gherkin syntax appears inside the JSON values.", + "files": [], + "expectations": [ + "Single fenced ```json code block", + "type field is 'UserStory'", + "id follows US- convention", + "as_a/i_want/so_that follow standard template", + "Each AC has id in US--AC- format", + "No Gherkin syntax in JSON values" + ] + } + ] +} diff --git a/skills/living-doc-create-user-story/evals/files/incomplete-user-story.json b/skills/living-doc-create-user-story/evals/files/incomplete-user-story.json new file mode 100644 index 0000000..93f6e15 --- /dev/null +++ b/skills/living-doc-create-user-story/evals/files/incomplete-user-story.json @@ -0,0 +1,23 @@ +{ + "type": "UserStory", + "id": "US-042", + "title": "Reset account password", + "status": "draft", + "narrative": { + "as_a": "registered customer", + "i_can": "reset my password using my email address", + "so_that": "I can regain access to my account if I forget my password" + }, + "features": ["FEAT-login"], + "acceptance_criteria": [ + { + "id": "US-042-AC-1", + "description": "Happy path: password reset email is sent", + "given": "the customer is on the forgot password page", + "when": "the customer enters their registered email address and submits", + "then": "a password reset email is sent and the customer sees a confirmation message", + "priority": "critical", + "type": "happy_path" + } + ] +} diff --git a/skills/living-doc-create-user-story/evals/fixture-map.md b/skills/living-doc-create-user-story/evals/fixture-map.md new file mode 100644 index 0000000..431652e --- /dev/null +++ b/skills/living-doc-create-user-story/evals/fixture-map.md @@ -0,0 +1,28 @@ +# Fixture Map — living-doc-create-user-story + +## Fixture files + +| File | Description | +|---|---| +| `evals/files/incomplete-user-story.json` | User Story US-042 (password reset) with only a happy-path AC — missing error and alternative paths | + +## Eval to fixture mapping + +| Eval ID | Category | Fixture file(s) | Coverage | +|---|---|---|---| +| 1 | happy-path | _(none — conversational elicitation)_ | Full elicitation workflow: actor → narrative → Feature → ACs → completeness check → output | +| 2 | happy-path | `incomplete-user-story.json` | Completeness check: detects missing error + alternative ACs | +| 3 | happy-path | _(none)_ | Anti-pattern: invalid actor ("the system") | +| 4 | regression | _(none)_ | Anti-pattern: technical AC (DB implementation detail) | +| 5 | negative | _(none)_ | Routing: atomic behavior → living-doc-create-functionality | + +## Trigger eval summary + +14 entries: 10 `should_trigger=true`, 4 `should_trigger=false` + +| Routes to | Query count | +|---|---| +| living-doc-create-feature | 1 | +| living-doc-create-functionality | 1 | +| living-doc-scenario-creator | 1 | +| living-doc-gap-finder | 1 | diff --git a/skills/living-doc-create-user-story/evals/trigger-eval.json b/skills/living-doc-create-user-story/evals/trigger-eval.json new file mode 100644 index 0000000..7f9d3d2 --- /dev/null +++ b/skills/living-doc-create-user-story/evals/trigger-eval.json @@ -0,0 +1,16 @@ +[ + {"id": 1, "query": "Create a user story for the password reset feature", "should_trigger": true, "reason": "Explicit 'create a user story' trigger keyword"}, + {"id": 2, "query": "Write acceptance criteria for the login capability", "should_trigger": true, "reason": "Explicit 'write acceptance criteria' trigger keyword"}, + {"id": 3, "query": "I need a new user story — a customer wants to track their delivery", "should_trigger": true, "reason": "Explicit 'new user story' trigger keyword"}, + {"id": 4, "query": "As a customer I want to view my order history", "should_trigger": true, "reason": "As-a narrative format triggers US elicitation"}, + {"id": 5, "query": "Help me document a business requirement for promo codes", "should_trigger": true, "reason": "'document a business requirement' trigger phrase"}, + {"id": 6, "query": "I need to define US AC for the checkout flow", "should_trigger": true, "reason": "'define US AC' trigger phrase"}, + {"id": 7, "query": "User story template for a SaaS onboarding feature", "should_trigger": true, "reason": "'user story template' trigger phrase"}, + {"id": 8, "query": "Elicit requirements for the notifications feature", "should_trigger": true, "reason": "'elicit requirements' trigger phrase"}, + {"id": 9, "query": "Review this user story and tell me what ACs are missing", "should_trigger": true, "reason": "Reviewing US ACs is part of this skill's completeness check"}, + {"id": 10, "query": "Is my narrative well formed? 'As a system I can process payments'", "should_trigger": true, "reason": "Validating a narrative is a core skill task"}, + {"id": 11, "query": "Document the checkout page as a Feature entity", "should_trigger": false, "reason": "Feature entity creation — routes to living-doc-create-feature"}, + {"id": 12, "query": "Document the atomic behavior: validate cart is not empty", "should_trigger": false, "reason": "Atomic behavior is a Functionality — routes to living-doc-create-functionality"}, + {"id": 13, "query": "Generate BDD scenarios for US-001", "should_trigger": false, "reason": "Scenario generation — routes to living-doc-scenario-creator"}, + {"id": 14, "query": "What test gaps exist in our living documentation?", "should_trigger": false, "reason": "Gap analysis — routes to living-doc-gap-finder"} +] diff --git a/skills/living-doc-gap-finder/SKILL.md b/skills/living-doc-gap-finder/SKILL.md new file mode 100644 index 0000000..9b73181 --- /dev/null +++ b/skills/living-doc-gap-finder/SKILL.md @@ -0,0 +1,191 @@ +--- +name: living-doc-gap-finder +description: > + Identify gaps in the living documentation by combining bottom-up UI/code exploration with + top-down requirement checking. Activate when auditing living doc completeness, finding + undocumented behaviors, discovering orphan tests with no AC link, detecting untested ACs, + producing a documentation coverage gap report, or proposing new living doc entities to fill + identified gaps. Orchestrates living-doc-pageobject-scan, living-doc-scenario-creator (read-only), + and living-doc-create-* skills. + Triggers on: "find what's not documented", "living doc gaps", "what's missing in living doc", + "find undocumented features", "orphan tests", "untested AC", "documentation coverage", + "gap report", "what's not covered", "living doc audit", "documentation audit". + Does NOT trigger for: creating new living doc objects (use living-doc-create-* skills), + generating tutorials (use living-doc-tutorial-creator). + Orchestrates: living-doc-pageobject-scan, living-doc-scenario-creator, and all create-* skills. +license: Apache-2.0 +compatibility: GitHub Copilot +--- + +# Living Doc — Gap Finder + +> **Key concepts:** Feature, Functionality, User Story, AC, PageObject — see `../references/living-doc-glossary.md`. + +## Gap taxonomy + +Five types of gaps are detected, in order of risk: + +| Priority | Gap type | Description | +|---|---|---| +| 1 — Blocker | **Untested AC** | An Active or Implemented AC in a User Story or Functionality has no linked test | +| 2 — Important | **Undocumented UI surface** | A screen or API endpoint exists in the app with no Feature entity | +| 3 — Important | **Orphan Feature** | A Feature entity exists with no linked User Story | +| 4 — Important | **Orphan test** | A test or BDD scenario exists with no linked AC | +| 5 — Nit | **Undocumented Functionality** | A Functionality entity exists with no associated tests | + +## Workflow + +### Step 1 — Bottom-up scan (apply living-doc-pageobject-scan) + +Load and follow the `living-doc-pageobject-scan` skill to build an **inventory** of: +- All discoverable UI screens and API endpoints +- All existing test files and BDD scenarios +- All existing PageObjects and their method coverage + +Output: `inventory.json` — a flat list of discovered artifacts. + +### Step 2 — Top-down entity traversal + +Traverse the entity graph by following relationship fields: +- All User Stories (with their ACs and status) — the root entry points +- All Features (via User Story `features` links) +- All Functionalities (via Feature `functionalities` links) +- All existing test links (test file → AC mappings) + +### Step 3 — Compute gaps + +For each gap type: + +**Gap type 1 — Untested AC:** +``` +For each AC in (UserStory.ACs + Functionality.ACs) + where status IN (Active, Implemented) + where no linked test exists: + → GAP: UNTESTED_AC +``` + +**Gap type 2 — Undocumented UI surface:** +``` +For each item in inventory (screens, API endpoints) + where no Feature entity exists for this surface: + → GAP: UNDOCUMENTED_SURFACE +``` + +**Gap type 3 — Orphan Feature:** +``` +For each Feature reachable via entity relationships + where user_stories == [] AND functionalities == []: + → GAP: ORPHAN_FEATURE +``` + +**Gap type 4 — Orphan test:** +``` +For each test in inventory + where no linked AC exists in any UserStory or Functionality: + → GAP: ORPHAN_TEST +``` + +**Gap type 5 — Undocumented Functionality:** +``` +For each Functionality reachable via Feature `functionalities` links + where no test references this Functionality's ACs: + → GAP: UNDOCUMENTED_FUNCTIONALITY +``` + +### Step 4 — Prioritise by risk + +Sort all gaps by: +1. Priority (Blocker before Important before Nit) +2. Within priority: by the number of dependent entities (higher impact first) +3. Within that: alphabetically by entity ID + +### Step 5 — Propose new entities + +For each gap, propose the living doc action: + +| Gap type | Proposed action | +|---|---| +| UNTESTED_AC | Create BDD scenario → `living-doc-scenario-creator` | +| UNDOCUMENTED_SURFACE | Create Feature entity → `living-doc-create-feature` | +| ORPHAN_FEATURE | Link to a User Story or delete if not used | +| ORPHAN_TEST | Link test to an existing AC, or create a Functionality → `living-doc-create-functionality`. **Never delete a test to resolve an orphan — that would silently remove coverage.** If the linked AC ID no longer exists (broken link), choose from: (1) recreate the AC/Functionality if the behavior is still required; (2) update the link to the merged AC ID if the entity was merged; (3) delete the test only after product owner confirmation that the behavior has been intentionally removed. | +| UNDOCUMENTED_FUNCTIONALITY | Create unit/integration tests for the Functionality's ACs | + +> **Out-of-scope actions:** living-doc-gap-finder identifies and proposes new entities — it does +> not create them. Direct creation requests (e.g. "create a User Story", "create a Feature") must +> be delegated to the appropriate skill: `living-doc-create-user-story`, `living-doc-create-feature`, +> or `living-doc-create-functionality`. + +### Step 6 — Output gap report + +```json +{ + "generated_at": "2026-05-15T10:00:00Z", + "documentation_coverage": { + "user_stories_with_full_coverage": 12, + "user_stories_with_gaps": 3, + "coverage_percentage": 80 + }, + "gaps": [ + { + "id": "GAP-001", + "type": "UNTESTED_AC", + "severity": "Blocker", + "entity": "AC:US-007-02", + "description": "Active AC 'Payment declined' has no linked E2E test", + "proposed_action": "Generate BDD scenario using living-doc-scenario-creator for US-007" + }, + { + "id": "GAP-002", + "type": "UNDOCUMENTED_SURFACE", + "severity": "Important", + "entity": "/account/preferences", + "description": "Screen /account/preferences discovered in webapp scan — no Feature entity", + "proposed_action": "Create Feature entity using living-doc-create-feature" + } + ] +} +``` + +## Documentation coverage metric + +``` +Coverage % = (ACs with at least one linked test) / (total ACs) × 100 +``` + +Report separately for: +- User Story ACs (E2E coverage) +- Functionality ACs (unit/integration coverage) + +A project with 100% documentation coverage has every AC backed by at least one test. + +## Large-scale analysis: batching guidance + +When the gap inventory is large (e.g. 100+ orphan tests or undocumented features from a legacy +codebase), running a single full-codebase gap-finder pass produces an unmanageable report. +Instead: + +1. **Batch by domain or Feature area** — process one Feature or service at a time. +2. **Prioritise by business risk** — start with the highest-risk domains first: payment, auth, + security, regulatory compliance. These gaps pose the greatest production risk. +3. **Iterate** — after each batch, link tests, create entities, and re-run gap-finder on that + domain before moving to the next. + +Processing everything at once is discouraged because the resulting gap list is too large to action +without clear prioritisation. + +## Lightweight scenario-coverage report format + +When the focus is specifically on scenario-to-AC coverage (rather than the full gap taxonomy), +or when asked to demonstrate or describe the gap report output format, +use this simplified two-section format: + +**Missing Scenarios** (ACs with no linked Gherkin scenario): +- `` — + +**Missing ACs** (Gherkin scenarios with no corresponding AC): +- `` — + +End with a summary line: `X ACs missing scenarios, Y scenarios missing ACs.` + +This format is diagnostic only — it does not suggest implementation changes. diff --git a/skills/living-doc-gap-finder/evals/evals.json b/skills/living-doc-gap-finder/evals/evals.json new file mode 100644 index 0000000..8fb49d3 --- /dev/null +++ b/skills/living-doc-gap-finder/evals/evals.json @@ -0,0 +1,110 @@ +{ + "skill_name": "living-doc-gap-finder", + "evals": [ + { + "id": 1, + "category": "happy-path", + "prompt": "Run a gap analysis on our living documentation. File: evals/files/catalog-snapshot.json", + "expected_output": "Agent analyzes the snapshot and produces a gap report with: Blocker — US-001-AC-2 and US-001-AC-3 (critical ACs) have no linked tests; Blocker — all 4 ACs of US-007 have no linked tests; Important — /account/preferences screen discovered in webapp with no Feature entity; Important — FEAT-orphan (Legacy Report Screen) has no User Stories and no Functionalities; Important — test_order_history.py and test_login_flow.feature have no linked ACs (orphan tests); Nit — FUNC-apply-discount has 5 ACs with no linked tests. Documentation coverage = 1/9 ACs covered = 11%.", + "files": [ + "evals/files/catalog-snapshot.json" + ], + "expectations": [ + "Identifies US-001-AC-2 and US-001-AC-3 as untested critical ACs (Blockers)", + "Identifies all 4 US-007 ACs as untested (Blockers)", + "Identifies /account/preferences as undocumented surface (Important)", + "Identifies FEAT-orphan as orphan Feature (Important)", + "Identifies test_order_history.py and test_login_flow.feature as orphan tests (Important)", + "Identifies FUNC-apply-discount ACs as untested (Nit)", + "Calculates documentation coverage percentage" + ] + }, + { + "id": 2, + "category": "happy-path", + "prompt": "What is documentation coverage and how is it calculated?", + "expected_output": "Documentation coverage = (ACs with at least one linked test) / (total ACs) × 100%. Reported separately for User Story ACs (E2E coverage) and Functionality ACs (unit/integration coverage). A project with 100% documentation coverage has every AC backed by at least one test. The metric drives the gap-finder workflow toward zero gaps.", + "files": [], + "expectations": [ + "Correct formula: covered ACs / total ACs × 100", + "Reported separately for US ACs and Functionality ACs", + "Notes 100% means every AC has at least one test" + ] + }, + { + "id": 3, + "category": "happy-path", + "prompt": "A test file exists with no linked AC. What gap type is this and what should I do?", + "expected_output": "This is an orphan test (Gap type 4 — Important). Resolution options: (1) find an existing AC in the living doc that this test covers and add the link; (2) if no AC exists, create a Functionality entity for the behavior being tested using living-doc-create-functionality, then link the test to the new Functionality's AC. Never delete a test to resolve an orphan — that would remove coverage.", + "files": [], + "expectations": [ + "Classifies as Gap type 4: ORPHAN_TEST", + "Provides two resolution options: link to existing AC, or create new Functionality", + "Explicitly warns against deleting the test" + ] + }, + { + "id": 4, + "category": "regression", + "prompt": "We have 200 orphan tests from a legacy codebase. Should I run gap-finder on all of them at once?", + "expected_output": "Batch the gap-finder run by domain or Feature area rather than running across the entire codebase at once. Process the highest-risk areas first (payment, auth, security). For each batch: identify which Functionalities or User Stories the tests correspond to, create missing living doc entities, and link tests. Processing all 200 at once produces an unmanageable gap report — prioritise by business risk.", + "files": [], + "expectations": [ + "Recommends batching by domain or Feature area", + "Prioritises highest-risk areas first (payment, auth, security)", + "Advises against running full-codebase analysis in one pass" + ] + }, + { + "id": 5, + "category": "negative", + "prompt": "Create a new User Story for the account preferences screen.", + "expected_output": "Creating a User Story is not a gap-finder action — routes to living-doc-create-user-story. living-doc-gap-finder identifies and proposes new entities; the creation itself is delegated to the appropriate create-* skill.", + "files": [], + "expectations": [ + "Does not create the User Story", + "Routes to living-doc-create-user-story", + "Notes the gap-finder proposes, the create-* skills execute" + ] + }, + { + "id": 6, + "category": "paraphrase", + "prompt": "Where are the holes in our living documentation? I want to make sure everything is covered before the release.", + "expected_output": "Agent runs the full gap analysis workflow. Reports: untested ACs (by priority), undocumented surfaces (features visible in code/UI with no Feature entity), orphan tests (tests with no linked AC), orphan Features (no User Stories or Functionalities). Produces documentation coverage percentage. Highlights Blockers first (untested critical ACs), then Important (undocumented surfaces, orphan tests), then Nit (low-priority untested ACs).", + "files": [], + "expectations": [ + "Identifies this as a gap analysis request despite 'holes' phrasing", + "Reports gap types by severity (Blocker/Important/Nit)", + "Produces documentation coverage percentage", + "Covers all four gap types: untested ACs, undocumented surfaces, orphan tests, orphan features" + ] + }, + { + "id": 7, + "category": "edge-case", + "prompt": "A test is linked to an AC, but the AC was deleted from the living doc. How is this classified and what should I do?", + "expected_output": "This is a broken-link gap (variant of Gap type 4: ORPHAN_TEST). The test references an AC ID that no longer exists. Resolution options: (1) If the behavior the test covers is still required, recreate the Functionality/AC entity and relink. (2) If the behavior has been removed, the test should be deleted after confirming with the product owner. (3) If the AC was merged into another entity, update the test's link comment to the new AC ID. Never delete a test without product owner confirmation.", + "files": [], + "expectations": [ + "Classifies as broken-link orphan test", + "Provides three resolution options", + "Warns against deleting the test without product owner confirmation", + "Notes the possibility of AC merge as a resolution path" + ] + }, + { + "id": 8, + "category": "output-format", + "prompt": "Run a gap analysis and show me exactly what format the output report uses.", + "expected_output": "The gap report has two clearly labelled sections: 'Missing Scenarios' (ACs with no linked Gherkin scenario) and 'Missing ACs' (scenarios with no corresponding AC). Each missing item is a bulleted entry with the AC or scenario ID and its description. The report ends with a count summary line: 'X ACs missing scenarios, Y scenarios missing ACs'. No implementation changes are suggested — the report is diagnostic only.", + "files": [], + "expectations": [ + "Two sections: Missing Scenarios and Missing ACs", + "Each item is a bulleted entry with ID and description", + "Count summary line at the end", + "Diagnostic only — no implementation suggestions" + ] + } + ] +} diff --git a/skills/living-doc-gap-finder/evals/files/catalog-snapshot.json b/skills/living-doc-gap-finder/evals/files/catalog-snapshot.json new file mode 100644 index 0000000..1c28db1 --- /dev/null +++ b/skills/living-doc-gap-finder/evals/files/catalog-snapshot.json @@ -0,0 +1,59 @@ +{ + "generated_at": "2026-05-15T08:00:00Z", + "catalog": { + "user_stories": [ + {"id": "US-001", "title": "Place an online order", "status": "ready", "ac_count": 3}, + {"id": "US-002", "title": "View order history", "status": "ready", "ac_count": 2}, + {"id": "US-007", "title": "Apply a promotional discount", "status": "ready", "ac_count": 4} + ], + "features": [ + {"id": "FEAT-checkout", "name": "Checkout Page", "user_stories": ["US-001"]}, + {"id": "FEAT-account", "name": "Account Dashboard", "user_stories": ["US-002"]}, + {"id": "FEAT-promo", "name": "Promotions Module", "user_stories": []}, + {"id": "FEAT-orphan", "name": "Legacy Report Screen", "user_stories": [], "functionalities": []} + ], + "functionalities": [ + { + "id": "FUNC-validate-cart", + "parent_feature": "FEAT-checkout", + "ac_count": 3, + "linked_tests": ["test_validate_cart.py::test_empty_cart"] + }, + { + "id": "FUNC-apply-discount", + "parent_feature": "FEAT-promo", + "ac_count": 5, + "linked_tests": [] + } + ] + }, + "inventory": { + "ui_screens": [ + "/checkout", + "/account/orders", + "/account/preferences", + "/promotions", + "/reports/legacy" + ], + "test_files": [ + {"file": "test_validate_cart.py", "linked_ac": "FUNC-validate-cart-AC-1"}, + {"file": "test_order_history.py", "linked_ac": null}, + {"file": "test_login_flow.feature", "linked_ac": null} + ], + "bdd_scenarios": [ + {"scenario": "Customer successfully places an order", "linked_ac": "US-001-AC-1"}, + {"scenario": "View paginated order history", "linked_ac": null} + ] + }, + "known_test_links": { + "US-001-AC-1": "tests/features/checkout.feature::Customer successfully places an order", + "US-001-AC-2": null, + "US-001-AC-3": null, + "US-002-AC-1": null, + "US-002-AC-2": null, + "US-007-AC-1": null, + "US-007-AC-2": null, + "US-007-AC-3": null, + "US-007-AC-4": null + } +} diff --git a/skills/living-doc-gap-finder/evals/fixture-map.md b/skills/living-doc-gap-finder/evals/fixture-map.md new file mode 100644 index 0000000..cb5d19e --- /dev/null +++ b/skills/living-doc-gap-finder/evals/fixture-map.md @@ -0,0 +1,27 @@ +# Fixture Map — living-doc-gap-finder + +## Fixture files + +| File | Description | +|---|---| +| `evals/files/catalog-snapshot.json` | Snapshot of the living doc catalog + webapp inventory showing: 8 uncovered ACs (including 5 critical), 1 undocumented screen, 1 orphan Feature, 2 orphan tests, 5 Functionality ACs with no linked tests | + +## Eval to fixture mapping + +| Eval ID | Category | Fixture file(s) | Coverage | +|---|---|---|---| +| 1 | happy-path | `catalog-snapshot.json` | Full gap analysis: all 5 gap types detected, gap report with severity levels, coverage % calculation | +| 2 | happy-path | _(none)_ | Coverage metric explanation and calculation | +| 3 | happy-path | _(none)_ | Orphan test resolution: link or create Functionality, never delete | +| 4 | regression | _(none)_ | Batch processing advice: domain-by-domain, prioritise by business risk | +| 5 | negative | _(none)_ | Routing: creating a User Story → living-doc-create-user-story | + +## Trigger eval summary + +14 entries: 11 `should_trigger=true`, 3 `should_trigger=false` + +| Routes to | Query count | +|---|---| +| living-doc-create-user-story | 1 | +| living-doc-create-feature | 1 | +| living-doc-tutorial-creator | 1 | diff --git a/skills/living-doc-gap-finder/evals/trigger-eval.json b/skills/living-doc-gap-finder/evals/trigger-eval.json new file mode 100644 index 0000000..d558fb7 --- /dev/null +++ b/skills/living-doc-gap-finder/evals/trigger-eval.json @@ -0,0 +1,16 @@ +[ + {"id": 1, "query": "Run a living doc gap analysis", "should_trigger": true, "reason": "'living doc gaps' trigger phrase"}, + {"id": 2, "query": "What's missing in our living documentation?", "should_trigger": true, "reason": "'what's missing in living doc' trigger phrase"}, + {"id": 3, "query": "Find undocumented features in the codebase", "should_trigger": true, "reason": "'find undocumented features' trigger phrase"}, + {"id": 4, "query": "Which tests have no linked acceptance criteria (orphan tests)?", "should_trigger": true, "reason": "'orphan tests' trigger keyword"}, + {"id": 5, "query": "Which ACs have no tests? (untested ACs)", "should_trigger": true, "reason": "'untested AC' trigger keyword"}, + {"id": 6, "query": "What is our documentation coverage percentage?", "should_trigger": true, "reason": "'documentation coverage' trigger keyword"}, + {"id": 7, "query": "Generate a gap report for the payments domain", "should_trigger": true, "reason": "'gap report' trigger keyword"}, + {"id": 8, "query": "What behaviors are not covered in the living doc?", "should_trigger": true, "reason": "'what's not covered' trigger phrase"}, + {"id": 9, "query": "Do a living doc audit before the release", "should_trigger": true, "reason": "'living doc audit' trigger phrase"}, + {"id": 10, "query": "Which User Story ACs are critical but have no BDD scenario?", "should_trigger": true, "reason": "Finding untested critical ACs — core gap-finder task"}, + {"id": 11, "query": "Find what's not documented in our test suite", "should_trigger": true, "reason": "'find what's not documented' trigger phrase"}, + {"id": 12, "query": "Create a user story for the preferences screen gap", "should_trigger": false, "reason": "Creating a User Story — routes to living-doc-create-user-story"}, + {"id": 13, "query": "Create a Feature entity for the account preferences screen", "should_trigger": false, "reason": "Creating a Feature — routes to living-doc-create-feature"}, + {"id": 14, "query": "Generate a tutorial from the checkout .feature file", "should_trigger": false, "reason": "Tutorial generation — routes to living-doc-tutorial-creator"} +] diff --git a/skills/living-doc-gap-finder/references/glossary.md b/skills/living-doc-gap-finder/references/glossary.md new file mode 100644 index 0000000..f77bc84 --- /dev/null +++ b/skills/living-doc-gap-finder/references/glossary.md @@ -0,0 +1,120 @@ +# Living Documentation — Shared Glossary + +> **This file has moved.** +> The shared glossary is now at [`skills/references/living-doc-glossary.md`](../../references/living-doc-glossary.md). +> Update any links pointing here to use the new path. + + +--- + +## Core entities + +### User Story (US) + +A business-level requirement expressed from the perspective of a named actor. + +``` +As a , +I can , +so that . +``` + +- ID format: `US-` (e.g. `US-001`) +- Owns: end-to-end **Acceptance Criteria (AC)** written in Given-When-Then format +- Links to: one or more **Features** (system surfaces the User Story touches) +- Status: `draft | ready | in-progress | done | deprecated` + +### Feature + +A named system surface — the structural layer between User Stories and atomic behaviors. + +- ID format: `FEAT-` (e.g. `FEAT-checkout`) +- Surface types: `UI | API | Service | Module` +- Owns: one or more **Functionalities** +- Linked to: one or more **User Stories** +- Status: `active | candidate | deprecated` +- Registry: all Features are listed in `docs/FEATURE_REGISTRY.md` + +**One PageObject ≈ one Feature** for UI surfaces. + +### Functionality (FUNC) + +An atomic, fast-testable behavior — a single verb phrase describing one responsibility. + +- ID format: `FUNC-` (e.g. `FUNC-apply-discount`) +- Belongs to: one parent **Feature** +- Owns: **Functionality-level Acceptance Criteria** (When/Then format, unit/integration-testable) +- Test type per AC: `unit | integration` +- Priority per AC: `critical | high | medium | low` + +Functionalities differ from User Story ACs: they are atomic and fast-testable (unit/integration), +not end-to-end. A single User Story may trigger multiple Functionalities. + +### Acceptance Criterion (AC) + +A binary pass/fail statement that defines a verifiable condition. + +**User Story AC format (end-to-end):** +``` +Given: +When: +Then: +``` + +**Functionality AC format (atomic/fast):** +``` +When: +Then: +``` + +Each AC has: +- A unique ID: `-AC-` (e.g. `US-001-AC-1`, `FUNC-apply-discount-AC-2`) +- A `priority`: `critical | high | medium | low` +- A `test_type` (Functionality ACs only): `unit | integration` + +### PageObject + +A class that encapsulates the selectors and actions of a single UI screen. Used by BDD step +definitions to interact with the application without embedding selectors in step code. + +- Naming: `Page` (e.g. `CheckoutPage`) +- One PageObject per distinct screen or significant modal +- Selector preference: `data-testid` > `aria-label`/role > CSS class (last resort) + +--- + +## Relationship diagram + +``` +User Story (US) + └── triggers / links to → Feature (FEAT) + └── owns → Functionality (FUNC) + └── owns → Functionality ACs + └── maps to → unit/integration tests + └── owns → User Story ACs (Given/When/Then) + └── maps to → BDD Scenarios (.feature files) + └── implemented by → Step Definitions + └── delegates to → PageObjects +``` + +--- + +## Living doc catalog + +The **living doc catalog** is the collection of all canonical entity JSON files in the project. +Typically stored under `docs/living-doc/` or equivalent. Gap finder, scenario creator, and +tutorial creator all read from this catalog. + +--- + +## What each skill creates or consumes + +| Skill | Creates | Reads | +|---|---|---| +| `living-doc-create-user-story` | User Story JSON | Feature Registry | +| `living-doc-create-feature` | Feature JSON + FEATURE_REGISTRY.md entry | User Story list | +| `living-doc-create-functionality` | Functionality JSON | Feature JSON | +| `living-doc-pageobject-scan` | PageObject classes + Feature stubs | App URL or test suite | +| `living-doc-scenario-creator` | .feature files | User Story, PageObjects | +| `living-doc-tutorial-creator` | Tutorial markdown | .feature files, User Stories | +| `living-doc-gap-finder` | Gap report | All of the above | diff --git a/skills/living-doc-impact-analysis/SKILL.md b/skills/living-doc-impact-analysis/SKILL.md new file mode 100644 index 0000000..2a2836e --- /dev/null +++ b/skills/living-doc-impact-analysis/SKILL.md @@ -0,0 +1,148 @@ +--- +name: living-doc-impact-analysis +description: > + Analyse the impact of a code change on the living documentation. Given a PR diff, + modified module, or changed API contract, trace which Features, Functionalities, User Stories, + and Gherkin scenarios are affected. Output an impact map that identifies what must be reviewed, + updated, or re-tested. Activate when a PR touches business logic and you need to know what + living doc entities are affected, when a service module is refactored, or when breaking API + changes need living doc coverage traced. + Triggers on: "living doc impact", "what does this change affect", "impact of PR on living doc", + "trace affected user stories", "affected features", "impact analysis", "living doc sign-off", + "what user stories are affected", "which scenarios need re-running", "PR impact on docs". + Does NOT trigger for: updating living doc (use living-doc-update), finding coverage gaps + (use living-doc-gap-finder), creating new entities (use living-doc-create-* skills). + +license: Apache-2.0 +compatibility: GitHub Copilot +--- + +# Living Doc — Impact Analysis + +> **Key concepts:** Feature, Functionality, User Story, AC — see `../references/living-doc-glossary.md`. + +## Step 1 — Identify the changed surface area + +Start from the code change (PR diff, renamed module, deleted endpoint): + +1. List the changed files and classify each: + - **Domain logic** (service, repository, domain model) + - **API contract** (controller, route, OpenAPI spec) + - **Event contract** (schema, Avro, Protobuf) + - **UI component** (page, form, component) + - **Configuration / infrastructure** (no living doc impact unless it changes a business flow) + +2. Map changed files to modules/services using the project structure. + +3. For each changed module, identify the corresponding Feature by traversing entity relationships: + - Which Feature owns this module? (check the Feature's `functionalities` links or ask the owning team) + - Which Functionalities does this module implement? + +## Step 2 — Trace to living doc entities + +Walk the entity hierarchy from Feature → Functionality → User Story: + +``` +Changed module: src/payments/checkout/PromoService.java + → Feature: FEAT-promotions + → Functionalities: FUNC-promo-validate, FUNC-promo-apply + → User Stories: US-042 (apply promo), US-067 (expired promo error) + → ACs affected: AC:US-042-01, AC:US-042-03, AC:US-067-02 + → Linked scenarios: checkout/promo_apply.feature (Scenarios 1, 3), checkout/promo_error.feature (Scenario 2) +``` + +Repeat for every changed module. Consolidate entities that appear more than once — they are +higher-risk and need priority review. + +**Shared utility classes:** If the changed file is a shared utility used by multiple modules +(e.g. `MoneyUtils`, `DateHelper`), fan out the trace to **every** Feature that imports or +depends on that utility. Classify each as **High impact** — a shared utility change propagates +to all consumers and each consumer's ACs must be reviewed. Produce a consolidated impact map +that covers all affected Feature areas. + +## Step 3 — Classify the impact level + +| Impact level | Criteria | Action required | +|---|---|---| +| **High** | AC or business rule directly changed or deleted | Must update living doc and re-run linked scenarios | +| **Medium** | Module changed but business rule unchanged (refactor/rename) | Update living doc if method names referenced; confirm scenarios still pass | +| **Low** | Config / infra change that alters a business flow | Update living doc if the flow change is documented; note in PR | +| **None** | Pure infrastructure change (resource limits, scaling, deployment config) with no business flow impact; or test files, mocks, build scripts only | No living doc update needed | + +## Step 4 — Output the impact map + +Emit a structured impact map for the PR or change set: + +``` +IMPACT MAP — PR #217: "Refactor promo validation to support stacked discounts" + Surface area: src/payments/checkout/PromoService.java (domain logic — High) + src/payments/checkout/PromoController.java (API contract — High) + + Affected entities: + Feature: FEAT-promotions (owner: team-payments) + Functionalities: FUNC-promo-validate, FUNC-promo-apply + User Stories: US-042 (high impact), US-067 (high impact), US-089 (medium impact) + + ACs requiring review: + AC:US-042-01 — Happy path: single promo applied correctly + AC:US-042-03 — Stacked promos applied in priority order ← NEW BEHAVIOUR + AC:US-067-02 — Expired promo returns 422 + + Scenarios requiring re-run: + checkout/promo_apply.feature — Scenarios: 1, 3 + checkout/promo_error.feature — Scenario: 2 + + Recommended actions: + 1. Update living-doc: add AC for stacked discount priority order (AC:US-042-03 is new) + → Invoke living-doc-update + 2. Sync Gherkin: promo_apply.feature Scenario 3 needs updating for stacked discount + → Invoke gherkin-living-doc-sync + 3. Re-run E2E journeys: US-042 and US-067 critical path scenarios + → Invoke test-e2e-standards +``` + +## Step 5 — Release sign-off checklist + +Before a release, confirm that all High-impact entities have been addressed: + +| Check | Status | +|---|---| +| All High-impact ACs reviewed and updated if needed | ☐ | +| All linked Gherkin scenarios re-run and passing | ☐ | +| living-doc-update applied for any changed business rules | ☐ | +| gherkin-living-doc-sync run for any drifted step text | ☐ | + +Produce this checklist as a PR comment or documentation artefact if requested. + +## Code-level impact report format + +When the change is a **method signature change** or **API contract change**, produce a +code-level impact report with four sections: + +**Direct callers** — classes or methods that call the changed method directly (markdown list). + +**Downstream dependents** — components that use the return value or depend on the changed +contract (markdown list). + +**Required changes** — concrete call-site updates needed (markdown list; include the old and +new signatures in fenced code blocks). + +**Test coverage required** — tests that must be added or updated to cover the new contract +(markdown list). + +Do not include speculative changes beyond the described scope. + +## Anti-patterns to flag + +| Anti-pattern | Flag | +|---|---| +| Changed domain logic with no Feature entity defined in the living doc | Missing living doc coverage — flag as a **High-impact gap** and recommend creating documentation with `living-doc-create-functionality` | +| AC not linked to any Gherkin scenario after a High-impact change | Coverage gap — flag for gherkin-living-doc-sync | +| Impact analysis only covers unit/integration tests, not E2E scenarios | Incomplete impact — flag for test-e2e-standards review | + +## Out-of-scope redirects + +| Request type | Correct skill | +|---|---| +| "Update a living doc entity / add a new AC" | `living-doc-update` — this skill analyses impact, it does not edit entities | +| "Which Functionalities have no User Stories / find coverage gaps" | `living-doc-gap-finder` — gap discovery is a separate concern | diff --git a/skills/living-doc-impact-analysis/evals/evals.json b/skills/living-doc-impact-analysis/evals/evals.json new file mode 100644 index 0000000..00b966c --- /dev/null +++ b/skills/living-doc-impact-analysis/evals/evals.json @@ -0,0 +1,156 @@ +{ + "skill_name": "living-doc-impact-analysis", + "evals": [ + { + "id": 1, + "type": "happy-path", + "description": "Developer opens a PR modifying a domain service and wants to know what living doc entities are affected.", + "prompt": "PR #217 modifies PromoService.java to support stacked discounts. What living doc entities does this change affect?", + "expected": { + "skill_triggered": true, + "key_guidance": [ + "Map changed file to Feature via FEATURE_REGISTRY.md", + "Trace Feature → Functionality → User Stories → ACs", + "Classify impact level (High for changed business logic)", + "Output structured impact map" + ] + } + }, + { + "id": 2, + "type": "regression", + "description": "Developer asks which scenarios need re-running after an API contract change.", + "prompt": "We changed the /v2/orders endpoint to add a required 'currency' field. Which Gherkin scenarios need to be re-run?", + "expected": { + "skill_triggered": true, + "key_guidance": [ + "Identify changed API contract surface area", + "Map endpoint to Feature and Functionalities", + "List linked User Stories and ACs", + "List linked Gherkin scenarios requiring re-run" + ] + } + }, + { + "id": 3, + "type": "regression", + "description": "Developer needs a living doc impact sign-off for a release.", + "prompt": "We're about to release the checkout refactor. Can you produce a living doc impact sign-off checklist for the release?", + "expected": { + "skill_triggered": true, + "key_guidance": [ + "Identify all High-impact entities", + "Produce release sign-off checklist", + "Include: ACs reviewed, scenarios re-run, living-doc-update applied, gherkin-living-doc-sync run" + ] + } + }, + { + "id": 4, + "type": "regression", + "description": "Changed module has no entry in FEATURE_REGISTRY.md — should flag the gap.", + "prompt": "The ShippingCalculator.java was changed in the PR but it doesn't appear in FEATURE_REGISTRY.md.", + "expected": { + "skill_triggered": true, + "key_guidance": [ + "Flag missing living doc coverage", + "Recommend invoking living-doc-create-functionality for the missing module", + "Note this as a High-impact gap" + ] + } + }, + { + "id": 5, + "type": "regression", + "description": "Infra-only change with no business logic — should return 'None' impact level.", + "prompt": "PR #300 updates the Kubernetes resource limits for the order-service deployment. What is the living doc impact?", + "expected": { + "skill_triggered": true, + "key_guidance": [ + "Classify as None impact level", + "Config/infra changes do not require living doc updates", + "Note in PR that no living doc update is needed" + ] + } + }, + { + "id": 6, + "type": "negative", + "description": "Developer asks to update the living doc — should trigger living-doc-update, not living-doc-impact-analysis.", + "prompt": "Update US-042 to add a new AC for the expired promo path.", + "expected": { + "skill_triggered": false, + "reason": "Updating entities is handled by living-doc-update, not living-doc-impact-analysis" + } + }, + { + "id": 7, + "type": "negative", + "description": "Developer asks to find living doc gaps — should trigger living-doc-gap-finder.", + "prompt": "Which Functionalities don't have any User Stories?", + "expected": { + "skill_triggered": false, + "reason": "Finding coverage gaps is handled by living-doc-gap-finder, not living-doc-impact-analysis" + } + }, + { + "id": 8, + "type": "paraphrase", + "description": "Same impact analysis intent — phrased as 'what needs re-testing' rather than 'what is affected'.", + "prompt": "We're about to merge a PR that changes the cart validation logic. What do we need to re-test in the living doc?", + "expected": { + "skill_triggered": true, + "key_guidance": [ + "Map changed code to Feature via FEATURE_REGISTRY.md", + "Trace Feature → Functionality → User Stories → ACs", + "List all linked Gherkin scenarios that need re-running", + "Output structured re-test checklist" + ] + } + }, + { + "id": 9, + "type": "edge-case", + "description": "PR touches a shared utility class used by multiple Features — impact should fan out to all.", + "prompt": "PR #410 modifies MoneyUtils.java, a shared utility class used by checkout, refunds, and the promotions engine. What is the living doc impact?", + "expected": { + "skill_triggered": true, + "key_guidance": [ + "Fan out the impact analysis to all Features that reference MoneyUtils", + "List all Functionalities in each Feature that call MoneyUtils", + "Classify all as High impact — shared utility changes affect all consumers", + "Produce a consolidated impact map across all three Feature areas" + ] + } + }, + { + "id": 10, + "type": "output-format", + "prompt": "PaymentGatewayClient.charge() now returns Result instead of ChargeResponse. What does the impact analysis report look like?", + "expected_output": "The impact analysis report has four sections: 'Direct callers' (services or classes that call the changed method), 'Downstream dependents' (components using the return value), 'Required changes' (concrete refactoring steps), and 'Test coverage required' (which tests need updating). Each section uses a markdown list. The method signature change appears in a fenced code block. No speculative changes beyond the described scope are included.", + "files": [], + "expectations": [ + "Four sections: Direct callers, Downstream dependents, Required changes, Test coverage required", + "Each section uses a markdown list", + "Method signature change in a fenced code block", + "No speculative changes beyond the described scope" + ] + }, + { + "id": 11, + "type": "file-based", + "description": "Analyse the impact of a method signature change in a Python service class.", + "prompt": "The file evals/files/changed-notification-service.py shows NotificationClient.send() with its old and new signature. Produce the impact analysis for this change.", + "expected_output": "Agent produces an impact analysis with: Direct callers (callers of NotificationClient.send() visible in the file or implied by the class structure), Required changes (update each call site to pass the new parameter), and Test coverage required (add tests for the new parameter in NotificationClientTest). The old and new signatures are shown in fenced code blocks.", + "files": [ + "evals/files/changed-notification-service.py" + ], + "expectations": [ + "Direct callers section identifies the caller", + "Required changes section lists specific call site updates", + "Test coverage section lists tests to add or update", + "Old and new signatures in fenced code blocks" + ] + } + ] +} diff --git a/skills/living-doc-impact-analysis/evals/files/changed-notification-service.py b/skills/living-doc-impact-analysis/evals/files/changed-notification-service.py new file mode 100644 index 0000000..68d452e --- /dev/null +++ b/skills/living-doc-impact-analysis/evals/files/changed-notification-service.py @@ -0,0 +1,54 @@ +""" +NotificationClient — changed method signature. + +Used by: living-doc-impact-analysis file-based eval + +The send() method previously accepted (user_id, message). +It now accepts (user_id, message, channel) where channel is +one of: 'email', 'sms', 'push'. The channel parameter is required +(not optional) to force explicit intent at every call site. + +This file shows: +- The OLD signature (commented out) +- The NEW signature +- An example caller (OrderService) that uses the old signature +""" + + +class NotificationClient: + # OLD signature — no longer valid: + # def send(self, user_id: str, message: str) -> None: + + def send(self, user_id: str, message: str, channel: str) -> None: + """Send a notification to the user via the specified channel. + + Args: + user_id: The unique identifier of the recipient. + message: The notification message text. + channel: Delivery channel — one of: 'email', 'sms', 'push'. + + Raises: + ValueError: If channel is not one of the allowed values. + """ + allowed = {"email", "sms", "push"} + if channel not in allowed: + raise ValueError(f"channel must be one of {allowed}, got '{channel}'") + # ... actual delivery logic omitted + + +class OrderService: + """Uses the OLD NotificationClient.send() signature — needs updating.""" + + def __init__(self, notification_client: NotificationClient): + self._notifications = notification_client + + def place_order(self, user_id: str, sku: str, quantity: int) -> dict: + order = {"user_id": user_id, "sku": sku, "quantity": quantity, "status": "placed"} + # BUG: old signature — missing 'channel' argument + self._notifications.send(user_id, f"Order placed for {quantity}x {sku}") + return order + + def cancel_order(self, order_id: str, user_id: str) -> bool: + # BUG: old signature — missing 'channel' argument + self._notifications.send(user_id, f"Order {order_id} has been cancelled") + return True diff --git a/skills/living-doc-impact-analysis/evals/fixture-map.md b/skills/living-doc-impact-analysis/evals/fixture-map.md new file mode 100644 index 0000000..ae432a4 --- /dev/null +++ b/skills/living-doc-impact-analysis/evals/fixture-map.md @@ -0,0 +1,29 @@ +# Living Doc Impact Analysis — Evals Fixture Map + +| Test ID | Category | Fixture | +|---|---|---| +| 1 | happy-path | *(no file — PR domain logic impact map scenario)* | +| 2 | regression | *(no file — API contract change scenario re-run list)* | +| 3 | regression | *(no file — release sign-off checklist scenario)* | +| 4 | regression | *(no file — changed module missing from FEATURE_REGISTRY)* | +| 5 | regression | *(no file — infra-only change None impact level)* | +| 6 | negative | *(no file — update entity redirect to living-doc-update)* | +| 7 | negative | *(no file — gap-finding redirect to living-doc-gap-finder)* | + +## Coverage summary + +- happy-path: 1 (PR domain logic full impact trace) +- regression: 4 (API contract, release sign-off, missing registry entry, infra-only) +- negative: 2 (update entity redirect, gap-finder redirect) + +## Rules exercised + +| Rule | Eval ID | +|---|---| +| Map changed file → Feature → US → scenarios | 1 | +| API contract change impact trace | 2 | +| Release sign-off checklist | 3 | +| Flag missing FEATURE_REGISTRY coverage | 4 | +| Classify infra change as None impact | 5 | +| Out-of-scope: update entity → living-doc-update | 6 | +| Out-of-scope: find gaps → living-doc-gap-finder | 7 | diff --git a/skills/living-doc-impact-analysis/evals/trigger-eval.json b/skills/living-doc-impact-analysis/evals/trigger-eval.json new file mode 100644 index 0000000..f0baa5b --- /dev/null +++ b/skills/living-doc-impact-analysis/evals/trigger-eval.json @@ -0,0 +1,68 @@ +[ + { + "id": "t01-impact-explicit", + "query": "What living doc entities does PR #217 affect?", + "should_trigger": true, + "reason": "'impact of PR on living doc' is a listed trigger phrase." + }, + { + "id": "t02-trace-affected-us", + "query": "Which User Stories are affected by the PromoService refactor?", + "should_trigger": true, + "reason": "'trace affected user stories' is a listed trigger phrase." + }, + { + "id": "t03-affected-features", + "query": "Which Features are affected by the checkout module changes?", + "should_trigger": true, + "reason": "'affected features' is a listed trigger phrase." + }, + { + "id": "t04-impact-analysis", + "query": "Run a living doc impact analysis for the payment gateway refactor.", + "should_trigger": true, + "reason": "'impact analysis' is a listed trigger phrase." + }, + { + "id": "t05-living-doc-sign-off", + "query": "I need a living doc sign-off before releasing the checkout changes.", + "should_trigger": true, + "reason": "'living doc sign-off' is a listed trigger phrase." + }, + { + "id": "t06-what-does-change-affect", + "query": "What does the change to ShippingCalculator.java affect in the living doc?", + "should_trigger": true, + "reason": "'what does this change affect' is a listed trigger phrase." + }, + { + "id": "t07-which-scenarios-rerun", + "query": "Which scenarios need re-running after we changed the orders endpoint?", + "should_trigger": true, + "reason": "'which scenarios need re-running' is a listed trigger phrase." + }, + { + "id": "t08-pr-impact-docs", + "query": "Does PR #300 have any impact on the living doc?", + "should_trigger": true, + "reason": "'PR impact on docs' is a listed trigger phrase." + }, + { + "id": "t09-not-update-entity", + "query": "Update US-042 to add a new AC for the expired promo path.", + "should_trigger": false, + "reason": "Updating entities is handled by living-doc-update, not living-doc-impact-analysis." + }, + { + "id": "t10-not-gap-finder", + "query": "Which Functionalities don't have any User Stories?", + "should_trigger": false, + "reason": "Finding coverage gaps is handled by living-doc-gap-finder." + }, + { + "id": "t11-not-create", + "query": "Create a new User Story for the stacked discount feature.", + "should_trigger": false, + "reason": "Creating new entities is handled by living-doc-create-user-story." + } +] diff --git a/skills/living-doc-update/SKILL.md b/skills/living-doc-update/SKILL.md new file mode 100644 index 0000000..9b3e2c5 --- /dev/null +++ b/skills/living-doc-update/SKILL.md @@ -0,0 +1,138 @@ +--- +name: living-doc-update +description: > + Update, amend, or deprecate existing living documentation entities (User Stories, Features, + Functionalities). Activate when adding new ACs to an existing User Story, changing a Feature's + ownership or status, deprecating a Functionality whose code has been deleted, or promoting a + User Story from draft to ready. + Triggers on: "update user story", "add AC to user story", "deprecate feature", "mark US ready", + "change feature owner", "update functionality", "deprecate functionality", + "living doc update", "update living doc entity", "mark feature deprecated", "update AC", + "change status of user story". + Does NOT trigger for: creating new entities from scratch (use living-doc-create-user-story, + living-doc-create-feature, or living-doc-create-functionality), finding gaps + (use living-doc-gap-finder), generating scenarios (use living-doc-scenario-creator). + +license: Apache-2.0 +compatibility: GitHub Copilot +--- + +# Living Doc — Update + +> **Key concepts:** Feature, Functionality, User Story, AC — see `../references/living-doc-glossary.md`. + +## Identify the entity and change type + +Ask: *Which entity is being updated, and what kind of change is this?* + +| Change type | Entity | Update action | +|---|---|---| +| Add a new AC | User Story / Functionality | Append a new AC entry with the next sequential AC ID | +| Modify AC description | User Story / Functionality | Edit the description; keep the AC ID stable | +| Change status | Any entity | Update `status` field; record the transition event | +| Change owner | Feature | Update `owners` field | +| Add a linked User Story | Feature | Append to `user_stories` | +| Deprecate an entity | Any entity | Set `status: deprecated`; add `deprecated_at` and `reason` | +| Delete a Functionality | Functionality | Do not delete — deprecate it and link to the commit that removed the code | + +## Update a User Story — add or modify ACs + +When adding a new AC to an existing User Story: + +1. Load the existing User Story entity +2. Assign the next sequential AC ID: `AC:US--` +3. Elicit the new AC using the same completeness checklist as `living-doc-create-user-story`: + - Happy path covered? + - Error paths covered? + - Alternative flows covered? +4. Check whether the new AC affects any existing Gherkin scenarios — flag for + `gherkin-living-doc-sync` if so + +When modifying an existing AC **keep the AC ID stable** — changing the ID breaks traceability +to linked tests and Gherkin scenarios. Only change the description text or state. If the changed +AC text affects the wording of linked Gherkin steps, flag the linked scenarios for +`gherkin-living-doc-sync`. + +## Promote a User Story from draft to ready + +Invariants that must hold before setting `status: ready`: + +| Check | Requirement | +|---|---| +| Narrative complete | As-a/I-can/so-that is filled in with a named actor | +| At least one Feature linked | Not `[]` and not `[NEW: ...]` | +| At least one AC | And at least one error/alternative-path AC | +| No open `[TODO]` markers | Description and ACs are finalised | + +Warn if any invariant fails: +> "User Story US-042 cannot be moved to 'ready': no error-path AC exists. Add at least one +> AC for a failure or edge case before promoting." + +## Deprecate a Feature or Functionality + +Use this workflow when code backing an entity is deleted or a business capability is retired. +Set the relevant fields in the project's Storage Profile format: + +| Field | Value | +|---|---| +| `status` | `deprecated` | +| `deprecated_at` | Date of deprecation | +| `deprecation_reason` | Why it was deprecated | +| `superseded_by` | ID of the replacement entity (if applicable) | + +Rules: +- Always deprecate — never delete entities (preserves audit trail) +- Add `superseded_by` when a replacement entity exists +- Flag any Gherkin scenarios linked to the deprecated entity for `gherkin-living-doc-sync` + +## Update Feature ownership or dependencies + +When a team changes ownership of a Feature, update the `owners` field and set `owner_changed_at` +(date) and `owner_change_reason`. If the Feature has open User Stories, notify the new owner. + +## Descope an AC mid-sprint + +When an AC is moved out of the current sprint but not permanently removed: + +- Add `descoped_at` (date) and `descoped_reason` fields — **do not delete the AC** (preserves audit trail) +- The AC's official lifecycle state remains `Planned` (still required, just deferred) +- Add `future_release` field if the work is planned for a later sprint +- Flag any linked Gherkin scenarios for `@wip` or `@pending` tagging via `gherkin-living-doc-sync` + +``` +AC:US-042-03 (v1.2.0 – Planned) + – Promo codes can be stacked and applied in defined priority order. + – descoped_at: 2026-05-15 + – descoped_reason: Promo stacking rule deferred — too complex for current sprint + – future_release: sprint-52 +``` + +## Routing + +| Request | Correct skill | +|---|---| +| Create a new User Story | `living-doc-create-user-story` | +| Create a new Feature | `living-doc-create-feature` | +| Create a new Functionality | `living-doc-create-functionality` | +| Find gaps in living documentation | `living-doc-gap-finder` | +| Generate Gherkin scenarios from a User Story | `living-doc-scenario-creator` | + +## Output change summary + +After every update, emit a structured change record. For **modified AC text**, show the old and +new values clearly labelled, and list any linked Gherkin scenarios that need re-syncing: + +``` +LIVING DOC UPDATE — 2026-05-15 + Entity: US-042 — Customer applies a promotional discount + Changes: + + Added AC AC:US-042-04 (state: Planned) — Promo code expired returns 422 with error message + ~ Modified AC AC:US-042-01: + OLD: "Payment must complete within 3 seconds under normal load (p99 SLA)" + NEW: "Payment must complete within 2 seconds under normal load (p99 SLA)" + Linked Gherkin scenarios requiring re-sync: + → checkout.feature:41 — Scenario: Payment completes within SLA + Downstream flags: + → Run gherkin-living-doc-sync: changed AC text affects linked scenario wording + → Run living-doc-gap-finder to confirm coverage after update +``` diff --git a/skills/living-doc-update/evals/evals.json b/skills/living-doc-update/evals/evals.json new file mode 100644 index 0000000..8a5958f --- /dev/null +++ b/skills/living-doc-update/evals/evals.json @@ -0,0 +1,157 @@ +{ + "skill_name": "living-doc-update", + "evals": [ + { + "id": 1, + "type": "happy-path", + "description": "Developer adds a new AC to an existing User Story.", + "prompt": "I need to add a new acceptance criterion to US-042 covering the case where a promo code has expired. How do I update the living doc?", + "expected": { + "skill_triggered": true, + "key_guidance": [ + "Assign next sequential AC ID: US-042-AC-", + "Use AC completeness checklist", + "Flag for gherkin-living-doc-sync if linked scenarios need updating", + "Output change summary" + ] + } + }, + { + "id": 2, + "type": "regression", + "description": "Developer promotes a User Story from draft to ready but some invariants are unmet.", + "prompt": "I want to mark US-089 as ready for development, but I'm not sure it's complete.", + "expected": { + "skill_triggered": true, + "key_guidance": [ + "Check all promotion invariants: narrative complete, Feature linked, AC exists, error-path AC exists, no TODO markers", + "Warn for each failing invariant", + "Block promotion until all invariants pass" + ] + } + }, + { + "id": 3, + "type": "regression", + "description": "Developer deprecates a Functionality whose code has been deleted.", + "prompt": "The `LegacyPaymentGatewayService` has been deleted from the codebase. How do I handle this in the living doc?", + "expected": { + "skill_triggered": true, + "key_guidance": [ + "Set status: deprecated — never delete the entity file", + "Add deprecated_at and deprecation_reason fields", + "Link to commit that deleted the code if possible", + "Flag linked Gherkin scenarios for gherkin-living-doc-sync" + ] + } + }, + { + "id": 4, + "type": "regression", + "description": "Developer changes Feature ownership after team restructure.", + "prompt": "The checkout feature is now owned by team-payments-v2 instead of team-checkout. How do I update this in the living doc?", + "expected": { + "skill_triggered": true, + "key_guidance": [ + "Update owners array in Feature JSON", + "Update FEATURE_REGISTRY.md row", + "Add owner_changed_at and owner_change_reason fields", + "Notify new owner if open User Stories exist" + ] + } + }, + { + "id": 5, + "type": "regression", + "description": "Developer modifies an AC description after sprint review.", + "prompt": "After the sprint review, the product owner clarified the wording of US-042-AC-1. How do I update it without breaking traceability?", + "expected": { + "skill_triggered": true, + "key_guidance": [ + "Keep AC ID stable — never change the ID", + "Only update description, given, when, then fields", + "Flag for gherkin-living-doc-sync if linked scenario text needs updating" + ] + } + }, + { + "id": 6, + "type": "negative", + "description": "Developer asks to create a new User Story — should trigger living-doc-create-user-story.", + "prompt": "Create a new User Story for the express checkout flow.", + "expected": { + "skill_triggered": false, + "reason": "Creating new US is handled by living-doc-create-user-story, not living-doc-update" + } + }, + { + "id": 7, + "type": "negative", + "description": "Developer asks to find living doc gaps — should trigger living-doc-gap-finder.", + "prompt": "Which User Stories don't have any linked Gherkin scenarios?", + "expected": { + "skill_triggered": false, + "reason": "Finding gaps is handled by living-doc-gap-finder, not living-doc-update" + } + }, + { + "id": 8, + "type": "paraphrase", + "description": "Same 'add AC' intent phrased as 'update the story' rather than 'add AC'.", + "prompt": "US-089 needs updating — we discovered a new edge case during testing: when the delivery address is outside our shipping zone, the order should be blocked with a clear message. Can you update the story?", + "expected": { + "skill_triggered": true, + "key_guidance": [ + "Assign next sequential AC ID", + "AC format: Given customer with out-of-zone address / When order is placed / Then order is blocked with SHIPPING_ZONE_EXCLUDED error", + "Flag for gherkin-living-doc-sync if linked scenarios need updating", + "Output change summary with the new AC" + ] + } + }, + { + "id": 9, + "type": "edge-case", + "description": "AC status changed to 'descoped' mid-sprint — should not be deleted, only status updated.", + "prompt": "We decided during the sprint to descope US-042-AC-3 — the promo stacking rule is moving to a future release. How do I handle this in the living doc without losing the work?", + "expected": { + "skill_triggered": true, + "key_guidance": [ + "Set the AC status to 'descoped' — do not delete the AC", + "Add descoped_at and descoped_reason fields", + "Add a future_release reference if the work is planned for a later sprint", + "Flag any linked Gherkin scenarios for @wip or @pending tagging via gherkin-living-doc-sync" + ] + } + }, + { + "id": 10, + "type": "output-format", + "prompt": "AC-5 on US-042 was changed from 'payment must complete within 3 seconds' to 'payment must complete within 2 seconds'. Show me what the update output looks like.", + "expected_output": "The response shows a diff-style update. The changed AC entry is presented with the old text labelled OLD and the new text labelled NEW (or using strikethrough markdown). The output also lists which Gherkin scenarios reference this AC (with filename and line number) and need re-syncing. No content beyond the changed AC and its directly linked scenarios is modified.", + "files": [], + "expectations": [ + "Old text clearly labelled (OLD or strikethrough)", + "New text clearly labelled (NEW)", + "Linked Gherkin scenarios listed with filename and line number", + "No content beyond changed AC and linked scenarios is modified" + ] + }, + { + "id": 11, + "type": "file-based", + "description": "Update a living doc file where AC-2 has been changed to tighten the response time SLA.", + "prompt": "Apply this AC change to evals/files/payment-living-doc.md: AC-2 'Payment must complete within 3 seconds' changes to 'Payment must complete within 1 second (p99)'.", + "expected_output": "Agent shows the old AC-2 text (labelled OLD) and the new AC-2 text (labelled NEW or in diff format). It also lists the Gherkin scenario linked to AC-2 in the file that needs to be re-synced. No other ACs or sections of the living doc are changed.", + "files": [ + "evals/files/payment-living-doc.md" + ], + "expectations": [ + "Old AC-2 text shown (labelled OLD or struck-through)", + "New AC-2 text shown (labelled NEW)", + "Linked Gherkin scenario identified for re-sync", + "No other ACs or sections modified" + ] + } + ] +} diff --git a/skills/living-doc-update/evals/files/payment-living-doc.md b/skills/living-doc-update/evals/files/payment-living-doc.md new file mode 100644 index 0000000..ee2f2f8 --- /dev/null +++ b/skills/living-doc-update/evals/files/payment-living-doc.md @@ -0,0 +1,34 @@ +# Payment Flow — Living Documentation +# Feature: FEAT-payment-flow +# User Story: US-021 — Make a payment +# Used by: living-doc-update file-based eval + +## Purpose +Enables a customer to make a payment from their current account to a +beneficiary. Covers intra-bank (same bank) and inter-bank (external) +payments. + +## Acceptance Criteria + +- **AC-1** — The customer must be authenticated (biometric or PIN) before + initiating a payment. + *Linked scenario*: `checkout.feature:23 — Scenario: Authenticated customer initiates payment` + +- **AC-2** — Payment must complete within 3 seconds under normal load + (p99 SLA). + *Linked scenario*: `checkout.feature:41 — Scenario: Payment completes within SLA` + +- **AC-3** — If the payment amount exceeds R 50 000, a second-factor + approval step is required. + *Linked scenario*: `checkout.feature:58 — Scenario: High-value payment requires second-factor approval` + +- **AC-4** — The customer receives an in-app notification within 5 seconds + of a successful payment. + *Linked scenario*: `notifications.feature:12 — Scenario: Customer receives payment confirmation notification` + +## Out of Scope +- Scheduled/recurring payments (covered by FEAT-scheduled-payments) +- Payments to international beneficiaries (not yet implemented) + +## Owner +team-payments diff --git a/skills/living-doc-update/evals/fixture-map.md b/skills/living-doc-update/evals/fixture-map.md new file mode 100644 index 0000000..81e414b --- /dev/null +++ b/skills/living-doc-update/evals/fixture-map.md @@ -0,0 +1,29 @@ +# Living Doc Update — Evals Fixture Map + +| Test ID | Category | Fixture | +|---|---|---| +| 1 | happy-path | *(no file — add AC to existing User Story scenario)* | +| 2 | regression | *(no file — US promotion invariant check scenario)* | +| 3 | regression | *(no file — deprecate deleted Functionality scenario)* | +| 4 | regression | *(no file — Feature ownership change scenario)* | +| 5 | regression | *(no file — modify AC description without breaking traceability)* | +| 6 | negative | *(no file — create US redirect to living-doc-create-user-story)* | +| 7 | negative | *(no file — gap-finding redirect to living-doc-gap-finder)* | + +## Coverage summary + +- happy-path: 1 (add AC to User Story) +- regression: 4 (US promotion check, deprecate Functionality, Feature ownership change, modify AC) +- negative: 2 (create US redirect, gap-finder redirect) + +## Rules exercised + +| Rule | Eval ID | +|---|---| +| Add AC to existing User Story | 1 | +| US promotion invariants check | 2 | +| Deprecate entity — never delete | 3 | +| Feature ownership update in JSON + registry | 4 | +| AC ID stability when modifying description | 5 | +| Out-of-scope: create US → living-doc-create-user-story | 6 | +| Out-of-scope: find gaps → living-doc-gap-finder | 7 | diff --git a/skills/living-doc-update/evals/trigger-eval.json b/skills/living-doc-update/evals/trigger-eval.json new file mode 100644 index 0000000..e3e6294 --- /dev/null +++ b/skills/living-doc-update/evals/trigger-eval.json @@ -0,0 +1,74 @@ +[ + { + "id": "t01-update-us-explicit", + "query": "Update User Story US-042 to add a new AC.", + "should_trigger": true, + "reason": "'update user story' is a listed trigger phrase." + }, + { + "id": "t02-add-ac", + "query": "I need to add an AC to an existing User Story for the expired promo path.", + "should_trigger": true, + "reason": "'add AC to user story' is a listed trigger phrase." + }, + { + "id": "t03-deprecate-functionality", + "query": "How do I deprecate a Functionality in the living doc?", + "should_trigger": true, + "reason": "'deprecate functionality' is a listed trigger phrase." + }, + { + "id": "t04-mark-us-ready", + "query": "I want to mark User Story US-089 as ready for development.", + "should_trigger": true, + "reason": "'mark US ready' is a listed trigger phrase." + }, + { + "id": "t05-change-feature-owner", + "query": "Change the owner of the checkout feature from team-checkout to team-payments-v2.", + "should_trigger": true, + "reason": "'change feature owner' is a listed trigger phrase." + }, + { + "id": "t06-update-functionality", + "query": "Update the living doc Functionality entry for the promo validation module.", + "should_trigger": true, + "reason": "'update functionality' is a listed trigger phrase." + }, + { + "id": "t07-deprecate-feature", + "query": "Mark the legacy payment feature as deprecated in the living doc.", + "should_trigger": true, + "reason": "'mark feature deprecated' is a listed trigger phrase." + }, + { + "id": "t08-update-ac", + "query": "The product owner changed the wording of AC-1 on US-042. How do I update it?", + "should_trigger": true, + "reason": "'update AC' is a listed trigger phrase." + }, + { + "id": "t09-update-feature-registry", + "query": "How do I update the Feature Registry after the team restructure?", + "should_trigger": true, + "reason": "'update feature registry' is a listed trigger phrase." + }, + { + "id": "t10-not-create-us", + "query": "Create a new User Story for the express checkout journey.", + "should_trigger": false, + "reason": "Creating new entities is handled by living-doc-create-user-story." + }, + { + "id": "t11-not-gap-finder", + "query": "Find User Stories with no Gherkin scenario coverage.", + "should_trigger": false, + "reason": "Finding coverage gaps is handled by living-doc-gap-finder." + }, + { + "id": "t12-not-scenario-creator", + "query": "Generate Gherkin scenarios for US-042.", + "should_trigger": false, + "reason": "Generating scenarios is handled by living-doc-scenario-creator." + } +] diff --git a/skills/references/living-doc-glossary.md b/skills/references/living-doc-glossary.md new file mode 100644 index 0000000..3e151e7 --- /dev/null +++ b/skills/references/living-doc-glossary.md @@ -0,0 +1,142 @@ +# Living Documentation — Shared Glossary + +All living-doc-* skills operate on the same canonical entity model. +Use these definitions consistently across all skill invocations. + +--- + +## Core entities + +### User Story (US) + +A business-level requirement expressed from the perspective of a named actor. + +``` +As a , +I can , +so that . +``` + +- ID format: `US-` (e.g. `US-001`) +- Name: short imperative title (e.g. "Customer Login") +- Owns: end-to-end **Acceptance Criteria (AC)** +- Links to: one or more **Features** (system surfaces the User Story touches) +- Status: `planned | active | deprecated` + +### Feature + +A named system surface — the structural layer between User Stories and atomic behaviors. + +- ID format: `FEAT-` (e.g. `FEAT-001`) +- Name: noun phrase identifying the surface (e.g. "Login Page") +- Surface types: + +| Type | Description | Test abstraction | +|---|---|---| +| `UI` | A web page, modal, or named screen | **PageObject** design pattern — class encapsulating selectors and user interactions for one screen. Selector preference: `data-testid` > `aria-label`/role > CSS class. | +| `API` | A REST/GraphQL endpoint or endpoint group. A backend service is documented as an API Feature representing its public contract. | **Annotated endpoint method** — the endpoint method with its API documentation header (OpenAPI annotation, JSDoc, etc.) serves as the living contract anchor. | + +- Owns: one or more **Functionalities** +- Links to: one or more **User Stories** +- Status: `planned | active | deprecated` + +### Functionality (FUNC) + +An atomic, fast-testable behavior — a single verb phrase describing one responsibility. + +- ID format: `FUNC-` (e.g. `FUNC-001`) +- Name: `` (e.g. "Login Page – Validate Password Strength") +- Belongs to: one parent **Feature** +- Owns: **Functionality-level Acceptance Criteria** (atomic input → output statements) +- Status: `planned | active | deprecated` + +Functionalities differ from User Story ACs: they are atomic and fast-testable, not end-to-end. +A single User Story may trigger multiple Functionalities. + +### Acceptance Criterion (AC) + +A binary pass/fail statement that defines a verifiable condition. + +Each AC is: +- **Atomic** — one input condition, one observable outcome +- **Binary** — clear pass/fail; no "usually" or "typically" +- **Single placeholder** — at most ONE `{placeholder}` per AC statement. If two aspects vary independently, write a separate AC for each. + +**AC identifier and state format:** + +``` +AC:- (v) + – + – : value1, value2, ... + – Rationale: ← optional +``` + +State values: `Planned | Implemented | Active | Deprecated` + +Deprecated ACs include a removal note: + +``` +AC:- (v – Deprecated – removal planned v) +``` + +**User Story AC examples — end-to-end, written from the user's perspective:** + +``` +AC:US-001-01 (v1.0.0 – Active) + – The login screen displays {required field}. + – Required field: username input, password input, login button + – Rationale: Accessibility standard — all interactive controls must be visible on load. + +AC:US-001-02 (v1.1.0 – Active) + – An inline field validation message is shown when invalid credentials are submitted. + +AC:US-001-03 (v2.1.0 – Deprecated – removal planned v3.0.0) + – A "Remember me" checkbox retains the session across browser restarts. + – Rationale: Deprecated due to security policy change in v2.0 — persistent sessions no longer permitted. +``` + +**Functionality AC examples — atomic input → output:** + +``` +AC:FUNC-001-01 (v1.0.0 – Active) + – Returns valid=true when the password satisfies all complexity rules. + +AC:FUNC-001-02 (v1.0.0 – Active) + – Raises {error code} when the credential check fails. + – Error code: INVALID_PASSWORD, USER_NOT_FOUND, ACCOUNT_LOCKED + – Rationale: Distinct error codes per failure reason, required by the global auth error contract. + +AC:FUNC-001-03 (v1.0.0 – Active) + – Rejects passwords shorter than 8 characters. +``` + +--- + +## Relationship diagram + +``` +User Story (US) + └── links to → Feature (FEAT) + └── owns → Functionality (FUNC) + └── owns → Functionality ACs + └── can map to → unit/integration tests + └── owns → User Story ACs + └── can map to → BDD Scenarios (.feature files) + └── implemented by → Step Definitions + └── delegates to → Feature test abstractions + └── can map to → API coverage / contract tests +``` + +--- + +## What each skill creates or consumes + +| Skill | Creates | Reads | +|---|---|---| +| `living-doc-create-user-story` | User Story entity | Feature entities | +| `living-doc-create-feature` | Feature entity | User Story entities | +| `living-doc-create-functionality` | Functionality entity | Feature entity | +| `living-doc-pageobject-scan` | Surface wrapper classes + Feature stubs | App URL or test suite | +| `living-doc-scenario-creator` | BDD scenario files (.feature) | User Story entities, Feature test abstractions | +| `living-doc-tutorial-creator` | Tutorial documents | BDD scenario files, User Story entities | +| `living-doc-gap-finder` | Gap report | All of the above | From dcbfcb5524dfe82f3605da3c8ecfd70b7a6e80c3 Mon Sep 17 00:00:00 2001 From: miroslavpojer Date: Fri, 22 May 2026 11:07:12 +0200 Subject: [PATCH 06/35] Update links to living-doc-glossary in various skill documents and agent files --- .github/agents/living-doc-copilot.agent.md | 6 +- README.md | 2 +- docs/living-doc-copilot.md | 6 +- skills/living-doc-create-feature/SKILL.md | 4 +- .../living-doc-create-functionality/SKILL.md | 8 +- skills/living-doc-create-user-story/SKILL.md | 6 +- skills/living-doc-gap-finder/SKILL.md | 79 +++++++++++++++---- .../references/glossary.md | 2 +- skills/living-doc-impact-analysis/SKILL.md | 2 +- skills/living-doc-update/SKILL.md | 2 +- 10 files changed, 84 insertions(+), 33 deletions(-) diff --git a/.github/agents/living-doc-copilot.agent.md b/.github/agents/living-doc-copilot.agent.md index 1c29cf1..8bd1013 100644 --- a/.github/agents/living-doc-copilot.agent.md +++ b/.github/agents/living-doc-copilot.agent.md @@ -46,10 +46,10 @@ Never assume a format. If the answer is incomplete, ask one targeted follow-up b ## Does NOT -- Write Gherkin scenarios or feature files → hand off to `@bdd-copilot` -- Explore or crawl web apps → hand off to `@bdd-copilot` +- Write Gherkin scenarios or feature files → hand off to `@living-doc-bdd-copilot` +- Explore or crawl web apps → hand off to `@living-doc-bdd-copilot` - Write any test code → hand off to `@sdet-copilot` -- Repair PageObject selectors or step definitions → hand off to `@bdd-copilot` +- Repair PageObject selectors or step definitions → hand off to `@living-doc-bdd-copilot` ## AC Metadata diff --git a/README.md b/README.md index 6dca80c..04fbac3 100644 --- a/README.md +++ b/README.md @@ -82,7 +82,7 @@ its purpose, trigger phrases, and full instructions. | **[living-doc-create-functionality](./skills/living-doc-create-functionality/)** | Define an atomic, testable behaviour (Functionality) with AC designed for fast unit or integration tests. | | **[living-doc-update](./skills/living-doc-update/)** | Amend or deprecate existing User Story, Feature, or Functionality entities — add ACs, change status, update ownership. | | **[living-doc-impact-analysis](./skills/living-doc-impact-analysis/)** | Trace which Features, Functionalities, User Stories, and Gherkin scenarios are affected by a code change or PR. | -| **[living-doc-gap-finder](./skills/living-doc-gap-finder/)** | Identify undocumented behaviours, orphan tests, and untested ACs. Shared by `@living-doc-copilot` and `@bdd-copilot`. | +| **[living-doc-gap-finder](./skills/living-doc-gap-finder/)** | Identify undocumented behaviours, orphan tests, and untested ACs. Shared by `@living-doc-copilot` and `@living-doc-bdd-copilot`. | | **[token-saving](./skills/token-saving/)** | Always-active response discipline — enforces brevity, no filler openers or closers, structured output, and a What/Why/How footer on code responses. Suspends on explicit "full detail" requests. | ## Agent Roster diff --git a/docs/living-doc-copilot.md b/docs/living-doc-copilot.md index 8762c0f..0bceb11 100644 --- a/docs/living-doc-copilot.md +++ b/docs/living-doc-copilot.md @@ -68,9 +68,9 @@ Repairs catalog drift. Triggers when the living doc has fallen behind the codeba - Sets `DEPRECATED` state on entities whose code no longer exists - Fixes broken traceability links (US ↔ Feature ↔ Functionality) - Updates `version` fields and removes stale `pre-conditions` -- Does **not** repair PageObject selectors or step definitions → `@bdd-copilot` +- Does **not** repair PageObject selectors or step definitions → `@living-doc-bdd-copilot` -> `@bdd-copilot` is the expected cooperating agent for automation-layer healing. It is deployed separately from this agent — if it is not yet available in your repo, record the automation-layer items as TODO notes for a future BDD session. +> `@living-doc-bdd-copilot` is the expected cooperating agent for automation-layer healing. It is deployed separately from this agent — if it is not yet available in your repo, record the automation-layer items as TODO notes for a future BDD session. ### PLAN mode @@ -102,7 +102,7 @@ Every AC created or updated by this agent carries: | `living-doc-create-functionality` | New atomic, testable behaviour | | `living-doc-update` | Amend or deprecate existing entities | | `living-doc-impact-analysis` | Trace entities affected by a code change | -| `living-doc-gap-finder` | Find undocumented behaviours and orphan tests. **Shared skill** — used top-down here (missing doc entities) and bottom-up by `@bdd-copilot` (scenario coverage gaps against known ACs). | +| `living-doc-gap-finder` | Find undocumented behaviours and orphan tests. **Shared skill** — used top-down here (missing doc entities) and bottom-up by `@living-doc-bdd-copilot` (scenario coverage gaps against known ACs). | --- diff --git a/skills/living-doc-create-feature/SKILL.md b/skills/living-doc-create-feature/SKILL.md index cc9a5ed..bddbd63 100644 --- a/skills/living-doc-create-feature/SKILL.md +++ b/skills/living-doc-create-feature/SKILL.md @@ -19,7 +19,7 @@ compatibility: GitHub Copilot # Living Doc — Create Feature -> **Key concepts:** Feature, Functionality, User Story, AC — see `../references/living-doc-glossary.md`. +> **Key concepts:** Feature, Functionality, User Story, AC — see [living-doc-glossary](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md). ## Step 1 — Identify the system surface @@ -34,7 +34,7 @@ Select the surface type: | `UI` | A web page, modal, or named screen (e.g. Checkout Page, Login Screen) | | `API` | A REST/GraphQL endpoint or endpoint group, including a backend service's public API contract (e.g. Orders API, Payment Gateway API) | -**One surface test abstraction ≈ one Feature** — a UI screen has a PageObject, an API endpoint group has an annotated endpoint method. See `../references/living-doc-glossary.md` for details. +**One surface test abstraction ≈ one Feature** — a UI screen has a PageObject, an API endpoint group has an annotated endpoint method. See [living-doc-glossary](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md) for details. ## Step 2 — Describe purpose and scope diff --git a/skills/living-doc-create-functionality/SKILL.md b/skills/living-doc-create-functionality/SKILL.md index 7a35e95..0fbc163 100644 --- a/skills/living-doc-create-functionality/SKILL.md +++ b/skills/living-doc-create-functionality/SKILL.md @@ -18,7 +18,7 @@ compatibility: GitHub Copilot # Living Doc — Create Functionality -> **Key concepts:** Feature, Functionality, User Story, AC — see `../references/living-doc-glossary.md`. +> **Key concepts:** Feature, Functionality, User Story, AC — see [living-doc-glossary](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md). ## Step 1 — Elicit the behavior @@ -54,7 +54,7 @@ Functionality ACs describe atomic inputs → outputs. They are: - **Fast-testable**: designed for verification by unit or integration test. E2E tests *can* exercise the same behavior, but they are slow and expensive — they belong in a separate system-test tier, not the fast or regression suite. - **Unambiguous**: exact error codes, exact output values where relevant -Use the canonical AC format (see `../references/living-doc-glossary.md`): +Use the canonical AC format (see [living-doc-glossary](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)): ``` AC:FUNC-- (v – Planned) @@ -90,7 +90,7 @@ If contextually distinct despite similar names, create a new Functionality and n ## Step 5 — Output canonical Functionality entity -Output using the project's Storage Profile format (defined per project — see `../../docs/living-doc-copilot.md`). Canonical fields (see `../references/living-doc-glossary.md` for AC format details): +Output using the project's Storage Profile format (defined per project — see `../../docs/living-doc-copilot.md`). Canonical fields (see [living-doc-glossary](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md) for AC format details): | Field | Required | Value | |---|---|---| @@ -99,7 +99,7 @@ Output using the project's Storage Profile format (defined per project — see ` | `name` | Yes | `` (e.g. "Login Page – Validate Password Strength") | | `parent_feature` | Yes | `FEAT-` ID of the owning Feature | | `status` | Yes | `planned` \| `active` \| `deprecated` | -| `acceptance_criteria` | Yes | List of ACs in the format defined in `../references/living-doc-glossary.md` | +| `acceptance_criteria` | Yes | List of ACs in the format defined in [living-doc-glossary](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md) | ## Distinguishing Functionality ACs from User Story ACs diff --git a/skills/living-doc-create-user-story/SKILL.md b/skills/living-doc-create-user-story/SKILL.md index 9494bdf..ba36750 100644 --- a/skills/living-doc-create-user-story/SKILL.md +++ b/skills/living-doc-create-user-story/SKILL.md @@ -18,7 +18,7 @@ compatibility: GitHub Copilot # Living Doc — Create User Story -> **Key concepts:** Feature, Functionality, User Story, AC — see `../references/living-doc-glossary.md`. +> **Key concepts:** Feature, Functionality, User Story, AC — see [living-doc-glossary](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md). ## Step 1 — Elicit the narrative @@ -71,7 +71,7 @@ AC:US-- (v – Planned) – : value1, value2, ... ``` -See full AC format and examples in `../references/living-doc-glossary.md`. +See full AC format and examples in [living-doc-glossary](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md). **Completeness check — always ask:** 1. What happens on the happy path? (at least one AC required) @@ -108,7 +108,7 @@ Output the User Story using the project's Storage Profile format. Canonical fiel | `i_can` | Yes | The capability | | `so_that` | Yes | Business outcome | | `features` | Yes | List of `FEAT-` IDs | -| `acceptance_criteria` | Yes | List of ACs in the format defined in `../references/living-doc-glossary.md` | +| `acceptance_criteria` | Yes | List of ACs in the format defined in [living-doc-glossary](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md) | ## Anti-patterns to flag diff --git a/skills/living-doc-gap-finder/SKILL.md b/skills/living-doc-gap-finder/SKILL.md index 9b73181..2acf8ff 100644 --- a/skills/living-doc-gap-finder/SKILL.md +++ b/skills/living-doc-gap-finder/SKILL.md @@ -19,19 +19,23 @@ compatibility: GitHub Copilot # Living Doc — Gap Finder -> **Key concepts:** Feature, Functionality, User Story, AC, PageObject — see `../references/living-doc-glossary.md`. +> **Key concepts:** Feature, Functionality, User Story, AC, PageObject — see [living-doc-glossary](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md). ## Gap taxonomy -Five types of gaps are detected, in order of risk: +Nine types of gaps are detected, in order of risk: | Priority | Gap type | Description | |---|---|---| | 1 — Blocker | **Untested AC** | An Active or Implemented AC in a User Story or Functionality has no linked test | | 2 — Important | **Undocumented UI surface** | A screen or API endpoint exists in the app with no Feature entity | | 3 — Important | **Orphan Feature** | A Feature entity exists with no linked User Story | -| 4 — Important | **Orphan test** | A test or BDD scenario exists with no linked AC | -| 5 — Nit | **Undocumented Functionality** | A Functionality entity exists with no associated tests | +| 4 — Important | **Orphan User Story** | A User Story exists with no linked Feature | +| 5 — Important | **Orphan Functionality** | A Functionality exists with no parent Feature | +| 6 — Important | **Orphan test** | A test or BDD scenario exists with no linked AC | +| 7 — Important | **Stale reference** | An active test or BDD scenario references a Deprecated AC | +| 8 — Nit | **Undocumented Functionality** | A Functionality entity exists with no associated tests | +| 9 — Nit | **Empty Feature** | A Feature entity exists with no Functionalities defined | ## Workflow @@ -74,24 +78,52 @@ For each item in inventory (screens, API endpoints) **Gap type 3 — Orphan Feature:** ``` For each Feature reachable via entity relationships - where user_stories == [] AND functionalities == []: + where user_stories == []: → GAP: ORPHAN_FEATURE ``` -**Gap type 4 — Orphan test:** +**Gap type 4 — Orphan User Story:** +``` +For each User Story in entity graph + where user_story.features == []: + → GAP: ORPHAN_USER_STORY +``` + +**Gap type 5 — Orphan Functionality:** +``` +For each Functionality in entity graph + where functionality.parent_feature == null: + → GAP: ORPHAN_FUNCTIONALITY +``` + +**Gap type 6 — Orphan test:** ``` For each test in inventory where no linked AC exists in any UserStory or Functionality: → GAP: ORPHAN_TEST ``` -**Gap type 5 — Undocumented Functionality:** +**Gap type 7 — Stale reference:** +``` +For each test in inventory + where linked_ac.status == Deprecated: + → GAP: STALE_REFERENCE +``` + +**Gap type 8 — Undocumented Functionality:** ``` For each Functionality reachable via Feature `functionalities` links where no test references this Functionality's ACs: → GAP: UNDOCUMENTED_FUNCTIONALITY ``` +**Gap type 9 — Empty Feature:** +``` +For each Feature reachable via entity relationships + where functionalities == []: + → GAP: EMPTY_FEATURE +``` + ### Step 4 — Prioritise by risk Sort all gaps by: @@ -107,9 +139,13 @@ For each gap, propose the living doc action: |---|---| | UNTESTED_AC | Create BDD scenario → `living-doc-scenario-creator` | | UNDOCUMENTED_SURFACE | Create Feature entity → `living-doc-create-feature` | -| ORPHAN_FEATURE | Link to a User Story or delete if not used | +| ORPHAN_FEATURE | (1) Confirm the Feature entity actually exists in the storage profile — a broken reference may mean the Feature was renamed or deleted without updating the link. (2) If the Feature exists: link it to an existing User Story or propose creating one. (3) If deletion is the right action: **always confirm with the user before deleting** — state the Feature ID, name, and any Functionalities it owns, and ask explicitly: *"No User Story references FEAT-nnn. Delete this Feature and its N Functionalities?"* | +| ORPHAN_USER_STORY | Link to an existing Feature, or create the missing Feature → `living-doc-create-feature` | +| ORPHAN_FUNCTIONALITY | Link to an existing Feature, or delete if the behavior has no owning surface. Do not delete if tests reference this Functionality's ACs — resolve those first (see ORPHAN_TEST). | | ORPHAN_TEST | Link test to an existing AC, or create a Functionality → `living-doc-create-functionality`. **Never delete a test to resolve an orphan — that would silently remove coverage.** If the linked AC ID no longer exists (broken link), choose from: (1) recreate the AC/Functionality if the behavior is still required; (2) update the link to the merged AC ID if the entity was merged; (3) delete the test only after product owner confirmation that the behavior has been intentionally removed. | +| STALE_REFERENCE | Update the test to reference the active replacement AC. If the deprecated behavior was intentionally removed, delete the test after product owner confirmation. If removed in error, reinstate the AC using `living-doc-update`. | | UNDOCUMENTED_FUNCTIONALITY | Create unit/integration tests for the Functionality's ACs | +| EMPTY_FEATURE | Create Functionalities for the Feature's known behaviors → `living-doc-create-functionality` | > **Out-of-scope actions:** living-doc-gap-finder identifies and proposes new entities — it does > not create them. Direct creation requests (e.g. "create a User Story", "create a Feature") must @@ -163,13 +199,28 @@ A project with 100% documentation coverage has every AC backed by at least one t When the gap inventory is large (e.g. 100+ orphan tests or undocumented features from a legacy codebase), running a single full-codebase gap-finder pass produces an unmanageable report. -Instead: +Use the following two-phase strategy: + +### Phase 1 — Baseline: ensure every User Story has at least one covered AC + +Before addressing any other gap type, guarantee minimum traceability across all User Stories: + +1. List all User Stories where **zero ACs** have a linked test. +2. For each, identify the highest-priority AC (first Active AC, or the first AC if none is Active). +3. Create one test or BDD scenario for that AC → `living-doc-scenario-creator`. +4. Repeat until every User Story has at least one covered AC. + +This phase establishes a baseline coverage floor. Do not skip to Phase 2 until all User Stories +have at least one covered AC. + +### Phase 2 — Depth: address gaps in order of size + +Once the baseline is met, continue by tackling the biggest remaining gaps first: -1. **Batch by domain or Feature area** — process one Feature or service at a time. -2. **Prioritise by business risk** — start with the highest-risk domains first: payment, auth, - security, regulatory compliance. These gaps pose the greatest production risk. -3. **Iterate** — after each batch, link tests, create entities, and re-run gap-finder on that - domain before moving to the next. +1. **Rank gap clusters by count** — group all remaining gaps by type and sort descending by number of affected entities. +2. **Work the largest cluster first** — a cluster of 20 UNTESTED_AC gaps in one domain has more impact than 5 scattered ORPHAN_TEST gaps. +3. **Batch by domain** — within a cluster, process one Feature or service at a time. +4. **Iterate** — after each batch, re-run gap-finder on that domain before moving to the next. Processing everything at once is discouraged because the resulting gap list is too large to action without clear prioritisation. diff --git a/skills/living-doc-gap-finder/references/glossary.md b/skills/living-doc-gap-finder/references/glossary.md index f77bc84..b0f5ab6 100644 --- a/skills/living-doc-gap-finder/references/glossary.md +++ b/skills/living-doc-gap-finder/references/glossary.md @@ -1,7 +1,7 @@ # Living Documentation — Shared Glossary > **This file has moved.** -> The shared glossary is now at [`skills/references/living-doc-glossary.md`](../../references/living-doc-glossary.md). +> The shared glossary is now at [skills/references/living-doc-glossary.md](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md). > Update any links pointing here to use the new path. diff --git a/skills/living-doc-impact-analysis/SKILL.md b/skills/living-doc-impact-analysis/SKILL.md index 2a2836e..057dac4 100644 --- a/skills/living-doc-impact-analysis/SKILL.md +++ b/skills/living-doc-impact-analysis/SKILL.md @@ -19,7 +19,7 @@ compatibility: GitHub Copilot # Living Doc — Impact Analysis -> **Key concepts:** Feature, Functionality, User Story, AC — see `../references/living-doc-glossary.md`. +> **Key concepts:** Feature, Functionality, User Story, AC — see [living-doc-glossary](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md). ## Step 1 — Identify the changed surface area diff --git a/skills/living-doc-update/SKILL.md b/skills/living-doc-update/SKILL.md index 9b3e2c5..e8712ad 100644 --- a/skills/living-doc-update/SKILL.md +++ b/skills/living-doc-update/SKILL.md @@ -19,7 +19,7 @@ compatibility: GitHub Copilot # Living Doc — Update -> **Key concepts:** Feature, Functionality, User Story, AC — see `../references/living-doc-glossary.md`. +> **Key concepts:** Feature, Functionality, User Story, AC — see [living-doc-glossary](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md). ## Identify the entity and change type From 15b5cec35afbf2c5b698024c620014b6e73351a6 Mon Sep 17 00:00:00 2001 From: miroslavpojer Date: Fri, 22 May 2026 13:15:07 +0200 Subject: [PATCH 07/35] Update links to living-doc-glossary in SKILL.md files and remove obsolete glossary file --- skills/living-doc-create-feature/SKILL.md | 4 +- .../living-doc-create-functionality/SKILL.md | 8 +- skills/living-doc-create-user-story/SKILL.md | 6 +- skills/living-doc-gap-finder/SKILL.md | 13 +- .../references/glossary.md | 120 ------------------ skills/living-doc-impact-analysis/SKILL.md | 2 +- skills/living-doc-update/SKILL.md | 2 +- skills/references/living-doc-glossary.md | 26 ++++ 8 files changed, 44 insertions(+), 137 deletions(-) delete mode 100644 skills/living-doc-gap-finder/references/glossary.md diff --git a/skills/living-doc-create-feature/SKILL.md b/skills/living-doc-create-feature/SKILL.md index bddbd63..68ebcdd 100644 --- a/skills/living-doc-create-feature/SKILL.md +++ b/skills/living-doc-create-feature/SKILL.md @@ -19,7 +19,7 @@ compatibility: GitHub Copilot # Living Doc — Create Feature -> **Key concepts:** Feature, Functionality, User Story, AC — see [living-doc-glossary](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md). +> **Key concepts:** Feature, Functionality, User Story, AC — see [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)). ## Step 1 — Identify the system surface @@ -34,7 +34,7 @@ Select the surface type: | `UI` | A web page, modal, or named screen (e.g. Checkout Page, Login Screen) | | `API` | A REST/GraphQL endpoint or endpoint group, including a backend service's public API contract (e.g. Orders API, Payment Gateway API) | -**One surface test abstraction ≈ one Feature** — a UI screen has a PageObject, an API endpoint group has an annotated endpoint method. See [living-doc-glossary](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md) for details. +**One surface test abstraction ≈ one Feature** — a UI screen has a PageObject, an API endpoint group has an annotated endpoint method. See [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)) for details. ## Step 2 — Describe purpose and scope diff --git a/skills/living-doc-create-functionality/SKILL.md b/skills/living-doc-create-functionality/SKILL.md index 0fbc163..6053825 100644 --- a/skills/living-doc-create-functionality/SKILL.md +++ b/skills/living-doc-create-functionality/SKILL.md @@ -18,7 +18,7 @@ compatibility: GitHub Copilot # Living Doc — Create Functionality -> **Key concepts:** Feature, Functionality, User Story, AC — see [living-doc-glossary](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md). +> **Key concepts:** Feature, Functionality, User Story, AC — see [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)). ## Step 1 — Elicit the behavior @@ -54,7 +54,7 @@ Functionality ACs describe atomic inputs → outputs. They are: - **Fast-testable**: designed for verification by unit or integration test. E2E tests *can* exercise the same behavior, but they are slow and expensive — they belong in a separate system-test tier, not the fast or regression suite. - **Unambiguous**: exact error codes, exact output values where relevant -Use the canonical AC format (see [living-doc-glossary](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)): +Use the canonical AC format (see [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md))): ``` AC:FUNC-- (v – Planned) @@ -90,7 +90,7 @@ If contextually distinct despite similar names, create a new Functionality and n ## Step 5 — Output canonical Functionality entity -Output using the project's Storage Profile format (defined per project — see `../../docs/living-doc-copilot.md`). Canonical fields (see [living-doc-glossary](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md) for AC format details): +Output using the project's Storage Profile format (defined per project — see `../../docs/living-doc-copilot.md`). Canonical fields (see [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)) for AC format details): | Field | Required | Value | |---|---|---| @@ -99,7 +99,7 @@ Output using the project's Storage Profile format (defined per project — see ` | `name` | Yes | `` (e.g. "Login Page – Validate Password Strength") | | `parent_feature` | Yes | `FEAT-` ID of the owning Feature | | `status` | Yes | `planned` \| `active` \| `deprecated` | -| `acceptance_criteria` | Yes | List of ACs in the format defined in [living-doc-glossary](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md) | +| `acceptance_criteria` | Yes | List of ACs in the format defined in [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)) | ## Distinguishing Functionality ACs from User Story ACs diff --git a/skills/living-doc-create-user-story/SKILL.md b/skills/living-doc-create-user-story/SKILL.md index ba36750..00027f0 100644 --- a/skills/living-doc-create-user-story/SKILL.md +++ b/skills/living-doc-create-user-story/SKILL.md @@ -18,7 +18,7 @@ compatibility: GitHub Copilot # Living Doc — Create User Story -> **Key concepts:** Feature, Functionality, User Story, AC — see [living-doc-glossary](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md). +> **Key concepts:** Feature, Functionality, User Story, AC — see [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)). ## Step 1 — Elicit the narrative @@ -71,7 +71,7 @@ AC:US-- (v – Planned) – : value1, value2, ... ``` -See full AC format and examples in [living-doc-glossary](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md). +See full AC format and examples in [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)). **Completeness check — always ask:** 1. What happens on the happy path? (at least one AC required) @@ -108,7 +108,7 @@ Output the User Story using the project's Storage Profile format. Canonical fiel | `i_can` | Yes | The capability | | `so_that` | Yes | Business outcome | | `features` | Yes | List of `FEAT-` IDs | -| `acceptance_criteria` | Yes | List of ACs in the format defined in [living-doc-glossary](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md) | +| `acceptance_criteria` | Yes | List of ACs in the format defined in [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)) | ## Anti-patterns to flag diff --git a/skills/living-doc-gap-finder/SKILL.md b/skills/living-doc-gap-finder/SKILL.md index 2acf8ff..e45739b 100644 --- a/skills/living-doc-gap-finder/SKILL.md +++ b/skills/living-doc-gap-finder/SKILL.md @@ -19,7 +19,7 @@ compatibility: GitHub Copilot # Living Doc — Gap Finder -> **Key concepts:** Feature, Functionality, User Story, AC, PageObject — see [living-doc-glossary](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md). +> **Key concepts:** Feature, Functionality, User Story, AC, PageObject — see [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)). ## Gap taxonomy @@ -50,11 +50,12 @@ Output: `inventory.json` — a flat list of discovered artifacts. ### Step 2 — Top-down entity traversal -Traverse the entity graph by following relationship fields: -- All User Stories (with their ACs and status) — the root entry points -- All Features (via User Story `features` links) -- All Functionalities (via Feature `functionalities` links) -- All existing test links (test file → AC mappings) +Traverse the entity graph top-down, starting from User Stories as roots: + +- **User Stories** (root) — load all entities with their ACs and status +- **Features** — for each User Story, follow its `features` list to reach linked Features +- **Functionalities** — for each Feature, follow its `functionalities` list to reach owned Functionalities +- **Test links** — collect all test file → AC mappings for cross-referencing in Step 3 ### Step 3 — Compute gaps diff --git a/skills/living-doc-gap-finder/references/glossary.md b/skills/living-doc-gap-finder/references/glossary.md deleted file mode 100644 index b0f5ab6..0000000 --- a/skills/living-doc-gap-finder/references/glossary.md +++ /dev/null @@ -1,120 +0,0 @@ -# Living Documentation — Shared Glossary - -> **This file has moved.** -> The shared glossary is now at [skills/references/living-doc-glossary.md](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md). -> Update any links pointing here to use the new path. - - ---- - -## Core entities - -### User Story (US) - -A business-level requirement expressed from the perspective of a named actor. - -``` -As a , -I can , -so that . -``` - -- ID format: `US-` (e.g. `US-001`) -- Owns: end-to-end **Acceptance Criteria (AC)** written in Given-When-Then format -- Links to: one or more **Features** (system surfaces the User Story touches) -- Status: `draft | ready | in-progress | done | deprecated` - -### Feature - -A named system surface — the structural layer between User Stories and atomic behaviors. - -- ID format: `FEAT-` (e.g. `FEAT-checkout`) -- Surface types: `UI | API | Service | Module` -- Owns: one or more **Functionalities** -- Linked to: one or more **User Stories** -- Status: `active | candidate | deprecated` -- Registry: all Features are listed in `docs/FEATURE_REGISTRY.md` - -**One PageObject ≈ one Feature** for UI surfaces. - -### Functionality (FUNC) - -An atomic, fast-testable behavior — a single verb phrase describing one responsibility. - -- ID format: `FUNC-` (e.g. `FUNC-apply-discount`) -- Belongs to: one parent **Feature** -- Owns: **Functionality-level Acceptance Criteria** (When/Then format, unit/integration-testable) -- Test type per AC: `unit | integration` -- Priority per AC: `critical | high | medium | low` - -Functionalities differ from User Story ACs: they are atomic and fast-testable (unit/integration), -not end-to-end. A single User Story may trigger multiple Functionalities. - -### Acceptance Criterion (AC) - -A binary pass/fail statement that defines a verifiable condition. - -**User Story AC format (end-to-end):** -``` -Given: -When: -Then: -``` - -**Functionality AC format (atomic/fast):** -``` -When: -Then: -``` - -Each AC has: -- A unique ID: `-AC-` (e.g. `US-001-AC-1`, `FUNC-apply-discount-AC-2`) -- A `priority`: `critical | high | medium | low` -- A `test_type` (Functionality ACs only): `unit | integration` - -### PageObject - -A class that encapsulates the selectors and actions of a single UI screen. Used by BDD step -definitions to interact with the application without embedding selectors in step code. - -- Naming: `Page` (e.g. `CheckoutPage`) -- One PageObject per distinct screen or significant modal -- Selector preference: `data-testid` > `aria-label`/role > CSS class (last resort) - ---- - -## Relationship diagram - -``` -User Story (US) - └── triggers / links to → Feature (FEAT) - └── owns → Functionality (FUNC) - └── owns → Functionality ACs - └── maps to → unit/integration tests - └── owns → User Story ACs (Given/When/Then) - └── maps to → BDD Scenarios (.feature files) - └── implemented by → Step Definitions - └── delegates to → PageObjects -``` - ---- - -## Living doc catalog - -The **living doc catalog** is the collection of all canonical entity JSON files in the project. -Typically stored under `docs/living-doc/` or equivalent. Gap finder, scenario creator, and -tutorial creator all read from this catalog. - ---- - -## What each skill creates or consumes - -| Skill | Creates | Reads | -|---|---|---| -| `living-doc-create-user-story` | User Story JSON | Feature Registry | -| `living-doc-create-feature` | Feature JSON + FEATURE_REGISTRY.md entry | User Story list | -| `living-doc-create-functionality` | Functionality JSON | Feature JSON | -| `living-doc-pageobject-scan` | PageObject classes + Feature stubs | App URL or test suite | -| `living-doc-scenario-creator` | .feature files | User Story, PageObjects | -| `living-doc-tutorial-creator` | Tutorial markdown | .feature files, User Stories | -| `living-doc-gap-finder` | Gap report | All of the above | diff --git a/skills/living-doc-impact-analysis/SKILL.md b/skills/living-doc-impact-analysis/SKILL.md index 057dac4..281bc6f 100644 --- a/skills/living-doc-impact-analysis/SKILL.md +++ b/skills/living-doc-impact-analysis/SKILL.md @@ -19,7 +19,7 @@ compatibility: GitHub Copilot # Living Doc — Impact Analysis -> **Key concepts:** Feature, Functionality, User Story, AC — see [living-doc-glossary](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md). +> **Key concepts:** Feature, Functionality, User Story, AC — see [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)). ## Step 1 — Identify the changed surface area diff --git a/skills/living-doc-update/SKILL.md b/skills/living-doc-update/SKILL.md index e8712ad..4fbef57 100644 --- a/skills/living-doc-update/SKILL.md +++ b/skills/living-doc-update/SKILL.md @@ -19,7 +19,7 @@ compatibility: GitHub Copilot # Living Doc — Update -> **Key concepts:** Feature, Functionality, User Story, AC — see [living-doc-glossary](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md). +> **Key concepts:** Feature, Functionality, User Story, AC — see [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)). ## Identify the entity and change type diff --git a/skills/references/living-doc-glossary.md b/skills/references/living-doc-glossary.md index 3e151e7..da8131a 100644 --- a/skills/references/living-doc-glossary.md +++ b/skills/references/living-doc-glossary.md @@ -22,6 +22,10 @@ so that . - Owns: end-to-end **Acceptance Criteria (AC)** - Links to: one or more **Features** (system surfaces the User Story touches) - Status: `planned | active | deprecated` +- Deprecation metadata (set when `status: deprecated`): + - `deprecated_at` — date the entity was deprecated + - `deprecation_reason` — why it was deprecated + - `superseded_by` — ID of the replacement entity (optional) ### Feature @@ -38,7 +42,15 @@ A named system surface — the structural layer between User Stories and atomic - Owns: one or more **Functionalities** - Links to: one or more **User Stories** +- `owners`: team or person responsible for this Feature - Status: `planned | active | deprecated` +- Deprecation metadata (set when `status: deprecated`): + - `deprecated_at` — date the entity was deprecated + - `deprecation_reason` — why it was deprecated + - `superseded_by` — ID of the replacement entity (optional) +- Ownership change metadata (set when `owners` changes): + - `owner_changed_at` — date of ownership transfer + - `owner_change_reason` — reason for the transfer ### Functionality (FUNC) @@ -49,6 +61,10 @@ An atomic, fast-testable behavior — a single verb phrase describing one respon - Belongs to: one parent **Feature** - Owns: **Functionality-level Acceptance Criteria** (atomic input → output statements) - Status: `planned | active | deprecated` +- Deprecation metadata (set when `status: deprecated`): + - `deprecated_at` — date the entity was deprecated + - `deprecation_reason` — why it was deprecated + - `superseded_by` — ID of the replacement entity (optional) Functionalities differ from User Story ACs: they are atomic and fast-testable, not end-to-end. A single User Story may trigger multiple Functionalities. @@ -79,6 +95,16 @@ Deprecated ACs include a removal note: AC:- (v – Deprecated – removal planned v) ``` +**Descoped ACs** (deferred mid-sprint — state stays `Planned`): + +``` +AC:- (v – Planned) + – + – descoped_at: ← date AC was deferred out of the current sprint + – descoped_reason: + – future_release: ← optional; target sprint or release +``` + **User Story AC examples — end-to-end, written from the user's perspective:** ``` From 9aab9d1b0b06e24a2058120661da1e2de34a53ce Mon Sep 17 00:00:00 2001 From: miroslavpojer Date: Fri, 22 May 2026 13:37:32 +0200 Subject: [PATCH 08/35] Add living doc automation scripts for ID assignment, gap detection, impact tracing, and entity validation - Implemented `next_id.py` for auto-assigning IDs to user stories, features, functionalities, and acceptance criteria. - Created `compute_gaps.py` to analyze a catalog snapshot and identify gaps in documentation and testing coverage. - Developed `trace_impact.py` to trace the impact of code changes on features, functionalities, and user stories based on a catalog. - Added `validate_entity.py` to validate living doc entities against a canonical schema, ensuring required fields and referential integrity. --- .gitignore | 1 + skills/living-doc-create-feature/SKILL.md | 4 + .../scripts/next_id.py | 142 +++++++ .../living-doc-create-functionality/SKILL.md | 5 + .../scripts/next_id.py | 142 +++++++ skills/living-doc-create-user-story/SKILL.md | 5 + .../scripts/next_id.py | 142 +++++++ skills/living-doc-gap-finder/SKILL.md | 22 ++ .../scripts/compute_gaps.py | 320 +++++++++++++++ skills/living-doc-impact-analysis/SKILL.md | 27 ++ .../scripts/trace_impact.py | 366 ++++++++++++++++++ skills/living-doc-update/SKILL.md | 30 +- .../scripts/validate_entity.py | 348 +++++++++++++++++ 13 files changed, 1550 insertions(+), 4 deletions(-) create mode 100644 .gitignore create mode 100644 skills/living-doc-create-feature/scripts/next_id.py create mode 100644 skills/living-doc-create-functionality/scripts/next_id.py create mode 100644 skills/living-doc-create-user-story/scripts/next_id.py create mode 100644 skills/living-doc-gap-finder/scripts/compute_gaps.py create mode 100644 skills/living-doc-impact-analysis/scripts/trace_impact.py create mode 100644 skills/living-doc-update/scripts/validate_entity.py diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..2cceedb --- /dev/null +++ b/.gitignore @@ -0,0 +1 @@ +**/__pycache__/* diff --git a/skills/living-doc-create-feature/SKILL.md b/skills/living-doc-create-feature/SKILL.md index 68ebcdd..dff91c7 100644 --- a/skills/living-doc-create-feature/SKILL.md +++ b/skills/living-doc-create-feature/SKILL.md @@ -74,6 +74,10 @@ FUNC entries), leave the array as `[]` and add a warning: ## Step 6 — Output canonical Feature entity +> **ID assignment:** before assigning a `FEAT-nnn` ID, run +> `python scripts/next_id.py --type FEAT --catalog catalog.json` +> to get the next available ID and avoid collisions. + Output using the project's Storage Profile format. Canonical fields: | Field | Required | Value | diff --git a/skills/living-doc-create-feature/scripts/next_id.py b/skills/living-doc-create-feature/scripts/next_id.py new file mode 100644 index 0000000..e79777b --- /dev/null +++ b/skills/living-doc-create-feature/scripts/next_id.py @@ -0,0 +1,142 @@ +#!/usr/bin/env python3 +""" +next_id.py — Living Doc ID Auto-Assigner + +Scans a living doc catalog and returns the next available ID for a given entity type. +Use this before creating a new entity to avoid ID collisions. + +Usage: + python next_id.py --type US --catalog catalog.json → US-005 + python next_id.py --type FEAT --catalog catalog.json → FEAT-012 + python next_id.py --type FUNC --catalog catalog.json → FUNC-003 + python next_id.py --type AC --parent US-007 --catalog catalog.json → AC:US-007-05 + python next_id.py --type AC --parent FUNC-002 --catalog catalog.json → AC:FUNC-002-03 + +Exits with code 0 and prints the next ID on stdout. +Exits with code 1 and prints an error on stderr if the catalog cannot be read, +the entity type is unknown, or --parent is missing when --type AC is used. + +Catalog JSON must contain one of: + - Top-level keys: "user_stories", "features", "functionalities" + - Or nested under a "catalog" key: {"catalog": {"user_stories": [...], ...}} +""" + +import argparse +import json +import re +import sys + +# Maps entity type token → (catalog collection key, ID regex with capture group for the number) +ENTITY_TYPE_MAP: dict[str, tuple[str, re.Pattern]] = { + "US": ("user_stories", re.compile(r"^US-(\d+)$")), + "FEAT": ("features", re.compile(r"^FEAT-(\d+)$")), + "FUNC": ("functionalities", re.compile(r"^FUNC-(\d+)$")), +} + +# Width of the numeric suffix (zero-padded) +ID_WIDTH = 3 + + +def load_catalog(path: str) -> dict: + with open(path) as f: + raw = json.load(f) + # Support both {"catalog": {...}} and flat {"user_stories": [...]} formats + return raw.get("catalog", raw) + + +def next_entity_id(catalog: dict, entity_type: str) -> str: + """ + Return the next sequential ID for US, FEAT, or FUNC entities. + Scans the matching collection for the highest existing numeric suffix. + """ + if entity_type not in ENTITY_TYPE_MAP: + raise ValueError( + f"Unknown entity type '{entity_type}'. " + f"Must be one of: {sorted(ENTITY_TYPE_MAP)}" + ) + collection_key, pattern = ENTITY_TYPE_MAP[entity_type] + entities: list[dict] = catalog.get(collection_key, []) + + max_num = 0 + for entity in entities: + m = pattern.match(entity.get("id", "")) + if m: + max_num = max(max_num, int(m.group(1))) + + return f"{entity_type}-{max_num + 1:0{ID_WIDTH}d}" + + +def next_ac_id(catalog: dict, parent_id: str) -> str: + """ + Return the next sequential AC ID for a given parent entity (User Story or Functionality). + AC format: AC:- (two-digit zero-padded suffix) + + Scans the parent entity's acceptance_criteria list for the highest existing number. + """ + prefix = parent_id.split("-")[0] + collection_map = {"US": "user_stories", "FUNC": "functionalities"} + collection_key = collection_map.get(prefix) + if not collection_key: + raise ValueError( + f"Cannot determine entity collection for parent '{parent_id}'. " + f"Prefix must be 'US' or 'FUNC'." + ) + + entities: list[dict] = catalog.get(collection_key, []) + parent = next((e for e in entities if e.get("id") == parent_id), None) + if parent is None: + raise ValueError(f"Entity '{parent_id}' not found in catalog") + + ac_pattern = re.compile(rf"^AC:{re.escape(parent_id)}-(\d+)$") + max_num = 0 + for ac in parent.get("acceptance_criteria", []): + m = ac_pattern.match(ac.get("id", "")) + if m: + max_num = max(max_num, int(m.group(1))) + + return f"AC:{parent_id}-{max_num + 1:02d}" + + +def main() -> None: + parser = argparse.ArgumentParser( + description="Return the next available living doc entity ID." + ) + parser.add_argument( + "--type", "-t", + required=True, + choices=["US", "FEAT", "FUNC", "AC"], + help="Entity type to generate an ID for", + ) + parser.add_argument( + "--parent", "-p", + help="Parent entity ID — required when --type is AC (e.g. US-007 or FUNC-002)", + ) + parser.add_argument( + "--catalog", "-c", + required=True, + help="Path to the catalog JSON file", + ) + args = parser.parse_args() + + if args.type == "AC" and not args.parent: + print("Error: --parent is required when --type is AC", file=sys.stderr) + sys.exit(1) + + try: + catalog = load_catalog(args.catalog) + if args.type == "AC": + result = next_ac_id(catalog, args.parent) + else: + result = next_entity_id(catalog, args.type) + except (FileNotFoundError, json.JSONDecodeError) as exc: + print(f"Error reading catalog: {exc}", file=sys.stderr) + sys.exit(1) + except ValueError as exc: + print(f"Error: {exc}", file=sys.stderr) + sys.exit(1) + + print(result) + + +if __name__ == "__main__": + main() diff --git a/skills/living-doc-create-functionality/SKILL.md b/skills/living-doc-create-functionality/SKILL.md index 6053825..fd08c11 100644 --- a/skills/living-doc-create-functionality/SKILL.md +++ b/skills/living-doc-create-functionality/SKILL.md @@ -90,6 +90,11 @@ If contextually distinct despite similar names, create a new Functionality and n ## Step 5 — Output canonical Functionality entity +> **ID assignment:** before assigning a `FUNC-nnn` ID, run +> `python scripts/next_id.py --type FUNC --catalog catalog.json` +> to get the next available ID and avoid collisions. +> For AC IDs, use `--type AC --parent FUNC-` to get the next sequential AC number. + Output using the project's Storage Profile format (defined per project — see `../../docs/living-doc-copilot.md`). Canonical fields (see [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)) for AC format details): | Field | Required | Value | diff --git a/skills/living-doc-create-functionality/scripts/next_id.py b/skills/living-doc-create-functionality/scripts/next_id.py new file mode 100644 index 0000000..e79777b --- /dev/null +++ b/skills/living-doc-create-functionality/scripts/next_id.py @@ -0,0 +1,142 @@ +#!/usr/bin/env python3 +""" +next_id.py — Living Doc ID Auto-Assigner + +Scans a living doc catalog and returns the next available ID for a given entity type. +Use this before creating a new entity to avoid ID collisions. + +Usage: + python next_id.py --type US --catalog catalog.json → US-005 + python next_id.py --type FEAT --catalog catalog.json → FEAT-012 + python next_id.py --type FUNC --catalog catalog.json → FUNC-003 + python next_id.py --type AC --parent US-007 --catalog catalog.json → AC:US-007-05 + python next_id.py --type AC --parent FUNC-002 --catalog catalog.json → AC:FUNC-002-03 + +Exits with code 0 and prints the next ID on stdout. +Exits with code 1 and prints an error on stderr if the catalog cannot be read, +the entity type is unknown, or --parent is missing when --type AC is used. + +Catalog JSON must contain one of: + - Top-level keys: "user_stories", "features", "functionalities" + - Or nested under a "catalog" key: {"catalog": {"user_stories": [...], ...}} +""" + +import argparse +import json +import re +import sys + +# Maps entity type token → (catalog collection key, ID regex with capture group for the number) +ENTITY_TYPE_MAP: dict[str, tuple[str, re.Pattern]] = { + "US": ("user_stories", re.compile(r"^US-(\d+)$")), + "FEAT": ("features", re.compile(r"^FEAT-(\d+)$")), + "FUNC": ("functionalities", re.compile(r"^FUNC-(\d+)$")), +} + +# Width of the numeric suffix (zero-padded) +ID_WIDTH = 3 + + +def load_catalog(path: str) -> dict: + with open(path) as f: + raw = json.load(f) + # Support both {"catalog": {...}} and flat {"user_stories": [...]} formats + return raw.get("catalog", raw) + + +def next_entity_id(catalog: dict, entity_type: str) -> str: + """ + Return the next sequential ID for US, FEAT, or FUNC entities. + Scans the matching collection for the highest existing numeric suffix. + """ + if entity_type not in ENTITY_TYPE_MAP: + raise ValueError( + f"Unknown entity type '{entity_type}'. " + f"Must be one of: {sorted(ENTITY_TYPE_MAP)}" + ) + collection_key, pattern = ENTITY_TYPE_MAP[entity_type] + entities: list[dict] = catalog.get(collection_key, []) + + max_num = 0 + for entity in entities: + m = pattern.match(entity.get("id", "")) + if m: + max_num = max(max_num, int(m.group(1))) + + return f"{entity_type}-{max_num + 1:0{ID_WIDTH}d}" + + +def next_ac_id(catalog: dict, parent_id: str) -> str: + """ + Return the next sequential AC ID for a given parent entity (User Story or Functionality). + AC format: AC:- (two-digit zero-padded suffix) + + Scans the parent entity's acceptance_criteria list for the highest existing number. + """ + prefix = parent_id.split("-")[0] + collection_map = {"US": "user_stories", "FUNC": "functionalities"} + collection_key = collection_map.get(prefix) + if not collection_key: + raise ValueError( + f"Cannot determine entity collection for parent '{parent_id}'. " + f"Prefix must be 'US' or 'FUNC'." + ) + + entities: list[dict] = catalog.get(collection_key, []) + parent = next((e for e in entities if e.get("id") == parent_id), None) + if parent is None: + raise ValueError(f"Entity '{parent_id}' not found in catalog") + + ac_pattern = re.compile(rf"^AC:{re.escape(parent_id)}-(\d+)$") + max_num = 0 + for ac in parent.get("acceptance_criteria", []): + m = ac_pattern.match(ac.get("id", "")) + if m: + max_num = max(max_num, int(m.group(1))) + + return f"AC:{parent_id}-{max_num + 1:02d}" + + +def main() -> None: + parser = argparse.ArgumentParser( + description="Return the next available living doc entity ID." + ) + parser.add_argument( + "--type", "-t", + required=True, + choices=["US", "FEAT", "FUNC", "AC"], + help="Entity type to generate an ID for", + ) + parser.add_argument( + "--parent", "-p", + help="Parent entity ID — required when --type is AC (e.g. US-007 or FUNC-002)", + ) + parser.add_argument( + "--catalog", "-c", + required=True, + help="Path to the catalog JSON file", + ) + args = parser.parse_args() + + if args.type == "AC" and not args.parent: + print("Error: --parent is required when --type is AC", file=sys.stderr) + sys.exit(1) + + try: + catalog = load_catalog(args.catalog) + if args.type == "AC": + result = next_ac_id(catalog, args.parent) + else: + result = next_entity_id(catalog, args.type) + except (FileNotFoundError, json.JSONDecodeError) as exc: + print(f"Error reading catalog: {exc}", file=sys.stderr) + sys.exit(1) + except ValueError as exc: + print(f"Error: {exc}", file=sys.stderr) + sys.exit(1) + + print(result) + + +if __name__ == "__main__": + main() diff --git a/skills/living-doc-create-user-story/SKILL.md b/skills/living-doc-create-user-story/SKILL.md index 00027f0..9a91833 100644 --- a/skills/living-doc-create-user-story/SKILL.md +++ b/skills/living-doc-create-user-story/SKILL.md @@ -90,6 +90,11 @@ Warn if only happy-path ACs are present: ## Step 4 — Validate and output +> **ID assignment:** before assigning a `US-nnn` ID, run +> `python scripts/next_id.py --type US --catalog catalog.json` +> to get the next available ID and avoid collisions. +> For AC IDs, use `--type AC --parent US-` to get the next sequential AC number. + Invariants that must hold before outputting: - At least one AC exists - At least one Feature is linked (or flagged as `[NEW]`) diff --git a/skills/living-doc-create-user-story/scripts/next_id.py b/skills/living-doc-create-user-story/scripts/next_id.py new file mode 100644 index 0000000..e79777b --- /dev/null +++ b/skills/living-doc-create-user-story/scripts/next_id.py @@ -0,0 +1,142 @@ +#!/usr/bin/env python3 +""" +next_id.py — Living Doc ID Auto-Assigner + +Scans a living doc catalog and returns the next available ID for a given entity type. +Use this before creating a new entity to avoid ID collisions. + +Usage: + python next_id.py --type US --catalog catalog.json → US-005 + python next_id.py --type FEAT --catalog catalog.json → FEAT-012 + python next_id.py --type FUNC --catalog catalog.json → FUNC-003 + python next_id.py --type AC --parent US-007 --catalog catalog.json → AC:US-007-05 + python next_id.py --type AC --parent FUNC-002 --catalog catalog.json → AC:FUNC-002-03 + +Exits with code 0 and prints the next ID on stdout. +Exits with code 1 and prints an error on stderr if the catalog cannot be read, +the entity type is unknown, or --parent is missing when --type AC is used. + +Catalog JSON must contain one of: + - Top-level keys: "user_stories", "features", "functionalities" + - Or nested under a "catalog" key: {"catalog": {"user_stories": [...], ...}} +""" + +import argparse +import json +import re +import sys + +# Maps entity type token → (catalog collection key, ID regex with capture group for the number) +ENTITY_TYPE_MAP: dict[str, tuple[str, re.Pattern]] = { + "US": ("user_stories", re.compile(r"^US-(\d+)$")), + "FEAT": ("features", re.compile(r"^FEAT-(\d+)$")), + "FUNC": ("functionalities", re.compile(r"^FUNC-(\d+)$")), +} + +# Width of the numeric suffix (zero-padded) +ID_WIDTH = 3 + + +def load_catalog(path: str) -> dict: + with open(path) as f: + raw = json.load(f) + # Support both {"catalog": {...}} and flat {"user_stories": [...]} formats + return raw.get("catalog", raw) + + +def next_entity_id(catalog: dict, entity_type: str) -> str: + """ + Return the next sequential ID for US, FEAT, or FUNC entities. + Scans the matching collection for the highest existing numeric suffix. + """ + if entity_type not in ENTITY_TYPE_MAP: + raise ValueError( + f"Unknown entity type '{entity_type}'. " + f"Must be one of: {sorted(ENTITY_TYPE_MAP)}" + ) + collection_key, pattern = ENTITY_TYPE_MAP[entity_type] + entities: list[dict] = catalog.get(collection_key, []) + + max_num = 0 + for entity in entities: + m = pattern.match(entity.get("id", "")) + if m: + max_num = max(max_num, int(m.group(1))) + + return f"{entity_type}-{max_num + 1:0{ID_WIDTH}d}" + + +def next_ac_id(catalog: dict, parent_id: str) -> str: + """ + Return the next sequential AC ID for a given parent entity (User Story or Functionality). + AC format: AC:- (two-digit zero-padded suffix) + + Scans the parent entity's acceptance_criteria list for the highest existing number. + """ + prefix = parent_id.split("-")[0] + collection_map = {"US": "user_stories", "FUNC": "functionalities"} + collection_key = collection_map.get(prefix) + if not collection_key: + raise ValueError( + f"Cannot determine entity collection for parent '{parent_id}'. " + f"Prefix must be 'US' or 'FUNC'." + ) + + entities: list[dict] = catalog.get(collection_key, []) + parent = next((e for e in entities if e.get("id") == parent_id), None) + if parent is None: + raise ValueError(f"Entity '{parent_id}' not found in catalog") + + ac_pattern = re.compile(rf"^AC:{re.escape(parent_id)}-(\d+)$") + max_num = 0 + for ac in parent.get("acceptance_criteria", []): + m = ac_pattern.match(ac.get("id", "")) + if m: + max_num = max(max_num, int(m.group(1))) + + return f"AC:{parent_id}-{max_num + 1:02d}" + + +def main() -> None: + parser = argparse.ArgumentParser( + description="Return the next available living doc entity ID." + ) + parser.add_argument( + "--type", "-t", + required=True, + choices=["US", "FEAT", "FUNC", "AC"], + help="Entity type to generate an ID for", + ) + parser.add_argument( + "--parent", "-p", + help="Parent entity ID — required when --type is AC (e.g. US-007 or FUNC-002)", + ) + parser.add_argument( + "--catalog", "-c", + required=True, + help="Path to the catalog JSON file", + ) + args = parser.parse_args() + + if args.type == "AC" and not args.parent: + print("Error: --parent is required when --type is AC", file=sys.stderr) + sys.exit(1) + + try: + catalog = load_catalog(args.catalog) + if args.type == "AC": + result = next_ac_id(catalog, args.parent) + else: + result = next_entity_id(catalog, args.type) + except (FileNotFoundError, json.JSONDecodeError) as exc: + print(f"Error reading catalog: {exc}", file=sys.stderr) + sys.exit(1) + except ValueError as exc: + print(f"Error: {exc}", file=sys.stderr) + sys.exit(1) + + print(result) + + +if __name__ == "__main__": + main() diff --git a/skills/living-doc-gap-finder/SKILL.md b/skills/living-doc-gap-finder/SKILL.md index e45739b..1bf0fc1 100644 --- a/skills/living-doc-gap-finder/SKILL.md +++ b/skills/living-doc-gap-finder/SKILL.md @@ -21,6 +21,28 @@ compatibility: GitHub Copilot > **Key concepts:** Feature, Functionality, User Story, AC, PageObject — see [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)). +## Script — `scripts/compute_gaps.py` + +Run this script to compute all 9 gap types deterministically before producing the gap report. +It takes a catalog snapshot JSON as input and outputs the `gaps[]` array and coverage stats. + +```bash +# Human-readable summary +python scripts/compute_gaps.py catalog-snapshot.json --summary + +# Machine-readable report +python scripts/compute_gaps.py catalog-snapshot.json --output gap-report.json +``` + +The catalog must contain `catalog`, `inventory`, and `known_test_links` sections — +see `evals/files/catalog-snapshot.json` for a worked example. + +Run the script first, then use its output to drive the Prioritise and Propose steps below. +The Workflow section describes the logic the script encodes — read it for understanding, but +delegate the computation to the script rather than reproducing it through reasoning. + +--- + ## Gap taxonomy Nine types of gaps are detected, in order of risk: diff --git a/skills/living-doc-gap-finder/scripts/compute_gaps.py b/skills/living-doc-gap-finder/scripts/compute_gaps.py new file mode 100644 index 0000000..1f634f2 --- /dev/null +++ b/skills/living-doc-gap-finder/scripts/compute_gaps.py @@ -0,0 +1,320 @@ +#!/usr/bin/env python3 +""" +compute_gaps.py — Living Doc Gap Finder + +Runs all 9 gap-detection algorithms over a catalog snapshot JSON and outputs a +structured gap report. The model uses the report to propose remediation actions +rather than computing the traversal itself — ensuring deterministic, token-efficient results. + +Usage: + python compute_gaps.py + python compute_gaps.py --output report.json + python compute_gaps.py --summary + +Input format: see evals/files/catalog-snapshot.json for a worked example. +The catalog JSON must contain: + catalog.user_stories — list of User Story entities + catalog.features — list of Feature entities + catalog.functionalities — list of Functionality entities + inventory.ui_screens — list of discovered UI screen paths (optional) + inventory.api_endpoints — list of discovered API paths (optional) + inventory.test_files — list of {file, linked_ac} test entries (optional) + inventory.bdd_scenarios — list of {scenario, linked_ac} entries (optional) + known_test_links — dict of AC_ID → test path | null + deprecated_acs — list of deprecated AC IDs (optional) +""" + +import argparse +import json +import sys +from datetime import datetime, timezone + +# Gap type → (sort priority, severity label) +PRIORITIES = { + "UNTESTED_AC": (1, "Blocker"), + "UNDOCUMENTED_SURFACE": (2, "Important"), + "ORPHAN_FEATURE": (3, "Important"), + "ORPHAN_USER_STORY": (4, "Important"), + "ORPHAN_FUNCTIONALITY": (5, "Important"), + "ORPHAN_TEST": (6, "Important"), + "STALE_REFERENCE": (7, "Important"), + "UNDOCUMENTED_FUNCTIONALITY": (8, "Nit"), + "EMPTY_FEATURE": (9, "Nit"), +} + + +def load_snapshot(path: str) -> dict: + with open(path) as f: + return json.load(f) + + +def compute_gaps(snapshot: dict) -> list[dict]: + catalog = snapshot.get("catalog", {}) + inventory = snapshot.get("inventory", {}) + known_test_links: dict = snapshot.get("known_test_links", {}) + + user_stories: list[dict] = catalog.get("user_stories", []) + features: list[dict] = catalog.get("features", []) + functionalities: list[dict] = catalog.get("functionalities", []) + + ui_screens: list[str] = inventory.get("ui_screens", []) + api_endpoints: list[str] = inventory.get("api_endpoints", []) + test_files: list[dict] = inventory.get("test_files", []) + bdd_scenarios: list[dict] = inventory.get("bdd_scenarios", []) + + gaps: list[dict] = [] + gap_counter = 0 + + def add_gap(gap_type: str, entity: str, description: str, proposed_action: str) -> None: + nonlocal gap_counter + gap_counter += 1 + priority_order, severity = PRIORITIES[gap_type] + gaps.append( + { + "id": f"GAP-{gap_counter:03d}", + "type": gap_type, + "_priority": priority_order, + "severity": severity, + "entity": entity, + "description": description, + "proposed_action": proposed_action, + } + ) + + # ── Build lookup indices ─────────────────────────────────────────────────── + feature_index = {f["id"]: f for f in features} + us_ids = {us["id"] for us in user_stories} + + # Features referenced by at least one User Story back-link + features_linked_from_us: set[str] = set() + for feat in features: + for us_id in feat.get("user_stories", []): + if us_id in us_ids: + features_linked_from_us.add(feat["id"]) + # Also honour forward links on the User Story entity itself + for us in user_stories: + for feat_id in us.get("features", []): + if feat_id in feature_index: + features_linked_from_us.add(feat_id) + + # Effective Functionality count per Feature (union of forward + back links) + feature_func_counts: dict[str, int] = {f["id"]: 0 for f in features} + for fn in functionalities: + parent = fn.get("parent_feature") + if parent and parent in feature_func_counts: + feature_func_counts[parent] += 1 + for feat in features: + declared = feat.get("functionalities", []) + if declared: + feature_func_counts[feat["id"]] = max(feature_func_counts[feat["id"]], len(declared)) + + # Documented surface name tokens for Gap-2 matching (case-insensitive) + documented_surface_tokens: set[str] = set() + for feat in features: + name = feat.get("name", "").lower() + if name: + documented_surface_tokens.add(name) + for path in feat.get("paths", []): + documented_surface_tokens.add(path.lower()) + + # Deprecated ACs from explicit list + link metadata + deprecated_ac_ids: set[str] = set(snapshot.get("deprecated_acs", [])) + for ac_id, link in known_test_links.items(): + if isinstance(link, dict) and link.get("ac_status") == "deprecated": + deprecated_ac_ids.add(ac_id) + + # ── Gap 1 — UNTESTED_AC ──────────────────────────────────────────────────── + for ac_id, test_link in known_test_links.items(): + if test_link is None: + add_gap( + "UNTESTED_AC", + ac_id, + f"AC '{ac_id}' has no linked test", + "Generate a BDD scenario using living-doc-scenario-creator, " + "or add a unit/integration test and link it to this AC", + ) + + # ── Gap 2 — UNDOCUMENTED_SURFACE ────────────────────────────────────────── + for surface in ui_screens + api_endpoints: + surface_lower = surface.lower().lstrip("/") + matched = any( + surface_lower in token or token in surface_lower + for token in documented_surface_tokens + ) + if not matched: + add_gap( + "UNDOCUMENTED_SURFACE", + surface, + f"Surface '{surface}' exists in the application with no Feature entity", + "Create a Feature entity using living-doc-create-feature", + ) + + # ── Gap 3 — ORPHAN_FEATURE ──────────────────────────────────────────────── + for feat in features: + if not feat.get("user_stories"): + add_gap( + "ORPHAN_FEATURE", + feat["id"], + f"Feature '{feat['id']}' ({feat.get('name', '')}) has no linked User Stories", + "Link to an existing User Story or confirm with the product owner whether to deprecate", + ) + + # ── Gap 4 — ORPHAN_USER_STORY ───────────────────────────────────────────── + us_linked_to_any_feature: set[str] = set() + for feat in features: + for us_id in feat.get("user_stories", []): + us_linked_to_any_feature.add(us_id) + for us in user_stories: + has_forward_link = bool(us.get("features")) + has_back_link = us["id"] in us_linked_to_any_feature + if not has_forward_link and not has_back_link: + add_gap( + "ORPHAN_USER_STORY", + us["id"], + f"User Story '{us['id']}' ({us.get('title', us.get('name', ''))}) has no linked Feature", + "Link to an existing Feature or create the missing Feature using living-doc-create-feature", + ) + + # ── Gap 5 — ORPHAN_FUNCTIONALITY ────────────────────────────────────────── + for fn in functionalities: + if not fn.get("parent_feature"): + add_gap( + "ORPHAN_FUNCTIONALITY", + fn["id"], + f"Functionality '{fn['id']}' has no parent Feature", + "Link to an existing Feature; if no owning surface exists, deprecate this Functionality", + ) + + # ── Gap 6 — ORPHAN_TEST ─────────────────────────────────────────────────── + all_tests = test_files + bdd_scenarios + for test in all_tests: + label = test.get("file") or test.get("scenario", "unknown") + if not test.get("linked_ac"): + add_gap( + "ORPHAN_TEST", + label, + f"Test '{label}' has no linked AC", + "Link to an existing AC, or create a Functionality for the behavior using " + "living-doc-create-functionality. Never delete the test to resolve the gap.", + ) + + # ── Gap 7 — STALE_REFERENCE ─────────────────────────────────────────────── + for test in all_tests: + label = test.get("file") or test.get("scenario", "unknown") + linked_ac = test.get("linked_ac") + if linked_ac and linked_ac in deprecated_ac_ids: + add_gap( + "STALE_REFERENCE", + label, + f"Test '{label}' references deprecated AC '{linked_ac}'", + "Update the test to reference the active replacement AC via gherkin-living-doc-sync; " + "if the behavior was intentionally removed, delete the test after product owner confirmation", + ) + + # ── Gap 8 — UNDOCUMENTED_FUNCTIONALITY ──────────────────────────────────── + for fn in functionalities: + if not fn.get("linked_tests"): + ac_count = fn.get("ac_count", 0) + add_gap( + "UNDOCUMENTED_FUNCTIONALITY", + fn["id"], + f"Functionality '{fn['id']}' has {ac_count} AC(s) with no linked tests", + "Create unit or integration tests for this Functionality's ACs and link them", + ) + + # ── Gap 9 — EMPTY_FEATURE ───────────────────────────────────────────────── + for feat in features: + if feature_func_counts.get(feat["id"], 0) == 0 and not feat.get("functionalities"): + add_gap( + "EMPTY_FEATURE", + feat["id"], + f"Feature '{feat['id']}' ({feat.get('name', '')}) has no Functionalities defined", + "Create Functionalities for known behaviors using living-doc-create-functionality", + ) + + # Sort: priority ascending, then entity ID ascending + gaps.sort(key=lambda g: (g["_priority"], g["entity"])) + for g in gaps: + del g["_priority"] + + return gaps + + +def coverage_stats(snapshot: dict, gaps: list[dict]) -> dict: + known_test_links: dict = snapshot.get("known_test_links", {}) + total = len(known_test_links) + covered = sum(1 for v in known_test_links.values() if v is not None) + return { + "total_acs": total, + "covered_acs": covered, + "coverage_percentage": round(covered / total * 100, 1) if total else 0.0, + } + + +def build_report(snapshot: dict, gaps: list[dict]) -> dict: + stats = coverage_stats(snapshot, gaps) + severity_counts = {"Blocker": 0, "Important": 0, "Nit": 0} + for g in gaps: + severity_counts[g["severity"]] += 1 + return { + "generated_at": datetime.now(timezone.utc).isoformat(), + "documentation_coverage": stats, + "summary": { + "total_gaps": len(gaps), + "blockers": severity_counts["Blocker"], + "important": severity_counts["Important"], + "nits": severity_counts["Nit"], + }, + "gaps": gaps, + } + + +def print_summary(report: dict) -> None: + cov = report["documentation_coverage"] + summ = report["summary"] + print(f"\n=== Living Doc Gap Report — {report['generated_at']} ===") + print( + f"Coverage: {cov['covered_acs']}/{cov['total_acs']} ACs covered " + f"({cov['coverage_percentage']}%)" + ) + print( + f"Gaps: {summ['total_gaps']} total | " + f"{summ['blockers']} Blocker | " + f"{summ['important']} Important | " + f"{summ['nits']} Nit\n" + ) + for gap in report["gaps"]: + print(f" [{gap['severity']:9s}] {gap['id']} {gap['type']}") + print(f" Entity: {gap['entity']}") + print(f" Action: {gap['proposed_action']}\n") + + +def main() -> None: + parser = argparse.ArgumentParser( + description="Compute living doc gaps from a catalog snapshot." + ) + parser.add_argument("snapshot", help="Path to catalog-snapshot.json") + parser.add_argument( + "--output", "-o", help="Write JSON gap report to this file (default: stdout)" + ) + parser.add_argument( + "--summary", "-s", action="store_true", help="Print human-readable summary" + ) + args = parser.parse_args() + + snapshot = load_snapshot(args.snapshot) + gaps = compute_gaps(snapshot) + report = build_report(snapshot, gaps) + + if args.summary: + print_summary(report) + elif args.output: + with open(args.output, "w") as f: + json.dump(report, f, indent=2) + print(f"Gap report written to {args.output}", file=sys.stderr) + else: + print(json.dumps(report, indent=2)) + + +if __name__ == "__main__": + main() diff --git a/skills/living-doc-impact-analysis/SKILL.md b/skills/living-doc-impact-analysis/SKILL.md index 281bc6f..d6c595a 100644 --- a/skills/living-doc-impact-analysis/SKILL.md +++ b/skills/living-doc-impact-analysis/SKILL.md @@ -21,6 +21,33 @@ compatibility: GitHub Copilot > **Key concepts:** Feature, Functionality, User Story, AC — see [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)). +## Script — `scripts/trace_impact.py` + +Run this script to trace changed files to living doc entities before producing the impact map. +The catalog JSON must include a `feature_registry` section mapping path patterns to Feature IDs. + +```bash +# Trace from an explicit file list +python scripts/trace_impact.py --files src/payments/PromoService.java --catalog catalog.json --summary + +# Trace from a unified git diff +python scripts/trace_impact.py --diff changes.diff --catalog catalog.json --output impact.json +``` + +Feature registry format (add to your catalog JSON): +```json +{ + "feature_registry": [ + { "feature_id": "FEAT-001", "paths": ["src/auth/**", "src/security/login*"] } + ] +} +``` + +The script handles Steps 1–2 (file classification and entity traversal). Use its output JSON +to drive Steps 3–5 (impact classification, impact map narrative, and sign-off checklist). + +--- + ## Step 1 — Identify the changed surface area Start from the code change (PR diff, renamed module, deleted endpoint): diff --git a/skills/living-doc-impact-analysis/scripts/trace_impact.py b/skills/living-doc-impact-analysis/scripts/trace_impact.py new file mode 100644 index 0000000..e8199c5 --- /dev/null +++ b/skills/living-doc-impact-analysis/scripts/trace_impact.py @@ -0,0 +1,366 @@ +#!/usr/bin/env python3 +""" +trace_impact.py — Living Doc Impact Tracer + +Given a list of changed files (or a unified diff) and a catalog JSON that includes a +feature_registry, traces which Features, Functionalities, and User Stories are affected +and at what impact level. The model uses the output to produce the narrative impact map +rather than performing the entity traversal through reasoning. + +Usage: + python trace_impact.py --files src/payments/PromoService.java --catalog catalog.json + python trace_impact.py --diff changes.diff --catalog catalog.json + python trace_impact.py --files src/auth/LoginController.java --catalog catalog.json --summary + python trace_impact.py --files src/payments/PromoService.java --catalog catalog.json \ + --output impact.json + +The catalog JSON must include a "feature_registry" section: + { + "feature_registry": [ + { + "feature_id": "FEAT-001", + "paths": ["src/auth/**", "src/security/login*"] + } + ], + "catalog": { ... }, + "known_test_links": { ... } + } + +Path patterns in feature_registry use Unix shell-style wildcards (fnmatch). +""" + +import argparse +import json +import re +import sys +from datetime import datetime, timezone +from fnmatch import fnmatch +from pathlib import Path + + +# ── File classification ──────────────────────────────────────────────────────── + +_DOMAIN_PATTERNS = [ + r".*[Ss]ervice\.(java|py|ts|scala)$", + r".*[Rr]epository\.(java|py|ts|scala)$", + r".*[Dd]omain.*\.(java|py|ts|scala)$", + r".*[Mm]odel\.(java|py|ts|scala)$", + r".*[Uu]se[Cc]ase\.(java|py|ts|scala)$", + r".*[Hh]andler\.(java|py|ts|scala)$", + r".*[Pp]rocessor\.(java|py|ts|scala)$", +] +_API_PATTERNS = [ + r".*[Cc]ontroller\.(java|ts|scala)$", + r".*[Rr]outer?\.(java|py|ts|scala)$", + r".*(openapi|swagger).*\.(yaml|yml|json)$", + r".*[Ee]ndpoint\.(java|py|ts|scala)$", + r".*[Rr]oute\.(java|py|ts|scala)$", + r".*[Rr]esource\.(java|ts|scala)$", +] +_EVENT_PATTERNS = [ + r".*\.(avsc|proto)$", + r".*[Ss]chema\.(json)$", + r".*[Ee]vent.*\.(java|py|ts|scala)$", +] +_UI_PATTERNS = [ + r".*\.(tsx|jsx)$", + r".*[Cc]omponent\.(ts|html)$", + r".*[Pp]age\.(ts|html)$", + r".*[Ff]orm\.(ts|html)$", +] +_CONFIG_PATTERNS = [ + r".*(application|bootstrap)\.(yaml|yml|properties)$", + r".*[Dd]ocker.*", + r".*\.(tf|hcl)$", + r".*(build|Build)\.(gradle|maven|sbt)$", + r".*[Cc]onfig\.(java|py|ts|yaml|yml|json)$", +] +_TEST_PATTERN = re.compile(r"(test|spec|mock|fixture|stub)", re.IGNORECASE) + + +def classify_file(path: str) -> str: + """Return a surface category for the given file path.""" + if _TEST_PATTERN.search(path): + return "test_or_mock" + name = Path(path).name + for pattern in _API_PATTERNS: + if re.match(pattern, name) or re.match(pattern, path): + return "api_contract" + for pattern in _EVENT_PATTERNS: + if re.match(pattern, name) or re.match(pattern, path): + return "event_contract" + for pattern in _UI_PATTERNS: + if re.match(pattern, name) or re.match(pattern, path): + return "ui_component" + for pattern in _CONFIG_PATTERNS: + if re.match(pattern, name) or re.match(pattern, path): + return "configuration" + for pattern in _DOMAIN_PATTERNS: + if re.match(pattern, name) or re.match(pattern, path): + return "domain_logic" + return "domain_logic" # safest default for unrecognised source files + + +_CATEGORY_IMPACT = { + "domain_logic": "High", + "api_contract": "High", + "event_contract": "High", + "ui_component": "Medium", + "configuration": "Low", + "test_or_mock": "None", +} + + +def default_impact_level(category: str) -> str: + return _CATEGORY_IMPACT.get(category, "Medium") + + +# ── Catalog helpers ──────────────────────────────────────────────────────────── + +def load_catalog(path: str) -> dict: + with open(path) as f: + return json.load(f) + + +def parse_diff_files(diff_path: str) -> list[str]: + """Extract changed file paths from a unified diff (git diff format).""" + seen: set[str] = set() + changed: list[str] = [] + with open(diff_path) as f: + for line in f: + if line.startswith("+++ ") or line.startswith("--- "): + raw = line[4:].strip() + if raw == "/dev/null": + continue + # Strip git a/ b/ prefixes + clean = re.sub(r"^[ab]/", "", raw) + if clean not in seen: + seen.add(clean) + changed.append(clean) + return changed + + +# ── Core logic ───────────────────────────────────────────────────────────────── + +def match_features( + changed_files: list[str], feature_registry: list[dict] +) -> dict[str, list[str]]: + """Map each changed file to the Feature IDs whose path patterns it matches.""" + result: dict[str, list[str]] = {} + for file_path in changed_files: + matched: list[str] = [] + for entry in feature_registry: + feat_id = entry.get("feature_id", "") + for pattern in entry.get("paths", []): + if fnmatch(file_path, pattern) or fnmatch(Path(file_path).name, pattern): + if feat_id not in matched: + matched.append(feat_id) + break + result[file_path] = matched + return result + + +def trace_entities(feature_ids: list[str], catalog_data: dict) -> dict: + """ + Walk Feature → Functionality → User Story → AC chains for the given Feature IDs. + Returns lists of affected entities and linked test artefacts. + """ + inner = catalog_data.get("catalog", catalog_data) + features = {f["id"]: f for f in inner.get("features", [])} + functionalities = {fn["id"]: fn for fn in inner.get("functionalities", [])} + user_stories = {us["id"]: us for us in inner.get("user_stories", [])} + known_test_links: dict = catalog_data.get("known_test_links", {}) + + result: dict = { + "features": [], + "functionalities": [], + "user_stories": [], + "acs_requiring_review": [], + "scenarios_requiring_rerun": [], + } + + visited_features: set[str] = set() + visited_funcs: set[str] = set() + visited_us: set[str] = set() + + for feat_id in feature_ids: + if feat_id in visited_features: + continue + visited_features.add(feat_id) + feat = features.get(feat_id) + if not feat: + continue + result["features"].append({"id": feat_id, "name": feat.get("name", "")}) + + # Functionalities + for func_id in feat.get("functionalities", []): + if func_id in visited_funcs: + continue + visited_funcs.add(func_id) + fn = functionalities.get(func_id) + label = fn.get("name", func_id) if fn else func_id + result["functionalities"].append({"id": func_id, "name": label}) + + # User Stories + for us_id in feat.get("user_stories", []): + if us_id in visited_us: + continue + visited_us.add(us_id) + us = user_stories.get(us_id) + if us: + result["user_stories"].append( + {"id": us_id, "title": us.get("title", us.get("name", ""))} + ) + + # Collect ACs and linked scenarios from the matched entities + reviewed_acs: set[str] = set() + rerun_scenarios: set[str] = set() + for ac_id, test_link in known_test_links.items(): + # Match ACs belonging to affected User Stories or Functionalities + owner = ac_id.split("-AC-")[0] if "-AC-" in ac_id else None + if not owner: + # Try prefix match on US-nnn or FUNC-nnn + for entity_list, key in [(result["user_stories"], "id"), (result["functionalities"], "id")]: + if any(ac_id.startswith(e[key]) for e in entity_list): + owner = "matched" + break + if owner: + if ac_id not in reviewed_acs: + reviewed_acs.add(ac_id) + result["acs_requiring_review"].append(ac_id) + if test_link and isinstance(test_link, str) and test_link not in rerun_scenarios: + rerun_scenarios.add(test_link) + result["scenarios_requiring_rerun"].append(test_link) + + return result + + +def build_impact_map( + changed_files: list[str], + file_to_features: dict[str, list[str]], + catalog: dict, +) -> dict: + unmatched = [f for f, feats in file_to_features.items() if not feats] + matched = {f: feats for f, feats in file_to_features.items() if feats} + + all_feature_ids: list[str] = [] + for feats in matched.values(): + for fid in feats: + if fid not in all_feature_ids: + all_feature_ids.append(fid) + + entities = trace_entities(all_feature_ids, catalog) + + surface_area = [] + for file_path in changed_files: + category = classify_file(file_path) + impact_level = default_impact_level(category) + surface_area.append( + { + "file": file_path, + "category": category, + "impact_level": impact_level, + "matched_features": file_to_features.get(file_path, []), + } + ) + + recommended_actions: list[str] = [] + if entities["acs_requiring_review"]: + recommended_actions.append( + "Review and update affected ACs → invoke living-doc-update" + ) + if entities["scenarios_requiring_rerun"]: + recommended_actions.append( + "Re-run linked Gherkin scenarios → invoke test-e2e-standards" + ) + if any(e["category"] in ("domain_logic", "api_contract") for e in surface_area): + recommended_actions.append( + "Sync drifted Gherkin step text → invoke gherkin-living-doc-sync" + ) + if unmatched: + recommended_actions.append( + f"{len(unmatched)} file(s) have no feature_registry entry — flag as High-impact gap " + "and document missing coverage using living-doc-create-functionality" + ) + + return { + "generated_at": datetime.now(timezone.utc).isoformat(), + "surface_area": surface_area, + "affected_entities": entities, + "unmatched_files": unmatched, + "recommended_actions": recommended_actions, + } + + +# ── Output helpers ───────────────────────────────────────────────────────────── + +def print_summary(impact_map: dict) -> None: + print(f"\n=== Living Doc Impact Map — {impact_map['generated_at']} ===") + print("\nSurface area:") + for entry in impact_map["surface_area"]: + feats = ", ".join(entry["matched_features"]) or "UNMATCHED" + print( + f" [{entry['impact_level']:6s}] {entry['file']}" + f" ({entry['category']}) → {feats}" + ) + ent = impact_map["affected_entities"] + if ent["features"]: + print(f"\nAffected Features: {', '.join(f['id'] for f in ent['features'])}") + if ent["functionalities"]: + print(f"Affected Functionalities: {', '.join(f['id'] for f in ent['functionalities'])}") + if ent["user_stories"]: + print(f"Affected User Stories: {', '.join(u['id'] for u in ent['user_stories'])}") + if ent["acs_requiring_review"]: + print(f"\nACs requiring review ({len(ent['acs_requiring_review'])}):") + for ac in ent["acs_requiring_review"]: + print(f" {ac}") + if ent["scenarios_requiring_rerun"]: + print(f"\nScenarios requiring re-run ({len(ent['scenarios_requiring_rerun'])}):") + for s in ent["scenarios_requiring_rerun"]: + print(f" {s}") + if impact_map["unmatched_files"]: + print("\nFiles NOT in feature registry (documentation gaps):") + for f in impact_map["unmatched_files"]: + print(f" ! {f}") + if impact_map["recommended_actions"]: + print("\nRecommended actions:") + for action in impact_map["recommended_actions"]: + print(f" → {action}") + + +# ── Entry point ──────────────────────────────────────────────────────────────── + +def main() -> None: + parser = argparse.ArgumentParser( + description="Trace living doc impact from a set of changed files." + ) + source = parser.add_mutually_exclusive_group(required=True) + source.add_argument("--files", "-f", nargs="+", help="Explicit list of changed file paths") + source.add_argument("--diff", "-d", help="Path to a unified diff file") + parser.add_argument( + "--catalog", "-c", required=True, + help="Path to catalog JSON (must include 'feature_registry')" + ) + parser.add_argument("--output", "-o", help="Write JSON impact map to this file") + parser.add_argument("--summary", "-s", action="store_true", help="Print human-readable summary") + args = parser.parse_args() + + catalog = load_catalog(args.catalog) + changed_files = args.files if args.files else parse_diff_files(args.diff) + feature_registry: list[dict] = catalog.get("feature_registry", []) + + file_to_features = match_features(changed_files, feature_registry) + impact_map = build_impact_map(changed_files, file_to_features, catalog) + + if args.summary: + print_summary(impact_map) + elif args.output: + with open(args.output, "w") as f: + json.dump(impact_map, f, indent=2) + print(f"Impact map written to {args.output}", file=sys.stderr) + else: + print(json.dumps(impact_map, indent=2)) + + +if __name__ == "__main__": + main() diff --git a/skills/living-doc-update/SKILL.md b/skills/living-doc-update/SKILL.md index 4fbef57..3c50a22 100644 --- a/skills/living-doc-update/SKILL.md +++ b/skills/living-doc-update/SKILL.md @@ -32,7 +32,7 @@ Ask: *Which entity is being updated, and what kind of change is this?* | Change status | Any entity | Update `status` field; record the transition event | | Change owner | Feature | Update `owners` field | | Add a linked User Story | Feature | Append to `user_stories` | -| Deprecate an entity | Any entity | Set `status: deprecated`; add `deprecated_at` and `reason` | +| Deprecate an entity | Any entity | Set `status: deprecated`; add `deprecated_at`, `deprecation_reason`, and optionally `superseded_by` | | Delete a Functionality | Functionality | Do not delete — deprecate it and link to the commit that removed the code | ## Update a User Story — add or modify ACs @@ -53,9 +53,9 @@ to linked tests and Gherkin scenarios. Only change the description text or state AC text affects the wording of linked Gherkin steps, flag the linked scenarios for `gherkin-living-doc-sync`. -## Promote a User Story from draft to ready +## Promote a User Story from planned to active -Invariants that must hold before setting `status: ready`: +Invariants that must hold before setting `status: active`: | Check | Requirement | |---|---| @@ -65,7 +65,7 @@ Invariants that must hold before setting `status: ready`: | No open `[TODO]` markers | Description and ACs are finalised | Warn if any invariant fails: -> "User Story US-042 cannot be moved to 'ready': no error-path AC exists. Add at least one +> "User Story US-042 cannot be promoted from 'planned' to 'active': no error-path AC exists. Add at least one > AC for a failure or edge case before promoting." ## Deprecate a Feature or Functionality @@ -117,6 +117,28 @@ AC:US-042-03 (v1.2.0 – Planned) | Find gaps in living documentation | `living-doc-gap-finder` | | Generate Gherkin scenarios from a User Story | `living-doc-scenario-creator` | +## Script — `scripts/validate_entity.py` + +After updating any entity, run this script to validate the result against the canonical schema. +It checks required fields, ID format, status values, AC structure, and (with `--catalog`) +referential integrity against the full catalog. + +```bash +# Validate a single entity file +python scripts/validate_entity.py entity.json + +# Validate with referential integrity checks +python scripts/validate_entity.py entity.json --catalog catalog.json + +# Machine-readable output (exits 1 if any error) +python scripts/validate_entity.py entity.json --json +``` + +Exits 0 if valid (warnings are non-blocking). Exits 1 if any required field is missing, +an ID format is wrong, or a status value is invalid. + +--- + ## Output change summary After every update, emit a structured change record. For **modified AC text**, show the old and diff --git a/skills/living-doc-update/scripts/validate_entity.py b/skills/living-doc-update/scripts/validate_entity.py new file mode 100644 index 0000000..8b5c61a --- /dev/null +++ b/skills/living-doc-update/scripts/validate_entity.py @@ -0,0 +1,348 @@ +#!/usr/bin/env python3 +""" +validate_entity.py — Living Doc Entity Validator + +Validates a living doc entity JSON against the canonical schema from the glossary. +Checks required fields, ID formats, status values, AC structure, and (optionally) +referential integrity against the full catalog. + +Usage: + python validate_entity.py entity.json + python validate_entity.py entity.json --catalog catalog.json + echo '{...}' | python validate_entity.py - + python validate_entity.py entity.json --json # machine-readable output + +Exits with code 0 if no errors (warnings are non-blocking). +Exits with code 1 if any errors exist. +""" + +import argparse +import json +import re +import sys + +# ── Canonical constraints (from living-doc-glossary.md) ─────────────────────── + +VALID_STATUSES = {"planned", "active", "deprecated"} +VALID_SURFACE_TYPES = {"UI", "API"} +VALID_AC_STATUSES = {"Planned", "Active", "Implemented", "Deprecated"} + +ID_PATTERNS: dict[str, re.Pattern] = { + "User Story": re.compile(r"^US-\d{3,}$"), + "Feature": re.compile(r"^FEAT-\d{3,}$"), + "Functionality": re.compile(r"^FUNC-\d{3,}$"), +} + +AC_ID_PATTERN = re.compile(r"^AC:(US|FUNC)-\d+-\d+$") + +REQUIRED_FIELDS: dict[str, list[str]] = { + "User Story": ["id", "name", "status", "features", "acceptance_criteria"], + "Feature": [ + "id", "name", "surface_type", "purpose", "status", + "user_stories", "functionalities", "owners", + ], + "Functionality": ["id", "name", "parent_feature", "status", "acceptance_criteria"], +} + +DEPRECATION_FIELDS = ["deprecated_at", "deprecation_reason"] +VERB_PREFIX_RE = re.compile( + r"^(process|handle|manage|do|perform|run|execute|validate|create|update|delete)\b", + re.IGNORECASE, +) +ERROR_KEYWORDS_RE = re.compile( + r"(error|fail|invalid|not|empty|exceed|deny|unauthori|reject|missing|timeout)", + re.IGNORECASE, +) + + +# ── Validation logic ─────────────────────────────────────────────────────────── + +def validate(entity: dict, catalog: dict | None = None) -> list[dict]: + """ + Validate a single entity dict. + Returns a list of issue dicts: {severity: 'error'|'warning', field: str, message: str}. + """ + issues: list[dict] = [] + + def error(field: str, message: str) -> None: + issues.append({"severity": "error", "field": field, "message": message}) + + def warning(field: str, message: str) -> None: + issues.append({"severity": "warning", "field": field, "message": message}) + + # ── entity_type ────────────────────────────────────────────────────────── + entity_type: str | None = entity.get("entity_type") or entity.get("type") + if not entity_type: + error( + "entity_type", + "Missing 'entity_type'. Must be one of: User Story, Feature, Functionality", + ) + return issues + if entity_type not in REQUIRED_FIELDS: + error( + "entity_type", + f"Unknown entity_type '{entity_type}'. " + f"Must be one of: {list(REQUIRED_FIELDS)}", + ) + return issues + + # ── Required fields ────────────────────────────────────────────────────── + for field in REQUIRED_FIELDS[entity_type]: + value = entity.get(field) + if value is None or value == "": + error(field, f"Required field '{field}' is missing or empty") + + # ── ID format ──────────────────────────────────────────────────────────── + entity_id: str = entity.get("id", "") + id_pattern = ID_PATTERNS.get(entity_type) + if entity_id and id_pattern and not id_pattern.match(entity_id): + example = {"User Story": "US-001", "Feature": "FEAT-001", "Functionality": "FUNC-001"} + error( + "id", + f"ID '{entity_id}' does not match expected format " + f"(e.g. {example.get(entity_type, 'XXX-001')})", + ) + + # ── Status ─────────────────────────────────────────────────────────────── + status: str = entity.get("status", "") + if status and status not in VALID_STATUSES: + error("status", f"Invalid status '{status}'. Must be one of: {VALID_STATUSES}") + + # ── Deprecation metadata ───────────────────────────────────────────────── + if status == "deprecated": + for dep_field in DEPRECATION_FIELDS: + if not entity.get(dep_field): + warning( + dep_field, + f"Deprecated entity is missing '{dep_field}' — " + "deprecation metadata is required for audit trail", + ) + + # ── Feature-specific ───────────────────────────────────────────────────── + if entity_type == "Feature": + surface_type: str = entity.get("surface_type", "") + if surface_type and surface_type not in VALID_SURFACE_TYPES: + error( + "surface_type", + f"Invalid surface_type '{surface_type}'. Must be one of: {VALID_SURFACE_TYPES}", + ) + if isinstance(entity.get("owners"), list) and not entity["owners"]: + warning("owners", "Feature has no owners — assign a team or individual") + if isinstance(entity.get("user_stories"), list) and not entity["user_stories"]: + warning( + "user_stories", + "Feature has no linked User Stories — " + "orphan Features appear in gap-finder reports", + ) + purpose: str = entity.get("purpose", "") + if purpose and len(purpose.split()) < 5: + warning( + "purpose", + "Purpose statement is very short — " + "describe the business value in 1-2 sentences", + ) + name: str = entity.get("name", "") + if name and VERB_PREFIX_RE.match(name): + warning( + "name", + f"Feature name '{name}' looks like a verb phrase — " + "Feature names should be nouns (e.g. 'Login Page', 'Orders API')", + ) + + # ── User Story-specific ────────────────────────────────────────────────── + if entity_type == "User Story": + if isinstance(entity.get("features"), list) and not entity["features"]: + warning( + "features", + "User Story has no linked Features — " + "orphan User Stories appear in gap-finder reports", + ) + acs: list[dict] = entity.get("acceptance_criteria") or [] + if not acs: + error("acceptance_criteria", "User Story must have at least one Acceptance Criterion") + else: + has_error_path = any( + ERROR_KEYWORDS_RE.search(ac.get("description", "")) for ac in acs + ) + if not has_error_path: + warning( + "acceptance_criteria", + "No error-path or alternative-flow AC found — " + "add at least one failure case", + ) + for i, ac in enumerate(acs): + _validate_ac(ac, entity_id, i, error, warning) + + # ── Functionality-specific ─────────────────────────────────────────────── + if entity_type == "Functionality": + name = entity.get("name", "") + if name and " – " not in name and " - " not in name: + warning( + "name", + "Functionality name should follow the pattern " + "'' " + "(e.g. 'Login Page – Validate Password Strength')", + ) + parent: str = entity.get("parent_feature", "") + if parent and not re.match(r"^FEAT-", parent): + error( + "parent_feature", + f"parent_feature '{parent}' does not look like a valid Feature ID (expected FEAT-nnn)", + ) + acs = entity.get("acceptance_criteria") or [] + if not acs: + warning( + "acceptance_criteria", + "Functionality has no Acceptance Criteria — " + "atomic behaviors should define at least one AC", + ) + for i, ac in enumerate(acs): + _validate_ac(ac, entity_id, i, error, warning) + + # ── Referential integrity ──────────────────────────────────────────────── + if catalog is not None: + _validate_references(entity, entity_type, catalog, warning) + + return issues + + +def _validate_ac( + ac: dict, + parent_id: str, + index: int, + error_fn, + warning_fn, +) -> None: + """Validate a single Acceptance Criterion entry.""" + field_prefix = f"acceptance_criteria[{index}]" + ac_id: str = ac.get("id", "") + if not ac_id: + error_fn(f"{field_prefix}.id", "AC is missing an 'id' field") + elif not AC_ID_PATTERN.match(ac_id): + warning_fn( + f"{field_prefix}.id", + f"AC ID '{ac_id}' does not match expected format " + f"AC:{parent_id}-nn (e.g. AC:{parent_id}-01)", + ) + if not ac.get("description"): + error_fn(f"{field_prefix}.description", "AC is missing a 'description'") + ac_status = ac.get("status", "") + if ac_status and ac_status not in VALID_AC_STATUSES: + warning_fn( + f"{field_prefix}.status", + f"Unrecognised AC status '{ac_status}'. " + f"Expected one of: {sorted(VALID_AC_STATUSES)}", + ) + + +def _validate_references( + entity: dict, entity_type: str, catalog: dict, warning_fn +) -> None: + """Check that referenced IDs exist in the catalog (referential integrity).""" + inner = catalog.get("catalog", catalog) + feature_ids = {f["id"] for f in inner.get("features", [])} + us_ids = {us["id"] for us in inner.get("user_stories", [])} + func_ids = {fn["id"] for fn in inner.get("functionalities", [])} + + if entity_type == "User Story": + for fid in entity.get("features", []): + if fid not in feature_ids: + warning_fn("features", f"Referenced Feature '{fid}' not found in catalog") + + if entity_type == "Feature": + for us_id in entity.get("user_stories", []): + if us_id not in us_ids: + warning_fn("user_stories", f"Referenced User Story '{us_id}' not found in catalog") + for fid in entity.get("functionalities", []): + if fid not in func_ids: + warning_fn("functionalities", f"Referenced Functionality '{fid}' not found in catalog") + + if entity_type == "Functionality": + parent = entity.get("parent_feature") + if parent and parent not in feature_ids: + warning_fn( + "parent_feature", + f"Parent Feature '{parent}' not found in catalog", + ) + + +# ── Output formatting ────────────────────────────────────────────────────────── + +def format_report(entity_id: str, issues: list[dict]) -> str: + if not issues: + return f"✓ {entity_id} — validation passed (no issues found)" + error_count = sum(1 for i in issues if i["severity"] == "error") + warning_count = sum(1 for i in issues if i["severity"] == "warning") + lines = [ + f"Validation report for {entity_id}:", + f" {error_count} error(s) {warning_count} warning(s)", + "", + ] + for issue in issues: + prefix = "✗" if issue["severity"] == "error" else "⚠" + lines.append(f" {prefix} [{issue['severity'].upper():7s}] {issue['field']}") + lines.append(f" {issue['message']}") + return "\n".join(lines) + + +# ── Entry point ──────────────────────────────────────────────────────────────── + +def main() -> None: + parser = argparse.ArgumentParser( + description="Validate a living doc entity against the canonical schema." + ) + parser.add_argument( + "entity", + help="Path to entity JSON file, or '-' to read from stdin", + ) + parser.add_argument( + "--catalog", "-c", + help="Path to catalog JSON — enables referential integrity checks", + ) + parser.add_argument( + "--json", "-j", + action="store_true", + help="Output validation results as JSON instead of human-readable text", + ) + args = parser.parse_args() + + try: + if args.entity == "-": + entity = json.load(sys.stdin) + else: + with open(args.entity) as f: + entity = json.load(f) + except (FileNotFoundError, json.JSONDecodeError) as exc: + print(f"Error reading entity: {exc}", file=sys.stderr) + sys.exit(1) + + catalog: dict | None = None + if args.catalog: + try: + with open(args.catalog) as f: + catalog = json.load(f) + except (FileNotFoundError, json.JSONDecodeError) as exc: + print(f"Error reading catalog: {exc}", file=sys.stderr) + sys.exit(1) + + issues = validate(entity, catalog) + entity_id: str = entity.get("id", "unknown") + has_errors = any(i["severity"] == "error" for i in issues) + + if args.json: + result = { + "entity_id": entity_id, + "valid": not has_errors, + "error_count": sum(1 for i in issues if i["severity"] == "error"), + "warning_count": sum(1 for i in issues if i["severity"] == "warning"), + "issues": issues, + } + print(json.dumps(result, indent=2)) + else: + print(format_report(entity_id, issues)) + + sys.exit(1 if has_errors else 0) + + +if __name__ == "__main__": + main() From c172270dcc5a62786d729a708d4413d59f22c225 Mon Sep 17 00:00:00 2001 From: miroslavpojer Date: Fri, 22 May 2026 15:20:37 +0200 Subject: [PATCH 09/35] feat: add script to scan .feature files for AC link compliance - Implemented `scan_ac_links.py` to check for missing or malformed AC link headers in Gherkin scenarios. - Validates AC ID format and checks for duplicates within the same feature file. docs: create gherkin-scenario skill documentation - Added `SKILL.md` for `gherkin-scenario` detailing standards for writing BDD scenarios in Gherkin. - Covers traceability requirements, language use, and anti-pattern avoidance. docs: create gherkin-step skill documentation - Added `SKILL.md` for `gherkin-step` outlining best practices for implementing Gherkin step definitions. - Emphasizes keeping steps thin, encapsulating selectors, and sharing state correctly. docs: create living-doc-pageobject-scan skill documentation - Added `SKILL.md` for `living-doc-pageobject-scan` detailing how to generate and maintain PageObject classes. - Describes modes for creating and maintaining PageObjects, including selector preferences and output artifacts. feat: add manifest diff script for PageObject validation - Implemented `manifest_diff.py` to compare the manifest against PageObject files on disk. - Identifies stale manifest entries and undocumented PageObjects. docs: create living-doc-scenario-creator skill documentation - Added `SKILL.md` for `living-doc-scenario-creator` detailing the process of generating BDD scenarios from User Stories. - Includes workflow steps for mapping acceptance criteria to scenarios and identifying missing steps. feat: add coverage report script for AC tracking - Implemented `coverage_report.py` to generate a report on AC coverage by Gherkin scenarios. - Scans feature files for AC links and compares them against User Stories to identify gaps. --- .../agents/living-doc-bdd-copilot.agent.md | 328 ++++++++ CONTRIBUTING.md | 4 +- README.md | 3 +- docs/README.md | 8 +- docs/guides/living-doc-bdd-copilot.md | 184 +++++ docs/{ => guides}/living-doc-copilot.md | 2 +- docs/{ => guides}/token-saving.md | 2 +- docs/testing/agent-testing.md | 231 ++++++ docs/{ => testing}/skill-testing.md | 0 roadmap.md | 733 ++++++++++++++++++ skills/gherkin-living-doc-sync/SKILL.md | 136 ++++ .../scripts/scan_ac_links.py | 138 ++++ skills/gherkin-scenario/SKILL.md | 143 ++++ skills/gherkin-step/SKILL.md | 156 ++++ .../living-doc-create-functionality/SKILL.md | 2 +- skills/living-doc-gap-finder/SKILL.md | 17 +- skills/living-doc-pageobject-scan/SKILL.md | 182 +++++ .../scripts/manifest_diff.py | 131 ++++ skills/living-doc-scenario-creator/SKILL.md | 162 ++++ .../scripts/coverage_report.py | 173 +++++ 20 files changed, 2724 insertions(+), 11 deletions(-) create mode 100644 .github/agents/living-doc-bdd-copilot.agent.md create mode 100644 docs/guides/living-doc-bdd-copilot.md rename docs/{ => guides}/living-doc-copilot.md (98%) rename docs/{ => guides}/token-saving.md (96%) create mode 100644 docs/testing/agent-testing.md rename docs/{ => testing}/skill-testing.md (100%) create mode 100644 roadmap.md create mode 100644 skills/gherkin-living-doc-sync/SKILL.md create mode 100644 skills/gherkin-living-doc-sync/scripts/scan_ac_links.py create mode 100644 skills/gherkin-scenario/SKILL.md create mode 100644 skills/gherkin-step/SKILL.md create mode 100644 skills/living-doc-pageobject-scan/SKILL.md create mode 100644 skills/living-doc-pageobject-scan/scripts/manifest_diff.py create mode 100644 skills/living-doc-scenario-creator/SKILL.md create mode 100644 skills/living-doc-scenario-creator/scripts/coverage_report.py diff --git a/.github/agents/living-doc-bdd-copilot.agent.md b/.github/agents/living-doc-bdd-copilot.agent.md new file mode 100644 index 0000000..0764ee2 --- /dev/null +++ b/.github/agents/living-doc-bdd-copilot.agent.md @@ -0,0 +1,328 @@ +--- +description: > + Bridge living documentation to executable tests. Explore web apps via MCP Playwright, + generate and maintain PageObjects, Gherkin scenarios, and step definitions. + Covers webapp exploration with Business Seed assembly, iterative UI crawling, scenario + generation from User Story ACs, and BDD suite maintenance (RE-SCAN, HEALING, REMOVE). + Triggers: "scan webapp", + "generate pageobjects", "heal pageobjects", "generate scenarios", "sync gherkin", + "playwright crawl", "explore the app", "bdd copilot", "living doc bdd copilot", + "BDD pipeline", "crawl the UI", "create page objects", "generate feature file", + "scenario coverage", "step definitions", "gherkin from user story". +tools: + - read_file + - replace_string_in_file + - create_file + - grep_search + - file_search + - semantic_search + - run_in_terminal + - mcp_microsoft_pla_browser_navigate + - mcp_microsoft_pla_browser_snapshot + - mcp_microsoft_pla_browser_click + - mcp_microsoft_pla_browser_fill_form + - mcp_microsoft_pla_browser_take_screenshot + - mcp_microsoft_pla_browser_type + - mcp_microsoft_pla_browser_wait_for +--- + +# @living-doc-bdd-copilot + +Automation layer agent. Explores web apps, generates PageObjects, produces Gherkin scenarios and step definitions, and maintains the BDD automation suite. Does not create living documentation catalog entities — that belongs to `@living-doc-copilot`. + +--- + +## Business Seed Assembly + +Before crawling, assemble the Business Seed file at `.copilot/bdd/seed.yaml`. + +Sources A–E — collect from whichever are available: + +| Source | Behaviour | +|---|---| +| **A — Living doc catalog** | Extract Feature names, US titles, and AC texts. Map each Feature to its primary URL/route if known. | +| **B — Sitemap or route config** | Parse route definitions (Angular router, React Router, `sitemap.xml`) to enumerate URL paths. | +| **C — OpenAPI / Swagger spec** | Extract endpoint paths; map REST resources to UI screens where obvious. | +| **D — Existing PageObjects** | Load current `.copilot/bdd/manifest.json` if present — treat known surfaces as already discovered. | +| **E — Guided traversal** | See Source E protocol below. | + +**Credential safety rule:** Never store literal credentials in `seed.yaml`. Always use `env:VAR_NAME` as the value, e.g.: + +```yaml +credentials: + username: env:BDD_USERNAME + password: env:BDD_PASSWORD +``` + +**Artifact location:** BDD artifacts can live anywhere in the repository. On session start, discover them: + +1. Search for `seed.yaml` containing a `base_url:` key. +2. Search for `manifest.json` containing an array with `pageobject_path` entries. +3. If found, load both files and record their paths for this session. +4. If NOT found, create them at a sensible location (e.g. alongside the existing living doc catalog directory if one exists, otherwise `.copilot/bdd/`). +5. **On first discovery:** propose adding their locations to `.github/copilot-instructions.md` so every future agent session can load them without searching: + +```markdown +## BDD Artifacts +- **Business Seed:** `/seed.yaml` — webapp routes, credentials (env refs), guided traversal steps +- **Exploration Manifest:** `/manifest.json` — discovered UI surfaces, component IDs, PageObject paths +``` + +Committing both files means every subsequent session resumes from the last known state — no re-crawl required. + +**Output artifact:** `seed.yaml` (path discovered or chosen above) + +```yaml +base_url: https://... +credentials: + username: env:BDD_USERNAME + password: env:BDD_PASSWORD +known_routes: + - path: /login + feature: Authentication + - path: /dashboard + feature: Dashboard +guided_steps: [] # populated during Source E traversal +``` + +--- + +## Iterative Exploration + +**On session start:** Load `seed.yaml`. If `.copilot/bdd/manifest.json` is present, load it — treat all listed surfaces as already discovered and resume from there. If manifest is absent, treat this as the first run (clean slate). + +**Partial state rule:** `seed.yaml` present but `manifest.json` absent = first exploration run. Begin crawl from `base_url`; do not assume any surfaces have been discovered. + +**Crawl loop:** + +1. Navigate to each known route from `seed.yaml` using MCP Playwright. +2. Snapshot the page; identify interactive elements, forms, navigation links, and significant UI surfaces. +3. Follow links and expand navigation to discover new routes not in the manifest. +4. For each new surface discovered: add an entry to `manifest.json` (Feature name, URL, component IDs, PageObject path). +5. Repeat until coverage plateau — no new surfaces found in the last full iteration. +6. Report any unreachable areas — auth walls, dead links, CAPTCHA gates, or forms that cannot be progressed due to missing business knowledge (unknown valid input values, business-specific field formats, required lookup codes, conditional field logic). Offer to enrich `seed.yaml` with missing routes, credentials, or form values, then loop. + +**Output artifact:** `.copilot/bdd/manifest.json` + +```json +[ + { + "feature": "Authentication", + "url": "/login", + "component_ids": ["login-form", "username-input", "password-input", "submit-btn"], + "pageobject_path": "tests/pageobjects/LoginPage.ts" + } +] +``` + +--- + +## Source E — Guided Traversal Protocol + +Use when automated crawling cannot proceed — unknown decision points, multi-step wizards, auth flows, role-gated screens, or forms blocked by missing business knowledge (required field values, valid lookup codes, business-specific input formats). + +**Protocol:** + +1. Take a screenshot; show the user what the agent sees. +2. Ask: *"I've reached a decision point at [URL]. What should I do next? (e.g. click X, fill field Y with Z, log in as role R, provide the valid value for field F)"* +3. Wait for the user's answer. Execute the described action via MCP Playwright. +4. Immediately append to `guided_steps:` in `seed.yaml`: + +```yaml +guided_steps: + - url: /checkout/payment + action: fill + field: card-number + value: env:TEST_CARD_NUMBER + note: "Test Visa card for payment flow" +``` + +5. Continue crawl from the new state. + +**CAPTCHA rule:** If a CAPTCHA is encountered, pause and ask the user to solve it manually in the browser. Do not attempt automated bypass. Once the user confirms it is solved, continue and record the step with `action: captcha_solved`. + +--- + +## Scenario Generation + +After exploration completes (manifest is up to date): + +1. Use the `living-doc-gap-finder` skill (bottom-up mode) to identify User Stories with `ACTIVE` ACs that have no linked Gherkin scenario. +2. For each gap: load the `living-doc-scenario-creator` skill and generate Gherkin scenario skeletons — one scenario per AC, with the mandatory `# AC:` traceability tag. +3. Write `.feature` files under the project's feature directory. +4. For each generated scenario, resolve step definitions: + a. **Narrow the search scope to the page first** — identify which PageObject the scenario's steps will interact with. Look in step definition files that already import or reference that PageObject; these are the most likely candidates for reuse. + b. **Match by purpose, not just pattern** — read the step's implementation body to confirm it performs the same business action (e.g. a `fill` on `username-input` vs a `fill` on `search-input` look identical in text but serve different purposes). Only reuse if purpose matches. + c. If a purpose-matching step exists, reuse it as-is; note which library file it lives in. + d. Only if no match exists: write a new stub using the `gherkin-step` skill; extend the relevant PageObject where a new UI interaction is needed. +5. Update `manifest.json` to record any new PageObject paths created. + +**Gap detection logic:** An AC is considered uncovered if no scenario in any `.feature` file carries the AC's traceability tag (`# AC: `). + +--- + +## Maintenance + +### RE-SCAN mode + +**Trigger:** New feature shipped, UI refactored, or significant route changes. + +**Scope:** Full re-run of every path recorded in `manifest.json`, plus active discovery of new routes not yet in the manifest. + +1. Reload `seed.yaml` and `manifest.json`. +2. For every existing manifest entry: navigate to its URL, snapshot the DOM, and validate that every recorded `component_id` locator still resolves. Flag any locator that no longer matches as `stale`. +3. **Actively discover new routes from each visited page** — do not limit discovery to routes already in `seed.yaml`. On each page snapshot: + - Find all `` links that resolve to new paths not yet in the manifest. + - Find all buttons and interactive components whose purpose suggests navigation to a new screen (e.g. "Create order", "View details", "Go to settings") — click them and record the resulting URL. + - Find tab panels, side-nav items, and wizard steps that expose sub-routes. + - Any new URL discovered this way is a candidate manifest entry; add it and crawl it recursively. +4. Add new surfaces to `manifest.json`; mark removed or stale-locator surfaces as `deprecated`. +5. Update PageObjects for any locators flagged as stale in step 2. +6. Generate new scenarios for newly discovered ACs (Scenario Generation logic). + +### HEALING mode + +**Trigger:** Test suite failures due to selector drift, broken step definitions, or PageObject mismatches. + +**Scope:** Failing tests only — do not touch passing tests or unrelated PageObjects. + +1. Receive or discover the list of failing test names / scenario titles. +2. Trace each failure back to its PageObject and step definition. +3. Navigate to the affected page via MCP Playwright; snapshot the current DOM. +4. Find updated element IDs or selectors; update the affected PageObject(s) accordingly. +5. Verify the step definition binding still resolves; fix if broken. +6. Re-run only the previously failing tests to confirm healing. Do not re-run the full suite. + +### REMOVE mode + +**Trigger:** Feature deprecated or deleted from the product. + +**Scope:** Only files linked to the removed entity — do not touch other Features, PageObjects, or step definitions. + +1. Identify the specific Feature/US/AC being removed. +2. Find all `.feature` files whose scenarios carry a `# AC:` tag matching the removed entity's IDs. +3. Find PageObjects referenced only by those scenarios; find step definitions used only by those scenarios. +4. Confirm the full deletion list with the user before touching any file. +5. Remove confirmed files; update `manifest.json` to remove the deprecated entry. +6. Flag linked US/AC entities in the living doc catalog as candidates for deprecation — hand off to `@living-doc-copilot`. + +--- + +## Scope + +- Load Business Seed (`seed.yaml`) and Exploration Manifest (`manifest.json`) before crawling +- Crawl web app via MCP Playwright using manifest-guided navigation +- Fill forms and traverse wizards using business-supplied test values from `seed.yaml` +- Identify Features from discovered UI surfaces and map them to the living doc catalog +- Detect scenario gaps — existing Gherkin scenarios vs User Story ACs +- Generate Gherkin scenarios from User Story ACs +- Write and extend step definitions +- Heal PageObjects after UI changes (selector drift detection via MCP Playwright) +- Challenge US/AC validity when observed app behaviour has diverged from documented ACs +- Sync Gherkin feature files with living documentation traceability links + +--- + +## Does NOT + +- Create living doc catalog entities (User Stories, Features, Functionalities) → `@living-doc-copilot` +- Write unit or integration tests → `@sdet-copilot` +- Run language-specific quality gates → `@quality-gate-copilot` +- Heal the catalog layer (AC states, traceability links, entity deprecation) → `@living-doc-copilot` + +--- + +## Shared skill note — `living-doc-gap-finder` + +`living-doc-gap-finder` is a shared skill used differently by each agent: + +- **`@living-doc-copilot`** uses it **top-down**: discovering missing documentation entities (Features, US, Functionalities not yet in the catalog). +- **`@living-doc-bdd-copilot`** uses it **bottom-up**: detecting scenario coverage gaps — ACs that exist in the catalog but have no linked Gherkin scenario. + +Load the skill with this distinction in mind. The bottom-up usage is the default context for this agent. + +--- + +## Living Doc Compatibility + +This agent adheres to the canonical living doc entity model. Full definitions are in [living-doc-glossary](../../skills/references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)). + +### Entity IDs + +| Entity | Format | Example | +|---|---|---| +| User Story | `US-` | `US-001` | +| Feature | `FEAT-` | `FEAT-001` | +| Functionality | `FUNC-` | `FUNC-001` | + +### AC format + +Every Acceptance Criterion reference must follow: + +``` +AC:- (v) + – +``` + +State values: `Planned | Implemented | Active | Deprecated` + +### Gherkin traceability tag + +Every `Scenario:` or `Scenario Outline:` must carry a link comment on the line immediately above it: + +```gherkin +# AC: US-001-01 (v1.0.0 – Active) — Customer places an order with a saved payment method +Scenario: Customer successfully places an order +``` + +The AC link is the single source of traceability between a scenario and the living doc catalog. Never delete or rewrite the `# AC:` comment without updating the catalog entity. + +### Feature surface types + +The glossary defines two surface types that determine the test abstraction: + +| Surface | Test abstraction | Selector preference | +|---|---|---| +| `UI` — web page, modal, screen | **PageObject** class — one class per screen | `data-testid` > `aria-label`/role > CSS class (last resort) | +| `API` — REST/GraphQL endpoint | Annotated endpoint method — OpenAPI/JSDoc header as living contract anchor | N/A | + +This agent generates PageObjects only for `UI` Features. API Feature coverage belongs in the contract test layer. + +### AC rules + +- **Atomic** — one input condition, one observable outcome per AC +- **Binary** — clear pass/fail; no "usually" or "typically" +- **Single placeholder** — at most ONE `{placeholder}` per AC statement; if two aspects vary independently, write a separate AC for each + +### Entity status + +`planned | active | deprecated` — only ACs with `active` or `implemented` state should drive scenario generation. Deprecated ACs require `deprecated_at`, `deprecation_reason`, and optionally `superseded_by`. + +--- + +## Skills + +| Skill | Intent | Path | +|---|---|---| +| `living-doc-pageobject-scan` | Discover, create, and maintain PageObject classes from a live webapp | `skills/living-doc-pageobject-scan/SKILL.md` | +| `living-doc-scenario-creator` | Generate Gherkin scenario skeletons from User Story ACs | `skills/living-doc-scenario-creator/SKILL.md` | +| `living-doc-gap-finder` | Find ACs with no linked Gherkin scenario (bottom-up usage) | `skills/living-doc-gap-finder/SKILL.md` | +| `gherkin-scenario` | Write BDD Gherkin scenarios in plain business language | `skills/gherkin-scenario/SKILL.md` | +| `gherkin-step` | Implement Gherkin step definitions — clean, reusable, maintainable | `skills/gherkin-step/SKILL.md` | +| `gherkin-living-doc-sync` | Synchronise feature files and scenarios with the living doc catalog | `skills/gherkin-living-doc-sync/SKILL.md` | + +--- + +## Handoff + +**Inbound — from `@living-doc-copilot`:** +Receives a confirmed list of User Stories with `ACTIVE` ACs. Use this as the input for scenario generation. + +**Inbound — from exploration (manifest complete):** +When the manifest is complete and new surfaces have been identified, hand the Feature list to `@living-doc-copilot`: + +> "Surfaces mapped. Call @living-doc-copilot to document them." + +**Outbound — after scenario generation:** + +> "Feature files and steps generated. Call @sdet-copilot for unit tests." diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 66e1e74..3be003b 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -38,7 +38,7 @@ skills/ | `scripts/` | Deterministic or repetitive logic better run as code than described in prose (e.g. a validation script, a formatter, a data transformer) | | `references/` | Domain docs, API specs, decision tables, or anything too large to keep in `SKILL.md` without exceeding 500 lines | | `assets/` | Template files, example inputs/outputs, icons — anything the skill produces or consumes | -| `evals/` | Test prompts and assertions to verify skill behavior and trigger accuracy. See [skill-testing.md](./docs/skill-testing.md) | +| `evals/` | Test prompts and assertions to verify skill behavior and trigger accuracy. See [skill-testing.md](./docs/testing/skill-testing.md) | --- @@ -201,7 +201,7 @@ If every test run of your skill independently writes the same helper script (a f Before proposing a PR, verify that your skill activates correctly and produces good output. The full testing methodology — eval creation, fixture management, with/without comparisons, trigger testing, and description optimization using the Anthropic [`skill-creator`](https://github.com/anthropics/skills/tree/main/skills/skill-creator) -skill — is covered in **[docs/skill-testing.md](./docs/skill-testing.md)**. +skill — is covered in **[docs/testing/skill-testing.md](./docs/testing/skill-testing.md)**. --- diff --git a/README.md b/README.md index 04fbac3..b8754d8 100644 --- a/README.md +++ b/README.md @@ -92,6 +92,7 @@ Agents are pre-configured AI personas that orchestrate multiple skills for a spe | Agent | Description | |---|---| | **[@living-doc-copilot](./.github/agents/living-doc-copilot.agent.md)** | Creates and maintains the living documentation catalog: User Stories, Features, Functionalities, AC updates, impact analysis, gap finding. | +| **[@living-doc-bdd-copilot](./.github/agents/living-doc-bdd-copilot.agent.md)** | Automation layer: explores web apps via MCP Playwright, generates PageObjects and Gherkin scenarios, writes step definitions, and maintains the BDD suite across RE-SCAN, HEALING, and REMOVE phases. | ## Finding More Skills @@ -108,7 +109,7 @@ Before building a new skill, check whether one already exists: ## Contributing See **[CONTRIBUTING.md](./CONTRIBUTING.md)** for the skill authoring guide — folder layout, frontmatter schema, writing -effective descriptions and bodies, [testing](./docs/skill-testing.md), and the PR checklist. +effective descriptions and bodies, [testing](./docs/testing/skill-testing.md), and the PR checklist. To propose a new skill — or to propose expanding the repo into agents, MCP servers, or plugins — [open an issue](https://github.com/AbsaOSS/agentic-toolkit/issues/new). diff --git a/docs/README.md b/docs/README.md index f95207b..85a459b 100644 --- a/docs/README.md +++ b/docs/README.md @@ -14,20 +14,22 @@ Navigation hub for all guides in this repository. Browse by category below. |-----------------------------------------|-------------------------------------------------------------------------------------| | [Getting Started](./getting-started.md) | What skills are, how to install them, Copilot CLI usage | | [Contributing](../CONTRIBUTING.md) | Skill folder layout, frontmatter, description writing, body guidelines, PR process | -| [Skill Testing](./skill-testing.md) | Eval creation, fixtures, regression loops, trigger and description optimization | +| [Skill Testing](./testing/skill-testing.md) | Eval creation, fixtures, regression loops, trigger and description optimization | +| [Agent Testing](./testing/agent-testing.md) | Eval creation, trigger accuracy tuning, and body quality testing for `.agent.md` files | | [Troubleshooting](./troubleshooting.md) | Setup fixes for install, activation, and proxy issues | ## Skill Guides | Guide | Description | |-------------------------------------|------------------------------------------------------------------------------------| -| [Token Saving](./token-saving.md) | Keeping AI responses concise — how the token-saving skill works and when it applies | +| [Token Saving](./guides/token-saving.md) | Keeping AI responses concise — how the token-saving skill works and when it applies | ## Agent Guides | Guide | Description | |-----------------------------------------------|------------------------------------------------------------------------------------------| -| [Living Doc Copilot](./living-doc-copilot.md) | How the living-doc-copilot agent works, its scope, modes, and how to trigger it | +| [Living Doc Copilot](./guides/living-doc-copilot.md) | How the living-doc-copilot agent works, its scope, modes, and how to trigger it | +| [Living Doc BDD Copilot](./guides/living-doc-bdd-copilot.md) | How the living-doc-bdd-copilot agent works: web app exploration, PageObjects, Gherkin generation, and BDD maintenance phases | > **Keep this index up to date.** When you add a new guide, add a row to the appropriate table above. diff --git a/docs/guides/living-doc-bdd-copilot.md b/docs/guides/living-doc-bdd-copilot.md new file mode 100644 index 0000000..1dc19ce --- /dev/null +++ b/docs/guides/living-doc-bdd-copilot.md @@ -0,0 +1,184 @@ +# Living Doc BDD Copilot Agent + +`@living-doc-bdd-copilot` is the automation layer agent. It explores web applications, generates PageObjects, produces Gherkin scenarios and step definitions, and maintains the BDD automation suite across the full engineering pipeline. + +--- + +## What it does + +| Task | When to use | +|---|---| +| Explore a web app | Crawl and map UI surfaces; discover Features from the live application | +| Generate PageObjects | Create or update PageObject classes from discovered UI surfaces | +| Generate Gherkin scenarios | Cover User Story ACs with `.feature` files and linked step definitions | +| Sync Gherkin with living doc | Ensure traceability tags in feature files match catalog ACs | +| Heal automation after UI changes | Fix broken selectors, step definitions, and PageObjects (failing tests only) | +| Re-scan after refactor | Full re-crawl of all manifest paths plus active discovery of new routes; update scenarios | +| Remove deprecated feature automation | Clean up `.feature` files, steps, and PageObjects for removed features | +| Generate tutorial documents | Transform executed BDD scenarios into annotated walkthrough documents | + +--- + +## How to trigger it + +``` +scan webapp +generate pageobjects for the login screen +explore the app at https://... +generate scenarios for US-42 +heal pageobjects +sync gherkin to living doc +crawl the UI +living doc bdd copilot +BDD pipeline +create page objects +generate feature file from user story +``` + +--- + +## Before you start — setup files + +The agent uses two persistent files: + +| File | Purpose | +|---|---| +| `seed.yaml` | Business Seed — base URL, credentials (env refs), known routes, guided traversal steps | +| `manifest.json` | Exploration Manifest — all discovered surfaces with Feature name, URL, component IDs, and PageObject path | + +These files can live anywhere in the repository. On each session start, the agent searches for them automatically: + +1. Searches for `seed.yaml` containing a `base_url:` key. +2. Searches for `manifest.json` containing an array with `pageobject_path` entries. +3. If found, loads both files and resumes from the last known state — no re-crawl needed. +4. If not found, creates them at a sensible location (alongside your existing living doc catalog directory, or `.copilot/bdd/` if no catalog is present). + +**On first discovery**, the agent will propose adding the file paths to `.github/copilot-instructions.md` so every future session can load them without searching: + +```markdown +## BDD Artifacts +- **Business Seed:** `/seed.yaml` +- **Exploration Manifest:** `/manifest.json` +``` + +**Credential safety:** Credentials in `seed.yaml` must always use `env:VAR_NAME` — never literal values. + +```yaml +base_url: https://your-app.example.com +credentials: + username: env:BDD_USERNAME + password: env:BDD_PASSWORD +known_routes: + - path: /login + feature: Authentication + - path: /dashboard + feature: Dashboard +guided_steps: [] # populated during guided traversal +``` + +--- + +## Pipeline + +### Business Seed assembly + +Collects sources A–E to build `seed.yaml`: + +| Source | What the agent collects | +|---|---| +| A — Living doc catalog | Feature names, US titles, AC texts, route mappings | +| B — Sitemap / route config | URL paths from Angular router, React Router, or `sitemap.xml` | +| C — OpenAPI / Swagger spec | REST endpoint paths, mapped to UI screens where obvious | +| D — Existing PageObjects | Already-discovered surfaces from a previous manifest run | +| E — Guided traversal | Steps recorded live as the agent pauses to ask the user at decision points | + +### Iterative exploration + +The agent navigates the live application via MCP Playwright, snapshots pages, identifies UI surfaces, and builds `manifest.json`. Exploration continues until a coverage plateau — no new surfaces in the last full iteration. + +If the agent hits an auth wall, multi-step wizard, CAPTCHA, or a form it cannot progress due to missing business knowledge (unknown valid input values, required lookup codes, business-specific field formats): + +- It takes a screenshot and describes what it sees. +- It asks you what to do next. +- CAPTCHA: it pauses and waits for you to solve it manually in the browser. +- All guided steps are recorded in `seed.yaml` under `guided_steps:` for future re-runs. + +### Scenario generation + +After exploration: + +1. Uses `living-doc-gap-finder` (bottom-up mode) to find `ACTIVE` ACs with no linked Gherkin scenario. +2. Generates `.feature` files with `Given/When/Then` scenarios — one scenario per AC, each with a `# AC:` traceability tag. +3. For each new step, checks for an existing reusable definition: first narrows scope to the relevant PageObject, then confirms the step's purpose matches (not just its text pattern). Reuses if it matches; writes a new stub only if no match exists. +4. Extends the relevant PageObject with any new UI interactions required by the new stubs. + +### Maintenance + +| Mode | When | What the agent does | +|---|---|---| +| **RE-SCAN** | New feature shipped or UI refactored | Full re-crawl of every manifest path plus active discovery of new routes (links, buttons, tabs, wizard steps); updates manifest; generates new scenarios for new ACs | +| **HEALING** | Tests failing due to selector drift | Scoped to failing tests only — navigates affected pages; identifies updated selectors; repairs PageObjects and step bindings; re-runs only the previously failing tests to confirm | +| **REMOVE** | Feature deprecated or deleted | Identifies linked `.feature` files, steps, and PageObjects; confirms before deleting; hands catalog deprecation to `@living-doc-copilot` | + +--- + +## Shared skill — `living-doc-gap-finder` + +`living-doc-gap-finder` is shared between two agents but used in opposite directions: + +| Agent | Direction | What it finds | +|---|---|---| +| `@living-doc-copilot` | **Top-down** | Missing documentation entities (Features, User Stories, Functionalities not yet in the catalog) | +| `@living-doc-bdd-copilot` | **Bottom-up** | ACs that exist in the catalog but have no linked Gherkin scenario | + +When this agent loads `living-doc-gap-finder`, it uses the **bottom-up** (scenario coverage) mode. + +--- + +## Skills used + +| Skill | Purpose | +|---|---| +| `living-doc-pageobject-scan` | Discover, create, and maintain PageObject classes | +| `living-doc-scenario-creator` | Generate Gherkin scenario skeletons from User Story ACs | +| `living-doc-gap-finder` | Find ACs with no linked scenario (bottom-up, scenario coverage) | +| `gherkin-scenario` | Write BDD Gherkin scenarios in plain business language | +| `gherkin-step` | Implement step definitions — clean, reusable, maintainable | +| `gherkin-living-doc-sync` | Sync feature files and scenarios with living doc traceability links | + +--- + +## Handoff + +**Inbound — from `@living-doc-copilot`:** +Receives confirmed User Stories with `ACTIVE` ACs. Generates scenarios and steps. + +**Outbound — after exploration (surfaces mapped):** + +> "Surfaces mapped. Call @living-doc-copilot to document them." + +**Outbound — after scenario generation (feature files generated):** + +> "Feature files and steps generated. Call @sdet-copilot for unit tests." + +--- + +## Agent boundaries + +| Concern | Owner | +|---|---| +| Living doc catalog entities (US, Feature, Functionality) | `@living-doc-copilot` | +| AC states, traceability links, entity deprecation | `@living-doc-copilot` | +| Web app exploration, PageObjects, Gherkin, step definitions | `@living-doc-bdd-copilot` (this agent) | +| Unit and integration tests | `@sdet-copilot` | +| CI quality gates and linting | `@quality-gate-copilot` | + +--- + +## Installation + +```bash +npx skills add https://github.com/AbsaOSS/agentic-toolkit -g +``` + +See [Getting Started](../getting-started.md) for the full install guide. diff --git a/docs/living-doc-copilot.md b/docs/guides/living-doc-copilot.md similarity index 98% rename from docs/living-doc-copilot.md rename to docs/guides/living-doc-copilot.md index 0bceb11..4c0a615 100644 --- a/docs/living-doc-copilot.md +++ b/docs/guides/living-doc-copilot.md @@ -122,4 +122,4 @@ Every AC created or updated by this agent carries: npx skills add https://github.com/AbsaOSS/agentic-toolkit -g ``` -See [Getting Started](./getting-started.md) for the full install guide. +See [Getting Started](../getting-started.md) for the full install guide. diff --git a/docs/token-saving.md b/docs/guides/token-saving.md similarity index 96% rename from docs/token-saving.md rename to docs/guides/token-saving.md index 22a6a60..2617fe0 100644 --- a/docs/token-saving.md +++ b/docs/guides/token-saving.md @@ -68,4 +68,4 @@ To install only this skill: npx skills add https://github.com/AbsaOSS/agentic-toolkit -g --skill token-saving ``` -See [Getting Started](./getting-started.md) for the full install guide. +See [Getting Started](../getting-started.md) for the full install guide. diff --git a/docs/testing/agent-testing.md b/docs/testing/agent-testing.md new file mode 100644 index 0000000..928992f --- /dev/null +++ b/docs/testing/agent-testing.md @@ -0,0 +1,231 @@ +# Agent Testing Guide + +This document describes how to test, evaluate, and tune `.agent.md` files — specifically how to use `agent-customization` (for structural edits) together with `skill-creator`'s eval methodology (for description trigger accuracy). This is the practical equivalent of [skill-testing.md](./skill-testing.md) applied to agents. + +--- + +## Why agent testing is different from skill testing + +| Dimension | Skill | Agent | +|---|---|---| +| Trigger mechanism | `description:` field in SKILL.md YAML | `description:` field in `.agent.md` YAML | +| Body loaded when? | When skill is activated by description match | When user addresses `@agent-name` or description matches | +| What to tune | Description trigger keywords + body instructions | Description trigger keywords + body sections (scope, handoff, maintenance modes) | +| Tool for structural edits | `skill-creator` | `agent-customization` | +| Tool for eval loop | `skill-creator` (fully supported) | `skill-creator` (description eval loop applies directly) | + +The key insight: an agent's `description:` block is read by the same matching mechanism as a skill's `description:`. Everything `skill-creator` does to optimize skill descriptions applies 1-for-1 to agent descriptions. + +--- + +## 1. Recommended workflow + +1. Create trigger eval cases in `.github/agents/evals//trigger-eval.json` +2. Create body eval cases in `.github/agents/evals//evals.json` +3. Start a Copilot Chat session from the repository root +4. Ask Copilot to use the `skill-creator` skill, pointing it at the agent's eval files +5. Review trigger accuracy and output quality +6. Use `agent-customization` to edit structural sections (tools list, scope, handoff, modes) +7. Re-run evals; repeat until stable + +--- + +## 2. File layout + +``` +.github/ + agents/ + my-agent.agent.md ← agent definition + evals/ + my-agent/ + trigger-eval.json ← which prompts should (and should not) invoke the agent + evals.json ← body behavior tests + files/ ← fixture files referenced by evals +``` + +--- + +## 3. Trigger eval format + +Mirrors the skill trigger-eval format exactly. Store at `.github/agents/evals//trigger-eval.json`: + +```json +{ + "agent_name": "my-agent", + "evals": [ + { + "id": "should-trigger-1", + "prompt": "scan this webapp and generate pageobjects", + "should_trigger": true + }, + { + "id": "should-trigger-2", + "prompt": "explore the app and create page objects for the login screen", + "should_trigger": true + }, + { + "id": "should-not-trigger-1", + "prompt": "create a user story for the login feature", + "should_trigger": false, + "expected_agent": "living-doc-copilot" + }, + { + "id": "should-not-trigger-2", + "prompt": "write a unit test for the login validator", + "should_trigger": false + } + ] +} +``` + +Write at least **5 should-trigger** and **5 should-not-trigger** cases. Should-not-trigger cases are as important as the positive ones — they catch over-broad descriptions that shadow other agents. + +--- + +## 4. Body eval format + +Store at `.github/agents/evals//evals.json`. Same schema as skill evals: + +```json +{ + "agent_name": "my-agent", + "evals": [ + { + "id": "business-seed-assembly", + "prompt": "I want to set up BDD automation for our app at https://app.example.com. The Angular router is at src/app/app-routing.module.ts.", + "expected_output": "Agent assembles seed.yaml from the router file, proposes base_url, lists known_routes, confirms credential env var names before crawling.", + "files": ["src/app/app-routing.module.ts"] + }, + { + "id": "re-scan-stale-locator", + "prompt": "RE-SCAN — the checkout page was redesigned.", + "expected_output": "Agent loads manifest.json, navigates to /checkout, validates component_id locators, flags stale ones, updates PageObject selectors. Does NOT touch unrelated pages.", + "files": ["manifest.json"] + }, + { + "id": "healing-scope", + "prompt": "HEALING — these 3 scenarios are failing: LoginPage submit, CheckoutPage confirm, DashboardPage filter.", + "expected_output": "Agent scopes work to those 3 failing tests only. Does not re-run or touch passing tests.", + "files": [] + } + ] +} +``` + +--- + +## 5. Running the eval loop + +Point `skill-creator` at the agent files — it treats the `description:` block the same way it treats a skill description. + +### Trigger accuracy + +``` +Use the skill-creator skill to optimize the description for .github/agents/my-agent.agent.md +using the trigger evals at .github/agents/evals/my-agent/trigger-eval.json. +``` + +`skill-creator` will propose candidate descriptions, score them against the eval set, and iterate. + +### Body quality + +``` +Use the skill-creator skill to run the body evals for .github/agents/my-agent.agent.md +using .github/agents/evals/my-agent/evals.json. +``` + +Use the same with-skill / baseline comparison flow described in [skill-testing.md](./skill-testing.md). + +--- + +## 6. Structural edits — use `agent-customization` + +When body evals reveal a section is wrong (wrong scope, missing tool, bad handoff), use the `agent-customization` skill to fix the structural parts: + +``` +Use the agent-customization skill to add `mcp_microsoft_pla_browser_wait_for` to the tools list +in .github/agents/my-agent.agent.md. +``` + +``` +Use the agent-customization skill to update the HEALING mode scope in +.github/agents/my-agent.agent.md — it should be scoped to failing tests only. +``` + +`agent-customization` understands `.agent.md` YAML frontmatter and section structure, so it handles these edits safely without breaking the file format. + +--- + +## 7. What to tune — agent-specific checklist + +Beyond the standard skill tuning checklist, also verify: + +| Check | Good signal | Bad signal | +|---|---|---| +| **Trigger precision** | Agent fires only for its domain | Fires for requests that belong to another agent | +| **Trigger recall** | All domain phrases trigger it | Mis-fires to default agent for known phrases | +| **Scope boundaries** | Refuses work outside its Does-NOT list | Silently attempts work outside its scope | +| **Mode activation** | RE-SCAN / HEALING / REMOVE activate on correct triggers | Wrong mode fires, or modes don't activate | +| **Handoff clarity** | Outputs correct hand-off message to the right agent | Hands off to wrong agent or swallows the work | +| **Tool completeness** | All tools needed by the body are in the frontmatter `tools:` list | Body references a tool not in `tools:` — it will be unavailable | + +--- + +## 8. Description anti-patterns + +These are the most common description problems observed in agent files: + +**Over-broad description** — causes the agent to shadow other agents: +```yaml +# BAD — fires on almost everything +description: > + Helps with testing, documentation, and web apps. +``` + +**Under-specified triggers** — causes the agent to miss its domain: +```yaml +# BAD — won't fire on "crawl the UI" or "playwright scan" +description: > + Generates BDD tests. +``` + +**Good pattern** — explicit Triggers list with concrete phrases: +```yaml +description: > + Bridge living documentation to executable tests. ... + Triggers: "scan webapp", "generate pageobjects", "heal pageobjects", + "playwright crawl", "BDD pipeline", "crawl the UI". +``` + +--- + +## 9. Regression-first loop + +Same as skill testing — run the full trigger-eval set, fix the largest failure cluster, re-run: + +1. Run full trigger-eval and body-eval sets; save baseline scores +2. Identify largest failure cluster (e.g. 4 should-not-trigger cases fire) +3. Make one description change +4. Re-run trigger-eval only +5. Review delta +6. Run full suite +7. Keep or revert +8. Repeat until all trigger evals pass and body eval delta is positive or neutral + +--- + +## 10. Minimal session + +``` +gh copilot +→ "Use the skill-creator skill to test the agent at .github/agents/my-agent.agent.md + using the evals at .github/agents/evals/my-agent/" +→ inspect trigger accuracy and body output diffs +→ use agent-customization to fix structural issues +→ "Use the skill-creator skill to optimize the description using the trigger-eval.json" +→ re-run evals until stable +``` + +--- + +For the full eval methodology (subagent spawning, benchmark aggregation, the viewer), see [skill-testing.md](./skill-testing.md) — the process is identical once the eval files are in place. diff --git a/docs/skill-testing.md b/docs/testing/skill-testing.md similarity index 100% rename from docs/skill-testing.md rename to docs/testing/skill-testing.md diff --git a/roadmap.md b/roadmap.md new file mode 100644 index 0000000..9d854f8 --- /dev/null +++ b/roadmap.md @@ -0,0 +1,733 @@ +# Implementation Roadmap — Agentic Engineering Toolkit + +> **Authored from:** `plugin-spec.md` (last reviewed 2026-05-21). +> The spec file has been removed from the repo; this document is the canonical delivery reference. +> **Last updated:** 2026-05-22 + +--- + +## Progress overview + +| Step | Cluster | Agent(s) | Skills | Done | Remaining | +|---|---|---|---|---|---| +| 1 | Living Doc + BDD | `@living-doc-copilot` ✅ `@bdd-copilot` ❌ | 6 / 12 ✅ | 7 files | 7 files | +| 2 | SDET | `@sdet-copilot` ❌ | 0 / 7 | — | 8 files | +| 3 | Code Quality | `@quality-gate-copilot` ❌ | 0 / 7 | — | 8 files | +| 4 | Test Specialist | `@test-specialist-copilot` ❌ | 0 / 6 | — | 7 files | +| 4b | Test Quality | `@test-quality-copilot` ❌ | 0 / 4 | — | 5 files | +| 5 | Standalone | — | 0 / 3 | — | 3 files | +| **Total** | | **6 agents** | **39 skills** | **7** | **38** | + +> **Constraint:** never merge a cluster without both the skill files AND the agent definition in the same PR. + +--- + +## File layout + +``` +skills/ +└── {skill-name}/ + ├── SKILL.md ← required + ├── scripts/ ← optional: executable logic + ├── references/ ← optional: overflow docs (when body approaches 500 lines) + ├── assets/ ← optional: templates, example files + └── evals/ ← optional: trigger + assertion test prompts + +.github/ +└── agents/ + └── {agent-name}.agent.md +``` + +Skill source to migrate from: `/Users/ab024ll/.copilot/skills/{skill-name}/SKILL.md` +Agent files: authored from scratch using the spec definitions below. + +### Agent file format + +```yaml +--- +description: > + +tools: + - read_file + - replace_string_in_file + - create_file + - grep_search + - file_search + - semantic_search + # + run_in_terminal for agents with execute capability + # + mcp_microsoft_pla_browser_* for @bdd-copilot only +--- + + +``` + +### Validation checklist (every skill before merge) + +- [ ] Folder name matches `name` frontmatter exactly — lowercase kebab-case, ≤ 64 chars +- [ ] `description` ≤ 1024 chars; covers *what* and *when*; includes trigger keywords +- [ ] Body < 500 lines; use `references/` with a pointer in body if needed +- [ ] No hardcoded secrets, credentials, or absolute machine-local paths +- [ ] Scripts in `scripts/` are referenced from `SKILL.md` with usage instructions + +--- + +## Step 1 — Living Doc + BDD Cluster + +### Already done ✅ + +| File | Status | +|---|---| +| `.github/agents/living-doc-copilot.agent.md` | ✅ | +| `skills/living-doc-create-feature/SKILL.md` | ✅ | +| `skills/living-doc-create-functionality/SKILL.md` | ✅ | +| `skills/living-doc-create-user-story/SKILL.md` | ✅ | +| `skills/living-doc-gap-finder/SKILL.md` | ✅ | +| `skills/living-doc-impact-analysis/SKILL.md` | ✅ | +| `skills/living-doc-update/SKILL.md` | ✅ | + +### Still required ❌ + +| File | Type | +|---|---| +| `skills/living-doc-pageobject-scan/SKILL.md` | Skill — migrate from `.copilot/skills/` | +| `skills/living-doc-scenario-creator/SKILL.md` | Skill — migrate | +| `skills/gherkin-scenario/SKILL.md` | Skill — migrate | +| `skills/gherkin-step/SKILL.md` | Skill — migrate | +| `skills/gherkin-living-doc-sync/SKILL.md` | Skill — migrate | +| `.github/agents/bdd-copilot.agent.md` | Agent — author from spec | + +### Agent outline: `@bdd-copilot` + +**Frontmatter:** + +```yaml +--- +description: > + Bridge living documentation to executable tests. Explore web apps via MCP Playwright, + generate and maintain PageObjects, Gherkin scenarios, and step definitions. + Handles Phase 0+1 (Business Seed + exploration), Phase 3 (scenario generation), + Phase 6 maintenance (RE-SCAN, HEALING, REMOVE). + Triggers: "scan webapp", "generate pageobjects", "heal pageobjects", "generate scenarios", + "sync gherkin", "playwright crawl", "explore the app", "bdd copilot", "BDD pipeline". +tools: + - read_file + - replace_string_in_file + - create_file + - grep_search + - file_search + - semantic_search + - run_in_terminal + - mcp_microsoft_pla_browser_navigate + - mcp_microsoft_pla_browser_snapshot + - mcp_microsoft_pla_browser_click + - mcp_microsoft_pla_browser_fill_form + - mcp_microsoft_pla_browser_take_screenshot + - mcp_microsoft_pla_browser_type + - mcp_microsoft_pla_browser_wait_for +--- +``` + +**Required body sections:** + +1. **Phase 0 — Business Seed assembly** + - Sources A–E with behaviour per source + - Credential rule: `env:VAR_NAME` in `seed.yaml` always, never literal values + - Output artifact: `.copilot/bdd/seed.yaml` + +2. **Phase 1 — Iterative exploration** + - Load `seed.yaml` + `manifest.json` (if present from prior run); absent manifest = first iteration + - Crawl loop until coverage plateau (no new surfaces in last iteration) + - Report unreachable areas → enrich seed → loop + - Output artifact: `.copilot/bdd/manifest.json` (Feature name, URL, component IDs, PageObject path) + +3. **Source E — Guided traversal protocol** + - Pause at unknown decision points, take screenshot, ask user + - Immediately append to `guided_steps:` in `seed.yaml`: `url`, `action`, `field`, `value` (`env:VAR` if sensitive), `note` + - CAPTCHA rule: pause, wait for human to solve in browser, continue; still record the step + +4. **Phase 3 — Scenario generation**: gap detection vs existing scenarios; generate via `living-doc-scenario-creator`; write step definitions; extend PageObjects + +5. **Phase 6 — Maintenance**: RE-SCAN (new feature/refactor), HEALING (test failures/selector drift), REMOVE (deprecated feature) — triggers and behaviour per mode + +6. **Scope** (10 bullets from spec): + - Load Business Seed + Exploration Manifest before crawling + - Crawl web app via MCP Playwright using manifest-guided navigation + - Fill forms and traverse wizards using business-supplied test values + - Identify Features from discovered UI surfaces + - Detect scenario gaps (existing scenarios vs US ACs) + - Generate Gherkin scenarios from User Story ACs + - Write and extend step definitions + - Heal PageObjects after UI changes (MCP Playwright drift detection) + - Challenge US/AC validity when app behaviour has changed + - Sync Gherkin feature files with living doc + +7. **Does NOT**: create living doc entities (→ `@living-doc-copilot`); write unit/integration tests (→ `@sdet-copilot`); run quality gates (→ `@quality-gate-copilot`) + +8. **Shared skill note**: `living-doc-gap-finder` is used bottom-up here (scenario coverage for known ACs) vs top-down in `@living-doc-copilot` (missing documentation) + +9. **Skills** (6): `living-doc-pageobject-scan`, `living-doc-scenario-creator`, `living-doc-gap-finder`, `gherkin-scenario`, `gherkin-step`, `gherkin-living-doc-sync` — each with path `skills/{name}/SKILL.md` + +10. **Handoff out** (two paths): + - Feature list → `@living-doc-copilot` to document + - After Phase 3: *"Feature files and steps generated. Call @sdet-copilot for unit tests."* + +--- + +### Issue 1.A — Complete remaining Step 1 skills + +**Title:** `[Step 1] Migrate remaining living-doc BDD + gherkin skills` + +**Body:** + +``` +## Summary + +Six skills remain from the Living Doc + BDD cluster. Must ship in the same PR as Issue 1.B +(spec rule: never transfer skills without the agent definition). + +## Files to create + +| Destination | Source | +|---|---| +| `skills/living-doc-pageobject-scan/SKILL.md` | `.copilot/skills/living-doc-pageobject-scan/SKILL.md` | +| `skills/living-doc-scenario-creator/SKILL.md` | `.copilot/skills/living-doc-scenario-creator/SKILL.md` | +| `skills/gherkin-scenario/SKILL.md` | `.copilot/skills/gherkin-scenario/SKILL.md` | +| `skills/gherkin-step/SKILL.md` | `.copilot/skills/gherkin-step/SKILL.md` | +| `skills/gherkin-living-doc-sync/SKILL.md` | `.copilot/skills/gherkin-living-doc-sync/SKILL.md` | + +## Acceptance criteria + +- [ ] All 6 folder names match their `name` frontmatter fields exactly +- [ ] All `description` fields ≤ 1024 chars and include trigger keywords +- [ ] `living-doc-gap-finder` description (already migrated) notes dual shared-skill usage + — verify it covers both top-down (@living-doc-copilot) and bottom-up (@bdd-copilot) +- [ ] `gherkin-scenario` description notes optional @sdet-copilot usage at unit level +- [ ] All bodies < 500 lines (use `references/` with pointer if needed) +- [ ] No hardcoded credentials or absolute local paths +- [ ] Closed in same PR as Issue 1.B + +## Reference + +Spec → Agent Catalog → @bdd-copilot skills table +``` + +--- + +### Issue 1.B — Create @bdd-copilot agent + +**Title:** `[Step 1] Create @bdd-copilot agent definition` + +**Body:** + +``` +## Summary + +Author `.github/agents/bdd-copilot.agent.md` — automation-layer agent for web app +exploration (Phases 0+1), BDD scenario generation (Phase 3), and maintenance (Phase 6). + +## File to create + +`.github/agents/bdd-copilot.agent.md` + +## Required frontmatter + +See roadmap.md → Step 1 → Agent outline: @bdd-copilot for full frontmatter block. +Key requirement: all mcp_microsoft_pla_browser_* tools must be listed. + +## Required body sections + +1. Phase 0 — Business Seed assembly (Sources A–E; credential safety; seed.yaml output) +2. Phase 1 — Iterative exploration (load seed + manifest; plateau detection; manifest.json output) +3. Partial state — seed.yaml present but manifest.json absent = treat as first Phase 1 run +4. Source E — Guided traversal (pause/screenshot/ask/execute/write guided_steps; CAPTCHA rule) +5. Phase 3 — Scenario generation with gap detection +6. Phase 6 — Maintenance (RE-SCAN / HEALING / REMOVE) +7. Scope — 10 bullets +8. Does NOT — with redirect targets +9. Shared skill note for living-doc-gap-finder +10. Skills table — 7 entries with paths +11. Handoff out — two paths; prompts verbatim from spec + +## Acceptance criteria + +- [ ] All MCP Playwright tools listed in frontmatter +- [ ] Sources A–E documented with exact behaviour per source +- [ ] Credential safety rule present (env:VAR_NAME, never literal) +- [ ] Partial state handling documented +- [ ] Guided traversal protocol includes CAPTCHA pause-and-wait +- [ ] Phase 6 all three maintenance modes documented +- [ ] All 7 skills referenced as `skills/{name}/SKILL.md` +- [ ] Handoff prompt exact: "Feature files and steps generated. Call @sdet-copilot for unit tests." +- [ ] Closed in same PR as Issue 1.A +``` + +--- + +### Planned agent: `@living-doc-bdd-tutorial-copilot` + +The tutorial generation capability (previously `living-doc-tutorial-creator` skill) will ship as +a dedicated agent rather than as part of `@living-doc-bdd-copilot`. It will own the full +tutorial authoring pipeline: transform executed BDD scenarios into annotated tutorial documents, +SSML narration scripts, and onboarding walkthroughs. + +| Attribute | Value | +|---|---| +| Agent file | `.github/agents/living-doc-bdd-tutorial-copilot.agent.md` | +| Skill | `skills/living-doc-tutorial-creator/SKILL.md` — migrate from `.copilot/skills/` | +| Inbound trigger | Executed `.feature` files + optional screenshots | +| Output | Annotated tutorial `.md`, SSML narration script | +| Step | Separate step (not yet scheduled) | + +--- + +## Step 2 — SDET Cluster + +### Files to create + +| File | Type | +|---|---| +| `skills/tdd-workflow/SKILL.md` | Skill — migrate from `.copilot/skills/` | +| `skills/test-unit-write/SKILL.md` | Skill — migrate | +| `skills/test-unit-review/SKILL.md` | Skill — migrate | +| `skills/test-unit-standards/SKILL.md` | Skill — migrate | +| `skills/test-case-design/SKILL.md` | Skill — migrate | +| `skills/test-data-management/SKILL.md` | Skill — migrate | +| `skills/test-mocking-patterns/SKILL.md` | Skill — migrate | +| `.github/agents/sdet-copilot.agent.md` | Agent — author from spec | + +### Agent outline: `@sdet-copilot` + +**Frontmatter:** + +```yaml +--- +description: > + Daily developer test-engineering companion. Use for: TDD red-green-refactor, writing + unit and integration tests, reviewing existing test files, designing test case tables, + managing test data and fixtures, choosing test doubles. Phase 4 of the engineering + pipeline. Triggers: "write tests", "TDD", "review my tests", "test doubles", + "test data", "red-green-refactor", "sdet copilot", "write unit tests", + "add tests for", "design test cases", "add coverage for". +tools: + - read_file + - replace_string_in_file + - create_file + - grep_search + - file_search + - semantic_search +--- +``` + +**Required body sections:** + +1. **Technology-neutral escalation constraint** (4-step): express guidance language-agnostic first; if language-specific tooling required ask *"What is your target technology / language?"*; recommend escalating to `@quality-gate-copilot` with the matching language skill (`qa-python`, `qa-java`, `qa-scala`, `qa-typescript`, `qa-dotnet`); if no match, provide generic guidance and note the gap + +2. **Scope** (6 bullets): TDD workflow; write unit and integration test code; review and audit test files; design test case tables; manage test data; choose test doubles + +3. **Does NOT**: run CI quality gates (→ `@quality-gate-copilot`); write Gherkin/BDD *as standalone BDD pipeline deliverables* (→ `@bdd-copilot`; `gherkin-scenario` available optionally at unit level); handle specialised test types — accessibility, security, E2E, API (→ `@test-specialist-copilot`); improve test quality depth — mutation, property-based, flakiness (→ `@test-quality-copilot`) + +4. **Skills** (7): `tdd-workflow`, `test-unit-write`, `test-unit-review`, `test-unit-standards`, `test-case-design`, `test-data-management`, `test-mocking-patterns`; note `gherkin-scenario` as optional 8th when team uses BDD at unit level + +5. **Handoff out**: *"Tests written. Run @quality-gate-copilot to enforce the gate."* + +--- + +### Issue 2.1 — SDET cluster (skills + agent) + +**Title:** `[Step 2] SDET cluster — migrate 7 skills and create @sdet-copilot agent` + +**Body:** + +``` +## Summary + +Migrate 7 SDET skills and author the @sdet-copilot agent definition as a single PR. + +## Files to create + +skills/tdd-workflow/SKILL.md, skills/test-unit-write/SKILL.md, +skills/test-unit-review/SKILL.md, skills/test-unit-standards/SKILL.md, +skills/test-case-design/SKILL.md, skills/test-data-management/SKILL.md, +skills/test-mocking-patterns/SKILL.md, .github/agents/sdet-copilot.agent.md + +## Acceptance criteria — skills + +- [ ] `tdd-workflow` body references SPEC.md-first pattern in the red-green-refactor cycle +- [ ] `test-unit-standards`, `test-unit-write`, and `test-unit-review` cross-reference each other + correctly (rule set vs procedural write vs procedural review distinction) +- [ ] All bodies < 500 lines; all descriptions ≤ 1024 chars +- [ ] All folder names match `name` frontmatter exactly + +## Acceptance criteria — agent + +- [ ] Escalation path says "recommend @quality-gate-copilot", not "load qa-* skill internally" +- [ ] Gherkin Does NOT entry is qualified: "standalone BDD pipeline deliverable"; + optional unit-level exception noted +- [ ] All 7 skills referenced by path `skills/{name}/SKILL.md` +- [ ] Handoff prompt exact: "Tests written. Run @quality-gate-copilot to enforce the gate." + +## Reference + +Spec → Agent Catalog → @sdet-copilot +``` + +--- + +## Step 3 — Code Quality Cluster + +### Files to create + +| File | Type | +|---|---| +| `skills/qa-python/SKILL.md` | Skill — migrate from `.copilot/skills/` | +| `skills/qa-java/SKILL.md` | Skill — migrate | +| `skills/qa-scala/SKILL.md` | Skill — migrate | +| `skills/qa-typescript/SKILL.md` | Skill — migrate | +| `skills/qa-dotnet/SKILL.md` | Skill — migrate | +| `skills/qa-terraform/SKILL.md` | Skill — migrate | +| `skills/test-coverage-gate/SKILL.md` | Skill — migrate | +| `.github/agents/quality-gate-copilot.agent.md` | Agent — author from spec | + +### Agent outline: `@quality-gate-copilot` + +**Frontmatter:** + +```yaml +--- +description: > + Enforce code quality standards — diagnose and fix CI quality gate failures across all + languages and stacks. Use for: linting, formatting, static analysis violations, coverage + thresholds, Javadoc, type annotations, and logging standards. Phase 5 of the pipeline. + Triggers: "quality gate", "CI failing", "coverage below", "lint error", "scalafmt", + "pylint", "quality gate copilot", "fix linting", "coverage threshold", "SpotBugs", + "ESLint violation", "dotnet format", "tflint failure". +tools: + - read_file + - grep_search + - file_search + - semantic_search + - run_in_terminal +--- +``` + +**Required body sections:** + +1. **Scope** (5 bullets): run/fix linting and formatting; fix static analysis violations; configure and enforce coverage thresholds; diagnose CI gate failures per language; apply logging/Javadoc/type annotation standards + +2. **Language routing table** — maps language/stack to skill and path: + + | Language | Skill | Path | + |---|---|---| + | Python | `qa-python` | `skills/qa-python/SKILL.md` | + | Java | `qa-java` | `skills/qa-java/SKILL.md` | + | Scala | `qa-scala` | `skills/qa-scala/SKILL.md` | + | TypeScript / JS | `qa-typescript` | `skills/qa-typescript/SKILL.md` | + | C# / .NET | `qa-dotnet` | `skills/qa-dotnet/SKILL.md` | + | HCL / Terraform | `qa-terraform` | `skills/qa-terraform/SKILL.md` | + | All (coverage) | `test-coverage-gate` | `skills/test-coverage-gate/SKILL.md` | + +3. **Does NOT**: write test code (→ `@sdet-copilot`); handle mutation testing strategy (→ `@test-quality-copilot`); author IaC modules (→ `cps-iac` in `cps-agentic-skills`, not this repo) + +4. **Skills** (7): language column + intent label + path + +--- + +### Issue 3.1 — Code Quality cluster (skills + agent) + +**Title:** `[Step 3] Code Quality cluster — migrate 7 skills and create @quality-gate-copilot agent` + +**Body:** + +``` +## Summary + +Migrate 7 code quality skills and author the @quality-gate-copilot agent as a single PR. + +## Files to create + +skills/qa-python/SKILL.md, skills/qa-java/SKILL.md, skills/qa-scala/SKILL.md, +skills/qa-typescript/SKILL.md, skills/qa-dotnet/SKILL.md, skills/qa-terraform/SKILL.md, +skills/test-coverage-gate/SKILL.md, .github/agents/quality-gate-copilot.agent.md + +## Acceptance criteria — skills + +- [ ] `qa-scala` body covers JMF filter requirement for JaCoCo +- [ ] `test-coverage-gate` distinguishes baseline measurement (no CI block) from + new-code gate (hard fail) +- [ ] Each `qa-*` description includes language-specific trigger keywords +- [ ] All folder names match `name` frontmatter exactly; all descriptions ≤ 1024 chars + +## Acceptance criteria — agent + +- [ ] Language routing table covers all 5 languages + HCL + cross-language coverage +- [ ] `run_in_terminal` present in tools (this agent executes commands) +- [ ] IaC redirect points to `cps-iac` in `cps-agentic-skills`, not this plugin +- [ ] All 7 skills referenced by path + +## Reference + +Spec → Agent Catalog → @quality-gate-copilot +``` + +--- + +## Step 4 — Test Specialist Cluster + +### Files to create + +| File | Type | +|---|---| +| `skills/test-accessibility/SKILL.md` | Skill — migrate from `.copilot/skills/` | +| `skills/test-api-standards/SKILL.md` | Skill — migrate | +| `skills/test-e2e-standards/SKILL.md` | Skill — migrate | +| `skills/test-integration-standards/SKILL.md` | Skill — migrate | +| `skills/test-ui-standards/SKILL.md` | Skill — migrate | +| `skills/test-security/SKILL.md` | Skill — migrate | +| `.github/agents/test-specialist-copilot.agent.md` | Agent — author from spec | + +### Agent outline: `@test-specialist-copilot` + +**Frontmatter:** + +```yaml +--- +description: > + Apply specialised testing for specific test types beyond standard unit tests. + Use for: accessibility (axe-core, WCAG 2.1 AA), API and Pact contract tests, + cross-service E2E, Testcontainers integration isolation, Angular/React/Cypress UI + tests, and SAST/DAST security scanning. Triggers: "a11y test", "Pact", + "E2E standards", "security scan", "Cypress", "Testcontainers", "accessibility", + "contract test", "test specialist copilot", "UI tests", "integration isolation". +tools: + - read_file + - replace_string_in_file + - create_file + - grep_search + - file_search + - semantic_search + - run_in_terminal +--- +``` + +**Required body sections:** + +1. **Specialisation routing table**: + + | Concern | Skill | Path | + |---|---|---| + | Accessibility / WCAG | `test-accessibility` | `skills/test-accessibility/SKILL.md` | + | REST + contract (Pact) | `test-api-standards` | `skills/test-api-standards/SKILL.md` | + | Cross-service E2E | `test-e2e-standards` | `skills/test-e2e-standards/SKILL.md` | + | Testcontainers / DB isolation | `test-integration-standards` | `skills/test-integration-standards/SKILL.md` | + | Angular / React / Cypress | `test-ui-standards` | `skills/test-ui-standards/SKILL.md` | + | SAST / DAST / dep scanning | `test-security` | `skills/test-security/SKILL.md` | + +2. **Scope** (6 bullets from spec) + +3. **Does NOT**: write standard unit tests (→ `@sdet-copilot`); run language-specific quality gates (→ `@quality-gate-copilot`); write BDD scenarios (→ `@bdd-copilot`); improve test quality depth (→ `@test-quality-copilot`) + +4. **Skills** (6): Specialisation column + path + +--- + +### Issue 4.1 — Test Specialist cluster (skills + agent) + +**Title:** `[Step 4] Test Specialist cluster — migrate 6 skills and create @test-specialist-copilot agent` + +**Body:** + +``` +## Summary + +Migrate 6 Test Specialist skills and author the @test-specialist-copilot agent as a single PR. + +## Files to create + +skills/test-accessibility/SKILL.md, skills/test-api-standards/SKILL.md, +skills/test-e2e-standards/SKILL.md, skills/test-integration-standards/SKILL.md, +skills/test-ui-standards/SKILL.md, skills/test-security/SKILL.md, +.github/agents/test-specialist-copilot.agent.md + +## Acceptance criteria — skills + +- [ ] `test-accessibility` covers axe-core, jest-axe, cypress-axe, WCAG 2.1 AA +- [ ] `test-api-standards` covers Pact consumer-driven contracts +- [ ] `test-e2e-standards` is clearly distinguished from `test-ui-standards` + (cross-service boundary vs UI-only) +- [ ] `test-integration-standards` covers Testcontainers and isolation/cleanup rules +- [ ] `test-security` covers SAST (Bandit/Semgrep), DAST (ZAP), dep scanning (Snyk/pip-audit) +- [ ] All folder names match `name` frontmatter; all descriptions ≤ 1024 chars + +## Acceptance criteria — agent + +- [ ] Specialisation routing table covers all 6 concerns +- [ ] Does NOT list distinguishes from all four other test agents +- [ ] All 6 skills referenced by path + +## Reference + +Spec → Agent Catalog → @test-specialist-copilot +``` + +--- + +## Step 4b — Test Quality Cluster + +### Files to create + +| File | Type | +|---|---| +| `skills/test-mutation/SKILL.md` | Skill — migrate from `.copilot/skills/` | +| `skills/test-property-based/SKILL.md` | Skill — migrate | +| `skills/test-flakiness-diagnosis/SKILL.md` | Skill — migrate | +| `skills/test-observability/SKILL.md` | Skill — migrate | +| `.github/agents/test-quality-copilot.agent.md` | Agent — author from spec | + +### Agent outline: `@test-quality-copilot` + +**Frontmatter:** + +```yaml +--- +description: > + Improve depth and reliability of existing tests — the quality improvement layer applied + after baseline tests are in place. Use for: mutation score improvement (mutmut, PIT, + Stryker), property-based testing (Hypothesis, ScalaCheck, fast-check), flaky test + diagnosis and repair, and observability test assertions (logs, metrics, OTel traces). + Triggers: "mutation", "Hypothesis", "flaky test", "test logs", "test metrics", + "surviving mutants", "property-based", "test quality copilot", "improve test quality". +tools: + - read_file + - replace_string_in_file + - create_file + - grep_search + - file_search + - semantic_search + - run_in_terminal +--- +``` + +**Required body sections:** + +1. **Prerequisite note**: called after `@sdet-copilot` has baseline coverage; this is a depth-improvement layer, not a test-writing starter + +2. **Scope** (4 bullets): mutation testing; property-based testing; flaky test diagnosis and repair; observability assertions + +3. **Does NOT**: write new test suites from scratch (→ `@sdet-copilot`); enforce CI quality gates (→ `@quality-gate-copilot`); handle specialised test types (→ `@test-specialist-copilot`) + +4. **Skills** (4): `test-mutation`, `test-property-based`, `test-flakiness-diagnosis`, `test-observability` — Specialisation column + path each + +5. **Handoff out**: None — this agent is terminal. Quality improvements are applied in place; no downstream phase requires a handoff. + +--- + +### Issue 4b.1 — Test Quality cluster (skills + agent) + +**Title:** `[Step 4b] Test Quality cluster — migrate 4 skills and create @test-quality-copilot agent` + +**Body:** + +``` +## Summary + +Migrate 4 Test Quality skills and author the @test-quality-copilot agent as a single PR. +This is a depth-improvement layer; call after @sdet-copilot has established baseline coverage. + +## Files to create + +skills/test-mutation/SKILL.md, skills/test-property-based/SKILL.md, +skills/test-flakiness-diagnosis/SKILL.md, skills/test-observability/SKILL.md, +.github/agents/test-quality-copilot.agent.md + +## Acceptance criteria — skills + +- [ ] `test-mutation` covers mutmut (Python), PIT (Java/Scala), Stryker (JS/TS) +- [ ] `test-property-based` covers Hypothesis, ScalaCheck, fast-check +- [ ] `test-flakiness-diagnosis` covers async timing, shared state, CI environment differences +- [ ] `test-observability` covers structured log assertions, prometheus_client fake registry, + InMemorySpanExporter for OTel spans +- [ ] All folder names match `name` frontmatter; all descriptions ≤ 1024 chars + +## Acceptance criteria — agent + +- [ ] Prerequisite note present: depth-improvement layer, not starter +- [ ] Handoff out section present and states this agent is terminal (no downstream phase) +- [ ] All 4 skills referenced by path + +## Reference + +Spec → Agent Catalog → @test-quality-copilot +``` + +--- + +## Step 5 — Standalone Skills + +No agent file — these 3 skills are below the 3-skill minimum for a dedicated agent per governance rules. + +### Files to create + +| File | Type | +|---|---| +| `skills/pr-review/SKILL.md` | Skill — migrate from `.copilot/skills/` | +| `skills/contract-openapi/SKILL.md` | Skill — migrate | +| `skills/contract-schema-registry/SKILL.md` | Skill — migrate | + +> **Future note:** `contract-openapi` + `contract-schema-registry` are candidates for a +> `@devops-copilot` agent when IaC skills are consolidated here. Until then, standalone. + +--- + +### Issue 5.1 — Standalone skills + +**Title:** `[Step 5] Migrate standalone skills: pr-review, contract-openapi, contract-schema-registry` + +**Body:** + +``` +## Summary + +Migrate 3 standalone skills — final migration step. No agent file required. + +## Files to create + +| Destination | Source | +|---|---| +| `skills/pr-review/SKILL.md` | `.copilot/skills/pr-review/SKILL.md` | +| `skills/contract-openapi/SKILL.md` | `.copilot/skills/contract-openapi/SKILL.md` | +| `skills/contract-schema-registry/SKILL.md` | `.copilot/skills/contract-schema-registry/SKILL.md` | + +## Acceptance criteria + +- [ ] `pr-review` description states it is language-agnostic +- [ ] `contract-openapi` description explicitly distinguishes from `contract-schema-registry` + (REST/OpenAPI vs event schema registry) +- [ ] `contract-schema-registry` description explicitly distinguishes from `test-api-standards` + (schema registry vs Pact consumer-driven contracts) +- [ ] All folder names match `name` frontmatter; all descriptions ≤ 1024 chars +- [ ] No hardcoded internal paths or credentials + +## Reference + +Spec → Standalone Skills section; Governance Rules → Agent scope (Skills < 3 → standalone) +``` + +--- + +## Post-implementation checklist + +After all steps are merged: + +- [ ] Update `docs/README.md` — add rows to the Skill Guides table for each skill that has a companion guide +- [ ] Update `README.md` — add the full skill catalog and agent roster +- [ ] Verify summary totals: 39 unique skill files, 6 agent files +- [ ] Add evals for at least one skill per cluster under `skills/{name}/evals/` (see `docs/testing/skill-testing.md`) +- [ ] Run trigger accuracy test for shared skills (`living-doc-gap-finder`, `gherkin-scenario`) to verify correct agent activates for each intent +- [ ] Confirm `@bdd-copilot` MCP Playwright tools are available in the target deployment environment +- [ ] Cross-check all agent handoff prompts match each other: + - `@bdd-copilot (explore)` → *"Surfaces mapped. Call @living-doc-copilot to document them."* + - `@living-doc-copilot` → *"US and ACs are ready. Call @bdd-copilot to generate scenarios."* + - `@bdd-copilot` → *"Feature files and steps generated. Call @sdet-copilot for unit tests."* + - `@sdet-copilot` → *"Tests written. Run @quality-gate-copilot to enforce the gate."* + - `@quality-gate-copilot` → *"Gate green. Pipeline complete."* + - `@test-quality-copilot` → terminal (no handoff) diff --git a/skills/gherkin-living-doc-sync/SKILL.md b/skills/gherkin-living-doc-sync/SKILL.md new file mode 100644 index 0000000..96d4080 --- /dev/null +++ b/skills/gherkin-living-doc-sync/SKILL.md @@ -0,0 +1,136 @@ +--- +name: gherkin-living-doc-sync +description: > + Synchronise Gherkin feature files and BDD scenarios with the living documentation catalog. + Activate when scenarios diverge from User Story ACs, when step text drifts after a UI + refactor, or when AC link headers are missing or stale. Distinct from gap-finder (which + detects missing coverage) — corrects existing links. + Triggers on: "sync gherkin to living doc", "feature file out of sync", "scenario not linked + to AC", "step text changed", "gherkin drift", "update living doc after BDD change", + "BDD sync", "AC link missing in feature file", "sync scenarios", + "gherkin out of sync with living doc", "traceability broken". + Does NOT trigger for: writing new scenarios (use gherkin-scenario), implementing step + definitions (use gherkin-step), finding living doc gaps (use living-doc-gap-finder), + creating new US/Feature entities (use living-doc-create-user-story). +--- + +# Gherkin ↔ Living Doc Sync + +> **Glossary:** Feature, Functionality, User Story, AC — see [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)). + +Sync runs in three directions: (1) feature file → living doc, (2) living doc AC → feature file, +(3) step text → PageObject method signature. + +Use `scripts/scan_ac_links.py` to detect missing or malformed `# AC:` headers before a full +sync run. + +--- + +## Step 1 — Detect the sync direction + +| Change event | Sync direction | Action | +|---|---|---| +| New `.feature` file added | Feature file → living doc | Link each scenario to an AC; create AC if missing | +| User Story AC modified or added | Living doc → feature file | Update or add the corresponding scenario | +| UI refactored (selector / method renamed) | Step text → PageObject | Update step text; re-link to PageObject method | +| US deprecated | Living doc → feature file | Mark linked scenarios as `@deprecated` or remove | +| Scenario added without an AC comment | Feature file → living doc | Propose an AC and add the `# AC:` header | + +--- + +## Step 2 — Audit AC link headers + +**Required AC link format** (from the glossary): + +```gherkin +# AC: US-001-01 (v1.0.0 – Active) — Customer places an order +Scenario: Customer successfully places an order +``` + +- AC ID format: `AC:-` — e.g. `AC:US-001-01`, `AC:FUNC-001-02` +- The `# AC:` comment(s) must appear on the lines immediately above `Scenario:` or `Scenario Outline:`. Multiple `# AC:` lines are allowed — a scenario may cover more than one AC, and annotation comments (e.g. `# @tag`, free-text notes) may also appear in the block. + +**Audit checklist:** +1. Does every `Scenario:` / `Scenario Outline:` have a `# AC:` comment? +2. Does the referenced AC ID exist in the living doc catalog? +3. Does the AC state match (`Active` or `Implemented` — not `Deprecated` or `Planned`)? +4. Does the AC description match the scenario intent? + +For each missing or mismatched link: + +``` +SYNC ACTION: checkout.feature:14 + Scenario: "Customer successfully places an order" + → Missing AC link header + → Proposed link: # AC: US-001-01 (v1.0.0 – Active) — Customer places an order + → Confirm or select a different AC +``` + +--- + +## Step 3 — Detect step text drift + +When step text changes after a UI refactor, the step definition binding breaks: + +``` +DRIFT DETECTED: checkout.feature:17 + Step: "When the customer clicks the Confirm Purchase button" + → No matching step definition found + → Previous match: "When the customer confirms the order" (checkout_steps.py:34) + → PageObject method: CheckoutPage.confirm_order() + → Suggested fix: update step text to "When the customer confirms the order" + OR update the step definition regex to match the new wording +``` + +--- + +## Step 4 — Apply sync changes + +Apply the minimum necessary change per action: + +- **Add missing AC link** → insert `# AC: (v) — ` above `Scenario:` +- **Update stale AC description** → update comment text; do not change the AC ID +- **Update scenario to match revised AC** → update step text; keep the `# AC:` link unchanged +- **Fix broken step text** → update the `.feature` file to match the step definition +- **Mark deprecated scenarios** → add `@deprecated` tag and a comment with the reason +- **AC split into multiple ACs** → update the existing scenario's `# AC:` link to the primary AC; create new scenarios for additional ACs + +Never delete a scenario during sync — flag it with `@review-needed` for developer decision. + +--- + +## Step 5 — Output sync report + +``` +SYNC REPORT — 2026-05-22 + Applied automatically (3): + checkout.feature:14 — added AC link header AC:US-001-01 + checkout.feature:28 — updated AC description (AC:US-001-02) + login.feature:7 — fixed step text drift → "When the user submits valid credentials" + + Requires manual review (1): + checkout.feature:45 — Scenario "Apply promo and checkout" has no matching AC + → Either create a new AC in US-001, or remove this scenario if it is obsolete +``` + +--- + +## Anti-patterns to flag + +| Anti-pattern | Flag | +|---|---| +| Scenario with no AC link | Missing traceability — add link or create AC | +| Two scenarios linked to the same AC | Usually a duplicate — review | +| AC linked from a scenario in a different User Story's feature file | Passive cross-US coverage — permitted but note it in the sync report. Only flag if the scenario's primary intent belongs to a different User Story (misplaced scenario) | +| Step text describes implementation (selector, endpoint) | Gherkin business-language violation — refer to `gherkin-scenario` | + +--- + +## Out-of-scope routing + +| Request | Use instead | +|---|---| +| Writing new Gherkin scenarios from scratch | `gherkin-scenario` | +| Implementing step definition code | `gherkin-step` | +| Finding ACs with no scenario coverage | `living-doc-gap-finder` | +| Creating new User Story or Feature entities | `living-doc-create-user-story` | diff --git a/skills/gherkin-living-doc-sync/scripts/scan_ac_links.py b/skills/gherkin-living-doc-sync/scripts/scan_ac_links.py new file mode 100644 index 0000000..25a5c6f --- /dev/null +++ b/skills/gherkin-living-doc-sync/scripts/scan_ac_links.py @@ -0,0 +1,138 @@ +#!/usr/bin/env python3 +""" +scan_ac_links.py — scan .feature files for missing or malformed AC link headers. + +Usage: + python scan_ac_links.py + +For every Scenario: / Scenario Outline: line found in .feature files, checks that: + - A '# AC: ' comment appears on the line immediately above it + - The AC ID follows the canonical format: AC:- + e.g. AC:US-001-01, AC:FEAT-003-02, AC:FUNC-001-03 + - No two scenarios in the same file reference the same AC ID (duplicate check) + +Exit code: 0 if all checks pass, 1 if any issues are found. + +Glossary reference: skills/references/living-doc-glossary.md +""" + +import re +import sys +from pathlib import Path + +# Matches: # AC: US-001-01 (v1.0.0 – Active) — description +# or simpler: # AC: US-001-01 +AC_COMMENT = re.compile(r"^\s*#\s*AC:\s*(\S+)", re.IGNORECASE) +# Canonical AC ID: AC:- where parent is US-nnn, FEAT-nnn, or FUNC-nnn +AC_ID_FORMAT = re.compile(r"^AC:(US|FEAT|FUNC)-\d{3}-\d{2}$", re.IGNORECASE) +SCENARIO_LINE = re.compile(r"^\s*(Scenario:|Scenario Outline:)\s*(.+)", re.IGNORECASE) + + +def scan_file(path: Path) -> list[dict]: + issues = [] + lines = path.read_text(encoding="utf-8").splitlines() + seen: dict[str, list[int]] = {} + + for i, line in enumerate(lines): + if not SCENARIO_LINE.match(line): + continue + + lineno = i + 1 + scenario_title = SCENARIO_LINE.match(line).group(2).strip() + prev = lines[i - 1].strip() if i > 0 else "" + ac_match = AC_COMMENT.match(prev) + + if not ac_match: + issues.append({ + "file": str(path), + "line": lineno, + "scenario": scenario_title, + "issue": "missing_ac_link", + "detail": "No '# AC: ' comment on the line immediately above this scenario.", + }) + continue + + ac_ref = ac_match.group(1).rstrip(",;") + + if not AC_ID_FORMAT.match(ac_ref): + issues.append({ + "file": str(path), + "line": lineno, + "scenario": scenario_title, + "issue": "malformed_ac_id", + "detail": ( + f"'{ac_ref}' does not match AC:- format " + "(e.g. AC:US-001-01, AC:FEAT-003-02)." + ), + }) + continue + + seen.setdefault(ac_ref.upper(), []).append(lineno) + + for ac_id, lines_found in seen.items(): + if len(lines_found) > 1: + issues.append({ + "file": str(path), + "line": lines_found, + "scenario": None, + "issue": "duplicate_ac_link", + "detail": ( + f"AC '{ac_id}' is linked from {len(lines_found)} scenarios " + f"at lines {lines_found}. Each AC should map to at most one scenario." + ), + }) + + return issues + + +def main(features_dir: str) -> None: + root = Path(features_dir) + if not root.exists(): + print(f"Error: directory not found: {features_dir}") + sys.exit(1) + + feature_files = sorted(root.rglob("*.feature")) + if not feature_files: + print(f"No .feature files found under {features_dir}") + return + + all_issues: list[dict] = [] + for f in feature_files: + all_issues.extend(scan_file(f)) + + if not all_issues: + print(f"✅ All {len(feature_files)} feature file(s) pass AC link checks.") + return + + by_type: dict[str, list] = {} + for issue in all_issues: + by_type.setdefault(issue["issue"], []).append(issue) + + print(f"Found {len(all_issues)} issue(s) in {len(feature_files)} feature file(s):\n") + + labels = { + "missing_ac_link": "MISSING AC LINK", + "malformed_ac_id": "MALFORMED AC ID", + "duplicate_ac_link": "DUPLICATE AC LINK", + } + + for issue_type, items in sorted(by_type.items()): + print(f"{'=' * 60}") + print(f" {labels.get(issue_type, issue_type)} ({len(items)})") + print(f"{'=' * 60}") + for item in items: + loc = item["line"] if isinstance(item["line"], int) else item["line"][0] + print(f" {item['file']}:{loc}") + if item.get("scenario"): + print(f" Scenario: {item['scenario']}") + print(f" → {item['detail']}") + print() + + sys.exit(1) + + +if __name__ == "__main__": + if len(sys.argv) != 2: + print("Usage: python scan_ac_links.py ") + sys.exit(1) + main(sys.argv[1]) diff --git a/skills/gherkin-scenario/SKILL.md b/skills/gherkin-scenario/SKILL.md new file mode 100644 index 0000000..a383b1b --- /dev/null +++ b/skills/gherkin-scenario/SKILL.md @@ -0,0 +1,143 @@ +--- +name: gherkin-scenario +description: > + Writing BDD Gherkin scenarios in plain business language. Activate when writing or reviewing + feature files, Given/When/Then steps, Scenario Outlines, Background blocks, or acceptance + criteria expressed as Gherkin. Covers business-language principles, one-behaviour-per-scenario + rule, anti-patterns (implementation leakage, multiple When actions, UI-speak in domain + scenarios), and data-driven scenario design. + Triggers on: "write a Gherkin scenario", "BDD scenario", "feature file", "Given When Then", + "Scenario Outline", "Cucumber scenario", "behave scenario", "acceptance test in Gherkin", + "should I use Background", "BDD anti-patterns", "review my feature file", "BDD scenarios for", + "convert acceptance criteria to Gherkin". + Does NOT trigger for: implementing step definitions (use gherkin-step), writing unit tests + (use test-unit-write), designing a test case table (use test-case-design). + Pairs with gherkin-step for step definition implementation. +--- + +# Gherkin Scenario Standards + +> **Glossary:** User Story, AC, Feature — see [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)). + +## Traceability requirement + +Every `Scenario:` or `Scenario Outline:` generated or reviewed by this skill must carry an +AC link comment on the line immediately above it, following the glossary AC ID format: + +```gherkin +# AC: US-001-01 (v1.0.0 – Active) — Happy path: customer places order +Scenario: Customer successfully places an order + ... +``` + +If writing standalone scenarios (no User Story context), use `# AC: STANDALONE` as a placeholder. +Standalone scenarios are permitted when they live outside the project's dedicated living doc +feature directory. Tutorial walkthroughs, exploratory probes, and any other developer-authored +scenarios that don't map to a User Story AC all qualify — the decision is the developer's. +`gherkin-living-doc-sync` will note `STANDALONE`-tagged scenarios in its sync report but will +not flag them as traceability gaps. + +--- + +## Write in the ubiquitous language + +Scenarios must use the language of the business domain. Anyone on the product team must be +able to read and verify them without knowing the implementation. + +```gherkin +# ✅ — business language +Given a customer with a gold membership +When they place an order for 2 units of "SKU-100" +Then the order is confirmed and the total is £160.00 + +# ❌ — implementation details +Given the database contains a row in users with tier="gold" +When a POST request is sent to /api/orders with body { "sku": "SKU-100", "qty": 2 } +Then the response status is 201 +``` + +--- + +## Follow Given / When / Then correctly + +| Keyword | Purpose | Rule | +|---------|---------|------| +| **Given** | System state before the action | Preconditions only — no actions | +| **When** | The action the actor takes | Exactly one meaningful action per scenario | +| **Then** | Observable outcome | Assertions only — no actions | +| **And / But** | Continuation | Never as the first step in a block | + +```gherkin +# ✅ +Given the customer's cart contains 3 items +When the customer applies the promo code "SAVE10" +Then the cart total is reduced by 10% + +# ❌ — multiple When actions (split into separate scenarios) +When the customer applies the promo code "SAVE10" +And the customer proceeds to checkout +And the customer enters payment details +``` + +--- + +## One behaviour per scenario + +Each scenario must verify exactly one observable behaviour. If the scenario name contains "and", +it likely tests two behaviours — split it. + +--- + +## Use Scenario Outline for data-driven variations + +```gherkin +# ✅ +Scenario Outline: Discount is applied correctly for each membership tier + Given a customer with a membership + When they purchase an item costing £100.00 + Then the total is £ + + Examples: + | tier | total | + | gold | 80.00 | + | silver | 90.00 | + | bronze | 95.00 | +``` + +--- + +## Use Background for shared preconditions + +Use `Background` when **every** scenario in the file shares the same precondition. +Keep Background to 3 steps or fewer. If only 2–3 scenarios share a precondition, +duplicate the `Given` step — prefer clarity over abstraction. + +--- + +## Avoid common anti-patterns + +| Anti-pattern | Problem | Fix | +|---|---|---| +| UI selectors in steps (`I click the "Submit" button`) | Breaks when UI changes | Use domain actions (`the customer submits the order`) | +| Imperative style (`I enter "alice@example.com" in Email field`) | Fragile and verbose | Declarative (`the customer logs in as Alice`) | +| Multiple `When` per scenario | Usually signals multiple behaviours — try to avoid | Prefer splitting into separate scenarios; if all steps represent a single logical action, collapse into one declarative step | +| Assertions in Given/When | Violates keyword semantics | Move all assertions to `Then` | +| Scenario depends on a previous scenario's state | Hidden ordering dependency | Each scenario must be fully self-contained | + +--- + +## Output format for generated scenarios + +Output all generated Gherkin in a single fenced `gherkin` code block starting with `Feature:`. +Use only `Scenario:`, `Scenario Outline:`, `Background:`, `Given`, `When`, `Then`, `And`, `But`, +and `Examples:` inside the block. + +--- + +## Out-of-scope routing + +| Request | Use instead | +|---|---| +| Implementing step definitions | **gherkin-step** | +| Writing unit tests | **test-unit-write** | +| Designing a test case table | **test-case-design** | diff --git a/skills/gherkin-step/SKILL.md b/skills/gherkin-step/SKILL.md new file mode 100644 index 0000000..6084433 --- /dev/null +++ b/skills/gherkin-step/SKILL.md @@ -0,0 +1,156 @@ +--- +name: gherkin-step +description: > + Implementing Gherkin step definitions that are clean, reusable, and maintainable. Activate when + writing or reviewing step definition code, binding Gherkin text to automation, managing shared + state between steps, configuring parameter types, parsing DataTable or DocString arguments, or + setting up Before/After hooks. Covers Python behave, Cucumber for Java and TypeScript, and + Cucumber-Scala idioms. + Triggers on: "step definitions", "implement Gherkin steps", "Cucumber step", "behave step", + "parameter type", "DataTable", "DocString", "Before hook", "After hook", "World object", + "step context", "step state sharing", "how to share state between steps", + "register step definition", "hook setup". + Does NOT trigger for: writing Gherkin scenarios (use gherkin-scenario), writing unit tests + (use test-unit-write). Pairs with gherkin-scenario. +--- + +# Gherkin Step Definition Standards + +> **Glossary:** Feature, PageObject, Functionality — see [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)). + +## Keep step definitions thin + +Step definitions are bindings — they translate Gherkin text into calls to PageObjects, domain +objects, or service clients. Business logic must not live in step definitions. + +**Keyword rules:** +- `Given` steps must not contain assertions — they set up preconditions only +- `When` steps must not contain assertions — they perform actions only +- Assertions belong exclusively in `Then` steps + +```python +# ✅ — thin; delegates to PageObject +@when('the customer confirms the order') +def step_confirm_order(context): + context.checkout_page.confirm_order() + +# ❌ — business logic embedded in the step +@when('the customer confirms the order') +def step_confirm_order(context): + context.cart.total *= (1 - context.discount / 100) + context.order_status = "placed" +``` + +--- + +## Encapsulate selectors in PageObjects + +Step definitions for domain-level scenarios must not contain CSS selectors, element IDs, or XPath. +Encapsulate all selector logic in PageObjects (selector preference: `data-testid` > `aria-label`/role > CSS class). + +```typescript +// ✅ — PageObject hides selector details +When("the customer submits the order", async function (this: OrderWorld) { + await this.checkoutPage.submitOrder(); // CheckoutPage owns the selector +}); + +// ❌ — selector leaks into the step definition +When("the customer submits the order", async function (this: OrderWorld) { + await this.page.click('[data-testid="submit-order-btn"]'); +}); +``` + +--- + +## Share state using the context / World object + +Never use global or module-level variables — they cause test contamination across scenarios. +Use the framework-provided context object, which is instantiated fresh for each scenario. + +| Framework | State object | Pattern | +|-----------|-------------|---------| +| behave (Python) | `context` | Attach attributes: `context.order = ...` | +| Cucumber (TypeScript) | `World` class | Extend `World`; access via `this` | +| Cucumber (Java) | Shared class via PicoContainer or Spring | Constructor injection | +| Cucumber (Scala) | Shared class via DI | `ScalaDI` or manual injection | + +```python +# ✅ behave — context carries state across steps +@given('a customer with a "{tier}" membership') +def step_given_customer(context, tier): + context.customer = Customer(tier=tier) + +@then("the discount is {rate:d}%") +def step_assert_discount(context, rate): + assert context.customer.discount_rate() == rate +``` + +--- + +## Use typed parameters + +```python +# ✅ — :d casts to int automatically +@when("the customer purchases {quantity:d} units") +def step_purchase(context, quantity: int): + context.cart.add_item(context.sku, quantity) +``` + +--- + +## Parse DataTable and DocString arguments + +```python +# ✅ — DataTable as list of dicts +@when("the customer adds the following items") +def step_add_items(context): + for row in context.table: + context.cart.add_item(row["sku"], int(row["quantity"])) + +# ✅ — DocString as raw text +@when("the system receives the following payload") +def step_receive_payload(context): + context.payload = json.loads(context.text) +``` + +--- + +## Configure hooks correctly + +| Hook | Use for | Must not use for | +|------|---------|-----------------| +| `before_scenario` / `Before` | Set up context state, seed data | Asserting behaviour | +| `after_scenario` / `After` | Cleanup: rollback DB, close browser | Seeding data | +| `before_all` / `BeforeAll` | Expensive one-time setup (start containers) | Per-test state | +| `after_all` / `AfterAll` | Stop containers, close connections | Per-test cleanup | + +Tag hooks to scope them to specific scenarios: + +```python +@before_scenario +def setup_database(context): + if "database" in context.tags: + context.db = create_test_db() +``` + +--- + +## Scala — Cucumber-Scala idioms + +```scala +// ✅ — Shared state via constructor injection +class OrderSteps(world: OrderWorld) extends StrictScalaDsl { + + Given("""a customer with a {string} membership""") { (tier: String) => + world.customer = Customer(tier = tier) + } + + When("""the customer purchases {int} units of {string}""") { (qty: Int, sku: String) => + world.order = world.customer.placeOrder(sku, qty) + } + + Then("""the order total is £{double}""") { (expected: Double) => + world.order.total shouldBe expected + } +} +``` diff --git a/skills/living-doc-create-functionality/SKILL.md b/skills/living-doc-create-functionality/SKILL.md index fd08c11..91289a5 100644 --- a/skills/living-doc-create-functionality/SKILL.md +++ b/skills/living-doc-create-functionality/SKILL.md @@ -95,7 +95,7 @@ If contextually distinct despite similar names, create a new Functionality and n > to get the next available ID and avoid collisions. > For AC IDs, use `--type AC --parent FUNC-` to get the next sequential AC number. -Output using the project's Storage Profile format (defined per project — see `../../docs/living-doc-copilot.md`). Canonical fields (see [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)) for AC format details): +Output using the project's Storage Profile format (defined per project — see `../../docs/guides/living-doc-copilot.md`). Canonical fields (see [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)) for AC format details): | Field | Required | Value | |---|---|---| diff --git a/skills/living-doc-gap-finder/SKILL.md b/skills/living-doc-gap-finder/SKILL.md index 1bf0fc1..a730e05 100644 --- a/skills/living-doc-gap-finder/SKILL.md +++ b/skills/living-doc-gap-finder/SKILL.md @@ -11,7 +11,7 @@ description: > "find undocumented features", "orphan tests", "untested AC", "documentation coverage", "gap report", "what's not covered", "living doc audit", "documentation audit". Does NOT trigger for: creating new living doc objects (use living-doc-create-* skills), - generating tutorials (use living-doc-tutorial-creator). + generating tutorials (use @living-doc-bdd-tutorial-copilot — planned). Orchestrates: living-doc-pageobject-scan, living-doc-scenario-creator, and all create-* skills. license: Apache-2.0 compatibility: GitHub Copilot @@ -43,13 +43,26 @@ delegate the computation to the script rather than reproducing it through reason --- +## Usage modes + +This skill is used in two directions depending on which agent calls it: + +| Mode | Caller | What it finds | +|---|---|---| +| **Full gap audit** (default) | `@living-doc-copilot` | All 9 gap types — missing documentation entities, orphan tests, stale references, empty Features | +| **Bottom-up scenario coverage** | `@living-doc-bdd-copilot` | Gap type 1 only, scoped to Gherkin: ACs that exist in the catalog but have no linked `.feature` scenario carrying a `# AC: ` traceability tag | + +When called by `@living-doc-bdd-copilot`, skip Steps 1 and 2 of the Workflow. Go directly to Gap type 1 with the Gherkin-scoped definition below, then output only those results. + +--- + ## Gap taxonomy Nine types of gaps are detected, in order of risk: | Priority | Gap type | Description | |---|---|---| -| 1 — Blocker | **Untested AC** | An Active or Implemented AC in a User Story or Functionality has no linked test | +| 1 — Blocker | **Untested AC** | An Active or Implemented AC in a User Story or Functionality has no linked test. In bottom-up scenario coverage mode (called from `@living-doc-bdd-copilot`): no `.feature` file carries a `# AC: ` traceability tag for this AC. | | 2 — Important | **Undocumented UI surface** | A screen or API endpoint exists in the app with no Feature entity | | 3 — Important | **Orphan Feature** | A Feature entity exists with no linked User Story | | 4 — Important | **Orphan User Story** | A User Story exists with no linked Feature | diff --git a/skills/living-doc-pageobject-scan/SKILL.md b/skills/living-doc-pageobject-scan/SKILL.md new file mode 100644 index 0000000..9fbee99 --- /dev/null +++ b/skills/living-doc-pageobject-scan/SKILL.md @@ -0,0 +1,182 @@ +--- +name: living-doc-pageobject-scan +description: > + Explore an existing web application or test codebase to discover, create, and maintain PageObject + classes — the bottom-up entry point for BDD-driven UI testing. Activate when generating + PageObjects from a live webapp URL or test directory, updating PageObjects after UI changes, + bootstrapping a test suite for a new screen, linking discovered UI surfaces to Feature entities + in the living doc, or detecting PageObject drift after a UI refactor. + Triggers on: "scan this webapp", "generate pageobjects", "update pageobjects", + "pageobject for this screen", "crawl the UI", "discover UI elements", "create page objects", + "scan test suite for pageobjects", "living doc bottom-up", "bootstrap page objects", + "pageobject drift", "sync pageobjects". + Does NOT trigger for: creating User Stories (use living-doc-create-user-story), writing BDD + scenarios (use living-doc-scenario-creator). Pairs with living-doc-create-functionality + and living-doc-gap-finder. +--- + +# Living Doc — PageObject Scan + +> **Glossary:** Feature, PageObject, Functionality — see `skills/references/living-doc-glossary.md`. + +**Scope:** This skill generates PageObjects only for `UI` Features (web pages, modals, screens). +API Features use annotated endpoint methods as their living contract anchor — not PageObjects. + +**Selector preference (from glossary):** `data-testid` > `aria-label`/role > CSS class. +Flag any element that only has positional CSS selectors (`nth-child`, `first-of-type`) as fragile +and recommend the development team add a `data-testid` attribute. + +--- + +## Two modes + +| Mode | Input | Use when | +|---|---|---| +| **Create** (initial scan) | App URL or test suite root | No PageObjects exist yet — bootstrapping from scratch | +| **Maintain** (rescan/update) | Existing PageObject files + current app | UI has changed; detect drift and update | + +--- + +## Create mode — initial scan + +### Inputs + +- `url`: root URL of the web application (authenticated access if needed) +- `dir`: path to an existing test suite with step files or PageObject skeletons + +### Workflow + +**1. Crawl the web application** + +Traverse all reachable routes from the root URL: +- Enumerate all distinct routes (paths and query patterns) +- On each route: capture the rendered DOM +- For SPAs: trigger navigation events to reach client-side routes + +**Handling authenticated routes:** + +| Auth type | Strategy | +|---|---| +| Cookie/session | Log in once via Playwright `storageState` and reuse across routes | +| OAuth / OIDC | Inject a pre-issued test token via `localStorage` or `Authorization` header | +| MFA-protected | Use a dedicated test account with MFA disabled, or a TOTP library with a known seed | +| Multi-step wizard | Parse existing step definitions to reconstruct the navigation sequence | + +**2. Discover elements per screen** + +For each distinct screen/route, extract: +- Interactive elements: buttons, links, form inputs, dropdowns, checkboxes +- Display elements: tables, lists, notifications, modals +- Page-level: title, heading (h1), primary URL pattern + +**3. Generate PageObject skeleton** + +One PageObject class per distinct screen. Naming: `Page`. + +```python +# ✅ Generated skeleton — Python / Playwright +class CheckoutPage: + """Checkout screen: /checkout — FEAT-003""" + + ORDER_SUMMARY = '[data-testid="order-summary"]' + CONFIRM_BUTTON = '[data-testid="confirm-order-btn"]' + PROMO_INPUT = '[data-testid="promo-code-input"]' + ERROR_BANNER = '[data-testid="error-banner"]' + + def __init__(self, page): + self.page = page + + def enter_promo_code(self, code: str) -> None: + self.page.fill(self.PROMO_INPUT, code) + + def confirm_order(self) -> None: + self.page.click(self.CONFIRM_BUTTON) + + def assert_error_visible(self, message: str) -> None: + expect(self.page.locator(self.ERROR_BANNER)).to_contain_text(message) +``` + +Include the Feature ID (`FEAT-`) in the class docstring to maintain traceability to the +living doc catalog. + +Flag fragile selectors: + +> "Element `` has a positional CSS selector. Please add: +> `data-testid=''` — e.g. `data-testid='confirm-order-btn'`" + +**4. Map PageObjects to Feature entities** + +One PageObject ≈ one `UI` Feature. For each generated PageObject: +- If a matching Feature (`FEAT-`) exists in the catalog: link them in the manifest +- If no Feature exists: generate a draft Feature stub (JSON) for `living-doc-create-feature` + +**5. Generate Functionality stubs from discovered elements** + +For each interactive element, propose a Functionality stub (`FUNC-`) with a name following +the glossary pattern ``: + +- Button → `"Checkout Page – Confirm Order"` +- Form → `"Login Page – Submit Credentials"` +- Table → `"Order History Page – Display Order List"` + +Output as draft JSON for review — not auto-committed. + +**Dynamic list elements:** + +```python +# ✅ — dynamic lists: use locator methods, not positional selectors +def get_cart_items(self): + return self.page.locator('[data-testid="cart-item"]').all() + +def get_cart_item_by_sku(self, sku: str): + return self.page.locator(f'[data-testid="cart-item"][data-sku="{sku}"]') +``` + +--- + +## Maintain mode — rescan and update + +**1. Diff existing PageObjects against current DOM** + +For each selector in the existing PageObject, check if it still resolves: +- **Present and unchanged**: no action +- **Present but changed**: update selector; log as `UPDATED` +- **Missing**: flag as `BREAKING CHANGE` — linked test steps may fail + +**2. Detect new elements** → propose additions. + +**3. Update PageObject files** — modify selector constants only. Preserve existing action and +assertion method logic. Never auto-delete methods — flag removals for developer review. + +**4. Breaking change report:** + +``` +BREAKING CHANGES DETECTED: + CheckoutPage.CONFIRM_BUTTON: '[data-testid="confirm-order-btn"]' not found in DOM + → Linked step: "When the customer confirms the order" (checkout.feature:14) + → Action required: verify selector and update, or remove step if element is gone +``` + +Use `scripts/manifest_diff.py` to detect stale manifest entries and undocumented PageObject +files before running a full rescan. + +--- + +## Output artifacts + +| Artifact | Location | +|---|---| +| PageObject files | `tests/pages/Page.py` | +| Draft Feature stubs | `docs/living-doc/features/draft/FEAT-.json` | +| Draft Functionality stubs | `docs/living-doc/functionalities/draft/FUNC-.json` | +| Breaking change report | stdout / PR comment | +| Exploration manifest | Path discovered by agent on session start (search for `manifest.json` with `pageobject_path` entries); created at `.copilot/bdd/manifest.json` only if no existing manifest is found | + +--- + +## Out-of-scope redirects + +| Request | Correct skill | +|---|---| +| Generate BDD scenarios for a User Story | `living-doc-scenario-creator` | +| Create a User Story for this screen | `living-doc-create-user-story` | diff --git a/skills/living-doc-pageobject-scan/scripts/manifest_diff.py b/skills/living-doc-pageobject-scan/scripts/manifest_diff.py new file mode 100644 index 0000000..100d28f --- /dev/null +++ b/skills/living-doc-pageobject-scan/scripts/manifest_diff.py @@ -0,0 +1,131 @@ +#!/usr/bin/env python3 +""" +manifest_diff.py — diff .copilot/bdd/manifest.json against PageObject files on disk. + +Usage: + python manifest_diff.py [--manifest PATH] [--root PROJECT_ROOT] + +Checks: + 1. Manifest entries whose pageobject_path does not exist on disk (stale manifest) + 2. PageObject files on disk not referenced in any manifest entry (undocumented POs) + +Defaults: + --manifest .copilot/bdd/manifest.json + --root . (current working directory) + +Exit code: 0 if manifest and disk are in sync, 1 if drift is detected. + +Glossary reference: skills/references/living-doc-glossary.md +""" + +import argparse +import json +import sys +from pathlib import Path + +# Filename patterns that identify PageObject classes +PO_PATTERNS = [ + "**/*Page.py", + "**/*Page.ts", + "**/*Page.tsx", + "**/*Page.java", + "**/*Page.scala", + "**/*PageObject.py", + "**/*PageObject.ts", +] + + +def load_manifest(path: Path) -> list[dict]: + if not path.exists(): + print(f"Error: manifest not found at {path}") + print(" → Run a scan via @living-doc-bdd-copilot to generate the manifest.") + sys.exit(1) + with path.open(encoding="utf-8") as f: + data = json.load(f) + if not isinstance(data, list): + print(f"Error: manifest must be a JSON array. Found: {type(data).__name__}") + sys.exit(1) + return data + + +def find_pageobject_files(root: Path) -> set[str]: + """Return relative paths of all PageObject files under root.""" + found: set[str] = set() + for pattern in PO_PATTERNS: + for p in root.glob(pattern): + # Skip anything under node_modules, .venv, __pycache__ + parts = p.parts + if any(skip in parts for skip in ("node_modules", ".venv", "__pycache__", ".git")): + continue + found.add(str(p.relative_to(root))) + return found + + +def main() -> None: + parser = argparse.ArgumentParser( + description="Diff .copilot/bdd/manifest.json against PageObject files on disk." + ) + parser.add_argument( + "--manifest", + default=".copilot/bdd/manifest.json", + help="Path to manifest.json (default: .copilot/bdd/manifest.json)", + ) + parser.add_argument( + "--root", + default=".", + help="Project root directory (default: current directory)", + ) + args = parser.parse_args() + + root = Path(args.root).resolve() + manifest_path = root / args.manifest + entries = load_manifest(manifest_path) + + disk_pos = find_pageobject_files(root) + manifest_paths = { + e["pageobject_path"] + for e in entries + if e.get("pageobject_path") + } + + stale = sorted(p for p in manifest_paths if not (root / p).exists()) + undocumented = sorted(disk_pos - manifest_paths) + + if not stale and not undocumented: + print( + f"✅ manifest.json is in sync — " + f"{len(entries)} entries, {len(disk_pos)} PageObject file(s) on disk." + ) + return + + # Build a lookup from path → entry for richer stale output + path_to_entry = {e.get("pageobject_path"): e for e in entries if e.get("pageobject_path")} + + if stale: + print(f"{'=' * 60}") + print(f" STALE MANIFEST ENTRIES ({len(stale)})") + print(f" Path in manifest but not found on disk") + print(f"{'=' * 60}") + for po_path in stale: + entry = path_to_entry.get(po_path, {}) + print(f" Feature : {entry.get('feature', '(unknown)')}") + print(f" URL : {entry.get('url', '(unknown)')}") + print(f" PO path : {po_path}") + print(f" → Action: update path in manifest.json, or run RE-SCAN mode") + print() + + if undocumented: + print(f"{'=' * 60}") + print(f" UNDOCUMENTED PAGE OBJECTS ({len(undocumented)})") + print(f" File on disk but not referenced in manifest.json") + print(f"{'=' * 60}") + for po_path in undocumented: + print(f" {po_path}") + print(f" → Action: add an entry to manifest.json, or delete if the file is obsolete") + print() + + sys.exit(1) + + +if __name__ == "__main__": + main() diff --git a/skills/living-doc-scenario-creator/SKILL.md b/skills/living-doc-scenario-creator/SKILL.md new file mode 100644 index 0000000..d487886 --- /dev/null +++ b/skills/living-doc-scenario-creator/SKILL.md @@ -0,0 +1,162 @@ +--- +name: living-doc-scenario-creator +description: > + From User Stories and Acceptance Criteria, generate BDD Gherkin scenario skeletons in + .feature files and identify step implementations needed using available PageObjects. + Activate when generating Gherkin scenarios from a User Story, covering US AC with BDD + scenarios, mapping Given-When-Then to PageObject actions, identifying missing step + definitions, or auditing scenario-to-AC coverage. + Triggers on: "create BDD scenarios for user story", "generate scenarios for US", + "cover AC with scenarios", "generate feature file from user story", "BDD from requirements", + "scenario coverage for US", "map AC to scenarios", "gherkin from user story", "scenarios for US-", + "generate .feature file". + Does NOT trigger for: standalone Gherkin without a User Story (use gherkin-scenario), + implementing step definitions (use gherkin-step), writing unit tests (use test-unit-write), + doc gaps or undocumented behaviors (use living-doc-gap-finder). + Pairs with living-doc-create-user-story, gherkin-scenario, and living-doc-pageobject-scan. +--- + +# Living Doc — Scenario Creator + +> **Glossary:** User Story, AC, PageObject, step definitions — see [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)). + +## Glossary alignment + +**AC ID format:** `AC:-` — e.g. `AC:US-001-01`, `AC:US-001-02` + +**AC traceability tag** (mandatory — placed above every `Scenario:` line): +```gherkin +# AC: US-001-01 (v1.0.0 – Active) — Happy path: customer places an order +Scenario: Customer successfully places an order +``` + +Only ACs with state `Active` or `Implemented` drive scenario generation. +ACs with state `Planned` or `Deprecated` are excluded from generation; note them in the coverage report. + +--- + +## Inputs required + +| Input | Source | Required | +|---|---|---| +| User Story (with ACs) | Living doc catalog or inline JSON | Yes | +| Available PageObjects | `tests/pages/` directory | Recommended | +| Existing step definitions | `tests/steps/` directory | Recommended | + +If PageObjects or step files are not available, generate scenarios with placeholder step text +and flag all steps as `[STEP: MISSING — implement with PageObject method]`. + +--- + +## Workflow + +### Step 1 — Read the User Story + +Load the User Story. Confirm: +- ID follows `US-` format +- At least one AC exists with state `Active` or `Implemented` +- ACs are atomic — each has one input condition and one observable outcome + +### Step 2 — Map each AC to a scenario + +For each active AC, select the scenario pattern by AC type: +- `happy_path` → `Scenario:` or `Scenario Outline:` (if data-driven) +- `error` → `Scenario: ` +- `alternative` → `Scenario: ` + +Generate a scenario for **every** active AC regardless of priority. Tag low-priority AC +scenarios with `@low-priority` so they can be excluded from smoke runs without losing traceability. + +Map Given-When-Then from the AC to existing step definitions — reuse exact step text where found. + +```gherkin +# AC: US-001-01 (v1.0.0 – Active) — Customer places an order with a saved payment method +Scenario: Customer successfully places an order + Given the customer has items in their cart and a saved payment method + When the customer confirms the order + Then the order is confirmed + And a confirmation email is sent to the customer + And the cart is emptied +``` + +### Step 3 — Identify missing steps + +For each step not found in existing step files: + +``` +MISSING STEP: "Given the customer has items in their cart and a saved payment method" + → PageObject candidate: CheckoutPage (FEAT-003) + → Suggested step file: tests/steps/checkout_steps.py + → Suggested implementation: + @given('the customer has items in their cart and a saved payment method') + def step_customer_has_cart_with_payment(context): + context.checkout_page = CheckoutPage(context.browser) + context.checkout_page.add_item_to_cart("SKU-100", quantity=1) + context.checkout_page.set_saved_payment_method() +``` + +### Step 4 — Validate AC coverage + +Every active AC must map to at least one scenario. +Run `scripts/coverage_report.py ` for a full catalog report. + +``` +AC COVERAGE REPORT — US-001 + AC:US-001-01 (Active, critical): ✅ covered by "Customer successfully places an order" + AC:US-001-02 (Active, critical): ✅ covered by "Order rejected when payment card is declined" + AC:US-001-03 (Active, high): ❌ NOT COVERED — added to gap list + AC:US-001-04 (Deprecated): ⏭ skipped — deprecated AC +``` + +Use `scripts/coverage_report.py` to generate this report across the full catalog. + +### Step 5 — Output artifacts + +**`.feature` file** — one per User Story, named `-.feature`: + +```gherkin +Feature: Place an online order + As a registered customer + I can place an order for in-stock items + So that the items are delivered to my address + + # AC: US-001-01 (v1.0.0 – Active) — Happy path: customer places an order + Scenario: Customer successfully places an order + ... + + # AC: US-001-02 (v1.0.0 – Active) — Payment failure path + Scenario: Order rejected when payment card is declined + ... +``` + +**Missing step report** — list of step functions to implement, grouped by step file. + +**Coverage table** — ACs with coverage status (use `scripts/coverage_report.py`). + +--- + +## Step reuse rules + +1. **Narrow to page scope first** — identify which PageObject the scenario's steps interact with. Only look in step definition files that already import or reference that PageObject; those are the most likely reuse candidates. +2. **Match by purpose, not just text** — read the step implementation body to confirm it performs the same business action. Two steps may have identical text but operate on different elements (e.g. a `fill` on `username-input` vs `search-input`). Only reuse if the purpose matches. +3. If a purpose-matching step exists, reuse it as-is; note the file it lives in. +4. Only if no match exists: write a new stub using the `gherkin-step` skill. If an existing step is close but not identical, suggest a parameter to generalise it rather than duplicating. +5. Never create duplicate step definitions — search before creating. + +## File placement + +| Step domain | Step file | +|---|---| +| Authentication | `tests/steps/auth_steps.py` | +| Checkout / order | `tests/steps/checkout_steps.py` | +| Common / shared | `tests/steps/common_steps.py` | +| Domain-specific | `tests/steps/_steps.py` | + +--- + +## Out-of-scope redirects + +| Request | Correct skill | +|---|---| +| Standalone Gherkin without a User Story | `gherkin-scenario` | +| Writing step definition code | `gherkin-step` | diff --git a/skills/living-doc-scenario-creator/scripts/coverage_report.py b/skills/living-doc-scenario-creator/scripts/coverage_report.py new file mode 100644 index 0000000..d0b328c --- /dev/null +++ b/skills/living-doc-scenario-creator/scripts/coverage_report.py @@ -0,0 +1,173 @@ +#!/usr/bin/env python3 +""" +coverage_report.py — AC coverage report: which User Story ACs have linked Gherkin scenarios. + +Usage: + python coverage_report.py + +Scans recursively for '# AC: US--' traceability comments above +Scenario lines. Loads User Story JSON files from and produces a coverage +table showing which ACs are covered and which are gaps. + +Expected User Story JSON structure: + { + "id": "US-001", + "name": "Customer Login", + "status": "active", + "acceptance_criteria": [ + { + "id": "US-001-01", + "text": "The login screen displays...", + "state": "Active" + } + ] + } + +AC link comment format (written by living-doc-scenario-creator): + # AC: US-001-01 (v1.0.0 – Active) — description + Scenario: ... + +Only ACs with state Active or Implemented are included in the coverage check. +Planned and Deprecated ACs are noted but not counted as gaps. + +Exit code: 0 if all active/implemented ACs are covered, 1 if gaps exist. + +Glossary reference: skills/references/living-doc-glossary.md +""" + +import json +import re +import sys +from pathlib import Path + +# Matches the AC ID in a traceability comment: # AC: US-001-01 ... +AC_TAG = re.compile(r"#\s*AC:\s*((?:US|FEAT|FUNC)-\d{3}-\d{2})", re.IGNORECASE) +SCENARIO_LINE = re.compile(r"^\s*(Scenario:|Scenario Outline:)\s*", re.IGNORECASE) + +ACTIVE_STATES = {"active", "implemented"} +SKIP_STATES = {"deprecated", "planned"} + + +def collect_covered_ac_ids(features_dir: Path) -> dict[str, list[str]]: + """Return {ac_id_upper: [feature_filename, ...]} for every AC tag above a Scenario.""" + covered: dict[str, list[str]] = {} + for feature_file in sorted(features_dir.rglob("*.feature")): + lines = feature_file.read_text(encoding="utf-8").splitlines() + for i, line in enumerate(lines): + if not SCENARIO_LINE.match(line): + continue + prev = lines[i - 1].strip() if i > 0 else "" + m = AC_TAG.search(prev) + if m: + ac_id = m.group(1).upper() + covered.setdefault(ac_id, []).append(feature_file.name) + return covered + + +def load_user_stories(living_doc_dir: Path) -> list[dict]: + """Load all User Story JSON files from living_doc_dir or living_doc_dir/user-stories/.""" + search_dirs = [living_doc_dir / "user-stories", living_doc_dir] + stories = [] + for d in search_dirs: + if d.exists(): + for f in sorted(d.glob("*.json")): + try: + data = json.loads(f.read_text(encoding="utf-8")) + # Accept a single US object or a list of US objects + if isinstance(data, list): + stories.extend(data) + elif isinstance(data, dict) and data.get("id", "").startswith("US-"): + stories.append(data) + except (json.JSONDecodeError, OSError) as e: + print(f"Warning: could not parse {f}: {e}", file=sys.stderr) + if stories: + break # stop at the first directory that yields results + return stories + + +def normalise_ac_id(us_id: str, raw_id: str) -> str: + """Normalise AC IDs that may be stored as '01' or 'US-001-01'.""" + raw = raw_id.strip().upper() + if re.match(r"^(US|FEAT|FUNC)-\d{3}-\d{2}$", raw): + return raw + # Stored as just the suffix: '01' → 'US-001-01' + if re.match(r"^\d{2}$", raw): + return f"{us_id.upper()}-{raw}" + return raw + + +def main(living_doc_dir: str, features_dir: str) -> None: + ld = Path(living_doc_dir) + fd = Path(features_dir) + + for p, label in [(ld, "living_doc_dir"), (fd, "features_dir")]: + if not p.exists(): + print(f"Error: {label} not found: {p}") + sys.exit(1) + + covered = collect_covered_ac_ids(fd) + stories = load_user_stories(ld) + + if not stories: + print(f"No User Story JSON files found under {living_doc_dir}") + sys.exit(1) + + total_active = 0 + total_covered = 0 + total_gaps = 0 + + for us in sorted(stories, key=lambda s: s.get("id", "")): + us_id = us.get("id", "?") + title = us.get("name") or us.get("title", "?") + us_status = us.get("status", "active").lower() + acs = us.get("acceptance_criteria", []) + + if not acs: + continue + + print(f"\n{'─' * 60}") + print(f" {us_id} — {title} [{us_status}]") + print(f"{'─' * 60}") + + for ac in acs: + raw_id = ac.get("id", "?") + ac_text = (ac.get("text") or ac.get("description", "?"))[:70] + ac_state = ac.get("state", "Active").lower() + + ac_id = normalise_ac_id(us_id, raw_id) + + if ac_state in SKIP_STATES: + print(f" ⏭ {ac_id} [{ac_state}] — {ac_text}") + continue + + total_active += 1 + files = covered.get(ac_id, []) + + if files: + total_covered += 1 + short = ", ".join(files[:3]) + suffix = f" (+{len(files) - 3} more)" if len(files) > 3 else "" + print(f" ✅ {ac_id} [{ac_state}] — {ac_text}") + print(f" ↳ covered by: {short}{suffix}") + else: + total_gaps += 1 + print(f" ❌ {ac_id} [{ac_state}] — {ac_text}") + print(f" ↳ NOT COVERED — add to scenario generation queue") + + pct = (total_covered * 100 // total_active) if total_active else 0 + + print(f"\n{'=' * 60}") + print(f" COVERAGE SUMMARY") + print(f"{'=' * 60}") + print(f" Active / Implemented ACs : {total_active}") + print(f" Covered by scenarios : {total_covered} ({pct}%)") + print(f" Gaps (no scenario) : {total_gaps}") + + sys.exit(0 if total_gaps == 0 else 1) + + +if __name__ == "__main__": + if len(sys.argv) != 3: + print("Usage: python coverage_report.py ") + sys.exit(1) + main(sys.argv[1], sys.argv[2]) From ae2fa989a909a93c3df4df33fb82609ce34fc052 Mon Sep 17 00:00:00 2001 From: miroslavpojer Date: Fri, 22 May 2026 15:39:46 +0200 Subject: [PATCH 10/35] feat: enhance documentation with new skills and update references --- README.md | 5 +++++ docs/guides/living-doc-copilot.md | 2 +- docs/testing/skill-testing.md | 2 +- skills/gherkin-step/SKILL.md | 23 +++++++++++++++++++++- skills/living-doc-gap-finder/SKILL.md | 3 +-- skills/living-doc-pageobject-scan/SKILL.md | 2 +- 6 files changed, 31 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index b8754d8..7eb55c7 100644 --- a/README.md +++ b/README.md @@ -83,6 +83,11 @@ its purpose, trigger phrases, and full instructions. | **[living-doc-update](./skills/living-doc-update/)** | Amend or deprecate existing User Story, Feature, or Functionality entities — add ACs, change status, update ownership. | | **[living-doc-impact-analysis](./skills/living-doc-impact-analysis/)** | Trace which Features, Functionalities, User Stories, and Gherkin scenarios are affected by a code change or PR. | | **[living-doc-gap-finder](./skills/living-doc-gap-finder/)** | Identify undocumented behaviours, orphan tests, and untested ACs. Shared by `@living-doc-copilot` and `@living-doc-bdd-copilot`. | +| **[living-doc-pageobject-scan](./skills/living-doc-pageobject-scan/)** | Discover, create, and maintain PageObject classes from a live web application — bootstrapping from scratch and detecting selector drift after UI changes. | +| **[living-doc-scenario-creator](./skills/living-doc-scenario-creator/)** | Generate Gherkin scenario skeletons from User Story ACs — one scenario per AC, coverage report, and missing step identification. | +| **[gherkin-scenario](./skills/gherkin-scenario/)** | Write BDD Gherkin scenarios in plain business language — Given/When/Then rules, anti-patterns, Scenario Outlines, and Background. | +| **[gherkin-step](./skills/gherkin-step/)** | Implement clean, reusable step definitions — behave (Python), Cucumber (Java, TypeScript, Scala), parameter types, DataTable, DocString, and hooks. | +| **[gherkin-living-doc-sync](./skills/gherkin-living-doc-sync/)** | Synchronise Gherkin feature files with the living documentation catalog — fix missing AC traceability headers, step text drift, and stale scenario links. | | **[token-saving](./skills/token-saving/)** | Always-active response discipline — enforces brevity, no filler openers or closers, structured output, and a What/Why/How footer on code responses. Suspends on explicit "full detail" requests. | ## Agent Roster diff --git a/docs/guides/living-doc-copilot.md b/docs/guides/living-doc-copilot.md index 4c0a615..435b5eb 100644 --- a/docs/guides/living-doc-copilot.md +++ b/docs/guides/living-doc-copilot.md @@ -108,7 +108,7 @@ Every AC created or updated by this agent carries: ## Handoff -**Inbound:** `@bdd-copilot` hands a surface list after webapp exploration. Load it and create the corresponding Feature and User Story entities. +**Inbound:** `@living-doc-bdd-copilot` hands a surface list after webapp exploration. Load it and create the corresponding Feature and User Story entities. **Outbound:** When entities are confirmed and ready: diff --git a/docs/testing/skill-testing.md b/docs/testing/skill-testing.md index 49aee05..a4942c9 100644 --- a/docs/testing/skill-testing.md +++ b/docs/testing/skill-testing.md @@ -7,7 +7,7 @@ This document provides a comprehensive methodology for testing, evaluating, and ## 1. Recommended workflow 1. Create eval cases in `skills//evals/evals.json` -2. Add fixture files under `skills//evals/fixtures/` when prompts depend on local files +2. Add fixture files under `skills//evals/files/` when prompts depend on local files 3. Start a Copilot CLI session from the repository root 4. Ask Copilot to use the `skill-creator` skill to test the target skill 5. Review outputs and diffs diff --git a/skills/gherkin-step/SKILL.md b/skills/gherkin-step/SKILL.md index 6084433..2d0c291 100644 --- a/skills/gherkin-step/SKILL.md +++ b/skills/gherkin-step/SKILL.md @@ -72,7 +72,7 @@ Use the framework-provided context object, which is instantiated fresh for each | behave (Python) | `context` | Attach attributes: `context.order = ...` | | Cucumber (TypeScript) | `World` class | Extend `World`; access via `this` | | Cucumber (Java) | Shared class via PicoContainer or Spring | Constructor injection | -| Cucumber (Scala) | Shared class via DI | `ScalaDI` or manual injection | +| Cucumber (Scala) | Shared class via DI | PicoContainer or manual injection | ```python # ✅ behave — context carries state across steps @@ -85,6 +85,27 @@ def step_assert_discount(context, rate): assert context.customer.discount_rate() == rate ``` +```java +// ✅ Cucumber (Java) — shared state via PicoContainer constructor injection +public class OrderSteps { + private final OrderWorld world; + + public OrderSteps(OrderWorld world) { + this.world = world; + } + + @Given("a customer with a {string} membership") + public void givenCustomer(String tier) { + world.customer = new Customer(tier); + } + + @Then("the discount is {int}%") + public void thenDiscount(int rate) { + assertEquals(rate, world.customer.discountRate()); + } +} +``` + --- ## Use typed parameters diff --git a/skills/living-doc-gap-finder/SKILL.md b/skills/living-doc-gap-finder/SKILL.md index a730e05..8a9aaf4 100644 --- a/skills/living-doc-gap-finder/SKILL.md +++ b/skills/living-doc-gap-finder/SKILL.md @@ -10,8 +10,7 @@ description: > Triggers on: "find what's not documented", "living doc gaps", "what's missing in living doc", "find undocumented features", "orphan tests", "untested AC", "documentation coverage", "gap report", "what's not covered", "living doc audit", "documentation audit". - Does NOT trigger for: creating new living doc objects (use living-doc-create-* skills), - generating tutorials (use @living-doc-bdd-tutorial-copilot — planned). + Does NOT trigger for: creating new living doc objects (use living-doc-create-* skills). Orchestrates: living-doc-pageobject-scan, living-doc-scenario-creator, and all create-* skills. license: Apache-2.0 compatibility: GitHub Copilot diff --git a/skills/living-doc-pageobject-scan/SKILL.md b/skills/living-doc-pageobject-scan/SKILL.md index 9fbee99..a86832e 100644 --- a/skills/living-doc-pageobject-scan/SKILL.md +++ b/skills/living-doc-pageobject-scan/SKILL.md @@ -17,7 +17,7 @@ description: > # Living Doc — PageObject Scan -> **Glossary:** Feature, PageObject, Functionality — see `skills/references/living-doc-glossary.md`. +> **Glossary:** Feature, PageObject, Functionality — see [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)). **Scope:** This skill generates PageObjects only for `UI` Features (web pages, modals, screens). API Features use annotated endpoint methods as their living contract anchor — not PageObjects. From 313385e918893ef1392c13daf3720c4707019a12 Mon Sep 17 00:00:00 2001 From: miroslavpojer Date: Fri, 22 May 2026 15:45:42 +0200 Subject: [PATCH 11/35] feat: remove outdated implementation roadmap from the repository --- .DS_Store | Bin 0 -> 6148 bytes roadmap.md | 733 ----------------------------------------------------- 2 files changed, 733 deletions(-) create mode 100644 .DS_Store delete mode 100644 roadmap.md diff --git a/.DS_Store b/.DS_Store new file mode 100644 index 0000000000000000000000000000000000000000..5008ddfcf53c02e82d7eee2e57c38e5672ef89f6 GIT binary patch literal 6148 zcmeH~Jr2S!425mzP>H1@V-^m;4Wg<&0T*E43hX&L&p$$qDprKhvt+--jT7}7np#A3 zem<@ulZcFPQ@L2!n>{z**++&mCkOWA81W14cNZlEfg7;MkzE(HCqgga^y>{tEnwC%0;vJ&^%eQ zLs35+`xjp>T0 **Authored from:** `plugin-spec.md` (last reviewed 2026-05-21). -> The spec file has been removed from the repo; this document is the canonical delivery reference. -> **Last updated:** 2026-05-22 - ---- - -## Progress overview - -| Step | Cluster | Agent(s) | Skills | Done | Remaining | -|---|---|---|---|---|---| -| 1 | Living Doc + BDD | `@living-doc-copilot` ✅ `@bdd-copilot` ❌ | 6 / 12 ✅ | 7 files | 7 files | -| 2 | SDET | `@sdet-copilot` ❌ | 0 / 7 | — | 8 files | -| 3 | Code Quality | `@quality-gate-copilot` ❌ | 0 / 7 | — | 8 files | -| 4 | Test Specialist | `@test-specialist-copilot` ❌ | 0 / 6 | — | 7 files | -| 4b | Test Quality | `@test-quality-copilot` ❌ | 0 / 4 | — | 5 files | -| 5 | Standalone | — | 0 / 3 | — | 3 files | -| **Total** | | **6 agents** | **39 skills** | **7** | **38** | - -> **Constraint:** never merge a cluster without both the skill files AND the agent definition in the same PR. - ---- - -## File layout - -``` -skills/ -└── {skill-name}/ - ├── SKILL.md ← required - ├── scripts/ ← optional: executable logic - ├── references/ ← optional: overflow docs (when body approaches 500 lines) - ├── assets/ ← optional: templates, example files - └── evals/ ← optional: trigger + assertion test prompts - -.github/ -└── agents/ - └── {agent-name}.agent.md -``` - -Skill source to migrate from: `/Users/ab024ll/.copilot/skills/{skill-name}/SKILL.md` -Agent files: authored from scratch using the spec definitions below. - -### Agent file format - -```yaml ---- -description: > - -tools: - - read_file - - replace_string_in_file - - create_file - - grep_search - - file_search - - semantic_search - # + run_in_terminal for agents with execute capability - # + mcp_microsoft_pla_browser_* for @bdd-copilot only ---- - - -``` - -### Validation checklist (every skill before merge) - -- [ ] Folder name matches `name` frontmatter exactly — lowercase kebab-case, ≤ 64 chars -- [ ] `description` ≤ 1024 chars; covers *what* and *when*; includes trigger keywords -- [ ] Body < 500 lines; use `references/` with a pointer in body if needed -- [ ] No hardcoded secrets, credentials, or absolute machine-local paths -- [ ] Scripts in `scripts/` are referenced from `SKILL.md` with usage instructions - ---- - -## Step 1 — Living Doc + BDD Cluster - -### Already done ✅ - -| File | Status | -|---|---| -| `.github/agents/living-doc-copilot.agent.md` | ✅ | -| `skills/living-doc-create-feature/SKILL.md` | ✅ | -| `skills/living-doc-create-functionality/SKILL.md` | ✅ | -| `skills/living-doc-create-user-story/SKILL.md` | ✅ | -| `skills/living-doc-gap-finder/SKILL.md` | ✅ | -| `skills/living-doc-impact-analysis/SKILL.md` | ✅ | -| `skills/living-doc-update/SKILL.md` | ✅ | - -### Still required ❌ - -| File | Type | -|---|---| -| `skills/living-doc-pageobject-scan/SKILL.md` | Skill — migrate from `.copilot/skills/` | -| `skills/living-doc-scenario-creator/SKILL.md` | Skill — migrate | -| `skills/gherkin-scenario/SKILL.md` | Skill — migrate | -| `skills/gherkin-step/SKILL.md` | Skill — migrate | -| `skills/gherkin-living-doc-sync/SKILL.md` | Skill — migrate | -| `.github/agents/bdd-copilot.agent.md` | Agent — author from spec | - -### Agent outline: `@bdd-copilot` - -**Frontmatter:** - -```yaml ---- -description: > - Bridge living documentation to executable tests. Explore web apps via MCP Playwright, - generate and maintain PageObjects, Gherkin scenarios, and step definitions. - Handles Phase 0+1 (Business Seed + exploration), Phase 3 (scenario generation), - Phase 6 maintenance (RE-SCAN, HEALING, REMOVE). - Triggers: "scan webapp", "generate pageobjects", "heal pageobjects", "generate scenarios", - "sync gherkin", "playwright crawl", "explore the app", "bdd copilot", "BDD pipeline". -tools: - - read_file - - replace_string_in_file - - create_file - - grep_search - - file_search - - semantic_search - - run_in_terminal - - mcp_microsoft_pla_browser_navigate - - mcp_microsoft_pla_browser_snapshot - - mcp_microsoft_pla_browser_click - - mcp_microsoft_pla_browser_fill_form - - mcp_microsoft_pla_browser_take_screenshot - - mcp_microsoft_pla_browser_type - - mcp_microsoft_pla_browser_wait_for ---- -``` - -**Required body sections:** - -1. **Phase 0 — Business Seed assembly** - - Sources A–E with behaviour per source - - Credential rule: `env:VAR_NAME` in `seed.yaml` always, never literal values - - Output artifact: `.copilot/bdd/seed.yaml` - -2. **Phase 1 — Iterative exploration** - - Load `seed.yaml` + `manifest.json` (if present from prior run); absent manifest = first iteration - - Crawl loop until coverage plateau (no new surfaces in last iteration) - - Report unreachable areas → enrich seed → loop - - Output artifact: `.copilot/bdd/manifest.json` (Feature name, URL, component IDs, PageObject path) - -3. **Source E — Guided traversal protocol** - - Pause at unknown decision points, take screenshot, ask user - - Immediately append to `guided_steps:` in `seed.yaml`: `url`, `action`, `field`, `value` (`env:VAR` if sensitive), `note` - - CAPTCHA rule: pause, wait for human to solve in browser, continue; still record the step - -4. **Phase 3 — Scenario generation**: gap detection vs existing scenarios; generate via `living-doc-scenario-creator`; write step definitions; extend PageObjects - -5. **Phase 6 — Maintenance**: RE-SCAN (new feature/refactor), HEALING (test failures/selector drift), REMOVE (deprecated feature) — triggers and behaviour per mode - -6. **Scope** (10 bullets from spec): - - Load Business Seed + Exploration Manifest before crawling - - Crawl web app via MCP Playwright using manifest-guided navigation - - Fill forms and traverse wizards using business-supplied test values - - Identify Features from discovered UI surfaces - - Detect scenario gaps (existing scenarios vs US ACs) - - Generate Gherkin scenarios from User Story ACs - - Write and extend step definitions - - Heal PageObjects after UI changes (MCP Playwright drift detection) - - Challenge US/AC validity when app behaviour has changed - - Sync Gherkin feature files with living doc - -7. **Does NOT**: create living doc entities (→ `@living-doc-copilot`); write unit/integration tests (→ `@sdet-copilot`); run quality gates (→ `@quality-gate-copilot`) - -8. **Shared skill note**: `living-doc-gap-finder` is used bottom-up here (scenario coverage for known ACs) vs top-down in `@living-doc-copilot` (missing documentation) - -9. **Skills** (6): `living-doc-pageobject-scan`, `living-doc-scenario-creator`, `living-doc-gap-finder`, `gherkin-scenario`, `gherkin-step`, `gherkin-living-doc-sync` — each with path `skills/{name}/SKILL.md` - -10. **Handoff out** (two paths): - - Feature list → `@living-doc-copilot` to document - - After Phase 3: *"Feature files and steps generated. Call @sdet-copilot for unit tests."* - ---- - -### Issue 1.A — Complete remaining Step 1 skills - -**Title:** `[Step 1] Migrate remaining living-doc BDD + gherkin skills` - -**Body:** - -``` -## Summary - -Six skills remain from the Living Doc + BDD cluster. Must ship in the same PR as Issue 1.B -(spec rule: never transfer skills without the agent definition). - -## Files to create - -| Destination | Source | -|---|---| -| `skills/living-doc-pageobject-scan/SKILL.md` | `.copilot/skills/living-doc-pageobject-scan/SKILL.md` | -| `skills/living-doc-scenario-creator/SKILL.md` | `.copilot/skills/living-doc-scenario-creator/SKILL.md` | -| `skills/gherkin-scenario/SKILL.md` | `.copilot/skills/gherkin-scenario/SKILL.md` | -| `skills/gherkin-step/SKILL.md` | `.copilot/skills/gherkin-step/SKILL.md` | -| `skills/gherkin-living-doc-sync/SKILL.md` | `.copilot/skills/gherkin-living-doc-sync/SKILL.md` | - -## Acceptance criteria - -- [ ] All 6 folder names match their `name` frontmatter fields exactly -- [ ] All `description` fields ≤ 1024 chars and include trigger keywords -- [ ] `living-doc-gap-finder` description (already migrated) notes dual shared-skill usage - — verify it covers both top-down (@living-doc-copilot) and bottom-up (@bdd-copilot) -- [ ] `gherkin-scenario` description notes optional @sdet-copilot usage at unit level -- [ ] All bodies < 500 lines (use `references/` with pointer if needed) -- [ ] No hardcoded credentials or absolute local paths -- [ ] Closed in same PR as Issue 1.B - -## Reference - -Spec → Agent Catalog → @bdd-copilot skills table -``` - ---- - -### Issue 1.B — Create @bdd-copilot agent - -**Title:** `[Step 1] Create @bdd-copilot agent definition` - -**Body:** - -``` -## Summary - -Author `.github/agents/bdd-copilot.agent.md` — automation-layer agent for web app -exploration (Phases 0+1), BDD scenario generation (Phase 3), and maintenance (Phase 6). - -## File to create - -`.github/agents/bdd-copilot.agent.md` - -## Required frontmatter - -See roadmap.md → Step 1 → Agent outline: @bdd-copilot for full frontmatter block. -Key requirement: all mcp_microsoft_pla_browser_* tools must be listed. - -## Required body sections - -1. Phase 0 — Business Seed assembly (Sources A–E; credential safety; seed.yaml output) -2. Phase 1 — Iterative exploration (load seed + manifest; plateau detection; manifest.json output) -3. Partial state — seed.yaml present but manifest.json absent = treat as first Phase 1 run -4. Source E — Guided traversal (pause/screenshot/ask/execute/write guided_steps; CAPTCHA rule) -5. Phase 3 — Scenario generation with gap detection -6. Phase 6 — Maintenance (RE-SCAN / HEALING / REMOVE) -7. Scope — 10 bullets -8. Does NOT — with redirect targets -9. Shared skill note for living-doc-gap-finder -10. Skills table — 7 entries with paths -11. Handoff out — two paths; prompts verbatim from spec - -## Acceptance criteria - -- [ ] All MCP Playwright tools listed in frontmatter -- [ ] Sources A–E documented with exact behaviour per source -- [ ] Credential safety rule present (env:VAR_NAME, never literal) -- [ ] Partial state handling documented -- [ ] Guided traversal protocol includes CAPTCHA pause-and-wait -- [ ] Phase 6 all three maintenance modes documented -- [ ] All 7 skills referenced as `skills/{name}/SKILL.md` -- [ ] Handoff prompt exact: "Feature files and steps generated. Call @sdet-copilot for unit tests." -- [ ] Closed in same PR as Issue 1.A -``` - ---- - -### Planned agent: `@living-doc-bdd-tutorial-copilot` - -The tutorial generation capability (previously `living-doc-tutorial-creator` skill) will ship as -a dedicated agent rather than as part of `@living-doc-bdd-copilot`. It will own the full -tutorial authoring pipeline: transform executed BDD scenarios into annotated tutorial documents, -SSML narration scripts, and onboarding walkthroughs. - -| Attribute | Value | -|---|---| -| Agent file | `.github/agents/living-doc-bdd-tutorial-copilot.agent.md` | -| Skill | `skills/living-doc-tutorial-creator/SKILL.md` — migrate from `.copilot/skills/` | -| Inbound trigger | Executed `.feature` files + optional screenshots | -| Output | Annotated tutorial `.md`, SSML narration script | -| Step | Separate step (not yet scheduled) | - ---- - -## Step 2 — SDET Cluster - -### Files to create - -| File | Type | -|---|---| -| `skills/tdd-workflow/SKILL.md` | Skill — migrate from `.copilot/skills/` | -| `skills/test-unit-write/SKILL.md` | Skill — migrate | -| `skills/test-unit-review/SKILL.md` | Skill — migrate | -| `skills/test-unit-standards/SKILL.md` | Skill — migrate | -| `skills/test-case-design/SKILL.md` | Skill — migrate | -| `skills/test-data-management/SKILL.md` | Skill — migrate | -| `skills/test-mocking-patterns/SKILL.md` | Skill — migrate | -| `.github/agents/sdet-copilot.agent.md` | Agent — author from spec | - -### Agent outline: `@sdet-copilot` - -**Frontmatter:** - -```yaml ---- -description: > - Daily developer test-engineering companion. Use for: TDD red-green-refactor, writing - unit and integration tests, reviewing existing test files, designing test case tables, - managing test data and fixtures, choosing test doubles. Phase 4 of the engineering - pipeline. Triggers: "write tests", "TDD", "review my tests", "test doubles", - "test data", "red-green-refactor", "sdet copilot", "write unit tests", - "add tests for", "design test cases", "add coverage for". -tools: - - read_file - - replace_string_in_file - - create_file - - grep_search - - file_search - - semantic_search ---- -``` - -**Required body sections:** - -1. **Technology-neutral escalation constraint** (4-step): express guidance language-agnostic first; if language-specific tooling required ask *"What is your target technology / language?"*; recommend escalating to `@quality-gate-copilot` with the matching language skill (`qa-python`, `qa-java`, `qa-scala`, `qa-typescript`, `qa-dotnet`); if no match, provide generic guidance and note the gap - -2. **Scope** (6 bullets): TDD workflow; write unit and integration test code; review and audit test files; design test case tables; manage test data; choose test doubles - -3. **Does NOT**: run CI quality gates (→ `@quality-gate-copilot`); write Gherkin/BDD *as standalone BDD pipeline deliverables* (→ `@bdd-copilot`; `gherkin-scenario` available optionally at unit level); handle specialised test types — accessibility, security, E2E, API (→ `@test-specialist-copilot`); improve test quality depth — mutation, property-based, flakiness (→ `@test-quality-copilot`) - -4. **Skills** (7): `tdd-workflow`, `test-unit-write`, `test-unit-review`, `test-unit-standards`, `test-case-design`, `test-data-management`, `test-mocking-patterns`; note `gherkin-scenario` as optional 8th when team uses BDD at unit level - -5. **Handoff out**: *"Tests written. Run @quality-gate-copilot to enforce the gate."* - ---- - -### Issue 2.1 — SDET cluster (skills + agent) - -**Title:** `[Step 2] SDET cluster — migrate 7 skills and create @sdet-copilot agent` - -**Body:** - -``` -## Summary - -Migrate 7 SDET skills and author the @sdet-copilot agent definition as a single PR. - -## Files to create - -skills/tdd-workflow/SKILL.md, skills/test-unit-write/SKILL.md, -skills/test-unit-review/SKILL.md, skills/test-unit-standards/SKILL.md, -skills/test-case-design/SKILL.md, skills/test-data-management/SKILL.md, -skills/test-mocking-patterns/SKILL.md, .github/agents/sdet-copilot.agent.md - -## Acceptance criteria — skills - -- [ ] `tdd-workflow` body references SPEC.md-first pattern in the red-green-refactor cycle -- [ ] `test-unit-standards`, `test-unit-write`, and `test-unit-review` cross-reference each other - correctly (rule set vs procedural write vs procedural review distinction) -- [ ] All bodies < 500 lines; all descriptions ≤ 1024 chars -- [ ] All folder names match `name` frontmatter exactly - -## Acceptance criteria — agent - -- [ ] Escalation path says "recommend @quality-gate-copilot", not "load qa-* skill internally" -- [ ] Gherkin Does NOT entry is qualified: "standalone BDD pipeline deliverable"; - optional unit-level exception noted -- [ ] All 7 skills referenced by path `skills/{name}/SKILL.md` -- [ ] Handoff prompt exact: "Tests written. Run @quality-gate-copilot to enforce the gate." - -## Reference - -Spec → Agent Catalog → @sdet-copilot -``` - ---- - -## Step 3 — Code Quality Cluster - -### Files to create - -| File | Type | -|---|---| -| `skills/qa-python/SKILL.md` | Skill — migrate from `.copilot/skills/` | -| `skills/qa-java/SKILL.md` | Skill — migrate | -| `skills/qa-scala/SKILL.md` | Skill — migrate | -| `skills/qa-typescript/SKILL.md` | Skill — migrate | -| `skills/qa-dotnet/SKILL.md` | Skill — migrate | -| `skills/qa-terraform/SKILL.md` | Skill — migrate | -| `skills/test-coverage-gate/SKILL.md` | Skill — migrate | -| `.github/agents/quality-gate-copilot.agent.md` | Agent — author from spec | - -### Agent outline: `@quality-gate-copilot` - -**Frontmatter:** - -```yaml ---- -description: > - Enforce code quality standards — diagnose and fix CI quality gate failures across all - languages and stacks. Use for: linting, formatting, static analysis violations, coverage - thresholds, Javadoc, type annotations, and logging standards. Phase 5 of the pipeline. - Triggers: "quality gate", "CI failing", "coverage below", "lint error", "scalafmt", - "pylint", "quality gate copilot", "fix linting", "coverage threshold", "SpotBugs", - "ESLint violation", "dotnet format", "tflint failure". -tools: - - read_file - - grep_search - - file_search - - semantic_search - - run_in_terminal ---- -``` - -**Required body sections:** - -1. **Scope** (5 bullets): run/fix linting and formatting; fix static analysis violations; configure and enforce coverage thresholds; diagnose CI gate failures per language; apply logging/Javadoc/type annotation standards - -2. **Language routing table** — maps language/stack to skill and path: - - | Language | Skill | Path | - |---|---|---| - | Python | `qa-python` | `skills/qa-python/SKILL.md` | - | Java | `qa-java` | `skills/qa-java/SKILL.md` | - | Scala | `qa-scala` | `skills/qa-scala/SKILL.md` | - | TypeScript / JS | `qa-typescript` | `skills/qa-typescript/SKILL.md` | - | C# / .NET | `qa-dotnet` | `skills/qa-dotnet/SKILL.md` | - | HCL / Terraform | `qa-terraform` | `skills/qa-terraform/SKILL.md` | - | All (coverage) | `test-coverage-gate` | `skills/test-coverage-gate/SKILL.md` | - -3. **Does NOT**: write test code (→ `@sdet-copilot`); handle mutation testing strategy (→ `@test-quality-copilot`); author IaC modules (→ `cps-iac` in `cps-agentic-skills`, not this repo) - -4. **Skills** (7): language column + intent label + path - ---- - -### Issue 3.1 — Code Quality cluster (skills + agent) - -**Title:** `[Step 3] Code Quality cluster — migrate 7 skills and create @quality-gate-copilot agent` - -**Body:** - -``` -## Summary - -Migrate 7 code quality skills and author the @quality-gate-copilot agent as a single PR. - -## Files to create - -skills/qa-python/SKILL.md, skills/qa-java/SKILL.md, skills/qa-scala/SKILL.md, -skills/qa-typescript/SKILL.md, skills/qa-dotnet/SKILL.md, skills/qa-terraform/SKILL.md, -skills/test-coverage-gate/SKILL.md, .github/agents/quality-gate-copilot.agent.md - -## Acceptance criteria — skills - -- [ ] `qa-scala` body covers JMF filter requirement for JaCoCo -- [ ] `test-coverage-gate` distinguishes baseline measurement (no CI block) from - new-code gate (hard fail) -- [ ] Each `qa-*` description includes language-specific trigger keywords -- [ ] All folder names match `name` frontmatter exactly; all descriptions ≤ 1024 chars - -## Acceptance criteria — agent - -- [ ] Language routing table covers all 5 languages + HCL + cross-language coverage -- [ ] `run_in_terminal` present in tools (this agent executes commands) -- [ ] IaC redirect points to `cps-iac` in `cps-agentic-skills`, not this plugin -- [ ] All 7 skills referenced by path - -## Reference - -Spec → Agent Catalog → @quality-gate-copilot -``` - ---- - -## Step 4 — Test Specialist Cluster - -### Files to create - -| File | Type | -|---|---| -| `skills/test-accessibility/SKILL.md` | Skill — migrate from `.copilot/skills/` | -| `skills/test-api-standards/SKILL.md` | Skill — migrate | -| `skills/test-e2e-standards/SKILL.md` | Skill — migrate | -| `skills/test-integration-standards/SKILL.md` | Skill — migrate | -| `skills/test-ui-standards/SKILL.md` | Skill — migrate | -| `skills/test-security/SKILL.md` | Skill — migrate | -| `.github/agents/test-specialist-copilot.agent.md` | Agent — author from spec | - -### Agent outline: `@test-specialist-copilot` - -**Frontmatter:** - -```yaml ---- -description: > - Apply specialised testing for specific test types beyond standard unit tests. - Use for: accessibility (axe-core, WCAG 2.1 AA), API and Pact contract tests, - cross-service E2E, Testcontainers integration isolation, Angular/React/Cypress UI - tests, and SAST/DAST security scanning. Triggers: "a11y test", "Pact", - "E2E standards", "security scan", "Cypress", "Testcontainers", "accessibility", - "contract test", "test specialist copilot", "UI tests", "integration isolation". -tools: - - read_file - - replace_string_in_file - - create_file - - grep_search - - file_search - - semantic_search - - run_in_terminal ---- -``` - -**Required body sections:** - -1. **Specialisation routing table**: - - | Concern | Skill | Path | - |---|---|---| - | Accessibility / WCAG | `test-accessibility` | `skills/test-accessibility/SKILL.md` | - | REST + contract (Pact) | `test-api-standards` | `skills/test-api-standards/SKILL.md` | - | Cross-service E2E | `test-e2e-standards` | `skills/test-e2e-standards/SKILL.md` | - | Testcontainers / DB isolation | `test-integration-standards` | `skills/test-integration-standards/SKILL.md` | - | Angular / React / Cypress | `test-ui-standards` | `skills/test-ui-standards/SKILL.md` | - | SAST / DAST / dep scanning | `test-security` | `skills/test-security/SKILL.md` | - -2. **Scope** (6 bullets from spec) - -3. **Does NOT**: write standard unit tests (→ `@sdet-copilot`); run language-specific quality gates (→ `@quality-gate-copilot`); write BDD scenarios (→ `@bdd-copilot`); improve test quality depth (→ `@test-quality-copilot`) - -4. **Skills** (6): Specialisation column + path - ---- - -### Issue 4.1 — Test Specialist cluster (skills + agent) - -**Title:** `[Step 4] Test Specialist cluster — migrate 6 skills and create @test-specialist-copilot agent` - -**Body:** - -``` -## Summary - -Migrate 6 Test Specialist skills and author the @test-specialist-copilot agent as a single PR. - -## Files to create - -skills/test-accessibility/SKILL.md, skills/test-api-standards/SKILL.md, -skills/test-e2e-standards/SKILL.md, skills/test-integration-standards/SKILL.md, -skills/test-ui-standards/SKILL.md, skills/test-security/SKILL.md, -.github/agents/test-specialist-copilot.agent.md - -## Acceptance criteria — skills - -- [ ] `test-accessibility` covers axe-core, jest-axe, cypress-axe, WCAG 2.1 AA -- [ ] `test-api-standards` covers Pact consumer-driven contracts -- [ ] `test-e2e-standards` is clearly distinguished from `test-ui-standards` - (cross-service boundary vs UI-only) -- [ ] `test-integration-standards` covers Testcontainers and isolation/cleanup rules -- [ ] `test-security` covers SAST (Bandit/Semgrep), DAST (ZAP), dep scanning (Snyk/pip-audit) -- [ ] All folder names match `name` frontmatter; all descriptions ≤ 1024 chars - -## Acceptance criteria — agent - -- [ ] Specialisation routing table covers all 6 concerns -- [ ] Does NOT list distinguishes from all four other test agents -- [ ] All 6 skills referenced by path - -## Reference - -Spec → Agent Catalog → @test-specialist-copilot -``` - ---- - -## Step 4b — Test Quality Cluster - -### Files to create - -| File | Type | -|---|---| -| `skills/test-mutation/SKILL.md` | Skill — migrate from `.copilot/skills/` | -| `skills/test-property-based/SKILL.md` | Skill — migrate | -| `skills/test-flakiness-diagnosis/SKILL.md` | Skill — migrate | -| `skills/test-observability/SKILL.md` | Skill — migrate | -| `.github/agents/test-quality-copilot.agent.md` | Agent — author from spec | - -### Agent outline: `@test-quality-copilot` - -**Frontmatter:** - -```yaml ---- -description: > - Improve depth and reliability of existing tests — the quality improvement layer applied - after baseline tests are in place. Use for: mutation score improvement (mutmut, PIT, - Stryker), property-based testing (Hypothesis, ScalaCheck, fast-check), flaky test - diagnosis and repair, and observability test assertions (logs, metrics, OTel traces). - Triggers: "mutation", "Hypothesis", "flaky test", "test logs", "test metrics", - "surviving mutants", "property-based", "test quality copilot", "improve test quality". -tools: - - read_file - - replace_string_in_file - - create_file - - grep_search - - file_search - - semantic_search - - run_in_terminal ---- -``` - -**Required body sections:** - -1. **Prerequisite note**: called after `@sdet-copilot` has baseline coverage; this is a depth-improvement layer, not a test-writing starter - -2. **Scope** (4 bullets): mutation testing; property-based testing; flaky test diagnosis and repair; observability assertions - -3. **Does NOT**: write new test suites from scratch (→ `@sdet-copilot`); enforce CI quality gates (→ `@quality-gate-copilot`); handle specialised test types (→ `@test-specialist-copilot`) - -4. **Skills** (4): `test-mutation`, `test-property-based`, `test-flakiness-diagnosis`, `test-observability` — Specialisation column + path each - -5. **Handoff out**: None — this agent is terminal. Quality improvements are applied in place; no downstream phase requires a handoff. - ---- - -### Issue 4b.1 — Test Quality cluster (skills + agent) - -**Title:** `[Step 4b] Test Quality cluster — migrate 4 skills and create @test-quality-copilot agent` - -**Body:** - -``` -## Summary - -Migrate 4 Test Quality skills and author the @test-quality-copilot agent as a single PR. -This is a depth-improvement layer; call after @sdet-copilot has established baseline coverage. - -## Files to create - -skills/test-mutation/SKILL.md, skills/test-property-based/SKILL.md, -skills/test-flakiness-diagnosis/SKILL.md, skills/test-observability/SKILL.md, -.github/agents/test-quality-copilot.agent.md - -## Acceptance criteria — skills - -- [ ] `test-mutation` covers mutmut (Python), PIT (Java/Scala), Stryker (JS/TS) -- [ ] `test-property-based` covers Hypothesis, ScalaCheck, fast-check -- [ ] `test-flakiness-diagnosis` covers async timing, shared state, CI environment differences -- [ ] `test-observability` covers structured log assertions, prometheus_client fake registry, - InMemorySpanExporter for OTel spans -- [ ] All folder names match `name` frontmatter; all descriptions ≤ 1024 chars - -## Acceptance criteria — agent - -- [ ] Prerequisite note present: depth-improvement layer, not starter -- [ ] Handoff out section present and states this agent is terminal (no downstream phase) -- [ ] All 4 skills referenced by path - -## Reference - -Spec → Agent Catalog → @test-quality-copilot -``` - ---- - -## Step 5 — Standalone Skills - -No agent file — these 3 skills are below the 3-skill minimum for a dedicated agent per governance rules. - -### Files to create - -| File | Type | -|---|---| -| `skills/pr-review/SKILL.md` | Skill — migrate from `.copilot/skills/` | -| `skills/contract-openapi/SKILL.md` | Skill — migrate | -| `skills/contract-schema-registry/SKILL.md` | Skill — migrate | - -> **Future note:** `contract-openapi` + `contract-schema-registry` are candidates for a -> `@devops-copilot` agent when IaC skills are consolidated here. Until then, standalone. - ---- - -### Issue 5.1 — Standalone skills - -**Title:** `[Step 5] Migrate standalone skills: pr-review, contract-openapi, contract-schema-registry` - -**Body:** - -``` -## Summary - -Migrate 3 standalone skills — final migration step. No agent file required. - -## Files to create - -| Destination | Source | -|---|---| -| `skills/pr-review/SKILL.md` | `.copilot/skills/pr-review/SKILL.md` | -| `skills/contract-openapi/SKILL.md` | `.copilot/skills/contract-openapi/SKILL.md` | -| `skills/contract-schema-registry/SKILL.md` | `.copilot/skills/contract-schema-registry/SKILL.md` | - -## Acceptance criteria - -- [ ] `pr-review` description states it is language-agnostic -- [ ] `contract-openapi` description explicitly distinguishes from `contract-schema-registry` - (REST/OpenAPI vs event schema registry) -- [ ] `contract-schema-registry` description explicitly distinguishes from `test-api-standards` - (schema registry vs Pact consumer-driven contracts) -- [ ] All folder names match `name` frontmatter; all descriptions ≤ 1024 chars -- [ ] No hardcoded internal paths or credentials - -## Reference - -Spec → Standalone Skills section; Governance Rules → Agent scope (Skills < 3 → standalone) -``` - ---- - -## Post-implementation checklist - -After all steps are merged: - -- [ ] Update `docs/README.md` — add rows to the Skill Guides table for each skill that has a companion guide -- [ ] Update `README.md` — add the full skill catalog and agent roster -- [ ] Verify summary totals: 39 unique skill files, 6 agent files -- [ ] Add evals for at least one skill per cluster under `skills/{name}/evals/` (see `docs/testing/skill-testing.md`) -- [ ] Run trigger accuracy test for shared skills (`living-doc-gap-finder`, `gherkin-scenario`) to verify correct agent activates for each intent -- [ ] Confirm `@bdd-copilot` MCP Playwright tools are available in the target deployment environment -- [ ] Cross-check all agent handoff prompts match each other: - - `@bdd-copilot (explore)` → *"Surfaces mapped. Call @living-doc-copilot to document them."* - - `@living-doc-copilot` → *"US and ACs are ready. Call @bdd-copilot to generate scenarios."* - - `@bdd-copilot` → *"Feature files and steps generated. Call @sdet-copilot for unit tests."* - - `@sdet-copilot` → *"Tests written. Run @quality-gate-copilot to enforce the gate."* - - `@quality-gate-copilot` → *"Gate green. Pipeline complete."* - - `@test-quality-copilot` → terminal (no handoff) From 8e33c87e87a74ac1f636f8868502e52263cf2b09 Mon Sep 17 00:00:00 2001 From: miroslavpojer Date: Fri, 22 May 2026 19:53:17 +0200 Subject: [PATCH 12/35] Refactor Gherkin step definitions and Living Doc skills - Updated Gherkin step definitions to remove Java and Scala references, focusing on Python behave and TypeScript Cucumber. - Enhanced Living Doc PageObject scan to include TypeScript examples and clarified output artifact locations. - Revised Living Doc Scenario Creator to improve missing step handling and stub generation, ensuring better integration with PageObjects. - Created a comprehensive implementation roadmap for the Agentic Engineering Toolkit, detailing progress, file layout, and validation checklists. --- roadmap.md | 729 ++++++++++++++++++++ skills/gherkin-step/SKILL.md | 48 +- skills/living-doc-pageobject-scan/SKILL.md | 52 +- skills/living-doc-scenario-creator/SKILL.md | 56 +- 4 files changed, 814 insertions(+), 71 deletions(-) create mode 100644 roadmap.md diff --git a/roadmap.md b/roadmap.md new file mode 100644 index 0000000..cc6b51f --- /dev/null +++ b/roadmap.md @@ -0,0 +1,729 @@ +# Implementation Roadmap — Agentic Engineering Toolkit + +> **Authored from:** `plugin-spec.md` (last reviewed 2026-05-21). +> The spec file has been removed from the repo; this document is the canonical delivery reference. +> **Last updated:** 2026-05-22 + +--- + +## Progress overview + +| Step | Cluster | Agent(s) | Skills | Done | Remaining | +|---|---|---|---|---|---| +| 1 | Living Doc + BDD | `@living-doc-copilot` ✅ `@living-doc-bdd-copilot` ✅ | 11 / 11 ✅ | 13 files | 0 | +| 1b | Tutorial | `@living-doc-bdd-tutorial-copilot` ❌ | 0 / 1 | — | 2 files | +| 2 | SDET | `@sdet-copilot` ❌ | 0 / 7 | — | 8 files | +| 3 | Code Quality | `@quality-gate-copilot` ❌ | 0 / 7 | — | 8 files | +| 4 | Test Specialist | `@test-specialist-copilot` ❌ | 0 / 6 | — | 7 files | +| 4b | Test Quality | `@test-quality-copilot` ❌ | 0 / 4 | — | 5 files | +| 5 | Standalone | — | 0 / 3 | — | 3 files | +| **Total** | | **7 agents** | **40 skills** | **13** | **33** | + +> **Constraint:** never merge a cluster without both the skill files AND the agent definition in the same PR. + +--- + +## File layout + +``` +skills/ +└── {skill-name}/ + ├── SKILL.md ← required + ├── scripts/ ← optional: executable logic + ├── references/ ← optional: overflow docs (when body approaches 500 lines) + ├── assets/ ← optional: templates, example files + └── evals/ ← optional: trigger + assertion test prompts + +.github/ +└── agents/ + └── {agent-name}.agent.md +``` + +Skill source to migrate from: `/Users/ab024ll/.copilot/skills/{skill-name}/SKILL.md` +Agent files: authored from scratch using the spec definitions below. + +### Agent file format + +```yaml +--- +description: > + +tools: + - read_file + - replace_string_in_file + - create_file + - grep_search + - file_search + - semantic_search + # + run_in_terminal for agents with execute capability + # + mcp_microsoft_pla_browser_* for @living-doc-bdd-copilot only +--- + + +``` + +### Validation checklist (every skill before merge) + +- [ ] Folder name matches `name` frontmatter exactly — lowercase kebab-case, ≤ 64 chars +- [ ] `description` ≤ 1024 chars; covers *what* and *when*; includes trigger keywords +- [ ] Body < 500 lines; use `references/` with a pointer in body if needed +- [ ] No hardcoded secrets, credentials, or absolute machine-local paths +- [ ] Scripts in `scripts/` are referenced from `SKILL.md` with usage instructions + +--- + +## Step 1 — Living Doc + BDD Cluster + +### Completed ✅ + +| File | Status | +|---|---| +| `.github/agents/living-doc-copilot.agent.md` | ✅ | +| `.github/agents/living-doc-bdd-copilot.agent.md` | ✅ | +| `skills/living-doc-create-feature/SKILL.md` | ✅ | +| `skills/living-doc-create-functionality/SKILL.md` | ✅ | +| `skills/living-doc-create-user-story/SKILL.md` | ✅ | +| `skills/living-doc-gap-finder/SKILL.md` | ✅ | +| `skills/living-doc-impact-analysis/SKILL.md` | ✅ | +| `skills/living-doc-update/SKILL.md` | ✅ | +| `skills/living-doc-pageobject-scan/SKILL.md` | ✅ | +| `skills/living-doc-scenario-creator/SKILL.md` | ✅ | +| `skills/gherkin-scenario/SKILL.md` | ✅ | +| `skills/gherkin-step/SKILL.md` | ✅ | +| `skills/gherkin-living-doc-sync/SKILL.md` | ✅ | + +### Agent outline: `@living-doc-bdd-copilot` + +**Frontmatter:** + +```yaml +--- +description: > + Bridge living documentation to executable tests. Explore web apps via MCP Playwright, + generate and maintain PageObjects, Gherkin scenarios, and step definitions. + Handles Phase 0+1 (Business Seed + exploration), Phase 3 (scenario generation), + Phase 6 maintenance (RE-SCAN, HEALING, REMOVE). + Triggers: "scan webapp", "generate pageobjects", "heal pageobjects", "generate scenarios", + "sync gherkin", "playwright crawl", "explore the app", "bdd copilot", "BDD pipeline". +tools: + - read_file + - replace_string_in_file + - create_file + - grep_search + - file_search + - semantic_search + - run_in_terminal + - mcp_microsoft_pla_browser_navigate + - mcp_microsoft_pla_browser_snapshot + - mcp_microsoft_pla_browser_click + - mcp_microsoft_pla_browser_fill_form + - mcp_microsoft_pla_browser_take_screenshot + - mcp_microsoft_pla_browser_type + - mcp_microsoft_pla_browser_wait_for +--- +``` + +**Required body sections:** + +1. **Phase 0 — Business Seed assembly** + - Sources A–E with behaviour per source + - Credential rule: `env:VAR_NAME` in `seed.yaml` always, never literal values + - Output artifact: `.copilot/bdd/seed.yaml` + +2. **Phase 1 — Iterative exploration** + - Load `seed.yaml` + `manifest.json` (if present from prior run); absent manifest = first iteration + - Crawl loop until coverage plateau (no new surfaces in last iteration) + - Report unreachable areas → enrich seed → loop + - Output artifact: `.copilot/bdd/manifest.json` (Feature name, URL, component IDs, PageObject path) + +3. **Source E — Guided traversal protocol** + - Pause at unknown decision points, take screenshot, ask user + - Immediately append to `guided_steps:` in `seed.yaml`: `url`, `action`, `field`, `value` (`env:VAR` if sensitive), `note` + - CAPTCHA rule: pause, wait for human to solve in browser, continue; still record the step + +4. **Phase 3 — Scenario generation**: gap detection vs existing scenarios; generate via `living-doc-scenario-creator`; write step definitions; extend PageObjects + +5. **Phase 6 — Maintenance**: RE-SCAN (new feature/refactor), HEALING (test failures/selector drift), REMOVE (deprecated feature) — triggers and behaviour per mode + +6. **Scope** (10 bullets from spec): + - Load Business Seed + Exploration Manifest before crawling + - Crawl web app via MCP Playwright using manifest-guided navigation + - Fill forms and traverse wizards using business-supplied test values + - Identify Features from discovered UI surfaces + - Detect scenario gaps (existing scenarios vs US ACs) + - Generate Gherkin scenarios from User Story ACs + - Write and extend step definitions + - Heal PageObjects after UI changes (MCP Playwright drift detection) + - Challenge US/AC validity when app behaviour has changed + - Sync Gherkin feature files with living doc + +7. **Does NOT**: create living doc entities (→ `@living-doc-copilot`); write unit/integration tests (→ `@sdet-copilot`); run quality gates (→ `@quality-gate-copilot`) + +8. **Shared skill note**: `living-doc-gap-finder` is used bottom-up here (scenario coverage for known ACs) vs top-down in `@living-doc-copilot` (missing documentation) + +9. **Skills** (6): `living-doc-pageobject-scan`, `living-doc-scenario-creator`, `living-doc-gap-finder`, `gherkin-scenario`, `gherkin-step`, `gherkin-living-doc-sync` — each with path `skills/{name}/SKILL.md` + +10. **Handoff out** (two paths): + - Feature list → `@living-doc-copilot` to document + - After Phase 3: *"Feature files and steps generated. Call @sdet-copilot for unit tests."* + +--- + +### Issue 1.A — Complete remaining Step 1 skills + +**Title:** `[Step 1] Migrate remaining living-doc BDD + gherkin skills` + +**Body:** + +``` +## Summary + +Six skills remain from the Living Doc + BDD cluster. Must ship in the same PR as Issue 1.B +(spec rule: never transfer skills without the agent definition). + +## Files to create + +| Destination | Source | +|---|---| +| `skills/living-doc-pageobject-scan/SKILL.md` | `.copilot/skills/living-doc-pageobject-scan/SKILL.md` | +| `skills/living-doc-scenario-creator/SKILL.md` | `.copilot/skills/living-doc-scenario-creator/SKILL.md` | +| `skills/gherkin-scenario/SKILL.md` | `.copilot/skills/gherkin-scenario/SKILL.md` | +| `skills/gherkin-step/SKILL.md` | `.copilot/skills/gherkin-step/SKILL.md` | +| `skills/gherkin-living-doc-sync/SKILL.md` | `.copilot/skills/gherkin-living-doc-sync/SKILL.md` | + +## Acceptance criteria + +- [ ] All 6 folder names match their `name` frontmatter fields exactly +- [ ] All `description` fields ≤ 1024 chars and include trigger keywords +- [ ] `living-doc-gap-finder` description (already migrated) notes dual shared-skill usage + — verify it covers both top-down (@living-doc-copilot) and bottom-up (@living-doc-bdd-copilot) +- [ ] `gherkin-scenario` description notes optional @sdet-copilot usage at unit level +- [ ] All bodies < 500 lines (use `references/` with pointer if needed) +- [ ] No hardcoded credentials or absolute local paths +- [ ] Closed in same PR as Issue 1.B + +## Reference + +Spec → Agent Catalog → @living-doc-bdd-copilot skills table +``` + +--- + +### Issue 1.B — Create @living-doc-bdd-copilot agent + +**Title:** `[Step 1] Create @living-doc-bdd-copilot agent definition` + +**Body:** + +``` +## Summary + +Author `.github/agents/living-doc-bdd-copilot.agent.md` — automation-layer agent for web app +exploration (Phases 0+1), BDD scenario generation (Phase 3), and maintenance (Phase 6). + +## File to create + +`.github/agents/living-doc-bdd-copilot.agent.md` + +## Required frontmatter + +See roadmap.md → Step 1 → Agent outline: @living-doc-bdd-copilot for full frontmatter block. +Key requirement: all mcp_microsoft_pla_browser_* tools must be listed. + +## Required body sections + +1. Phase 0 — Business Seed assembly (Sources A–E; credential safety; seed.yaml output) +2. Phase 1 — Iterative exploration (load seed + manifest; plateau detection; manifest.json output) +3. Partial state — seed.yaml present but manifest.json absent = treat as first Phase 1 run +4. Source E — Guided traversal (pause/screenshot/ask/execute/write guided_steps; CAPTCHA rule) +5. Phase 3 — Scenario generation with gap detection +6. Phase 6 — Maintenance (RE-SCAN / HEALING / REMOVE) +7. Scope — 10 bullets +8. Does NOT — with redirect targets +9. Shared skill note for living-doc-gap-finder +10. Skills table — 7 entries with paths +11. Handoff out — two paths; prompts verbatim from spec + +## Acceptance criteria + +- [ ] All MCP Playwright tools listed in frontmatter +- [ ] Sources A–E documented with exact behaviour per source +- [ ] Credential safety rule present (env:VAR_NAME, never literal) +- [ ] Partial state handling documented +- [ ] Guided traversal protocol includes CAPTCHA pause-and-wait +- [ ] Phase 6 all three maintenance modes documented +- [ ] All 7 skills referenced as `skills/{name}/SKILL.md` +- [ ] Handoff prompt exact: "Feature files and steps generated. Call @sdet-copilot for unit tests." +- [ ] Closed in same PR as Issue 1.A +``` + +--- + +### Planned agent: `@living-doc-bdd-tutorial-copilot` + +The tutorial generation capability (previously `living-doc-tutorial-creator` skill) will ship as +a dedicated agent rather than as part of `@living-doc-bdd-copilot`. It will own the full +tutorial authoring pipeline: transform executed BDD scenarios into annotated tutorial documents, +SSML narration scripts, and onboarding walkthroughs. + +| Attribute | Value | +|---|---| +| Agent file | `.github/agents/living-doc-bdd-tutorial-copilot.agent.md` | +| Skill | `skills/living-doc-tutorial-creator/SKILL.md` — migrate from `.copilot/skills/` | +| Inbound trigger | Executed `.feature` files + optional screenshots | +| Output | Annotated tutorial `.md`, SSML narration script | +| Step | Separate step (not yet scheduled) | + +--- + +## Step 2 — SDET Cluster + +### Files to create + +| File | Type | +|---|---| +| `skills/tdd-workflow/SKILL.md` | Skill — migrate from `.copilot/skills/` | +| `skills/test-unit-write/SKILL.md` | Skill — migrate | +| `skills/test-unit-review/SKILL.md` | Skill — migrate | +| `skills/test-unit-standards/SKILL.md` | Skill — migrate | +| `skills/test-case-design/SKILL.md` | Skill — migrate | +| `skills/test-data-management/SKILL.md` | Skill — migrate | +| `skills/test-mocking-patterns/SKILL.md` | Skill — migrate | +| `.github/agents/sdet-copilot.agent.md` | Agent — author from spec | + +### Agent outline: `@sdet-copilot` + +**Frontmatter:** + +```yaml +--- +description: > + Daily developer test-engineering companion. Use for: TDD red-green-refactor, writing + unit and integration tests, reviewing existing test files, designing test case tables, + managing test data and fixtures, choosing test doubles. Phase 4 of the engineering + pipeline. Triggers: "write tests", "TDD", "review my tests", "test doubles", + "test data", "red-green-refactor", "sdet copilot", "write unit tests", + "add tests for", "design test cases", "add coverage for". +tools: + - read_file + - replace_string_in_file + - create_file + - grep_search + - file_search + - semantic_search +--- +``` + +**Required body sections:** + +1. **Technology-neutral escalation constraint** (4-step): express guidance language-agnostic first; if language-specific tooling required ask *"What is your target technology / language?"*; recommend escalating to `@quality-gate-copilot` with the matching language skill (`qa-python`, `qa-java`, `qa-scala`, `qa-typescript`, `qa-dotnet`); if no match, provide generic guidance and note the gap + +2. **Scope** (6 bullets): TDD workflow; write unit and integration test code; review and audit test files; design test case tables; manage test data; choose test doubles + +3. **Does NOT**: run CI quality gates (→ `@quality-gate-copilot`); write Gherkin/BDD *as standalone BDD pipeline deliverables* (→ `@living-doc-bdd-copilot`; `gherkin-scenario` available optionally at unit level); handle specialised test types — accessibility, security, E2E, API (→ `@test-specialist-copilot`); improve test quality depth — mutation, property-based, flakiness (→ `@test-quality-copilot`) + +4. **Skills** (7): `tdd-workflow`, `test-unit-write`, `test-unit-review`, `test-unit-standards`, `test-case-design`, `test-data-management`, `test-mocking-patterns`; note `gherkin-scenario` as optional 8th when team uses BDD at unit level + +5. **Handoff out**: *"Tests written. Run @quality-gate-copilot to enforce the gate."* + +--- + +### Issue 2.1 — SDET cluster (skills + agent) + +**Title:** `[Step 2] SDET cluster — migrate 7 skills and create @sdet-copilot agent` + +**Body:** + +``` +## Summary + +Migrate 7 SDET skills and author the @sdet-copilot agent definition as a single PR. + +## Files to create + +skills/tdd-workflow/SKILL.md, skills/test-unit-write/SKILL.md, +skills/test-unit-review/SKILL.md, skills/test-unit-standards/SKILL.md, +skills/test-case-design/SKILL.md, skills/test-data-management/SKILL.md, +skills/test-mocking-patterns/SKILL.md, .github/agents/sdet-copilot.agent.md + +## Acceptance criteria — skills + +- [ ] `tdd-workflow` body references SPEC.md-first pattern in the red-green-refactor cycle +- [ ] `test-unit-standards`, `test-unit-write`, and `test-unit-review` cross-reference each other + correctly (rule set vs procedural write vs procedural review distinction) +- [ ] All bodies < 500 lines; all descriptions ≤ 1024 chars +- [ ] All folder names match `name` frontmatter exactly + +## Acceptance criteria — agent + +- [ ] Escalation path says "recommend @quality-gate-copilot", not "load qa-* skill internally" +- [ ] Gherkin Does NOT entry is qualified: "standalone BDD pipeline deliverable"; + optional unit-level exception noted +- [ ] All 7 skills referenced by path `skills/{name}/SKILL.md` +- [ ] Handoff prompt exact: "Tests written. Run @quality-gate-copilot to enforce the gate." + +## Reference + +Spec → Agent Catalog → @sdet-copilot +``` + +--- + +## Step 3 — Code Quality Cluster + +### Files to create + +| File | Type | +|---|---| +| `skills/qa-python/SKILL.md` | Skill — migrate from `.copilot/skills/` | +| `skills/qa-java/SKILL.md` | Skill — migrate | +| `skills/qa-scala/SKILL.md` | Skill — migrate | +| `skills/qa-typescript/SKILL.md` | Skill — migrate | +| `skills/qa-dotnet/SKILL.md` | Skill — migrate | +| `skills/qa-terraform/SKILL.md` | Skill — migrate | +| `skills/test-coverage-gate/SKILL.md` | Skill — migrate | +| `.github/agents/quality-gate-copilot.agent.md` | Agent — author from spec | + +### Agent outline: `@quality-gate-copilot` + +**Frontmatter:** + +```yaml +--- +description: > + Enforce code quality standards — diagnose and fix CI quality gate failures across all + languages and stacks. Use for: linting, formatting, static analysis violations, coverage + thresholds, Javadoc, type annotations, and logging standards. Phase 5 of the pipeline. + Triggers: "quality gate", "CI failing", "coverage below", "lint error", "scalafmt", + "pylint", "quality gate copilot", "fix linting", "coverage threshold", "SpotBugs", + "ESLint violation", "dotnet format", "tflint failure". +tools: + - read_file + - grep_search + - file_search + - semantic_search + - run_in_terminal +--- +``` + +**Required body sections:** + +1. **Scope** (5 bullets): run/fix linting and formatting; fix static analysis violations; configure and enforce coverage thresholds; diagnose CI gate failures per language; apply logging/Javadoc/type annotation standards + +2. **Language routing table** — maps language/stack to skill and path: + + | Language | Skill | Path | + |---|---|---| + | Python | `qa-python` | `skills/qa-python/SKILL.md` | + | Java | `qa-java` | `skills/qa-java/SKILL.md` | + | Scala | `qa-scala` | `skills/qa-scala/SKILL.md` | + | TypeScript / JS | `qa-typescript` | `skills/qa-typescript/SKILL.md` | + | C# / .NET | `qa-dotnet` | `skills/qa-dotnet/SKILL.md` | + | HCL / Terraform | `qa-terraform` | `skills/qa-terraform/SKILL.md` | + | All (coverage) | `test-coverage-gate` | `skills/test-coverage-gate/SKILL.md` | + +3. **Does NOT**: write test code (→ `@sdet-copilot`); handle mutation testing strategy (→ `@test-quality-copilot`); author IaC modules (→ `cps-iac` in `cps-agentic-skills`, not this repo) + +4. **Skills** (7): language column + intent label + path + +--- + +### Issue 3.1 — Code Quality cluster (skills + agent) + +**Title:** `[Step 3] Code Quality cluster — migrate 7 skills and create @quality-gate-copilot agent` + +**Body:** + +``` +## Summary + +Migrate 7 code quality skills and author the @quality-gate-copilot agent as a single PR. + +## Files to create + +skills/qa-python/SKILL.md, skills/qa-java/SKILL.md, skills/qa-scala/SKILL.md, +skills/qa-typescript/SKILL.md, skills/qa-dotnet/SKILL.md, skills/qa-terraform/SKILL.md, +skills/test-coverage-gate/SKILL.md, .github/agents/quality-gate-copilot.agent.md + +## Acceptance criteria — skills + +- [ ] `qa-scala` body covers JMF filter requirement for JaCoCo +- [ ] `test-coverage-gate` distinguishes baseline measurement (no CI block) from + new-code gate (hard fail) +- [ ] Each `qa-*` description includes language-specific trigger keywords +- [ ] All folder names match `name` frontmatter exactly; all descriptions ≤ 1024 chars + +## Acceptance criteria — agent + +- [ ] Language routing table covers all 5 languages + HCL + cross-language coverage +- [ ] `run_in_terminal` present in tools (this agent executes commands) +- [ ] IaC redirect points to `cps-iac` in `cps-agentic-skills`, not this plugin +- [ ] All 7 skills referenced by path + +## Reference + +Spec → Agent Catalog → @quality-gate-copilot +``` + +--- + +## Step 4 — Test Specialist Cluster + +### Files to create + +| File | Type | +|---|---| +| `skills/test-accessibility/SKILL.md` | Skill — migrate from `.copilot/skills/` | +| `skills/test-api-standards/SKILL.md` | Skill — migrate | +| `skills/test-e2e-standards/SKILL.md` | Skill — migrate | +| `skills/test-integration-standards/SKILL.md` | Skill — migrate | +| `skills/test-ui-standards/SKILL.md` | Skill — migrate | +| `skills/test-security/SKILL.md` | Skill — migrate | +| `.github/agents/test-specialist-copilot.agent.md` | Agent — author from spec | + +### Agent outline: `@test-specialist-copilot` + +**Frontmatter:** + +```yaml +--- +description: > + Apply specialised testing for specific test types beyond standard unit tests. + Use for: accessibility (axe-core, WCAG 2.1 AA), API and Pact contract tests, + cross-service E2E, Testcontainers integration isolation, Angular/React/Cypress UI + tests, and SAST/DAST security scanning. Triggers: "a11y test", "Pact", + "E2E standards", "security scan", "Cypress", "Testcontainers", "accessibility", + "contract test", "test specialist copilot", "UI tests", "integration isolation". +tools: + - read_file + - replace_string_in_file + - create_file + - grep_search + - file_search + - semantic_search + - run_in_terminal +--- +``` + +**Required body sections:** + +1. **Specialisation routing table**: + + | Concern | Skill | Path | + |---|---|---| + | Accessibility / WCAG | `test-accessibility` | `skills/test-accessibility/SKILL.md` | + | REST + contract (Pact) | `test-api-standards` | `skills/test-api-standards/SKILL.md` | + | Cross-service E2E | `test-e2e-standards` | `skills/test-e2e-standards/SKILL.md` | + | Testcontainers / DB isolation | `test-integration-standards` | `skills/test-integration-standards/SKILL.md` | + | Angular / React / Cypress | `test-ui-standards` | `skills/test-ui-standards/SKILL.md` | + | SAST / DAST / dep scanning | `test-security` | `skills/test-security/SKILL.md` | + +2. **Scope** (6 bullets from spec) + +3. **Does NOT**: write standard unit tests (→ `@sdet-copilot`); run language-specific quality gates (→ `@quality-gate-copilot`); write BDD scenarios (→ `@living-doc-bdd-copilot`); improve test quality depth (→ `@test-quality-copilot`) + +4. **Skills** (6): Specialisation column + path + +--- + +### Issue 4.1 — Test Specialist cluster (skills + agent) + +**Title:** `[Step 4] Test Specialist cluster — migrate 6 skills and create @test-specialist-copilot agent` + +**Body:** + +``` +## Summary + +Migrate 6 Test Specialist skills and author the @test-specialist-copilot agent as a single PR. + +## Files to create + +skills/test-accessibility/SKILL.md, skills/test-api-standards/SKILL.md, +skills/test-e2e-standards/SKILL.md, skills/test-integration-standards/SKILL.md, +skills/test-ui-standards/SKILL.md, skills/test-security/SKILL.md, +.github/agents/test-specialist-copilot.agent.md + +## Acceptance criteria — skills + +- [ ] `test-accessibility` covers axe-core, jest-axe, cypress-axe, WCAG 2.1 AA +- [ ] `test-api-standards` covers Pact consumer-driven contracts +- [ ] `test-e2e-standards` is clearly distinguished from `test-ui-standards` + (cross-service boundary vs UI-only) +- [ ] `test-integration-standards` covers Testcontainers and isolation/cleanup rules +- [ ] `test-security` covers SAST (Bandit/Semgrep), DAST (ZAP), dep scanning (Snyk/pip-audit) +- [ ] All folder names match `name` frontmatter; all descriptions ≤ 1024 chars + +## Acceptance criteria — agent + +- [ ] Specialisation routing table covers all 6 concerns +- [ ] Does NOT list distinguishes from all four other test agents +- [ ] All 6 skills referenced by path + +## Reference + +Spec → Agent Catalog → @test-specialist-copilot +``` + +--- + +## Step 4b — Test Quality Cluster + +### Files to create + +| File | Type | +|---|---| +| `skills/test-mutation/SKILL.md` | Skill — migrate from `.copilot/skills/` | +| `skills/test-property-based/SKILL.md` | Skill — migrate | +| `skills/test-flakiness-diagnosis/SKILL.md` | Skill — migrate | +| `skills/test-observability/SKILL.md` | Skill — migrate | +| `.github/agents/test-quality-copilot.agent.md` | Agent — author from spec | + +### Agent outline: `@test-quality-copilot` + +**Frontmatter:** + +```yaml +--- +description: > + Improve depth and reliability of existing tests — the quality improvement layer applied + after baseline tests are in place. Use for: mutation score improvement (mutmut, PIT, + Stryker), property-based testing (Hypothesis, ScalaCheck, fast-check), flaky test + diagnosis and repair, and observability test assertions (logs, metrics, OTel traces). + Triggers: "mutation", "Hypothesis", "flaky test", "test logs", "test metrics", + "surviving mutants", "property-based", "test quality copilot", "improve test quality". +tools: + - read_file + - replace_string_in_file + - create_file + - grep_search + - file_search + - semantic_search + - run_in_terminal +--- +``` + +**Required body sections:** + +1. **Prerequisite note**: called after `@sdet-copilot` has baseline coverage; this is a depth-improvement layer, not a test-writing starter + +2. **Scope** (4 bullets): mutation testing; property-based testing; flaky test diagnosis and repair; observability assertions + +3. **Does NOT**: write new test suites from scratch (→ `@sdet-copilot`); enforce CI quality gates (→ `@quality-gate-copilot`); handle specialised test types (→ `@test-specialist-copilot`) + +4. **Skills** (4): `test-mutation`, `test-property-based`, `test-flakiness-diagnosis`, `test-observability` — Specialisation column + path each + +5. **Handoff out**: None — this agent is terminal. Quality improvements are applied in place; no downstream phase requires a handoff. + +--- + +### Issue 4b.1 — Test Quality cluster (skills + agent) + +**Title:** `[Step 4b] Test Quality cluster — migrate 4 skills and create @test-quality-copilot agent` + +**Body:** + +``` +## Summary + +Migrate 4 Test Quality skills and author the @test-quality-copilot agent as a single PR. +This is a depth-improvement layer; call after @sdet-copilot has established baseline coverage. + +## Files to create + +skills/test-mutation/SKILL.md, skills/test-property-based/SKILL.md, +skills/test-flakiness-diagnosis/SKILL.md, skills/test-observability/SKILL.md, +.github/agents/test-quality-copilot.agent.md + +## Acceptance criteria — skills + +- [ ] `test-mutation` covers mutmut (Python), PIT (Java/Scala), Stryker (JS/TS) +- [ ] `test-property-based` covers Hypothesis, ScalaCheck, fast-check +- [ ] `test-flakiness-diagnosis` covers async timing, shared state, CI environment differences +- [ ] `test-observability` covers structured log assertions, prometheus_client fake registry, + InMemorySpanExporter for OTel spans +- [ ] All folder names match `name` frontmatter; all descriptions ≤ 1024 chars + +## Acceptance criteria — agent + +- [ ] Prerequisite note present: depth-improvement layer, not starter +- [ ] Handoff out section present and states this agent is terminal (no downstream phase) +- [ ] All 4 skills referenced by path + +## Reference + +Spec → Agent Catalog → @test-quality-copilot +``` + +--- + +## Step 5 — Standalone Skills + +No agent file — these 3 skills are below the 3-skill minimum for a dedicated agent per governance rules. + +### Files to create + +| File | Type | +|---|---| +| `skills/pr-review/SKILL.md` | Skill — migrate from `.copilot/skills/` | +| `skills/contract-openapi/SKILL.md` | Skill — migrate | +| `skills/contract-schema-registry/SKILL.md` | Skill — migrate | + +> **Future note:** `contract-openapi` + `contract-schema-registry` are candidates for a +> `@devops-copilot` agent when IaC skills are consolidated here. Until then, standalone. + +--- + +### Issue 5.1 — Standalone skills + +**Title:** `[Step 5] Migrate standalone skills: pr-review, contract-openapi, contract-schema-registry` + +**Body:** + +``` +## Summary + +Migrate 3 standalone skills — final migration step. No agent file required. + +## Files to create + +| Destination | Source | +|---|---| +| `skills/pr-review/SKILL.md` | `.copilot/skills/pr-review/SKILL.md` | +| `skills/contract-openapi/SKILL.md` | `.copilot/skills/contract-openapi/SKILL.md` | +| `skills/contract-schema-registry/SKILL.md` | `.copilot/skills/contract-schema-registry/SKILL.md` | + +## Acceptance criteria + +- [ ] `pr-review` description states it is language-agnostic +- [ ] `contract-openapi` description explicitly distinguishes from `contract-schema-registry` + (REST/OpenAPI vs event schema registry) +- [ ] `contract-schema-registry` description explicitly distinguishes from `test-api-standards` + (schema registry vs Pact consumer-driven contracts) +- [ ] All folder names match `name` frontmatter; all descriptions ≤ 1024 chars +- [ ] No hardcoded internal paths or credentials + +## Reference + +Spec → Standalone Skills section; Governance Rules → Agent scope (Skills < 3 → standalone) +``` + +--- + +## Post-implementation checklist + +After all steps are merged: + +- [ ] Update `docs/README.md` — add rows to the Skill Guides table for each skill that has a companion guide +- [ ] Update `README.md` — add the full skill catalog and agent roster +- [ ] Verify summary totals: 40 unique skill files, 7 agent files +- [ ] Add evals for at least one skill per cluster under `skills/{name}/evals/` (see `docs/testing/skill-testing.md`) +- [ ] Run trigger accuracy test for shared skills (`living-doc-gap-finder`, `gherkin-scenario`) to verify correct agent activates for each intent +- [ ] Confirm `@living-doc-bdd-copilot` MCP Playwright tools are available in the target deployment environment +- [ ] Cross-check all agent handoff prompts match each other: + - `@living-doc-bdd-copilot (explore)` → *"Surfaces mapped. Call @living-doc-copilot to document them."* + - `@living-doc-copilot` → *"US and ACs are ready. Call @living-doc-bdd-copilot to generate scenarios."* + - `@living-doc-bdd-copilot` → *"Feature files and steps generated. Call @sdet-copilot for unit tests."* + - `@sdet-copilot` → *"Tests written. Run @quality-gate-copilot to enforce the gate."* + - `@quality-gate-copilot` → *"Gate green. Pipeline complete."* + - `@test-quality-copilot` → terminal (no handoff) diff --git a/skills/gherkin-step/SKILL.md b/skills/gherkin-step/SKILL.md index 2d0c291..d0b085b 100644 --- a/skills/gherkin-step/SKILL.md +++ b/skills/gherkin-step/SKILL.md @@ -4,8 +4,7 @@ description: > Implementing Gherkin step definitions that are clean, reusable, and maintainable. Activate when writing or reviewing step definition code, binding Gherkin text to automation, managing shared state between steps, configuring parameter types, parsing DataTable or DocString arguments, or - setting up Before/After hooks. Covers Python behave, Cucumber for Java and TypeScript, and - Cucumber-Scala idioms. + setting up Before/After hooks. Covers Python behave and Cucumber for TypeScript. Triggers on: "step definitions", "implement Gherkin steps", "Cucumber step", "behave step", "parameter type", "DataTable", "DocString", "Before hook", "After hook", "World object", "step context", "step state sharing", "how to share state between steps", @@ -71,8 +70,6 @@ Use the framework-provided context object, which is instantiated fresh for each |-----------|-------------|---------| | behave (Python) | `context` | Attach attributes: `context.order = ...` | | Cucumber (TypeScript) | `World` class | Extend `World`; access via `this` | -| Cucumber (Java) | Shared class via PicoContainer or Spring | Constructor injection | -| Cucumber (Scala) | Shared class via DI | PicoContainer or manual injection | ```python # ✅ behave — context carries state across steps @@ -85,27 +82,6 @@ def step_assert_discount(context, rate): assert context.customer.discount_rate() == rate ``` -```java -// ✅ Cucumber (Java) — shared state via PicoContainer constructor injection -public class OrderSteps { - private final OrderWorld world; - - public OrderSteps(OrderWorld world) { - this.world = world; - } - - @Given("a customer with a {string} membership") - public void givenCustomer(String tier) { - world.customer = new Customer(tier); - } - - @Then("the discount is {int}%") - public void thenDiscount(int rate) { - assertEquals(rate, world.customer.discountRate()); - } -} -``` - --- ## Use typed parameters @@ -153,25 +129,3 @@ def setup_database(context): if "database" in context.tags: context.db = create_test_db() ``` - ---- - -## Scala — Cucumber-Scala idioms - -```scala -// ✅ — Shared state via constructor injection -class OrderSteps(world: OrderWorld) extends StrictScalaDsl { - - Given("""a customer with a {string} membership""") { (tier: String) => - world.customer = Customer(tier = tier) - } - - When("""the customer purchases {int} units of {string}""") { (qty: Int, sku: String) => - world.order = world.customer.placeOrder(sku, qty) - } - - Then("""the order total is £{double}""") { (expected: Double) => - world.order.total shouldBe expected - } -} -``` diff --git a/skills/living-doc-pageobject-scan/SKILL.md b/skills/living-doc-pageobject-scan/SKILL.md index a86832e..b4e7ee0 100644 --- a/skills/living-doc-pageobject-scan/SKILL.md +++ b/skills/living-doc-pageobject-scan/SKILL.md @@ -75,9 +75,8 @@ One PageObject class per distinct screen. Naming: `Page`. ```python # ✅ Generated skeleton — Python / Playwright +# living-doc: FEAT-003 | /checkout class CheckoutPage: - """Checkout screen: /checkout — FEAT-003""" - ORDER_SUMMARY = '[data-testid="order-summary"]' CONFIRM_BUTTON = '[data-testid="confirm-order-btn"]' PROMO_INPUT = '[data-testid="promo-code-input"]' @@ -96,8 +95,40 @@ class CheckoutPage: expect(self.page.locator(self.ERROR_BANNER)).to_contain_text(message) ``` -Include the Feature ID (`FEAT-`) in the class docstring to maintain traceability to the -living doc catalog. +```typescript +// ✅ Generated skeleton — TypeScript / Playwright +// living-doc: FEAT-003 | /checkout +import { type Page, type Locator, expect } from '@playwright/test'; + +export class CheckoutPage { + readonly orderSummary: Locator; + readonly confirmButton: Locator; + readonly promoInput: Locator; + readonly errorBanner: Locator; + + constructor(readonly page: Page) { + this.orderSummary = page.getByTestId('order-summary'); + this.confirmButton = page.getByTestId('confirm-order-btn'); + this.promoInput = page.getByTestId('promo-code-input'); + this.errorBanner = page.getByTestId('error-banner'); + } + + async enterPromoCode(code: string): Promise { + await this.promoInput.fill(code); + } + + async confirmOrder(): Promise { + await this.confirmButton.click(); + } + + async assertErrorVisible(message: string): Promise { + await expect(this.errorBanner).toContainText(message); + } +} +``` + +The Living Doc Feature link (`FEAT-`) is recorded in a file-level header comment (see +examples above) — not in the class docstring. Flag fragile selectors: @@ -108,7 +139,7 @@ Flag fragile selectors: One PageObject ≈ one `UI` Feature. For each generated PageObject: - If a matching Feature (`FEAT-`) exists in the catalog: link them in the manifest -- If no Feature exists: generate a draft Feature stub (JSON) for `living-doc-create-feature` +- If no Feature exists: invoke `living-doc-create-feature` to produce a draft Feature entity in the project's Storage Profile format **5. Generate Functionality stubs from discovered elements** @@ -119,7 +150,8 @@ the glossary pattern ``: - Form → `"Login Page – Submit Credentials"` - Table → `"Order History Page – Display Order List"` -Output as draft JSON for review — not auto-committed. +Output as draft Functionality entities in the project's Storage Profile format for review — not +auto-committed. Use `living-doc-create-functionality` to produce the canonical output. **Dynamic list elements:** @@ -164,14 +196,16 @@ files before running a full rescan. ## Output artifacts -| Artifact | Location | +| Artifact | Example location | |---|---| | PageObject files | `tests/pages/Page.py` | -| Draft Feature stubs | `docs/living-doc/features/draft/FEAT-.json` | -| Draft Functionality stubs | `docs/living-doc/functionalities/draft/FUNC-.json` | +| Draft Feature entities | `docs/living-doc/features/draft/FEAT-.` | +| Draft Functionality entities | `docs/living-doc/functionalities/draft/FUNC-.` | | Breaking change report | stdout / PR comment | | Exploration manifest | Path discovered by agent on session start (search for `manifest.json` with `pageobject_path` entries); created at `.copilot/bdd/manifest.json` only if no existing manifest is found | +> **Note:** Locations above are illustrative defaults. Actual paths and file formats depend on the project's repository structure and Storage Profile configuration. + --- ## Out-of-scope redirects diff --git a/skills/living-doc-scenario-creator/SKILL.md b/skills/living-doc-scenario-creator/SKILL.md index d487886..726fa7f 100644 --- a/skills/living-doc-scenario-creator/SKILL.md +++ b/skills/living-doc-scenario-creator/SKILL.md @@ -26,7 +26,7 @@ description: > **AC traceability tag** (mandatory — placed above every `Scenario:` line): ```gherkin -# AC: US-001-01 (v1.0.0 – Active) — Happy path: customer places an order +# AC: US-001-01 (v1.0.0 – Active) — customer places an order Scenario: Customer successfully places an order ``` @@ -43,8 +43,8 @@ ACs with state `Planned` or `Deprecated` are excluded from generation; note them | Available PageObjects | `tests/pages/` directory | Recommended | | Existing step definitions | `tests/steps/` directory | Recommended | -If PageObjects or step files are not available, generate scenarios with placeholder step text -and flag all steps as `[STEP: MISSING — implement with PageObject method]`. +If PageObjects or step files are not available, generate scenarios with stub step implementations +(see Step 3 for the two-case protocol: PageObject method found vs. not found). --- @@ -64,8 +64,7 @@ For each active AC, select the scenario pattern by AC type: - `error` → `Scenario: ` - `alternative` → `Scenario: ` -Generate a scenario for **every** active AC regardless of priority. Tag low-priority AC -scenarios with `@low-priority` so they can be excluded from smoke runs without losing traceability. +Generate a scenario for **every** active AC. Map Given-When-Then from the AC to existing step definitions — reuse exact step text where found. @@ -79,15 +78,20 @@ Scenario: Customer successfully places an order And the cart is emptied ``` -### Step 3 — Identify missing steps +### Step 3 — Implement missing step stubs -For each step not found in existing step files: +For each step not found in existing step files, generate a named stub function in the +appropriate step file. Apply the following two-case protocol: + +**Case A — A PageObject method can implement the step:** + +Generate the full stub using the available method: ``` MISSING STEP: "Given the customer has items in their cart and a saved payment method" → PageObject candidate: CheckoutPage (FEAT-003) → Suggested step file: tests/steps/checkout_steps.py - → Suggested implementation: + → Generated stub: @given('the customer has items in their cart and a saved payment method') def step_customer_has_cart_with_payment(context): context.checkout_page = CheckoutPage(context.browser) @@ -95,6 +99,26 @@ MISSING STEP: "Given the customer has items in their cart and a saved payment me context.checkout_page.set_saved_payment_method() ``` +**Case B — No matching PageObject method exists for the step:** + +Generate a stub with a `NotImplementedError` failure guard and flag the gap to +`living-doc-pageobject-scan` (Maintain mode) so it can extend the PageObject: + +``` +MISSING STEP + MISSING PAGEOBJECT METHOD: + "When the customer applies a promo code" + → No matching method found in CheckoutPage (FEAT-003) + → Generated stub (with failure guard): + @when('the customer applies a promo code') + def step_apply_promo_code(context): + raise NotImplementedError( + "Step not implemented: 'the customer applies a promo code'. " + "CheckoutPage (FEAT-003) is missing an 'apply_promo_code' method. " + "Run living-doc-pageobject-scan (Maintain mode) on FEAT-003 to add it." + ) + → Action: invoke living-doc-pageobject-scan (Maintain mode) for the missing element +``` + ### Step 4 — Validate AC coverage Every active AC must map to at least one scenario. @@ -102,10 +126,10 @@ Run `scripts/coverage_report.py ` for a full cata ``` AC COVERAGE REPORT — US-001 - AC:US-001-01 (Active, critical): ✅ covered by "Customer successfully places an order" - AC:US-001-02 (Active, critical): ✅ covered by "Order rejected when payment card is declined" - AC:US-001-03 (Active, high): ❌ NOT COVERED — added to gap list - AC:US-001-04 (Deprecated): ⏭ skipped — deprecated AC + AC:US-001-01 (Active): ✅ covered by "Customer successfully places an order" + AC:US-001-02 (Active): ✅ covered by "Order rejected when payment card is declined" + AC:US-001-03 (Active): ❌ NOT COVERED — added to gap list + AC:US-001-04 (Deprecated): ⏭ skipped — deprecated AC ``` Use `scripts/coverage_report.py` to generate this report across the full catalog. @@ -120,7 +144,7 @@ Feature: Place an online order I can place an order for in-stock items So that the items are delivered to my address - # AC: US-001-01 (v1.0.0 – Active) — Happy path: customer places an order + # AC: US-001-01 (v1.0.0 – Active) — customer places an order Scenario: Customer successfully places an order ... @@ -129,7 +153,7 @@ Feature: Place an online order ... ``` -**Missing step report** — list of step functions to implement, grouped by step file. +**Missing step report** — generated stub implementations grouped by step file; Case B stubs include `NotImplementedError` failure guards and flag missing PageObject methods for extension (see Step 3). **Coverage table** — ACs with coverage status (use `scripts/coverage_report.py`). @@ -145,13 +169,15 @@ Feature: Place an online order ## File placement -| Step domain | Step file | +| Step domain | Example step file | |---|---| | Authentication | `tests/steps/auth_steps.py` | | Checkout / order | `tests/steps/checkout_steps.py` | | Common / shared | `tests/steps/common_steps.py` | | Domain-specific | `tests/steps/_steps.py` | +> **Note:** Paths above are illustrative examples. Actual file locations depend on the project's repository structure. + --- ## Out-of-scope redirects From c63f2dd7cc883ca3f771599597400715578fc898 Mon Sep 17 00:00:00 2001 From: miroslavpojer Date: Fri, 22 May 2026 20:19:32 +0200 Subject: [PATCH 13/35] feat: update Gherkin step definitions and enhance living documentation skills with new trigger phrases and improved descriptions --- skills/gherkin-step/SKILL.md | 3 ++- .../evals/trigger-eval.json | 4 ++- skills/living-doc-create-user-story/SKILL.md | 3 ++- .../evals/evals.json | 3 +-- skills/living-doc-gap-finder/evals/evals.json | 25 +++++++++++-------- .../evals/trigger-eval.json | 3 ++- .../evals/evals.json | 8 +++--- skills/living-doc-update/evals/evals.json | 3 +-- .../living-doc-update/evals/trigger-eval.json | 2 +- 9 files changed, 31 insertions(+), 23 deletions(-) diff --git a/skills/gherkin-step/SKILL.md b/skills/gherkin-step/SKILL.md index d0b085b..09b3766 100644 --- a/skills/gherkin-step/SKILL.md +++ b/skills/gherkin-step/SKILL.md @@ -4,7 +4,8 @@ description: > Implementing Gherkin step definitions that are clean, reusable, and maintainable. Activate when writing or reviewing step definition code, binding Gherkin text to automation, managing shared state between steps, configuring parameter types, parsing DataTable or DocString arguments, or - setting up Before/After hooks. Covers Python behave and Cucumber for TypeScript. + setting up Before/After hooks. Covers Python behave, Cucumber for Java and TypeScript, and + Cucumber-Scala idioms. Triggers on: "step definitions", "implement Gherkin steps", "Cucumber step", "behave step", "parameter type", "DataTable", "DocString", "Before hook", "After hook", "World object", "step context", "step state sharing", "how to share state between steps", diff --git a/skills/living-doc-create-functionality/evals/trigger-eval.json b/skills/living-doc-create-functionality/evals/trigger-eval.json index e7a6c73..619f568 100644 --- a/skills/living-doc-create-functionality/evals/trigger-eval.json +++ b/skills/living-doc-create-functionality/evals/trigger-eval.json @@ -12,5 +12,7 @@ {"id": 11, "query": "Create a user story for the checkout capability", "should_trigger": false, "reason": "User Story — routes to living-doc-create-user-story"}, {"id": 12, "query": "Document the checkout page as a Feature", "should_trigger": false, "reason": "Feature entity — routes to living-doc-create-feature"}, {"id": 13, "query": "Generate BDD scenarios for US-001", "should_trigger": false, "reason": "Scenario generation — routes to living-doc-scenario-creator"}, - {"id": 14, "query": "Run a gap analysis on the living documentation", "should_trigger": false, "reason": "Gap analysis — routes to living-doc-gap-finder"} + {"id": 14, "query": "Run a gap analysis on the living documentation", "should_trigger": false, "reason": "Gap analysis — routes to living-doc-gap-finder"}, + {"id": 15, "query": "How should I define the component behavior for the payment validator?", "should_trigger": true, "reason": "'define component behavior' trigger phrase"}, + {"id": 16, "query": "Write atomic acceptance criteria for the session expiry logic", "should_trigger": true, "reason": "'atomic acceptance criteria' trigger phrase"} ] diff --git a/skills/living-doc-create-user-story/SKILL.md b/skills/living-doc-create-user-story/SKILL.md index 9a91833..396c7ca 100644 --- a/skills/living-doc-create-user-story/SKILL.md +++ b/skills/living-doc-create-user-story/SKILL.md @@ -7,7 +7,8 @@ description: > Criteria, or validating User Story completeness before handing off to scenario creation. Triggers on: "create a user story", "new user story for", "write acceptance criteria for", "document a business requirement", "define US AC", "user story template", "as a user I want", - "elicit requirements", "AC for user story", "US acceptance criteria". + "elicit requirements", "AC for user story", "US acceptance criteria", + "review this user story", "is my narrative well-formed". Does NOT trigger for: atomic component behaviors (use living-doc-create-functionality), documenting system surfaces (use living-doc-create-feature), generating BDD scenarios (use living-doc-scenario-creator). diff --git a/skills/living-doc-create-user-story/evals/evals.json b/skills/living-doc-create-user-story/evals/evals.json index 401c846..8492b15 100644 --- a/skills/living-doc-create-user-story/evals/evals.json +++ b/skills/living-doc-create-user-story/evals/evals.json @@ -5,7 +5,7 @@ "id": 1, "category": "happy-path", "prompt": "I want to create a new User Story for the password reset capability.", - "expected_output": "Agent asks three elicitation questions in sequence: (1) Who is the user? (2) What do they want to do? (3) Why / what business outcome? After answers, forms the As-a/I-can/so-that narrative. Then asks for domain context (which Feature). Then elicits ACs in Given-When-Then. Checks for error and alternative paths (unregistered email, expired token, already-used token). Assigns priorities. Outputs canonical User Story JSON with at least 3 ACs (happy path + at least 2 error/alternative).", + "expected_output": "Agent asks three elicitation questions in sequence: (1) Who is the user? (2) What do they want to do? (3) Why / what business outcome? After answers, forms the As-a/I-can/so-that narrative. Then asks for domain context (which Feature). Then elicits ACs in Given-When-Then. Checks for error and alternative paths (unregistered email, expired token, already-used token). Outputs canonical User Story JSON with at least 3 ACs (happy path + at least 2 error/alternative).", "files": [], "expectations": [ "Asks actor, capability, and business value as distinct questions before writing narrative", @@ -13,7 +13,6 @@ "Asks which Feature(s) this story touches", "Elicits at least one error-path AC (e.g. unregistered email, expired token)", "Warns if only happy-path AC is provided", - "Assigns independent priority to each AC", "Outputs valid canonical UserStory JSON" ] }, diff --git a/skills/living-doc-gap-finder/evals/evals.json b/skills/living-doc-gap-finder/evals/evals.json index 8fb49d3..62548a0 100644 --- a/skills/living-doc-gap-finder/evals/evals.json +++ b/skills/living-doc-gap-finder/evals/evals.json @@ -5,17 +5,21 @@ "id": 1, "category": "happy-path", "prompt": "Run a gap analysis on our living documentation. File: evals/files/catalog-snapshot.json", - "expected_output": "Agent analyzes the snapshot and produces a gap report with: Blocker — US-001-AC-2 and US-001-AC-3 (critical ACs) have no linked tests; Blocker — all 4 ACs of US-007 have no linked tests; Important — /account/preferences screen discovered in webapp with no Feature entity; Important — FEAT-orphan (Legacy Report Screen) has no User Stories and no Functionalities; Important — test_order_history.py and test_login_flow.feature have no linked ACs (orphan tests); Nit — FUNC-apply-discount has 5 ACs with no linked tests. Documentation coverage = 1/9 ACs covered = 11%.", + "expected_output": "Agent analyzes the snapshot and produces a gap report with: Blocker — US-001-AC-2 and US-001-AC-3 have no linked tests; Blocker — US-002-AC-1 and US-002-AC-2 have no linked tests; Blocker — all 4 ACs of US-007 have no linked tests; Blocker — FUNC-apply-discount has 5 ACs with no linked tests (Gap type 1 covers Functionality ACs, not just User Story ACs); Important — /account/preferences screen discovered in webapp with no Feature entity; Important — FEAT-promo and FEAT-orphan each have no linked User Stories (orphan Features); Important — test_order_history.py, test_login_flow.feature, and the 'View paginated order history' BDD scenario have no linked ACs (orphan tests); Nit — FEAT-orphan has no Functionalities (empty Feature). Documentation coverage reported for both US ACs and Functionality ACs separately.", "files": [ "evals/files/catalog-snapshot.json" ], "expectations": [ - "Identifies US-001-AC-2 and US-001-AC-3 as untested critical ACs (Blockers)", + "Identifies US-001-AC-2 and US-001-AC-3 as untested (Blockers)", + "Identifies US-002-AC-1 and US-002-AC-2 as untested (Blockers)", "Identifies all 4 US-007 ACs as untested (Blockers)", + "Identifies FUNC-apply-discount ACs as untested (Blocker, not Nit — Gap type 1 applies to Functionality ACs)", "Identifies /account/preferences as undocumented surface (Important)", + "Identifies FEAT-promo as orphan Feature (Important)", "Identifies FEAT-orphan as orphan Feature (Important)", "Identifies test_order_history.py and test_login_flow.feature as orphan tests (Important)", - "Identifies FUNC-apply-discount ACs as untested (Nit)", + "Identifies 'View paginated order history' BDD scenario as orphan test (Important)", + "Identifies FEAT-orphan as empty Feature (Nit — no Functionalities)", "Calculates documentation coverage percentage" ] }, @@ -35,10 +39,10 @@ "id": 3, "category": "happy-path", "prompt": "A test file exists with no linked AC. What gap type is this and what should I do?", - "expected_output": "This is an orphan test (Gap type 4 — Important). Resolution options: (1) find an existing AC in the living doc that this test covers and add the link; (2) if no AC exists, create a Functionality entity for the behavior being tested using living-doc-create-functionality, then link the test to the new Functionality's AC. Never delete a test to resolve an orphan — that would remove coverage.", + "expected_output": "This is an orphan test (Gap type 6 — Important). Resolution options: (1) find an existing AC in the living doc that this test covers and add the link; (2) if no AC exists, create a Functionality entity for the behavior being tested using living-doc-create-functionality, then link the test to the new Functionality's AC. Never delete a test to resolve an orphan — that would remove coverage.", "files": [], "expectations": [ - "Classifies as Gap type 4: ORPHAN_TEST", + "Classifies as Gap type 6: ORPHAN_TEST", "Provides two resolution options: link to existing AC, or create new Functionality", "Explicitly warns against deleting the test" ] @@ -97,13 +101,14 @@ "id": 8, "category": "output-format", "prompt": "Run a gap analysis and show me exactly what format the output report uses.", - "expected_output": "The gap report has two clearly labelled sections: 'Missing Scenarios' (ACs with no linked Gherkin scenario) and 'Missing ACs' (scenarios with no corresponding AC). Each missing item is a bulleted entry with the AC or scenario ID and its description. The report ends with a count summary line: 'X ACs missing scenarios, Y scenarios missing ACs'. No implementation changes are suggested — the report is diagnostic only.", + "expected_output": "The gap report is emitted as structured JSON (or a formatted rendering of it) with a top-level `documentation_coverage` section (coverage_percentage, user_stories_with_full_coverage, user_stories_with_gaps) and a `gaps[]` array. Each gap item includes: id (GAP-NNN), type (one of UNTESTED_AC, UNDOCUMENTED_SURFACE, ORPHAN_FEATURE, ORPHAN_USER_STORY, ORPHAN_FUNCTIONALITY, ORPHAN_TEST, STALE_REFERENCE, UNDOCUMENTED_FUNCTIONALITY, EMPTY_FEATURE), severity (Blocker/Important/Nit), entity (the affected entity ID or path), description, and proposed_action. Gaps are ordered by severity (Blocker first, then Important, then Nit). The report is diagnostic only — no entity creation or modification is made.", "files": [], "expectations": [ - "Two sections: Missing Scenarios and Missing ACs", - "Each item is a bulleted entry with ID and description", - "Count summary line at the end", - "Diagnostic only — no implementation suggestions" + "Report includes top-level documentation_coverage section with coverage_percentage", + "gaps[] array present; each item has id, type, severity, entity, description, proposed_action", + "Gap type codes are canonical: UNTESTED_AC, ORPHAN_TEST, ORPHAN_FEATURE, etc.", + "Gaps ordered by severity (Blocker before Important before Nit)", + "Diagnostic only — no entity creation or modification" ] } ] diff --git a/skills/living-doc-gap-finder/evals/trigger-eval.json b/skills/living-doc-gap-finder/evals/trigger-eval.json index d558fb7..b2b0fa0 100644 --- a/skills/living-doc-gap-finder/evals/trigger-eval.json +++ b/skills/living-doc-gap-finder/evals/trigger-eval.json @@ -12,5 +12,6 @@ {"id": 11, "query": "Find what's not documented in our test suite", "should_trigger": true, "reason": "'find what's not documented' trigger phrase"}, {"id": 12, "query": "Create a user story for the preferences screen gap", "should_trigger": false, "reason": "Creating a User Story — routes to living-doc-create-user-story"}, {"id": 13, "query": "Create a Feature entity for the account preferences screen", "should_trigger": false, "reason": "Creating a Feature — routes to living-doc-create-feature"}, - {"id": 14, "query": "Generate a tutorial from the checkout .feature file", "should_trigger": false, "reason": "Tutorial generation — routes to living-doc-tutorial-creator"} + {"id": 14, "query": "Generate a tutorial from the checkout .feature file", "should_trigger": false, "reason": "Tutorial generation — routes to living-doc-tutorial-creator"}, + {"id": 15, "query": "Do a documentation audit to check for missing tests before the go-live", "should_trigger": true, "reason": "'documentation audit' trigger phrase"} ] diff --git a/skills/living-doc-impact-analysis/evals/evals.json b/skills/living-doc-impact-analysis/evals/evals.json index 00b966c..a184185 100644 --- a/skills/living-doc-impact-analysis/evals/evals.json +++ b/skills/living-doc-impact-analysis/evals/evals.json @@ -9,7 +9,7 @@ "expected": { "skill_triggered": true, "key_guidance": [ - "Map changed file to Feature via FEATURE_REGISTRY.md", + "Map changed file to Feature via the feature_registry section in catalog.json (or run trace_impact.py --catalog catalog.json)", "Trace Feature → Functionality → User Stories → ACs", "Classify impact level (High for changed business logic)", "Output structured impact map" @@ -48,8 +48,8 @@ { "id": 4, "type": "regression", - "description": "Changed module has no entry in FEATURE_REGISTRY.md — should flag the gap.", - "prompt": "The ShippingCalculator.java was changed in the PR but it doesn't appear in FEATURE_REGISTRY.md.", + "description": "Changed module has no entry in the feature_registry section of catalog.json — should flag the gap.", + "prompt": "The ShippingCalculator.java was changed in the PR but it has no entry in the feature_registry section of catalog.json.", "expected": { "skill_triggered": true, "key_guidance": [ @@ -101,7 +101,7 @@ "expected": { "skill_triggered": true, "key_guidance": [ - "Map changed code to Feature via FEATURE_REGISTRY.md", + "Map changed code to Feature via the feature_registry section in catalog.json", "Trace Feature → Functionality → User Stories → ACs", "List all linked Gherkin scenarios that need re-running", "Output structured re-test checklist" diff --git a/skills/living-doc-update/evals/evals.json b/skills/living-doc-update/evals/evals.json index 8a5958f..d32ea68 100644 --- a/skills/living-doc-update/evals/evals.json +++ b/skills/living-doc-update/evals/evals.json @@ -53,8 +53,7 @@ "expected": { "skill_triggered": true, "key_guidance": [ - "Update owners array in Feature JSON", - "Update FEATURE_REGISTRY.md row", + "Update owners array in Feature entity JSON", "Add owner_changed_at and owner_change_reason fields", "Notify new owner if open User Stories exist" ] diff --git a/skills/living-doc-update/evals/trigger-eval.json b/skills/living-doc-update/evals/trigger-eval.json index e3e6294..3291af0 100644 --- a/skills/living-doc-update/evals/trigger-eval.json +++ b/skills/living-doc-update/evals/trigger-eval.json @@ -51,7 +51,7 @@ "id": "t09-update-feature-registry", "query": "How do I update the Feature Registry after the team restructure?", "should_trigger": true, - "reason": "'update feature registry' is a listed trigger phrase." + "reason": "Updating Feature ownership is a core living-doc-update task covered by 'change feature owner'; Feature Registry updates are part of Feature ownership changes handled in the skill body." }, { "id": "t10-not-create-us", From b983ef6a64fb58155ceb92797c7a683b1ff2837f Mon Sep 17 00:00:00 2001 From: miroslavpojer Date: Fri, 22 May 2026 20:30:28 +0200 Subject: [PATCH 14/35] Add evaluation configurations for Gherkin and PageObject skills - Introduced trigger evaluations for Gherkin living doc sync to identify sync-related queries. - Added evaluation scenarios for Gherkin scenario creation, focusing on writing and reviewing BDD scenarios. - Implemented evaluations for Gherkin step definitions, emphasizing the distinction between Gherkin text and step binding code. - Created evaluations for living doc page object scanning, including bootstrap and maintain modes for PageObject generation. - Established evaluations for living doc scenario creation, generating BDD scenarios from user stories and handling coverage reports. --- .../gherkin-living-doc-sync/evals/evals.json | 112 +++++++++++++++++ .../evals/trigger-eval.json | 17 +++ skills/gherkin-scenario/evals/evals.json | 112 +++++++++++++++++ .../gherkin-scenario/evals/trigger-eval.json | 18 +++ skills/gherkin-step/evals/evals.json | 108 ++++++++++++++++ skills/gherkin-step/evals/trigger-eval.json | 19 +++ .../evals/evals.json | 115 ++++++++++++++++++ .../evals/trigger-eval.json | 16 +++ .../evals/evals.json | 112 +++++++++++++++++ .../evals/trigger-eval.json | 16 +++ 10 files changed, 645 insertions(+) create mode 100644 skills/gherkin-living-doc-sync/evals/evals.json create mode 100644 skills/gherkin-living-doc-sync/evals/trigger-eval.json create mode 100644 skills/gherkin-scenario/evals/evals.json create mode 100644 skills/gherkin-scenario/evals/trigger-eval.json create mode 100644 skills/gherkin-step/evals/evals.json create mode 100644 skills/gherkin-step/evals/trigger-eval.json create mode 100644 skills/living-doc-pageobject-scan/evals/evals.json create mode 100644 skills/living-doc-pageobject-scan/evals/trigger-eval.json create mode 100644 skills/living-doc-scenario-creator/evals/evals.json create mode 100644 skills/living-doc-scenario-creator/evals/trigger-eval.json diff --git a/skills/gherkin-living-doc-sync/evals/evals.json b/skills/gherkin-living-doc-sync/evals/evals.json new file mode 100644 index 0000000..5cc8fbe --- /dev/null +++ b/skills/gherkin-living-doc-sync/evals/evals.json @@ -0,0 +1,112 @@ +{ + "skill_name": "gherkin-living-doc-sync", + "evals": [ + { + "id": 1, + "category": "happy-path", + "prompt": "checkout.feature has 3 scenarios but none of them have a # AC: comment above them. What should I do?", + "expected_output": "Agent identifies all three scenarios as missing AC link headers (sync direction: feature file → living doc). For each scenario, it proposes a matching AC from the living doc catalog and outputs a SYNC ACTION block per scenario showing the proposed # AC: line to insert. Format: 'SYNC ACTION: checkout.feature: — Missing AC link header — Proposed link: # AC: () — '. Asks the developer to confirm each mapping before applying.", + "files": [], + "expectations": [ + "Identifies all scenarios missing # AC: comments", + "Proposes a matching AC from the catalog for each scenario", + "Outputs a SYNC ACTION block for each affected scenario", + "Does not apply changes without developer confirmation", + "Does not delete any scenario" + ] + }, + { + "id": 2, + "category": "happy-path", + "prompt": "The product owner changed the description of AC:US-001-01 in the living doc. The linked scenario in checkout.feature still has the old # AC: description text. How do I fix this?", + "expected_output": "Sync direction is living doc → feature file. Agent updates the # AC: comment text above the linked scenario to reflect the new AC description. The AC ID itself is never changed — only the description text in the comment. Outputs: OLD: '# AC: US-001-01 (v1.0.0 – Active) — ' and NEW: '# AC: US-001-01 (v1.1.0 – Active) — '. Flags any step text that may also need updating to match the revised AC intent.", + "files": [], + "expectations": [ + "Sync direction: living doc → feature file", + "Updates comment description text only — AC ID remains stable", + "Shows old and new comment text clearly labelled", + "Flags step text for review if the AC intent changed significantly", + "Does not change the scenario structure" + ] + }, + { + "id": 3, + "category": "happy-path", + "prompt": "After renaming a button's data-testid, the step 'When the customer clicks the Confirm Purchase button' no longer matches any step definition. The old step was 'When the customer confirms the order'. How do I fix this?", + "expected_output": "Sync direction: step text → PageObject. Agent outputs a DRIFT DETECTED block: identifies the broken step text, the previous matching step definition, the PageObject method it delegates to (CheckoutPage.confirm_order()), and proposes two fix options: (1) update the .feature file step text to match the existing step definition, or (2) update the step definition regex to match the new wording. Recommends option 1 (update feature file) as the lower-risk change since the PageObject method and step definition are already working.", + "files": [], + "expectations": [ + "Outputs DRIFT DETECTED block with affected file and line number", + "Identifies the broken step text and the expected step definition", + "Links the step to its PageObject method", + "Provides two fix options", + "Recommends updating the feature file as the lower-risk fix" + ] + }, + { + "id": 4, + "category": "regression", + "prompt": "User Story US-042 has been deprecated in the living doc. There are 3 scenarios in promo.feature linked to ACs of US-042. What happens to these scenarios?", + "expected_output": "When a US is deprecated, the sync direction is living doc → feature file. Agent proposes: add @deprecated tag to each affected scenario; add a comment above each with the reason (e.g. '# Deprecated: US-042 was deprecated on 2026-05-15 — reason: feature removed'). Never deletes scenarios — marks with @review-needed for developer decision. Lists all 3 affected scenarios with their line numbers.", + "files": [], + "expectations": [ + "Adds @deprecated tag to each scenario linked to the deprecated US", + "Adds a comment explaining the reason for deprecation", + "Never deletes scenarios — marks with @review-needed", + "Lists all affected scenarios with file and line number" + ] + }, + { + "id": 5, + "category": "negative", + "prompt": "I need a new Gherkin scenario for the case where a promo code has expired.", + "expected_output": "Writing new scenarios is out of scope for this skill — routes to gherkin-scenario. gherkin-living-doc-sync corrects existing links and syncs existing scenarios; it does not write new scenarios from scratch.", + "files": [], + "expectations": [ + "Does not write a new scenario", + "Routes to gherkin-scenario", + "Explains the distinction: sync vs. write new" + ] + }, + { + "id": 6, + "category": "paraphrase", + "prompt": "After the UI redesign our feature files are a mess — step text is broken and half the AC links point to the wrong things. Where do I start?", + "expected_output": "Agent identifies this as a multi-direction sync problem. Recommends running scan_ac_links.py first to get a full audit before applying changes. Then: (1) run step text drift detection for all .feature files to find broken step bindings; (2) audit # AC: headers for correctness; (3) apply the minimum change per sync action. Outputs a prioritised repair plan: broken steps (risk: tests fail) before stale AC links (risk: traceability gaps).", + "files": [], + "expectations": [ + "Recommends scan_ac_links.py as first action", + "Detects both step text drift and stale AC links", + "Prioritises broken steps (test failures) over stale links (traceability)", + "Applies minimum necessary change per action" + ] + }, + { + "id": 7, + "category": "edge-case", + "prompt": "A scenario has '# AC: US-099-01' in its header but US-099 does not exist in the living doc catalog. What should I do?", + "expected_output": "This is a broken AC reference. Agent outputs a SYNC ACTION flagging the scenario with the broken link. Resolution options: (1) find the correct AC ID in the catalog that matches this scenario's intent and update the # AC: comment; (2) if the behavior is new and has no AC, invoke living-doc-create-user-story or living-doc-create-functionality to create the missing entity, then link the scenario. Never silently removes the # AC: comment — that would destroy the traceability intent.", + "files": [], + "expectations": [ + "Detects the broken AC reference (AC ID not in catalog)", + "Provides two resolution options: find correct AC or create missing entity", + "Does not silently remove the # AC: comment", + "Routes entity creation to the appropriate living-doc-create-* skill" + ] + }, + { + "id": 8, + "category": "output-format", + "prompt": "Show me what the output of a sync run looks like when there are 2 missing AC links and 1 step text drift.", + "expected_output": "Output contains: (1) a SYNC ACTION block for each missing AC link — showing the feature file path, line number, scenario title, and proposed # AC: header; (2) a DRIFT DETECTED block for the broken step — showing the old step text, the expected step text, the PageObject method, and the suggested fix. Each block has a clear header (SYNC ACTION / DRIFT DETECTED). A summary line at the end: '2 missing AC links, 1 step text drift detected — apply changes? (y/n per action)'.", + "files": [], + "expectations": [ + "SYNC ACTION blocks for each missing AC link with file, line, scenario, and proposed fix", + "DRIFT DETECTED block with old step, expected step, PageObject method, and fix options", + "Each block has a distinct labelled header", + "Summary line with counts at the end", + "Asks for confirmation before applying" + ] + } + ] +} diff --git a/skills/gherkin-living-doc-sync/evals/trigger-eval.json b/skills/gherkin-living-doc-sync/evals/trigger-eval.json new file mode 100644 index 0000000..0be41ff --- /dev/null +++ b/skills/gherkin-living-doc-sync/evals/trigger-eval.json @@ -0,0 +1,17 @@ +[ + {"id": 1, "query": "Sync the checkout feature file to the living doc", "should_trigger": true, "reason": "'sync gherkin to living doc' trigger phrase"}, + {"id": 2, "query": "My feature file is out of sync with the living doc catalog", "should_trigger": true, "reason": "'feature file out of sync' trigger phrase"}, + {"id": 3, "query": "This scenario has no # AC: comment linking it to the living doc", "should_trigger": true, "reason": "'scenario not linked to AC' trigger phrase"}, + {"id": 4, "query": "The step text changed after the UI refactor — what needs updating?", "should_trigger": true, "reason": "'step text changed' trigger phrase"}, + {"id": 5, "query": "There is Gherkin drift between the feature files and the living doc", "should_trigger": true, "reason": "'gherkin drift' trigger phrase"}, + {"id": 6, "query": "I updated an AC in the living doc — how do I propagate that to the BDD scenario?", "should_trigger": true, "reason": "'update living doc after BDD change' and living-doc → feature file sync direction"}, + {"id": 7, "query": "Run a BDD sync between the feature files and living documentation", "should_trigger": true, "reason": "'BDD sync' trigger phrase"}, + {"id": 8, "query": "The AC link header is missing from several scenarios in checkout.feature", "should_trigger": true, "reason": "'AC link missing in feature file' trigger phrase"}, + {"id": 9, "query": "Sync all scenarios in the payments feature file", "should_trigger": true, "reason": "'sync scenarios' trigger phrase"}, + {"id": 10, "query": "The Gherkin scenarios are out of sync with the living doc", "should_trigger": true, "reason": "'gherkin out of sync with living doc' trigger phrase"}, + {"id": 11, "query": "Traceability is broken between the feature files and the AC catalog", "should_trigger": true, "reason": "'traceability broken' trigger phrase"}, + {"id": 12, "query": "Write a new scenario for the expired promo AC", "should_trigger": false, "reason": "Writing new scenarios — routes to gherkin-scenario"}, + {"id": 13, "query": "Implement the step definition for 'When the customer confirms the order'", "should_trigger": false, "reason": "Step definition implementation — routes to gherkin-step"}, + {"id": 14, "query": "Find which User Stories have no Gherkin scenarios", "should_trigger": false, "reason": "Finding living doc gaps — routes to living-doc-gap-finder"}, + {"id": 15, "query": "Create a new User Story for the checkout capability", "should_trigger": false, "reason": "Creating new entities — routes to living-doc-create-user-story"} +] diff --git a/skills/gherkin-scenario/evals/evals.json b/skills/gherkin-scenario/evals/evals.json new file mode 100644 index 0000000..ba27df0 --- /dev/null +++ b/skills/gherkin-scenario/evals/evals.json @@ -0,0 +1,112 @@ +{ + "skill_name": "gherkin-scenario", + "evals": [ + { + "id": 1, + "category": "happy-path", + "prompt": "Write a Gherkin scenario for AC:US-003-01 — 'Customer with a gold membership receives a 20% discount on orders over £100'. Use business language.", + "expected_output": "Outputs a fenced gherkin block starting with Feature:. The Scenario: is preceded by a '# AC: US-003-01 (v1.0.0 – Active) — Customer with gold membership receives 20% discount on orders over £100' comment. Given describes the customer's membership tier (not a database row). When describes the customer placing an order (not a POST request). Then asserts the discounted total in business terms (e.g. 'Then the order total is £160.00'). No CSS selectors, endpoint URLs, or implementation details appear in the steps.", + "files": [], + "expectations": [ + "Output is a single fenced gherkin block", + "Feature: tag appears at the top of the block", + "# AC: US-003-01 comment appears on the line immediately above Scenario:", + "Scenario uses business language — no database rows, HTTP requests, or CSS selectors", + "Given describes state only, When describes exactly one action, Then describes outcome" + ] + }, + { + "id": 2, + "category": "happy-path", + "prompt": "Should I use a Scenario Outline for testing the discount calculation across gold (20%), silver (10%), and bronze (5%) membership tiers?", + "expected_output": "Yes — this is a classic data-driven case with a single behaviour repeated across input variations. Outputs a Scenario Outline with a parameterised step body and an Examples: table containing the three tiers (gold/80.00, silver/90.00, bronze/95.00 for £100 order). The # AC: comment appears above the Scenario Outline:. Explains that Scenario Outline is correct here because all rows exercise the same observable behaviour — the discount calculation — with different inputs.", + "files": [], + "expectations": [ + "Recommends Scenario Outline with Examples: table for the three tiers", + "# AC: comment appears above Scenario Outline:", + "Parameters are used in the step text: and ", + "Examples table includes all three membership tiers", + "Explains why Scenario Outline is appropriate here" + ] + }, + { + "id": 3, + "category": "happy-path", + "prompt": "I have a feature file where every scenario starts with the same two Given steps: 'Given the user is logged in as a gold member' and 'Given the cart has items'. Should I use a Background block?", + "expected_output": "Yes — Background is appropriate when every scenario in the file shares the same preconditions and the shared block is 3 steps or fewer. The Background: block replaces the repeated Given steps in each scenario. Provides guidance: if only 2–3 scenarios share the precondition, prefer duplicating the step for clarity. Warns to keep Background to 3 steps or fewer.", + "files": [], + "expectations": [ + "Recommends Background block since all scenarios share the same preconditions", + "Correctly states the no-more-than-3-steps guideline", + "Notes the alternative (duplicate Given steps) when only a subset of scenarios share the precondition", + "Does not put When or Then steps in the Background" + ] + }, + { + "id": 4, + "category": "regression", + "prompt": "Review this scenario for anti-patterns:\n\nScenario: Checkout flow\n When the customer clicks the 'Submit Order' button\n And the customer enters their credit card number '4111111111111111'\n And the customer clicks the 'Confirm Payment' button\n Then the response status is 201", + "expected_output": "Flags multiple anti-patterns: (1) UI selectors in steps — 'clicks the Submit Order button' and 'clicks the Confirm Payment button' expose implementation details; rewrite using domain actions ('the customer submits the order'). (2) Multiple When actions — the scenario has three When-equivalent steps describing a multi-step flow; should be collapsed into one declarative step or split. (3) Implementation detail in Then — 'response status is 201' is technical; rewrite as 'the order is confirmed'. (4) Missing # AC: comment. Provides a corrected version using domain language.", + "files": [], + "expectations": [ + "Flags UI selectors in step text (button clicks) as anti-pattern", + "Flags multiple When-equivalent actions", + "Flags technical assertion ('status 201') in Then", + "Flags missing # AC: comment", + "Provides a corrected version using domain language" + ] + }, + { + "id": 5, + "category": "negative", + "prompt": "Write the step definition code for 'When the customer confirms the order' in Python behave.", + "expected_output": "Implementing step definitions is out of scope for this skill — routes to gherkin-step. This skill writes the Gherkin text; gherkin-step handles the binding code.", + "files": [], + "expectations": [ + "Does not write step definition code", + "Routes to gherkin-step", + "Explains the distinction: Gherkin text vs. step binding code" + ] + }, + { + "id": 6, + "category": "paraphrase", + "prompt": "Convert these acceptance criteria into Gherkin:\nAC1: When the promo code is valid, the cart total decreases by 10%.\nAC2: When the promo code is expired, an error message is shown.", + "expected_output": "Outputs a fenced gherkin block with two Scenarios. Each Scenario is preceded by a # AC: comment. Scenario 1 (AC1): Given/When covers the valid promo path resulting in a 10% reduction. Scenario 2 (AC2): covers the expired promo error path. Business language throughout — no HTTP calls, selectors, or raw data.", + "files": [], + "expectations": [ + "Two scenarios, one per AC", + "# AC: comment above each Scenario", + "AC1 scenario covers the valid promo path", + "AC2 scenario covers the expired promo error path", + "Business language used throughout — no implementation details" + ] + }, + { + "id": 7, + "category": "edge-case", + "prompt": "I'm writing exploratory scenarios for a spike that doesn't have a User Story yet. What should I use for the # AC: comment?", + "expected_output": "For scenarios without a User Story context, use '# AC: STANDALONE' as a placeholder. Standalone scenarios are permitted when they live outside the project's dedicated living doc feature directory. Tutorial walkthroughs, exploratory probes, and developer-authored scenarios without a User Story AC all qualify. gherkin-living-doc-sync will note STANDALONE-tagged scenarios but will not flag them as traceability gaps.", + "files": [], + "expectations": [ + "Recommends '# AC: STANDALONE' as the placeholder", + "Explains that standalone is permitted for exploratory probes", + "Notes that gherkin-living-doc-sync will report but not flag STANDALONE scenarios as gaps" + ] + }, + { + "id": 8, + "category": "output-format", + "prompt": "Write a Gherkin scenario for AC:US-005-02 — 'Order is rejected when the payment card is declined'. Show the expected output format.", + "expected_output": "Output is a single fenced gherkin code block. The block starts with 'Feature:' followed by a feature title. The # AC: comment appears on the line immediately above the Scenario: keyword. The scenario has Given/When/Then steps. The entire output is inside one gherkin block — no extra prose inside the code block.", + "files": [], + "expectations": [ + "Entire output is a single fenced gherkin code block", + "Block starts with Feature:", + "# AC: US-005-02 comment immediately precedes Scenario:", + "Scenario has Given, When, and Then steps", + "No implementation details (no HTTP, selectors, DB) in steps" + ] + } + ] +} diff --git a/skills/gherkin-scenario/evals/trigger-eval.json b/skills/gherkin-scenario/evals/trigger-eval.json new file mode 100644 index 0000000..c2d39ef --- /dev/null +++ b/skills/gherkin-scenario/evals/trigger-eval.json @@ -0,0 +1,18 @@ +[ + {"id": 1, "query": "Write a Gherkin scenario for when a customer applies a promo code", "should_trigger": true, "reason": "'write a Gherkin scenario' trigger phrase"}, + {"id": 2, "query": "Help me write a BDD scenario for the payment failure case", "should_trigger": true, "reason": "'BDD scenario' trigger phrase"}, + {"id": 3, "query": "Review my feature file for anti-patterns", "should_trigger": true, "reason": "'feature file' and 'review my feature file' trigger phrases"}, + {"id": 4, "query": "I'm not sure how to write Given When Then for the order flow", "should_trigger": true, "reason": "'Given When Then' trigger phrase"}, + {"id": 5, "query": "Should I use a Scenario Outline here?", "should_trigger": true, "reason": "'Scenario Outline' trigger phrase"}, + {"id": 6, "query": "How do I write a Cucumber scenario for logging in?", "should_trigger": true, "reason": "'Cucumber scenario' trigger phrase"}, + {"id": 7, "query": "Write a behave scenario for the discount calculation", "should_trigger": true, "reason": "'behave scenario' trigger phrase"}, + {"id": 8, "query": "Can you help me write acceptance tests in Gherkin?", "should_trigger": true, "reason": "'acceptance test in Gherkin' trigger phrase"}, + {"id": 9, "query": "Should I use Background for shared login steps?", "should_trigger": true, "reason": "'should I use Background' trigger phrase"}, + {"id": 10, "query": "What are BDD anti-patterns I should avoid in feature files?", "should_trigger": true, "reason": "'BDD anti-patterns' trigger phrase"}, + {"id": 11, "query": "Review my feature file and give feedback", "should_trigger": true, "reason": "'review my feature file' trigger phrase"}, + {"id": 12, "query": "Can you write BDD scenarios for the checkout flow?", "should_trigger": true, "reason": "'BDD scenarios for' trigger phrase"}, + {"id": 13, "query": "Convert these acceptance criteria to Gherkin", "should_trigger": true, "reason": "'convert acceptance criteria to Gherkin' trigger phrase"}, + {"id": 14, "query": "Implement the step definition for 'When the customer confirms the order'", "should_trigger": false, "reason": "Step definition implementation — routes to gherkin-step"}, + {"id": 15, "query": "Write a unit test for the discount calculation function", "should_trigger": false, "reason": "Unit test request — routes to test-unit-write"}, + {"id": 16, "query": "Design a test case table for the promo code feature", "should_trigger": false, "reason": "Test case table design — routes to test-case-design"} +] diff --git a/skills/gherkin-step/evals/evals.json b/skills/gherkin-step/evals/evals.json new file mode 100644 index 0000000..afcaaff --- /dev/null +++ b/skills/gherkin-step/evals/evals.json @@ -0,0 +1,108 @@ +{ + "skill_name": "gherkin-step", + "evals": [ + { + "id": 1, + "category": "happy-path", + "prompt": "Write a Python behave step definition for 'When the customer confirms the order'. The CheckoutPage PageObject has a confirm_order() method.", + "expected_output": "Outputs a thin step definition that delegates entirely to the PageObject: @when('the customer confirms the order') / def step_confirm_order(context): / context.checkout_page.confirm_order(). No CSS selectors, no business logic, and no assertions inside the step. The method call is the only line in the step body (plus any state retrieval from context).", + "files": [], + "expectations": [ + "Step delegates to CheckoutPage.confirm_order() — no selector or business logic in step body", + "Uses @when decorator", + "Accesses checkout_page via context object", + "No assertions in a When step" + ] + }, + { + "id": 2, + "category": "happy-path", + "prompt": "In behave, I have a Given step that creates a customer and a Then step that checks the discount. How do I pass the customer object between them without using global variables?", + "expected_output": "Use the context object — behave instantiates it fresh per scenario so there is no contamination. In the Given step, attach the object: context.customer = Customer(tier=tier). In the Then step, read it back: context.customer.discount_rate(). Never store state in module-level or global variables. Provides a code example showing both steps using context.customer.", + "files": [], + "expectations": [ + "Uses context object to pass state between steps", + "Explicitly warns against global or module-level variables", + "Shows both Given and Then steps using context.customer", + "Notes that context is fresh per scenario" + ] + }, + { + "id": 3, + "category": "happy-path", + "prompt": "I have a step 'When the customer adds the following items' that takes a DataTable with columns 'sku' and 'quantity'. How do I parse the table in behave?", + "expected_output": "Access the table via context.table and iterate with a for loop. Each row is a dict-like object: for row in context.table: context.cart.add_item(row['sku'], int(row['quantity'])). Notes that column values are strings by default — cast quantity to int explicitly. Provides a complete step definition example.", + "files": [], + "expectations": [ + "Uses context.table to access the DataTable", + "Iterates rows with a for loop", + "Casts quantity to int (DataTable values are strings)", + "Delegates the actual add_item call to a domain object" + ] + }, + { + "id": 4, + "category": "regression", + "prompt": "Review this step definition:\n\n@when('the customer confirms the order')\ndef step_confirm_order(context):\n context.cart.total *= (1 - context.discount / 100)\n context.order_status = 'placed'\n context.db.save(context.order)", + "expected_output": "Flags the step as violating the 'keep step definitions thin' rule. Business logic (discount calculation, status assignment, DB write) must not live in step definitions — these belong in domain objects or PageObjects. The step should only call a method: context.checkout_page.confirm_order() or context.order_service.confirm_order(). Provides a corrected version that delegates to a domain object.", + "files": [], + "expectations": [ + "Identifies business logic in the step body as the violation", + "Flags discount calculation, status assignment, and DB write as out-of-place", + "Provides a corrected thin version that delegates to a domain/service object", + "Explains that step definitions are bindings only" + ] + }, + { + "id": 5, + "category": "negative", + "prompt": "Write a Gherkin scenario for when the promo code is expired.", + "expected_output": "Writing Gherkin scenarios is out of scope for this skill — routes to gherkin-scenario. gherkin-step handles step definition code; gherkin-scenario handles Gherkin text.", + "files": [], + "expectations": [ + "Does not write a Gherkin scenario", + "Routes to gherkin-scenario", + "Explains the distinction: step binding code vs. Gherkin text" + ] + }, + { + "id": 6, + "category": "paraphrase", + "prompt": "How do I pass data between step files in behave? For example, I create an order in a Given step and need to check it in a Then step in a different file.", + "expected_output": "Use the context object — behave's built-in mechanism for sharing state across steps from different files. In the Given step (any file), assign context.order = .... In the Then step (another file), read context.order. The context is scoped to the scenario and reset between scenarios. Provides a concrete code example showing a Given in one file and Then in another, both using context.order.", + "files": [], + "expectations": [ + "Recommends context object for cross-file state sharing", + "Shows assignment in Given and access in Then", + "Notes that context is scenario-scoped", + "Does not recommend global variables" + ] + }, + { + "id": 7, + "category": "edge-case", + "prompt": "I want a Before hook in behave that only runs for scenarios tagged @database. How do I scope it?", + "expected_output": "Use the context.tags check inside the before_scenario hook: @before_scenario / def setup_database(context): / if 'database' in context.tags: / context.db = create_test_db(). This scopes setup to only the tagged scenarios. Notes that before_scenario runs before every scenario by default — the tag check prevents unnecessary setup. Advises pairing with an after_scenario hook to clean up: if 'database' in context.tags: context.db.teardown().", + "files": [], + "expectations": [ + "Uses context.tags check to scope the hook", + "Shows before_scenario hook with 'database' tag check", + "Advises pairing with a cleanup after_scenario hook", + "Notes the hook runs before every scenario by default without the tag check" + ] + }, + { + "id": 8, + "category": "output-format", + "prompt": "Show me the correct structure for a Cucumber TypeScript step definition that reads the order ID from the World object and submits the order using a PageObject.", + "expected_output": "Output is a TypeScript code block. The step uses the When decorator and accesses this (typed as OrderWorld). Calls this.checkoutPage.submitOrder(this.orderId). The World interface/class includes orderId and checkoutPage properties. No CSS selectors appear in the step body — they are encapsulated in CheckoutPage. Example follows the pattern: When('the customer submits the order', async function (this: OrderWorld) { await this.checkoutPage.submitOrder(); }).", + "files": [], + "expectations": [ + "Output is a TypeScript code block", + "Step uses async function with this typed as a World class", + "Delegates to PageObject method — no selectors in step body", + "World object holds page and state properties" + ] + } + ] +} diff --git a/skills/gherkin-step/evals/trigger-eval.json b/skills/gherkin-step/evals/trigger-eval.json new file mode 100644 index 0000000..869dc62 --- /dev/null +++ b/skills/gherkin-step/evals/trigger-eval.json @@ -0,0 +1,19 @@ +[ + {"id": 1, "query": "Write step definitions for the checkout feature file", "should_trigger": true, "reason": "'step definitions' trigger phrase"}, + {"id": 2, "query": "Implement Gherkin steps for the login scenarios", "should_trigger": true, "reason": "'implement Gherkin steps' trigger phrase"}, + {"id": 3, "query": "How do I write a Cucumber step for 'When the customer submits the order'?", "should_trigger": true, "reason": "'Cucumber step' trigger phrase"}, + {"id": 4, "query": "How do I write a behave step for 'Given a gold tier customer'?", "should_trigger": true, "reason": "'behave step' trigger phrase"}, + {"id": 5, "query": "How do I configure a parameter type so a number is cast to int?", "should_trigger": true, "reason": "'parameter type' trigger phrase"}, + {"id": 6, "query": "How do I parse a DataTable in a step definition?", "should_trigger": true, "reason": "'DataTable' trigger phrase"}, + {"id": 7, "query": "How do I access a DocString payload inside a step?", "should_trigger": true, "reason": "'DocString' trigger phrase"}, + {"id": 8, "query": "How do I set up a Before hook in Cucumber TypeScript?", "should_trigger": true, "reason": "'Before hook' trigger phrase"}, + {"id": 9, "query": "How do I clean up after each scenario using an After hook?", "should_trigger": true, "reason": "'After hook' trigger phrase"}, + {"id": 10, "query": "How do I use the World object to share page instances between steps?", "should_trigger": true, "reason": "'World object' trigger phrase"}, + {"id": 11, "query": "How do I manage step context across multiple step files?", "should_trigger": true, "reason": "'step context' trigger phrase"}, + {"id": 12, "query": "How do I share data between two step definitions in behave?", "should_trigger": true, "reason": "'step state sharing' trigger phrase"}, + {"id": 13, "query": "How do I share state between steps in a Cucumber scenario?", "should_trigger": true, "reason": "'how to share state between steps' trigger phrase"}, + {"id": 14, "query": "How do I register a step definition pattern for a new step text?", "should_trigger": true, "reason": "'register step definition' trigger phrase"}, + {"id": 15, "query": "How do I set up hooks for my Cucumber test suite?", "should_trigger": true, "reason": "'hook setup' trigger phrase"}, + {"id": 16, "query": "Write a Gherkin scenario for the promo code feature", "should_trigger": false, "reason": "Writing Gherkin scenarios — routes to gherkin-scenario"}, + {"id": 17, "query": "Write a unit test for the discount calculation function", "should_trigger": false, "reason": "Unit test request — routes to test-unit-write"} +] diff --git a/skills/living-doc-pageobject-scan/evals/evals.json b/skills/living-doc-pageobject-scan/evals/evals.json new file mode 100644 index 0000000..d57985d --- /dev/null +++ b/skills/living-doc-pageobject-scan/evals/evals.json @@ -0,0 +1,115 @@ +{ + "skill_name": "living-doc-pageobject-scan", + "evals": [ + { + "id": 1, + "category": "happy-path", + "prompt": "We have a new checkout screen at https://app.example.com/checkout with no PageObjects yet. Please bootstrap a PageObject for it.", + "expected_output": "Operates in Create mode. Crawls /checkout and discovers interactive elements: a promo code input, a confirm order button, and an error banner. Generates a CheckoutPage skeleton with one selector constant per element, following data-testid selector preference. Outputs Python or TypeScript class with: class-level selector constants, __init__ or constructor that takes a page, and method stubs for each interactive element (enter_promo_code, confirm_order, assert_error_visible). Includes a file-level '# living-doc: FEAT- | /checkout' comment. If no matching Feature exists in the catalog, proposes drafting FEAT- via living-doc-create-feature.", + "files": [], + "expectations": [ + "Operates in Create mode", + "Generates a PageObject named CheckoutPage", + "Selector constants use data-testid preference over CSS class or positional selectors", + "Includes file-level living-doc: FEAT-nnn | /route comment", + "Method stubs for each interactive element", + "Proposes creating Feature entity if none exists in catalog" + ] + }, + { + "id": 2, + "category": "happy-path", + "prompt": "The checkout screen has an element that only has a positional CSS selector like 'button:nth-child(2)'. How should this be handled in the PageObject?", + "expected_output": "Flags the element as fragile with a recommendation: 'Element has a positional CSS selector. Please add: data-testid=\"confirm-order-btn\"'. The selector is still included in the generated PageObject (to avoid blocking test authoring) but is annotated with a comment like '# FRAGILE: positional selector — add data-testid=\"confirm-order-btn\" to the element'. The flag is also added to the breaking change report / scan output.", + "files": [], + "expectations": [ + "Flags the positional selector as fragile", + "Provides the specific data-testid recommendation in kebab-case", + "Still includes the selector in the PageObject to avoid blocking development", + "Annotates the selector constant with a FRAGILE comment" + ] + }, + { + "id": 3, + "category": "happy-path", + "prompt": "After a UI update the confirm order button changed its data-testid from 'confirm-order-btn' to 'submit-order-btn'. CheckoutPage still uses the old selector. What does the Maintain mode output look like?", + "expected_output": "Operates in Maintain mode. Diffs the existing CheckoutPage selector against the current DOM. CONFIRM_BUTTON resolves to the old data-testid which is no longer present. Outputs a BREAKING CHANGE report: 'CheckoutPage.CONFIRM_BUTTON: [data-testid=\"confirm-order-btn\"] not found in DOM → Linked step: \"When the customer confirms the order\" (checkout.feature:14) → Action required: update selector to [data-testid=\"submit-order-btn\"]'. Updates the selector constant in CheckoutPage. Never auto-deletes the method.", + "files": [], + "expectations": [ + "Operates in Maintain mode", + "Detects the broken selector as a BREAKING CHANGE", + "Links the broken selector to affected step definitions and feature file lines", + "Proposes the updated selector", + "Updates selector constants only — preserves method logic", + "Never auto-deletes a method" + ] + }, + { + "id": 4, + "category": "regression", + "prompt": "The order summary table element was removed from the checkout page in the latest release, but CheckoutPage still has get_order_summary() referencing '[data-testid=\"order-summary\"]'. What happens?", + "expected_output": "Maintains mode detects the missing element. ORDER_SUMMARY selector resolves to nothing in the current DOM. Flags as BREAKING CHANGE with the linked step definition. Does NOT auto-delete the get_order_summary() method — flags it for developer review with the comment '# BREAKING: [data-testid=\"order-summary\"] not found in DOM — verify if element was removed or renamed'. The breaking change report includes the linked step text so the developer knows which test steps will fail.", + "files": [], + "expectations": [ + "Detects missing element as BREAKING CHANGE", + "Does not auto-delete the method", + "Adds a BREAKING comment to the selector constant", + "Lists linked step definitions that will be affected", + "Flags for developer review rather than auto-removing" + ] + }, + { + "id": 5, + "category": "negative", + "prompt": "Generate BDD scenarios for US-007 — Place an Online Order.", + "expected_output": "Generating scenarios from User Stories is out of scope for this skill — routes to living-doc-scenario-creator. living-doc-pageobject-scan generates PageObjects from a live webapp or test suite; scenario generation from User Story ACs is handled by living-doc-scenario-creator.", + "files": [], + "expectations": [ + "Does not generate BDD scenarios", + "Routes to living-doc-scenario-creator", + "Explains the distinction: PageObject discovery vs. scenario generation" + ] + }, + { + "id": 6, + "category": "paraphrase", + "prompt": "Our login page was redesigned and the Playwright tests are all failing because the selectors are wrong. Can you help?", + "expected_output": "This is a Maintain mode request. Agent rescans the login page, diffs LoginPage selectors against the current DOM, and identifies which selector constants are broken. Outputs BREAKING CHANGE entries for each missing selector and proposes the updated data-testid values. Updates the selector constants in LoginPage while preserving existing method logic. Flags any remaining positional selectors for the dev team to add proper data-testid attributes.", + "files": [], + "expectations": [ + "Identifies this as a Maintain mode operation", + "Diffs existing LoginPage selectors against the current DOM", + "Outputs BREAKING CHANGE entries for each broken selector", + "Updates selector constants — does not touch method bodies", + "Flags any remaining positional selectors" + ] + }, + { + "id": 7, + "category": "edge-case", + "prompt": "Should I create a PageObject for the /api/orders REST endpoint?", + "expected_output": "No — this skill generates PageObjects only for UI Features (web pages, modals, screens). API Features use annotated endpoint methods as their living contract anchor, not PageObjects. For API surfaces, the appropriate living doc anchor is a Functionality entity documenting the endpoint behaviour. Routes this request to living-doc-create-functionality for the endpoint behavior documentation.", + "files": [], + "expectations": [ + "Explains that PageObjects are for UI Features only — not API endpoints", + "Explains the alternative: Functionality entities for API surfaces", + "Routes to living-doc-create-functionality for API documentation" + ] + }, + { + "id": 8, + "category": "output-format", + "prompt": "Show me the expected structure of a generated CheckoutPage PageObject in Python for a screen at /checkout linked to FEAT-003.", + "expected_output": "Output is a Python code block. Line 1: '# living-doc: FEAT-003 | /checkout'. The class is named CheckoutPage. Selector constants are ALL_CAPS class attributes with data-testid values. The __init__ takes a page parameter and assigns self.page. Methods are named in snake_case, each wrapping a single page action or assertion. No inline selectors in method bodies — selectors are referenced only via the class constants. Methods include at least one action method and one assertion method (assert_*).", + "files": [], + "expectations": [ + "Output is a Python code block", + "File-level comment: # living-doc: FEAT-003 | /checkout", + "Class named CheckoutPage", + "Selector constants are ALL_CAPS class attributes", + "Methods reference class-level selector constants — no inline selectors", + "At least one action method and one assertion method" + ] + } + ] +} diff --git a/skills/living-doc-pageobject-scan/evals/trigger-eval.json b/skills/living-doc-pageobject-scan/evals/trigger-eval.json new file mode 100644 index 0000000..03867b5 --- /dev/null +++ b/skills/living-doc-pageobject-scan/evals/trigger-eval.json @@ -0,0 +1,16 @@ +[ + {"id": 1, "query": "Scan this webapp and generate PageObjects for each screen", "should_trigger": true, "reason": "'scan this webapp' trigger phrase"}, + {"id": 2, "query": "Generate PageObjects for our checkout and login screens", "should_trigger": true, "reason": "'generate pageobjects' trigger phrase"}, + {"id": 3, "query": "Update the PageObjects after the UI redesign", "should_trigger": true, "reason": "'update pageobjects' trigger phrase"}, + {"id": 4, "query": "Create a PageObject for the order history screen", "should_trigger": true, "reason": "'pageobject for this screen' trigger phrase"}, + {"id": 5, "query": "Crawl the UI to discover all available screens", "should_trigger": true, "reason": "'crawl the UI' trigger phrase"}, + {"id": 6, "query": "Discover all UI elements on the checkout page", "should_trigger": true, "reason": "'discover UI elements' trigger phrase"}, + {"id": 7, "query": "Create page objects for the admin portal", "should_trigger": true, "reason": "'create page objects' trigger phrase"}, + {"id": 8, "query": "Scan the test suite to find existing PageObjects", "should_trigger": true, "reason": "'scan test suite for pageobjects' trigger phrase"}, + {"id": 9, "query": "Do a living doc bottom-up scan of the web app", "should_trigger": true, "reason": "'living doc bottom-up' trigger phrase"}, + {"id": 10, "query": "Bootstrap page objects for a new test suite", "should_trigger": true, "reason": "'bootstrap page objects' trigger phrase"}, + {"id": 11, "query": "There is pageobject drift after the latest UI change — what do I do?", "should_trigger": true, "reason": "'pageobject drift' trigger phrase"}, + {"id": 12, "query": "Sync the PageObjects with the current app", "should_trigger": true, "reason": "'sync pageobjects' trigger phrase"}, + {"id": 13, "query": "Create a User Story for the checkout screen", "should_trigger": false, "reason": "Creating User Stories — routes to living-doc-create-user-story"}, + {"id": 14, "query": "Generate BDD scenarios for User Story US-007", "should_trigger": false, "reason": "Generating BDD scenarios from a User Story — routes to living-doc-scenario-creator"} +] diff --git a/skills/living-doc-scenario-creator/evals/evals.json b/skills/living-doc-scenario-creator/evals/evals.json new file mode 100644 index 0000000..d9656f1 --- /dev/null +++ b/skills/living-doc-scenario-creator/evals/evals.json @@ -0,0 +1,112 @@ +{ + "skill_name": "living-doc-scenario-creator", + "evals": [ + { + "id": 1, + "category": "happy-path", + "prompt": "Generate BDD scenarios for US-007 — Place an Online Order.\n\nACs:\n- AC:US-007-01 (v1.0.0 – Active) — happy_path: Customer places an order with a saved payment method\n- AC:US-007-02 (v1.0.0 – Active) — error: Order is rejected when the payment card is declined\n- AC:US-007-03 (v1.0.0 – Active) — alternative: Customer places an order using a guest checkout", + "expected_output": "Outputs a .feature file named 'us-007-place-an-online-order.feature'. Feature header uses the As-a/I-can/so-that narrative. Three Scenarios generated — one per active AC. Each Scenario: is preceded immediately by a '# AC: US-007-0n (v1.0.0 – Active) — ...' comment. Scenario names follow the AC type conventions: US-007-01 is a Scenario: (happy_path), US-007-02 is titled 'Order rejected when payment card is declined' (error), US-007-03 is titled 'Place an Online Order — guest checkout' (alternative). All step text is domain language with no implementation details.", + "files": [], + "expectations": [ + "Feature file named us-007-place-an-online-order.feature", + "Feature header with As-a/I-can/so-that narrative", + "Three scenarios — one per active AC", + "Each Scenario: immediately preceded by a # AC: comment", + "AC type conventions: happy_path → Scenario:, error → 'X — error', alternative → 'X — alternative path'", + "Business language steps — no HTTP, selectors, or DB" + ] + }, + { + "id": 2, + "category": "happy-path", + "prompt": "Generate scenarios for US-007 which has these ACs:\n- AC:US-007-01 (Active) — happy_path\n- AC:US-007-04 (Deprecated) — error: old payment timeout path\n- AC:US-007-05 (Planned) — alternative: loyalty points checkout\n\nWhat scenarios get generated and what appears in the coverage report?", + "expected_output": "Only AC:US-007-01 is Active — one Scenario is generated for it. AC:US-007-04 (Deprecated) and AC:US-007-05 (Planned) are excluded from generation. Coverage report shows: AC:US-007-01 (Active): generated. AC:US-007-04 (Deprecated): skipped — deprecated AC. AC:US-007-05 (Planned): skipped — not yet active. No scenarios are generated for Deprecated or Planned ACs.", + "files": [], + "expectations": [ + "Only Active ACs drive scenario generation", + "Deprecated ACs are excluded with 'skipped — deprecated AC' notation", + "Planned ACs are excluded with 'skipped — not yet active' notation", + "Coverage report lists all ACs with their skip reason" + ] + }, + { + "id": 3, + "category": "happy-path", + "prompt": "Generating scenarios for US-007-01 — 'Customer confirms the order'. The CheckoutPage PageObject has a confirm_order() method. Generate the step stub for 'When the customer confirms the order'.", + "expected_output": "Case A — PageObject method exists. Generates a full step stub using the available method. Output includes: the missing step text, the PageObject candidate (CheckoutPage, FEAT-003), the suggested step file (tests/steps/checkout_steps.py), and the generated stub: @when('the customer confirms the order') / def step_confirm_order(context): / context.checkout_page = CheckoutPage(context.browser) or context.checkout_page.confirm_order(). No NotImplementedError — the PageObject method exists.", + "files": [], + "expectations": [ + "Identifies as Case A (PageObject method exists)", + "Generates a full stub — no NotImplementedError", + "Names the PageObject candidate and Feature ID", + "Suggests the correct step file based on the domain" + ] + }, + { + "id": 4, + "category": "regression", + "prompt": "Generating scenarios for US-007-02. The step 'When the customer applies a promo code' has no matching method in CheckoutPage. Generate the step stub.", + "expected_output": "Case B — No matching PageObject method. Generates a stub with a NotImplementedError failure guard. The error message names the step text, the PageObject that is missing the method (CheckoutPage, FEAT-003), and the expected method name (apply_promo_code). Also outputs an action: invoke living-doc-pageobject-scan in Maintain mode for FEAT-003 to add the missing element. The stub is clearly marked as incomplete and will fail at runtime until the PageObject is extended.", + "files": [], + "expectations": [ + "Identifies as Case B (missing PageObject method)", + "Generates stub with raise NotImplementedError", + "Error message includes step text, PageObject name, Feature ID, and missing method name", + "Instructs to run living-doc-pageobject-scan (Maintain mode) for the missing method", + "Stub will fail at runtime as intended" + ] + }, + { + "id": 5, + "category": "negative", + "prompt": "Write a standalone Gherkin scenario for testing login without a specific User Story.", + "expected_output": "Standalone Gherkin without a User Story is out of scope — routes to gherkin-scenario. This skill generates scenarios from User Story ACs; gherkin-scenario handles standalone or exploratory scenarios.", + "files": [], + "expectations": [ + "Does not generate a standalone scenario", + "Routes to gherkin-scenario", + "Explains the distinction: US-driven vs. standalone Gherkin" + ] + }, + { + "id": 6, + "category": "paraphrase", + "prompt": "Write feature tests for US-007 — I need full coverage of all the acceptance criteria.", + "expected_output": "Agent recognises this as a scenario generation request for a User Story. Loads US-007, filters to Active and Implemented ACs, and generates one scenario per AC. Outputs the .feature file and a coverage table. If any ACs are Planned or Deprecated, notes them as skipped in the coverage report.", + "files": [], + "expectations": [ + "Recognises 'write feature tests for US-nnn' as a scenario generation request", + "Filters to Active/Implemented ACs only", + "Generates one scenario per AC", + "Outputs coverage table with skip reasons for excluded ACs" + ] + }, + { + "id": 7, + "category": "edge-case", + "prompt": "I want to generate scenarios for US-020, but all its ACs are in Planned state. What happens?", + "expected_output": "No scenarios are generated — there are no Active or Implemented ACs to drive generation. Coverage report shows all ACs with 'skipped — not yet active'. The skill advises: when ACs are promoted to Active, re-run the scenario creator to generate the feature file. Does not create empty or stub scenarios for Planned ACs.", + "files": [], + "expectations": [ + "Generates zero scenarios when all ACs are Planned", + "Coverage report lists all ACs as 'skipped — not yet active'", + "Advises to re-run when ACs become Active", + "Does not create empty or stub scenarios for Planned ACs" + ] + }, + { + "id": 8, + "category": "output-format", + "prompt": "Show the expected structure of a .feature file generated for US-007 with two active ACs: one happy_path and one error AC.", + "expected_output": "Output is a gherkin code block. Filename in a comment or header: us-007-place-an-online-order.feature. Feature: block with As-a/I-can/so-that narrative from the User Story. First scenario: '# AC: US-007-01 (v1.0.0 – Active) — ...' immediately above 'Scenario: '. Second scenario: '# AC: US-007-02 (v1.0.0 – Active) — ...' immediately above 'Scenario: '. All steps use domain language. A coverage table appears after the feature file block.", + "files": [], + "expectations": [ + "Output is a gherkin code block with Feature: header", + "Feature header uses As-a/I-can/so-that narrative", + "# AC: comment immediately precedes each Scenario:", + "Error scenario title follows convention: ''", + "Coverage table appended after the feature file" + ] + } + ] +} diff --git a/skills/living-doc-scenario-creator/evals/trigger-eval.json b/skills/living-doc-scenario-creator/evals/trigger-eval.json new file mode 100644 index 0000000..f235cec --- /dev/null +++ b/skills/living-doc-scenario-creator/evals/trigger-eval.json @@ -0,0 +1,16 @@ +[ + {"id": 1, "query": "Create BDD scenarios for user story US-007 — Place an Online Order", "should_trigger": true, "reason": "'create BDD scenarios for user story' trigger phrase"}, + {"id": 2, "query": "Generate scenarios for US-003 — Apply Promo Code", "should_trigger": true, "reason": "'generate scenarios for US' trigger phrase"}, + {"id": 3, "query": "Cover all ACs of US-011 with BDD scenarios", "should_trigger": true, "reason": "'cover AC with scenarios' trigger phrase"}, + {"id": 4, "query": "Generate a feature file from user story US-015", "should_trigger": true, "reason": "'generate feature file from user story' trigger phrase"}, + {"id": 5, "query": "Create BDD scenarios from requirements for the login flow", "should_trigger": true, "reason": "'BDD from requirements' trigger phrase"}, + {"id": 6, "query": "What is the scenario coverage for US-007?", "should_trigger": true, "reason": "'scenario coverage for US' trigger phrase"}, + {"id": 7, "query": "Map the ACs of US-009 to Gherkin scenarios", "should_trigger": true, "reason": "'map AC to scenarios' trigger phrase"}, + {"id": 8, "query": "Generate Gherkin from user story US-012", "should_trigger": true, "reason": "'gherkin from user story' trigger phrase"}, + {"id": 9, "query": "Create scenarios for US-007", "should_trigger": true, "reason": "'scenarios for US-' trigger phrase — explicitly mentions a US ID"}, + {"id": 10, "query": "Generate a .feature file for the checkout flow", "should_trigger": true, "reason": "'generate .feature file' trigger phrase"}, + {"id": 11, "query": "Write standalone Gherkin scenarios for an exploratory test", "should_trigger": false, "reason": "Standalone Gherkin without a User Story — routes to gherkin-scenario"}, + {"id": 12, "query": "Implement the step definition for 'When the customer confirms the order'", "should_trigger": false, "reason": "Step definition implementation — routes to gherkin-step"}, + {"id": 13, "query": "Write a unit test for the promo code calculation", "should_trigger": false, "reason": "Unit test request — routes to test-unit-write"}, + {"id": 14, "query": "Find which User Stories have no Gherkin coverage at all", "should_trigger": false, "reason": "Finding doc gaps — routes to living-doc-gap-finder"} +] From 21503cdc8408051fe61c0794285f34f36bf554a1 Mon Sep 17 00:00:00 2001 From: miroslavpojer Date: Fri, 22 May 2026 20:47:25 +0200 Subject: [PATCH 15/35] feat: enhance Gherkin-Living Doc sync skill with improved descriptions and new trigger phrases --- skills/gherkin-living-doc-sync/SKILL.md | 7 ++++--- skills/gherkin-living-doc-sync/evals/trigger-eval.json | 3 ++- skills/living-doc-create-feature/SKILL.md | 10 +++++----- skills/living-doc-create-functionality/SKILL.md | 7 ++++--- .../evals/trigger-eval.json | 4 +++- 5 files changed, 18 insertions(+), 13 deletions(-) diff --git a/skills/gherkin-living-doc-sync/SKILL.md b/skills/gherkin-living-doc-sync/SKILL.md index 96d4080..2a867a0 100644 --- a/skills/gherkin-living-doc-sync/SKILL.md +++ b/skills/gherkin-living-doc-sync/SKILL.md @@ -3,12 +3,13 @@ name: gherkin-living-doc-sync description: > Synchronise Gherkin feature files and BDD scenarios with the living documentation catalog. Activate when scenarios diverge from User Story ACs, when step text drifts after a UI - refactor, or when AC link headers are missing or stale. Distinct from gap-finder (which - detects missing coverage) — corrects existing links. + refactor, when AC link headers or # AC: comment annotations are missing or stale, or when + propagating AC changes from the living doc back to feature files. Distinct from gap-finder + (which detects missing coverage) — corrects existing links. Triggers on: "sync gherkin to living doc", "feature file out of sync", "scenario not linked to AC", "step text changed", "gherkin drift", "update living doc after BDD change", "BDD sync", "AC link missing in feature file", "sync scenarios", - "gherkin out of sync with living doc", "traceability broken". + "gherkin out of sync with living doc", "traceability broken", "propagate AC changes". Does NOT trigger for: writing new scenarios (use gherkin-scenario), implementing step definitions (use gherkin-step), finding living doc gaps (use living-doc-gap-finder), creating new US/Feature entities (use living-doc-create-user-story). diff --git a/skills/gherkin-living-doc-sync/evals/trigger-eval.json b/skills/gherkin-living-doc-sync/evals/trigger-eval.json index 0be41ff..28ac771 100644 --- a/skills/gherkin-living-doc-sync/evals/trigger-eval.json +++ b/skills/gherkin-living-doc-sync/evals/trigger-eval.json @@ -13,5 +13,6 @@ {"id": 12, "query": "Write a new scenario for the expired promo AC", "should_trigger": false, "reason": "Writing new scenarios — routes to gherkin-scenario"}, {"id": 13, "query": "Implement the step definition for 'When the customer confirms the order'", "should_trigger": false, "reason": "Step definition implementation — routes to gherkin-step"}, {"id": 14, "query": "Find which User Stories have no Gherkin scenarios", "should_trigger": false, "reason": "Finding living doc gaps — routes to living-doc-gap-finder"}, - {"id": 15, "query": "Create a new User Story for the checkout capability", "should_trigger": false, "reason": "Creating new entities — routes to living-doc-create-user-story"} + {"id": 15, "query": "Create a new User Story for the checkout capability", "should_trigger": false, "reason": "Creating new entities — routes to living-doc-create-user-story"}, + {"id": 16, "query": "Propagate AC changes from the living doc back to the feature files", "should_trigger": true, "reason": "'propagate AC changes' trigger phrase"} ] diff --git a/skills/living-doc-create-feature/SKILL.md b/skills/living-doc-create-feature/SKILL.md index dff91c7..92267e3 100644 --- a/skills/living-doc-create-feature/SKILL.md +++ b/skills/living-doc-create-feature/SKILL.md @@ -1,11 +1,11 @@ --- name: living-doc-create-feature description: > - Define a system surface (UI screen or API endpoint group) as a Feature entity, enabling impact - analysis and change-management traceability in the living documentation. Activate when - documenting a new screen or API endpoint, mapping system surfaces to User Stories, enumerating - which Functionalities a surface owns, or bootstrapping the structural layer between User Stories - and atomic behaviors. + Define a system surface (UI screen, API endpoint, service, or module) as a Feature entity, + enabling impact analysis and change-management traceability in the living documentation. + Activate when documenting a new screen, API endpoint, service, or module; maintaining a + Feature Registry; mapping system surfaces to User Stories; enumerating which Functionalities + a surface owns; or bootstrapping the structural layer between User Stories and atomic behaviors. Triggers on: "document a new feature", "create a feature entity", "new screen documentation", "document an API endpoint", "feature registry", "what feature owns this", "map user story to feature", "create feature entity", "system surface documentation", "feature owners", diff --git a/skills/living-doc-create-functionality/SKILL.md b/skills/living-doc-create-functionality/SKILL.md index 91289a5..caa96d2 100644 --- a/skills/living-doc-create-functionality/SKILL.md +++ b/skills/living-doc-create-functionality/SKILL.md @@ -4,14 +4,15 @@ description: > Define an atomic, testable behavior (Functionality) with Functionality-level Acceptance Criteria designed to be validated by fast unit or integration tests. Activate when documenting an atomic behavior, component function, or business rule; writing Functionality-level AC; creating the - granular test anchor for a Feature; or identifying reuse candidates across User Stories. + granular test anchor for a Feature; identifying reuse candidates across User Stories; or + reviewing a Functionality for completeness. Triggers on: "create a functionality", "document an atomic behavior", "functionality AC", "unit-testable behavior", "define component behavior", "atomic acceptance criteria", - "document a business rule", "create a functionality entity", "functionality acceptance criteria". + "document a business rule", "create a functionality entity", "functionality acceptance criteria", + "test_type", "unit vs integration test", "choose test type". Does NOT trigger for: end-to-end User Stories (use living-doc-create-user-story), system surface documentation (use living-doc-create-feature), BDD scenario generation (use living-doc-scenario-creator). - Pairs with living-doc-create-feature and living-doc-scenario-creator. license: Apache-2.0 compatibility: GitHub Copilot --- diff --git a/skills/living-doc-create-functionality/evals/trigger-eval.json b/skills/living-doc-create-functionality/evals/trigger-eval.json index 619f568..3d9012e 100644 --- a/skills/living-doc-create-functionality/evals/trigger-eval.json +++ b/skills/living-doc-create-functionality/evals/trigger-eval.json @@ -14,5 +14,7 @@ {"id": 13, "query": "Generate BDD scenarios for US-001", "should_trigger": false, "reason": "Scenario generation — routes to living-doc-scenario-creator"}, {"id": 14, "query": "Run a gap analysis on the living documentation", "should_trigger": false, "reason": "Gap analysis — routes to living-doc-gap-finder"}, {"id": 15, "query": "How should I define the component behavior for the payment validator?", "should_trigger": true, "reason": "'define component behavior' trigger phrase"}, - {"id": 16, "query": "Write atomic acceptance criteria for the session expiry logic", "should_trigger": true, "reason": "'atomic acceptance criteria' trigger phrase"} + {"id": 16, "query": "Write atomic acceptance criteria for the session expiry logic", "should_trigger": true, "reason": "'atomic acceptance criteria' trigger phrase"}, + {"id": 17, "query": "Should this behavior be tested with a unit test or an integration test?", "should_trigger": true, "reason": "'unit vs integration test' trigger phrase"}, + {"id": 18, "query": "Help me choose test type for the loyalty points calculation — it calls no external services", "should_trigger": true, "reason": "'choose test type' trigger phrase"} ] From ea9b7c8992da268d71d982b9de1367838ca57bfe Mon Sep 17 00:00:00 2001 From: miroslavpojer Date: Fri, 22 May 2026 21:11:36 +0200 Subject: [PATCH 16/35] Refactor living documentation skills for improved clarity and functionality - Updated SKILL.md for living-doc-create-functionality to clarify Functionality naming and acceptance criteria elicitation. - Enhanced living-doc-create-user-story to streamline narrative elicitation and improve AC generation process. - Revised living-doc-gap-finder to normalize script output and report gaps more effectively. - Improved living-doc-impact-analysis to flag missing coverage and provide a re-test checklist. - Enhanced living-doc-pageobject-scan to better handle fragile selectors and update PageObjects. - Updated living-doc-scenario-creator to ensure accurate scenario generation and coverage reporting. - Refined living-doc-update to maintain AC ID stability and improve documentation practices. - Added gap-report.json to track documentation coverage and identify gaps in User Stories and Functionalities. --- skills/gherkin-living-doc-sync/SKILL.md | 44 +++-- skills/gherkin-scenario/SKILL.md | 23 +++ skills/gherkin-step/SKILL.md | 19 +- skills/living-doc-create-feature/SKILL.md | 38 ++-- .../living-doc-create-functionality/SKILL.md | 134 +++++++------ skills/living-doc-create-user-story/SKILL.md | 65 ++++--- skills/living-doc-gap-finder/SKILL.md | 7 +- .../evals/files/gap-report.json | 176 ++++++++++++++++++ skills/living-doc-impact-analysis/SKILL.md | 3 + skills/living-doc-pageobject-scan/SKILL.md | 12 +- skills/living-doc-scenario-creator/SKILL.md | 28 ++- skills/living-doc-update/SKILL.md | 19 +- 12 files changed, 439 insertions(+), 129 deletions(-) create mode 100644 skills/living-doc-gap-finder/evals/files/gap-report.json diff --git a/skills/gherkin-living-doc-sync/SKILL.md b/skills/gherkin-living-doc-sync/SKILL.md index 2a867a0..9b84402 100644 --- a/skills/gherkin-living-doc-sync/SKILL.md +++ b/skills/gherkin-living-doc-sync/SKILL.md @@ -34,7 +34,7 @@ sync run. | New `.feature` file added | Feature file → living doc | Link each scenario to an AC; create AC if missing | | User Story AC modified or added | Living doc → feature file | Update or add the corresponding scenario | | UI refactored (selector / method renamed) | Step text → PageObject | Update step text; re-link to PageObject method | -| US deprecated | Living doc → feature file | Mark linked scenarios as `@deprecated` or remove | +| US deprecated | Living doc → feature file | Emit one sync action per linked scenario; add `@deprecated`, record the reason, and flag `@review-needed` | | Scenario added without an AC comment | Feature file → living doc | Propose an AC and add the `# AC:` header | --- @@ -90,10 +90,11 @@ DRIFT DETECTED: checkout.feature:17 Apply the minimum necessary change per action: - **Add missing AC link** → insert `# AC: (v) — ` above `Scenario:` -- **Update stale AC description** → update comment text; do not change the AC ID +- **Update stale AC description** → update comment text only; do not change the AC ID. Show the exact change as `OLD:` and `NEW:` lines. If the revised AC intent changed materially, flag the linked step text for review instead of restructuring the scenario in the same sync action. - **Update scenario to match revised AC** → update step text; keep the `# AC:` link unchanged -- **Fix broken step text** → update the `.feature` file to match the step definition -- **Mark deprecated scenarios** → add `@deprecated` tag and a comment with the reason +- **Fix broken step text** → prefer updating the `.feature` file to match the existing step definition and PageObject method; only update the step definition regex when the business wording genuinely changed +- **Mark deprecated scenarios** → add `@deprecated` and `@review-needed`, plus a comment with the date and reason. Emit one action per affected scenario with file and line number. +- **Broken AC reference** → never silently remove the `# AC:` comment. Either relink it to the correct AC ID, or create the missing living doc entity with `living-doc-create-user-story` / `living-doc-create-functionality`, then update the link. - **AC split into multiple ACs** → update the existing scenario's `# AC:` link to the primary AC; create new scenarios for additional ACs Never delete a scenario during sync — flag it with `@review-needed` for developer decision. @@ -102,16 +103,31 @@ Never delete a scenario during sync — flag it with `@review-needed` for develo ## Step 5 — Output sync report +Do **not** apply sync changes automatically. Report `DRIFT DETECTED` blocks first (tests fail), then `SYNC ACTION` blocks (traceability), and ask the developer to confirm each action before editing files. + ``` -SYNC REPORT — 2026-05-22 - Applied automatically (3): - checkout.feature:14 — added AC link header AC:US-001-01 - checkout.feature:28 — updated AC description (AC:US-001-02) - login.feature:7 — fixed step text drift → "When the user submits valid credentials" - - Requires manual review (1): - checkout.feature:45 — Scenario "Apply promo and checkout" has no matching AC - → Either create a new AC in US-001, or remove this scenario if it is obsolete +DRIFT DETECTED: checkout.feature:17 + Step: "When the customer clicks the Confirm Purchase button" + → No matching step definition found + → Previous match: "When the customer confirms the order" (checkout_steps.py:34) + → PageObject method: CheckoutPage.confirm_order() + → Recommended fix: update the feature file step text to match the existing step definition + OR update the step definition regex to match the new wording + → Apply change? (y/n) + +SYNC ACTION: checkout.feature:14 + Scenario: "Customer successfully places an order" + → Missing AC link header + → Proposed link: # AC: US-001-01 (v1.0.0 – Active) — Customer places an order + → Apply change? (y/n) + +SYNC ACTION: checkout.feature:32 + Scenario: "Customer reviews order totals before payment" + → Missing AC link header + → Proposed link: # AC: US-001-02 (v1.0.0 – Active) — Customer reviews the order summary before confirming payment + → Apply change? (y/n) + +Summary: 2 missing AC links, 1 step text drift detected — apply changes? (y/n per action) ``` --- @@ -134,4 +150,4 @@ SYNC REPORT — 2026-05-22 | Writing new Gherkin scenarios from scratch | `gherkin-scenario` | | Implementing step definition code | `gherkin-step` | | Finding ACs with no scenario coverage | `living-doc-gap-finder` | -| Creating new User Story or Feature entities | `living-doc-create-user-story` | +| Creating new User Story, Feature, or Functionality entities | `living-doc-create-user-story` / `living-doc-create-functionality` | diff --git a/skills/gherkin-scenario/SKILL.md b/skills/gherkin-scenario/SKILL.md index a383b1b..be336b5 100644 --- a/skills/gherkin-scenario/SKILL.md +++ b/skills/gherkin-scenario/SKILL.md @@ -30,7 +30,14 @@ Scenario: Customer successfully places an order ... ``` +If the prompt already gives an AC ID and AC wording, copy that ID and wording into the comment; +when no lifecycle marker is supplied, use `(v1.0.0 – Active)` as the default status text. + If writing standalone scenarios (no User Story context), use `# AC: STANDALONE` as a placeholder. +When asked what comment to use for exploratory work, answer with the placeholder, say that tutorial +walkthroughs, exploratory probes, and other developer-authored scenarios without a User Story AC +all qualify, and note that `gherkin-living-doc-sync` will report `STANDALONE` scenarios without +flagging them as traceability gaps. Standalone scenarios are permitted when they live outside the project's dedicated living doc feature directory. Tutorial walkthroughs, exploratory probes, and any other developer-authored scenarios that don't map to a User Story AC all qualify — the decision is the developer's. @@ -104,6 +111,11 @@ Scenario Outline: Discount is applied correctly for each membership tier | bronze | 95.00 | ``` +When illustrating discount calculations, show the resulting order total in the `Then` step or +`Examples:` table rather than the raw discount percentage. If the prompt does not give an amount, +default to a £100.00 order for comparison tables and to £200.00 for single-scenario threshold cases +such as "orders over £100" so the discounted outcome is concrete. + --- ## Use Background for shared preconditions @@ -111,6 +123,10 @@ Scenario Outline: Discount is applied correctly for each membership tier Use `Background` when **every** scenario in the file shares the same precondition. Keep Background to 3 steps or fewer. If only 2–3 scenarios share a precondition, duplicate the `Given` step — prefer clarity over abstraction. +When answering whether `Background` is appropriate, explicitly mention all three checks: +shared-by-every-scenario, 3-steps-or-fewer, and duplicate the `Given` steps instead when only a +subset of scenarios needs them. Keep `Background` to shared `Given` preconditions, not `When` or +`Then` steps. --- @@ -124,6 +140,9 @@ duplicate the `Given` step — prefer clarity over abstraction. | Assertions in Given/When | Violates keyword semantics | Move all assertions to `Then` | | Scenario depends on a previous scenario's state | Hidden ordering dependency | Each scenario must be fully self-contained | +When reviewing an existing scenario, explicitly check for a missing `# AC:` comment immediately +above each `Scenario:` or `Scenario Outline:` and call that out as a traceability defect. + --- ## Output format for generated scenarios @@ -141,3 +160,7 @@ and `Examples:` inside the block. | Implementing step definitions | **gherkin-step** | | Writing unit tests | **test-unit-write** | | Designing a test case table | **test-case-design** | + +If asked for step definition code, do not write it here. Redirect to **gherkin-step** and explain +that this skill writes or reviews Gherkin scenario text, while **gherkin-step** implements the step +binding code. diff --git a/skills/gherkin-step/SKILL.md b/skills/gherkin-step/SKILL.md index 09b3766..6ef4301 100644 --- a/skills/gherkin-step/SKILL.md +++ b/skills/gherkin-step/SKILL.md @@ -18,6 +18,14 @@ description: > > **Glossary:** Feature, PageObject, Functionality — see [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)). +## Respect the boundary with Gherkin text + +If the user asks to write or review a **Gherkin scenario / feature file**, do not draft the +scenario here. Explain that this skill covers **step definition code** only, then route the user to +`gherkin-scenario` for the Gherkin text itself. + +--- + ## Keep step definitions thin Step definitions are bindings — they translate Gherkin text into calls to PageObjects, domain @@ -122,11 +130,20 @@ def step_receive_payload(context): | `before_all` / `BeforeAll` | Expensive one-time setup (start containers) | Per-test state | | `after_all` / `AfterAll` | Stop containers, close connections | Per-test cleanup | -Tag hooks to scope them to specific scenarios: +`before_scenario` runs before **every** scenario by default, so add a tag check when setup +should only apply to a subset. When explaining this pattern, say explicitly that the hook still +fires for every scenario; the `if "database" in context.tags` check only gates the expensive setup. + +Tag hooks to scope them to specific scenarios, and pair setup with matching cleanup: ```python @before_scenario def setup_database(context): if "database" in context.tags: context.db = create_test_db() + +@after_scenario +def teardown_database(context): + if "database" in context.tags: + context.db.teardown() ``` diff --git a/skills/living-doc-create-feature/SKILL.md b/skills/living-doc-create-feature/SKILL.md index 92267e3..164094a 100644 --- a/skills/living-doc-create-feature/SKILL.md +++ b/skills/living-doc-create-feature/SKILL.md @@ -23,7 +23,7 @@ compatibility: GitHub Copilot ## Step 1 — Identify the system surface -Before asking, **scan the conversation context** for a surface name, surface type, and owning team already stated by the user. If all three are present, propose the Feature directly and ask for confirmation rather than re-asking the questions. +Before asking, **scan the conversation context** for a surface name, surface type, and owning team already stated by the user. If the prompt already gives enough information to draft the entity, infer the obvious details and propose the Feature directly instead of blocking on follow-up questions. Ask only for what is still missing or ambiguous. Ask only for what is missing: *What system surface does this Feature represent?* @@ -33,6 +33,12 @@ Select the surface type: |---|---| | `UI` | A web page, modal, or named screen (e.g. Checkout Page, Login Screen) | | `API` | A REST/GraphQL endpoint or endpoint group, including a backend service's public API contract (e.g. Orders API, Payment Gateway API) | +| `Service` | A named backend/service surface with its own contract (e.g. Customer Profile Service) | +| `Worker` | An asynchronous/background processor (e.g. Notification Worker) | +| `Module` | A distinct internal module with a stable contract or bounded responsibility | +| `Library` | A substantial shared internal library that is intentionally tracked as its own surface | + +Feature names should be **noun phrases** that name the surface. If it could plausibly be a PageObject or service/module class name (for example `PaymentPage`), it is usually a good Feature name. **One surface test abstraction ≈ one Feature** — a UI screen has a PageObject, an API endpoint group has an annotated endpoint method. See [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)) for details. @@ -74,33 +80,35 @@ FUNC entries), leave the array as `[]` and add a warning: ## Step 6 — Output canonical Feature entity -> **ID assignment:** before assigning a `FEAT-nnn` ID, run -> `python scripts/next_id.py --type FEAT --catalog catalog.json` -> to get the next available ID and avoid collisions. +Use a readable slug ID based on the business surface name: `FEAT-` (for example `FEAT-checkout`, `FEAT-orders-api`, `FEAT-notifications-centre`). For UI names ending in generic words like `Page`, `Screen`, or `Modal`, you may omit that trailing UI noun in the ID when the shorter slug stays unambiguous. + +Output the entity as a **single fenced `json` code block** whenever you have enough information to draft it. Keep any warnings or follow-up questions **outside** the code block. If the user gives a named surface but not all metadata, ask the missing questions and still include a starter draft in the same reply, using inferred purpose/surface type, `status: "planned"`, and `[]` for relationships that are still unknown. If the request explicitly asks to create the entity from the given details, emit the draft immediately. -Output using the project's Storage Profile format. Canonical fields: +Canonical JSON fields: | Field | Required | Value | |---|---|---| -| entity type | Yes | `Feature` | -| `id` | Yes | `FEAT-` (e.g. `FEAT-001`) | +| `type` | Yes | `Feature` | +| `id` | Yes | `FEAT-` | | `name` | Yes | Noun phrase (e.g. "Login Page") | -| `surface_type` | Yes | `UI` \| `API` | +| `surface_type` | Yes | `UI` \| `API` \| `Service` \| `Worker` \| `Module` \| `Library` | | `purpose` | Yes | One-to-two sentence description in business language | -| `status` | Yes | `planned` \| `active` \| `deprecated` | -| `user_stories` | Yes | List of `US-` IDs (can be `[]` for new Features) | -| `functionalities` | Yes | List of `FUNC-` IDs (can be `[]` initially) | +| `status` | Yes | `planned` \| `active` \| `candidate` \| `deprecated` | +| `user_stories` | Yes | List of `US-<...>` IDs (use `[]` if unknown) | +| `functionalities` | Yes | List of `FUNC-<...>` IDs (use `[]` if unknown or still only candidates) | | `owners` | Yes | Team name(s) | -| `external_dependencies` | No | Names of services or systems this Feature calls | +| `external_dependencies` | Yes | Names of services or systems this Feature calls | + +If `user_stories` is `[]`, repeat the orphan warning from Step 3 outside the JSON. If `functionalities` is `[]` because they are still just candidate notes, repeat the formal-definition warning from Step 4 outside the JSON. ## Anti-patterns to flag | Anti-pattern | Warning | |---|---| | Feature covers multiple unrelated screens | Split into one Feature per distinct screen | -| Feature name is a verb (e.g. "Process Payment") | Feature names should be nouns — name the surface. Verb phrases describe *what the surface does*, which belongs in a Functionality entity (use **living-doc-create-functionality**) | -| Feature has no User Stories and no Functionalities | Orphan Feature — link or delete | -| Shared utility library documented as a Feature | A third-party dependency is not a system surface — document it as `external_dependency` in the Features that consume it. Internal module-level behaviors belong in Functionality entities under the API Feature that owns the service contract. | +| Feature name is a verb (e.g. "Process Payment") | Feature names should be nouns — name the surface. Verb phrases describe *what the surface does*, which belongs in a Functionality entity (use **living-doc-create-functionality**). If it could be a PageObject or service/module class name, it is usually a better Feature name. | +| Feature has no User Stories and no Functionalities | Orphan Feature — it contributes no traceable business value. Link at least one User Story, mark it as `candidate` if it is still exploratory, or delete it if it is no longer relevant. Orphan Features will be surfaced as gaps in living-doc-gap-finder reports. | +| Shared utility library documented as a Feature | By default, a shared utility library is not a Feature — document it as an `external_dependency` on the consumer Features. Only create a standalone Feature when the library is substantial enough to be treated as a distinct shared surface; in that case use `surface_type: "Library"` and mark it as a shared internal dependency. Features should map 1:1 to distinct/deployable surfaces. | | Feature name encodes implementation technology (e.g. "React Login Component", "Spring Payment Controller") | Feature names describe the business surface, not the stack. Use "Login Screen" (UI) or "Payment API" (API) — technology choice is an implementation detail that changes without the surface changing. | | `surface_type` is `UI` for a backend REST controller or service | A REST endpoint group is an `API` surface. `UI` is reserved for screens a human interacts with directly. Misclassification breaks impact analysis routing between frontend and backend changes. | | Feature shares a name with an existing Feature | Check for duplicates before creating. Identical names indicate a merge candidate or a scope overlap — clarify the boundary before proceeding. | diff --git a/skills/living-doc-create-functionality/SKILL.md b/skills/living-doc-create-functionality/SKILL.md index caa96d2..d9df979 100644 --- a/skills/living-doc-create-functionality/SKILL.md +++ b/skills/living-doc-create-functionality/SKILL.md @@ -23,89 +23,107 @@ compatibility: GitHub Copilot ## Step 1 — Elicit the behavior -Before asking, **scan the conversation context** for a behavior phrase and parent Feature already stated by the user. If both are present, form the Functionality name directly and ask for confirmation rather than re-asking the questions. +Before asking, **scan the conversation context** for a behavior phrase and parent Feature already stated by the user. If the behavior is already clear, do not re-ask for it. Ask only for what is missing: *What is the atomic behavior to document?* -Express as a **verb phrase** — a single, focused responsibility. The Functionality name follows -the pattern: `` (e.g. "Login Page – Validate Password Strength"). +Express the Functionality `name` as a **verb phrase only** — one atomic responsibility, with no Feature prefix. Keep the owning Feature separate in `feature_id`. ``` -✅ "Calculate discount for a cart item given the customer's membership tier" -✅ "Validate that an order quantity is within the allowed range" -✅ "Raise a CartEmptyError when checkout is attempted on an empty cart" +✅ "Validate cart contains at least one in-stock item" +✅ "Apply gold member discount on qualifying orders" +✅ "Deduct voucher discount before tax is calculated" -❌ "Handle the checkout process" (too broad — split into multiple Functionalities) -❌ "The payment page" (that is a Feature, not a Functionality) +❌ "Handle checkout" (too broad — split into multiple Functionalities) +❌ "The payment page" (that is a Feature, not a Functionality) ``` ## Step 2 — Identify the parent Feature -Ask: *Which Feature (system surface) owns this behavior?* - -A Functionality must belong to at least one Feature. If the Feature does not yet exist, suggest -creating it with `living-doc-create-feature` first. - +Ask: *Which Feature (system surface) owns this behavior?* only if it is not already obvious from the prompt. +A Functionality must belong to at least one Feature. If the user clearly names the surface or domain (for example checkout, basket, login, pricing), infer a provisional `feature_id` such as `FEAT-checkout` and proceed. If the Feature truly does not yet exist, suggest creating it with `living-doc-create-feature` first. ## Step 3 — Elicit Functionality-level Acceptance Criteria Functionality ACs describe atomic inputs → outputs. They are: - **Atomic**: one input condition, one output or side effect per AC -- **Fast-testable**: designed for verification by unit or integration test. E2E tests *can* exercise the same behavior, but they are slow and expensive — they belong in a separate system-test tier, not the fast or regression suite. -- **Unambiguous**: exact error codes, exact output values where relevant +- **Fast-testable**: designed for verification by unit or integration test +- **Unambiguous**: exact error codes, exact output values, exact rule outcomes where relevant -Use the canonical AC format (see [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md))): +Write **3-7 ACs** for one coherent behavior. If a Functionality needs around **12 ACs**, treat that as a strong sign it is not atomic and split it into 2-3 focused Functionalities. -``` -AC:FUNC-- (v – Planned) - – - – : value1, value2, ... ← only when two or more values vary - – Rationale: ← optional -``` - -**Completeness checklist — prompt for each:** +**Completeness checklist — adapt each prompt to the domain before finalizing:** | Category | Prompt | |---|---| -| Empty / null input | "What happens when the input is null or empty?" | -| Boundary values | "What happens at the minimum and maximum allowed values?" | -| Invalid type / format | "What error is raised for invalid format, and what is the error code?" | -| Concurrent access | "Is there a race condition? Should this behavior be idempotent?" | -| All error codes | "Are all error codes documented (not just the generic 'error occurred')?" | +| Empty / null input | "What happens when the input is null, empty, or missing entirely?" | +| Invalid members / states | "What happens when every item is invalid, only some items are valid, an item has an invalid state such as zero quantity or out-of-stock, or the actor is not eligible (for example a non-gold member)?" | +| Boundary values | "What happens below the threshold, exactly on the threshold, and above it?" | +| Rule interactions | "Does this combine with other rules, discounts, promo codes, or validations? If so, what is the stacking or precedence rule?" | +| External dependency | "Does proving this behavior require a real DB / service read or write, or can it be verified as a pure function?" | +| All error codes | "Are all error codes documented explicitly, not just 'error' or 'invalid'?" | + +Warn if only happy-path ACs are present. + +### Choosing `test_type` -Warn if only happy-path ACs are present — same as for User Stories. +- Use **`unit`** when the behavior can be verified in isolation as a pure calculation, validation rule, or deterministic transformation. +- Use **`integration`** when correctness depends on a real database, uniqueness check, external service, persistence side effect, or cross-component interaction. +- If the behavior could be refactored into a pure function that accepts all required inputs directly, prefer that design and then use **`unit`**. + +### When reviewing an existing Functionality + +Classify findings as **Blocker**, **Important**, or **Nit**. +- **Blocker**: not atomic, vague ACs, or non-testable wording such as "works correctly". +- **Important**: missing error codes, missing boundary conditions, or missing interaction rules. +- **Nit**: wording cleanup that does not change the contract. + +For Blocker or Important findings, propose a split into smaller Functionalities where needed and show rewritten AC examples with exact `When` / `Then` outcomes and explicit error codes. ## Step 4 — Flag reuse candidates -Before creating, check whether an identical behavior already exists under any Feature. **Compare ACs, not names** — the same verb phrase in a different Feature context often produces a legitimately different contract (e.g. "Validate Amount" on a Payment Feature vs. a Transfer Feature may enforce different limits and error codes and must remain separate). +Before creating, check whether an identical behavior already exists under any Feature. **Compare ACs, not names** — the same verb phrase in a different Feature context often produces a legitimately different contract. -If the ACs are identical or near-identical across two Features: +If the ACs are identical or near-identical across Features or User Stories, prefer **one shared Functionality**. Link every consuming User Story in the `user_stories` array instead of duplicating the ACs. -> "This behavior has the same contract as [FUNC-nnn] under [parent Feature]. Consider whether -> both are genuinely the same behavior in different contexts, or whether one can be reused. -> If the contracts are truly identical, consolidating avoids a maintenance burden — a contract -> change must otherwise be applied in every copy, increasing the risk of divergence." +> "This is a reuse candidate. If the contract is truly identical, keep one Functionality and link both User Stories to it. Duplicating the same AC in multiple places creates maintenance burden and raises the risk of divergence when the behavior changes." If contextually distinct despite similar names, create a new Functionality and note the related one for future reviewers. ## Step 5 — Output canonical Functionality entity -> **ID assignment:** before assigning a `FUNC-nnn` ID, run -> `python scripts/next_id.py --type FUNC --catalog catalog.json` -> to get the next available ID and avoid collisions. -> For AC IDs, use `--type AC --parent FUNC-` to get the next sequential AC number. - -Output using the project's Storage Profile format (defined per project — see `../../docs/guides/living-doc-copilot.md`). Canonical fields (see [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)) for AC format details): +When creating a Functionality, output **one fenced `json` code block** and no extra prose inside the block. + +Use this canonical shape: + +```json +{ + "type": "Functionality", + "id": "FUNC-", + "name": "", + "description": "", + "feature_id": "FEAT-", + "user_stories": ["US-"], + "acceptance_criteria": [ + "When , ", + "When , validation returns INVALID with code ", + "When , " + ], + "test_coverage": [ + {"ac": "AC-1", "test_type": "unit", "justification": "Pure validation rule"}, + {"ac": "AC-2", "test_type": "unit", "justification": "Pure validation rule"} + ], + "status": "planned" +} +``` -| Field | Required | Value | -|---|---|---| -| entity type | Yes | `Functionality` | -| `id` | Yes | `FUNC-` (e.g. `FUNC-001`) | -| `name` | Yes | `` (e.g. "Login Page – Validate Password Strength") | -| `parent_feature` | Yes | `FEAT-` ID of the owning Feature | -| `status` | Yes | `planned` \| `active` \| `deprecated` | -| `acceptance_criteria` | Yes | List of ACs in the format defined in [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)) | +Rules: +- `id` uses the stable draft convention `FUNC-` when no catalog allocator is available in-session. +- `name` stays a verb phrase only. +- `description` and `acceptance_criteria` must stay in plain business language with **no implementation details**. +- Every acceptance criterion must state an exact outcome; error cases must include the explicit error code. +- `test_coverage` must cover every AC and record `unit` or `integration` consistently with Step 3. ## Distinguishing Functionality ACs from User Story ACs @@ -122,13 +140,15 @@ redirect to `living-doc-create-user-story`. | Anti-pattern | Warning | |---|---| -| Functionality name is a noun (e.g. "Password Validation") | Names must be verb phrases expressing the atomic behavior — e.g. "Validate Password Strength". A noun names a concept; a verb phrase names what the code does. | -| Functionality AC describes a full user journey (e.g. "User logs in and sees their dashboard") | That is a User Story AC — redirect to **living-doc-create-user-story**. Functionality ACs describe a single function's input → output or side effect. | -| Functionality has only happy-path ACs | Edge cases (null input, boundary values, error codes) are missing. Run through the completeness checklist in Step 3 before confirming. Untested error paths are the most common source of production incidents. | -| AC says "returns error" without specifying the type or code | Specify the error code using the canonical AC format: `– Raises {error code} when …` with `– Error code: CODE_VALUE`. Without a named code, the AC cannot be verified against a specific error contract. | -| AC uses `{placeholder}` for a single fixed value | Write the value inline. `{placeholder}` is only justified when two or more values vary across AC variants. | -| Two Functionalities have identical or near-identical ACs | Duplicate ACs create a maintenance burden. Consolidate into one shared Functionality owned by the appropriate parent Feature. | -| Functionality has no parent Feature | A Functionality without a parent Feature is untraceable — it cannot appear in impact analysis. Create or identify the parent Feature first. | +| Functionality name is a noun (e.g. "Password Validation") | Names must be verb phrases expressing the atomic behavior — e.g. "Validate Password Strength". | +| Functionality name is broad (e.g. "Handle checkout") | That is not atomic. Split it into smaller behaviors such as validation, pricing, payment authorization, or order submission. | +| Functionality AC describes a full user journey (e.g. "User logs in and sees their dashboard") | That is a User Story AC — redirect to **living-doc-create-user-story**. Functionality ACs describe a single behavior's input → output or side effect. | +| Functionality has only happy-path ACs | Edge cases (null input, boundary values, partial validity, error codes) are missing. Run through the completeness checklist in Step 3 before confirming. | +| AC says "returns error" without specifying the type or code | Specify the exact error code. Without a named code, the AC is not testable. | +| AC wording is vague (e.g. "works correctly", "handles it appropriately") | Rewrite with exact `When` / `Then` behavior and explicit outputs or error codes. | +| Functionality has more than 7 ACs | Review for non-atomic scope. Around 12 ACs is almost certainly too broad and should be split into 2-3 Functionalities. | +| Two Functionalities have identical or near-identical ACs | Duplicate ACs create a maintenance burden. Consolidate into one shared Functionality and link all related `user_stories`. | +| Functionality has no parent Feature | A Functionality without a parent Feature is untraceable — create or identify the parent Feature first. | ## Out-of-scope redirects diff --git a/skills/living-doc-create-user-story/SKILL.md b/skills/living-doc-create-user-story/SKILL.md index 396c7ca..3b6389a 100644 --- a/skills/living-doc-create-user-story/SKILL.md +++ b/skills/living-doc-create-user-story/SKILL.md @@ -23,9 +23,9 @@ compatibility: GitHub Copilot ## Step 1 — Elicit the narrative -Before asking, **scan the conversation context** for an actor, capability, or business outcome already stated by the user. If all three are present, form the narrative directly and ask for confirmation rather than re-asking the questions. +Before asking, **scan the conversation context** for an actor, capability, or business outcome already stated by the user. If the user clearly provides all three parts and asks for the final artifact now, form the narrative directly and proceed to output. Otherwise, walk through all three questions in order. When a detail is already implied, restate it as a proposed answer and ask the user to confirm or refine it rather than silently skipping the question. -Ask only for what is missing: +Ask these three questions explicitly: 1. **Who is the user?** — The actor using the system (a specific role, not "the user") 2. **What do they want to do?** — The capability or action in business terms @@ -64,15 +64,11 @@ Each AC must be: - **Binary** — clear pass/fail; no "should usually" or "typically" - **Single placeholder** — at most ONE `{placeholder}` per AC statement. If two aspects vary independently, write a separate AC for each. -Use `{placeholder}` syntax when a value varies, and list the concrete values immediately below: +Use `{placeholder}` syntax when a value varies, and list the concrete values immediately below. During elicitation, capture ACs in **Given / When / Then** language; in the final JSON, convert each accepted AC into a plain-language description with no Gherkin keywords. -``` -AC:US-- (v – Planned) - – - – : value1, value2, ... -``` +When reviewing an existing User Story, classify **only happy-path ACs present** as an **Important** gap. Name the missing cases in domain language and propose 2-3 extra Given / When / Then ACs. For password-reset stories, explicitly check for: unregistered email or phone, expired token or code, already-used token or code, wrong code, and retry limits. -See full AC format and examples in [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)). +If the request is really for a single atomic rule or technical behavior rather than an end-to-end user outcome, say so explicitly: this is a **Functionality-level behavior**, not a User Story. Stop and redirect to `living-doc-create-functionality`. **Completeness check — always ask:** 1. What happens on the happy path? (at least one AC required) @@ -94,7 +90,6 @@ Warn if only happy-path ACs are present: > **ID assignment:** before assigning a `US-nnn` ID, run > `python scripts/next_id.py --type US --catalog catalog.json` > to get the next available ID and avoid collisions. -> For AC IDs, use `--type AC --parent US-` to get the next sequential AC number. Invariants that must hold before outputting: - At least one AC exists @@ -102,19 +97,40 @@ Invariants that must hold before outputting: - Status defaults to `planned` - No open `[TODO]` markers -Output the User Story using the project's Storage Profile format. Canonical fields: +When creating a new User Story, output **one fenced `json` code block** using this canonical shape: + +```json +{ + "type": "UserStory", + "id": "US-001", + "title": "Reset password via SMS", + "status": "planned", + "as_a": "registered customer", + "i_want": "reset my password via SMS", + "so_that": "I can regain access even when I cannot use email", + "features": ["FEAT-login"], + "acceptance_criteria": [ + { + "id": "US-001-AC-1", + "description": "A registered customer with a phone number on file can request a password reset code by SMS and sees confirmation that the code was sent." + }, + { + "id": "US-001-AC-2", + "description": "A customer who enters an unregistered phone number is told that the reset request cannot be completed." + }, + { + "id": "US-001-AC-3", + "description": "A customer who submits an expired or already-used reset code is told to request a new code." + } + ] +} +``` -| Field | Required | Value | -|---|---|---| -| entity type | Yes | `UserStory` | -| `id` | Yes | `US-` (e.g. `US-001`) | -| `name` | Yes | Short imperative title (e.g. "Customer Login") | -| `status` | Yes | `planned` — default for new entities | -| `as_a` | Yes | Named actor | -| `i_can` | Yes | The capability | -| `so_that` | Yes | Business outcome | -| `features` | Yes | List of `FEAT-` IDs | -| `acceptance_criteria` | Yes | List of ACs in the format defined in [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)) | +Rules: +- Use `title` rather than `name` +- Use `as_a`, `i_want`, and `so_that` +- Every AC object must have `id` in `US--AC-` format and a plain-language `description` +- Keep Gherkin keywords out of JSON values ## Anti-patterns to flag @@ -123,8 +139,9 @@ Output the User Story using the project's Storage Profile format. Canonical fiel | AC says "the system saves to the database" | Technical implementation — restate as user outcome. Provide a rewritten AC: e.g. "When the customer confirms the order, then the order is acknowledged and the customer sees a confirmation message." | | AC says "unit test passes" | Test is not an AC — describe the behavior, not how it's verified | | Narrative says "As a system..." | System is not a user — name the human role | -| Same capability described for two different actors | Two actors = two separate User Stories. Different actors have different permissions, audit requirements, and AC sets. Mixing two actor perspectives in one User Story produces ambiguous ACs. Shared Functionalities (e.g. OTP generation, email delivery) can be linked to both User Stories. || User Story "I can" clause contains "and" | Multiple capabilities in one User Story — split at each “and”. Each capability has its own failure paths and may touch different Features; bundling them makes ACs ambiguous and traceability impossible. | +| Same capability described for two different actors | Two actors = two separate User Stories. Different actors have different permissions, audit requirements, and AC sets. Mixing two actor perspectives in one User Story produces ambiguous ACs. Shared Functionalities (e.g. OTP generation, email delivery) can be linked to both User Stories. | +| User Story "I want" clause contains "and" | Multiple capabilities in one User Story — split at each “and”. Each capability has its own failure paths and may touch different Features; bundling them makes ACs ambiguous and traceability impossible. | | AC uses `{placeholder}` for a single value | Placeholder syntax is only justified when two or more values vary. If only one value applies, write it inline. Example: instead of `{error type}: inline validation message`, write `an inline validation message is shown`. | | AC describes a non-observable outcome | e.g. “a background job processes the record” — the user cannot observe this. Restate as the observable signal (e.g. “the confirmation email arrives within 60 seconds”), or redirect the behavior to a Functionality entity if it is purely technical. | -| AC identifier is missing the version or state | AC format requires `AC:- (v)`. An AC without version or state cannot be traced across releases or marked as deprecated without rewriting its ID. | +| AC identifier does not follow `US--AC-` | Every acceptance criterion in the JSON output needs a stable `US--AC-` id so it can be referenced unambiguously. | | AC behavior already documented in another User Story | Duplicate ACs create a maintenance burden — any change must be applied in every copy. Extract the shared behavior into a Functionality entity and link both User Stories to it. | \ No newline at end of file diff --git a/skills/living-doc-gap-finder/SKILL.md b/skills/living-doc-gap-finder/SKILL.md index 8a9aaf4..fb5c826 100644 --- a/skills/living-doc-gap-finder/SKILL.md +++ b/skills/living-doc-gap-finder/SKILL.md @@ -40,6 +40,11 @@ Run the script first, then use its output to drive the Prioritise and Propose st The Workflow section describes the logic the script encodes — read it for understanding, but delegate the computation to the script rather than reproducing it through reasoning. +Before presenting the final report, normalise the script output against the taxonomy in this skill: +- Gap type 1 applies to **both User Story ACs and Functionality ACs**. If a Functionality has ACs and no linked tests, report those ACs as `UNTESTED_AC` **Blockers** (you may summarise as `FUNC-xyz has N ACs with no linked tests`) and do **not** leave the same root cause only as `UNDOCUMENTED_FUNCTIONALITY`. +- Report documentation coverage **separately** for User Story ACs and Functionality ACs, even if the raw script output gives a combined number. +- For Gap type 2, treat a discovered screen/API as already documented when an existing Feature clearly owns the same surface by path, name, or domain meaning (for example `/account/orders` ↔ `Account Dashboard`, `/reports/legacy` ↔ `Legacy Report Screen`). Only raise `UNDOCUMENTED_SURFACE` when no plausible owning Feature exists. + --- ## Usage modes @@ -253,7 +258,7 @@ have at least one covered AC. Once the baseline is met, continue by tackling the biggest remaining gaps first: 1. **Rank gap clusters by count** — group all remaining gaps by type and sort descending by number of affected entities. -2. **Work the largest cluster first** — a cluster of 20 UNTESTED_AC gaps in one domain has more impact than 5 scattered ORPHAN_TEST gaps. +2. **Start with the highest-risk domain first** — payment, auth, security, or other release-critical areas take priority over lower-risk domains, even before broad legacy clean-up. 3. **Batch by domain** — within a cluster, process one Feature or service at a time. 4. **Iterate** — after each batch, re-run gap-finder on that domain before moving to the next. diff --git a/skills/living-doc-gap-finder/evals/files/gap-report.json b/skills/living-doc-gap-finder/evals/files/gap-report.json new file mode 100644 index 0000000..c7df25f --- /dev/null +++ b/skills/living-doc-gap-finder/evals/files/gap-report.json @@ -0,0 +1,176 @@ +{ + "generated_at": "2026-05-22T19:03:52.630632+00:00", + "documentation_coverage": { + "total_acs": 9, + "covered_acs": 1, + "coverage_percentage": 11.1 + }, + "summary": { + "total_gaps": 20, + "blockers": 8, + "important": 9, + "nits": 3 + }, + "gaps": [ + { + "id": "GAP-001", + "type": "UNTESTED_AC", + "severity": "Blocker", + "entity": "US-001-AC-2", + "description": "AC 'US-001-AC-2' has no linked test", + "proposed_action": "Generate a BDD scenario using living-doc-scenario-creator, or add a unit/integration test and link it to this AC" + }, + { + "id": "GAP-002", + "type": "UNTESTED_AC", + "severity": "Blocker", + "entity": "US-001-AC-3", + "description": "AC 'US-001-AC-3' has no linked test", + "proposed_action": "Generate a BDD scenario using living-doc-scenario-creator, or add a unit/integration test and link it to this AC" + }, + { + "id": "GAP-003", + "type": "UNTESTED_AC", + "severity": "Blocker", + "entity": "US-002-AC-1", + "description": "AC 'US-002-AC-1' has no linked test", + "proposed_action": "Generate a BDD scenario using living-doc-scenario-creator, or add a unit/integration test and link it to this AC" + }, + { + "id": "GAP-004", + "type": "UNTESTED_AC", + "severity": "Blocker", + "entity": "US-002-AC-2", + "description": "AC 'US-002-AC-2' has no linked test", + "proposed_action": "Generate a BDD scenario using living-doc-scenario-creator, or add a unit/integration test and link it to this AC" + }, + { + "id": "GAP-005", + "type": "UNTESTED_AC", + "severity": "Blocker", + "entity": "US-007-AC-1", + "description": "AC 'US-007-AC-1' has no linked test", + "proposed_action": "Generate a BDD scenario using living-doc-scenario-creator, or add a unit/integration test and link it to this AC" + }, + { + "id": "GAP-006", + "type": "UNTESTED_AC", + "severity": "Blocker", + "entity": "US-007-AC-2", + "description": "AC 'US-007-AC-2' has no linked test", + "proposed_action": "Generate a BDD scenario using living-doc-scenario-creator, or add a unit/integration test and link it to this AC" + }, + { + "id": "GAP-007", + "type": "UNTESTED_AC", + "severity": "Blocker", + "entity": "US-007-AC-3", + "description": "AC 'US-007-AC-3' has no linked test", + "proposed_action": "Generate a BDD scenario using living-doc-scenario-creator, or add a unit/integration test and link it to this AC" + }, + { + "id": "GAP-008", + "type": "UNTESTED_AC", + "severity": "Blocker", + "entity": "US-007-AC-4", + "description": "AC 'US-007-AC-4' has no linked test", + "proposed_action": "Generate a BDD scenario using living-doc-scenario-creator, or add a unit/integration test and link it to this AC" + }, + { + "id": "GAP-009", + "type": "UNDOCUMENTED_SURFACE", + "severity": "Important", + "entity": "/account/orders", + "description": "Surface '/account/orders' exists in the application with no Feature entity", + "proposed_action": "Create a Feature entity using living-doc-create-feature" + }, + { + "id": "GAP-010", + "type": "UNDOCUMENTED_SURFACE", + "severity": "Important", + "entity": "/account/preferences", + "description": "Surface '/account/preferences' exists in the application with no Feature entity", + "proposed_action": "Create a Feature entity using living-doc-create-feature" + }, + { + "id": "GAP-011", + "type": "UNDOCUMENTED_SURFACE", + "severity": "Important", + "entity": "/reports/legacy", + "description": "Surface '/reports/legacy' exists in the application with no Feature entity", + "proposed_action": "Create a Feature entity using living-doc-create-feature" + }, + { + "id": "GAP-013", + "type": "ORPHAN_FEATURE", + "severity": "Important", + "entity": "FEAT-orphan", + "description": "Feature 'FEAT-orphan' (Legacy Report Screen) has no linked User Stories", + "proposed_action": "Link to an existing User Story or confirm with the product owner whether to deprecate" + }, + { + "id": "GAP-012", + "type": "ORPHAN_FEATURE", + "severity": "Important", + "entity": "FEAT-promo", + "description": "Feature 'FEAT-promo' (Promotions Module) has no linked User Stories", + "proposed_action": "Link to an existing User Story or confirm with the product owner whether to deprecate" + }, + { + "id": "GAP-014", + "type": "ORPHAN_USER_STORY", + "severity": "Important", + "entity": "US-007", + "description": "User Story 'US-007' (Apply a promotional discount) has no linked Feature", + "proposed_action": "Link to an existing Feature or create the missing Feature using living-doc-create-feature" + }, + { + "id": "GAP-017", + "type": "ORPHAN_TEST", + "severity": "Important", + "entity": "View paginated order history", + "description": "Test 'View paginated order history' has no linked AC", + "proposed_action": "Link to an existing AC, or create a Functionality for the behavior using living-doc-create-functionality. Never delete the test to resolve the gap." + }, + { + "id": "GAP-016", + "type": "ORPHAN_TEST", + "severity": "Important", + "entity": "test_login_flow.feature", + "description": "Test 'test_login_flow.feature' has no linked AC", + "proposed_action": "Link to an existing AC, or create a Functionality for the behavior using living-doc-create-functionality. Never delete the test to resolve the gap." + }, + { + "id": "GAP-015", + "type": "ORPHAN_TEST", + "severity": "Important", + "entity": "test_order_history.py", + "description": "Test 'test_order_history.py' has no linked AC", + "proposed_action": "Link to an existing AC, or create a Functionality for the behavior using living-doc-create-functionality. Never delete the test to resolve the gap." + }, + { + "id": "GAP-018", + "type": "UNDOCUMENTED_FUNCTIONALITY", + "severity": "Nit", + "entity": "FUNC-apply-discount", + "description": "Functionality 'FUNC-apply-discount' has 5 AC(s) with no linked tests", + "proposed_action": "Create unit or integration tests for this Functionality's ACs and link them" + }, + { + "id": "GAP-019", + "type": "EMPTY_FEATURE", + "severity": "Nit", + "entity": "FEAT-account", + "description": "Feature 'FEAT-account' (Account Dashboard) has no Functionalities defined", + "proposed_action": "Create Functionalities for known behaviors using living-doc-create-functionality" + }, + { + "id": "GAP-020", + "type": "EMPTY_FEATURE", + "severity": "Nit", + "entity": "FEAT-orphan", + "description": "Feature 'FEAT-orphan' (Legacy Report Screen) has no Functionalities defined", + "proposed_action": "Create Functionalities for known behaviors using living-doc-create-functionality" + } + ] +} \ No newline at end of file diff --git a/skills/living-doc-impact-analysis/SKILL.md b/skills/living-doc-impact-analysis/SKILL.md index d6c595a..220fcc8 100644 --- a/skills/living-doc-impact-analysis/SKILL.md +++ b/skills/living-doc-impact-analysis/SKILL.md @@ -64,6 +64,7 @@ Start from the code change (PR diff, renamed module, deleted endpoint): 3. For each changed module, identify the corresponding Feature by traversing entity relationships: - Which Feature owns this module? (check the Feature's `functionalities` links or ask the owning team) - Which Functionalities does this module implement? + - If the module has no matching `feature_registry` entry, treat it as missing living doc coverage for impact-analysis purposes: flag a **High-impact gap**, recommend `living-doc-create-functionality`, and note that the registry mapping must be added. ## Step 2 — Trace to living doc entities @@ -128,6 +129,8 @@ IMPACT MAP — PR #217: "Refactor promo validation to support stacked discounts" → Invoke test-e2e-standards ``` +If the request is framed as **"what needs re-testing"**, present Step 4 as a compact **re-test checklist**: group by Feature / Functionality / User Story, list the affected ACs, and then list every linked Gherkin scenario that must be re-run. + ## Step 5 — Release sign-off checklist Before a release, confirm that all High-impact entities have been addressed: diff --git a/skills/living-doc-pageobject-scan/SKILL.md b/skills/living-doc-pageobject-scan/SKILL.md index b4e7ee0..5ac41a7 100644 --- a/skills/living-doc-pageobject-scan/SKILL.md +++ b/skills/living-doc-pageobject-scan/SKILL.md @@ -135,6 +135,10 @@ Flag fragile selectors: > "Element `` has a positional CSS selector. Please add: > `data-testid=''` — e.g. `data-testid='confirm-order-btn'`" +Still include the current selector in the generated PageObject so test authoring is not blocked, but +annotate that selector constant with a `FRAGILE` comment and repeat the warning in the scan / breaking +change report. + **4. Map PageObjects to Feature entities** One PageObject ≈ one `UI` Feature. For each generated PageObject: @@ -172,13 +176,16 @@ def get_cart_item_by_sku(self, sku: str): For each selector in the existing PageObject, check if it still resolves: - **Present and unchanged**: no action -- **Present but changed**: update selector; log as `UPDATED` +- **Present but changed**: update selector; log as `UPDATED`; if the replacement selector is evident + (for example a renamed `data-testid`), report the exact new selector in the action required line - **Missing**: flag as `BREAKING CHANGE` — linked test steps may fail **2. Detect new elements** → propose additions. **3. Update PageObject files** — modify selector constants only. Preserve existing action and -assertion method logic. Never auto-delete methods — flag removals for developer review. +assertion method logic. Never auto-delete methods — flag removals for developer review. For missing +selectors, keep the selector constant and annotate it with a `BREAKING` comment so developers can +review whether the element was removed or renamed. **4. Breaking change report:** @@ -214,3 +221,4 @@ files before running a full rescan. |---|---| | Generate BDD scenarios for a User Story | `living-doc-scenario-creator` | | Create a User Story for this screen | `living-doc-create-user-story` | +| Document an API endpoint or REST surface | `living-doc-create-functionality` | diff --git a/skills/living-doc-scenario-creator/SKILL.md b/skills/living-doc-scenario-creator/SKILL.md index 726fa7f..ad9d7c7 100644 --- a/skills/living-doc-scenario-creator/SKILL.md +++ b/skills/living-doc-scenario-creator/SKILL.md @@ -54,19 +54,26 @@ If PageObjects or step files are not available, generate scenarios with stub ste Load the User Story. Confirm: - ID follows `US-` format -- At least one AC exists with state `Active` or `Implemented` +- Which ACs are eligible for generation (`Active` or `Implemented`) - ACs are atomic — each has one input condition and one observable outcome +Treat requests such as “write feature tests for US-007” as requests to generate BDD scenarios plus a coverage table for that User Story. + +If no ACs are `Active` or `Implemented`, do **not** generate empty or stub scenarios. Instead, +output a coverage report that lists every AC with its state-specific skip reason (`Planned` → +`skipped — not yet active`, `Deprecated` → `skipped — deprecated AC`) and advise the user to +re-run the scenario creator when an AC becomes `Active` or `Implemented`. + ### Step 2 — Map each AC to a scenario For each active AC, select the scenario pattern by AC type: - `happy_path` → `Scenario:` or `Scenario Outline:` (if data-driven) -- `error` → `Scenario: ` +- `error` → `Scenario: `. If the AC text already gives a crisp business-facing failure title (for example, `Order rejected when payment card is declined`), prefer that exact title instead of mechanically prefixing the User Story title. - `alternative` → `Scenario: ` Generate a scenario for **every** active AC. -Map Given-When-Then from the AC to existing step definitions — reuse exact step text where found. +Map Given-When-Then from the AC to existing step definitions — reuse exact step text where found. Keep all step text in domain/business language only; never mention HTTP, APIs, selectors, DOM details, databases, or other implementation mechanics. ```gherkin # AC: US-001-01 (v1.0.0 – Active) — Customer places an order with a saved payment method @@ -121,7 +128,12 @@ MISSING STEP + MISSING PAGEOBJECT METHOD: ### Step 4 — Validate AC coverage -Every active AC must map to at least one scenario. +Every `Active` or `Implemented` AC must map to at least one scenario. +The coverage report must list **every** AC on the User Story, including skipped ones. +Use these skip reasons verbatim so the output is predictable and auditable: +- `Planned` → `skipped — not yet active` +- `Deprecated` → `skipped — deprecated AC` + Run `scripts/coverage_report.py ` for a full catalog report. ``` @@ -129,16 +141,18 @@ AC COVERAGE REPORT — US-001 AC:US-001-01 (Active): ✅ covered by "Customer successfully places an order" AC:US-001-02 (Active): ✅ covered by "Order rejected when payment card is declined" AC:US-001-03 (Active): ❌ NOT COVERED — added to gap list - AC:US-001-04 (Deprecated): ⏭ skipped — deprecated AC + AC:US-001-04 (Planned): ⏭ skipped — not yet active + AC:US-001-05 (Deprecated): ⏭ skipped — deprecated AC ``` Use `scripts/coverage_report.py` to generate this report across the full catalog. ### Step 5 — Output artifacts -**`.feature` file** — one per User Story, named `-.feature`: +**`.feature` file** — one per User Story, named `us--.feature` in lowercase. When showing the generated output, include the filename in a comment or header inside the gherkin block: ```gherkin +# us-001-place-an-online-order.feature Feature: Place an online order As a registered customer I can place an order for in-stock items @@ -155,7 +169,7 @@ Feature: Place an online order **Missing step report** — generated stub implementations grouped by step file; Case B stubs include `NotImplementedError` failure guards and flag missing PageObject methods for extension (see Step 3). -**Coverage table** — ACs with coverage status (use `scripts/coverage_report.py`). +**Coverage table** — ACs with coverage status (use `scripts/coverage_report.py`). Append it immediately after the `.feature` code block in the response. --- diff --git a/skills/living-doc-update/SKILL.md b/skills/living-doc-update/SKILL.md index 3c50a22..164dda6 100644 --- a/skills/living-doc-update/SKILL.md +++ b/skills/living-doc-update/SKILL.md @@ -40,8 +40,10 @@ Ask: *Which entity is being updated, and what kind of change is this?* When adding a new AC to an existing User Story: 1. Load the existing User Story entity -2. Assign the next sequential AC ID: `AC:US--` -3. Elicit the new AC using the same completeness checklist as `living-doc-create-user-story`: +2. Assign the next sequential AC ID (for example `US-042-AC-4`; preserve an existing project + prefix such as `AC:US-042-04` if the catalog already uses it) +3. Elicit the new AC using the same completeness checklist as `living-doc-create-user-story` and + capture it in `description`, `given`, `when`, `then` form: - Happy path covered? - Error paths covered? - Alternative flows covered? @@ -49,9 +51,9 @@ When adding a new AC to an existing User Story: `gherkin-living-doc-sync` if so When modifying an existing AC **keep the AC ID stable** — changing the ID breaks traceability -to linked tests and Gherkin scenarios. Only change the description text or state. If the changed -AC text affects the wording of linked Gherkin steps, flag the linked scenarios for -`gherkin-living-doc-sync`. +to linked tests and Gherkin scenarios. Only update the `description`, `given`, `when`, `then`, or +state fields. If the changed AC text affects the wording of linked Gherkin steps, flag the linked +scenarios for `gherkin-living-doc-sync`. ## Promote a User Story from planned to active @@ -78,10 +80,12 @@ Set the relevant fields in the project's Storage Profile format: | `status` | `deprecated` | | `deprecated_at` | Date of deprecation | | `deprecation_reason` | Why it was deprecated | +| `deprecated_code_commit` | Commit SHA or URL that removed the backing code (if applicable) | | `superseded_by` | ID of the replacement entity (if applicable) | Rules: - Always deprecate — never delete entities (preserves audit trail) +- Add `deprecated_code_commit` when the code was removed in a commit - Add `superseded_by` when a replacement entity exists - Flag any Gherkin scenarios linked to the deprecated entity for `gherkin-living-doc-sync` @@ -94,13 +98,12 @@ When a team changes ownership of a Feature, update the `owners` field and set `o When an AC is moved out of the current sprint but not permanently removed: -- Add `descoped_at` (date) and `descoped_reason` fields — **do not delete the AC** (preserves audit trail) -- The AC's official lifecycle state remains `Planned` (still required, just deferred) +- Set `status: descoped` and add `descoped_at` (date) and `descoped_reason` fields — **do not delete the AC** (preserves audit trail) - Add `future_release` field if the work is planned for a later sprint - Flag any linked Gherkin scenarios for `@wip` or `@pending` tagging via `gherkin-living-doc-sync` ``` -AC:US-042-03 (v1.2.0 – Planned) +AC:US-042-03 (v1.2.0 – descoped) – Promo codes can be stacked and applied in defined priority order. – descoped_at: 2026-05-15 – descoped_reason: Promo stacking rule deferred — too complex for current sprint From d13a22f988ab774f41425b6e083c3aee8053485a Mon Sep 17 00:00:00 2001 From: miroslavpojer Date: Fri, 22 May 2026 21:34:21 +0200 Subject: [PATCH 17/35] feat: add evals and trigger evaluations for living-doc and living-doc-bdd-copilot agents --- .../evals/living-doc-bdd-copilot/evals.json | 119 ++++++++++++++++++ .../living-doc-bdd-copilot/trigger-eval.json | 22 ++++ .../evals/living-doc-copilot/evals.json | 113 +++++++++++++++++ .../living-doc-copilot/trigger-eval.json | 18 +++ .../agents/living-doc-bdd-copilot.agent.md | 25 ++-- .github/agents/living-doc-copilot.agent.md | 28 +++-- docs/testing/agent-testing.md | 25 ++-- 7 files changed, 315 insertions(+), 35 deletions(-) create mode 100644 .github/agents/evals/living-doc-bdd-copilot/evals.json create mode 100644 .github/agents/evals/living-doc-bdd-copilot/trigger-eval.json create mode 100644 .github/agents/evals/living-doc-copilot/evals.json create mode 100644 .github/agents/evals/living-doc-copilot/trigger-eval.json diff --git a/.github/agents/evals/living-doc-bdd-copilot/evals.json b/.github/agents/evals/living-doc-bdd-copilot/evals.json new file mode 100644 index 0000000..bb0d50c --- /dev/null +++ b/.github/agents/evals/living-doc-bdd-copilot/evals.json @@ -0,0 +1,119 @@ +{ + "agent_name": "living-doc-bdd-copilot", + "evals": [ + { + "id": 1, + "category": "happy-path", + "prompt": "Start a BDD session for a new project. The living doc catalog is in docs/living-doc/. The app runs at https://app.example.com. Login is required.", + "expected_output": "Agent assembles the Business Seed file at the discovered or default location. Sources: (A) loads Feature names and AC texts from docs/living-doc/; (B) looks for route config (Angular router, React Router, sitemap.xml); (D) checks for an existing manifest.json. Creates seed.yaml with: base_url, credentials using env: references (never literal values), known_routes from catalog Features, and an empty guided_steps list. Proposes adding BDD artifact paths to .github/copilot-instructions.md for future sessions.", + "files": [], + "expectations": [ + "Creates seed.yaml with base_url, credentials as env: references, and known_routes", + "Never stores literal credentials — always env:VAR_NAME", + "Loads Feature names and routes from the living doc catalog (Source A)", + "Checks for existing manifest.json before starting (Source D)", + "Proposes adding artifact paths to .github/copilot-instructions.md" + ] + }, + { + "id": 2, + "category": "happy-path", + "prompt": "Crawl the webapp and generate a PageObject for the checkout screen at /checkout.", + "expected_output": "Agent navigates to /checkout via MCP Playwright. Takes a snapshot and identifies interactive elements: promo code input, confirm order button, error banner. Generates CheckoutPage with: file-level living-doc: FEAT- | /checkout comment, ALL_CAPS selector constants using data-testid preference, __init__ or constructor taking a page parameter, and method stubs for each interactive element. Adds the surface to manifest.json. If no matching Feature entity exists, hands off to @living-doc-copilot to create FEAT-. Flags any element using positional CSS selectors as fragile.", + "files": [], + "expectations": [ + "Uses MCP Playwright to navigate and snapshot the page", + "Generates CheckoutPage with data-testid selector preference", + "File-level living-doc: FEAT-nnn | /checkout comment", + "Adds entry to manifest.json", + "Hands off to @living-doc-copilot for missing Feature entities", + "Flags positional CSS selectors as fragile" + ] + }, + { + "id": 3, + "category": "happy-path", + "prompt": "Generate Gherkin scenarios for US-007 — Place an Online Order. ACs: (1) Active — happy path: customer places order with saved payment. (2) Active — error: order rejected when card declined.", + "expected_output": "Agent generates a .feature file named us-007-place-an-online-order.feature. Feature header uses the As-a/I-can/so-that narrative from US-007. Two scenarios generated — one per Active AC. Each Scenario: is immediately preceded by a '# AC: US-007-0n (v1.0.0 – Active) — ...' traceability tag. Step text uses domain language (no HTTP calls, selectors, or DB). For steps without a matching step definition, generates stubs: Case A (PageObject method exists) = full stub; Case B (no PageObject method) = stub with NotImplementedError and flag to extend the PageObject.", + "files": [], + "expectations": [ + "Feature file named us-007-place-an-online-order.feature", + "Each Scenario: immediately preceded by a # AC: traceability comment", + "Skips Planned and Deprecated ACs — only Active ACs drive generation", + "Step text in domain language — no implementation details", + "Case A stubs delegate to PageObject methods", + "Case B stubs raise NotImplementedError and flag the missing PageObject method" + ] + }, + { + "id": 4, + "category": "happy-path", + "prompt": "RE-SCAN mode — the checkout screen had a UI update. Re-validate all manifest entries and discover any new routes.", + "expected_output": "Agent reloads seed.yaml and manifest.json. For every existing manifest entry: navigates to the URL, snapshots the DOM, and checks each recorded component_id selector. Selectors that no longer resolve are flagged as BREAKING CHANGE. On each visited page, agent also actively discovers new routes — follows links, clicks navigation-suggesting buttons, and checks tab panels and side-nav items not yet in the manifest. New surfaces are added to manifest.json. Removed surfaces are marked as deprecated. Stale PageObject selectors are updated.", + "files": [], + "expectations": [ + "Reloads both seed.yaml and manifest.json before starting", + "Validates every manifest entry's component_id selectors against the current DOM", + "Flags broken selectors as BREAKING CHANGE with linked step definition details", + "Actively discovers new routes beyond existing manifest entries", + "Adds new surfaces to manifest.json; marks removed ones as deprecated", + "Updates stale selector constants in PageObjects" + ] + }, + { + "id": 5, + "category": "regression", + "prompt": "HEALING mode — the checkout test 'Scenario: Customer successfully places an order' is failing. The confirm button selector is broken.", + "expected_output": "Agent enters HEALING mode — scope is limited to the failing test only. Traces the failure to CheckoutPage.CONFIRM_BUTTON and the step 'When the customer confirms the order'. Navigates to /checkout via MCP Playwright, snapshots the DOM, and finds the updated selector. Updates CONFIRM_BUTTON in CheckoutPage only. Verifies the linked step definition binding still resolves. Re-runs only the previously failing test to confirm healing. Does not touch passing tests or unrelated PageObjects.", + "files": [], + "expectations": [ + "Scope limited to the failing test — does not touch passing tests", + "Navigates to the affected page via MCP Playwright and snapshots DOM", + "Updates the broken selector in CheckoutPage only", + "Verifies the step definition binding is intact", + "Re-runs only the failing test to confirm healing" + ] + }, + { + "id": 6, + "category": "negative", + "prompt": "Create a User Story for the guest checkout capability.", + "expected_output": "Creating living doc catalog entities is out of scope for this agent — hands off to @living-doc-copilot. @living-doc-bdd-copilot owns the automation layer (PageObjects, scenarios, step definitions); @living-doc-copilot owns the catalog layer (User Stories, Features, Functionalities).", + "files": [], + "expectations": [ + "Does not create a User Story", + "Routes to @living-doc-copilot", + "Explains the catalog vs automation layer boundary" + ] + }, + { + "id": 7, + "category": "paraphrase", + "prompt": "Our BDD tests are all failing after the UI redesign — buttons and inputs have new IDs.", + "expected_output": "Agent recognises this as a HEALING mode request. Asks for the list of failing test names or scenario titles. Scopes repair to only those failing scenarios. For each failing scenario: traces to the affected PageObject, navigates to the screen via MCP Playwright, snapshots DOM to find updated selectors, and updates affected PageObject constants. Verifies step definition bindings. Re-runs only the previously failing tests. Does not touch unrelated PageObjects.", + "files": [], + "expectations": [ + "Identifies this as HEALING mode", + "Asks for the failing test list to scope the repair", + "Traces each failure to its PageObject and step definition", + "Uses MCP Playwright to discover updated selectors", + "Re-runs only previously failing tests after healing" + ] + }, + { + "id": 8, + "category": "edge-case", + "prompt": "REMOVE mode — the legacy promo code feature has been removed from the product. Clean up the BDD artifacts.", + "expected_output": "Agent enters REMOVE mode. Identifies all .feature files whose scenarios carry # AC: tags matching the removed promo code Feature/US IDs. Finds PageObjects referenced only by those scenarios. Finds step definitions used only by those scenarios. Presents the complete deletion list to the user for confirmation before touching any file. After confirmation: removes confirmed files, updates manifest.json to remove the deprecated entries. Flags the linked US/AC entities in the catalog as deprecation candidates and hands off to @living-doc-copilot.", + "files": [], + "expectations": [ + "Identifies all .feature file scenarios linked to the removed Feature via # AC: tags", + "Identifies PageObjects and step definitions used exclusively by those scenarios", + "Presents the full deletion list for user confirmation before any file is touched", + "Removes only confirmed files — does not auto-delete", + "Updates manifest.json to remove deprecated entries", + "Flags catalog entities for deprecation and hands off to @living-doc-copilot" + ] + } + ] +} diff --git a/.github/agents/evals/living-doc-bdd-copilot/trigger-eval.json b/.github/agents/evals/living-doc-bdd-copilot/trigger-eval.json new file mode 100644 index 0000000..8369a23 --- /dev/null +++ b/.github/agents/evals/living-doc-bdd-copilot/trigger-eval.json @@ -0,0 +1,22 @@ +[ + {"id": 1, "query": "Scan the webapp at https://app.example.com and generate PageObjects", "should_trigger": true, "reason": "'scan webapp' trigger phrase"}, + {"id": 2, "query": "Generate PageObjects for the checkout and login screens", "should_trigger": true, "reason": "'generate pageobjects' trigger phrase"}, + {"id": 3, "query": "Heal the PageObjects after the UI redesign — selectors are broken", "should_trigger": true, "reason": "'heal pageobjects' trigger phrase"}, + {"id": 4, "query": "Generate BDD scenarios for the active User Stories", "should_trigger": true, "reason": "'generate scenarios' trigger phrase"}, + {"id": 5, "query": "Sync the Gherkin feature files with the living doc AC catalog", "should_trigger": true, "reason": "'sync gherkin' trigger phrase"}, + {"id": 6, "query": "Use Playwright to crawl the application and discover all screens", "should_trigger": true, "reason": "'playwright crawl' trigger phrase"}, + {"id": 7, "query": "Explore the app and map all the UI surfaces", "should_trigger": true, "reason": "'explore the app' trigger phrase"}, + {"id": 8, "query": "@bdd-copilot scan the dashboard and generate scenarios", "should_trigger": true, "reason": "'bdd copilot' trigger phrase — explicit agent invocation"}, + {"id": 9, "query": "@living-doc-bdd-copilot set up the BDD suite for our new module", "should_trigger": true, "reason": "'living doc bdd copilot' trigger phrase — explicit agent invocation"}, + {"id": 10, "query": "Run the full BDD pipeline — crawl, generate PageObjects, and produce feature files", "should_trigger": true, "reason": "'BDD pipeline' trigger phrase"}, + {"id": 11, "query": "Crawl the UI to discover all reachable pages", "should_trigger": true, "reason": "'crawl the UI' trigger phrase"}, + {"id": 12, "query": "Create page objects for the admin portal", "should_trigger": true, "reason": "'create page objects' trigger phrase"}, + {"id": 13, "query": "Generate a feature file for US-007 — Place an Online Order", "should_trigger": true, "reason": "'generate feature file' trigger phrase"}, + {"id": 14, "query": "What is the scenario coverage for US-007?", "should_trigger": true, "reason": "'scenario coverage' trigger phrase"}, + {"id": 15, "query": "Write the step definitions for the checkout scenarios", "should_trigger": true, "reason": "'step definitions' trigger phrase"}, + {"id": 16, "query": "Generate Gherkin from user story US-003", "should_trigger": true, "reason": "'gherkin from user story' trigger phrase"}, + {"id": 17, "query": "Create a User Story for the loyalty points redemption feature", "should_trigger": false, "reason": "Living doc catalog entity creation — routes to @living-doc-copilot"}, + {"id": 18, "query": "Write a unit test for the discount calculation function", "should_trigger": false, "reason": "Unit test authoring — routes to @sdet-copilot"}, + {"id": 19, "query": "Update the AC state on US-007-02 to DEPRECATED", "should_trigger": false, "reason": "Catalog entity state update — routes to @living-doc-copilot"}, + {"id": 20, "query": "Run the TypeScript quality gate for the frontend", "should_trigger": false, "reason": "Quality gate execution — routes to @quality-gate-copilot"} +] diff --git a/.github/agents/evals/living-doc-copilot/evals.json b/.github/agents/evals/living-doc-copilot/evals.json new file mode 100644 index 0000000..16ba395 --- /dev/null +++ b/.github/agents/evals/living-doc-copilot/evals.json @@ -0,0 +1,113 @@ +{ + "agent_name": "living-doc-copilot", + "evals": [ + { + "id": 1, + "category": "happy-path", + "prompt": "I want to start documenting our living doc catalog.", + "expected_output": "Before performing any create or update operation, agent asks for the Storage Profile: 'Which storage format does your living doc use? Describe the entity structure, field names, and where entities are stored (e.g. YAML files in docs/living-doc/, ADO work items, Confluence pages).' Waits for the answer before proceeding. Does not assume a format or create any entity until the storage profile is provided.", + "files": [], + "expectations": [ + "Asks for Storage Profile before any create or update operation", + "Asks about storage location, entity templates, AC block structure, and field name mappings", + "Does not create any entity until the Storage Profile is provided", + "Waits for the user's answer — does not assume a format" + ] + }, + { + "id": 2, + "category": "happy-path", + "prompt": "Create a User Story for the promo code feature. ACs: (1) Valid promo reduces cart total by 10%. (2) Expired promo shows an error message.", + "expected_output": "Agent creates a User Story entity with the As-a/I-can/so-that narrative. Each AC carries all required metadata: state (ACTIVE or PLANNED), version, pre-conditions, and not_in_scope. ACs are atomic — one input condition and one observable outcome each. AC IDs follow the AC:- format. The entity is written in the project's confirmed Storage Profile format.", + "files": [], + "expectations": [ + "User Story has an As-a/I-can/so-that narrative", + "Each AC has state, version, pre-conditions, and not_in_scope metadata fields", + "AC IDs follow AC:- format", + "ACs are atomic — one condition and one outcome each", + "Entity written in the confirmed Storage Profile format" + ] + }, + { + "id": 3, + "category": "happy-path", + "prompt": "PLAN mode — the PO wants a loyalty points redemption feature. No code exists yet. Draft ACs from this description: 'Customers with at least 500 points can redeem them at checkout. Each point is worth 1 penny.'", + "expected_output": "Agent operates in PLAN mode. Drafts ACs in plain language from the PO description. Presents the draft to the user for confirmation before creating any entity. After confirmation, creates the User Story and ACs in PLANNED state only — not ACTIVE. ACs cover: successful redemption (>=500 points), insufficient points (< 500), point conversion to currency (1pt = £0.01), and boundary condition (exactly 500 points).", + "files": [], + "expectations": [ + "Operates in PLAN mode", + "Drafts ACs and presents them for confirmation before creating", + "Creates entities in PLANNED state only", + "Covers happy path, error path, and boundary condition", + "Does not create in ACTIVE state without explicit confirmation that code exists" + ] + }, + { + "id": 4, + "category": "happy-path", + "prompt": "We refactored PaymentService — it no longer calls the legacy card tokeniser. Run an impact analysis.", + "expected_output": "Agent traces which living doc entities reference PaymentService: identifies linked Features (e.g. FEAT-payments), User Stories whose ACs describe payment tokenisation, and Functionalities that reference the tokeniser. Outputs an impact map: entities that must be reviewed (ACs that reference the old tokeniser behaviour), entities that need state update (ACs whose code no longer exists), and entities unaffected. Recommends: deprecate ACs that reference the deleted tokeniser, update version fields on affected entities.", + "files": [], + "expectations": [ + "Identifies all Features, User Stories, and Functionalities referencing PaymentService", + "Produces an impact map with affected and unaffected entities", + "Recommends deprecating ACs whose behaviour was removed", + "Recommends version bump on changed entities", + "Does not auto-change state — presents for confirmation" + ] + }, + { + "id": 5, + "category": "regression", + "prompt": "HEALING mode — the promo code module was deleted three sprints ago. Its ACs are still in ACTIVE state in the catalog.", + "expected_output": "Agent enters HEALING mode. Verifies that the module no longer exists (via file search or user confirmation). Identifies stale entities: all ACs under the promo-code Feature that reference deleted code. Sets state to DEPRECATED on confirmed stale ACs. Fixes traceability links that reference the deleted module. Does NOT touch PageObjects or step definitions — flags those for @living-doc-bdd-copilot. Notes any remaining pre-conditions that reference the deleted flow.", + "files": [], + "expectations": [ + "Enters HEALING mode — catalog layer only", + "Identifies all ACs with ACTIVE state that reference the deleted module", + "Sets state to DEPRECATED on confirmed stale ACs", + "Does not touch PageObjects or step definitions — defers to @living-doc-bdd-copilot", + "Removes or flags pre-conditions that reference the deleted flow" + ] + }, + { + "id": 6, + "category": "negative", + "prompt": "Write Gherkin scenarios for US-007 — Place an Online Order.", + "expected_output": "Scenario generation is out of scope for this agent — hands off to @living-doc-bdd-copilot. living-doc-copilot owns the catalog layer (entities, ACs, traceability); @living-doc-bdd-copilot generates Gherkin from US ACs and manages the automation layer.", + "files": [], + "expectations": [ + "Does not write Gherkin scenarios", + "Routes to @living-doc-bdd-copilot", + "Explains the catalog vs automation layer boundary" + ] + }, + { + "id": 7, + "category": "paraphrase", + "prompt": "I need to capture this business rule in the living doc: orders from repeat customers get a 5% loyalty discount automatically.", + "expected_output": "Agent identifies this as a Functionality entity request (atomic business rule). Invokes the living-doc-create-functionality skill. Forms a verb-phrase name (e.g. 'Apply repeat-customer loyalty discount'). Runs the completeness checklist: asks about the threshold for 'repeat customer', boundary cases (exactly N orders), interaction with other discounts. Drafts ACs and presents for confirmation. Creates the Functionality in the confirmed Storage Profile format with all required AC metadata fields.", + "files": [], + "expectations": [ + "Identifies as a Functionality entity — atomic business rule", + "Invokes living-doc-create-functionality skill", + "Verb-phrase name for the Functionality", + "Runs completeness checklist — asks about threshold and boundary conditions", + "Drafts ACs for confirmation before creating" + ] + }, + { + "id": 8, + "category": "edge-case", + "prompt": "AC:US-001-02 is currently ACTIVE at v1.0.0. The business changed the rule: the discount threshold is now £75 instead of £50. How do I update it?", + "expected_output": "Updating an ACTIVE AC requires a version bump. Agent updates the AC description and increments the version (v1.0.0 → v1.1.0). Reminds that any Gherkin scenarios linked to this AC via '# AC: US-001-02' may now have stale step text — flags for @living-doc-bdd-copilot to sync. Does NOT change the AC ID. Shows old and new AC side by side for confirmation before writing.", + "files": [], + "expectations": [ + "Bumps version on the updated AC (v1.0.0 → v1.1.0)", + "Does not change the AC ID", + "Flags linked Gherkin scenarios as potentially stale — defers to @living-doc-bdd-copilot", + "Shows old and new AC side by side for confirmation before writing" + ] + } + ] +} diff --git a/.github/agents/evals/living-doc-copilot/trigger-eval.json b/.github/agents/evals/living-doc-copilot/trigger-eval.json new file mode 100644 index 0000000..434c08f --- /dev/null +++ b/.github/agents/evals/living-doc-copilot/trigger-eval.json @@ -0,0 +1,18 @@ +[ + {"id": 1, "query": "Create a user story for the checkout capability", "should_trigger": true, "reason": "'create user story' trigger phrase"}, + {"id": 2, "query": "Document the Orders API as a Feature in the living doc", "should_trigger": true, "reason": "'document feature' trigger phrase"}, + {"id": 3, "query": "Update the AC on US-007 — the payment timeout is now 30 seconds, not 60", "should_trigger": true, "reason": "'update AC' trigger phrase"}, + {"id": 4, "query": "Run an impact analysis on the PaymentService refactor", "should_trigger": true, "reason": "'impact analysis' trigger phrase"}, + {"id": 5, "query": "Find gaps in the living documentation catalog", "should_trigger": true, "reason": "'living doc gaps' trigger phrase"}, + {"id": 6, "query": "Enter PLAN mode — the PO has a new checkout initiative that hasn't been built yet", "should_trigger": true, "reason": "'PLAN mode' trigger phrase"}, + {"id": 7, "query": "Run HEALING mode on the living doc catalog — we've deleted several old flows", "should_trigger": true, "reason": "'HEALING mode' trigger phrase"}, + {"id": 8, "query": "Deprecate the legacy payment flow entities in the living doc", "should_trigger": true, "reason": "'deprecate entity' trigger phrase"}, + {"id": 9, "query": "@living-doc-copilot help me update the requirements catalog", "should_trigger": true, "reason": "'living doc copilot' trigger phrase — explicit agent invocation"}, + {"id": 10, "query": "Add an AC to user story US-003 for the expired promo code case", "should_trigger": true, "reason": "'add AC to user story' trigger phrase"}, + {"id": 11, "query": "Trace which features are affected by the change to the notification service", "should_trigger": true, "reason": "'trace affected features' trigger phrase"}, + {"id": 12, "query": "Update the feature registry to include the new reporting module", "should_trigger": true, "reason": "'update feature registry' trigger phrase"}, + {"id": 13, "query": "Scan the webapp and generate PageObjects for the checkout screen", "should_trigger": false, "reason": "UI crawl and PageObject generation — routes to @living-doc-bdd-copilot"}, + {"id": 14, "query": "Generate Gherkin scenarios for US-007", "should_trigger": false, "reason": "Scenario generation — routes to @living-doc-bdd-copilot"}, + {"id": 15, "query": "Write a unit test for the discount calculation function", "should_trigger": false, "reason": "Test code authoring — routes to @sdet-copilot"}, + {"id": 16, "query": "Fix the failing BDD tests after the UI redesign", "should_trigger": false, "reason": "PageObject and step definition repair — routes to @living-doc-bdd-copilot"} +] diff --git a/.github/agents/living-doc-bdd-copilot.agent.md b/.github/agents/living-doc-bdd-copilot.agent.md index 0764ee2..9898a45 100644 --- a/.github/agents/living-doc-bdd-copilot.agent.md +++ b/.github/agents/living-doc-bdd-copilot.agent.md @@ -102,6 +102,8 @@ guided_steps: [] # populated during Source E traversal 5. Repeat until coverage plateau — no new surfaces found in the last full iteration. 6. Report any unreachable areas — auth walls, dead links, CAPTCHA gates, or forms that cannot be progressed due to missing business knowledge (unknown valid input values, business-specific field formats, required lookup codes, conditional field logic). Offer to enrich `seed.yaml` with missing routes, credentials, or form values, then loop. +**PageObject generation rule:** For every new or changed UI surface, load `living-doc-pageobject-scan` — `Create` mode for first-time generation and `Maintain` mode for selector drift. Generated PageObjects must use a file-level `living-doc: FEAT- | /route` header comment, prefer `data-testid` selectors, keep selector constants in `ALL_CAPS`, accept `page` in `__init__` / `constructor`, and expose method stubs for each interactive element. Flag any positional CSS selector as `FRAGILE`. If no matching Feature exists in the catalog, hand the surface to `@living-doc-copilot`; do not create catalog entities here. + **Output artifact:** `.copilot/bdd/manifest.json` ```json @@ -148,14 +150,17 @@ guided_steps: After exploration completes (manifest is up to date): 1. Use the `living-doc-gap-finder` skill (bottom-up mode) to identify User Stories with `ACTIVE` ACs that have no linked Gherkin scenario. -2. For each gap: load the `living-doc-scenario-creator` skill and generate Gherkin scenario skeletons — one scenario per AC, with the mandatory `# AC:` traceability tag. -3. Write `.feature` files under the project's feature directory. -4. For each generated scenario, resolve step definitions: +2. For each gap: load the `living-doc-scenario-creator` skill and generate Gherkin scenario skeletons — one scenario per `Active` or `Implemented` AC, with the mandatory `# AC:` traceability tag. Skip `Planned` and `Deprecated` ACs. +3. Write `.feature` files under the project's feature directory using `-.feature` naming, e.g. `us-007-place-an-online-order.feature`. +4. The `Feature:` header must restate the User Story narrative in `As a / I can / so that` form. +5. Scenario step text must stay in business/domain language only — never mention selectors, HTTP calls, DOM details, or database operations. +6. For each generated scenario, resolve step definitions: a. **Narrow the search scope to the page first** — identify which PageObject the scenario's steps will interact with. Look in step definition files that already import or reference that PageObject; these are the most likely candidates for reuse. b. **Match by purpose, not just pattern** — read the step's implementation body to confirm it performs the same business action (e.g. a `fill` on `username-input` vs a `fill` on `search-input` look identical in text but serve different purposes). Only reuse if purpose matches. c. If a purpose-matching step exists, reuse it as-is; note which library file it lives in. - d. Only if no match exists: write a new stub using the `gherkin-step` skill; extend the relevant PageObject where a new UI interaction is needed. -5. Update `manifest.json` to record any new PageObject paths created. + d. If no reusable step exists but the needed PageObject method already exists, generate a full step stub via `gherkin-step` that delegates directly to that PageObject method. + e. If neither the step nor the PageObject method exists, generate a stub that raises `NotImplementedError` (or the language-equivalent pending marker) and explicitly flag that the PageObject must be extended with the missing interaction. +7. Update `manifest.json` to record any new PageObject paths created. **Gap detection logic:** An AC is considered uncovered if no scenario in any `.feature` file carries the AC's traceability tag (`# AC: `). @@ -170,14 +175,14 @@ After exploration completes (manifest is up to date): **Scope:** Full re-run of every path recorded in `manifest.json`, plus active discovery of new routes not yet in the manifest. 1. Reload `seed.yaml` and `manifest.json`. -2. For every existing manifest entry: navigate to its URL, snapshot the DOM, and validate that every recorded `component_id` locator still resolves. Flag any locator that no longer matches as `stale`. +2. For every existing manifest entry: navigate to its URL, snapshot the DOM, and validate that every recorded `component_id` locator still resolves. Flag any locator that no longer matches as `BREAKING CHANGE`, including the linked step definition / scenario details that may fail. 3. **Actively discover new routes from each visited page** — do not limit discovery to routes already in `seed.yaml`. On each page snapshot: - Find all `` links that resolve to new paths not yet in the manifest. - Find all buttons and interactive components whose purpose suggests navigation to a new screen (e.g. "Create order", "View details", "Go to settings") — click them and record the resulting URL. - Find tab panels, side-nav items, and wizard steps that expose sub-routes. - Any new URL discovered this way is a candidate manifest entry; add it and crawl it recursively. -4. Add new surfaces to `manifest.json`; mark removed or stale-locator surfaces as `deprecated`. -5. Update PageObjects for any locators flagged as stale in step 2. +4. Add new surfaces to `manifest.json`; mark removed surfaces as `deprecated`. +5. Update stale selector constants in PageObjects for any locators flagged in step 2. 6. Generate new scenarios for newly discovered ACs (Scenario Generation logic). ### HEALING mode @@ -186,10 +191,10 @@ After exploration completes (manifest is up to date): **Scope:** Failing tests only — do not touch passing tests or unrelated PageObjects. -1. Receive or discover the list of failing test names / scenario titles. +1. Receive or discover the list of failing test names / scenario titles. If the request only says tests are failing but does not include the failing list, ask for it before making changes so scope stays limited to the failing scenarios. 2. Trace each failure back to its PageObject and step definition. 3. Navigate to the affected page via MCP Playwright; snapshot the current DOM. -4. Find updated element IDs or selectors; update the affected PageObject(s) accordingly. +4. Find updated element IDs or selectors; update only the affected PageObject(s) accordingly. 5. Verify the step definition binding still resolves; fix if broken. 6. Re-run only the previously failing tests to confirm healing. Do not re-run the full suite. diff --git a/.github/agents/living-doc-copilot.agent.md b/.github/agents/living-doc-copilot.agent.md index 8bd1013..3566d21 100644 --- a/.github/agents/living-doc-copilot.agent.md +++ b/.github/agents/living-doc-copilot.agent.md @@ -22,17 +22,17 @@ Requirements layer agent. Owns the living documentation catalog — creates, upd ## Initialisation -On every session start, ask: +When the user is starting the living doc catalog or explicitly asks to define storage setup, ask: > "Which storage format does your living doc use? Describe the entity structure, field names, and where entities are stored (e.g. YAML files in `docs/living-doc/`, ADO work items, Confluence pages)." -Wait for the answer before any create or update operation. Extract from the response: +Wait for the answer before the first persisted create or update in that session. Extract from the response: - **Storage location** — where entity files live (path pattern or external system) - **Entity templates** — expected fields and their names per entity type (US, Feature, Functionality) - **AC block structure** — how ACs are represented (inline fields, nested list, table) - **Field name mappings** — e.g. what the project calls `state`, `version`, `id` -Never assume a format. If the answer is incomplete, ask one targeted follow-up before proceeding. +Never invent a format. If the answer is incomplete, ask one targeted follow-up before proceeding. If a later request omits storage details, assume the session's confirmed Storage Profile still applies. ## Scope @@ -70,7 +70,7 @@ Every AC must carry these fields: - Fix broken traceability links: US ↔ Feature ↔ Functionality - Update `version` fields where incremented - Remove `pre-conditions` that reference deleted flows -- Does NOT repair PageObject selectors or step definition bindings → `@bdd-copilot` +- Does NOT repair PageObject selectors or step definition bindings → `@living-doc-bdd-copilot` **PLAN** — triggered by PO descriptions without existing code: - Draft ACs from plain-language descriptions @@ -80,10 +80,10 @@ Every AC must carry these fields: ## Cross-agent HEALING boundary This agent heals the **catalog layer** (entities, ACs, traceability links). -`@bdd-copilot` heals the **automation layer** (PageObjects, step definitions, feature files). +`@living-doc-bdd-copilot` heals the **automation layer** (PageObjects, step definitions, feature files). Do not cross this boundary. -> `@bdd-copilot` is the expected cooperating agent for the automation layer. It is deployed separately — if it is not yet available in this repository, hand-off notes should be left as TODO comments for a future BDD session. +> `@living-doc-bdd-copilot` is the expected cooperating agent for the automation layer. It is deployed separately — if it is not yet available in this repository, hand-off notes should be left as TODO comments for a future BDD session. ## Skills @@ -96,10 +96,22 @@ Do not cross this boundary. | `living-doc-impact-analysis` | Trace which entities a code change affects | `skills/living-doc-impact-analysis/SKILL.md` | | `living-doc-gap-finder` | Find undocumented behaviours and orphan tests | `skills/living-doc-gap-finder/SKILL.md` | +## Operating rules + +- Confirm and cache the Storage Profile before the first persisted create or update only when the session is establishing storage setup; once confirmed, write every entity in that format, reuse it for later requests in the same session, and never invent missing field names. +- Route by request type: User Story or business journey → `living-doc-create-user-story`; atomic business rule or component behaviour → `living-doc-create-functionality`; impact or change trace → `living-doc-impact-analysis`; update or deprecate an existing entity or AC → `living-doc-update`; catalog drift or stale coverage → `living-doc-gap-finder`. +- If a User Story request includes capability and ACs but omits actor or business value, draft the most likely `As a / I can / so that` narrative from the business context and ask for confirmation only when the role or value is genuinely ambiguous. +- Use atomic ACs only: one triggering condition plus one observable outcome per AC. Every AC must include `id`, `state`, `version`, `pre-conditions`, and `not_in_scope`. Unless the confirmed Storage Profile already defines a different convention, use `AC:-` and keep AC IDs stable across updates. +- PLAN mode: draft ACs first, cover happy path, error path, boundary conditions, and threshold or conversion rules where relevant, then create only after confirmation and only in `PLANNED` state. +- HEALING mode: verify deleted or superseded code via repository search or explicit user confirmation before deprecating; then set stale ACs or entities to `DEPRECATED`, repair traceability links, remove or flag stale `pre-conditions`, and leave PageObjects, step definitions, and Gherkin sync to `@living-doc-bdd-copilot`. +- Impact analysis: produce an explicit impact map covering affected and unaffected Features, Functionalities, User Stories, ACs, and linked scenarios; recommend version bumps on changed entities and deprecation for removed behaviours, but do not change state without user confirmation. +- Updating an `ACTIVE` AC: show OLD vs NEW side by side before writing, keep the AC ID unchanged, and bump the semantic version for business-rule changes (for example `v1.0.0` → `v1.1.0` for a threshold change). Flag linked `# AC: ...` Gherkin or scenario text as potentially stale for `@living-doc-bdd-copilot`. +- For Functionality requests, use a verb-phrase name, draft ACs and present them for confirmation before creating, and run a completeness checklist for thresholds, below/exactly/above-boundary behaviour, invalid or missing input, and interactions with other rules. + ## Handoff -**Inbound:** `@bdd-copilot` hands a surface list after Phase 1 exploration. Load it, then create the corresponding Feature and User Story entities. +**Inbound:** `@living-doc-bdd-copilot` hands a surface list after Phase 1 exploration. Load it, then create the corresponding Feature and User Story entities. **Outbound:** When US and ACs are confirmed and in `ACTIVE` (or `PLANNED`) state, complete with: -> "US and ACs are ready. Call @bdd-copilot to generate scenarios." +> "US and ACs are ready. Call @living-doc-bdd-copilot to generate scenarios." diff --git a/docs/testing/agent-testing.md b/docs/testing/agent-testing.md index 928992f..e3ea770 100644 --- a/docs/testing/agent-testing.md +++ b/docs/testing/agent-testing.md @@ -1,6 +1,6 @@ # Agent Testing Guide -This document describes how to test, evaluate, and tune `.agent.md` files — specifically how to use `agent-customization` (for structural edits) together with `skill-creator`'s eval methodology (for description trigger accuracy). This is the practical equivalent of [skill-testing.md](./skill-testing.md) applied to agents. +This document describes how to test, evaluate, and tune `.agent.md` files — specifically how to use `skill-creator`'s eval methodology (for description trigger accuracy). This is the practical equivalent of [skill-testing.md](./skill-testing.md) applied to agents. --- @@ -11,7 +11,6 @@ This document describes how to test, evaluate, and tune `.agent.md` files — sp | Trigger mechanism | `description:` field in SKILL.md YAML | `description:` field in `.agent.md` YAML | | Body loaded when? | When skill is activated by description match | When user addresses `@agent-name` or description matches | | What to tune | Description trigger keywords + body instructions | Description trigger keywords + body sections (scope, handoff, maintenance modes) | -| Tool for structural edits | `skill-creator` | `agent-customization` | | Tool for eval loop | `skill-creator` (fully supported) | `skill-creator` (description eval loop applies directly) | The key insight: an agent's `description:` block is read by the same matching mechanism as a skill's `description:`. Everything `skill-creator` does to optimize skill descriptions applies 1-for-1 to agent descriptions. @@ -25,7 +24,7 @@ The key insight: an agent's `description:` block is read by the same matching me 3. Start a Copilot Chat session from the repository root 4. Ask Copilot to use the `skill-creator` skill, pointing it at the agent's eval files 5. Review trigger accuracy and output quality -6. Use `agent-customization` to edit structural sections (tools list, scope, handoff, modes) +6. Edit structural sections directly in the `.agent.md` file (tools list, scope, handoff, modes) 7. Re-run evals; repeat until stable --- @@ -138,21 +137,13 @@ Use the same with-skill / baseline comparison flow described in [skill-testing.m --- -## 6. Structural edits — use `agent-customization` +## 6. Structural edits -When body evals reveal a section is wrong (wrong scope, missing tool, bad handoff), use the `agent-customization` skill to fix the structural parts: +When body evals reveal a section is wrong (wrong scope, missing tool, bad handoff), edit the `.agent.md` file directly: -``` -Use the agent-customization skill to add `mcp_microsoft_pla_browser_wait_for` to the tools list -in .github/agents/my-agent.agent.md. -``` - -``` -Use the agent-customization skill to update the HEALING mode scope in -.github/agents/my-agent.agent.md — it should be scoped to failing tests only. -``` - -`agent-customization` understands `.agent.md` YAML frontmatter and section structure, so it handles these edits safely without breaking the file format. +- **Missing tool** — add the tool name to the `tools:` list in the YAML frontmatter +- **Wrong scope boundary** — update the relevant section (`## Scope`, `## Does NOT`, or the specific mode block) +- **Broken handoff** — update the `## Handoff` section with the correct target agent and conditions --- @@ -221,7 +212,7 @@ gh copilot → "Use the skill-creator skill to test the agent at .github/agents/my-agent.agent.md using the evals at .github/agents/evals/my-agent/" → inspect trigger accuracy and body output diffs -→ use agent-customization to fix structural issues +→ edit the `.agent.md` file directly to fix structural issues → "Use the skill-creator skill to optimize the description using the trigger-eval.json" → re-run evals until stable ``` From 4d94d2879c2176d138d06191900c4091845f7ccc Mon Sep 17 00:00:00 2001 From: miroslavpojer Date: Fri, 22 May 2026 21:35:44 +0200 Subject: [PATCH 18/35] chore: remove roadmap.md as part of project restructuring --- roadmap.md | 729 ----------------------------------------------------- 1 file changed, 729 deletions(-) delete mode 100644 roadmap.md diff --git a/roadmap.md b/roadmap.md deleted file mode 100644 index cc6b51f..0000000 --- a/roadmap.md +++ /dev/null @@ -1,729 +0,0 @@ -# Implementation Roadmap — Agentic Engineering Toolkit - -> **Authored from:** `plugin-spec.md` (last reviewed 2026-05-21). -> The spec file has been removed from the repo; this document is the canonical delivery reference. -> **Last updated:** 2026-05-22 - ---- - -## Progress overview - -| Step | Cluster | Agent(s) | Skills | Done | Remaining | -|---|---|---|---|---|---| -| 1 | Living Doc + BDD | `@living-doc-copilot` ✅ `@living-doc-bdd-copilot` ✅ | 11 / 11 ✅ | 13 files | 0 | -| 1b | Tutorial | `@living-doc-bdd-tutorial-copilot` ❌ | 0 / 1 | — | 2 files | -| 2 | SDET | `@sdet-copilot` ❌ | 0 / 7 | — | 8 files | -| 3 | Code Quality | `@quality-gate-copilot` ❌ | 0 / 7 | — | 8 files | -| 4 | Test Specialist | `@test-specialist-copilot` ❌ | 0 / 6 | — | 7 files | -| 4b | Test Quality | `@test-quality-copilot` ❌ | 0 / 4 | — | 5 files | -| 5 | Standalone | — | 0 / 3 | — | 3 files | -| **Total** | | **7 agents** | **40 skills** | **13** | **33** | - -> **Constraint:** never merge a cluster without both the skill files AND the agent definition in the same PR. - ---- - -## File layout - -``` -skills/ -└── {skill-name}/ - ├── SKILL.md ← required - ├── scripts/ ← optional: executable logic - ├── references/ ← optional: overflow docs (when body approaches 500 lines) - ├── assets/ ← optional: templates, example files - └── evals/ ← optional: trigger + assertion test prompts - -.github/ -└── agents/ - └── {agent-name}.agent.md -``` - -Skill source to migrate from: `/Users/ab024ll/.copilot/skills/{skill-name}/SKILL.md` -Agent files: authored from scratch using the spec definitions below. - -### Agent file format - -```yaml ---- -description: > - -tools: - - read_file - - replace_string_in_file - - create_file - - grep_search - - file_search - - semantic_search - # + run_in_terminal for agents with execute capability - # + mcp_microsoft_pla_browser_* for @living-doc-bdd-copilot only ---- - - -``` - -### Validation checklist (every skill before merge) - -- [ ] Folder name matches `name` frontmatter exactly — lowercase kebab-case, ≤ 64 chars -- [ ] `description` ≤ 1024 chars; covers *what* and *when*; includes trigger keywords -- [ ] Body < 500 lines; use `references/` with a pointer in body if needed -- [ ] No hardcoded secrets, credentials, or absolute machine-local paths -- [ ] Scripts in `scripts/` are referenced from `SKILL.md` with usage instructions - ---- - -## Step 1 — Living Doc + BDD Cluster - -### Completed ✅ - -| File | Status | -|---|---| -| `.github/agents/living-doc-copilot.agent.md` | ✅ | -| `.github/agents/living-doc-bdd-copilot.agent.md` | ✅ | -| `skills/living-doc-create-feature/SKILL.md` | ✅ | -| `skills/living-doc-create-functionality/SKILL.md` | ✅ | -| `skills/living-doc-create-user-story/SKILL.md` | ✅ | -| `skills/living-doc-gap-finder/SKILL.md` | ✅ | -| `skills/living-doc-impact-analysis/SKILL.md` | ✅ | -| `skills/living-doc-update/SKILL.md` | ✅ | -| `skills/living-doc-pageobject-scan/SKILL.md` | ✅ | -| `skills/living-doc-scenario-creator/SKILL.md` | ✅ | -| `skills/gherkin-scenario/SKILL.md` | ✅ | -| `skills/gherkin-step/SKILL.md` | ✅ | -| `skills/gherkin-living-doc-sync/SKILL.md` | ✅ | - -### Agent outline: `@living-doc-bdd-copilot` - -**Frontmatter:** - -```yaml ---- -description: > - Bridge living documentation to executable tests. Explore web apps via MCP Playwright, - generate and maintain PageObjects, Gherkin scenarios, and step definitions. - Handles Phase 0+1 (Business Seed + exploration), Phase 3 (scenario generation), - Phase 6 maintenance (RE-SCAN, HEALING, REMOVE). - Triggers: "scan webapp", "generate pageobjects", "heal pageobjects", "generate scenarios", - "sync gherkin", "playwright crawl", "explore the app", "bdd copilot", "BDD pipeline". -tools: - - read_file - - replace_string_in_file - - create_file - - grep_search - - file_search - - semantic_search - - run_in_terminal - - mcp_microsoft_pla_browser_navigate - - mcp_microsoft_pla_browser_snapshot - - mcp_microsoft_pla_browser_click - - mcp_microsoft_pla_browser_fill_form - - mcp_microsoft_pla_browser_take_screenshot - - mcp_microsoft_pla_browser_type - - mcp_microsoft_pla_browser_wait_for ---- -``` - -**Required body sections:** - -1. **Phase 0 — Business Seed assembly** - - Sources A–E with behaviour per source - - Credential rule: `env:VAR_NAME` in `seed.yaml` always, never literal values - - Output artifact: `.copilot/bdd/seed.yaml` - -2. **Phase 1 — Iterative exploration** - - Load `seed.yaml` + `manifest.json` (if present from prior run); absent manifest = first iteration - - Crawl loop until coverage plateau (no new surfaces in last iteration) - - Report unreachable areas → enrich seed → loop - - Output artifact: `.copilot/bdd/manifest.json` (Feature name, URL, component IDs, PageObject path) - -3. **Source E — Guided traversal protocol** - - Pause at unknown decision points, take screenshot, ask user - - Immediately append to `guided_steps:` in `seed.yaml`: `url`, `action`, `field`, `value` (`env:VAR` if sensitive), `note` - - CAPTCHA rule: pause, wait for human to solve in browser, continue; still record the step - -4. **Phase 3 — Scenario generation**: gap detection vs existing scenarios; generate via `living-doc-scenario-creator`; write step definitions; extend PageObjects - -5. **Phase 6 — Maintenance**: RE-SCAN (new feature/refactor), HEALING (test failures/selector drift), REMOVE (deprecated feature) — triggers and behaviour per mode - -6. **Scope** (10 bullets from spec): - - Load Business Seed + Exploration Manifest before crawling - - Crawl web app via MCP Playwright using manifest-guided navigation - - Fill forms and traverse wizards using business-supplied test values - - Identify Features from discovered UI surfaces - - Detect scenario gaps (existing scenarios vs US ACs) - - Generate Gherkin scenarios from User Story ACs - - Write and extend step definitions - - Heal PageObjects after UI changes (MCP Playwright drift detection) - - Challenge US/AC validity when app behaviour has changed - - Sync Gherkin feature files with living doc - -7. **Does NOT**: create living doc entities (→ `@living-doc-copilot`); write unit/integration tests (→ `@sdet-copilot`); run quality gates (→ `@quality-gate-copilot`) - -8. **Shared skill note**: `living-doc-gap-finder` is used bottom-up here (scenario coverage for known ACs) vs top-down in `@living-doc-copilot` (missing documentation) - -9. **Skills** (6): `living-doc-pageobject-scan`, `living-doc-scenario-creator`, `living-doc-gap-finder`, `gherkin-scenario`, `gherkin-step`, `gherkin-living-doc-sync` — each with path `skills/{name}/SKILL.md` - -10. **Handoff out** (two paths): - - Feature list → `@living-doc-copilot` to document - - After Phase 3: *"Feature files and steps generated. Call @sdet-copilot for unit tests."* - ---- - -### Issue 1.A — Complete remaining Step 1 skills - -**Title:** `[Step 1] Migrate remaining living-doc BDD + gherkin skills` - -**Body:** - -``` -## Summary - -Six skills remain from the Living Doc + BDD cluster. Must ship in the same PR as Issue 1.B -(spec rule: never transfer skills without the agent definition). - -## Files to create - -| Destination | Source | -|---|---| -| `skills/living-doc-pageobject-scan/SKILL.md` | `.copilot/skills/living-doc-pageobject-scan/SKILL.md` | -| `skills/living-doc-scenario-creator/SKILL.md` | `.copilot/skills/living-doc-scenario-creator/SKILL.md` | -| `skills/gherkin-scenario/SKILL.md` | `.copilot/skills/gherkin-scenario/SKILL.md` | -| `skills/gherkin-step/SKILL.md` | `.copilot/skills/gherkin-step/SKILL.md` | -| `skills/gherkin-living-doc-sync/SKILL.md` | `.copilot/skills/gherkin-living-doc-sync/SKILL.md` | - -## Acceptance criteria - -- [ ] All 6 folder names match their `name` frontmatter fields exactly -- [ ] All `description` fields ≤ 1024 chars and include trigger keywords -- [ ] `living-doc-gap-finder` description (already migrated) notes dual shared-skill usage - — verify it covers both top-down (@living-doc-copilot) and bottom-up (@living-doc-bdd-copilot) -- [ ] `gherkin-scenario` description notes optional @sdet-copilot usage at unit level -- [ ] All bodies < 500 lines (use `references/` with pointer if needed) -- [ ] No hardcoded credentials or absolute local paths -- [ ] Closed in same PR as Issue 1.B - -## Reference - -Spec → Agent Catalog → @living-doc-bdd-copilot skills table -``` - ---- - -### Issue 1.B — Create @living-doc-bdd-copilot agent - -**Title:** `[Step 1] Create @living-doc-bdd-copilot agent definition` - -**Body:** - -``` -## Summary - -Author `.github/agents/living-doc-bdd-copilot.agent.md` — automation-layer agent for web app -exploration (Phases 0+1), BDD scenario generation (Phase 3), and maintenance (Phase 6). - -## File to create - -`.github/agents/living-doc-bdd-copilot.agent.md` - -## Required frontmatter - -See roadmap.md → Step 1 → Agent outline: @living-doc-bdd-copilot for full frontmatter block. -Key requirement: all mcp_microsoft_pla_browser_* tools must be listed. - -## Required body sections - -1. Phase 0 — Business Seed assembly (Sources A–E; credential safety; seed.yaml output) -2. Phase 1 — Iterative exploration (load seed + manifest; plateau detection; manifest.json output) -3. Partial state — seed.yaml present but manifest.json absent = treat as first Phase 1 run -4. Source E — Guided traversal (pause/screenshot/ask/execute/write guided_steps; CAPTCHA rule) -5. Phase 3 — Scenario generation with gap detection -6. Phase 6 — Maintenance (RE-SCAN / HEALING / REMOVE) -7. Scope — 10 bullets -8. Does NOT — with redirect targets -9. Shared skill note for living-doc-gap-finder -10. Skills table — 7 entries with paths -11. Handoff out — two paths; prompts verbatim from spec - -## Acceptance criteria - -- [ ] All MCP Playwright tools listed in frontmatter -- [ ] Sources A–E documented with exact behaviour per source -- [ ] Credential safety rule present (env:VAR_NAME, never literal) -- [ ] Partial state handling documented -- [ ] Guided traversal protocol includes CAPTCHA pause-and-wait -- [ ] Phase 6 all three maintenance modes documented -- [ ] All 7 skills referenced as `skills/{name}/SKILL.md` -- [ ] Handoff prompt exact: "Feature files and steps generated. Call @sdet-copilot for unit tests." -- [ ] Closed in same PR as Issue 1.A -``` - ---- - -### Planned agent: `@living-doc-bdd-tutorial-copilot` - -The tutorial generation capability (previously `living-doc-tutorial-creator` skill) will ship as -a dedicated agent rather than as part of `@living-doc-bdd-copilot`. It will own the full -tutorial authoring pipeline: transform executed BDD scenarios into annotated tutorial documents, -SSML narration scripts, and onboarding walkthroughs. - -| Attribute | Value | -|---|---| -| Agent file | `.github/agents/living-doc-bdd-tutorial-copilot.agent.md` | -| Skill | `skills/living-doc-tutorial-creator/SKILL.md` — migrate from `.copilot/skills/` | -| Inbound trigger | Executed `.feature` files + optional screenshots | -| Output | Annotated tutorial `.md`, SSML narration script | -| Step | Separate step (not yet scheduled) | - ---- - -## Step 2 — SDET Cluster - -### Files to create - -| File | Type | -|---|---| -| `skills/tdd-workflow/SKILL.md` | Skill — migrate from `.copilot/skills/` | -| `skills/test-unit-write/SKILL.md` | Skill — migrate | -| `skills/test-unit-review/SKILL.md` | Skill — migrate | -| `skills/test-unit-standards/SKILL.md` | Skill — migrate | -| `skills/test-case-design/SKILL.md` | Skill — migrate | -| `skills/test-data-management/SKILL.md` | Skill — migrate | -| `skills/test-mocking-patterns/SKILL.md` | Skill — migrate | -| `.github/agents/sdet-copilot.agent.md` | Agent — author from spec | - -### Agent outline: `@sdet-copilot` - -**Frontmatter:** - -```yaml ---- -description: > - Daily developer test-engineering companion. Use for: TDD red-green-refactor, writing - unit and integration tests, reviewing existing test files, designing test case tables, - managing test data and fixtures, choosing test doubles. Phase 4 of the engineering - pipeline. Triggers: "write tests", "TDD", "review my tests", "test doubles", - "test data", "red-green-refactor", "sdet copilot", "write unit tests", - "add tests for", "design test cases", "add coverage for". -tools: - - read_file - - replace_string_in_file - - create_file - - grep_search - - file_search - - semantic_search ---- -``` - -**Required body sections:** - -1. **Technology-neutral escalation constraint** (4-step): express guidance language-agnostic first; if language-specific tooling required ask *"What is your target technology / language?"*; recommend escalating to `@quality-gate-copilot` with the matching language skill (`qa-python`, `qa-java`, `qa-scala`, `qa-typescript`, `qa-dotnet`); if no match, provide generic guidance and note the gap - -2. **Scope** (6 bullets): TDD workflow; write unit and integration test code; review and audit test files; design test case tables; manage test data; choose test doubles - -3. **Does NOT**: run CI quality gates (→ `@quality-gate-copilot`); write Gherkin/BDD *as standalone BDD pipeline deliverables* (→ `@living-doc-bdd-copilot`; `gherkin-scenario` available optionally at unit level); handle specialised test types — accessibility, security, E2E, API (→ `@test-specialist-copilot`); improve test quality depth — mutation, property-based, flakiness (→ `@test-quality-copilot`) - -4. **Skills** (7): `tdd-workflow`, `test-unit-write`, `test-unit-review`, `test-unit-standards`, `test-case-design`, `test-data-management`, `test-mocking-patterns`; note `gherkin-scenario` as optional 8th when team uses BDD at unit level - -5. **Handoff out**: *"Tests written. Run @quality-gate-copilot to enforce the gate."* - ---- - -### Issue 2.1 — SDET cluster (skills + agent) - -**Title:** `[Step 2] SDET cluster — migrate 7 skills and create @sdet-copilot agent` - -**Body:** - -``` -## Summary - -Migrate 7 SDET skills and author the @sdet-copilot agent definition as a single PR. - -## Files to create - -skills/tdd-workflow/SKILL.md, skills/test-unit-write/SKILL.md, -skills/test-unit-review/SKILL.md, skills/test-unit-standards/SKILL.md, -skills/test-case-design/SKILL.md, skills/test-data-management/SKILL.md, -skills/test-mocking-patterns/SKILL.md, .github/agents/sdet-copilot.agent.md - -## Acceptance criteria — skills - -- [ ] `tdd-workflow` body references SPEC.md-first pattern in the red-green-refactor cycle -- [ ] `test-unit-standards`, `test-unit-write`, and `test-unit-review` cross-reference each other - correctly (rule set vs procedural write vs procedural review distinction) -- [ ] All bodies < 500 lines; all descriptions ≤ 1024 chars -- [ ] All folder names match `name` frontmatter exactly - -## Acceptance criteria — agent - -- [ ] Escalation path says "recommend @quality-gate-copilot", not "load qa-* skill internally" -- [ ] Gherkin Does NOT entry is qualified: "standalone BDD pipeline deliverable"; - optional unit-level exception noted -- [ ] All 7 skills referenced by path `skills/{name}/SKILL.md` -- [ ] Handoff prompt exact: "Tests written. Run @quality-gate-copilot to enforce the gate." - -## Reference - -Spec → Agent Catalog → @sdet-copilot -``` - ---- - -## Step 3 — Code Quality Cluster - -### Files to create - -| File | Type | -|---|---| -| `skills/qa-python/SKILL.md` | Skill — migrate from `.copilot/skills/` | -| `skills/qa-java/SKILL.md` | Skill — migrate | -| `skills/qa-scala/SKILL.md` | Skill — migrate | -| `skills/qa-typescript/SKILL.md` | Skill — migrate | -| `skills/qa-dotnet/SKILL.md` | Skill — migrate | -| `skills/qa-terraform/SKILL.md` | Skill — migrate | -| `skills/test-coverage-gate/SKILL.md` | Skill — migrate | -| `.github/agents/quality-gate-copilot.agent.md` | Agent — author from spec | - -### Agent outline: `@quality-gate-copilot` - -**Frontmatter:** - -```yaml ---- -description: > - Enforce code quality standards — diagnose and fix CI quality gate failures across all - languages and stacks. Use for: linting, formatting, static analysis violations, coverage - thresholds, Javadoc, type annotations, and logging standards. Phase 5 of the pipeline. - Triggers: "quality gate", "CI failing", "coverage below", "lint error", "scalafmt", - "pylint", "quality gate copilot", "fix linting", "coverage threshold", "SpotBugs", - "ESLint violation", "dotnet format", "tflint failure". -tools: - - read_file - - grep_search - - file_search - - semantic_search - - run_in_terminal ---- -``` - -**Required body sections:** - -1. **Scope** (5 bullets): run/fix linting and formatting; fix static analysis violations; configure and enforce coverage thresholds; diagnose CI gate failures per language; apply logging/Javadoc/type annotation standards - -2. **Language routing table** — maps language/stack to skill and path: - - | Language | Skill | Path | - |---|---|---| - | Python | `qa-python` | `skills/qa-python/SKILL.md` | - | Java | `qa-java` | `skills/qa-java/SKILL.md` | - | Scala | `qa-scala` | `skills/qa-scala/SKILL.md` | - | TypeScript / JS | `qa-typescript` | `skills/qa-typescript/SKILL.md` | - | C# / .NET | `qa-dotnet` | `skills/qa-dotnet/SKILL.md` | - | HCL / Terraform | `qa-terraform` | `skills/qa-terraform/SKILL.md` | - | All (coverage) | `test-coverage-gate` | `skills/test-coverage-gate/SKILL.md` | - -3. **Does NOT**: write test code (→ `@sdet-copilot`); handle mutation testing strategy (→ `@test-quality-copilot`); author IaC modules (→ `cps-iac` in `cps-agentic-skills`, not this repo) - -4. **Skills** (7): language column + intent label + path - ---- - -### Issue 3.1 — Code Quality cluster (skills + agent) - -**Title:** `[Step 3] Code Quality cluster — migrate 7 skills and create @quality-gate-copilot agent` - -**Body:** - -``` -## Summary - -Migrate 7 code quality skills and author the @quality-gate-copilot agent as a single PR. - -## Files to create - -skills/qa-python/SKILL.md, skills/qa-java/SKILL.md, skills/qa-scala/SKILL.md, -skills/qa-typescript/SKILL.md, skills/qa-dotnet/SKILL.md, skills/qa-terraform/SKILL.md, -skills/test-coverage-gate/SKILL.md, .github/agents/quality-gate-copilot.agent.md - -## Acceptance criteria — skills - -- [ ] `qa-scala` body covers JMF filter requirement for JaCoCo -- [ ] `test-coverage-gate` distinguishes baseline measurement (no CI block) from - new-code gate (hard fail) -- [ ] Each `qa-*` description includes language-specific trigger keywords -- [ ] All folder names match `name` frontmatter exactly; all descriptions ≤ 1024 chars - -## Acceptance criteria — agent - -- [ ] Language routing table covers all 5 languages + HCL + cross-language coverage -- [ ] `run_in_terminal` present in tools (this agent executes commands) -- [ ] IaC redirect points to `cps-iac` in `cps-agentic-skills`, not this plugin -- [ ] All 7 skills referenced by path - -## Reference - -Spec → Agent Catalog → @quality-gate-copilot -``` - ---- - -## Step 4 — Test Specialist Cluster - -### Files to create - -| File | Type | -|---|---| -| `skills/test-accessibility/SKILL.md` | Skill — migrate from `.copilot/skills/` | -| `skills/test-api-standards/SKILL.md` | Skill — migrate | -| `skills/test-e2e-standards/SKILL.md` | Skill — migrate | -| `skills/test-integration-standards/SKILL.md` | Skill — migrate | -| `skills/test-ui-standards/SKILL.md` | Skill — migrate | -| `skills/test-security/SKILL.md` | Skill — migrate | -| `.github/agents/test-specialist-copilot.agent.md` | Agent — author from spec | - -### Agent outline: `@test-specialist-copilot` - -**Frontmatter:** - -```yaml ---- -description: > - Apply specialised testing for specific test types beyond standard unit tests. - Use for: accessibility (axe-core, WCAG 2.1 AA), API and Pact contract tests, - cross-service E2E, Testcontainers integration isolation, Angular/React/Cypress UI - tests, and SAST/DAST security scanning. Triggers: "a11y test", "Pact", - "E2E standards", "security scan", "Cypress", "Testcontainers", "accessibility", - "contract test", "test specialist copilot", "UI tests", "integration isolation". -tools: - - read_file - - replace_string_in_file - - create_file - - grep_search - - file_search - - semantic_search - - run_in_terminal ---- -``` - -**Required body sections:** - -1. **Specialisation routing table**: - - | Concern | Skill | Path | - |---|---|---| - | Accessibility / WCAG | `test-accessibility` | `skills/test-accessibility/SKILL.md` | - | REST + contract (Pact) | `test-api-standards` | `skills/test-api-standards/SKILL.md` | - | Cross-service E2E | `test-e2e-standards` | `skills/test-e2e-standards/SKILL.md` | - | Testcontainers / DB isolation | `test-integration-standards` | `skills/test-integration-standards/SKILL.md` | - | Angular / React / Cypress | `test-ui-standards` | `skills/test-ui-standards/SKILL.md` | - | SAST / DAST / dep scanning | `test-security` | `skills/test-security/SKILL.md` | - -2. **Scope** (6 bullets from spec) - -3. **Does NOT**: write standard unit tests (→ `@sdet-copilot`); run language-specific quality gates (→ `@quality-gate-copilot`); write BDD scenarios (→ `@living-doc-bdd-copilot`); improve test quality depth (→ `@test-quality-copilot`) - -4. **Skills** (6): Specialisation column + path - ---- - -### Issue 4.1 — Test Specialist cluster (skills + agent) - -**Title:** `[Step 4] Test Specialist cluster — migrate 6 skills and create @test-specialist-copilot agent` - -**Body:** - -``` -## Summary - -Migrate 6 Test Specialist skills and author the @test-specialist-copilot agent as a single PR. - -## Files to create - -skills/test-accessibility/SKILL.md, skills/test-api-standards/SKILL.md, -skills/test-e2e-standards/SKILL.md, skills/test-integration-standards/SKILL.md, -skills/test-ui-standards/SKILL.md, skills/test-security/SKILL.md, -.github/agents/test-specialist-copilot.agent.md - -## Acceptance criteria — skills - -- [ ] `test-accessibility` covers axe-core, jest-axe, cypress-axe, WCAG 2.1 AA -- [ ] `test-api-standards` covers Pact consumer-driven contracts -- [ ] `test-e2e-standards` is clearly distinguished from `test-ui-standards` - (cross-service boundary vs UI-only) -- [ ] `test-integration-standards` covers Testcontainers and isolation/cleanup rules -- [ ] `test-security` covers SAST (Bandit/Semgrep), DAST (ZAP), dep scanning (Snyk/pip-audit) -- [ ] All folder names match `name` frontmatter; all descriptions ≤ 1024 chars - -## Acceptance criteria — agent - -- [ ] Specialisation routing table covers all 6 concerns -- [ ] Does NOT list distinguishes from all four other test agents -- [ ] All 6 skills referenced by path - -## Reference - -Spec → Agent Catalog → @test-specialist-copilot -``` - ---- - -## Step 4b — Test Quality Cluster - -### Files to create - -| File | Type | -|---|---| -| `skills/test-mutation/SKILL.md` | Skill — migrate from `.copilot/skills/` | -| `skills/test-property-based/SKILL.md` | Skill — migrate | -| `skills/test-flakiness-diagnosis/SKILL.md` | Skill — migrate | -| `skills/test-observability/SKILL.md` | Skill — migrate | -| `.github/agents/test-quality-copilot.agent.md` | Agent — author from spec | - -### Agent outline: `@test-quality-copilot` - -**Frontmatter:** - -```yaml ---- -description: > - Improve depth and reliability of existing tests — the quality improvement layer applied - after baseline tests are in place. Use for: mutation score improvement (mutmut, PIT, - Stryker), property-based testing (Hypothesis, ScalaCheck, fast-check), flaky test - diagnosis and repair, and observability test assertions (logs, metrics, OTel traces). - Triggers: "mutation", "Hypothesis", "flaky test", "test logs", "test metrics", - "surviving mutants", "property-based", "test quality copilot", "improve test quality". -tools: - - read_file - - replace_string_in_file - - create_file - - grep_search - - file_search - - semantic_search - - run_in_terminal ---- -``` - -**Required body sections:** - -1. **Prerequisite note**: called after `@sdet-copilot` has baseline coverage; this is a depth-improvement layer, not a test-writing starter - -2. **Scope** (4 bullets): mutation testing; property-based testing; flaky test diagnosis and repair; observability assertions - -3. **Does NOT**: write new test suites from scratch (→ `@sdet-copilot`); enforce CI quality gates (→ `@quality-gate-copilot`); handle specialised test types (→ `@test-specialist-copilot`) - -4. **Skills** (4): `test-mutation`, `test-property-based`, `test-flakiness-diagnosis`, `test-observability` — Specialisation column + path each - -5. **Handoff out**: None — this agent is terminal. Quality improvements are applied in place; no downstream phase requires a handoff. - ---- - -### Issue 4b.1 — Test Quality cluster (skills + agent) - -**Title:** `[Step 4b] Test Quality cluster — migrate 4 skills and create @test-quality-copilot agent` - -**Body:** - -``` -## Summary - -Migrate 4 Test Quality skills and author the @test-quality-copilot agent as a single PR. -This is a depth-improvement layer; call after @sdet-copilot has established baseline coverage. - -## Files to create - -skills/test-mutation/SKILL.md, skills/test-property-based/SKILL.md, -skills/test-flakiness-diagnosis/SKILL.md, skills/test-observability/SKILL.md, -.github/agents/test-quality-copilot.agent.md - -## Acceptance criteria — skills - -- [ ] `test-mutation` covers mutmut (Python), PIT (Java/Scala), Stryker (JS/TS) -- [ ] `test-property-based` covers Hypothesis, ScalaCheck, fast-check -- [ ] `test-flakiness-diagnosis` covers async timing, shared state, CI environment differences -- [ ] `test-observability` covers structured log assertions, prometheus_client fake registry, - InMemorySpanExporter for OTel spans -- [ ] All folder names match `name` frontmatter; all descriptions ≤ 1024 chars - -## Acceptance criteria — agent - -- [ ] Prerequisite note present: depth-improvement layer, not starter -- [ ] Handoff out section present and states this agent is terminal (no downstream phase) -- [ ] All 4 skills referenced by path - -## Reference - -Spec → Agent Catalog → @test-quality-copilot -``` - ---- - -## Step 5 — Standalone Skills - -No agent file — these 3 skills are below the 3-skill minimum for a dedicated agent per governance rules. - -### Files to create - -| File | Type | -|---|---| -| `skills/pr-review/SKILL.md` | Skill — migrate from `.copilot/skills/` | -| `skills/contract-openapi/SKILL.md` | Skill — migrate | -| `skills/contract-schema-registry/SKILL.md` | Skill — migrate | - -> **Future note:** `contract-openapi` + `contract-schema-registry` are candidates for a -> `@devops-copilot` agent when IaC skills are consolidated here. Until then, standalone. - ---- - -### Issue 5.1 — Standalone skills - -**Title:** `[Step 5] Migrate standalone skills: pr-review, contract-openapi, contract-schema-registry` - -**Body:** - -``` -## Summary - -Migrate 3 standalone skills — final migration step. No agent file required. - -## Files to create - -| Destination | Source | -|---|---| -| `skills/pr-review/SKILL.md` | `.copilot/skills/pr-review/SKILL.md` | -| `skills/contract-openapi/SKILL.md` | `.copilot/skills/contract-openapi/SKILL.md` | -| `skills/contract-schema-registry/SKILL.md` | `.copilot/skills/contract-schema-registry/SKILL.md` | - -## Acceptance criteria - -- [ ] `pr-review` description states it is language-agnostic -- [ ] `contract-openapi` description explicitly distinguishes from `contract-schema-registry` - (REST/OpenAPI vs event schema registry) -- [ ] `contract-schema-registry` description explicitly distinguishes from `test-api-standards` - (schema registry vs Pact consumer-driven contracts) -- [ ] All folder names match `name` frontmatter; all descriptions ≤ 1024 chars -- [ ] No hardcoded internal paths or credentials - -## Reference - -Spec → Standalone Skills section; Governance Rules → Agent scope (Skills < 3 → standalone) -``` - ---- - -## Post-implementation checklist - -After all steps are merged: - -- [ ] Update `docs/README.md` — add rows to the Skill Guides table for each skill that has a companion guide -- [ ] Update `README.md` — add the full skill catalog and agent roster -- [ ] Verify summary totals: 40 unique skill files, 7 agent files -- [ ] Add evals for at least one skill per cluster under `skills/{name}/evals/` (see `docs/testing/skill-testing.md`) -- [ ] Run trigger accuracy test for shared skills (`living-doc-gap-finder`, `gherkin-scenario`) to verify correct agent activates for each intent -- [ ] Confirm `@living-doc-bdd-copilot` MCP Playwright tools are available in the target deployment environment -- [ ] Cross-check all agent handoff prompts match each other: - - `@living-doc-bdd-copilot (explore)` → *"Surfaces mapped. Call @living-doc-copilot to document them."* - - `@living-doc-copilot` → *"US and ACs are ready. Call @living-doc-bdd-copilot to generate scenarios."* - - `@living-doc-bdd-copilot` → *"Feature files and steps generated. Call @sdet-copilot for unit tests."* - - `@sdet-copilot` → *"Tests written. Run @quality-gate-copilot to enforce the gate."* - - `@quality-gate-copilot` → *"Gate green. Pipeline complete."* - - `@test-quality-copilot` → terminal (no handoff) From 8ad9e593bf5571194685a626e75b4a74ea8ec531 Mon Sep 17 00:00:00 2001 From: miroslavpojer Date: Mon, 25 May 2026 15:28:30 +0200 Subject: [PATCH 19/35] tmp --- .../token-saving/tmp/.github/dependabot.yml | 41 +++++ skills/token-saving/tmp/.yamllint.yml | 31 ++++ skills/token-saving/tmp/Makefile | 122 +++++++++++++++ skills/token-saving/tmp/pyproject.toml | 148 ++++++++++++++++++ 4 files changed, 342 insertions(+) create mode 100644 skills/token-saving/tmp/.github/dependabot.yml create mode 100644 skills/token-saving/tmp/.yamllint.yml create mode 100644 skills/token-saving/tmp/Makefile create mode 100644 skills/token-saving/tmp/pyproject.toml diff --git a/skills/token-saving/tmp/.github/dependabot.yml b/skills/token-saving/tmp/.github/dependabot.yml new file mode 100644 index 0000000..82d68f5 --- /dev/null +++ b/skills/token-saving/tmp/.github/dependabot.yml @@ -0,0 +1,41 @@ +version: 2 +updates: + - package-ecosystem: "github-actions" + directory: "/" + target-branch: "master" + schedule: + interval: "weekly" + day: "sunday" + labels: + - "infrastructure" + - "no RN" + open-pull-requests-limit: 3 + commit-message: + prefix: "chore" + include: "scope" + groups: + github-actions: + patterns: + - "*" + + - package-ecosystem: "uv" + directory: "/" + target-branch: "master" + schedule: + interval: "weekly" + day: "sunday" + cooldown: + default-days: 7 + labels: + - "infrastructure" + - "no RN" + open-pull-requests-limit: 3 + commit-message: + prefix: "chore" + include: "scope" + allow: + - dependency-type: "direct" + groups: + python-dependencies: + patterns: + - "*" diff --git a/skills/token-saving/tmp/.yamllint.yml b/skills/token-saving/tmp/.yamllint.yml new file mode 100644 index 0000000..7477a84 --- /dev/null +++ b/skills/token-saving/tmp/.yamllint.yml @@ -0,0 +1,31 @@ +# Yamllint Configuration + +--- +ignore-from-file: .gitignore + +extends: default + +rules: + line-length: + max: 120 + allow-non-breakable-inline-mappings: true + indentation: + spaces: 2 + indent-sequences: true + truthy: + # "on" is also required by GitHub Actions for workflow trigger keyword. + allowed-values: ["true", "false", "on"] + check-keys: false + comments: + min-spaces-from-content: 1 + comments-indentation: disable + document-start: disable + braces: + max-spaces-inside: 1 + brackets: + max-spaces-inside: 1 + octal-values: + forbid-implicit-octal: true + forbid-explicit-octal: true + key-duplicates: + forbid-duplicated-merge-keys: true diff --git a/skills/token-saving/tmp/Makefile b/skills/token-saving/tmp/Makefile new file mode 100644 index 0000000..296f2ab --- /dev/null +++ b/skills/token-saving/tmp/Makefile @@ -0,0 +1,122 @@ +# ══════════════════════════════════════════════════════════════ +# Makefile — Single command centre for all quality gates +# +# Mirrors the CI pipeline so developers get identical feedback +# locally. Run `make all` to execute every gate, or pick one. +# +# Typical developer workflow: +# make install — first-time setup (creates venv, installs all deps) +# make all — run every CI gate locally, or e.g. `make all-core` for a single project (in this case 'core') +# make fix — auto-fix Python lint, format issues and Terraform lint issues across all projects +# make lock — regenerate lockfile after editing pyproject.toml +# make clean — wipe all generated artifacts (.venv, caches, etc.) +# +# ══════════════════════════════════════════════════════════════ + +# Auto-discover Python projects: any directory that contains a pyproject.toml at depth 1 is a project. +# This avoids drift between the filesystem and a hand-maintained list. +PROJECTS := $(patsubst %/pyproject.toml,%,$(wildcard */pyproject.toml)) + +# .PHONY declares targets that don't correspond to files on disk. Without this, Make would skip a target +# if a file/dir with the same name exists (e.g. `core/` directory vs `unit-test-core` target). +.PHONY: all help install lock clean lint format format-fix lint-fix typecheck unit-test pip-audit lock-check tflint tflint-fix fix yamllint \ + $(foreach p,$(PROJECTS),all-$(p) clean-$(p) lint-$(p) format-$(p) format-fix-$(p) lint-fix-$(p) typecheck-$(p) unit-test-$(p) pip-audit-$(p)) + +# Default goal when running bare `make`. +.DEFAULT_GOAL := help + +help: + @echo "Targets:" + @grep -E '^[a-zA-Z_-]+.*##' $(MAKEFILE_LIST) | sed 's/:.*##/ —/' | sort + @echo "" + @echo "Discovered projects: $(PROJECTS)" + +# ────────────────────────────────────────────── +# Workspace-level targets +# ────────────────────────────────────────────── + +all: lint format typecheck unit-test pip-audit lock-check yamllint tflint ## Run every CI gate locally, for all projects + +fix: lint-fix format-fix tflint-fix ## Auto-fix Python lint, format and Terraform lint issues + +install: ## Create venv and install all deps (first-time setup) + uv sync --all-packages + +lock: ## Regenerate the workspace uv.lock after pyproject.toml changes + # Clear UV_INDEX so the lockfile is always resolved against public PyPI. + # This guarantees CI (which has no access to the corporate Artifactory) can + # always install from the committed lockfile. + UV_INDEX="" uv lock + +lock-check: ## Verify workspace uv.lock is in sync and free of corporate URLs + UV_INDEX="" uv lock --check + @if grep -q "artifacts.bcp" uv.lock; then \ + echo "::error::uv.lock contains Absa Artifactory URLs — regenerate with 'make lock'"; \ + exit 1; \ + fi + +clean: ## Remove .venv, caches, and build artifacts + rm -rf .venv .mypy_cache .ruff_cache .pytest_cache + $(foreach p,$(PROJECTS),rm -rf $(p)/.mypy_cache $(p)/.ruff_cache $(p)/.pytest_cache $(p)/*.egg-info;) + find . -type d -name __pycache__ -exec rm -rf {} + 2>/dev/null || true + +tflint: ## TFLint static analysis on Terraform files in iac/ directory + @test -d iac || (echo "ERROR: iac/ directory not found"; exit 1) + tflint --init + tflint --recursive --chdir=iac + +tflint-fix: ## TFLint auto-fix on Terraform files in iac/ directory + @test -d iac || (echo "ERROR: iac/ directory not found"; exit 1) + tflint --init + tflint --recursive --fix --chdir=iac + +yamllint: ## Yamllint check on all YAML files + uv run yamllint . + +# ────────────────────────────────────────────── +# Aggregate per-project targets — fan out to every project +# ────────────────────────────────────────────── + +lint: $(foreach p,$(PROJECTS),lint-$(p)) ## Ruff lint +format: $(foreach p,$(PROJECTS),format-$(p)) ## Ruff format check +format-fix: $(foreach p,$(PROJECTS),format-fix-$(p)) ## Ruff format — auto-fix in-place (not a CI gate) +lint-fix: $(foreach p,$(PROJECTS),lint-fix-$(p)) ## Ruff lint — auto-fix (not a CI gate) +typecheck: $(foreach p,$(PROJECTS),typecheck-$(p)) ## mypy strict +unit-test: $(foreach p,$(PROJECTS),unit-test-$(p)) ## Unit tests +pip-audit: $(foreach p,$(PROJECTS),pip-audit-$(p)) ## pip-audit + +# ────────────────────────────────────────────── +# Per-project targets (generated via template) +# +# `define` + `foreach` + `eval` creates identical targets for every project without copy-paste. +# ────────────────────────────────────────────── + +define PROJECT_RULES + +# Run every CI gate for this project only (e.g. make all-core). +all-$(1): lint-$(1) format-$(1) typecheck-$(1) unit-test-$(1) pip-audit-$(1) + +lint-$(1): + cd $(1) && uv run ruff check . + +format-$(1): + cd $(1) && uv run ruff format --check . + +format-fix-$(1): + cd $(1) && uv run ruff format . + +lint-fix-$(1): + cd $(1) && uv run ruff check --fix . + +typecheck-$(1): + cd $(1) && uv run mypy $(1) + +unit-test-$(1): + cd $(1) && uv run pytest -m unit --cov=$(1) --cov-report=term-missing + +pip-audit-$(1): + cd $(1) && uv run pip-audit --skip-editable # Ignore local editable packages, audit only real published dependencies + +endef + +$(foreach p,$(PROJECTS),$(eval $(call PROJECT_RULES,$(p)))) diff --git a/skills/token-saving/tmp/pyproject.toml b/skills/token-saving/tmp/pyproject.toml new file mode 100644 index 0000000..ecfe88d --- /dev/null +++ b/skills/token-saving/tmp/pyproject.toml @@ -0,0 +1,148 @@ +[project] +name = "TODO" +version = "0.0.0" +description = "TODO" +requires-python = ">=3.14" + +# Workspace-wide dev tools (PEP 735) — declared once, available in the shared .venv +# for every workspace member. Install with `uv sync --all-extras --all-packages`. +[dependency-groups] +dev = [ + "mypy>=1.20", + "ruff>=0.15", + "pytest>=9.0", + "pytest-asyncio>=1.3", + "pytest-cov>=7.1", + "pip-audit>=2.10", + "yamllint>=1.38", +] + +[tool.uv] +# This is a virtual workspace root — it has no installable package of its own. +# All real packages live in the member directories below. +managed = true +default-groups = ["dev"] + +[tool.uv.workspace] +members = ["core", "business_catalog"] + +# ────────────────────────────────────────────── +# Shared Ruff configuration for all workspace members. +# Each project extends this file via `extend = "../pyproject.toml"` +# and adds only project-specific overrides (e.g. known-first-party). +# ────────────────────────────────────────────── + +[tool.ruff] +target-version = "py314" +line-length = 120 + +[tool.ruff.lint] +select = [ + "E", # pycodestyle errors + "W", # pycodestyle warnings + "F", # pyflakes — unused imports, undefined names + "I", # isort — import ordering + "N", # pep8-naming + "UP", # pyupgrade — modern syntax + "S", # bandit — security + "B", # bugbear — common Python gotchas + "A", # flake8-builtins — don't shadow builtins + "C4", # comprehensions — unnecessary/inefficient comprehensions + "C90", # mccabe — cyclomatic complexity + "D", # pydocstyle — docstring conventions + "ANN", # flake8-annotations — enforce type annotations + "DTZ", # flake8-datetimez — timezone-aware datetimes + "T20", # flake8-print — no print() in production (ADR §5) + "SIM", # flake8-simplify — simplifiable constructs + # "TCH" intentionally excluded — TYPE_CHECKING guards add complexity, https://pypi.org/project/flake8-type-checking/ + # with negligible benefit for service startup; + # they also conflict with runtime annotation consumers (FastAPI, Pydantic). + "ARG", # flake8-unused-arguments + "PTH", # flake8-use-pathlib — prefer pathlib + "ERA", # eradicate — commented-out code + "PIE", # flake8-pie — misc. lints + "PT", # flake8-pytest-style + "RET", # flake8-return — return/else simplifications + "RSE", # flake8-raise — clean raise statements + "FLY", # flynt — prefer f-strings + "PERF", # perflint — performance anti-patterns + "LOG", # flake8-logging — logging best practices (ADR §5) + "RUF", # ruff-specific rules +] +ignore = [ + "D100", # missing docstring in public module (too noisy early on) + "D104", # missing docstring in public package +] + +[tool.ruff.lint.pydocstyle] +convention = "google" + +[tool.ruff.lint.mccabe] +max-complexity = 10 + +[tool.ruff.lint.isort] +lines-after-imports = 2 + +[tool.ruff.lint.pep8-naming] +# Pydantic v2 validators behave like classmethods. +classmethod-decorators = ["pydantic.field_validator", "pydantic.model_validator"] + +# ────────────────────────────────────────────── +# Shared Mypy configuration. +# Inherited by every workspace member because no member redefines [tool.mypy]. +# `mypy_path = ["core"]` is relative to this file, so members can import `core` +# without needing per-project overrides. +# ────────────────────────────────────────────── + +[tool.mypy] +strict = true +python_version = "3.14" +plugins = ["pydantic.mypy"] +mypy_path = ["$MYPY_CONFIG_FILE_DIR/core"] +warn_unused_configs = true +warn_unreachable = true +show_error_codes = true # required so ADR's "every # type: ignore needs a code" is enforceable +enable_error_code = [ + "ignore-without-code", # ADR §1 — every `# type: ignore` must have a justification + "redundant-cast", + "truthy-bool", + "truthy-iterable", + "unused-awaitable", +] + +# ────────────────────────────────────────────── +# Shared Pytest configuration. +# All markers are declared once here; individual projects use a subset. +# ────────────────────────────────────────────── + +[tool.pytest.ini_options] +asyncio_mode = "auto" +testpaths = ["tests"] +python_files = ["test_*.py"] +python_functions = ["test_*"] +addopts = [ + "--strict-markers", # unknown markers fail the run + "--strict-config", # bad config fails the run + "-ra", # summary of all non-passing tests + "--tb=short", # concise tracebacks +] +markers = [ + "unit: Pure unit tests — no I/O, no AWS, no vendor calls", +] + +# ────────────────────────────────────────────── +# Shared Coverage configuration — workspace-wide default. +# `fail_under` here is the safety-net default; projects override it +# in their own pyproject.toml (e.g. core = 90%, services = 70%). +# ────────────────────────────────────────────── + +[tool.coverage.report] +fail_under = 70 +show_missing = true +exclude_lines = [ + "pragma: no cover", + "if TYPE_CHECKING:", + "@overload", + "raise NotImplementedError", + "\\.\\.\\.", # Protocol method stubs +] From 6ac9ff3f6e10877cda2b196c7ef68f5f66cec6fc Mon Sep 17 00:00:00 2001 From: miroslavpojer Date: Tue, 26 May 2026 09:26:31 +0200 Subject: [PATCH 20/35] Remove tmp data. --- .../token-saving/tmp/.github/dependabot.yml | 41 ----- skills/token-saving/tmp/.yamllint.yml | 31 ---- skills/token-saving/tmp/Makefile | 122 --------------- skills/token-saving/tmp/pyproject.toml | 148 ------------------ 4 files changed, 342 deletions(-) delete mode 100644 skills/token-saving/tmp/.github/dependabot.yml delete mode 100644 skills/token-saving/tmp/.yamllint.yml delete mode 100644 skills/token-saving/tmp/Makefile delete mode 100644 skills/token-saving/tmp/pyproject.toml diff --git a/skills/token-saving/tmp/.github/dependabot.yml b/skills/token-saving/tmp/.github/dependabot.yml deleted file mode 100644 index 82d68f5..0000000 --- a/skills/token-saving/tmp/.github/dependabot.yml +++ /dev/null @@ -1,41 +0,0 @@ -version: 2 -updates: - - package-ecosystem: "github-actions" - directory: "/" - target-branch: "master" - schedule: - interval: "weekly" - day: "sunday" - labels: - - "infrastructure" - - "no RN" - open-pull-requests-limit: 3 - commit-message: - prefix: "chore" - include: "scope" - groups: - github-actions: - patterns: - - "*" - - - package-ecosystem: "uv" - directory: "/" - target-branch: "master" - schedule: - interval: "weekly" - day: "sunday" - cooldown: - default-days: 7 - labels: - - "infrastructure" - - "no RN" - open-pull-requests-limit: 3 - commit-message: - prefix: "chore" - include: "scope" - allow: - - dependency-type: "direct" - groups: - python-dependencies: - patterns: - - "*" diff --git a/skills/token-saving/tmp/.yamllint.yml b/skills/token-saving/tmp/.yamllint.yml deleted file mode 100644 index 7477a84..0000000 --- a/skills/token-saving/tmp/.yamllint.yml +++ /dev/null @@ -1,31 +0,0 @@ -# Yamllint Configuration - ---- -ignore-from-file: .gitignore - -extends: default - -rules: - line-length: - max: 120 - allow-non-breakable-inline-mappings: true - indentation: - spaces: 2 - indent-sequences: true - truthy: - # "on" is also required by GitHub Actions for workflow trigger keyword. - allowed-values: ["true", "false", "on"] - check-keys: false - comments: - min-spaces-from-content: 1 - comments-indentation: disable - document-start: disable - braces: - max-spaces-inside: 1 - brackets: - max-spaces-inside: 1 - octal-values: - forbid-implicit-octal: true - forbid-explicit-octal: true - key-duplicates: - forbid-duplicated-merge-keys: true diff --git a/skills/token-saving/tmp/Makefile b/skills/token-saving/tmp/Makefile deleted file mode 100644 index 296f2ab..0000000 --- a/skills/token-saving/tmp/Makefile +++ /dev/null @@ -1,122 +0,0 @@ -# ══════════════════════════════════════════════════════════════ -# Makefile — Single command centre for all quality gates -# -# Mirrors the CI pipeline so developers get identical feedback -# locally. Run `make all` to execute every gate, or pick one. -# -# Typical developer workflow: -# make install — first-time setup (creates venv, installs all deps) -# make all — run every CI gate locally, or e.g. `make all-core` for a single project (in this case 'core') -# make fix — auto-fix Python lint, format issues and Terraform lint issues across all projects -# make lock — regenerate lockfile after editing pyproject.toml -# make clean — wipe all generated artifacts (.venv, caches, etc.) -# -# ══════════════════════════════════════════════════════════════ - -# Auto-discover Python projects: any directory that contains a pyproject.toml at depth 1 is a project. -# This avoids drift between the filesystem and a hand-maintained list. -PROJECTS := $(patsubst %/pyproject.toml,%,$(wildcard */pyproject.toml)) - -# .PHONY declares targets that don't correspond to files on disk. Without this, Make would skip a target -# if a file/dir with the same name exists (e.g. `core/` directory vs `unit-test-core` target). -.PHONY: all help install lock clean lint format format-fix lint-fix typecheck unit-test pip-audit lock-check tflint tflint-fix fix yamllint \ - $(foreach p,$(PROJECTS),all-$(p) clean-$(p) lint-$(p) format-$(p) format-fix-$(p) lint-fix-$(p) typecheck-$(p) unit-test-$(p) pip-audit-$(p)) - -# Default goal when running bare `make`. -.DEFAULT_GOAL := help - -help: - @echo "Targets:" - @grep -E '^[a-zA-Z_-]+.*##' $(MAKEFILE_LIST) | sed 's/:.*##/ —/' | sort - @echo "" - @echo "Discovered projects: $(PROJECTS)" - -# ────────────────────────────────────────────── -# Workspace-level targets -# ────────────────────────────────────────────── - -all: lint format typecheck unit-test pip-audit lock-check yamllint tflint ## Run every CI gate locally, for all projects - -fix: lint-fix format-fix tflint-fix ## Auto-fix Python lint, format and Terraform lint issues - -install: ## Create venv and install all deps (first-time setup) - uv sync --all-packages - -lock: ## Regenerate the workspace uv.lock after pyproject.toml changes - # Clear UV_INDEX so the lockfile is always resolved against public PyPI. - # This guarantees CI (which has no access to the corporate Artifactory) can - # always install from the committed lockfile. - UV_INDEX="" uv lock - -lock-check: ## Verify workspace uv.lock is in sync and free of corporate URLs - UV_INDEX="" uv lock --check - @if grep -q "artifacts.bcp" uv.lock; then \ - echo "::error::uv.lock contains Absa Artifactory URLs — regenerate with 'make lock'"; \ - exit 1; \ - fi - -clean: ## Remove .venv, caches, and build artifacts - rm -rf .venv .mypy_cache .ruff_cache .pytest_cache - $(foreach p,$(PROJECTS),rm -rf $(p)/.mypy_cache $(p)/.ruff_cache $(p)/.pytest_cache $(p)/*.egg-info;) - find . -type d -name __pycache__ -exec rm -rf {} + 2>/dev/null || true - -tflint: ## TFLint static analysis on Terraform files in iac/ directory - @test -d iac || (echo "ERROR: iac/ directory not found"; exit 1) - tflint --init - tflint --recursive --chdir=iac - -tflint-fix: ## TFLint auto-fix on Terraform files in iac/ directory - @test -d iac || (echo "ERROR: iac/ directory not found"; exit 1) - tflint --init - tflint --recursive --fix --chdir=iac - -yamllint: ## Yamllint check on all YAML files - uv run yamllint . - -# ────────────────────────────────────────────── -# Aggregate per-project targets — fan out to every project -# ────────────────────────────────────────────── - -lint: $(foreach p,$(PROJECTS),lint-$(p)) ## Ruff lint -format: $(foreach p,$(PROJECTS),format-$(p)) ## Ruff format check -format-fix: $(foreach p,$(PROJECTS),format-fix-$(p)) ## Ruff format — auto-fix in-place (not a CI gate) -lint-fix: $(foreach p,$(PROJECTS),lint-fix-$(p)) ## Ruff lint — auto-fix (not a CI gate) -typecheck: $(foreach p,$(PROJECTS),typecheck-$(p)) ## mypy strict -unit-test: $(foreach p,$(PROJECTS),unit-test-$(p)) ## Unit tests -pip-audit: $(foreach p,$(PROJECTS),pip-audit-$(p)) ## pip-audit - -# ────────────────────────────────────────────── -# Per-project targets (generated via template) -# -# `define` + `foreach` + `eval` creates identical targets for every project without copy-paste. -# ────────────────────────────────────────────── - -define PROJECT_RULES - -# Run every CI gate for this project only (e.g. make all-core). -all-$(1): lint-$(1) format-$(1) typecheck-$(1) unit-test-$(1) pip-audit-$(1) - -lint-$(1): - cd $(1) && uv run ruff check . - -format-$(1): - cd $(1) && uv run ruff format --check . - -format-fix-$(1): - cd $(1) && uv run ruff format . - -lint-fix-$(1): - cd $(1) && uv run ruff check --fix . - -typecheck-$(1): - cd $(1) && uv run mypy $(1) - -unit-test-$(1): - cd $(1) && uv run pytest -m unit --cov=$(1) --cov-report=term-missing - -pip-audit-$(1): - cd $(1) && uv run pip-audit --skip-editable # Ignore local editable packages, audit only real published dependencies - -endef - -$(foreach p,$(PROJECTS),$(eval $(call PROJECT_RULES,$(p)))) diff --git a/skills/token-saving/tmp/pyproject.toml b/skills/token-saving/tmp/pyproject.toml deleted file mode 100644 index ecfe88d..0000000 --- a/skills/token-saving/tmp/pyproject.toml +++ /dev/null @@ -1,148 +0,0 @@ -[project] -name = "TODO" -version = "0.0.0" -description = "TODO" -requires-python = ">=3.14" - -# Workspace-wide dev tools (PEP 735) — declared once, available in the shared .venv -# for every workspace member. Install with `uv sync --all-extras --all-packages`. -[dependency-groups] -dev = [ - "mypy>=1.20", - "ruff>=0.15", - "pytest>=9.0", - "pytest-asyncio>=1.3", - "pytest-cov>=7.1", - "pip-audit>=2.10", - "yamllint>=1.38", -] - -[tool.uv] -# This is a virtual workspace root — it has no installable package of its own. -# All real packages live in the member directories below. -managed = true -default-groups = ["dev"] - -[tool.uv.workspace] -members = ["core", "business_catalog"] - -# ────────────────────────────────────────────── -# Shared Ruff configuration for all workspace members. -# Each project extends this file via `extend = "../pyproject.toml"` -# and adds only project-specific overrides (e.g. known-first-party). -# ────────────────────────────────────────────── - -[tool.ruff] -target-version = "py314" -line-length = 120 - -[tool.ruff.lint] -select = [ - "E", # pycodestyle errors - "W", # pycodestyle warnings - "F", # pyflakes — unused imports, undefined names - "I", # isort — import ordering - "N", # pep8-naming - "UP", # pyupgrade — modern syntax - "S", # bandit — security - "B", # bugbear — common Python gotchas - "A", # flake8-builtins — don't shadow builtins - "C4", # comprehensions — unnecessary/inefficient comprehensions - "C90", # mccabe — cyclomatic complexity - "D", # pydocstyle — docstring conventions - "ANN", # flake8-annotations — enforce type annotations - "DTZ", # flake8-datetimez — timezone-aware datetimes - "T20", # flake8-print — no print() in production (ADR §5) - "SIM", # flake8-simplify — simplifiable constructs - # "TCH" intentionally excluded — TYPE_CHECKING guards add complexity, https://pypi.org/project/flake8-type-checking/ - # with negligible benefit for service startup; - # they also conflict with runtime annotation consumers (FastAPI, Pydantic). - "ARG", # flake8-unused-arguments - "PTH", # flake8-use-pathlib — prefer pathlib - "ERA", # eradicate — commented-out code - "PIE", # flake8-pie — misc. lints - "PT", # flake8-pytest-style - "RET", # flake8-return — return/else simplifications - "RSE", # flake8-raise — clean raise statements - "FLY", # flynt — prefer f-strings - "PERF", # perflint — performance anti-patterns - "LOG", # flake8-logging — logging best practices (ADR §5) - "RUF", # ruff-specific rules -] -ignore = [ - "D100", # missing docstring in public module (too noisy early on) - "D104", # missing docstring in public package -] - -[tool.ruff.lint.pydocstyle] -convention = "google" - -[tool.ruff.lint.mccabe] -max-complexity = 10 - -[tool.ruff.lint.isort] -lines-after-imports = 2 - -[tool.ruff.lint.pep8-naming] -# Pydantic v2 validators behave like classmethods. -classmethod-decorators = ["pydantic.field_validator", "pydantic.model_validator"] - -# ────────────────────────────────────────────── -# Shared Mypy configuration. -# Inherited by every workspace member because no member redefines [tool.mypy]. -# `mypy_path = ["core"]` is relative to this file, so members can import `core` -# without needing per-project overrides. -# ────────────────────────────────────────────── - -[tool.mypy] -strict = true -python_version = "3.14" -plugins = ["pydantic.mypy"] -mypy_path = ["$MYPY_CONFIG_FILE_DIR/core"] -warn_unused_configs = true -warn_unreachable = true -show_error_codes = true # required so ADR's "every # type: ignore needs a code" is enforceable -enable_error_code = [ - "ignore-without-code", # ADR §1 — every `# type: ignore` must have a justification - "redundant-cast", - "truthy-bool", - "truthy-iterable", - "unused-awaitable", -] - -# ────────────────────────────────────────────── -# Shared Pytest configuration. -# All markers are declared once here; individual projects use a subset. -# ────────────────────────────────────────────── - -[tool.pytest.ini_options] -asyncio_mode = "auto" -testpaths = ["tests"] -python_files = ["test_*.py"] -python_functions = ["test_*"] -addopts = [ - "--strict-markers", # unknown markers fail the run - "--strict-config", # bad config fails the run - "-ra", # summary of all non-passing tests - "--tb=short", # concise tracebacks -] -markers = [ - "unit: Pure unit tests — no I/O, no AWS, no vendor calls", -] - -# ────────────────────────────────────────────── -# Shared Coverage configuration — workspace-wide default. -# `fail_under` here is the safety-net default; projects override it -# in their own pyproject.toml (e.g. core = 90%, services = 70%). -# ────────────────────────────────────────────── - -[tool.coverage.report] -fail_under = 70 -show_missing = true -exclude_lines = [ - "pragma: no cover", - "if TYPE_CHECKING:", - "@overload", - "raise NotImplementedError", - "\\.\\.\\.", # Protocol method stubs -] From 949234b7c90f53480fd2e0f0f4864ecd7c25e730 Mon Sep 17 00:00:00 2001 From: miroslavpojer Date: Wed, 27 May 2026 12:42:38 +0200 Subject: [PATCH 21/35] Updated form Unify project integration. --- .../agents/living-doc-bdd-copilot.agent.md | 95 ++++++-- .github/agents/living-doc-copilot.agent.md | 16 +- skills/gherkin-living-doc-sync/SKILL.md | 115 ++++++---- .../scripts/scan_ac_links.py | 165 ++++++++++---- skills/gherkin-scenario/SKILL.md | 69 ++++-- skills/living-doc-create-feature/SKILL.md | 4 +- .../scripts/next_id.py | 2 +- .../living-doc-create-functionality/SKILL.md | 12 +- .../scripts/next_id.py | 2 +- skills/living-doc-create-user-story/SKILL.md | 9 +- .../scripts/next_id.py | 2 +- skills/living-doc-gap-finder/SKILL.md | 79 +++---- .../living-doc-gap-finder/scripts/.DS_Store | Bin 0 -> 6148 bytes skills/living-doc-impact-analysis/SKILL.md | 38 ++-- .../scripts/.DS_Store | Bin 0 -> 6148 bytes skills/living-doc-pageobject-scan/SKILL.md | 133 ++++++++--- skills/living-doc-scenario-creator/SKILL.md | 129 +++++++++-- .../scripts/coverage_report.py | 42 ++-- skills/living-doc-update/SKILL.md | 25 +- skills/references/living-doc-glossary.md | 214 +++++++++++++++--- 20 files changed, 813 insertions(+), 338 deletions(-) create mode 100644 skills/living-doc-gap-finder/scripts/.DS_Store create mode 100644 skills/living-doc-impact-analysis/scripts/.DS_Store diff --git a/.github/agents/living-doc-bdd-copilot.agent.md b/.github/agents/living-doc-bdd-copilot.agent.md index 9898a45..4a38b89 100644 --- a/.github/agents/living-doc-bdd-copilot.agent.md +++ b/.github/agents/living-doc-bdd-copilot.agent.md @@ -40,7 +40,7 @@ Sources A–E — collect from whichever are available: | Source | Behaviour | |---|---| -| **A — Living doc catalog** | Extract Feature names, US titles, and AC texts. Map each Feature to its primary URL/route if known. | +| **A — Living documentation** | Extract Feature names, US titles, and AC texts. Map each Feature to its primary URL/route if known. | | **B — Sitemap or route config** | Parse route definitions (Angular router, React Router, `sitemap.xml`) to enumerate URL paths. | | **C — OpenAPI / Swagger spec** | Extract endpoint paths; map REST resources to UI screens where obvious. | | **D — Existing PageObjects** | Load current `.copilot/bdd/manifest.json` if present — treat known surfaces as already discovered. | @@ -59,7 +59,7 @@ credentials: 1. Search for `seed.yaml` containing a `base_url:` key. 2. Search for `manifest.json` containing an array with `pageobject_path` entries. 3. If found, load both files and record their paths for this session. -4. If NOT found, create them at a sensible location (e.g. alongside the existing living doc catalog directory if one exists, otherwise `.copilot/bdd/`). +4. If NOT found, create them at a sensible location (e.g. alongside the existing living documentation directory if one exists, otherwise `.copilot/bdd/`). 5. **On first discovery:** propose adding their locations to `.github/copilot-instructions.md` so every future agent session can load them without searching: ```markdown @@ -102,19 +102,36 @@ guided_steps: [] # populated during Source E traversal 5. Repeat until coverage plateau — no new surfaces found in the last full iteration. 6. Report any unreachable areas — auth walls, dead links, CAPTCHA gates, or forms that cannot be progressed due to missing business knowledge (unknown valid input values, business-specific field formats, required lookup codes, conditional field logic). Offer to enrich `seed.yaml` with missing routes, credentials, or form values, then loop. -**PageObject generation rule:** For every new or changed UI surface, load `living-doc-pageobject-scan` — `Create` mode for first-time generation and `Maintain` mode for selector drift. Generated PageObjects must use a file-level `living-doc: FEAT- | /route` header comment, prefer `data-testid` selectors, keep selector constants in `ALL_CAPS`, accept `page` in `__init__` / `constructor`, and expose method stubs for each interactive element. Flag any positional CSS selector as `FRAGILE`. If no matching Feature exists in the catalog, hand the surface to `@living-doc-copilot`; do not create catalog entities here. +**PageObject generation rule:** For every new or changed UI surface, load `living-doc-pageobject-scan` — `Create` mode for first-time generation and `Maintain` mode for selector drift. Generated PageObjects must use a file-level `living-doc: FEAT- | /route` header comment, prefer `data-testid` selectors, keep selector constants in `ALL_CAPS`, accept `page` in `__init__` / `constructor`, and expose method stubs for each interactive element. Flag any positional CSS selector as `FRAGILE`. If no matching Feature exists in the living documentation, hand the surface to `@living-doc-copilot`; do not create entities here. **Output artifact:** `.copilot/bdd/manifest.json` +The manifest records per-route exploration state. Schema matches the `living-doc-pageobject-scan` skill definition: + ```json -[ - { - "feature": "Authentication", - "url": "/login", - "component_ids": ["login-form", "username-input", "password-input", "submit-btn"], - "pageobject_path": "tests/pageobjects/LoginPage.ts" +{ + "version": "1.0", + "routes": { + "/login": { + "pageobject_path": "aul-ui/playwright/pages/LoginPage.ts", + "feature_id": "FEAT-001", + "last_scanned": "2026-05-26T10:30:00Z", + "elements": [ + { "data_cy": "username-input", "tag": "input" }, + { "data_cy": "password-input", "tag": "input" }, + { "data_cy": "login-btn", "tag": "cps-button" } + ], + "coverage_gaps": [], + "navigation_context": { + "prerequisites": null, + "navigation_steps": "Navigate directly to /login.", + "data_requirements": null, + "auth_role": "unauthenticated", + "notes": null + } + } } -] +} ``` --- @@ -150,8 +167,8 @@ guided_steps: After exploration completes (manifest is up to date): 1. Use the `living-doc-gap-finder` skill (bottom-up mode) to identify User Stories with `ACTIVE` ACs that have no linked Gherkin scenario. -2. For each gap: load the `living-doc-scenario-creator` skill and generate Gherkin scenario skeletons — one scenario per `Active` or `Implemented` AC, with the mandatory `# AC:` traceability tag. Skip `Planned` and `Deprecated` ACs. -3. Write `.feature` files under the project's feature directory using `-.feature` naming, e.g. `us-007-place-an-online-order.feature`. +2. For each gap: load the `living-doc-scenario-creator` skill and generate Gherkin scenario skeletons — one scenario per `Active` or `Implemented` AC, with the mandatory `@AC:` traceability tag. Skip `Planned` and `Deprecated` ACs. +3. Write `.feature` files under `features/us/` using `us--.feature` naming, e.g. `features/us/us-007-place-an-online-order.feature`. 4. The `Feature:` header must restate the User Story narrative in `As a / I can / so that` form. 5. Scenario step text must stay in business/domain language only — never mention selectors, HTTP calls, DOM details, or database operations. 6. For each generated scenario, resolve step definitions: @@ -162,7 +179,7 @@ After exploration completes (manifest is up to date): e. If neither the step nor the PageObject method exists, generate a stub that raises `NotImplementedError` (or the language-equivalent pending marker) and explicitly flag that the PageObject must be extended with the missing interaction. 7. Update `manifest.json` to record any new PageObject paths created. -**Gap detection logic:** An AC is considered uncovered if no scenario in any `.feature` file carries the AC's traceability tag (`# AC: `). +**Gap detection logic:** An AC is considered uncovered if no scenario in any `.feature` file carries the `@AC:` traceability tag. --- @@ -205,11 +222,11 @@ After exploration completes (manifest is up to date): **Scope:** Only files linked to the removed entity — do not touch other Features, PageObjects, or step definitions. 1. Identify the specific Feature/US/AC being removed. -2. Find all `.feature` files whose scenarios carry a `# AC:` tag matching the removed entity's IDs. +2. Find all `.feature` files whose scenarios carry an `@AC:` tag matching the removed entity's IDs. 3. Find PageObjects referenced only by those scenarios; find step definitions used only by those scenarios. 4. Confirm the full deletion list with the user before touching any file. 5. Remove confirmed files; update `manifest.json` to remove the deprecated entry. -6. Flag linked US/AC entities in the living doc catalog as candidates for deprecation — hand off to `@living-doc-copilot`. +6. Flag linked US/AC entities in the living documentation as candidates for deprecation — hand off to `@living-doc-copilot`. --- @@ -218,7 +235,7 @@ After exploration completes (manifest is up to date): - Load Business Seed (`seed.yaml`) and Exploration Manifest (`manifest.json`) before crawling - Crawl web app via MCP Playwright using manifest-guided navigation - Fill forms and traverse wizards using business-supplied test values from `seed.yaml` -- Identify Features from discovered UI surfaces and map them to the living doc catalog +- Identify Features from discovered UI surfaces and map them to the living documentation - Detect scenario gaps — existing Gherkin scenarios vs User Story ACs - Generate Gherkin scenarios from User Story ACs - Write and extend step definitions @@ -230,10 +247,10 @@ After exploration completes (manifest is up to date): ## Does NOT -- Create living doc catalog entities (User Stories, Features, Functionalities) → `@living-doc-copilot` -- Write unit or integration tests → `@sdet-copilot` -- Run language-specific quality gates → `@quality-gate-copilot` -- Heal the catalog layer (AC states, traceability links, entity deprecation) → `@living-doc-copilot` +- Create living documentation entities (User Stories, Features, Functionalities): `@living-doc-copilot` +- Write unit or integration tests: `@sdet-copilot` +- Run language-specific quality gates: `@quality-gate-copilot` +- Heal the catalog layer (AC states, traceability links, entity deprecation): `@living-doc-copilot` --- @@ -273,14 +290,44 @@ State values: `Planned | Implemented | Active | Deprecated` ### Gherkin traceability tag -Every `Scenario:` or `Scenario Outline:` must carry a link comment on the line immediately above it: +Every `Scenario:` or `Scenario Outline:` in a **living-doc feature file** (`features/us/` and +`features/functionalities/`) must carry two complementary annotations: + +1. A `# AC:` comment — human-readable context (ID, version, state, description, optional aspect). +2. An `@AC:` Cucumber tag — machine-readable link: `@AC:[/param:value...]`. ```gherkin -# AC: US-001-01 (v1.0.0 – Active) — Customer places an order with a saved payment method +# AC:US-1-01 (v1.0.0 - Active) — customer places an order with a saved payment method +@AC:US-1-01 Scenario: Customer successfully places an order ``` -The AC link is the single source of traceability between a scenario and the living doc catalog. Never delete or rewrite the `# AC:` comment without updating the catalog entity. +When the scenario covers only **one aspect** of a multi-aspect AC, encode it as a `/param:value` +segment on the tag and mirror it in the comment: + +```gherkin +# AC:US-1-01 (v1.0.0 - Active) — displays {required field} on login screen | aspect: username input +@AC:US-1-01/aspect:username-input +Scenario: Login form shows the username input field +``` + +Multiple ACs — one comment + tag pair per AC: + +```gherkin +# AC:US-1-01 (v1.0.0 - Active) — invalid credentials show an error message +# AC:US-1-02 (v1.0.0 - Active) — account lockout after 3 failed attempts +@AC:US-1-01 +@AC:US-1-02 +@Regression +Scenario: User is locked out after repeated failed logins +``` + +The `/param:value` format is extensible — additional params can be added as needed. +The `@AC:` tag is the single source of machine traceability. Never delete or rename an `@AC:` tag +without updating the corresponding entity. + +Feature files outside `features/us/` and `features/functionalities/` (smoke tests, regression +suites, exploratory probes) do not require these annotations. ### Feature surface types @@ -314,7 +361,7 @@ This agent generates PageObjects only for `UI` Features. API Feature coverage be | `living-doc-gap-finder` | Find ACs with no linked Gherkin scenario (bottom-up usage) | `skills/living-doc-gap-finder/SKILL.md` | | `gherkin-scenario` | Write BDD Gherkin scenarios in plain business language | `skills/gherkin-scenario/SKILL.md` | | `gherkin-step` | Implement Gherkin step definitions — clean, reusable, maintainable | `skills/gherkin-step/SKILL.md` | -| `gherkin-living-doc-sync` | Synchronise feature files and scenarios with the living doc catalog | `skills/gherkin-living-doc-sync/SKILL.md` | +| `gherkin-living-doc-sync` | Synchronise feature files and scenarios with the living documentation | `skills/gherkin-living-doc-sync/SKILL.md` | --- diff --git a/.github/agents/living-doc-copilot.agent.md b/.github/agents/living-doc-copilot.agent.md index 3566d21..935fff5 100644 --- a/.github/agents/living-doc-copilot.agent.md +++ b/.github/agents/living-doc-copilot.agent.md @@ -22,7 +22,7 @@ Requirements layer agent. Owns the living documentation catalog — creates, upd ## Initialisation -When the user is starting the living doc catalog or explicitly asks to define storage setup, ask: +When the user is starting the living documentation or explicitly asks to define storage setup, ask: > "Which storage format does your living doc use? Describe the entity structure, field names, and where entities are stored (e.g. YAML files in `docs/living-doc/`, ADO work items, Confluence pages)." @@ -46,10 +46,10 @@ Never invent a format. If the answer is incomplete, ask one targeted follow-up b ## Does NOT -- Write Gherkin scenarios or feature files → hand off to `@living-doc-bdd-copilot` -- Explore or crawl web apps → hand off to `@living-doc-bdd-copilot` -- Write any test code → hand off to `@sdet-copilot` -- Repair PageObject selectors or step definitions → hand off to `@living-doc-bdd-copilot` +- Write Gherkin scenarios or feature files: hand off to `@living-doc-bdd-copilot` +- Explore or crawl web apps: hand off to `@living-doc-bdd-copilot` +- Write any test code: hand off to `@sdet-copilot` +- Repair PageObject selectors or step definitions: hand off to `@living-doc-bdd-copilot` ## AC Metadata @@ -70,7 +70,7 @@ Every AC must carry these fields: - Fix broken traceability links: US ↔ Feature ↔ Functionality - Update `version` fields where incremented - Remove `pre-conditions` that reference deleted flows -- Does NOT repair PageObject selectors or step definition bindings → `@living-doc-bdd-copilot` +- Does NOT repair PageObject selectors or step definition bindings: `@living-doc-bdd-copilot` **PLAN** — triggered by PO descriptions without existing code: - Draft ACs from plain-language descriptions @@ -99,13 +99,13 @@ Do not cross this boundary. ## Operating rules - Confirm and cache the Storage Profile before the first persisted create or update only when the session is establishing storage setup; once confirmed, write every entity in that format, reuse it for later requests in the same session, and never invent missing field names. -- Route by request type: User Story or business journey → `living-doc-create-user-story`; atomic business rule or component behaviour → `living-doc-create-functionality`; impact or change trace → `living-doc-impact-analysis`; update or deprecate an existing entity or AC → `living-doc-update`; catalog drift or stale coverage → `living-doc-gap-finder`. +- Route by request type: User Story or business journey, use `living-doc-create-user-story`; atomic business rule or component behaviour, use `living-doc-create-functionality`; impact or change trace, use `living-doc-impact-analysis`; update or deprecate an existing entity or AC, use `living-doc-update`; catalog drift or stale coverage, use `living-doc-gap-finder`. - If a User Story request includes capability and ACs but omits actor or business value, draft the most likely `As a / I can / so that` narrative from the business context and ask for confirmation only when the role or value is genuinely ambiguous. - Use atomic ACs only: one triggering condition plus one observable outcome per AC. Every AC must include `id`, `state`, `version`, `pre-conditions`, and `not_in_scope`. Unless the confirmed Storage Profile already defines a different convention, use `AC:-` and keep AC IDs stable across updates. - PLAN mode: draft ACs first, cover happy path, error path, boundary conditions, and threshold or conversion rules where relevant, then create only after confirmation and only in `PLANNED` state. - HEALING mode: verify deleted or superseded code via repository search or explicit user confirmation before deprecating; then set stale ACs or entities to `DEPRECATED`, repair traceability links, remove or flag stale `pre-conditions`, and leave PageObjects, step definitions, and Gherkin sync to `@living-doc-bdd-copilot`. - Impact analysis: produce an explicit impact map covering affected and unaffected Features, Functionalities, User Stories, ACs, and linked scenarios; recommend version bumps on changed entities and deprecation for removed behaviours, but do not change state without user confirmation. -- Updating an `ACTIVE` AC: show OLD vs NEW side by side before writing, keep the AC ID unchanged, and bump the semantic version for business-rule changes (for example `v1.0.0` → `v1.1.0` for a threshold change). Flag linked `# AC: ...` Gherkin or scenario text as potentially stale for `@living-doc-bdd-copilot`. +- Updating an `ACTIVE` AC: show OLD vs NEW side by side before writing, keep the AC ID unchanged, and bump the semantic version for business-rule changes (for example `v1.0.0` to `v1.1.0` for a threshold change). Flag any linked `@AC:` tag annotations in feature files as potentially stale for `@living-doc-bdd-copilot`. - For Functionality requests, use a verb-phrase name, draft ACs and present them for confirmation before creating, and run a completeness checklist for thresholds, below/exactly/above-boundary behaviour, invalid or missing input, and interactions with other rules. ## Handoff diff --git a/skills/gherkin-living-doc-sync/SKILL.md b/skills/gherkin-living-doc-sync/SKILL.md index 9b84402..abd739b 100644 --- a/skills/gherkin-living-doc-sync/SKILL.md +++ b/skills/gherkin-living-doc-sync/SKILL.md @@ -1,10 +1,10 @@ --- name: gherkin-living-doc-sync description: > - Synchronise Gherkin feature files and BDD scenarios with the living documentation catalog. + Synchronise Gherkin feature files and BDD scenarios with the living documentation. Activate when scenarios diverge from User Story ACs, when step text drifts after a UI - refactor, when AC link headers or # AC: comment annotations are missing or stale, or when - propagating AC changes from the living doc back to feature files. Distinct from gap-finder + refactor, when `@AC:` tag annotations are missing or stale, or when propagating AC + changes from the living documentation back to feature files. Distinct from gap-finder (which detects missing coverage) — corrects existing links. Triggers on: "sync gherkin to living doc", "feature file out of sync", "scenario not linked to AC", "step text changed", "gherkin drift", "update living doc after BDD change", @@ -19,56 +19,71 @@ description: > > **Glossary:** Feature, Functionality, User Story, AC — see [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)). -Sync runs in three directions: (1) feature file → living doc, (2) living doc AC → feature file, -(3) step text → PageObject method signature. +Sync runs in three directions: (1) feature file to living doc, (2) living doc AC to feature file, +(3) step text to PageObject method signature. -Use `scripts/scan_ac_links.py` to detect missing or malformed `# AC:` headers before a full -sync run. +Use `scripts/scan_ac_links.py` to detect missing or malformed `@AC:` tags and missing `# AC:` +comments before a full sync run. The script only checks living-doc feature files (`features/us/` +and `features/functionalities/`) — other feature files are skipped. --- ## Step 1 — Detect the sync direction +**Upstream dependencies:** Directions that flow from the living documentation into feature files are initiated by catalog-layer operations from `@living-doc-copilot`: +- `living-doc-update` modified, added, or deprecated an AC → triggers directions 2 and 4 below +- `living-doc-impact-analysis` identified High-impact AC changes that require resync → may trigger directions 2 and 3 + | Change event | Sync direction | Action | |---|---|---| -| New `.feature` file added | Feature file → living doc | Link each scenario to an AC; create AC if missing | -| User Story AC modified or added | Living doc → feature file | Update or add the corresponding scenario | -| UI refactored (selector / method renamed) | Step text → PageObject | Update step text; re-link to PageObject method | -| US deprecated | Living doc → feature file | Emit one sync action per linked scenario; add `@deprecated`, record the reason, and flag `@review-needed` | -| Scenario added without an AC comment | Feature file → living doc | Propose an AC and add the `# AC:` header | +| New `.feature` file added | Feature file to living doc | Link each scenario to an AC; create AC if missing | +| User Story AC modified or added | Living doc to feature file | Update or add the corresponding scenario | +| UI refactored (selector / method renamed) | Step text to PageObject | Update step text; re-link to PageObject method | +| US deprecated | Living doc to feature file | Emit one sync action per linked scenario; add `@deprecated`, record the reason, and flag `@review-needed` | +| Scenario added without an `@AC:` tag | Feature file to living doc | Propose an AC and add the `@AC:` tag | --- -## Step 2 — Audit AC link headers +## Step 2 — Audit `@AC:` traceability tags -**Required AC link format** (from the glossary): +**Required traceability format** for living-doc feature files (from the glossary): ```gherkin -# AC: US-001-01 (v1.0.0 – Active) — Customer places an order +# AC:US-1-01 (v1.0.0 - Active) — customer places an order with a saved payment method +@AC:US-1-01 Scenario: Customer successfully places an order ``` -- AC ID format: `AC:-` — e.g. `AC:US-001-01`, `AC:FUNC-001-02` -- The `# AC:` comment(s) must appear on the lines immediately above `Scenario:` or `Scenario Outline:`. Multiple `# AC:` lines are allowed — a scenario may cover more than one AC, and annotation comments (e.g. `# @tag`, free-text notes) may also appear in the block. +With aspect param — when the scenario covers only one aspect of a multi-aspect AC: + +```gherkin +# AC:US-1-01 (v1.0.0 - Active) — displays {required field} on login screen | aspect: username input +@AC:US-1-01/aspect:username-input +Scenario: Login form shows the username input field +``` + +- `# AC:` comment: human-readable context — ID, version, state, description, optional aspect. +- `@AC:` Cucumber tag: `@AC:[/param:value...]` — machine-readable link. The `/param:value` format is extensible. +- The `@AC:` tag(s) must appear on the lines immediately above `Scenario:` or `Scenario Outline:`. Additional tags (e.g. `@Regression`, `@skip`) may appear in the same block. +- Full AC details (version, state, description) live in the file's `# Acceptance Criteria:` header block. **Audit checklist:** -1. Does every `Scenario:` / `Scenario Outline:` have a `# AC:` comment? -2. Does the referenced AC ID exist in the living doc catalog? -3. Does the AC state match (`Active` or `Implemented` — not `Deprecated` or `Planned`)? -4. Does the AC description match the scenario intent? +1. Does every `Scenario:` / `Scenario Outline:` in living-doc files have at least one `@AC:` tag? +2. Is the corresponding `# AC:` comment present and matching the tag's AC ID? +3. Does the referenced AC ID exist in the living documentation? +4. Does the AC state match (`Active` or `Implemented` — not `Deprecated` or `Planned`)? +5. Does the AC description (in the file header) match the scenario intent? -For each missing or mismatched link: +For each missing or mismatched tag: ``` SYNC ACTION: checkout.feature:14 Scenario: "Customer successfully places an order" - → Missing AC link header - → Proposed link: # AC: US-001-01 (v1.0.0 – Active) — Customer places an order - → Confirm or select a different AC + Missing @AC: tag + Proposed tag: @AC:US-001-01 + Confirm or select a different AC ``` ---- - ## Step 3 — Detect step text drift When step text changes after a UI refactor, the step definition binding breaks: @@ -76,10 +91,10 @@ When step text changes after a UI refactor, the step definition binding breaks: ``` DRIFT DETECTED: checkout.feature:17 Step: "When the customer clicks the Confirm Purchase button" - → No matching step definition found - → Previous match: "When the customer confirms the order" (checkout_steps.py:34) - → PageObject method: CheckoutPage.confirm_order() - → Suggested fix: update step text to "When the customer confirms the order" + No matching step definition found + Previous match: "When the customer confirms the order" (checkout_steps.py:34) + PageObject method: CheckoutPage.confirm_order() + Suggested fix: update step text to "When the customer confirms the order" OR update the step definition regex to match the new wording ``` @@ -89,13 +104,13 @@ DRIFT DETECTED: checkout.feature:17 Apply the minimum necessary change per action: -- **Add missing AC link** → insert `# AC: (v) — ` above `Scenario:` -- **Update stale AC description** → update comment text only; do not change the AC ID. Show the exact change as `OLD:` and `NEW:` lines. If the revised AC intent changed materially, flag the linked step text for review instead of restructuring the scenario in the same sync action. -- **Update scenario to match revised AC** → update step text; keep the `# AC:` link unchanged -- **Fix broken step text** → prefer updating the `.feature` file to match the existing step definition and PageObject method; only update the step definition regex when the business wording genuinely changed -- **Mark deprecated scenarios** → add `@deprecated` and `@review-needed`, plus a comment with the date and reason. Emit one action per affected scenario with file and line number. -- **Broken AC reference** → never silently remove the `# AC:` comment. Either relink it to the correct AC ID, or create the missing living doc entity with `living-doc-create-user-story` / `living-doc-create-functionality`, then update the link. -- **AC split into multiple ACs** → update the existing scenario's `# AC:` link to the primary AC; create new scenarios for additional ACs +- **Add missing `@AC:` tag**: insert `@AC:` above `Scenario:` +- **Update stale AC reference**: update the file header's `# Acceptance Criteria:` block entry; the `@AC:` tag on the scenario stays unchanged. Show the exact change as `OLD:` and `NEW:` lines. If the revised AC intent changed materially, flag the linked step text for review instead of restructuring the scenario in the same sync action. +- **Update scenario to match revised AC**: update step text; keep the `@AC:` tag unchanged +- **Fix broken step text**: prefer updating the `.feature` file to match the existing step definition and PageObject method; only update the step definition regex when the business wording genuinely changed +- **Mark deprecated scenarios**: add `@deprecated` and `@review-needed`, plus a comment with the date and reason. Emit one action per affected scenario with file and line number. +- **Broken AC reference**: never silently remove the `@AC:` tag. Either relink it to the correct AC ID, or create the missing living doc entity with `living-doc-create-user-story` / `living-doc-create-functionality`, then update the tag. +- **AC split into multiple ACs**: update the existing scenario's `@AC:` tag to the primary AC; create new scenarios for additional ACs Never delete a scenario during sync — flag it with `@review-needed` for developer decision. @@ -108,24 +123,24 @@ Do **not** apply sync changes automatically. Report `DRIFT DETECTED` blocks firs ``` DRIFT DETECTED: checkout.feature:17 Step: "When the customer clicks the Confirm Purchase button" - → No matching step definition found - → Previous match: "When the customer confirms the order" (checkout_steps.py:34) - → PageObject method: CheckoutPage.confirm_order() - → Recommended fix: update the feature file step text to match the existing step definition + No matching step definition found + Previous match: "When the customer confirms the order" (checkout_steps.py:34) + PageObject method: CheckoutPage.confirm_order() + Recommended fix: update the feature file step text to match the existing step definition OR update the step definition regex to match the new wording - → Apply change? (y/n) + Apply change? (y/n) SYNC ACTION: checkout.feature:14 Scenario: "Customer successfully places an order" - → Missing AC link header - → Proposed link: # AC: US-001-01 (v1.0.0 – Active) — Customer places an order - → Apply change? (y/n) + Missing @AC: tag + Proposed tag: @AC:US-001-01 + Apply change? (y/n) SYNC ACTION: checkout.feature:32 Scenario: "Customer reviews order totals before payment" - → Missing AC link header - → Proposed link: # AC: US-001-02 (v1.0.0 – Active) — Customer reviews the order summary before confirming payment - → Apply change? (y/n) + Missing @AC: tag + Proposed tag: @AC:US-001-02 + Apply change? (y/n) Summary: 2 missing AC links, 1 step text drift detected — apply changes? (y/n per action) ``` @@ -136,7 +151,7 @@ Summary: 2 missing AC links, 1 step text drift detected — apply changes? (y/n | Anti-pattern | Flag | |---|---| -| Scenario with no AC link | Missing traceability — add link or create AC | +| Scenario with no `@AC:` tag | Missing traceability — add tag or create AC | | Two scenarios linked to the same AC | Usually a duplicate — review | | AC linked from a scenario in a different User Story's feature file | Passive cross-US coverage — permitted but note it in the sync report. Only flag if the scenario's primary intent belongs to a different User Story (misplaced scenario) | | Step text describes implementation (selector, endpoint) | Gherkin business-language violation — refer to `gherkin-scenario` | diff --git a/skills/gherkin-living-doc-sync/scripts/scan_ac_links.py b/skills/gherkin-living-doc-sync/scripts/scan_ac_links.py index 25a5c6f..cfd1f98 100644 --- a/skills/gherkin-living-doc-sync/scripts/scan_ac_links.py +++ b/skills/gherkin-living-doc-sync/scripts/scan_ac_links.py @@ -1,17 +1,21 @@ #!/usr/bin/env python3 """ -scan_ac_links.py — scan .feature files for missing or malformed AC link headers. +scan_ac_links.py — scan living-doc .feature files for missing or malformed @AC: traceability. Usage: python scan_ac_links.py -For every Scenario: / Scenario Outline: line found in .feature files, checks that: - - A '# AC: ' comment appears on the line immediately above it +Only scans files under 'features/us/' and 'features/functionalities/' — living-doc paths. +Other feature files (smoke tests, regression suites, exploratory probes) are skipped. + +For every Scenario: / Scenario Outline: line found in living-doc files, checks that: + - At least one '@AC:' Cucumber tag appears on a tag line immediately above it (error) + - A matching '# AC:' human-readable comment is also present (warning) - The AC ID follows the canonical format: AC:- - e.g. AC:US-001-01, AC:FEAT-003-02, AC:FUNC-001-03 + e.g. AC:US-001-01, AC:US-1-01, AC:FEAT-003-02, AC:FUNC-001-03 - No two scenarios in the same file reference the same AC ID (duplicate check) -Exit code: 0 if all checks pass, 1 if any issues are found. +Exit code: 0 if all checks pass, 1 if any errors are found (warnings do not fail). Glossary reference: skills/references/living-doc-glossary.md """ @@ -20,13 +24,51 @@ import sys from pathlib import Path -# Matches: # AC: US-001-01 (v1.0.0 – Active) — description -# or simpler: # AC: US-001-01 -AC_COMMENT = re.compile(r"^\s*#\s*AC:\s*(\S+)", re.IGNORECASE) -# Canonical AC ID: AC:- where parent is US-nnn, FEAT-nnn, or FUNC-nnn -AC_ID_FORMAT = re.compile(r"^AC:(US|FEAT|FUNC)-\d{3}-\d{2}$", re.IGNORECASE) +# Matches a @AC: Cucumber tag with optional /param:value segments: +# @AC:US-1-01 or @AC:US-001-01/aspect:username-input/coverage:partial +AC_TAG = re.compile( + r"@AC:((?:US|FEAT|FUNC)-\d+-\d{2})((?:/[a-z][\w-]*:[^\s/@]+)*)", + re.IGNORECASE, +) +# Matches a # AC: human-readable comment: # AC:US-1-01 or # AC:US-1-01 (...) +AC_COMMENT_LINE = re.compile(r"^\s*#\s*AC:((?:US|FEAT|FUNC)-\d+-\d{2})", re.IGNORECASE) +# Canonical AC ID only (no params): AC:- +AC_ID_FORMAT = re.compile(r"^AC:(US|FEAT|FUNC)-\d+-\d{2}$", re.IGNORECASE) +TAG_LINE = re.compile(r"^\s*@\S+") +COMMENT_LINE = re.compile(r"^\s*#") SCENARIO_LINE = re.compile(r"^\s*(Scenario:|Scenario Outline:)\s*(.+)", re.IGNORECASE) +# Living-doc path components — only files under these directories are scanned +LIVING_DOC_PATHS = ("features/us/", "features/functionalities/") + + +def is_living_doc_file(path: Path) -> bool: + """Return True if the file is in a living-doc feature directory.""" + normalised = str(path).replace("\\", "/") + return any(segment in normalised for segment in LIVING_DOC_PATHS) + + +def get_tags_above(lines: list[str], scenario_index: int) -> list[str]: + """Return all Cucumber tag tokens from consecutive tag lines immediately above a scenario.""" + tags: list[str] = [] + i = scenario_index - 1 + while i >= 0 and TAG_LINE.match(lines[i]): + tags.extend(re.findall(r"@\S+", lines[i])) + i -= 1 + return tags + + +def get_ac_comments_above(lines: list[str], scenario_index: int) -> set[str]: + """Return AC IDs mentioned in # AC: comments in the tag/comment block above a scenario.""" + ac_ids: set[str] = set() + i = scenario_index - 1 + while i >= 0 and (TAG_LINE.match(lines[i]) or COMMENT_LINE.match(lines[i])): + m = AC_COMMENT_LINE.match(lines[i]) + if m: + ac_ids.add(m.group(1).upper()) + i -= 1 + return ac_ids + def scan_file(path: Path) -> list[dict]: issues = [] @@ -39,35 +81,67 @@ def scan_file(path: Path) -> list[dict]: lineno = i + 1 scenario_title = SCENARIO_LINE.match(line).group(2).strip() - prev = lines[i - 1].strip() if i > 0 else "" - ac_match = AC_COMMENT.match(prev) + tags_above = get_tags_above(lines, i) + ac_tags = [t for t in tags_above if t.upper().startswith("@AC:")] - if not ac_match: + if not ac_tags: issues.append({ "file": str(path), "line": lineno, "scenario": scenario_title, - "issue": "missing_ac_link", - "detail": "No '# AC: ' comment on the line immediately above this scenario.", + "issue": "missing_ac_tag", + "severity": "error", + "detail": "No '@AC:' tag on the tag line(s) immediately above this scenario.", }) continue - ac_ref = ac_match.group(1).rstrip(",;") - - if not AC_ID_FORMAT.match(ac_ref): - issues.append({ - "file": str(path), - "line": lineno, - "scenario": scenario_title, - "issue": "malformed_ac_id", - "detail": ( - f"'{ac_ref}' does not match AC:- format " - "(e.g. AC:US-001-01, AC:FEAT-003-02)." - ), - }) - continue - - seen.setdefault(ac_ref.upper(), []).append(lineno) + ac_comments = get_ac_comments_above(lines, i) + + for tag in ac_tags: + # Extract AC ID and optional /param:value segments + m = AC_TAG.match(tag.lstrip("@")) + if not m: + issues.append({ + "file": str(path), + "line": lineno, + "scenario": scenario_title, + "issue": "malformed_ac_id", + "severity": "error", + "detail": ( + f"'{tag}' does not match @AC:-[/param:value] format " + "(e.g. @AC:US-1-01, @AC:US-001-01/aspect:username-input)." + ), + }) + continue + ac_id_raw = "AC:" + m.group(1) # reconstruct full AC ID + if not AC_ID_FORMAT.match(ac_id_raw): + issues.append({ + "file": str(path), + "line": lineno, + "scenario": scenario_title, + "issue": "malformed_ac_id", + "severity": "error", + "detail": ( + f"'{ac_id_raw}' does not match AC:- format " + "(e.g. AC:US-001-01, AC:US-1-01)." + ), + }) + continue + plain_id = m.group(1).upper() # e.g. US-1-01 + seen.setdefault(ac_id_raw.upper(), []).append(lineno) + # Warn if the human-readable # AC: comment is missing for this tag + if plain_id not in {c.upper() for c in ac_comments}: + issues.append({ + "file": str(path), + "line": lineno, + "scenario": scenario_title, + "issue": "missing_ac_comment", + "severity": "warning", + "detail": ( + f"@AC:{plain_id} tag is present but no matching '# AC:{plain_id} ...' " + "human-readable comment was found above the scenario." + ), + }) for ac_id, lines_found in seen.items(): if len(lines_found) > 1: @@ -91,9 +165,16 @@ def main(features_dir: str) -> None: print(f"Error: directory not found: {features_dir}") sys.exit(1) - feature_files = sorted(root.rglob("*.feature")) + all_files = sorted(root.rglob("*.feature")) + feature_files = [f for f in all_files if is_living_doc_file(f)] + skipped = len(all_files) - len(feature_files) + + if skipped: + print(f"Skipped {skipped} non-living-doc feature file(s) (smoke, regression, exploratory).") + if not feature_files: - print(f"No .feature files found under {features_dir}") + print(f"No living-doc .feature files found under {features_dir}") + print(f"Expected files under 'features/us/' or 'features/functionalities/'") return all_issues: list[dict] = [] @@ -101,19 +182,23 @@ def main(features_dir: str) -> None: all_issues.extend(scan_file(f)) if not all_issues: - print(f"✅ All {len(feature_files)} feature file(s) pass AC link checks.") + print(f"\u2705 All {len(feature_files)} living-doc feature file(s) pass AC link checks.") return + errors = [i for i in all_issues if i.get("severity") == "error"] + warnings = [i for i in all_issues if i.get("severity") == "warning"] + by_type: dict[str, list] = {} for issue in all_issues: by_type.setdefault(issue["issue"], []).append(issue) - print(f"Found {len(all_issues)} issue(s) in {len(feature_files)} feature file(s):\n") + print(f"Found {len(errors)} error(s) and {len(warnings)} warning(s) in {len(feature_files)} living-doc feature file(s):\n") labels = { - "missing_ac_link": "MISSING AC LINK", - "malformed_ac_id": "MALFORMED AC ID", - "duplicate_ac_link": "DUPLICATE AC LINK", + "missing_ac_tag": "[ERROR] MISSING @AC: TAG", + "malformed_ac_id": "[ERROR] MALFORMED AC ID", + "duplicate_ac_link": "[ERROR] DUPLICATE AC LINK", + "missing_ac_comment": "[WARN] MISSING # AC: COMMENT", } for issue_type, items in sorted(by_type.items()): @@ -125,10 +210,10 @@ def main(features_dir: str) -> None: print(f" {item['file']}:{loc}") if item.get("scenario"): print(f" Scenario: {item['scenario']}") - print(f" → {item['detail']}") + print(f" {item['detail']}") print() - sys.exit(1) + sys.exit(1 if errors else 0) if __name__ == "__main__": diff --git a/skills/gherkin-scenario/SKILL.md b/skills/gherkin-scenario/SKILL.md index be336b5..9383399 100644 --- a/skills/gherkin-scenario/SKILL.md +++ b/skills/gherkin-scenario/SKILL.md @@ -21,28 +21,65 @@ description: > ## Traceability requirement -Every `Scenario:` or `Scenario Outline:` generated or reviewed by this skill must carry an -AC link comment on the line immediately above it, following the glossary AC ID format: +Living-doc feature files (`features/us/` and `features/functionalities/`) require two +complementary annotations above each `Scenario:` or `Scenario Outline:`: + +1. **`# AC:` comment** — human-readable context: AC ID, version, state, description, and + optionally the specific aspect this scenario covers. +2. **`@AC:` tag** — machine-readable Cucumber tag consumed by scripts and coverage reports. ```gherkin -# AC: US-001-01 (v1.0.0 – Active) — Happy path: customer places order +# AC:US-1-01 (v1.0.0 - Active) — customer places an order with a saved payment method +@AC:US-1-01 Scenario: Customer successfully places an order ... ``` -If the prompt already gives an AC ID and AC wording, copy that ID and wording into the comment; -when no lifecycle marker is supplied, use `(v1.0.0 – Active)` as the default status text. +When a scenario covers only **one aspect** of a multi-aspect AC, encode the aspect as a +`/param:value` segment on the tag and mirror it in the comment: + +```gherkin +# AC:US-1-01 (v1.0.0 - Active) — displays {required field} on login screen | aspect: username input +@AC:US-1-01/aspect:username-input +Scenario: Login form shows the username input field + ... +``` + +The `/param:value` format is extensible — future params (e.g. `/coverage:partial`) can be +appended. Multiple ACs — one comment + tag pair per AC: + +```gherkin +# AC:US-1-01 (v1.0.0 - Active) — invalid credentials show an error message +# AC:US-1-02 (v1.0.0 - Active) — account lockout after 3 failed attempts +@AC:US-1-01 +@AC:US-1-02 +@Regression +Scenario: User is locked out after repeated failed logins + ... +``` + +The AC tag prefix matches the parent entity: `@AC:US--` for User Story scenarios, +`@AC:FUNC--` for Functionality scenarios. + +**Scope:** These annotations are only required in living-doc feature files. Other feature files +(smoke tests, regression suites, exploratory probes) do not require `@AC:` tags and may use +`@AC:STANDALONE` as an optional placeholder to signal intent. `gherkin-living-doc-sync` reports +`STANDALONE`-tagged scenarios but does not flag them as traceability gaps. + +--- + +## Feature file types + +Two categories of `.feature` files exist — they have different locations, headers, and scopes: + +| Type | Location | File header | Feature block | Scope | +|---|---|---|---|---| +| User Story (E2E) | `features/us/us--.feature` | `# Source:`, `# Business Value:`, `# Acceptance Criteria:` block + `@US_ID:US-` feature tag | `Feature: ` with As-a/I-can/so-that narrative | End-to-end, user perspective | +| Functionality (system test) | `features/functionalities//func--.feature` | Similar to US — format TBD; `@FUNC_ID:FUNC-` feature tag | `Feature: ` | One atomic behavior, input to output | + +Both types use the `@AC:` + `# AC:` traceability annotations described above. Both must be written in business domain language — no implementation details, selectors, or code references. -If writing standalone scenarios (no User Story context), use `# AC: STANDALONE` as a placeholder. -When asked what comment to use for exploratory work, answer with the placeholder, say that tutorial -walkthroughs, exploratory probes, and other developer-authored scenarios without a User Story AC -all qualify, and note that `gherkin-living-doc-sync` will report `STANDALONE` scenarios without -flagging them as traceability gaps. -Standalone scenarios are permitted when they live outside the project's dedicated living doc -feature directory. Tutorial walkthroughs, exploratory probes, and any other developer-authored -scenarios that don't map to a User Story AC all qualify — the decision is the developer's. -`gherkin-living-doc-sync` will note `STANDALONE`-tagged scenarios in its sync report but will -not flag them as traceability gaps. +For non-living-doc scenarios (exploratory probes, tutorial walkthroughs, regression suites not tied to a User Story AC), `@AC:` annotations are not required. Use `@AC:STANDALONE` as an optional placeholder when explicitly signalling that a scenario is intentionally unlinked — `gherkin-living-doc-sync` will note it but not flag it as a traceability gap. --- @@ -140,7 +177,7 @@ subset of scenarios needs them. Keep `Background` to shared `Given` precondition | Assertions in Given/When | Violates keyword semantics | Move all assertions to `Then` | | Scenario depends on a previous scenario's state | Hidden ordering dependency | Each scenario must be fully self-contained | -When reviewing an existing scenario, explicitly check for a missing `# AC:` comment immediately +When reviewing an existing scenario, explicitly check for a missing `@AC:` tag immediately above each `Scenario:` or `Scenario Outline:` and call that out as a traceability defect. --- diff --git a/skills/living-doc-create-feature/SKILL.md b/skills/living-doc-create-feature/SKILL.md index 164094a..39a707f 100644 --- a/skills/living-doc-create-feature/SKILL.md +++ b/skills/living-doc-create-feature/SKILL.md @@ -11,8 +11,7 @@ description: > feature", "create feature entity", "system surface documentation", "feature owners", "feature dependencies". Does NOT trigger for: creating User Stories (use living-doc-create-user-story), defining atomic - behaviors (use living-doc-create-functionality), scanning a webapp for PageObjects - (use living-doc-pageobject-scan), generating scenarios (use living-doc-scenario-creator). + behaviors (use living-doc-create-functionality). license: Apache-2.0 compatibility: GitHub Copilot --- @@ -120,4 +119,3 @@ If `user_stories` is `[]`, repeat the orphan warning from Step 3 outside the JSO |---|---| | Creating a User Story | **living-doc-create-user-story** | | Defining an atomic behavior (Functionality) | **living-doc-create-functionality** | -| Scanning a webapp for PageObjects | **living-doc-pageobject-scan** | diff --git a/skills/living-doc-create-feature/scripts/next_id.py b/skills/living-doc-create-feature/scripts/next_id.py index e79777b..77942e0 100644 --- a/skills/living-doc-create-feature/scripts/next_id.py +++ b/skills/living-doc-create-feature/scripts/next_id.py @@ -2,7 +2,7 @@ """ next_id.py — Living Doc ID Auto-Assigner -Scans a living doc catalog and returns the next available ID for a given entity type. +Scans the living documentation and returns the next available ID for a given entity type. Use this before creating a new entity to avoid ID collisions. Usage: diff --git a/skills/living-doc-create-functionality/SKILL.md b/skills/living-doc-create-functionality/SKILL.md index d9df979..bd882ef 100644 --- a/skills/living-doc-create-functionality/SKILL.md +++ b/skills/living-doc-create-functionality/SKILL.md @@ -11,8 +11,7 @@ description: > "document a business rule", "create a functionality entity", "functionality acceptance criteria", "test_type", "unit vs integration test", "choose test type". Does NOT trigger for: end-to-end User Stories (use living-doc-create-user-story), system - surface documentation (use living-doc-create-feature), BDD scenario generation - (use living-doc-scenario-creator). + surface documentation (use living-doc-create-feature). license: Apache-2.0 compatibility: GitHub Copilot --- @@ -46,7 +45,7 @@ A Functionality must belong to at least one Feature. If the user clearly names t ## Step 3 — Elicit Functionality-level Acceptance Criteria -Functionality ACs describe atomic inputs → outputs. They are: +Functionality ACs describe atomic inputs to outputs. They are: - **Atomic**: one input condition, one output or side effect per AC - **Fast-testable**: designed for verification by unit or integration test - **Unambiguous**: exact error codes, exact output values, exact rule outcomes where relevant @@ -95,6 +94,10 @@ If contextually distinct despite similar names, create a new Functionality and n When creating a Functionality, output **one fenced `json` code block** and no extra prose inside the block. +> **ID assignment:** before assigning a `FUNC-nnn` ID, run +> `python scripts/next_id.py --type FUNC --catalog catalog.json` +> to get the next available ID and avoid collisions. + Use this canonical shape: ```json @@ -142,7 +145,7 @@ redirect to `living-doc-create-user-story`. |---|---| | Functionality name is a noun (e.g. "Password Validation") | Names must be verb phrases expressing the atomic behavior — e.g. "Validate Password Strength". | | Functionality name is broad (e.g. "Handle checkout") | That is not atomic. Split it into smaller behaviors such as validation, pricing, payment authorization, or order submission. | -| Functionality AC describes a full user journey (e.g. "User logs in and sees their dashboard") | That is a User Story AC — redirect to **living-doc-create-user-story**. Functionality ACs describe a single behavior's input → output or side effect. | +| Functionality AC describes a full user journey (e.g. "User logs in and sees their dashboard") | That is a User Story AC — redirect to **living-doc-create-user-story**. Functionality ACs describe a single behavior's input to output or side effect. | | Functionality has only happy-path ACs | Edge cases (null input, boundary values, partial validity, error codes) are missing. Run through the completeness checklist in Step 3 before confirming. | | AC says "returns error" without specifying the type or code | Specify the exact error code. Without a named code, the AC is not testable. | | AC wording is vague (e.g. "works correctly", "handles it appropriately") | Rewrite with exact `When` / `Then` behavior and explicit outputs or error codes. | @@ -156,4 +159,3 @@ redirect to `living-doc-create-user-story`. |---|---| | "Create a User Story" | `living-doc-create-user-story` — this skill documents atomic behaviors, not end-to-end User Stories | | "Create a Feature entity" | `living-doc-create-feature` — a Feature is a system surface, not an atomic behavior | -| "Generate BDD scenarios" | `living-doc-scenario-creator` — scenario generation requires a User Story with ACs | diff --git a/skills/living-doc-create-functionality/scripts/next_id.py b/skills/living-doc-create-functionality/scripts/next_id.py index e79777b..77942e0 100644 --- a/skills/living-doc-create-functionality/scripts/next_id.py +++ b/skills/living-doc-create-functionality/scripts/next_id.py @@ -2,7 +2,7 @@ """ next_id.py — Living Doc ID Auto-Assigner -Scans a living doc catalog and returns the next available ID for a given entity type. +Scans the living documentation and returns the next available ID for a given entity type. Use this before creating a new entity to avoid ID collisions. Usage: diff --git a/skills/living-doc-create-user-story/SKILL.md b/skills/living-doc-create-user-story/SKILL.md index 3b6389a..dccff9f 100644 --- a/skills/living-doc-create-user-story/SKILL.md +++ b/skills/living-doc-create-user-story/SKILL.md @@ -10,9 +10,8 @@ description: > "elicit requirements", "AC for user story", "US acceptance criteria", "review this user story", "is my narrative well-formed". Does NOT trigger for: atomic component behaviors (use living-doc-create-functionality), - documenting system surfaces (use living-doc-create-feature), generating BDD scenarios - (use living-doc-scenario-creator). - Pairs with living-doc-create-functionality and living-doc-scenario-creator. + documenting system surfaces (use living-doc-create-feature). + Pairs with living-doc-create-functionality. license: Apache-2.0 compatibility: GitHub Copilot --- @@ -64,7 +63,7 @@ Each AC must be: - **Binary** — clear pass/fail; no "should usually" or "typically" - **Single placeholder** — at most ONE `{placeholder}` per AC statement. If two aspects vary independently, write a separate AC for each. -Use `{placeholder}` syntax when a value varies, and list the concrete values immediately below. During elicitation, capture ACs in **Given / When / Then** language; in the final JSON, convert each accepted AC into a plain-language description with no Gherkin keywords. +Use `{placeholder}` syntax when a value varies, and list the concrete values immediately below. During elicitation, capture ACs using structured condition / action / outcome language; in the final JSON, convert each accepted AC into a plain-language description. When reviewing an existing User Story, classify **only happy-path ACs present** as an **Important** gap. Name the missing cases in domain language and propose 2-3 extra Given / When / Then ACs. For password-reset stories, explicitly check for: unregistered email or phone, expired token or code, already-used token or code, wrong code, and retry limits. @@ -130,7 +129,7 @@ Rules: - Use `title` rather than `name` - Use `as_a`, `i_want`, and `so_that` - Every AC object must have `id` in `US--AC-` format and a plain-language `description` -- Keep Gherkin keywords out of JSON values +- Write AC descriptions in plain language — no structured language keywords in JSON values ## Anti-patterns to flag diff --git a/skills/living-doc-create-user-story/scripts/next_id.py b/skills/living-doc-create-user-story/scripts/next_id.py index e79777b..77942e0 100644 --- a/skills/living-doc-create-user-story/scripts/next_id.py +++ b/skills/living-doc-create-user-story/scripts/next_id.py @@ -2,7 +2,7 @@ """ next_id.py — Living Doc ID Auto-Assigner -Scans a living doc catalog and returns the next available ID for a given entity type. +Scans the living documentation and returns the next available ID for a given entity type. Use this before creating a new entity to avoid ID collisions. Usage: diff --git a/skills/living-doc-gap-finder/SKILL.md b/skills/living-doc-gap-finder/SKILL.md index fb5c826..0fcbea3 100644 --- a/skills/living-doc-gap-finder/SKILL.md +++ b/skills/living-doc-gap-finder/SKILL.md @@ -5,20 +5,19 @@ description: > top-down requirement checking. Activate when auditing living doc completeness, finding undocumented behaviors, discovering orphan tests with no AC link, detecting untested ACs, producing a documentation coverage gap report, or proposing new living doc entities to fill - identified gaps. Orchestrates living-doc-pageobject-scan, living-doc-scenario-creator (read-only), - and living-doc-create-* skills. + identified gaps. Orchestrates living-doc-create-* skills. Triggers on: "find what's not documented", "living doc gaps", "what's missing in living doc", "find undocumented features", "orphan tests", "untested AC", "documentation coverage", "gap report", "what's not covered", "living doc audit", "documentation audit". Does NOT trigger for: creating new living doc objects (use living-doc-create-* skills). - Orchestrates: living-doc-pageobject-scan, living-doc-scenario-creator, and all create-* skills. + Orchestrates: all create-* skills. license: Apache-2.0 compatibility: GitHub Copilot --- # Living Doc — Gap Finder -> **Key concepts:** Feature, Functionality, User Story, AC, PageObject — see [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)). +> **Key concepts:** Feature, Functionality, User Story, AC — see [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)). ## Script — `scripts/compute_gaps.py` @@ -47,43 +46,29 @@ Before presenting the final report, normalise the script output against the taxo --- -## Usage modes - -This skill is used in two directions depending on which agent calls it: - -| Mode | Caller | What it finds | -|---|---|---| -| **Full gap audit** (default) | `@living-doc-copilot` | All 9 gap types — missing documentation entities, orphan tests, stale references, empty Features | -| **Bottom-up scenario coverage** | `@living-doc-bdd-copilot` | Gap type 1 only, scoped to Gherkin: ACs that exist in the catalog but have no linked `.feature` scenario carrying a `# AC: ` traceability tag | - -When called by `@living-doc-bdd-copilot`, skip Steps 1 and 2 of the Workflow. Go directly to Gap type 1 with the Gherkin-scoped definition below, then output only those results. - ---- - ## Gap taxonomy Nine types of gaps are detected, in order of risk: | Priority | Gap type | Description | |---|---|---| -| 1 — Blocker | **Untested AC** | An Active or Implemented AC in a User Story or Functionality has no linked test. In bottom-up scenario coverage mode (called from `@living-doc-bdd-copilot`): no `.feature` file carries a `# AC: ` traceability tag for this AC. | +| 1 — Blocker | **Untested AC** | An Active or Implemented AC in a User Story or Functionality has no linked test. | | 2 — Important | **Undocumented UI surface** | A screen or API endpoint exists in the app with no Feature entity | | 3 — Important | **Orphan Feature** | A Feature entity exists with no linked User Story | | 4 — Important | **Orphan User Story** | A User Story exists with no linked Feature | | 5 — Important | **Orphan Functionality** | A Functionality exists with no parent Feature | -| 6 — Important | **Orphan test** | A test or BDD scenario exists with no linked AC | -| 7 — Important | **Stale reference** | An active test or BDD scenario references a Deprecated AC | +| 6 — Important | **Orphan test** | A test exists with no linked AC | +| 7 — Important | **Stale reference** | An active test references a Deprecated AC | | 8 — Nit | **Undocumented Functionality** | A Functionality entity exists with no associated tests | | 9 — Nit | **Empty Feature** | A Feature entity exists with no Functionalities defined | ## Workflow -### Step 1 — Bottom-up scan (apply living-doc-pageobject-scan) +### Step 1 — Bottom-up scan -Load and follow the `living-doc-pageobject-scan` skill to build an **inventory** of: +Build an **inventory** of: - All discoverable UI screens and API endpoints -- All existing test files and BDD scenarios -- All existing PageObjects and their method coverage +- All existing test files Output: `inventory.json` — a flat list of discovered artifacts. @@ -94,7 +79,7 @@ Traverse the entity graph top-down, starting from User Stories as roots: - **User Stories** (root) — load all entities with their ACs and status - **Features** — for each User Story, follow its `features` list to reach linked Features - **Functionalities** — for each Feature, follow its `functionalities` list to reach owned Functionalities -- **Test links** — collect all test file → AC mappings for cross-referencing in Step 3 +- **Test links** — collect all test file to AC mappings for cross-referencing in Step 3 ### Step 3 — Compute gaps @@ -105,63 +90,63 @@ For each gap type: For each AC in (UserStory.ACs + Functionality.ACs) where status IN (Active, Implemented) where no linked test exists: - → GAP: UNTESTED_AC + GAP: UNTESTED_AC ``` **Gap type 2 — Undocumented UI surface:** ``` For each item in inventory (screens, API endpoints) where no Feature entity exists for this surface: - → GAP: UNDOCUMENTED_SURFACE + GAP: UNDOCUMENTED_SURFACE ``` **Gap type 3 — Orphan Feature:** ``` For each Feature reachable via entity relationships where user_stories == []: - → GAP: ORPHAN_FEATURE + GAP: ORPHAN_FEATURE ``` **Gap type 4 — Orphan User Story:** ``` For each User Story in entity graph where user_story.features == []: - → GAP: ORPHAN_USER_STORY + GAP: ORPHAN_USER_STORY ``` **Gap type 5 — Orphan Functionality:** ``` For each Functionality in entity graph where functionality.parent_feature == null: - → GAP: ORPHAN_FUNCTIONALITY + GAP: ORPHAN_FUNCTIONALITY ``` **Gap type 6 — Orphan test:** ``` For each test in inventory where no linked AC exists in any UserStory or Functionality: - → GAP: ORPHAN_TEST + GAP: ORPHAN_TEST ``` **Gap type 7 — Stale reference:** ``` For each test in inventory where linked_ac.status == Deprecated: - → GAP: STALE_REFERENCE + GAP: STALE_REFERENCE ``` **Gap type 8 — Undocumented Functionality:** ``` For each Functionality reachable via Feature `functionalities` links where no test references this Functionality's ACs: - → GAP: UNDOCUMENTED_FUNCTIONALITY + GAP: UNDOCUMENTED_FUNCTIONALITY ``` **Gap type 9 — Empty Feature:** ``` For each Feature reachable via entity relationships where functionalities == []: - → GAP: EMPTY_FEATURE + GAP: EMPTY_FEATURE ``` ### Step 4 — Prioritise by risk @@ -177,15 +162,15 @@ For each gap, propose the living doc action: | Gap type | Proposed action | |---|---| -| UNTESTED_AC | Create BDD scenario → `living-doc-scenario-creator` | -| UNDOCUMENTED_SURFACE | Create Feature entity → `living-doc-create-feature` | +| UNTESTED_AC | Create a test for the uncovered AC — use `living-doc-create-functionality` to define the behavior if not yet documented | +| UNDOCUMENTED_SURFACE | Create Feature entity — `living-doc-create-feature` | | ORPHAN_FEATURE | (1) Confirm the Feature entity actually exists in the storage profile — a broken reference may mean the Feature was renamed or deleted without updating the link. (2) If the Feature exists: link it to an existing User Story or propose creating one. (3) If deletion is the right action: **always confirm with the user before deleting** — state the Feature ID, name, and any Functionalities it owns, and ask explicitly: *"No User Story references FEAT-nnn. Delete this Feature and its N Functionalities?"* | -| ORPHAN_USER_STORY | Link to an existing Feature, or create the missing Feature → `living-doc-create-feature` | +| ORPHAN_USER_STORY | Link to an existing Feature, or create the missing Feature — `living-doc-create-feature` | | ORPHAN_FUNCTIONALITY | Link to an existing Feature, or delete if the behavior has no owning surface. Do not delete if tests reference this Functionality's ACs — resolve those first (see ORPHAN_TEST). | -| ORPHAN_TEST | Link test to an existing AC, or create a Functionality → `living-doc-create-functionality`. **Never delete a test to resolve an orphan — that would silently remove coverage.** If the linked AC ID no longer exists (broken link), choose from: (1) recreate the AC/Functionality if the behavior is still required; (2) update the link to the merged AC ID if the entity was merged; (3) delete the test only after product owner confirmation that the behavior has been intentionally removed. | +| ORPHAN_TEST | Link test to an existing AC, or create a Functionality — `living-doc-create-functionality`. **Never delete a test to resolve an orphan — that would silently remove coverage.** If the linked AC ID no longer exists (broken link), choose from: (1) recreate the AC/Functionality if the behavior is still required; (2) update the link to the merged AC ID if the entity was merged; (3) delete the test only after product owner confirmation that the behavior has been intentionally removed. | | STALE_REFERENCE | Update the test to reference the active replacement AC. If the deprecated behavior was intentionally removed, delete the test after product owner confirmation. If removed in error, reinstate the AC using `living-doc-update`. | | UNDOCUMENTED_FUNCTIONALITY | Create unit/integration tests for the Functionality's ACs | -| EMPTY_FEATURE | Create Functionalities for the Feature's known behaviors → `living-doc-create-functionality` | +| EMPTY_FEATURE | Create Functionalities for the Feature's known behaviors — `living-doc-create-functionality` | > **Out-of-scope actions:** living-doc-gap-finder identifies and proposes new entities — it does > not create them. Direct creation requests (e.g. "create a User Story", "create a Feature") must @@ -209,7 +194,7 @@ For each gap, propose the living doc action: "severity": "Blocker", "entity": "AC:US-007-02", "description": "Active AC 'Payment declined' has no linked E2E test", - "proposed_action": "Generate BDD scenario using living-doc-scenario-creator for US-007" + "proposed_action": "Create a test to cover AC:US-007-02 for US-007" }, { "id": "GAP-002", @@ -247,7 +232,7 @@ Before addressing any other gap type, guarantee minimum traceability across all 1. List all User Stories where **zero ACs** have a linked test. 2. For each, identify the highest-priority AC (first Active AC, or the first AC if none is Active). -3. Create one test or BDD scenario for that AC → `living-doc-scenario-creator`. +3. Create one test for that AC using the appropriate testing workflow. 4. Repeat until every User Story has at least one covered AC. This phase establishes a baseline coverage floor. Do not skip to Phase 2 until all User Stories @@ -265,18 +250,18 @@ Once the baseline is met, continue by tackling the biggest remaining gaps first: Processing everything at once is discouraged because the resulting gap list is too large to action without clear prioritisation. -## Lightweight scenario-coverage report format +## Lightweight coverage report format -When the focus is specifically on scenario-to-AC coverage (rather than the full gap taxonomy), +When the focus is specifically on test-to-AC coverage (rather than the full gap taxonomy), or when asked to demonstrate or describe the gap report output format, use this simplified two-section format: -**Missing Scenarios** (ACs with no linked Gherkin scenario): +**Missing Tests** (ACs with no linked test): - `` — -**Missing ACs** (Gherkin scenarios with no corresponding AC): -- `` — +**Orphan Tests** (tests with no corresponding AC): +- `` — -End with a summary line: `X ACs missing scenarios, Y scenarios missing ACs.` +End with a summary line: `X ACs missing tests, Y tests missing ACs.` This format is diagnostic only — it does not suggest implementation changes. diff --git a/skills/living-doc-gap-finder/scripts/.DS_Store b/skills/living-doc-gap-finder/scripts/.DS_Store new file mode 100644 index 0000000000000000000000000000000000000000..5008ddfcf53c02e82d7eee2e57c38e5672ef89f6 GIT binary patch literal 6148 zcmeH~Jr2S!425mzP>H1@V-^m;4Wg<&0T*E43hX&L&p$$qDprKhvt+--jT7}7np#A3 zem<@ulZcFPQ@L2!n>{z**++&mCkOWA81W14cNZlEfg7;MkzE(HCqgga^y>{tEnwC%0;vJ&^%eQ zLs35+`xjp>T0 Analyse the impact of a code change on the living documentation. Given a PR diff, - modified module, or changed API contract, trace which Features, Functionalities, User Stories, - and Gherkin scenarios are affected. Output an impact map that identifies what must be reviewed, + modified module, or changed API contract, trace which Features, Functionalities, and User Stories + are affected. Output an impact map that identifies what must be reviewed, updated, or re-tested. Activate when a PR touches business logic and you need to know what living doc entities are affected, when a service module is refactored, or when breaking API changes need living doc coverage traced. Triggers on: "living doc impact", "what does this change affect", "impact of PR on living doc", "trace affected user stories", "affected features", "impact analysis", "living doc sign-off", - "what user stories are affected", "which scenarios need re-running", "PR impact on docs". + "what user stories are affected", "PR impact on docs". Does NOT trigger for: updating living doc (use living-doc-update), finding coverage gaps (use living-doc-gap-finder), creating new entities (use living-doc-create-* skills). @@ -68,15 +68,14 @@ Start from the code change (PR diff, renamed module, deleted endpoint): ## Step 2 — Trace to living doc entities -Walk the entity hierarchy from Feature → Functionality → User Story: +Walk the entity hierarchy from Feature, Functionality, to User Story: ``` Changed module: src/payments/checkout/PromoService.java - → Feature: FEAT-promotions - → Functionalities: FUNC-promo-validate, FUNC-promo-apply - → User Stories: US-042 (apply promo), US-067 (expired promo error) - → ACs affected: AC:US-042-01, AC:US-042-03, AC:US-067-02 - → Linked scenarios: checkout/promo_apply.feature (Scenarios 1, 3), checkout/promo_error.feature (Scenario 2) + Feature: FEAT-promotions + Functionalities: FUNC-promo-validate, FUNC-promo-apply + User Stories: US-042 (apply promo), US-067 (expired promo error) + ACs affected: AC:US-042-01, AC:US-042-03, AC:US-067-02 ``` Repeat for every changed module. Consolidate entities that appear more than once — they are @@ -92,8 +91,8 @@ that covers all affected Feature areas. | Impact level | Criteria | Action required | |---|---|---| -| **High** | AC or business rule directly changed or deleted | Must update living doc and re-run linked scenarios | -| **Medium** | Module changed but business rule unchanged (refactor/rename) | Update living doc if method names referenced; confirm scenarios still pass | +| **High** | AC or business rule directly changed or deleted | Must update living doc and re-run linked tests | +| **Medium** | Module changed but business rule unchanged (refactor/rename) | Update living doc if method names referenced; confirm tests still pass | | **Low** | Config / infra change that alters a business flow | Update living doc if the flow change is documented; note in PR | | **None** | Pure infrastructure change (resource limits, scaling, deployment config) with no business flow impact; or test files, mocks, build scripts only | No living doc update needed | @@ -116,20 +115,14 @@ IMPACT MAP — PR #217: "Refactor promo validation to support stacked discounts" AC:US-042-03 — Stacked promos applied in priority order ← NEW BEHAVIOUR AC:US-067-02 — Expired promo returns 422 - Scenarios requiring re-run: - checkout/promo_apply.feature — Scenarios: 1, 3 - checkout/promo_error.feature — Scenario: 2 - Recommended actions: 1. Update living-doc: add AC for stacked discount priority order (AC:US-042-03 is new) - → Invoke living-doc-update - 2. Sync Gherkin: promo_apply.feature Scenario 3 needs updating for stacked discount - → Invoke gherkin-living-doc-sync - 3. Re-run E2E journeys: US-042 and US-067 critical path scenarios - → Invoke test-e2e-standards + Invoke living-doc-update + 2. Re-run E2E journeys: US-042 and US-067 critical path scenarios + Invoke test-e2e-standards ``` -If the request is framed as **"what needs re-testing"**, present Step 4 as a compact **re-test checklist**: group by Feature / Functionality / User Story, list the affected ACs, and then list every linked Gherkin scenario that must be re-run. +If the request is framed as **"what needs re-testing"**, present Step 4 as a compact **re-test checklist**: group by Feature / Functionality / User Story and list the affected ACs. ## Step 5 — Release sign-off checklist @@ -138,9 +131,7 @@ Before a release, confirm that all High-impact entities have been addressed: | Check | Status | |---|---| | All High-impact ACs reviewed and updated if needed | ☐ | -| All linked Gherkin scenarios re-run and passing | ☐ | | living-doc-update applied for any changed business rules | ☐ | -| gherkin-living-doc-sync run for any drifted step text | ☐ | Produce this checklist as a PR comment or documentation artefact if requested. @@ -167,7 +158,6 @@ Do not include speculative changes beyond the described scope. | Anti-pattern | Flag | |---|---| | Changed domain logic with no Feature entity defined in the living doc | Missing living doc coverage — flag as a **High-impact gap** and recommend creating documentation with `living-doc-create-functionality` | -| AC not linked to any Gherkin scenario after a High-impact change | Coverage gap — flag for gherkin-living-doc-sync | | Impact analysis only covers unit/integration tests, not E2E scenarios | Incomplete impact — flag for test-e2e-standards review | ## Out-of-scope redirects diff --git a/skills/living-doc-impact-analysis/scripts/.DS_Store b/skills/living-doc-impact-analysis/scripts/.DS_Store new file mode 100644 index 0000000000000000000000000000000000000000..5008ddfcf53c02e82d7eee2e57c38e5672ef89f6 GIT binary patch literal 6148 zcmeH~Jr2S!425mzP>H1@V-^m;4Wg<&0T*E43hX&L&p$$qDprKhvt+--jT7}7np#A3 zem<@ulZcFPQ@L2!n>{z**++&mCkOWA81W14cNZlEfg7;MkzE(HCqgga^y>{tEnwC%0;vJ&^%eQ zLs35+`xjp>T0`) is recorded in a file-level header comment (see -examples above) — not in the class docstring. +examples above) — not in the class docstring. The exact multi-field header format for +PageObject files is TBD and will follow similar conventions to the US/FUNC feature file header. Flag fragile selectors: @@ -141,21 +142,28 @@ change report. **4. Map PageObjects to Feature entities** -One PageObject ≈ one `UI` Feature. For each generated PageObject: -- If a matching Feature (`FEAT-`) exists in the catalog: link them in the manifest -- If no Feature exists: invoke `living-doc-create-feature` to produce a draft Feature entity in the project's Storage Profile format +One PageObject ≈ one `UI` Feature. Write the Feature ID as a header comment in the generated PageObject file (the `// living-doc: FEAT- | ` line shown in the templates above). Also record `feature_id` in the manifest entry for the route. -**5. Generate Functionality stubs from discovered elements** +- If a matching Feature (`FEAT-`) exists in the living documentation: add the header comment and manifest entry. +- If no Feature exists: write `// living-doc: FEAT-UNKNOWN | ` as a placeholder and flag the route in the scan report as **"needs Feature entity"**. Do not auto-create a Feature file — raise it for the team to create via `living-doc-create-feature`. -For each interactive element, propose a Functionality stub (`FUNC-`) with a name following +**5. Generate Functionality stubs from discovered behaviors** + +For each **behavior** identified on the screen — an interaction pattern, business operation, or +component capability — propose a Functionality stub (`FUNC-`) with a name following the glossary pattern ``: -- Button → `"Checkout Page – Confirm Order"` -- Form → `"Login Page – Submit Credentials"` -- Table → `"Order History Page – Display Order List"` +- Button: `"Checkout Page – Confirm Order"` +- Form: `"Login Page – Submit Credentials"` +- Table: `"Order History Page – Display Order List"` + +Note: a Functionality represents a business behavior, not an individual UI element. One interactive +element may map to one Functionality, or a group of elements may represent a single behavior. +The team decides the appropriate granularity when promoting stubs. -Output as draft Functionality entities in the project's Storage Profile format for review — not -auto-committed. Use `living-doc-create-functionality` to produce the canonical output. +Output Functionality feature file stubs to `features/functionalities//func-.feature` +with `@FUNC_ID:FUNC-UNKNOWN` placeholder tags for team review. When the Functionality is confirmed +and an ID is assigned, use `living-doc-create-functionality` to populate the canonical entity file. **Dynamic list elements:** @@ -172,29 +180,55 @@ def get_cart_item_by_sku(self, sku: str): ## Maintain mode — rescan and update +**0. Load manifest and prioritise routes** + +Read `.copilot/bdd/manifest.json`. Sort routes by `last_scanned` ascending (oldest first). For focused healing (triggered by failing tests or a PR), filter to the routes linked to the failing test files or the changed UI paths provided by the caller. + **1. Diff existing PageObjects against current DOM** -For each selector in the existing PageObject, check if it still resolves: +For each route to scan, navigate using `navigation_context.navigation_steps` if present — this avoids rediscovering hard-to-reach routes. For each selector in the existing PageObject, check if it still resolves: - **Present and unchanged**: no action - **Present but changed**: update selector; log as `UPDATED`; if the replacement selector is evident (for example a renamed `data-testid`), report the exact new selector in the action required line - **Missing**: flag as `BREAKING CHANGE` — linked test steps may fail -**2. Detect new elements** → propose additions. +**2. Detect new elements**: propose additions. **3. Update PageObject files** — modify selector constants only. Preserve existing action and assertion method logic. Never auto-delete methods — flag removals for developer review. For missing selectors, keep the selector constant and annotate it with a `BREAKING` comment so developers can review whether the element was removed or renamed. -**4. Breaking change report:** +**4. Breaking change report** +Write results to `.copilot/bdd/breaking-changes.md`. The file has a fixed structure and is overwritten on each scan: + +```markdown +# Breaking Changes Report + +Generated: +Scan scope: + +## + +| Selector | Status | Linked test | Action | +|---|---|---|---| +| `PageObject.locatorName` | REMOVED | `feature-file.feature:` | Verify if element was removed or renamed | +| `PageObject.otherLocator` | CHANGED | — | Update selector constant | + +## Routes needing a Feature entity + +| Route | PageObject | Reason | +|---|---|---| +| `/auth/settings` | `SettingsPage.ts` | No matching FEAT-xxx found in the living documentation | ``` -BREAKING CHANGES DETECTED: - CheckoutPage.CONFIRM_BUTTON: '[data-testid="confirm-order-btn"]' not found in DOM - → Linked step: "When the customer confirms the order" (checkout.feature:14) - → Action required: verify selector and update, or remove step if element is gone -``` + +**5. Update manifest** + +After confirmation of all changes, update the manifest entry for each scanned route: +- Set `last_scanned` to the current ISO 8601 timestamp. +- Update `elements` and `coverage_gaps` to reflect the current DOM state. +- Populate or update `navigation_context` if new information was gathered about how to reach the route. Use `scripts/manifest_diff.py` to detect stale manifest entries and undocumented PageObject files before running a full rescan. @@ -203,15 +237,62 @@ files before running a full rescan. ## Output artifacts -| Artifact | Example location | +| Artifact | Location | |---|---| -| PageObject files | `tests/pages/Page.py` | -| Draft Feature entities | `docs/living-doc/features/draft/FEAT-.` | -| Draft Functionality entities | `docs/living-doc/functionalities/draft/FUNC-.` | -| Breaking change report | stdout / PR comment | -| Exploration manifest | Path discovered by agent on session start (search for `manifest.json` with `pageobject_path` entries); created at `.copilot/bdd/manifest.json` only if no existing manifest is found | +| PageObject files | `tests/pages/Page.py` (or `.ts`) | +| Feature link | `// living-doc: FEAT- \| ` header comment in the PageObject file. If no Feature exists: `FEAT-UNKNOWN` placeholder and a note in the scan report. Header format TBD — will follow similar conventions to the US/FUNC feature file header. | +| Functionality feature file stubs | `features/functionalities//func-.feature` — one file per discovered Functionality behavior, `@FUNC_ID:FUNC-UNKNOWN` tag until ID is assigned | +| Breaking change report | `.copilot/bdd/breaking-changes.md` | +| Exploration manifest | `.copilot/bdd/manifest.json` | + +> **Note:** Locations above are illustrative defaults. Actual paths depend on the project's repository structure and Storage Profile configuration. + +--- -> **Note:** Locations above are illustrative defaults. Actual paths and file formats depend on the project's repository structure and Storage Profile configuration. +## Manifest schema + +The manifest records per-route exploration state. Agents and tools read it to drive healing sessions without re-discovering routes. + +```json +{ + "version": "1.0", + "routes": { + "/auth/all-domains": { + "pageobject_path": "aul-ui/playwright/pages/AllDomainsPage.ts", + "feature_id": "FEAT-001", + "last_scanned": "2026-05-26T10:30:00Z", + "elements": [ + { "data_cy": "create-domain-btn", "tag": "cps-button" }, + { "data_cy": "domains-table", "tag": "table" } + ], + "coverage_gaps": [ + { "tag": "input", "placeholder": "Search domains", "suggested_data_cy": "domains-search-input" } + ], + "navigation_context": { + "prerequisites": "User must be logged in.", + "navigation_steps": "Click sidebar item \u2018All Domains\u2019.", + "data_requirements": null, + "auth_role": "standard user", + "notes": null + } + } + } +} +``` + +| Field | Type | Purpose | +|---|---|---| +| `last_scanned` | ISO 8601 string | Timestamp of the last successful scan for this route. Used during healing to surface stale entries and prioritise rescans. | +| `elements` | array | All `data-cy` elements found on the route at last scan. | +| `coverage_gaps` | array | Interactive elements lacking `data-cy` at time of scan, with suggested names. | +| `pageobject_path` | string | Relative path to the linked PageObject file. | +| `feature_id` | string | Living doc Feature entity ID linked to this route. | +| `navigation_context` | object | **How to reach hard-to-access routes.** Populated on first discovery; reused in all subsequent healing sessions so the agent can navigate directly without re-discovering the path. | +| `navigation_context.prerequisites` | string | State that must exist before navigating (e.g. "a domain must have been visited at least once"). | +| `navigation_context.navigation_steps` | string | Step-by-step path to the route from the app root or login page. | +| `navigation_context.data_requirements` | string/null | Test data that must exist (e.g. "at least one published domain"). | +| `navigation_context.auth_role` | string | Minimum role required to reach this route. | +| `navigation_context.notes` | string/null | Any additional context for the agent (e.g. quirks, timing, overlay triggers). | --- diff --git a/skills/living-doc-scenario-creator/SKILL.md b/skills/living-doc-scenario-creator/SKILL.md index ad9d7c7..1bab175 100644 --- a/skills/living-doc-scenario-creator/SKILL.md +++ b/skills/living-doc-scenario-creator/SKILL.md @@ -24,12 +24,26 @@ description: > **AC ID format:** `AC:-` — e.g. `AC:US-001-01`, `AC:US-001-02` -**AC traceability tag** (mandatory — placed above every `Scenario:` line): +**AC traceability** (required for living-doc feature files — placed above every `Scenario:` in `features/us/` and `features/functionalities/`): + ```gherkin -# AC: US-001-01 (v1.0.0 – Active) — customer places an order +# AC:US-1-01 (v1.0.0 - Active) — customer places an order with a saved payment method +@AC:US-1-01 Scenario: Customer successfully places an order ``` +When a scenario covers only **one aspect** of a multi-aspect AC, encode it as a `/aspect:value` +param on the tag and mirror it in the comment: + +```gherkin +# AC:US-1-01 (v1.0.0 - Active) — displays {required field} on login screen | aspect: username input +@AC:US-1-01/aspect:username-input +Scenario: Login form shows the username input field +``` + +The `/param:value` format is extensible — additional params can be added after `/aspect:value`. +The `# AC:` comment provides human context (version, state, description, aspect). The `@AC:` Cucumber tag provides machine traceability for scripts and coverage reports. + Only ACs with state `Active` or `Implemented` drive scenario generation. ACs with state `Planned` or `Deprecated` are excluded from generation; note them in the coverage report. @@ -39,7 +53,7 @@ ACs with state `Planned` or `Deprecated` are excluded from generation; note them | Input | Source | Required | |---|---|---| -| User Story (with ACs) | Living doc catalog or inline JSON | Yes | +| User Story (with ACs) | User Story entity file (or inline JSON) | Yes | | Available PageObjects | `tests/pages/` directory | Recommended | | Existing step definitions | `tests/steps/` directory | Recommended | @@ -60,23 +74,24 @@ Load the User Story. Confirm: Treat requests such as “write feature tests for US-007” as requests to generate BDD scenarios plus a coverage table for that User Story. If no ACs are `Active` or `Implemented`, do **not** generate empty or stub scenarios. Instead, -output a coverage report that lists every AC with its state-specific skip reason (`Planned` → -`skipped — not yet active`, `Deprecated` → `skipped — deprecated AC`) and advise the user to +output a coverage report that lists every AC with its state-specific skip reason (`Planned`: +`skipped — not yet active`, `Deprecated`: `skipped — deprecated AC`) and advise the user to re-run the scenario creator when an AC becomes `Active` or `Implemented`. ### Step 2 — Map each AC to a scenario For each active AC, select the scenario pattern by AC type: -- `happy_path` → `Scenario:` or `Scenario Outline:` (if data-driven) -- `error` → `Scenario: `. If the AC text already gives a crisp business-facing failure title (for example, `Order rejected when payment card is declined`), prefer that exact title instead of mechanically prefixing the User Story title. -- `alternative` → `Scenario: ` +- `happy_path`: `Scenario:` or `Scenario Outline:` (if data-driven) +- `error`: `Scenario: `. If the AC text already gives a crisp business-facing failure title (for example, `Order rejected when payment card is declined`), prefer that exact title instead of mechanically prefixing the User Story title. +- `alternative`: `Scenario: ` Generate a scenario for **every** active AC. Map Given-When-Then from the AC to existing step definitions — reuse exact step text where found. Keep all step text in domain/business language only; never mention HTTP, APIs, selectors, DOM details, databases, or other implementation mechanics. ```gherkin -# AC: US-001-01 (v1.0.0 – Active) — Customer places an order with a saved payment method +# AC:US-1-01 (v1.0.0 - Active) — customer places an order with a saved payment method +@AC:US-1-01 Scenario: Customer successfully places an order Given the customer has items in their cart and a saved payment method When the customer confirms the order @@ -96,9 +111,9 @@ Generate the full stub using the available method: ``` MISSING STEP: "Given the customer has items in their cart and a saved payment method" - → PageObject candidate: CheckoutPage (FEAT-003) - → Suggested step file: tests/steps/checkout_steps.py - → Generated stub: + PageObject candidate: CheckoutPage (FEAT-003) + Suggested step file: tests/steps/checkout_steps.py + Generated stub: @given('the customer has items in their cart and a saved payment method') def step_customer_has_cart_with_payment(context): context.checkout_page = CheckoutPage(context.browser) @@ -114,8 +129,8 @@ Generate a stub with a `NotImplementedError` failure guard and flag the gap to ``` MISSING STEP + MISSING PAGEOBJECT METHOD: "When the customer applies a promo code" - → No matching method found in CheckoutPage (FEAT-003) - → Generated stub (with failure guard): + No matching method found in CheckoutPage (FEAT-003) + Generated stub (with failure guard): @when('the customer applies a promo code') def step_apply_promo_code(context): raise NotImplementedError( @@ -123,7 +138,7 @@ MISSING STEP + MISSING PAGEOBJECT METHOD: "CheckoutPage (FEAT-003) is missing an 'apply_promo_code' method. " "Run living-doc-pageobject-scan (Maintain mode) on FEAT-003 to add it." ) - → Action: invoke living-doc-pageobject-scan (Maintain mode) for the missing element + Action: invoke living-doc-pageobject-scan (Maintain mode) for the missing element ``` ### Step 4 — Validate AC coverage @@ -131,10 +146,10 @@ MISSING STEP + MISSING PAGEOBJECT METHOD: Every `Active` or `Implemented` AC must map to at least one scenario. The coverage report must list **every** AC on the User Story, including skipped ones. Use these skip reasons verbatim so the output is predictable and auditable: -- `Planned` → `skipped — not yet active` -- `Deprecated` → `skipped — deprecated AC` +- `Planned`: `skipped — not yet active` +- `Deprecated`: `skipped — deprecated AC` -Run `scripts/coverage_report.py ` for a full catalog report. +Run `scripts/coverage_report.py ` for a full coverage report. ``` AC COVERAGE REPORT — US-001 @@ -145,24 +160,41 @@ AC COVERAGE REPORT — US-001 AC:US-001-05 (Deprecated): ⏭ skipped — deprecated AC ``` -Use `scripts/coverage_report.py` to generate this report across the full catalog. +Use `scripts/coverage_report.py` to generate this report across all entities. ### Step 5 — Output artifacts -**`.feature` file** — one per User Story, named `us--.feature` in lowercase. When showing the generated output, include the filename in a comment or header inside the gherkin block: +**`.feature` file** — one per User Story, named `us--.feature` in lowercase. The file starts with a header block (matching the project's US feature file convention) and uses `@AC:` traceability tags above each scenario. When showing the generated output, include the filename as a comment: ```gherkin # us-001-place-an-online-order.feature + +# Source: https://github.com///issues/ + +# Business Value: +# - + +# Acceptance Criteria: +# +# AC:US-001-01 (v1.0.0 - Active) +# - Customer places an order with a saved payment method. +# +# AC:US-001-02 (v1.0.0 - Active) +# - Order is rejected when the payment card is declined. + +@US_ID:US-001 Feature: Place an online order As a registered customer I can place an order for in-stock items So that the items are delivered to my address - # AC: US-001-01 (v1.0.0 – Active) — customer places an order + # AC:US-001-01 (v1.0.0 - Active) — customer places an order with a saved payment method + @AC:US-001-01 Scenario: Customer successfully places an order ... - # AC: US-001-02 (v1.0.0 – Active) — Payment failure path + # AC:US-001-02 (v1.0.0 - Active) — order is rejected when the payment card is declined + @AC:US-001-02 Scenario: Order rejected when payment card is declined ... ``` @@ -194,6 +226,59 @@ Feature: Place an online order --- +## Functionality scenarios + +When the source is a Functionality (`FUNC-`) rather than a User Story, apply the same workflow but with these differences: + +| Aspect | User Story (E2E) | Functionality (system test) | +|---|---|---| +| AC ID format | `AC:US--` | `AC:FUNC--` | +| File location | `features/us/us--.feature` | `features/functionalities//func--.feature` | +| File header | `# Source:` (optional), `# Business Value:`, `# Acceptance Criteria:` block + `@US_ID:US-` tag | `# Source:` (optional), `# Rationale:` (optional), `# Acceptance Criteria:` block + `@FUNC_ID:FUNC-` tag | +| Feature block | `Feature: ` with As-a/I-can/so-that | `Feature: ` (no narrative) | +| Scope | End-to-end, from user's perspective | One atomic behavior, input to output contract | +| Language | Business domain language | Business domain language — same rule; no code calls, no selector references | + +**Functionality feature file example:** + +```gherkin +# func-001-validate-password-strength.feature + +# Source: https://github.com///issues/ ← optional + +# Rationale: +# - ← optional + +# Acceptance Criteria: +# +# AC:FUNC-001-01 (v1.0.0 - Active) +# - Returns valid=true when the password satisfies all complexity rules. +# +# AC:FUNC-001-02 (v1.0.0 - Active) +# - Raises INVALID_PASSWORD when the password is shorter than 8 characters. + +@FUNC_ID:FUNC-001 +Feature: Login Page — Validate Password Strength + + # AC:FUNC-001-01 (v1.0.0 - Active) — returns valid=true when password satisfies all complexity rules + @AC:FUNC-001-01 + Scenario: Password meets all complexity rules + Given a password with at least 8 characters, one uppercase, one lowercase, and one number + When password strength is validated + Then the result is valid + + # AC:FUNC-001-02 (v1.0.0 - Active) — raises INVALID_PASSWORD when password is shorter than 8 characters + @AC:FUNC-001-02 + Scenario: Password too short + Given a password with 7 or fewer characters + When password strength is validated + Then the result is invalid with code INVALID_PASSWORD +``` + +Functionality scenarios are **not** unit tests written in Gherkin. Steps must still describe observable business-facing input/output — never internal method calls, DB queries, or selector names. + +--- + ## Out-of-scope redirects | Request | Correct skill | diff --git a/skills/living-doc-scenario-creator/scripts/coverage_report.py b/skills/living-doc-scenario-creator/scripts/coverage_report.py index d0b328c..c9bcf36 100644 --- a/skills/living-doc-scenario-creator/scripts/coverage_report.py +++ b/skills/living-doc-scenario-creator/scripts/coverage_report.py @@ -5,9 +5,9 @@ Usage: python coverage_report.py -Scans recursively for '# AC: US--' traceability comments above -Scenario lines. Loads User Story JSON files from and produces a coverage -table showing which ACs are covered and which are gaps. +Scans recursively for '@AC:' Cucumber traceability tags on tag lines +above Scenario lines. Loads User Story JSON files from and produces a +coverage table showing which ACs are covered and which are gaps. Expected User Story JSON structure: { @@ -23,8 +23,8 @@ ] } -AC link comment format (written by living-doc-scenario-creator): - # AC: US-001-01 (v1.0.0 – Active) — description +@AC: tag format (written by living-doc-scenario-creator): + @AC:US-001-01 Scenario: ... Only ACs with state Active or Implemented are included in the coverage check. @@ -40,26 +40,40 @@ import sys from pathlib import Path -# Matches the AC ID in a traceability comment: # AC: US-001-01 ... -AC_TAG = re.compile(r"#\s*AC:\s*((?:US|FEAT|FUNC)-\d{3}-\d{2})", re.IGNORECASE) +# Matches an @AC: Cucumber tag with optional /param:value segments: +# @AC:US-1-01 or @AC:US-001-01/aspect:username-input +# Group 1 captures the AC ID only (params are ignored for coverage purposes). +AC_TAG = re.compile( + r"@AC:((?:US|FEAT|FUNC)-\d+-\d{2})(?:/[a-z][\w-]*:[^\s/@]+)*", + re.IGNORECASE, +) +TAG_LINE = re.compile(r"^\s*@\S+") SCENARIO_LINE = re.compile(r"^\s*(Scenario:|Scenario Outline:)\s*", re.IGNORECASE) ACTIVE_STATES = {"active", "implemented"} SKIP_STATES = {"deprecated", "planned"} +def get_ac_tags_above(lines: list[str], scenario_index: int) -> list[str]: + """Return all @AC: tag values from consecutive tag lines immediately above a scenario.""" + ac_ids: list[str] = [] + i = scenario_index - 1 + while i >= 0 and TAG_LINE.match(lines[i]): + for m in AC_TAG.finditer(lines[i]): + ac_ids.append(m.group(1).upper()) + i -= 1 + return ac_ids + + def collect_covered_ac_ids(features_dir: Path) -> dict[str, list[str]]: - """Return {ac_id_upper: [feature_filename, ...]} for every AC tag above a Scenario.""" + """Return {ac_id_upper: [feature_filename, ...]} for every @AC: tag above a Scenario.""" covered: dict[str, list[str]] = {} for feature_file in sorted(features_dir.rglob("*.feature")): lines = feature_file.read_text(encoding="utf-8").splitlines() for i, line in enumerate(lines): if not SCENARIO_LINE.match(line): continue - prev = lines[i - 1].strip() if i > 0 else "" - m = AC_TAG.search(prev) - if m: - ac_id = m.group(1).upper() + for ac_id in get_ac_tags_above(lines, i): covered.setdefault(ac_id, []).append(feature_file.name) return covered @@ -86,9 +100,9 @@ def load_user_stories(living_doc_dir: Path) -> list[dict]: def normalise_ac_id(us_id: str, raw_id: str) -> str: - """Normalise AC IDs that may be stored as '01' or 'US-001-01'.""" + """Normalise AC IDs that may be stored as '01' or 'US-001-01' or 'US-1-01'.""" raw = raw_id.strip().upper() - if re.match(r"^(US|FEAT|FUNC)-\d{3}-\d{2}$", raw): + if re.match(r"^(US|FEAT|FUNC)-\d+-\d{2}$", raw): return raw # Stored as just the suffix: '01' → 'US-001-01' if re.match(r"^\d{2}$", raw): diff --git a/skills/living-doc-update/SKILL.md b/skills/living-doc-update/SKILL.md index 164dda6..2b889dd 100644 --- a/skills/living-doc-update/SKILL.md +++ b/skills/living-doc-update/SKILL.md @@ -11,7 +11,7 @@ description: > "change status of user story". Does NOT trigger for: creating new entities from scratch (use living-doc-create-user-story, living-doc-create-feature, or living-doc-create-functionality), finding gaps - (use living-doc-gap-finder), generating scenarios (use living-doc-scenario-creator). + (use living-doc-gap-finder). license: Apache-2.0 compatibility: GitHub Copilot @@ -47,13 +47,12 @@ When adding a new AC to an existing User Story: - Happy path covered? - Error paths covered? - Alternative flows covered? -4. Check whether the new AC affects any existing Gherkin scenarios — flag for - `gherkin-living-doc-sync` if so +4. Confirm whether the new AC requires new or updated tests — flag for the appropriate testing + workflow if so When modifying an existing AC **keep the AC ID stable** — changing the ID breaks traceability -to linked tests and Gherkin scenarios. Only update the `description`, `given`, `when`, `then`, or -state fields. If the changed AC text affects the wording of linked Gherkin steps, flag the linked -scenarios for `gherkin-living-doc-sync`. +to linked tests. Only update the `description`, `given`, `when`, `then`, or +state fields. If the changed AC text affects linked tests, flag them for update. ## Promote a User Story from planned to active @@ -87,7 +86,7 @@ Rules: - Always deprecate — never delete entities (preserves audit trail) - Add `deprecated_code_commit` when the code was removed in a commit - Add `superseded_by` when a replacement entity exists -- Flag any Gherkin scenarios linked to the deprecated entity for `gherkin-living-doc-sync` +- Flag any tests linked to the deprecated entity for update or removal ## Update Feature ownership or dependencies @@ -100,7 +99,7 @@ When an AC is moved out of the current sprint but not permanently removed: - Set `status: descoped` and add `descoped_at` (date) and `descoped_reason` fields — **do not delete the AC** (preserves audit trail) - Add `future_release` field if the work is planned for a later sprint -- Flag any linked Gherkin scenarios for `@wip` or `@pending` tagging via `gherkin-living-doc-sync` +- Flag any linked tests for `@skip` or `@pending` tagging ``` AC:US-042-03 (v1.2.0 – descoped) @@ -118,7 +117,6 @@ AC:US-042-03 (v1.2.0 – descoped) | Create a new Feature | `living-doc-create-feature` | | Create a new Functionality | `living-doc-create-functionality` | | Find gaps in living documentation | `living-doc-gap-finder` | -| Generate Gherkin scenarios from a User Story | `living-doc-scenario-creator` | ## Script — `scripts/validate_entity.py` @@ -145,7 +143,7 @@ an ID format is wrong, or a status value is invalid. ## Output change summary After every update, emit a structured change record. For **modified AC text**, show the old and -new values clearly labelled, and list any linked Gherkin scenarios that need re-syncing: +new values clearly labelled, and list any linked tests that need updating: ``` LIVING DOC UPDATE — 2026-05-15 @@ -155,9 +153,8 @@ LIVING DOC UPDATE — 2026-05-15 ~ Modified AC AC:US-042-01: OLD: "Payment must complete within 3 seconds under normal load (p99 SLA)" NEW: "Payment must complete within 2 seconds under normal load (p99 SLA)" - Linked Gherkin scenarios requiring re-sync: - → checkout.feature:41 — Scenario: Payment completes within SLA + Linked tests requiring update: + checkout.feature:41 — Scenario: Payment completes within SLA Downstream flags: - → Run gherkin-living-doc-sync: changed AC text affects linked scenario wording - → Run living-doc-gap-finder to confirm coverage after update + Run living-doc-gap-finder to confirm coverage after update ``` diff --git a/skills/references/living-doc-glossary.md b/skills/references/living-doc-glossary.md index da8131a..ac6d68b 100644 --- a/skills/references/living-doc-glossary.md +++ b/skills/references/living-doc-glossary.md @@ -27,6 +27,55 @@ so that . - `deprecation_reason` — why it was deprecated - `superseded_by` — ID of the replacement entity (optional) +**US feature file header format** (as used in `features/us/us--.feature`): + +The header comment block at the top of a US feature file holds all US metadata. This data is +collected during the living documentation output generation process (data mining). + +```gherkin +# Source: https://github.com///issues/ + +# Business Value: +# - + +# Not in scope: ← optional +# - + +# Preconditions: ← optional +# - + +# Acceptance Criteria: +# +# AC:US--01 (v - ) +# - +# - : , ← optional; used for {placeholder} ACs +# +# AC:US--02 (v - ) +# - + +@US_ID:US- +Feature: + As a , I can , so that . + + Background: ← optional + Given + + # AC:US--01 (v - ) — + @AC:US--01 + Scenario: + ... +``` + +**Header sections:** +| Section | Required | Purpose | +|---|---|---| +| `# Source:` | Optional | Link to the original issue tracker entry or the pre-BDD living doc location — primarily useful during migration from a legacy format | +| `# Business Value:` | Yes | Why this User Story exists (bullets) | +| `# Not in scope:` | Optional | Explicit exclusions | +| `# Preconditions:` | Optional | System-level state required before test execution | +| `# Acceptance Criteria:` | Yes | Full AC listing with IDs, versions, and states | +| `@US_ID:US-` tag | Yes | Machine-parseable User Story ID (feature-level tag) | + ### Feature A named system surface — the structural layer between User Stories and atomic behaviors. @@ -59,7 +108,10 @@ An atomic, fast-testable behavior — a single verb phrase describing one respon - ID format: `FUNC-` (e.g. `FUNC-001`) - Name: `` (e.g. "Login Page – Validate Password Strength") - Belongs to: one parent **Feature** -- Owns: **Functionality-level Acceptance Criteria** (atomic input → output statements) +- Owns: **Functionality-level Acceptance Criteria** (atomic input to output statements) +- Test anchor: a **Functionality feature file** (`func-.feature`) under + `features/functionalities//` — one file per Functionality, containing + all AC-linked system-test scenarios once implemented. - Status: `planned | active | deprecated` - Deprecation metadata (set when `status: deprecated`): - `deprecated_at` — date the entity was deprecated @@ -69,6 +121,40 @@ An atomic, fast-testable behavior — a single verb phrase describing one respon Functionalities differ from User Story ACs: they are atomic and fast-testable, not end-to-end. A single User Story may trigger multiple Functionalities. +**Functionality feature file header format** (draft — exact spec TBD, follows US conventions): + +```gherkin +# Source: https://github.com///issues/ + +# Rationale: ← optional (replaces Business Value for atomic behaviors) +# - + +# Not in scope: ← optional +# - + +# Acceptance Criteria: +# +# AC:FUNC--01 (v - ) +# - +# +# AC:FUNC--02 (v - ) +# - + +@FUNC_ID:FUNC- +Feature: + # No scenarios yet — uncovered ACs flagged by coverage_report.py. + # When adding scenarios: include both # AC: comment and @AC:[/param:value] tag above each Scenario. +``` + +**Header sections:** +| Section | Required | Purpose | +|---|---|---| +| `# Source:` | Optional | Link to the original issue tracker entry or the pre-BDD living doc location — primarily useful during migration from a legacy format | +| `# Rationale:` | Optional | Why this atomic behavior exists | +| `# Not in scope:` | Optional | Explicit exclusions | +| `# Acceptance Criteria:` | Yes | Full AC listing with IDs, versions, and states | +| `@FUNC_ID:FUNC-` tag | Yes | Machine-parseable Functionality ID (feature-level tag) | + ### Acceptance Criterion (AC) A binary pass/fail statement that defines a verifiable condition. @@ -78,17 +164,65 @@ Each AC is: - **Binary** — clear pass/fail; no "usually" or "typically" - **Single placeholder** — at most ONE `{placeholder}` per AC statement. If two aspects vary independently, write a separate AC for each. -**AC identifier and state format:** +**AC identifier and state format** (in file header and entity files): ``` -AC:- (v) - – - – : value1, value2, ... - – Rationale: ← optional +AC:- (v - ) + - + - : value1, value2, ... + - Rationale: ← optional ``` State values: `Planned | Implemented | Active | Deprecated` +**Scenario traceability:** living-doc scenarios (US and Functionality feature files) carry two +complementary annotations — a human-readable `# AC:` comment and a machine-readable `@AC:` tag: + +```gherkin +# AC:US-1-01 (v1.0.0 - Active) — customer places an order with a saved payment method +@AC:US-1-01 +Scenario: Customer successfully places an order + ... +``` + +When a scenario covers only **one aspect** of a multi-aspect AC, encode the aspect directly in +the `@AC:` tag using the `/param:value` param syntax, and mirror it in the comment: + +```gherkin +# AC:US-1-01 (v1.0.0 - Active) — displays {required field} on login screen | aspect: username input +@AC:US-1-01/aspect:username-input +Scenario: Login form shows the username input field + ... +``` + +Multiple ACs — one comment + tag pair per AC: + +```gherkin +# AC:US-1-01 (v1.0.0 - Active) — invalid credentials show an error message +# AC:US-1-02 (v1.0.0 - Active) — account lockout after 3 failed attempts +@AC:US-1-01 +@AC:US-1-02 +@Regression +Scenario: User is locked out after repeated failed logins + ... +``` + +**Tag format:** `@AC:[/param:value...]` + +| Param | Purpose | Example | +|---|---|---| +| `/aspect:` | Names the specific aspect of the AC this scenario covers | `@AC:US-1-01/aspect:username-input` | + +Additional `/param:value` segments can be appended as needed — the format is open for extension. + +- The `# AC:` comment is human-readable context: AC ID, version, state, description, optional aspect. +- The `@AC:` Cucumber tag is machine-readable: drives script scanning, coverage reports, and sync checks. +- US scenarios: `@AC:US--` (e.g. `@AC:US-1-01`) +- Functionality scenarios: `@AC:FUNC--` (e.g. `@AC:FUNC-001-01`) +- Both annotations are required for living-doc feature files (`features/us/` and `features/functionalities/`). +- Feature files outside the living-doc directories (smoke tests, regression suites, exploratory probes) + do not require `@AC:` tags. + Deprecated ACs include a removal note: ``` @@ -105,35 +239,35 @@ AC:- (v – Planned) – future_release: ← optional; target sprint or release ``` -**User Story AC examples — end-to-end, written from the user's perspective:** +**User Story AC examples** (in the `# Acceptance Criteria:` file header block): ``` -AC:US-001-01 (v1.0.0 – Active) - – The login screen displays {required field}. - – Required field: username input, password input, login button - – Rationale: Accessibility standard — all interactive controls must be visible on load. +AC:US-001-01 (v1.0.0 - Active) + - The login screen displays {required field}. + - Required field: username input, password input, login button + - Rationale: Accessibility standard — all interactive controls must be visible on load. -AC:US-001-02 (v1.1.0 – Active) - – An inline field validation message is shown when invalid credentials are submitted. +AC:US-001-02 (v1.1.0 - Active) + - An inline field validation message is shown when invalid credentials are submitted. -AC:US-001-03 (v2.1.0 – Deprecated – removal planned v3.0.0) - – A "Remember me" checkbox retains the session across browser restarts. - – Rationale: Deprecated due to security policy change in v2.0 — persistent sessions no longer permitted. +AC:US-001-03 (v2.1.0 - Deprecated - removal planned v3.0.0) + - A "Remember me" checkbox retains the session across browser restarts. + - Rationale: Deprecated due to security policy change in v2.0 — persistent sessions no longer permitted. ``` -**Functionality AC examples — atomic input → output:** +**Functionality AC examples** (in the `# Acceptance Criteria:` file header block): ``` -AC:FUNC-001-01 (v1.0.0 – Active) - – Returns valid=true when the password satisfies all complexity rules. +AC:FUNC-001-01 (v1.0.0 - Active) + - Returns valid=true when the password satisfies all complexity rules. -AC:FUNC-001-02 (v1.0.0 – Active) - – Raises {error code} when the credential check fails. - – Error code: INVALID_PASSWORD, USER_NOT_FOUND, ACCOUNT_LOCKED - – Rationale: Distinct error codes per failure reason, required by the global auth error contract. +AC:FUNC-001-02 (v1.0.0 - Active) + - Raises {error code} when the credential check fails. + - Error code: INVALID_PASSWORD, USER_NOT_FOUND, ACCOUNT_LOCKED + - Rationale: Distinct error codes per failure reason, required by the global auth error contract. -AC:FUNC-001-03 (v1.0.0 – Active) - – Rejects passwords shorter than 8 characters. +AC:FUNC-001-03 (v1.0.0 - Active) + - Rejects passwords shorter than 8 characters. ``` --- @@ -142,15 +276,21 @@ AC:FUNC-001-03 (v1.0.0 – Active) ``` User Story (US) - └── links to → Feature (FEAT) - └── owns → Functionality (FUNC) - └── owns → Functionality ACs - └── can map to → unit/integration tests - └── owns → User Story ACs - └── can map to → BDD Scenarios (.feature files) - └── implemented by → Step Definitions - └── delegates to → Feature test abstractions - └── can map to → API coverage / contract tests + └── links to: Feature (FEAT) + └── owns: Functionality (FUNC) + └── owns: Functionality ACs + └── maps to: Functionality feature file (system test) + | features/functionalities//.feature + | @FUNC_ID tag + @AC:FUNC-nnn-nn tagged scenarios + | └── implemented by: Step Definitions + └── can map to: unit/integration tests + └── owns: User Story ACs (in # Acceptance Criteria: header block) + └── linked via: @AC:US-n-nn tags on Scenarios + └── can map to: E2E BDD Scenarios (features/us/*.feature) + @US_ID tag + @AC:US-n-nn tagged scenarios + └── implemented by: Step Definitions + └── delegates to: PageObjects + └── can map to: API coverage / contract tests ``` --- @@ -161,8 +301,8 @@ User Story (US) |---|---|---| | `living-doc-create-user-story` | User Story entity | Feature entities | | `living-doc-create-feature` | Feature entity | User Story entities | -| `living-doc-create-functionality` | Functionality entity | Feature entity | -| `living-doc-pageobject-scan` | Surface wrapper classes + Feature stubs | App URL or test suite | -| `living-doc-scenario-creator` | BDD scenario files (.feature) | User Story entities, Feature test abstractions | +| `living-doc-create-functionality` | Functionality entity + Functionality feature file stub | Feature entity | +| `living-doc-pageobject-scan` | PageObject files + Functionality feature file stubs | App URL or test suite | +| `living-doc-scenario-creator` | E2E BDD scenario files (US) + Functionality feature files (FUNC) | US / FUNC entities, PageObjects | | `living-doc-tutorial-creator` | Tutorial documents | BDD scenario files, User Story entities | | `living-doc-gap-finder` | Gap report | All of the above | From 337bcfdad0016d242e059695705e7aa265f7535a Mon Sep 17 00:00:00 2001 From: miroslavpojer Date: Wed, 27 May 2026 15:22:56 +0200 Subject: [PATCH 22/35] Updated evals and finished round 1 of trigger evals testing. --- .../evals/living-doc-bdd-copilot/evals.json | 68 ++++++ .../living-doc-bdd-copilot/fixture-map.md | 33 +++ .../living-doc-bdd-copilot/trigger-eval.json | 8 +- .../evals/living-doc-copilot/evals.json | 67 +++++- .../evals/living-doc-copilot/fixture-map.md | 33 +++ .../living-doc-copilot/trigger-eval.json | 8 +- skills/gherkin-living-doc-sync/SKILL.md | 17 +- .../gherkin-living-doc-sync/evals/evals.json | 66 ++++++ .../evals/fixture-map.md | 33 +++ .../evals/trigger-eval.json | 6 +- skills/gherkin-scenario/SKILL.md | 10 +- .../gherkin-scenario/evals/trigger-eval.json | 5 +- skills/gherkin-step/evals/trigger-eval.json | 2 +- skills/living-doc-create-feature/SKILL.md | 17 +- .../evals/evals.json | 68 +++++- .../evals/fixture-map.md | 17 +- .../evals/trigger-eval.json | 6 +- .../living-doc-create-functionality/SKILL.md | 7 +- .../evals/evals.json | 54 +++++ .../evals/fixture-map.md | 11 +- .../evals/trigger-eval.json | 6 +- skills/living-doc-create-user-story/SKILL.md | 21 +- .../evals/evals.json | 56 ++++- .../evals/files/incomplete-user-story.json | 2 +- .../evals/fixture-map.md | 11 +- .../evals/trigger-eval.json | 6 +- skills/living-doc-gap-finder/SKILL.md | 14 +- skills/living-doc-gap-finder/evals/evals.json | 63 ++++- .../evals/fixture-map.md | 21 +- .../evals/trigger-eval.json | 8 +- skills/living-doc-impact-analysis/SKILL.md | 3 +- .../evals/evals.json | 194 ++++++++------- .../evals/fixture-map.md | 26 +- .../evals/trigger-eval.json | 24 ++ skills/living-doc-pageobject-scan/SKILL.md | 10 +- .../evals/evals.json | 45 ++++ .../evals/fixture-map.md | 32 +++ .../evals/trigger-eval.json | 6 +- skills/living-doc-scenario-creator/SKILL.md | 11 +- .../evals/evals.json | 43 +++- .../evals/fixture-map.md | 32 +++ .../evals/trigger-eval.json | 8 +- skills/living-doc-update/SKILL.md | 18 +- skills/living-doc-update/evals/evals.json | 223 +++++++++++------- skills/living-doc-update/evals/fixture-map.md | 23 +- .../living-doc-update/evals/trigger-eval.json | 24 ++ 46 files changed, 1182 insertions(+), 284 deletions(-) create mode 100644 .github/agents/evals/living-doc-bdd-copilot/fixture-map.md create mode 100644 .github/agents/evals/living-doc-copilot/fixture-map.md create mode 100644 skills/gherkin-living-doc-sync/evals/fixture-map.md create mode 100644 skills/living-doc-pageobject-scan/evals/fixture-map.md create mode 100644 skills/living-doc-scenario-creator/evals/fixture-map.md diff --git a/.github/agents/evals/living-doc-bdd-copilot/evals.json b/.github/agents/evals/living-doc-bdd-copilot/evals.json index bb0d50c..cada06e 100644 --- a/.github/agents/evals/living-doc-bdd-copilot/evals.json +++ b/.github/agents/evals/living-doc-bdd-copilot/evals.json @@ -114,6 +114,74 @@ "Updates manifest.json to remove deprecated entries", "Flags catalog entities for deprecation and hands off to @living-doc-copilot" ] + }, + { + "id": 9, + "category": "regression", + "prompt": "The seed.yaml is present at .copilot/bdd/seed.yaml but manifest.json is missing. What should the agent do?", + "expected_output": "This is a first exploration run — partial state rule applies. seed.yaml present but manifest.json absent means the session starts fresh: treat this as the first run with a clean slate. Begin the crawl from base_url in seed.yaml; do not assume any surfaces have been discovered. Create manifest.json as crawl progresses. Do not attempt to 'resume' from a non-existent manifest.", + "files": [], + "expectations": [ + "Identifies partial state: seed.yaml present, manifest.json absent", + "Treats this as a first exploration run — clean slate", + "Starts crawl from base_url in seed.yaml", + "Creates manifest.json during crawl — does not assume prior discovery", + "Does not attempt to resume from a non-existent manifest" + ] + }, + { + "id": 10, + "category": "regression", + "prompt": "The seed.yaml contains: 'credentials: {username: admin, password: secret123}'. Is this correct?", + "expected_output": "No — literal credentials must never be stored in seed.yaml. This is a security violation. Replace the literal values with environment variable references: 'credentials: {username: env:BDD_USERNAME, password: env:BDD_PASSWORD}'. Literal credentials in a file committed to source control are exposed to anyone with repository access. The agent must flag this immediately and refuse to proceed with the current seed.yaml until the credentials are replaced with env:VAR_NAME references.", + "files": [], + "expectations": [ + "Flags literal credentials as a security violation", + "Refuses to proceed with literal credentials in seed.yaml", + "Provides the corrected env:VAR_NAME format", + "Explains the risk: credentials exposed to anyone with repository access" + ] + }, + { + "id": 11, + "category": "edge-case", + "prompt": "During crawling, the agent reaches a multi-step checkout wizard (/checkout/step-2) but doesn't know what values to enter in the required 'Delivery Zone Code' field. How should this be handled?", + "expected_output": "The agent enters the Source E guided traversal protocol. It takes a screenshot and shows the user what it sees. It asks: 'I've reached a decision point at /checkout/step-2. What should I do next? Please provide the valid Delivery Zone Code value (or another way to progress past this step).' After the user provides the value, the agent executes the action via MCP Playwright and immediately appends the action to guided_steps in seed.yaml so the route can be re-navigated in future sessions without prompting.", + "files": [], + "expectations": [ + "Takes a screenshot and shows the user the current state", + "Asks the user for the missing business-specific input value", + "Executes the guided action via MCP Playwright after receiving the answer", + "Appends the action to guided_steps in seed.yaml for future sessions", + "Does not invent or guess business-specific field values" + ] + }, + { + "id": 12, + "category": "output-format", + "prompt": "After scanning /login and generating a LoginPage, show me what the manifest.json entry for /login looks like.", + "expected_output": "The manifest.json entry for /login includes: pageobject_path (path to the generated LoginPage file), feature_id (FEAT- or FEAT-UNKNOWN if unlinked), last_scanned (ISO timestamp), elements (list of discovered elements with data_cy and tag), coverage_gaps (empty list initially), and navigation_context with prerequisites, navigation_steps, data_requirements, auth_role, and notes. The feature_id is FEAT-UNKNOWN if no matching Feature entity exists in the living doc — flag this route as 'needs Feature entity' and hand off to @living-doc-copilot.", + "files": [], + "expectations": [ + "manifest.json entry has: pageobject_path, feature_id, last_scanned, elements, coverage_gaps, navigation_context", + "feature_id is FEAT-UNKNOWN if no matching Feature entity exists", + "last_scanned is an ISO timestamp", + "navigation_context includes prerequisites, navigation_steps, auth_role", + "Missing Feature entity is flagged for @living-doc-copilot handoff" + ] + }, + { + "id": 13, + "category": "regression", + "prompt": "During seed assembly, the living doc catalog at docs/living-doc/ has FEAT-checkout mapped to route /checkout and FEAT-account mapped to route /account/orders. No sitemap.xml exists. How should known_routes in seed.yaml be populated?", + "expected_output": "Agent loads Source A (living documentation). Extracts Feature-to-route mappings: FEAT-checkout \u2192 /checkout and FEAT-account \u2192 /account/orders. Adds both to known_routes in seed.yaml. Notes that Source B (sitemap.xml) is absent \u2014 no error is raised. Routes not listed in the living doc will be discovered dynamically during the crawl.", + "files": [], + "expectations": [ + "Source A: Feature-to-route mappings extracted from the living doc catalog", + "Both routes added to known_routes in seed.yaml", + "Source B (sitemap) noted as absent \u2014 no error raised", + "Notes that unlisted routes will be discovered dynamically during crawl" + ] } ] } diff --git a/.github/agents/evals/living-doc-bdd-copilot/fixture-map.md b/.github/agents/evals/living-doc-bdd-copilot/fixture-map.md new file mode 100644 index 0000000..9573e21 --- /dev/null +++ b/.github/agents/evals/living-doc-bdd-copilot/fixture-map.md @@ -0,0 +1,33 @@ +# Fixture Map — living-doc-bdd-copilot agent evals + +## Eval coverage summary + +| Eval ID | Category | Description | Fixture files | +|---------|----------|-------------|---------------| +| 1 | happy-path | Business Seed assembly — seed.yaml structure | — | +| 2 | happy-path | Create mode: PageObject generation from crawled surface | — | +| 3 | happy-path | Scenario generation from US ACs | — | +| 4 | regression | RE-SCAN mode — selector drift detection and repair | — | +| 5 | regression | HEALING mode — broken step definitions | — | +| 6 | negative | User Story creation request → route to @living-doc-copilot | — | +| 7 | paraphrase | "fix failing tests" → HEALING mode trigger | — | +| 8 | regression | REMOVE mode — full feature removal with pre-deletion checklist | — | +| 9 | regression | Partial state rule: seed.yaml present, manifest.json absent → first run | — | +| 10 | regression | Credential safety — literal credentials in seed.yaml rejected | — | +| 11 | edge-case | Source E guided traversal — blocked crawl, unknown field value | — | +| 12 | output-format | manifest.json entry structure for a scanned route | — | + +## Trigger eval summary + +| Count | Triggers (should_trigger=true) | Non-triggers (should_trigger=false) | +|-------|-------------------------------|--------------------------------------| +| 24 total | 20 true | 4 false | + +False cases: +- `create a User Story` → @living-doc-copilot +- `write a unit test` → @sdet-copilot +- `update AC state` → @living-doc-copilot +- `TypeScript quality gate` → @quality-gate-copilot (out of scope) +- `update AC on US-007` → @living-doc-copilot + +> No fixture files — all evals use inline prompt/expected_output; agent behavior is assessed against the agent.md operating rules and skill definitions. diff --git a/.github/agents/evals/living-doc-bdd-copilot/trigger-eval.json b/.github/agents/evals/living-doc-bdd-copilot/trigger-eval.json index 8369a23..7b54d2b 100644 --- a/.github/agents/evals/living-doc-bdd-copilot/trigger-eval.json +++ b/.github/agents/evals/living-doc-bdd-copilot/trigger-eval.json @@ -16,7 +16,11 @@ {"id": 15, "query": "Write the step definitions for the checkout scenarios", "should_trigger": true, "reason": "'step definitions' trigger phrase"}, {"id": 16, "query": "Generate Gherkin from user story US-003", "should_trigger": true, "reason": "'gherkin from user story' trigger phrase"}, {"id": 17, "query": "Create a User Story for the loyalty points redemption feature", "should_trigger": false, "reason": "Living doc catalog entity creation — routes to @living-doc-copilot"}, - {"id": 18, "query": "Write a unit test for the discount calculation function", "should_trigger": false, "reason": "Unit test authoring — routes to @sdet-copilot"}, + {"id": 18, "query": "Write a unit test for the discount calculation function", "should_trigger": false, "reason": "Unit test authoring — out of scope for this toolkit (no @sdet-copilot agent defined)"}, {"id": 19, "query": "Update the AC state on US-007-02 to DEPRECATED", "should_trigger": false, "reason": "Catalog entity state update — routes to @living-doc-copilot"}, - {"id": 20, "query": "Run the TypeScript quality gate for the frontend", "should_trigger": false, "reason": "Quality gate execution — routes to @quality-gate-copilot"} + {"id": 20, "query": "Run the TypeScript quality gate for the frontend", "should_trigger": false, "reason": "Quality gate execution — out of scope for this agent"}, + {"id": 21, "query": "The manifest.json is missing — start a first exploration run from the seed file", "should_trigger": true, "reason": "Partial state: seed present, manifest absent → first exploration run — 'scan webapp' pattern"}, + {"id": 22, "query": "The seed.yaml has literal credentials — is that correct?", "should_trigger": true, "reason": "Credential safety rule enforcement during seed assembly — BDD session setup task"}, + {"id": 23, "query": "I've hit a guided traversal point — the checkout wizard needs a delivery zone code", "should_trigger": true, "reason": "Source E guided traversal protocol — blocked crawl point during exploration"}, + {"id": 24, "query": "Update the AC on US-007 to change the payment timeout to 30 seconds", "should_trigger": false, "reason": "AC update is a catalog layer operation — routes to @living-doc-copilot"} ] diff --git a/.github/agents/evals/living-doc-copilot/evals.json b/.github/agents/evals/living-doc-copilot/evals.json index 16ba395..bc0fc62 100644 --- a/.github/agents/evals/living-doc-copilot/evals.json +++ b/.github/agents/evals/living-doc-copilot/evals.json @@ -18,7 +18,7 @@ "id": 2, "category": "happy-path", "prompt": "Create a User Story for the promo code feature. ACs: (1) Valid promo reduces cart total by 10%. (2) Expired promo shows an error message.", - "expected_output": "Agent creates a User Story entity with the As-a/I-can/so-that narrative. Each AC carries all required metadata: state (ACTIVE or PLANNED), version, pre-conditions, and not_in_scope. ACs are atomic — one input condition and one observable outcome each. AC IDs follow the AC:- format. The entity is written in the project's confirmed Storage Profile format.", + "expected_output": "Agent creates a User Story entity with the As-a/I-can/so-that narrative. Each AC carries all required metadata: state (PLANNED / ACTIVE / DEPRECATED / IN_REVIEW), version, pre-conditions, and not_in_scope. ACs are atomic — one input condition and one observable outcome each. AC IDs follow the AC:- format. The entity is written in the project's confirmed Storage Profile format.", "files": [], "expectations": [ "User Story has an As-a/I-can/so-that narrative", @@ -108,6 +108,71 @@ "Flags linked Gherkin scenarios as potentially stale — defers to @living-doc-bdd-copilot", "Shows old and new AC side by side for confirmation before writing" ] + }, + { + "id": 9, + "category": "regression", + "prompt": "We confirmed the storage profile 5 messages ago — the living doc uses YAML files in docs/living-doc/. Now create a Feature entity for the Payment API without asking about storage again.", + "expected_output": "Agent does NOT re-ask for the storage profile — the confirmed format from earlier in the session is reused. Creates the Feature entity in the already-confirmed YAML format without prompting for storage details. Proceeds directly to eliciting the Feature metadata (surface type, purpose, owners, dependencies).", + "files": [], + "expectations": [ + "Does not re-ask for the storage profile within the same session", + "Reuses the confirmed Storage Profile from earlier in the session", + "Creates the entity in the correct confirmed format", + "Proceeds directly to eliciting the missing Feature metadata" + ] + }, + { + "id": 10, + "category": "regression", + "prompt": "I'm creating AC:US-007-02 for the promo code user story. It just says: 'Expired promo codes are rejected.' Is that enough?", + "expected_output": "No. The AC is missing required metadata fields. Every AC must include: state (PLANNED / ACTIVE / DEPRECATED / IN_REVIEW), version (e.g. v1.0.0), pre-conditions (what must hold before this AC can be tested), and not_in_scope (explicit exclusions). Also, the AC text is incomplete — it does not specify the observable outcome (e.g. what error message the customer sees). Complete the AC before creating: state=PLANNED, version=v1.0.0, pre-conditions (customer is on checkout page with items in cart), not_in_scope (does not cover code-reuse attacks).", + "files": [], + "expectations": [ + "Flags missing required AC metadata: state, version, pre-conditions, not_in_scope", + "Flags the AC text as incomplete — no observable outcome specified", + "Provides a complete AC example with all required fields", + "Does not create the entity until all required fields are present" + ] + }, + { + "id": 11, + "category": "negative", + "prompt": "Scan the checkout page to discover its UI elements and generate a PageObject.", + "expected_output": "UI exploration and PageObject generation is out of scope for this agent — hands off to @living-doc-bdd-copilot. @living-doc-copilot owns the catalog layer; @living-doc-bdd-copilot uses MCP Playwright to crawl, discover elements, and generate PageObjects.", + "files": [], + "expectations": [ + "Does not scan the webapp or generate PageObjects", + "Routes to @living-doc-bdd-copilot", + "Explains the catalog vs automation layer boundary" + ] + }, + { + "id": 12, + "category": "edge-case", + "prompt": "HEALING mode — I ran a gap analysis and found FUNC-promo-validate has no parent Feature. What should this agent do?", + "expected_output": "Agent in HEALING mode identifies FUNC-promo-validate as an ORPHAN_FUNCTIONALITY. It searches the catalog for the most plausible owning Feature (e.g. FEAT-promotions) based on the Functionality name and description. Presents the proposed Feature link to the user for confirmation. After confirmation, updates the Functionality entity to set the parent Feature. Does NOT create a new Feature entity without user confirmation.", + "files": [], + "expectations": [ + "Identifies ORPHAN_FUNCTIONALITY in HEALING mode", + "Searches for the most plausible owning Feature", + "Presents proposed Feature link for user confirmation", + "Updates the Functionality entity only after confirmation", + "Does not auto-create a new Feature entity" + ] + }, + { + "id": 13, + "category": "happy-path", + "prompt": "@living-doc-bdd-copilot just completed a Phase 1 scan and found 3 new surfaces: /checkout, /account/profile, and /reports. None of them have Feature entities in the catalog. What should this agent do?", + "expected_output": "Agent loads the surface list from the inbound handoff. For each surface it invokes living-doc-create-feature, identifies surface_type as UI for all three, and drafts a candidate Feature entity (FEAT-checkout, FEAT-account-profile, FEAT-reports) for confirmation before persisting. Does not re-ask for the storage profile if it is already confirmed in the session. After confirmation sends the completion handoff: 'Feature entities are ready. Call @living-doc-bdd-copilot to generate scenarios.'", + "files": [], + "expectations": [ + "Processes all three surfaces from the inbound handoff", + "Creates a Feature entity draft per surface using living-doc-create-feature", + "Does not re-ask for storage profile if already confirmed in session", + "Sends completion handoff message back to @living-doc-bdd-copilot" + ] } ] } diff --git a/.github/agents/evals/living-doc-copilot/fixture-map.md b/.github/agents/evals/living-doc-copilot/fixture-map.md new file mode 100644 index 0000000..a279cf3 --- /dev/null +++ b/.github/agents/evals/living-doc-copilot/fixture-map.md @@ -0,0 +1,33 @@ +# Fixture Map — living-doc-copilot agent evals + +## Eval coverage summary + +| Eval ID | Category | Description | Fixture files | +|---------|----------|-------------|---------------| +| 1 | happy-path | Storage Profile elicitation on session start | — | +| 2 | happy-path | Create User Story with full AC metadata fields | — | +| 3 | happy-path | PLAN mode — draft ACs from PO description in PLANNED state | — | +| 4 | happy-path | Impact analysis: code change → impact map | — | +| 5 | regression | HEALING mode — stale Functionality deprecation | — | +| 6 | negative | Gherkin scenario request → route to @living-doc-bdd-copilot | — | +| 7 | paraphrase | "document a behavior" → create Functionality entity | — | +| 8 | regression | Updating ACTIVE AC bumps version, preserves ID, flags Gherkin stale | — | +| 9 | regression | Storage Profile reuse — does NOT re-ask within same session | — | +| 10 | regression | AC completeness check — missing state/version/pre-conditions/not_in_scope | — | +| 11 | negative | Webapp scan/PageObject request → route to @living-doc-bdd-copilot | — | +| 12 | edge-case | HEALING mode — ORPHAN_FUNCTIONALITY repair with Feature link proposal | — | + +## Trigger eval summary + +| Count | Triggers (should_trigger=true) | Non-triggers (should_trigger=false) | +|-------|-------------------------------|--------------------------------------| +| 20 total | 15 true | 5 false | + +False cases: +- `scan webapp / generate pageobjects` → @living-doc-bdd-copilot +- `generate BDD scenarios` → @living-doc-bdd-copilot +- `write a unit test` → @sdet-copilot +- `fix failing BDD tests` → @living-doc-bdd-copilot +- `crawl the app and create PageObjects` → @living-doc-bdd-copilot + +> No fixture files — all evals use inline prompt/expected_output; agent behavior is assessed against the agent.md operating rules. diff --git a/.github/agents/evals/living-doc-copilot/trigger-eval.json b/.github/agents/evals/living-doc-copilot/trigger-eval.json index 434c08f..468a0ad 100644 --- a/.github/agents/evals/living-doc-copilot/trigger-eval.json +++ b/.github/agents/evals/living-doc-copilot/trigger-eval.json @@ -13,6 +13,10 @@ {"id": 12, "query": "Update the feature registry to include the new reporting module", "should_trigger": true, "reason": "'update feature registry' trigger phrase"}, {"id": 13, "query": "Scan the webapp and generate PageObjects for the checkout screen", "should_trigger": false, "reason": "UI crawl and PageObject generation — routes to @living-doc-bdd-copilot"}, {"id": 14, "query": "Generate Gherkin scenarios for US-007", "should_trigger": false, "reason": "Scenario generation — routes to @living-doc-bdd-copilot"}, - {"id": 15, "query": "Write a unit test for the discount calculation function", "should_trigger": false, "reason": "Test code authoring — routes to @sdet-copilot"}, - {"id": 16, "query": "Fix the failing BDD tests after the UI redesign", "should_trigger": false, "reason": "PageObject and step definition repair — routes to @living-doc-bdd-copilot"} + {"id": 15, "query": "Write a unit test for the discount calculation function", "should_trigger": false, "reason": "Unit test authoring — out of scope for this toolkit (no @sdet-copilot agent defined)"}, + {"id": 16, "query": "Fix the failing BDD tests after the UI redesign", "should_trigger": false, "reason": "PageObject and step definition repair — routes to @living-doc-bdd-copilot"}, + {"id": 17, "query": "My AC is missing the pre-conditions field — can you add it?", "should_trigger": true, "reason": "Updating AC metadata is a living-doc-update / catalog layer task — routes to this agent"}, + {"id": 18, "query": "Enter HEALING mode and fix the stale Functionality entities", "should_trigger": true, "reason": "HEALING mode catalog layer — explicit trigger phrase"}, + {"id": 19, "query": "Check whether US-007 has all required AC fields before we mark it active", "should_trigger": true, "reason": "Reviewing AC completeness and US promotion readiness is a catalog layer task"}, + {"id": 20, "query": "Crawl the app and create PageObjects for all screens", "should_trigger": false, "reason": "Webapp crawl and PageObject generation — routes to @living-doc-bdd-copilot"} ] diff --git a/skills/gherkin-living-doc-sync/SKILL.md b/skills/gherkin-living-doc-sync/SKILL.md index abd739b..5f7231a 100644 --- a/skills/gherkin-living-doc-sync/SKILL.md +++ b/skills/gherkin-living-doc-sync/SKILL.md @@ -1,15 +1,15 @@ --- name: gherkin-living-doc-sync description: > - Synchronise Gherkin feature files and BDD scenarios with the living documentation. - Activate when scenarios diverge from User Story ACs, when step text drifts after a UI - refactor, when `@AC:` tag annotations are missing or stale, or when propagating AC - changes from the living documentation back to feature files. Distinct from gap-finder - (which detects missing coverage) — corrects existing links. + Synchronise Gherkin feature files and BDD scenarios with the living documentation catalog. + Activate when scenarios diverge from User Story ACs, step text drifts after a refactor, + `@AC:` tag or `# AC:` comment annotations are missing or stale, descoped ACs need their + linked scenarios updated, or AC changes must propagate from the living doc back to feature + files. Run scan_ac_links.py to audit AC link health before a sync pass. + Distinct from gap-finder (which detects missing coverage) — corrects existing links. Triggers on: "sync gherkin to living doc", "feature file out of sync", "scenario not linked - to AC", "step text changed", "gherkin drift", "update living doc after BDD change", - "BDD sync", "AC link missing in feature file", "sync scenarios", - "gherkin out of sync with living doc", "traceability broken", "propagate AC changes". + to AC", "step text changed", "gherkin drift", "BDD sync", "AC link missing in feature file", + "sync scenarios", "traceability broken", "propagate AC changes", "AC was descoped". Does NOT trigger for: writing new scenarios (use gherkin-scenario), implementing step definitions (use gherkin-step), finding living doc gaps (use living-doc-gap-finder), creating new US/Feature entities (use living-doc-create-user-story). @@ -109,6 +109,7 @@ Apply the minimum necessary change per action: - **Update scenario to match revised AC**: update step text; keep the `@AC:` tag unchanged - **Fix broken step text**: prefer updating the `.feature` file to match the existing step definition and PageObject method; only update the step definition regex when the business wording genuinely changed - **Mark deprecated scenarios**: add `@deprecated` and `@review-needed`, plus a comment with the date and reason. Emit one action per affected scenario with file and line number. +- **Mark descoped scenarios**: add `@wip` or `@pending` and `@review-needed`, plus a comment with the descope reason and target-release reference. Preserve the scenario — never delete it — so it can be reinstated when the AC is promoted back to Active. Emit one SYNC ACTION per affected scenario. - **Broken AC reference**: never silently remove the `@AC:` tag. Either relink it to the correct AC ID, or create the missing living doc entity with `living-doc-create-user-story` / `living-doc-create-functionality`, then update the tag. - **AC split into multiple ACs**: update the existing scenario's `@AC:` tag to the primary AC; create new scenarios for additional ACs diff --git a/skills/gherkin-living-doc-sync/evals/evals.json b/skills/gherkin-living-doc-sync/evals/evals.json index 5cc8fbe..e024005 100644 --- a/skills/gherkin-living-doc-sync/evals/evals.json +++ b/skills/gherkin-living-doc-sync/evals/evals.json @@ -107,6 +107,72 @@ "Summary line with counts at the end", "Asks for confirmation before applying" ] + }, + { + "id": 9, + "category": "happy-path", + "prompt": "What is the difference between the '@AC:US-001-01' Cucumber tag and the '# AC:US-001-01 ...' comment above a scenario? Do I need both?", + "expected_output": "Yes — both serve distinct purposes. The '# AC: ...' comment provides human-readable context: the AC version, state, description, and optional aspect annotation, all visible at a glance in the feature file. The '@AC:' Cucumber tag provides machine-readable traceability: coverage scripts, gap finders, and CI tools use it to link scenarios to ACs programmatically. Both must be present on every scenario for full traceability. The sync skill ensures both are aligned — if only one is present, a SYNC ACTION is raised.", + "files": [], + "expectations": [ + "Explains # AC: comment as human-readable context with version, state, and description", + "Explains @AC: tag as machine-readable traceability for scripts and CI", + "States both are required on every scenario", + "Notes sync skill raises a SYNC ACTION when only one is present" + ] + }, + { + "id": 10, + "category": "happy-path", + "prompt": "I want to run a full audit of all AC link headers across all feature files before starting a sync. How do I use scan_ac_links.py?", + "expected_output": "Run: 'python scripts/scan_ac_links.py features/ --catalog catalog.json'. The script scans all .feature files for '# AC:' comments and '@AC:' tags, validates each AC ID against the living doc catalog, and outputs a report listing: (1) missing # AC: comments (scenarios with no link header), (2) stale AC IDs (IDs not found in catalog), (3) mismatched comment/tag pairs (one present but not the other). Use this report as input to the sync workflow — address broken links first, then missing links, then mismatches.", + "files": [], + "expectations": [ + "Names the correct command: python scripts/scan_ac_links.py features/ --catalog catalog.json", + "Explains what the script reports: missing headers, stale IDs, mismatched pairs", + "Recommends repair order: broken links → missing links → mismatches", + "Notes the script output drives the subsequent sync workflow" + ] + }, + { + "id": 11, + "category": "regression", + "prompt": "A scenario has '@AC:US-001-01/aspect:username-input' but the # AC: comment above it just reads '# AC:US-001-01 (v1.0.0 – Active) — displays login form fields'. Is this a sync issue?", + "expected_output": "Yes — this is a SYNC ACTION. The @AC: tag encodes an aspect param (/aspect:username-input) but the # AC: comment does not mirror it. Per the skill spec, the comment must include '| aspect: username input' to match the tag. Apply: change the comment to '# AC:US-001-01 (v1.0.0 – Active) — displays login form fields | aspect: username input'. The comment mirrors the human-readable form of the tag's /aspect: param. Confirm before applying.", + "files": [], + "expectations": [ + "Identifies mismatch between @AC: tag aspect param and # AC: comment as a SYNC ACTION", + "Provides the corrected # AC: comment with | aspect: annotation", + "Comment mirrors the /aspect:value in human-readable form", + "Asks for confirmation before applying" + ] + }, + { + "id": 12, + "category": "edge-case", + "prompt": "I have a scenario linked to AC:US-042-03 which has just been descoped in the living doc (status: descoped). What should the sync skill do?", + "expected_output": "Sync direction: living doc → feature file. When an AC is descoped, the linked scenario should be tagged with @wip or @pending to indicate it is not active. Add a comment above the scenario explaining the descope: '# Descoped: AC:US-042-03 descoped on 2026-05-15 — reason: promo stacking deferred to sprint-52'. Never delete the scenario — it preserves the intent for when the AC is reinstated. Flag the scenario for @pending or @wip tagging in the CI pipeline.", + "files": [], + "expectations": [ + "Sync direction: living doc → feature file", + "Tags scenario with @wip or @pending — does not delete it", + "Adds a comment with the descope reason and date", + "Preserves the scenario for when the AC is reinstated", + "Flags for @pending tagging in CI" + ] + }, + { + "id": 13, + "category": "edge-case", + "prompt": "AC:US-001-02 was split into two separate ACs: AC:US-001-02 (happy path) and AC:US-001-04 (alt path: guest checkout). The feature file still has one scenario linked to AC:US-001-02. What does the sync output look like?", + "expected_output": "Two SYNC ACTIONs are emitted. (1) Existing scenario: the '@AC:US-001-02' tag is updated to the primary (happy path) AC — it stays unchanged since it was always the primary AC. (2) A SYNC ACTION proposes creating a new scenario for AC:US-001-04 with the required '# AC: US-001-04' header and '@AC:US-001-04' tag. The existing scenario is never modified or deleted. Developer confirms both actions before any change is applied.", + "files": [], + "expectations": [ + "Existing scenario's @AC: tag is updated to point to the primary AC (US-001-02)", + "A SYNC ACTION is raised proposing a new scenario for AC:US-001-04", + "The existing scenario is not deleted or modified beyond the tag update", + "Developer confirmation is required before any file is edited" + ] } ] } diff --git a/skills/gherkin-living-doc-sync/evals/fixture-map.md b/skills/gherkin-living-doc-sync/evals/fixture-map.md new file mode 100644 index 0000000..eed2ceb --- /dev/null +++ b/skills/gherkin-living-doc-sync/evals/fixture-map.md @@ -0,0 +1,33 @@ +# Fixture Map — gherkin-living-doc-sync + +## Fixture files + +No fixture files for this skill. All evals are conversational — the skill operates on feature files and a living doc catalog referenced by path, not by inline fixture content. + +## Eval to fixture mapping + +| Eval ID | Category | Fixture file(s) | Coverage | +|---|---|---|---| +| 1 | happy-path | _(none — conversational)_ | Missing # AC: headers in checkout.feature — SYNC ACTION blocks per scenario | +| 2 | happy-path | _(none)_ | AC description updated in living doc → propagate to # AC: comment in feature file | +| 3 | happy-path | _(none)_ | Step text drift after UI rename → DRIFT DETECTED block with two fix options | +| 4 | regression | _(none)_ | US deprecated in living doc → @deprecated + @review-needed tags on linked scenarios | +| 5 | negative | _(none)_ | Routing: new scenario authoring → gherkin-scenario | +| 6 | paraphrase | _(none)_ | "Feature files are a mess after redesign" → prioritised repair plan: steps first, then links | +| 7 | edge-case | _(none)_ | Broken AC reference (US-099 not in catalog) → resolution options, never remove the link | +| 8 | output-format | _(none)_ | Sync run output format: SYNC ACTION + DRIFT DETECTED blocks + summary line | +| 9 | happy-path | _(none)_ | @AC: Cucumber tag vs # AC: comment — both required, each serves a distinct purpose | +| 10 | happy-path | _(none)_ | scan_ac_links.py audit command and output interpretation | +| 11 | regression | _(none)_ | Aspect param mismatch: @AC: tag has /aspect: but # AC: comment does not mirror it | +| 12 | edge-case | _(none)_ | Descoped AC: tag scenario @wip/@pending, add comment, never delete | + +## Trigger eval summary + +20 entries: 14 `should_trigger=true`, 6 `should_trigger=false` + +| Routes to | Query count | +|---|---| +| gherkin-scenario | 2 | +| gherkin-step | 1 | +| living-doc-gap-finder | 1 | +| living-doc-create-user-story | 1 | diff --git a/skills/gherkin-living-doc-sync/evals/trigger-eval.json b/skills/gherkin-living-doc-sync/evals/trigger-eval.json index 28ac771..722e6bc 100644 --- a/skills/gherkin-living-doc-sync/evals/trigger-eval.json +++ b/skills/gherkin-living-doc-sync/evals/trigger-eval.json @@ -14,5 +14,9 @@ {"id": 13, "query": "Implement the step definition for 'When the customer confirms the order'", "should_trigger": false, "reason": "Step definition implementation — routes to gherkin-step"}, {"id": 14, "query": "Find which User Stories have no Gherkin scenarios", "should_trigger": false, "reason": "Finding living doc gaps — routes to living-doc-gap-finder"}, {"id": 15, "query": "Create a new User Story for the checkout capability", "should_trigger": false, "reason": "Creating new entities — routes to living-doc-create-user-story"}, - {"id": 16, "query": "Propagate AC changes from the living doc back to the feature files", "should_trigger": true, "reason": "'propagate AC changes' trigger phrase"} + {"id": 16, "query": "Propagate AC changes from the living doc back to the feature files", "should_trigger": true, "reason": "'propagate AC changes' trigger phrase"}, + {"id": 17, "query": "The @AC: tag and the # AC: comment are out of sync — what do I do?", "should_trigger": true, "reason": "Comment/tag mismatch is a sync issue — core task of this skill"}, + {"id": 18, "query": "Generate a new scenario for the expired promo AC from scratch", "should_trigger": false, "reason": "Writing new scenarios from scratch — routes to gherkin-scenario (not syncing existing ones)"}, + {"id": 19, "query": "Run scan_ac_links.py before doing a sync pass", "should_trigger": true, "reason": "Auditing AC link headers is the first step of the sync workflow — this skill owns scan_ac_links.py"}, + {"id": 20, "query": "An AC was descoped last sprint — what should happen to the linked scenario?", "should_trigger": true, "reason": "Propagating AC status change (descoped) to feature file is a living-doc → feature file sync direction"} ] diff --git a/skills/gherkin-scenario/SKILL.md b/skills/gherkin-scenario/SKILL.md index 9383399..85389e1 100644 --- a/skills/gherkin-scenario/SKILL.md +++ b/skills/gherkin-scenario/SKILL.md @@ -1,15 +1,15 @@ --- name: gherkin-scenario description: > - Writing BDD Gherkin scenarios in plain business language. Activate when writing or reviewing + Writing BDD Gherkin scenarios in plain business language. Use when writing or reviewing feature files, Given/When/Then steps, Scenario Outlines, Background blocks, or acceptance - criteria expressed as Gherkin. Covers business-language principles, one-behaviour-per-scenario - rule, anti-patterns (implementation leakage, multiple When actions, UI-speak in domain - scenarios), and data-driven scenario design. + criteria expressed as Gherkin — including `# AC:` comment annotations for traceability and + how to tag exploratory scenarios with no User Story. Covers one-behaviour-per-scenario rule + and anti-patterns (implementation leakage, multiple When actions, UI-speak). Triggers on: "write a Gherkin scenario", "BDD scenario", "feature file", "Given When Then", "Scenario Outline", "Cucumber scenario", "behave scenario", "acceptance test in Gherkin", "should I use Background", "BDD anti-patterns", "review my feature file", "BDD scenarios for", - "convert acceptance criteria to Gherkin". + "convert acceptance criteria to Gherkin", "# AC: comment", "exploratory scenario". Does NOT trigger for: implementing step definitions (use gherkin-step), writing unit tests (use test-unit-write), designing a test case table (use test-case-design). Pairs with gherkin-step for step definition implementation. diff --git a/skills/gherkin-scenario/evals/trigger-eval.json b/skills/gherkin-scenario/evals/trigger-eval.json index c2d39ef..458f9b0 100644 --- a/skills/gherkin-scenario/evals/trigger-eval.json +++ b/skills/gherkin-scenario/evals/trigger-eval.json @@ -13,6 +13,7 @@ {"id": 12, "query": "Can you write BDD scenarios for the checkout flow?", "should_trigger": true, "reason": "'BDD scenarios for' trigger phrase"}, {"id": 13, "query": "Convert these acceptance criteria to Gherkin", "should_trigger": true, "reason": "'convert acceptance criteria to Gherkin' trigger phrase"}, {"id": 14, "query": "Implement the step definition for 'When the customer confirms the order'", "should_trigger": false, "reason": "Step definition implementation — routes to gherkin-step"}, - {"id": 15, "query": "Write a unit test for the discount calculation function", "should_trigger": false, "reason": "Unit test request — routes to test-unit-write"}, - {"id": 16, "query": "Design a test case table for the promo code feature", "should_trigger": false, "reason": "Test case table design — routes to test-case-design"} + {"id": 15, "query": "Write a unit test for the discount calculation function", "should_trigger": false, "reason": "Unit test request — out of scope for this toolkit (no test-unit-write skill defined)"}, + {"id": 16, "query": "Design a test case table for the promo code feature", "should_trigger": false, "reason": "Test case table design \u2014 out of scope for this toolkit (no test-case-design skill defined)"}, + {"id": 17, "query": "What # AC: comment should I use for an exploratory scenario that has no User Story?", "should_trigger": true, "reason": "Standalone # AC: STANDALONE placeholder guidance \u2014 part of gherkin-scenario skill (edge-case: exploratory scenario without a User Story)"} ] diff --git a/skills/gherkin-step/evals/trigger-eval.json b/skills/gherkin-step/evals/trigger-eval.json index 869dc62..c5be3fd 100644 --- a/skills/gherkin-step/evals/trigger-eval.json +++ b/skills/gherkin-step/evals/trigger-eval.json @@ -15,5 +15,5 @@ {"id": 14, "query": "How do I register a step definition pattern for a new step text?", "should_trigger": true, "reason": "'register step definition' trigger phrase"}, {"id": 15, "query": "How do I set up hooks for my Cucumber test suite?", "should_trigger": true, "reason": "'hook setup' trigger phrase"}, {"id": 16, "query": "Write a Gherkin scenario for the promo code feature", "should_trigger": false, "reason": "Writing Gherkin scenarios — routes to gherkin-scenario"}, - {"id": 17, "query": "Write a unit test for the discount calculation function", "should_trigger": false, "reason": "Unit test request — routes to test-unit-write"} + {"id": 17, "query": "Write a unit test for the discount calculation function", "should_trigger": false, "reason": "Unit test request — out of scope for this toolkit (no test-unit-write skill defined)"} ] diff --git a/skills/living-doc-create-feature/SKILL.md b/skills/living-doc-create-feature/SKILL.md index 39a707f..9d8f3ad 100644 --- a/skills/living-doc-create-feature/SKILL.md +++ b/skills/living-doc-create-feature/SKILL.md @@ -2,16 +2,17 @@ name: living-doc-create-feature description: > Define a system surface (UI screen, API endpoint, service, or module) as a Feature entity, - enabling impact analysis and change-management traceability in the living documentation. - Activate when documenting a new screen, API endpoint, service, or module; maintaining a - Feature Registry; mapping system surfaces to User Stories; enumerating which Functionalities - a surface owns; or bootstrapping the structural layer between User Stories and atomic behaviors. + enabling impact analysis and traceability in the living documentation. Use when documenting + a new screen, API endpoint, service, or module; maintaining a Feature Registry; mapping + system surfaces to User Stories; resolving Feature naming conflicts or duplicate entries; + or bootstrapping the structural layer between User Stories and atomic behaviors. Triggers on: "document a new feature", "create a feature entity", "new screen documentation", "document an API endpoint", "feature registry", "what feature owns this", "map user story to - feature", "create feature entity", "system surface documentation", "feature owners", - "feature dependencies". - Does NOT trigger for: creating User Stories (use living-doc-create-user-story), defining atomic - behaviors (use living-doc-create-functionality). + feature", "system surface documentation", "feature owners", "feature dependencies", + "duplicate feature name", "resolve feature naming". + Does NOT trigger for: creating User Stories (use living-doc-create-user-story), defining + atomic behaviors (use living-doc-create-functionality), scanning PageObjects (use + living-doc-pageobject-scan), deprecating entities (use living-doc-update). license: Apache-2.0 compatibility: GitHub Copilot --- diff --git a/skills/living-doc-create-feature/evals/evals.json b/skills/living-doc-create-feature/evals/evals.json index 3f69b1e..c48e1c5 100644 --- a/skills/living-doc-create-feature/evals/evals.json +++ b/skills/living-doc-create-feature/evals/evals.json @@ -100,7 +100,7 @@ "Single fenced ```json code block", "All required fields present: type, id, name, surface_type, purpose, status, user_stories, functionalities, owners, external_dependencies", "id follows FEAT- convention", - "surface_type is one of UI/API/Service/Module", + "surface_type is one of UI/API/Service/Worker/Module/Library", "No prose inside the code block" ] }, @@ -119,6 +119,72 @@ "surface_type, owners, external_dependencies populated from file", "user_stories and functionalities are empty arrays with a linking warning" ] + }, + { + "id": 10, + "category": "regression", + "prompt": "I want to document a Feature called 'Spring Payment Controller'. Is that a good Feature name?", + "expected_output": "No. Feature names should describe the business surface, not the technology stack. 'Spring Payment Controller' encodes an implementation detail (Spring framework) that changes without the business surface changing. Use 'Payment API' (surface_type=API) instead. The test: if the tech stack changed to Node.js, would the Feature name still be accurate? 'Payment API' would be — 'Spring Payment Controller' would not.", + "files": [], + "expectations": [ + "Flags technology-encoded name as an anti-pattern", + "Recommends the technology-agnostic business surface name (e.g. Payment API)", + "Explains the rule: Feature names describe surfaces, not stack", + "Notes that technology detail changes without the surface changing" + ] + }, + { + "id": 11, + "category": "edge-case", + "prompt": "I'm documenting our OrdersController — a Spring REST controller that handles GET /orders and POST /orders. Should surface_type be 'UI'?", + "expected_output": "No. A REST controller is an API surface, not a UI surface. surface_type should be 'API'. UI is reserved for screens that a human interacts with directly (e.g. a web page or modal). Misclassifying a REST endpoint as UI breaks impact analysis routing between frontend and backend changes. Use surface_type='API' with a name like 'Orders API'.", + "files": [], + "expectations": [ + "Corrects surface_type from UI to API", + "Explains: UI is for screens humans interact with directly", + "Notes that misclassification breaks impact analysis routing", + "Recommends the correct name: Orders API" + ] + }, + { + "id": 12, + "category": "happy-path", + "prompt": "Document a Feature for our PaymentEventProcessor — an asynchronous worker that listens on a Kafka topic and processes payment events in the background.", + "expected_output": "Agent identifies surface_type as Worker (asynchronous/background processor). Purpose describes the business contract: processes payment events asynchronously from the Kafka topic. Outputs Feature JSON with id=FEAT-payment-event-processor, surface_type=Worker, purpose in business language describing the event processing responsibility, and external_dependencies including the Kafka topic and any downstream services.", + "files": [], + "expectations": [ + "Identifies surface_type as Worker", + "Feature name is a noun phrase: 'Payment Event Processor'", + "Purpose describes the business contract, not the technology", + "external_dependencies includes the Kafka topic or downstream services", + "Outputs valid canonical Feature JSON" + ] + }, + { + "id": 13, + "category": "regression", + "prompt": "I have these candidate behaviors for FEAT-checkout: 'validate cart', 'apply promo', 'confirm order'. Should I add them to the functionalities field right now?", + "expected_output": "No. Candidate Functionalities that have not yet been formally defined as FUNC- entities must not be listed in the functionalities array. The functionalities field only accepts formally registered FUNC- IDs. While these are still informal notes, leave functionalities as [] and note the candidates externally. Use living-doc-create-functionality to formally define each one before linking it here.", + "files": [], + "expectations": [ + "Instructs to leave functionalities as [] for unregistered candidates", + "Explains: functionalities field only accepts formal FUNC- IDs", + "Does not list the candidates inside the JSON functionalities array", + "Points to living-doc-create-functionality as the next step" + ] + }, + { + "id": 14, + "category": "regression", + "prompt": "Two teams both want to create a Feature called 'Payment Page'. How should I handle the naming conflict?", + "expected_output": "Identical Feature names indicate a merge candidate or a scope overlap — always check for duplicates before creating. If both teams are documenting the same screen/surface, consolidate into a single Feature with both teams as co-owners. If the surfaces are genuinely different (e.g. a customer-facing payment form vs an admin reconciliation screen), disambiguate the names (e.g. 'Customer Payment Page' vs 'Admin Payment Reconciliation Screen') and clarify the boundary. Creating two Features with identical names breaks impact analysis and makes traceability ambiguous.", + "files": [], + "expectations": [ + "Identifies duplicate name as a naming conflict anti-pattern", + "Offers two paths: merge if same surface, disambiguate if different surfaces", + "Warns that identical names break impact analysis and traceability", + "Notes that FEAT ID must also be unique" + ] } ] } diff --git a/skills/living-doc-create-feature/evals/fixture-map.md b/skills/living-doc-create-feature/evals/fixture-map.md index 9c0bcd8..1734693 100644 --- a/skills/living-doc-create-feature/evals/fixture-map.md +++ b/skills/living-doc-create-feature/evals/fixture-map.md @@ -2,8 +2,9 @@ ## Fixture files -No fixture files for this skill. All evals are conversational — the skill provides -procedural guidance for structuring a Feature entity from user-provided answers. +| File | Description | +|---|---| +| `evals/files/raw-feature-notes.md` | Discovery session notes for the Notifications Centre screen — used by the file-based eval (id=9) | ## Eval to fixture mapping @@ -14,10 +15,19 @@ procedural guidance for structuring a Feature entity from user-provided answers. | 3 | regression | _(none)_ | Orphan Feature warning (no User Stories, no Functionalities) | | 4 | happy-path | _(none)_ | Anti-pattern: verb-phrase Feature name (Process Payment) | | 5 | negative | _(none)_ | Routing: User Story creation → living-doc-create-user-story | +| 6 | paraphrase | _(none)_ | Notification Service — surface type (Worker/API) identification | +| 7 | edge-case | _(none)_ | Shared utility library — external_dependency vs Feature entity | +| 8 | output-format | _(none)_ | Canonical JSON output: all required fields, FEAT-kebab id, surface_type enum | +| 9 | file-based | `raw-feature-notes.md` | Notifications Centre — extract surface from rough notes | +| 10 | regression | _(none)_ | Anti-pattern: technology-encoded Feature name (Spring Payment Controller) | +| 11 | edge-case | _(none)_ | Anti-pattern: surface_type=UI for a REST controller | +| 12 | happy-path | _(none)_ | Worker surface type: PaymentEventProcessor | +| 13 | regression | _(none)_ | Candidate Functionalities not formally defined — leave functionalities=[] | +| 14 | regression | _(none)_ | Duplicate Feature name conflict resolution | ## Trigger eval summary -14 entries: 10 `should_trigger=true`, 4 `should_trigger=false` +18 entries: 13 `should_trigger=true`, 5 `should_trigger=false` | Routes to | Query count | |---|---| @@ -25,3 +35,4 @@ procedural guidance for structuring a Feature entity from user-provided answers. | living-doc-create-functionality | 1 | | living-doc-pageobject-scan | 1 | | living-doc-scenario-creator | 1 | +| living-doc-update | 1 | diff --git a/skills/living-doc-create-feature/evals/trigger-eval.json b/skills/living-doc-create-feature/evals/trigger-eval.json index da51b6d..817232e 100644 --- a/skills/living-doc-create-feature/evals/trigger-eval.json +++ b/skills/living-doc-create-feature/evals/trigger-eval.json @@ -12,5 +12,9 @@ {"id": 11, "query": "Create a user story for the checkout capability", "should_trigger": false, "reason": "User Story creation — routes to living-doc-create-user-story"}, {"id": 12, "query": "Document the atomic behavior: validate cart before checkout", "should_trigger": false, "reason": "Atomic behavior — routes to living-doc-create-functionality"}, {"id": 13, "query": "Scan the checkout page for PageObjects", "should_trigger": false, "reason": "UI scan — routes to living-doc-pageobject-scan"}, - {"id": 14, "query": "Generate scenarios for the checkout User Story", "should_trigger": false, "reason": "Scenario creation — routes to living-doc-scenario-creator"} + {"id": 14, "query": "Generate Gherkin scenarios for the checkout User Story", "should_trigger": false, "reason": "Scenario creation — routes to living-doc-scenario-creator"}, + {"id": 15, "query": "Register the notification background worker in the living doc as a system surface", "should_trigger": true, "reason": "Documenting a background worker as a system surface — Feature creation (surface_type=Worker)"}, + {"id": 16, "query": "Deprecate the checkout feature in the living doc", "should_trigger": false, "reason": "Deprecating an existing entity — routes to living-doc-update"}, + {"id": 17, "query": "Document the Orders Service — it exposes a REST API to place and cancel orders", "should_trigger": true, "reason": "Documenting a backend service surface — Feature creation (surface_type=Service or API)"}, + {"id": 18, "query": "Two Features have the same name 'Payment Page' — how do I resolve this?", "should_trigger": true, "reason": "Duplicate Feature name resolution is part of the Feature creation workflow"} ] diff --git a/skills/living-doc-create-functionality/SKILL.md b/skills/living-doc-create-functionality/SKILL.md index bd882ef..2f94365 100644 --- a/skills/living-doc-create-functionality/SKILL.md +++ b/skills/living-doc-create-functionality/SKILL.md @@ -4,12 +4,13 @@ description: > Define an atomic, testable behavior (Functionality) with Functionality-level Acceptance Criteria designed to be validated by fast unit or integration tests. Activate when documenting an atomic behavior, component function, or business rule; writing Functionality-level AC; creating the - granular test anchor for a Feature; identifying reuse candidates across User Stories; or - reviewing a Functionality for completeness. + granular test anchor for a Feature; choosing test_type (unit vs integration); identifying reuse + candidates across User Stories; linking a Functionality to its parent Feature; or reviewing a + Functionality for completeness. Triggers on: "create a functionality", "document an atomic behavior", "functionality AC", "unit-testable behavior", "define component behavior", "atomic acceptance criteria", "document a business rule", "create a functionality entity", "functionality acceptance criteria", - "test_type", "unit vs integration test", "choose test type". + "test_type", "unit vs integration test", "choose test type", "link functionality to feature". Does NOT trigger for: end-to-end User Stories (use living-doc-create-user-story), system surface documentation (use living-doc-create-feature). license: Apache-2.0 diff --git a/skills/living-doc-create-functionality/evals/evals.json b/skills/living-doc-create-functionality/evals/evals.json index 23c37a1..2213e4f 100644 --- a/skills/living-doc-create-functionality/evals/evals.json +++ b/skills/living-doc-create-functionality/evals/evals.json @@ -105,8 +105,62 @@ "type field is 'Functionality'", "id follows FUNC- convention", "acceptance_criteria is an array of business-language strings", + "test_coverage array covers every AC with unit or integration classification", "No implementation details in JSON values" ] + }, + { + "id": 9, + "category": "regression", + "prompt": "I want to document a Functionality called 'Password Validation'. Is that a good name?", + "expected_output": "'Password Validation' is a noun phrase — Functionality names must be verb phrases expressing what the behavior does. Rename it to 'Validate Password Strength' or 'Validate Password Against Policy'. The noun phrase 'Password Validation' names a module or component, which would be a Feature, not a Functionality. Correct format: verb + object, describing the single atomic action the Functionality performs.", + "files": [], + "expectations": [ + "Flags 'Password Validation' as a noun phrase anti-pattern", + "Provides a correct verb-phrase rename (e.g. 'Validate Password Strength')", + "Explains: Functionality names must be verb phrases", + "Notes noun phrases describe modules/components (Features), not behaviors" + ] + }, + { + "id": 10, + "category": "happy-path", + "prompt": "I need to document this atomic behavior for the checkout domain: 'Reject order when all items in the cart are out of stock'. The Feature hasn't been created yet.", + "expected_output": "Agent infers a provisional feature_id of FEAT-checkout from the 'checkout domain' context. Forms the verb-phrase name: 'Reject order when all items are out of stock'. Runs completeness checklist: boundary conditions (all items out of stock vs. some items out of stock), partial cart availability, zero-quantity items. Notes that FEAT-checkout must be created via living-doc-create-feature if it does not yet exist. Outputs canonical Functionality JSON with feature_id: 'FEAT-checkout'.", + "files": [], + "expectations": [ + "Infers feature_id as FEAT-checkout from 'checkout domain' context", + "Forms a verb-phrase name", + "Runs completeness checklist: all out of stock vs. partial", + "Notes that FEAT-checkout must be created if it doesn't exist", + "Outputs valid Functionality JSON with feature_id populated" + ] + }, + { + "id": 11, + "category": "regression", + "prompt": "A Functionality has no parent Feature — I forgot to link it. Is that a problem?", + "expected_output": "Yes. A Functionality without a parent Feature is untraceable — it cannot be reached via the entity hierarchy, will be surfaced as an ORPHAN_FUNCTIONALITY gap by living-doc-gap-finder, and will be missed in impact analyses. Identify or create the owning Feature first (living-doc-create-feature), then set feature_id in the Functionality entity. Do not leave feature_id empty or null in a finalized entity.", + "files": [], + "expectations": [ + "Flags missing feature_id as a traceability problem", + "Names the gap type: ORPHAN_FUNCTIONALITY", + "Notes it will be missed in impact analyses", + "Directs to living-doc-create-feature to create the parent Feature first" + ] + }, + { + "id": 12, + "category": "edge-case", + "prompt": "I have a Functionality AC that says: 'The system validates the discount code'. Is this AC well-formed?", + "expected_output": "No. This AC is vague — it does not specify an observable outcome or error code. 'validates' without a result is non-testable. Rewrite using When/Then form with explicit outcomes: e.g. 'When a valid discount code is applied, the discount is deducted from the basket total' and 'When an invalid or expired discount code is submitted, validation returns INVALID_CODE error'. Every AC must state an exact output or side effect; error cases must include the explicit error code.", + "files": [], + "expectations": [ + "Flags 'validates' without a result as non-testable vague AC", + "Provides a rewritten AC with explicit When/Then outcome", + "Includes an error-case AC with an explicit error code", + "Explains: every AC must state an exact outcome" + ] } ] } diff --git a/skills/living-doc-create-functionality/evals/fixture-map.md b/skills/living-doc-create-functionality/evals/fixture-map.md index bdd06ec..32e3c02 100644 --- a/skills/living-doc-create-functionality/evals/fixture-map.md +++ b/skills/living-doc-create-functionality/evals/fixture-map.md @@ -15,10 +15,17 @@ | 3 | happy-path | _(none)_ | unit vs integration decision for a DB uniqueness check | | 4 | regression | _(none)_ | Reuse candidate detection: same AC in two User Stories | | 5 | negative | _(none)_ | Routing: User Story creation → living-doc-create-user-story | +| 6 | paraphrase | _(none)_ | Gold member discount business rule — Functionality elicitation | +| 7 | edge-case | _(none)_ | 12 ACs → non-atomic scope signal; recommend split | +| 8 | output-format | _(none)_ | Canonical Functionality JSON: all required fields, test_coverage array | +| 9 | regression | _(none)_ | Anti-pattern: noun name ('Password Validation') → verb phrase required | +| 10 | happy-path | _(none)_ | Feature inference from context ('checkout domain') | +| 11 | regression | _(none)_ | Missing parent Feature: ORPHAN_FUNCTIONALITY anti-pattern | +| 12 | edge-case | _(none)_ | Vague AC ('validates') — non-testable; rewrite with explicit When/Then + error code | ## Trigger eval summary -14 entries: 10 `should_trigger=true`, 4 `should_trigger=false` +22 entries: 16 `should_trigger=true`, 6 `should_trigger=false` | Routes to | Query count | |---|---| @@ -26,3 +33,5 @@ | living-doc-create-feature | 1 | | living-doc-scenario-creator | 1 | | living-doc-gap-finder | 1 | +| living-doc-update | 1 | +| living-doc-pageobject-scan | 1 | diff --git a/skills/living-doc-create-functionality/evals/trigger-eval.json b/skills/living-doc-create-functionality/evals/trigger-eval.json index 3d9012e..e2bb420 100644 --- a/skills/living-doc-create-functionality/evals/trigger-eval.json +++ b/skills/living-doc-create-functionality/evals/trigger-eval.json @@ -16,5 +16,9 @@ {"id": 15, "query": "How should I define the component behavior for the payment validator?", "should_trigger": true, "reason": "'define component behavior' trigger phrase"}, {"id": 16, "query": "Write atomic acceptance criteria for the session expiry logic", "should_trigger": true, "reason": "'atomic acceptance criteria' trigger phrase"}, {"id": 17, "query": "Should this behavior be tested with a unit test or an integration test?", "should_trigger": true, "reason": "'unit vs integration test' trigger phrase"}, - {"id": 18, "query": "Help me choose test type for the loyalty points calculation — it calls no external services", "should_trigger": true, "reason": "'choose test type' trigger phrase"} + {"id": 18, "query": "Help me choose test type for the loyalty points calculation — it calls no external services", "should_trigger": true, "reason": "'choose test type' trigger phrase"}, + {"id": 19, "query": "Help me document the null-check rule for user IDs in the registration service", "should_trigger": true, "reason": "Documenting an atomic validation rule is a Functionality — 'document a business rule' / 'atomic acceptance criteria' pattern"}, + {"id": 20, "query": "A Functionality I wrote has no parent Feature — how do I link it?", "should_trigger": true, "reason": "Resolving ORPHAN_FUNCTIONALITY — identifying and linking the parent Feature is a Functionality skill task"}, + {"id": 21, "query": "Update the living doc entity for the discount validation Functionality", "should_trigger": false, "reason": "Updating an existing entity — routes to living-doc-update"}, + {"id": 22, "query": "Scan the checkout page for UI elements", "should_trigger": false, "reason": "UI scan — routes to living-doc-pageobject-scan"} ] diff --git a/skills/living-doc-create-user-story/SKILL.md b/skills/living-doc-create-user-story/SKILL.md index dccff9f..71a0f96 100644 --- a/skills/living-doc-create-user-story/SKILL.md +++ b/skills/living-doc-create-user-story/SKILL.md @@ -2,16 +2,17 @@ name: living-doc-create-user-story description: > Guide the creation of a well-formed User Story (US) with business-level Acceptance Criteria - that are traceable, testable, and E2E-ready. Activate when creating a new User Story for a + that are traceable, testable, and E2E-ready. Use when creating a new User Story for a business capability, eliciting As-a/I-can/so-that narratives, defining US-level Acceptance - Criteria, or validating User Story completeness before handing off to scenario creation. + Criteria, validating User Story narrative structure (checking As-a role, I-want clause, or + AC wording quality), or reviewing US completeness before scenario creation. Triggers on: "create a user story", "new user story for", "write acceptance criteria for", "document a business requirement", "define US AC", "user story template", "as a user I want", "elicit requirements", "AC for user story", "US acceptance criteria", - "review this user story", "is my narrative well-formed". + "review this user story", "is my narrative well-formed", "I-want clause". Does NOT trigger for: atomic component behaviors (use living-doc-create-functionality), - documenting system surfaces (use living-doc-create-feature). - Pairs with living-doc-create-functionality. + documenting system surfaces (use living-doc-create-feature), generating BDD scenarios + (use living-doc-scenario-creator). Pairs with living-doc-create-functionality. license: Apache-2.0 compatibility: GitHub Copilot --- @@ -110,15 +111,15 @@ When creating a new User Story, output **one fenced `json` code block** using th "features": ["FEAT-login"], "acceptance_criteria": [ { - "id": "US-001-AC-1", + "id": "AC:US-001-01", "description": "A registered customer with a phone number on file can request a password reset code by SMS and sees confirmation that the code was sent." }, { - "id": "US-001-AC-2", + "id": "AC:US-001-02", "description": "A customer who enters an unregistered phone number is told that the reset request cannot be completed." }, { - "id": "US-001-AC-3", + "id": "AC:US-001-03", "description": "A customer who submits an expired or already-used reset code is told to request a new code." } ] @@ -128,7 +129,7 @@ When creating a new User Story, output **one fenced `json` code block** using th Rules: - Use `title` rather than `name` - Use `as_a`, `i_want`, and `so_that` -- Every AC object must have `id` in `US--AC-` format and a plain-language `description` +- Every AC object must have `id` in `AC:US--` format and a plain-language `description` - Write AC descriptions in plain language — no structured language keywords in JSON values ## Anti-patterns to flag @@ -142,5 +143,5 @@ Rules: | User Story "I want" clause contains "and" | Multiple capabilities in one User Story — split at each “and”. Each capability has its own failure paths and may touch different Features; bundling them makes ACs ambiguous and traceability impossible. | | AC uses `{placeholder}` for a single value | Placeholder syntax is only justified when two or more values vary. If only one value applies, write it inline. Example: instead of `{error type}: inline validation message`, write `an inline validation message is shown`. | | AC describes a non-observable outcome | e.g. “a background job processes the record” — the user cannot observe this. Restate as the observable signal (e.g. “the confirmation email arrives within 60 seconds”), or redirect the behavior to a Functionality entity if it is purely technical. | -| AC identifier does not follow `US--AC-` | Every acceptance criterion in the JSON output needs a stable `US--AC-` id so it can be referenced unambiguously. | +| AC identifier does not follow `AC:US--` | Every acceptance criterion in the JSON output needs a stable `AC:US--` id so it can be referenced unambiguously. | | AC behavior already documented in another User Story | Duplicate ACs create a maintenance burden — any change must be applied in every copy. Extract the shared behavior into a Functionality entity and link both User Stories to it. | \ No newline at end of file diff --git a/skills/living-doc-create-user-story/evals/evals.json b/skills/living-doc-create-user-story/evals/evals.json index 8492b15..48ae98b 100644 --- a/skills/living-doc-create-user-story/evals/evals.json +++ b/skills/living-doc-create-user-story/evals/evals.json @@ -101,16 +101,68 @@ "id": 8, "category": "output-format", "prompt": "Create a user story: 'A customer service agent needs to view the full order history for any customer to resolve disputes.'", - "expected_output": "The output contains a fenced ```json code block with a UserStory entity. Required fields: type ('UserStory'), id (US-), title, as_a, i_want, so_that, acceptance_criteria (array of objects with id and description). The as_a/i_want/so_that fields follow the standard user story template. Each AC has a unique id in the format US--AC-. No Gherkin syntax appears inside the JSON values.", + "expected_output": "The output contains a fenced ```json code block with a UserStory entity. Required fields: type ('UserStory'), id (US-), title, as_a, i_want, so_that, acceptance_criteria (array of objects with id and description). The as_a/i_want/so_that fields follow the standard user story template. Each AC has a unique id in the format AC:US--. No Gherkin syntax appears inside the JSON values.", "files": [], "expectations": [ "Single fenced ```json code block", "type field is 'UserStory'", "id follows US- convention", "as_a/i_want/so_that follow standard template", - "Each AC has id in US--AC- format", + "Each AC has id in AC:US-- format", "No Gherkin syntax in JSON values" ] + }, + { + "id": 9, + "category": "regression", + "prompt": "Here is my User Story: 'As a customer, I want to browse products and add items to my cart so that I can purchase what I want.' Is this well-formed?", + "expected_output": "No. The 'I want' clause contains 'and' — it bundles two distinct capabilities: browsing products and adding items to cart. These should be two separate User Stories. Each has its own failure paths: browsing may fail due to search/filter issues; adding to cart may fail due to stock availability or login state. Split the story at each 'and'. Shared Functionalities (e.g. stock availability check) can be linked to both User Stories.", + "files": [], + "expectations": [ + "Detects the 'and' in the 'I want' clause", + "Recommends splitting into two separate User Stories", + "Explains that each capability has distinct failure paths", + "Notes shared Functionalities can be linked to both stories" + ] + }, + { + "id": 10, + "category": "regression", + "prompt": "Is this AC correct? 'When the customer submits an invalid {error type}: inline validation message is shown.'", + "expected_output": "No. The {placeholder} syntax is only justified when two or more values vary independently. Here only one concrete feedback type applies — the placeholder is unnecessary. Rewrite without the placeholder: 'When the customer submits invalid data, an inline validation message is shown.' If multiple distinct error types each require different handling, write a separate AC per error type rather than bundling them with a placeholder.", + "files": [], + "expectations": [ + "Flags single-value placeholder as an anti-pattern", + "Provides a rewritten AC without placeholder", + "Explains: placeholder only for independently varying values", + "Notes: write separate ACs for distinct error types" + ] + }, + { + "id": 11, + "category": "regression", + "prompt": "Is this AC valid for a User Story? 'When the customer places an order, a background job processes the payment record asynchronously.'", + "expected_output": "No. This AC describes a non-observable outcome — the customer cannot observe a background job. User Story ACs must be observable from the user's perspective. Restate as an observable signal: e.g. 'When the customer places an order, they see an order confirmation and receive a confirmation email.' If the behavior is purely technical, it belongs in a Functionality entity — redirect to living-doc-create-functionality.", + "files": [], + "expectations": [ + "Flags the AC as a non-observable outcome", + "Provides a rewritten AC with an observable user signal", + "Notes that purely technical behaviors belong in Functionality entities", + "Routes to living-doc-create-functionality for technical AC" + ] + }, + { + "id": 12, + "category": "edge-case", + "prompt": "The same AC appears in both US-007 and US-011: 'A verification email is sent to the customer within 30 seconds.' Should I duplicate it?", + "expected_output": "No. Duplicating ACs across User Stories creates a maintenance burden — any change must be applied in every copy, and they can drift. Extract the shared behavior into a Functionality entity (e.g. 'Send verification email within SLA') and link both US-007 and US-011 to it via the Functionality's user_stories field. Use living-doc-create-functionality to create the shared Functionality.", + "files": [], + "expectations": [ + "Flags duplicate AC as a maintenance burden anti-pattern", + "Recommends creating a shared Functionality entity", + "Both User Stories link to the shared Functionality", + "Points to living-doc-create-functionality for the extraction" + ] } ] } diff --git a/skills/living-doc-create-user-story/evals/files/incomplete-user-story.json b/skills/living-doc-create-user-story/evals/files/incomplete-user-story.json index 93f6e15..fe43c97 100644 --- a/skills/living-doc-create-user-story/evals/files/incomplete-user-story.json +++ b/skills/living-doc-create-user-story/evals/files/incomplete-user-story.json @@ -11,7 +11,7 @@ "features": ["FEAT-login"], "acceptance_criteria": [ { - "id": "US-042-AC-1", + "id": "AC:US-042-01", "description": "Happy path: password reset email is sent", "given": "the customer is on the forgot password page", "when": "the customer enters their registered email address and submits", diff --git a/skills/living-doc-create-user-story/evals/fixture-map.md b/skills/living-doc-create-user-story/evals/fixture-map.md index 431652e..30cd1a8 100644 --- a/skills/living-doc-create-user-story/evals/fixture-map.md +++ b/skills/living-doc-create-user-story/evals/fixture-map.md @@ -15,14 +15,21 @@ | 3 | happy-path | _(none)_ | Anti-pattern: invalid actor ("the system") | | 4 | regression | _(none)_ | Anti-pattern: technical AC (DB implementation detail) | | 5 | negative | _(none)_ | Routing: atomic behavior → living-doc-create-functionality | +| 6 | paraphrase | _(none)_ | SMS password reset — full elicitation with happy path + error paths | +| 7 | edge-case | _(none)_ | Two actors for same capability → two separate User Stories | +| 8 | output-format | _(none)_ | Canonical UserStory JSON: as_a/i_want/so_that, US--AC- format | +| 9 | regression | _(none)_ | Anti-pattern: 'I want' clause with 'and' — two capabilities bundled | +| 10 | regression | _(none)_ | Anti-pattern: single-value placeholder {error type} | +| 11 | regression | _(none)_ | Anti-pattern: non-observable outcome (background job) | +| 12 | edge-case | _(none)_ | Duplicate AC across two User Stories → shared Functionality | ## Trigger eval summary -14 entries: 10 `should_trigger=true`, 4 `should_trigger=false` +18 entries: 13 `should_trigger=true`, 5 `should_trigger=false` | Routes to | Query count | |---|---| | living-doc-create-feature | 1 | | living-doc-create-functionality | 1 | -| living-doc-scenario-creator | 1 | +| living-doc-scenario-creator | 2 | | living-doc-gap-finder | 1 | diff --git a/skills/living-doc-create-user-story/evals/trigger-eval.json b/skills/living-doc-create-user-story/evals/trigger-eval.json index 7f9d3d2..4a75a21 100644 --- a/skills/living-doc-create-user-story/evals/trigger-eval.json +++ b/skills/living-doc-create-user-story/evals/trigger-eval.json @@ -12,5 +12,9 @@ {"id": 11, "query": "Document the checkout page as a Feature entity", "should_trigger": false, "reason": "Feature entity creation — routes to living-doc-create-feature"}, {"id": 12, "query": "Document the atomic behavior: validate cart is not empty", "should_trigger": false, "reason": "Atomic behavior is a Functionality — routes to living-doc-create-functionality"}, {"id": 13, "query": "Generate BDD scenarios for US-001", "should_trigger": false, "reason": "Scenario generation — routes to living-doc-scenario-creator"}, - {"id": 14, "query": "What test gaps exist in our living documentation?", "should_trigger": false, "reason": "Gap analysis — routes to living-doc-gap-finder"} + {"id": 14, "query": "What test gaps exist in our living documentation?", "should_trigger": false, "reason": "Gap analysis — routes to living-doc-gap-finder"}, + {"id": 15, "query": "I want to write requirements for the loyalty points feature", "should_trigger": true, "reason": "'document a business requirement' pattern — User Story elicitation"}, + {"id": 16, "query": "Is my user story well-formed? Here it is: 'As a system, process the payment'", "should_trigger": true, "reason": "Validating a User Story narrative is a core skill task"}, + {"id": 17, "query": "My I-want clause contains 'and' — is that OK for a User Story?", "should_trigger": true, "reason": "Reviewing User Story narrative correctness is a core task of this skill"}, + {"id": 18, "query": "Create a BDD scenario for the checkout User Story", "should_trigger": false, "reason": "Scenario creation from a User Story — routes to living-doc-scenario-creator"} ] diff --git a/skills/living-doc-gap-finder/SKILL.md b/skills/living-doc-gap-finder/SKILL.md index 0fcbea3..9075165 100644 --- a/skills/living-doc-gap-finder/SKILL.md +++ b/skills/living-doc-gap-finder/SKILL.md @@ -3,14 +3,16 @@ name: living-doc-gap-finder description: > Identify gaps in the living documentation by combining bottom-up UI/code exploration with top-down requirement checking. Activate when auditing living doc completeness, finding - undocumented behaviors, discovering orphan tests with no AC link, detecting untested ACs, - producing a documentation coverage gap report, or proposing new living doc entities to fill - identified gaps. Orchestrates living-doc-create-* skills. + undocumented behaviors, discovering orphan tests with no AC link, orphan Functionalities with + no parent Feature, detecting untested ACs, producing a documentation coverage gap report + (including batch runs for large suites), or proposing new living doc entities to fill + identified gaps. Orchestrates living-doc-pageobject-scan and living-doc-create-* skills. Triggers on: "find what's not documented", "living doc gaps", "what's missing in living doc", - "find undocumented features", "orphan tests", "untested AC", "documentation coverage", - "gap report", "what's not covered", "living doc audit", "documentation audit". + "find undocumented features", "orphan tests", "orphan functionalities", "untested AC", + "documentation coverage", "gap report", "what's not covered", "living doc audit", + "documentation audit". Does NOT trigger for: creating new living doc objects (use living-doc-create-* skills). - Orchestrates: all create-* skills. + Orchestrates: living-doc-pageobject-scan, living-doc-scenario-creator, and all create-* skills. license: Apache-2.0 compatibility: GitHub Copilot --- diff --git a/skills/living-doc-gap-finder/evals/evals.json b/skills/living-doc-gap-finder/evals/evals.json index 62548a0..274af84 100644 --- a/skills/living-doc-gap-finder/evals/evals.json +++ b/skills/living-doc-gap-finder/evals/evals.json @@ -5,7 +5,7 @@ "id": 1, "category": "happy-path", "prompt": "Run a gap analysis on our living documentation. File: evals/files/catalog-snapshot.json", - "expected_output": "Agent analyzes the snapshot and produces a gap report with: Blocker — US-001-AC-2 and US-001-AC-3 have no linked tests; Blocker — US-002-AC-1 and US-002-AC-2 have no linked tests; Blocker — all 4 ACs of US-007 have no linked tests; Blocker — FUNC-apply-discount has 5 ACs with no linked tests (Gap type 1 covers Functionality ACs, not just User Story ACs); Important — /account/preferences screen discovered in webapp with no Feature entity; Important — FEAT-promo and FEAT-orphan each have no linked User Stories (orphan Features); Important — test_order_history.py, test_login_flow.feature, and the 'View paginated order history' BDD scenario have no linked ACs (orphan tests); Nit — FEAT-orphan has no Functionalities (empty Feature). Documentation coverage reported for both US ACs and Functionality ACs separately.", + "expected_output": "Agent analyzes the snapshot and produces a gap report with: Blocker — US-001-AC-2 and US-001-AC-3 have no linked tests; Blocker — US-002-AC-1 and US-002-AC-2 have no linked tests; Blocker — all 4 ACs of US-007 have no linked tests; Blocker — FUNC-apply-discount has 5 ACs with no linked tests (Gap type 1 applies to Functionality ACs — report as UNTESTED_AC Blocker, not UNDOCUMENTED_FUNCTIONALITY); Important — /account/preferences screen discovered in webapp with no Feature entity (after normalisation: /account/orders ↔ FEAT-account, /reports/legacy ↔ FEAT-orphan are already documented); Important — FEAT-promo and FEAT-orphan each have no linked User Stories (orphan Features); Important — US-007 has no linked Feature (orphan User Story); Important — test_order_history.py, test_login_flow.feature, and the 'View paginated order history' BDD scenario have no linked ACs (orphan tests); Nit — FEAT-account and FEAT-orphan have no Functionalities defined (empty Features). Documentation coverage reported separately for US ACs and Functionality ACs.", "files": [ "evals/files/catalog-snapshot.json" ], @@ -19,8 +19,10 @@ "Identifies FEAT-orphan as orphan Feature (Important)", "Identifies test_order_history.py and test_login_flow.feature as orphan tests (Important)", "Identifies 'View paginated order history' BDD scenario as orphan test (Important)", - "Identifies FEAT-orphan as empty Feature (Nit — no Functionalities)", - "Calculates documentation coverage percentage" + "Identifies FEAT-account and FEAT-orphan as empty Features (Nit — no Functionalities)", + "Identifies US-007 as orphan User Story (Important — no linked Feature)", + "Normalises undocumented surfaces: only /account/preferences is truly undocumented after matching existing Features", + "Calculates documentation coverage percentage separately for US ACs and Functionality ACs" ] }, { @@ -88,7 +90,7 @@ "id": 7, "category": "edge-case", "prompt": "A test is linked to an AC, but the AC was deleted from the living doc. How is this classified and what should I do?", - "expected_output": "This is a broken-link gap (variant of Gap type 4: ORPHAN_TEST). The test references an AC ID that no longer exists. Resolution options: (1) If the behavior the test covers is still required, recreate the Functionality/AC entity and relink. (2) If the behavior has been removed, the test should be deleted after confirming with the product owner. (3) If the AC was merged into another entity, update the test's link comment to the new AC ID. Never delete a test without product owner confirmation.", + "expected_output": "This is a broken-link gap (variant of Gap type 6: ORPHAN_TEST). The test references an AC ID that no longer exists. Resolution options: (1) If the behavior the test covers is still required, recreate the Functionality/AC entity and relink. (2) If the behavior has been removed, the test should be deleted after confirming with the product owner. (3) If the AC was merged into another entity, update the test's link comment to the new AC ID. Never delete a test without product owner confirmation.", "files": [], "expectations": [ "Classifies as broken-link orphan test", @@ -110,6 +112,59 @@ "Gaps ordered by severity (Blocker before Important before Nit)", "Diagnostic only — no entity creation or modification" ] + }, + { + "id": 9, + "category": "regression", + "prompt": "A test references AC:US-042-01 but that AC was deprecated last sprint. What gap type is this and how do I resolve it?", + "expected_output": "This is a stale reference (Gap type 7 — Important). The active test references a Deprecated AC. Resolution options: (1) update the test's link to the active replacement AC if the behavior was superseded; (2) reinstate the AC using living-doc-update if it was deprecated in error; (3) if the behavior was intentionally removed, delete the test after product owner confirmation. The test must not be deleted without product owner confirmation.", + "files": [], + "expectations": [ + "Classifies as Gap type 7: STALE_REFERENCE", + "Classifies severity as Important", + "Provides three resolution options: relink to new AC, reinstate AC, or delete after PO confirmation", + "Warns against deleting test without product owner confirmation", + "Notes reinstatement path via living-doc-update" + ] + }, + { + "id": 10, + "category": "edge-case", + "prompt": "We have 50 orphan tests and 30 untested ACs across the entire platform. Should I run a single all-domain gap report and work through everything at once?", + "expected_output": "No — use the two-phase strategy. Phase 1: ensure every User Story has at least one covered AC. List all User Stories with zero covered ACs, cover the first AC of each before moving on. This establishes a minimum traceability baseline. Phase 2: once every US has at least one covered AC, rank gap clusters by count, prioritise the highest-risk domains first (payment, auth, security), batch by Feature or domain, and iterate. Processing all 80 gaps at once produces an unmanageable report and obscures progress.", + "files": [], + "expectations": [ + "Recommends two-phase strategy over single full-pass", + "Phase 1: ensure every US has at least one covered AC before proceeding", + "Phase 2: rank by count, prioritise high-risk domains, batch by domain", + "Explains why a single full-platform pass is discouraged" + ] + }, + { + "id": 11, + "category": "happy-path", + "prompt": "A Functionality entity FUNC-promo-validate exists in the catalog but has no parent Feature linked. What gap type is this and what should I do?", + "expected_output": "This is an orphan Functionality (Gap type 5 — Important). A Functionality with no parent Feature is untraceable — it cannot be reached via the entity hierarchy and is missed in impact analyses. Resolution: identify or create the owning Feature and add FUNC-promo-validate to its functionalities list. If tests reference this Functionality's ACs, resolve those first (ORPHAN_TEST takes priority) before removing the Functionality.", + "files": [], + "expectations": [ + "Classifies as Gap type 5: ORPHAN_FUNCTIONALITY", + "Classifies severity as Important", + "Advises linking to an existing Feature or creating one", + "Warns: do not remove if tests reference this Functionality's ACs" + ] + }, + { + "id": 12, + "category": "regression", + "prompt": "The gap-finder script reports /reports/legacy as an UNDOCUMENTED_SURFACE but there is already a Feature entity 'Legacy Report Screen' (FEAT-orphan) in the catalog. Should this be reported as a gap?", + "expected_output": "No. After normalisation, /reports/legacy is already documented — FEAT-orphan (Legacy Report Screen) clearly owns that surface by name and domain meaning. The skill instructs to treat a discovered screen as already documented when an existing Feature clearly owns the same surface by path, name, or domain meaning. Remove this item from the gap report. FEAT-orphan still has other gaps (orphan Feature, empty Feature) but UNDOCUMENTED_SURFACE is not one of them.", + "files": [], + "expectations": [ + "Removes /reports/legacy from UNDOCUMENTED_SURFACE gaps after normalisation", + "Explains normalisation rule: surface matches by name/domain meaning", + "Notes FEAT-orphan still has ORPHAN_FEATURE and EMPTY_FEATURE gaps", + "Distinguishes raw script output from normalised report" + ] } ] } diff --git a/skills/living-doc-gap-finder/evals/fixture-map.md b/skills/living-doc-gap-finder/evals/fixture-map.md index cb5d19e..4c7ff49 100644 --- a/skills/living-doc-gap-finder/evals/fixture-map.md +++ b/skills/living-doc-gap-finder/evals/fixture-map.md @@ -4,24 +4,33 @@ | File | Description | |---|---| -| `evals/files/catalog-snapshot.json` | Snapshot of the living doc catalog + webapp inventory showing: 8 uncovered ACs (including 5 critical), 1 undocumented screen, 1 orphan Feature, 2 orphan tests, 5 Functionality ACs with no linked tests | +| `evals/files/catalog-snapshot.json` | Snapshot of the living doc catalog + webapp inventory showing: 8 uncovered US ACs, 5 uncovered Functionality ACs, 1 undocumented screen after normalisation, 2 orphan Features, 1 orphan User Story, 3 orphan tests, and 2 empty Features | +| `evals/files/gap-report.json` | Expected gap report output produced by compute_gaps.py before normalisation — used as reference for output-format eval | ## Eval to fixture mapping | Eval ID | Category | Fixture file(s) | Coverage | |---|---|---|---| -| 1 | happy-path | `catalog-snapshot.json` | Full gap analysis: all 5 gap types detected, gap report with severity levels, coverage % calculation | -| 2 | happy-path | _(none)_ | Coverage metric explanation and calculation | -| 3 | happy-path | _(none)_ | Orphan test resolution: link or create Functionality, never delete | +| 1 | happy-path | `catalog-snapshot.json` | Full gap analysis: all 9 gap types, severity levels, normalisation rules, coverage % | +| 2 | happy-path | _(none)_ | Coverage metric formula and separate US vs Functionality reporting | +| 3 | happy-path | _(none)_ | ORPHAN_TEST resolution: link or create Functionality, never delete | | 4 | regression | _(none)_ | Batch processing advice: domain-by-domain, prioritise by business risk | | 5 | negative | _(none)_ | Routing: creating a User Story → living-doc-create-user-story | +| 6 | paraphrase | _(none)_ | "Holes in living doc" → gap analysis framing | +| 7 | edge-case | _(none)_ | Broken-link orphan test: AC ID deleted from catalog | +| 8 | output-format | `gap-report.json` | Canonical gap report JSON: coverage section + gaps[] array structure | +| 9 | regression | _(none)_ | STALE_REFERENCE (Gap type 7): active test linked to deprecated AC | +| 10 | edge-case | _(none)_ | Two-phase strategy for 50+ orphan tests and untested ACs | +| 11 | happy-path | _(none)_ | ORPHAN_FUNCTIONALITY (Gap type 5): Functionality with no parent Feature | +| 12 | regression | `catalog-snapshot.json` | Normalisation: /reports/legacy ↔ FEAT-orphan — not an UNDOCUMENTED_SURFACE | ## Trigger eval summary -14 entries: 11 `should_trigger=true`, 3 `should_trigger=false` +19 entries: 13 `should_trigger=true`, 6 `should_trigger=false` | Routes to | Query count | |---|---| | living-doc-create-user-story | 1 | | living-doc-create-feature | 1 | -| living-doc-tutorial-creator | 1 | +| living-doc-update | 1 | +| gherkin-step | 1 | diff --git a/skills/living-doc-gap-finder/evals/trigger-eval.json b/skills/living-doc-gap-finder/evals/trigger-eval.json index b2b0fa0..8206c4b 100644 --- a/skills/living-doc-gap-finder/evals/trigger-eval.json +++ b/skills/living-doc-gap-finder/evals/trigger-eval.json @@ -12,6 +12,10 @@ {"id": 11, "query": "Find what's not documented in our test suite", "should_trigger": true, "reason": "'find what's not documented' trigger phrase"}, {"id": 12, "query": "Create a user story for the preferences screen gap", "should_trigger": false, "reason": "Creating a User Story — routes to living-doc-create-user-story"}, {"id": 13, "query": "Create a Feature entity for the account preferences screen", "should_trigger": false, "reason": "Creating a Feature — routes to living-doc-create-feature"}, - {"id": 14, "query": "Generate a tutorial from the checkout .feature file", "should_trigger": false, "reason": "Tutorial generation — routes to living-doc-tutorial-creator"}, - {"id": 15, "query": "Do a documentation audit to check for missing tests before the go-live", "should_trigger": true, "reason": "'documentation audit' trigger phrase"} + {"id": 14, "query": "Implement step definitions for the gap report scenario", "should_trigger": false, "reason": "Step definition implementation — routes to gherkin-step"}, + {"id": 15, "query": "Do a documentation audit to check for missing tests before the go-live", "should_trigger": true, "reason": "'documentation audit' trigger phrase"}, + {"id": 16, "query": "Which Functionalities in our living doc have no parent Feature?", "should_trigger": true, "reason": "Detecting ORPHAN_FUNCTIONALITY gaps — core gap-finder task"}, + {"id": 17, "query": "A test is pointing to a deprecated AC — what kind of gap is that?", "should_trigger": true, "reason": "Stale reference detection (Gap type 7) — core gap-finder task"}, + {"id": 18, "query": "We have 100 orphan tests — how should we batch the gap-finder run?", "should_trigger": true, "reason": "Batching strategy for large-scale gap analysis — gap-finder guidance task"}, + {"id": 19, "query": "Update US-042 to add a new AC", "should_trigger": false, "reason": "Updating an existing entity — routes to living-doc-update"} ] diff --git a/skills/living-doc-impact-analysis/SKILL.md b/skills/living-doc-impact-analysis/SKILL.md index cf70fa6..914c436 100644 --- a/skills/living-doc-impact-analysis/SKILL.md +++ b/skills/living-doc-impact-analysis/SKILL.md @@ -9,7 +9,8 @@ description: > changes need living doc coverage traced. Triggers on: "living doc impact", "what does this change affect", "impact of PR on living doc", "trace affected user stories", "affected features", "impact analysis", "living doc sign-off", - "what user stories are affected", "PR impact on docs". + "what user stories are affected", "which scenarios need re-running", "what needs re-testing", + "PR impact on docs". Does NOT trigger for: updating living doc (use living-doc-update), finding coverage gaps (use living-doc-gap-finder), creating new entities (use living-doc-create-* skills). diff --git a/skills/living-doc-impact-analysis/evals/evals.json b/skills/living-doc-impact-analysis/evals/evals.json index a184185..2e196a3 100644 --- a/skills/living-doc-impact-analysis/evals/evals.json +++ b/skills/living-doc-impact-analysis/evals/evals.json @@ -3,125 +3,117 @@ "evals": [ { "id": 1, - "type": "happy-path", - "description": "Developer opens a PR modifying a domain service and wants to know what living doc entities are affected.", + "category": "happy-path", "prompt": "PR #217 modifies PromoService.java to support stacked discounts. What living doc entities does this change affect?", - "expected": { - "skill_triggered": true, - "key_guidance": [ - "Map changed file to Feature via the feature_registry section in catalog.json (or run trace_impact.py --catalog catalog.json)", - "Trace Feature → Functionality → User Stories → ACs", - "Classify impact level (High for changed business logic)", - "Output structured impact map" - ] - } + "expected_output": "Agent maps PromoService.java to its Feature via the feature_registry in catalog.json (or runs trace_impact.py). Traces Feature \u2192 Functionality \u2192 User Stories \u2192 ACs. Classifies impact as High (changed business logic). Outputs a structured impact map listing affected Features, Functionalities, User Stories, and ACs that require review.", + "files": [], + "expectations": [ + "Maps changed file to Feature via the feature_registry section in catalog.json (or runs trace_impact.py --catalog catalog.json)", + "Traces Feature \u2192 Functionality \u2192 User Stories \u2192 ACs", + "Classifies impact level as High for changed business logic", + "Outputs a structured impact map" + ] }, { "id": 2, - "type": "regression", - "description": "Developer asks which scenarios need re-running after an API contract change.", + "category": "regression", "prompt": "We changed the /v2/orders endpoint to add a required 'currency' field. Which Gherkin scenarios need to be re-run?", - "expected": { - "skill_triggered": true, - "key_guidance": [ - "Identify changed API contract surface area", - "Map endpoint to Feature and Functionalities", - "List linked User Stories and ACs", - "List linked Gherkin scenarios requiring re-run" - ] - } + "expected_output": "Agent identifies the changed API contract surface area (/v2/orders endpoint). Maps the endpoint to its Feature and Functionalities. Lists the linked User Stories and ACs. Lists the linked Gherkin scenarios that reference this endpoint's behavior and must be re-run.", + "files": [], + "expectations": [ + "Identifies changed API contract surface area", + "Maps endpoint to Feature and Functionalities", + "Lists linked User Stories and ACs", + "Lists linked Gherkin scenarios requiring re-run" + ] }, { "id": 3, - "type": "regression", - "description": "Developer needs a living doc impact sign-off for a release.", + "category": "regression", "prompt": "We're about to release the checkout refactor. Can you produce a living doc impact sign-off checklist for the release?", - "expected": { - "skill_triggered": true, - "key_guidance": [ - "Identify all High-impact entities", - "Produce release sign-off checklist", - "Include: ACs reviewed, scenarios re-run, living-doc-update applied, gherkin-living-doc-sync run" - ] - } + "expected_output": "Agent identifies all High-impact entities in the checkout release scope. Produces a release sign-off checklist covering: ACs reviewed, linked scenarios re-run, living-doc-update applied for changed business rules, and gherkin-living-doc-sync run for affected feature files.", + "files": [], + "expectations": [ + "Identifies all High-impact entities", + "Produces a release sign-off checklist", + "Checklist includes: ACs reviewed, scenarios re-run, living-doc-update applied, gherkin-living-doc-sync run" + ] }, { "id": 4, - "type": "regression", - "description": "Changed module has no entry in the feature_registry section of catalog.json — should flag the gap.", - "prompt": "The ShippingCalculator.java was changed in the PR but it has no entry in the feature_registry section of catalog.json.", - "expected": { - "skill_triggered": true, - "key_guidance": [ - "Flag missing living doc coverage", - "Recommend invoking living-doc-create-functionality for the missing module", - "Note this as a High-impact gap" - ] - } + "category": "regression", + "prompt": "The ShippingCalculator.java was changed in the PR but it has no entry in the feature_registry section of catalog.json.", + "expected_output": "Agent flags missing living doc coverage as a High-impact gap. The changed module has no feature_registry entry and cannot be traced to a Feature. Recommends invoking living-doc-create-functionality to document the missing behavior, and notes that the feature_registry mapping must be added.", + "files": [], + "expectations": [ + "Flags missing living doc coverage as a High-impact gap", + "Notes the module has no feature_registry entry", + "Recommends invoking living-doc-create-functionality for the missing module", + "Notes that the feature_registry mapping must be added" + ] }, { "id": 5, - "type": "regression", - "description": "Infra-only change with no business logic — should return 'None' impact level.", + "category": "regression", "prompt": "PR #300 updates the Kubernetes resource limits for the order-service deployment. What is the living doc impact?", - "expected": { - "skill_triggered": true, - "key_guidance": [ - "Classify as None impact level", - "Config/infra changes do not require living doc updates", - "Note in PR that no living doc update is needed" - ] - } + "expected_output": "Agent classifies this as a None impact level. Config and infrastructure changes that do not affect any business flow do not require living doc updates. Notes in the PR that no living doc update is needed.", + "files": [], + "expectations": [ + "Classifies as None impact level", + "Config/infra changes do not require living doc updates", + "Notes in PR that no living doc update is needed" + ] }, { "id": 6, - "type": "negative", - "description": "Developer asks to update the living doc — should trigger living-doc-update, not living-doc-impact-analysis.", + "category": "negative", "prompt": "Update US-042 to add a new AC for the expired promo path.", - "expected": { - "skill_triggered": false, - "reason": "Updating entities is handled by living-doc-update, not living-doc-impact-analysis" - } + "expected_output": "Updating living doc entities is out of scope for this skill \u2014 routes to living-doc-update. living-doc-impact-analysis traces which entities are affected by a code change; it does not create or modify entity content.", + "files": [], + "expectations": [ + "Does not update the User Story", + "Routes to living-doc-update", + "Explains the distinction: impact tracing vs. entity modification" + ] }, { "id": 7, - "type": "negative", - "description": "Developer asks to find living doc gaps — should trigger living-doc-gap-finder.", + "category": "negative", "prompt": "Which Functionalities don't have any User Stories?", - "expected": { - "skill_triggered": false, - "reason": "Finding coverage gaps is handled by living-doc-gap-finder, not living-doc-impact-analysis" - } + "expected_output": "Finding coverage gaps is out of scope for this skill \u2014 routes to living-doc-gap-finder. living-doc-impact-analysis traces the impact of code changes; coverage gap detection is handled by living-doc-gap-finder.", + "files": [], + "expectations": [ + "Does not search for orphan Functionalities", + "Routes to living-doc-gap-finder", + "Explains the distinction: impact analysis vs. gap detection" + ] }, { "id": 8, - "type": "paraphrase", - "description": "Same impact analysis intent — phrased as 'what needs re-testing' rather than 'what is affected'.", + "category": "paraphrase", "prompt": "We're about to merge a PR that changes the cart validation logic. What do we need to re-test in the living doc?", - "expected": { - "skill_triggered": true, - "key_guidance": [ - "Map changed code to Feature via the feature_registry section in catalog.json", - "Trace Feature → Functionality → User Stories → ACs", - "List all linked Gherkin scenarios that need re-running", - "Output structured re-test checklist" - ] - } + "expected_output": "Agent identifies this as an impact analysis request despite 're-test' phrasing. Maps the changed cart validation code to its Feature via the feature_registry in catalog.json. Traces Feature \u2192 Functionality \u2192 User Stories \u2192 ACs. Lists all linked Gherkin scenarios that need re-running. Outputs a structured re-test checklist.", + "files": [], + "expectations": [ + "Identifies this as an impact analysis request despite 're-test' phrasing", + "Maps changed code to Feature via the feature_registry section in catalog.json", + "Traces Feature \u2192 Functionality \u2192 User Stories \u2192 ACs", + "Lists all linked Gherkin scenarios that need re-running", + "Outputs a structured re-test checklist" + ] }, { "id": 9, - "type": "edge-case", - "description": "PR touches a shared utility class used by multiple Features — impact should fan out to all.", + "category": "edge-case", "prompt": "PR #410 modifies MoneyUtils.java, a shared utility class used by checkout, refunds, and the promotions engine. What is the living doc impact?", - "expected": { - "skill_triggered": true, - "key_guidance": [ - "Fan out the impact analysis to all Features that reference MoneyUtils", - "List all Functionalities in each Feature that call MoneyUtils", - "Classify all as High impact — shared utility changes affect all consumers", - "Produce a consolidated impact map across all three Feature areas" - ] - } + "expected_output": "Agent fans out the impact analysis to all Features that import or reference MoneyUtils. Lists all Functionalities within each Feature area that call MoneyUtils. Classifies all as High impact \u2014 shared utility changes propagate to all consumers. Produces a consolidated impact map across all three Feature areas (checkout, refunds, promotions).", + "files": [], + "expectations": [ + "Fans out impact analysis to all Features that reference MoneyUtils", + "Lists all Functionalities in each Feature that call MoneyUtils", + "Classifies all as High impact \u2014 shared utility changes affect all consumers", + "Produces a consolidated impact map across all three Feature areas" + ] }, { "id": 10, @@ -151,6 +143,34 @@ "Test coverage section lists tests to add or update", "Old and new signatures in fenced code blocks" ] + }, + { + "id": 12, + "category": "edge-case", + "prompt": "PR #501 modifies only OrderServiceTest.java and adds a new mock in MockNotificationClient.java. What is the living doc impact?", + "expected_output": "Agent classifies all changed files as test files and mocks \u2014 not domain logic or API contract. Impact level is None. Test-only changes do not affect business logic or living doc entities. Notes in the PR that no living doc update is needed.", + "files": [], + "expectations": [ + "Classifies all changed files as test files and mocks \u2014 not domain logic or API contract", + "Impact level: None \u2014 test-only changes do not affect living doc", + "No living doc update required", + "Notes in PR that no living doc update is needed" + ] + }, + { + "id": 13, + "category": "happy-path", + "prompt": "PR #222 modifies both DiscountService.java (domain logic) and DiscountController.java (REST controller). What living doc entities are affected?", + "expected_output": "Agent classifies DiscountService.java as domain logic (High impact) and DiscountController.java as API contract (High impact). Traces both files to the owning Feature via the feature_registry in catalog.json. Traces Feature \u2192 Functionalities \u2192 User Stories \u2192 ACs for each changed file. Consolidates entities appearing more than once as higher-risk. Outputs a single consolidated impact map covering both changed files.", + "files": [], + "expectations": [ + "Classifies DiscountService.java as domain logic \u2014 High impact", + "Classifies DiscountController.java as API contract \u2014 High impact", + "Traces both files to the owning Feature (e.g. FEAT-promotions) via feature_registry", + "Traces Feature \u2192 Functionalities \u2192 User Stories \u2192 ACs for both changed files", + "Consolidates entities appearing more than once \u2014 higher risk", + "Outputs a single consolidated impact map covering both changed files" + ] } ] } diff --git a/skills/living-doc-impact-analysis/evals/fixture-map.md b/skills/living-doc-impact-analysis/evals/fixture-map.md index ae432a4..fe71159 100644 --- a/skills/living-doc-impact-analysis/evals/fixture-map.md +++ b/skills/living-doc-impact-analysis/evals/fixture-map.md @@ -5,25 +5,41 @@ | 1 | happy-path | *(no file — PR domain logic impact map scenario)* | | 2 | regression | *(no file — API contract change scenario re-run list)* | | 3 | regression | *(no file — release sign-off checklist scenario)* | -| 4 | regression | *(no file — changed module missing from FEATURE_REGISTRY)* | +| 4 | regression | *(no file — changed module missing from feature_registry in catalog.json)* | | 5 | regression | *(no file — infra-only change None impact level)* | | 6 | negative | *(no file — update entity redirect to living-doc-update)* | | 7 | negative | *(no file — gap-finding redirect to living-doc-gap-finder)* | +| 8 | paraphrase | *(no file — "what needs re-testing" re-test checklist framing)* | +| 9 | edge-case | *(no file — shared utility MoneyUtils fan-out to all consumers)* | +| 10 | output-format | *(no file — code-level impact report: method signature change format)* | +| 11 | file-based | `changed-notification-service.py` | Impact of NotificationClient.send() signature change | +| 12 | edge-case | *(no file — test-only PR: None impact level)* | +| 13 | happy-path | *(no file — PR with domain service + REST controller: fan-out trace)* | +| 14 | regression | *(no file — shared utility rounding change fan-out across three Features)* | ## Coverage summary -- happy-path: 1 (PR domain logic full impact trace) -- regression: 4 (API contract, release sign-off, missing registry entry, infra-only) +- happy-path: 2 (domain logic impact trace, multi-file fan-out) +- regression: 5 (API contract, release sign-off, missing registry entry, infra-only, shared utility) - negative: 2 (update entity redirect, gap-finder redirect) +- paraphrase: 1 (re-test checklist framing) +- edge-case: 2 (shared utility fan-out, test-only None impact) +- output-format: 1 (method signature change format) +- file-based: 1 (NotificationClient signature change) ## Rules exercised | Rule | Eval ID | |---|---| -| Map changed file → Feature → US → scenarios | 1 | +| Map changed file → Feature → US → scenarios | 1, 13 | | API contract change impact trace | 2 | | Release sign-off checklist | 3 | -| Flag missing FEATURE_REGISTRY coverage | 4 | +| Flag missing feature_registry coverage | 4 | | Classify infra change as None impact | 5 | | Out-of-scope: update entity → living-doc-update | 6 | | Out-of-scope: find gaps → living-doc-gap-finder | 7 | +| Re-test checklist framing | 8 | +| Shared utility fan-out to all consumers | 9, 14 | +| Method signature change code-level format | 10 | +| File-based method signature analysis | 11 | +| Test-only change → None impact | 12 | diff --git a/skills/living-doc-impact-analysis/evals/trigger-eval.json b/skills/living-doc-impact-analysis/evals/trigger-eval.json index f0baa5b..92b6271 100644 --- a/skills/living-doc-impact-analysis/evals/trigger-eval.json +++ b/skills/living-doc-impact-analysis/evals/trigger-eval.json @@ -64,5 +64,29 @@ "query": "Create a new User Story for the stacked discount feature.", "should_trigger": false, "reason": "Creating new entities is handled by living-doc-create-user-story." + }, + { + "id": "t12-what-needs-retesting", + "query": "What do we need to re-test after this refactor?", + "should_trigger": true, + "reason": "'which scenarios need re-running' pattern — living doc impact re-test checklist." + }, + { + "id": "t13-shared-utility-impact", + "query": "MoneyUtils was changed — how far does the impact spread in the living doc?", + "should_trigger": true, + "reason": "Shared utility impact fan-out analysis — 'impact of PR on living doc' trigger." + }, + { + "id": "t14-test-only-change", + "query": "PR #501 only changes test files. Does it need a living doc update?", + "should_trigger": true, + "reason": "Still an impact analysis request — expected answer is 'None' impact level; skill should confirm no update needed." + }, + { + "id": "t15-not-gap-finder-orphan", + "query": "Find all Functionalities with no linked User Stories.", + "should_trigger": false, + "reason": "Finding coverage gaps is handled by living-doc-gap-finder." } ] diff --git a/skills/living-doc-pageobject-scan/SKILL.md b/skills/living-doc-pageobject-scan/SKILL.md index 27cac98..9de6d30 100644 --- a/skills/living-doc-pageobject-scan/SKILL.md +++ b/skills/living-doc-pageobject-scan/SKILL.md @@ -2,14 +2,14 @@ name: living-doc-pageobject-scan description: > Explore an existing web application or test codebase to discover, create, and maintain PageObject - classes — the bottom-up entry point for BDD-driven UI testing. Activate when generating - PageObjects from a live webapp URL or test directory, updating PageObjects after UI changes, - bootstrapping a test suite for a new screen, linking discovered UI surfaces to Feature entities - in the living doc, or detecting PageObject drift after a UI refactor. + classes — the bottom-up entry point for BDD-driven UI testing. Use when generating PageObjects + from a live webapp URL or test directory, updating PageObjects after UI changes, bootstrapping + a test suite for a new screen, generating Functionality stubs from discovered UI elements, + updating the PageObject manifest after a redesign, or detecting PageObject drift. Triggers on: "scan this webapp", "generate pageobjects", "update pageobjects", "pageobject for this screen", "crawl the UI", "discover UI elements", "create page objects", "scan test suite for pageobjects", "living doc bottom-up", "bootstrap page objects", - "pageobject drift", "sync pageobjects". + "pageobject drift", "sync pageobjects", "update manifest", "functionality stubs from UI". Does NOT trigger for: creating User Stories (use living-doc-create-user-story), writing BDD scenarios (use living-doc-scenario-creator). Pairs with living-doc-create-functionality and living-doc-gap-finder. diff --git a/skills/living-doc-pageobject-scan/evals/evals.json b/skills/living-doc-pageobject-scan/evals/evals.json index d57985d..5e4ee74 100644 --- a/skills/living-doc-pageobject-scan/evals/evals.json +++ b/skills/living-doc-pageobject-scan/evals/evals.json @@ -110,6 +110,51 @@ "Methods reference class-level selector constants — no inline selectors", "At least one action method and one assertion method" ] + }, + { + "id": 9, + "category": "happy-path", + "prompt": "During a Create mode scan of the checkout page, the agent discovers a 'Confirm Order' button and an 'Apply Promo Code' form input. What Functionality stubs should be generated?", + "expected_output": "Agent generates two Functionality stubs: (1) 'Checkout Page – Confirm Order' (for the button) and (2) 'Checkout Page – Apply Promo Code' (for the form input). Both are output to features/functionalities/feat-checkout/ with @FUNC_ID:FUNC-UNKNOWN placeholder tags for team review. The agent notes these are candidates requiring formal definition via living-doc-create-functionality before FUNC- IDs are assigned.", + "files": [], + "expectations": [ + "Generates one Functionality stub per distinct behavior (button → action, form input → action)", + "Stub names follow: ''", + "Stubs output to features/functionalities// directory", + "Stubs use @FUNC_ID:FUNC-UNKNOWN placeholder tags", + "Notes formal definition via living-doc-create-functionality is the next step" + ] + }, + { + "id": 10, + "category": "output-format", + "prompt": "Show me the expected structure of a generated CheckoutPage PageObject in TypeScript for a screen at /checkout linked to FEAT-003.", + "expected_output": "Output is a TypeScript code block. First line: '// living-doc: FEAT-003 | /checkout'. Import line: 'import { type Page, type Locator, expect } from \"@playwright/test\"'. Class is declared 'export class CheckoutPage'. Locator fields are declared as 'readonly'. The constructor takes 'readonly page: Page' and initialises each Locator via page.getByTestId(). Methods are async, named in camelCase, each wrapping a single interaction or assertion. No inline selectors in method bodies — only the class Locator fields are used.", + "files": [], + "expectations": [ + "Output is a TypeScript code block", + "File-level comment: // living-doc: FEAT-003 | /checkout", + "Import from @playwright/test (Page, Locator, expect)", + "Class exported: export class CheckoutPage", + "Locator fields declared as readonly", + "Constructor takes readonly page: Page", + "Methods are async camelCase", + "No inline selectors — Locators used from class fields" + ] + }, + { + "id": 11, + "category": "edge-case", + "prompt": "During a Maintain mode rescan, a route /admin/orders requires a multi-step authentication flow to reach. How should this be handled?", + "expected_output": "Check the manifest entry for /admin/orders for a navigation_context.navigation_steps field. If navigation steps are recorded (e.g. log in, navigate to admin panel, open orders tab), follow those steps instead of rediscovering the route from scratch. If no navigation context exists, prompt the team to add it to the manifest so future rescans can navigate to authenticated/multi-step routes reliably. Never skip a route just because it requires authentication — use the authentication strategy appropriate to the auth type (cookie/session, OAuth token injection, or TOTP).", + "files": [], + "expectations": [ + "Checks manifest for navigation_context.navigation_steps", + "Uses recorded navigation steps if available", + "Prompts to add navigation context if missing", + "Does not skip the route due to authentication requirement", + "Names appropriate auth strategy for the auth type" + ] } ] } diff --git a/skills/living-doc-pageobject-scan/evals/fixture-map.md b/skills/living-doc-pageobject-scan/evals/fixture-map.md new file mode 100644 index 0000000..6b15a76 --- /dev/null +++ b/skills/living-doc-pageobject-scan/evals/fixture-map.md @@ -0,0 +1,32 @@ +# Fixture Map — living-doc-pageobject-scan + +## Fixture files + +No fixture files for this skill. All evals are conversational or reference live webapp URLs. + +## Eval to fixture mapping + +| Eval ID | Category | Fixture file(s) | Coverage | +|---|---|---|---| +| 1 | happy-path | _(none — URL-based Create mode)_ | Bootstrap CheckoutPage from /checkout: elements, selectors, Feature link | +| 2 | happy-path | _(none)_ | Fragile positional selector: FRAGILE comment + data-testid recommendation | +| 3 | happy-path | _(none)_ | Maintain mode: renamed data-testid selector → BREAKING CHANGE report | +| 4 | regression | _(none)_ | Maintain mode: removed element → BREAKING comment, never auto-delete method | +| 5 | negative | _(none)_ | Routing: BDD scenario generation → living-doc-scenario-creator | +| 6 | paraphrase | _(none)_ | "Playwright tests failing after redesign" → Maintain mode | +| 7 | edge-case | _(none)_ | API endpoint: PageObjects are for UI only → living-doc-create-functionality | +| 8 | output-format | _(none)_ | Python CheckoutPage skeleton: ALL_CAPS constants, method stubs, living-doc header | +| 9 | happy-path | _(none)_ | Create mode Step 5: Functionality stubs from discovered behaviors | +| 10 | output-format | _(none)_ | TypeScript CheckoutPage: readonly Locators, async methods, living-doc header | +| 11 | edge-case | _(none)_ | Maintain mode: multi-step auth route — navigation_context.navigation_steps | + +## Trigger eval summary + +18 entries: 14 `should_trigger=true`, 4 `should_trigger=false` + +| Routes to | Query count | +|---|---| +| living-doc-create-user-story | 1 | +| living-doc-scenario-creator | 1 | +| living-doc-update | 1 | +| living-doc-create-functionality | 1 (API endpoint redirect) | diff --git a/skills/living-doc-pageobject-scan/evals/trigger-eval.json b/skills/living-doc-pageobject-scan/evals/trigger-eval.json index 03867b5..3caf5a4 100644 --- a/skills/living-doc-pageobject-scan/evals/trigger-eval.json +++ b/skills/living-doc-pageobject-scan/evals/trigger-eval.json @@ -12,5 +12,9 @@ {"id": 11, "query": "There is pageobject drift after the latest UI change — what do I do?", "should_trigger": true, "reason": "'pageobject drift' trigger phrase"}, {"id": 12, "query": "Sync the PageObjects with the current app", "should_trigger": true, "reason": "'sync pageobjects' trigger phrase"}, {"id": 13, "query": "Create a User Story for the checkout screen", "should_trigger": false, "reason": "Creating User Stories — routes to living-doc-create-user-story"}, - {"id": 14, "query": "Generate BDD scenarios for User Story US-007", "should_trigger": false, "reason": "Generating BDD scenarios from a User Story — routes to living-doc-scenario-creator"} + {"id": 14, "query": "Generate BDD scenarios for User Story US-007", "should_trigger": false, "reason": "Generating BDD scenarios from a User Story — routes to living-doc-scenario-creator"}, + {"id": 15, "query": "Detect whether the login PageObject has drifted after the redesign", "should_trigger": true, "reason": "'pageobject drift' / 'update pageobjects' pattern — Maintain mode trigger"}, + {"id": 16, "query": "Deprecate the checkout feature", "should_trigger": false, "reason": "Deprecating a living doc entity — routes to living-doc-update"}, + {"id": 17, "query": "Generate Functionality stubs from the elements discovered on the checkout screen", "should_trigger": true, "reason": "Functionality stub generation from discovered UI behaviors is part of Create mode Step 5"}, + {"id": 18, "query": "Update the manifest for the admin portal after the UI redesign", "should_trigger": true, "reason": "Maintain mode manifest update — 'update pageobjects' / 'sync pageobjects' trigger pattern"} ] diff --git a/skills/living-doc-scenario-creator/SKILL.md b/skills/living-doc-scenario-creator/SKILL.md index 1bab175..670e180 100644 --- a/skills/living-doc-scenario-creator/SKILL.md +++ b/skills/living-doc-scenario-creator/SKILL.md @@ -3,16 +3,15 @@ name: living-doc-scenario-creator description: > From User Stories and Acceptance Criteria, generate BDD Gherkin scenario skeletons in .feature files and identify step implementations needed using available PageObjects. - Activate when generating Gherkin scenarios from a User Story, covering US AC with BDD - scenarios, mapping Given-When-Then to PageObject actions, identifying missing step - definitions, or auditing scenario-to-AC coverage. + Use when generating Gherkin scenarios from a User Story (e.g. US-007), covering US ACs with + BDD scenarios, mapping Given-When-Then to PageObject actions, auditing scenario-to-AC coverage, + or tagging partial AC coverage with aspect notation. Triggers on: "create BDD scenarios for user story", "generate scenarios for US", "cover AC with scenarios", "generate feature file from user story", "BDD from requirements", "scenario coverage for US", "map AC to scenarios", "gherkin from user story", "scenarios for US-", - "generate .feature file". + "generate .feature file", "AC coverage for US", "partial AC coverage". Does NOT trigger for: standalone Gherkin without a User Story (use gherkin-scenario), - implementing step definitions (use gherkin-step), writing unit tests (use test-unit-write), - doc gaps or undocumented behaviors (use living-doc-gap-finder). + implementing step definitions (use gherkin-step), doc gaps (use living-doc-gap-finder). Pairs with living-doc-create-user-story, gherkin-scenario, and living-doc-pageobject-scan. --- diff --git a/skills/living-doc-scenario-creator/evals/evals.json b/skills/living-doc-scenario-creator/evals/evals.json index d9656f1..ab0425d 100644 --- a/skills/living-doc-scenario-creator/evals/evals.json +++ b/skills/living-doc-scenario-creator/evals/evals.json @@ -33,7 +33,7 @@ "id": 3, "category": "happy-path", "prompt": "Generating scenarios for US-007-01 — 'Customer confirms the order'. The CheckoutPage PageObject has a confirm_order() method. Generate the step stub for 'When the customer confirms the order'.", - "expected_output": "Case A — PageObject method exists. Generates a full step stub using the available method. Output includes: the missing step text, the PageObject candidate (CheckoutPage, FEAT-003), the suggested step file (tests/steps/checkout_steps.py), and the generated stub: @when('the customer confirms the order') / def step_confirm_order(context): / context.checkout_page = CheckoutPage(context.browser) or context.checkout_page.confirm_order(). No NotImplementedError — the PageObject method exists.", + "expected_output": "Case A — PageObject method exists. Generates a full step stub using the available method. Output includes: the missing step text, the PageObject candidate (CheckoutPage, FEAT-003), the suggested step file (tests/steps/checkout_steps.py), and the generated stub: @when('the customer confirms the order') / def step_confirm_order(context): / context.checkout_page.confirm_order(). No NotImplementedError — the PageObject method exists. PageObject setup belongs in before_scenario, not in the step body.", "files": [], "expectations": [ "Identifies as Case A (PageObject method exists)", @@ -107,6 +107,47 @@ "Error scenario title follows convention: ''", "Coverage table appended after the feature file" ] + }, + { + "id": 9, + "category": "edge-case", + "prompt": "AC:US-010-01 is: 'The login form displays the username input field and the password input field.' I need to write two scenarios — one for each field. How should the @AC: tags look?", + "expected_output": "Since each scenario covers only one aspect of the multi-aspect AC, encode the aspect using the /aspect:value param on the tag: first scenario uses '@AC:US-010-01/aspect:username-input' with comment '# AC:US-010-01 (v1.0.0 - Active) — ... | aspect: username input'. Second scenario uses '@AC:US-010-01/aspect:password-input' with comment mirroring the aspect. The /aspect:value format allows coverage scripts to map each scenario to its specific aspect while the base AC ID keeps overall traceability intact.", + "files": [], + "expectations": [ + "Uses @AC:US-010-01/aspect:username-input for the first scenario", + "Uses @AC:US-010-01/aspect:password-input for the second scenario", + "# AC: comment includes '| aspect: ' annotation", + "Comment annotation mirrors the /aspect: tag value", + "Both scenarios map back to AC:US-010-01 via the base AC ID" + ] + }, + { + "id": 10, + "category": "output-format", + "prompt": "What tags and header blocks should appear at the Feature level (not the Scenario level) in a generated .feature file for US-007?", + "expected_output": "The feature file starts with: (1) a file-level comment block showing the source URL, business value statement, and the full AC list with versions and states; (2) a '@US_ID:US-007' tag immediately before the 'Feature:' keyword; (3) the 'Feature: ' block with the As-a/I-can/so-that narrative. The @US_ID: tag provides machine-readable file-level traceability distinct from the per-scenario @AC: tags. The # AC: scenario-level comments appear only immediately above each individual Scenario: block.", + "files": [], + "expectations": [ + "File-level comment block with source URL, business value, and AC list", + "@US_ID:US-007 tag immediately before Feature: keyword", + "Feature: block with As-a/I-can/so-that narrative", + "@US_ID: tag is distinct from per-scenario @AC: tags", + "Per-scenario @AC: and # AC: appear only directly above individual Scenario: blocks" + ] + }, + { + "id": 11, + "category": "regression", + "prompt": "I ran coverage_report.py and it shows AC:US-007-03 is not covered. But I have a scenario for that AC. What might be wrong?", + "expected_output": "The most likely causes: (1) the @AC:US-007-03 Cucumber tag is missing or misspelled on the scenario — the script matches on the machine-readable @AC: tag, not the # AC: comment; (2) the tag uses the wrong ID format (e.g. @AC:US-7-03 instead of @AC:US-007-03); (3) the feature file is not in the directory the script was pointed at. Check the @AC: tag on the scenario exactly matches the AC ID in the living doc catalog. Run: scripts/coverage_report.py <living_doc_dir> <features_dir>.", + "files": [], + "expectations": [ + "Identifies @AC: tag mismatch/missing as the most likely cause", + "Distinguishes machine-readable @AC: tag from human-readable # AC: comment", + "Notes ID format must match exactly (zero-padded)", + "Provides the coverage_report.py command for diagnosis" + ] } ] } diff --git a/skills/living-doc-scenario-creator/evals/fixture-map.md b/skills/living-doc-scenario-creator/evals/fixture-map.md new file mode 100644 index 0000000..742ec1f --- /dev/null +++ b/skills/living-doc-scenario-creator/evals/fixture-map.md @@ -0,0 +1,32 @@ +# Fixture Map — living-doc-scenario-creator + +## Fixture files + +No fixture files for this skill. All evals use inline User Story/AC definitions within the prompt. + +## Eval to fixture mapping + +| Eval ID | Category | Fixture file(s) | Coverage | +|---|---|---|---| +| 1 | happy-path | _(none — inline US JSON in prompt)_ | Three active ACs → three scenarios; # AC: comments + @AC: tags; naming conventions | +| 2 | happy-path | _(none — inline AC list in prompt)_ | AC state filtering: Active → generated, Deprecated → skipped, Planned → skipped | +| 3 | happy-path | _(none)_ | Case A step stub: PageObject method exists — full stub, no NotImplementedError | +| 4 | regression | _(none)_ | Case B step stub: missing PageObject method — NotImplementedError + maintenance flag | +| 5 | negative | _(none)_ | Routing: standalone Gherkin without a US → gherkin-scenario | +| 6 | paraphrase | _(none)_ | "Write feature tests for US-nnn" → scenario generation request | +| 7 | edge-case | _(none)_ | All ACs Planned → zero scenarios generated; coverage report with skip reasons | +| 8 | output-format | _(none)_ | .feature file structure: @US_ID:, Feature: header, # AC: + @AC: per scenario | +| 9 | edge-case | _(none)_ | /aspect:value param encoding for multi-aspect ACs | +| 10 | output-format | _(none)_ | Feature-level @US_ID: tag vs. per-scenario @AC: tags | +| 11 | regression | _(none)_ | coverage_report.py: @AC: tag mismatch causing false "not covered" result | + +## Trigger eval summary + +18 entries: 13 `should_trigger=true`, 5 `should_trigger=false` + +| Routes to | Query count | +|---|---| +| gherkin-scenario | 1 | +| gherkin-step | 1 | +| living-doc-gap-finder | 1 | +| gherkin-living-doc-sync | 1 | diff --git a/skills/living-doc-scenario-creator/evals/trigger-eval.json b/skills/living-doc-scenario-creator/evals/trigger-eval.json index f235cec..795c6d8 100644 --- a/skills/living-doc-scenario-creator/evals/trigger-eval.json +++ b/skills/living-doc-scenario-creator/evals/trigger-eval.json @@ -11,6 +11,10 @@ {"id": 10, "query": "Generate a .feature file for the checkout flow", "should_trigger": true, "reason": "'generate .feature file' trigger phrase"}, {"id": 11, "query": "Write standalone Gherkin scenarios for an exploratory test", "should_trigger": false, "reason": "Standalone Gherkin without a User Story — routes to gherkin-scenario"}, {"id": 12, "query": "Implement the step definition for 'When the customer confirms the order'", "should_trigger": false, "reason": "Step definition implementation — routes to gherkin-step"}, - {"id": 13, "query": "Write a unit test for the promo code calculation", "should_trigger": false, "reason": "Unit test request — routes to test-unit-write"}, - {"id": 14, "query": "Find which User Stories have no Gherkin coverage at all", "should_trigger": false, "reason": "Finding doc gaps — routes to living-doc-gap-finder"} + {"id": 13, "query": "Write a unit test for the promo code calculation", "should_trigger": false, "reason": "Unit test request — out of scope for this toolkit (no test-unit-write skill defined)"}, + {"id": 14, "query": "Find which User Stories have no Gherkin coverage at all", "should_trigger": false, "reason": "Finding doc gaps — routes to living-doc-gap-finder"}, + {"id": 15, "query": "What is the AC coverage for US-007 — are all ACs covered by a scenario?", "should_trigger": true, "reason": "'scenario coverage for US' trigger phrase — auditing AC-to-scenario coverage"}, + {"id": 16, "query": "Generate BDD scenarios for all active ACs on US-003", "should_trigger": true, "reason": "'generate scenarios for US' / 'cover AC with scenarios' trigger phrase"}, + {"id": 17, "query": "My scenario for AC:US-010-01 covers only the username field — how should I tag it?", "should_trigger": true, "reason": "Aspect:value tag encoding is part of scenario generation for multi-aspect ACs"}, + {"id": 18, "query": "Sync feature files with living doc after AC changes", "should_trigger": false, "reason": "Syncing existing scenarios to living doc — routes to gherkin-living-doc-sync"} ] diff --git a/skills/living-doc-update/SKILL.md b/skills/living-doc-update/SKILL.md index 2b889dd..4d537c1 100644 --- a/skills/living-doc-update/SKILL.md +++ b/skills/living-doc-update/SKILL.md @@ -2,16 +2,16 @@ name: living-doc-update description: > Update, amend, or deprecate existing living documentation entities (User Stories, Features, - Functionalities). Activate when adding new ACs to an existing User Story, changing a Feature's - ownership or status, deprecating a Functionality whose code has been deleted, or promoting a - User Story from draft to ready. - Triggers on: "update user story", "add AC to user story", "deprecate feature", "mark US ready", - "change feature owner", "update functionality", "deprecate functionality", + Functionalities). Use when adding new ACs to an existing User Story, descoping or removing + an AC, changing a Feature's ownership or status, updating the Feature Registry after a team + restructure, deprecating a Functionality whose code has been deleted, or promoting a User + Story from draft to ready. + Triggers on: "update user story", "add AC to user story", "descope AC", "deprecate feature", + "mark US ready", "change feature owner", "update functionality", "deprecate functionality", "living doc update", "update living doc entity", "mark feature deprecated", "update AC", - "change status of user story". - Does NOT trigger for: creating new entities from scratch (use living-doc-create-user-story, - living-doc-create-feature, or living-doc-create-functionality), finding gaps - (use living-doc-gap-finder). + "change status of user story", "update feature registry". + Does NOT trigger for: creating new entities (use living-doc-create-*), finding gaps + (use living-doc-gap-finder), generating scenarios (use living-doc-scenario-creator). license: Apache-2.0 compatibility: GitHub Copilot diff --git a/skills/living-doc-update/evals/evals.json b/skills/living-doc-update/evals/evals.json index d32ea68..b86815f 100644 --- a/skills/living-doc-update/evals/evals.json +++ b/skills/living-doc-update/evals/evals.json @@ -3,125 +3,116 @@ "evals": [ { "id": 1, - "type": "happy-path", - "description": "Developer adds a new AC to an existing User Story.", + "category": "happy-path", "prompt": "I need to add a new acceptance criterion to US-042 covering the case where a promo code has expired. How do I update the living doc?", - "expected": { - "skill_triggered": true, - "key_guidance": [ - "Assign next sequential AC ID: US-042-AC-<n+1>", - "Use AC completeness checklist", - "Flag for gherkin-living-doc-sync if linked scenarios need updating", - "Output change summary" - ] - } + "expected_output": "Agent assigns the next sequential AC ID. Runs the AC completeness checklist. Flags any linked Gherkin scenarios for review via gherkin-living-doc-sync if they need updating. Outputs a change summary.", + "files": [], + "expectations": [ + "Assigns next sequential AC ID", + "Runs AC completeness checklist", + "Flags for gherkin-living-doc-sync if linked scenarios need updating", + "Outputs change summary" + ] }, { "id": 2, - "type": "regression", - "description": "Developer promotes a User Story from draft to ready but some invariants are unmet.", + "category": "regression", "prompt": "I want to mark US-089 as ready for development, but I'm not sure it's complete.", - "expected": { - "skill_triggered": true, - "key_guidance": [ - "Check all promotion invariants: narrative complete, Feature linked, AC exists, error-path AC exists, no TODO markers", - "Warn for each failing invariant", - "Block promotion until all invariants pass" - ] - } + "expected_output": "Agent checks all promotion invariants: narrative complete, Feature linked, at least one AC exists, at least one error-path AC exists, and no TODO markers remain. Warns for each failing invariant. Blocks promotion until all invariants pass.", + "files": [], + "expectations": [ + "Checks all promotion invariants: narrative complete, Feature linked, AC exists, error-path AC exists, no TODO markers", + "Warns for each failing invariant", + "Blocks promotion until all invariants pass" + ] }, { "id": 3, - "type": "regression", - "description": "Developer deprecates a Functionality whose code has been deleted.", + "category": "regression", "prompt": "The `LegacyPaymentGatewayService` has been deleted from the codebase. How do I handle this in the living doc?", - "expected": { - "skill_triggered": true, - "key_guidance": [ - "Set status: deprecated — never delete the entity file", - "Add deprecated_at and deprecation_reason fields", - "Link to commit that deleted the code if possible", - "Flag linked Gherkin scenarios for gherkin-living-doc-sync" - ] - } + "expected_output": "Agent sets status to deprecated on the entity \u2014 never deletes the entity file. Adds deprecated_at and deprecation_reason fields. Links to the commit that deleted the code if possible. Flags any linked Gherkin scenarios for gherkin-living-doc-sync.", + "files": [], + "expectations": [ + "Sets status: deprecated \u2014 never deletes the entity file", + "Adds deprecated_at and deprecation_reason fields", + "Links to the commit that deleted the code if possible", + "Flags linked Gherkin scenarios for gherkin-living-doc-sync" + ] }, { "id": 4, - "type": "regression", - "description": "Developer changes Feature ownership after team restructure.", + "category": "regression", "prompt": "The checkout feature is now owned by team-payments-v2 instead of team-checkout. How do I update this in the living doc?", - "expected": { - "skill_triggered": true, - "key_guidance": [ - "Update owners array in Feature entity JSON", - "Add owner_changed_at and owner_change_reason fields", - "Notify new owner if open User Stories exist" - ] - } + "expected_output": "Agent updates the owners array in the Feature entity JSON to team-payments-v2. Adds owner_changed_at and owner_change_reason fields. Notifies the new owner if open User Stories are linked to the Feature.", + "files": [], + "expectations": [ + "Updates owners array in Feature entity JSON", + "Adds owner_changed_at and owner_change_reason fields", + "Notifies new owner if open User Stories exist" + ] }, { "id": 5, - "type": "regression", - "description": "Developer modifies an AC description after sprint review.", + "category": "regression", "prompt": "After the sprint review, the product owner clarified the wording of US-042-AC-1. How do I update it without breaking traceability?", - "expected": { - "skill_triggered": true, - "key_guidance": [ - "Keep AC ID stable — never change the ID", - "Only update description, given, when, then fields", - "Flag for gherkin-living-doc-sync if linked scenario text needs updating" - ] - } + "expected_output": "Agent keeps the AC ID stable \u2014 never changes the ID. Only updates the description, given, when, and then fields. Flags for gherkin-living-doc-sync if the linked scenario step text needs updating to match the revised wording.", + "files": [], + "expectations": [ + "Keeps AC ID stable \u2014 never changes the ID", + "Updates only description, given, when, then fields", + "Flags for gherkin-living-doc-sync if linked scenario text needs updating" + ] }, { "id": 6, - "type": "negative", - "description": "Developer asks to create a new User Story — should trigger living-doc-create-user-story.", + "category": "negative", "prompt": "Create a new User Story for the express checkout flow.", - "expected": { - "skill_triggered": false, - "reason": "Creating new US is handled by living-doc-create-user-story, not living-doc-update" - } + "expected_output": "Creating a new User Story is out of scope for this skill \u2014 routes to living-doc-create-user-story. living-doc-update amends or deprecates existing entities; it does not create new ones.", + "files": [], + "expectations": [ + "Does not create a new User Story", + "Routes to living-doc-create-user-story", + "Explains the distinction: update existing vs. create new" + ] }, { "id": 7, - "type": "negative", - "description": "Developer asks to find living doc gaps — should trigger living-doc-gap-finder.", + "category": "negative", "prompt": "Which User Stories don't have any linked Gherkin scenarios?", - "expected": { - "skill_triggered": false, - "reason": "Finding gaps is handled by living-doc-gap-finder, not living-doc-update" - } + "expected_output": "Finding coverage gaps is out of scope for this skill \u2014 routes to living-doc-gap-finder. living-doc-update modifies existing entities; gap detection is handled by living-doc-gap-finder.", + "files": [], + "expectations": [ + "Does not search for User Stories without scenarios", + "Routes to living-doc-gap-finder", + "Explains the distinction: entity update vs. gap detection" + ] }, { "id": 8, - "type": "paraphrase", - "description": "Same 'add AC' intent phrased as 'update the story' rather than 'add AC'.", - "prompt": "US-089 needs updating — we discovered a new edge case during testing: when the delivery address is outside our shipping zone, the order should be blocked with a clear message. Can you update the story?", - "expected": { - "skill_triggered": true, - "key_guidance": [ - "Assign next sequential AC ID", - "AC format: Given customer with out-of-zone address / When order is placed / Then order is blocked with SHIPPING_ZONE_EXCLUDED error", - "Flag for gherkin-living-doc-sync if linked scenarios need updating", - "Output change summary with the new AC" - ] - } + "category": "paraphrase", + "prompt": "US-089 needs updating \u2014 we discovered a new edge case during testing: when the delivery address is outside our shipping zone, the order should be blocked with a clear message. Can you update the story?", + "expected_output": "Agent identifies this as an add-AC request despite 'update the story' phrasing. Assigns the next sequential AC ID. Forms the AC: Given customer with out-of-zone address / When order is placed / Then order is blocked with SHIPPING_ZONE_EXCLUDED error. Flags for gherkin-living-doc-sync if linked scenarios need updating. Outputs a change summary with the new AC.", + "files": [], + "expectations": [ + "Identifies this as an add-AC request despite 'update the story' phrasing", + "Assigns next sequential AC ID", + "AC format: Given out-of-zone address / When order placed / Then blocked with SHIPPING_ZONE_EXCLUDED error", + "Flags for gherkin-living-doc-sync if linked scenarios need updating", + "Outputs change summary with the new AC" + ] }, { "id": 9, - "type": "edge-case", - "description": "AC status changed to 'descoped' mid-sprint — should not be deleted, only status updated.", - "prompt": "We decided during the sprint to descope US-042-AC-3 — the promo stacking rule is moving to a future release. How do I handle this in the living doc without losing the work?", - "expected": { - "skill_triggered": true, - "key_guidance": [ - "Set the AC status to 'descoped' — do not delete the AC", - "Add descoped_at and descoped_reason fields", - "Add a future_release reference if the work is planned for a later sprint", - "Flag any linked Gherkin scenarios for @wip or @pending tagging via gherkin-living-doc-sync" - ] - } + "category": "edge-case", + "prompt": "We decided during the sprint to descope US-042-AC-3 \u2014 the promo stacking rule is moving to a future release. How do I handle this in the living doc without losing the work?", + "expected_output": "Agent sets the AC status to 'descoped' \u2014 does not delete the AC. Adds descoped_at and descoped_reason fields. Adds a future_release reference if the work is planned for a later sprint. Flags any linked Gherkin scenarios for @wip or @pending tagging via gherkin-living-doc-sync.", + "files": [], + "expectations": [ + "Sets AC status to 'descoped' \u2014 does not delete the AC", + "Adds descoped_at and descoped_reason fields", + "Adds a future_release reference if planned for a later sprint", + "Flags linked Gherkin scenarios for @wip or @pending tagging via gherkin-living-doc-sync" + ] }, { "id": 10, @@ -151,6 +142,60 @@ "Linked Gherkin scenario identified for re-sync", "No other ACs or sections modified" ] - } + }, + { + "id": 12, + "type": "happy-path", + "description": "Developer deprecates a Feature that has been superseded by a new Feature.", + "prompt": "The 'Legacy Payment Widget' (FEAT-legacy-payment-widget) has been replaced by the new 'Payment Page' (FEAT-payment-page). How do I deprecate the old Feature in the living doc?", + "expected_output": "Set status: deprecated on FEAT-legacy-payment-widget. Add deprecated_at (today's date), deprecation_reason (e.g. 'Replaced by FEAT-payment-page'), and superseded_by: 'FEAT-payment-page'. Never delete the entity — deprecation preserves the audit trail. Flag any Functionalities owned by FEAT-legacy-payment-widget for deprecation review. Flag any tests linked to those Functionalities for update or removal.", + "files": [], + "expectations": [ + "Sets status: deprecated — never deletes the entity", + "Adds deprecated_at, deprecation_reason, and superseded_by fields", + "superseded_by points to FEAT-payment-page", + "Flags owned Functionalities for deprecation review", + "Flags linked tests for update or removal" + ] + }, + { + "id": 13, + "type": "regression", + "description": "Developer asks whether to validate the entity after making changes.", + "prompt": "I just added a new AC to US-042. Should I run any validation before committing?", + "expected_output": "Yes — run scripts/validate_entity.py against the updated entity file. This checks required fields, ID format (US-<nnn>-AC-<n> pattern), status values, and AC structure. With --catalog flag it also checks referential integrity against the full catalog. If the script exits 0, the entity is valid (warnings are non-blocking). If it exits 1, fix the reported errors before committing.", + "files": [], + "expectations": [ + "Recommends running validate_entity.py after any update", + "Notes the script checks field requirements, ID format, and status values", + "Notes --catalog flag for referential integrity checks", + "Explains exit 0 (valid with warnings) vs exit 1 (errors to fix)" + ] + }, + { + "id": 14, + "type": "regression", + "description": "Developer asks how to handle promoting a User Story that fails an invariant check.", + "prompt": "I want to set US-089 status to 'active', but it only has one AC and it's a happy-path AC. Is that OK?", + "expected_output": "No. US-089 cannot be promoted to 'active' because it is missing at least one error or alternative-path AC. The promotion invariant requires at least one error/alternative AC to prevent User Stories from shipping with untested failure paths. Add at least one AC for a failure scenario before promoting. Example: 'When the delivery address is outside the shipping zone, the order is rejected with a clear reason.'", + "files": [], + "expectations": [ + "Blocks promotion — invariant not met", + "Identifies the failing invariant: missing error/alternative-path AC", + "Proposes an example error-path AC", + "Does not set status to active until the invariant passes" + ] }, + { + "id": 15, + "category": "regression", + "prompt": "AC:US-042-01 is currently ACTIVE at v1.0.0. The discount threshold rule changed \u2014 the minimum order value is now \u00a375 instead of \u00a350. I need to update the AC.", + "expected_output": "Agent shows OLD and NEW AC side by side for confirmation. Updates the AC description to reflect the new threshold. Bumps the version from v1.0.0 to v1.1.0 \u2014 required for all business-rule changes to ACTIVE ACs. The AC ID stays unchanged. Any linked Gherkin scenarios annotated with '# AC: US-042-01' are flagged as potentially stale and handed to gherkin-living-doc-sync.", + "files": [], + "expectations": [ + "Version bumped from v1.0.0 to v1.1.0", + "AC ID stays unchanged", + "OLD and NEW AC shown side by side before writing", + "Linked Gherkin scenarios flagged for re-sync via gherkin-living-doc-sync" + ] } ] } diff --git a/skills/living-doc-update/evals/fixture-map.md b/skills/living-doc-update/evals/fixture-map.md index 81e414b..cedf2c4 100644 --- a/skills/living-doc-update/evals/fixture-map.md +++ b/skills/living-doc-update/evals/fixture-map.md @@ -9,21 +9,36 @@ | 5 | regression | *(no file — modify AC description without breaking traceability)* | | 6 | negative | *(no file — create US redirect to living-doc-create-user-story)* | | 7 | negative | *(no file — gap-finding redirect to living-doc-gap-finder)* | +| 8 | paraphrase | *(no file — add AC phrased as "update the story")* | +| 9 | edge-case | *(no file — descope AC mid-sprint: status=descoped, do not delete)* | +| 10 | output-format | *(no file — AC text change: OLD/NEW diff + linked scenario list)* | +| 11 | file-based | `payment-living-doc.md` | AC-2 SLA change from 3 s to 1 s (p99) | +| 12 | happy-path | *(no file — Feature deprecation with superseded_by field)* | +| 13 | regression | *(no file — validate_entity.py post-update validation)* | +| 14 | regression | *(no file — US promotion blocked by missing error-path AC)* | ## Coverage summary -- happy-path: 1 (add AC to User Story) -- regression: 4 (US promotion check, deprecate Functionality, Feature ownership change, modify AC) +- happy-path: 2 (add AC to User Story, Feature deprecation with superseded_by) +- regression: 5 (US promotion check, deprecate Functionality, Feature ownership, AC ID stability, validate after update) - negative: 2 (create US redirect, gap-finder redirect) +- paraphrase: 1 (add AC phrased as "update the story") +- edge-case: 1 (descope AC mid-sprint) +- output-format: 1 (AC diff format) +- file-based: 1 (payment living doc SLA update) ## Rules exercised | Rule | Eval ID | |---|---| | Add AC to existing User Story | 1 | -| US promotion invariants check | 2 | -| Deprecate entity — never delete | 3 | +| US promotion invariants check | 2, 14 | +| Deprecate entity — never delete | 3, 12 | | Feature ownership update in JSON + registry | 4 | | AC ID stability when modifying description | 5 | | Out-of-scope: create US → living-doc-create-user-story | 6 | | Out-of-scope: find gaps → living-doc-gap-finder | 7 | +| Descope AC mid-sprint | 9 | +| Change summary format with OLD/NEW diff | 10, 11 | +| superseded_by field on Feature deprecation | 12 | +| validate_entity.py post-update check | 13 | diff --git a/skills/living-doc-update/evals/trigger-eval.json b/skills/living-doc-update/evals/trigger-eval.json index 3291af0..faab5eb 100644 --- a/skills/living-doc-update/evals/trigger-eval.json +++ b/skills/living-doc-update/evals/trigger-eval.json @@ -70,5 +70,29 @@ "query": "Generate Gherkin scenarios for US-042.", "should_trigger": false, "reason": "Generating scenarios is handled by living-doc-scenario-creator." + }, + { + "id": "t13-change-status", + "query": "Change the status of User Story US-089 to active.", + "should_trigger": true, + "reason": "'change status of user story' is a listed trigger phrase." + }, + { + "id": "t14-not-impact-analysis", + "query": "What does this PR affect in the living doc?", + "should_trigger": false, + "reason": "PR impact analysis is handled by living-doc-impact-analysis, not living-doc-update." + }, + { + "id": "t15-descope-ac", + "query": "Descope AC-3 on US-042 — it's moving to the next sprint.", + "should_trigger": true, + "reason": "Descoping an AC mid-sprint is an update operation covered by the 'update AC' / 'change status of user story' trigger patterns." + }, + { + "id": "t16-superseded-by", + "query": "Mark FEAT-legacy-payment-widget as deprecated and point to FEAT-payment-page as the replacement.", + "should_trigger": true, + "reason": "'mark feature deprecated' and 'deprecate feature' trigger phrases; superseded_by field is set as part of deprecation." } ] From e4cef548fcbadcfa3100bbd1dd0176888e5d6146 Mon Sep 17 00:00:00 2001 From: miroslavpojer <miroslav.pojer@absa.africa> Date: Wed, 27 May 2026 15:31:39 +0200 Subject: [PATCH 23/35] Update living documentation agent description and triggers for clarity and completeness --- .github/agents/living-doc-bdd-copilot.agent.md | 14 +++++++------- .github/agents/living-doc-copilot.agent.md | 12 +++++++----- 2 files changed, 14 insertions(+), 12 deletions(-) diff --git a/.github/agents/living-doc-bdd-copilot.agent.md b/.github/agents/living-doc-bdd-copilot.agent.md index 4a38b89..c0db2ed 100644 --- a/.github/agents/living-doc-bdd-copilot.agent.md +++ b/.github/agents/living-doc-bdd-copilot.agent.md @@ -2,13 +2,13 @@ description: > Bridge living documentation to executable tests. Explore web apps via MCP Playwright, generate and maintain PageObjects, Gherkin scenarios, and step definitions. - Covers webapp exploration with Business Seed assembly, iterative UI crawling, scenario - generation from User Story ACs, and BDD suite maintenance (RE-SCAN, HEALING, REMOVE). - Triggers: "scan webapp", - "generate pageobjects", "heal pageobjects", "generate scenarios", "sync gherkin", - "playwright crawl", "explore the app", "bdd copilot", "living doc bdd copilot", - "BDD pipeline", "crawl the UI", "create page objects", "generate feature file", - "scenario coverage", "step definitions", "gherkin from user story". + Covers webapp exploration with Business Seed assembly (seed.yaml, manifest.json), + iterative UI crawling with guided traversal support, scenario generation from User + Story ACs, and BDD suite maintenance (RE-SCAN, HEALING, REMOVE). Triggers: "scan + webapp", "generate pageobjects", "heal pageobjects", "generate scenarios", "sync + gherkin", "playwright crawl", "explore the app", "bdd copilot", "living doc bdd + copilot", "BDD pipeline", "crawl the UI", "create page objects", "generate feature + file", "scenario coverage", "step definitions", "gherkin from user story". tools: - read_file - replace_string_in_file diff --git a/.github/agents/living-doc-copilot.agent.md b/.github/agents/living-doc-copilot.agent.md index 935fff5..7793e52 100644 --- a/.github/agents/living-doc-copilot.agent.md +++ b/.github/agents/living-doc-copilot.agent.md @@ -2,11 +2,13 @@ description: > Maintain the living documentation catalog — single source of truth for requirements, behaviours, and traceability. Use for: creating Feature / Functionality / User Story - entities, updating or deprecating entities, analysing code change impact on docs, - finding documentation gaps, and PO planning in PLANNED state. - Triggers: "create user story", "document feature", "update AC", "impact analysis", - "living doc gaps", "PLAN mode", "HEALING mode", "deprecate entity", "living doc copilot", - "add AC to user story", "trace affected features", "update feature registry". + entities, updating or deprecating entities, checking AC completeness and promoting + User Stories to active, analysing code change impact on docs, finding documentation + gaps, and PO planning in PLANNED state. Triggers: "create user story", + "document feature", "update AC", "impact analysis", "living doc gaps", "PLAN mode", + "HEALING mode", "deprecate entity", "living doc copilot", "add AC to user story", + "trace affected features", "update feature registry", "mark US ready", + "check AC completeness". tools: - read_file - replace_string_in_file From 9edf2eb2b34e78c1cc226d42511e7501d56470a4 Mon Sep 17 00:00:00 2001 From: miroslavpojer <miroslav.pojer@absa.africa> Date: Wed, 27 May 2026 16:07:05 +0200 Subject: [PATCH 24/35] Update .gitignore and enhance living documentation skills with new context and function naming conventions --- .gitignore | 1 + skills/gherkin-step/SKILL.md | 30 ++++++++++++++++++++++ skills/living-doc-gap-finder/SKILL.md | 23 +++++++++-------- skills/living-doc-impact-analysis/SKILL.md | 29 +++++++++++++++++++++ skills/living-doc-pageobject-scan/SKILL.md | 12 ++++++++- 5 files changed, 83 insertions(+), 12 deletions(-) diff --git a/.gitignore b/.gitignore index 2cceedb..d96c070 100644 --- a/.gitignore +++ b/.gitignore @@ -1 +1,2 @@ **/__pycache__/* +skills-eval-workspace/* \ No newline at end of file diff --git a/skills/gherkin-step/SKILL.md b/skills/gherkin-step/SKILL.md index 6ef4301..2353147 100644 --- a/skills/gherkin-step/SKILL.md +++ b/skills/gherkin-step/SKILL.md @@ -26,6 +26,36 @@ scenario here. Explain that this skill covers **step definition code** only, the --- +## Context initialization — how PageObjects reach steps + +Step definitions receive a fresh `context` object each scenario. PageObjects must be attached to +`context` in a `before_scenario` hook (or a preceding `Given` step), not inside the step itself. + +```python +# ✅ — Before hook initialises the PageObject once per scenario +@before_scenario +def setup_pages(context): + context.checkout_page = CheckoutPage(context.browser.new_page()) +``` + +The `When` step then delegates without creating or managing the PageObject: + +```python +@when('the customer confirms the order') +def step_confirm_order(context): + context.checkout_page.confirm_order() # relies on before_scenario having run +``` + +--- + +## Function naming convention + +Name step functions after the business action, not the full step text: +- `step_confirm_order` ✅ — concise, action-based +- `step_customer_confirms_the_order` ❌ — verbatim transcription of the step + +--- + ## Keep step definitions thin Step definitions are bindings — they translate Gherkin text into calls to PageObjects, domain diff --git a/skills/living-doc-gap-finder/SKILL.md b/skills/living-doc-gap-finder/SKILL.md index 9075165..3a888ce 100644 --- a/skills/living-doc-gap-finder/SKILL.md +++ b/skills/living-doc-gap-finder/SKILL.md @@ -42,9 +42,10 @@ The Workflow section describes the logic the script encodes — read it for unde delegate the computation to the script rather than reproducing it through reasoning. Before presenting the final report, normalise the script output against the taxonomy in this skill: -- Gap type 1 applies to **both User Story ACs and Functionality ACs**. If a Functionality has ACs and no linked tests, report those ACs as `UNTESTED_AC` **Blockers** (you may summarise as `FUNC-xyz has N ACs with no linked tests`) and do **not** leave the same root cause only as `UNDOCUMENTED_FUNCTIONALITY`. +- The first gap type (`UNTESTED_AC`) applies to **both User Story ACs and Functionality ACs**. If a Functionality has ACs and no linked tests, report those ACs as `UNTESTED_AC` **Blockers** (you may summarise as `FUNC-xyz has N ACs with no linked tests`) and do **not** leave the same root cause only as `UNDOCUMENTED_FUNCTIONALITY`. - Report documentation coverage **separately** for User Story ACs and Functionality ACs, even if the raw script output gives a combined number. -- For Gap type 2, treat a discovered screen/API as already documented when an existing Feature clearly owns the same surface by path, name, or domain meaning (for example `/account/orders` ↔ `Account Dashboard`, `/reports/legacy` ↔ `Legacy Report Screen`). Only raise `UNDOCUMENTED_SURFACE` when no plausible owning Feature exists. +- For `UNDOCUMENTED_SURFACE`, treat a discovered screen/API as already documented when an existing Feature clearly owns the same surface by path, name, or domain meaning (for example `/account/orders` ↔ `Account Dashboard`, `/reports/legacy` ↔ `Legacy Report Screen`). Only raise `UNDOCUMENTED_SURFACE` when no plausible owning Feature exists. +- **Always refer to gap types by their name** (e.g. `ORPHAN_TEST`, `UNTESTED_AC`) — never by an ordinal number (e.g. "Gap type 6"). The priority order below is for triage, not for labelling gaps in the report. --- @@ -87,7 +88,7 @@ Traverse the entity graph top-down, starting from User Stories as roots: For each gap type: -**Gap type 1 — Untested AC:** +**UNTESTED_AC:** ``` For each AC in (UserStory.ACs + Functionality.ACs) where status IN (Active, Implemented) @@ -95,56 +96,56 @@ For each AC in (UserStory.ACs + Functionality.ACs) GAP: UNTESTED_AC ``` -**Gap type 2 — Undocumented UI surface:** +**UNDOCUMENTED_SURFACE:** ``` For each item in inventory (screens, API endpoints) where no Feature entity exists for this surface: GAP: UNDOCUMENTED_SURFACE ``` -**Gap type 3 — Orphan Feature:** +**ORPHAN_FEATURE:** ``` For each Feature reachable via entity relationships where user_stories == []: GAP: ORPHAN_FEATURE ``` -**Gap type 4 — Orphan User Story:** +**ORPHAN_USER_STORY:** ``` For each User Story in entity graph where user_story.features == []: GAP: ORPHAN_USER_STORY ``` -**Gap type 5 — Orphan Functionality:** +**ORPHAN_FUNCTIONALITY:** ``` For each Functionality in entity graph where functionality.parent_feature == null: GAP: ORPHAN_FUNCTIONALITY ``` -**Gap type 6 — Orphan test:** +**ORPHAN_TEST:** ``` For each test in inventory where no linked AC exists in any UserStory or Functionality: GAP: ORPHAN_TEST ``` -**Gap type 7 — Stale reference:** +**STALE_REFERENCE:** ``` For each test in inventory where linked_ac.status == Deprecated: GAP: STALE_REFERENCE ``` -**Gap type 8 — Undocumented Functionality:** +**UNDOCUMENTED_FUNCTIONALITY:** ``` For each Functionality reachable via Feature `functionalities` links where no test references this Functionality's ACs: GAP: UNDOCUMENTED_FUNCTIONALITY ``` -**Gap type 9 — Empty Feature:** +**EMPTY_FEATURE:** ``` For each Feature reachable via entity relationships where functionalities == []: diff --git a/skills/living-doc-impact-analysis/SKILL.md b/skills/living-doc-impact-analysis/SKILL.md index 914c436..4cd3275 100644 --- a/skills/living-doc-impact-analysis/SKILL.md +++ b/skills/living-doc-impact-analysis/SKILL.md @@ -49,6 +49,35 @@ to drive Steps 3–5 (impact classification, impact map narrative, and sign-off --- +## Fast path — infra/config-only and test-only PRs + +Before running the full workflow, check whether the PR scope is entirely out of living-doc reach. +If **all** changed files fall into one or more of these categories, issue a concise no-impact +verdict and stop — do not generate a full Impact Map: + +| Scope | Examples | Verdict | +|---|---|---| +| Pure infrastructure | Kubernetes manifests, Helm charts, Terraform, Docker resource limits | **No living doc impact** | +| Build / CI config | `Dockerfile`, GitHub Actions, `pom.xml` dependency bumps | **No living doc impact** | +| Test-only | `*Test.java`, `*Spec.ts`, mock/stub files, test fixtures | **No living doc impact** (unless a test references an AC that no longer exists — flag that separately) | +| Documentation / comments | `*.md`, `*.adoc`, Javadoc-only changes | **No living doc impact** | + +**Concise no-impact verdict format:** + +``` +Impact level: None. + +<PR description> is a <category> change. It does not modify business logic, API contracts, +event contracts, or UI behaviour, so no living doc entities require updating. + +Recommended action: note "no living doc update required" in the PR and proceed. +``` + +Skip Steps 2–5 for these PRs. Only escalate to the full workflow if at least one changed file +touches domain logic, an API contract, an event contract, or a UI component. + +--- + ## Step 1 — Identify the changed surface area Start from the code change (PR diff, renamed module, deleted endpoint): diff --git a/skills/living-doc-pageobject-scan/SKILL.md b/skills/living-doc-pageobject-scan/SKILL.md index 9de6d30..d2ff017 100644 --- a/skills/living-doc-pageobject-scan/SKILL.md +++ b/skills/living-doc-pageobject-scan/SKILL.md @@ -77,13 +77,23 @@ One PageObject class per distinct screen. Naming: `<ScreenName>Page`. # ✅ Generated skeleton — Python / Playwright # living-doc: FEAT-003 | /checkout class CheckoutPage: + ROUTE = '/checkout' ORDER_SUMMARY = '[data-testid="order-summary"]' CONFIRM_BUTTON = '[data-testid="confirm-order-btn"]' PROMO_INPUT = '[data-testid="promo-code-input"]' ERROR_BANNER = '[data-testid="error-banner"]' - def __init__(self, page): + def __init__(self, page: Page, base_url: str = '') -> None: self.page = page + self.base_url = base_url + + def open(self) -> 'CheckoutPage': + self.page.goto(f'{self.base_url}{self.ROUTE}') + self.wait_until_loaded() + return self + + def wait_until_loaded(self) -> None: + expect(self.page.locator(self.ORDER_SUMMARY)).to_be_visible() def enter_promo_code(self, code: str) -> None: self.page.fill(self.PROMO_INPUT, code) From 8e7ba762ec91f8bbd1282578a4c8890041c6719d Mon Sep 17 00:00:00 2001 From: miroslavpojer <miroslav.pojer@absa.africa> Date: Wed, 27 May 2026 16:49:28 +0200 Subject: [PATCH 25/35] Enhance evals with improved output formatting and additional queries for living documentation analysis --- .gitignore | 3 +- skills/gherkin-step/evals/evals.json | 32 +++-- skills/gherkin-step/evals/trigger-eval.json | 129 +++++++++++++++--- skills/living-doc-gap-finder/evals/evals.json | 44 +++--- .../evals/evals.json | 8 +- 5 files changed, 170 insertions(+), 46 deletions(-) diff --git a/.gitignore b/.gitignore index d96c070..632a17a 100644 --- a/.gitignore +++ b/.gitignore @@ -1,2 +1,3 @@ **/__pycache__/* -skills-eval-workspace/* \ No newline at end of file +skills-eval-workspace/* +agents-eval-workspace/* \ No newline at end of file diff --git a/skills/gherkin-step/evals/evals.json b/skills/gherkin-step/evals/evals.json index afcaaff..f3d33a1 100644 --- a/skills/gherkin-step/evals/evals.json +++ b/skills/gherkin-step/evals/evals.json @@ -8,7 +8,7 @@ "expected_output": "Outputs a thin step definition that delegates entirely to the PageObject: @when('the customer confirms the order') / def step_confirm_order(context): / context.checkout_page.confirm_order(). No CSS selectors, no business logic, and no assertions inside the step. The method call is the only line in the step body (plus any state retrieval from context).", "files": [], "expectations": [ - "Step delegates to CheckoutPage.confirm_order() — no selector or business logic in step body", + "Step delegates to CheckoutPage.confirm_order() - no selector or business logic in step body", "Uses @when decorator", "Accesses checkout_page via context object", "No assertions in a When step" @@ -18,7 +18,7 @@ "id": 2, "category": "happy-path", "prompt": "In behave, I have a Given step that creates a customer and a Then step that checks the discount. How do I pass the customer object between them without using global variables?", - "expected_output": "Use the context object — behave instantiates it fresh per scenario so there is no contamination. In the Given step, attach the object: context.customer = Customer(tier=tier). In the Then step, read it back: context.customer.discount_rate(). Never store state in module-level or global variables. Provides a code example showing both steps using context.customer.", + "expected_output": "Use the context object - behave instantiates it fresh per scenario so there is no contamination. In the Given step, attach the object: context.customer = Customer(tier=tier). In the Then step, read it back: context.customer.discount_rate(). Never store state in module-level or global variables. Provides a code example showing both steps using context.customer.", "files": [], "expectations": [ "Uses context object to pass state between steps", @@ -31,7 +31,7 @@ "id": 3, "category": "happy-path", "prompt": "I have a step 'When the customer adds the following items' that takes a DataTable with columns 'sku' and 'quantity'. How do I parse the table in behave?", - "expected_output": "Access the table via context.table and iterate with a for loop. Each row is a dict-like object: for row in context.table: context.cart.add_item(row['sku'], int(row['quantity'])). Notes that column values are strings by default — cast quantity to int explicitly. Provides a complete step definition example.", + "expected_output": "Access the table via context.table and iterate with a for loop. Each row is a dict-like object: for row in context.table: context.cart.add_item(row['sku'], int(row['quantity'])). Notes that column values are strings by default - cast quantity to int explicitly. Provides a complete step definition example.", "files": [], "expectations": [ "Uses context.table to access the DataTable", @@ -44,7 +44,7 @@ "id": 4, "category": "regression", "prompt": "Review this step definition:\n\n@when('the customer confirms the order')\ndef step_confirm_order(context):\n context.cart.total *= (1 - context.discount / 100)\n context.order_status = 'placed'\n context.db.save(context.order)", - "expected_output": "Flags the step as violating the 'keep step definitions thin' rule. Business logic (discount calculation, status assignment, DB write) must not live in step definitions — these belong in domain objects or PageObjects. The step should only call a method: context.checkout_page.confirm_order() or context.order_service.confirm_order(). Provides a corrected version that delegates to a domain object.", + "expected_output": "Flags the step as violating the 'keep step definitions thin' rule. Business logic (discount calculation, status assignment, DB write) must not live in step definitions - these belong in domain objects or PageObjects. The step should only call a method: context.checkout_page.confirm_order() or context.order_service.confirm_order(). Provides a corrected version that delegates to a domain object.", "files": [], "expectations": [ "Identifies business logic in the step body as the violation", @@ -57,7 +57,7 @@ "id": 5, "category": "negative", "prompt": "Write a Gherkin scenario for when the promo code is expired.", - "expected_output": "Writing Gherkin scenarios is out of scope for this skill — routes to gherkin-scenario. gherkin-step handles step definition code; gherkin-scenario handles Gherkin text.", + "expected_output": "Writing Gherkin scenarios is out of scope for this skill - routes to gherkin-scenario. gherkin-step handles step definition code; gherkin-scenario handles Gherkin text.", "files": [], "expectations": [ "Does not write a Gherkin scenario", @@ -69,7 +69,7 @@ "id": 6, "category": "paraphrase", "prompt": "How do I pass data between step files in behave? For example, I create an order in a Given step and need to check it in a Then step in a different file.", - "expected_output": "Use the context object — behave's built-in mechanism for sharing state across steps from different files. In the Given step (any file), assign context.order = .... In the Then step (another file), read context.order. The context is scoped to the scenario and reset between scenarios. Provides a concrete code example showing a Given in one file and Then in another, both using context.order.", + "expected_output": "Use the context object - behave's built-in mechanism for sharing state across steps from different files. In the Given step (any file), assign context.order = .... In the Then step (another file), read context.order. The context is scoped to the scenario and reset between scenarios. Provides a concrete code example showing a Given in one file and Then in another, both using context.order.", "files": [], "expectations": [ "Recommends context object for cross-file state sharing", @@ -82,7 +82,7 @@ "id": 7, "category": "edge-case", "prompt": "I want a Before hook in behave that only runs for scenarios tagged @database. How do I scope it?", - "expected_output": "Use the context.tags check inside the before_scenario hook: @before_scenario / def setup_database(context): / if 'database' in context.tags: / context.db = create_test_db(). This scopes setup to only the tagged scenarios. Notes that before_scenario runs before every scenario by default — the tag check prevents unnecessary setup. Advises pairing with an after_scenario hook to clean up: if 'database' in context.tags: context.db.teardown().", + "expected_output": "Use the context.tags check inside the before_scenario hook: @before_scenario / def setup_database(context): / if 'database' in context.tags: / context.db = create_test_db(). This scopes setup to only the tagged scenarios. Notes that before_scenario runs before every scenario by default - the tag check prevents unnecessary setup. Advises pairing with an after_scenario hook to clean up: if 'database' in context.tags: context.db.teardown().", "files": [], "expectations": [ "Uses context.tags check to scope the hook", @@ -95,14 +95,26 @@ "id": 8, "category": "output-format", "prompt": "Show me the correct structure for a Cucumber TypeScript step definition that reads the order ID from the World object and submits the order using a PageObject.", - "expected_output": "Output is a TypeScript code block. The step uses the When decorator and accesses this (typed as OrderWorld). Calls this.checkoutPage.submitOrder(this.orderId). The World interface/class includes orderId and checkoutPage properties. No CSS selectors appear in the step body — they are encapsulated in CheckoutPage. Example follows the pattern: When('the customer submits the order', async function (this: OrderWorld) { await this.checkoutPage.submitOrder(); }).", + "expected_output": "Output is a TypeScript code block. The step uses the When decorator and accesses this (typed as OrderWorld). Calls this.checkoutPage.submitOrder(this.orderId). The World interface/class includes orderId and checkoutPage properties. No CSS selectors appear in the step body - they are encapsulated in CheckoutPage. Example follows the pattern: When('the customer submits the order', async function (this: OrderWorld) { await this.checkoutPage.submitOrder(); }).", "files": [], "expectations": [ "Output is a TypeScript code block", "Step uses async function with this typed as a World class", - "Delegates to PageObject method — no selectors in step body", + "Delegates to PageObject method - no selectors in step body", "World object holds page and state properties" ] + }, + { + "id": 9, + "prompt": "My step is throwing AttributeError: \"Context\" object has no attribute \"checkout_page\". How do I fix this? The CheckoutPage class is in pages/checkout_page.py.", + "expected_output": "Explanation that context.checkout_page must be initialized before the step runs, using a before_scenario hook in environment.py; shows the correct before_scenario pattern attaching a CheckoutPage instance to context.", + "files": [] + }, + { + "id": 10, + "prompt": "Should I name my behave step function step_when_the_customer_clicks_the_confirm_order_button or step_confirm_order?", + "expected_output": "Recommends step_confirm_order - concise action-based name. Explains why verbose full-phrase names are discouraged: they duplicate the Gherkin text and make step files harder to scan.", + "files": [] } ] -} +} \ No newline at end of file diff --git a/skills/gherkin-step/evals/trigger-eval.json b/skills/gherkin-step/evals/trigger-eval.json index c5be3fd..474dbef 100644 --- a/skills/gherkin-step/evals/trigger-eval.json +++ b/skills/gherkin-step/evals/trigger-eval.json @@ -1,19 +1,112 @@ [ - {"id": 1, "query": "Write step definitions for the checkout feature file", "should_trigger": true, "reason": "'step definitions' trigger phrase"}, - {"id": 2, "query": "Implement Gherkin steps for the login scenarios", "should_trigger": true, "reason": "'implement Gherkin steps' trigger phrase"}, - {"id": 3, "query": "How do I write a Cucumber step for 'When the customer submits the order'?", "should_trigger": true, "reason": "'Cucumber step' trigger phrase"}, - {"id": 4, "query": "How do I write a behave step for 'Given a gold tier customer'?", "should_trigger": true, "reason": "'behave step' trigger phrase"}, - {"id": 5, "query": "How do I configure a parameter type so a number is cast to int?", "should_trigger": true, "reason": "'parameter type' trigger phrase"}, - {"id": 6, "query": "How do I parse a DataTable in a step definition?", "should_trigger": true, "reason": "'DataTable' trigger phrase"}, - {"id": 7, "query": "How do I access a DocString payload inside a step?", "should_trigger": true, "reason": "'DocString' trigger phrase"}, - {"id": 8, "query": "How do I set up a Before hook in Cucumber TypeScript?", "should_trigger": true, "reason": "'Before hook' trigger phrase"}, - {"id": 9, "query": "How do I clean up after each scenario using an After hook?", "should_trigger": true, "reason": "'After hook' trigger phrase"}, - {"id": 10, "query": "How do I use the World object to share page instances between steps?", "should_trigger": true, "reason": "'World object' trigger phrase"}, - {"id": 11, "query": "How do I manage step context across multiple step files?", "should_trigger": true, "reason": "'step context' trigger phrase"}, - {"id": 12, "query": "How do I share data between two step definitions in behave?", "should_trigger": true, "reason": "'step state sharing' trigger phrase"}, - {"id": 13, "query": "How do I share state between steps in a Cucumber scenario?", "should_trigger": true, "reason": "'how to share state between steps' trigger phrase"}, - {"id": 14, "query": "How do I register a step definition pattern for a new step text?", "should_trigger": true, "reason": "'register step definition' trigger phrase"}, - {"id": 15, "query": "How do I set up hooks for my Cucumber test suite?", "should_trigger": true, "reason": "'hook setup' trigger phrase"}, - {"id": 16, "query": "Write a Gherkin scenario for the promo code feature", "should_trigger": false, "reason": "Writing Gherkin scenarios — routes to gherkin-scenario"}, - {"id": 17, "query": "Write a unit test for the discount calculation function", "should_trigger": false, "reason": "Unit test request — out of scope for this toolkit (no test-unit-write skill defined)"} -] + { + "id": 1, + "query": "Write step definitions for the checkout feature file", + "should_trigger": true, + "reason": "'step definitions' trigger phrase" + }, + { + "id": 2, + "query": "Implement Gherkin steps for the login scenarios", + "should_trigger": true, + "reason": "'implement Gherkin steps' trigger phrase" + }, + { + "id": 3, + "query": "How do I write a Cucumber step for 'When the customer submits the order'?", + "should_trigger": true, + "reason": "'Cucumber step' trigger phrase" + }, + { + "id": 4, + "query": "How do I write a behave step for 'Given a gold tier customer'?", + "should_trigger": true, + "reason": "'behave step' trigger phrase" + }, + { + "id": 5, + "query": "How do I configure a parameter type so a number is cast to int?", + "should_trigger": true, + "reason": "'parameter type' trigger phrase" + }, + { + "id": 6, + "query": "How do I parse a DataTable in a step definition?", + "should_trigger": true, + "reason": "'DataTable' trigger phrase" + }, + { + "id": 7, + "query": "How do I access a DocString payload inside a step?", + "should_trigger": true, + "reason": "'DocString' trigger phrase" + }, + { + "id": 8, + "query": "How do I set up a Before hook in Cucumber TypeScript?", + "should_trigger": true, + "reason": "'Before hook' trigger phrase" + }, + { + "id": 9, + "query": "How do I clean up after each scenario using an After hook?", + "should_trigger": true, + "reason": "'After hook' trigger phrase" + }, + { + "id": 10, + "query": "How do I use the World object to share page instances between steps?", + "should_trigger": true, + "reason": "'World object' trigger phrase" + }, + { + "id": 11, + "query": "How do I manage step context across multiple step files?", + "should_trigger": true, + "reason": "'step context' trigger phrase" + }, + { + "id": 12, + "query": "How do I share data between two step definitions in behave?", + "should_trigger": true, + "reason": "'step state sharing' trigger phrase" + }, + { + "id": 13, + "query": "How do I share state between steps in a Cucumber scenario?", + "should_trigger": true, + "reason": "'how to share state between steps' trigger phrase" + }, + { + "id": 14, + "query": "How do I register a step definition pattern for a new step text?", + "should_trigger": true, + "reason": "'register step definition' trigger phrase" + }, + { + "id": 15, + "query": "How do I set up hooks for my Cucumber test suite?", + "should_trigger": true, + "reason": "'hook setup' trigger phrase" + }, + { + "id": 16, + "query": "Write a Gherkin scenario for the promo code feature", + "should_trigger": false, + "reason": "Writing Gherkin scenarios \u2014 routes to gherkin-scenario" + }, + { + "id": 17, + "query": "Write a unit test for the discount calculation function", + "should_trigger": false, + "reason": "Unit test request \u2014 out of scope for this toolkit (no test-unit-write skill defined)" + }, + { + "query": "How do I initialize the CheckoutPage in behave so that context.checkout_page is available in my When and Then step definitions?", + "should_trigger": true + }, + { + "query": "My step function is called step_when_the_customer_clicks_the_submit_order_button \u2014 is that the right naming convention for behave?", + "should_trigger": true + } +] \ No newline at end of file diff --git a/skills/living-doc-gap-finder/evals/evals.json b/skills/living-doc-gap-finder/evals/evals.json index 274af84..0a1d649 100644 --- a/skills/living-doc-gap-finder/evals/evals.json +++ b/skills/living-doc-gap-finder/evals/evals.json @@ -5,7 +5,7 @@ "id": 1, "category": "happy-path", "prompt": "Run a gap analysis on our living documentation. File: evals/files/catalog-snapshot.json", - "expected_output": "Agent analyzes the snapshot and produces a gap report with: Blocker — US-001-AC-2 and US-001-AC-3 have no linked tests; Blocker — US-002-AC-1 and US-002-AC-2 have no linked tests; Blocker — all 4 ACs of US-007 have no linked tests; Blocker — FUNC-apply-discount has 5 ACs with no linked tests (Gap type 1 applies to Functionality ACs — report as UNTESTED_AC Blocker, not UNDOCUMENTED_FUNCTIONALITY); Important — /account/preferences screen discovered in webapp with no Feature entity (after normalisation: /account/orders ↔ FEAT-account, /reports/legacy ↔ FEAT-orphan are already documented); Important — FEAT-promo and FEAT-orphan each have no linked User Stories (orphan Features); Important — US-007 has no linked Feature (orphan User Story); Important — test_order_history.py, test_login_flow.feature, and the 'View paginated order history' BDD scenario have no linked ACs (orphan tests); Nit — FEAT-account and FEAT-orphan have no Functionalities defined (empty Features). Documentation coverage reported separately for US ACs and Functionality ACs.", + "expected_output": "Agent analyzes the snapshot and produces a gap report with: Blocker \u2014 US-001-AC-2 and US-001-AC-3 have no linked tests; Blocker \u2014 US-002-AC-1 and US-002-AC-2 have no linked tests; Blocker \u2014 all 4 ACs of US-007 have no linked tests; Blocker \u2014 FUNC-apply-discount has 5 ACs with no linked tests (Gap type 1 applies to Functionality ACs \u2014 report as UNTESTED_AC Blocker, not UNDOCUMENTED_FUNCTIONALITY); Important \u2014 /account/preferences screen discovered in webapp with no Feature entity (after normalisation: /account/orders \u2194 FEAT-account, /reports/legacy \u2194 FEAT-orphan are already documented); Important \u2014 FEAT-promo and FEAT-orphan each have no linked User Stories (orphan Features); Important \u2014 US-007 has no linked Feature (orphan User Story); Important \u2014 test_order_history.py, test_login_flow.feature, and the 'View paginated order history' BDD scenario have no linked ACs (orphan tests); Nit \u2014 FEAT-account and FEAT-orphan have no Functionalities defined (empty Features). Documentation coverage reported separately for US ACs and Functionality ACs.", "files": [ "evals/files/catalog-snapshot.json" ], @@ -13,14 +13,14 @@ "Identifies US-001-AC-2 and US-001-AC-3 as untested (Blockers)", "Identifies US-002-AC-1 and US-002-AC-2 as untested (Blockers)", "Identifies all 4 US-007 ACs as untested (Blockers)", - "Identifies FUNC-apply-discount ACs as untested (Blocker, not Nit — Gap type 1 applies to Functionality ACs)", + "Identifies FUNC-apply-discount ACs as untested (Blocker, not Nit \u2014 Gap type 1 applies to Functionality ACs)", "Identifies /account/preferences as undocumented surface (Important)", "Identifies FEAT-promo as orphan Feature (Important)", "Identifies FEAT-orphan as orphan Feature (Important)", "Identifies test_order_history.py and test_login_flow.feature as orphan tests (Important)", "Identifies 'View paginated order history' BDD scenario as orphan test (Important)", - "Identifies FEAT-account and FEAT-orphan as empty Features (Nit — no Functionalities)", - "Identifies US-007 as orphan User Story (Important — no linked Feature)", + "Identifies FEAT-account and FEAT-orphan as empty Features (Nit \u2014 no Functionalities)", + "Identifies US-007 as orphan User Story (Important \u2014 no linked Feature)", "Normalises undocumented surfaces: only /account/preferences is truly undocumented after matching existing Features", "Calculates documentation coverage percentage separately for US ACs and Functionality ACs" ] @@ -29,10 +29,10 @@ "id": 2, "category": "happy-path", "prompt": "What is documentation coverage and how is it calculated?", - "expected_output": "Documentation coverage = (ACs with at least one linked test) / (total ACs) × 100%. Reported separately for User Story ACs (E2E coverage) and Functionality ACs (unit/integration coverage). A project with 100% documentation coverage has every AC backed by at least one test. The metric drives the gap-finder workflow toward zero gaps.", + "expected_output": "Documentation coverage = (ACs with at least one linked test) / (total ACs) * 100%. Reported separately for User Story ACs (E2E coverage) and Functionality ACs (unit/integration coverage). A project with 100% documentation coverage has every AC backed by at least one test. The metric drives the gap-finder workflow toward zero gaps.", "files": [], "expectations": [ - "Correct formula: covered ACs / total ACs × 100", + "Correct formula: covered ACs / total ACs * 100", "Reported separately for US ACs and Functionality ACs", "Notes 100% means every AC has at least one test" ] @@ -41,7 +41,7 @@ "id": 3, "category": "happy-path", "prompt": "A test file exists with no linked AC. What gap type is this and what should I do?", - "expected_output": "This is an orphan test (Gap type 6 — Important). Resolution options: (1) find an existing AC in the living doc that this test covers and add the link; (2) if no AC exists, create a Functionality entity for the behavior being tested using living-doc-create-functionality, then link the test to the new Functionality's AC. Never delete a test to resolve an orphan — that would remove coverage.", + "expected_output": "This is an orphan test (Gap type 6 \u2014 Important). Resolution options: (1) find an existing AC in the living doc that this test covers and add the link; (2) if no AC exists, create a Functionality entity for the behavior being tested using living-doc-create-functionality, then link the test to the new Functionality's AC. Never delete a test to resolve an orphan \u2014 that would remove coverage.", "files": [], "expectations": [ "Classifies as Gap type 6: ORPHAN_TEST", @@ -53,7 +53,7 @@ "id": 4, "category": "regression", "prompt": "We have 200 orphan tests from a legacy codebase. Should I run gap-finder on all of them at once?", - "expected_output": "Batch the gap-finder run by domain or Feature area rather than running across the entire codebase at once. Process the highest-risk areas first (payment, auth, security). For each batch: identify which Functionalities or User Stories the tests correspond to, create missing living doc entities, and link tests. Processing all 200 at once produces an unmanageable gap report — prioritise by business risk.", + "expected_output": "Batch the gap-finder run by domain or Feature area rather than running across the entire codebase at once. Process the highest-risk areas first (payment, auth, security). For each batch: identify which Functionalities or User Stories the tests correspond to, create missing living doc entities, and link tests. Processing all 200 at once produces an unmanageable gap report \u2014 prioritise by business risk.", "files": [], "expectations": [ "Recommends batching by domain or Feature area", @@ -65,7 +65,7 @@ "id": 5, "category": "negative", "prompt": "Create a new User Story for the account preferences screen.", - "expected_output": "Creating a User Story is not a gap-finder action — routes to living-doc-create-user-story. living-doc-gap-finder identifies and proposes new entities; the creation itself is delegated to the appropriate create-* skill.", + "expected_output": "Creating a User Story is not a gap-finder action \u2014 routes to living-doc-create-user-story. living-doc-gap-finder identifies and proposes new entities; the creation itself is delegated to the appropriate create-* skill.", "files": [], "expectations": [ "Does not create the User Story", @@ -103,21 +103,21 @@ "id": 8, "category": "output-format", "prompt": "Run a gap analysis and show me exactly what format the output report uses.", - "expected_output": "The gap report is emitted as structured JSON (or a formatted rendering of it) with a top-level `documentation_coverage` section (coverage_percentage, user_stories_with_full_coverage, user_stories_with_gaps) and a `gaps[]` array. Each gap item includes: id (GAP-NNN), type (one of UNTESTED_AC, UNDOCUMENTED_SURFACE, ORPHAN_FEATURE, ORPHAN_USER_STORY, ORPHAN_FUNCTIONALITY, ORPHAN_TEST, STALE_REFERENCE, UNDOCUMENTED_FUNCTIONALITY, EMPTY_FEATURE), severity (Blocker/Important/Nit), entity (the affected entity ID or path), description, and proposed_action. Gaps are ordered by severity (Blocker first, then Important, then Nit). The report is diagnostic only — no entity creation or modification is made.", + "expected_output": "The gap report is emitted as structured JSON (or a formatted rendering of it) with a top-level `documentation_coverage` section (coverage_percentage, user_stories_with_full_coverage, user_stories_with_gaps) and a `gaps[]` array. Each gap item includes: id (GAP-NNN), type (one of UNTESTED_AC, UNDOCUMENTED_SURFACE, ORPHAN_FEATURE, ORPHAN_USER_STORY, ORPHAN_FUNCTIONALITY, ORPHAN_TEST, STALE_REFERENCE, UNDOCUMENTED_FUNCTIONALITY, EMPTY_FEATURE), severity (Blocker/Important/Nit), entity (the affected entity ID or path), description, and proposed_action. Gaps are ordered by severity (Blocker first, then Important, then Nit). The report is diagnostic only \u2014 no entity creation or modification is made.", "files": [], "expectations": [ "Report includes top-level documentation_coverage section with coverage_percentage", "gaps[] array present; each item has id, type, severity, entity, description, proposed_action", "Gap type codes are canonical: UNTESTED_AC, ORPHAN_TEST, ORPHAN_FEATURE, etc.", "Gaps ordered by severity (Blocker before Important before Nit)", - "Diagnostic only — no entity creation or modification" + "Diagnostic only \u2014 no entity creation or modification" ] }, { "id": 9, "category": "regression", "prompt": "A test references AC:US-042-01 but that AC was deprecated last sprint. What gap type is this and how do I resolve it?", - "expected_output": "This is a stale reference (Gap type 7 — Important). The active test references a Deprecated AC. Resolution options: (1) update the test's link to the active replacement AC if the behavior was superseded; (2) reinstate the AC using living-doc-update if it was deprecated in error; (3) if the behavior was intentionally removed, delete the test after product owner confirmation. The test must not be deleted without product owner confirmation.", + "expected_output": "This is a stale reference (Gap type 7 \u2014 Important). The active test references a Deprecated AC. Resolution options: (1) update the test's link to the active replacement AC if the behavior was superseded; (2) reinstate the AC using living-doc-update if it was deprecated in error; (3) if the behavior was intentionally removed, delete the test after product owner confirmation. The test must not be deleted without product owner confirmation.", "files": [], "expectations": [ "Classifies as Gap type 7: STALE_REFERENCE", @@ -131,7 +131,7 @@ "id": 10, "category": "edge-case", "prompt": "We have 50 orphan tests and 30 untested ACs across the entire platform. Should I run a single all-domain gap report and work through everything at once?", - "expected_output": "No — use the two-phase strategy. Phase 1: ensure every User Story has at least one covered AC. List all User Stories with zero covered ACs, cover the first AC of each before moving on. This establishes a minimum traceability baseline. Phase 2: once every US has at least one covered AC, rank gap clusters by count, prioritise the highest-risk domains first (payment, auth, security), batch by Feature or domain, and iterate. Processing all 80 gaps at once produces an unmanageable report and obscures progress.", + "expected_output": "No \u2014 use the two-phase strategy. Phase 1: ensure every User Story has at least one covered AC. List all User Stories with zero covered ACs, cover the first AC of each before moving on. This establishes a minimum traceability baseline. Phase 2: once every US has at least one covered AC, rank gap clusters by count, prioritise the highest-risk domains first (payment, auth, security), batch by Feature or domain, and iterate. Processing all 80 gaps at once produces an unmanageable report and obscures progress.", "files": [], "expectations": [ "Recommends two-phase strategy over single full-pass", @@ -144,7 +144,7 @@ "id": 11, "category": "happy-path", "prompt": "A Functionality entity FUNC-promo-validate exists in the catalog but has no parent Feature linked. What gap type is this and what should I do?", - "expected_output": "This is an orphan Functionality (Gap type 5 — Important). A Functionality with no parent Feature is untraceable — it cannot be reached via the entity hierarchy and is missed in impact analyses. Resolution: identify or create the owning Feature and add FUNC-promo-validate to its functionalities list. If tests reference this Functionality's ACs, resolve those first (ORPHAN_TEST takes priority) before removing the Functionality.", + "expected_output": "This is an orphan Functionality (Gap type 5 \u2014 Important). A Functionality with no parent Feature is untraceable \u2014 it cannot be reached via the entity hierarchy and is missed in impact analyses. Resolution: identify or create the owning Feature and add FUNC-promo-validate to its functionalities list. If tests reference this Functionality's ACs, resolve those first (ORPHAN_TEST takes priority) before removing the Functionality.", "files": [], "expectations": [ "Classifies as Gap type 5: ORPHAN_FUNCTIONALITY", @@ -157,7 +157,7 @@ "id": 12, "category": "regression", "prompt": "The gap-finder script reports /reports/legacy as an UNDOCUMENTED_SURFACE but there is already a Feature entity 'Legacy Report Screen' (FEAT-orphan) in the catalog. Should this be reported as a gap?", - "expected_output": "No. After normalisation, /reports/legacy is already documented — FEAT-orphan (Legacy Report Screen) clearly owns that surface by name and domain meaning. The skill instructs to treat a discovered screen as already documented when an existing Feature clearly owns the same surface by path, name, or domain meaning. Remove this item from the gap report. FEAT-orphan still has other gaps (orphan Feature, empty Feature) but UNDOCUMENTED_SURFACE is not one of them.", + "expected_output": "No. After normalisation, /reports/legacy is already documented \u2014 FEAT-orphan (Legacy Report Screen) clearly owns that surface by name and domain meaning. The skill instructs to treat a discovered screen as already documented when an existing Feature clearly owns the same surface by path, name, or domain meaning. Remove this item from the gap report. FEAT-orphan still has other gaps (orphan Feature, empty Feature) but UNDOCUMENTED_SURFACE is not one of them.", "files": [], "expectations": [ "Removes /reports/legacy from UNDOCUMENTED_SURFACE gaps after normalisation", @@ -165,6 +165,18 @@ "Notes FEAT-orphan still has ORPHAN_FEATURE and EMPTY_FEATURE gaps", "Distinguishes raw script output from normalised report" ] + }, + { + "id": 13, + "prompt": "A Feature entity FEAT-checkout exists in the living doc but its functionalities list is empty \u2014 no Functionality entities are linked to it. What gap type is this and what is the priority?", + "expected_output": "Gap type EMPTY_FEATURE (not \"Gap type 9\"). Priority: Nit. Guidance: define Functionality entities for the behaviors this Feature owns using living-doc-create-functionality.", + "files": [] + }, + { + "id": 14, + "prompt": "FEAT-checkout is linked to no User Stories at all. What gap type is this?", + "expected_output": "Gap type ORPHAN_FEATURE (not \"Gap type 3\"). Priority: Important. Guidance: link at least one User Story to give the Feature traceable business value.", + "files": [] } ] -} +} \ No newline at end of file diff --git a/skills/living-doc-impact-analysis/evals/evals.json b/skills/living-doc-impact-analysis/evals/evals.json index 2e196a3..93166b6 100644 --- a/skills/living-doc-impact-analysis/evals/evals.json +++ b/skills/living-doc-impact-analysis/evals/evals.json @@ -171,6 +171,12 @@ "Consolidates entities appearing more than once \u2014 higher risk", "Outputs a single consolidated impact map covering both changed files" ] + }, + { + "id": 14, + "prompt": "PR #600 only updates README.md, adds inline code comments, and reformats a YAML config file with no value changes. What is the living doc impact?", + "expected_output": "No living doc impact. Docs-only and formatting-only PRs fall into the fast-path no-impact category. No entities to review or update.", + "files": [] } ] -} +} \ No newline at end of file From 17baf4effcb650ea7eb2f978f097a8759c78c14f Mon Sep 17 00:00:00 2001 From: miroslavpojer <miroslav.pojer@absa.africa> Date: Sat, 30 May 2026 16:08:37 +0200 Subject: [PATCH 26/35] Backup from integration --- .../agents/living-doc-bdd-copilot.agent.md | 371 +++++----------- .github/agents/living-doc-copilot.agent.md | 36 +- skills/bdd-explore/SKILL.md | 167 +++++++ skills/bdd-maintain/SKILL.md | 124 ++++++ .../scripts/find_unused_po_components.py | 136 ++++++ .../scripts/find_unused_po_methods.py | 174 ++++++++ .../bdd-maintain/scripts/find_unused_steps.py | 151 +++++++ skills/bdd-scenario-gen/SKILL.md | 87 ++++ skills/data-cy-instrument/SKILL.md | 240 ++++++++++ skills/living-doc-pageobject-scan/SKILL.md | 48 +- skills/references/living-doc-glossary.md | 417 ++++++++++++++++-- 11 files changed, 1645 insertions(+), 306 deletions(-) create mode 100644 skills/bdd-explore/SKILL.md create mode 100644 skills/bdd-maintain/SKILL.md create mode 100644 skills/bdd-maintain/scripts/find_unused_po_components.py create mode 100644 skills/bdd-maintain/scripts/find_unused_po_methods.py create mode 100644 skills/bdd-maintain/scripts/find_unused_steps.py create mode 100644 skills/bdd-scenario-gen/SKILL.md create mode 100644 skills/data-cy-instrument/SKILL.md diff --git a/.github/agents/living-doc-bdd-copilot.agent.md b/.github/agents/living-doc-bdd-copilot.agent.md index c0db2ed..0884a8f 100644 --- a/.github/agents/living-doc-bdd-copilot.agent.md +++ b/.github/agents/living-doc-bdd-copilot.agent.md @@ -8,22 +8,10 @@ description: > webapp", "generate pageobjects", "heal pageobjects", "generate scenarios", "sync gherkin", "playwright crawl", "explore the app", "bdd copilot", "living doc bdd copilot", "BDD pipeline", "crawl the UI", "create page objects", "generate feature - file", "scenario coverage", "step definitions", "gherkin from user story". -tools: - - read_file - - replace_string_in_file - - create_file - - grep_search - - file_search - - semantic_search - - run_in_terminal - - mcp_microsoft_pla_browser_navigate - - mcp_microsoft_pla_browser_snapshot - - mcp_microsoft_pla_browser_click - - mcp_microsoft_pla_browser_fill_form - - mcp_microsoft_pla_browser_take_screenshot - - mcp_microsoft_pla_browser_type - - mcp_microsoft_pla_browser_wait_for + file", "scenario coverage", "step definitions", "gherkin from user story", + "add missing data-cy", "instrument templates", "fix data-cy gaps", "add testids", + "fix playwright selectors". +tools: [vscode, execute, read, agent, browser, edit, search, web, 'playwright/*', todo] --- # @living-doc-bdd-copilot @@ -32,201 +20,99 @@ Automation layer agent. Explores web apps, generates PageObjects, produces Gherk --- -## Business Seed Assembly +## Session State Protocol -Before crawling, assemble the Business Seed file at `.copilot/bdd/seed.yaml`. +**On every session start**, create or load `.copilot/bdd/.session-state.md` (dot-prefix — add to `.gitignore`). -Sources A–E — collect from whichever are available: +This file is the agent's working memory. It keeps the context window small during long sessions: instead of holding the full manifest and all skill content in context, the agent writes progress to disk and loads only what it needs next. -| Source | Behaviour | -|---|---| -| **A — Living documentation** | Extract Feature names, US titles, and AC texts. Map each Feature to its primary URL/route if known. | -| **B — Sitemap or route config** | Parse route definitions (Angular router, React Router, `sitemap.xml`) to enumerate URL paths. | -| **C — OpenAPI / Swagger spec** | Extract endpoint paths; map REST resources to UI screens where obvious. | -| **D — Existing PageObjects** | Load current `.copilot/bdd/manifest.json` if present — treat known surfaces as already discovered. | -| **E — Guided traversal** | See Source E protocol below. | - -**Credential safety rule:** Never store literal credentials in `seed.yaml`. Always use `env:VAR_NAME` as the value, e.g.: - -```yaml -credentials: - username: env:BDD_USERNAME - password: env:BDD_PASSWORD -``` - -**Artifact location:** BDD artifacts can live anywhere in the repository. On session start, discover them: - -1. Search for `seed.yaml` containing a `base_url:` key. -2. Search for `manifest.json` containing an array with `pageobject_path` entries. -3. If found, load both files and record their paths for this session. -4. If NOT found, create them at a sensible location (e.g. alongside the existing living documentation directory if one exists, otherwise `.copilot/bdd/`). -5. **On first discovery:** propose adding their locations to `.github/copilot-instructions.md` so every future agent session can load them without searching: +**Schema:** ```markdown -## BDD Artifacts -- **Business Seed:** `<relative-path>/seed.yaml` — webapp routes, credentials (env refs), guided traversal steps -- **Exploration Manifest:** `<relative-path>/manifest.json` — discovered UI surfaces, component IDs, PageObject paths -``` +# BDD Session State +_Auto-managed by @living-doc-bdd-copilot. Delete when session complete._ -Committing both files means every subsequent session resumes from the last known state — no re-crawl required. - -**Output artifact:** `seed.yaml` (path discovered or chosen above) - -```yaml -base_url: https://... -credentials: - username: env:BDD_USERNAME - password: env:BDD_PASSWORD -known_routes: - - path: /login - feature: Authentication - - path: /dashboard - feature: Dashboard -guided_steps: [] # populated during Source E traversal -``` +## Mode +<!-- EXPLORE | SCENARIO-GEN | HEAL | RE-SCAN | REMOVE --> ---- +## Goal +<!-- One sentence: what this session must accomplish --> -## Iterative Exploration - -**On session start:** Load `seed.yaml`. If `.copilot/bdd/manifest.json` is present, load it — treat all listed surfaces as already discovered and resume from there. If manifest is absent, treat this as the first run (clean slate). - -**Partial state rule:** `seed.yaml` present but `manifest.json` absent = first exploration run. Begin crawl from `base_url`; do not assume any surfaces have been discovered. - -**Crawl loop:** - -1. Navigate to each known route from `seed.yaml` using MCP Playwright. -2. Snapshot the page; identify interactive elements, forms, navigation links, and significant UI surfaces. -3. Follow links and expand navigation to discover new routes not in the manifest. -4. For each new surface discovered: add an entry to `manifest.json` (Feature name, URL, component IDs, PageObject path). -5. Repeat until coverage plateau — no new surfaces found in the last full iteration. -6. Report any unreachable areas — auth walls, dead links, CAPTCHA gates, or forms that cannot be progressed due to missing business knowledge (unknown valid input values, business-specific field formats, required lookup codes, conditional field logic). Offer to enrich `seed.yaml` with missing routes, credentials, or form values, then loop. - -**PageObject generation rule:** For every new or changed UI surface, load `living-doc-pageobject-scan` — `Create` mode for first-time generation and `Maintain` mode for selector drift. Generated PageObjects must use a file-level `living-doc: FEAT-<nnn> | /route` header comment, prefer `data-testid` selectors, keep selector constants in `ALL_CAPS`, accept `page` in `__init__` / `constructor`, and expose method stubs for each interactive element. Flag any positional CSS selector as `FRAGILE`. If no matching Feature exists in the living documentation, hand the surface to `@living-doc-copilot`; do not create entities here. - -**Output artifact:** `.copilot/bdd/manifest.json` - -The manifest records per-route exploration state. Schema matches the `living-doc-pageobject-scan` skill definition: - -```json -{ - "version": "1.0", - "routes": { - "/login": { - "pageobject_path": "aul-ui/playwright/pages/LoginPage.ts", - "feature_id": "FEAT-001", - "last_scanned": "2026-05-26T10:30:00Z", - "elements": [ - { "data_cy": "username-input", "tag": "input" }, - { "data_cy": "password-input", "tag": "input" }, - { "data_cy": "login-btn", "tag": "cps-button" } - ], - "coverage_gaps": [], - "navigation_context": { - "prerequisites": null, - "navigation_steps": "Navigate directly to /login.", - "data_requirements": null, - "auth_role": "unauthenticated", - "notes": null - } - } - } -} -``` +## Artifacts +- seed.yaml: <path> +- manifest.json: <path> ---- - -## Source E — Guided Traversal Protocol +## Route Progress +<!-- Per-route status. Only routes relevant to this session. --> +- [ ] /route-a — pending +- [-] /route-b — IN PROGRESS (note current sub-step or blocker) +- [x] /route-c — done -Use when automated crawling cannot proceed — unknown decision points, multi-step wizards, auth flows, role-gated screens, or forms blocked by missing business knowledge (required field values, valid lookup codes, business-specific input formats). +## Current Position +<!-- What is the agent doing RIGHT NOW — route, wizard step, form field, etc. --> -**Protocol:** +## Pending Actions +<!-- Ordered. Remove items as they complete. --> +1. <next action> +2. <action after that> -1. Take a screenshot; show the user what the agent sees. -2. Ask: *"I've reached a decision point at [URL]. What should I do next? (e.g. click X, fill field Y with Z, log in as role R, provide the valid value for field F)"* -3. Wait for the user's answer. Execute the described action via MCP Playwright. -4. Immediately append to `guided_steps:` in `seed.yaml`: - -```yaml -guided_steps: - - url: /checkout/payment - action: fill - field: card-number - value: env:TEST_CARD_NUMBER - note: "Test Visa card for payment flow" +## Decisions & Findings +<!-- Notes that would be expensive to re-discover: dead ends, field constraints, + role requirements, entity IDs resolved this session, CAPTCHA steps taken. --> ``` -5. Continue crawl from the new state. +**Update rules:** +- Update `Current Position` and `Route Progress` after every route completes. +- Append to `Decisions & Findings` whenever you discover something non-obvious. +- Never store full element arrays here — those belong in `manifest.json`. +- Delete the file when the session goal is fully achieved. -**CAPTCHA rule:** If a CAPTCHA is encountered, pause and ask the user to solve it manually in the browser. Do not attempt automated bypass. Once the user confirms it is solved, continue and record the step with `action: captcha_solved`. +**On resume** (session-state file already exists): read it first, then load only the skill and manifest entries relevant to `Current Position` and `Pending Actions`. Do not reload completed routes. --- -## Scenario Generation - -After exploration completes (manifest is up to date): +## Mode Dispatch -1. Use the `living-doc-gap-finder` skill (bottom-up mode) to identify User Stories with `ACTIVE` ACs that have no linked Gherkin scenario. -2. For each gap: load the `living-doc-scenario-creator` skill and generate Gherkin scenario skeletons — one scenario per `Active` or `Implemented` AC, with the mandatory `@AC:` traceability tag. Skip `Planned` and `Deprecated` ACs. -3. Write `.feature` files under `features/us/` using `us-<nnn>-<kebab-title>.feature` naming, e.g. `features/us/us-007-place-an-online-order.feature`. -4. The `Feature:` header must restate the User Story narrative in `As a / I can / so that` form. -5. Scenario step text must stay in business/domain language only — never mention selectors, HTTP calls, DOM details, or database operations. -6. For each generated scenario, resolve step definitions: - a. **Narrow the search scope to the page first** — identify which PageObject the scenario's steps will interact with. Look in step definition files that already import or reference that PageObject; these are the most likely candidates for reuse. - b. **Match by purpose, not just pattern** — read the step's implementation body to confirm it performs the same business action (e.g. a `fill` on `username-input` vs a `fill` on `search-input` look identical in text but serve different purposes). Only reuse if purpose matches. - c. If a purpose-matching step exists, reuse it as-is; note which library file it lives in. - d. If no reusable step exists but the needed PageObject method already exists, generate a full step stub via `gherkin-step` that delegates directly to that PageObject method. - e. If neither the step nor the PageObject method exists, generate a stub that raises `NotImplementedError` (or the language-equivalent pending marker) and explicitly flag that the PageObject must be extended with the missing interaction. -7. Update `manifest.json` to record any new PageObject paths created. - -**Gap detection logic:** An AC is considered uncovered if no scenario in any `.feature` file carries the `@AC:<id>` traceability tag. - ---- +Identify intent from the user's request. Load **one** skill per session — do not pre-load skills for other modes. -## Maintenance +| User intent | Load skill | Manifest loading scope | +|---|---|---| +| Scan / crawl / explore the app | `bdd-explore` | Load only routes being crawled this session | +| Add / fix missing data-cy attributes | `data-cy-instrument` | Load only the routes with coverage gaps | +| Generate scenarios from ACs | `bdd-scenario-gen` | Load only the target US's route entry | +| Fix failing tests / selector drift | `bdd-maintain` (HEALING) | Load only the failing routes | +| Full re-scan after UI change | `bdd-maintain` (RE-SCAN) | Load full manifest | +| Remove a deprecated feature | `bdd-maintain` (REMOVE) | Load only the deprecated route entry | -### RE-SCAN mode +**Manifest loading rule:** Read `manifest.json` with targeted line ranges for the route(s) in scope. Load the full file only for RE-SCAN. This keeps context lean as the manifest grows. -**Trigger:** New feature shipped, UI refactored, or significant route changes. +**seed.yaml:** Always load in full — it is small and stable. -**Scope:** Full re-run of every path recorded in `manifest.json`, plus active discovery of new routes not yet in the manifest. +**living-doc-glossary:** Do NOT load the full glossary. Essential definitions are inlined below in [Living Doc Conventions](#living-doc-conventions). -1. Reload `seed.yaml` and `manifest.json`. -2. For every existing manifest entry: navigate to its URL, snapshot the DOM, and validate that every recorded `component_id` locator still resolves. Flag any locator that no longer matches as `BREAKING CHANGE`, including the linked step definition / scenario details that may fail. -3. **Actively discover new routes from each visited page** — do not limit discovery to routes already in `seed.yaml`. On each page snapshot: - - Find all `<a href>` links that resolve to new paths not yet in the manifest. - - Find all buttons and interactive components whose purpose suggests navigation to a new screen (e.g. "Create order", "View details", "Go to settings") — click them and record the resulting URL. - - Find tab panels, side-nav items, and wizard steps that expose sub-routes. - - Any new URL discovered this way is a candidate manifest entry; add it and crawl it recursively. -4. Add new surfaces to `manifest.json`; mark removed surfaces as `deprecated`. -5. Update stale selector constants in PageObjects for any locators flagged in step 2. -6. Generate new scenarios for newly discovered ACs (Scenario Generation logic). +--- -### HEALING mode +## Shared Skill Note — `living-doc-gap-finder` -**Trigger:** Test suite failures due to selector drift, broken step definitions, or PageObject mismatches. +`living-doc-gap-finder` is a shared skill used differently by each agent: -**Scope:** Failing tests only — do not touch passing tests or unrelated PageObjects. +- **`@living-doc-copilot`** uses it **top-down**: discovering missing documentation entities (Features, US, Functionalities not yet in the catalog). +- **`@living-doc-bdd-copilot`** uses it **bottom-up**: detecting scenario coverage gaps — ACs that exist in the catalog but have no linked Gherkin scenario. -1. Receive or discover the list of failing test names / scenario titles. If the request only says tests are failing but does not include the failing list, ask for it before making changes so scope stays limited to the failing scenarios. -2. Trace each failure back to its PageObject and step definition. -3. Navigate to the affected page via MCP Playwright; snapshot the current DOM. -4. Find updated element IDs or selectors; update only the affected PageObject(s) accordingly. -5. Verify the step definition binding still resolves; fix if broken. -6. Re-run only the previously failing tests to confirm healing. Do not re-run the full suite. +Load the skill with this distinction in mind. The bottom-up usage is the default context for this agent. -### REMOVE mode +--- -**Trigger:** Feature deprecated or deleted from the product. +## Workflow Detail -**Scope:** Only files linked to the removed entity — do not touch other Features, PageObjects, or step definitions. +Full protocols for each mode live in the corresponding skill — loaded on demand by Mode Dispatch above. -1. Identify the specific Feature/US/AC being removed. -2. Find all `.feature` files whose scenarios carry an `@AC:` tag matching the removed entity's IDs. -3. Find PageObjects referenced only by those scenarios; find step definitions used only by those scenarios. -4. Confirm the full deletion list with the user before touching any file. -5. Remove confirmed files; update `manifest.json` to remove the deprecated entry. -6. Flag linked US/AC entities in the living documentation as candidates for deprecation — hand off to `@living-doc-copilot`. +| Skill | What it contains | +|---|---| +| `bdd-explore` | Business Seed Assembly (Sources A–E), crawl loop, entity harvesting, ExplorationFixture cascade, component interaction rules, parameterised route resolution, Source E guided traversal, manifest.json schema | +| `data-cy-instrument` | Gap audit from manifest.json, route→component resolution, naming validation, template instrumentation, PageObject sync, Functionality promotion, WORK_LOG update | +| `bdd-scenario-gen` | Gap detection logic, feature file naming, `@AC:` traceability tagging, step definition resolution rules | +| `bdd-maintain` | RE-SCAN mode, HEALING mode, REMOVE mode | --- @@ -265,103 +151,48 @@ Load the skill with this distinction in mind. The bottom-up usage is the default --- -## Living Doc Compatibility +## Living Doc Conventions -This agent adheres to the canonical living doc entity model. Full definitions are in [living-doc-glossary](../../skills/references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)). +Full model: [living-doc-glossary](../../skills/references/living-doc-glossary.md) — load only if creating or validating entities. -### Entity IDs - -| Entity | Format | Example | -|---|---|---| -| User Story | `US-<nnn>` | `US-001` | -| Feature | `FEAT-<nnn>` | `FEAT-001` | -| Functionality | `FUNC-<nnn>` | `FUNC-001` | - -### AC format - -Every Acceptance Criterion reference must follow: +**Entity IDs:** `US-<nnn>` · `FEAT-<nnn>` · `FUNC-<nnn>` +**AC reference format:** ``` AC:<parent-id>-<nn> (v<version> – <State>) – <atomic description; at most one {placeholder}> ``` - State values: `Planned | Implemented | Active | Deprecated` -### Gherkin traceability tag - -Every `Scenario:` or `Scenario Outline:` in a **living-doc feature file** (`features/us/` and -`features/functionalities/`) must carry two complementary annotations: - -1. A `# AC:` comment — human-readable context (ID, version, state, description, optional aspect). -2. An `@AC:` Cucumber tag — machine-readable link: `@AC:<id>[/param:value...]`. - +**Gherkin traceability** — every scenario in `features/us/` and `features/functionalities/` requires: ```gherkin -# AC:US-1-01 (v1.0.0 - Active) — customer places an order with a saved payment method +# AC:US-1-01 (v1.0.0 - Active) — <description> @AC:US-1-01 -Scenario: Customer successfully places an order -``` - -When the scenario covers only **one aspect** of a multi-aspect AC, encode it as a `/param:value` -segment on the tag and mirror it in the comment: - -```gherkin -# AC:US-1-01 (v1.0.0 - Active) — displays {required field} on login screen | aspect: username input -@AC:US-1-01/aspect:username-input -Scenario: Login form shows the username input field +Scenario: ... ``` +One `# AC:` + `@AC:` pair per AC. Aspect variant: `@AC:US-1-01/aspect:username-input`. The `@AC:` tag is the single source of machine traceability — never delete or rename without updating the entity. -Multiple ACs — one comment + tag pair per AC: - -```gherkin -# AC:US-1-01 (v1.0.0 - Active) — invalid credentials show an error message -# AC:US-1-02 (v1.0.0 - Active) — account lockout after 3 failed attempts -@AC:US-1-01 -@AC:US-1-02 -@Regression -Scenario: User is locked out after repeated failed logins -``` - -The `/param:value` format is extensible — additional params can be added as needed. -The `@AC:` tag is the single source of machine traceability. Never delete or rename an `@AC:` tag -without updating the corresponding entity. - -Feature files outside `features/us/` and `features/functionalities/` (smoke tests, regression -suites, exploratory probes) do not require these annotations. - -### Feature surface types - -The glossary defines two surface types that determine the test abstraction: - -| Surface | Test abstraction | Selector preference | -|---|---|---| -| `UI` — web page, modal, screen | **PageObject** class — one class per screen | `data-testid` > `aria-label`/role > CSS class (last resort) | -| `API` — REST/GraphQL endpoint | Annotated endpoint method — OpenAPI/JSDoc header as living contract anchor | N/A | +**Surface types:** `UI` → PageObject class (prefer `data-testid`). `API` → contract test layer only. -This agent generates PageObjects only for `UI` Features. API Feature coverage belongs in the contract test layer. +**AC rules:** atomic (one condition + one outcome) · binary (clear pass/fail) · single placeholder per statement. -### AC rules - -- **Atomic** — one input condition, one observable outcome per AC -- **Binary** — clear pass/fail; no "usually" or "typically" -- **Single placeholder** — at most ONE `{placeholder}` per AC statement; if two aspects vary independently, write a separate AC for each - -### Entity status - -`planned | active | deprecated` — only ACs with `active` or `implemented` state should drive scenario generation. Deprecated ACs require `deprecated_at`, `deprecation_reason`, and optionally `superseded_by`. +**Active/Implemented ACs** drive scenario generation. Deprecated ACs require `deprecated_at`, `deprecation_reason`, and optionally `superseded_by`. --- ## Skills -| Skill | Intent | Path | -|---|---|---| -| `living-doc-pageobject-scan` | Discover, create, and maintain PageObject classes from a live webapp | `skills/living-doc-pageobject-scan/SKILL.md` | -| `living-doc-scenario-creator` | Generate Gherkin scenario skeletons from User Story ACs | `skills/living-doc-scenario-creator/SKILL.md` | -| `living-doc-gap-finder` | Find ACs with no linked Gherkin scenario (bottom-up usage) | `skills/living-doc-gap-finder/SKILL.md` | -| `gherkin-scenario` | Write BDD Gherkin scenarios in plain business language | `skills/gherkin-scenario/SKILL.md` | -| `gherkin-step` | Implement Gherkin step definitions — clean, reusable, maintainable | `skills/gherkin-step/SKILL.md` | -| `gherkin-living-doc-sync` | Synchronise feature files and scenarios with the living documentation | `skills/gherkin-living-doc-sync/SKILL.md` | +| Skill | Intent | Path | When to load | +|---|---|---|---| +| `bdd-explore` | Business Seed assembly, crawl loop, component rules, manifest schema | `skills/bdd-explore/SKILL.md` | EXPLORE mode | +| `bdd-scenario-gen` | Generate Gherkin from ACs, step resolution, traceability tagging | `skills/bdd-scenario-gen/SKILL.md` | SCENARIO-GEN mode | +| `bdd-maintain` | RE-SCAN, HEALING, REMOVE protocols | `skills/bdd-maintain/SKILL.md` | RE-SCAN / HEAL / REMOVE mode | +| `living-doc-pageobject-scan` | Discover, create, and maintain PageObject classes from a live webapp | `skills/living-doc-pageobject-scan/SKILL.md` | When generating or healing PageObjects | +| `living-doc-scenario-creator` | Generate Gherkin scenario skeletons from User Story ACs | `skills/living-doc-scenario-creator/SKILL.md` | Called from bdd-scenario-gen | +| `living-doc-gap-finder` | Find ACs with no linked Gherkin scenario (bottom-up usage) | `skills/living-doc-gap-finder/SKILL.md` | Called from bdd-scenario-gen | +| `gherkin-scenario` | Write BDD Gherkin scenarios in plain business language | `skills/gherkin-scenario/SKILL.md` | Called from bdd-scenario-gen | +| `gherkin-step` | Implement Gherkin step definitions — clean, reusable, maintainable | `skills/gherkin-step/SKILL.md` | Called from bdd-scenario-gen | +| `gherkin-living-doc-sync` | Synchronise feature files and scenarios with the living documentation | `skills/gherkin-living-doc-sync/SKILL.md` | When syncing traceability tags | --- @@ -378,3 +209,31 @@ When the manifest is complete and new surfaces have been identified, hand the Fe **Outbound — after scenario generation:** > "Feature files and steps generated. Call @sdet-copilot for unit tests." + +## File editing protocol (CLI context) + +When this agent runs via the GitHub Copilot CLI task tool, only `view` (read) and `create` (new files) are available — `str_replace`/`edit` tools are not provisioned regardless of the `tools:` frontmatter. This is a CLI constraint, not a configuration problem. + +**When a task requires modifying an existing file** (e.g. updating a PageObject locator, healing a step definition, patching a feature file): + +1. Read the file with `view`. +2. Produce a structured edit specification — do NOT generate shell commands or workarounds. Use this exact format for each file change: + +``` +FILE: <relative/path/to/file> +FIND (exact, unique string): +<<< +<old content> +>>> +REPLACE WITH: +<<< +<new content> +>>> +``` + +3. After all edit specs, add: + > ⚙️ **Caller action required:** Apply the edit specs above using the `edit` tool, then confirm completion. + +The calling agent (GitHub Copilot CLI main session) will apply the edits using its own `edit` tool and report back. + +**When a task requires creating a new file** (new PageObject, new feature file, new step definition): use `create` directly — this works without restriction. diff --git a/.github/agents/living-doc-copilot.agent.md b/.github/agents/living-doc-copilot.agent.md index 7793e52..8bd1e0a 100644 --- a/.github/agents/living-doc-copilot.agent.md +++ b/.github/agents/living-doc-copilot.agent.md @@ -9,13 +9,7 @@ description: > "HEALING mode", "deprecate entity", "living doc copilot", "add AC to user story", "trace affected features", "update feature registry", "mark US ready", "check AC completeness". -tools: - - read_file - - replace_string_in_file - - create_file - - grep_search - - file_search - - semantic_search +tools: [vscode/extensions, vscode/installExtension, vscode/memory, vscode/newWorkspace, vscode/resolveMemoryFileUri, vscode/runCommand, vscode/vscodeAPI, vscode/askQuestions, vscode/toolSearch, execute/getTerminalOutput, execute/killTerminal, execute/sendToTerminal, execute/runTask, execute/createAndRunTask, execute/runNotebookCell, execute/runTests, execute/testFailure, execute/runInTerminal, read/terminalSelection, read/terminalLastCommand, read/getTaskOutput, read/getNotebookSummary, read/problems, read/readFile, read/viewImage, read/readNotebookCellOutput, agent/runSubagent, browser/openBrowserPage, browser/readPage, browser/screenshotPage, browser/navigatePage, browser/clickElement, browser/dragElement, browser/hoverElement, browser/typeInPage, browser/runPlaywrightCode, browser/handleDialog, edit/createDirectory, edit/createFile, edit/createJupyterNotebook, edit/editFiles, edit/editNotebook, edit/rename, search/codebase, search/fileSearch, search/listDirectory, search/textSearch, search/usages, web/fetch, web/githubRepo, web/githubTextSearch, todo] --- # @living-doc-copilot @@ -110,6 +104,34 @@ Do not cross this boundary. - Updating an `ACTIVE` AC: show OLD vs NEW side by side before writing, keep the AC ID unchanged, and bump the semantic version for business-rule changes (for example `v1.0.0` to `v1.1.0` for a threshold change). Flag any linked `@AC:` tag annotations in feature files as potentially stale for `@living-doc-bdd-copilot`. - For Functionality requests, use a verb-phrase name, draft ACs and present them for confirmation before creating, and run a completeness checklist for thresholds, below/exactly/above-boundary behaviour, invalid or missing input, and interactions with other rules. +## File editing protocol (CLI context) + +When this agent runs via the GitHub Copilot CLI task tool, only `view` (read) and `create` (new files) are available — `str_replace`/`edit` tools are not provisioned regardless of the `tools:` frontmatter. This is a CLI constraint, not a configuration problem. + +**When a task requires modifying an existing file:** + +1. Read the file with `view`. +2. Produce a structured edit specification — do NOT generate shell commands or workarounds. Use this exact format for each file change: + +``` +FILE: <relative/path/to/file> +FIND (exact, unique string): +<<< +<old content> +>>> +REPLACE WITH: +<<< +<new content> +>>> +``` + +3. After all edit specs, add: + > ⚙️ **Caller action required:** Apply the edit specs above using the `edit` tool, then confirm completion. + +The calling agent (GitHub Copilot CLI main session) will apply the edits using its own `edit` tool and report back. + +**When a task requires creating a new file:** use `create` directly — this works without restriction. + ## Handoff **Inbound:** `@living-doc-bdd-copilot` hands a surface list after Phase 1 exploration. Load it, then create the corresponding Feature and User Story entities. diff --git a/skills/bdd-explore/SKILL.md b/skills/bdd-explore/SKILL.md new file mode 100644 index 0000000..db4c4fa --- /dev/null +++ b/skills/bdd-explore/SKILL.md @@ -0,0 +1,167 @@ +--- +name: bdd-explore +description: > + Business Seed assembly, iterative UI crawl, PageObject generation, and guided traversal + for the @living-doc-bdd-copilot agent. Activate for any webapp exploration or first-time + scan session. Covers seed.yaml assembly (Sources A–E), MCP Playwright crawl loop, entity + harvesting, ExplorationFixture sourcing cascade, custom component interaction rules, + parameterised route resolution, Source E guided traversal, and manifest.json output. + Triggers on: "scan webapp", "crawl UI", "explore the app", "discover routes", + "business seed", "seed.yaml", "manifest.json", "build pageobjects", "first scan", + "assemble seed", "guided traversal", "explore routes", "bdd explore". +--- + +# BDD Explore — Business Seed Assembly & Iterative Crawl + +--- + +## Business Seed Assembly + +Before crawling, assemble the Business Seed file at `.copilot/bdd/seed.yaml`. + +Sources A–E — collect from whichever are available: + +| Source | Behaviour | +|---|---| +| **A — Living documentation** | Extract Feature names, US titles, and AC texts. Map each Feature to its primary URL/route if known. | +| **B — Sitemap or route config** | Parse route definitions (Angular router, React Router, `sitemap.xml`) to enumerate URL paths. | +| **C — OpenAPI / Swagger spec** | Extract endpoint paths; map REST resources to UI screens where obvious. | +| **D — Existing PageObjects** | Load current `.copilot/bdd/manifest.json` if present — treat known surfaces as already discovered. | +| **E — Guided traversal** | See Source E protocol below. | + +**Credential safety rule:** Never store literal credentials in `seed.yaml`. Always use `env:VAR_NAME` as the value, e.g.: + +```yaml +credentials: + username: env:BDD_USERNAME + password: env:BDD_PASSWORD +``` + +**Artifact location:** BDD artifacts can live anywhere in the repository. On session start, discover them: + +1. Search for `seed.yaml` containing a `base_url:` key. +2. Search for `manifest.json` containing an array with `pageobject_path` entries. +3. If found, load both files and record their paths for this session. +4. If NOT found, create them at a sensible location (e.g. alongside the existing living documentation directory if one exists, otherwise `.copilot/bdd/`). +5. **On first discovery:** propose adding their locations to `.github/copilot-instructions.md` so every future agent session can load them without searching: + +```markdown +## BDD Artifacts +- **Business Seed:** `<relative-path>/seed.yaml` — webapp routes, credentials (env refs), guided traversal steps +- **Exploration Manifest:** `<relative-path>/manifest.json` — discovered UI surfaces, component IDs, PageObject paths +``` + +Committing both files means every subsequent session resumes from the last known state — no re-crawl required. + +**Output artifact:** `seed.yaml` (path discovered or chosen above) + +```yaml +base_url: https://... +credentials: + username: env:BDD_USERNAME + password: env:BDD_PASSWORD +known_routes: + - path: /login + feature: Authentication + - path: /dashboard + feature: Dashboard +guided_steps: [] # populated during Source E traversal +form_fixtures: {} # keyed by route path; populated during form traversal (ExplorationFixture schema) +``` + +--- + +## Iterative Exploration + +**On session start:** Load `seed.yaml`. If `.copilot/bdd/manifest.json` is present, load it — treat all listed surfaces as already discovered and resume from there. If manifest is absent, treat this as the first run (clean slate). + +**Partial state rule:** `seed.yaml` present but `manifest.json` absent = first exploration run. Begin crawl from `base_url`; do not assume any surfaces have been discovered. + +**Crawl loop:** + +1. Navigate to each known route from `seed.yaml` using MCP Playwright. +2. Snapshot the page; identify interactive elements, forms, navigation links, and significant UI surfaces. +3. Follow links and expand navigation to discover new routes not in the manifest. +4. For each new surface discovered: add an entry to `manifest.json` (Feature name, URL, component IDs, PageObject path). +5. Repeat until coverage plateau — no new surfaces found in the last full iteration. +5a. **Entity harvesting** — whenever a domain ID, version, feed ID, or other parameterised entity is read from the DOM (URLs, card text, table rows), record it under `known_entities` in `seed.yaml` if not already present. Fields: `id`, `version`, `name`, `status`, `owner`, `note`. These values feed the sourcing cascade for parameterised routes in subsequent sessions. +6. For each form, wizard, or dialog on a visited page, attempt to fill and progress using the **ExplorationFixture sourcing cascade** (see glossary): (1) pre-declared values in `seed.yaml form_fixtures` — use the `default`-labelled value for the happy path and explore alternate `values[]` branches to reach different form sections or sub-routes; (2) values read from an existing entity in the app — copy verbatim (`copyable`) or append a suffix to avoid duplicate rejection (`derived`); (3) inferred `fake` values from label + placeholder + tooltip text; (4) user-assist pause for `real-world` fields with no resolvable value. Skip `condition`-gated fields until the controlling field holds the required value. After a successful submission, probe each text input for: special characters (`<>'"&\`), oversized input (200+ chars), wrong type, and duplicate values — run the core scan after each probe to capture `data-cy` validation elements visible only in error state. Record findings as `field_constraints` in the manifest `navigation_context`. Report any still-unreachable flows (auth walls, CAPTCHA, deep data dependencies) and offer to enrich `seed.yaml`. **Dismiss rule — after scanning any modal dialog or overlay, always close it (Cancel button → × close button → Escape key, in that order) before navigating to the next route or triggering the next action. Never leave a dialog open while scanning a subsequent page.** + +**Component interaction rules — use these instead of `fill()` for custom components:** + +| Component | Correct interaction | +|---|---| +| `cps-radio-group` | `browser_click` the inner `<label>` or `<span>` whose text matches the desired option. Do NOT use `fill()`. | +| `cps-select` | `browser_click` the component to open the dropdown portal, then `browser_click` the matching `<li>` option by text. | +| `cps-autocomplete` | Type into the inner `<input>` using `browser_type`, wait for the dropdown to appear, then `browser_click` the matching option. | +| `cps-switch` / `cps-checkbox` | `browser_click` the component wrapper. | +| `app-text-editor` (rich text) | `browser_click` the `contenteditable` child, then `browser_type` the value. | +| `cps-button` | `browser_click` the inner `<button>` (e.g. via `evaluate`: `el.querySelector('button').click()`). | +| `input[type=file]` | Use `mcp_browser_file_upload` or `page.setInputFiles()` with a fixture file path from `seed.yaml form_fixtures`. | + +After interacting with a required field (especially `cps-radio-group`), re-check whether a gated button (e.g. Continue, Save) has become enabled before proceeding. + +**Parameterised route resolution — use `known_entities` before prompting the user:** + +Before navigating to any parameterised route (e.g. `/auth/all-domains/{domainId}/{version}/...`), first check `seed.yaml known_entities` for a matching entity with `owner` equal to the current test user. Substitute the `id` and `version` values directly. Only fall back to the user-assist pause if no matching entity exists. + +For domain detail tab scans (Schema, Run history, Access, Version management): always navigate using the first `known_entities` domain owned by the current test user, then click each tab in turn and run the core scan. These tabs are reachable by tab click alone — no additional data state is required to open them. + +**PageObject generation rule:** For every new or changed UI surface, load `living-doc-pageobject-scan` — `Create` mode for first-time generation and `Maintain` mode for selector drift. Generated PageObjects must use a file-level `living-doc: FEAT-<nnn> | /route` header comment, prefer `data-testid` selectors, keep selector constants in `ALL_CAPS`, accept `page` in `__init__` / `constructor`, and expose method stubs for each interactive element. Flag any positional CSS selector as `FRAGILE`. If no matching Feature exists in the living documentation, hand the surface to `@living-doc-copilot`; do not create entities here. + +**Output artifact:** `.copilot/bdd/manifest.json` + +The manifest records per-route exploration state. Schema matches the `living-doc-pageobject-scan` skill definition: + +```json +{ + "version": "1.0", + "routes": { + "/login": { + "pageobject_path": "aul-ui/playwright/pages/LoginPage.ts", + "feature_id": "FEAT-001", + "last_scanned": "2026-05-26T10:30:00Z", + "elements": [ + { "data_cy": "username-input", "tag": "input" }, + { "data_cy": "password-input", "tag": "input" }, + { "data_cy": "login-btn", "tag": "cps-button" } + ], + "coverage_gaps": [], + "navigation_context": { + "prerequisites": null, + "navigation_steps": "Navigate directly to /login.", + "data_requirements": null, + "auth_role": "unauthenticated", + "notes": null, + "field_constraints": [] + } + } + } +} +``` + +--- + +## Source E — Guided Traversal Protocol + +Use when automated crawling cannot proceed — unknown decision points, multi-step wizards, auth flows, role-gated screens, or forms blocked by missing business knowledge (required field values, valid lookup codes, business-specific input formats). + +**Protocol:** + +1. Take a screenshot; show the user what the agent sees. +2. Ask: *"I've reached a decision point at [URL]. What should I do next? (e.g. click X, fill field Y with Z, log in as role R, provide the valid value for field F)"* +3. Wait for the user's answer. Execute the described action via MCP Playwright. +4. Immediately append to `guided_steps:` in `seed.yaml`: + +```yaml +guided_steps: + - url: /checkout/payment + action: fill + field: card-number + value: env:TEST_CARD_NUMBER + note: "Test Visa card for payment flow" +``` + +5. Continue crawl from the new state. + +**CAPTCHA rule:** If a CAPTCHA is encountered, pause and ask the user to solve it manually in the browser. Do not attempt automated bypass. Once the user confirms it is solved, continue and record the step with `action: captcha_solved`. diff --git a/skills/bdd-maintain/SKILL.md b/skills/bdd-maintain/SKILL.md new file mode 100644 index 0000000..9d63590 --- /dev/null +++ b/skills/bdd-maintain/SKILL.md @@ -0,0 +1,124 @@ +--- +name: bdd-maintain +description: > + Maintenance modes for the @living-doc-bdd-copilot agent: RE-SCAN (full manifest refresh + after UI changes), HEALING (fix selector drift in failing tests only), and REMOVE + (delete files linked to a deprecated feature). Activate when the UI has changed and the + manifest needs refreshing, when tests are failing due to selector drift, or when a feature + has been removed from the product. + Triggers on: "re-scan", "refresh manifest", "heal pageobjects", "fix failing tests", + "selector drift", "tests are failing", "remove feature", "deprecate bdd", "bdd maintain", + "update selectors", "pageobject broken", "scenario failing". +--- + +# BDD Maintenance + +Three modes — activate the one that matches the trigger. + +--- + +## RE-SCAN mode + +**Trigger:** New feature shipped, UI refactored, or significant route changes. + +**Scope:** Full re-run of every path recorded in `manifest.json`, plus active discovery of new routes not yet in the manifest. + +1. Reload `seed.yaml` and `manifest.json`. +2. For every existing manifest entry: navigate to its URL, snapshot the DOM, and validate that every recorded `component_id` locator still resolves. Flag any locator that no longer matches as `BREAKING CHANGE`, including the linked step definition / scenario details that may fail. +3. **Actively discover new routes from each visited page** — do not limit discovery to routes already in `seed.yaml`. On each page snapshot: + - Find all `<a href>` links that resolve to new paths not yet in the manifest. + - Find all buttons and interactive components whose purpose suggests navigation to a new screen (e.g. "Create order", "View details", "Go to settings") — click them and record the resulting URL. + - Find tab panels, side-nav items, and wizard steps that expose sub-routes. + - Any new URL discovered this way is a candidate manifest entry; add it and crawl it recursively. +4. Add new surfaces to `manifest.json`; mark removed surfaces as `deprecated`. +5. Update stale selector constants in PageObjects for any locators flagged in step 2. +6. Generate new scenarios for newly discovered ACs (load `bdd-scenario-gen` skill). + +--- + +## HEALING mode + +**Trigger:** Test suite failures due to selector drift, broken step definitions, or PageObject mismatches. + +**Scope:** Failing tests only — do not touch passing tests or unrelated PageObjects. + +1. Receive or discover the list of failing test names / scenario titles. If the request only says tests are failing but does not include the failing list, ask for it before making changes so scope stays limited to the failing scenarios. +2. Trace each failure back to its PageObject and step definition. +3. Navigate to the affected page via MCP Playwright; snapshot the current DOM. +4. Find updated element IDs or selectors; update only the affected PageObject(s) accordingly. +5. Verify the step definition binding still resolves; fix if broken. +6. Re-run only the previously failing tests to confirm healing. Do not re-run the full suite. + +--- + +## REMOVE mode + +**Trigger:** Feature deprecated or deleted from the product. + +**Scope:** Only files linked to the removed entity — do not touch other Features, PageObjects, or step definitions. + +1. Identify the specific Feature/US/AC being removed. +2. Find all `.feature` files whose scenarios carry an `@AC:` tag matching the removed entity's IDs. +3. Find PageObjects referenced only by those scenarios; find step definitions used only by those scenarios. +4. Confirm the full deletion list with the user before touching any file. +5. Remove confirmed files; update `manifest.json` to remove the deprecated entry. +6. Flag linked US/AC entities in the living documentation as candidates for deprecation — hand off to `@living-doc-copilot`. + +--- + +## DEAD CODE AUDIT mode + +**Trigger:** Step definitions added but scenarios removed, PageObject redesigned, new PO classes created but not yet wired into steps. + +**Scope:** Full audit of `playwright/steps/`, `playwright/pages/`, and `playwright/features/` for dead code. + +Three standalone Python scripts live in `scripts`: + +### 1 · `find_unused_steps.py` — step definitions with no feature coverage + +Parses all `*.steps.ts` files for `Given(…)`, `When(…)`, `Then(…)` pattern strings, then scans every `.feature` file for matching step usages (Cucumber expression placeholders resolved to regex wildcards). Reports any step definition that is never exercised. + +```bash +# Run from aul-ui/ +python playwright/scripts/find_unused_steps.py \ + --steps-dir playwright/steps \ + --features-dir playwright/features +``` + +### 2 · `find_unused_po_methods.py` — PageObject methods never called from step files + +Parses every `playwright/pages/*.ts` for public method declarations (`async name(` / `name(`), then scans all step files for `.name(` call sites. Reports methods that are defined but never invoked from any step. + +```bash +python playwright/scripts/find_unused_po_methods.py \ + --pages-dir playwright/pages \ + --steps-dir playwright/steps +``` + +### 3 · `find_unused_po_components.py` — PageObject classes not imported anywhere + +Scans all exported `class` names from `playwright/pages/*.ts`, then checks every `*.steps.ts` and `fixtures.ts` for import statements. Reports classes that are defined but never imported. + +```bash +python playwright/scripts/find_unused_po_components.py \ + --pages-dir playwright/pages \ + --steps-dir playwright/steps +``` + +### When to run + +| Trigger | Script(s) to run | +|---------|-----------------| +| Step definition added or removed | `find_unused_steps.py` | +| PageObject method added, renamed, or deleted | `find_unused_po_methods.py` | +| New PageObject class created | `find_unused_po_components.py` | +| Before any REMOVE operation | All three | +| CI / pre-merge gate | All three (each exits 1 on findings) | + +### Handling findings + +- **Unused step def**: either add a scenario that exercises it, or delete the step definition. +- **Unused PO method**: either write a step that calls it, or remove the method from the PageObject. +- **Unused PO class**: either add an import and fixture entry, or remove the `.ts` file — after confirming nothing references it outside the test suite. + +All three scripts exit `0` on clean, `1` on findings, `2` on bad arguments — safe for CI gating. diff --git a/skills/bdd-maintain/scripts/find_unused_po_components.py b/skills/bdd-maintain/scripts/find_unused_po_components.py new file mode 100644 index 0000000..973a241 --- /dev/null +++ b/skills/bdd-maintain/scripts/find_unused_po_components.py @@ -0,0 +1,136 @@ +#!/usr/bin/env python3 +""" +find_unused_po_components.py — Dead-code detector: PageObject classes never imported in step files. + +Usage: + python playwright/scripts/find_unused_po_components.py [--pages-dir DIR] [--steps-dir DIR] + +Exits with code 1 if any unused PageObject classes are found (useful in CI). +""" +from __future__ import annotations + +import argparse +import re +import sys +from pathlib import Path +from dataclasses import dataclass + + +# --------------------------------------------------------------------------- +# Patterns +# --------------------------------------------------------------------------- + +# Match class declarations in TypeScript: `export class FooPage {` +CLASS_DEF_RE = re.compile(r"export\s+class\s+([A-Z][a-zA-Z0-9_$]*)") + +# Match import statements: `import { Foo, Bar } from './...'` +IMPORT_BRACE_RE = re.compile(r"import\s*\{([^}]+)\}\s*from") +# Match default imports: `import Foo from './...'` +IMPORT_DEFAULT_RE = re.compile(r"import\s+([A-Z][a-zA-Z0-9_$]*)\s+from") + +# Match TypeScript type usage: e.g. param: FooPage, variable: FooPage, extends FooPage +TYPE_USE_RE = re.compile(r"\b([A-Z][a-zA-Z0-9_$]*)\b") + + +@dataclass +class PageObjectClass: + name: str + file: Path + + +def collect_po_classes(pages_dir: Path) -> list[PageObjectClass]: + """Extract exported class names from all PageObject .ts files.""" + classes: list[PageObjectClass] = [] + for ts_file in sorted(pages_dir.glob("*.ts")): + text = ts_file.read_text(encoding="utf-8") + for m in CLASS_DEF_RE.finditer(text): + classes.append(PageObjectClass(m.group(1), ts_file)) + return classes + + +def collect_imported_names(steps_dir: Path) -> set[str]: + """Collect all identifiers imported or used in step files and fixtures.""" + names: set[str] = set() + # Also scan fixtures.ts at the parent of steps_dir or sibling file + scan_dirs = [steps_dir] + parent = steps_dir.parent + fixtures_file = parent / "fixtures.ts" + extra_files: list[Path] = [] + if fixtures_file.exists(): + extra_files.append(fixtures_file) + + for ts_file in list(sorted(steps_dir.rglob("*.ts"))) + extra_files: + text = ts_file.read_text(encoding="utf-8") + # Named imports: import { Foo, Bar } from ... + for m in IMPORT_BRACE_RE.finditer(text): + for identifier in m.group(1).split(","): + stripped = identifier.strip() + # Handle `Foo as F` aliasing + actual = stripped.split(" as ")[0].strip() + names.add(actual) + # Default imports + for m in IMPORT_DEFAULT_RE.finditer(text): + names.add(m.group(1)) + + return names + + +def main() -> int: + parser = argparse.ArgumentParser( + description="Find PageObject classes never imported in step files or fixtures." + ) + parser.add_argument( + "--pages-dir", + default="playwright/pages", + help="Directory containing PageObject *.ts files (default: playwright/pages)", + ) + parser.add_argument( + "--steps-dir", + default="playwright/steps", + help="Directory containing *.steps.ts files (default: playwright/steps)", + ) + args = parser.parse_args() + + pages_dir = Path(args.pages_dir) + steps_dir = Path(args.steps_dir) + + if not pages_dir.is_dir(): + print(f"ERROR: pages-dir not found: {pages_dir}", file=sys.stderr) + return 2 + if not steps_dir.is_dir(): + print(f"ERROR: steps-dir not found: {steps_dir}", file=sys.stderr) + return 2 + + print(f"Scanning PageObject classes in: {pages_dir}") + print(f"Scanning imports in: {steps_dir}") + + # Also look for fixtures.ts + fixtures_path = steps_dir.parent / "fixtures.ts" + if fixtures_path.exists(): + print(f"Also scanning: {fixtures_path}") + print() + + po_classes = collect_po_classes(pages_dir) + imported_names = collect_imported_names(steps_dir) + + print(f"Found {len(po_classes)} PageObject class(es) in {pages_dir}") + print(f"Found {len(imported_names)} imported name(s) across step files") + print() + + unused = [c for c in po_classes if c.name not in imported_names] + + if not unused: + print("✅ All PageObject classes are imported/used.") + return 0 + + print(f"⚠️ {len(unused)} UNUSED PageObject class(es) found:\n") + for cls in sorted(unused, key=lambda c: c.name): + print(f" {cls.name:<40} {cls.file}") + print() + print("Action: either import and use these classes in step files, or remove the PageObject files.") + print() + return 1 + + +if __name__ == "__main__": + sys.exit(main()) diff --git a/skills/bdd-maintain/scripts/find_unused_po_methods.py b/skills/bdd-maintain/scripts/find_unused_po_methods.py new file mode 100644 index 0000000..2aa792f --- /dev/null +++ b/skills/bdd-maintain/scripts/find_unused_po_methods.py @@ -0,0 +1,174 @@ +#!/usr/bin/env python3 +""" +find_unused_po_methods.py — Dead-code detector: PageObject methods never called from step files. + +Usage: + python playwright/scripts/find_unused_po_methods.py [--pages-dir DIR] [--steps-dir DIR] + +Exits with code 1 if any unused methods are found (useful in CI). +""" +from __future__ import annotations + +import argparse +import re +import sys +from pathlib import Path +from dataclasses import dataclass + + +# --------------------------------------------------------------------------- +# Patterns +# --------------------------------------------------------------------------- + +# Public method definitions in TypeScript classes: +# async methodName( methodName( +# Exclude: constructor, private/protected (prefixed with modifier word) +# Match only on lines that look like method declarations (not calls) +METHOD_DEF_RE = re.compile( + r""" + ^[ \t]* # leading indent + (?!private|protected|readonly|static|get |set ) # not a modifier-only line + (?:async\s+)? # optional async + ([a-zA-Z_$][a-zA-Z0-9_$]*) # method name + \s*\( # opening paren + (?!.*:\s*Promise|\s*\{) # exclude constructor-like and block-only lines + """, + re.VERBOSE | re.MULTILINE, +) + +# Simpler fallback: any `async name(` or `name(` at line start with content +# We'll collect both and deduplicate +SIMPLE_METHOD_RE = re.compile( + r"""^[ \t]+(?:async\s+)([a-zA-Z_$][a-zA-Z0-9_$]*)\s*\(""", + re.MULTILINE, +) + +# TypeScript getter shorthand — these are locators, not callable methods +GETTER_RE = re.compile(r"^[ \t]+get\s+([a-zA-Z_$][a-zA-Z0-9_$]*)\s*\(", re.MULTILINE) + +# Method calls from step files: `.methodName(` or `fixture.methodName(` +# Capture any `.identifier(` occurrence +CALL_RE = re.compile(r"\.([a-zA-Z_$][a-zA-Z0-9_$]*)\s*\(") + +# Excluded names — too generic or framework-level +EXCLUDED_NAMES = frozenset({ + "constructor", "toString", "valueOf", "then", "catch", "finally", + "nth", "first", "last", "locator", "getByTestId", "getByRole", + "getByText", "fill", "click", "isVisible", "isDisabled", "isEnabled", + "waitFor", "expectToBeVisible", "expect", "goto", "reload", + "setTimeout", "setInputFiles", "hover", "press", "type", "check", + "uncheck", "selectOption", "evaluate", "dispatchEvent", "focus", + "blur", "screenshot", "textContent", "innerText", "inputValue", + "getAttribute", "scrollIntoViewIfNeeded", + # common Angular/Playwright helpers + "map", "filter", "forEach", "push", "join", "split", "trim", + "toLowerCase", "toUpperCase", "replace", "includes", "startsWith", + "endsWith", "slice", "substring", "indexOf", +}) + + +@dataclass +class MethodDef: + name: str + file: Path + line: int + + +def collect_po_methods(pages_dir: Path) -> list[MethodDef]: + """Extract public method names from all PageObject .ts files.""" + methods: list[MethodDef] = [] + seen: set[tuple[Path, str]] = set() + + for ts_file in sorted(pages_dir.glob("*.ts")): + text = ts_file.read_text(encoding="utf-8") + lines = text.splitlines() + + # Look for `async methodName(` or method-like declarations + for m in re.finditer(r"^[ \t]+(?:async\s+)?([a-zA-Z_$][a-zA-Z0-9_$]*)\s*\(", text, re.MULTILINE): + name = m.group(1) + if name in EXCLUDED_NAMES: + continue + if name == "constructor": + continue + line_num = text[: m.start()].count("\n") + 1 + # skip getter declarations + line_text = lines[line_num - 1] if line_num <= len(lines) else "" + if re.match(r"^\s*get\s+", line_text): + continue + + key = (ts_file, name) + if key not in seen: + seen.add(key) + methods.append(MethodDef(name, ts_file, line_num)) + + return methods + + +def collect_called_methods(steps_dir: Path) -> set[str]: + """Collect all method names called (via `.name(`) in step files and fixtures.""" + called: set[str] = set() + for ts_file in sorted(steps_dir.rglob("*.ts")): + text = ts_file.read_text(encoding="utf-8") + for m in CALL_RE.finditer(text): + called.add(m.group(1)) + return called + + +def main() -> int: + parser = argparse.ArgumentParser( + description="Find PageObject methods never called from step files." + ) + parser.add_argument( + "--pages-dir", + default="playwright/pages", + help="Directory containing PageObject *.ts files (default: playwright/pages)", + ) + parser.add_argument( + "--steps-dir", + default="playwright/steps", + help="Directory containing *.steps.ts files (default: playwright/steps)", + ) + args = parser.parse_args() + + pages_dir = Path(args.pages_dir) + steps_dir = Path(args.steps_dir) + + if not pages_dir.is_dir(): + print(f"ERROR: pages-dir not found: {pages_dir}", file=sys.stderr) + return 2 + if not steps_dir.is_dir(): + print(f"ERROR: steps-dir not found: {steps_dir}", file=sys.stderr) + return 2 + + print(f"Scanning PageObject methods in: {pages_dir}") + print(f"Scanning method calls in: {steps_dir}") + print() + + po_methods = collect_po_methods(pages_dir) + called_methods = collect_called_methods(steps_dir) + + print(f"Found {len(po_methods)} public method(s) across {pages_dir}") + print(f"Found {len(called_methods)} distinct method call(s) across {steps_dir}") + print() + + unused = [m for m in po_methods if m.name not in called_methods] + + if not unused: + print("✅ All PageObject methods are used.") + return 0 + + print(f"⚠️ {len(unused)} UNUSED PageObject method(s) found:\n") + by_file: dict[Path, list[MethodDef]] = {} + for m in unused: + by_file.setdefault(m.file, []).append(m) + + for file, methods in sorted(by_file.items()): + print(f" {file}") + for meth in sorted(methods, key=lambda x: x.line): + print(f" line {meth.line:4d} {meth.name}()") + print() + return 1 + + +if __name__ == "__main__": + sys.exit(main()) diff --git a/skills/bdd-maintain/scripts/find_unused_steps.py b/skills/bdd-maintain/scripts/find_unused_steps.py new file mode 100644 index 0000000..b5df559 --- /dev/null +++ b/skills/bdd-maintain/scripts/find_unused_steps.py @@ -0,0 +1,151 @@ +#!/usr/bin/env python3 +""" +find_unused_steps.py — Dead-code detector: step definitions unused in any .feature file. + +Usage: + python playwright/scripts/find_unused_steps.py [--steps-dir DIR] [--features-dir DIR] + +Exits with code 1 if any unused steps are found (useful in CI). +""" +from __future__ import annotations + +import argparse +import re +import sys +from pathlib import Path +from dataclasses import dataclass, field + + +# --------------------------------------------------------------------------- +# Patterns +# --------------------------------------------------------------------------- + +# Matches: Given('pattern', ...) / When(`pattern`, ...) / Then("pattern", ...) +# playwright-bdd uses Given/When/Then imported from cucumber or createBdd +STEP_DEF_RE = re.compile( + r"""(?:Given|When|Then|And|But)\s*\(\s*(['"`])(.*?)\1""", + re.DOTALL, +) + +# Playwright-bdd also supports: @Given('pattern') @When('pattern') @Then('pattern') +DECORATOR_RE = re.compile( + r"""@(?:Given|When|Then|And|But)\s*\(\s*(['"`])(.*?)\1""", + re.DOTALL, +) + +# Convert a Cucumber expression / simple regex pattern to a Python regex +# Cucumber expression placeholders: {string}, {int}, {word}, {float} +CUCUMBER_PLACEHOLDER_RE = re.compile(r"\{(?:string|int|word|float|[^}]+)\}") + + +@dataclass +class StepDefinition: + pattern: str # raw pattern string from source + file: Path + line: int + regex: re.Pattern # compiled regex for matching + + +def cucumber_to_regex(pattern: str) -> re.Pattern: + """Convert a Cucumber expression pattern to a Python compiled regex.""" + escaped = re.escape(pattern) + # Restore the placeholder wildcards after escaping + # {string} → matches "..." or '...' → use non-greedy wildcard for simplicity + escaped = CUCUMBER_PLACEHOLDER_RE.sub(r".+?", re.escape(pattern)) + # Actually redo: escape first, then replace placeholders + escaped = re.escape(pattern) + escaped = re.sub(r"\\{(?:string|int|word|float|[^}]+)\\}", r".+?", escaped) + return re.compile(rf"^\s*{escaped}\s*$", re.IGNORECASE) + + +def collect_step_definitions(steps_dir: Path) -> list[StepDefinition]: + defs: list[StepDefinition] = [] + for ts_file in sorted(steps_dir.rglob("*.steps.ts")): + text = ts_file.read_text(encoding="utf-8") + for pattern_re in (STEP_DEF_RE, DECORATOR_RE): + for m in pattern_re.finditer(text): + raw_pattern = m.group(2) + line = text[: m.start()].count("\n") + 1 + try: + compiled = cucumber_to_regex(raw_pattern) + defs.append(StepDefinition(raw_pattern, ts_file, line, compiled)) + except re.error: + print(f" WARN: could not compile pattern at {ts_file}:{line}: {raw_pattern!r}", + file=sys.stderr) + return defs + + +def collect_feature_steps(features_dir: Path) -> list[str]: + """Return every step line from every .feature file (stripped of keyword).""" + step_line_re = re.compile( + r"^\s*(?:Given|When|Then|And|But)\s+(.+)$", re.IGNORECASE + ) + steps: list[str] = [] + for feat_file in sorted(features_dir.rglob("*.feature")): + for line in feat_file.read_text(encoding="utf-8").splitlines(): + m = step_line_re.match(line) + if m: + steps.append(m.group(1).strip()) + return steps + + +def main() -> int: + parser = argparse.ArgumentParser(description="Find unused Playwright-BDD step definitions.") + parser.add_argument( + "--steps-dir", + default="playwright/steps", + help="Directory containing *.steps.ts files (default: playwright/steps)", + ) + parser.add_argument( + "--features-dir", + default="playwright/features", + help="Directory containing *.feature files (default: playwright/features)", + ) + args = parser.parse_args() + + steps_dir = Path(args.steps_dir) + features_dir = Path(args.features_dir) + + if not steps_dir.is_dir(): + print(f"ERROR: steps-dir not found: {steps_dir}", file=sys.stderr) + return 2 + if not features_dir.is_dir(): + print(f"ERROR: features-dir not found: {features_dir}", file=sys.stderr) + return 2 + + print(f"Scanning step definitions in: {steps_dir}") + print(f"Scanning feature steps in: {features_dir}") + print() + + step_defs = collect_step_definitions(steps_dir) + feature_steps = collect_feature_steps(features_dir) + + print(f"Found {len(step_defs)} step definition(s) across {steps_dir}") + print(f"Found {len(feature_steps)} step usage(s) across {features_dir}") + print() + + unused: list[StepDefinition] = [] + for sd in step_defs: + matched = any(sd.regex.match(fs) for fs in feature_steps) + if not matched: + unused.append(sd) + + if not unused: + print("✅ All step definitions are used.") + return 0 + + print(f"⚠️ {len(unused)} UNUSED step definition(s) found:\n") + by_file: dict[Path, list[StepDefinition]] = {} + for sd in unused: + by_file.setdefault(sd.file, []).append(sd) + + for file, sds in sorted(by_file.items()): + print(f" {file}") + for sd in sorted(sds, key=lambda s: s.line): + print(f" line {sd.line:4d} {sd.pattern!r}") + print() + return 1 + + +if __name__ == "__main__": + sys.exit(main()) diff --git a/skills/bdd-scenario-gen/SKILL.md b/skills/bdd-scenario-gen/SKILL.md new file mode 100644 index 0000000..90fd0a1 --- /dev/null +++ b/skills/bdd-scenario-gen/SKILL.md @@ -0,0 +1,87 @@ +--- +name: bdd-scenario-gen +description: > + Generate Gherkin scenario skeletons from User Story Acceptance Criteria and resolve + step definitions for the @living-doc-bdd-copilot agent. Activate after exploration + completes (manifest up to date) or when a specific US needs BDD coverage. + Covers gap detection logic, scenario skeleton generation, step reuse/stub rules, + feature file naming and header conventions, and @AC: traceability tagging. + Triggers on: "generate scenarios", "cover AC with scenarios", "generate feature file", + "gherkin from user story", "scenario coverage", "map AC to scenarios", + "AC coverage for US", "scenarios for US-", "bdd scenario gen". +--- + +# BDD Scenario Generation + +Use after exploration completes (manifest is up to date), or targeting a specific User Story. + +--- + +## Gap Detection + +An AC is considered uncovered if no scenario in any `.feature` file carries the `@AC:<id>` traceability tag. + +1. Use the `living-doc-gap-finder` skill (bottom-up mode) to identify User Stories with `ACTIVE` ACs that have no linked Gherkin scenario. +2. For each gap: generate Gherkin scenario skeletons — one scenario per `Active` or `Implemented` AC, with the mandatory `@AC:` traceability tag. Skip `Planned` and `Deprecated` ACs. + +--- + +## Feature File Conventions + +- Write `.feature` files under `features/us/` using `us-<nnn>-<kebab-title>.feature` naming, e.g. `features/us/us-007-place-an-online-order.feature`. +- The `Feature:` header must restate the User Story narrative in `As a / I can / so that` form. +- Scenario step text must stay in business/domain language only — never mention selectors, HTTP calls, DOM details, or database operations. + +--- + +## Traceability Annotations + +Every `Scenario:` or `Scenario Outline:` in a living-doc feature file must carry two complementary annotations: + +1. A `# AC:` comment — human-readable context (ID, version, state, description, optional aspect). +2. An `@AC:` Cucumber tag — machine-readable link: `@AC:<id>[/param:value...]`. + +```gherkin +# AC:US-1-01 (v1.0.0 - Active) — customer places an order with a saved payment method +@AC:US-1-01 +Scenario: Customer successfully places an order +``` + +When the scenario covers only **one aspect** of a multi-aspect AC, encode it as a `/param:value` segment: + +```gherkin +# AC:US-1-01 (v1.0.0 - Active) — displays {required field} on login screen | aspect: username input +@AC:US-1-01/aspect:username-input +Scenario: Login form shows the username input field +``` + +Multiple ACs — one comment + tag pair per AC: + +```gherkin +# AC:US-1-01 (v1.0.0 - Active) — invalid credentials show an error message +# AC:US-1-02 (v1.0.0 - Active) — account lockout after 3 failed attempts +@AC:US-1-01 +@AC:US-1-02 +@Regression +Scenario: User is locked out after repeated failed logins +``` + +Feature files outside `features/us/` and `features/functionalities/` (smoke tests, regression suites, exploratory probes) do not require these annotations. + +--- + +## Step Definition Resolution + +For each generated scenario: + +a. **Narrow the search scope to the page first** — identify which PageObject the scenario's steps will interact with. Look in step definition files that already import or reference that PageObject; these are the most likely candidates for reuse. + +b. **Match by purpose, not just pattern** — read the step's implementation body to confirm it performs the same business action (e.g. a `fill` on `username-input` vs a `fill` on `search-input` look identical in text but serve different purposes). Only reuse if purpose matches. + +c. If a purpose-matching step exists, reuse it as-is; note which library file it lives in. + +d. If no reusable step exists but the needed PageObject method already exists, generate a full step stub via `gherkin-step` that delegates directly to that PageObject method. + +e. If neither the step nor the PageObject method exists, generate a stub that raises `NotImplementedError` (or the language-equivalent pending marker) and explicitly flag that the PageObject must be extended with the missing interaction. + +After resolution, update `manifest.json` to record any new PageObject paths created. diff --git a/skills/data-cy-instrument/SKILL.md b/skills/data-cy-instrument/SKILL.md new file mode 100644 index 0000000..54d3e55 --- /dev/null +++ b/skills/data-cy-instrument/SKILL.md @@ -0,0 +1,240 @@ +--- +name: data-cy-instrument +description: > + Automatically resolve missing `data-cy` attributes in Angular component templates + and sync the corresponding Playwright PageObjects to use `getByTestId()`. Activate + whenever coverage gaps exist in `manifest.json`, when PageObject stubs carry + "⚠️ PROPOSED" locator comments, when Functionality entities have `status: planned` + due to missing test IDs, or when a dev explicitly asks to instrument templates. + Fires automatically at the end of a `bdd-explore` or `bdd-maintain` RE-SCAN session + when `coverage_gaps` arrays are non-empty. + Triggers on: "add missing data-cy", "instrument templates", "fix data-cy gaps", + "add testids", "data-cy audit", "instrument angular templates", "fix locators", + "add data-cy attributes", "add test ids to templates", "fix playwright selectors", + "data-cy-instrument". +--- + +# data-cy-instrument + +Resolves missing `data-cy` attributes end-to-end: from gap discovery in `manifest.json` +through Angular template edits, PageObject sync, Functionality promotion, and WORK_LOG +status update. All steps are in sequence — do not skip steps or re-order them. + +--- + +## When this skill activates + +- `manifest.json` has one or more surfaces with a non-empty `coverage_gaps` array +- A PageObject file contains a locator comment marked `⚠️ PROPOSED` or `⚠️ NOT YET IN TEMPLATE` +- A Functionality `.feature` file has `status: planned` with a comment indicating the reason is missing `data-cy` +- WORK_LOG.md §4 has rows marked 🔴 or ⚠️ +- User asks to add/fix data-cy or instrument an Angular template + +--- + +## Phase 1 · Gap Audit + +Build a prioritised gap list before touching any file. + +1. Load `.copilot/bdd/manifest.json`. For each surface entry, extract `coverage_gaps` items. +2. Load `.copilot/bdd/WORK_LOG.md` §4 — identify rows with status 🔴 (pending) or ⚠️ (lib-limited). +3. Cross-reference with `issue-missing-data-cy.md` if present at `.copilot/bdd/`. +4. For each gap, record: + + ``` + route: /auth/domain-access-control + element_desc: Status filter toggle (Pending / Approved / Rejected) + suggested_data_cy: filter-access-status + component_hint: domain-access-approvals-new.component.html + priority: P1 | P2 | P3 + ``` + +5. Sort by priority P1 → P3. Process in that order. + +**Skip list — do not attempt to instrument these:** +- Elements inside third-party library internals where the host attribute is confirmed not to be propagated (e.g. `cps-table` inner paginator buttons, `cps-tab` inner `<li role="tab">` when the lib does not forward host attributes). Mark these ⚠️ "needs lib support" and surface them as a library issue to the dev team, not a template change. +- Elements that require authenticated roles to render — flag as needing an integration test fixture, not a data-cy change. + +--- + +## Phase 2 · Route → Component Resolution + +For each gap, resolve which Angular component owns the element. + +1. Open `aul-ui/src/app/pages/authenticated/authenticated-routing.module.ts` (or the relevant routing module). +2. Find the route path matching the gap's route. +3. Check feature-flag conditionals: + - `environment.useBoundedCtxApi` (runtime flag `BD_CTX_API`) — if present, there are two component variants: `-new` (flag on) and the legacy component (flag off). **Instrument both.** + - `SHOW_EXPERIMENTAL_FEATURES` — instrument only if the element is inside that guard. +4. If the component is a wrapper that delegates to a child (`<app-*>` sub-component), follow the child selector to its `.html` file. Repeat until the element is found. +5. Record the resolved template path(s) before making any edits. + +--- + +## Phase 3 · Name Validation + +Before writing any `data-cy` value, validate the candidate name. + +**Naming prefix rules:** + +| Prefix | Use for | Example | +|---|---|---| +| `btn-` | Any CTA button (`<cps-button>`, `<button>`) | `btn-request-access-rights` | +| `tab-` | Tab host element (`<cps-tab>`, `<li role="tab">`) | `tab-version-management` | +| `filter-` | Filter control (toggle group, dropdown used to filter) | `filter-access-status` | +| `toggle-` | Boolean toggle (checkbox, switch) | `toggle-primary-ownership` | +| `input-` | Text / number input field | `input-domain-name` | +| `row-` | Clickable / selectable table row | `row-run-history` | +| `metric-` | Read-only display card / KPI tile | `metric-coverage` | +| `pagination-` | Pagination control | `pagination-page` | +| `dialog-` | Modal / side-nav container | `dialog-import-domain` | +| `select-` | Dropdown / select used to choose a value | `select-country-code` | + +**Format rules:** +- Kebab-case only — no underscores, no camelCase. +- Must be unique in the workspace: run `grep_search` for the candidate value across `aul-ui/src/**/*.html` before writing. If already present, append a disambiguating suffix (e.g. `-header`, `-row`, `-footer`). +- Must be descriptive enough to understand the element's purpose without context. + +--- + +## Phase 4 · Apply to Angular Template + +Add `data-cy` to the **host** Angular component element, not the inner native element. + +**The rule:** +```html +<!-- ✅ Correct — data-cy on the CPS host component --> +<cps-button data-cy="btn-create-draft-version" label="Create new draft version" ...> +</cps-button> + +<!-- ❌ Wrong — data-cy on the inner native button rendered by the library --> +<cps-button ...> + <button data-cy="btn-create-draft-version">Create new draft version</button> +</cps-button> +``` + +**Placement:** Add `data-cy` as the second attribute after the component tag name (or after any structural directive like `*ngIf`, `@if`, `[ngClass]`). Preserve all existing attributes and indentation exactly. + +**Multi-line component elements:** +```html +<!-- Before --> +<cps-button + class="go-to-all-access-requests-btn" + type="borderless" + label="Apply for access" + (clicked)="goToAllAccessRequests()"> + +<!-- After — data-cy on the second line, before class --> +<cps-button + data-cy="btn-apply-access-other" + class="go-to-all-access-requests-btn" + type="borderless" + label="Apply for access" + (clicked)="goToAllAccessRequests()"> +``` + +**Inline component elements:** +```html +<!-- Before --> +<cps-button (clicked)="openViewAccessReqSidenav(item)" color="prepared" label="View" type="borderless"> + +<!-- After --> +<cps-button (clicked)="openViewAccessReqSidenav(item)" color="prepared" data-cy="btn-view-access-request" label="View" type="borderless"> +``` + +When a gap covers multiple instances of the same component in a loop (e.g. one "View" button per table row), add the `data-cy` once on the template element — the PageObject will use `.nth(index)` to distinguish instances. + +--- + +## Phase 5 · PageObject Sync + +After every template change, update the matching PageObject in `aul-ui/playwright/pages/`. + +**Replace proposed/fallback locators with `getByTestId()`:** + +```typescript +// Before — text fallback or proposed comment +// ⚠️ PROPOSED data-cy: btn-request-access-rights +readonly requestAccessButton: Locator = page.locator('cps-button', { hasText: 'Request access' }); + +// After +readonly requestAccessButton: Locator = page.getByTestId('btn-request-access-rights').locator('button'); +``` + +**Inner element resolution:** `getByTestId()` resolves the host Angular component element. For Playwright interactions (`click`, `fill`), chain `.locator('button')` or `.locator('input')` on the result if the interaction target is the native element inside the host. + +**Remove stub markers:** Delete any comment lines containing `⚠️ PROPOSED`, `⚠️ NOT YET IN TEMPLATE`, or `will resolve once template is updated` that relate to the now-instrumented elements. + +**Update PageObject header comments:** +- Change `status: candidate` → `status: active` if all locators for the page are now resolved. +- Remove `stub-reason:` line if no un-instrumented elements remain. + +--- + +## Phase 6 · Living Doc Promotion + +For each Functionality whose `status: planned` was solely due to missing `data-cy`: + +1. Open `aul-ui/playwright/features/liv_doc_func/func-{NNN}-*.feature`. +2. Change `# status: planned` → `# status: active` in the comment header. +3. Remove the `# planned-reason: no data-cy attributes` comment line if present. +4. Do **not** change any other header fields (AC text, func_type, feature, etc.). + +Only promote if the data-cy attributes required by that Functionality's ACs have all been added in Phase 4. If a Functionality depends on multiple elements and only some were instrumented, leave it as `planned` and add a comment listing the remaining blockers. + +--- + +## Phase 7 · WORK_LOG Update + +Update `.copilot/bdd/WORK_LOG.md` §4 and §8 to reflect completed work. + +**§4 row updates:** +- Change 🔴 → ✅ for each element that was instrumented. +- Change `Suggested data-cy` column to the `data-cy` column for confirmed values. +- Add a "Files updated:" note under the section header listing the template file(s) changed. + +**§8 open items:** +- Close OI items that are now fully resolved: change status column to `✅ closed` or remove the row. +- If a gap was partially resolved (e.g. some elements done, some need lib support), update the item description to reflect remaining scope. + +--- + +## Output after completing all phases + +Report the following at the end of the run: + +``` +## data-cy-instrument run summary + +### Templates updated +- <file-path>: <list of data-cy values added> + +### PageObjects synced +- <PageObject.ts>: <locators updated> + +### Functionalities promoted +- <func-NNN-*.feature>: planned → active + +### Remaining gaps (lib-limited or deferred) +- <element description>: <reason> + +### WORK_LOG §4 rows closed: N +### WORK_LOG OI items closed: N +``` + +--- + +## Interaction with other skills + +| Skill | Relationship | +|---|---| +| `bdd-explore` | Upstream — produces `manifest.json` with `coverage_gaps`. This skill consumes that output. | +| `bdd-maintain` RE-SCAN | Upstream — re-generates `coverage_gaps` after a UI change. Trigger this skill after RE-SCAN if new gaps appear. | +| `bdd-scenario-gen` | Downstream — after Functionalities are promoted from `planned` to `active`, generate Gherkin scenarios for them. | +| `living-doc-update` | Downstream — if PageObject header `status` changes, the corresponding Feature entity in the living doc may also need a status update. | + +**Pipeline position:** +``` +bdd-explore (scan) → data-cy-instrument → bdd-scenario-gen +bdd-maintain RE-SCAN → data-cy-instrument → bdd-scenario-gen +``` diff --git a/skills/living-doc-pageobject-scan/SKILL.md b/skills/living-doc-pageobject-scan/SKILL.md index d2ff017..793d253 100644 --- a/skills/living-doc-pageobject-scan/SKILL.md +++ b/skills/living-doc-pageobject-scan/SKILL.md @@ -69,7 +69,43 @@ For each distinct screen/route, extract: - Display elements: tables, lists, notifications, modals - Page-level: title, heading (h1), primary URL pattern -**3. Generate PageObject skeleton** +**3. Form traversal (deep exploration)** + +For each form, wizard, or dialog discovered on the route, attempt to fill and progress: + +a. **Resolve field values** using the sourcing cascade (see `ExplorationFixture` in the glossary): + 1. Check `seed.yaml form_fixtures` for a pre-declared value for this route + field. + 2. If absent: navigate to the entity list for this surface type; read an actual field value + from an existing entity. Replay it as `copyable`, or append a suffix (e.g. `-copy`) to + name fields to avoid duplicate rejection (`derived`). + 3. If no existing entities: infer a `fake` value from label + placeholder + tooltip + + adjacent validation hint text. +4. If a `real-world` field has no resolvable value: user-assist pause → ask user → record + to `form_fixtures` with `source: user_provided`, then continue. + +b. **Fill and progress** — fill all resolved fields; click Submit or Next. Scan the resulting + page or confirmation state. Record the fill sequence as `navigation_steps` and the required + values as `data_requirements` in the manifest `navigation_context`. + +c. **Probe validation behaviour** — after a successful fill-and-submit, return to the form and + probe each text input: + - **Special characters** (`<>'"&\`) — observe inline error, silent strip, or truncation. + - **Oversized input** (200+ random characters) — observe character counter, truncation at + max length, or rejection message. + - **Wrong type** (alphabetic text in a numeric or date field) — observe inline validation + message. + - **Duplicate detection** (value identical to a known existing entity name) — observe + duplicate-rejection error and capture its `data-cy`. + +d. **Scan validation state** — after each probe, run the core scan and elements-without-data-cy + scripts to capture `data-cy` error messages, character counters, and validation banners that + are only visible during invalid input. These become source material for `field_validation` + Functionality stubs. + +e. **Record findings** in the manifest `navigation_context.field_constraints` for this route: + `{ field_data_cy, max_length, special_chars, duplicate, duplicate_error_data_cy, real_world_required }` + +**4. Generate PageObject skeleton** One PageObject class per distinct screen. Naming: `<ScreenName>Page`. @@ -150,14 +186,14 @@ Still include the current selector in the generated PageObject so test authoring annotate that selector constant with a `FRAGILE` comment and repeat the warning in the scan / breaking change report. -**4. Map PageObjects to Feature entities** +**5. Map PageObjects to Feature entities** One PageObject ≈ one `UI` Feature. Write the Feature ID as a header comment in the generated PageObject file (the `// living-doc: FEAT-<nnn> | <route>` line shown in the templates above). Also record `feature_id` in the manifest entry for the route. - If a matching Feature (`FEAT-<nnn>`) exists in the living documentation: add the header comment and manifest entry. - If no Feature exists: write `// living-doc: FEAT-UNKNOWN | <route>` as a placeholder and flag the route in the scan report as **"needs Feature entity"**. Do not auto-create a Feature file — raise it for the team to create via `living-doc-create-feature`. -**5. Generate Functionality stubs from discovered behaviors** +**6. Generate Functionality stubs from discovered behaviors** For each **behavior** identified on the screen — an interaction pattern, business operation, or component capability — propose a Functionality stub (`FUNC-<nnn>`) with a name following @@ -253,6 +289,8 @@ files before running a full rescan. | Feature link | `// living-doc: FEAT-<nnn> \| <route>` header comment in the PageObject file. If no Feature exists: `FEAT-UNKNOWN` placeholder and a note in the scan report. Header format TBD — will follow similar conventions to the US/FUNC feature file header. | | Functionality feature file stubs | `features/functionalities/<feature-kebab>/func-<kebab>.feature` — one file per discovered Functionality behavior, `@FUNC_ID:FUNC-UNKNOWN` tag until ID is assigned | | Breaking change report | `.copilot/bdd/breaking-changes.md` | +| Inaccessible routes (PHASE 5) | `.copilot/bdd/scan-phase5-inaccessible.md` | +| Final scan report (PHASE 6) | `.copilot/bdd/scan-phase6-report.md` | | Exploration manifest | `.copilot/bdd/manifest.json` | > **Note:** Locations above are illustrative defaults. Actual paths depend on the project's repository structure and Storage Profile configuration. @@ -283,7 +321,8 @@ The manifest records per-route exploration state. Agents and tools read it to dr "navigation_steps": "Click sidebar item \u2018All Domains\u2019.", "data_requirements": null, "auth_role": "standard user", - "notes": null + "notes": null, + "field_constraints": [] } } } @@ -301,6 +340,7 @@ The manifest records per-route exploration state. Agents and tools read it to dr | `navigation_context.prerequisites` | string | State that must exist before navigating (e.g. "a domain must have been visited at least once"). | | `navigation_context.navigation_steps` | string | Step-by-step path to the route from the app root or login page. | | `navigation_context.data_requirements` | string/null | Test data that must exist (e.g. "at least one published domain"). | +| `navigation_context.field_constraints` | array | Per-field validation findings from form traversal probing. Schema: `{ field_data_cy, max_length, special_chars, duplicate, duplicate_error_data_cy, real_world_required }`. Empty array until probed. | | `navigation_context.auth_role` | string | Minimum role required to reach this route. | | `navigation_context.notes` | string/null | Any additional context for the agent (e.g. quirks, timing, overlay triggers). | diff --git a/skills/references/living-doc-glossary.md b/skills/references/living-doc-glossary.md index ac6d68b..5f7cd74 100644 --- a/skills/references/living-doc-glossary.md +++ b/skills/references/living-doc-glossary.md @@ -29,29 +29,31 @@ so that <business outcome>. **US feature file header format** (as used in `features/us/us-<nnn>-<kebab>.feature`): -The header comment block at the top of a US feature file holds all US metadata. This data is -collected during the living documentation output generation process (data mining). +The header comment block at the top of a US feature file holds all US metadata and is mined +during living documentation output generation. ```gherkin -# Source: https://github.com/<org>/<repo>/issues/<n> - -# Business Value: +# ============================================================================= +# LIVING DOC — US-<n> · <US Title> +# ============================================================================= +# source: https://github.com/<org>/<repo>/issues/<n> ← optional +# status: planned | active | deprecated +# business_value: # - <bullet describing the business outcome> - -# Not in scope: ← optional +# not_in_scope: ← optional # - <item excluded from this US> - -# Preconditions: ← optional +# preconditions: ← optional # - <system state required before test> - -# Acceptance Criteria: +# +# acceptance_criteria: # # AC:US-<n>-01 (v<version> - <State>) # - <description of the AC> -# - <Aspect>: <value1>, <value2> ← optional; used for {placeholder} ACs +# - <Aspect>: <value1>, <value2> ← optional; used for {placeholder} ACs # # AC:US-<n>-02 (v<version> - <State>) # - <description of the AC> +# ============================================================================= @US_ID:US-<n> Feature: <US Title> @@ -66,14 +68,15 @@ Feature: <US Title> ... ``` -**Header sections:** -| Section | Required | Purpose | +**Header fields:** +| Field | Required | Purpose | |---|---|---| -| `# Source:` | Optional | Link to the original issue tracker entry or the pre-BDD living doc location — primarily useful during migration from a legacy format | -| `# Business Value:` | Yes | Why this User Story exists (bullets) | -| `# Not in scope:` | Optional | Explicit exclusions | -| `# Preconditions:` | Optional | System-level state required before test execution | -| `# Acceptance Criteria:` | Yes | Full AC listing with IDs, versions, and states | +| `# source:` | Optional | Link to the original issue tracker entry or the pre-BDD living doc location | +| `# status:` | Yes | `planned` · `active` · `deprecated` | +| `# business_value:` | Yes | Why this User Story exists (bullets) | +| `# not_in_scope:` | Optional | Explicit exclusions | +| `# preconditions:` | Optional | System-level state required before test execution | +| `# acceptance_criteria:` | Yes | Full AC listing with IDs, versions, and states | | `@US_ID:US-<n>` tag | Yes | Machine-parseable User Story ID (feature-level tag) | ### Feature @@ -101,6 +104,176 @@ A named system surface — the structural layer between User Stories and atomic - `owner_changed_at` — date of ownership transfer - `owner_change_reason` — reason for the transfer +#### UI surface — PageObject file header + +Every PageObject file opens with a living-doc header block that embeds the canonical Feature fields. Use this format so each file is self-describing and traceable without opening a separate registry. + +**Required fields:** + +| Field | Canonical values | +|---|---| +| `surface_type` | `UI` · `API` · `Service` · `Worker` · `Module` · `Library` | +| `route` | URL path — use `{param}` for dynamic segments | +| `owners` | Team name(s), comma-separated | +| `status` | `active` · `planned` · `candidate` · `deprecated` | +| `purpose` | One-to-two sentence description in business language | +| `user_stories` | `US-N` IDs, comma-separated — or `none` (triggers orphan warning in gap reports) | +| `functionalities` | `FUNC-N` IDs, comma-separated — or `none` (triggers a reminder to define FUNCs) | +| `external_dependencies` | Service or API names this surface calls — or `none` | +| `page-object` | Filename of this PageObject | + +**Optional fields (for specific surface types):** + +| Field | When | +|---|---| +| `wizard-steps` | Multi-step wizard UI — list the named steps in order | +| `stub-reason` | `status: candidate` — one-to-two sentence statement of **why** the surface is not yet fully instrumented; treated as tech-debt resolvable by instrumenting the template and re-scanning | + +--- + +#### Two header formats: Full header vs Cross-reference header + +A PageObject file uses one of two formats depending on whether it is the **primary surface owner** or a **secondary file** that implements part of a surface already owned elsewhere. + +**When to use each:** + +| Situation | Format | +|---|---| +| One PageObject = one distinct navigable surface (URL or modal) | **Full header** | +| Multiple PageObjects share one URL (e.g. wizard steps, sub-pages, dialogs) — one file is the primary owner, the others are implementation helpers | **Cross-reference header** — secondary files only | + +**Rule:** exactly one file per Feature carries the full header. Every other file that contributes to the same Feature carries a cross-reference header with `parent-feat` pointing to the Feature ID. This keeps traceability fields (`user_stories`, `functionalities`, `external_dependencies`) in a single authoritative location. + +**Wizard example:** FEAT-042 (Account Setup Wizard) lives at one URL. `AccountSetupWizardPage.ts` is the primary file and carries the full header. `AccountSetupWizardProfilePage.ts`, `AccountSetupWizardPreferencesPage.ts`, and the other step files each carry a cross-reference header pointing `parent-feat: FEAT-042`. Adding a wizard step never requires editing the Feature registry or duplicating traceability data. + +--- + +**Full header required fields:** + +| Field | Canonical values | +|---|---| +| `surface_type` | `UI` · `API` · `Service` · `Worker` · `Module` · `Library` | +| `route` | URL path — use `{param}` for dynamic segments | +| `owners` | Team name(s), comma-separated | +| `status` | `active` · `planned` · `candidate` · `deprecated` | +| `purpose` | One-to-two sentence description in business language | +| `user_stories` | `US-N` IDs, comma-separated — or `none` (triggers orphan warning in gap reports) | +| `functionalities` | `FUNC-N` IDs, comma-separated — or `none` (triggers a reminder to define FUNCs) | +| `external_dependencies` | Service or API names this surface calls — or `none` | +| `page-object` | Filename of this PageObject | + +**Full header example:** + +```typescript +/* ============================================================================= + * LIVING DOC — FEAT-042 · Account Setup Wizard + * ============================================================================= + * surface_type: UI + * route: /app/accounts/setup + * owners: Platform Team + * status: active + * wizard-steps: Profile · Preferences · Review · Confirm + * purpose: Multi-step wizard for creating and configuring a new account. + * user_stories: US-10, US-12 + * functionalities: FUNC-005, FUNC-006 + * external_dependencies: accounts-api + * page-object: AccountSetupWizardPage.ts + * ============================================================================= */ +``` + +--- + +**Cross-reference header required fields:** + +| Field | Canonical values | +|---|---| +| `parent-feat` | `FEAT-<nnn>` — ID of the primary Feature that owns this surface. **Required.** | +| `route` | URL path of this specific sub-surface — use `{param}` for dynamic segments | +| `owners` | Team name(s), comma-separated | +| `status` | `active` · `planned` · `candidate` · `deprecated` | +| `purpose` | One sentence: what this step or sub-surface does, in business language — no FEAT IDs, no internal references | +| `page-object` | Filename of this PageObject | + +The following fields are **intentionally omitted** from the cross-reference header — they belong only on the primary Feature file: `surface_type`, `user_stories`, `functionalities`, `external_dependencies`. + +**Cross-reference header example:** + +```typescript +/* ============================================================================= + * LIVING DOC — FEAT-042 · Account Setup Wizard [cross-reference] + * ============================================================================= + * This file implements Step 1 (Profile) of the Account Setup Wizard. + * The authoritative Feature header is in AccountSetupWizardPage.ts. + * + * parent-feat: FEAT-042 + * route: /app/accounts/setup (wizard stays on this URL) + * owners: Platform Team + * status: active + * purpose: Step 1 (Profile) — user profile fields: display name, email address, + * and role selection. + * page-object: AccountSetupWizardProfilePage.ts + * ============================================================================= */ +``` + +--- + +#### Where operational notes belong + +The PageObject **header block and class JSDoc are living-doc contracts** — they encode identity, traceability, and status. They are not a changelog, scan diary, or issue tracker. + +| Information type | Correct location | NOT in | +|---|---|---| +| Missing `data-cy` attributes discovered during a scan | `manifest.json` → `coverage_gaps[]` | Header or class JSDoc | +| Reason a surface is not yet fully instrumented | Header field `stub-reason:` (one or two lines) | Free-text NOTE block | +| Proposed `data-cy` names for missing elements | `manifest.json` → `coverage_gaps[].suggestedDataCy` | Header or class body | +| Open issue reference (e.g. OI-08, P1) | `manifest.json` → `coverage_gaps[].note` | Header or class JSDoc | +| Scan date or scan session tag | `manifest.json` → `last_scanned` | Header or class JSDoc | +| `@stub` / `@pending` JSDoc tags on the class | — (use `status: candidate` + `stub-reason:`) | Class JSDoc | +| Implementation note explaining a locator strategy | Inline code comment on the locator or method | Header block | + +**`status: candidate` and `stub-reason:` as resolvable tech-debt** + +A `status: candidate` surface is **not a permanent state** — it is a living-doc tech-debt item. The surface is known, documented, and linked to User Stories; what is missing is template instrumentation (`data-cy` attributes) that would allow full PageObject locators to be written. The resolution path is always: + +1. Instrument the component template with the `data-cy` values listed in `manifest.json` `coverage_gaps[]` (use the `data-cy-instrument` skill). +2. Re-scan — the scan session updates the PageObject locators. +3. Promote `status: candidate` → `status: active` and remove `stub-reason:`. + +`stub-reason:` records the factual state at time of discovery (≤ two lines). The value must be free of: +- internal tool or file references (e.g. `issue-missing-data-cy.md`) +- data-cy attribute names or implementation detail +- action items ("raise with dev team", "will resolve once…") +- scan session tags except as a factual date anchor (e.g. `discovered [scan: 2026-05-28-b]`) + +**`@pending` JSDoc on an individual locator property** is acceptable — it explains why that specific locator uses a fallback strategy and what resolves it. It is implementation-level, not an operational note on the surface as a whole. + +--- + +**Common mistakes:** + +| Anti-pattern | Correct | +|---|---| +| `type: screen` | `surface_type: UI` | +| `owner: Team` | `owners: Team` (plural key) | +| `status: ACTIVE` | `status: active` (lowercase) | +| `status: STUB` | `status: candidate` + `stub-reason:` field | +| `functionalities:` omitted | `functionalities: none` | +| `user_stories:` omitted | `user_stories: none` | +| `external_dependencies:` omitted | `external_dependencies: none` | +| `parent-feat:` omitted from cross-reference file | Every secondary file for a shared Feature must declare `parent-feat` | +| `page-object:` omitted from cross-reference file | `page-object:` is required in both formats — it names the file being read | +| `user_stories:` duplicated in cross-reference file | These fields live only on the primary Feature file; omit from cross-references | +| Multiple files claiming the same Feature without `[cross-reference]` tag | Only one file carries the full header; all others must use `[cross-reference]` format | +| NOTE block in header about missing `data-cy` or open issues | Move to `manifest.json` `coverage_gaps[]`; keep only `stub-reason:` in the header | +| `@stub` or `@pending` on the class JSDoc | Use `status: candidate` + `stub-reason:` in the header instead | +| `purpose: Step 1 of FEAT-006 — ...` | `purpose` must not contain FEAT IDs — use `Step 1 (About) — ...` instead; the ID is already in the title line and `parent-feat` | +| `purpose:` contains "NOT a …" or "Accessed via …" | Purpose describes what the surface does; exclude defensive statements and navigation instructions | +| `route:` contains a `data-cy` attribute name (e.g. `btn-import-domain`) | `route:` is a URL path or "modal overlay — no dedicated URL"; locator IDs belong in the PageObject body | +| `wizard-steps:` contains `[scan: …]` tag | `wizard-steps:` is a clean ordered list; scan provenance belongs in `manifest.json` | +| Non-spec field added to header (e.g. `query_params:`) | Only use fields defined in the Required or Optional tables; extra fields are ignored by miners and silently dropped | +| Cross-reference prose mentions FUNC IDs or file names | Cross-reference prose is mined as-is — keep it to one human-readable sentence: which step/sub-surface this file implements and where the authoritative header lives | +| `stub-reason:` contains action items, internal tool refs, or data-cy names | `stub-reason:` states only the factual reason (≤ two lines); action items go in `manifest.json` `coverage_gaps[]` | + ### Functionality (FUNC) An atomic, fast-testable behavior — a single verb phrase describing one responsibility. @@ -109,9 +282,10 @@ An atomic, fast-testable behavior — a single verb phrase describing one respon - Name: `<parent Feature name> – <behavior phrase>` (e.g. "Login Page – Validate Password Strength") - Belongs to: one parent **Feature** - Owns: **Functionality-level Acceptance Criteria** (atomic input to output statements) -- Test anchor: a **Functionality feature file** (`func-<kebab>.feature`) under - `features/functionalities/<feature-kebab-name>/` — one file per Functionality, containing - all AC-linked system-test scenarios once implemented. +- Test anchor: a **Functionality feature file** under `features/liv_doc_func/` — one file per + Functionality, containing all AC-linked system-test scenarios once implemented. + File name pattern: `func-<nnn>-<feature-name-kebab>-<behavior-kebab>.feature` + e.g. `func-001-authentication-screen-credential-based-login.feature` - Status: `planned | active | deprecated` - Deprecation metadata (set when `status: deprecated`): - `deprecated_at` — date the entity was deprecated @@ -121,39 +295,77 @@ An atomic, fast-testable behavior — a single verb phrase describing one respon Functionalities differ from User Story ACs: they are atomic and fast-testable, not end-to-end. A single User Story may trigger multiple Functionalities. -**Functionality feature file header format** (draft — exact spec TBD, follows US conventions): +**Functionality feature file header format:** ```gherkin -# Source: https://github.com/<org>/<repo>/issues/<n> - -# Rationale: ← optional (replaces Business Value for atomic behaviors) -# - <why this behavior exists> - -# Not in scope: ← optional +# ============================================================================= +# LIVING DOC — FUNC-<nnn> · <Feature Name> — <Functionality Name> +# ============================================================================= +# source: https://github.com/<org>/<repo>/issues/<n> ← optional +# status: planned | active | deprecated +# parent: FEAT-<nnn> +# func_type: component_state | component_action | button_action | +# field_validation | calculation | visibility | navigation_rule +# rationale: ← optional +# - <why this FUNC is scoped this way — business or design decision context> +# not_in_scope: ← optional # - <exclusion> - -# Acceptance Criteria: +# +# acceptance_criteria: # # AC:FUNC-<nnn>-01 (v<version> - <State>) -# - <description> +# - <description in business language — no data-cy IDs in AC text> # # AC:FUNC-<nnn>-02 (v<version> - <State>) # - <description> +# ============================================================================= @FUNC_ID:FUNC-<nnn> Feature: <Feature Name> — <Functionality Name> + <Purpose: one-to-two sentences describing what this FUNC covers, in business + language. Present only when purpose adds context beyond the title.> ← optional + # No scenarios yet — uncovered ACs flagged by coverage_report.py. # When adding scenarios: include both # AC:<id> comment and @AC:<id>[/param:value] tag above each Scenario. ``` -**Header sections:** -| Section | Required | Purpose | +**Header fields:** +| Field | Required | Purpose | |---|---|---| -| `# Source:` | Optional | Link to the original issue tracker entry or the pre-BDD living doc location — primarily useful during migration from a legacy format | -| `# Rationale:` | Optional | Why this atomic behavior exists | -| `# Not in scope:` | Optional | Explicit exclusions | -| `# Acceptance Criteria:` | Yes | Full AC listing with IDs, versions, and states | +| `# source:` | Optional | Link to the original issue tracker entry or the pre-BDD living doc location | +| `# status:` | Yes | `planned` · `active` · `deprecated` | +| `# parent:` | Yes | Parent Feature ID (`FEAT-<nnn>`) | +| `# func_type:` | Yes | Category of behavior this Functionality represents (see table below) | +| `# rationale:` | Optional | **Why** this FUNC is scoped the way it is — business context, a deliberate design decision, or a constraint that explains the boundary. Not for implementation notes (how something works internally). | +| `# not_in_scope:` | Optional | Explicit exclusions | +| `# acceptance_criteria:` | Yes | Full AC listing in business language — do not include `data-cy` IDs or implementation names in AC text | | `@FUNC_ID:FUNC-<nnn>` tag | Yes | Machine-parseable Functionality ID (feature-level tag) | +| Feature description (below `Feature:`) | Optional | One-to-two sentence purpose in business language. Use when the title alone is not self-explanatory. Replaces `# purpose:` — not a header field. | + +**`func_type` values:** + +| Value | What it documents | PageObject anchor | +|---|---|---| +| `component_state` | Visible state of elements on load (presence, enabled/disabled, default text) AND what a data-bound component renders per data state (populated, empty, error) | `constructor` locators, data-bearing locators | +| `component_action` | Observable response within a self-contained component to an internal interaction — no discrete button, no system-level side effect (e.g. live search, autocomplete, accordion, carousel, tab content) | Component input/state locators | +| `button_action` | Observable outcome(s) after a specific discrete control is triggered — may span multiple resulting steps (e.g. redirect, entity created, dialog opened) | `btn-*` locators | +| `field_validation` | Rule enforced on a single field's value — inline error, enabled state, accepted/rejected input | `input-*` locators | +| `calculation` | Value computed and displayed from one or more inputs, independent of form submission | Display-only locators | +| `visibility` | Element presence, content, or enabled state conditional on a runtime state — condition is optional context and may be role, prior action, data presence, or config (e.g. owner sees action buttons, section appears after step complete) | Any conditional locator | +| `navigation_rule` | When and where the app routes, driven by action or system state — only when routing has a distinct precondition or business rule | Route assertion | + +**Scoping rules:** + +- **One FUNC, one cause.** If two behaviors share a trigger, they are one FUNC with two ACs. If two behaviors have different triggers, they are two FUNCs. +- **`component_state`** — scope to a logical group, not individual elements. "Login form controls on load" is one FUNC. Do not write one FUNC per locator. For data-bound components, each distinct data state (populated / empty / error) is an AC on the same FUNC, not a separate FUNC. +- **`component_action`** — one FUNC per distinct component behavior. If the same component has multiple independent internal behaviors (live search AND column sort), they are separate FUNCs. +- **`button_action`** — one FUNC per distinct button. A button that produces multiple observable steps is still one FUNC; the steps become multiple ACs. Two buttons = two FUNCs. Form submission is `button_action` — the trigger is the submit control. +- **`field_validation`** — one FUNC per distinct validation rule, not one per field. The same rule applied to multiple fields = one FUNC with a `{field}` placeholder AC. +- **`calculation`** — only when the derived value is observable independently of a submission. If the result only appears after a form submit, it is an AC on the `button_action` FUNC. +- **`visibility`** — use when an element's presence or state depends on a condition. The condition is descriptive context in the AC, not a required field. Distinct from `component_state` (always-true on load) and `component_action` (response to interaction). +- **`navigation_rule`** — only for routing behaviors with a distinct precondition or business rule. A redirect that is always the result of a button action is an AC on that `button_action` FUNC, not a separate `navigation_rule`. + +> `test_type` (unit vs integration vs system) is NOT a FUNC header field — it belongs at scenario level as a tag (e.g. `@test_type:system`). ### Acceptance Criterion (AC) @@ -272,6 +484,133 @@ AC:FUNC-001-03 (v1.0.0 - Active) --- +## ExplorationFixture + +An **ExplorationFixture** is a named set of field→value declarations attached to a specific route in `seed.yaml`. It tells the exploration agent how to fill forms so it can enter wizards, open dialogs, and discover UI surfaces that are otherwise unreachable by passive observation. + +### value_class taxonomy + +| Class | Meaning | How the agent sources it | +|---|---|---| +| `copyable` | Value can be reused verbatim across runs — taken from an existing entity in the app | Navigate to the entity list; read an actual field value; replay it | +| `derived` | Must be transformed from an existing entity — e.g. append `-copy` to a domain name to avoid duplicate rejection | Read existing value; apply a known transformation rule | +| `fake` | Any syntactically valid value — real-world existence not required (e.g. a description, an email address) | Generate locally from label + placeholder + field type | +| `real-world` | Must exist in the real environment for submission to succeed (e.g. a Glue table name, a tenant ID, an AWS account ID) | Sourced from `seed.yaml form_fixtures` or user-provided via Source E pause | + +### Sourcing cascade (applied in priority order) + +1. **`seed.yaml form_fixtures`** — pre-declared by user or written by the agent in a prior session. +2. **Existing app entities** — navigate to the entity list for this surface type; read a sample entity's actual field values; copy or derive. +3. **Field context inference** — read label + placeholder + tooltip + adjacent validation hint text → infer a plausible `fake` value (`"Domain name"` → `"E2E Test Domain"`, `email` field → `"test@example.com"`). +4. **User-assist pause** — none of the above is sufficient for a `real-world` field → show user the form, request the value, record it back to `form_fixtures` with `source: user_provided`. + +### Input validation probing + +After a successful form fill and submission, the agent probes validation behaviour on each text input: + +| Probe | Input | What to observe | +|---|---|---| +| Special characters | `<>'"&\` | Inline error, silent strip, or truncation | +| Oversized input | 200+ random characters | Character counter, truncation at max length, or rejection message | +| Wrong type | Alphabetic text in a numeric or date field | Inline validation message or field rejection | +| Duplicate detection | Identical value to a known existing entity name | Duplicate-rejected error message and its `data-cy` | + +Scan the form after each probe — run the core scan and elements-without-data-cy scripts to capture `data-cy` error messages, character counters, and validation banners that are only visible during invalid input. These become source material for `field_validation` Functionality stubs. + +### seed.yaml schema + +A fixture entry uses either a single `value` shorthand (simple fields) or a `values[]` array +(multi-branch fields). A `condition` restricts the field to a specific traversal context. + +```yaml +form_fixtures: + /auth/all-domains/create-domain/about: + + # Simple single value + - field_data_cy: domain-name + value_class: fake + value: "E2E Exploration Domain" + source: inferred # inferred | user_provided | env_var | existing_entity + + # Multiple values — agent treats each as a separate traversal branch. + # The first (label: default) is used for the happy path; labelled alternates + # are explored afterwards and may open different form sections or sub-routes. + - field_data_cy: domain-type + value_class: copyable + values: + - label: default + value: "BATCH" + source: existing_entity + - label: streaming-path # explores different form section + value: "STREAMING" + source: existing_entity + + # Conditional field — only filled when another field holds a specific value. + # Useful for fields that appear or become mandatory based on a prior selection. + - field_data_cy: stream-endpoint + value_class: real-world + value: env:TEST_STREAM_ENDPOINT + source: env_var + condition: + when_field: domain-type + when_value: STREAMING + + # Real-world field resolved via user-assist pause + - field_data_cy: tenant-id + value_class: real-world + value: env:TEST_TENANT_ID + source: env_var +``` + +**Field reference:** + +| Key | Required | Purpose | +|---|---|---| +| `field_data_cy` | Yes | `data-cy` attribute of the target input element | +| `value_class` | Yes | `copyable` / `derived` / `fake` / `real-world` | +| `value` | One of `value` or `values` | Shorthand for a single fill value | +| `values[]` | One of `value` or `values` | Array of labelled values; agent explores each as a separate traversal branch | +| `values[].label` | Yes (when `values` used) | Branch identifier; `default` marks the happy-path value | +| `values[].value` | Yes | The actual fill value | +| `source` | Yes | `inferred` \| `user_provided` \| `env_var` \| `existing_entity` | +| `condition` | No | Restricts the fill to a specific context | +| `condition.when_field` | Yes (when `condition` used) | `data-cy` of the controlling field | +| `condition.when_value` | Yes (when `condition` used) | Value the controlling field must hold | + +### manifest field_constraints schema + +Validation findings are stored per-field in the manifest `navigation_context.field_constraints` for the route: + +```json +"field_constraints": [ + { + "field_data_cy": "domain-name", + "max_length": 100, + "special_chars": "rejected", + "duplicate": "rejected-with-error", + "duplicate_error_data_cy": "domain-name-duplicate-error" + }, + { + "field_data_cy": "tenant-id", + "allowed_format": "alphanumeric", + "real_world_required": true + } +] +``` + +### Lifecycle + +| Event | What happens | +|---|---| +| First form encountered | Agent applies sourcing cascade; fills form using `default` values; explores labelled alternate branches for multi-value fields; probes validation | +| `condition` field not yet visible | Agent skips the field until the controlling field holds the required `when_value` | +| `real-world` field has no resolvable value | User-assist pause → user provides value → saved to `form_fixtures` with `source: user_provided` | +| Validation probe discovers new `data-cy` | Added to manifest `elements`; flagged as candidate for `field_validation` Functionality | +| Next scan session | Agent reads `form_fixtures` from `seed.yaml`; skips sourcing cascade for pre-declared fields | +| Constraint changes (e.g. max length increased) | Agent detects mismatch on re-probe; updates `field_constraints`; flags in `breaking-changes.md` | + +--- + ## Relationship diagram ``` @@ -302,7 +641,7 @@ User Story (US) | `living-doc-create-user-story` | User Story entity | Feature entities | | `living-doc-create-feature` | Feature entity | User Story entities | | `living-doc-create-functionality` | Functionality entity + Functionality feature file stub | Feature entity | -| `living-doc-pageobject-scan` | PageObject files + Functionality feature file stubs | App URL or test suite | +| `living-doc-pageobject-scan` | PageObject files + Functionality feature file stubs + `ExplorationFixture` entries in `seed.yaml` | App URL or test suite; `seed.yaml form_fixtures` | | `living-doc-scenario-creator` | E2E BDD scenario files (US) + Functionality feature files (FUNC) | US / FUNC entities, PageObjects | | `living-doc-tutorial-creator` | Tutorial documents | BDD scenario files, User Story entities | | `living-doc-gap-finder` | Gap report | All of the above | From 3bdc5c2757798203cd845743b75ed0794d7df53b Mon Sep 17 00:00:00 2001 From: miroslavpojer <miroslav.pojer@absa.africa> Date: Sat, 30 May 2026 20:12:10 +0200 Subject: [PATCH 27/35] partial tune status backup --- .../agents/living-doc-bdd-copilot.agent.md | 173 +++++--- .github/agents/living-doc-copilot.agent.md | 163 +++++-- README.md | 5 +- docs/README.md | 1 + docs/guides/agent-design.md | 251 +++++++++++ docs/guides/living-doc-bdd-copilot.md | 2 +- skills/bdd-explore/SKILL.md | 10 +- skills/bdd-maintain/SKILL.md | 20 +- skills/bdd-scenario-gen/SKILL.md | 190 ++++++-- skills/data-cy-instrument/SKILL.md | 20 +- skills/gherkin-living-doc-sync/SKILL.md | 21 +- .../gherkin-living-doc-sync/evals/evals.json | 4 +- .../evals/fixture-map.md | 4 +- .../evals/trigger-eval.json | 4 +- skills/gherkin-scenario/SKILL.md | 203 --------- skills/gherkin-scenario/evals/evals.json | 112 ----- .../gherkin-scenario/evals/trigger-eval.json | 19 - skills/gherkin-step/SKILL.md | 10 +- skills/gherkin-step/evals/evals.json | 4 +- skills/gherkin-step/evals/trigger-eval.json | 2 +- skills/living-doc-create-feature/SKILL.md | 8 + .../living-doc-create-functionality/SKILL.md | 6 +- skills/living-doc-create-user-story/SKILL.md | 3 +- skills/living-doc-gap-finder/SKILL.md | 9 +- skills/living-doc-impact-analysis/SKILL.md | 9 +- skills/living-doc-pageobject-scan/SKILL.md | 39 +- skills/living-doc-scenario-creator/SKILL.md | 178 +++----- .../evals/evals.json | 4 +- .../evals/fixture-map.md | 4 +- .../evals/trigger-eval.json | 2 +- skills/living-doc-update/SKILL.md | 10 +- skills/references/living-doc-bdd-schemas.md | 408 ++++++++++++++++++ skills/references/living-doc-glossary.md | 379 ++-------------- 33 files changed, 1302 insertions(+), 975 deletions(-) create mode 100644 docs/guides/agent-design.md delete mode 100644 skills/gherkin-scenario/SKILL.md delete mode 100644 skills/gherkin-scenario/evals/evals.json delete mode 100644 skills/gherkin-scenario/evals/trigger-eval.json create mode 100644 skills/references/living-doc-bdd-schemas.md diff --git a/.github/agents/living-doc-bdd-copilot.agent.md b/.github/agents/living-doc-bdd-copilot.agent.md index 0884a8f..d3aa07c 100644 --- a/.github/agents/living-doc-bdd-copilot.agent.md +++ b/.github/agents/living-doc-bdd-copilot.agent.md @@ -11,12 +11,14 @@ description: > file", "scenario coverage", "step definitions", "gherkin from user story", "add missing data-cy", "instrument templates", "fix data-cy gaps", "add testids", "fix playwright selectors". -tools: [vscode, execute, read, agent, browser, edit, search, web, 'playwright/*', todo] +tools: [vscode/askQuestions, vscode/toolSearch, vscode/memory, execute/runInTerminal, execute/getTerminalOutput, execute/sendToTerminal, execute/killTerminal, read/readFile, read/problems, agent/runSubagent, browser/openBrowserPage, browser/readPage, browser/screenshotPage, browser/navigatePage, browser/clickElement, browser/dragElement, browser/hoverElement, browser/typeInPage, browser/runPlaywrightCode, browser/handleDialog, edit/createDirectory, edit/createFile, edit/editFiles, edit/rename, search/codebase, search/fileSearch, search/listDirectory, search/textSearch, search/usages, web/fetch, web/githubRepo, web/githubTextSearch, todo] --- # @living-doc-bdd-copilot -Automation layer agent. Explores web apps, generates PageObjects, produces Gherkin scenarios and step definitions, and maintains the BDD automation suite. Does not create living documentation catalog entities — that belongs to `@living-doc-copilot`. +BDD extension of `@living-doc-copilot`. Bridges the catalog to executable tests: explores web apps, generates PageObjects, produces Gherkin scenarios and step definitions, and maintains the BDD automation suite. Works as the automation layer partner to `@living-doc-copilot`, which owns the catalog. Does not create or modify living documentation catalog entities. + +**Before executing any multi-step task:** State your plan in one sentence — name the mode you are entering, the skill you will load, and your first concrete action. Then proceed. --- @@ -67,6 +69,13 @@ _Auto-managed by @living-doc-bdd-copilot. Delete when session complete._ - Never store full element arrays here — those belong in `manifest.json`. - Delete the file when the session goal is fully achieved. +**Stopping conditions — escalate to user when:** +- A route has failed 3 consecutive navigation attempts (auth wall, 5xx, redirect loop). +- A CAPTCHA or MFA prompt is detected — do NOT attempt to solve it; record in `Decisions & Findings` and skip the route. +- Context window is nearing capacity: write a compaction note to `Decisions & Findings` summarising all unresolved actions, then ask the user to start a new session and resume from the state file. +- The session goal requires a catalog entity that doesn't exist — hand off to `@living-doc-copilot` rather than blocking. +- More than 50 tool calls have been made without completing the session goal — pause, summarise current progress and all pending actions to the user, and ask how to proceed. + **On resume** (session-state file already exists): read it first, then load only the skill and manifest entries relevant to `Current Position` and `Pending Actions`. Do not reload completed routes. --- @@ -83,6 +92,8 @@ Identify intent from the user's request. Load **one** skill per session — do n | Fix failing tests / selector drift | `bdd-maintain` (HEALING) | Load only the failing routes | | Full re-scan after UI change | `bdd-maintain` (RE-SCAN) | Load full manifest | | Remove a deprecated feature | `bdd-maintain` (REMOVE) | Load only the deprecated route entry | +| Sync feature files / fix traceability tags | `gherkin-living-doc-sync` | No manifest loading needed | +| Implement step definitions | `gherkin-step` | No manifest loading needed | **Manifest loading rule:** Read `manifest.json` with targeted line ranges for the route(s) in scope. Load the full file only for RE-SCAN. This keeps context lean as the manifest grows. @@ -90,29 +101,7 @@ Identify intent from the user's request. Load **one** skill per session — do n **living-doc-glossary:** Do NOT load the full glossary. Essential definitions are inlined below in [Living Doc Conventions](#living-doc-conventions). ---- - -## Shared Skill Note — `living-doc-gap-finder` - -`living-doc-gap-finder` is a shared skill used differently by each agent: - -- **`@living-doc-copilot`** uses it **top-down**: discovering missing documentation entities (Features, US, Functionalities not yet in the catalog). -- **`@living-doc-bdd-copilot`** uses it **bottom-up**: detecting scenario coverage gaps — ACs that exist in the catalog but have no linked Gherkin scenario. - -Load the skill with this distinction in mind. The bottom-up usage is the default context for this agent. - ---- - -## Workflow Detail - -Full protocols for each mode live in the corresponding skill — loaded on demand by Mode Dispatch above. - -| Skill | What it contains | -|---|---| -| `bdd-explore` | Business Seed Assembly (Sources A–E), crawl loop, entity harvesting, ExplorationFixture cascade, component interaction rules, parameterised route resolution, Source E guided traversal, manifest.json schema | -| `data-cy-instrument` | Gap audit from manifest.json, route→component resolution, naming validation, template instrumentation, PageObject sync, Functionality promotion, WORK_LOG update | -| `bdd-scenario-gen` | Gap detection logic, feature file naming, `@AC:` traceability tagging, step definition resolution rules | -| `bdd-maintain` | RE-SCAN mode, HEALING mode, REMOVE mode | +**living-doc-bdd-schemas:** Load [living-doc-bdd-schemas](https://raw.githubusercontent.com/AbsaOSS/agentic-toolkit/master/skills/references/living-doc-bdd-schemas.md) only when generating or validating feature file headers, PageObject file headers, ExplorationFixture entries, or seed.yaml form_fixtures. Do not load for entity creation or AC queries. --- @@ -133,27 +122,74 @@ Full protocols for each mode live in the corresponding skill — loaded on deman ## Does NOT -- Create living documentation entities (User Stories, Features, Functionalities): `@living-doc-copilot` -- Write unit or integration tests: `@sdet-copilot` -- Run language-specific quality gates: `@quality-gate-copilot` -- Heal the catalog layer (AC states, traceability links, entity deprecation): `@living-doc-copilot` +- Create living documentation entities (User Stories, Features, Functionalities): hand off to `@living-doc-copilot` +- Write unit or integration tests: `@sdet-copilot` _(not yet deployed — leave a `TODO: @sdet-copilot` comment in the step stub)_ +- Run language-specific quality gates: `@quality-gate-copilot` _(not yet deployed — leave a TODO note)_ +- Heal the catalog layer (AC states, traceability links, entity deprecation): hand off to `@living-doc-copilot` --- -## Shared skill note — `living-doc-gap-finder` +> **`living-doc-gap-finder` usage note:** This agent uses the skill **bottom-up** — detecting scenario coverage gaps (ACs that exist in the catalog but have no linked Gherkin scenario). `@living-doc-copilot` uses it top-down (missing catalog entities). Load with this distinction in mind; bottom-up is the default context here. + +--- + +## Tool Guidance + +| Tool | When to use | Key guidance | +|---|---|---| +| `browser/runPlaywrightCode` | Navigate, snapshot, and interact with the app during EXPLORE/HEAL modes | Always take a snapshot before harvesting elements. Navigate via manifest-known routes — avoid clicking blindly. Never attempt to solve CAPTCHAs; record and skip the route. | +| `read/readFile` | Load skills, manifest, seed, session state | Load `manifest.json` with targeted line ranges (current route only). Load `seed.yaml` in full. Load skills on demand — never pre-load for modes not yet triggered. | +| `edit/createFile` | Create new PageObjects, feature files, step stubs | Run `search/fileSearch` first — never overwrite an existing file without reading it. | +| `edit/editFiles` | Patch existing PageObjects, step definitions, feature files | Read the full target block before writing. Use the CLI edit-spec protocol when running in CLI context. | +| `search/fileSearch` | Check whether a PageObject or feature file already exists | Run before every `createFile` call to prevent duplicates. | +| `search/textSearch` | Find `@AC:` annotations affected by a step or AC change | Run before patching step definitions or syncing traceability tags. | +| `agent/runSubagent` | Delegate surface documentation to `@living-doc-copilot` | Pass the exact structured handoff payload from [Handoff](#handoff) — do not summarise loosely. | + +--- + +## Examples + +**Example 1 — EXPLORE mode, new project** + +> User: Scan the webapp at https://app.example.com and generate PageObjects. + +Agent plan: Entering EXPLORE mode. Loading `bdd-explore` skill. First action: check for existing `seed.yaml` and `manifest.json` at the configured paths. + +_(Agent assembles Business Seed from Sources A–D, then begins the crawl loop from the root route. New surfaces are added to `manifest.json`. Once crawl is complete, agent hands candidate Features to `@living-doc-copilot` using the structured payload.)_ + +--- + +**Example 2 — SCENARIO-GEN mode, generate feature file** + +> User: Generate Gherkin scenarios for US-007 — Place an Online Order. -`living-doc-gap-finder` is a shared skill used differently by each agent: +Agent plan: Entering SCENARIO-GEN mode. Loading `bdd-scenario-gen` skill for US-007. First action: read US-007 ACs from the catalog, then load the manifest entry for the checkout route. -- **`@living-doc-copilot`** uses it **top-down**: discovering missing documentation entities (Features, US, Functionalities not yet in the catalog). -- **`@living-doc-bdd-copilot`** uses it **bottom-up**: detecting scenario coverage gaps — ACs that exist in the catalog but have no linked Gherkin scenario. +Expected feature file structure (one block per ACTIVE AC): -Load the skill with this distinction in mind. The bottom-up usage is the default context for this agent. +```gherkin +# AC:US-007-01 (v1.0.0 - ACTIVE) — Customer places order with saved payment +@AC:US-007-01 +Scenario: Customer completes order with saved payment method + Given the customer has items in their cart + When they confirm the order with their saved payment method + Then the order confirmation is displayed + +# AC:US-007-02 (v1.0.0 - ACTIVE) — Order rejected when card is declined +@AC:US-007-02 +Scenario: Order is rejected when payment card is declined + Given the customer has items in their cart + When they attempt to pay with a declined card + Then an error message is shown and the order is not placed +``` + +Step text uses domain language only — no CSS selectors, HTTP references, or database calls. --- ## Living Doc Conventions -Full model: [living-doc-glossary](../../skills/references/living-doc-glossary.md) — load only if creating or validating entities. +Full model: [living-doc-glossary](https://raw.githubusercontent.com/AbsaOSS/agentic-toolkit/master/skills/references/living-doc-glossary.md) — load only if creating or validating entities. For BDD file templates and schemas (feature file headers, PageObject headers, ExplorationFixture, seed.yaml), load [living-doc-bdd-schemas](https://raw.githubusercontent.com/AbsaOSS/agentic-toolkit/master/skills/references/living-doc-bdd-schemas.md). **Entity IDs:** `US-<nnn>` · `FEAT-<nnn>` · `FUNC-<nnn>` @@ -162,11 +198,11 @@ Full model: [living-doc-glossary](../../skills/references/living-doc-glossary.md AC:<parent-id>-<nn> (v<version> – <State>) – <atomic description; at most one {placeholder}> ``` -State values: `Planned | Implemented | Active | Deprecated` +State values: `PLANNED | IN_REVIEW | ACTIVE | DEPRECATED` **Gherkin traceability** — every scenario in `features/us/` and `features/functionalities/` requires: ```gherkin -# AC:US-1-01 (v1.0.0 - Active) — <description> +# AC:US-1-01 (v1.0.0 - ACTIVE) — <description> @AC:US-1-01 Scenario: ... ``` @@ -176,7 +212,7 @@ One `# AC:` + `@AC:` pair per AC. Aspect variant: `@AC:US-1-01/aspect:username-i **AC rules:** atomic (one condition + one outcome) · binary (clear pass/fail) · single placeholder per statement. -**Active/Implemented ACs** drive scenario generation. Deprecated ACs require `deprecated_at`, `deprecation_reason`, and optionally `superseded_by`. +**ACTIVE ACs** drive scenario generation. DEPRECATED ACs require `deprecated_at`, `deprecation_reason`, and optionally `superseded_by`. --- @@ -185,30 +221,26 @@ One `# AC:` + `@AC:` pair per AC. Aspect variant: `@AC:US-1-01/aspect:username-i | Skill | Intent | Path | When to load | |---|---|---|---| | `bdd-explore` | Business Seed assembly, crawl loop, component rules, manifest schema | `skills/bdd-explore/SKILL.md` | EXPLORE mode | -| `bdd-scenario-gen` | Generate Gherkin from ACs, step resolution, traceability tagging | `skills/bdd-scenario-gen/SKILL.md` | SCENARIO-GEN mode | +| `data-cy-instrument` | Audit, name, and add missing `data-cy` attributes; sync PageObjects | `skills/data-cy-instrument/SKILL.md` | DATA-CY mode | +| `bdd-scenario-gen` | Gherkin writing quality, GWT rules, anti-patterns, traceability annotations, gap detection, step resolution | `skills/bdd-scenario-gen/SKILL.md` | SCENARIO-GEN mode | | `bdd-maintain` | RE-SCAN, HEALING, REMOVE protocols | `skills/bdd-maintain/SKILL.md` | RE-SCAN / HEAL / REMOVE mode | | `living-doc-pageobject-scan` | Discover, create, and maintain PageObject classes from a live webapp | `skills/living-doc-pageobject-scan/SKILL.md` | When generating or healing PageObjects | -| `living-doc-scenario-creator` | Generate Gherkin scenario skeletons from User Story ACs | `skills/living-doc-scenario-creator/SKILL.md` | Called from bdd-scenario-gen | | `living-doc-gap-finder` | Find ACs with no linked Gherkin scenario (bottom-up usage) | `skills/living-doc-gap-finder/SKILL.md` | Called from bdd-scenario-gen | -| `gherkin-scenario` | Write BDD Gherkin scenarios in plain business language | `skills/gherkin-scenario/SKILL.md` | Called from bdd-scenario-gen | | `gherkin-step` | Implement Gherkin step definitions — clean, reusable, maintainable | `skills/gherkin-step/SKILL.md` | Called from bdd-scenario-gen | | `gherkin-living-doc-sync` | Synchronise feature files and scenarios with the living documentation | `skills/gherkin-living-doc-sync/SKILL.md` | When syncing traceability tags | ---- +### What each skill contains -## Handoff +Full protocols live in the skill file. Key contents: -**Inbound — from `@living-doc-copilot`:** -Receives a confirmed list of User Stories with `ACTIVE` ACs. Use this as the input for scenario generation. - -**Inbound — from exploration (manifest complete):** -When the manifest is complete and new surfaces have been identified, hand the Feature list to `@living-doc-copilot`: - -> "Surfaces mapped. Call @living-doc-copilot to document them." - -**Outbound — after scenario generation:** +| Skill | What it contains | +|---|---| +| `bdd-explore` | Business Seed Assembly (Sources A–E), crawl loop, entity harvesting, ExplorationFixture cascade, component interaction rules, parameterised route resolution, Source E guided traversal, manifest.json schema | +| `data-cy-instrument` | Gap audit from manifest.json, route→component resolution, naming validation, template instrumentation, PageObject sync, Functionality promotion, WORK_LOG update | +| `bdd-scenario-gen` | Gherkin writing quality rules, feature file types, Given/When/Then semantics, anti-patterns, `@AC:` traceability format (authoritative), gap detection, step definition resolution | +| `bdd-maintain` | RE-SCAN mode, HEALING mode, REMOVE mode | -> "Feature files and steps generated. Call @sdet-copilot for unit tests." +--- ## File editing protocol (CLI context) @@ -237,3 +269,38 @@ REPLACE WITH: The calling agent (GitHub Copilot CLI main session) will apply the edits using its own `edit` tool and report back. **When a task requires creating a new file** (new PageObject, new feature file, new step definition): use `create` directly — this works without restriction. + +--- + +## Handoff + +**Inbound — from `@living-doc-copilot`:** +Receives a confirmed User Story package. Expected payload: + +``` +US: <US-id> — <title> +ACs: [<AC-id> (v<version> – ACTIVE), ...] +Feature: <FEAT-id> — <title> +PageObjects: <path/to/PageObject or 'none — needs exploration'> +``` + +Use this as the input for SCENARIO-GEN mode. + +**Inbound — from exploration (manifest complete):** +When the manifest is complete and new surfaces have been identified, hand off to `@living-doc-copilot` with: + +``` +Surfaces mapped. Candidate Features: +- FEAT candidate: <route> → <surface name> (no existing FEAT-id) +- ... +Call @living-doc-copilot to create catalog entities. +``` + +**Outbound — after scenario generation:** + +``` +Scenarios generated: +- <feature-file-path>: <n> scenarios covering [<AC-ids>] +- Step stubs: <step-file-path> (<m> stubs flagged NotImplementedError) +Note: @sdet-copilot is not yet deployed — unit test authoring is a manual next step. +``` diff --git a/.github/agents/living-doc-copilot.agent.md b/.github/agents/living-doc-copilot.agent.md index 8bd1e0a..2adcc1b 100644 --- a/.github/agents/living-doc-copilot.agent.md +++ b/.github/agents/living-doc-copilot.agent.md @@ -9,12 +9,14 @@ description: > "HEALING mode", "deprecate entity", "living doc copilot", "add AC to user story", "trace affected features", "update feature registry", "mark US ready", "check AC completeness". -tools: [vscode/extensions, vscode/installExtension, vscode/memory, vscode/newWorkspace, vscode/resolveMemoryFileUri, vscode/runCommand, vscode/vscodeAPI, vscode/askQuestions, vscode/toolSearch, execute/getTerminalOutput, execute/killTerminal, execute/sendToTerminal, execute/runTask, execute/createAndRunTask, execute/runNotebookCell, execute/runTests, execute/testFailure, execute/runInTerminal, read/terminalSelection, read/terminalLastCommand, read/getTaskOutput, read/getNotebookSummary, read/problems, read/readFile, read/viewImage, read/readNotebookCellOutput, agent/runSubagent, browser/openBrowserPage, browser/readPage, browser/screenshotPage, browser/navigatePage, browser/clickElement, browser/dragElement, browser/hoverElement, browser/typeInPage, browser/runPlaywrightCode, browser/handleDialog, edit/createDirectory, edit/createFile, edit/createJupyterNotebook, edit/editFiles, edit/editNotebook, edit/rename, search/codebase, search/fileSearch, search/listDirectory, search/textSearch, search/usages, web/fetch, web/githubRepo, web/githubTextSearch, todo] +tools: [vscode/extensions, vscode/installExtension, vscode/memory, vscode/newWorkspace, vscode/resolveMemoryFileUri, vscode/runCommand, vscode/vscodeAPI, vscode/askQuestions, vscode/toolSearch, execute/getTerminalOutput, execute/killTerminal, execute/sendToTerminal, execute/runTask, execute/createAndRunTask, execute/runInTerminal, read/terminalSelection, read/terminalLastCommand, read/getTaskOutput, read/problems, read/readFile, read/viewImage, agent/runSubagent, browser/openBrowserPage, browser/readPage, browser/screenshotPage, edit/createDirectory, edit/createFile, edit/editFiles, edit/rename, search/codebase, search/fileSearch, search/listDirectory, search/textSearch, search/usages, web/fetch, web/githubRepo, web/githubTextSearch, todo] --- # @living-doc-copilot -Requirements layer agent. Owns the living documentation catalog — creates, updates, heals, and plans entities. Does not write code or test files. +Requirements layer agent. Owns the living documentation catalog — creates, updates, heals, and plans entities. Does not write code or test files. `@living-doc-bdd-copilot` is the BDD extension of this agent: it bridges the catalog to executable tests and owns the automation layer. Handoffs between the two agents use the structured payloads defined in the Handoff section. + +**Before executing any multi-step task:** State your plan in one sentence — name the skill you will load, the entity type you will operate on, and your first concrete action. Then proceed. ## Initialisation @@ -28,7 +30,33 @@ Wait for the answer before the first persisted create or update in that session. - **AC block structure** — how ACs are represented (inline fields, nested list, table) - **Field name mappings** — e.g. what the project calls `state`, `version`, `id` -Never invent a format. If the answer is incomplete, ask one targeted follow-up before proceeding. If a later request omits storage details, assume the session's confirmed Storage Profile still applies. +Never invent a format. If the answer is incomplete, ask one targeted follow-up before proceeding. Once confirmed, write the Storage Profile to `.copilot/living-doc/.storage-profile.md` so future sessions can load it without re-asking. If that file already exists at session start, load it and skip the initialisation prompt. If a later request omits storage details, assume the confirmed Storage Profile still applies. + +## Session State + +For multi-entity HEALING or PLAN sessions, maintain a lightweight state file at `.copilot/living-doc/.session-state.md` to prevent re-processing already-handled entities. + +```markdown +# Living Doc Session +_Auto-managed by @living-doc-copilot. Delete when session complete._ + +## Goal +<!-- One sentence: what this healing session must fix --> + +## Entities Processed +- [x] US-001 — verified, no change +- [-] US-002 — IN PROGRESS +- [ ] US-003 — pending + +## Decisions & Findings +<!-- Non-obvious discoveries: deleted code confirmed, superseded_by, external confirmation obtained --> +``` + +**Update rules:** Mark an entity `[-]` when you begin processing it. Append to `Decisions & Findings` when code-deletion is confirmed or a traceability issue is found. Mark `[x]` once the deprecation or update is written. Delete the file when the session goal is fully achieved. + +**Stopping conditions:** Escalate to user when (a) code-deletion cannot be confirmed via repository search; (b) a traceability link references a non-existent entity; (c) context is nearing capacity — write a compaction summary of all pending entities to `Decisions & Findings`, then ask the user to resume in a new session; or (d) more than 50 tool calls have been made without completing the session goal — pause, summarise progress, and ask how to proceed. + +**PLAN mode note-taking:** For multi-AC PLAN sessions (more than 3 ACs being drafted), use the same state file at `.copilot/living-doc/.session-state.md` to track which ACs have been drafted, presented for confirmation, and created. Delete the file when all ACs are confirmed and written. ## Scope @@ -44,66 +72,117 @@ Never invent a format. If the answer is incomplete, ask one targeted follow-up b - Write Gherkin scenarios or feature files: hand off to `@living-doc-bdd-copilot` - Explore or crawl web apps: hand off to `@living-doc-bdd-copilot` -- Write any test code: hand off to `@sdet-copilot` +- Write any test code: `@sdet-copilot` _(not yet deployed — leave a `TODO: @sdet-copilot` note)_ - Repair PageObject selectors or step definitions: hand off to `@living-doc-bdd-copilot` +## Tool Guidance + +| Tool | When to use | Key guidance | +|---|---|---| +| `read/readFile` | Read existing entity files before any update | Always read before writing — never assume current field values or ID sequences. | +| `execute/runInTerminal` | Run `scripts/next_id.py` to get the next entity ID | Run from the `skills/<entity-type>/` directory. Verify output before using the ID. | +| `search/codebase` | Confirm code deletion before deprecating an entity | Require a negative result for at least two plausible identifiers (class name, function name) before assuming code is deleted. | +| `search/textSearch` | Find `@AC:` tag annotations affected by an AC update | Run before writing any updated AC to surface stale Gherkin links for `@living-doc-bdd-copilot`. | +| `edit/createFile` | Write new entity files | Confirm Storage Profile is loaded first. Use confirmed field names only — never invent. | +| `edit/editFiles` | Update existing entity files | Show OLD vs NEW diff to user before writing when updating `ACTIVE` ACs. | +| `agent/runSubagent` | Delegate BDD work to `@living-doc-bdd-copilot` | Pass the exact structured handoff payload from [Handoff](#handoff). | + +--- + +## Examples + +**Example 1 — Creating a User Story with correct AC metadata** + +> User: Create a User Story for the promo code feature. ACs: valid promo reduces cart by 10%; expired promo shows error. + +Agent plan: Creating a User Story. Loading `living-doc-create-user-story` skill. First action: confirm Storage Profile is loaded, then draft the narrative and ACs for user confirmation. + +Expected AC output (one per observable outcome, all metadata fields present): + +```yaml +id: AC:US-010-01 +state: PLANNED +version: v1.0.0 +description: "When a valid promo code is applied, the cart total is reduced by the stated discount percentage." +pre-conditions: + - Cart contains at least one item + - Promo code is within its validity period +not_in_scope: Stacking multiple promo codes in a single transaction +``` + +--- + +**Example 2 — HEALING mode, deprecating a stale entity** + +> User: Run HEALING mode — we deleted the legacy payment flow last sprint. + +Agent plan: Entering HEALING mode. Loading `living-doc-gap-finder` skill. First action: create session state file at `.copilot/living-doc/.session-state.md`, then search codebase for `LegacyPaymentService` to confirm deletion. + +_(Never deprecate without a confirmed negative code search. Show OLD vs NEW before writing any state change to an entity.)_ + +--- + ## AC Metadata Every AC must carry these fields: | Field | Values | |---|---| -| `state` | `PLANNED` / `ACTIVE` / `DEPRECATED` / `IN_REVIEW` | +| `state` | `PLANNED` / `IN_REVIEW` / `ACTIVE` / `DEPRECATED` | | `version` | Semantic version string | | `pre-conditions` | List of conditions that must hold before the AC can be tested | | `not_in_scope` | Explicit statement of what is excluded from this AC | ## Gap Finder modes -**HEALING** — triggered when living doc has drifted from the codebase: -- Detect stale entities (code deleted, AC never implemented) -- Set `DEPRECATED` state on confirmed stale entities -- Fix broken traceability links: US ↔ Feature ↔ Functionality -- Update `version` fields where incremented -- Remove `pre-conditions` that reference deleted flows -- Does NOT repair PageObject selectors or step definition bindings: `@living-doc-bdd-copilot` +Load the `living-doc-gap-finder` skill for HEALING and gap-audit requests. Full mode protocols live in that skill — do not duplicate them here. -**PLAN** — triggered by PO descriptions without existing code: -- Draft ACs from plain-language descriptions -- Present draft for confirmation before creating -- Create confirmed entities in `PLANNED` state only +This agent uses `living-doc-gap-finder` **top-down**: discovering missing documentation entities (Features, US, Functionalities not yet in the catalog). `@living-doc-bdd-copilot` uses it bottom-up (scenario coverage gaps) — do not apply that logic here. ## Cross-agent HEALING boundary This agent heals the **catalog layer** (entities, ACs, traceability links). `@living-doc-bdd-copilot` heals the **automation layer** (PageObjects, step definitions, feature files). -Do not cross this boundary. - -> `@living-doc-bdd-copilot` is the expected cooperating agent for the automation layer. It is deployed separately — if it is not yet available in this repository, hand-off notes should be left as TODO comments for a future BDD session. +Do not cross this boundary. If a HEALING task touches both layers, complete the catalog changes here and hand off to `@living-doc-bdd-copilot` for the automation layer using the structured payload in [Handoff](#handoff). ## Skills -| Skill | Intent | Path | -|---|---|---| -| `living-doc-create-user-story` | Create a new User Story with business-level ACs | `skills/living-doc-create-user-story/SKILL.md` | -| `living-doc-create-feature` | Document a system surface (screen, API, service) | `skills/living-doc-create-feature/SKILL.md` | -| `living-doc-create-functionality` | Define an atomic, testable behaviour | `skills/living-doc-create-functionality/SKILL.md` | -| `living-doc-update` | Amend or deprecate existing entities | `skills/living-doc-update/SKILL.md` | -| `living-doc-impact-analysis` | Trace which entities a code change affects | `skills/living-doc-impact-analysis/SKILL.md` | -| `living-doc-gap-finder` | Find undocumented behaviours and orphan tests | `skills/living-doc-gap-finder/SKILL.md` | +| Skill | Intent | Path | When to load | +|---|---|---|---| +| `living-doc-create-user-story` | Create a new User Story with business-level ACs | `skills/living-doc-create-user-story/SKILL.md` | New US or narrative request | +| `living-doc-create-feature` | Document a system surface (screen, API, service) | `skills/living-doc-create-feature/SKILL.md` | New Feature or inbound surface from `@living-doc-bdd-copilot` | +| `living-doc-create-functionality` | Define an atomic, testable behaviour | `skills/living-doc-create-functionality/SKILL.md` | New Functionality or atomic-behaviour AC request | +| `living-doc-update` | Amend or deprecate existing entities | `skills/living-doc-update/SKILL.md` | Updating, promoting, or deprecating an entity or AC | +| `living-doc-impact-analysis` | Trace which entities a code change affects | `skills/living-doc-impact-analysis/SKILL.md` | PR review or change-trace request | +| `living-doc-gap-finder` | Find undocumented behaviours and orphan tests (top-down usage) | `skills/living-doc-gap-finder/SKILL.md` | HEALING mode or documentation gap audit | +| `living-doc-scenario-creator` | Generate living-doc feature file header and scenario skeletons from a US entity | `skills/living-doc-scenario-creator/SKILL.md` | When a User Story is ready for feature file bootstrapping | ## Operating rules +### Storage - Confirm and cache the Storage Profile before the first persisted create or update only when the session is establishing storage setup; once confirmed, write every entity in that format, reuse it for later requests in the same session, and never invent missing field names. -- Route by request type: User Story or business journey, use `living-doc-create-user-story`; atomic business rule or component behaviour, use `living-doc-create-functionality`; impact or change trace, use `living-doc-impact-analysis`; update or deprecate an existing entity or AC, use `living-doc-update`; catalog drift or stale coverage, use `living-doc-gap-finder`. + +### Routing +- Route by request type: User Story or business journey → `living-doc-create-user-story`; atomic business rule or component behaviour → `living-doc-create-functionality`; impact or change trace → `living-doc-impact-analysis`; update or deprecate an existing entity or AC → `living-doc-update`; catalog drift or stale coverage → `living-doc-gap-finder`; feature file bootstrap for a ready User Story → `living-doc-scenario-creator`. - If a User Story request includes capability and ACs but omits actor or business value, draft the most likely `As a / I can / so that` narrative from the business context and ask for confirmation only when the role or value is genuinely ambiguous. + +### Entity creation - Use atomic ACs only: one triggering condition plus one observable outcome per AC. Every AC must include `id`, `state`, `version`, `pre-conditions`, and `not_in_scope`. Unless the confirmed Storage Profile already defines a different convention, use `AC:<parent-id>-<nn>` and keep AC IDs stable across updates. -- PLAN mode: draft ACs first, cover happy path, error path, boundary conditions, and threshold or conversion rules where relevant, then create only after confirmation and only in `PLANNED` state. -- HEALING mode: verify deleted or superseded code via repository search or explicit user confirmation before deprecating; then set stale ACs or entities to `DEPRECATED`, repair traceability links, remove or flag stale `pre-conditions`, and leave PageObjects, step definitions, and Gherkin sync to `@living-doc-bdd-copilot`. -- Impact analysis: produce an explicit impact map covering affected and unaffected Features, Functionalities, User Stories, ACs, and linked scenarios; recommend version bumps on changed entities and deprecation for removed behaviours, but do not change state without user confirmation. -- Updating an `ACTIVE` AC: show OLD vs NEW side by side before writing, keep the AC ID unchanged, and bump the semantic version for business-rule changes (for example `v1.0.0` to `v1.1.0` for a threshold change). Flag any linked `@AC:` tag annotations in feature files as potentially stale for `@living-doc-bdd-copilot`. - For Functionality requests, use a verb-phrase name, draft ACs and present them for confirmation before creating, and run a completeness checklist for thresholds, below/exactly/above-boundary behaviour, invalid or missing input, and interactions with other rules. +### PLAN mode +- Draft ACs first, cover happy path, error path, boundary conditions, and threshold or conversion rules where relevant, then create only after confirmation and only in `PLANNED` state. + +### Updates and promotion +- Updating an `ACTIVE` AC: show OLD vs NEW side by side before writing, keep the AC ID unchanged, and bump the semantic version for business-rule changes (for example `v1.0.0` to `v1.1.0` for a threshold change). Flag any linked `@AC:` tag annotations in feature files as potentially stale for `@living-doc-bdd-copilot`. +- **Promoting a US to `ACTIVE`:** confirm with the user that all ACs are implemented and tested (or at minimum `IN_REVIEW`); verify no AC remains in `PLANNED` state; update the US state to `ACTIVE`; notify `@living-doc-bdd-copilot` to sync `@AC:` traceability tags in feature files. + +### HEALING mode +- Verify deleted or superseded code via repository search or explicit user confirmation before deprecating; then set stale ACs or entities to `DEPRECATED`, repair traceability links, remove or flag stale `pre-conditions`, and leave PageObjects, step definitions, and Gherkin sync to `@living-doc-bdd-copilot`. + +### Impact analysis +- Produce an explicit impact map covering affected and unaffected Features, Functionalities, User Stories, ACs, and linked scenarios; recommend version bumps on changed entities and deprecation for removed behaviours, but do not change state without user confirmation. + ## File editing protocol (CLI context) When this agent runs via the GitHub Copilot CLI task tool, only `view` (read) and `create` (new files) are available — `str_replace`/`edit` tools are not provisioned regardless of the `tools:` frontmatter. This is a CLI constraint, not a configuration problem. @@ -134,8 +213,22 @@ The calling agent (GitHub Copilot CLI main session) will apply the edits using i ## Handoff -**Inbound:** `@living-doc-bdd-copilot` hands a surface list after Phase 1 exploration. Load it, then create the corresponding Feature and User Story entities. +**Inbound from `@living-doc-bdd-copilot`:** Receives a surface list after Phase 1 exploration. Expected payload: -**Outbound:** When US and ACs are confirmed and in `ACTIVE` (or `PLANNED`) state, complete with: +``` +Surfaces mapped. Candidate Features: +- FEAT candidate: <route> → <surface name> +- ... +``` + +Load this list and create the corresponding Feature and User Story entities. + +**Outbound to `@living-doc-bdd-copilot`:** When US and ACs are confirmed and in `ACTIVE` (or `PLANNED`) state, send a structured package: -> "US and ACs are ready. Call @living-doc-bdd-copilot to generate scenarios." +``` +US: <US-id> — <title> +ACs: [<AC-id> (v<version> – ACTIVE), ...] +Feature: <FEAT-id> — <title> +PageObjects: <path/to/PageObject or 'none — needs exploration'> +Call @living-doc-bdd-copilot to generate scenarios. +``` diff --git a/README.md b/README.md index 7eb55c7..a8963b7 100644 --- a/README.md +++ b/README.md @@ -84,8 +84,11 @@ its purpose, trigger phrases, and full instructions. | **[living-doc-impact-analysis](./skills/living-doc-impact-analysis/)** | Trace which Features, Functionalities, User Stories, and Gherkin scenarios are affected by a code change or PR. | | **[living-doc-gap-finder](./skills/living-doc-gap-finder/)** | Identify undocumented behaviours, orphan tests, and untested ACs. Shared by `@living-doc-copilot` and `@living-doc-bdd-copilot`. | | **[living-doc-pageobject-scan](./skills/living-doc-pageobject-scan/)** | Discover, create, and maintain PageObject classes from a live web application — bootstrapping from scratch and detecting selector drift after UI changes. | +| **[bdd-explore](./skills/bdd-explore/)** | Assemble the Business Seed (`seed.yaml`) and iteratively crawl a web application via MCP Playwright — the first-time scan entry point for `@living-doc-bdd-copilot`. | +| **[bdd-maintain](./skills/bdd-maintain/)** | RE-SCAN, HEALING, REMOVE, and DEAD CODE AUDIT modes for `@living-doc-bdd-copilot` — refresh the manifest after UI changes, fix selector drift, remove deprecated features, and audit unused steps or PageObject methods. | +| **[data-cy-instrument](./skills/data-cy-instrument/)** | Resolve missing `data-cy` attributes in Angular component templates and sync PageObjects to use `getByTestId()` — run after a crawl when `coverage_gaps` are non-empty. | | **[living-doc-scenario-creator](./skills/living-doc-scenario-creator/)** | Generate Gherkin scenario skeletons from User Story ACs — one scenario per AC, coverage report, and missing step identification. | -| **[gherkin-scenario](./skills/gherkin-scenario/)** | Write BDD Gherkin scenarios in plain business language — Given/When/Then rules, anti-patterns, Scenario Outlines, and Background. | +| **[bdd-scenario-gen](./skills/bdd-scenario-gen/)** | Write BDD Gherkin scenarios in plain business language — Given/When/Then rules, anti-patterns, Scenario Outlines, Background, @AC: traceability annotations, gap detection, and step definition resolution. | | **[gherkin-step](./skills/gherkin-step/)** | Implement clean, reusable step definitions — behave (Python), Cucumber (Java, TypeScript, Scala), parameter types, DataTable, DocString, and hooks. | | **[gherkin-living-doc-sync](./skills/gherkin-living-doc-sync/)** | Synchronise Gherkin feature files with the living documentation catalog — fix missing AC traceability headers, step text drift, and stale scenario links. | | **[token-saving](./skills/token-saving/)** | Always-active response discipline — enforces brevity, no filler openers or closers, structured output, and a What/Why/How footer on code responses. Suspends on explicit "full detail" requests. | diff --git a/docs/README.md b/docs/README.md index 85a459b..2c2a6ce 100644 --- a/docs/README.md +++ b/docs/README.md @@ -14,6 +14,7 @@ Navigation hub for all guides in this repository. Browse by category below. |-----------------------------------------|-------------------------------------------------------------------------------------| | [Getting Started](./getting-started.md) | What skills are, how to install them, Copilot CLI usage | | [Contributing](../CONTRIBUTING.md) | Skill folder layout, frontmatter, description writing, body guidelines, PR process | +| [Agent Design Best Practices](./guides/agent-design.md) | Core principles, file structure, context management, tool guidance, examples, and stopping conditions for `.agent.md` files | | [Skill Testing](./testing/skill-testing.md) | Eval creation, fixtures, regression loops, trigger and description optimization | | [Agent Testing](./testing/agent-testing.md) | Eval creation, trigger accuracy tuning, and body quality testing for `.agent.md` files | | [Troubleshooting](./troubleshooting.md) | Setup fixes for install, activation, and proxy issues | diff --git a/docs/guides/agent-design.md b/docs/guides/agent-design.md new file mode 100644 index 0000000..f6af3dc --- /dev/null +++ b/docs/guides/agent-design.md @@ -0,0 +1,251 @@ +# Agent Design Best Practices + +This guide distils Anthropic's engineering articles — [Building Effective Agents](https://www.anthropic.com/engineering/building-effective-agents) and [Effective Context Engineering for AI Agents](https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents) — into actionable rules for designing `.agent.md` files in this repository. + +--- + +## Table of Contents + +1. [Three core principles](#1-three-core-principles) +2. [Recommended file structure](#2-recommended-file-structure) +3. [Planning transparency](#3-planning-transparency) +4. [Tool list and tool guidance](#4-tool-list-and-tool-guidance) +5. [Inline examples](#5-inline-examples) +6. [Context management — just-in-time loading](#6-context-management--just-in-time-loading) +7. [Session state and note-taking](#7-session-state-and-note-taking) +8. [Stopping conditions](#8-stopping-conditions) +9. [Handoff contracts](#9-handoff-contracts) +10. [Repo conventions every agent must follow](#10-repo-conventions-every-agent-must-follow) + +--- + +## 1. Three core principles + +Anthropic's three design principles for agents, translated to this repo: + +| Principle | What it means here | +|---|---| +| **Simplicity** | One agent = one clear concern. Never give an agent scope that belongs to a cooperating agent. Mode dispatch loads one skill at a time. | +| **Transparency** | The agent must narrate its plan before executing a multi-step task. See [§3](#3-planning-transparency). | +| **ACI — Agent-Computer Interface** | Every tool the agent can call must be understood from the agent body alone. See [§4](#4-tool-list-and-tool-guidance). | + +--- + +## 2. Recommended file structure + +Organise every `.agent.md` body in this order. Use `##` Markdown headers for each section. + +``` +description: (YAML frontmatter) +tools: (YAML frontmatter) + +# @agent-name ← one-line purpose + relationship to cooperating agents + +## Initialisation ← storage/seed setup; runs only when starting fresh +## Session State ← note-taking schema; required if the agent runs multi-step tasks +## Mode Dispatch ← routing table: intent → skill + scope (if agent is multi-modal) +## Scope ← what this agent does +## Does NOT ← explicit out-of-scope items with named agent responsible +## Tool Guidance ← per-tool notes (usage, edge cases, common mistakes) +## Examples ← 1–2 canonical inline few-shot examples +## [Domain conventions] ← reference data kept inline (small, stable, always needed) +## Skills ← table with path and "When to load" column +## Operating rules ← decision rules; use sub-headers, not a flat bullet list +## File editing protocol ← CLI constraint protocol (if agent runs in CLI context) +## Handoff ← inbound and outbound structured payloads +``` + +**Altitude rule:** Instructions should sit in the Goldilocks zone — specific enough to guide behaviour, flexible enough to avoid brittle if-else hardcoding. Avoid encoding single-valued rules (e.g. `always use field X = "Y"`) in the agent body when they belong in the skill or the Storage Profile. + +--- + +## 3. Planning transparency + +Every agent **must** instruct itself to narrate its plan before executing any multi-step task: + +```markdown +**Before executing any multi-step task:** State your plan in one sentence — name the +mode or skill you will use and your first concrete action. Then proceed. +``` + +This satisfies Anthropic's second principle ("prioritise transparency by explicitly showing planning steps") and helps users understand and correct the agent's interpretation before it acts. + +--- + +## 4. Tool list and tool guidance + +### `tools:` frontmatter + +- List **individual tools** — not group aliases like `vscode` or `browser`. +- Only include tools the agent actually needs for its stated scope. +- An agent that explicitly states "Does NOT crawl web apps" must not list `browser/clickElement`, `browser/typeInPage`, etc. + +### `## Tool Guidance` body section + +Add a table with one row per key tool: + +```markdown +| Tool | When to use | Key guidance | +|---|---|---| +| `read/readFile` | Load entity files before updating | Always read before writing — never assume current values. | +| `edit/editFiles` | Patch existing files | Read the full target block first. Show OLD vs NEW for ACTIVE entity changes. | +| `search/codebase` | Confirm code deletion before deprecating | Require negative result for at least two identifiers before assuming deleted. | +``` + +**Rule:** If a human engineer on your team couldn't immediately tell which tool to use in a given situation, the agent can't either. Add guidance until the choice is unambiguous. + +--- + +## 5. Inline examples + +Include **1–2 canonical few-shot examples** directly in the agent body. Examples are the most token-efficient way to convey expected output format and planning behaviour. + +Format: + +```markdown +## Examples + +**Example 1 — <mode/scenario name>** + +> User: <short prompt> + +Agent plan: <one-sentence narration> + +_(Brief description of what happens next.)_ + +Expected output: +\```<language> +<representative output snippet> +\``` +``` + +**Rules:** +- Cover the most common trigger case and one error/edge case. +- Examples must use real entity ID patterns (`US-007`, `FEAT-003`, `AC:US-007-01`). +- Step text in Gherkin examples must use domain language — no selectors, HTTP references, or database terms. +- Do not stuff every edge case in — 2 canonical examples beat 10 exhaustive ones. + +--- + +## 6. Context management — just-in-time loading + +**Skills:** Load one skill per session, only when the mode is confirmed. Never pre-load skills for modes that haven't been triggered. The `Mode Dispatch` table must show which skill maps to each intent. + +**Manifests and large files:** Always load with targeted line ranges for the route(s) in scope. Load the full file only for full re-scan operations. + +**Reference docs:** Do not inline content that is available in a referenced file, unless it is small (< 30 lines) and needed across all modes. Use `[Load only if …]` annotations in the Skills table. + +**Gap Finder modes:** Mode detail (what HEALING does, what PLAN does) belongs in the `living-doc-gap-finder` skill — not duplicated in the agent body. The agent body should name the mode and point to the skill. + +--- + +## 7. Session state and note-taking + +Any agent that runs multi-step tasks spanning many tool calls **must** define a session state file. This prevents context rot and enables resuming interrupted sessions. + +**Minimum schema:** + +```markdown +# <Agent> Session State +_Auto-managed. Delete when session complete._ + +## Goal +<!-- One sentence --> + +## Progress +<!-- Per-item status: [ ] pending | [-] in progress | [x] done --> + +## Decisions & Findings +<!-- Non-obvious discoveries — expensive to re-derive --> +``` + +**Rules:** +- Store at `.copilot/<domain>/.session-state.md` (dot-prefix; add to `.gitignore`). +- Update after every item completes. +- Append to `Decisions & Findings` for non-obvious discoveries only. +- Never store large data objects here — those belong in the artifact file (e.g. `manifest.json`). +- Delete the file when the session goal is fully achieved. + +**Compaction trigger:** When context is nearing capacity, write a compaction summary (all unresolved items + key findings) to `Decisions & Findings`, then ask the user to start a new session and resume from the state file. + +--- + +## 8. Stopping conditions + +Every agent must define explicit escalation rules. At minimum include: + +```markdown +**Stopping conditions — escalate to user when:** +- <domain-specific failure condition 1> +- <domain-specific failure condition 2> +- Context is nearing capacity — write compaction summary to session state, then ask the user to resume in a new session. +- More than 50 tool calls have been made without completing the session goal — pause, summarise progress, and ask how to proceed. +``` + +**Why the 50-call limit matters:** Anthropic recommends max iteration caps for autonomous agents. Without a limit, compounding errors can cause an agent to execute dozens of irreversible actions before the user can intervene. + +--- + +## 9. Handoff contracts + +Agent-to-agent handoffs must use **structured payloads**, not free-form prose. Both sides (outbound and inbound) must match. + +```markdown +## Handoff + +**Outbound to @other-agent:** +\``` +Key: value +Key: value +\``` + +**Inbound from @other-agent:** +\``` +Key: value +\``` +``` + +**Rules:** +- Payloads must include entity IDs, state, version, and file paths where relevant. +- Never summarise loosely — use the exact payload format. +- If the target agent is not yet deployed, document with a `TODO: @agent-name` comment rather than omitting the handoff. + +--- + +## 10. Repo conventions every agent must follow + +### AC state vocabulary + +All agents in this repository use the same four AC states. Never introduce alternative spellings. + +| State | Meaning | +|---|---| +| `PLANNED` | Drafted; no implementation yet | +| `IN_REVIEW` | Implementation underway or in PR | +| `ACTIVE` | Implemented and verified | +| `DEPRECATED` | Superseded or deleted; requires `deprecated_at` and `deprecation_reason` | + +### Entity ID format + +`US-<nnn>` · `FEAT-<nnn>` · `FUNC-<nnn>` · `AC:<parent-id>-<nn>` + +IDs are stable — never change an ID after creation. Bump the `version` field for changes. + +### Gherkin traceability tag format + +```gherkin +# AC:US-007-01 (v1.0.0 - ACTIVE) — <description> +@AC:US-007-01 +Scenario: ... +``` + +One `# AC:` + `@AC:` pair per AC. The `@AC:` tag is the machine-readable traceability anchor — never delete or rename it without syncing the catalog entity. + +### Cooperating agent boundary + +| Layer | Owner | +|---|---| +| Catalog (entities, ACs, traceability links) | `@living-doc-copilot` | +| Automation (PageObjects, step definitions, feature files) | `@living-doc-bdd-copilot` | + +Never cross this boundary. When a task belongs to the other agent, hand off using the structured payload — do not attempt the task yourself. diff --git a/docs/guides/living-doc-bdd-copilot.md b/docs/guides/living-doc-bdd-copilot.md index 1dc19ce..2990c2c 100644 --- a/docs/guides/living-doc-bdd-copilot.md +++ b/docs/guides/living-doc-bdd-copilot.md @@ -142,7 +142,7 @@ When this agent loads `living-doc-gap-finder`, it uses the **bottom-up** (scenar | `living-doc-pageobject-scan` | Discover, create, and maintain PageObject classes | | `living-doc-scenario-creator` | Generate Gherkin scenario skeletons from User Story ACs | | `living-doc-gap-finder` | Find ACs with no linked scenario (bottom-up, scenario coverage) | -| `gherkin-scenario` | Write BDD Gherkin scenarios in plain business language | +| `bdd-scenario-gen` | Write BDD Gherkin scenarios, detect coverage gaps, resolve step stubs | | `gherkin-step` | Implement step definitions — clean, reusable, maintainable | | `gherkin-living-doc-sync` | Sync feature files and scenarios with living doc traceability links | diff --git a/skills/bdd-explore/SKILL.md b/skills/bdd-explore/SKILL.md index db4c4fa..eaca791 100644 --- a/skills/bdd-explore/SKILL.md +++ b/skills/bdd-explore/SKILL.md @@ -9,10 +9,18 @@ description: > Triggers on: "scan webapp", "crawl UI", "explore the app", "discover routes", "business seed", "seed.yaml", "manifest.json", "build pageobjects", "first scan", "assemble seed", "guided traversal", "explore routes", "bdd explore". + Does NOT trigger for: standalone PageObject generation from a pre-built manifest without a + live webapp crawl (use living-doc-pageobject-scan); BDD maintenance after UI changes or + test failures (use bdd-maintain). +license: Apache-2.0 +compatibility: GitHub Copilot --- # BDD Explore — Business Seed Assembly & Iterative Crawl +> **Glossary:** Feature, Functionality, User Story — see [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)). +> **BDD schemas:** ExplorationFixture taxonomy, seed.yaml schema, manifest field_constraints — see [living-doc-bdd-schemas](../references/living-doc-bdd-schemas.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-bdd-schemas.md)). + --- ## Business Seed Assembly @@ -85,7 +93,7 @@ form_fixtures: {} # keyed by route path; populated during form traversal (Expl 4. For each new surface discovered: add an entry to `manifest.json` (Feature name, URL, component IDs, PageObject path). 5. Repeat until coverage plateau — no new surfaces found in the last full iteration. 5a. **Entity harvesting** — whenever a domain ID, version, feed ID, or other parameterised entity is read from the DOM (URLs, card text, table rows), record it under `known_entities` in `seed.yaml` if not already present. Fields: `id`, `version`, `name`, `status`, `owner`, `note`. These values feed the sourcing cascade for parameterised routes in subsequent sessions. -6. For each form, wizard, or dialog on a visited page, attempt to fill and progress using the **ExplorationFixture sourcing cascade** (see glossary): (1) pre-declared values in `seed.yaml form_fixtures` — use the `default`-labelled value for the happy path and explore alternate `values[]` branches to reach different form sections or sub-routes; (2) values read from an existing entity in the app — copy verbatim (`copyable`) or append a suffix to avoid duplicate rejection (`derived`); (3) inferred `fake` values from label + placeholder + tooltip text; (4) user-assist pause for `real-world` fields with no resolvable value. Skip `condition`-gated fields until the controlling field holds the required value. After a successful submission, probe each text input for: special characters (`<>'"&\`), oversized input (200+ chars), wrong type, and duplicate values — run the core scan after each probe to capture `data-cy` validation elements visible only in error state. Record findings as `field_constraints` in the manifest `navigation_context`. Report any still-unreachable flows (auth walls, CAPTCHA, deep data dependencies) and offer to enrich `seed.yaml`. **Dismiss rule — after scanning any modal dialog or overlay, always close it (Cancel button → × close button → Escape key, in that order) before navigating to the next route or triggering the next action. Never leave a dialog open while scanning a subsequent page.** +6. For each form, wizard, or dialog on a visited page, attempt to fill and progress using the **ExplorationFixture sourcing cascade** (see [living-doc-bdd-schemas](../references/living-doc-bdd-schemas.md#explorationfixture)): (1) pre-declared values in `seed.yaml form_fixtures` — use the `default`-labelled value for the happy path and explore alternate `values[]` branches to reach different form sections or sub-routes; (2) values read from an existing entity in the app — copy verbatim (`copyable`) or append a suffix to avoid duplicate rejection (`derived`); (3) inferred `fake` values from label + placeholder + tooltip text; (4) user-assist pause for `real-world` fields with no resolvable value. Skip `condition`-gated fields until the controlling field holds the required value. After a successful submission, probe each text input for: special characters (`<>'"&\`), oversized input (200+ chars), wrong type, and duplicate values — run the core scan after each probe to capture `data-cy` validation elements visible only in error state. Record findings as `field_constraints` in the manifest `navigation_context`. Report any still-unreachable flows (auth walls, CAPTCHA, deep data dependencies) and offer to enrich `seed.yaml`. **Dismiss rule — after scanning any modal dialog or overlay, always close it (Cancel button → × close button → Escape key, in that order) before navigating to the next route or triggering the next action. Never leave a dialog open while scanning a subsequent page.** **Component interaction rules — use these instead of `fill()` for custom components:** diff --git a/skills/bdd-maintain/SKILL.md b/skills/bdd-maintain/SKILL.md index 9d63590..ff24903 100644 --- a/skills/bdd-maintain/SKILL.md +++ b/skills/bdd-maintain/SKILL.md @@ -2,17 +2,27 @@ name: bdd-maintain description: > Maintenance modes for the @living-doc-bdd-copilot agent: RE-SCAN (full manifest refresh - after UI changes), HEALING (fix selector drift in failing tests only), and REMOVE - (delete files linked to a deprecated feature). Activate when the UI has changed and the - manifest needs refreshing, when tests are failing due to selector drift, or when a feature - has been removed from the product. + after UI changes), HEALING (fix selector drift in failing tests only), REMOVE + (delete files linked to a deprecated feature), and DEAD CODE AUDIT (find unused step + definitions, PageObject methods, and PO components). Activate when the UI has changed and + the manifest needs refreshing, when tests are failing due to selector drift, when a feature + has been removed from the product, or when dead BDD code needs to be identified. Triggers on: "re-scan", "refresh manifest", "heal pageobjects", "fix failing tests", "selector drift", "tests are failing", "remove feature", "deprecate bdd", "bdd maintain", - "update selectors", "pageobject broken", "scenario failing". + "update selectors", "pageobject broken", "scenario failing", "unused steps", + "dead pageobject methods", "find unused steps", "dead code audit", "unused po methods". + Does NOT trigger for: first-time webapp exploration and seed assembly (use bdd-explore); + standalone (non-agent) PageObject maintenance outside @living-doc-bdd-copilot + (use living-doc-pageobject-scan). +license: Apache-2.0 +compatibility: GitHub Copilot --- # BDD Maintenance +> **Glossary:** Feature, Functionality, User Story — see [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)). +> **BDD schemas:** manifest.json schema (routes, elements, coverage_gaps, navigation_context) — see [living-doc-bdd-schemas](../references/living-doc-bdd-schemas.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-bdd-schemas.md)). + Three modes — activate the one that matches the trigger. --- diff --git a/skills/bdd-scenario-gen/SKILL.md b/skills/bdd-scenario-gen/SKILL.md index 90fd0a1..be43635 100644 --- a/skills/bdd-scenario-gen/SKILL.md +++ b/skills/bdd-scenario-gen/SKILL.md @@ -1,87 +1,225 @@ --- name: bdd-scenario-gen description: > - Generate Gherkin scenario skeletons from User Story Acceptance Criteria and resolve - step definitions for the @living-doc-bdd-copilot agent. Activate after exploration - completes (manifest up to date) or when a specific US needs BDD coverage. - Covers gap detection logic, scenario skeleton generation, step reuse/stub rules, - feature file naming and header conventions, and @AC: traceability tagging. - Triggers on: "generate scenarios", "cover AC with scenarios", "generate feature file", - "gherkin from user story", "scenario coverage", "map AC to scenarios", - "AC coverage for US", "scenarios for US-", "bdd scenario gen". + BDD scenario writing quality and agent-level scenario generation for @living-doc-bdd-copilot. + Covers: writing Gherkin in plain business language, Given/When/Then correctness, + one-behaviour-per-scenario rule, Scenario Outline, Background, anti-patterns, feature file + types (US vs Functionality), @AC: traceability annotations (authoritative format), gap + detection via living-doc-gap-finder, and step definition resolution against PageObjects. + Triggers on: "write a Gherkin scenario", "BDD scenario", "standalone feature file", + "Given When Then", "Scenario Outline", "Cucumber scenario", "behave scenario", + "acceptance test in Gherkin", "should I use Background", "BDD anti-patterns", + "review my feature file", "BDD scenarios for", + "convert acceptance criteria to Gherkin", "# AC: comment", "exploratory scenario". + Does NOT trigger for: implementing step definitions (use gherkin-step), writing unit tests, + designing a test case table, generating the living-doc feature file header block or skeleton + scenarios (use living-doc-scenario-creator). +license: Apache-2.0 +compatibility: GitHub Copilot --- # BDD Scenario Generation -Use after exploration completes (manifest is up to date), or targeting a specific User Story. +> **Glossary:** User Story, AC, Feature, PageObject — see [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)). +> **BDD schemas:** US and Functionality feature file templates — see [living-doc-bdd-schemas](../references/living-doc-bdd-schemas.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-bdd-schemas.md)). + +Use for: writing or reviewing Gherkin scenarios, generating feature files from ACs, detecting uncovered ACs, resolving step stubs against PageObjects. --- -## Gap Detection +## Gap Detection (agent mode) An AC is considered uncovered if no scenario in any `.feature` file carries the `@AC:<id>` traceability tag. 1. Use the `living-doc-gap-finder` skill (bottom-up mode) to identify User Stories with `ACTIVE` ACs that have no linked Gherkin scenario. -2. For each gap: generate Gherkin scenario skeletons — one scenario per `Active` or `Implemented` AC, with the mandatory `@AC:` traceability tag. Skip `Planned` and `Deprecated` ACs. +2. For each gap: generate scenario skeletons — one scenario per `ACTIVE` AC with the mandatory `@AC:` traceability tag. Skip `PLANNED` and `DEPRECATED` ACs. + +--- + +## Feature File Types + +Two categories of `.feature` files exist — they have different locations, headers, and scopes: + +| Type | Location | Feature block | Scope | +|---|---|---|---| +| User Story (E2E) | `features/us/us-<nnn>-<kebab>.feature` | `Feature: <US title>` with As-a/I-can/so-that narrative + `@US_ID:US-<n>` tag | End-to-end, user perspective | +| Functionality (system test) | `features/functionalities/<feat-kebab>/func-<nnn>-<kebab>.feature` | `Feature: <Feature name> — <Functionality name>` + `@FUNC_ID:FUNC-<nnn>` tag | One atomic behavior, input to output | + +For non-living-doc scenarios (exploratory probes, regression suites not tied to a US AC), `@AC:` annotations are not required. Use `@AC:STANDALONE` as an optional placeholder when explicitly signalling that a scenario is intentionally unlinked — `gherkin-living-doc-sync` will note it but not flag it as a traceability gap. --- ## Feature File Conventions -- Write `.feature` files under `features/us/` using `us-<nnn>-<kebab-title>.feature` naming, e.g. `features/us/us-007-place-an-online-order.feature`. +- File naming: `us-<nnn>-<kebab-title>.feature` under `features/us/`, e.g. `features/us/us-007-place-an-online-order.feature`. - The `Feature:` header must restate the User Story narrative in `As a / I can / so that` form. - Scenario step text must stay in business/domain language only — never mention selectors, HTTP calls, DOM details, or database operations. --- +## Write in the Ubiquitous Language + +Scenarios must use the language of the business domain. Anyone on the product team must be able to read and verify them without knowing the implementation. + +```gherkin +# ✅ — business language +Given a customer with a gold membership +When they place an order for 2 units of "SKU-100" +Then the order is confirmed and the total is £160.00 + +# ❌ — implementation details +Given the database contains a row in users with tier="gold" +When a POST request is sent to /api/orders with body { "sku": "SKU-100", "qty": 2 } +Then the response status is 201 +``` + +--- + +## Given / When / Then + +| Keyword | Purpose | Rule | +|---|---|---| +| **Given** | System state before the action | Preconditions only — no actions | +| **When** | The action the actor takes | Exactly one meaningful action per scenario | +| **Then** | Observable outcome | Assertions only — no actions | +| **And / But** | Continuation | Never as the first step in a block | + +```gherkin +# ✅ +Given the customer's cart contains 3 items +When the customer applies the promo code "SAVE10" +Then the cart total is reduced by 10% + +# ❌ — multiple When actions (split into separate scenarios) +When the customer applies the promo code "SAVE10" +And the customer proceeds to checkout +And the customer enters payment details +``` + +--- + +## One Behaviour per Scenario + +Each scenario must verify exactly one observable behaviour. If the scenario name contains "and", it likely tests two behaviours — split it. + +--- + +## Scenario Outline for Data-Driven Variations + +```gherkin +# ✅ +Scenario Outline: Discount is applied correctly for each membership tier + Given a customer with a <tier> membership + When they purchase an item costing £100.00 + Then the total is £<total> + + Examples: + | tier | total | + | gold | 80.00 | + | silver | 90.00 | + | bronze | 95.00 | +``` + +When illustrating discount calculations, show the resulting order total in the `Then` step or `Examples:` table rather than the raw discount percentage. If the prompt does not give an amount, default to £100.00 for comparison tables and £200.00 for single-scenario threshold cases so the discounted outcome is concrete. + +--- + +## Background + +Use `Background` when **every** scenario in the file shares the same precondition. Keep Background to 3 steps or fewer. If only 2–3 scenarios share a precondition, duplicate the `Given` step — prefer clarity over abstraction. Keep `Background` to `Given` preconditions only, not `When` or `Then` steps. + +When answering whether `Background` is appropriate, confirm all three checks: shared-by-every-scenario, 3-steps-or-fewer, and no-subset-sharing. + +--- + +## Anti-Patterns + +| Anti-pattern | Problem | Fix | +|---|---|---| +| UI selectors in steps (`I click the "Submit" button`) | Breaks when UI changes | Use domain actions (`the customer submits the order`) | +| Imperative style (`I enter "alice@example.com" in Email field`) | Fragile and verbose | Declarative (`the customer logs in as Alice`) | +| Multiple `When` per scenario | Usually signals multiple behaviours | Prefer splitting; if all steps represent one logical action, collapse into one declarative step | +| Assertions in Given/When | Violates keyword semantics | Move all assertions to `Then` | +| Scenario depends on a previous scenario's state | Hidden ordering dependency | Each scenario must be fully self-contained | + +When reviewing an existing scenario, explicitly check for a missing `@AC:` tag immediately above each `Scenario:` or `Scenario Outline:` and call that out as a traceability defect. + +--- + ## Traceability Annotations -Every `Scenario:` or `Scenario Outline:` in a living-doc feature file must carry two complementary annotations: +Living-doc feature files (`features/us/` and `features/functionalities/`) require two complementary annotations above each `Scenario:` or `Scenario Outline:`: -1. A `# AC:` comment — human-readable context (ID, version, state, description, optional aspect). -2. An `@AC:` Cucumber tag — machine-readable link: `@AC:<id>[/param:value...]`. +1. **`# AC:` comment** — human-readable context: AC ID, version, state, description, and optionally the specific aspect this scenario covers. +2. **`@AC:` tag** — machine-readable Cucumber tag consumed by scripts and coverage reports. ```gherkin -# AC:US-1-01 (v1.0.0 - Active) — customer places an order with a saved payment method +# AC:US-1-01 (v1.0.0 - ACTIVE) — customer places an order with a saved payment method @AC:US-1-01 Scenario: Customer successfully places an order + ... ``` -When the scenario covers only **one aspect** of a multi-aspect AC, encode it as a `/param:value` segment: +When a scenario covers only **one aspect** of a multi-aspect AC, encode the aspect as a `/param:value` segment on the tag and mirror it in the comment: ```gherkin -# AC:US-1-01 (v1.0.0 - Active) — displays {required field} on login screen | aspect: username input +# AC:US-1-01 (v1.0.0 - ACTIVE) — displays {required field} on login screen | aspect: username input @AC:US-1-01/aspect:username-input Scenario: Login form shows the username input field + ... ``` -Multiple ACs — one comment + tag pair per AC: +The `/param:value` format is extensible. Multiple ACs — one comment + tag pair per AC: ```gherkin -# AC:US-1-01 (v1.0.0 - Active) — invalid credentials show an error message -# AC:US-1-02 (v1.0.0 - Active) — account lockout after 3 failed attempts +# AC:US-1-01 (v1.0.0 - ACTIVE) — invalid credentials show an error message +# AC:US-1-02 (v1.0.0 - ACTIVE) — account lockout after 3 failed attempts @AC:US-1-01 @AC:US-1-02 @Regression Scenario: User is locked out after repeated failed logins + ... ``` -Feature files outside `features/us/` and `features/functionalities/` (smoke tests, regression suites, exploratory probes) do not require these annotations. +The AC tag prefix matches the parent entity: `@AC:US-<n>-<nn>` for User Story scenarios, `@AC:FUNC-<nnn>-<nn>` for Functionality scenarios. --- -## Step Definition Resolution +## Step Definition Resolution (agent mode) -For each generated scenario: +For each generated scenario step: a. **Narrow the search scope to the page first** — identify which PageObject the scenario's steps will interact with. Look in step definition files that already import or reference that PageObject; these are the most likely candidates for reuse. -b. **Match by purpose, not just pattern** — read the step's implementation body to confirm it performs the same business action (e.g. a `fill` on `username-input` vs a `fill` on `search-input` look identical in text but serve different purposes). Only reuse if purpose matches. +b. **Match by purpose, not just pattern** — read the step's implementation body to confirm it performs the same business action. Only reuse if purpose matches. c. If a purpose-matching step exists, reuse it as-is; note which library file it lives in. d. If no reusable step exists but the needed PageObject method already exists, generate a full step stub via `gherkin-step` that delegates directly to that PageObject method. -e. If neither the step nor the PageObject method exists, generate a stub that raises `NotImplementedError` (or the language-equivalent pending marker) and explicitly flag that the PageObject must be extended with the missing interaction. +e. If neither the step nor the PageObject method exists, generate a stub that raises `NotImplementedError` and flag that the PageObject must be extended with the missing interaction. After resolution, update `manifest.json` to record any new PageObject paths created. + +--- + +## Output Format + +Output all generated Gherkin in a single fenced `gherkin` code block starting with `Feature:`. Use only `Scenario:`, `Scenario Outline:`, `Background:`, `Given`, `When`, `Then`, `And`, `But`, and `Examples:` inside the block. + +--- + +## Out-of-Scope Routing + +| Request | Use instead | +|---|---| +| Implementing step definitions | **gherkin-step** | +| Writing unit tests | Use your project's unit test framework directly | +| Designing a test case table | Use your project's test design practice | +| Generate a living-doc US entity with AC coverage report | **living-doc-scenario-creator** | + +If asked for step definition code, do not write it here — redirect to **gherkin-step**. If asked for a US entity skeleton with an AC coverage report, redirect to **living-doc-scenario-creator**. + +**Ambiguous request — "create scenarios for US-007":** If the user does not specify whether they want the feature file structure or full scenario bodies, ask: +> "Do you want the living-doc feature file header and skeleton scenario titles (use `living-doc-scenario-creator`), or full Given/When/Then scenario bodies (continue here in `bdd-scenario-gen`)?" +Both skills handle different parts of the same feature file — they are meant to be used in sequence. diff --git a/skills/data-cy-instrument/SKILL.md b/skills/data-cy-instrument/SKILL.md index 54d3e55..1a0176e 100644 --- a/skills/data-cy-instrument/SKILL.md +++ b/skills/data-cy-instrument/SKILL.md @@ -1,21 +1,35 @@ --- name: data-cy-instrument description: > - Automatically resolve missing `data-cy` attributes in Angular component templates + Automatically resolve missing `data-cy` attributes in component templates (Angular-first) and sync the corresponding Playwright PageObjects to use `getByTestId()`. Activate whenever coverage gaps exist in `manifest.json`, when PageObject stubs carry "⚠️ PROPOSED" locator comments, when Functionality entities have `status: planned` due to missing test IDs, or when a dev explicitly asks to instrument templates. - Fires automatically at the end of a `bdd-explore` or `bdd-maintain` RE-SCAN session - when `coverage_gaps` arrays are non-empty. + Activate at the end of a `living-doc-pageobject-scan`, `bdd-explore`, or `bdd-maintain` + RE-SCAN session when `coverage_gaps` arrays are non-empty; Triggers on: "add missing data-cy", "instrument templates", "fix data-cy gaps", "add testids", "data-cy audit", "instrument angular templates", "fix locators", "add data-cy attributes", "add test ids to templates", "fix playwright selectors", "data-cy-instrument". + Does NOT trigger for: adding or fixing Gherkin scenarios (use bdd-scenario-gen); generating + or healing PageObjects without instrumentation gaps (use living-doc-pageobject-scan); initial + webapp crawl (use bdd-explore). +license: Apache-2.0 +compatibility: GitHub Copilot --- # data-cy-instrument +> **Glossary:** Feature, Functionality, status vocabulary — see [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)). +> **BDD schemas:** manifest.json coverage_gaps schema, seed.yaml form_fixtures — see [living-doc-bdd-schemas](../references/living-doc-bdd-schemas.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-bdd-schemas.md)). + +**Framework scope:** This skill is **Angular-first** — naming conventions, routing module paths, +and feature-flag patterns are Angular-specific. The gap audit, naming validation, and PageObject +sync phases (Phases 1, 3, 5) are framework-agnostic and apply to any frontend stack. For +React, Vue, or other frameworks, adapt the component resolution in Phase 2 to the project's +routing and component model; all other phases apply unchanged. + Resolves missing `data-cy` attributes end-to-end: from gap discovery in `manifest.json` through Angular template edits, PageObject sync, Functionality promotion, and WORK_LOG status update. All steps are in sequence — do not skip steps or re-order them. diff --git a/skills/gherkin-living-doc-sync/SKILL.md b/skills/gherkin-living-doc-sync/SKILL.md index 5f7231a..d9a718c 100644 --- a/skills/gherkin-living-doc-sync/SKILL.md +++ b/skills/gherkin-living-doc-sync/SKILL.md @@ -10,14 +10,17 @@ description: > Triggers on: "sync gherkin to living doc", "feature file out of sync", "scenario not linked to AC", "step text changed", "gherkin drift", "BDD sync", "AC link missing in feature file", "sync scenarios", "traceability broken", "propagate AC changes", "AC was descoped". - Does NOT trigger for: writing new scenarios (use gherkin-scenario), implementing step + Does NOT trigger for: writing new scenarios (use bdd-scenario-gen), implementing step definitions (use gherkin-step), finding living doc gaps (use living-doc-gap-finder), creating new US/Feature entities (use living-doc-create-user-story). +license: Apache-2.0 +compatibility: GitHub Copilot --- # Gherkin ↔ Living Doc Sync > **Glossary:** Feature, Functionality, User Story, AC — see [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)). +> **BDD schemas:** US and Functionality feature file headers, `# Acceptance Criteria:` block format — see [living-doc-bdd-schemas](../references/living-doc-bdd-schemas.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-bdd-schemas.md)). Sync runs in three directions: (1) feature file to living doc, (2) living doc AC to feature file, (3) step text to PageObject method signature. @@ -67,11 +70,21 @@ Scenario: Login form shows the username input field - The `@AC:` tag(s) must appear on the lines immediately above `Scenario:` or `Scenario Outline:`. Additional tags (e.g. `@Regression`, `@skip`) may appear in the same block. - Full AC details (version, state, description) live in the file's `# Acceptance Criteria:` header block. +**Deprecated US detection (Direction 4):** When the trigger is a deprecated User Story, first +build the list of affected scenarios before running the standard checklist: + +1. Collect all AC IDs owned by the deprecated US (from the US entity or its feature file header). +2. Search all `.feature` files under `features/us/` and `features/functionalities/` for + `@AC:` tags matching those IDs. +3. For each matching scenario, emit a SYNC ACTION to add `@deprecated` and `@review-needed`, + with a comment recording the deprecation date and reason from the US entity. +4. After tagging, continue with the standard checklist below to catch any remaining link issues. + **Audit checklist:** 1. Does every `Scenario:` / `Scenario Outline:` in living-doc files have at least one `@AC:` tag? 2. Is the corresponding `# AC:` comment present and matching the tag's AC ID? 3. Does the referenced AC ID exist in the living documentation? -4. Does the AC state match (`Active` or `Implemented` — not `Deprecated` or `Planned`)? +4. Does the AC state match (`ACTIVE` — not `DEPRECATED`, `PLANNED`, or `IN_REVIEW`)? 5. Does the AC description (in the file header) match the scenario intent? For each missing or mismatched tag: @@ -155,7 +168,7 @@ Summary: 2 missing AC links, 1 step text drift detected — apply changes? (y/n | Scenario with no `@AC:` tag | Missing traceability — add tag or create AC | | Two scenarios linked to the same AC | Usually a duplicate — review | | AC linked from a scenario in a different User Story's feature file | Passive cross-US coverage — permitted but note it in the sync report. Only flag if the scenario's primary intent belongs to a different User Story (misplaced scenario) | -| Step text describes implementation (selector, endpoint) | Gherkin business-language violation — refer to `gherkin-scenario` | +| Step text describes implementation (selector, endpoint) | Gherkin business-language violation — refer to `bdd-scenario-gen` | --- @@ -163,7 +176,7 @@ Summary: 2 missing AC links, 1 step text drift detected — apply changes? (y/n | Request | Use instead | |---|---| -| Writing new Gherkin scenarios from scratch | `gherkin-scenario` | +| Writing new Gherkin scenarios from scratch | `bdd-scenario-gen` | | Implementing step definition code | `gherkin-step` | | Finding ACs with no scenario coverage | `living-doc-gap-finder` | | Creating new User Story, Feature, or Functionality entities | `living-doc-create-user-story` / `living-doc-create-functionality` | diff --git a/skills/gherkin-living-doc-sync/evals/evals.json b/skills/gherkin-living-doc-sync/evals/evals.json index e024005..17b8ae9 100644 --- a/skills/gherkin-living-doc-sync/evals/evals.json +++ b/skills/gherkin-living-doc-sync/evals/evals.json @@ -60,11 +60,11 @@ "id": 5, "category": "negative", "prompt": "I need a new Gherkin scenario for the case where a promo code has expired.", - "expected_output": "Writing new scenarios is out of scope for this skill — routes to gherkin-scenario. gherkin-living-doc-sync corrects existing links and syncs existing scenarios; it does not write new scenarios from scratch.", + "expected_output": "Writing new scenarios is out of scope for this skill — routes to bdd-scenario-gen. gherkin-living-doc-sync corrects existing links and syncs existing scenarios; it does not write new scenarios from scratch.", "files": [], "expectations": [ "Does not write a new scenario", - "Routes to gherkin-scenario", + "Routes to bdd-scenario-gen", "Explains the distinction: sync vs. write new" ] }, diff --git a/skills/gherkin-living-doc-sync/evals/fixture-map.md b/skills/gherkin-living-doc-sync/evals/fixture-map.md index eed2ceb..4e36e15 100644 --- a/skills/gherkin-living-doc-sync/evals/fixture-map.md +++ b/skills/gherkin-living-doc-sync/evals/fixture-map.md @@ -12,7 +12,7 @@ No fixture files for this skill. All evals are conversational — the skill oper | 2 | happy-path | _(none)_ | AC description updated in living doc → propagate to # AC: comment in feature file | | 3 | happy-path | _(none)_ | Step text drift after UI rename → DRIFT DETECTED block with two fix options | | 4 | regression | _(none)_ | US deprecated in living doc → @deprecated + @review-needed tags on linked scenarios | -| 5 | negative | _(none)_ | Routing: new scenario authoring → gherkin-scenario | +| 5 | negative | _(none)_ | Routing: new scenario authoring → bdd-scenario-gen | | 6 | paraphrase | _(none)_ | "Feature files are a mess after redesign" → prioritised repair plan: steps first, then links | | 7 | edge-case | _(none)_ | Broken AC reference (US-099 not in catalog) → resolution options, never remove the link | | 8 | output-format | _(none)_ | Sync run output format: SYNC ACTION + DRIFT DETECTED blocks + summary line | @@ -27,7 +27,7 @@ No fixture files for this skill. All evals are conversational — the skill oper | Routes to | Query count | |---|---| -| gherkin-scenario | 2 | +| bdd-scenario-gen | 2 | | gherkin-step | 1 | | living-doc-gap-finder | 1 | | living-doc-create-user-story | 1 | diff --git a/skills/gherkin-living-doc-sync/evals/trigger-eval.json b/skills/gherkin-living-doc-sync/evals/trigger-eval.json index 722e6bc..14956d3 100644 --- a/skills/gherkin-living-doc-sync/evals/trigger-eval.json +++ b/skills/gherkin-living-doc-sync/evals/trigger-eval.json @@ -10,13 +10,13 @@ {"id": 9, "query": "Sync all scenarios in the payments feature file", "should_trigger": true, "reason": "'sync scenarios' trigger phrase"}, {"id": 10, "query": "The Gherkin scenarios are out of sync with the living doc", "should_trigger": true, "reason": "'gherkin out of sync with living doc' trigger phrase"}, {"id": 11, "query": "Traceability is broken between the feature files and the AC catalog", "should_trigger": true, "reason": "'traceability broken' trigger phrase"}, - {"id": 12, "query": "Write a new scenario for the expired promo AC", "should_trigger": false, "reason": "Writing new scenarios — routes to gherkin-scenario"}, + {"id": 12, "query": "Write a new scenario for the expired promo AC", "should_trigger": false, "reason": "Writing new scenarios — routes to bdd-scenario-gen"}, {"id": 13, "query": "Implement the step definition for 'When the customer confirms the order'", "should_trigger": false, "reason": "Step definition implementation — routes to gherkin-step"}, {"id": 14, "query": "Find which User Stories have no Gherkin scenarios", "should_trigger": false, "reason": "Finding living doc gaps — routes to living-doc-gap-finder"}, {"id": 15, "query": "Create a new User Story for the checkout capability", "should_trigger": false, "reason": "Creating new entities — routes to living-doc-create-user-story"}, {"id": 16, "query": "Propagate AC changes from the living doc back to the feature files", "should_trigger": true, "reason": "'propagate AC changes' trigger phrase"}, {"id": 17, "query": "The @AC: tag and the # AC: comment are out of sync — what do I do?", "should_trigger": true, "reason": "Comment/tag mismatch is a sync issue — core task of this skill"}, - {"id": 18, "query": "Generate a new scenario for the expired promo AC from scratch", "should_trigger": false, "reason": "Writing new scenarios from scratch — routes to gherkin-scenario (not syncing existing ones)"}, + {"id": 18, "query": "Generate a new scenario for the expired promo AC from scratch", "should_trigger": false, "reason": "Writing new scenarios from scratch — routes to bdd-scenario-gen (not syncing existing ones)"}, {"id": 19, "query": "Run scan_ac_links.py before doing a sync pass", "should_trigger": true, "reason": "Auditing AC link headers is the first step of the sync workflow — this skill owns scan_ac_links.py"}, {"id": 20, "query": "An AC was descoped last sprint — what should happen to the linked scenario?", "should_trigger": true, "reason": "Propagating AC status change (descoped) to feature file is a living-doc → feature file sync direction"} ] diff --git a/skills/gherkin-scenario/SKILL.md b/skills/gherkin-scenario/SKILL.md deleted file mode 100644 index 85389e1..0000000 --- a/skills/gherkin-scenario/SKILL.md +++ /dev/null @@ -1,203 +0,0 @@ ---- -name: gherkin-scenario -description: > - Writing BDD Gherkin scenarios in plain business language. Use when writing or reviewing - feature files, Given/When/Then steps, Scenario Outlines, Background blocks, or acceptance - criteria expressed as Gherkin — including `# AC:` comment annotations for traceability and - how to tag exploratory scenarios with no User Story. Covers one-behaviour-per-scenario rule - and anti-patterns (implementation leakage, multiple When actions, UI-speak). - Triggers on: "write a Gherkin scenario", "BDD scenario", "feature file", "Given When Then", - "Scenario Outline", "Cucumber scenario", "behave scenario", "acceptance test in Gherkin", - "should I use Background", "BDD anti-patterns", "review my feature file", "BDD scenarios for", - "convert acceptance criteria to Gherkin", "# AC: comment", "exploratory scenario". - Does NOT trigger for: implementing step definitions (use gherkin-step), writing unit tests - (use test-unit-write), designing a test case table (use test-case-design). - Pairs with gherkin-step for step definition implementation. ---- - -# Gherkin Scenario Standards - -> **Glossary:** User Story, AC, Feature — see [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)). - -## Traceability requirement - -Living-doc feature files (`features/us/` and `features/functionalities/`) require two -complementary annotations above each `Scenario:` or `Scenario Outline:`: - -1. **`# AC:` comment** — human-readable context: AC ID, version, state, description, and - optionally the specific aspect this scenario covers. -2. **`@AC:` tag** — machine-readable Cucumber tag consumed by scripts and coverage reports. - -```gherkin -# AC:US-1-01 (v1.0.0 - Active) — customer places an order with a saved payment method -@AC:US-1-01 -Scenario: Customer successfully places an order - ... -``` - -When a scenario covers only **one aspect** of a multi-aspect AC, encode the aspect as a -`/param:value` segment on the tag and mirror it in the comment: - -```gherkin -# AC:US-1-01 (v1.0.0 - Active) — displays {required field} on login screen | aspect: username input -@AC:US-1-01/aspect:username-input -Scenario: Login form shows the username input field - ... -``` - -The `/param:value` format is extensible — future params (e.g. `/coverage:partial`) can be -appended. Multiple ACs — one comment + tag pair per AC: - -```gherkin -# AC:US-1-01 (v1.0.0 - Active) — invalid credentials show an error message -# AC:US-1-02 (v1.0.0 - Active) — account lockout after 3 failed attempts -@AC:US-1-01 -@AC:US-1-02 -@Regression -Scenario: User is locked out after repeated failed logins - ... -``` - -The AC tag prefix matches the parent entity: `@AC:US-<n>-<nn>` for User Story scenarios, -`@AC:FUNC-<nnn>-<nn>` for Functionality scenarios. - -**Scope:** These annotations are only required in living-doc feature files. Other feature files -(smoke tests, regression suites, exploratory probes) do not require `@AC:` tags and may use -`@AC:STANDALONE` as an optional placeholder to signal intent. `gherkin-living-doc-sync` reports -`STANDALONE`-tagged scenarios but does not flag them as traceability gaps. - ---- - -## Feature file types - -Two categories of `.feature` files exist — they have different locations, headers, and scopes: - -| Type | Location | File header | Feature block | Scope | -|---|---|---|---|---| -| User Story (E2E) | `features/us/us-<nnn>-<kebab>.feature` | `# Source:`, `# Business Value:`, `# Acceptance Criteria:` block + `@US_ID:US-<n>` feature tag | `Feature: <US title>` with As-a/I-can/so-that narrative | End-to-end, user perspective | -| Functionality (system test) | `features/functionalities/<feat-kebab>/func-<nnn>-<kebab>.feature` | Similar to US — format TBD; `@FUNC_ID:FUNC-<nnn>` feature tag | `Feature: <Feature name> — <Functionality name>` | One atomic behavior, input to output | - -Both types use the `@AC:` + `# AC:` traceability annotations described above. Both must be written in business domain language — no implementation details, selectors, or code references. - -For non-living-doc scenarios (exploratory probes, tutorial walkthroughs, regression suites not tied to a User Story AC), `@AC:` annotations are not required. Use `@AC:STANDALONE` as an optional placeholder when explicitly signalling that a scenario is intentionally unlinked — `gherkin-living-doc-sync` will note it but not flag it as a traceability gap. - ---- - -## Write in the ubiquitous language - -Scenarios must use the language of the business domain. Anyone on the product team must be -able to read and verify them without knowing the implementation. - -```gherkin -# ✅ — business language -Given a customer with a gold membership -When they place an order for 2 units of "SKU-100" -Then the order is confirmed and the total is £160.00 - -# ❌ — implementation details -Given the database contains a row in users with tier="gold" -When a POST request is sent to /api/orders with body { "sku": "SKU-100", "qty": 2 } -Then the response status is 201 -``` - ---- - -## Follow Given / When / Then correctly - -| Keyword | Purpose | Rule | -|---------|---------|------| -| **Given** | System state before the action | Preconditions only — no actions | -| **When** | The action the actor takes | Exactly one meaningful action per scenario | -| **Then** | Observable outcome | Assertions only — no actions | -| **And / But** | Continuation | Never as the first step in a block | - -```gherkin -# ✅ -Given the customer's cart contains 3 items -When the customer applies the promo code "SAVE10" -Then the cart total is reduced by 10% - -# ❌ — multiple When actions (split into separate scenarios) -When the customer applies the promo code "SAVE10" -And the customer proceeds to checkout -And the customer enters payment details -``` - ---- - -## One behaviour per scenario - -Each scenario must verify exactly one observable behaviour. If the scenario name contains "and", -it likely tests two behaviours — split it. - ---- - -## Use Scenario Outline for data-driven variations - -```gherkin -# ✅ -Scenario Outline: Discount is applied correctly for each membership tier - Given a customer with a <tier> membership - When they purchase an item costing £100.00 - Then the total is £<total> - - Examples: - | tier | total | - | gold | 80.00 | - | silver | 90.00 | - | bronze | 95.00 | -``` - -When illustrating discount calculations, show the resulting order total in the `Then` step or -`Examples:` table rather than the raw discount percentage. If the prompt does not give an amount, -default to a £100.00 order for comparison tables and to £200.00 for single-scenario threshold cases -such as "orders over £100" so the discounted outcome is concrete. - ---- - -## Use Background for shared preconditions - -Use `Background` when **every** scenario in the file shares the same precondition. -Keep Background to 3 steps or fewer. If only 2–3 scenarios share a precondition, -duplicate the `Given` step — prefer clarity over abstraction. -When answering whether `Background` is appropriate, explicitly mention all three checks: -shared-by-every-scenario, 3-steps-or-fewer, and duplicate the `Given` steps instead when only a -subset of scenarios needs them. Keep `Background` to shared `Given` preconditions, not `When` or -`Then` steps. - ---- - -## Avoid common anti-patterns - -| Anti-pattern | Problem | Fix | -|---|---|---| -| UI selectors in steps (`I click the "Submit" button`) | Breaks when UI changes | Use domain actions (`the customer submits the order`) | -| Imperative style (`I enter "alice@example.com" in Email field`) | Fragile and verbose | Declarative (`the customer logs in as Alice`) | -| Multiple `When` per scenario | Usually signals multiple behaviours — try to avoid | Prefer splitting into separate scenarios; if all steps represent a single logical action, collapse into one declarative step | -| Assertions in Given/When | Violates keyword semantics | Move all assertions to `Then` | -| Scenario depends on a previous scenario's state | Hidden ordering dependency | Each scenario must be fully self-contained | - -When reviewing an existing scenario, explicitly check for a missing `@AC:` tag immediately -above each `Scenario:` or `Scenario Outline:` and call that out as a traceability defect. - ---- - -## Output format for generated scenarios - -Output all generated Gherkin in a single fenced `gherkin` code block starting with `Feature:`. -Use only `Scenario:`, `Scenario Outline:`, `Background:`, `Given`, `When`, `Then`, `And`, `But`, -and `Examples:` inside the block. - ---- - -## Out-of-scope routing - -| Request | Use instead | -|---|---| -| Implementing step definitions | **gherkin-step** | -| Writing unit tests | **test-unit-write** | -| Designing a test case table | **test-case-design** | - -If asked for step definition code, do not write it here. Redirect to **gherkin-step** and explain -that this skill writes or reviews Gherkin scenario text, while **gherkin-step** implements the step -binding code. diff --git a/skills/gherkin-scenario/evals/evals.json b/skills/gherkin-scenario/evals/evals.json deleted file mode 100644 index ba27df0..0000000 --- a/skills/gherkin-scenario/evals/evals.json +++ /dev/null @@ -1,112 +0,0 @@ -{ - "skill_name": "gherkin-scenario", - "evals": [ - { - "id": 1, - "category": "happy-path", - "prompt": "Write a Gherkin scenario for AC:US-003-01 — 'Customer with a gold membership receives a 20% discount on orders over £100'. Use business language.", - "expected_output": "Outputs a fenced gherkin block starting with Feature:. The Scenario: is preceded by a '# AC: US-003-01 (v1.0.0 – Active) — Customer with gold membership receives 20% discount on orders over £100' comment. Given describes the customer's membership tier (not a database row). When describes the customer placing an order (not a POST request). Then asserts the discounted total in business terms (e.g. 'Then the order total is £160.00'). No CSS selectors, endpoint URLs, or implementation details appear in the steps.", - "files": [], - "expectations": [ - "Output is a single fenced gherkin block", - "Feature: tag appears at the top of the block", - "# AC: US-003-01 comment appears on the line immediately above Scenario:", - "Scenario uses business language — no database rows, HTTP requests, or CSS selectors", - "Given describes state only, When describes exactly one action, Then describes outcome" - ] - }, - { - "id": 2, - "category": "happy-path", - "prompt": "Should I use a Scenario Outline for testing the discount calculation across gold (20%), silver (10%), and bronze (5%) membership tiers?", - "expected_output": "Yes — this is a classic data-driven case with a single behaviour repeated across input variations. Outputs a Scenario Outline with a parameterised step body and an Examples: table containing the three tiers (gold/80.00, silver/90.00, bronze/95.00 for £100 order). The # AC: comment appears above the Scenario Outline:. Explains that Scenario Outline is correct here because all rows exercise the same observable behaviour — the discount calculation — with different inputs.", - "files": [], - "expectations": [ - "Recommends Scenario Outline with Examples: table for the three tiers", - "# AC: comment appears above Scenario Outline:", - "Parameters are used in the step text: <tier> and <total>", - "Examples table includes all three membership tiers", - "Explains why Scenario Outline is appropriate here" - ] - }, - { - "id": 3, - "category": "happy-path", - "prompt": "I have a feature file where every scenario starts with the same two Given steps: 'Given the user is logged in as a gold member' and 'Given the cart has items'. Should I use a Background block?", - "expected_output": "Yes — Background is appropriate when every scenario in the file shares the same preconditions and the shared block is 3 steps or fewer. The Background: block replaces the repeated Given steps in each scenario. Provides guidance: if only 2–3 scenarios share the precondition, prefer duplicating the step for clarity. Warns to keep Background to 3 steps or fewer.", - "files": [], - "expectations": [ - "Recommends Background block since all scenarios share the same preconditions", - "Correctly states the no-more-than-3-steps guideline", - "Notes the alternative (duplicate Given steps) when only a subset of scenarios share the precondition", - "Does not put When or Then steps in the Background" - ] - }, - { - "id": 4, - "category": "regression", - "prompt": "Review this scenario for anti-patterns:\n\nScenario: Checkout flow\n When the customer clicks the 'Submit Order' button\n And the customer enters their credit card number '4111111111111111'\n And the customer clicks the 'Confirm Payment' button\n Then the response status is 201", - "expected_output": "Flags multiple anti-patterns: (1) UI selectors in steps — 'clicks the Submit Order button' and 'clicks the Confirm Payment button' expose implementation details; rewrite using domain actions ('the customer submits the order'). (2) Multiple When actions — the scenario has three When-equivalent steps describing a multi-step flow; should be collapsed into one declarative step or split. (3) Implementation detail in Then — 'response status is 201' is technical; rewrite as 'the order is confirmed'. (4) Missing # AC: comment. Provides a corrected version using domain language.", - "files": [], - "expectations": [ - "Flags UI selectors in step text (button clicks) as anti-pattern", - "Flags multiple When-equivalent actions", - "Flags technical assertion ('status 201') in Then", - "Flags missing # AC: comment", - "Provides a corrected version using domain language" - ] - }, - { - "id": 5, - "category": "negative", - "prompt": "Write the step definition code for 'When the customer confirms the order' in Python behave.", - "expected_output": "Implementing step definitions is out of scope for this skill — routes to gherkin-step. This skill writes the Gherkin text; gherkin-step handles the binding code.", - "files": [], - "expectations": [ - "Does not write step definition code", - "Routes to gherkin-step", - "Explains the distinction: Gherkin text vs. step binding code" - ] - }, - { - "id": 6, - "category": "paraphrase", - "prompt": "Convert these acceptance criteria into Gherkin:\nAC1: When the promo code is valid, the cart total decreases by 10%.\nAC2: When the promo code is expired, an error message is shown.", - "expected_output": "Outputs a fenced gherkin block with two Scenarios. Each Scenario is preceded by a # AC: comment. Scenario 1 (AC1): Given/When covers the valid promo path resulting in a 10% reduction. Scenario 2 (AC2): covers the expired promo error path. Business language throughout — no HTTP calls, selectors, or raw data.", - "files": [], - "expectations": [ - "Two scenarios, one per AC", - "# AC: comment above each Scenario", - "AC1 scenario covers the valid promo path", - "AC2 scenario covers the expired promo error path", - "Business language used throughout — no implementation details" - ] - }, - { - "id": 7, - "category": "edge-case", - "prompt": "I'm writing exploratory scenarios for a spike that doesn't have a User Story yet. What should I use for the # AC: comment?", - "expected_output": "For scenarios without a User Story context, use '# AC: STANDALONE' as a placeholder. Standalone scenarios are permitted when they live outside the project's dedicated living doc feature directory. Tutorial walkthroughs, exploratory probes, and developer-authored scenarios without a User Story AC all qualify. gherkin-living-doc-sync will note STANDALONE-tagged scenarios but will not flag them as traceability gaps.", - "files": [], - "expectations": [ - "Recommends '# AC: STANDALONE' as the placeholder", - "Explains that standalone is permitted for exploratory probes", - "Notes that gherkin-living-doc-sync will report but not flag STANDALONE scenarios as gaps" - ] - }, - { - "id": 8, - "category": "output-format", - "prompt": "Write a Gherkin scenario for AC:US-005-02 — 'Order is rejected when the payment card is declined'. Show the expected output format.", - "expected_output": "Output is a single fenced gherkin code block. The block starts with 'Feature:' followed by a feature title. The # AC: comment appears on the line immediately above the Scenario: keyword. The scenario has Given/When/Then steps. The entire output is inside one gherkin block — no extra prose inside the code block.", - "files": [], - "expectations": [ - "Entire output is a single fenced gherkin code block", - "Block starts with Feature:", - "# AC: US-005-02 comment immediately precedes Scenario:", - "Scenario has Given, When, and Then steps", - "No implementation details (no HTTP, selectors, DB) in steps" - ] - } - ] -} diff --git a/skills/gherkin-scenario/evals/trigger-eval.json b/skills/gherkin-scenario/evals/trigger-eval.json deleted file mode 100644 index 458f9b0..0000000 --- a/skills/gherkin-scenario/evals/trigger-eval.json +++ /dev/null @@ -1,19 +0,0 @@ -[ - {"id": 1, "query": "Write a Gherkin scenario for when a customer applies a promo code", "should_trigger": true, "reason": "'write a Gherkin scenario' trigger phrase"}, - {"id": 2, "query": "Help me write a BDD scenario for the payment failure case", "should_trigger": true, "reason": "'BDD scenario' trigger phrase"}, - {"id": 3, "query": "Review my feature file for anti-patterns", "should_trigger": true, "reason": "'feature file' and 'review my feature file' trigger phrases"}, - {"id": 4, "query": "I'm not sure how to write Given When Then for the order flow", "should_trigger": true, "reason": "'Given When Then' trigger phrase"}, - {"id": 5, "query": "Should I use a Scenario Outline here?", "should_trigger": true, "reason": "'Scenario Outline' trigger phrase"}, - {"id": 6, "query": "How do I write a Cucumber scenario for logging in?", "should_trigger": true, "reason": "'Cucumber scenario' trigger phrase"}, - {"id": 7, "query": "Write a behave scenario for the discount calculation", "should_trigger": true, "reason": "'behave scenario' trigger phrase"}, - {"id": 8, "query": "Can you help me write acceptance tests in Gherkin?", "should_trigger": true, "reason": "'acceptance test in Gherkin' trigger phrase"}, - {"id": 9, "query": "Should I use Background for shared login steps?", "should_trigger": true, "reason": "'should I use Background' trigger phrase"}, - {"id": 10, "query": "What are BDD anti-patterns I should avoid in feature files?", "should_trigger": true, "reason": "'BDD anti-patterns' trigger phrase"}, - {"id": 11, "query": "Review my feature file and give feedback", "should_trigger": true, "reason": "'review my feature file' trigger phrase"}, - {"id": 12, "query": "Can you write BDD scenarios for the checkout flow?", "should_trigger": true, "reason": "'BDD scenarios for' trigger phrase"}, - {"id": 13, "query": "Convert these acceptance criteria to Gherkin", "should_trigger": true, "reason": "'convert acceptance criteria to Gherkin' trigger phrase"}, - {"id": 14, "query": "Implement the step definition for 'When the customer confirms the order'", "should_trigger": false, "reason": "Step definition implementation — routes to gherkin-step"}, - {"id": 15, "query": "Write a unit test for the discount calculation function", "should_trigger": false, "reason": "Unit test request — out of scope for this toolkit (no test-unit-write skill defined)"}, - {"id": 16, "query": "Design a test case table for the promo code feature", "should_trigger": false, "reason": "Test case table design \u2014 out of scope for this toolkit (no test-case-design skill defined)"}, - {"id": 17, "query": "What # AC: comment should I use for an exploratory scenario that has no User Story?", "should_trigger": true, "reason": "Standalone # AC: STANDALONE placeholder guidance \u2014 part of gherkin-scenario skill (edge-case: exploratory scenario without a User Story)"} -] diff --git a/skills/gherkin-step/SKILL.md b/skills/gherkin-step/SKILL.md index 2353147..0e0af9a 100644 --- a/skills/gherkin-step/SKILL.md +++ b/skills/gherkin-step/SKILL.md @@ -10,8 +10,12 @@ description: > "parameter type", "DataTable", "DocString", "Before hook", "After hook", "World object", "step context", "step state sharing", "how to share state between steps", "register step definition", "hook setup". - Does NOT trigger for: writing Gherkin scenarios (use gherkin-scenario), writing unit tests - (use test-unit-write). Pairs with gherkin-scenario. + Does NOT trigger for: writing Gherkin scenarios (use bdd-scenario-gen), writing unit tests + (no skill in this toolkit covers unit test authoring — use your project's test framework + directly). + Pairs with bdd-scenario-gen. +license: Apache-2.0 +compatibility: GitHub Copilot --- # Gherkin Step Definition Standards @@ -22,7 +26,7 @@ description: > If the user asks to write or review a **Gherkin scenario / feature file**, do not draft the scenario here. Explain that this skill covers **step definition code** only, then route the user to -`gherkin-scenario` for the Gherkin text itself. +`bdd-scenario-gen` for the Gherkin text itself. --- diff --git a/skills/gherkin-step/evals/evals.json b/skills/gherkin-step/evals/evals.json index f3d33a1..ce6c150 100644 --- a/skills/gherkin-step/evals/evals.json +++ b/skills/gherkin-step/evals/evals.json @@ -57,11 +57,11 @@ "id": 5, "category": "negative", "prompt": "Write a Gherkin scenario for when the promo code is expired.", - "expected_output": "Writing Gherkin scenarios is out of scope for this skill - routes to gherkin-scenario. gherkin-step handles step definition code; gherkin-scenario handles Gherkin text.", + "expected_output": "Writing Gherkin scenarios is out of scope for this skill - routes to bdd-scenario-gen. gherkin-step handles step definition code; bdd-scenario-gen handles Gherkin text.", "files": [], "expectations": [ "Does not write a Gherkin scenario", - "Routes to gherkin-scenario", + "Routes to bdd-scenario-gen", "Explains the distinction: step binding code vs. Gherkin text" ] }, diff --git a/skills/gherkin-step/evals/trigger-eval.json b/skills/gherkin-step/evals/trigger-eval.json index 474dbef..2a89397 100644 --- a/skills/gherkin-step/evals/trigger-eval.json +++ b/skills/gherkin-step/evals/trigger-eval.json @@ -93,7 +93,7 @@ "id": 16, "query": "Write a Gherkin scenario for the promo code feature", "should_trigger": false, - "reason": "Writing Gherkin scenarios \u2014 routes to gherkin-scenario" + "reason": "Writing Gherkin scenarios — routes to bdd-scenario-gen" }, { "id": 17, diff --git a/skills/living-doc-create-feature/SKILL.md b/skills/living-doc-create-feature/SKILL.md index 9d8f3ad..ff29793 100644 --- a/skills/living-doc-create-feature/SKILL.md +++ b/skills/living-doc-create-feature/SKILL.md @@ -20,6 +20,7 @@ compatibility: GitHub Copilot # Living Doc — Create Feature > **Key concepts:** Feature, Functionality, User Story, AC — see [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)). +> **BDD schemas:** PageObject file header schema — see [living-doc-bdd-schemas](../references/living-doc-bdd-schemas.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-bdd-schemas.md)). ## Step 1 — Identify the system surface @@ -80,6 +81,13 @@ FUNC entries), leave the array as `[]` and add a warning: ## Step 6 — Output canonical Feature entity +> **ID assignment:** Before assigning a `FEAT-` ID, run +> `python scripts/next_id.py --type FEAT --catalog catalog.json` +> to get the next available numeric ID (e.g. `FEAT-012`) and avoid collisions. +> If your project uses readable slug IDs instead of numeric ones, derive the slug from the +> surface name (e.g. `FEAT-checkout`, `FEAT-orders-api`, `FEAT-notifications-centre`) +> and confirm there is no existing slug with the same name in the catalog. + Use a readable slug ID based on the business surface name: `FEAT-<kebab-name>` (for example `FEAT-checkout`, `FEAT-orders-api`, `FEAT-notifications-centre`). For UI names ending in generic words like `Page`, `Screen`, or `Modal`, you may omit that trailing UI noun in the ID when the shorter slug stays unambiguous. Output the entity as a **single fenced `json` code block** whenever you have enough information to draft it. Keep any warnings or follow-up questions **outside** the code block. If the user gives a named surface but not all metadata, ask the missing questions and still include a starter draft in the same reply, using inferred purpose/surface type, `status: "planned"`, and `[]` for relationships that are still unknown. If the request explicitly asks to create the entity from the given details, emit the draft immediately. diff --git a/skills/living-doc-create-functionality/SKILL.md b/skills/living-doc-create-functionality/SKILL.md index 2f94365..2f07913 100644 --- a/skills/living-doc-create-functionality/SKILL.md +++ b/skills/living-doc-create-functionality/SKILL.md @@ -12,7 +12,8 @@ description: > "document a business rule", "create a functionality entity", "functionality acceptance criteria", "test_type", "unit vs integration test", "choose test type", "link functionality to feature". Does NOT trigger for: end-to-end User Stories (use living-doc-create-user-story), system - surface documentation (use living-doc-create-feature). + surface documentation (use living-doc-create-feature), generating BDD scenarios for a + Functionality (use bdd-scenario-gen). license: Apache-2.0 compatibility: GitHub Copilot --- @@ -20,6 +21,7 @@ compatibility: GitHub Copilot # Living Doc — Create Functionality > **Key concepts:** Feature, Functionality, User Story, AC — see [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)). +> **BDD schemas:** Functionality feature file template and func_type values — see [living-doc-bdd-schemas](../references/living-doc-bdd-schemas.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-bdd-schemas.md)). ## Step 1 — Elicit the behavior @@ -160,3 +162,5 @@ redirect to `living-doc-create-user-story`. |---|---| | "Create a User Story" | `living-doc-create-user-story` — this skill documents atomic behaviors, not end-to-end User Stories | | "Create a Feature entity" | `living-doc-create-feature` — a Feature is a system surface, not an atomic behavior | +| "Write unit tests for this Functionality" | No skill in this toolkit covers unit test authoring — use your project's test framework directly. This skill defines the _what_ (ACs); writing the test code is outside scope. | +| "Generate BDD scenarios for this Functionality" | `bdd-scenario-gen` (step bodies) via `living-doc-scenario-creator` (feature file skeleton) | diff --git a/skills/living-doc-create-user-story/SKILL.md b/skills/living-doc-create-user-story/SKILL.md index 71a0f96..7754fb4 100644 --- a/skills/living-doc-create-user-story/SKILL.md +++ b/skills/living-doc-create-user-story/SKILL.md @@ -12,7 +12,8 @@ description: > "review this user story", "is my narrative well-formed", "I-want clause". Does NOT trigger for: atomic component behaviors (use living-doc-create-functionality), documenting system surfaces (use living-doc-create-feature), generating BDD scenarios - (use living-doc-scenario-creator). Pairs with living-doc-create-functionality. + (use living-doc-scenario-creator). Pairs with living-doc-create-functionality and + living-doc-scenario-creator. license: Apache-2.0 compatibility: GitHub Copilot --- diff --git a/skills/living-doc-gap-finder/SKILL.md b/skills/living-doc-gap-finder/SKILL.md index 3a888ce..a2d68d3 100644 --- a/skills/living-doc-gap-finder/SKILL.md +++ b/skills/living-doc-gap-finder/SKILL.md @@ -6,13 +6,14 @@ description: > undocumented behaviors, discovering orphan tests with no AC link, orphan Functionalities with no parent Feature, detecting untested ACs, producing a documentation coverage gap report (including batch runs for large suites), or proposing new living doc entities to fill - identified gaps. Orchestrates living-doc-pageobject-scan and living-doc-create-* skills. + identified gaps. Triggers on: "find what's not documented", "living doc gaps", "what's missing in living doc", "find undocumented features", "orphan tests", "orphan functionalities", "untested AC", "documentation coverage", "gap report", "what's not covered", "living doc audit", "documentation audit". Does NOT trigger for: creating new living doc objects (use living-doc-create-* skills). - Orchestrates: living-doc-pageobject-scan, living-doc-scenario-creator, and all create-* skills. + Delegates to: living-doc-pageobject-scan, living-doc-scenario-creator, bdd-scenario-gen, + and all create-* skills. license: Apache-2.0 compatibility: GitHub Copilot --- @@ -55,7 +56,7 @@ Nine types of gaps are detected, in order of risk: | Priority | Gap type | Description | |---|---|---| -| 1 — Blocker | **Untested AC** | An Active or Implemented AC in a User Story or Functionality has no linked test. | +| 1 — Blocker | **Untested AC** | An `ACTIVE` AC in a User Story or Functionality has no linked test. | | 2 — Important | **Undocumented UI surface** | A screen or API endpoint exists in the app with no Feature entity | | 3 — Important | **Orphan Feature** | A Feature entity exists with no linked User Story | | 4 — Important | **Orphan User Story** | A User Story exists with no linked Feature | @@ -91,7 +92,7 @@ For each gap type: **UNTESTED_AC:** ``` For each AC in (UserStory.ACs + Functionality.ACs) - where status IN (Active, Implemented) + where status == ACTIVE where no linked test exists: GAP: UNTESTED_AC ``` diff --git a/skills/living-doc-impact-analysis/SKILL.md b/skills/living-doc-impact-analysis/SKILL.md index 4cd3275..7428951 100644 --- a/skills/living-doc-impact-analysis/SKILL.md +++ b/skills/living-doc-impact-analysis/SKILL.md @@ -13,7 +13,8 @@ description: > "PR impact on docs". Does NOT trigger for: updating living doc (use living-doc-update), finding coverage gaps (use living-doc-gap-finder), creating new entities (use living-doc-create-* skills). - + Pairs with gherkin-living-doc-sync — high-impact AC changes identified here cascade to + gherkin-living-doc-sync for feature file propagation. license: Apache-2.0 compatibility: GitHub Copilot --- @@ -165,6 +166,12 @@ Before a release, confirm that all High-impact entities have been addressed: Produce this checklist as a PR comment or documentation artefact if requested. +> **After completing the impact map:** if the analysis identified ACs or entity descriptions that +> must change, hand off to `living-doc-update` immediately. Pass the exact entity ID(s) and the +> recommended change from Step 4's recommended actions list. This skill analyses — it does not +> edit entities. If any High-impact ACs were subsequently modified or deprecated, also invoke +> `gherkin-living-doc-sync` to propagate the changes to linked feature files. + ## Code-level impact report format When the change is a **method signature change** or **API contract change**, produce a diff --git a/skills/living-doc-pageobject-scan/SKILL.md b/skills/living-doc-pageobject-scan/SKILL.md index 793d253..1830e93 100644 --- a/skills/living-doc-pageobject-scan/SKILL.md +++ b/skills/living-doc-pageobject-scan/SKILL.md @@ -1,23 +1,29 @@ --- name: living-doc-pageobject-scan description: > - Explore an existing web application or test codebase to discover, create, and maintain PageObject - classes — the bottom-up entry point for BDD-driven UI testing. Use when generating PageObjects - from a live webapp URL or test directory, updating PageObjects after UI changes, bootstrapping - a test suite for a new screen, generating Functionality stubs from discovered UI elements, - updating the PageObject manifest after a redesign, or detecting PageObject drift. + Discover, create, and maintain PageObject classes — standalone bottom-up entry point for + BDD-driven UI testing. Use when generating PageObjects from a live webapp URL or test + directory, updating PageObjects after UI changes, bootstrapping a test suite for a new + screen, generating Functionality stubs from discovered UI elements, or detecting PageObject + drift. Triggers on: "scan this webapp", "generate pageobjects", "update pageobjects", "pageobject for this screen", "crawl the UI", "discover UI elements", "create page objects", "scan test suite for pageobjects", "living doc bottom-up", "bootstrap page objects", "pageobject drift", "sync pageobjects", "update manifest", "functionality stubs from UI". - Does NOT trigger for: creating User Stories (use living-doc-create-user-story), writing BDD - scenarios (use living-doc-scenario-creator). Pairs with living-doc-create-functionality - and living-doc-gap-finder. + Does NOT trigger for: creating User Stories (use living-doc-create-user-story), writing + scenarios (use living-doc-scenario-creator), agent crawl in @living-doc-bdd-copilot + (use bdd-explore), agent maintenance after UI changes (use bdd-maintain). + Pairs with living-doc-create-functionality and living-doc-gap-finder. After a scan + that produces non-empty coverage_gaps, continue with data-cy-instrument to resolve + missing data-cy attributes before generating scenarios. +license: Apache-2.0 +compatibility: GitHub Copilot --- # Living Doc — PageObject Scan > **Glossary:** Feature, PageObject, Functionality — see [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)). +> **BDD schemas:** PageObject file header (required fields, cross-reference format, operational notes, common mistakes) — see [living-doc-bdd-schemas](../references/living-doc-bdd-schemas.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-bdd-schemas.md)). **Scope:** This skill generates PageObjects only for `UI` Features (web pages, modals, screens). API Features use annotated endpoint methods as their living contract anchor — not PageObjects. @@ -73,7 +79,7 @@ For each distinct screen/route, extract: For each form, wizard, or dialog discovered on the route, attempt to fill and progress: -a. **Resolve field values** using the sourcing cascade (see `ExplorationFixture` in the glossary): +a. **Resolve field values** using the sourcing cascade (see [living-doc-bdd-schemas — ExplorationFixture](../references/living-doc-bdd-schemas.md#explorationfixture)): 1. Check `seed.yaml form_fixtures` for a pre-declared value for this route + field. 2. If absent: navigate to the entity list for this surface type; read an actual field value from an existing entity. Replay it as `copyable`, or append a suffix (e.g. `-copy`) to @@ -174,8 +180,8 @@ export class CheckoutPage { ``` The Living Doc Feature link (`FEAT-<nnn>`) is recorded in a file-level header comment (see -examples above) — not in the class docstring. The exact multi-field header format for -PageObject files is TBD and will follow similar conventions to the US/FUNC feature file header. +examples above) — not in the class docstring. The multi-field header format for PageObject +files is defined in [living-doc-bdd-schemas — PageObject File Header](../references/living-doc-bdd-schemas.md#pageobject-file-header). Flag fragile selectors: @@ -279,6 +285,14 @@ After confirmation of all changes, update the manifest entry for each scanned ro Use `scripts/manifest_diff.py` to detect stale manifest entries and undocumented PageObject files before running a full rescan. +```bash +# Show stale manifest entries and undocumented PageObjects +python scripts/manifest_diff.py --manifest .copilot/bdd/manifest.json --pages-dir tests/pages + +# Include a diff of element counts since last scan +python scripts/manifest_diff.py --manifest .copilot/bdd/manifest.json --pages-dir tests/pages --diff +``` + --- ## Output artifacts @@ -286,7 +300,7 @@ files before running a full rescan. | Artifact | Location | |---|---| | PageObject files | `tests/pages/<ScreenName>Page.py` (or `.ts`) | -| Feature link | `// living-doc: FEAT-<nnn> \| <route>` header comment in the PageObject file. If no Feature exists: `FEAT-UNKNOWN` placeholder and a note in the scan report. Header format TBD — will follow similar conventions to the US/FUNC feature file header. | +| Feature link | `// living-doc: FEAT-<nnn> \| <route>` header comment in the PageObject file. If no Feature exists: `FEAT-UNKNOWN` placeholder and a note in the scan report. Header format: see [living-doc-bdd-schemas — PageObject File Header](../references/living-doc-bdd-schemas.md#pageobject-file-header). | | Functionality feature file stubs | `features/functionalities/<feature-kebab>/func-<kebab>.feature` — one file per discovered Functionality behavior, `@FUNC_ID:FUNC-UNKNOWN` tag until ID is assigned | | Breaking change report | `.copilot/bdd/breaking-changes.md` | | Inaccessible routes (PHASE 5) | `.copilot/bdd/scan-phase5-inaccessible.md` | @@ -353,3 +367,4 @@ The manifest records per-route exploration state. Agents and tools read it to dr | Generate BDD scenarios for a User Story | `living-doc-scenario-creator` | | Create a User Story for this screen | `living-doc-create-user-story` | | Document an API endpoint or REST surface | `living-doc-create-functionality` | +| Resolve missing `data-cy` attributes after scan | `data-cy-instrument` (when `coverage_gaps` non-empty) | diff --git a/skills/living-doc-scenario-creator/SKILL.md b/skills/living-doc-scenario-creator/SKILL.md index 670e180..be6f413 100644 --- a/skills/living-doc-scenario-creator/SKILL.md +++ b/skills/living-doc-scenario-creator/SKILL.md @@ -1,50 +1,37 @@ --- name: living-doc-scenario-creator description: > - From User Stories and Acceptance Criteria, generate BDD Gherkin scenario skeletons in - .feature files and identify step implementations needed using available PageObjects. - Use when generating Gherkin scenarios from a User Story (e.g. US-007), covering US ACs with - BDD scenarios, mapping Given-When-Then to PageObject actions, auditing scenario-to-AC coverage, - or tagging partial AC coverage with aspect notation. - Triggers on: "create BDD scenarios for user story", "generate scenarios for US", - "cover AC with scenarios", "generate feature file from user story", "BDD from requirements", - "scenario coverage for US", "map AC to scenarios", "gherkin from user story", "scenarios for US-", - "generate .feature file", "AC coverage for US", "partial AC coverage". - Does NOT trigger for: standalone Gherkin without a User Story (use gherkin-scenario), - implementing step definitions (use gherkin-step), doc gaps (use living-doc-gap-finder). - Pairs with living-doc-create-user-story, gherkin-scenario, and living-doc-pageobject-scan. + Generate the living-doc feature file header block (@US_ID:/@FUNC_ID: tag, Feature narrative, + # Acceptance Criteria: block) and scenario skeletons (one @AC:-tagged Scenario: title with + ... placeholder per ACTIVE AC). Produces an AC coverage report. Step bodies (Given/When/Then) + are authored by bdd-scenario-gen. + Use when bootstrapping a feature file for a US or Functionality, auditing AC coverage, or + tagging partial coverage with aspect notation. + Triggers on: "feature file header for user story", "living-doc feature file", + "bootstrap feature file for US", "US feature file structure", "cover AC with scenarios", + "scenario coverage for US", "map AC to scenarios", "AC coverage for US", + "partial AC coverage", "scenario creator", "generate feature file for US", + "bootstrap living-doc scenarios". + Does NOT trigger for: writing scenario step bodies (use bdd-scenario-gen), standalone + Gherkin (use bdd-scenario-gen), step definitions (use gherkin-step), + doc gaps (use living-doc-gap-finder). + Pairs with living-doc-create-user-story, bdd-scenario-gen, and living-doc-pageobject-scan. +license: Apache-2.0 +compatibility: GitHub Copilot --- # Living Doc — Scenario Creator > **Glossary:** User Story, AC, PageObject, step definitions — see [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)). +> **BDD schemas:** US and Functionality feature file templates — see [living-doc-bdd-schemas](../references/living-doc-bdd-schemas.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-bdd-schemas.md)). -## Glossary alignment +## AC state vocabulary -**AC ID format:** `AC:<parent-id>-<nn>` — e.g. `AC:US-001-01`, `AC:US-001-02` +**AC states used in this skill:** `PLANNED` · `IN_REVIEW` · `ACTIVE` · `DEPRECATED` -**AC traceability** (required for living-doc feature files — placed above every `Scenario:` in `features/us/` and `features/functionalities/`): +Only `ACTIVE` ACs drive scenario generation. `PLANNED` and `DEPRECATED` ACs are skipped. -```gherkin -# AC:US-1-01 (v1.0.0 - Active) — customer places an order with a saved payment method -@AC:US-1-01 -Scenario: Customer successfully places an order -``` - -When a scenario covers only **one aspect** of a multi-aspect AC, encode it as a `/aspect:value` -param on the tag and mirror it in the comment: - -```gherkin -# AC:US-1-01 (v1.0.0 - Active) — displays {required field} on login screen | aspect: username input -@AC:US-1-01/aspect:username-input -Scenario: Login form shows the username input field -``` - -The `/param:value` format is extensible — additional params can be added after `/aspect:value`. -The `# AC:` comment provides human context (version, state, description, aspect). The `@AC:` Cucumber tag provides machine traceability for scripts and coverage reports. - -Only ACs with state `Active` or `Implemented` drive scenario generation. -ACs with state `Planned` or `Deprecated` are excluded from generation; note them in the coverage report. +**AC traceability format** — for the authoritative `# AC:` and `@AC:` annotation format, load `bdd-scenario-gen`. --- @@ -67,96 +54,60 @@ If PageObjects or step files are not available, generate scenarios with stub ste Load the User Story. Confirm: - ID follows `US-<nnn>` format -- Which ACs are eligible for generation (`Active` or `Implemented`) +- Which ACs are eligible for generation (`ACTIVE`) - ACs are atomic — each has one input condition and one observable outcome -Treat requests such as “write feature tests for US-007” as requests to generate BDD scenarios plus a coverage table for that User Story. - -If no ACs are `Active` or `Implemented`, do **not** generate empty or stub scenarios. Instead, -output a coverage report that lists every AC with its state-specific skip reason (`Planned`: -`skipped — not yet active`, `Deprecated`: `skipped — deprecated AC`) and advise the user to -re-run the scenario creator when an AC becomes `Active` or `Implemented`. +Treat requests such as "write feature tests for US-007" as requests to generate BDD scenarios plus a coverage table for that User Story. -### Step 2 — Map each AC to a scenario +If no ACs are `ACTIVE`, do **not** generate empty or stub scenarios. Instead, +output a coverage report that lists every AC with its state-specific skip reason (`PLANNED`: +`skipped — not yet active`, `DEPRECATED`: `skipped — deprecated AC`) and advise the user to +re-run the scenario creator when an AC becomes `ACTIVE`. -For each active AC, select the scenario pattern by AC type: -- `happy_path`: `Scenario:` or `Scenario Outline:` (if data-driven) -- `error`: `Scenario: <US title> — <error condition>`. If the AC text already gives a crisp business-facing failure title (for example, `Order rejected when payment card is declined`), prefer that exact title instead of mechanically prefixing the User Story title. -- `alternative`: `Scenario: <US title> — <alternative path>` +### Step 2 — Generate scenario skeletons -Generate a scenario for **every** active AC. +For each `ACTIVE` AC, generate the `# AC:` comment, `@AC:` tag, and `Scenario:` title with `...` as the step placeholder. Step bodies (Given/When/Then) are authored by `bdd-scenario-gen`. -Map Given-When-Then from the AC to existing step definitions — reuse exact step text where found. Keep all step text in domain/business language only; never mention HTTP, APIs, selectors, DOM details, databases, or other implementation mechanics. +Select the title by AC type: +- `happy_path`: `Scenario: <positive outcome>` +- `error`: `Scenario: <US title> — <error condition>` (prefer the crisp business-facing failure title from the AC if available) +- `alternative`: `Scenario: <US title> — <alternative path>` ```gherkin # AC:US-1-01 (v1.0.0 - Active) — customer places an order with a saved payment method @AC:US-1-01 Scenario: Customer successfully places an order - Given the customer has items in their cart and a saved payment method - When the customer confirms the order - Then the order is confirmed - And a confirmation email is sent to the customer - And the cart is emptied -``` - -### Step 3 — Implement missing step stubs + ... -For each step not found in existing step files, generate a named stub function in the -appropriate step file. Apply the following two-case protocol: - -**Case A — A PageObject method can implement the step:** - -Generate the full stub using the available method: - -``` -MISSING STEP: "Given the customer has items in their cart and a saved payment method" - PageObject candidate: CheckoutPage (FEAT-003) - Suggested step file: tests/steps/checkout_steps.py - Generated stub: - @given('the customer has items in their cart and a saved payment method') - def step_customer_has_cart_with_payment(context): - context.checkout_page = CheckoutPage(context.browser) - context.checkout_page.add_item_to_cart("SKU-100", quantity=1) - context.checkout_page.set_saved_payment_method() +# AC:US-1-02 (v1.0.0 - Active) — order is rejected when the payment card is declined +@AC:US-1-02 +Scenario: Order rejected when payment card is declined + ... ``` -**Case B — No matching PageObject method exists for the step:** +### Step 3 — Hand off step bodies to bdd-scenario-gen -Generate a stub with a `NotImplementedError` failure guard and flag the gap to -`living-doc-pageobject-scan` (Maintain mode) so it can extend the PageObject: +The skeletons from Step 2 use `...` placeholders. To produce full Given/When/Then implementations, pass the generated feature file to `bdd-scenario-gen`. For step definition code, load `gherkin-step`. -``` -MISSING STEP + MISSING PAGEOBJECT METHOD: - "When the customer applies a promo code" - No matching method found in CheckoutPage (FEAT-003) - Generated stub (with failure guard): - @when('the customer applies a promo code') - def step_apply_promo_code(context): - raise NotImplementedError( - "Step not implemented: 'the customer applies a promo code'. " - "CheckoutPage (FEAT-003) is missing an 'apply_promo_code' method. " - "Run living-doc-pageobject-scan (Maintain mode) on FEAT-003 to add it." - ) - Action: invoke living-doc-pageobject-scan (Maintain mode) for the missing element -``` +Do not author step bodies in this skill. ### Step 4 — Validate AC coverage -Every `Active` or `Implemented` AC must map to at least one scenario. +Every `ACTIVE` AC must map to at least one scenario. The coverage report must list **every** AC on the User Story, including skipped ones. Use these skip reasons verbatim so the output is predictable and auditable: -- `Planned`: `skipped — not yet active` -- `Deprecated`: `skipped — deprecated AC` +- `PLANNED`: `skipped — not yet active` +- `DEPRECATED`: `skipped — deprecated AC` Run `scripts/coverage_report.py <living_doc_dir> <features_dir>` for a full coverage report. ``` AC COVERAGE REPORT — US-001 - AC:US-001-01 (Active): ✅ covered by "Customer successfully places an order" - AC:US-001-02 (Active): ✅ covered by "Order rejected when payment card is declined" - AC:US-001-03 (Active): ❌ NOT COVERED — added to gap list - AC:US-001-04 (Planned): ⏭ skipped — not yet active - AC:US-001-05 (Deprecated): ⏭ skipped — deprecated AC + AC:US-001-01 (ACTIVE): ✅ covered by "Customer successfully places an order" + AC:US-001-02 (ACTIVE): ✅ covered by "Order rejected when payment card is declined" + AC:US-001-03 (ACTIVE): ❌ NOT COVERED — added to gap list + AC:US-001-04 (PLANNED): ⏭ skipped — not yet active + AC:US-001-05 (DEPRECATED): ⏭ skipped — deprecated AC ``` Use `scripts/coverage_report.py` to generate this report across all entities. @@ -198,33 +149,10 @@ Feature: Place an online order ... ``` -**Missing step report** — generated stub implementations grouped by step file; Case B stubs include `NotImplementedError` failure guards and flag missing PageObject methods for extension (see Step 3). - **Coverage table** — ACs with coverage status (use `scripts/coverage_report.py`). Append it immediately after the `.feature` code block in the response. --- -## Step reuse rules - -1. **Narrow to page scope first** — identify which PageObject the scenario's steps interact with. Only look in step definition files that already import or reference that PageObject; those are the most likely reuse candidates. -2. **Match by purpose, not just text** — read the step implementation body to confirm it performs the same business action. Two steps may have identical text but operate on different elements (e.g. a `fill` on `username-input` vs `search-input`). Only reuse if the purpose matches. -3. If a purpose-matching step exists, reuse it as-is; note the file it lives in. -4. Only if no match exists: write a new stub using the `gherkin-step` skill. If an existing step is close but not identical, suggest a parameter to generalise it rather than duplicating. -5. Never create duplicate step definitions — search before creating. - -## File placement - -| Step domain | Example step file | -|---|---| -| Authentication | `tests/steps/auth_steps.py` | -| Checkout / order | `tests/steps/checkout_steps.py` | -| Common / shared | `tests/steps/common_steps.py` | -| Domain-specific | `tests/steps/<domain>_steps.py` | - -> **Note:** Paths above are illustrative examples. Actual file locations depend on the project's repository structure. - ---- - ## Functionality scenarios When the source is a Functionality (`FUNC-<nnn>`) rather than a User Story, apply the same workflow but with these differences: @@ -282,5 +210,9 @@ Functionality scenarios are **not** unit tests written in Gherkin. Steps must st | Request | Correct skill | |---|---| -| Standalone Gherkin without a User Story | `gherkin-scenario` | +| Standalone Gherkin without a User Story | `bdd-scenario-gen` | | Writing step definition code | `gherkin-step` | + +**Ambiguous request — "create scenarios for US-007":** If the user does not specify whether they want skeleton structure or full step bodies, ask: +> "Do you want the living-doc feature file header and skeleton scenario titles (continue here in `living-doc-scenario-creator`), or full Given/When/Then scenario bodies (use `bdd-scenario-gen`)?" +This skill produces the reusable structural skeleton (header block + AC-tagged scenario titles); `bdd-scenario-gen` fills in the step bodies. diff --git a/skills/living-doc-scenario-creator/evals/evals.json b/skills/living-doc-scenario-creator/evals/evals.json index ab0425d..f198af4 100644 --- a/skills/living-doc-scenario-creator/evals/evals.json +++ b/skills/living-doc-scenario-creator/evals/evals.json @@ -60,11 +60,11 @@ "id": 5, "category": "negative", "prompt": "Write a standalone Gherkin scenario for testing login without a specific User Story.", - "expected_output": "Standalone Gherkin without a User Story is out of scope — routes to gherkin-scenario. This skill generates scenarios from User Story ACs; gherkin-scenario handles standalone or exploratory scenarios.", + "expected_output": "Standalone Gherkin without a User Story is out of scope — routes to bdd-scenario-gen. This skill generates scenarios from User Story ACs; bdd-scenario-gen handles standalone or exploratory scenarios.", "files": [], "expectations": [ "Does not generate a standalone scenario", - "Routes to gherkin-scenario", + "Routes to bdd-scenario-gen", "Explains the distinction: US-driven vs. standalone Gherkin" ] }, diff --git a/skills/living-doc-scenario-creator/evals/fixture-map.md b/skills/living-doc-scenario-creator/evals/fixture-map.md index 742ec1f..e8f963a 100644 --- a/skills/living-doc-scenario-creator/evals/fixture-map.md +++ b/skills/living-doc-scenario-creator/evals/fixture-map.md @@ -12,7 +12,7 @@ No fixture files for this skill. All evals use inline User Story/AC definitions | 2 | happy-path | _(none — inline AC list in prompt)_ | AC state filtering: Active → generated, Deprecated → skipped, Planned → skipped | | 3 | happy-path | _(none)_ | Case A step stub: PageObject method exists — full stub, no NotImplementedError | | 4 | regression | _(none)_ | Case B step stub: missing PageObject method — NotImplementedError + maintenance flag | -| 5 | negative | _(none)_ | Routing: standalone Gherkin without a US → gherkin-scenario | +| 5 | negative | _(none)_ | Routing: standalone Gherkin without a US → bdd-scenario-gen | | 6 | paraphrase | _(none)_ | "Write feature tests for US-nnn" → scenario generation request | | 7 | edge-case | _(none)_ | All ACs Planned → zero scenarios generated; coverage report with skip reasons | | 8 | output-format | _(none)_ | .feature file structure: @US_ID:, Feature: header, # AC: + @AC: per scenario | @@ -26,7 +26,7 @@ No fixture files for this skill. All evals use inline User Story/AC definitions | Routes to | Query count | |---|---| -| gherkin-scenario | 1 | +| bdd-scenario-gen | 1 | | gherkin-step | 1 | | living-doc-gap-finder | 1 | | gherkin-living-doc-sync | 1 | diff --git a/skills/living-doc-scenario-creator/evals/trigger-eval.json b/skills/living-doc-scenario-creator/evals/trigger-eval.json index 795c6d8..75684a8 100644 --- a/skills/living-doc-scenario-creator/evals/trigger-eval.json +++ b/skills/living-doc-scenario-creator/evals/trigger-eval.json @@ -9,7 +9,7 @@ {"id": 8, "query": "Generate Gherkin from user story US-012", "should_trigger": true, "reason": "'gherkin from user story' trigger phrase"}, {"id": 9, "query": "Create scenarios for US-007", "should_trigger": true, "reason": "'scenarios for US-' trigger phrase — explicitly mentions a US ID"}, {"id": 10, "query": "Generate a .feature file for the checkout flow", "should_trigger": true, "reason": "'generate .feature file' trigger phrase"}, - {"id": 11, "query": "Write standalone Gherkin scenarios for an exploratory test", "should_trigger": false, "reason": "Standalone Gherkin without a User Story — routes to gherkin-scenario"}, + {"id": 11, "query": "Write standalone Gherkin scenarios for an exploratory test", "should_trigger": false, "reason": "Standalone Gherkin without a User Story — routes to bdd-scenario-gen"}, {"id": 12, "query": "Implement the step definition for 'When the customer confirms the order'", "should_trigger": false, "reason": "Step definition implementation — routes to gherkin-step"}, {"id": 13, "query": "Write a unit test for the promo code calculation", "should_trigger": false, "reason": "Unit test request — out of scope for this toolkit (no test-unit-write skill defined)"}, {"id": 14, "query": "Find which User Stories have no Gherkin coverage at all", "should_trigger": false, "reason": "Finding doc gaps — routes to living-doc-gap-finder"}, diff --git a/skills/living-doc-update/SKILL.md b/skills/living-doc-update/SKILL.md index 4d537c1..ac58b0f 100644 --- a/skills/living-doc-update/SKILL.md +++ b/skills/living-doc-update/SKILL.md @@ -12,7 +12,6 @@ description: > "change status of user story", "update feature registry". Does NOT trigger for: creating new entities (use living-doc-create-*), finding gaps (use living-doc-gap-finder), generating scenarios (use living-doc-scenario-creator). - license: Apache-2.0 compatibility: GitHub Copilot --- @@ -87,6 +86,8 @@ Rules: - Add `deprecated_code_commit` when the code was removed in a commit - Add `superseded_by` when a replacement entity exists - Flag any tests linked to the deprecated entity for update or removal +- If the deprecated entity has `ACTIVE` ACs with linked Gherkin scenarios, trigger + `gherkin-living-doc-sync` to propagate `@deprecated` and `@review-needed` tags to those scenarios ## Update Feature ownership or dependencies @@ -97,12 +98,13 @@ When a team changes ownership of a Feature, update the `owners` field and set `o When an AC is moved out of the current sprint but not permanently removed: -- Set `status: descoped` and add `descoped_at` (date) and `descoped_reason` fields — **do not delete the AC** (preserves audit trail) +- Keep `status: PLANNED` — state does not change for a deferral +- Add `descoped_at` (date) and `descoped_reason` fields — **do not delete the AC** (preserves audit trail) - Add `future_release` field if the work is planned for a later sprint - Flag any linked tests for `@skip` or `@pending` tagging ``` -AC:US-042-03 (v1.2.0 – descoped) +AC:US-042-03 (v1.2.0 – PLANNED) – Promo codes can be stacked and applied in defined priority order. – descoped_at: 2026-05-15 – descoped_reason: Promo stacking rule deferred — too complex for current sprint @@ -117,6 +119,8 @@ AC:US-042-03 (v1.2.0 – descoped) | Create a new Feature | `living-doc-create-feature` | | Create a new Functionality | `living-doc-create-functionality` | | Find gaps in living documentation | `living-doc-gap-finder` | +| AC modified, deprecated, or descoped — sync linked scenarios | `gherkin-living-doc-sync` | +| Assess impact of an AC change on Features and User Stories | `living-doc-impact-analysis` | ## Script — `scripts/validate_entity.py` diff --git a/skills/references/living-doc-bdd-schemas.md b/skills/references/living-doc-bdd-schemas.md new file mode 100644 index 0000000..121a448 --- /dev/null +++ b/skills/references/living-doc-bdd-schemas.md @@ -0,0 +1,408 @@ +# Living Documentation — BDD Schemas + +Templates and schemas for BDD automation files. Load this file when writing or validating: +- US or Functionality **feature file headers** (`features/us/`, `features/functionalities/`) +- **PageObject file headers** (full header or cross-reference header) +- **ExplorationFixture** entries in `seed.yaml` +- **manifest.json** `field_constraints` entries + +For entity definitions (IDs, status vocabulary, AC format, relationship diagram) see [living-doc-glossary](./living-doc-glossary.md). + +--- + +## US Feature File Header + +Header comment block at the top of every `features/us/us-<nnn>-<kebab>.feature` file. +Holds all US metadata and is mined during living documentation output generation. + +```gherkin +# ============================================================================= +# LIVING DOC — US-<n> · <US Title> +# ============================================================================= +# source: https://github.com/<org>/<repo>/issues/<n> ← optional +# status: PLANNED | IN_REVIEW | ACTIVE | DEPRECATED +# business_value: +# - <bullet describing the business outcome> +# not_in_scope: ← optional +# - <item excluded from this US> +# preconditions: ← optional +# - <system state required before test> +# +# acceptance_criteria: +# +# AC:US-<n>-01 (v<version> - <State>) +# - <description of the AC> +# - <Aspect>: <value1>, <value2> ← optional; used for {placeholder} ACs +# +# AC:US-<n>-02 (v<version> - <State>) +# - <description of the AC> +# ============================================================================= + +@US_ID:US-<n> +Feature: <US Title> + As a <actor>, I can <capability>, so that <business outcome>. + + Background: ← optional + Given <shared precondition> + + # AC:US-<n>-01 (v<version> - <State>) — <AC description> + @AC:US-<n>-01 + Scenario: <scenario title> + ... +``` + +**Header fields:** + +| Field | Required | Purpose | +|---|---|---| +| `# source:` | Optional | Link to the original issue tracker entry or the pre-BDD living doc location | +| `# status:` | Yes | `PLANNED` · `IN_REVIEW` · `ACTIVE` · `DEPRECATED` | +| `# business_value:` | Yes | Why this User Story exists (bullets) | +| `# not_in_scope:` | Optional | Explicit exclusions | +| `# preconditions:` | Optional | System-level state required before test execution | +| `# acceptance_criteria:` | Yes | Full AC listing with IDs, versions, and states | +| `@US_ID:US-<n>` tag | Yes | Machine-parseable User Story ID (feature-level tag) | + +--- + +## Functionality Feature File Header + +Header comment block at the top of every `features/functionalities/<feat-kebab>/func-<nnn>-<kebab>.feature` file. + +```gherkin +# ============================================================================= +# LIVING DOC — FUNC-<nnn> · <Feature Name> — <Functionality Name> +# ============================================================================= +# source: https://github.com/<org>/<repo>/issues/<n> ← optional +# status: PLANNED | IN_REVIEW | ACTIVE | DEPRECATED +# parent: FEAT-<nnn> +# func_type: component_state | component_action | button_action | +# field_validation | calculation | visibility | navigation_rule +# rationale: ← optional +# - <why this FUNC is scoped this way — business or design decision context> +# not_in_scope: ← optional +# - <exclusion> +# +# acceptance_criteria: +# +# AC:FUNC-<nnn>-01 (v<version> - <State>) +# - <description in business language — no data-cy IDs in AC text> +# +# AC:FUNC-<nnn>-02 (v<version> - <State>) +# - <description> +# ============================================================================= + +@FUNC_ID:FUNC-<nnn> +Feature: <Feature Name> — <Functionality Name> + <Purpose: one-to-two sentences describing what this FUNC covers, in business + language. Present only when purpose adds context beyond the title.> ← optional + + # No scenarios yet — uncovered ACs flagged by coverage_report.py. + # When adding scenarios: include both # AC:<id> comment and @AC:<id>[/param:value] tag above each Scenario. +``` + +**Header fields:** + +| Field | Required | Purpose | +|---|---|---| +| `# source:` | Optional | Link to the original issue tracker entry or the pre-BDD living doc location | +| `# status:` | Yes | `PLANNED` · `IN_REVIEW` · `ACTIVE` · `DEPRECATED` | +| `# parent:` | Yes | Parent Feature ID (`FEAT-<nnn>`) | +| `# func_type:` | Yes | Category of behavior this Functionality represents (see table below) | +| `# rationale:` | Optional | **Why** this FUNC is scoped the way it is — business context, a deliberate design decision, or a constraint that explains the boundary. Not for implementation notes. | +| `# not_in_scope:` | Optional | Explicit exclusions | +| `# acceptance_criteria:` | Yes | Full AC listing in business language — do not include `data-cy` IDs or implementation names in AC text | +| `@FUNC_ID:FUNC-<nnn>` tag | Yes | Machine-parseable Functionality ID (feature-level tag) | +| Feature description (below `Feature:`) | Optional | One-to-two sentence purpose in business language. Use when the title alone is not self-explanatory. | + +**`func_type` values:** + +| Value | What it documents | PageObject anchor | +|---|---|---| +| `component_state` | Visible state of elements on load (presence, enabled/disabled, default text) AND what a data-bound component renders per data state (populated, empty, error) | `constructor` locators, data-bearing locators | +| `component_action` | Observable response within a self-contained component to an internal interaction — no discrete button, no system-level side effect (e.g. live search, autocomplete, accordion, carousel, tab content) | Component input/state locators | +| `button_action` | Observable outcome(s) after a specific discrete control is triggered — may span multiple resulting steps (e.g. redirect, entity created, dialog opened) | `btn-*` locators | +| `field_validation` | Rule enforced on a single field's value — inline error, enabled state, accepted/rejected input | `input-*` locators | +| `calculation` | Value computed and displayed from one or more inputs, independent of form submission | Display-only locators | +| `visibility` | Element presence, content, or enabled state conditional on a runtime state — condition is optional context and may be role, prior action, data presence, or config (e.g. owner sees action buttons, section appears after step complete) | Any conditional locator | +| `navigation_rule` | When and where the app routes, driven by action or system state — only when routing has a distinct precondition or business rule | Route assertion | + +**Scoping rules:** + +- **One FUNC, one cause.** If two behaviors share a trigger, they are one FUNC with two ACs. If two behaviors have different triggers, they are two FUNCs. +- **`component_state`** — scope to a logical group, not individual elements. "Login form controls on load" is one FUNC. Do not write one FUNC per locator. For data-bound components, each distinct data state (populated / empty / error) is an AC on the same FUNC, not a separate FUNC. +- **`component_action`** — one FUNC per distinct component behavior. If the same component has multiple independent internal behaviors (live search AND column sort), they are separate FUNCs. +- **`button_action`** — one FUNC per distinct button. A button that produces multiple observable steps is still one FUNC; the steps become multiple ACs. Two buttons = two FUNCs. Form submission is `button_action` — the trigger is the submit control. +- **`field_validation`** — one FUNC per distinct validation rule, not one per field. The same rule applied to multiple fields = one FUNC with a `{field}` placeholder AC. +- **`calculation`** — only when the derived value is observable independently of a submission. If the result only appears after a form submit, it is an AC on the `button_action` FUNC. +- **`visibility`** — use when an element's presence or state depends on a condition. The condition is descriptive context in the AC, not a required field. Distinct from `component_state` (always-true on load) and `component_action` (response to interaction). +- **`navigation_rule`** — only for routing behaviors with a distinct precondition or business rule. A redirect that is always the result of a button action is an AC on that `button_action` FUNC, not a separate `navigation_rule`. + +> `test_type` (unit vs integration vs system) is NOT a FUNC header field — it belongs at scenario level as a tag (e.g. `@test_type:system`). + +--- + +## PageObject File Header + +Every PageObject file opens with a living-doc header block. Use this format so each file is self-describing and traceable without opening a separate registry. + +### Required fields + +| Field | Canonical values | +|---|---| +| `surface_type` | `UI` · `API` · `Service` · `Worker` · `Module` · `Library` | +| `route` | URL path — use `{param}` for dynamic segments | +| `owners` | Team name(s), comma-separated | +| `status` | `active` · `planned` · `candidate` · `deprecated` | +| `purpose` | One-to-two sentence description in business language | +| `user_stories` | `US-N` IDs, comma-separated — or `none` (triggers orphan warning in gap reports) | +| `functionalities` | `FUNC-N` IDs, comma-separated — or `none` (triggers a reminder to define FUNCs) | +| `external_dependencies` | Service or API names this surface calls — or `none` | +| `page-object` | Filename of this PageObject | + +**Optional fields:** + +| Field | When | +|---|---| +| `wizard-steps` | Multi-step wizard UI — list the named steps in order | +| `stub-reason` | `status: candidate` — one-to-two sentence statement of **why** the surface is not yet fully instrumented; treated as tech-debt resolvable by instrumenting the template and re-scanning | + +### Two header formats: Full vs Cross-reference + +A PageObject file uses one of two formats depending on whether it is the **primary surface owner** or a **secondary file** that implements part of a surface already owned elsewhere. + +| Situation | Format | +|---|---| +| One PageObject = one distinct navigable surface (URL or modal) | **Full header** | +| Multiple PageObjects share one URL (e.g. wizard steps, sub-pages, dialogs) — one file is the primary owner, the others are implementation helpers | **Cross-reference header** — secondary files only | + +**Rule:** exactly one file per Feature carries the full header. Every other file that contributes to the same Feature carries a cross-reference header with `parent-feat` pointing to the Feature ID. This keeps traceability fields (`user_stories`, `functionalities`, `external_dependencies`) in a single authoritative location. + +**Wizard example:** FEAT-042 (Account Setup Wizard) lives at one URL. `AccountSetupWizardPage.ts` is the primary file and carries the full header. `AccountSetupWizardProfilePage.ts`, `AccountSetupWizardPreferencesPage.ts`, and the other step files each carry a cross-reference header pointing `parent-feat: FEAT-042`. Adding a wizard step never requires editing the Feature registry or duplicating traceability data. + +--- + +**Full header example:** + +```typescript +/* ============================================================================= + * LIVING DOC — FEAT-042 · Account Setup Wizard + * ============================================================================= + * surface_type: UI + * route: /app/accounts/setup + * owners: Platform Team + * status: active + * wizard-steps: Profile · Preferences · Review · Confirm + * purpose: Multi-step wizard for creating and configuring a new account. + * user_stories: US-10, US-12 + * functionalities: FUNC-005, FUNC-006 + * external_dependencies: accounts-api + * page-object: AccountSetupWizardPage.ts + * ============================================================================= */ +``` + +--- + +**Cross-reference header required fields:** + +| Field | Canonical values | +|---|---| +| `parent-feat` | `FEAT-<nnn>` — ID of the primary Feature that owns this surface. **Required.** | +| `route` | URL path of this specific sub-surface — use `{param}` for dynamic segments | +| `owners` | Team name(s), comma-separated | +| `status` | `active` · `planned` · `candidate` · `deprecated` | +| `purpose` | One sentence: what this step or sub-surface does, in business language — no FEAT IDs | +| `page-object` | Filename of this PageObject | + +The following fields are **intentionally omitted** from the cross-reference header — they belong only on the primary Feature file: `surface_type`, `user_stories`, `functionalities`, `external_dependencies`. + +**Cross-reference header example:** + +```typescript +/* ============================================================================= + * LIVING DOC — FEAT-042 · Account Setup Wizard [cross-reference] + * ============================================================================= + * This file implements Step 1 (Profile) of the Account Setup Wizard. + * The authoritative Feature header is in AccountSetupWizardPage.ts. + * + * parent-feat: FEAT-042 + * route: /app/accounts/setup (wizard stays on this URL) + * owners: Platform Team + * status: active + * purpose: Step 1 (Profile) — user profile fields: display name, email address, + * and role selection. + * page-object: AccountSetupWizardProfilePage.ts + * ============================================================================= */ +``` + +### Where operational notes belong + +The PageObject **header block and class JSDoc are living-doc contracts** — they encode identity, traceability, and status. They are not a changelog, scan diary, or issue tracker. + +| Information type | Correct location | NOT in | +|---|---|---| +| Missing `data-cy` attributes discovered during a scan | `manifest.json` → `coverage_gaps[]` | Header or class JSDoc | +| Reason a surface is not yet fully instrumented | Header field `stub-reason:` (one or two lines) | Free-text NOTE block | +| Proposed `data-cy` names for missing elements | `manifest.json` → `coverage_gaps[].suggestedDataCy` | Header or class body | +| Open issue reference (e.g. OI-08, P1) | `manifest.json` → `coverage_gaps[].note` | Header or class JSDoc | +| Scan date or scan session tag | `manifest.json` → `last_scanned` | Header or class JSDoc | +| `@stub` / `@pending` JSDoc tags on the class | — (use `status: candidate` + `stub-reason:`) | Class JSDoc | +| Implementation note explaining a locator strategy | Inline code comment on the locator or method | Header block | + +**`status: candidate` and `stub-reason:` as resolvable tech-debt** + +A `status: candidate` surface is **not a permanent state**. The surface is known, documented, and linked to User Stories; what is missing is template instrumentation (`data-cy` attributes). Resolution path: + +1. Instrument the component template with the `data-cy` values listed in `manifest.json` `coverage_gaps[]` (use the `data-cy-instrument` skill). +2. Re-scan — the scan session updates the PageObject locators. +3. Promote `status: candidate` → `status: active` and remove `stub-reason:`. + +`stub-reason:` records the factual state at time of discovery (≤ two lines). Must not contain: internal tool or file references, data-cy attribute names, action items, or scan session tags except as a factual date anchor. + +### Common mistakes + +| Anti-pattern | Correct | +|---|---| +| `type: screen` | `surface_type: UI` | +| `owner: Team` | `owners: Team` (plural key) | +| `status: ACTIVE` | `status: active` (lowercase) | +| `status: STUB` | `status: candidate` + `stub-reason:` field | +| `functionalities:` omitted | `functionalities: none` | +| `user_stories:` omitted | `user_stories: none` | +| `external_dependencies:` omitted | `external_dependencies: none` | +| `parent-feat:` omitted from cross-reference file | Every secondary file for a shared Feature must declare `parent-feat` | +| `page-object:` omitted from cross-reference file | `page-object:` is required in both formats | +| `user_stories:` duplicated in cross-reference file | These fields live only on the primary Feature file | +| Multiple files claiming the same Feature without `[cross-reference]` tag | Only one file carries the full header | +| NOTE block in header about missing `data-cy` or open issues | Move to `manifest.json` `coverage_gaps[]`; keep only `stub-reason:` in the header | +| `@stub` or `@pending` on the class JSDoc | Use `status: candidate` + `stub-reason:` in the header instead | +| `purpose:` contains FEAT IDs | `purpose` must not contain FEAT IDs — ID is already in the title line and `parent-feat` | +| `purpose:` contains "NOT a …" or "Accessed via …" | Purpose describes what the surface does; exclude defensive statements and navigation instructions | +| `route:` contains a `data-cy` attribute name | `route:` is a URL path; locator IDs belong in the PageObject body | +| `wizard-steps:` contains `[scan: …]` tag | `wizard-steps:` is a clean ordered list; scan provenance belongs in `manifest.json` | +| Non-spec field added to header | Only use fields defined in the Required or Optional tables | +| Cross-reference prose mentions FUNC IDs or file names | Keep to one human-readable sentence: which step/sub-surface this file implements and where the authoritative header lives | +| `stub-reason:` contains action items or internal tool refs | `stub-reason:` states only the factual reason (≤ two lines) | + +--- + +## ExplorationFixture + +An **ExplorationFixture** is a named set of field→value declarations attached to a specific route in `seed.yaml`. It tells the exploration agent how to fill forms so it can enter wizards, open dialogs, and discover UI surfaces that are otherwise unreachable by passive observation. + +### value_class taxonomy + +| Class | Meaning | How the agent sources it | +|---|---|---| +| `copyable` | Value can be reused verbatim across runs — taken from an existing entity in the app | Navigate to the entity list; read an actual field value; replay it | +| `derived` | Must be transformed from an existing entity — e.g. append `-copy` to a domain name to avoid duplicate rejection | Read existing value; apply a known transformation rule | +| `fake` | Any syntactically valid value — real-world existence not required (e.g. a description, an email address) | Generate locally from label + placeholder + field type | +| `real-world` | Must exist in the real environment for submission to succeed (e.g. a Glue table name, a tenant ID, an AWS account ID) | Sourced from `seed.yaml form_fixtures` or user-provided via Source E pause | + +### Sourcing cascade (applied in priority order) + +1. **`seed.yaml form_fixtures`** — pre-declared by user or written by the agent in a prior session. +2. **Existing app entities** — navigate to the entity list for this surface type; read a sample entity's actual field values; copy or derive. +3. **Field context inference** — read label + placeholder + tooltip + adjacent validation hint text → infer a plausible `fake` value (`"Domain name"` → `"E2E Test Domain"`, `email` field → `"test@example.com"`). +4. **User-assist pause** — none of the above is sufficient for a `real-world` field → show user the form, request the value, record it back to `form_fixtures` with `source: user_provided`. + +### Input validation probing + +After a successful form fill and submission, the agent probes validation behaviour on each text input: + +| Probe | Input | What to observe | +|---|---|---| +| Special characters | `<>'"&\` | Inline error, silent strip, or truncation | +| Oversized input | 200+ random characters | Character counter, truncation at max length, or rejection message | +| Wrong type | Alphabetic text in a numeric or date field | Inline validation message or field rejection | +| Duplicate detection | Identical value to a known existing entity name | Duplicate-rejected error message and its `data-cy` | + +Scan the form after each probe to capture `data-cy` error messages, character counters, and validation banners that are only visible during invalid input. These become source material for `field_validation` Functionality stubs. + +### seed.yaml schema + +```yaml +form_fixtures: + /auth/all-domains/create-domain/about: + + # Simple single value + - field_data_cy: domain-name + value_class: fake + value: "E2E Exploration Domain" + source: inferred # inferred | user_provided | env_var | existing_entity + + # Multiple values — agent treats each as a separate traversal branch. + # The first (label: default) is used for the happy path; labelled alternates + # are explored afterwards and may open different form sections or sub-routes. + - field_data_cy: domain-type + value_class: copyable + values: + - label: default + value: "BATCH" + source: existing_entity + - label: streaming-path # explores different form section + value: "STREAMING" + source: existing_entity + + # Conditional field — only filled when another field holds a specific value. + - field_data_cy: stream-endpoint + value_class: real-world + value: env:TEST_STREAM_ENDPOINT + source: env_var + condition: + when_field: domain-type + when_value: STREAMING + + # Real-world field resolved via user-assist pause + - field_data_cy: tenant-id + value_class: real-world + value: env:TEST_TENANT_ID + source: env_var +``` + +**Field reference:** + +| Key | Required | Purpose | +|---|---|---| +| `field_data_cy` | Yes | `data-cy` attribute of the target input element | +| `value_class` | Yes | `copyable` / `derived` / `fake` / `real-world` | +| `value` | One of `value` or `values` | Shorthand for a single fill value | +| `values[]` | One of `value` or `values` | Array of labelled values; agent explores each as a separate traversal branch | +| `values[].label` | Yes (when `values` used) | Branch identifier; `default` marks the happy-path value | +| `values[].value` | Yes | The actual fill value | +| `source` | Yes | `inferred` \| `user_provided` \| `env_var` \| `existing_entity` | +| `condition` | No | Restricts the fill to a specific context | +| `condition.when_field` | Yes (when `condition` used) | `data-cy` of the controlling field | +| `condition.when_value` | Yes (when `condition` used) | Value the controlling field must hold | + +### manifest field_constraints schema + +Validation findings are stored per-field in the manifest `navigation_context.field_constraints` for the route: + +```json +"field_constraints": [ + { + "field_data_cy": "domain-name", + "max_length": 100, + "special_chars": "rejected", + "duplicate": "rejected-with-error", + "duplicate_error_data_cy": "domain-name-duplicate-error" + }, + { + "field_data_cy": "tenant-id", + "allowed_format": "alphanumeric", + "real_world_required": true + } +] +``` + +### Lifecycle + +| Event | What happens | +|---|---| +| First form encountered | Agent applies sourcing cascade; fills form using `default` values; explores labelled alternate branches for multi-value fields; probes validation | +| `condition` field not yet visible | Agent skips the field until the controlling field holds the required `when_value` | +| `real-world` field has no resolvable value | User-assist pause → user provides value → saved to `form_fixtures` with `source: user_provided` | +| Validation probe discovers new `data-cy` | Added to manifest `elements`; flagged as candidate for `field_validation` Functionality | +| Next scan session | Agent reads `form_fixtures` from `seed.yaml`; skips sourcing cascade for pre-declared fields | +| Constraint changes (e.g. max length increased) | Agent detects mismatch on re-probe; updates `field_constraints`; flags in `breaking-changes.md` | diff --git a/skills/references/living-doc-glossary.md b/skills/references/living-doc-glossary.md index 5f7cd74..efb3ad3 100644 --- a/skills/references/living-doc-glossary.md +++ b/skills/references/living-doc-glossary.md @@ -1,7 +1,9 @@ # Living Documentation — Shared Glossary -All living-doc-* skills operate on the same canonical entity model. -Use these definitions consistently across all skill invocations. +Core entity contracts: IDs, status vocabulary, relationships, and AC format. +All living-doc-* skills operate on this canonical entity model. + +For BDD file templates and schemas (feature file headers, PageObject headers, ExplorationFixture, seed.yaml), load [living-doc-bdd-schemas](./living-doc-bdd-schemas.md). --- @@ -21,63 +23,13 @@ so that <business outcome>. - Name: short imperative title (e.g. "Customer Login") - Owns: end-to-end **Acceptance Criteria (AC)** - Links to: one or more **Features** (system surfaces the User Story touches) -- Status: `planned | active | deprecated` -- Deprecation metadata (set when `status: deprecated`): +- Status: `PLANNED | IN_REVIEW | ACTIVE | DEPRECATED` +- Deprecation metadata (set when `status: DEPRECATED`): - `deprecated_at` — date the entity was deprecated - `deprecation_reason` — why it was deprecated - `superseded_by` — ID of the replacement entity (optional) -**US feature file header format** (as used in `features/us/us-<nnn>-<kebab>.feature`): - -The header comment block at the top of a US feature file holds all US metadata and is mined -during living documentation output generation. - -```gherkin -# ============================================================================= -# LIVING DOC — US-<n> · <US Title> -# ============================================================================= -# source: https://github.com/<org>/<repo>/issues/<n> ← optional -# status: planned | active | deprecated -# business_value: -# - <bullet describing the business outcome> -# not_in_scope: ← optional -# - <item excluded from this US> -# preconditions: ← optional -# - <system state required before test> -# -# acceptance_criteria: -# -# AC:US-<n>-01 (v<version> - <State>) -# - <description of the AC> -# - <Aspect>: <value1>, <value2> ← optional; used for {placeholder} ACs -# -# AC:US-<n>-02 (v<version> - <State>) -# - <description of the AC> -# ============================================================================= - -@US_ID:US-<n> -Feature: <US Title> - As a <actor>, I can <capability>, so that <business outcome>. - - Background: ← optional - Given <shared precondition> - - # AC:US-<n>-01 (v<version> - <State>) — <AC description> - @AC:US-<n>-01 - Scenario: <scenario title> - ... -``` - -**Header fields:** -| Field | Required | Purpose | -|---|---|---| -| `# source:` | Optional | Link to the original issue tracker entry or the pre-BDD living doc location | -| `# status:` | Yes | `planned` · `active` · `deprecated` | -| `# business_value:` | Yes | Why this User Story exists (bullets) | -| `# not_in_scope:` | Optional | Explicit exclusions | -| `# preconditions:` | Optional | System-level state required before test execution | -| `# acceptance_criteria:` | Yes | Full AC listing with IDs, versions, and states | -| `@US_ID:US-<n>` tag | Yes | Machine-parseable User Story ID (feature-level tag) | +> Feature file template: see [living-doc-bdd-schemas — US Feature File Header](./living-doc-bdd-schemas.md#us-feature-file-header). ### Feature @@ -95,8 +47,8 @@ A named system surface — the structural layer between User Stories and atomic - Owns: one or more **Functionalities** - Links to: one or more **User Stories** - `owners`: team or person responsible for this Feature -- Status: `planned | active | deprecated` -- Deprecation metadata (set when `status: deprecated`): +- Status: `PLANNED | IN_REVIEW | ACTIVE | DEPRECATED` +- Deprecation metadata (set when `status: DEPRECATED`): - `deprecated_at` — date the entity was deprecated - `deprecation_reason` — why it was deprecated - `superseded_by` — ID of the replacement entity (optional) @@ -104,89 +56,9 @@ A named system surface — the structural layer between User Stories and atomic - `owner_changed_at` — date of ownership transfer - `owner_change_reason` — reason for the transfer -#### UI surface — PageObject file header - -Every PageObject file opens with a living-doc header block that embeds the canonical Feature fields. Use this format so each file is self-describing and traceable without opening a separate registry. - -**Required fields:** - -| Field | Canonical values | -|---|---| -| `surface_type` | `UI` · `API` · `Service` · `Worker` · `Module` · `Library` | -| `route` | URL path — use `{param}` for dynamic segments | -| `owners` | Team name(s), comma-separated | -| `status` | `active` · `planned` · `candidate` · `deprecated` | -| `purpose` | One-to-two sentence description in business language | -| `user_stories` | `US-N` IDs, comma-separated — or `none` (triggers orphan warning in gap reports) | -| `functionalities` | `FUNC-N` IDs, comma-separated — or `none` (triggers a reminder to define FUNCs) | -| `external_dependencies` | Service or API names this surface calls — or `none` | -| `page-object` | Filename of this PageObject | - -**Optional fields (for specific surface types):** - -| Field | When | -|---|---| -| `wizard-steps` | Multi-step wizard UI — list the named steps in order | -| `stub-reason` | `status: candidate` — one-to-two sentence statement of **why** the surface is not yet fully instrumented; treated as tech-debt resolvable by instrumenting the template and re-scanning | - ---- - -#### Two header formats: Full header vs Cross-reference header - -A PageObject file uses one of two formats depending on whether it is the **primary surface owner** or a **secondary file** that implements part of a surface already owned elsewhere. - -**When to use each:** - -| Situation | Format | -|---|---| -| One PageObject = one distinct navigable surface (URL or modal) | **Full header** | -| Multiple PageObjects share one URL (e.g. wizard steps, sub-pages, dialogs) — one file is the primary owner, the others are implementation helpers | **Cross-reference header** — secondary files only | - -**Rule:** exactly one file per Feature carries the full header. Every other file that contributes to the same Feature carries a cross-reference header with `parent-feat` pointing to the Feature ID. This keeps traceability fields (`user_stories`, `functionalities`, `external_dependencies`) in a single authoritative location. - -**Wizard example:** FEAT-042 (Account Setup Wizard) lives at one URL. `AccountSetupWizardPage.ts` is the primary file and carries the full header. `AccountSetupWizardProfilePage.ts`, `AccountSetupWizardPreferencesPage.ts`, and the other step files each carry a cross-reference header pointing `parent-feat: FEAT-042`. Adding a wizard step never requires editing the Feature registry or duplicating traceability data. - ---- - -**Full header required fields:** - -| Field | Canonical values | -|---|---| -| `surface_type` | `UI` · `API` · `Service` · `Worker` · `Module` · `Library` | -| `route` | URL path — use `{param}` for dynamic segments | -| `owners` | Team name(s), comma-separated | -| `status` | `active` · `planned` · `candidate` · `deprecated` | -| `purpose` | One-to-two sentence description in business language | -| `user_stories` | `US-N` IDs, comma-separated — or `none` (triggers orphan warning in gap reports) | -| `functionalities` | `FUNC-N` IDs, comma-separated — or `none` (triggers a reminder to define FUNCs) | -| `external_dependencies` | Service or API names this surface calls — or `none` | -| `page-object` | Filename of this PageObject | - -**Full header example:** +> PageObject file header schemas (full header, cross-reference, operational notes, common mistakes): see [living-doc-bdd-schemas — PageObject File Header](./living-doc-bdd-schemas.md#pageobject-file-header). -```typescript -/* ============================================================================= - * LIVING DOC — FEAT-042 · Account Setup Wizard - * ============================================================================= - * surface_type: UI - * route: /app/accounts/setup - * owners: Platform Team - * status: active - * wizard-steps: Profile · Preferences · Review · Confirm - * purpose: Multi-step wizard for creating and configuring a new account. - * user_stories: US-10, US-12 - * functionalities: FUNC-005, FUNC-006 - * external_dependencies: accounts-api - * page-object: AccountSetupWizardPage.ts - * ============================================================================= */ -``` - ---- - -**Cross-reference header required fields:** - -| Field | Canonical values | -|---|---| +### Functionality (FUNC) | `parent-feat` | `FEAT-<nnn>` — ID of the primary Feature that owns this surface. **Required.** | | `route` | URL path of this specific sub-surface — use `{param}` for dynamic segments | | `owners` | Team name(s), comma-separated | @@ -286,8 +158,8 @@ An atomic, fast-testable behavior — a single verb phrase describing one respon Functionality, containing all AC-linked system-test scenarios once implemented. File name pattern: `func-<nnn>-<feature-name-kebab>-<behavior-kebab>.feature` e.g. `func-001-authentication-screen-credential-based-login.feature` -- Status: `planned | active | deprecated` -- Deprecation metadata (set when `status: deprecated`): +- Status: `PLANNED | IN_REVIEW | ACTIVE | DEPRECATED` +- Deprecation metadata (set when `status: DEPRECATED`): - `deprecated_at` — date the entity was deprecated - `deprecation_reason` — why it was deprecated - `superseded_by` — ID of the replacement entity (optional) @@ -295,77 +167,7 @@ An atomic, fast-testable behavior — a single verb phrase describing one respon Functionalities differ from User Story ACs: they are atomic and fast-testable, not end-to-end. A single User Story may trigger multiple Functionalities. -**Functionality feature file header format:** - -```gherkin -# ============================================================================= -# LIVING DOC — FUNC-<nnn> · <Feature Name> — <Functionality Name> -# ============================================================================= -# source: https://github.com/<org>/<repo>/issues/<n> ← optional -# status: planned | active | deprecated -# parent: FEAT-<nnn> -# func_type: component_state | component_action | button_action | -# field_validation | calculation | visibility | navigation_rule -# rationale: ← optional -# - <why this FUNC is scoped this way — business or design decision context> -# not_in_scope: ← optional -# - <exclusion> -# -# acceptance_criteria: -# -# AC:FUNC-<nnn>-01 (v<version> - <State>) -# - <description in business language — no data-cy IDs in AC text> -# -# AC:FUNC-<nnn>-02 (v<version> - <State>) -# - <description> -# ============================================================================= - -@FUNC_ID:FUNC-<nnn> -Feature: <Feature Name> — <Functionality Name> - <Purpose: one-to-two sentences describing what this FUNC covers, in business - language. Present only when purpose adds context beyond the title.> ← optional - - # No scenarios yet — uncovered ACs flagged by coverage_report.py. - # When adding scenarios: include both # AC:<id> comment and @AC:<id>[/param:value] tag above each Scenario. -``` - -**Header fields:** -| Field | Required | Purpose | -|---|---|---| -| `# source:` | Optional | Link to the original issue tracker entry or the pre-BDD living doc location | -| `# status:` | Yes | `planned` · `active` · `deprecated` | -| `# parent:` | Yes | Parent Feature ID (`FEAT-<nnn>`) | -| `# func_type:` | Yes | Category of behavior this Functionality represents (see table below) | -| `# rationale:` | Optional | **Why** this FUNC is scoped the way it is — business context, a deliberate design decision, or a constraint that explains the boundary. Not for implementation notes (how something works internally). | -| `# not_in_scope:` | Optional | Explicit exclusions | -| `# acceptance_criteria:` | Yes | Full AC listing in business language — do not include `data-cy` IDs or implementation names in AC text | -| `@FUNC_ID:FUNC-<nnn>` tag | Yes | Machine-parseable Functionality ID (feature-level tag) | -| Feature description (below `Feature:`) | Optional | One-to-two sentence purpose in business language. Use when the title alone is not self-explanatory. Replaces `# purpose:` — not a header field. | - -**`func_type` values:** - -| Value | What it documents | PageObject anchor | -|---|---|---| -| `component_state` | Visible state of elements on load (presence, enabled/disabled, default text) AND what a data-bound component renders per data state (populated, empty, error) | `constructor` locators, data-bearing locators | -| `component_action` | Observable response within a self-contained component to an internal interaction — no discrete button, no system-level side effect (e.g. live search, autocomplete, accordion, carousel, tab content) | Component input/state locators | -| `button_action` | Observable outcome(s) after a specific discrete control is triggered — may span multiple resulting steps (e.g. redirect, entity created, dialog opened) | `btn-*` locators | -| `field_validation` | Rule enforced on a single field's value — inline error, enabled state, accepted/rejected input | `input-*` locators | -| `calculation` | Value computed and displayed from one or more inputs, independent of form submission | Display-only locators | -| `visibility` | Element presence, content, or enabled state conditional on a runtime state — condition is optional context and may be role, prior action, data presence, or config (e.g. owner sees action buttons, section appears after step complete) | Any conditional locator | -| `navigation_rule` | When and where the app routes, driven by action or system state — only when routing has a distinct precondition or business rule | Route assertion | - -**Scoping rules:** - -- **One FUNC, one cause.** If two behaviors share a trigger, they are one FUNC with two ACs. If two behaviors have different triggers, they are two FUNCs. -- **`component_state`** — scope to a logical group, not individual elements. "Login form controls on load" is one FUNC. Do not write one FUNC per locator. For data-bound components, each distinct data state (populated / empty / error) is an AC on the same FUNC, not a separate FUNC. -- **`component_action`** — one FUNC per distinct component behavior. If the same component has multiple independent internal behaviors (live search AND column sort), they are separate FUNCs. -- **`button_action`** — one FUNC per distinct button. A button that produces multiple observable steps is still one FUNC; the steps become multiple ACs. Two buttons = two FUNCs. Form submission is `button_action` — the trigger is the submit control. -- **`field_validation`** — one FUNC per distinct validation rule, not one per field. The same rule applied to multiple fields = one FUNC with a `{field}` placeholder AC. -- **`calculation`** — only when the derived value is observable independently of a submission. If the result only appears after a form submit, it is an AC on the `button_action` FUNC. -- **`visibility`** — use when an element's presence or state depends on a condition. The condition is descriptive context in the AC, not a required field. Distinct from `component_state` (always-true on load) and `component_action` (response to interaction). -- **`navigation_rule`** — only for routing behaviors with a distinct precondition or business rule. A redirect that is always the result of a button action is an AC on that `button_action` FUNC, not a separate `navigation_rule`. - -> `test_type` (unit vs integration vs system) is NOT a FUNC header field — it belongs at scenario level as a tag (e.g. `@test_type:system`). +> Feature file template and `func_type` values: see [living-doc-bdd-schemas — Functionality Feature File Header](./living-doc-bdd-schemas.md#functionality-feature-file-header). ### Acceptance Criterion (AC) @@ -385,13 +187,13 @@ AC:<parent-id>-<nn> (v<version> - <State>) - Rationale: <business context, policy reference, or design decision> ← optional ``` -State values: `Planned | Implemented | Active | Deprecated` +State values: `PLANNED | IN_REVIEW | ACTIVE | DEPRECATED` **Scenario traceability:** living-doc scenarios (US and Functionality feature files) carry two complementary annotations — a human-readable `# AC:` comment and a machine-readable `@AC:` tag: ```gherkin -# AC:US-1-01 (v1.0.0 - Active) — customer places an order with a saved payment method +# AC:US-1-01 (v1.0.0 - ACTIVE) — customer places an order with a saved payment method @AC:US-1-01 Scenario: Customer successfully places an order ... @@ -401,7 +203,7 @@ When a scenario covers only **one aspect** of a multi-aspect AC, encode the aspe the `@AC:` tag using the `/param:value` param syntax, and mirror it in the comment: ```gherkin -# AC:US-1-01 (v1.0.0 - Active) — displays {required field} on login screen | aspect: username input +# AC:US-1-01 (v1.0.0 - ACTIVE) — displays {required field} on login screen | aspect: username input @AC:US-1-01/aspect:username-input Scenario: Login form shows the username input field ... @@ -410,8 +212,8 @@ Scenario: Login form shows the username input field Multiple ACs — one comment + tag pair per AC: ```gherkin -# AC:US-1-01 (v1.0.0 - Active) — invalid credentials show an error message -# AC:US-1-02 (v1.0.0 - Active) — account lockout after 3 failed attempts +# AC:US-1-01 (v1.0.0 - ACTIVE) — invalid credentials show an error message +# AC:US-1-02 (v1.0.0 - ACTIVE) — account lockout after 3 failed attempts @AC:US-1-01 @AC:US-1-02 @Regression @@ -438,13 +240,13 @@ Additional `/param:value` segments can be appended as needed — the format is o Deprecated ACs include a removal note: ``` -AC:<parent-id>-<nn> (v<version> – Deprecated – removal planned v<version>) +AC:<parent-id>-<nn> (v<version> – DEPRECATED – removal planned v<version>) ``` -**Descoped ACs** (deferred mid-sprint — state stays `Planned`): +**Descoped ACs** (deferred mid-sprint — state stays `PLANNED`): ``` -AC:<parent-id>-<nn> (v<version> – Planned) +AC:<parent-id>-<nn> (v<version> – PLANNED) – <description> – descoped_at: <date> ← date AC was deferred out of the current sprint – descoped_reason: <text> @@ -454,15 +256,15 @@ AC:<parent-id>-<nn> (v<version> – Planned) **User Story AC examples** (in the `# Acceptance Criteria:` file header block): ``` -AC:US-001-01 (v1.0.0 - Active) +AC:US-001-01 (v1.0.0 - ACTIVE) - The login screen displays {required field}. - Required field: username input, password input, login button - Rationale: Accessibility standard — all interactive controls must be visible on load. -AC:US-001-02 (v1.1.0 - Active) +AC:US-001-02 (v1.1.0 - ACTIVE) - An inline field validation message is shown when invalid credentials are submitted. -AC:US-001-03 (v2.1.0 - Deprecated - removal planned v3.0.0) +AC:US-001-03 (v2.1.0 - DEPRECATED - removal planned v3.0.0) - A "Remember me" checkbox retains the session across browser restarts. - Rationale: Deprecated due to security policy change in v2.0 — persistent sessions no longer permitted. ``` @@ -470,144 +272,21 @@ AC:US-001-03 (v2.1.0 - Deprecated - removal planned v3.0.0) **Functionality AC examples** (in the `# Acceptance Criteria:` file header block): ``` -AC:FUNC-001-01 (v1.0.0 - Active) +AC:FUNC-001-01 (v1.0.0 - ACTIVE) - Returns valid=true when the password satisfies all complexity rules. -AC:FUNC-001-02 (v1.0.0 - Active) +AC:FUNC-001-02 (v1.0.0 - ACTIVE) - Raises {error code} when the credential check fails. - Error code: INVALID_PASSWORD, USER_NOT_FOUND, ACCOUNT_LOCKED - Rationale: Distinct error codes per failure reason, required by the global auth error contract. -AC:FUNC-001-03 (v1.0.0 - Active) +AC:FUNC-001-03 (v1.0.0 - ACTIVE) - Rejects passwords shorter than 8 characters. ``` --- -## ExplorationFixture - -An **ExplorationFixture** is a named set of field→value declarations attached to a specific route in `seed.yaml`. It tells the exploration agent how to fill forms so it can enter wizards, open dialogs, and discover UI surfaces that are otherwise unreachable by passive observation. - -### value_class taxonomy - -| Class | Meaning | How the agent sources it | -|---|---|---| -| `copyable` | Value can be reused verbatim across runs — taken from an existing entity in the app | Navigate to the entity list; read an actual field value; replay it | -| `derived` | Must be transformed from an existing entity — e.g. append `-copy` to a domain name to avoid duplicate rejection | Read existing value; apply a known transformation rule | -| `fake` | Any syntactically valid value — real-world existence not required (e.g. a description, an email address) | Generate locally from label + placeholder + field type | -| `real-world` | Must exist in the real environment for submission to succeed (e.g. a Glue table name, a tenant ID, an AWS account ID) | Sourced from `seed.yaml form_fixtures` or user-provided via Source E pause | - -### Sourcing cascade (applied in priority order) - -1. **`seed.yaml form_fixtures`** — pre-declared by user or written by the agent in a prior session. -2. **Existing app entities** — navigate to the entity list for this surface type; read a sample entity's actual field values; copy or derive. -3. **Field context inference** — read label + placeholder + tooltip + adjacent validation hint text → infer a plausible `fake` value (`"Domain name"` → `"E2E Test Domain"`, `email` field → `"test@example.com"`). -4. **User-assist pause** — none of the above is sufficient for a `real-world` field → show user the form, request the value, record it back to `form_fixtures` with `source: user_provided`. - -### Input validation probing - -After a successful form fill and submission, the agent probes validation behaviour on each text input: - -| Probe | Input | What to observe | -|---|---|---| -| Special characters | `<>'"&\` | Inline error, silent strip, or truncation | -| Oversized input | 200+ random characters | Character counter, truncation at max length, or rejection message | -| Wrong type | Alphabetic text in a numeric or date field | Inline validation message or field rejection | -| Duplicate detection | Identical value to a known existing entity name | Duplicate-rejected error message and its `data-cy` | - -Scan the form after each probe — run the core scan and elements-without-data-cy scripts to capture `data-cy` error messages, character counters, and validation banners that are only visible during invalid input. These become source material for `field_validation` Functionality stubs. - -### seed.yaml schema - -A fixture entry uses either a single `value` shorthand (simple fields) or a `values[]` array -(multi-branch fields). A `condition` restricts the field to a specific traversal context. - -```yaml -form_fixtures: - /auth/all-domains/create-domain/about: - - # Simple single value - - field_data_cy: domain-name - value_class: fake - value: "E2E Exploration Domain" - source: inferred # inferred | user_provided | env_var | existing_entity - - # Multiple values — agent treats each as a separate traversal branch. - # The first (label: default) is used for the happy path; labelled alternates - # are explored afterwards and may open different form sections or sub-routes. - - field_data_cy: domain-type - value_class: copyable - values: - - label: default - value: "BATCH" - source: existing_entity - - label: streaming-path # explores different form section - value: "STREAMING" - source: existing_entity - - # Conditional field — only filled when another field holds a specific value. - # Useful for fields that appear or become mandatory based on a prior selection. - - field_data_cy: stream-endpoint - value_class: real-world - value: env:TEST_STREAM_ENDPOINT - source: env_var - condition: - when_field: domain-type - when_value: STREAMING - - # Real-world field resolved via user-assist pause - - field_data_cy: tenant-id - value_class: real-world - value: env:TEST_TENANT_ID - source: env_var -``` - -**Field reference:** - -| Key | Required | Purpose | -|---|---|---| -| `field_data_cy` | Yes | `data-cy` attribute of the target input element | -| `value_class` | Yes | `copyable` / `derived` / `fake` / `real-world` | -| `value` | One of `value` or `values` | Shorthand for a single fill value | -| `values[]` | One of `value` or `values` | Array of labelled values; agent explores each as a separate traversal branch | -| `values[].label` | Yes (when `values` used) | Branch identifier; `default` marks the happy-path value | -| `values[].value` | Yes | The actual fill value | -| `source` | Yes | `inferred` \| `user_provided` \| `env_var` \| `existing_entity` | -| `condition` | No | Restricts the fill to a specific context | -| `condition.when_field` | Yes (when `condition` used) | `data-cy` of the controlling field | -| `condition.when_value` | Yes (when `condition` used) | Value the controlling field must hold | - -### manifest field_constraints schema - -Validation findings are stored per-field in the manifest `navigation_context.field_constraints` for the route: - -```json -"field_constraints": [ - { - "field_data_cy": "domain-name", - "max_length": 100, - "special_chars": "rejected", - "duplicate": "rejected-with-error", - "duplicate_error_data_cy": "domain-name-duplicate-error" - }, - { - "field_data_cy": "tenant-id", - "allowed_format": "alphanumeric", - "real_world_required": true - } -] -``` - -### Lifecycle - -| Event | What happens | -|---|---| -| First form encountered | Agent applies sourcing cascade; fills form using `default` values; explores labelled alternate branches for multi-value fields; probes validation | -| `condition` field not yet visible | Agent skips the field until the controlling field holds the required `when_value` | -| `real-world` field has no resolvable value | User-assist pause → user provides value → saved to `form_fixtures` with `source: user_provided` | -| Validation probe discovers new `data-cy` | Added to manifest `elements`; flagged as candidate for `field_validation` Functionality | -| Next scan session | Agent reads `form_fixtures` from `seed.yaml`; skips sourcing cascade for pre-declared fields | -| Constraint changes (e.g. max length increased) | Agent detects mismatch on re-probe; updates `field_constraints`; flags in `breaking-changes.md` | +> ExplorationFixture taxonomy, seed.yaml schema, and manifest.field_constraints schema: see [living-doc-bdd-schemas — ExplorationFixture](./living-doc-bdd-schemas.md#explorationfixture). --- From 3fd7e501ae86ba4710d462891b5114bc2f93a1bd Mon Sep 17 00:00:00 2001 From: miroslavpojer <miroslav.pojer@absa.africa> Date: Sat, 30 May 2026 21:15:10 +0200 Subject: [PATCH 28/35] Skill gap reduction. --- .../evals/living-doc-bdd-copilot/evals.json | 20 +- .../living-doc-bdd-copilot/fixture-map.md | 5 +- .../living-doc-bdd-copilot/trigger-eval.json | 6 +- .../evals/living-doc-copilot/evals.json | 178 ------- .../evals/living-doc-copilot/fixture-map.md | 33 -- .../living-doc-copilot/trigger-eval.json | 22 - .../agents/living-doc-bdd-copilot.agent.md | 358 +++++++------- .github/agents/living-doc-copilot.agent.md | 234 --------- README.md | 8 +- docs/README.md | 5 +- docs/getting-started.md | 10 +- docs/guides/agent-design.md | 2 +- docs/guides/living-doc-bdd-copilot.md | 43 +- docs/guides/living-doc-copilot.md | 125 +---- docs/testing/agent-testing.md | 2 +- skills/bdd-explore/SKILL.md | 175 ------- skills/bdd-maintain/SKILL.md | 64 +-- skills/bdd-scenario-gen/SKILL.md | 225 --------- skills/data-cy-instrument/SKILL.md | 19 +- skills/gherkin-living-doc-sync/SKILL.md | 8 +- .../gherkin-living-doc-sync/evals/evals.json | 4 +- .../evals/fixture-map.md | 4 +- .../evals/trigger-eval.json | 4 +- skills/gherkin-step/SKILL.md | 6 +- skills/gherkin-step/evals/evals.json | 4 +- skills/gherkin-step/evals/trigger-eval.json | 2 +- .../living-doc-create-functionality/SKILL.md | 4 +- skills/living-doc-gap-finder/SKILL.md | 3 +- skills/living-doc-pageobject-scan/SKILL.md | 463 +++++++++--------- skills/living-doc-scenario-creator/SKILL.md | 306 +++++++----- .../evals/evals.json | 4 +- .../evals/fixture-map.md | 3 +- .../evals/trigger-eval.json | 2 +- 33 files changed, 658 insertions(+), 1693 deletions(-) delete mode 100644 .github/agents/evals/living-doc-copilot/evals.json delete mode 100644 .github/agents/evals/living-doc-copilot/fixture-map.md delete mode 100644 .github/agents/evals/living-doc-copilot/trigger-eval.json delete mode 100644 .github/agents/living-doc-copilot.agent.md delete mode 100644 skills/bdd-explore/SKILL.md delete mode 100644 skills/bdd-scenario-gen/SKILL.md diff --git a/.github/agents/evals/living-doc-bdd-copilot/evals.json b/.github/agents/evals/living-doc-bdd-copilot/evals.json index cada06e..38edb9e 100644 --- a/.github/agents/evals/living-doc-bdd-copilot/evals.json +++ b/.github/agents/evals/living-doc-bdd-copilot/evals.json @@ -19,14 +19,14 @@ "id": 2, "category": "happy-path", "prompt": "Crawl the webapp and generate a PageObject for the checkout screen at /checkout.", - "expected_output": "Agent navigates to /checkout via MCP Playwright. Takes a snapshot and identifies interactive elements: promo code input, confirm order button, error banner. Generates CheckoutPage with: file-level living-doc: FEAT-<nnn> | /checkout comment, ALL_CAPS selector constants using data-testid preference, __init__ or constructor taking a page parameter, and method stubs for each interactive element. Adds the surface to manifest.json. If no matching Feature entity exists, hands off to @living-doc-copilot to create FEAT-<nnn>. Flags any element using positional CSS selectors as fragile.", + "expected_output": "Agent navigates to /checkout via MCP Playwright. Takes a snapshot and identifies interactive elements: promo code input, confirm order button, error banner. Generates CheckoutPage with: file-level living-doc: FEAT-<nnn> | /checkout comment, ALL_CAPS selector constants using data-testid preference, __init__ or constructor taking a page parameter, and method stubs for each interactive element. Adds the surface to manifest.json. If no matching Feature entity exists, loads `living-doc-create-feature` skill to create FEAT-<nnn> before continuing. Flags any element using positional CSS selectors as fragile.", "files": [], "expectations": [ "Uses MCP Playwright to navigate and snapshot the page", "Generates CheckoutPage with data-testid selector preference", "File-level living-doc: FEAT-nnn | /checkout comment", "Adds entry to manifest.json", - "Hands off to @living-doc-copilot for missing Feature entities", + "Loads living-doc-create-feature for missing Feature entities", "Flags positional CSS selectors as fragile" ] }, @@ -78,12 +78,12 @@ "id": 6, "category": "negative", "prompt": "Create a User Story for the guest checkout capability.", - "expected_output": "Creating living doc catalog entities is out of scope for this agent — hands off to @living-doc-copilot. @living-doc-bdd-copilot owns the automation layer (PageObjects, scenarios, step definitions); @living-doc-copilot owns the catalog layer (User Stories, Features, Functionalities).", + "expected_output": "Agent switches to catalog-operations mode and loads `living-doc-create-user-story` skill to create a User Story for guest checkout. This agent handles both catalog management and BDD automation.", "files": [], "expectations": [ - "Does not create a User Story", - "Routes to @living-doc-copilot", - "Explains the catalog vs automation layer boundary" + "Loads living-doc-create-user-story skill", + "Creates User Story in catalog-operations mode", + "Does not hand off to another agent" ] }, { @@ -104,7 +104,7 @@ "id": 8, "category": "edge-case", "prompt": "REMOVE mode — the legacy promo code feature has been removed from the product. Clean up the BDD artifacts.", - "expected_output": "Agent enters REMOVE mode. Identifies all .feature files whose scenarios carry # AC: tags matching the removed promo code Feature/US IDs. Finds PageObjects referenced only by those scenarios. Finds step definitions used only by those scenarios. Presents the complete deletion list to the user for confirmation before touching any file. After confirmation: removes confirmed files, updates manifest.json to remove the deprecated entries. Flags the linked US/AC entities in the catalog as deprecation candidates and hands off to @living-doc-copilot.", + "expected_output": "Agent enters REMOVE mode. Identifies all .feature files whose scenarios carry # AC: tags matching the removed promo code Feature/US IDs. Finds PageObjects referenced only by those scenarios. Finds step definitions used only by those scenarios. Presents the complete deletion list to the user for confirmation before touching any file. After confirmation: removes confirmed files, updates manifest.json to remove the deprecated entries. Loads `living-doc-update` skill to deprecate the linked US/AC entities in the catalog.", "files": [], "expectations": [ "Identifies all .feature file scenarios linked to the removed Feature via # AC: tags", @@ -112,7 +112,7 @@ "Presents the full deletion list for user confirmation before any file is touched", "Removes only confirmed files — does not auto-delete", "Updates manifest.json to remove deprecated entries", - "Flags catalog entities for deprecation and hands off to @living-doc-copilot" + "Loads living-doc-update skill to deprecate catalog entities" ] }, { @@ -160,14 +160,14 @@ "id": 12, "category": "output-format", "prompt": "After scanning /login and generating a LoginPage, show me what the manifest.json entry for /login looks like.", - "expected_output": "The manifest.json entry for /login includes: pageobject_path (path to the generated LoginPage file), feature_id (FEAT-<nnn> or FEAT-UNKNOWN if unlinked), last_scanned (ISO timestamp), elements (list of discovered elements with data_cy and tag), coverage_gaps (empty list initially), and navigation_context with prerequisites, navigation_steps, data_requirements, auth_role, and notes. The feature_id is FEAT-UNKNOWN if no matching Feature entity exists in the living doc — flag this route as 'needs Feature entity' and hand off to @living-doc-copilot.", + "expected_output": "The manifest.json entry for /login includes: pageobject_path (path to the generated LoginPage file), feature_id (FEAT-<nnn> or FEAT-UNKNOWN if unlinked), last_scanned (ISO timestamp), elements (list of discovered elements with data_cy and tag), coverage_gaps (empty list initially), and navigation_context with prerequisites, navigation_steps, data_requirements, auth_role, and notes. The feature_id is FEAT-UNKNOWN if no matching Feature entity exists in the living doc — flag this route as 'needs Feature entity' and load `living-doc-create-feature` skill to create it.", "files": [], "expectations": [ "manifest.json entry has: pageobject_path, feature_id, last_scanned, elements, coverage_gaps, navigation_context", "feature_id is FEAT-UNKNOWN if no matching Feature entity exists", "last_scanned is an ISO timestamp", "navigation_context includes prerequisites, navigation_steps, auth_role", - "Missing Feature entity is flagged for @living-doc-copilot handoff" + "Missing Feature entity triggers living-doc-create-feature skill load" ] }, { diff --git a/.github/agents/evals/living-doc-bdd-copilot/fixture-map.md b/.github/agents/evals/living-doc-bdd-copilot/fixture-map.md index 9573e21..54cd1b1 100644 --- a/.github/agents/evals/living-doc-bdd-copilot/fixture-map.md +++ b/.github/agents/evals/living-doc-bdd-copilot/fixture-map.md @@ -9,7 +9,7 @@ | 3 | happy-path | Scenario generation from US ACs | — | | 4 | regression | RE-SCAN mode — selector drift detection and repair | — | | 5 | regression | HEALING mode — broken step definitions | — | -| 6 | negative | User Story creation request → route to @living-doc-copilot | — | +| 6 | negative | Unit test request → @sdet-copilot | — | | 7 | paraphrase | "fix failing tests" → HEALING mode trigger | — | | 8 | regression | REMOVE mode — full feature removal with pre-deletion checklist | — | | 9 | regression | Partial state rule: seed.yaml present, manifest.json absent → first run | — | @@ -24,10 +24,7 @@ | 24 total | 20 true | 4 false | False cases: -- `create a User Story` → @living-doc-copilot - `write a unit test` → @sdet-copilot -- `update AC state` → @living-doc-copilot - `TypeScript quality gate` → @quality-gate-copilot (out of scope) -- `update AC on US-007` → @living-doc-copilot > No fixture files — all evals use inline prompt/expected_output; agent behavior is assessed against the agent.md operating rules and skill definitions. diff --git a/.github/agents/evals/living-doc-bdd-copilot/trigger-eval.json b/.github/agents/evals/living-doc-bdd-copilot/trigger-eval.json index 7b54d2b..94beb6f 100644 --- a/.github/agents/evals/living-doc-bdd-copilot/trigger-eval.json +++ b/.github/agents/evals/living-doc-bdd-copilot/trigger-eval.json @@ -15,12 +15,12 @@ {"id": 14, "query": "What is the scenario coverage for US-007?", "should_trigger": true, "reason": "'scenario coverage' trigger phrase"}, {"id": 15, "query": "Write the step definitions for the checkout scenarios", "should_trigger": true, "reason": "'step definitions' trigger phrase"}, {"id": 16, "query": "Generate Gherkin from user story US-003", "should_trigger": true, "reason": "'gherkin from user story' trigger phrase"}, - {"id": 17, "query": "Create a User Story for the loyalty points redemption feature", "should_trigger": false, "reason": "Living doc catalog entity creation — routes to @living-doc-copilot"}, + {"id": 17, "query": "Create a User Story for the loyalty points redemption feature", "should_trigger": true, "reason": "Catalog entity creation is handled by this agent in catalog-operations mode"}, {"id": 18, "query": "Write a unit test for the discount calculation function", "should_trigger": false, "reason": "Unit test authoring — out of scope for this toolkit (no @sdet-copilot agent defined)"}, - {"id": 19, "query": "Update the AC state on US-007-02 to DEPRECATED", "should_trigger": false, "reason": "Catalog entity state update — routes to @living-doc-copilot"}, + {"id": 19, "query": "Update the AC state on US-007-02 to DEPRECATED", "should_trigger": true, "reason": "Catalog entity state update is handled by this agent in catalog-operations mode"}, {"id": 20, "query": "Run the TypeScript quality gate for the frontend", "should_trigger": false, "reason": "Quality gate execution — out of scope for this agent"}, {"id": 21, "query": "The manifest.json is missing — start a first exploration run from the seed file", "should_trigger": true, "reason": "Partial state: seed present, manifest absent → first exploration run — 'scan webapp' pattern"}, {"id": 22, "query": "The seed.yaml has literal credentials — is that correct?", "should_trigger": true, "reason": "Credential safety rule enforcement during seed assembly — BDD session setup task"}, {"id": 23, "query": "I've hit a guided traversal point — the checkout wizard needs a delivery zone code", "should_trigger": true, "reason": "Source E guided traversal protocol — blocked crawl point during exploration"}, - {"id": 24, "query": "Update the AC on US-007 to change the payment timeout to 30 seconds", "should_trigger": false, "reason": "AC update is a catalog layer operation — routes to @living-doc-copilot"} + {"id": 24, "query": "Update the AC on US-007 to change the payment timeout to 30 seconds", "should_trigger": true, "reason": "AC update is a catalog layer operation handled by this agent"} ] diff --git a/.github/agents/evals/living-doc-copilot/evals.json b/.github/agents/evals/living-doc-copilot/evals.json deleted file mode 100644 index bc0fc62..0000000 --- a/.github/agents/evals/living-doc-copilot/evals.json +++ /dev/null @@ -1,178 +0,0 @@ -{ - "agent_name": "living-doc-copilot", - "evals": [ - { - "id": 1, - "category": "happy-path", - "prompt": "I want to start documenting our living doc catalog.", - "expected_output": "Before performing any create or update operation, agent asks for the Storage Profile: 'Which storage format does your living doc use? Describe the entity structure, field names, and where entities are stored (e.g. YAML files in docs/living-doc/, ADO work items, Confluence pages).' Waits for the answer before proceeding. Does not assume a format or create any entity until the storage profile is provided.", - "files": [], - "expectations": [ - "Asks for Storage Profile before any create or update operation", - "Asks about storage location, entity templates, AC block structure, and field name mappings", - "Does not create any entity until the Storage Profile is provided", - "Waits for the user's answer — does not assume a format" - ] - }, - { - "id": 2, - "category": "happy-path", - "prompt": "Create a User Story for the promo code feature. ACs: (1) Valid promo reduces cart total by 10%. (2) Expired promo shows an error message.", - "expected_output": "Agent creates a User Story entity with the As-a/I-can/so-that narrative. Each AC carries all required metadata: state (PLANNED / ACTIVE / DEPRECATED / IN_REVIEW), version, pre-conditions, and not_in_scope. ACs are atomic — one input condition and one observable outcome each. AC IDs follow the AC:<parent-id>-<nn> format. The entity is written in the project's confirmed Storage Profile format.", - "files": [], - "expectations": [ - "User Story has an As-a/I-can/so-that narrative", - "Each AC has state, version, pre-conditions, and not_in_scope metadata fields", - "AC IDs follow AC:<US-id>-<nn> format", - "ACs are atomic — one condition and one outcome each", - "Entity written in the confirmed Storage Profile format" - ] - }, - { - "id": 3, - "category": "happy-path", - "prompt": "PLAN mode — the PO wants a loyalty points redemption feature. No code exists yet. Draft ACs from this description: 'Customers with at least 500 points can redeem them at checkout. Each point is worth 1 penny.'", - "expected_output": "Agent operates in PLAN mode. Drafts ACs in plain language from the PO description. Presents the draft to the user for confirmation before creating any entity. After confirmation, creates the User Story and ACs in PLANNED state only — not ACTIVE. ACs cover: successful redemption (>=500 points), insufficient points (< 500), point conversion to currency (1pt = £0.01), and boundary condition (exactly 500 points).", - "files": [], - "expectations": [ - "Operates in PLAN mode", - "Drafts ACs and presents them for confirmation before creating", - "Creates entities in PLANNED state only", - "Covers happy path, error path, and boundary condition", - "Does not create in ACTIVE state without explicit confirmation that code exists" - ] - }, - { - "id": 4, - "category": "happy-path", - "prompt": "We refactored PaymentService — it no longer calls the legacy card tokeniser. Run an impact analysis.", - "expected_output": "Agent traces which living doc entities reference PaymentService: identifies linked Features (e.g. FEAT-payments), User Stories whose ACs describe payment tokenisation, and Functionalities that reference the tokeniser. Outputs an impact map: entities that must be reviewed (ACs that reference the old tokeniser behaviour), entities that need state update (ACs whose code no longer exists), and entities unaffected. Recommends: deprecate ACs that reference the deleted tokeniser, update version fields on affected entities.", - "files": [], - "expectations": [ - "Identifies all Features, User Stories, and Functionalities referencing PaymentService", - "Produces an impact map with affected and unaffected entities", - "Recommends deprecating ACs whose behaviour was removed", - "Recommends version bump on changed entities", - "Does not auto-change state — presents for confirmation" - ] - }, - { - "id": 5, - "category": "regression", - "prompt": "HEALING mode — the promo code module was deleted three sprints ago. Its ACs are still in ACTIVE state in the catalog.", - "expected_output": "Agent enters HEALING mode. Verifies that the module no longer exists (via file search or user confirmation). Identifies stale entities: all ACs under the promo-code Feature that reference deleted code. Sets state to DEPRECATED on confirmed stale ACs. Fixes traceability links that reference the deleted module. Does NOT touch PageObjects or step definitions — flags those for @living-doc-bdd-copilot. Notes any remaining pre-conditions that reference the deleted flow.", - "files": [], - "expectations": [ - "Enters HEALING mode — catalog layer only", - "Identifies all ACs with ACTIVE state that reference the deleted module", - "Sets state to DEPRECATED on confirmed stale ACs", - "Does not touch PageObjects or step definitions — defers to @living-doc-bdd-copilot", - "Removes or flags pre-conditions that reference the deleted flow" - ] - }, - { - "id": 6, - "category": "negative", - "prompt": "Write Gherkin scenarios for US-007 — Place an Online Order.", - "expected_output": "Scenario generation is out of scope for this agent — hands off to @living-doc-bdd-copilot. living-doc-copilot owns the catalog layer (entities, ACs, traceability); @living-doc-bdd-copilot generates Gherkin from US ACs and manages the automation layer.", - "files": [], - "expectations": [ - "Does not write Gherkin scenarios", - "Routes to @living-doc-bdd-copilot", - "Explains the catalog vs automation layer boundary" - ] - }, - { - "id": 7, - "category": "paraphrase", - "prompt": "I need to capture this business rule in the living doc: orders from repeat customers get a 5% loyalty discount automatically.", - "expected_output": "Agent identifies this as a Functionality entity request (atomic business rule). Invokes the living-doc-create-functionality skill. Forms a verb-phrase name (e.g. 'Apply repeat-customer loyalty discount'). Runs the completeness checklist: asks about the threshold for 'repeat customer', boundary cases (exactly N orders), interaction with other discounts. Drafts ACs and presents for confirmation. Creates the Functionality in the confirmed Storage Profile format with all required AC metadata fields.", - "files": [], - "expectations": [ - "Identifies as a Functionality entity — atomic business rule", - "Invokes living-doc-create-functionality skill", - "Verb-phrase name for the Functionality", - "Runs completeness checklist — asks about threshold and boundary conditions", - "Drafts ACs for confirmation before creating" - ] - }, - { - "id": 8, - "category": "edge-case", - "prompt": "AC:US-001-02 is currently ACTIVE at v1.0.0. The business changed the rule: the discount threshold is now £75 instead of £50. How do I update it?", - "expected_output": "Updating an ACTIVE AC requires a version bump. Agent updates the AC description and increments the version (v1.0.0 → v1.1.0). Reminds that any Gherkin scenarios linked to this AC via '# AC: US-001-02' may now have stale step text — flags for @living-doc-bdd-copilot to sync. Does NOT change the AC ID. Shows old and new AC side by side for confirmation before writing.", - "files": [], - "expectations": [ - "Bumps version on the updated AC (v1.0.0 → v1.1.0)", - "Does not change the AC ID", - "Flags linked Gherkin scenarios as potentially stale — defers to @living-doc-bdd-copilot", - "Shows old and new AC side by side for confirmation before writing" - ] - }, - { - "id": 9, - "category": "regression", - "prompt": "We confirmed the storage profile 5 messages ago — the living doc uses YAML files in docs/living-doc/. Now create a Feature entity for the Payment API without asking about storage again.", - "expected_output": "Agent does NOT re-ask for the storage profile — the confirmed format from earlier in the session is reused. Creates the Feature entity in the already-confirmed YAML format without prompting for storage details. Proceeds directly to eliciting the Feature metadata (surface type, purpose, owners, dependencies).", - "files": [], - "expectations": [ - "Does not re-ask for the storage profile within the same session", - "Reuses the confirmed Storage Profile from earlier in the session", - "Creates the entity in the correct confirmed format", - "Proceeds directly to eliciting the missing Feature metadata" - ] - }, - { - "id": 10, - "category": "regression", - "prompt": "I'm creating AC:US-007-02 for the promo code user story. It just says: 'Expired promo codes are rejected.' Is that enough?", - "expected_output": "No. The AC is missing required metadata fields. Every AC must include: state (PLANNED / ACTIVE / DEPRECATED / IN_REVIEW), version (e.g. v1.0.0), pre-conditions (what must hold before this AC can be tested), and not_in_scope (explicit exclusions). Also, the AC text is incomplete — it does not specify the observable outcome (e.g. what error message the customer sees). Complete the AC before creating: state=PLANNED, version=v1.0.0, pre-conditions (customer is on checkout page with items in cart), not_in_scope (does not cover code-reuse attacks).", - "files": [], - "expectations": [ - "Flags missing required AC metadata: state, version, pre-conditions, not_in_scope", - "Flags the AC text as incomplete — no observable outcome specified", - "Provides a complete AC example with all required fields", - "Does not create the entity until all required fields are present" - ] - }, - { - "id": 11, - "category": "negative", - "prompt": "Scan the checkout page to discover its UI elements and generate a PageObject.", - "expected_output": "UI exploration and PageObject generation is out of scope for this agent — hands off to @living-doc-bdd-copilot. @living-doc-copilot owns the catalog layer; @living-doc-bdd-copilot uses MCP Playwright to crawl, discover elements, and generate PageObjects.", - "files": [], - "expectations": [ - "Does not scan the webapp or generate PageObjects", - "Routes to @living-doc-bdd-copilot", - "Explains the catalog vs automation layer boundary" - ] - }, - { - "id": 12, - "category": "edge-case", - "prompt": "HEALING mode — I ran a gap analysis and found FUNC-promo-validate has no parent Feature. What should this agent do?", - "expected_output": "Agent in HEALING mode identifies FUNC-promo-validate as an ORPHAN_FUNCTIONALITY. It searches the catalog for the most plausible owning Feature (e.g. FEAT-promotions) based on the Functionality name and description. Presents the proposed Feature link to the user for confirmation. After confirmation, updates the Functionality entity to set the parent Feature. Does NOT create a new Feature entity without user confirmation.", - "files": [], - "expectations": [ - "Identifies ORPHAN_FUNCTIONALITY in HEALING mode", - "Searches for the most plausible owning Feature", - "Presents proposed Feature link for user confirmation", - "Updates the Functionality entity only after confirmation", - "Does not auto-create a new Feature entity" - ] - }, - { - "id": 13, - "category": "happy-path", - "prompt": "@living-doc-bdd-copilot just completed a Phase 1 scan and found 3 new surfaces: /checkout, /account/profile, and /reports. None of them have Feature entities in the catalog. What should this agent do?", - "expected_output": "Agent loads the surface list from the inbound handoff. For each surface it invokes living-doc-create-feature, identifies surface_type as UI for all three, and drafts a candidate Feature entity (FEAT-checkout, FEAT-account-profile, FEAT-reports) for confirmation before persisting. Does not re-ask for the storage profile if it is already confirmed in the session. After confirmation sends the completion handoff: 'Feature entities are ready. Call @living-doc-bdd-copilot to generate scenarios.'", - "files": [], - "expectations": [ - "Processes all three surfaces from the inbound handoff", - "Creates a Feature entity draft per surface using living-doc-create-feature", - "Does not re-ask for storage profile if already confirmed in session", - "Sends completion handoff message back to @living-doc-bdd-copilot" - ] - } - ] -} diff --git a/.github/agents/evals/living-doc-copilot/fixture-map.md b/.github/agents/evals/living-doc-copilot/fixture-map.md deleted file mode 100644 index a279cf3..0000000 --- a/.github/agents/evals/living-doc-copilot/fixture-map.md +++ /dev/null @@ -1,33 +0,0 @@ -# Fixture Map — living-doc-copilot agent evals - -## Eval coverage summary - -| Eval ID | Category | Description | Fixture files | -|---------|----------|-------------|---------------| -| 1 | happy-path | Storage Profile elicitation on session start | — | -| 2 | happy-path | Create User Story with full AC metadata fields | — | -| 3 | happy-path | PLAN mode — draft ACs from PO description in PLANNED state | — | -| 4 | happy-path | Impact analysis: code change → impact map | — | -| 5 | regression | HEALING mode — stale Functionality deprecation | — | -| 6 | negative | Gherkin scenario request → route to @living-doc-bdd-copilot | — | -| 7 | paraphrase | "document a behavior" → create Functionality entity | — | -| 8 | regression | Updating ACTIVE AC bumps version, preserves ID, flags Gherkin stale | — | -| 9 | regression | Storage Profile reuse — does NOT re-ask within same session | — | -| 10 | regression | AC completeness check — missing state/version/pre-conditions/not_in_scope | — | -| 11 | negative | Webapp scan/PageObject request → route to @living-doc-bdd-copilot | — | -| 12 | edge-case | HEALING mode — ORPHAN_FUNCTIONALITY repair with Feature link proposal | — | - -## Trigger eval summary - -| Count | Triggers (should_trigger=true) | Non-triggers (should_trigger=false) | -|-------|-------------------------------|--------------------------------------| -| 20 total | 15 true | 5 false | - -False cases: -- `scan webapp / generate pageobjects` → @living-doc-bdd-copilot -- `generate BDD scenarios` → @living-doc-bdd-copilot -- `write a unit test` → @sdet-copilot -- `fix failing BDD tests` → @living-doc-bdd-copilot -- `crawl the app and create PageObjects` → @living-doc-bdd-copilot - -> No fixture files — all evals use inline prompt/expected_output; agent behavior is assessed against the agent.md operating rules. diff --git a/.github/agents/evals/living-doc-copilot/trigger-eval.json b/.github/agents/evals/living-doc-copilot/trigger-eval.json deleted file mode 100644 index 468a0ad..0000000 --- a/.github/agents/evals/living-doc-copilot/trigger-eval.json +++ /dev/null @@ -1,22 +0,0 @@ -[ - {"id": 1, "query": "Create a user story for the checkout capability", "should_trigger": true, "reason": "'create user story' trigger phrase"}, - {"id": 2, "query": "Document the Orders API as a Feature in the living doc", "should_trigger": true, "reason": "'document feature' trigger phrase"}, - {"id": 3, "query": "Update the AC on US-007 — the payment timeout is now 30 seconds, not 60", "should_trigger": true, "reason": "'update AC' trigger phrase"}, - {"id": 4, "query": "Run an impact analysis on the PaymentService refactor", "should_trigger": true, "reason": "'impact analysis' trigger phrase"}, - {"id": 5, "query": "Find gaps in the living documentation catalog", "should_trigger": true, "reason": "'living doc gaps' trigger phrase"}, - {"id": 6, "query": "Enter PLAN mode — the PO has a new checkout initiative that hasn't been built yet", "should_trigger": true, "reason": "'PLAN mode' trigger phrase"}, - {"id": 7, "query": "Run HEALING mode on the living doc catalog — we've deleted several old flows", "should_trigger": true, "reason": "'HEALING mode' trigger phrase"}, - {"id": 8, "query": "Deprecate the legacy payment flow entities in the living doc", "should_trigger": true, "reason": "'deprecate entity' trigger phrase"}, - {"id": 9, "query": "@living-doc-copilot help me update the requirements catalog", "should_trigger": true, "reason": "'living doc copilot' trigger phrase — explicit agent invocation"}, - {"id": 10, "query": "Add an AC to user story US-003 for the expired promo code case", "should_trigger": true, "reason": "'add AC to user story' trigger phrase"}, - {"id": 11, "query": "Trace which features are affected by the change to the notification service", "should_trigger": true, "reason": "'trace affected features' trigger phrase"}, - {"id": 12, "query": "Update the feature registry to include the new reporting module", "should_trigger": true, "reason": "'update feature registry' trigger phrase"}, - {"id": 13, "query": "Scan the webapp and generate PageObjects for the checkout screen", "should_trigger": false, "reason": "UI crawl and PageObject generation — routes to @living-doc-bdd-copilot"}, - {"id": 14, "query": "Generate Gherkin scenarios for US-007", "should_trigger": false, "reason": "Scenario generation — routes to @living-doc-bdd-copilot"}, - {"id": 15, "query": "Write a unit test for the discount calculation function", "should_trigger": false, "reason": "Unit test authoring — out of scope for this toolkit (no @sdet-copilot agent defined)"}, - {"id": 16, "query": "Fix the failing BDD tests after the UI redesign", "should_trigger": false, "reason": "PageObject and step definition repair — routes to @living-doc-bdd-copilot"}, - {"id": 17, "query": "My AC is missing the pre-conditions field — can you add it?", "should_trigger": true, "reason": "Updating AC metadata is a living-doc-update / catalog layer task — routes to this agent"}, - {"id": 18, "query": "Enter HEALING mode and fix the stale Functionality entities", "should_trigger": true, "reason": "HEALING mode catalog layer — explicit trigger phrase"}, - {"id": 19, "query": "Check whether US-007 has all required AC fields before we mark it active", "should_trigger": true, "reason": "Reviewing AC completeness and US promotion readiness is a catalog layer task"}, - {"id": 20, "query": "Crawl the app and create PageObjects for all screens", "should_trigger": false, "reason": "Webapp crawl and PageObject generation — routes to @living-doc-bdd-copilot"} -] diff --git a/.github/agents/living-doc-bdd-copilot.agent.md b/.github/agents/living-doc-bdd-copilot.agent.md index d3aa07c..ef901ad 100644 --- a/.github/agents/living-doc-bdd-copilot.agent.md +++ b/.github/agents/living-doc-bdd-copilot.agent.md @@ -1,135 +1,159 @@ --- description: > - Bridge living documentation to executable tests. Explore web apps via MCP Playwright, - generate and maintain PageObjects, Gherkin scenarios, and step definitions. - Covers webapp exploration with Business Seed assembly (seed.yaml, manifest.json), - iterative UI crawling with guided traversal support, scenario generation from User - Story ACs, and BDD suite maintenance (RE-SCAN, HEALING, REMOVE). Triggers: "scan - webapp", "generate pageobjects", "heal pageobjects", "generate scenarios", "sync - gherkin", "playwright crawl", "explore the app", "bdd copilot", "living doc bdd - copilot", "BDD pipeline", "crawl the UI", "create page objects", "generate feature - file", "scenario coverage", "step definitions", "gherkin from user story", - "add missing data-cy", "instrument templates", "fix data-cy gaps", "add testids", - "fix playwright selectors". -tools: [vscode/askQuestions, vscode/toolSearch, vscode/memory, execute/runInTerminal, execute/getTerminalOutput, execute/sendToTerminal, execute/killTerminal, read/readFile, read/problems, agent/runSubagent, browser/openBrowserPage, browser/readPage, browser/screenshotPage, browser/navigatePage, browser/clickElement, browser/dragElement, browser/hoverElement, browser/typeInPage, browser/runPlaywrightCode, browser/handleDialog, edit/createDirectory, edit/createFile, edit/editFiles, edit/rename, search/codebase, search/fileSearch, search/listDirectory, search/textSearch, search/usages, web/fetch, web/githubRepo, web/githubTextSearch, todo] + Single agent for living documentation and BDD automation — catalog management plus + executable test generation. Catalog: create/update/deprecate User Stories, Features, + Functionalities and ACs; impact analysis; gap finding (HEALING/PLAN modes). + Automation: explore webapps, generate PageObjects, produce Gherkin scenarios and step + definitions, maintain BDD suites, sync traceability. Triggers: "create user story", + "document feature", "update AC", "impact analysis", "living doc gaps", "PLAN mode", + "HEALING mode", "deprecate entity", "mark US ready", "scan webapp", "generate pageobjects", + "heal pageobjects", "generate scenarios", "sync gherkin", "playwright crawl", + "explore the app", "BDD pipeline", "crawl the UI", "create page objects", + "generate feature file", "step definitions", "add missing data-cy", "fix playwright selectors", + "living doc bdd copilot", "living doc copilot". +tools: [vscode/askQuestions, vscode/toolSearch, vscode/memory, vscode/resolveMemoryFileUri, vscode/runCommand, vscode/vscodeAPI, execute/runInTerminal, execute/getTerminalOutput, execute/sendToTerminal, execute/killTerminal, execute/runTask, execute/createAndRunTask, read/readFile, read/viewImage, read/problems, read/terminalLastCommand, agent/runSubagent, browser/openBrowserPage, browser/readPage, browser/screenshotPage, browser/navigatePage, browser/clickElement, browser/dragElement, browser/hoverElement, browser/typeInPage, browser/runPlaywrightCode, browser/handleDialog, edit/createDirectory, edit/createFile, edit/editFiles, edit/rename, search/codebase, search/fileSearch, search/listDirectory, search/textSearch, search/usages, web/fetch, web/githubRepo, web/githubTextSearch, todo] --- # @living-doc-bdd-copilot -BDD extension of `@living-doc-copilot`. Bridges the catalog to executable tests: explores web apps, generates PageObjects, produces Gherkin scenarios and step definitions, and maintains the BDD automation suite. Works as the automation layer partner to `@living-doc-copilot`, which owns the catalog. Does not create or modify living documentation catalog entities. +Full living documentation agent. Owns both the catalog layer (requirements, entities, ACs, traceability) and the automation layer (PageObjects, Gherkin, step definitions, BDD maintenance). One agent, no cross-agent handoffs needed. -**Before executing any multi-step task:** State your plan in one sentence — name the mode you are entering, the skill you will load, and your first concrete action. Then proceed. +**Before any multi-step task:** State your plan in one sentence — name the mode, the skill you will load, and your first concrete action. Then proceed. --- -## Session State Protocol +## Initialisation (catalog layer) -**On every session start**, create or load `.copilot/bdd/.session-state.md` (dot-prefix — add to `.gitignore`). +When the user is setting up living documentation for the first time, ask: -This file is the agent's working memory. It keeps the context window small during long sessions: instead of holding the full manifest and all skill content in context, the agent writes progress to disk and loads only what it needs next. +> "Which storage format does your living doc use? Describe field names, entity structure, and where entities are stored (e.g. YAML files in `docs/living-doc/`)." -**Schema:** +Wait for the answer before the first create or update. Extract storage location, entity templates, AC block structure, and field name mappings. Write to `.copilot/living-doc/.storage-profile.md`. If it already exists at session start, load it and skip the prompt. -```markdown -# BDD Session State -_Auto-managed by @living-doc-bdd-copilot. Delete when session complete._ +--- -## Mode -<!-- EXPLORE | SCENARIO-GEN | HEAL | RE-SCAN | REMOVE --> +## Session State -## Goal -<!-- One sentence: what this session must accomplish --> +For multi-step sessions, maintain a state file to keep context lean: -## Artifacts -- seed.yaml: <path> -- manifest.json: <path> +- **Catalog sessions** (HEALING, PLAN, multi-entity): `.copilot/living-doc/.session-state.md` +- **Automation sessions** (EXPLORE, RE-SCAN, SCENARIO-GEN): `.copilot/bdd/.session-state.md` -## Route Progress -<!-- Per-route status. Only routes relevant to this session. --> -- [ ] /route-a — pending -- [-] /route-b — IN PROGRESS (note current sub-step or blocker) -- [x] /route-c — done +Both files use the same schema: + +```markdown +# Session State +_Auto-managed. Delete when session complete._ -## Current Position -<!-- What is the agent doing RIGHT NOW — route, wizard step, form field, etc. --> +## Mode <!-- e.g. HEALING | EXPLORE | SCENARIO-GEN --> +## Goal <!-- One sentence --> +## Artifacts <!-- seed.yaml: <path> / manifest.json: <path> — for automation sessions --> -## Pending Actions -<!-- Ordered. Remove items as they complete. --> -1. <next action> -2. <action after that> +## Progress +<!-- CATALOG: - [x] US-001 done / [-] US-002 in progress / [ ] US-003 pending --> +<!-- AUTOMATION: - [x] /route-a / [-] /route-b IN PROGRESS / [ ] /route-c pending --> -## Decisions & Findings -<!-- Notes that would be expensive to re-discover: dead ends, field constraints, - role requirements, entity IDs resolved this session, CAPTCHA steps taken. --> +## Current Position <!-- What the agent is doing right now --> +## Pending Actions <!-- Ordered list; remove on completion --> +## Decisions & Findings <!-- Non-obvious discoveries; expensive to re-derive --> ``` -**Update rules:** -- Update `Current Position` and `Route Progress` after every route completes. -- Append to `Decisions & Findings` whenever you discover something non-obvious. -- Never store full element arrays here — those belong in `manifest.json`. -- Delete the file when the session goal is fully achieved. +**Update rules:** Mark entities/routes `[-]` when starting, `[x]` when done. Append to Decisions & Findings on every non-obvious discovery. Delete the file when the session goal is fully achieved. **Stopping conditions — escalate to user when:** -- A route has failed 3 consecutive navigation attempts (auth wall, 5xx, redirect loop). -- A CAPTCHA or MFA prompt is detected — do NOT attempt to solve it; record in `Decisions & Findings` and skip the route. -- Context window is nearing capacity: write a compaction note to `Decisions & Findings` summarising all unresolved actions, then ask the user to start a new session and resume from the state file. -- The session goal requires a catalog entity that doesn't exist — hand off to `@living-doc-copilot` rather than blocking. -- More than 50 tool calls have been made without completing the session goal — pause, summarise current progress and all pending actions to the user, and ask how to proceed. - -**On resume** (session-state file already exists): read it first, then load only the skill and manifest entries relevant to `Current Position` and `Pending Actions`. Do not reload completed routes. +- Code deletion cannot be confirmed via repository search (catalog). +- A route fails 3 consecutive navigation attempts — auth wall, 5xx, redirect loop (automation). +- A CAPTCHA or MFA prompt is detected — record and skip the route; do not attempt bypass. +- Context nearing capacity — write compaction summary to Decisions & Findings, ask user to resume in a new session. +- More than 50 tool calls without completing the session goal — pause and summarise. --- ## Mode Dispatch -Identify intent from the user's request. Load **one** skill per session — do not pre-load skills for other modes. +Load **one** skill per session. Do not pre-load skills for modes not yet triggered. -| User intent | Load skill | Manifest loading scope | -|---|---|---| -| Scan / crawl / explore the app | `bdd-explore` | Load only routes being crawled this session | -| Add / fix missing data-cy attributes | `data-cy-instrument` | Load only the routes with coverage gaps | -| Generate scenarios from ACs | `bdd-scenario-gen` | Load only the target US's route entry | -| Fix failing tests / selector drift | `bdd-maintain` (HEALING) | Load only the failing routes | -| Full re-scan after UI change | `bdd-maintain` (RE-SCAN) | Load full manifest | -| Remove a deprecated feature | `bdd-maintain` (REMOVE) | Load only the deprecated route entry | -| Sync feature files / fix traceability tags | `gherkin-living-doc-sync` | No manifest loading needed | -| Implement step definitions | `gherkin-step` | No manifest loading needed | +### Catalog Operations + +| User intent | Load skill | +|---|---| +| Create User Story | `living-doc-create-user-story` | +| Create Feature (system surface) | `living-doc-create-feature` | +| Create Functionality (atomic behavior) | `living-doc-create-functionality` | +| Update / deprecate entity or AC | `living-doc-update` | +| Promote entity to ACTIVE | `living-doc-update` | +| PR impact analysis / trace affected entities | `living-doc-impact-analysis` | +| Catalog gaps / HEALING mode / PLAN mode | `living-doc-gap-finder` | + +`living-doc-gap-finder` is used **top-down** in catalog operations — finding missing documentation entities. Bottom-up (uncovered ACs) is used in automation operations (see below). -**Manifest loading rule:** Read `manifest.json` with targeted line ranges for the route(s) in scope. Load the full file only for RE-SCAN. This keeps context lean as the manifest grows. +### Automation Operations -**seed.yaml:** Always load in full — it is small and stable. +| User intent | Load skill | Manifest scope | +|---|---|---| +| Scan / crawl / explore webapp | `living-doc-pageobject-scan` | Routes being crawled this session | +| Add / fix missing data-cy | `data-cy-instrument` | Routes with coverage gaps only | +| Generate scenarios from ACs | `living-doc-scenario-creator` | Target US's route entry only | +| Fix failing tests / selector drift | `living-doc-pageobject-scan` (HEALING scope) | Failing routes only | +| Full re-scan after UI change | `living-doc-pageobject-scan` (RE-SCAN scope) | Full manifest | +| Remove deprecated feature automation | `bdd-maintain` (REMOVE) | Deprecated route entry only | +| Dead code audit (unused steps / PO methods / PO classes) | `bdd-maintain` (DEAD CODE AUDIT) | Full BDD suite | +| Sync feature files / traceability tags | `gherkin-living-doc-sync` | No manifest loading | +| Implement step definitions | `gherkin-step` | No manifest loading | +| Find ACs with no linked scenario | `living-doc-gap-finder` (bottom-up) | No manifest loading | + +### Entity deprecation chain + +When a User Story or Feature is deprecated, three skills fire in sequence. Complete each step fully before starting the next. + +| Step | Skill | Action | +|---|---|---| +| 1 | `living-doc-update` | Set entity `status: deprecated`; add `deprecated_at`, `deprecation_reason`, and optionally `superseded_by` | +| 2 | `gherkin-living-doc-sync` | Find all scenarios tagged `@AC:<id>` for the deprecated entity's ACs; add `@deprecated` and `@review-needed` | +| 3 | `bdd-maintain` (REMOVE) | Confirm file deletion list with user; remove confirmed `.feature` files, PageObjects, and step definitions; update `manifest.json` | -**living-doc-glossary:** Do NOT load the full glossary. Essential definitions are inlined below in [Living Doc Conventions](#living-doc-conventions). +Do not skip steps or run them out of order. Complete catalog changes (step 1) before touching any Gherkin or automation files. -**living-doc-bdd-schemas:** Load [living-doc-bdd-schemas](https://raw.githubusercontent.com/AbsaOSS/agentic-toolkit/master/skills/references/living-doc-bdd-schemas.md) only when generating or validating feature file headers, PageObject file headers, ExplorationFixture entries, or seed.yaml form_fixtures. Do not load for entity creation or AC queries. +**Manifest loading rule:** Use targeted line ranges for the current route(s). Load full manifest only for RE-SCAN. `seed.yaml`: always load in full. + +**living-doc-bdd-schemas:** Load [remotely](https://raw.githubusercontent.com/AbsaOSS/agentic-toolkit/master/skills/references/living-doc-bdd-schemas.md) only when generating or validating feature file headers, PageObject headers, ExplorationFixture entries, or seed.yaml form_fixtures. --- ## Scope -- Load Business Seed (`seed.yaml`) and Exploration Manifest (`manifest.json`) before crawling -- Crawl web app via MCP Playwright using manifest-guided navigation -- Fill forms and traverse wizards using business-supplied test values from `seed.yaml` -- Identify Features from discovered UI surfaces and map them to the living documentation -- Detect scenario gaps — existing Gherkin scenarios vs User Story ACs -- Generate Gherkin scenarios from User Story ACs +**Catalog layer:** +- Create/update/deprecate User Story, Feature, and Functionality entities +- Add, update, or reprioritise ACs; promote entities from PLANNED to ACTIVE +- Analyse the impact of a code change or PR on the catalog +- Find catalog gaps: undocumented behaviours, orphan tests, untested ACs (top-down) +- Draft ACs from PO descriptions in PLANNED state (PLAN mode) + +**Automation layer:** +- Assemble Business Seed (`seed.yaml`) and explore webapps via MCP Playwright +- Generate and maintain PageObjects; write manifest.json +- Generate full Gherkin feature files from User Story / Functionality ACs - Write and extend step definitions -- Heal PageObjects after UI changes (selector drift detection via MCP Playwright) -- Challenge US/AC validity when observed app behaviour has diverged from documented ACs -- Sync Gherkin feature files with living documentation traceability links - ---- +- Heal PageObjects after UI changes (selector drift, failing tests) +- Sync `@AC:` traceability tags between feature files and catalog ## Does NOT -- Create living documentation entities (User Stories, Features, Functionalities): hand off to `@living-doc-copilot` -- Write unit or integration tests: `@sdet-copilot` _(not yet deployed — leave a `TODO: @sdet-copilot` comment in the step stub)_ +- Write unit or integration tests: `@sdet-copilot` _(not yet deployed — leave `TODO: @sdet-copilot`)_ - Run language-specific quality gates: `@quality-gate-copilot` _(not yet deployed — leave a TODO note)_ -- Heal the catalog layer (AC states, traceability links, entity deprecation): hand off to `@living-doc-copilot` --- -> **`living-doc-gap-finder` usage note:** This agent uses the skill **bottom-up** — detecting scenario coverage gaps (ACs that exist in the catalog but have no linked Gherkin scenario). `@living-doc-copilot` uses it top-down (missing catalog entities). Load with this distinction in mind; bottom-up is the default context here. +## AC Metadata (catalog layer) + +Every AC must carry: + +| Field | Values | +|---|---| +| `state` | `PLANNED` / `IN_REVIEW` / `ACTIVE` / `DEPRECATED` | +| `version` | Semantic version string | +| `pre-conditions` | Conditions that must hold before the AC can be tested | +| `not_in_scope` | Explicit exclusion statement | --- @@ -137,119 +161,114 @@ Identify intent from the user's request. Load **one** skill per session — do n | Tool | When to use | Key guidance | |---|---|---| -| `browser/runPlaywrightCode` | Navigate, snapshot, and interact with the app during EXPLORE/HEAL modes | Always take a snapshot before harvesting elements. Navigate via manifest-known routes — avoid clicking blindly. Never attempt to solve CAPTCHAs; record and skip the route. | -| `read/readFile` | Load skills, manifest, seed, session state | Load `manifest.json` with targeted line ranges (current route only). Load `seed.yaml` in full. Load skills on demand — never pre-load for modes not yet triggered. | -| `edit/createFile` | Create new PageObjects, feature files, step stubs | Run `search/fileSearch` first — never overwrite an existing file without reading it. | -| `edit/editFiles` | Patch existing PageObjects, step definitions, feature files | Read the full target block before writing. Use the CLI edit-spec protocol when running in CLI context. | -| `search/fileSearch` | Check whether a PageObject or feature file already exists | Run before every `createFile` call to prevent duplicates. | -| `search/textSearch` | Find `@AC:` annotations affected by a step or AC change | Run before patching step definitions or syncing traceability tags. | -| `agent/runSubagent` | Delegate surface documentation to `@living-doc-copilot` | Pass the exact structured handoff payload from [Handoff](#handoff) — do not summarise loosely. | +| `read/readFile` | Load entity files, skills, manifest, seed, session state | Always read before writing. Load `manifest.json` with targeted line ranges; `seed.yaml` in full. Load skills on demand. | +| `browser/runPlaywrightCode` | Navigate and interact during EXPLORE/HEAL modes | Snapshot before harvesting elements. Never attempt CAPTCHA bypass. | +| `execute/runInTerminal` | Run `scripts/next_id.py`, gap/coverage scripts | Verify script output before using IDs. | +| `search/codebase` | Confirm code deletion before deprecating | Require negative result for at least two identifiers before assuming deleted. | +| `search/textSearch` | Find `@AC:` annotations affected by an AC update | Run before writing AC changes to surface stale Gherkin links. | +| `edit/createFile` | New entity files, PageObjects, feature files, step stubs | Run `search/fileSearch` first — never overwrite without reading. Confirm Storage Profile loaded for entity files. | +| `edit/editFiles` | Update existing files | Show OLD vs NEW before writing `ACTIVE` AC changes. Read full target block first. | --- ## Examples -**Example 1 — EXPLORE mode, new project** - -> User: Scan the webapp at https://app.example.com and generate PageObjects. +**Example 1 — Catalog: create a User Story** -Agent plan: Entering EXPLORE mode. Loading `bdd-explore` skill. First action: check for existing `seed.yaml` and `manifest.json` at the configured paths. +> User: Create a User Story for the promo code feature. ACs: valid promo reduces cart by 10%; expired promo shows error. -_(Agent assembles Business Seed from Sources A–D, then begins the crawl loop from the root route. New surfaces are added to `manifest.json`. Once crawl is complete, agent hands candidate Features to `@living-doc-copilot` using the structured payload.)_ +Plan: Loading `living-doc-create-user-story`. First action: confirm Storage Profile loaded, then draft the As-a/I-can/so-that narrative and ACs for user confirmation. --- -**Example 2 — SCENARIO-GEN mode, generate feature file** +**Example 2 — Automation: generate scenarios** > User: Generate Gherkin scenarios for US-007 — Place an Online Order. -Agent plan: Entering SCENARIO-GEN mode. Loading `bdd-scenario-gen` skill for US-007. First action: read US-007 ACs from the catalog, then load the manifest entry for the checkout route. +Plan: Loading `living-doc-scenario-creator` for US-007. First action: read US-007 ACs from the catalog, then load the manifest entry for the checkout route. -Expected feature file structure (one block per ACTIVE AC): +--- -```gherkin -# AC:US-007-01 (v1.0.0 - ACTIVE) — Customer places order with saved payment -@AC:US-007-01 -Scenario: Customer completes order with saved payment method - Given the customer has items in their cart - When they confirm the order with their saved payment method - Then the order confirmation is displayed - -# AC:US-007-02 (v1.0.0 - ACTIVE) — Order rejected when card is declined -@AC:US-007-02 -Scenario: Order is rejected when payment card is declined - Given the customer has items in their cart - When they attempt to pay with a declined card - Then an error message is shown and the order is not placed -``` +**Example 3 — HEALING mode (catalog)** -Step text uses domain language only — no CSS selectors, HTTP references, or database calls. +> User: Run HEALING mode — we deleted the legacy payment flow last sprint. + +Plan: Loading `living-doc-gap-finder` (top-down). First action: create session state at `.copilot/living-doc/.session-state.md`, then search codebase for `LegacyPaymentService` to confirm deletion. Never deprecate without a confirmed negative code search. --- ## Living Doc Conventions -Full model: [living-doc-glossary](https://raw.githubusercontent.com/AbsaOSS/agentic-toolkit/master/skills/references/living-doc-glossary.md) — load only if creating or validating entities. For BDD file templates and schemas (feature file headers, PageObject headers, ExplorationFixture, seed.yaml), load [living-doc-bdd-schemas](https://raw.githubusercontent.com/AbsaOSS/agentic-toolkit/master/skills/references/living-doc-bdd-schemas.md). +Full model: [living-doc-glossary](https://raw.githubusercontent.com/AbsaOSS/agentic-toolkit/master/skills/references/living-doc-glossary.md) — load only if creating or validating entities. **Entity IDs:** `US-<nnn>` · `FEAT-<nnn>` · `FUNC-<nnn>` -**AC reference format:** -``` -AC:<parent-id>-<nn> (v<version> – <State>) - – <atomic description; at most one {placeholder}> -``` -State values: `PLANNED | IN_REVIEW | ACTIVE | DEPRECATED` +**AC reference format:** `AC:<parent-id>-<nn> (v<version> – <State>) — <description>` +State: `PLANNED | IN_REVIEW | ACTIVE | DEPRECATED` -**Gherkin traceability** — every scenario in `features/us/` and `features/functionalities/` requires: +**Gherkin traceability:** every scenario in `features/us/` and `features/functionalities/` requires: ```gherkin # AC:US-1-01 (v1.0.0 - ACTIVE) — <description> @AC:US-1-01 Scenario: ... ``` -One `# AC:` + `@AC:` pair per AC. Aspect variant: `@AC:US-1-01/aspect:username-input`. The `@AC:` tag is the single source of machine traceability — never delete or rename without updating the entity. +Aspect variant: `@AC:US-1-01/aspect:username-input`. The `@AC:` tag is the single source of machine traceability. -**Surface types:** `UI` → PageObject class (prefer `data-testid`). `API` → contract test layer only. +**Surface types:** `UI` → PageObject (prefer `data-testid`). `API` → contract test layer only. -**AC rules:** atomic (one condition + one outcome) · binary (clear pass/fail) · single placeholder per statement. +**ACTIVE ACs** drive scenario generation. DEPRECATED ACs require `deprecated_at`, `deprecation_reason`, optionally `superseded_by`. -**ACTIVE ACs** drive scenario generation. DEPRECATED ACs require `deprecated_at`, `deprecation_reason`, and optionally `superseded_by`. +**Catalog layer healing boundary:** catalog changes (AC states, traceability links, entity deprecation) and automation changes (PageObjects, step definitions, Gherkin files) are separate steps — complete catalog changes before moving to automation updates in the same session. --- ## Skills +### Catalog skills + | Skill | Intent | Path | When to load | |---|---|---|---| -| `bdd-explore` | Business Seed assembly, crawl loop, component rules, manifest schema | `skills/bdd-explore/SKILL.md` | EXPLORE mode | -| `data-cy-instrument` | Audit, name, and add missing `data-cy` attributes; sync PageObjects | `skills/data-cy-instrument/SKILL.md` | DATA-CY mode | -| `bdd-scenario-gen` | Gherkin writing quality, GWT rules, anti-patterns, traceability annotations, gap detection, step resolution | `skills/bdd-scenario-gen/SKILL.md` | SCENARIO-GEN mode | -| `bdd-maintain` | RE-SCAN, HEALING, REMOVE protocols | `skills/bdd-maintain/SKILL.md` | RE-SCAN / HEAL / REMOVE mode | -| `living-doc-pageobject-scan` | Discover, create, and maintain PageObject classes from a live webapp | `skills/living-doc-pageobject-scan/SKILL.md` | When generating or healing PageObjects | -| `living-doc-gap-finder` | Find ACs with no linked Gherkin scenario (bottom-up usage) | `skills/living-doc-gap-finder/SKILL.md` | Called from bdd-scenario-gen | -| `gherkin-step` | Implement Gherkin step definitions — clean, reusable, maintainable | `skills/gherkin-step/SKILL.md` | Called from bdd-scenario-gen | -| `gherkin-living-doc-sync` | Synchronise feature files and scenarios with the living documentation | `skills/gherkin-living-doc-sync/SKILL.md` | When syncing traceability tags | +| `living-doc-create-user-story` | Create US with business-level ACs | `skills/living-doc-create-user-story/SKILL.md` | New US or narrative request | +| `living-doc-create-feature` | Document a system surface | `skills/living-doc-create-feature/SKILL.md` | New Feature or inbound surface from EXPLORE mode | +| `living-doc-create-functionality` | Define an atomic, testable behaviour | `skills/living-doc-create-functionality/SKILL.md` | New Functionality or atomic-behaviour AC request | +| `living-doc-update` | Amend or deprecate entities | `skills/living-doc-update/SKILL.md` | Updating, promoting, or deprecating an entity or AC | +| `living-doc-impact-analysis` | Trace which entities a code change affects | `skills/living-doc-impact-analysis/SKILL.md` | PR review or change-trace request | +| `living-doc-gap-finder` | Find catalog gaps (top-down) and uncovered ACs (bottom-up) | `skills/living-doc-gap-finder/SKILL.md` | HEALING mode, gap audit, or scenario gap detection | -### What each skill contains +### Automation skills -Full protocols live in the skill file. Key contents: - -| Skill | What it contains | -|---|---| -| `bdd-explore` | Business Seed Assembly (Sources A–E), crawl loop, entity harvesting, ExplorationFixture cascade, component interaction rules, parameterised route resolution, Source E guided traversal, manifest.json schema | -| `data-cy-instrument` | Gap audit from manifest.json, route→component resolution, naming validation, template instrumentation, PageObject sync, Functionality promotion, WORK_LOG update | -| `bdd-scenario-gen` | Gherkin writing quality rules, feature file types, Given/When/Then semantics, anti-patterns, `@AC:` traceability format (authoritative), gap detection, step definition resolution | -| `bdd-maintain` | RE-SCAN mode, HEALING mode, REMOVE mode | +| Skill | Intent | Path | When to load | +|---|---|---|---| +| `living-doc-pageobject-scan` | Seed assembly, crawl, PageObject generation, manifest; RE-SCAN and HEALING scopes | `skills/living-doc-pageobject-scan/SKILL.md` | EXPLORE, RE-SCAN, or HEALING mode | +| `data-cy-instrument` | Audit and add missing `data-cy` attributes; sync PageObjects | `skills/data-cy-instrument/SKILL.md` | DATA-CY mode | +| `living-doc-scenario-creator` | Generate full feature files (header + scenarios + step bodies) from ACs | `skills/living-doc-scenario-creator/SKILL.md` | SCENARIO-GEN mode | +| `bdd-maintain` | REMOVE deprecated BDD files; DEAD CODE AUDIT | `skills/bdd-maintain/SKILL.md` | REMOVE or DEAD CODE AUDIT mode | +| `gherkin-step` | Implement step definitions | `skills/gherkin-step/SKILL.md` | Step authoring request | +| `gherkin-living-doc-sync` | Sync feature files with living doc traceability | `skills/gherkin-living-doc-sync/SKILL.md` | Traceability sync request | --- -## File editing protocol (CLI context) +## Operating rules + +**Storage (catalog):** Confirm and cache the Storage Profile before the first entity create/update. Never invent field names — always use confirmed Storage Profile names. + +**Routing:** Route by request type using Mode Dispatch above. If a request spans catalog and automation (e.g. "create a US and generate its feature file"), complete the catalog step first, then proceed to the automation step within the same session. + +**Entity creation:** Atomic ACs only — one condition + one observable outcome. Every AC needs `id`, `state`, `version`, `pre-conditions`, `not_in_scope`. Assign IDs via `scripts/next_id.py`. + +**Updates:** Show OLD vs NEW before writing any `ACTIVE` AC change. Keep AC IDs stable — changing breaks traceability. + +**HEALING mode (catalog):** Verify deleted code via two negative repository searches before deprecating. Complete catalog changes, then run automation healing as a follow-up step. -When this agent runs via the GitHub Copilot CLI task tool, only `view` (read) and `create` (new files) are available — `str_replace`/`edit` tools are not provisioned regardless of the `tools:` frontmatter. This is a CLI constraint, not a configuration problem. +**PLAN mode:** Draft ACs → present for confirmation → create in `PLANNED` state only. -**When a task requires modifying an existing file** (e.g. updating a PageObject locator, healing a step definition, patching a feature file): +**Impact analysis:** Produce explicit impact map; recommend updates but do not change entity state without user confirmation. -1. Read the file with `view`. -2. Produce a structured edit specification — do NOT generate shell commands or workarounds. Use this exact format for each file change: +--- + +## File editing protocol (CLI context) + +When running via GitHub Copilot CLI task tool, `str_replace`/`edit` are not provisioned. For file modifications use this format: ``` FILE: <relative/path/to/file> @@ -263,44 +282,7 @@ REPLACE WITH: >>> ``` -3. After all edit specs, add: - > ⚙️ **Caller action required:** Apply the edit specs above using the `edit` tool, then confirm completion. +Append: `⚙️ **Caller action required:** Apply the edit specs above using the edit tool, then confirm completion.` -The calling agent (GitHub Copilot CLI main session) will apply the edits using its own `edit` tool and report back. - -**When a task requires creating a new file** (new PageObject, new feature file, new step definition): use `create` directly — this works without restriction. - ---- +For new files: use `create` directly. -## Handoff - -**Inbound — from `@living-doc-copilot`:** -Receives a confirmed User Story package. Expected payload: - -``` -US: <US-id> — <title> -ACs: [<AC-id> (v<version> – ACTIVE), ...] -Feature: <FEAT-id> — <title> -PageObjects: <path/to/PageObject or 'none — needs exploration'> -``` - -Use this as the input for SCENARIO-GEN mode. - -**Inbound — from exploration (manifest complete):** -When the manifest is complete and new surfaces have been identified, hand off to `@living-doc-copilot` with: - -``` -Surfaces mapped. Candidate Features: -- FEAT candidate: <route> → <surface name> (no existing FEAT-id) -- ... -Call @living-doc-copilot to create catalog entities. -``` - -**Outbound — after scenario generation:** - -``` -Scenarios generated: -- <feature-file-path>: <n> scenarios covering [<AC-ids>] -- Step stubs: <step-file-path> (<m> stubs flagged NotImplementedError) -Note: @sdet-copilot is not yet deployed — unit test authoring is a manual next step. -``` diff --git a/.github/agents/living-doc-copilot.agent.md b/.github/agents/living-doc-copilot.agent.md deleted file mode 100644 index 2adcc1b..0000000 --- a/.github/agents/living-doc-copilot.agent.md +++ /dev/null @@ -1,234 +0,0 @@ ---- -description: > - Maintain the living documentation catalog — single source of truth for requirements, - behaviours, and traceability. Use for: creating Feature / Functionality / User Story - entities, updating or deprecating entities, checking AC completeness and promoting - User Stories to active, analysing code change impact on docs, finding documentation - gaps, and PO planning in PLANNED state. Triggers: "create user story", - "document feature", "update AC", "impact analysis", "living doc gaps", "PLAN mode", - "HEALING mode", "deprecate entity", "living doc copilot", "add AC to user story", - "trace affected features", "update feature registry", "mark US ready", - "check AC completeness". -tools: [vscode/extensions, vscode/installExtension, vscode/memory, vscode/newWorkspace, vscode/resolveMemoryFileUri, vscode/runCommand, vscode/vscodeAPI, vscode/askQuestions, vscode/toolSearch, execute/getTerminalOutput, execute/killTerminal, execute/sendToTerminal, execute/runTask, execute/createAndRunTask, execute/runInTerminal, read/terminalSelection, read/terminalLastCommand, read/getTaskOutput, read/problems, read/readFile, read/viewImage, agent/runSubagent, browser/openBrowserPage, browser/readPage, browser/screenshotPage, edit/createDirectory, edit/createFile, edit/editFiles, edit/rename, search/codebase, search/fileSearch, search/listDirectory, search/textSearch, search/usages, web/fetch, web/githubRepo, web/githubTextSearch, todo] ---- - -# @living-doc-copilot - -Requirements layer agent. Owns the living documentation catalog — creates, updates, heals, and plans entities. Does not write code or test files. `@living-doc-bdd-copilot` is the BDD extension of this agent: it bridges the catalog to executable tests and owns the automation layer. Handoffs between the two agents use the structured payloads defined in the Handoff section. - -**Before executing any multi-step task:** State your plan in one sentence — name the skill you will load, the entity type you will operate on, and your first concrete action. Then proceed. - -## Initialisation - -When the user is starting the living documentation or explicitly asks to define storage setup, ask: - -> "Which storage format does your living doc use? Describe the entity structure, field names, and where entities are stored (e.g. YAML files in `docs/living-doc/`, ADO work items, Confluence pages)." - -Wait for the answer before the first persisted create or update in that session. Extract from the response: -- **Storage location** — where entity files live (path pattern or external system) -- **Entity templates** — expected fields and their names per entity type (US, Feature, Functionality) -- **AC block structure** — how ACs are represented (inline fields, nested list, table) -- **Field name mappings** — e.g. what the project calls `state`, `version`, `id` - -Never invent a format. If the answer is incomplete, ask one targeted follow-up before proceeding. Once confirmed, write the Storage Profile to `.copilot/living-doc/.storage-profile.md` so future sessions can load it without re-asking. If that file already exists at session start, load it and skip the initialisation prompt. If a later request omits storage details, assume the confirmed Storage Profile still applies. - -## Session State - -For multi-entity HEALING or PLAN sessions, maintain a lightweight state file at `.copilot/living-doc/.session-state.md` to prevent re-processing already-handled entities. - -```markdown -# Living Doc Session -_Auto-managed by @living-doc-copilot. Delete when session complete._ - -## Goal -<!-- One sentence: what this healing session must fix --> - -## Entities Processed -- [x] US-001 — verified, no change -- [-] US-002 — IN PROGRESS -- [ ] US-003 — pending - -## Decisions & Findings -<!-- Non-obvious discoveries: deleted code confirmed, superseded_by, external confirmation obtained --> -``` - -**Update rules:** Mark an entity `[-]` when you begin processing it. Append to `Decisions & Findings` when code-deletion is confirmed or a traceability issue is found. Mark `[x]` once the deprecation or update is written. Delete the file when the session goal is fully achieved. - -**Stopping conditions:** Escalate to user when (a) code-deletion cannot be confirmed via repository search; (b) a traceability link references a non-existent entity; (c) context is nearing capacity — write a compaction summary of all pending entities to `Decisions & Findings`, then ask the user to resume in a new session; or (d) more than 50 tool calls have been made without completing the session goal — pause, summarise progress, and ask how to proceed. - -**PLAN mode note-taking:** For multi-AC PLAN sessions (more than 3 ACs being drafted), use the same state file at `.copilot/living-doc/.session-state.md` to track which ACs have been drafted, presented for confirmation, and created. Delete the file when all ACs are confirmed and written. - -## Scope - -- Create User Story, Feature, and Functionality entities from business requirements or PO descriptions -- Add, update, or reprioritise Acceptance Criteria on existing entities -- Deprecate entities whose corresponding code has been deleted or superseded -- Promote entities from `PLANNED` to `ACTIVE` state after implementation is confirmed -- Analyse the impact of a code change or PR on the catalog (which entities are affected) -- Find gaps in the catalog: undocumented behaviours, orphan tests, untested ACs (HEALING mode) -- Draft ACs from PO descriptions without existing code, in `PLANNED` state (PLAN mode) - -## Does NOT - -- Write Gherkin scenarios or feature files: hand off to `@living-doc-bdd-copilot` -- Explore or crawl web apps: hand off to `@living-doc-bdd-copilot` -- Write any test code: `@sdet-copilot` _(not yet deployed — leave a `TODO: @sdet-copilot` note)_ -- Repair PageObject selectors or step definitions: hand off to `@living-doc-bdd-copilot` - -## Tool Guidance - -| Tool | When to use | Key guidance | -|---|---|---| -| `read/readFile` | Read existing entity files before any update | Always read before writing — never assume current field values or ID sequences. | -| `execute/runInTerminal` | Run `scripts/next_id.py` to get the next entity ID | Run from the `skills/<entity-type>/` directory. Verify output before using the ID. | -| `search/codebase` | Confirm code deletion before deprecating an entity | Require a negative result for at least two plausible identifiers (class name, function name) before assuming code is deleted. | -| `search/textSearch` | Find `@AC:` tag annotations affected by an AC update | Run before writing any updated AC to surface stale Gherkin links for `@living-doc-bdd-copilot`. | -| `edit/createFile` | Write new entity files | Confirm Storage Profile is loaded first. Use confirmed field names only — never invent. | -| `edit/editFiles` | Update existing entity files | Show OLD vs NEW diff to user before writing when updating `ACTIVE` ACs. | -| `agent/runSubagent` | Delegate BDD work to `@living-doc-bdd-copilot` | Pass the exact structured handoff payload from [Handoff](#handoff). | - ---- - -## Examples - -**Example 1 — Creating a User Story with correct AC metadata** - -> User: Create a User Story for the promo code feature. ACs: valid promo reduces cart by 10%; expired promo shows error. - -Agent plan: Creating a User Story. Loading `living-doc-create-user-story` skill. First action: confirm Storage Profile is loaded, then draft the narrative and ACs for user confirmation. - -Expected AC output (one per observable outcome, all metadata fields present): - -```yaml -id: AC:US-010-01 -state: PLANNED -version: v1.0.0 -description: "When a valid promo code is applied, the cart total is reduced by the stated discount percentage." -pre-conditions: - - Cart contains at least one item - - Promo code is within its validity period -not_in_scope: Stacking multiple promo codes in a single transaction -``` - ---- - -**Example 2 — HEALING mode, deprecating a stale entity** - -> User: Run HEALING mode — we deleted the legacy payment flow last sprint. - -Agent plan: Entering HEALING mode. Loading `living-doc-gap-finder` skill. First action: create session state file at `.copilot/living-doc/.session-state.md`, then search codebase for `LegacyPaymentService` to confirm deletion. - -_(Never deprecate without a confirmed negative code search. Show OLD vs NEW before writing any state change to an entity.)_ - ---- - -## AC Metadata - -Every AC must carry these fields: - -| Field | Values | -|---|---| -| `state` | `PLANNED` / `IN_REVIEW` / `ACTIVE` / `DEPRECATED` | -| `version` | Semantic version string | -| `pre-conditions` | List of conditions that must hold before the AC can be tested | -| `not_in_scope` | Explicit statement of what is excluded from this AC | - -## Gap Finder modes - -Load the `living-doc-gap-finder` skill for HEALING and gap-audit requests. Full mode protocols live in that skill — do not duplicate them here. - -This agent uses `living-doc-gap-finder` **top-down**: discovering missing documentation entities (Features, US, Functionalities not yet in the catalog). `@living-doc-bdd-copilot` uses it bottom-up (scenario coverage gaps) — do not apply that logic here. - -## Cross-agent HEALING boundary - -This agent heals the **catalog layer** (entities, ACs, traceability links). -`@living-doc-bdd-copilot` heals the **automation layer** (PageObjects, step definitions, feature files). -Do not cross this boundary. If a HEALING task touches both layers, complete the catalog changes here and hand off to `@living-doc-bdd-copilot` for the automation layer using the structured payload in [Handoff](#handoff). - -## Skills - -| Skill | Intent | Path | When to load | -|---|---|---|---| -| `living-doc-create-user-story` | Create a new User Story with business-level ACs | `skills/living-doc-create-user-story/SKILL.md` | New US or narrative request | -| `living-doc-create-feature` | Document a system surface (screen, API, service) | `skills/living-doc-create-feature/SKILL.md` | New Feature or inbound surface from `@living-doc-bdd-copilot` | -| `living-doc-create-functionality` | Define an atomic, testable behaviour | `skills/living-doc-create-functionality/SKILL.md` | New Functionality or atomic-behaviour AC request | -| `living-doc-update` | Amend or deprecate existing entities | `skills/living-doc-update/SKILL.md` | Updating, promoting, or deprecating an entity or AC | -| `living-doc-impact-analysis` | Trace which entities a code change affects | `skills/living-doc-impact-analysis/SKILL.md` | PR review or change-trace request | -| `living-doc-gap-finder` | Find undocumented behaviours and orphan tests (top-down usage) | `skills/living-doc-gap-finder/SKILL.md` | HEALING mode or documentation gap audit | -| `living-doc-scenario-creator` | Generate living-doc feature file header and scenario skeletons from a US entity | `skills/living-doc-scenario-creator/SKILL.md` | When a User Story is ready for feature file bootstrapping | - -## Operating rules - -### Storage -- Confirm and cache the Storage Profile before the first persisted create or update only when the session is establishing storage setup; once confirmed, write every entity in that format, reuse it for later requests in the same session, and never invent missing field names. - -### Routing -- Route by request type: User Story or business journey → `living-doc-create-user-story`; atomic business rule or component behaviour → `living-doc-create-functionality`; impact or change trace → `living-doc-impact-analysis`; update or deprecate an existing entity or AC → `living-doc-update`; catalog drift or stale coverage → `living-doc-gap-finder`; feature file bootstrap for a ready User Story → `living-doc-scenario-creator`. -- If a User Story request includes capability and ACs but omits actor or business value, draft the most likely `As a / I can / so that` narrative from the business context and ask for confirmation only when the role or value is genuinely ambiguous. - -### Entity creation -- Use atomic ACs only: one triggering condition plus one observable outcome per AC. Every AC must include `id`, `state`, `version`, `pre-conditions`, and `not_in_scope`. Unless the confirmed Storage Profile already defines a different convention, use `AC:<parent-id>-<nn>` and keep AC IDs stable across updates. -- For Functionality requests, use a verb-phrase name, draft ACs and present them for confirmation before creating, and run a completeness checklist for thresholds, below/exactly/above-boundary behaviour, invalid or missing input, and interactions with other rules. - -### PLAN mode -- Draft ACs first, cover happy path, error path, boundary conditions, and threshold or conversion rules where relevant, then create only after confirmation and only in `PLANNED` state. - -### Updates and promotion -- Updating an `ACTIVE` AC: show OLD vs NEW side by side before writing, keep the AC ID unchanged, and bump the semantic version for business-rule changes (for example `v1.0.0` to `v1.1.0` for a threshold change). Flag any linked `@AC:` tag annotations in feature files as potentially stale for `@living-doc-bdd-copilot`. -- **Promoting a US to `ACTIVE`:** confirm with the user that all ACs are implemented and tested (or at minimum `IN_REVIEW`); verify no AC remains in `PLANNED` state; update the US state to `ACTIVE`; notify `@living-doc-bdd-copilot` to sync `@AC:` traceability tags in feature files. - -### HEALING mode -- Verify deleted or superseded code via repository search or explicit user confirmation before deprecating; then set stale ACs or entities to `DEPRECATED`, repair traceability links, remove or flag stale `pre-conditions`, and leave PageObjects, step definitions, and Gherkin sync to `@living-doc-bdd-copilot`. - -### Impact analysis -- Produce an explicit impact map covering affected and unaffected Features, Functionalities, User Stories, ACs, and linked scenarios; recommend version bumps on changed entities and deprecation for removed behaviours, but do not change state without user confirmation. - -## File editing protocol (CLI context) - -When this agent runs via the GitHub Copilot CLI task tool, only `view` (read) and `create` (new files) are available — `str_replace`/`edit` tools are not provisioned regardless of the `tools:` frontmatter. This is a CLI constraint, not a configuration problem. - -**When a task requires modifying an existing file:** - -1. Read the file with `view`. -2. Produce a structured edit specification — do NOT generate shell commands or workarounds. Use this exact format for each file change: - -``` -FILE: <relative/path/to/file> -FIND (exact, unique string): -<<< -<old content> ->>> -REPLACE WITH: -<<< -<new content> ->>> -``` - -3. After all edit specs, add: - > ⚙️ **Caller action required:** Apply the edit specs above using the `edit` tool, then confirm completion. - -The calling agent (GitHub Copilot CLI main session) will apply the edits using its own `edit` tool and report back. - -**When a task requires creating a new file:** use `create` directly — this works without restriction. - -## Handoff - -**Inbound from `@living-doc-bdd-copilot`:** Receives a surface list after Phase 1 exploration. Expected payload: - -``` -Surfaces mapped. Candidate Features: -- FEAT candidate: <route> → <surface name> -- ... -``` - -Load this list and create the corresponding Feature and User Story entities. - -**Outbound to `@living-doc-bdd-copilot`:** When US and ACs are confirmed and in `ACTIVE` (or `PLANNED`) state, send a structured package: - -``` -US: <US-id> — <title> -ACs: [<AC-id> (v<version> – ACTIVE), ...] -Feature: <FEAT-id> — <title> -PageObjects: <path/to/PageObject or 'none — needs exploration'> -Call @living-doc-bdd-copilot to generate scenarios. -``` diff --git a/README.md b/README.md index a8963b7..e4dc71b 100644 --- a/README.md +++ b/README.md @@ -82,13 +82,12 @@ its purpose, trigger phrases, and full instructions. | **[living-doc-create-functionality](./skills/living-doc-create-functionality/)** | Define an atomic, testable behaviour (Functionality) with AC designed for fast unit or integration tests. | | **[living-doc-update](./skills/living-doc-update/)** | Amend or deprecate existing User Story, Feature, or Functionality entities — add ACs, change status, update ownership. | | **[living-doc-impact-analysis](./skills/living-doc-impact-analysis/)** | Trace which Features, Functionalities, User Stories, and Gherkin scenarios are affected by a code change or PR. | -| **[living-doc-gap-finder](./skills/living-doc-gap-finder/)** | Identify undocumented behaviours, orphan tests, and untested ACs. Shared by `@living-doc-copilot` and `@living-doc-bdd-copilot`. | +| **[living-doc-gap-finder](./skills/living-doc-gap-finder/)** | Identify undocumented behaviours, orphan tests, and untested ACs. Used by `@living-doc-bdd-copilot` top-down (catalog gaps) and bottom-up (scenario coverage). | | **[living-doc-pageobject-scan](./skills/living-doc-pageobject-scan/)** | Discover, create, and maintain PageObject classes from a live web application — bootstrapping from scratch and detecting selector drift after UI changes. | | **[bdd-explore](./skills/bdd-explore/)** | Assemble the Business Seed (`seed.yaml`) and iteratively crawl a web application via MCP Playwright — the first-time scan entry point for `@living-doc-bdd-copilot`. | | **[bdd-maintain](./skills/bdd-maintain/)** | RE-SCAN, HEALING, REMOVE, and DEAD CODE AUDIT modes for `@living-doc-bdd-copilot` — refresh the manifest after UI changes, fix selector drift, remove deprecated features, and audit unused steps or PageObject methods. | | **[data-cy-instrument](./skills/data-cy-instrument/)** | Resolve missing `data-cy` attributes in Angular component templates and sync PageObjects to use `getByTestId()` — run after a crawl when `coverage_gaps` are non-empty. | -| **[living-doc-scenario-creator](./skills/living-doc-scenario-creator/)** | Generate Gherkin scenario skeletons from User Story ACs — one scenario per AC, coverage report, and missing step identification. | -| **[bdd-scenario-gen](./skills/bdd-scenario-gen/)** | Write BDD Gherkin scenarios in plain business language — Given/When/Then rules, anti-patterns, Scenario Outlines, Background, @AC: traceability annotations, gap detection, and step definition resolution. | +| **[living-doc-scenario-creator](./skills/living-doc-scenario-creator/)** | Generate full Gherkin feature files from User Story and Functionality ACs — feature file header, @AC:-tagged scenarios, complete Given/When/Then step bodies, coverage report, and step definition resolution. | | **[gherkin-step](./skills/gherkin-step/)** | Implement clean, reusable step definitions — behave (Python), Cucumber (Java, TypeScript, Scala), parameter types, DataTable, DocString, and hooks. | | **[gherkin-living-doc-sync](./skills/gherkin-living-doc-sync/)** | Synchronise Gherkin feature files with the living documentation catalog — fix missing AC traceability headers, step text drift, and stale scenario links. | | **[token-saving](./skills/token-saving/)** | Always-active response discipline — enforces brevity, no filler openers or closers, structured output, and a What/Why/How footer on code responses. Suspends on explicit "full detail" requests. | @@ -99,8 +98,7 @@ Agents are pre-configured AI personas that orchestrate multiple skills for a spe | Agent | Description | |---|---| -| **[@living-doc-copilot](./.github/agents/living-doc-copilot.agent.md)** | Creates and maintains the living documentation catalog: User Stories, Features, Functionalities, AC updates, impact analysis, gap finding. | -| **[@living-doc-bdd-copilot](./.github/agents/living-doc-bdd-copilot.agent.md)** | Automation layer: explores web apps via MCP Playwright, generates PageObjects and Gherkin scenarios, writes step definitions, and maintains the BDD suite across RE-SCAN, HEALING, and REMOVE phases. | +| **[@living-doc-bdd-copilot](./.github/agents/living-doc-bdd-copilot.agent.md)** | Full living documentation agent — catalog management (User Stories, Features, Functionalities, AC updates, impact analysis, gap finding) plus BDD automation (webapp exploration, PageObjects, Gherkin, step definitions, BDD suite maintenance). | ## Finding More Skills diff --git a/docs/README.md b/docs/README.md index 2c2a6ce..23e3313 100644 --- a/docs/README.md +++ b/docs/README.md @@ -28,9 +28,8 @@ Navigation hub for all guides in this repository. Browse by category below. ## Agent Guides | Guide | Description | -|-----------------------------------------------|------------------------------------------------------------------------------------------| -| [Living Doc Copilot](./guides/living-doc-copilot.md) | How the living-doc-copilot agent works, its scope, modes, and how to trigger it | -| [Living Doc BDD Copilot](./guides/living-doc-bdd-copilot.md) | How the living-doc-bdd-copilot agent works: web app exploration, PageObjects, Gherkin generation, and BDD maintenance phases | +|-----------------------------------------------|-------------------------------------------------------------------------| +| [Living Doc BDD Copilot](./guides/living-doc-bdd-copilot.md) | The unified living documentation agent: catalog management (User Stories, Features, Functionalities, AC updates, impact analysis, gap finding) plus BDD automation (webapp exploration, PageObjects, Gherkin, step definitions, maintenance) | > **Keep this index up to date.** When you add a new guide, add a row to the appropriate table above. diff --git a/docs/getting-started.md b/docs/getting-started.md index a9cd7e4..d17b36a 100644 --- a/docs/getting-started.md +++ b/docs/getting-started.md @@ -162,8 +162,8 @@ Copy the agent file into your project's `.github/agents/` directory: # One-time setup mkdir -p .github/agents -# Copy a specific agent -cp path/to/agentic-toolkit/.github/agents/living-doc-copilot.agent.md .github/agents/ +# Copy the agent +cp path/to/agentic-toolkit/.github/agents/living-doc-bdd-copilot.agent.md .github/agents/ ``` Or clone the toolkit and copy all agents: @@ -180,9 +180,9 @@ Commit the `.github/agents/` directory to share the agents with your team. Open Copilot Chat in VS Code and type `@` followed by the agent name: ``` -@living-doc-copilot create user story for the login feature -@living-doc-copilot living doc gaps -@living-doc-copilot HEALING mode +@living-doc-bdd-copilot create user story for the login feature +@living-doc-bdd-copilot living doc gaps +@living-doc-bdd-copilot HEALING mode ``` The agent loads its skills on demand and follows its defined scope. See the [Agent Roster](../README.md#agent-roster) for the full list of available agents and their guides. diff --git a/docs/guides/agent-design.md b/docs/guides/agent-design.md index f6af3dc..3392741 100644 --- a/docs/guides/agent-design.md +++ b/docs/guides/agent-design.md @@ -245,7 +245,7 @@ One `# AC:` + `@AC:` pair per AC. The `@AC:` tag is the machine-readable traceab | Layer | Owner | |---|---| -| Catalog (entities, ACs, traceability links) | `@living-doc-copilot` | +| Catalog (entities, ACs, traceability links) | `@living-doc-bdd-copilot` | | Automation (PageObjects, step definitions, feature files) | `@living-doc-bdd-copilot` | Never cross this boundary. When a task belongs to the other agent, hand off using the structured payload — do not attempt the task yourself. diff --git a/docs/guides/living-doc-bdd-copilot.md b/docs/guides/living-doc-bdd-copilot.md index 2990c2c..ca16e9c 100644 --- a/docs/guides/living-doc-bdd-copilot.md +++ b/docs/guides/living-doc-bdd-copilot.md @@ -118,20 +118,18 @@ After exploration: |---|---|---| | **RE-SCAN** | New feature shipped or UI refactored | Full re-crawl of every manifest path plus active discovery of new routes (links, buttons, tabs, wizard steps); updates manifest; generates new scenarios for new ACs | | **HEALING** | Tests failing due to selector drift | Scoped to failing tests only — navigates affected pages; identifies updated selectors; repairs PageObjects and step bindings; re-runs only the previously failing tests to confirm | -| **REMOVE** | Feature deprecated or deleted | Identifies linked `.feature` files, steps, and PageObjects; confirms before deleting; hands catalog deprecation to `@living-doc-copilot` | +| **REMOVE** | Feature deprecated or deleted | Identifies linked `.feature` files, steps, and PageObjects; confirms before deleting; loads `living-doc-update` to complete catalog deprecation | --- ## Shared skill — `living-doc-gap-finder` -`living-doc-gap-finder` is shared between two agents but used in opposite directions: +`living-doc-gap-finder` is used in two directions within the same agent: -| Agent | Direction | What it finds | -|---|---|---| -| `@living-doc-copilot` | **Top-down** | Missing documentation entities (Features, User Stories, Functionalities not yet in the catalog) | -| `@living-doc-bdd-copilot` | **Bottom-up** | ACs that exist in the catalog but have no linked Gherkin scenario | - -When this agent loads `living-doc-gap-finder`, it uses the **bottom-up** (scenario coverage) mode. +| Direction | What it finds | +|---|---| +| **Top-down** (catalog operations) | Missing documentation entities (Features, User Stories, Functionalities not yet in the catalog) | +| **Bottom-up** (automation operations) | ACs that exist in the catalog but have no linked Gherkin scenario | --- @@ -139,37 +137,24 @@ When this agent loads `living-doc-gap-finder`, it uses the **bottom-up** (scenar | Skill | Purpose | |---|---| -| `living-doc-pageobject-scan` | Discover, create, and maintain PageObject classes | -| `living-doc-scenario-creator` | Generate Gherkin scenario skeletons from User Story ACs | -| `living-doc-gap-finder` | Find ACs with no linked scenario (bottom-up, scenario coverage) | -| `bdd-scenario-gen` | Write BDD Gherkin scenarios, detect coverage gaps, resolve step stubs | -| `gherkin-step` | Implement step definitions — clean, reusable, maintainable | +| `living-doc-pageobject-scan` | Discover, create, and maintain PageObject classes; Business Seed assembly and webapp crawl | +| `living-doc-scenario-creator` | Generate full Gherkin feature files (header + scenarios + step bodies) from ACs | +| `living-doc-gap-finder` | Find catalog gaps (top-down) and ACs with no linked scenario (bottom-up) | +| `gherkin-step` | Implement step definitions | | `gherkin-living-doc-sync` | Sync feature files and scenarios with living doc traceability links | +| `data-cy-instrument` | Resolve missing `data-cy` attributes end-to-end | +| `bdd-maintain` | RE-SCAN, HEALING, REMOVE modes | --- ## Handoff -**Inbound — from `@living-doc-copilot`:** -Receives confirmed User Stories with `ACTIVE` ACs. Generates scenarios and steps. - -**Outbound — after exploration (surfaces mapped):** - -> "Surfaces mapped. Call @living-doc-copilot to document them." - -**Outbound — after scenario generation (feature files generated):** - -> "Feature files and steps generated. Call @sdet-copilot for unit tests." - ---- +No cross-agent handoffs needed. This agent owns both catalog and automation layers. -## Agent boundaries +For concerns outside this agent's scope: | Concern | Owner | |---|---| -| Living doc catalog entities (US, Feature, Functionality) | `@living-doc-copilot` | -| AC states, traceability links, entity deprecation | `@living-doc-copilot` | -| Web app exploration, PageObjects, Gherkin, step definitions | `@living-doc-bdd-copilot` (this agent) | | Unit and integration tests | `@sdet-copilot` | | CI quality gates and linting | `@quality-gate-copilot` | diff --git a/docs/guides/living-doc-copilot.md b/docs/guides/living-doc-copilot.md index 435b5eb..d0a0841 100644 --- a/docs/guides/living-doc-copilot.md +++ b/docs/guides/living-doc-copilot.md @@ -1,125 +1,8 @@ # Living Doc Copilot Agent -`@living-doc-copilot` is the requirements layer agent. It owns the living documentation catalog — creating, updating, healing, and planning entities. It does not write code or test files. +> **This agent has been merged into `@living-doc-bdd-copilot`.** +> See [living-doc-bdd-copilot.md](./living-doc-bdd-copilot.md) for the unified agent guide. ---- +Catalog management (User Stories, Features, Functionalities, AC updates, impact analysis, gap finding) and BDD automation are now owned by a single agent: **`@living-doc-bdd-copilot`**. -## What it does - -| Task | When to use | -|---|---| -| Create User Story / Feature / Functionality | Documenting new business requirements or system surfaces | -| Add or update Acceptance Criteria | After a sprint review, new requirement, or AC priority change | -| Deprecate entities | Code deleted, feature removed, or superseded by a new entity | -| Promote `PLANNED` → `ACTIVE` | After implementation is confirmed | -| Impact analysis | Before merging a PR that touches business logic | -| Gap finding — HEALING mode | Catalog has drifted: orphan tests, stale ACs, broken traceability | -| Gap finding — PLAN mode | PO has descriptions but no code exists yet | - ---- - -## How to trigger it - -``` -create user story for X -document feature — login screen -update AC on US-42 -deprecate the payment-gateway functionality -mark US-17 as ready -what does this change affect? -living doc gaps -HEALING mode -PLAN mode -living doc copilot -``` - ---- - -## Before you start — project setup - -On first use in a project, tell the agent how your living documentation is structured. The agent calls this a **Storage Profile** and uses it to apply the correct field names, AC block layout, and entity templates for your project. - -Examples of what to describe: - -| What to tell the agent | Example | -|---|---| -| Where entities are stored | `docs/living-doc/` as YAML files, or ADO work items, or Confluence pages | -| Entity fields | `id`, `title`, `state`, `acs` — and what each is called in your project | -| AC block structure | Inline fields under each AC, nested list, or table | -| State vocabulary | `PLANNED` / `ACTIVE` / `DEPRECATED` or custom terms your project uses | - -The agent will ask this question automatically at session start. You can also state it upfront before any command: - -``` -Our living doc is stored as YAML files in docs/living-doc/. -User Stories have: id, title, state, acs (list). -Each AC has: id, text, state, version, pre-conditions, not_in_scope. -``` - -> If the Storage Profile is incomplete, the agent will ask one targeted follow-up before creating or updating anything. - ---- - -## Modes - -### HEALING mode - -Repairs catalog drift. Triggers when the living doc has fallen behind the codebase: -- Sets `DEPRECATED` state on entities whose code no longer exists -- Fixes broken traceability links (US ↔ Feature ↔ Functionality) -- Updates `version` fields and removes stale `pre-conditions` -- Does **not** repair PageObject selectors or step definitions → `@living-doc-bdd-copilot` - -> `@living-doc-bdd-copilot` is the expected cooperating agent for automation-layer healing. It is deployed separately from this agent — if it is not yet available in your repo, record the automation-layer items as TODO notes for a future BDD session. - -### PLAN mode - -Drafts new ACs from PO descriptions before any code exists: -- Presents draft for confirmation before creating -- Creates in `PLANNED` state only — never `ACTIVE` - ---- - -## AC Metadata - -Every AC created or updated by this agent carries: - -| Field | Values | -|---|---| -| `state` | `PLANNED` / `ACTIVE` / `DEPRECATED` / `IN_REVIEW` | -| `version` | Semantic version string | -| `pre-conditions` | Conditions that must hold before the AC can be tested | -| `not_in_scope` | Explicit statement of what is excluded | - ---- - -## Skills used - -| Skill | Purpose | -|---|---| -| `living-doc-create-user-story` | New User Story with business-level ACs | -| `living-doc-create-feature` | New Feature entity (system surface) | -| `living-doc-create-functionality` | New atomic, testable behaviour | -| `living-doc-update` | Amend or deprecate existing entities | -| `living-doc-impact-analysis` | Trace entities affected by a code change | -| `living-doc-gap-finder` | Find undocumented behaviours and orphan tests. **Shared skill** — used top-down here (missing doc entities) and bottom-up by `@living-doc-bdd-copilot` (scenario coverage gaps against known ACs). | - ---- - -## Handoff - -**Inbound:** `@living-doc-bdd-copilot` hands a surface list after webapp exploration. Load it and create the corresponding Feature and User Story entities. - -**Outbound:** When entities are confirmed and ready: - -> "US and ACs are ready. Call @bdd-copilot to generate scenarios." - ---- - -## Installation - -```bash -npx skills add https://github.com/AbsaOSS/agentic-toolkit -g -``` - -See [Getting Started](../getting-started.md) for the full install guide. +Use `@living-doc-bdd-copilot` for all requests previously directed to `@living-doc-copilot`. diff --git a/docs/testing/agent-testing.md b/docs/testing/agent-testing.md index e3ea770..87f2e06 100644 --- a/docs/testing/agent-testing.md +++ b/docs/testing/agent-testing.md @@ -66,7 +66,7 @@ Mirrors the skill trigger-eval format exactly. Store at `.github/agents/evals/<a "id": "should-not-trigger-1", "prompt": "create a user story for the login feature", "should_trigger": false, - "expected_agent": "living-doc-copilot" + "expected_agent": "living-doc-bdd-copilot" }, { "id": "should-not-trigger-2", diff --git a/skills/bdd-explore/SKILL.md b/skills/bdd-explore/SKILL.md deleted file mode 100644 index eaca791..0000000 --- a/skills/bdd-explore/SKILL.md +++ /dev/null @@ -1,175 +0,0 @@ ---- -name: bdd-explore -description: > - Business Seed assembly, iterative UI crawl, PageObject generation, and guided traversal - for the @living-doc-bdd-copilot agent. Activate for any webapp exploration or first-time - scan session. Covers seed.yaml assembly (Sources A–E), MCP Playwright crawl loop, entity - harvesting, ExplorationFixture sourcing cascade, custom component interaction rules, - parameterised route resolution, Source E guided traversal, and manifest.json output. - Triggers on: "scan webapp", "crawl UI", "explore the app", "discover routes", - "business seed", "seed.yaml", "manifest.json", "build pageobjects", "first scan", - "assemble seed", "guided traversal", "explore routes", "bdd explore". - Does NOT trigger for: standalone PageObject generation from a pre-built manifest without a - live webapp crawl (use living-doc-pageobject-scan); BDD maintenance after UI changes or - test failures (use bdd-maintain). -license: Apache-2.0 -compatibility: GitHub Copilot ---- - -# BDD Explore — Business Seed Assembly & Iterative Crawl - -> **Glossary:** Feature, Functionality, User Story — see [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)). -> **BDD schemas:** ExplorationFixture taxonomy, seed.yaml schema, manifest field_constraints — see [living-doc-bdd-schemas](../references/living-doc-bdd-schemas.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-bdd-schemas.md)). - ---- - -## Business Seed Assembly - -Before crawling, assemble the Business Seed file at `.copilot/bdd/seed.yaml`. - -Sources A–E — collect from whichever are available: - -| Source | Behaviour | -|---|---| -| **A — Living documentation** | Extract Feature names, US titles, and AC texts. Map each Feature to its primary URL/route if known. | -| **B — Sitemap or route config** | Parse route definitions (Angular router, React Router, `sitemap.xml`) to enumerate URL paths. | -| **C — OpenAPI / Swagger spec** | Extract endpoint paths; map REST resources to UI screens where obvious. | -| **D — Existing PageObjects** | Load current `.copilot/bdd/manifest.json` if present — treat known surfaces as already discovered. | -| **E — Guided traversal** | See Source E protocol below. | - -**Credential safety rule:** Never store literal credentials in `seed.yaml`. Always use `env:VAR_NAME` as the value, e.g.: - -```yaml -credentials: - username: env:BDD_USERNAME - password: env:BDD_PASSWORD -``` - -**Artifact location:** BDD artifacts can live anywhere in the repository. On session start, discover them: - -1. Search for `seed.yaml` containing a `base_url:` key. -2. Search for `manifest.json` containing an array with `pageobject_path` entries. -3. If found, load both files and record their paths for this session. -4. If NOT found, create them at a sensible location (e.g. alongside the existing living documentation directory if one exists, otherwise `.copilot/bdd/`). -5. **On first discovery:** propose adding their locations to `.github/copilot-instructions.md` so every future agent session can load them without searching: - -```markdown -## BDD Artifacts -- **Business Seed:** `<relative-path>/seed.yaml` — webapp routes, credentials (env refs), guided traversal steps -- **Exploration Manifest:** `<relative-path>/manifest.json` — discovered UI surfaces, component IDs, PageObject paths -``` - -Committing both files means every subsequent session resumes from the last known state — no re-crawl required. - -**Output artifact:** `seed.yaml` (path discovered or chosen above) - -```yaml -base_url: https://... -credentials: - username: env:BDD_USERNAME - password: env:BDD_PASSWORD -known_routes: - - path: /login - feature: Authentication - - path: /dashboard - feature: Dashboard -guided_steps: [] # populated during Source E traversal -form_fixtures: {} # keyed by route path; populated during form traversal (ExplorationFixture schema) -``` - ---- - -## Iterative Exploration - -**On session start:** Load `seed.yaml`. If `.copilot/bdd/manifest.json` is present, load it — treat all listed surfaces as already discovered and resume from there. If manifest is absent, treat this as the first run (clean slate). - -**Partial state rule:** `seed.yaml` present but `manifest.json` absent = first exploration run. Begin crawl from `base_url`; do not assume any surfaces have been discovered. - -**Crawl loop:** - -1. Navigate to each known route from `seed.yaml` using MCP Playwright. -2. Snapshot the page; identify interactive elements, forms, navigation links, and significant UI surfaces. -3. Follow links and expand navigation to discover new routes not in the manifest. -4. For each new surface discovered: add an entry to `manifest.json` (Feature name, URL, component IDs, PageObject path). -5. Repeat until coverage plateau — no new surfaces found in the last full iteration. -5a. **Entity harvesting** — whenever a domain ID, version, feed ID, or other parameterised entity is read from the DOM (URLs, card text, table rows), record it under `known_entities` in `seed.yaml` if not already present. Fields: `id`, `version`, `name`, `status`, `owner`, `note`. These values feed the sourcing cascade for parameterised routes in subsequent sessions. -6. For each form, wizard, or dialog on a visited page, attempt to fill and progress using the **ExplorationFixture sourcing cascade** (see [living-doc-bdd-schemas](../references/living-doc-bdd-schemas.md#explorationfixture)): (1) pre-declared values in `seed.yaml form_fixtures` — use the `default`-labelled value for the happy path and explore alternate `values[]` branches to reach different form sections or sub-routes; (2) values read from an existing entity in the app — copy verbatim (`copyable`) or append a suffix to avoid duplicate rejection (`derived`); (3) inferred `fake` values from label + placeholder + tooltip text; (4) user-assist pause for `real-world` fields with no resolvable value. Skip `condition`-gated fields until the controlling field holds the required value. After a successful submission, probe each text input for: special characters (`<>'"&\`), oversized input (200+ chars), wrong type, and duplicate values — run the core scan after each probe to capture `data-cy` validation elements visible only in error state. Record findings as `field_constraints` in the manifest `navigation_context`. Report any still-unreachable flows (auth walls, CAPTCHA, deep data dependencies) and offer to enrich `seed.yaml`. **Dismiss rule — after scanning any modal dialog or overlay, always close it (Cancel button → × close button → Escape key, in that order) before navigating to the next route or triggering the next action. Never leave a dialog open while scanning a subsequent page.** - -**Component interaction rules — use these instead of `fill()` for custom components:** - -| Component | Correct interaction | -|---|---| -| `cps-radio-group` | `browser_click` the inner `<label>` or `<span>` whose text matches the desired option. Do NOT use `fill()`. | -| `cps-select` | `browser_click` the component to open the dropdown portal, then `browser_click` the matching `<li>` option by text. | -| `cps-autocomplete` | Type into the inner `<input>` using `browser_type`, wait for the dropdown to appear, then `browser_click` the matching option. | -| `cps-switch` / `cps-checkbox` | `browser_click` the component wrapper. | -| `app-text-editor` (rich text) | `browser_click` the `contenteditable` child, then `browser_type` the value. | -| `cps-button` | `browser_click` the inner `<button>` (e.g. via `evaluate`: `el.querySelector('button').click()`). | -| `input[type=file]` | Use `mcp_browser_file_upload` or `page.setInputFiles()` with a fixture file path from `seed.yaml form_fixtures`. | - -After interacting with a required field (especially `cps-radio-group`), re-check whether a gated button (e.g. Continue, Save) has become enabled before proceeding. - -**Parameterised route resolution — use `known_entities` before prompting the user:** - -Before navigating to any parameterised route (e.g. `/auth/all-domains/{domainId}/{version}/...`), first check `seed.yaml known_entities` for a matching entity with `owner` equal to the current test user. Substitute the `id` and `version` values directly. Only fall back to the user-assist pause if no matching entity exists. - -For domain detail tab scans (Schema, Run history, Access, Version management): always navigate using the first `known_entities` domain owned by the current test user, then click each tab in turn and run the core scan. These tabs are reachable by tab click alone — no additional data state is required to open them. - -**PageObject generation rule:** For every new or changed UI surface, load `living-doc-pageobject-scan` — `Create` mode for first-time generation and `Maintain` mode for selector drift. Generated PageObjects must use a file-level `living-doc: FEAT-<nnn> | /route` header comment, prefer `data-testid` selectors, keep selector constants in `ALL_CAPS`, accept `page` in `__init__` / `constructor`, and expose method stubs for each interactive element. Flag any positional CSS selector as `FRAGILE`. If no matching Feature exists in the living documentation, hand the surface to `@living-doc-copilot`; do not create entities here. - -**Output artifact:** `.copilot/bdd/manifest.json` - -The manifest records per-route exploration state. Schema matches the `living-doc-pageobject-scan` skill definition: - -```json -{ - "version": "1.0", - "routes": { - "/login": { - "pageobject_path": "aul-ui/playwright/pages/LoginPage.ts", - "feature_id": "FEAT-001", - "last_scanned": "2026-05-26T10:30:00Z", - "elements": [ - { "data_cy": "username-input", "tag": "input" }, - { "data_cy": "password-input", "tag": "input" }, - { "data_cy": "login-btn", "tag": "cps-button" } - ], - "coverage_gaps": [], - "navigation_context": { - "prerequisites": null, - "navigation_steps": "Navigate directly to /login.", - "data_requirements": null, - "auth_role": "unauthenticated", - "notes": null, - "field_constraints": [] - } - } - } -} -``` - ---- - -## Source E — Guided Traversal Protocol - -Use when automated crawling cannot proceed — unknown decision points, multi-step wizards, auth flows, role-gated screens, or forms blocked by missing business knowledge (required field values, valid lookup codes, business-specific input formats). - -**Protocol:** - -1. Take a screenshot; show the user what the agent sees. -2. Ask: *"I've reached a decision point at [URL]. What should I do next? (e.g. click X, fill field Y with Z, log in as role R, provide the valid value for field F)"* -3. Wait for the user's answer. Execute the described action via MCP Playwright. -4. Immediately append to `guided_steps:` in `seed.yaml`: - -```yaml -guided_steps: - - url: /checkout/payment - action: fill - field: card-number - value: env:TEST_CARD_NUMBER - note: "Test Visa card for payment flow" -``` - -5. Continue crawl from the new state. - -**CAPTCHA rule:** If a CAPTCHA is encountered, pause and ask the user to solve it manually in the browser. Do not attempt automated bypass. Once the user confirms it is solved, continue and record the step with `action: captcha_solved`. diff --git a/skills/bdd-maintain/SKILL.md b/skills/bdd-maintain/SKILL.md index ff24903..55fdc1b 100644 --- a/skills/bdd-maintain/SKILL.md +++ b/skills/bdd-maintain/SKILL.md @@ -1,19 +1,19 @@ --- name: bdd-maintain description: > - Maintenance modes for the @living-doc-bdd-copilot agent: RE-SCAN (full manifest refresh - after UI changes), HEALING (fix selector drift in failing tests only), REMOVE - (delete files linked to a deprecated feature), and DEAD CODE AUDIT (find unused step - definitions, PageObject methods, and PO components). Activate when the UI has changed and - the manifest needs refreshing, when tests are failing due to selector drift, when a feature - has been removed from the product, or when dead BDD code needs to be identified. - Triggers on: "re-scan", "refresh manifest", "heal pageobjects", "fix failing tests", - "selector drift", "tests are failing", "remove feature", "deprecate bdd", "bdd maintain", - "update selectors", "pageobject broken", "scenario failing", "unused steps", - "dead pageobject methods", "find unused steps", "dead code audit", "unused po methods". - Does NOT trigger for: first-time webapp exploration and seed assembly (use bdd-explore); - standalone (non-agent) PageObject maintenance outside @living-doc-bdd-copilot - (use living-doc-pageobject-scan). + Lifecycle cleanup for BDD automation artifacts: REMOVE (delete feature files, step + definitions, and PageObjects linked to a deprecated entity) and DEAD CODE AUDIT (find + unused step definitions, PageObject methods, and PO components via three Python scripts). + Activate when a feature has been removed from the product and its linked BDD files must be + deleted, or when dead BDD code needs to be identified. Third step in the entity-deprecation + chain — runs after living-doc-update deprecates the entity and gherkin-living-doc-sync + marks linked scenarios. + Triggers on: "remove feature", "deprecate bdd", "delete feature files", "bdd cleanup", + "remove pageobject", "unused steps", "dead pageobject methods", "find unused steps", + "dead code audit", "unused po methods", "dead po components", "bdd-maintain". + Does NOT trigger for: re-scanning the manifest after UI changes (use living-doc-pageobject-scan + RE-SCAN scope); healing failing tests after selector drift (use living-doc-pageobject-scan + HEALING scope); syncing @AC: traceability tags (use gherkin-living-doc-sync). license: Apache-2.0 compatibility: GitHub Copilot --- @@ -23,41 +23,7 @@ compatibility: GitHub Copilot > **Glossary:** Feature, Functionality, User Story — see [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)). > **BDD schemas:** manifest.json schema (routes, elements, coverage_gaps, navigation_context) — see [living-doc-bdd-schemas](../references/living-doc-bdd-schemas.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-bdd-schemas.md)). -Three modes — activate the one that matches the trigger. - ---- - -## RE-SCAN mode - -**Trigger:** New feature shipped, UI refactored, or significant route changes. - -**Scope:** Full re-run of every path recorded in `manifest.json`, plus active discovery of new routes not yet in the manifest. - -1. Reload `seed.yaml` and `manifest.json`. -2. For every existing manifest entry: navigate to its URL, snapshot the DOM, and validate that every recorded `component_id` locator still resolves. Flag any locator that no longer matches as `BREAKING CHANGE`, including the linked step definition / scenario details that may fail. -3. **Actively discover new routes from each visited page** — do not limit discovery to routes already in `seed.yaml`. On each page snapshot: - - Find all `<a href>` links that resolve to new paths not yet in the manifest. - - Find all buttons and interactive components whose purpose suggests navigation to a new screen (e.g. "Create order", "View details", "Go to settings") — click them and record the resulting URL. - - Find tab panels, side-nav items, and wizard steps that expose sub-routes. - - Any new URL discovered this way is a candidate manifest entry; add it and crawl it recursively. -4. Add new surfaces to `manifest.json`; mark removed surfaces as `deprecated`. -5. Update stale selector constants in PageObjects for any locators flagged in step 2. -6. Generate new scenarios for newly discovered ACs (load `bdd-scenario-gen` skill). - ---- - -## HEALING mode - -**Trigger:** Test suite failures due to selector drift, broken step definitions, or PageObject mismatches. - -**Scope:** Failing tests only — do not touch passing tests or unrelated PageObjects. - -1. Receive or discover the list of failing test names / scenario titles. If the request only says tests are failing but does not include the failing list, ask for it before making changes so scope stays limited to the failing scenarios. -2. Trace each failure back to its PageObject and step definition. -3. Navigate to the affected page via MCP Playwright; snapshot the current DOM. -4. Find updated element IDs or selectors; update only the affected PageObject(s) accordingly. -5. Verify the step definition binding still resolves; fix if broken. -6. Re-run only the previously failing tests to confirm healing. Do not re-run the full suite. +Two modes — activate the one that matches the trigger. --- @@ -72,7 +38,7 @@ Three modes — activate the one that matches the trigger. 3. Find PageObjects referenced only by those scenarios; find step definitions used only by those scenarios. 4. Confirm the full deletion list with the user before touching any file. 5. Remove confirmed files; update `manifest.json` to remove the deprecated entry. -6. Flag linked US/AC entities in the living documentation as candidates for deprecation — hand off to `@living-doc-copilot`. +6. Flag linked US/AC entities in the living documentation as candidates for deprecation — load `living-doc-update` skill. --- diff --git a/skills/bdd-scenario-gen/SKILL.md b/skills/bdd-scenario-gen/SKILL.md deleted file mode 100644 index be43635..0000000 --- a/skills/bdd-scenario-gen/SKILL.md +++ /dev/null @@ -1,225 +0,0 @@ ---- -name: bdd-scenario-gen -description: > - BDD scenario writing quality and agent-level scenario generation for @living-doc-bdd-copilot. - Covers: writing Gherkin in plain business language, Given/When/Then correctness, - one-behaviour-per-scenario rule, Scenario Outline, Background, anti-patterns, feature file - types (US vs Functionality), @AC: traceability annotations (authoritative format), gap - detection via living-doc-gap-finder, and step definition resolution against PageObjects. - Triggers on: "write a Gherkin scenario", "BDD scenario", "standalone feature file", - "Given When Then", "Scenario Outline", "Cucumber scenario", "behave scenario", - "acceptance test in Gherkin", "should I use Background", "BDD anti-patterns", - "review my feature file", "BDD scenarios for", - "convert acceptance criteria to Gherkin", "# AC: comment", "exploratory scenario". - Does NOT trigger for: implementing step definitions (use gherkin-step), writing unit tests, - designing a test case table, generating the living-doc feature file header block or skeleton - scenarios (use living-doc-scenario-creator). -license: Apache-2.0 -compatibility: GitHub Copilot ---- - -# BDD Scenario Generation - -> **Glossary:** User Story, AC, Feature, PageObject — see [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)). -> **BDD schemas:** US and Functionality feature file templates — see [living-doc-bdd-schemas](../references/living-doc-bdd-schemas.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-bdd-schemas.md)). - -Use for: writing or reviewing Gherkin scenarios, generating feature files from ACs, detecting uncovered ACs, resolving step stubs against PageObjects. - ---- - -## Gap Detection (agent mode) - -An AC is considered uncovered if no scenario in any `.feature` file carries the `@AC:<id>` traceability tag. - -1. Use the `living-doc-gap-finder` skill (bottom-up mode) to identify User Stories with `ACTIVE` ACs that have no linked Gherkin scenario. -2. For each gap: generate scenario skeletons — one scenario per `ACTIVE` AC with the mandatory `@AC:` traceability tag. Skip `PLANNED` and `DEPRECATED` ACs. - ---- - -## Feature File Types - -Two categories of `.feature` files exist — they have different locations, headers, and scopes: - -| Type | Location | Feature block | Scope | -|---|---|---|---| -| User Story (E2E) | `features/us/us-<nnn>-<kebab>.feature` | `Feature: <US title>` with As-a/I-can/so-that narrative + `@US_ID:US-<n>` tag | End-to-end, user perspective | -| Functionality (system test) | `features/functionalities/<feat-kebab>/func-<nnn>-<kebab>.feature` | `Feature: <Feature name> — <Functionality name>` + `@FUNC_ID:FUNC-<nnn>` tag | One atomic behavior, input to output | - -For non-living-doc scenarios (exploratory probes, regression suites not tied to a US AC), `@AC:` annotations are not required. Use `@AC:STANDALONE` as an optional placeholder when explicitly signalling that a scenario is intentionally unlinked — `gherkin-living-doc-sync` will note it but not flag it as a traceability gap. - ---- - -## Feature File Conventions - -- File naming: `us-<nnn>-<kebab-title>.feature` under `features/us/`, e.g. `features/us/us-007-place-an-online-order.feature`. -- The `Feature:` header must restate the User Story narrative in `As a / I can / so that` form. -- Scenario step text must stay in business/domain language only — never mention selectors, HTTP calls, DOM details, or database operations. - ---- - -## Write in the Ubiquitous Language - -Scenarios must use the language of the business domain. Anyone on the product team must be able to read and verify them without knowing the implementation. - -```gherkin -# ✅ — business language -Given a customer with a gold membership -When they place an order for 2 units of "SKU-100" -Then the order is confirmed and the total is £160.00 - -# ❌ — implementation details -Given the database contains a row in users with tier="gold" -When a POST request is sent to /api/orders with body { "sku": "SKU-100", "qty": 2 } -Then the response status is 201 -``` - ---- - -## Given / When / Then - -| Keyword | Purpose | Rule | -|---|---|---| -| **Given** | System state before the action | Preconditions only — no actions | -| **When** | The action the actor takes | Exactly one meaningful action per scenario | -| **Then** | Observable outcome | Assertions only — no actions | -| **And / But** | Continuation | Never as the first step in a block | - -```gherkin -# ✅ -Given the customer's cart contains 3 items -When the customer applies the promo code "SAVE10" -Then the cart total is reduced by 10% - -# ❌ — multiple When actions (split into separate scenarios) -When the customer applies the promo code "SAVE10" -And the customer proceeds to checkout -And the customer enters payment details -``` - ---- - -## One Behaviour per Scenario - -Each scenario must verify exactly one observable behaviour. If the scenario name contains "and", it likely tests two behaviours — split it. - ---- - -## Scenario Outline for Data-Driven Variations - -```gherkin -# ✅ -Scenario Outline: Discount is applied correctly for each membership tier - Given a customer with a <tier> membership - When they purchase an item costing £100.00 - Then the total is £<total> - - Examples: - | tier | total | - | gold | 80.00 | - | silver | 90.00 | - | bronze | 95.00 | -``` - -When illustrating discount calculations, show the resulting order total in the `Then` step or `Examples:` table rather than the raw discount percentage. If the prompt does not give an amount, default to £100.00 for comparison tables and £200.00 for single-scenario threshold cases so the discounted outcome is concrete. - ---- - -## Background - -Use `Background` when **every** scenario in the file shares the same precondition. Keep Background to 3 steps or fewer. If only 2–3 scenarios share a precondition, duplicate the `Given` step — prefer clarity over abstraction. Keep `Background` to `Given` preconditions only, not `When` or `Then` steps. - -When answering whether `Background` is appropriate, confirm all three checks: shared-by-every-scenario, 3-steps-or-fewer, and no-subset-sharing. - ---- - -## Anti-Patterns - -| Anti-pattern | Problem | Fix | -|---|---|---| -| UI selectors in steps (`I click the "Submit" button`) | Breaks when UI changes | Use domain actions (`the customer submits the order`) | -| Imperative style (`I enter "alice@example.com" in Email field`) | Fragile and verbose | Declarative (`the customer logs in as Alice`) | -| Multiple `When` per scenario | Usually signals multiple behaviours | Prefer splitting; if all steps represent one logical action, collapse into one declarative step | -| Assertions in Given/When | Violates keyword semantics | Move all assertions to `Then` | -| Scenario depends on a previous scenario's state | Hidden ordering dependency | Each scenario must be fully self-contained | - -When reviewing an existing scenario, explicitly check for a missing `@AC:` tag immediately above each `Scenario:` or `Scenario Outline:` and call that out as a traceability defect. - ---- - -## Traceability Annotations - -Living-doc feature files (`features/us/` and `features/functionalities/`) require two complementary annotations above each `Scenario:` or `Scenario Outline:`: - -1. **`# AC:` comment** — human-readable context: AC ID, version, state, description, and optionally the specific aspect this scenario covers. -2. **`@AC:` tag** — machine-readable Cucumber tag consumed by scripts and coverage reports. - -```gherkin -# AC:US-1-01 (v1.0.0 - ACTIVE) — customer places an order with a saved payment method -@AC:US-1-01 -Scenario: Customer successfully places an order - ... -``` - -When a scenario covers only **one aspect** of a multi-aspect AC, encode the aspect as a `/param:value` segment on the tag and mirror it in the comment: - -```gherkin -# AC:US-1-01 (v1.0.0 - ACTIVE) — displays {required field} on login screen | aspect: username input -@AC:US-1-01/aspect:username-input -Scenario: Login form shows the username input field - ... -``` - -The `/param:value` format is extensible. Multiple ACs — one comment + tag pair per AC: - -```gherkin -# AC:US-1-01 (v1.0.0 - ACTIVE) — invalid credentials show an error message -# AC:US-1-02 (v1.0.0 - ACTIVE) — account lockout after 3 failed attempts -@AC:US-1-01 -@AC:US-1-02 -@Regression -Scenario: User is locked out after repeated failed logins - ... -``` - -The AC tag prefix matches the parent entity: `@AC:US-<n>-<nn>` for User Story scenarios, `@AC:FUNC-<nnn>-<nn>` for Functionality scenarios. - ---- - -## Step Definition Resolution (agent mode) - -For each generated scenario step: - -a. **Narrow the search scope to the page first** — identify which PageObject the scenario's steps will interact with. Look in step definition files that already import or reference that PageObject; these are the most likely candidates for reuse. - -b. **Match by purpose, not just pattern** — read the step's implementation body to confirm it performs the same business action. Only reuse if purpose matches. - -c. If a purpose-matching step exists, reuse it as-is; note which library file it lives in. - -d. If no reusable step exists but the needed PageObject method already exists, generate a full step stub via `gherkin-step` that delegates directly to that PageObject method. - -e. If neither the step nor the PageObject method exists, generate a stub that raises `NotImplementedError` and flag that the PageObject must be extended with the missing interaction. - -After resolution, update `manifest.json` to record any new PageObject paths created. - ---- - -## Output Format - -Output all generated Gherkin in a single fenced `gherkin` code block starting with `Feature:`. Use only `Scenario:`, `Scenario Outline:`, `Background:`, `Given`, `When`, `Then`, `And`, `But`, and `Examples:` inside the block. - ---- - -## Out-of-Scope Routing - -| Request | Use instead | -|---|---| -| Implementing step definitions | **gherkin-step** | -| Writing unit tests | Use your project's unit test framework directly | -| Designing a test case table | Use your project's test design practice | -| Generate a living-doc US entity with AC coverage report | **living-doc-scenario-creator** | - -If asked for step definition code, do not write it here — redirect to **gherkin-step**. If asked for a US entity skeleton with an AC coverage report, redirect to **living-doc-scenario-creator**. - -**Ambiguous request — "create scenarios for US-007":** If the user does not specify whether they want the feature file structure or full scenario bodies, ask: -> "Do you want the living-doc feature file header and skeleton scenario titles (use `living-doc-scenario-creator`), or full Given/When/Then scenario bodies (continue here in `bdd-scenario-gen`)?" -Both skills handle different parts of the same feature file — they are meant to be used in sequence. diff --git a/skills/data-cy-instrument/SKILL.md b/skills/data-cy-instrument/SKILL.md index 1a0176e..f1250ce 100644 --- a/skills/data-cy-instrument/SKILL.md +++ b/skills/data-cy-instrument/SKILL.md @@ -6,15 +6,14 @@ description: > whenever coverage gaps exist in `manifest.json`, when PageObject stubs carry "⚠️ PROPOSED" locator comments, when Functionality entities have `status: planned` due to missing test IDs, or when a dev explicitly asks to instrument templates. - Activate at the end of a `living-doc-pageobject-scan`, `bdd-explore`, or `bdd-maintain` - RE-SCAN session when `coverage_gaps` arrays are non-empty; + Activate at the end of a `living-doc-pageobject-scan` session (Create or RE-SCAN scope) + when `coverage_gaps` arrays are non-empty; Triggers on: "add missing data-cy", "instrument templates", "fix data-cy gaps", "add testids", "data-cy audit", "instrument angular templates", "fix locators", "add data-cy attributes", "add test ids to templates", "fix playwright selectors", "data-cy-instrument". - Does NOT trigger for: adding or fixing Gherkin scenarios (use bdd-scenario-gen); generating - or healing PageObjects without instrumentation gaps (use living-doc-pageobject-scan); initial - webapp crawl (use bdd-explore). + Does NOT trigger for: adding or fixing Gherkin scenarios (use living-doc-scenario-creator); generating + or healing PageObjects without instrumentation gaps (use living-doc-pageobject-scan). license: Apache-2.0 compatibility: GitHub Copilot --- @@ -242,13 +241,13 @@ Report the following at the end of the run: | Skill | Relationship | |---|---| -| `bdd-explore` | Upstream — produces `manifest.json` with `coverage_gaps`. This skill consumes that output. | -| `bdd-maintain` RE-SCAN | Upstream — re-generates `coverage_gaps` after a UI change. Trigger this skill after RE-SCAN if new gaps appear. | -| `bdd-scenario-gen` | Downstream — after Functionalities are promoted from `planned` to `active`, generate Gherkin scenarios for them. | +| `living-doc-pageobject-scan` | Upstream — produces `manifest.json` with `coverage_gaps`. This skill consumes that output. | +| `living-doc-pageobject-scan` RE-SCAN scope | Upstream — re-generates `coverage_gaps` after a UI change. Trigger this skill after RE-SCAN if new gaps appear. | +| `living-doc-scenario-creator` | Downstream — after Functionalities are promoted from `planned` to `active`, generate Gherkin scenarios for them. | | `living-doc-update` | Downstream — if PageObject header `status` changes, the corresponding Feature entity in the living doc may also need a status update. | **Pipeline position:** ``` -bdd-explore (scan) → data-cy-instrument → bdd-scenario-gen -bdd-maintain RE-SCAN → data-cy-instrument → bdd-scenario-gen +living-doc-pageobject-scan → data-cy-instrument → living-doc-scenario-creator +living-doc-pageobject-scan (RE-SCAN) → data-cy-instrument → living-doc-scenario-creator ``` diff --git a/skills/gherkin-living-doc-sync/SKILL.md b/skills/gherkin-living-doc-sync/SKILL.md index d9a718c..ded348a 100644 --- a/skills/gherkin-living-doc-sync/SKILL.md +++ b/skills/gherkin-living-doc-sync/SKILL.md @@ -10,7 +10,7 @@ description: > Triggers on: "sync gherkin to living doc", "feature file out of sync", "scenario not linked to AC", "step text changed", "gherkin drift", "BDD sync", "AC link missing in feature file", "sync scenarios", "traceability broken", "propagate AC changes", "AC was descoped". - Does NOT trigger for: writing new scenarios (use bdd-scenario-gen), implementing step + Does NOT trigger for: writing new scenarios (use living-doc-scenario-creator), implementing step definitions (use gherkin-step), finding living doc gaps (use living-doc-gap-finder), creating new US/Feature entities (use living-doc-create-user-story). license: Apache-2.0 @@ -33,7 +33,7 @@ and `features/functionalities/`) — other feature files are skipped. ## Step 1 — Detect the sync direction -**Upstream dependencies:** Directions that flow from the living documentation into feature files are initiated by catalog-layer operations from `@living-doc-copilot`: +**Upstream dependencies:** Directions that flow from the living documentation into feature files are initiated by catalog-layer operations from `@living-doc-bdd-copilot`: - `living-doc-update` modified, added, or deprecated an AC → triggers directions 2 and 4 below - `living-doc-impact-analysis` identified High-impact AC changes that require resync → may trigger directions 2 and 3 @@ -168,7 +168,7 @@ Summary: 2 missing AC links, 1 step text drift detected — apply changes? (y/n | Scenario with no `@AC:` tag | Missing traceability — add tag or create AC | | Two scenarios linked to the same AC | Usually a duplicate — review | | AC linked from a scenario in a different User Story's feature file | Passive cross-US coverage — permitted but note it in the sync report. Only flag if the scenario's primary intent belongs to a different User Story (misplaced scenario) | -| Step text describes implementation (selector, endpoint) | Gherkin business-language violation — refer to `bdd-scenario-gen` | +| Step text describes implementation (selector, endpoint) | Gherkin business-language violation — refer to `living-doc-scenario-creator` | --- @@ -176,7 +176,7 @@ Summary: 2 missing AC links, 1 step text drift detected — apply changes? (y/n | Request | Use instead | |---|---| -| Writing new Gherkin scenarios from scratch | `bdd-scenario-gen` | +| Writing new Gherkin scenarios from scratch | `living-doc-scenario-creator` | | Implementing step definition code | `gherkin-step` | | Finding ACs with no scenario coverage | `living-doc-gap-finder` | | Creating new User Story, Feature, or Functionality entities | `living-doc-create-user-story` / `living-doc-create-functionality` | diff --git a/skills/gherkin-living-doc-sync/evals/evals.json b/skills/gherkin-living-doc-sync/evals/evals.json index 17b8ae9..0cc1f82 100644 --- a/skills/gherkin-living-doc-sync/evals/evals.json +++ b/skills/gherkin-living-doc-sync/evals/evals.json @@ -60,11 +60,11 @@ "id": 5, "category": "negative", "prompt": "I need a new Gherkin scenario for the case where a promo code has expired.", - "expected_output": "Writing new scenarios is out of scope for this skill — routes to bdd-scenario-gen. gherkin-living-doc-sync corrects existing links and syncs existing scenarios; it does not write new scenarios from scratch.", + "expected_output": "Writing new scenarios is out of scope for this skill — routes to living-doc-scenario-creator. gherkin-living-doc-sync corrects existing links and syncs existing scenarios; it does not write new scenarios from scratch.", "files": [], "expectations": [ "Does not write a new scenario", - "Routes to bdd-scenario-gen", + "Routes to living-doc-scenario-creator", "Explains the distinction: sync vs. write new" ] }, diff --git a/skills/gherkin-living-doc-sync/evals/fixture-map.md b/skills/gherkin-living-doc-sync/evals/fixture-map.md index 4e36e15..bb7c892 100644 --- a/skills/gherkin-living-doc-sync/evals/fixture-map.md +++ b/skills/gherkin-living-doc-sync/evals/fixture-map.md @@ -12,7 +12,7 @@ No fixture files for this skill. All evals are conversational — the skill oper | 2 | happy-path | _(none)_ | AC description updated in living doc → propagate to # AC: comment in feature file | | 3 | happy-path | _(none)_ | Step text drift after UI rename → DRIFT DETECTED block with two fix options | | 4 | regression | _(none)_ | US deprecated in living doc → @deprecated + @review-needed tags on linked scenarios | -| 5 | negative | _(none)_ | Routing: new scenario authoring → bdd-scenario-gen | +| 5 | negative | _(none)_ | Routing: new scenario authoring → living-doc-scenario-creator | | 6 | paraphrase | _(none)_ | "Feature files are a mess after redesign" → prioritised repair plan: steps first, then links | | 7 | edge-case | _(none)_ | Broken AC reference (US-099 not in catalog) → resolution options, never remove the link | | 8 | output-format | _(none)_ | Sync run output format: SYNC ACTION + DRIFT DETECTED blocks + summary line | @@ -27,7 +27,7 @@ No fixture files for this skill. All evals are conversational — the skill oper | Routes to | Query count | |---|---| -| bdd-scenario-gen | 2 | +| living-doc-scenario-creator | 2 | | gherkin-step | 1 | | living-doc-gap-finder | 1 | | living-doc-create-user-story | 1 | diff --git a/skills/gherkin-living-doc-sync/evals/trigger-eval.json b/skills/gherkin-living-doc-sync/evals/trigger-eval.json index 14956d3..f0a2faf 100644 --- a/skills/gherkin-living-doc-sync/evals/trigger-eval.json +++ b/skills/gherkin-living-doc-sync/evals/trigger-eval.json @@ -10,13 +10,13 @@ {"id": 9, "query": "Sync all scenarios in the payments feature file", "should_trigger": true, "reason": "'sync scenarios' trigger phrase"}, {"id": 10, "query": "The Gherkin scenarios are out of sync with the living doc", "should_trigger": true, "reason": "'gherkin out of sync with living doc' trigger phrase"}, {"id": 11, "query": "Traceability is broken between the feature files and the AC catalog", "should_trigger": true, "reason": "'traceability broken' trigger phrase"}, - {"id": 12, "query": "Write a new scenario for the expired promo AC", "should_trigger": false, "reason": "Writing new scenarios — routes to bdd-scenario-gen"}, + {"id": 12, "query": "Write a new scenario for the expired promo AC", "should_trigger": false, "reason": "Writing new scenarios — routes to living-doc-scenario-creator"}, {"id": 13, "query": "Implement the step definition for 'When the customer confirms the order'", "should_trigger": false, "reason": "Step definition implementation — routes to gherkin-step"}, {"id": 14, "query": "Find which User Stories have no Gherkin scenarios", "should_trigger": false, "reason": "Finding living doc gaps — routes to living-doc-gap-finder"}, {"id": 15, "query": "Create a new User Story for the checkout capability", "should_trigger": false, "reason": "Creating new entities — routes to living-doc-create-user-story"}, {"id": 16, "query": "Propagate AC changes from the living doc back to the feature files", "should_trigger": true, "reason": "'propagate AC changes' trigger phrase"}, {"id": 17, "query": "The @AC: tag and the # AC: comment are out of sync — what do I do?", "should_trigger": true, "reason": "Comment/tag mismatch is a sync issue — core task of this skill"}, - {"id": 18, "query": "Generate a new scenario for the expired promo AC from scratch", "should_trigger": false, "reason": "Writing new scenarios from scratch — routes to bdd-scenario-gen (not syncing existing ones)"}, + {"id": 18, "query": "Generate a new scenario for the expired promo AC from scratch", "should_trigger": false, "reason": "Writing new scenarios from scratch — routes to living-doc-scenario-creator (not syncing existing ones)"}, {"id": 19, "query": "Run scan_ac_links.py before doing a sync pass", "should_trigger": true, "reason": "Auditing AC link headers is the first step of the sync workflow — this skill owns scan_ac_links.py"}, {"id": 20, "query": "An AC was descoped last sprint — what should happen to the linked scenario?", "should_trigger": true, "reason": "Propagating AC status change (descoped) to feature file is a living-doc → feature file sync direction"} ] diff --git a/skills/gherkin-step/SKILL.md b/skills/gherkin-step/SKILL.md index 0e0af9a..931c5ef 100644 --- a/skills/gherkin-step/SKILL.md +++ b/skills/gherkin-step/SKILL.md @@ -10,10 +10,10 @@ description: > "parameter type", "DataTable", "DocString", "Before hook", "After hook", "World object", "step context", "step state sharing", "how to share state between steps", "register step definition", "hook setup". - Does NOT trigger for: writing Gherkin scenarios (use bdd-scenario-gen), writing unit tests + Does NOT trigger for: writing Gherkin scenarios (use living-doc-scenario-creator), writing unit tests (no skill in this toolkit covers unit test authoring — use your project's test framework directly). - Pairs with bdd-scenario-gen. + Pairs with living-doc-scenario-creator. license: Apache-2.0 compatibility: GitHub Copilot --- @@ -26,7 +26,7 @@ compatibility: GitHub Copilot If the user asks to write or review a **Gherkin scenario / feature file**, do not draft the scenario here. Explain that this skill covers **step definition code** only, then route the user to -`bdd-scenario-gen` for the Gherkin text itself. +`living-doc-scenario-creator` for the Gherkin text itself. --- diff --git a/skills/gherkin-step/evals/evals.json b/skills/gherkin-step/evals/evals.json index ce6c150..0d5080a 100644 --- a/skills/gherkin-step/evals/evals.json +++ b/skills/gherkin-step/evals/evals.json @@ -57,11 +57,11 @@ "id": 5, "category": "negative", "prompt": "Write a Gherkin scenario for when the promo code is expired.", - "expected_output": "Writing Gherkin scenarios is out of scope for this skill - routes to bdd-scenario-gen. gherkin-step handles step definition code; bdd-scenario-gen handles Gherkin text.", + "expected_output": "Writing Gherkin scenarios is out of scope for this skill - routes to living-doc-scenario-creator. gherkin-step handles step definition code; living-doc-scenario-creator handles Gherkin text.", "files": [], "expectations": [ "Does not write a Gherkin scenario", - "Routes to bdd-scenario-gen", + "Routes to living-doc-scenario-creator", "Explains the distinction: step binding code vs. Gherkin text" ] }, diff --git a/skills/gherkin-step/evals/trigger-eval.json b/skills/gherkin-step/evals/trigger-eval.json index 2a89397..4217756 100644 --- a/skills/gherkin-step/evals/trigger-eval.json +++ b/skills/gherkin-step/evals/trigger-eval.json @@ -93,7 +93,7 @@ "id": 16, "query": "Write a Gherkin scenario for the promo code feature", "should_trigger": false, - "reason": "Writing Gherkin scenarios — routes to bdd-scenario-gen" + "reason": "Writing Gherkin scenarios — routes to living-doc-scenario-creator" }, { "id": 17, diff --git a/skills/living-doc-create-functionality/SKILL.md b/skills/living-doc-create-functionality/SKILL.md index 2f07913..6012088 100644 --- a/skills/living-doc-create-functionality/SKILL.md +++ b/skills/living-doc-create-functionality/SKILL.md @@ -13,7 +13,7 @@ description: > "test_type", "unit vs integration test", "choose test type", "link functionality to feature". Does NOT trigger for: end-to-end User Stories (use living-doc-create-user-story), system surface documentation (use living-doc-create-feature), generating BDD scenarios for a - Functionality (use bdd-scenario-gen). + Functionality (use living-doc-scenario-creator). license: Apache-2.0 compatibility: GitHub Copilot --- @@ -163,4 +163,4 @@ redirect to `living-doc-create-user-story`. | "Create a User Story" | `living-doc-create-user-story` — this skill documents atomic behaviors, not end-to-end User Stories | | "Create a Feature entity" | `living-doc-create-feature` — a Feature is a system surface, not an atomic behavior | | "Write unit tests for this Functionality" | No skill in this toolkit covers unit test authoring — use your project's test framework directly. This skill defines the _what_ (ACs); writing the test code is outside scope. | -| "Generate BDD scenarios for this Functionality" | `bdd-scenario-gen` (step bodies) via `living-doc-scenario-creator` (feature file skeleton) | +| "Generate BDD scenarios for this Functionality" | `living-doc-scenario-creator` | diff --git a/skills/living-doc-gap-finder/SKILL.md b/skills/living-doc-gap-finder/SKILL.md index a2d68d3..f028664 100644 --- a/skills/living-doc-gap-finder/SKILL.md +++ b/skills/living-doc-gap-finder/SKILL.md @@ -12,8 +12,7 @@ description: > "documentation coverage", "gap report", "what's not covered", "living doc audit", "documentation audit". Does NOT trigger for: creating new living doc objects (use living-doc-create-* skills). - Delegates to: living-doc-pageobject-scan, living-doc-scenario-creator, bdd-scenario-gen, - and all create-* skills. + Delegates to: living-doc-pageobject-scan, living-doc-scenario-creator, and all create-* skills. license: Apache-2.0 compatibility: GitHub Copilot --- diff --git a/skills/living-doc-pageobject-scan/SKILL.md b/skills/living-doc-pageobject-scan/SKILL.md index 1830e93..5bdfe62 100644 --- a/skills/living-doc-pageobject-scan/SKILL.md +++ b/skills/living-doc-pageobject-scan/SKILL.md @@ -1,36 +1,32 @@ --- name: living-doc-pageobject-scan description: > - Discover, create, and maintain PageObject classes — standalone bottom-up entry point for - BDD-driven UI testing. Use when generating PageObjects from a live webapp URL or test - directory, updating PageObjects after UI changes, bootstrapping a test suite for a new - screen, generating Functionality stubs from discovered UI elements, or detecting PageObject - drift. - Triggers on: "scan this webapp", "generate pageobjects", "update pageobjects", - "pageobject for this screen", "crawl the UI", "discover UI elements", "create page objects", - "scan test suite for pageobjects", "living doc bottom-up", "bootstrap page objects", - "pageobject drift", "sync pageobjects", "update manifest", "functionality stubs from UI". - Does NOT trigger for: creating User Stories (use living-doc-create-user-story), writing - scenarios (use living-doc-scenario-creator), agent crawl in @living-doc-bdd-copilot - (use bdd-explore), agent maintenance after UI changes (use bdd-maintain). - Pairs with living-doc-create-functionality and living-doc-gap-finder. After a scan - that produces non-empty coverage_gaps, continue with data-cy-instrument to resolve - missing data-cy attributes before generating scenarios. + Discover, create, and maintain PageObject classes — entry point for all webapp exploration + and BDD-driven UI testing. Covers seed.yaml assembly (Sources A–E), iterative MCP Playwright + crawl, entity harvesting, ExplorationFixture sourcing, PageObject generation, Functionality + stubs, and manifest.json output. Two Maintain scopes: RE-SCAN (full manifest refresh after + UI changes, with active new-route discovery) and HEALING (fix selector drift in failing + tests only). Use for first-time scans, re-scanning after UI changes, or healing failing tests. + Triggers on: "scan this webapp", "generate pageobjects", "update pageobjects", "crawl the UI", + "explore the app", "discover routes", "seed.yaml", "manifest.json", "first scan", + "create page objects", "pageobject drift", "bootstrap page objects", "re-scan", + "refresh manifest", "heal pageobjects", "fix failing tests", "selector drift", + "tests are failing". + Does NOT trigger for: adding/fixing Gherkin (use living-doc-scenario-creator); resolving + missing data-cy attributes (use data-cy-instrument); deleting deprecated BDD files + (use bdd-maintain). license: Apache-2.0 compatibility: GitHub Copilot --- -# Living Doc — PageObject Scan +# Living Doc — PageObject Scan & Webapp Exploration > **Glossary:** Feature, PageObject, Functionality — see [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)). -> **BDD schemas:** PageObject file header (required fields, cross-reference format, operational notes, common mistakes) — see [living-doc-bdd-schemas](../references/living-doc-bdd-schemas.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-bdd-schemas.md)). +> **BDD schemas:** ExplorationFixture, seed.yaml, manifest field_constraints, PageObject file header — see [living-doc-bdd-schemas](../references/living-doc-bdd-schemas.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-bdd-schemas.md)). -**Scope:** This skill generates PageObjects only for `UI` Features (web pages, modals, screens). -API Features use annotated endpoint methods as their living contract anchor — not PageObjects. +**Scope:** UI Features only (web pages, modals, screens). API Features use annotated endpoint methods — not PageObjects. -**Selector preference (from glossary):** `data-testid` > `aria-label`/role > CSS class. -Flag any element that only has positional CSS selectors (`nth-child`, `first-of-type`) as fragile -and recommend the development team add a `data-testid` attribute. +**Selector preference:** `data-testid` > `aria-label`/role > CSS class. Flag positional selectors (`nth-child`, `first-of-type`) as `FRAGILE`. --- @@ -38,301 +34,278 @@ and recommend the development team add a `data-testid` attribute. | Mode | Input | Use when | |---|---|---| -| **Create** (initial scan) | App URL or test suite root | No PageObjects exist yet — bootstrapping from scratch | -| **Maintain** (rescan/update) | Existing PageObject files + current app | UI has changed; detect drift and update | +| **Create** (initial scan) | App URL or test suite root | No PageObjects exist — bootstrapping or first session on a new app | +| **Maintain — RE-SCAN** | Existing PageObject files + current app | UI refactored, new feature shipped, or significant route changes — full manifest refresh | +| **Maintain — HEALING** | Failing test names / scenario titles | Test suite failures due to selector drift — failing tests only, do not touch passing tests | --- -## Create mode — initial scan +## Create mode -### Inputs +### Step 0 — Business Seed assembly -- `url`: root URL of the web application (authenticated access if needed) -- `dir`: path to an existing test suite with step files or PageObject skeletons +Before crawling, locate or create `seed.yaml` and `manifest.json`: -### Workflow +1. Search for `seed.yaml` containing `base_url:`; search for `manifest.json` containing `pageobject_path` entries. +2. If found, load both and resume — all manifest entries are already discovered. +3. If not found, create at the living-doc directory or `.copilot/bdd/`. +4. On first discovery, propose adding both paths to `.github/copilot-instructions.md`: -**1. Crawl the web application** +```markdown +## BDD Artifacts +- **Business Seed:** `<path>/seed.yaml` +- **Exploration Manifest:** `<path>/manifest.json` +``` + +Collect seed content from whichever sources are available: + +| Source | Behaviour | +|---|---| +| **A — Living documentation** | Extract Feature names, US titles, AC texts, and primary routes. | +| **B — Sitemap / route config** | Parse Angular router, React Router, or `sitemap.xml` for URL paths. | +| **C — OpenAPI / Swagger** | Extract endpoint paths; map REST resources to UI screens where obvious. | +| **D — Existing PageObjects** | Load current `manifest.json` — treat known surfaces as already discovered. | +| **E — Guided traversal** | See [Guided Traversal Protocol](#guided-traversal-protocol-source-e) below. | + +**Credential rule:** Never store literals in `seed.yaml`. Always use `env:VAR_NAME`: + +```yaml +base_url: https://... +credentials: + username: env:BDD_USERNAME + password: env:BDD_PASSWORD +known_routes: + - path: /login + feature: Authentication +guided_steps: [] +form_fixtures: {} +``` + +**Partial state rule:** `seed.yaml` present, `manifest.json` absent = first run. Begin crawl from `base_url`; do not assume any surfaces are discovered. + +### Step 1 — Crawl + +Navigate each route in `seed.yaml` via MCP Playwright. Snapshot DOM; identify interactive elements, forms, navigation links, significant UI surfaces. Follow links to find new routes not yet in manifest. + +**Entity harvesting:** when a domain ID, version, feed ID, or other parameterised value is read from the DOM, record it under `known_entities` in `seed.yaml` (fields: `id`, `version`, `name`, `status`, `owner`, `note`). Use before prompting the user for parameterised route values. + +**Parameterised routes:** check `seed.yaml known_entities` for a match owned by the current test user before navigating `/path/{id}/{version}`. Only fall back to user-assist pause if none exists. -Traverse all reachable routes from the root URL: -- Enumerate all distinct routes (paths and query patterns) -- On each route: capture the rendered DOM -- For SPAs: trigger navigation events to reach client-side routes +**Dismiss rule:** close any modal/overlay (Cancel → × → Escape) before moving to the next route. -**Handling authenticated routes:** +Repeat until coverage plateau — no new surfaces in the last full iteration. + +### Step 2 — Auth handling | Auth type | Strategy | |---|---| -| Cookie/session | Log in once via Playwright `storageState` and reuse across routes | -| OAuth / OIDC | Inject a pre-issued test token via `localStorage` or `Authorization` header | -| MFA-protected | Use a dedicated test account with MFA disabled, or a TOTP library with a known seed | -| Multi-step wizard | Parse existing step definitions to reconstruct the navigation sequence | - -**2. Discover elements per screen** - -For each distinct screen/route, extract: -- Interactive elements: buttons, links, form inputs, dropdowns, checkboxes -- Display elements: tables, lists, notifications, modals -- Page-level: title, heading (h1), primary URL pattern - -**3. Form traversal (deep exploration)** - -For each form, wizard, or dialog discovered on the route, attempt to fill and progress: - -a. **Resolve field values** using the sourcing cascade (see [living-doc-bdd-schemas — ExplorationFixture](../references/living-doc-bdd-schemas.md#explorationfixture)): - 1. Check `seed.yaml form_fixtures` for a pre-declared value for this route + field. - 2. If absent: navigate to the entity list for this surface type; read an actual field value - from an existing entity. Replay it as `copyable`, or append a suffix (e.g. `-copy`) to - name fields to avoid duplicate rejection (`derived`). - 3. If no existing entities: infer a `fake` value from label + placeholder + tooltip + - adjacent validation hint text. -4. If a `real-world` field has no resolvable value: user-assist pause → ask user → record - to `form_fixtures` with `source: user_provided`, then continue. - -b. **Fill and progress** — fill all resolved fields; click Submit or Next. Scan the resulting - page or confirmation state. Record the fill sequence as `navigation_steps` and the required - values as `data_requirements` in the manifest `navigation_context`. - -c. **Probe validation behaviour** — after a successful fill-and-submit, return to the form and - probe each text input: - - **Special characters** (`<>'"&\`) — observe inline error, silent strip, or truncation. - - **Oversized input** (200+ random characters) — observe character counter, truncation at - max length, or rejection message. - - **Wrong type** (alphabetic text in a numeric or date field) — observe inline validation - message. - - **Duplicate detection** (value identical to a known existing entity name) — observe - duplicate-rejection error and capture its `data-cy`. - -d. **Scan validation state** — after each probe, run the core scan and elements-without-data-cy - scripts to capture `data-cy` error messages, character counters, and validation banners that - are only visible during invalid input. These become source material for `field_validation` - Functionality stubs. - -e. **Record findings** in the manifest `navigation_context.field_constraints` for this route: - `{ field_data_cy, max_length, special_chars, duplicate, duplicate_error_data_cy, real_world_required }` - -**4. Generate PageObject skeleton** - -One PageObject class per distinct screen. Naming: `<ScreenName>Page`. +| Cookie/session | Log in once via Playwright `storageState`, reuse across routes. | +| OAuth/OIDC | Inject pre-issued test token via `localStorage` or `Authorization` header. | +| MFA-protected | Use test account with MFA disabled, or TOTP library with known seed. | +| Multi-step wizard | Parse existing step definitions to reconstruct navigation sequence. | -```python -# ✅ Generated skeleton — Python / Playwright -# living-doc: FEAT-003 | /checkout -class CheckoutPage: - ROUTE = '/checkout' - ORDER_SUMMARY = '[data-testid="order-summary"]' - CONFIRM_BUTTON = '[data-testid="confirm-order-btn"]' - PROMO_INPUT = '[data-testid="promo-code-input"]' - ERROR_BANNER = '[data-testid="error-banner"]' +### Step 3 — Form traversal (deep exploration) - def __init__(self, page: Page, base_url: str = '') -> None: - self.page = page - self.base_url = base_url +Resolve field values using the **ExplorationFixture sourcing cascade** (see [living-doc-bdd-schemas](../references/living-doc-bdd-schemas.md#explorationfixture)): - def open(self) -> 'CheckoutPage': - self.page.goto(f'{self.base_url}{self.ROUTE}') - self.wait_until_loaded() - return self +1. `seed.yaml form_fixtures` pre-declared value for this route + field. +2. Value copied from an existing entity (`copyable`), or suffixed to avoid duplicate rejection (`derived`). +3. Inferred `fake` value from label + placeholder + tooltip. +4. User-assist pause for `real-world` fields — record to `form_fixtures` as `source: user_provided`. - def wait_until_loaded(self) -> None: - expect(self.page.locator(self.ORDER_SUMMARY)).to_be_visible() +Skip `condition`-gated fields until the controlling field holds the required value. - def enter_promo_code(self, code: str) -> None: - self.page.fill(self.PROMO_INPUT, code) +After successful submit, probe each text input: special characters (`<>'"&\`), oversized input (200+ chars), wrong type, duplicate value. Run core scan after each probe to capture `data-cy` error elements visible only in error state. Record in `navigation_context.field_constraints`. - def confirm_order(self) -> None: - self.page.click(self.CONFIRM_BUTTON) +#### Angular CPS component interactions - def assert_error_visible(self, message: str) -> None: - expect(self.page.locator(self.ERROR_BANNER)).to_contain_text(message) -``` +> **Angular-specific.** For React/Vue, adapt component resolution; all other steps apply unchanged. + +| Component | Correct interaction | +|---|---| +| `cps-radio-group` | `browser_click` inner `<label>` or `<span>` matching option text. Do NOT use `fill()`. | +| `cps-select` | `browser_click` to open portal, then `browser_click` `<li>` by text. | +| `cps-autocomplete` | `browser_type` into inner `<input>`, wait for dropdown, `browser_click` option. | +| `cps-switch` / `cps-checkbox` | `browser_click` the wrapper. | +| `app-text-editor` (rich text) | `browser_click` `contenteditable` child, then `browser_type`. | +| `cps-button` | `browser_click` inner `<button>` (or `evaluate`: `el.querySelector('button').click()`). | +| `input[type=file]` | `mcp_browser_file_upload` or `page.setInputFiles()` with fixture path from `seed.yaml`. | + +After interacting with a required field (e.g. `cps-radio-group`), re-check whether gated buttons (Continue, Save) have become enabled. + +### Step 4 — Generate PageObject skeleton + +One class per distinct screen. Naming: `<ScreenName>Page`. ```typescript -// ✅ Generated skeleton — TypeScript / Playwright // living-doc: FEAT-003 | /checkout import { type Page, type Locator, expect } from '@playwright/test'; export class CheckoutPage { - readonly orderSummary: Locator; readonly confirmButton: Locator; readonly promoInput: Locator; readonly errorBanner: Locator; constructor(readonly page: Page) { - this.orderSummary = page.getByTestId('order-summary'); this.confirmButton = page.getByTestId('confirm-order-btn'); this.promoInput = page.getByTestId('promo-code-input'); this.errorBanner = page.getByTestId('error-banner'); } - async enterPromoCode(code: string): Promise<void> { - await this.promoInput.fill(code); - } - - async confirmOrder(): Promise<void> { - await this.confirmButton.click(); - } - - async assertErrorVisible(message: string): Promise<void> { - await expect(this.errorBanner).toContainText(message); - } + async confirmOrder(): Promise<void> { await this.confirmButton.click(); } + async enterPromoCode(code: string): Promise<void> { await this.promoInput.fill(code); } + async assertErrorVisible(msg: string): Promise<void> { await expect(this.errorBanner).toContainText(msg); } } ``` -The Living Doc Feature link (`FEAT-<nnn>`) is recorded in a file-level header comment (see -examples above) — not in the class docstring. The multi-field header format for PageObject -files is defined in [living-doc-bdd-schemas — PageObject File Header](../references/living-doc-bdd-schemas.md#pageobject-file-header). +```python +# living-doc: FEAT-003 | /checkout +class CheckoutPage: + ROUTE = '/checkout' + CONFIRM_BUTTON = '[data-testid="confirm-order-btn"]' + PROMO_INPUT = '[data-testid="promo-code-input"]' + ERROR_BANNER = '[data-testid="error-banner"]' -Flag fragile selectors: + def __init__(self, page, base_url=''): + self.page = page -> "Element `<description>` has a positional CSS selector. Please add: -> `data-testid='<descriptive-kebab-name>'` — e.g. `data-testid='confirm-order-btn'`" + def confirm_order(self): self.page.click(self.CONFIRM_BUTTON) + def enter_promo_code(self, code): self.page.fill(self.PROMO_INPUT, code) + def assert_error_visible(self, msg): expect(self.page.locator(self.ERROR_BANNER)).to_contain_text(msg) +``` -Still include the current selector in the generated PageObject so test authoring is not blocked, but -annotate that selector constant with a `FRAGILE` comment and repeat the warning in the scan / breaking -change report. +Flag fragile selectors: annotate `# FRAGILE`, recommend `data-testid='<descriptive-name>'`. Keep the current selector so authoring is not blocked. -**5. Map PageObjects to Feature entities** +### Step 5 — Map PageObjects to Feature entities -One PageObject ≈ one `UI` Feature. Write the Feature ID as a header comment in the generated PageObject file (the `// living-doc: FEAT-<nnn> | <route>` line shown in the templates above). Also record `feature_id` in the manifest entry for the route. +One PageObject ≈ one `UI` Feature. Write `// living-doc: FEAT-<nnn> | <route>` as a file-level header and record `feature_id` in the manifest. -- If a matching Feature (`FEAT-<nnn>`) exists in the living documentation: add the header comment and manifest entry. -- If no Feature exists: write `// living-doc: FEAT-UNKNOWN | <route>` as a placeholder and flag the route in the scan report as **"needs Feature entity"**. Do not auto-create a Feature file — raise it for the team to create via `living-doc-create-feature`. +- Feature exists → add header and manifest entry. +- No Feature → write `FEAT-UNKNOWN` placeholder, flag as **"needs Feature entity"** in the scan report. Do not auto-create — raise via `living-doc-create-feature`. -**6. Generate Functionality stubs from discovered behaviors** +### Step 6 — Generate Functionality stubs -For each **behavior** identified on the screen — an interaction pattern, business operation, or -component capability — propose a Functionality stub (`FUNC-<nnn>`) with a name following -the glossary pattern `<Feature name> – <behavior phrase>`: +For each discovered behavior, propose a stub named `<Feature name> – <behavior phrase>`: -- Button: `"Checkout Page – Confirm Order"` -- Form: `"Login Page – Submit Credentials"` -- Table: `"Order History Page – Display Order List"` +- Button → `"Checkout Page – Confirm Order"` +- Form → `"Login Page – Submit Credentials"` +- Table → `"Order History Page – Display Order List"` -Note: a Functionality represents a business behavior, not an individual UI element. One interactive -element may map to one Functionality, or a group of elements may represent a single behavior. -The team decides the appropriate granularity when promoting stubs. +Output to `features/functionalities/<feat-kebab>/func-<kebab>.feature` with `@FUNC_ID:FUNC-UNKNOWN`. Promote via `living-doc-create-functionality` when IDs are assigned. -Output Functionality feature file stubs to `features/functionalities/<feat-kebab>/func-<kebab>.feature` -with `@FUNC_ID:FUNC-UNKNOWN` placeholder tags for team review. When the Functionality is confirmed -and an ID is assigned, use `living-doc-create-functionality` to populate the canonical entity file. +--- -**Dynamic list elements:** +## Guided Traversal Protocol (Source E) -```python -# ✅ — dynamic lists: use locator methods, not positional selectors -def get_cart_items(self): - return self.page.locator('[data-testid="cart-item"]').all() +Use when automated crawling cannot proceed — multi-step wizards, auth flows, role-gated screens, forms with missing business knowledge. -def get_cart_item_by_sku(self, sku: str): - return self.page.locator(f'[data-testid="cart-item"][data-sku="{sku}"]') -``` +1. Screenshot; show user what the agent sees. +2. Ask: *"I've reached a decision point at [URL]. What should I do next?"* +3. Execute the action via MCP Playwright. +4. Append to `seed.yaml guided_steps`: ---- +```yaml +guided_steps: + - url: /checkout/payment + action: fill + field: card-number + value: env:TEST_CARD_NUMBER + note: "Test Visa card" +``` -## Maintain mode — rescan and update +5. Continue crawl. -**0. Load manifest and prioritise routes** +**CAPTCHA:** pause, ask user to solve manually, continue after confirmation, record `action: captcha_solved`. -Read `.copilot/bdd/manifest.json`. Sort routes by `last_scanned` ascending (oldest first). For focused healing (triggered by failing tests or a PR), filter to the routes linked to the failing test files or the changed UI paths provided by the caller. +--- -**1. Diff existing PageObjects against current DOM** +## Maintain mode -For each route to scan, navigate using `navigation_context.navigation_steps` if present — this avoids rediscovering hard-to-reach routes. For each selector in the existing PageObject, check if it still resolves: -- **Present and unchanged**: no action -- **Present but changed**: update selector; log as `UPDATED`; if the replacement selector is evident - (for example a renamed `data-testid`), report the exact new selector in the action required line -- **Missing**: flag as `BREAKING CHANGE` — linked test steps may fail +Two scopes — activate the one that matches the trigger. -**2. Detect new elements**: propose additions. +| Scope | Trigger | Breadth | +|---|---|---| +| **RE-SCAN** | New feature shipped, UI refactored, or significant route changes | Full manifest — all routes re-visited, new routes actively discovered | +| **HEALING** | Test suite failures due to selector drift or PageObject mismatch | Failing tests only — do not touch passing tests or unrelated PageObjects | -**3. Update PageObject files** — modify selector constants only. Preserve existing action and -assertion method logic. Never auto-delete methods — flag removals for developer review. For missing -selectors, keep the selector constant and annotate it with a `BREAKING` comment so developers can -review whether the element was removed or renamed. +--- -**4. Breaking change report** +### RE-SCAN scope -Write results to `.copilot/bdd/breaking-changes.md`. The file has a fixed structure and is overwritten on each scan: +#### Step 0 — Load and prioritise -```markdown -# Breaking Changes Report +Read `manifest.json`. Sort by `last_scanned` ascending. -Generated: <ISO timestamp> -Scan scope: <full | healing | scoped> +```bash +python scripts/manifest_diff.py --manifest .copilot/bdd/manifest.json --pages-dir tests/pages +python scripts/manifest_diff.py --manifest .copilot/bdd/manifest.json --pages-dir tests/pages --diff +``` -## <route-path> +#### Steps 1–3 — Diff, detect, update -| Selector | Status | Linked test | Action | -|---|---|---|---| -| `PageObject.locatorName` | REMOVED | `feature-file.feature:<line>` | Verify if element was removed or renamed | -| `PageObject.otherLocator` | CHANGED | — | Update selector constant | +Navigate each route using `navigation_context.navigation_steps` if present. For each selector: -## Routes needing a Feature entity +| State | Action | +|---|---| +| Present, unchanged | No action | +| Present, changed | Update constant; log `UPDATED` with new value | +| Missing | Flag `BREAKING CHANGE`; annotate constant `# BREAKING`; never auto-delete | -| Route | PageObject | Reason | -|---|---|---| -| `/auth/settings` | `SettingsPage.ts` | No matching FEAT-xxx found in the living documentation | -``` +Propose additions for new elements. Update selector constants only; never auto-delete methods. -**5. Update manifest** +**Actively discover new routes** — do not limit discovery to routes already in `manifest.json`. On each page snapshot: +- Find all `<a href>` links that resolve to new paths not yet in the manifest. +- Find buttons whose purpose suggests navigation (e.g. "Create order", "View details") — click them and record the resulting URL. +- Find tab panels, side-nav items, and wizard steps that expose sub-routes. +- Any new URL discovered this way is a candidate manifest entry; add it and crawl it recursively. -After confirmation of all changes, update the manifest entry for each scanned route: -- Set `last_scanned` to the current ISO 8601 timestamp. -- Update `elements` and `coverage_gaps` to reflect the current DOM state. -- Populate or update `navigation_context` if new information was gathered about how to reach the route. +#### Step 4 — Breaking change report -Use `scripts/manifest_diff.py` to detect stale manifest entries and undocumented PageObject -files before running a full rescan. +Overwrite `.copilot/bdd/breaking-changes.md`: -```bash -# Show stale manifest entries and undocumented PageObjects -python scripts/manifest_diff.py --manifest .copilot/bdd/manifest.json --pages-dir tests/pages +```markdown +# Breaking Changes Report +Generated: <ISO> | Scope: <full|healing|scoped> -# Include a diff of element counts since last scan -python scripts/manifest_diff.py --manifest .copilot/bdd/manifest.json --pages-dir tests/pages --diff +## <route> +| Selector | Status | Linked test | Action | +|---|---|---|---| +| `Page.locatorName` | REMOVED | `file.feature:<line>` | Verify removed or renamed | ``` +#### Step 5 — Update manifest and register new routes + +After confirming changes: set `last_scanned`, update `elements` and `coverage_gaps`, update `navigation_context`. Add new surfaces; mark removed surfaces as `deprecated`. Generate new scenarios for newly discovered ACs (load `living-doc-scenario-creator`). + --- -## Output artifacts +### HEALING scope -| Artifact | Location | -|---|---| -| PageObject files | `tests/pages/<ScreenName>Page.py` (or `.ts`) | -| Feature link | `// living-doc: FEAT-<nnn> \| <route>` header comment in the PageObject file. If no Feature exists: `FEAT-UNKNOWN` placeholder and a note in the scan report. Header format: see [living-doc-bdd-schemas — PageObject File Header](../references/living-doc-bdd-schemas.md#pageobject-file-header). | -| Functionality feature file stubs | `features/functionalities/<feature-kebab>/func-<kebab>.feature` — one file per discovered Functionality behavior, `@FUNC_ID:FUNC-UNKNOWN` tag until ID is assigned | -| Breaking change report | `.copilot/bdd/breaking-changes.md` | -| Inaccessible routes (PHASE 5) | `.copilot/bdd/scan-phase5-inaccessible.md` | -| Final scan report (PHASE 6) | `.copilot/bdd/scan-phase6-report.md` | -| Exploration manifest | `.copilot/bdd/manifest.json` | +**Before starting:** ask for the list of failing scenario titles if not provided — do not proceed without a confirmed scope. -> **Note:** Locations above are illustrative defaults. Actual paths depend on the project's repository structure and Storage Profile configuration. +1. Trace each failing scenario to its PageObject and step definition. +2. Navigate to the affected page via MCP Playwright; snapshot the current DOM. +3. Find updated element IDs or selectors; update only the affected PageObject(s). +4. Verify the step definition binding still resolves; fix if broken. +5. Re-run only the previously failing tests to confirm healing. Do not re-run the full suite. --- ## Manifest schema -The manifest records per-route exploration state. Agents and tools read it to drive healing sessions without re-discovering routes. - ```json { "version": "1.0", "routes": { "/auth/all-domains": { - "pageobject_path": "aul-ui/playwright/pages/AllDomainsPage.ts", + "pageobject_path": "playwright/pages/AllDomainsPage.ts", "feature_id": "FEAT-001", "last_scanned": "2026-05-26T10:30:00Z", - "elements": [ - { "data_cy": "create-domain-btn", "tag": "cps-button" }, - { "data_cy": "domains-table", "tag": "table" } - ], - "coverage_gaps": [ - { "tag": "input", "placeholder": "Search domains", "suggested_data_cy": "domains-search-input" } - ], + "elements": [{ "data_cy": "create-domain-btn", "tag": "cps-button" }], + "coverage_gaps": [{ "tag": "input", "placeholder": "Search", "suggested_data_cy": "domains-search-input" }], "navigation_context": { "prerequisites": "User must be logged in.", - "navigation_steps": "Click sidebar item \u2018All Domains\u2019.", + "navigation_steps": "Click sidebar 'All Domains'.", "data_requirements": null, "auth_role": "standard user", "notes": null, @@ -343,20 +316,29 @@ The manifest records per-route exploration state. Agents and tools read it to dr } ``` -| Field | Type | Purpose | -|---|---|---| -| `last_scanned` | ISO 8601 string | Timestamp of the last successful scan for this route. Used during healing to surface stale entries and prioritise rescans. | -| `elements` | array | All `data-cy` elements found on the route at last scan. | -| `coverage_gaps` | array | Interactive elements lacking `data-cy` at time of scan, with suggested names. | -| `pageobject_path` | string | Relative path to the linked PageObject file. | -| `feature_id` | string | Living doc Feature entity ID linked to this route. | -| `navigation_context` | object | **How to reach hard-to-access routes.** Populated on first discovery; reused in all subsequent healing sessions so the agent can navigate directly without re-discovering the path. | -| `navigation_context.prerequisites` | string | State that must exist before navigating (e.g. "a domain must have been visited at least once"). | -| `navigation_context.navigation_steps` | string | Step-by-step path to the route from the app root or login page. | -| `navigation_context.data_requirements` | string/null | Test data that must exist (e.g. "at least one published domain"). | -| `navigation_context.field_constraints` | array | Per-field validation findings from form traversal probing. Schema: `{ field_data_cy, max_length, special_chars, duplicate, duplicate_error_data_cy, real_world_required }`. Empty array until probed. | -| `navigation_context.auth_role` | string | Minimum role required to reach this route. | -| `navigation_context.notes` | string/null | Any additional context for the agent (e.g. quirks, timing, overlay triggers). | +| Field | Purpose | +|---|---| +| `last_scanned` | ISO 8601 timestamp; surfaces stale entries. | +| `elements` | All `data-cy` elements at last scan. | +| `coverage_gaps` | Interactive elements lacking `data-cy`; with suggested names. | +| `pageobject_path` | Relative path to PageObject file. | +| `feature_id` | Living doc Feature entity ID. | +| `navigation_context` | How to reach hard-to-access routes; reused in all subsequent sessions. | +| `navigation_context.field_constraints` | Per-field validation findings. Schema: `{ field_data_cy, max_length, special_chars, duplicate, duplicate_error_data_cy, real_world_required }`. | + +--- + +## Output artifacts + +| Artifact | Location | +|---|---| +| PageObject files | `tests/pages/<ScreenName>Page.py` / `.ts` | +| Feature link | `// living-doc: FEAT-<nnn> | <route>` header comment | +| Functionality stubs | `features/functionalities/<feat-kebab>/func-<kebab>.feature` | +| Breaking change report | `.copilot/bdd/breaking-changes.md` | +| Exploration manifest | `.copilot/bdd/manifest.json` | + +> Paths are defaults — actual locations depend on the project's Storage Profile. --- @@ -366,5 +348,6 @@ The manifest records per-route exploration state. Agents and tools read it to dr |---|---| | Generate BDD scenarios for a User Story | `living-doc-scenario-creator` | | Create a User Story for this screen | `living-doc-create-user-story` | -| Document an API endpoint or REST surface | `living-doc-create-functionality` | -| Resolve missing `data-cy` attributes after scan | `data-cy-instrument` (when `coverage_gaps` non-empty) | +| Resolve missing `data-cy` attributes | `data-cy-instrument` | +| Delete deprecated BDD files | `bdd-maintain` | + diff --git a/skills/living-doc-scenario-creator/SKILL.md b/skills/living-doc-scenario-creator/SKILL.md index be6f413..2fb0672 100644 --- a/skills/living-doc-scenario-creator/SKILL.md +++ b/skills/living-doc-scenario-creator/SKILL.md @@ -1,136 +1,115 @@ --- name: living-doc-scenario-creator description: > - Generate the living-doc feature file header block (@US_ID:/@FUNC_ID: tag, Feature narrative, - # Acceptance Criteria: block) and scenario skeletons (one @AC:-tagged Scenario: title with - ... placeholder per ACTIVE AC). Produces an AC coverage report. Step bodies (Given/When/Then) - are authored by bdd-scenario-gen. - Use when bootstrapping a feature file for a US or Functionality, auditing AC coverage, or - tagging partial coverage with aspect notation. - Triggers on: "feature file header for user story", "living-doc feature file", - "bootstrap feature file for US", "US feature file structure", "cover AC with scenarios", - "scenario coverage for US", "map AC to scenarios", "AC coverage for US", - "partial AC coverage", "scenario creator", "generate feature file for US", - "bootstrap living-doc scenarios". - Does NOT trigger for: writing scenario step bodies (use bdd-scenario-gen), standalone - Gherkin (use bdd-scenario-gen), step definitions (use gherkin-step), - doc gaps (use living-doc-gap-finder). - Pairs with living-doc-create-user-story, bdd-scenario-gen, and living-doc-pageobject-scan. + Generate Gherkin scenarios and living-doc feature files from User Story and Functionality ACs. + Covers: full feature file output (header block, @AC:-tagged scenarios, complete Given/When/Then + bodies), standalone Gherkin without an entity, GWT correctness, ubiquitous language rules, + one-behaviour-per-scenario, Scenario Outline, Background, anti-patterns, @AC: traceability + annotations (authoritative format), AC coverage report, gap detection via living-doc-gap-finder, + and step definition resolution against PageObjects. + Triggers on: "write a Gherkin scenario", "BDD scenario", "standalone feature file", + "Given When Then", "Scenario Outline", "BDD anti-patterns", "review my feature file", + "BDD scenarios for", "convert acceptance criteria to Gherkin", "exploratory scenario", + "feature file header for user story", "living-doc feature file", "bootstrap feature file for US", + "cover AC with scenarios", "scenario coverage for US", "map AC to scenarios", "scenario creator". + Does NOT trigger for: implementing step definitions (use gherkin-step), writing unit tests. + Pairs with living-doc-create-user-story and living-doc-pageobject-scan. license: Apache-2.0 compatibility: GitHub Copilot --- # Living Doc — Scenario Creator -> **Glossary:** User Story, AC, PageObject, step definitions — see [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)). +> **Glossary:** User Story, AC, Feature, PageObject — see [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)). > **BDD schemas:** US and Functionality feature file templates — see [living-doc-bdd-schemas](../references/living-doc-bdd-schemas.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-bdd-schemas.md)). -## AC state vocabulary +**AC states:** `PLANNED` · `IN_REVIEW` · `ACTIVE` · `DEPRECATED`. Only `ACTIVE` ACs drive scenario generation. -**AC states used in this skill:** `PLANNED` · `IN_REVIEW` · `ACTIVE` · `DEPRECATED` +--- -Only `ACTIVE` ACs drive scenario generation. `PLANNED` and `DEPRECATED` ACs are skipped. +## Two modes -**AC traceability format** — for the authoritative `# AC:` and `@AC:` annotation format, load `bdd-scenario-gen`. +| Mode | When to use | +|---|---| +| **Entity mode** | A User Story or Functionality entity exists — generate full feature file with header, `@AC:` tags, and step bodies. | +| **Standalone mode** | No US/FUNC entity — write Gherkin directly from business descriptions. Use `@AC:STANDALONE` as tag; `gherkin-living-doc-sync` will note it but not flag a traceability gap. | --- -## Inputs required +## Entity mode workflow -| Input | Source | Required | -|---|---|---| -| User Story (with ACs) | User Story entity file (or inline JSON) | Yes | -| Available PageObjects | `tests/pages/` directory | Recommended | -| Existing step definitions | `tests/steps/` directory | Recommended | +### Step 1 — Read the entity -If PageObjects or step files are not available, generate scenarios with stub step implementations -(see Step 3 for the two-case protocol: PageObject method found vs. not found). - ---- +Load the User Story or Functionality. Confirm: +- ID follows `US-<nnn>` or `FUNC-<nnn>` format. +- Which ACs are `ACTIVE` (eligible for generation). +- ACs are atomic — one input condition, one observable outcome. -## Workflow +If no ACs are `ACTIVE`, do not generate empty scenarios. Output a coverage report with state-specific skip reasons (`PLANNED`: `skipped — not yet active`, `DEPRECATED`: `skipped — deprecated AC`) and advise the user to re-run when an AC becomes `ACTIVE`. -### Step 1 — Read the User Story +### Step 2 — Gap detection -Load the User Story. Confirm: -- ID follows `US-<nnn>` format -- Which ACs are eligible for generation (`ACTIVE`) -- ACs are atomic — each has one input condition and one observable outcome +An AC is uncovered if no `.feature` file carries `@AC:<id>`. Use `living-doc-gap-finder` (bottom-up mode) to identify `ACTIVE` ACs with no linked scenario before writing new files. -Treat requests such as "write feature tests for US-007" as requests to generate BDD scenarios plus a coverage table for that User Story. +### Step 3 — Generate feature file -If no ACs are `ACTIVE`, do **not** generate empty or stub scenarios. Instead, -output a coverage report that lists every AC with its state-specific skip reason (`PLANNED`: -`skipped — not yet active`, `DEPRECATED`: `skipped — deprecated AC`) and advise the user to -re-run the scenario creator when an AC becomes `ACTIVE`. +For each `ACTIVE` AC, output `# AC:` comment, `@AC:` tag, `Scenario:` title, and full Given/When/Then step bodies. -### Step 2 — Generate scenario skeletons +**Scenario title by AC type:** +- `happy_path` → `Scenario: <positive outcome>` +- `error` → `Scenario: <US title> — <error condition>` +- `alternative` → `Scenario: <US title> — <alternative path>` -For each `ACTIVE` AC, generate the `# AC:` comment, `@AC:` tag, and `Scenario:` title with `...` as the step placeholder. Step bodies (Given/When/Then) are authored by `bdd-scenario-gen`. - -Select the title by AC type: -- `happy_path`: `Scenario: <positive outcome>` -- `error`: `Scenario: <US title> — <error condition>` (prefer the crisp business-facing failure title from the AC if available) -- `alternative`: `Scenario: <US title> — <alternative path>` +**Traceability format** (authoritative): ```gherkin -# AC:US-1-01 (v1.0.0 - Active) — customer places an order with a saved payment method +# AC:US-1-01 (v1.0.0 - ACTIVE) — customer places an order with a saved payment method @AC:US-1-01 Scenario: Customer successfully places an order - ... - -# AC:US-1-02 (v1.0.0 - Active) — order is rejected when the payment card is declined -@AC:US-1-02 -Scenario: Order rejected when payment card is declined - ... + Given the customer has items in their cart + When they confirm the order with their saved payment method + Then the order confirmation is displayed ``` -### Step 3 — Hand off step bodies to bdd-scenario-gen - -The skeletons from Step 2 use `...` placeholders. To produce full Given/When/Then implementations, pass the generated feature file to `bdd-scenario-gen`. For step definition code, load `gherkin-step`. +Aspect variant (when one scenario covers only one aspect of a multi-aspect AC): -Do not author step bodies in this skill. - -### Step 4 — Validate AC coverage - -Every `ACTIVE` AC must map to at least one scenario. -The coverage report must list **every** AC on the User Story, including skipped ones. -Use these skip reasons verbatim so the output is predictable and auditable: -- `PLANNED`: `skipped — not yet active` -- `DEPRECATED`: `skipped — deprecated AC` +```gherkin +# AC:US-1-01 (v1.0.0 - ACTIVE) — displays {required field} on login screen | aspect: username input +@AC:US-1-01/aspect:username-input +Scenario: Login form shows the username input field +``` -Run `scripts/coverage_report.py <living_doc_dir> <features_dir>` for a full coverage report. +Multiple ACs per scenario — one comment + tag pair per AC: +```gherkin +# AC:US-1-01 (v1.0.0 - ACTIVE) — invalid credentials show an error message +# AC:US-1-02 (v1.0.0 - ACTIVE) — account lockout after 3 failed attempts +@AC:US-1-01 +@AC:US-1-02 +@Regression +Scenario: User is locked out after repeated failed logins ``` -AC COVERAGE REPORT — US-001 - AC:US-001-01 (ACTIVE): ✅ covered by "Customer successfully places an order" - AC:US-001-02 (ACTIVE): ✅ covered by "Order rejected when payment card is declined" - AC:US-001-03 (ACTIVE): ❌ NOT COVERED — added to gap list - AC:US-001-04 (PLANNED): ⏭ skipped — not yet active - AC:US-001-05 (DEPRECATED): ⏭ skipped — deprecated AC -``` -Use `scripts/coverage_report.py` to generate this report across all entities. +AC tag prefix matches the parent entity: `@AC:US-<n>-<nn>` for User Story, `@AC:FUNC-<nnn>-<nn>` for Functionality. + +**Feature file types:** -### Step 5 — Output artifacts +| Type | Location | Feature block | +|---|---|---| +| User Story (E2E) | `features/us/us-<nnn>-<kebab>.feature` | `Feature: <US title>` with As-a/I-can/so-that + `@US_ID:US-<n>` | +| Functionality | `features/functionalities/<feat-kebab>/func-<nnn>-<kebab>.feature` | `Feature: <Feature name> — <Functionality name>` + `@FUNC_ID:FUNC-<nnn>` | -**`.feature` file** — one per User Story, named `us-<nnn>-<kebab-title>.feature` in lowercase. The file starts with a header block (matching the project's US feature file convention) and uses `@AC:` traceability tags above each scenario. When showing the generated output, include the filename as a comment: +**US feature file example:** ```gherkin # us-001-place-an-online-order.feature -# Source: https://github.com/<org>/<repo>/issues/<n> - # Business Value: -# - <concise value statement> +# - Customers can complete an order without calling support. # Acceptance Criteria: -# -# AC:US-001-01 (v1.0.0 - Active) -# - Customer places an order with a saved payment method. -# -# AC:US-001-02 (v1.0.0 - Active) -# - Order is rejected when the payment card is declined. +# AC:US-001-01 (v1.0.0 - Active) — customer places an order with a saved payment method. +# AC:US-001-02 (v1.0.0 - Active) — order is rejected when the payment card is declined. @US_ID:US-001 Feature: Place an online order @@ -141,68 +120,133 @@ Feature: Place an online order # AC:US-001-01 (v1.0.0 - Active) — customer places an order with a saved payment method @AC:US-001-01 Scenario: Customer successfully places an order - ... + Given the customer has items in their cart + When they confirm the order with their saved payment method + Then the order confirmation is displayed # AC:US-001-02 (v1.0.0 - Active) — order is rejected when the payment card is declined @AC:US-001-02 Scenario: Order rejected when payment card is declined - ... + Given the customer has items in their cart + When they attempt to pay with a declined card + Then an error message is shown and the order is not placed +``` + +**Functionality feature file example:** + +```gherkin +@FUNC_ID:FUNC-001 +Feature: Login Page — Validate Password Strength + + # AC:FUNC-001-01 (v1.0.0 - Active) — returns valid=true when password satisfies all rules + @AC:FUNC-001-01 + Scenario: Password meets all complexity rules + Given a password with at least 8 characters, one uppercase, one lowercase, and one number + When password strength is validated + Then the result is valid +``` + +### Step 4 — AC coverage report + +Run `scripts/coverage_report.py <living_doc_dir> <features_dir>` for a full report. Append after the `.feature` code block: + ``` +AC COVERAGE REPORT — US-001 + AC:US-001-01 (ACTIVE): ✅ covered + AC:US-001-02 (ACTIVE): ✅ covered + AC:US-001-03 (ACTIVE): ❌ NOT COVERED + AC:US-001-04 (PLANNED): ⏭ skipped — not yet active +``` + +### Step 5 — Step definition resolution -**Coverage table** — ACs with coverage status (use `scripts/coverage_report.py`). Append it immediately after the `.feature` code block in the response. +For each generated scenario step: + +1. Narrow scope to the relevant PageObject first — check step files that import it for reuse candidates. +2. Match by purpose, not just pattern — confirm the implementation performs the same business action. +3. If purpose-matching step exists, reuse it; note the source file. +4. If no reuse candidate but the PageObject method exists, generate a thin step stub via `gherkin-step`. +5. If neither exists, generate a stub that raises `NotImplementedError` and flag the PageObject extension needed. --- -## Functionality scenarios +## Gherkin quality rules + +### Write in the ubiquitous language -When the source is a Functionality (`FUNC-<nnn>`) rather than a User Story, apply the same workflow but with these differences: +Scenarios must use business domain language. Anyone on the product team must be able to read and verify them without implementation knowledge. -| Aspect | User Story (E2E) | Functionality (system test) | +```gherkin +# ✅ +Given a customer with a gold membership +When they place an order for 2 units of "SKU-100" +Then the order is confirmed and the total is £160.00 + +# ❌ — implementation details +Given the database contains a row in users with tier="gold" +When a POST request is sent to /api/orders +Then the response status is 201 +``` + +### GWT keyword rules + +| Keyword | Purpose | Rule | |---|---|---| -| AC ID format | `AC:US-<nnn>-<nn>` | `AC:FUNC-<nnn>-<nn>` | -| File location | `features/us/us-<nnn>-<kebab>.feature` | `features/functionalities/<feat-kebab>/func-<nnn>-<kebab>.feature` | -| File header | `# Source:` (optional), `# Business Value:`, `# Acceptance Criteria:` block + `@US_ID:US-<n>` tag | `# Source:` (optional), `# Rationale:` (optional), `# Acceptance Criteria:` block + `@FUNC_ID:FUNC-<nnn>` tag | -| Feature block | `Feature: <US title>` with As-a/I-can/so-that | `Feature: <Feature name> — <Functionality name>` (no narrative) | -| Scope | End-to-end, from user's perspective | One atomic behavior, input to output contract | -| Language | Business domain language | Business domain language — same rule; no code calls, no selector references | +| **Given** | System state before the action | Preconditions only — no actions, no assertions | +| **When** | The action the actor takes | Exactly one meaningful action per scenario | +| **Then** | Observable outcome | Assertions only — no actions | +| **And / But** | Continuation | Never as the first step in a block | -**Functionality feature file example:** +One behaviour per scenario. If the scenario name contains "and", it likely tests two behaviours — split it. + +### Scenario Outline + +Use for data-driven variations. Show concrete outcome values in `Examples:`, not raw percentages: ```gherkin -# func-001-validate-password-strength.feature +Scenario Outline: Discount applied correctly for each membership tier + Given a customer with a <tier> membership + When they purchase an item costing £100.00 + Then the total is £<total> + + Examples: + | tier | total | + | gold | 80.00 | + | silver | 90.00 | +``` -# Source: https://github.com/<org>/<repo>/issues/<n> ← optional +### Background -# Rationale: -# - <why this atomic behavior exists> ← optional +Use only when every scenario in the file shares the precondition. Keep to 3 steps or fewer. If only 2–3 scenarios share a precondition, duplicate the `Given` step — prefer clarity over abstraction. `Background` must use only `Given` steps. -# Acceptance Criteria: -# -# AC:FUNC-001-01 (v1.0.0 - Active) -# - Returns valid=true when the password satisfies all complexity rules. -# -# AC:FUNC-001-02 (v1.0.0 - Active) -# - Raises INVALID_PASSWORD when the password is shorter than 8 characters. +### Anti-patterns -@FUNC_ID:FUNC-001 -Feature: Login Page — Validate Password Strength +| Anti-pattern | Fix | +|---|---| +| UI selectors in steps (`I click the "Submit" button`) | Domain action (`the customer submits the order`) | +| Imperative style (`I enter "alice@example.com" in Email field`) | Declarative (`the customer logs in as Alice`) | +| Multiple `When` per scenario | Split into separate scenarios | +| Assertions in Given/When | Move all assertions to `Then` | +| Scenario depends on prior scenario state | Make every scenario fully self-contained | - # AC:FUNC-001-01 (v1.0.0 - Active) — returns valid=true when password satisfies all complexity rules - @AC:FUNC-001-01 - Scenario: Password meets all complexity rules - Given a password with at least 8 characters, one uppercase, one lowercase, and one number - When password strength is validated - Then the result is valid +When reviewing an existing scenario, check for a missing `@AC:` tag above each `Scenario:` — call that out as a traceability defect. - # AC:FUNC-001-02 (v1.0.0 - Active) — raises INVALID_PASSWORD when password is shorter than 8 characters - @AC:FUNC-001-02 - Scenario: Password too short - Given a password with 7 or fewer characters - When password strength is validated - Then the result is invalid with code INVALID_PASSWORD -``` +--- + +## Standalone mode -Functionality scenarios are **not** unit tests written in Gherkin. Steps must still describe observable business-facing input/output — never internal method calls, DB queries, or selector names. +When no User Story or Functionality entity exists, generate scenarios directly from business descriptions: + +- Apply all GWT rules and ubiquitous language rules above. +- Use `@AC:STANDALONE` as an optional tag to signal intentionally unlinked scenarios. +- Omit the header block (`# Business Value:`, `# Acceptance Criteria:`, `@US_ID:`) — start directly with `Feature:`. +- File location is at the user's discretion; `gherkin-living-doc-sync` will note `@AC:STANDALONE` but not flag a traceability gap. + +--- + +## Output format + +Output all generated Gherkin in a single fenced `gherkin` code block starting with `Feature:`. Use only `Scenario:`, `Scenario Outline:`, `Background:`, `Given`, `When`, `Then`, `And`, `But`, `Examples:` inside the block. --- @@ -210,9 +254,7 @@ Functionality scenarios are **not** unit tests written in Gherkin. Steps must st | Request | Correct skill | |---|---| -| Standalone Gherkin without a User Story | `bdd-scenario-gen` | -| Writing step definition code | `gherkin-step` | +| Implementing step definition code | `gherkin-step` | +| Writing unit tests | Use your project's test framework directly | + -**Ambiguous request — "create scenarios for US-007":** If the user does not specify whether they want skeleton structure or full step bodies, ask: -> "Do you want the living-doc feature file header and skeleton scenario titles (continue here in `living-doc-scenario-creator`), or full Given/When/Then scenario bodies (use `bdd-scenario-gen`)?" -This skill produces the reusable structural skeleton (header block + AC-tagged scenario titles); `bdd-scenario-gen` fills in the step bodies. diff --git a/skills/living-doc-scenario-creator/evals/evals.json b/skills/living-doc-scenario-creator/evals/evals.json index f198af4..45551d3 100644 --- a/skills/living-doc-scenario-creator/evals/evals.json +++ b/skills/living-doc-scenario-creator/evals/evals.json @@ -60,11 +60,11 @@ "id": 5, "category": "negative", "prompt": "Write a standalone Gherkin scenario for testing login without a specific User Story.", - "expected_output": "Standalone Gherkin without a User Story is out of scope — routes to bdd-scenario-gen. This skill generates scenarios from User Story ACs; bdd-scenario-gen handles standalone or exploratory scenarios.", + "expected_output": "Standalone Gherkin without a User Story uses living-doc-scenario-creator Standalone mode. Use @AC:STANDALONE tag; scenario is not tied to a catalog entity.", "files": [], "expectations": [ "Does not generate a standalone scenario", - "Routes to bdd-scenario-gen", + "Uses Standalone mode with @AC:STANDALONE tag", "Explains the distinction: US-driven vs. standalone Gherkin" ] }, diff --git a/skills/living-doc-scenario-creator/evals/fixture-map.md b/skills/living-doc-scenario-creator/evals/fixture-map.md index e8f963a..2cdf967 100644 --- a/skills/living-doc-scenario-creator/evals/fixture-map.md +++ b/skills/living-doc-scenario-creator/evals/fixture-map.md @@ -12,7 +12,7 @@ No fixture files for this skill. All evals use inline User Story/AC definitions | 2 | happy-path | _(none — inline AC list in prompt)_ | AC state filtering: Active → generated, Deprecated → skipped, Planned → skipped | | 3 | happy-path | _(none)_ | Case A step stub: PageObject method exists — full stub, no NotImplementedError | | 4 | regression | _(none)_ | Case B step stub: missing PageObject method — NotImplementedError + maintenance flag | -| 5 | negative | _(none)_ | Routing: standalone Gherkin without a US → bdd-scenario-gen | +| 5 | negative | _(none)_ | Routing: standalone Gherkin without a US → living-doc-scenario-creator (standalone mode) | | 6 | paraphrase | _(none)_ | "Write feature tests for US-nnn" → scenario generation request | | 7 | edge-case | _(none)_ | All ACs Planned → zero scenarios generated; coverage report with skip reasons | | 8 | output-format | _(none)_ | .feature file structure: @US_ID:, Feature: header, # AC: + @AC: per scenario | @@ -26,7 +26,6 @@ No fixture files for this skill. All evals use inline User Story/AC definitions | Routes to | Query count | |---|---| -| bdd-scenario-gen | 1 | | gherkin-step | 1 | | living-doc-gap-finder | 1 | | gherkin-living-doc-sync | 1 | diff --git a/skills/living-doc-scenario-creator/evals/trigger-eval.json b/skills/living-doc-scenario-creator/evals/trigger-eval.json index 75684a8..0e82d67 100644 --- a/skills/living-doc-scenario-creator/evals/trigger-eval.json +++ b/skills/living-doc-scenario-creator/evals/trigger-eval.json @@ -9,7 +9,7 @@ {"id": 8, "query": "Generate Gherkin from user story US-012", "should_trigger": true, "reason": "'gherkin from user story' trigger phrase"}, {"id": 9, "query": "Create scenarios for US-007", "should_trigger": true, "reason": "'scenarios for US-' trigger phrase — explicitly mentions a US ID"}, {"id": 10, "query": "Generate a .feature file for the checkout flow", "should_trigger": true, "reason": "'generate .feature file' trigger phrase"}, - {"id": 11, "query": "Write standalone Gherkin scenarios for an exploratory test", "should_trigger": false, "reason": "Standalone Gherkin without a User Story — routes to bdd-scenario-gen"}, + {"id": 11, "query": "Write standalone Gherkin scenarios for an exploratory test", "should_trigger": true, "reason": "Standalone Gherkin without a User Story — uses living-doc-scenario-creator Standalone mode"}, {"id": 12, "query": "Implement the step definition for 'When the customer confirms the order'", "should_trigger": false, "reason": "Step definition implementation — routes to gherkin-step"}, {"id": 13, "query": "Write a unit test for the promo code calculation", "should_trigger": false, "reason": "Unit test request — out of scope for this toolkit (no test-unit-write skill defined)"}, {"id": 14, "query": "Find which User Stories have no Gherkin coverage at all", "should_trigger": false, "reason": "Finding doc gaps — routes to living-doc-gap-finder"}, From b45c00a07da9aff8682c6dc6b533cb9fa44137a8 Mon Sep 17 00:00:00 2001 From: miroslavpojer <miroslav.pojer@absa.africa> Date: Sat, 30 May 2026 22:25:19 +0200 Subject: [PATCH 29/35] Final fix of gaps and issues. --- .../agents/living-doc-bdd-copilot.agent.md | 6 +- skills/bdd-maintain/SKILL.md | 39 +++++++---- skills/data-cy-instrument/SKILL.md | 44 +++++++----- skills/gherkin-living-doc-sync/SKILL.md | 24 ++++--- skills/gherkin-step/SKILL.md | 70 +++++++++++++++---- skills/living-doc-create-feature/SKILL.md | 24 +++++-- .../living-doc-create-functionality/SKILL.md | 26 ++++--- skills/living-doc-create-user-story/SKILL.md | 32 ++++++--- skills/living-doc-gap-finder/SKILL.md | 43 ++++++++---- skills/living-doc-impact-analysis/SKILL.md | 36 ++++++---- skills/living-doc-pageobject-scan/SKILL.md | 40 +++++++---- skills/living-doc-scenario-creator/SKILL.md | 28 +++++--- skills/living-doc-update/SKILL.md | 32 ++++++++- 13 files changed, 315 insertions(+), 129 deletions(-) diff --git a/.github/agents/living-doc-bdd-copilot.agent.md b/.github/agents/living-doc-bdd-copilot.agent.md index ef901ad..580e1cf 100644 --- a/.github/agents/living-doc-bdd-copilot.agent.md +++ b/.github/agents/living-doc-bdd-copilot.agent.md @@ -2,11 +2,11 @@ description: > Single agent for living documentation and BDD automation — catalog management plus executable test generation. Catalog: create/update/deprecate User Stories, Features, - Functionalities and ACs; impact analysis; gap finding (HEALING/PLAN modes). + Functionalities and ACs; impact analysis; gap finding (AUDIT/PLAN modes). Automation: explore webapps, generate PageObjects, produce Gherkin scenarios and step definitions, maintain BDD suites, sync traceability. Triggers: "create user story", "document feature", "update AC", "impact analysis", "living doc gaps", "PLAN mode", - "HEALING mode", "deprecate entity", "mark US ready", "scan webapp", "generate pageobjects", + "AUDIT mode", "deprecate entity", "mark US ready", "scan webapp", "generate pageobjects", "heal pageobjects", "generate scenarios", "sync gherkin", "playwright crawl", "explore the app", "BDD pipeline", "crawl the UI", "create page objects", "generate feature file", "step definitions", "add missing data-cy", "fix playwright selectors", @@ -83,7 +83,7 @@ Load **one** skill per session. Do not pre-load skills for modes not yet trigger | Update / deprecate entity or AC | `living-doc-update` | | Promote entity to ACTIVE | `living-doc-update` | | PR impact analysis / trace affected entities | `living-doc-impact-analysis` | -| Catalog gaps / HEALING mode / PLAN mode | `living-doc-gap-finder` | +| Catalog gaps / AUDIT mode / PLAN mode | `living-doc-gap-finder` | `living-doc-gap-finder` is used **top-down** in catalog operations — finding missing documentation entities. Bottom-up (uncovered ACs) is used in automation operations (see below). diff --git a/skills/bdd-maintain/SKILL.md b/skills/bdd-maintain/SKILL.md index 55fdc1b..0b07cf9 100644 --- a/skills/bdd-maintain/SKILL.md +++ b/skills/bdd-maintain/SKILL.md @@ -1,19 +1,18 @@ --- name: bdd-maintain description: > - Lifecycle cleanup for BDD automation artifacts: REMOVE (delete feature files, step - definitions, and PageObjects linked to a deprecated entity) and DEAD CODE AUDIT (find - unused step definitions, PageObject methods, and PO components via three Python scripts). - Activate when a feature has been removed from the product and its linked BDD files must be - deleted, or when dead BDD code needs to be identified. Third step in the entity-deprecation - chain — runs after living-doc-update deprecates the entity and gherkin-living-doc-sync - marks linked scenarios. + Lifecycle cleanup for BDD automation artifacts. REMOVE: delete feature files, step + definitions, and PageObjects linked to a deprecated entity. DEAD CODE AUDIT: find + unused step definitions, PageObject methods, and PO components via three Python scripts. + Third step in the entity-deprecation chain — after living-doc-update and gherkin-living-doc-sync. Triggers on: "remove feature", "deprecate bdd", "delete feature files", "bdd cleanup", "remove pageobject", "unused steps", "dead pageobject methods", "find unused steps", "dead code audit", "unused po methods", "dead po components", "bdd-maintain". - Does NOT trigger for: re-scanning the manifest after UI changes (use living-doc-pageobject-scan - RE-SCAN scope); healing failing tests after selector drift (use living-doc-pageobject-scan - HEALING scope); syncing @AC: traceability tags (use gherkin-living-doc-sync). + Does NOT trigger for: re-scanning manifest after UI changes (use living-doc-pageobject-scan + RE-SCAN); healing selector drift (use living-doc-pageobject-scan HEALING); syncing @AC: + traceability tags (use gherkin-living-doc-sync). + Pairs with living-doc-update (upstream — deprecate entity first) and + gherkin-living-doc-sync (upstream — tag scenarios first). license: Apache-2.0 compatibility: GitHub Copilot --- @@ -31,14 +30,16 @@ Two modes — activate the one that matches the trigger. **Trigger:** Feature deprecated or deleted from the product. +**Prerequisite:** `living-doc-update` must have already deprecated the entity and `gherkin-living-doc-sync` must have already tagged linked scenarios with `@deprecated` and `@review-needed`. Run those two skills first if they have not yet run — removing files before scenarios are tagged silently breaks traceability. + **Scope:** Only files linked to the removed entity — do not touch other Features, PageObjects, or step definitions. 1. Identify the specific Feature/US/AC being removed. 2. Find all `.feature` files whose scenarios carry an `@AC:` tag matching the removed entity's IDs. -3. Find PageObjects referenced only by those scenarios; find step definitions used only by those scenarios. +3. Find PageObjects referenced only by those scenarios; find step definitions used only by those scenarios. Also check `playwright/fixtures.ts` (or the project's fixture file) for fixture registrations that import the PageObjects being removed — those imports and constructor parameters must be removed too. 4. Confirm the full deletion list with the user before touching any file. -5. Remove confirmed files; update `manifest.json` to remove the deprecated entry. -6. Flag linked US/AC entities in the living documentation as candidates for deprecation — load `living-doc-update` skill. +5. Remove confirmed files; remove the deprecated entry from `manifest.json`. Do not restructure or regenerate the manifest — `living-doc-pageobject-scan` owns the manifest for all active entries. +6. If any child entities (linked User Stories, Functionalities) were not yet deprecated in the catalog, flag them and load `living-doc-update` to deprecate them now. --- @@ -98,3 +99,15 @@ python playwright/scripts/find_unused_po_components.py \ - **Unused PO class**: either add an import and fixture entry, or remove the `.ts` file — after confirming nothing references it outside the test suite. All three scripts exit `0` on clean, `1` on findings, `2` on bad arguments — safe for CI gating. + +--- + +## Out-of-scope routing + +| Request | Correct skill | +|---|---| +| Re-scan manifest after UI changes | `living-doc-pageobject-scan` RE-SCAN scope | +| Fix failing tests due to selector drift | `living-doc-pageobject-scan` HEALING scope | +| Sync `@AC:` traceability tags | `gherkin-living-doc-sync` | +| Deprecate an entity in the catalog | `living-doc-update` | +| Tag deprecated scenarios before deletion | `gherkin-living-doc-sync` | diff --git a/skills/data-cy-instrument/SKILL.md b/skills/data-cy-instrument/SKILL.md index f1250ce..776e739 100644 --- a/skills/data-cy-instrument/SKILL.md +++ b/skills/data-cy-instrument/SKILL.md @@ -1,19 +1,17 @@ --- name: data-cy-instrument description: > - Automatically resolve missing `data-cy` attributes in component templates (Angular-first) - and sync the corresponding Playwright PageObjects to use `getByTestId()`. Activate - whenever coverage gaps exist in `manifest.json`, when PageObject stubs carry - "⚠️ PROPOSED" locator comments, when Functionality entities have `status: planned` - due to missing test IDs, or when a dev explicitly asks to instrument templates. - Activate at the end of a `living-doc-pageobject-scan` session (Create or RE-SCAN scope) - when `coverage_gaps` arrays are non-empty; - Triggers on: "add missing data-cy", "instrument templates", "fix data-cy gaps", - "add testids", "data-cy audit", "instrument angular templates", "fix locators", - "add data-cy attributes", "add test ids to templates", "fix playwright selectors", - "data-cy-instrument". - Does NOT trigger for: adding or fixing Gherkin scenarios (use living-doc-scenario-creator); generating - or healing PageObjects without instrumentation gaps (use living-doc-pageobject-scan). + Automatically resolve missing `data-cy` attributes in Angular templates and sync PageObjects + to use `getByTestId()`. Angular-first but phases 1, 3, and 5 are framework-agnostic. Activates + when coverage_gaps are non-empty, PageObjects carry "⚠️ PROPOSED" locator comments, or + Functionalities have `status: planned` due to missing test IDs. + Triggers on: "add missing data-cy", "instrument templates", "fix data-cy gaps", "add testids", + "data-cy audit", "instrument angular templates", "fix locators", "add data-cy attributes", + "add test ids to templates", "fix playwright selectors due to missing data-cy", "data-cy-instrument". + Does NOT trigger for: adding Gherkin (use living-doc-scenario-creator); PageObject + healing without data-cy gaps (use living-doc-pageobject-scan HEALING). + Pairs with living-doc-pageobject-scan (upstream) and living-doc-scenario-creator (downstream); + invokes living-doc-update for Functionality promotion. license: Apache-2.0 compatibility: GitHub Copilot --- @@ -65,7 +63,7 @@ Build a prioritised gap list before touching any file. 5. Sort by priority P1 → P3. Process in that order. **Skip list — do not attempt to instrument these:** -- Elements inside third-party library internals where the host attribute is confirmed not to be propagated (e.g. `cps-table` inner paginator buttons, `cps-tab` inner `<li role="tab">` when the lib does not forward host attributes). Mark these ⚠️ "needs lib support" and surface them as a library issue to the dev team, not a template change. +- Elements inside third-party library internals where the host attribute is confirmed not to be propagated (e.g. `cps-table` inner paginator buttons, `cps-tab` inner `<li role="tab">` when the lib does not forward host attributes). Mark these ⚠️ "needs lib support" — add a WORK_LOG.md §4 row with status ⚠️, element description, library name and version, and a link to the library's issue tracker if one exists. Do not leave these as silent skips. - Elements that require authenticated roles to render — flag as needing an integration test fixture, not a data-cy change. --- @@ -195,6 +193,8 @@ For each Functionality whose `status: planned` was solely due to missing `data-c Only promote if the data-cy attributes required by that Functionality's ACs have all been added in Phase 4. If a Functionality depends on multiple elements and only some were instrumented, leave it as `planned` and add a comment listing the remaining blockers. +After updating the BDD feature file header, also invoke `living-doc-update` to change the matching catalog entity's `status` from `planned` to `active`. The BDD file header and the catalog entity must stay in sync. + --- ## Phase 7 · WORK_LOG Update @@ -248,6 +248,18 @@ Report the following at the end of the run: **Pipeline position:** ``` -living-doc-pageobject-scan → data-cy-instrument → living-doc-scenario-creator -living-doc-pageobject-scan (RE-SCAN) → data-cy-instrument → living-doc-scenario-creator +living-doc-pageobject-scan (or RE-SCAN) → data-cy-instrument + → living-doc-update (promote Functionalities: planned → active) + → living-doc-scenario-creator ``` + +--- + +## Out-of-scope routing + +| Request | Correct skill | +|---|---| +| Add or fix Gherkin scenarios | `living-doc-scenario-creator` | +| Generate or heal PageObjects (no missing data-cy) | `living-doc-pageobject-scan` | +| Fix selector drift from DOM structure changes (no missing data-cy) | `living-doc-pageobject-scan` HEALING scope | +| Deprecate a Functionality entity | `living-doc-update` | diff --git a/skills/gherkin-living-doc-sync/SKILL.md b/skills/gherkin-living-doc-sync/SKILL.md index ded348a..1e93ec8 100644 --- a/skills/gherkin-living-doc-sync/SKILL.md +++ b/skills/gherkin-living-doc-sync/SKILL.md @@ -2,17 +2,17 @@ name: gherkin-living-doc-sync description: > Synchronise Gherkin feature files and BDD scenarios with the living documentation catalog. - Activate when scenarios diverge from User Story ACs, step text drifts after a refactor, - `@AC:` tag or `# AC:` comment annotations are missing or stale, descoped ACs need their - linked scenarios updated, or AC changes must propagate from the living doc back to feature - files. Run scan_ac_links.py to audit AC link health before a sync pass. - Distinct from gap-finder (which detects missing coverage) — corrects existing links. + Corrects existing links — distinct from living-doc-gap-finder (which detects missing coverage). + Activate when `@AC:` tags or `# AC:` comments are missing or stale, step text drifts after + a refactor, ACs are descoped, or AC changes must propagate from the living doc to feature files. + Run scan_ac_links.py to audit AC link health before a sync pass. Triggers on: "sync gherkin to living doc", "feature file out of sync", "scenario not linked to AC", "step text changed", "gherkin drift", "BDD sync", "AC link missing in feature file", "sync scenarios", "traceability broken", "propagate AC changes", "AC was descoped". - Does NOT trigger for: writing new scenarios (use living-doc-scenario-creator), implementing step - definitions (use gherkin-step), finding living doc gaps (use living-doc-gap-finder), - creating new US/Feature entities (use living-doc-create-user-story). + Does NOT trigger for: writing new scenarios (use living-doc-scenario-creator); implementing + step definitions (use gherkin-step); finding gaps (use living-doc-gap-finder); + creating entities (use living-doc-create-*). + Pairs with living-doc-update (upstream) and gherkin-step (downstream). license: Apache-2.0 compatibility: GitHub Copilot --- @@ -41,7 +41,7 @@ and `features/functionalities/`) — other feature files are skipped. |---|---|---| | New `.feature` file added | Feature file to living doc | Link each scenario to an AC; create AC if missing | | User Story AC modified or added | Living doc to feature file | Update or add the corresponding scenario | -| UI refactored (selector / method renamed) | Step text to PageObject | Update step text; re-link to PageObject method | +| UI refactored (selector / method renamed) | Step text to PageObject | Update step text and `@AC:` tag if scenario intent changed; for the PageObject side of the rename (method signature or locator), load `living-doc-pageobject-scan` HEALING scope — this skill owns only the Gherkin step text, not the PageObject code | | US deprecated | Living doc to feature file | Emit one sync action per linked scenario; add `@deprecated`, record the reason, and flag `@review-needed` | | Scenario added without an `@AC:` tag | Feature file to living doc | Propose an AC and add the `@AC:` tag | @@ -49,6 +49,8 @@ and `features/functionalities/`) — other feature files are skipped. ## Step 2 — Audit `@AC:` traceability tags +> **Authoritative source:** The `@AC:` format is defined in `living-doc-scenario-creator`. The spec below is a reference copy for sync validation — load `living-doc-scenario-creator` for the canonical definition. + **Required traceability format** for living-doc feature files (from the glossary): ```gherkin @@ -111,6 +113,10 @@ DRIFT DETECTED: checkout.feature:17 OR update the step definition regex to match the new wording ``` +> **Scope boundary with `living-doc-pageobject-scan` HEALING:** This step corrects step text in `.feature` files and step definition pattern strings. If the underlying PageObject selector or method signature drifted (renamed in the DOM or PageObject class), use `living-doc-pageobject-scan` HEALING mode to fix the PageObject class first, then re-run this sync to align feature files. +> +> **Step definition code changes:** When a step definition regex pattern must be updated (not just the feature file wording), load `gherkin-step` to apply the code change correctly. + --- ## Step 4 — Apply sync changes diff --git a/skills/gherkin-step/SKILL.md b/skills/gherkin-step/SKILL.md index 931c5ef..48d2e88 100644 --- a/skills/gherkin-step/SKILL.md +++ b/skills/gherkin-step/SKILL.md @@ -1,19 +1,18 @@ --- name: gherkin-step description: > - Implementing Gherkin step definitions that are clean, reusable, and maintainable. Activate when + Implement Gherkin step definitions that are clean, reusable, and maintainable. Activate when writing or reviewing step definition code, binding Gherkin text to automation, managing shared state between steps, configuring parameter types, parsing DataTable or DocString arguments, or - setting up Before/After hooks. Covers Python behave, Cucumber for Java and TypeScript, and - Cucumber-Scala idioms. + setting up Before/After hooks. Covers Python behave, Cucumber TypeScript/Java, and Cucumber-Scala. Triggers on: "step definitions", "implement Gherkin steps", "Cucumber step", "behave step", "parameter type", "DataTable", "DocString", "Before hook", "After hook", "World object", "step context", "step state sharing", "how to share state between steps", "register step definition", "hook setup". - Does NOT trigger for: writing Gherkin scenarios (use living-doc-scenario-creator), writing unit tests - (no skill in this toolkit covers unit test authoring — use your project's test framework - directly). - Pairs with living-doc-scenario-creator. + Does NOT trigger for: writing Gherkin scenarios (use living-doc-scenario-creator); writing + unit tests (use your project's test framework). + Pairs with living-doc-scenario-creator and living-doc-pageobject-scan (PageObjects must + exist before step definitions reference them). license: Apache-2.0 compatibility: GitHub Copilot --- @@ -22,6 +21,8 @@ compatibility: GitHub Copilot > **Glossary:** Feature, PageObject, Functionality — see [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)). +> **Framework scope:** This skill covers step definition idioms for **Python behave**, **Cucumber TypeScript**, **Cucumber Java**, and **Cucumber-Scala**. The PageObject ecosystem in this toolkit uses **Playwright + TypeScript** — Python or Java projects must adapt PageObject patterns to their own test framework. All BDD principles (thin steps, no selectors in steps, context object) apply regardless of language. + ## Respect the boundary with Gherkin text If the user asks to write or review a **Gherkin scenario / feature file**, do not draft the @@ -32,8 +33,9 @@ scenario here. Explain that this skill covers **step definition code** only, the ## Context initialization — how PageObjects reach steps -Step definitions receive a fresh `context` object each scenario. PageObjects must be attached to -`context` in a `before_scenario` hook (or a preceding `Given` step), not inside the step itself. +> **Prerequisite:** PageObject classes must exist before step definitions can reference them. If PageObjects have not yet been generated for the screens under test, use `living-doc-pageobject-scan` first to produce them. + +**Python behave:** Step definitions receive a fresh `context` object each scenario. Attach PageObjects in a `before_scenario` hook. ```python # ✅ — Before hook initialises the PageObject once per scenario @@ -42,14 +44,43 @@ def setup_pages(context): context.checkout_page = CheckoutPage(context.browser.new_page()) ``` -The `When` step then delegates without creating or managing the PageObject: +**Cucumber TypeScript (Playwright):** Use a typed `World` class registered with `setWorldConstructor`. -```python -@when('the customer confirms the order') -def step_confirm_order(context): - context.checkout_page.confirm_order() # relies on before_scenario having run +```typescript +// world.ts +import { setWorldConstructor, World, IWorldOptions } from '@cucumber/cucumber'; +import { Browser, Page } from '@playwright/test'; +import { CheckoutPage } from './pages/checkout.page'; + +export interface AppWorld extends World { + browser: Browser; + page: Page; + checkoutPage: CheckoutPage; +} + +class AppWorldImpl extends World implements AppWorld { + browser!: Browser; + page!: Page; + checkoutPage!: CheckoutPage; + constructor(options: IWorldOptions) { super(options); } +} +setWorldConstructor(AppWorldImpl); ``` +```typescript +// hooks.ts +Before(async function (this: AppWorld) { + this.browser = await chromium.launch(); + this.page = await this.browser.newPage(); + this.checkoutPage = new CheckoutPage(this.page); +}); +``` + +**Step definition file naming:** +- One file per domain area: `checkout.steps.ts` / `checkout_steps.py` +- Place under `playwright/steps/` (TS) or `features/steps/` (Python) +- Never name a file `steps.ts` or `steps.py` — the name must identify the domain + --- ## Function naming convention @@ -181,3 +212,14 @@ def teardown_database(context): if "database" in context.tags: context.db.teardown() ``` + +--- + +## Out-of-scope routing + +| Request | Correct skill | +|---|---| +| Write or review Gherkin scenarios / feature files | `living-doc-scenario-creator` | +| Generate or update PageObject classes | `living-doc-pageobject-scan` | +| Sync `@AC:` traceability tags in feature files | `gherkin-living-doc-sync` | +| Write unit tests | Use your project's test framework directly | diff --git a/skills/living-doc-create-feature/SKILL.md b/skills/living-doc-create-feature/SKILL.md index ff29793..3e2c2b4 100644 --- a/skills/living-doc-create-feature/SKILL.md +++ b/skills/living-doc-create-feature/SKILL.md @@ -3,16 +3,17 @@ name: living-doc-create-feature description: > Define a system surface (UI screen, API endpoint, service, or module) as a Feature entity, enabling impact analysis and traceability in the living documentation. Use when documenting - a new screen, API endpoint, service, or module; maintaining a Feature Registry; mapping - system surfaces to User Stories; resolving Feature naming conflicts or duplicate entries; - or bootstrapping the structural layer between User Stories and atomic behaviors. + a new screen, API, service, or module; mapping surfaces to User Stories; or resolving + Feature naming conflicts. Triggers on: "document a new feature", "create a feature entity", "new screen documentation", "document an API endpoint", "feature registry", "what feature owns this", "map user story to feature", "system surface documentation", "feature owners", "feature dependencies", "duplicate feature name", "resolve feature naming". - Does NOT trigger for: creating User Stories (use living-doc-create-user-story), defining - atomic behaviors (use living-doc-create-functionality), scanning PageObjects (use - living-doc-pageobject-scan), deprecating entities (use living-doc-update). + Does NOT trigger for: creating User Stories (use living-doc-create-user-story); defining + behaviors (use living-doc-create-functionality); scanning PageObjects (use + living-doc-pageobject-scan); deprecating (use living-doc-update). + Pairs with living-doc-create-functionality and living-doc-create-user-story. + After creating, add a feature_registry entry for living-doc-impact-analysis. license: Apache-2.0 compatibility: GitHub Copilot --- @@ -128,3 +129,14 @@ If `user_stories` is `[]`, repeat the orphan warning from Step 3 outside the JSO |---|---| | Creating a User Story | **living-doc-create-user-story** | | Defining an atomic behavior (Functionality) | **living-doc-create-functionality** | + +## Next steps after creation + +| Action | Skill | +|---|---| +| Define atomic behaviors for this Feature | **living-doc-create-functionality** | +| Link to an existing User Story | **living-doc-update** (add Feature to the User Story's `features` list) | +| Generate BDD PageObjects for a UI Feature | **living-doc-pageobject-scan** | +| Update feature_registry for impact traceability | **living-doc-impact-analysis** (see Feature registry format in that skill) | + +> **Renaming a Feature:** Changing a Feature's `id` or `name` requires cascading updates. Load `living-doc-update` and follow the "Rename a Feature" workflow there, which covers: Functionality `feature_id` fields, `feature_registry` entry, `manifest.json`, `seed.yaml`, PageObject file headers, and Gherkin feature file `# Feature:` headers. diff --git a/skills/living-doc-create-functionality/SKILL.md b/skills/living-doc-create-functionality/SKILL.md index 6012088..5c411c3 100644 --- a/skills/living-doc-create-functionality/SKILL.md +++ b/skills/living-doc-create-functionality/SKILL.md @@ -1,19 +1,19 @@ --- name: living-doc-create-functionality description: > - Define an atomic, testable behavior (Functionality) with Functionality-level Acceptance Criteria - designed to be validated by fast unit or integration tests. Activate when documenting an atomic - behavior, component function, or business rule; writing Functionality-level AC; creating the - granular test anchor for a Feature; choosing test_type (unit vs integration); identifying reuse - candidates across User Stories; linking a Functionality to its parent Feature; or reviewing a - Functionality for completeness. + Define an atomic, testable behavior (Functionality) with Acceptance Criteria for unit or + integration tests. Use when documenting an atomic behavior, writing Functionality-level ACs, + choosing test_type, identifying reuse candidates, or reviewing a Functionality. Triggers on: "create a functionality", "document an atomic behavior", "functionality AC", "unit-testable behavior", "define component behavior", "atomic acceptance criteria", "document a business rule", "create a functionality entity", "functionality acceptance criteria", "test_type", "unit vs integration test", "choose test type", "link functionality to feature". - Does NOT trigger for: end-to-end User Stories (use living-doc-create-user-story), system - surface documentation (use living-doc-create-feature), generating BDD scenarios for a - Functionality (use living-doc-scenario-creator). + Does NOT trigger for: E2E User Stories (use living-doc-create-user-story); system + surfaces (use living-doc-create-feature); generating BDD scenarios (use + living-doc-scenario-creator). + Pairs with living-doc-create-feature (parent surface first) and living-doc-scenario-creator + (BDD after). After creating, update the parent Feature's functionalities[] array + (else ORPHAN_FUNCTIONALITY gap). license: Apache-2.0 compatibility: GitHub Copilot --- @@ -87,6 +87,8 @@ For Blocker or Important findings, propose a split into smaller Functionalities Before creating, check whether an identical behavior already exists under any Feature. **Compare ACs, not names** — the same verb phrase in a different Feature context often produces a legitimately different contract. +> **Scope note:** This step is a lightweight in-session check during creation. For a full cross-catalog duplicate and coverage audit across all existing Functionalities, use `living-doc-gap-finder` instead. + If the ACs are identical or near-identical across Features or User Stories, prefer **one shared Functionality**. Link every consuming User Story in the `user_stories` array instead of duplicating the ACs. > "This is a reuse candidate. If the contract is truly identical, keep one Functionality and link both User Stories to it. Duplicating the same AC in multiple places creates maintenance burden and raises the risk of divergence when the behavior changes." @@ -131,6 +133,10 @@ Rules: - Every acceptance criterion must state an exact outcome; error cases must include the explicit error code. - `test_coverage` must cover every AC and record `unit` or `integration` consistently with Step 3. +> **Promoting `planned` → `active`:** A Functionality is created with `status: "planned"`. Once the tests backing all its ACs are written and passing, use `living-doc-update` to change the status to `active`. Do not mark a Functionality `active` until its test coverage is in place. + +> **Parent Feature sync:** After saving this entity, load `living-doc-update` and append this `FUNC-<id>` to the parent Feature's `"functionalities"` array. An unlinked Functionality will be flagged as `ORPHAN_FUNCTIONALITY` by `living-doc-gap-finder`. + ## Distinguishing Functionality ACs from User Story ACs | Dimension | User Story AC | Functionality AC | @@ -156,7 +162,7 @@ redirect to `living-doc-create-user-story`. | Two Functionalities have identical or near-identical ACs | Duplicate ACs create a maintenance burden. Consolidate into one shared Functionality and link all related `user_stories`. | | Functionality has no parent Feature | A Functionality without a parent Feature is untraceable — create or identify the parent Feature first. | -## Out-of-scope redirects +## Out-of-scope routing | Request type | Correct skill | |---|---| diff --git a/skills/living-doc-create-user-story/SKILL.md b/skills/living-doc-create-user-story/SKILL.md index 7754fb4..3c22e1a 100644 --- a/skills/living-doc-create-user-story/SKILL.md +++ b/skills/living-doc-create-user-story/SKILL.md @@ -2,18 +2,17 @@ name: living-doc-create-user-story description: > Guide the creation of a well-formed User Story (US) with business-level Acceptance Criteria - that are traceable, testable, and E2E-ready. Use when creating a new User Story for a - business capability, eliciting As-a/I-can/so-that narratives, defining US-level Acceptance - Criteria, validating User Story narrative structure (checking As-a role, I-want clause, or - AC wording quality), or reviewing US completeness before scenario creation. + that are traceable, testable, and E2E-ready. Use when creating a new User Story, eliciting + As-a/I-can/so-that narratives, defining US-level ACs, validating US narrative structure, + or reviewing US completeness before scenario creation. Triggers on: "create a user story", "new user story for", "write acceptance criteria for", "document a business requirement", "define US AC", "user story template", "as a user I want", "elicit requirements", "AC for user story", "US acceptance criteria", "review this user story", "is my narrative well-formed", "I-want clause". - Does NOT trigger for: atomic component behaviors (use living-doc-create-functionality), - documenting system surfaces (use living-doc-create-feature), generating BDD scenarios - (use living-doc-scenario-creator). Pairs with living-doc-create-functionality and - living-doc-scenario-creator. + Does NOT trigger for: atomic behaviors (use living-doc-create-functionality); system surfaces + (use living-doc-create-feature); generating BDD scenarios (use living-doc-scenario-creator). + Pairs with living-doc-create-feature, living-doc-create-functionality, and + living-doc-scenario-creator (generate scenarios after the US is active). license: Apache-2.0 compatibility: GitHub Copilot --- @@ -57,6 +56,8 @@ A Feature is a named system surface (UI screen or API endpoint group). If the Fe has not yet been created as a living doc entity, note it as `[NEW: <name>]` and suggest creating it with `living-doc-create-feature` after completing the User Story. +Also ask: *Are there existing Functionalities this User Story relies on?* If yes, link them in the `functionalities` array. This prevents `ORPHAN_FUNCTIONALITY` gaps and makes the entity graph traversable from US down to test coverage. + ## Step 3 — Elicit Acceptance Criteria Each AC must be: @@ -133,6 +134,8 @@ Rules: - Every AC object must have `id` in `AC:US-<nnn>-<nn>` format and a plain-language `description` - Write AC descriptions in plain language — no structured language keywords in JSON values +> **Next steps after creation:** The User Story is created with `status: "planned"`. When all ACs are finalised and at least one Feature is linked, use `living-doc-update` to promote it to `active`. After promotion, use `living-doc-scenario-creator` to generate BDD feature files for each `ACTIVE` AC. + ## Anti-patterns to flag | Anti-pattern | Warning | @@ -145,4 +148,15 @@ Rules: | AC uses `{placeholder}` for a single value | Placeholder syntax is only justified when two or more values vary. If only one value applies, write it inline. Example: instead of `{error type}: inline validation message`, write `an inline validation message is shown`. | | AC describes a non-observable outcome | e.g. “a background job processes the record” — the user cannot observe this. Restate as the observable signal (e.g. “the confirmation email arrives within 60 seconds”), or redirect the behavior to a Functionality entity if it is purely technical. | | AC identifier does not follow `AC:US-<nnn>-<nn>` | Every acceptance criterion in the JSON output needs a stable `AC:US-<nnn>-<nn>` id so it can be referenced unambiguously. | -| AC behavior already documented in another User Story | Duplicate ACs create a maintenance burden — any change must be applied in every copy. Extract the shared behavior into a Functionality entity and link both User Stories to it. | \ No newline at end of file +| AC behavior already documented in another User Story | Duplicate ACs create a maintenance burden — any change must be applied in every copy. Extract the shared behavior into a Functionality entity and link both User Stories to it. | + +--- + +## Out-of-scope routing + +| Request | Correct skill | +|---|---| +| Document an atomic behavior or business rule | `living-doc-create-functionality` | +| Document a system surface (screen, API) | `living-doc-create-feature` | +| Generate BDD scenarios for User Story ACs | `living-doc-scenario-creator` | +| Update or deprecate an existing User Story | `living-doc-update` | \ No newline at end of file diff --git a/skills/living-doc-gap-finder/SKILL.md b/skills/living-doc-gap-finder/SKILL.md index f028664..82129b4 100644 --- a/skills/living-doc-gap-finder/SKILL.md +++ b/skills/living-doc-gap-finder/SKILL.md @@ -1,18 +1,17 @@ --- name: living-doc-gap-finder description: > - Identify gaps in the living documentation by combining bottom-up UI/code exploration with - top-down requirement checking. Activate when auditing living doc completeness, finding - undocumented behaviors, discovering orphan tests with no AC link, orphan Functionalities with - no parent Feature, detecting untested ACs, producing a documentation coverage gap report - (including batch runs for large suites), or proposing new living doc entities to fill - identified gaps. + Identify gaps in the living documentation by combining bottom-up and top-down analysis. + Use when auditing living doc completeness, finding undocumented behaviors, orphan tests, + orphan Functionalities, untested ACs, or producing a documentation coverage gap report. + Proposes actions executed by living-doc-create-*, living-doc-scenario-creator, and + living-doc-update. Re-run after entity creation or status changes to confirm gaps are closed. Triggers on: "find what's not documented", "living doc gaps", "what's missing in living doc", "find undocumented features", "orphan tests", "orphan functionalities", "untested AC", "documentation coverage", "gap report", "what's not covered", "living doc audit", "documentation audit". Does NOT trigger for: creating new living doc objects (use living-doc-create-* skills). - Delegates to: living-doc-pageobject-scan, living-doc-scenario-creator, and all create-* skills. + Pairs with living-doc-update (stale references) and living-doc-create-* skills (gap resolution). license: Apache-2.0 compatibility: GitHub Copilot --- @@ -49,6 +48,17 @@ Before presenting the final report, normalise the script output against the taxo --- +## Mode names + +| Mode | When to use | +|---|---| +| **AUDIT mode** | Full catalog audit — runs the 9-type taxonomy top-down across all entities. Use after a sprint with entity changes or when the living doc hasn’t been reviewed recently. | +| **PLAN mode** | Bootstrap new coverage — draft ACs from PageObject descriptions or discovered UI surfaces (bottom-up). Produces `PLANNED`-state AC drafts for user confirmation before creating entities. | + +Both modes use `compute_gaps.py` and the same gap taxonomy. AUDIT mode spans the full catalog; PLAN mode is scoped to the surfaces being bootstrapped. + +--- + ## Gap taxonomy Nine types of gaps are detected, in order of risk: @@ -65,6 +75,8 @@ Nine types of gaps are detected, in order of risk: | 8 — Nit | **Undocumented Functionality** | A Functionality entity exists with no associated tests | | 9 — Nit | **Empty Feature** | A Feature entity exists with no Functionalities defined | +> **Resolution routing:** `UNTESTED_AC` → `living-doc-scenario-creator`; `UNDOCUMENTED_SURFACE` / `ORPHAN_FUNCTIONALITY` / `EMPTY_FEATURE` → `living-doc-create-*`; `ORPHAN_FEATURE` / `ORPHAN_USER_STORY` → `living-doc-update` (add missing link); `ORPHAN_TEST` → `gherkin-living-doc-sync`; **`STALE_REFERENCE`** → `living-doc-update` (deprecate the AC or update the test `@AC:` tag); `UNDOCUMENTED_FUNCTIONALITY` → `living-doc-scenario-creator`. + ## Workflow ### Step 1 — Bottom-up scan @@ -171,14 +183,21 @@ For each gap, propose the living doc action: | ORPHAN_USER_STORY | Link to an existing Feature, or create the missing Feature — `living-doc-create-feature` | | ORPHAN_FUNCTIONALITY | Link to an existing Feature, or delete if the behavior has no owning surface. Do not delete if tests reference this Functionality's ACs — resolve those first (see ORPHAN_TEST). | | ORPHAN_TEST | Link test to an existing AC, or create a Functionality — `living-doc-create-functionality`. **Never delete a test to resolve an orphan — that would silently remove coverage.** If the linked AC ID no longer exists (broken link), choose from: (1) recreate the AC/Functionality if the behavior is still required; (2) update the link to the merged AC ID if the entity was merged; (3) delete the test only after product owner confirmation that the behavior has been intentionally removed. | -| STALE_REFERENCE | Update the test to reference the active replacement AC. If the deprecated behavior was intentionally removed, delete the test after product owner confirmation. If removed in error, reinstate the AC using `living-doc-update`. | +| STALE_REFERENCE | Use `living-doc-update` to manage the AC state first: reinstate the AC if the deprecation was in error, or confirm the deprecation is intentional. Then update the test to reference the active replacement AC, or delete the test after product owner confirmation if the behavior has been intentionally retired. | | UNDOCUMENTED_FUNCTIONALITY | Create unit/integration tests for the Functionality's ACs | | EMPTY_FEATURE | Create Functionalities for the Feature's known behaviors — `living-doc-create-functionality` | -> **Out-of-scope actions:** living-doc-gap-finder identifies and proposes new entities — it does -> not create them. Direct creation requests (e.g. "create a User Story", "create a Feature") must -> be delegated to the appropriate skill: `living-doc-create-user-story`, `living-doc-create-feature`, -> or `living-doc-create-functionality`. +## Out-of-scope routing + +| Request | Correct skill | +|---|---| +| Create a User Story | `living-doc-create-user-story` | +| Create a Feature | `living-doc-create-feature` | +| Create a Functionality | `living-doc-create-functionality` | +| Update or deprecate an entity / AC | `living-doc-update` | +| Generate BDD scenarios | `living-doc-scenario-creator` | + +Living-doc-gap-finder identifies and proposes — it does not create or edit entities. ### Step 6 — Output gap report diff --git a/skills/living-doc-impact-analysis/SKILL.md b/skills/living-doc-impact-analysis/SKILL.md index 7428951..6157478 100644 --- a/skills/living-doc-impact-analysis/SKILL.md +++ b/skills/living-doc-impact-analysis/SKILL.md @@ -1,20 +1,19 @@ --- name: living-doc-impact-analysis description: > - Analyse the impact of a code change on the living documentation. Given a PR diff, - modified module, or changed API contract, trace which Features, Functionalities, and User Stories - are affected. Output an impact map that identifies what must be reviewed, - updated, or re-tested. Activate when a PR touches business logic and you need to know what - living doc entities are affected, when a service module is refactored, or when breaking API - changes need living doc coverage traced. + Analyse the impact of a code change on the living documentation. Given a PR diff, modified + module, or changed API contract, trace affected Features, Functionalities, and User Stories. + Output an impact map identifying what must be reviewed, updated, or re-tested. Activate when + a PR touches business logic, a service module is refactored, or breaking API changes need + living doc coverage traced. Triggers on: "living doc impact", "what does this change affect", "impact of PR on living doc", "trace affected user stories", "affected features", "impact analysis", "living doc sign-off", "what user stories are affected", "which scenarios need re-running", "what needs re-testing", "PR impact on docs". - Does NOT trigger for: updating living doc (use living-doc-update), finding coverage gaps - (use living-doc-gap-finder), creating new entities (use living-doc-create-* skills). - Pairs with gherkin-living-doc-sync — high-impact AC changes identified here cascade to - gherkin-living-doc-sync for feature file propagation. + Does NOT trigger for: updating living doc (use living-doc-update); finding coverage gaps + (use living-doc-gap-finder); creating new entities (use living-doc-create-*). + Pairs with living-doc-update (apply changes), gherkin-living-doc-sync (propagate AC changes), + and bdd-maintain (cleanup for deprecated entities). license: Apache-2.0 compatibility: GitHub Copilot --- @@ -45,6 +44,16 @@ Feature registry format (add to your catalog JSON): } ``` +**Bootstrapping `feature_registry`:** If no registry exists, follow these steps: +1. Run `living-doc-gap-finder` to list all Feature entities and their IDs. +2. For each Feature, manually map its canonical source directory to its ID: + - Angular: `"paths": ["src/app/pages/checkout/**"]` mirrors the module directory under `src/app/`. + - Java/Spring: `"paths": ["src/main/java/com/example/checkout/**"]` uses the package path. +3. Add each mapping as `{ "feature_id": "FEAT-<id>", "paths": ["<glob>"] }` under `"feature_registry"` in `catalog.json`. +4. Re-run `trace_impact.py` to verify mappings resolve correctly against a known changed file. + +Maintain the registry whenever a Feature is created, renamed, or its source directory moves. The `living-doc-create-feature` and `living-doc-update` "Rename a Feature" workflows include a reminder for this step. + The script handles Steps 1–2 (file classification and entity traversal). Use its output JSON to drive Steps 3–5 (impact classification, impact map narrative, and sign-off checklist). @@ -170,7 +179,9 @@ Produce this checklist as a PR comment or documentation artefact if requested. > must change, hand off to `living-doc-update` immediately. Pass the exact entity ID(s) and the > recommended change from Step 4's recommended actions list. This skill analyses — it does not > edit entities. If any High-impact ACs were subsequently modified or deprecated, also invoke -> `gherkin-living-doc-sync` to propagate the changes to linked feature files. +> `gherkin-living-doc-sync` to propagate the changes to linked feature files. If the change +> revealed that a Feature or Functionality has been fully deprecated with active BDD coverage, +> also invoke `bdd-maintain` REMOVE mode to clean up the associated automation files. ## Code-level impact report format @@ -197,9 +208,10 @@ Do not include speculative changes beyond the described scope. | Changed domain logic with no Feature entity defined in the living doc | Missing living doc coverage — flag as a **High-impact gap** and recommend creating documentation with `living-doc-create-functionality` | | Impact analysis only covers unit/integration tests, not E2E scenarios | Incomplete impact — flag for test-e2e-standards review | -## Out-of-scope redirects +## Out-of-scope routing | Request type | Correct skill | |---|---| | "Update a living doc entity / add a new AC" | `living-doc-update` — this skill analyses impact, it does not edit entities | | "Which Functionalities have no User Stories / find coverage gaps" | `living-doc-gap-finder` — gap discovery is a separate concern | +| "Clean up BDD files for a deprecated feature" | `bdd-maintain` — deletes automation artifacts for removed entities | diff --git a/skills/living-doc-pageobject-scan/SKILL.md b/skills/living-doc-pageobject-scan/SKILL.md index 5bdfe62..f4b28e3 100644 --- a/skills/living-doc-pageobject-scan/SKILL.md +++ b/skills/living-doc-pageobject-scan/SKILL.md @@ -1,20 +1,19 @@ --- name: living-doc-pageobject-scan description: > - Discover, create, and maintain PageObject classes — entry point for all webapp exploration - and BDD-driven UI testing. Covers seed.yaml assembly (Sources A–E), iterative MCP Playwright - crawl, entity harvesting, ExplorationFixture sourcing, PageObject generation, Functionality - stubs, and manifest.json output. Two Maintain scopes: RE-SCAN (full manifest refresh after - UI changes, with active new-route discovery) and HEALING (fix selector drift in failing - tests only). Use for first-time scans, re-scanning after UI changes, or healing failing tests. - Triggers on: "scan this webapp", "generate pageobjects", "update pageobjects", "crawl the UI", - "explore the app", "discover routes", "seed.yaml", "manifest.json", "first scan", - "create page objects", "pageobject drift", "bootstrap page objects", "re-scan", - "refresh manifest", "heal pageobjects", "fix failing tests", "selector drift", - "tests are failing". + Discover, create, and maintain PageObject classes for webapp exploration. + Covers seed.yaml assembly, MCP Playwright crawl, entity harvesting, PageObject generation, + Functionality stubs, and manifest.json output. + Three scopes: CREATE (first scan), RE-SCAN (full manifest refresh after UI changes), + HEALING (fix selector drift in failing tests only). + Triggers on: "scan this webapp", "generate pageobjects", "crawl the UI", "explore the app", + "discover routes", "seed.yaml", "manifest.json", "first scan", "create page objects", + "pageobject drift", "re-scan", "refresh manifest", "heal pageobjects", "fix failing tests", + "selector drift", "tests are failing". Does NOT trigger for: adding/fixing Gherkin (use living-doc-scenario-creator); resolving - missing data-cy attributes (use data-cy-instrument); deleting deprecated BDD files - (use bdd-maintain). + missing data-cy (use data-cy-instrument); deleting deprecated BDD files (use bdd-maintain). + Pairs with data-cy-instrument, living-doc-create-feature, living-doc-scenario-creator, + and gherkin-living-doc-sync. license: Apache-2.0 compatibility: GitHub Copilot --- @@ -193,6 +192,12 @@ For each discovered behavior, propose a stub named `<Feature name> – <behavior Output to `features/functionalities/<feat-kebab>/func-<kebab>.feature` with `@FUNC_ID:FUNC-UNKNOWN`. Promote via `living-doc-create-functionality` when IDs are assigned. +**Post-Create pipeline:** +- Non-empty `coverage_gaps` in the manifest → trigger `data-cy-instrument` to add missing `data-cy` attributes. +- PageObjects with `FEAT-UNKNOWN` placeholders → create Feature entities using `living-doc-create-feature`. +- Functionality stubs with `FUNC-UNKNOWN` → register Functionalities using `living-doc-create-functionality`. +- New surfaces with no Gherkin coverage → use `living-doc-scenario-creator` to generate scenarios. + --- ## Guided Traversal Protocol (Source E) @@ -277,6 +282,11 @@ Generated: <ISO> | Scope: <full|healing|scoped> After confirming changes: set `last_scanned`, update `elements` and `coverage_gaps`, update `navigation_context`. Add new surfaces; mark removed surfaces as `deprecated`. Generate new scenarios for newly discovered ACs (load `living-doc-scenario-creator`). +**Post-RE-SCAN pipeline:** +- Non-empty `coverage_gaps` → trigger `data-cy-instrument` to add missing `data-cy` attributes. +- New routes without PageObjects → continue in Create mode for those surfaces. +- Deprecated surfaces → trigger `bdd-maintain` REMOVE mode to clean up associated automation files. + --- ### HEALING scope @@ -289,6 +299,8 @@ After confirming changes: set `last_scanned`, update `elements` and `coverage_ga 4. Verify the step definition binding still resolves; fix if broken. 5. Re-run only the previously failing tests to confirm healing. Do not re-run the full suite. +> **Scope boundary with `gherkin-living-doc-sync`:** HEALING mode fixes selector drift in PageObject classes and step definition bindings. It does not resync `@AC:` traceability tags or correct scenario wording in `.feature` files. If healing reveals that feature file step text also drifted, trigger `gherkin-living-doc-sync` to realign the feature files. + --- ## Manifest schema @@ -342,7 +354,7 @@ After confirming changes: set `last_scanned`, update `elements` and `coverage_ga --- -## Out-of-scope redirects +## Out-of-scope routing | Request | Correct skill | |---|---| diff --git a/skills/living-doc-scenario-creator/SKILL.md b/skills/living-doc-scenario-creator/SKILL.md index 2fb0672..e0a2b22 100644 --- a/skills/living-doc-scenario-creator/SKILL.md +++ b/skills/living-doc-scenario-creator/SKILL.md @@ -2,18 +2,16 @@ name: living-doc-scenario-creator description: > Generate Gherkin scenarios and living-doc feature files from User Story and Functionality ACs. - Covers: full feature file output (header block, @AC:-tagged scenarios, complete Given/When/Then - bodies), standalone Gherkin without an entity, GWT correctness, ubiquitous language rules, - one-behaviour-per-scenario, Scenario Outline, Background, anti-patterns, @AC: traceability - annotations (authoritative format), AC coverage report, gap detection via living-doc-gap-finder, - and step definition resolution against PageObjects. + Covers full feature file output (@AC:-tagged scenarios, GWT bodies), Scenario Outline, + Background, AC coverage report, anti-pattern detection, and step definition resolution. + Two modes: entity (from US/FUNC) and standalone. Triggers on: "write a Gherkin scenario", "BDD scenario", "standalone feature file", "Given When Then", "Scenario Outline", "BDD anti-patterns", "review my feature file", "BDD scenarios for", "convert acceptance criteria to Gherkin", "exploratory scenario", "feature file header for user story", "living-doc feature file", "bootstrap feature file for US", "cover AC with scenarios", "scenario coverage for US", "map AC to scenarios", "scenario creator". - Does NOT trigger for: implementing step definitions (use gherkin-step), writing unit tests. - Pairs with living-doc-create-user-story and living-doc-pageobject-scan. + Does NOT trigger for: implementing step definitions (use gherkin-step); writing unit tests. + Pairs with living-doc-create-user-story, living-doc-pageobject-scan, and gherkin-step. license: Apache-2.0 compatibility: GitHub Copilot --- @@ -47,10 +45,19 @@ Load the User Story or Functionality. Confirm: If no ACs are `ACTIVE`, do not generate empty scenarios. Output a coverage report with state-specific skip reasons (`PLANNED`: `skipped — not yet active`, `DEPRECATED`: `skipped — deprecated AC`) and advise the user to re-run when an AC becomes `ACTIVE`. -### Step 2 — Gap detection +### Step 2 — Gap detection and merge policy An AC is uncovered if no `.feature` file carries `@AC:<id>`. Use `living-doc-gap-finder` (bottom-up mode) to identify `ACTIVE` ACs with no linked scenario before writing new files. +**If a scenario already exists for an AC**, apply this policy: + +| Existing scenario state | Action | +|---|---| +| Matches AC intent; GWT correct | **Skip** — record `already covered` in the coverage report | +| Step text stale or AC description changed | **Update** — rewrite GWT in-place; keep `@AC:` tag and title stable | +| Tagged `@deprecated` or `@review-needed` | **Propose replacement** — draft new scenario; confirm with user before overwriting | +| Multiple scenarios for the same AC | **Flag** — list them; ask user: valid aspect split or consolidate? | + ### Step 3 — Generate feature file For each `ACTIVE` AC, output `# AC:` comment, `@AC:` tag, `Scenario:` title, and full Given/When/Then step bodies. @@ -60,7 +67,7 @@ For each `ACTIVE` AC, output `# AC:` comment, `@AC:` tag, `Scenario:` title, and - `error` → `Scenario: <US title> — <error condition>` - `alternative` → `Scenario: <US title> — <alternative path>` -**Traceability format** (authoritative): +**Traceability format** (authoritative — `gherkin-living-doc-sync` validates against this definition): ```gherkin # AC:US-1-01 (v1.0.0 - ACTIVE) — customer places an order with a saved payment method @@ -250,11 +257,12 @@ Output all generated Gherkin in a single fenced `gherkin` code block starting wi --- -## Out-of-scope redirects +## Out-of-scope routing | Request | Correct skill | |---|---| | Implementing step definition code | `gherkin-step` | | Writing unit tests | Use your project's test framework directly | +| Syncing `@AC:` tags and traceability in existing feature files | `gherkin-living-doc-sync` | diff --git a/skills/living-doc-update/SKILL.md b/skills/living-doc-update/SKILL.md index ac58b0f..6d65323 100644 --- a/skills/living-doc-update/SKILL.md +++ b/skills/living-doc-update/SKILL.md @@ -12,6 +12,7 @@ description: > "change status of user story", "update feature registry". Does NOT trigger for: creating new entities (use living-doc-create-*), finding gaps (use living-doc-gap-finder), generating scenarios (use living-doc-scenario-creator). + Pairs with gherkin-living-doc-sync (propagate AC changes) and bdd-maintain (cleanup after deprecation). license: Apache-2.0 compatibility: GitHub Copilot --- @@ -53,6 +54,18 @@ When modifying an existing AC **keep the AC ID stable** — changing the ID brea to linked tests. Only update the `description`, `given`, `when`, `then`, or state fields. If the changed AC text affects linked tests, flag them for update. +## Promote a Functionality from planned to active + +A Functionality is ready to move from `planned` to `active` when all its ACs have passing tests. + +| Check | Requirement | +|---|---| +| `test_coverage` entries present | Every AC has a `test_type` and `justification` | +| Tests passing | All referenced unit/integration tests pass in CI | +| No `FUNC-UNKNOWN` placeholder | Functionality has a stable registered ID | + +After promoting a Functionality to `active`, run `living-doc-gap-finder` to confirm no `UNDOCUMENTED_FUNCTIONALITY` gaps remain. + ## Promote a User Story from planned to active Invariants that must hold before setting `status: active`: @@ -68,6 +81,8 @@ Warn if any invariant fails: > "User Story US-042 cannot be promoted from 'planned' to 'active': no error-path AC exists. Add at least one > AC for a failure or edge case before promoting." +After promoting a User Story to `active`, trigger `living-doc-scenario-creator` to generate BDD feature files for each `ACTIVE` AC if they do not yet exist. + ## Deprecate a Feature or Functionality Use this workflow when code backing an entity is deleted or a business capability is retired. @@ -88,6 +103,20 @@ Rules: - Flag any tests linked to the deprecated entity for update or removal - If the deprecated entity has `ACTIVE` ACs with linked Gherkin scenarios, trigger `gherkin-living-doc-sync` to propagate `@deprecated` and `@review-needed` tags to those scenarios +- After `gherkin-living-doc-sync` has tagged the deprecated scenarios, trigger `bdd-maintain` + REMOVE mode if the automation files for this entity should be deleted from the repository + +## Rename a Feature + +Changing a Feature's `id` or `name` requires these cascading updates: + +1. Update the Feature entity (`id`, `name`, and any self-referencing fields). +2. Update `feature_id` in every Functionality linked to this Feature. +3. Update the `feature_registry` entry in `catalog.json` (change the `feature_id` key and any path comments). +4. Search `manifest.json` and `seed.yaml` for the old name or ID and update. +5. Search PageObject file headers for the old Feature reference and update. +6. If Gherkin feature files have a `# Feature:` header with the old name, update those headers. +7. Run `living-doc-gap-finder` to confirm no `ORPHAN_FUNCTIONALITY` gaps remain after the rename. ## Update Feature ownership or dependencies @@ -111,7 +140,7 @@ AC:US-042-03 (v1.2.0 – PLANNED) – future_release: sprint-52 ``` -## Routing +## Out-of-scope routing | Request | Correct skill | |---|---| @@ -120,6 +149,7 @@ AC:US-042-03 (v1.2.0 – PLANNED) | Create a new Functionality | `living-doc-create-functionality` | | Find gaps in living documentation | `living-doc-gap-finder` | | AC modified, deprecated, or descoped — sync linked scenarios | `gherkin-living-doc-sync` | +| Deprecated entity — remove associated automation files | `bdd-maintain` | | Assess impact of an AC change on Features and User Stories | `living-doc-impact-analysis` | ## Script — `scripts/validate_entity.py` From 4f825bcc83889667a1e651db2da74196cf8c1e00 Mon Sep 17 00:00:00 2001 From: miroslavpojer <miroslav.pojer@absa.africa> Date: Sun, 31 May 2026 08:07:40 +0200 Subject: [PATCH 30/35] Test case review. --- .../evals/living-doc-bdd-copilot/evals.json | 161 +++++++++- .../living-doc-bdd-copilot/trigger-eval.json | 290 ++++++++++++++++-- skills/bdd-maintain/evals/evals.json | 128 ++++++++ skills/bdd-maintain/evals/trigger-eval.json | 122 ++++++++ skills/data-cy-instrument/evals/evals.json | 130 ++++++++ .../evals/trigger-eval.json | 134 ++++++++ .../gherkin-living-doc-sync/evals/evals.json | 14 +- .../evals/trigger-eval.json | 202 ++++++++++-- skills/gherkin-step/evals/evals.json | 68 +++- skills/gherkin-step/evals/trigger-eval.json | 126 +++++++- .../evals/evals.json | 15 +- .../evals/trigger-eval.json | 182 +++++++++-- .../evals/evals.json | 14 +- .../evals/trigger-eval.json | 222 ++++++++++++-- .../evals/evals.json | 14 +- .../evals/trigger-eval.json | 182 +++++++++-- skills/living-doc-gap-finder/evals/evals.json | 81 +++-- .../evals/trigger-eval.json | 213 +++++++++++-- .../evals/evals.json | 61 ++-- .../evals/trigger-eval.json | 50 ++- .../evals/trigger-eval.json | 200 ++++++++++-- .../evals/evals.json | 50 ++- .../evals/trigger-eval.json | 194 ++++++++++-- skills/living-doc-update/evals/evals.json | 60 ++-- .../living-doc-update/evals/trigger-eval.json | 56 +++- 25 files changed, 2729 insertions(+), 240 deletions(-) create mode 100644 skills/bdd-maintain/evals/evals.json create mode 100644 skills/bdd-maintain/evals/trigger-eval.json create mode 100644 skills/data-cy-instrument/evals/evals.json create mode 100644 skills/data-cy-instrument/evals/trigger-eval.json diff --git a/.github/agents/evals/living-doc-bdd-copilot/evals.json b/.github/agents/evals/living-doc-bdd-copilot/evals.json index 38edb9e..0f7c560 100644 --- a/.github/agents/evals/living-doc-bdd-copilot/evals.json +++ b/.github/agents/evals/living-doc-bdd-copilot/evals.json @@ -174,14 +174,169 @@ "id": 13, "category": "regression", "prompt": "During seed assembly, the living doc catalog at docs/living-doc/ has FEAT-checkout mapped to route /checkout and FEAT-account mapped to route /account/orders. No sitemap.xml exists. How should known_routes in seed.yaml be populated?", - "expected_output": "Agent loads Source A (living documentation). Extracts Feature-to-route mappings: FEAT-checkout \u2192 /checkout and FEAT-account \u2192 /account/orders. Adds both to known_routes in seed.yaml. Notes that Source B (sitemap.xml) is absent \u2014 no error is raised. Routes not listed in the living doc will be discovered dynamically during the crawl.", + "expected_output": "Agent loads Source A (living documentation). Extracts Feature-to-route mappings: FEAT-checkout → /checkout and FEAT-account → /account/orders. Adds both to known_routes in seed.yaml. Notes that Source B (sitemap.xml) is absent — no error is raised. Routes not listed in the living doc will be discovered dynamically during the crawl.", "files": [], "expectations": [ "Source A: Feature-to-route mappings extracted from the living doc catalog", "Both routes added to known_routes in seed.yaml", - "Source B (sitemap) noted as absent \u2014 no error raised", + "Source B (sitemap) noted as absent — no error raised", "Notes that unlisted routes will be discovered dynamically during crawl" ] + }, + { + "id": 14, + "category": "skill-dispatch", + "prompt": "Run a gap analysis on our living documentation before the v3.0 release. The catalog is at docs/living-doc/.", + "expected_output": "Agent switches to catalog-operations mode and loads the `living-doc-gap-finder` skill. Runs AUDIT mode: processes all entities and test files in docs/living-doc/, produces a gap report listing ORPHAN_TEST, STALE_REFERENCE, ORPHAN_FUNCTIONALITY, ORPHAN_FEATURE, EMPTY_FEATURE, and UNDOCUMENTED_SURFACE gaps. Outputs a documentation_coverage percentage and a prioritised gap list (Blocker → Critical → Important → Nit).", + "files": [], + "expectations": [ + "Loads living-doc-gap-finder skill", + "Runs AUDIT mode — full catalog audit", + "Outputs gap report with documentation_coverage percentage", + "Prioritised gap list: Blocker → Critical → Important → Nit", + "Does not modify any entities — gap-finder is read-only" + ] + }, + { + "id": 15, + "category": "skill-dispatch", + "prompt": "What living doc entities does PR #217 affect? It modifies PromoService.java and DiscountController.java.", + "expected_output": "Agent loads the `living-doc-impact-analysis` skill. Traces PromoService.java and DiscountController.java through the feature_registry. PromoService maps to FEAT-promo (domain logic — High impact). DiscountController maps to FEAT-discount (API contract — High impact). Output lists: affected Features, affected Functionalities, ACs requiring re-test, and Gherkin scenarios needing re-run. Produces a release sign-off checklist for both impacted Features.", + "files": [], + "expectations": [ + "Loads living-doc-impact-analysis skill", + "Traces changed files through feature_registry", + "Classifies each file as domain logic or API contract", + "Lists affected Features, Functionalities, ACs, and scenarios", + "Produces release sign-off checklist" + ] + }, + { + "id": 16, + "category": "skill-dispatch", + "prompt": "The @AC: traceability tags in checkout.feature are out of sync with the living doc. Sync them.", + "expected_output": "Agent loads the `gherkin-living-doc-sync` skill. Runs scan_ac_links.py to audit the @AC: tags and # AC: comments across checkout.feature. For each scenario: (1) verifies the @AC: tag matches a live AC in the catalog; (2) checks the # AC: comment format is canonical; (3) flags stale or missing links. Produces a sync diff showing what changed. Does not generate new scenarios — routes new-scenario requests to living-doc-scenario-creator.", + "files": [], + "expectations": [ + "Loads gherkin-living-doc-sync skill", + "Runs scan_ac_links.py to audit @AC: tags", + "Flags stale, missing, or malformed AC links", + "Produces a sync diff — does not auto-generate new scenarios", + "Routes new-scenario generation to living-doc-scenario-creator" + ] + }, + { + "id": 17, + "category": "skill-dispatch", + "prompt": "Write step definitions for the checkout scenarios — I need behave steps for the 'When the customer confirms the order' step.", + "expected_output": "Agent loads the `gherkin-step` skill. Generates a Python behave step definition: `@when('the customer confirms the order')` decorated function. The step body accesses `context.checkout_page` (initialised in a Before hook). Uses the CheckoutPage PageObject method `checkout_page.click_confirm_button()`. Follows the naming convention: function named `step_confirm_order`, not the verbose `step_when_the_customer_confirms_the_order`.", + "files": [], + "expectations": [ + "Loads gherkin-step skill", + "Generates @when decorated behave step function", + "Step body delegates to CheckoutPage PageObject method", + "Function name follows concise convention (step_confirm_order)", + "Does not write the Gherkin scenario itself — routes scenario creation to living-doc-scenario-creator" + ] + }, + { + "id": 18, + "category": "skill-dispatch", + "prompt": "Add missing data-cy attributes to the checkout Angular template — the PageObjects have PROPOSED locator comments.", + "expected_output": "Agent loads the `data-cy-instrument` skill. Phase 1: scans the checkout Angular template for elements without data-cy attributes; adds data-cy='<element-name>' to each native HTML element. For third-party components that cannot forward attributes, adds a WORK_LOG.md §4 row with library name, version, and issue tracker link. Phase 3: updates CheckoutPage PageObject locators from CSS selectors to getByTestId() calls. Removes PROPOSED comments after update.", + "files": [], + "expectations": [ + "Loads data-cy-instrument skill", + "Adds data-cy attributes to native HTML elements in the Angular template", + "Escalates lib components to WORK_LOG.md §4 — does not silently skip", + "Updates PageObject locators to getByTestId()", + "Removes PROPOSED comments after locator update" + ] + }, + { + "id": 19, + "category": "skill-dispatch", + "prompt": "Delete all BDD artifacts linked to the deprecated FEAT-legacy-promo feature — clean up feature files, step definitions, and PageObjects.", + "expected_output": "Agent loads the `bdd-maintain` skill in REMOVE mode. Identifies all .feature file scenarios whose # AC: tags reference FEAT-legacy-promo User Stories. Runs find_unused_steps.py to find step definitions only used by those scenarios. Runs find_unused_po_methods.py and find_unused_po_components.py to identify exclusively-used PageObject artifacts. Presents the full deletion list to the user for confirmation before touching any file. Also checks fixtures.ts for PageObject imports to remove. After confirmation, removes the identified files.", + "files": [], + "expectations": [ + "Loads bdd-maintain skill", + "REMOVE mode — identifies scenarios via # AC: tags linked to deprecated Feature", + "Runs all three audit scripts to identify exclusively-used artifacts", + "Checks fixtures.ts for PageObject imports", + "Presents deletion list for confirmation before touching any file" + ] + }, + { + "id": 20, + "category": "skill-dispatch", + "prompt": "Document the Notifications Service as a Feature entity in the living doc — it exposes a REST API at /api/notifications.", + "expected_output": "Agent loads the `living-doc-create-feature` skill. Assigns the next sequential FEAT-nnn ID using next_id.py. Creates a Feature entity JSON with: id, name ('Notifications Service'), type ('api'), route ('/api/notifications'), status ('active'), owners (asks user), and empty functionalities and user_stories arrays. Adds the Feature to feature_registry.json. Prompts: 'Do you want to create Functionality entities for specific behaviors of this service?'", + "files": [], + "expectations": [ + "Loads living-doc-create-feature skill", + "Assigns next FEAT-nnn ID using next_id.py", + "Creates Feature entity JSON with required fields", + "Adds entry to feature_registry.json", + "Prompts for Functionality creation as next step" + ] + }, + { + "id": 21, + "category": "skill-dispatch", + "prompt": "Document the atomic behavior: apply a 20% discount to all cart items for Gold tier customers. Parent Feature is FEAT-discount.", + "expected_output": "Agent loads the `living-doc-create-functionality` skill. Assigns the next FUNC-nnn ID via next_id.py. Creates a Functionality entity with: id, name ('Apply Gold Tier Discount'), parent feature_id ('FEAT-discount'), status ('planned'), and at least two ACs — a happy-path AC and an error-path AC. After saving, prompts to load living-doc-update to append FUNC-<id> to FEAT-discount's functionalities array — otherwise the entity will be flagged as ORPHAN_FUNCTIONALITY.", + "files": [], + "expectations": [ + "Loads living-doc-create-functionality skill", + "Assigns next FUNC-nnn ID via next_id.py", + "Creates Functionality with happy-path and error-path ACs", + "Prompts to update parent FEAT-discount.functionalities array via living-doc-update", + "Warns that skipping the parent link causes ORPHAN_FUNCTIONALITY gap" + ] + }, + { + "id": 22, + "category": "skill-dispatch", + "prompt": "Update the AC wording on US-042-AC-1 — the product owner changed the discount threshold from $50 to $75.", + "expected_output": "Agent loads the `living-doc-update` skill. Shows the OLD AC-1 text and proposes the NEW text with $75 threshold. Keeps the AC ID (US-042-AC-1) stable — never changes IDs. Bumps the AC version from v1.0.0 to v1.1.0. Flags the linked Gherkin scenario for review — the step 'When the cart total exceeds $50' will need updating. Runs validate_entity.py after the edit to confirm no invariants are broken.", + "files": [], + "expectations": [ + "Loads living-doc-update skill", + "Shows OLD and NEW AC text side by side before committing", + "AC ID stays stable — only description/GWT fields updated", + "AC version bumped from v1.0.0 to v1.1.0", + "Flags linked Gherkin scenario as stale and needing update", + "Runs validate_entity.py post-edit" + ] + }, + { + "id": 23, + "category": "skill-dispatch", + "prompt": "Generate BDD scenarios for US-007 — Place an Online Order. It has 3 Active ACs and 1 Planned AC.", + "expected_output": "Agent loads the `living-doc-scenario-creator` skill. Generates a .feature file for US-007 with 3 scenarios — one per Active AC. The Planned AC is skipped (not generated until Active). Each scenario is preceded by a '# AC: US-007-0n (v1.0.0 – Active)' traceability comment. Merge policy: if a scenario already exists for an AC, applies the 4-row decision table (skip if intent matches, update if GWT is stale, propose replacement if deprecated, flag if multiple scenarios exist per AC).", + "files": [], + "expectations": [ + "Loads living-doc-scenario-creator skill", + "Generates scenarios only for Active ACs — skips Planned", + "Each scenario preceded by # AC: traceability comment", + "Applies merge policy decision table for existing scenarios", + "Output is a named .feature file: us-007-place-an-online-order.feature" + ] + }, + { + "id": 24, + "category": "skill-dispatch", + "prompt": "Create a User Story for the guest checkout feature — a guest should be able to place an order without registering.", + "expected_output": "Agent loads the `living-doc-create-user-story` skill. Assigns the next US-nnn ID via next_id.py. Elicits the full narrative: As a [guest customer] / I can [place an order without registering] / so that [I can complete a purchase without commitment]. Guides through AC creation: at least a happy-path AC and one error-path AC. Checks whether existing Functionalities (e.g. FUNC-guest-cart, FUNC-guest-checkout) should be linked in the functionalities array. Routes to living-doc-create-functionality if new Functionalities are needed.", + "files": [], + "expectations": [ + "Loads living-doc-create-user-story skill", + "Assigns next US-nnn ID via next_id.py", + "Elicits complete narrative (As a / I can / so that)", + "Requires at least one happy-path and one error-path AC", + "Asks about existing Functionalities to link — prevents ORPHAN_FUNCTIONALITY gaps" + ] } ] -} +} \ No newline at end of file diff --git a/.github/agents/evals/living-doc-bdd-copilot/trigger-eval.json b/.github/agents/evals/living-doc-bdd-copilot/trigger-eval.json index 94beb6f..c14fa01 100644 --- a/.github/agents/evals/living-doc-bdd-copilot/trigger-eval.json +++ b/.github/agents/evals/living-doc-bdd-copilot/trigger-eval.json @@ -1,26 +1,266 @@ [ - {"id": 1, "query": "Scan the webapp at https://app.example.com and generate PageObjects", "should_trigger": true, "reason": "'scan webapp' trigger phrase"}, - {"id": 2, "query": "Generate PageObjects for the checkout and login screens", "should_trigger": true, "reason": "'generate pageobjects' trigger phrase"}, - {"id": 3, "query": "Heal the PageObjects after the UI redesign — selectors are broken", "should_trigger": true, "reason": "'heal pageobjects' trigger phrase"}, - {"id": 4, "query": "Generate BDD scenarios for the active User Stories", "should_trigger": true, "reason": "'generate scenarios' trigger phrase"}, - {"id": 5, "query": "Sync the Gherkin feature files with the living doc AC catalog", "should_trigger": true, "reason": "'sync gherkin' trigger phrase"}, - {"id": 6, "query": "Use Playwright to crawl the application and discover all screens", "should_trigger": true, "reason": "'playwright crawl' trigger phrase"}, - {"id": 7, "query": "Explore the app and map all the UI surfaces", "should_trigger": true, "reason": "'explore the app' trigger phrase"}, - {"id": 8, "query": "@bdd-copilot scan the dashboard and generate scenarios", "should_trigger": true, "reason": "'bdd copilot' trigger phrase — explicit agent invocation"}, - {"id": 9, "query": "@living-doc-bdd-copilot set up the BDD suite for our new module", "should_trigger": true, "reason": "'living doc bdd copilot' trigger phrase — explicit agent invocation"}, - {"id": 10, "query": "Run the full BDD pipeline — crawl, generate PageObjects, and produce feature files", "should_trigger": true, "reason": "'BDD pipeline' trigger phrase"}, - {"id": 11, "query": "Crawl the UI to discover all reachable pages", "should_trigger": true, "reason": "'crawl the UI' trigger phrase"}, - {"id": 12, "query": "Create page objects for the admin portal", "should_trigger": true, "reason": "'create page objects' trigger phrase"}, - {"id": 13, "query": "Generate a feature file for US-007 — Place an Online Order", "should_trigger": true, "reason": "'generate feature file' trigger phrase"}, - {"id": 14, "query": "What is the scenario coverage for US-007?", "should_trigger": true, "reason": "'scenario coverage' trigger phrase"}, - {"id": 15, "query": "Write the step definitions for the checkout scenarios", "should_trigger": true, "reason": "'step definitions' trigger phrase"}, - {"id": 16, "query": "Generate Gherkin from user story US-003", "should_trigger": true, "reason": "'gherkin from user story' trigger phrase"}, - {"id": 17, "query": "Create a User Story for the loyalty points redemption feature", "should_trigger": true, "reason": "Catalog entity creation is handled by this agent in catalog-operations mode"}, - {"id": 18, "query": "Write a unit test for the discount calculation function", "should_trigger": false, "reason": "Unit test authoring — out of scope for this toolkit (no @sdet-copilot agent defined)"}, - {"id": 19, "query": "Update the AC state on US-007-02 to DEPRECATED", "should_trigger": true, "reason": "Catalog entity state update is handled by this agent in catalog-operations mode"}, - {"id": 20, "query": "Run the TypeScript quality gate for the frontend", "should_trigger": false, "reason": "Quality gate execution — out of scope for this agent"}, - {"id": 21, "query": "The manifest.json is missing — start a first exploration run from the seed file", "should_trigger": true, "reason": "Partial state: seed present, manifest absent → first exploration run — 'scan webapp' pattern"}, - {"id": 22, "query": "The seed.yaml has literal credentials — is that correct?", "should_trigger": true, "reason": "Credential safety rule enforcement during seed assembly — BDD session setup task"}, - {"id": 23, "query": "I've hit a guided traversal point — the checkout wizard needs a delivery zone code", "should_trigger": true, "reason": "Source E guided traversal protocol — blocked crawl point during exploration"}, - {"id": 24, "query": "Update the AC on US-007 to change the payment timeout to 30 seconds", "should_trigger": true, "reason": "AC update is a catalog layer operation handled by this agent"} -] + { + "id": 1, + "query": "Scan the webapp at https://app.example.com and generate PageObjects", + "should_trigger": true, + "reason": "'scan webapp' trigger phrase" + }, + { + "id": 2, + "query": "Generate PageObjects for the checkout and login screens", + "should_trigger": true, + "reason": "'generate pageobjects' trigger phrase" + }, + { + "id": 3, + "query": "Heal the PageObjects after the UI redesign — selectors are broken", + "should_trigger": true, + "reason": "'heal pageobjects' trigger phrase" + }, + { + "id": 4, + "query": "Generate BDD scenarios for the active User Stories", + "should_trigger": true, + "reason": "'generate scenarios' trigger phrase" + }, + { + "id": 5, + "query": "Sync the Gherkin feature files with the living doc AC catalog", + "should_trigger": true, + "reason": "'sync gherkin' trigger phrase" + }, + { + "id": 6, + "query": "Use Playwright to crawl the application and discover all screens", + "should_trigger": true, + "reason": "'playwright crawl' trigger phrase" + }, + { + "id": 7, + "query": "Explore the app and map all the UI surfaces", + "should_trigger": true, + "reason": "'explore the app' trigger phrase" + }, + { + "id": 8, + "query": "@bdd-copilot scan the dashboard and generate scenarios", + "should_trigger": true, + "reason": "'bdd copilot' trigger phrase — explicit agent invocation" + }, + { + "id": 9, + "query": "@living-doc-bdd-copilot set up the BDD suite for our new module", + "should_trigger": true, + "reason": "'living doc bdd copilot' trigger phrase — explicit agent invocation" + }, + { + "id": 10, + "query": "Run the full BDD pipeline — crawl, generate PageObjects, and produce feature files", + "should_trigger": true, + "reason": "'BDD pipeline' trigger phrase" + }, + { + "id": 11, + "query": "Crawl the UI to discover all reachable pages", + "should_trigger": true, + "reason": "'crawl the UI' trigger phrase" + }, + { + "id": 12, + "query": "Create page objects for the admin portal", + "should_trigger": true, + "reason": "'create page objects' trigger phrase" + }, + { + "id": 13, + "query": "Generate a feature file for US-007 — Place an Online Order", + "should_trigger": true, + "reason": "'generate feature file' trigger phrase" + }, + { + "id": 14, + "query": "What is the scenario coverage for US-007?", + "should_trigger": true, + "reason": "'scenario coverage' trigger phrase" + }, + { + "id": 15, + "query": "Write the step definitions for the checkout scenarios", + "should_trigger": true, + "reason": "'step definitions' trigger phrase" + }, + { + "id": 16, + "query": "Generate Gherkin from user story US-003", + "should_trigger": true, + "reason": "'gherkin from user story' trigger phrase" + }, + { + "id": 17, + "query": "Create a User Story for the loyalty points redemption feature", + "should_trigger": true, + "reason": "Catalog entity creation is handled by this agent in catalog-operations mode" + }, + { + "id": 18, + "query": "Write a unit test for the discount calculation function", + "should_trigger": false, + "reason": "Unit test authoring — out of scope for this toolkit (no @sdet-copilot agent defined)" + }, + { + "id": 19, + "query": "Update the AC state on US-007-02 to DEPRECATED", + "should_trigger": true, + "reason": "Catalog entity state update is handled by this agent in catalog-operations mode" + }, + { + "id": 20, + "query": "Run the TypeScript quality gate for the frontend", + "should_trigger": false, + "reason": "Quality gate execution — out of scope for this agent" + }, + { + "id": 21, + "query": "The manifest.json is missing — start a first exploration run from the seed file", + "should_trigger": true, + "reason": "Partial state: seed present, manifest absent → first exploration run — 'scan webapp' pattern" + }, + { + "id": 22, + "query": "The seed.yaml has literal credentials — is that correct?", + "should_trigger": true, + "reason": "Credential safety rule enforcement during seed assembly — BDD session setup task" + }, + { + "id": 23, + "query": "I've hit a guided traversal point — the checkout wizard needs a delivery zone code", + "should_trigger": true, + "reason": "Source E guided traversal protocol — blocked crawl point during exploration" + }, + { + "id": 24, + "query": "Update the AC on US-007 to change the payment timeout to 30 seconds", + "should_trigger": true, + "reason": "AC update is a catalog layer operation handled by this agent" + }, + { + "id": 25, + "query": "Debug the null pointer exception in PaymentService.processOrder()", + "should_trigger": false, + "reason": "Application debugging — outside the living doc / BDD scope" + }, + { + "id": 26, + "query": "Add error handling to the checkout API endpoint", + "should_trigger": false, + "reason": "Production code change — outside the living doc / BDD scope" + }, + { + "id": 27, + "query": "Write an OpenAPI spec for the orders REST endpoint", + "should_trigger": false, + "reason": "API schema documentation — not living doc entity creation" + }, + { + "id": 28, + "query": "Configure the Kubernetes resource limits for the order service", + "should_trigger": false, + "reason": "Infrastructure configuration — outside scope" + }, + { + "id": 29, + "query": "Set up a CI pipeline for the frontend build", + "should_trigger": false, + "reason": "CI/CD configuration — outside scope" + }, + { + "id": 30, + "query": "Fix the failing unit tests in CartCalculatorTest", + "should_trigger": false, + "reason": "Unit test fix — outside the living doc / BDD scope" + }, + { + "id": 31, + "query": "Refactor the PaymentService to use the repository pattern", + "should_trigger": false, + "reason": "Code refactoring — outside the living doc / BDD scope" + }, + { + "id": 32, + "query": "Add structured logging to the checkout service", + "should_trigger": false, + "reason": "Application logging — outside scope" + }, + { + "id": 33, + "query": "Review this pull request for code quality issues", + "should_trigger": false, + "reason": "Code review — outside scope" + }, + { + "id": 34, + "query": "Configure ESLint rules for the frontend project", + "should_trigger": false, + "reason": "Dev tooling configuration — outside scope" + }, + { + "id": 35, + "query": "Write a database migration script to add the promo_code column", + "should_trigger": false, + "reason": "DB schema change — outside scope" + }, + { + "id": 36, + "query": "Optimize the SQL query in OrderRepository.findByCustomer()", + "should_trigger": false, + "reason": "Query optimization — outside scope" + }, + { + "id": 37, + "query": "Set up monitoring alerts for the payment service", + "should_trigger": false, + "reason": "Ops / monitoring — outside scope" + }, + { + "id": 38, + "query": "Write technical documentation for the REST API", + "should_trigger": false, + "reason": "Generic tech docs — not living doc entity creation" + }, + { + "id": 39, + "query": "How do I set up a multi-stage Docker build for the backend?", + "should_trigger": false, + "reason": "Container/infra question — outside scope" + }, + { + "id": 40, + "query": "Run the security vulnerability scan on the checkout service", + "should_trigger": false, + "reason": "Security tooling — outside scope" + }, + { + "id": 41, + "query": "Generate a performance report for the checkout flow", + "should_trigger": false, + "reason": "Performance testing — outside scope" + }, + { + "id": 42, + "query": "Write a changelog for the v2.1.0 release", + "should_trigger": false, + "reason": "Release management — outside scope" + }, + { + "id": 43, + "query": "Configure feature flags for the new checkout flow", + "should_trigger": false, + "reason": "Feature flag setup — outside scope" + }, + { + "id": 44, + "query": "Fix the TypeScript compilation error in CheckoutComponent", + "should_trigger": false, + "reason": "Compile error fix — outside scope" + } +] \ No newline at end of file diff --git a/skills/bdd-maintain/evals/evals.json b/skills/bdd-maintain/evals/evals.json new file mode 100644 index 0000000..1069806 --- /dev/null +++ b/skills/bdd-maintain/evals/evals.json @@ -0,0 +1,128 @@ +{ + "skill_name": "bdd-maintain", + "evals": [ + { + "id": 1, + "category": "happy-path", + "prompt": "US-042 was deprecated last sprint. Which BDD files need to be removed and in what order?", + "expected_output": "REMOVE mode: (1) Identify all .feature file scenarios tagged @AC:US-042-xx. (2) Remove those scenarios (or the entire feature file if all scenarios belong to US-042). (3) Remove all step definitions only used by those scenarios (run find_unused_steps.py to confirm). (4) Remove the PageObject if no other feature uses it (run find_unused_po_methods.py). (5) Remove fixture registrations in fixtures.ts that import the removed PageObject.", + "files": [], + "expectations": [ + "Lists removal order: scenarios → step defs → PageObject → fixtures.ts", + "Uses find_unused_steps.py to identify orphaned step defs", + "Checks fixtures.ts for PageObject imports", + "Does not remove shared artifacts still used by other features" + ] + }, + { + "id": 2, + "category": "happy-path", + "prompt": "Run find_unused_steps.py — it reported 12 unused step definitions. What should I do with them?", + "expected_output": "Review each unused step against the living doc: (1) If the step belongs to a deprecated entity — delete it. (2) If the step belongs to an active entity but has no scenario — it may be a stale draft; flag for team review before deleting. (3) Never delete without checking whether the step is imported or aliased from another step file.", + "files": [], + "expectations": [ + "Does not auto-delete without review", + "Distinguishes deprecated entity steps from active-entity orphans", + "Warns about aliased/imported steps" + ] + }, + { + "id": 3, + "category": "regression", + "prompt": "I removed the CheckoutPage PageObject but forgot to remove it from fixtures.ts. What are the consequences and how do I fix it?", + "expected_output": "fixtures.ts now has a broken import pointing to the deleted file, which causes a compile error in the test suite. Fix: (1) Open fixtures.ts. (2) Remove the import statement for CheckoutPage. (3) Remove the checkout parameter from the test fixture function signature and the constructor call. (4) Re-run the test suite to confirm no lingering references.", + "files": [], + "expectations": [ + "Identifies compile error as the consequence", + "Removes the import statement", + "Removes constructor parameter in fixture function", + "Verifies by re-running the test suite" + ] + }, + { + "id": 4, + "category": "happy-path", + "prompt": "What is the difference between find_unused_po_methods.py and find_unused_po_components.py?", + "expected_output": "find_unused_po_methods.py finds PageObject class methods (e.g. clickConfirmButton()) that are defined but never called by any step definition. find_unused_po_components.py finds entire PageObject classes or component groupings that have no references anywhere in the step files. Use methods script first for fine-grained cleanup; use components script to find entire dead PO files.", + "files": [], + "expectations": [ + "Correctly distinguishes method-level vs class/component-level scope", + "Recommends methods script for fine-grained, components for file-level", + "Both scripts referenced by name" + ] + }, + { + "id": 5, + "category": "regression", + "prompt": "find_unused_po_methods.py reports that CheckoutPage.clickConfirmButton() is unused, but I can see it's called in checkout.steps.ts. Why?", + "expected_output": "The script likely scans with a string match — if the method is called via a variable reference (e.g. `page.clickConfirmButton()` where `page` is typed as `BasePage`) or through an alias, the static scan may miss it. Review the call site in checkout.steps.ts and verify the import. If the call exists, the script has a false positive — do not delete the method. Report the false positive to the team.", + "files": [], + "expectations": [ + "Identifies dynamic dispatch / base-class alias as likely cause", + "Does not delete the method on a false positive", + "Recommends verifying the actual call site" + ] + }, + { + "id": 6, + "category": "negative", + "prompt": "The selectors in CheckoutPage are broken after the UI redesign. Should I use bdd-maintain to fix them?", + "expected_output": "No — broken selectors are a HEALING scope concern, not a BDD maintenance task. Load living-doc-pageobject-scan and use HEALING mode to re-scan the page and update selectors. bdd-maintain handles artifact lifecycle (deletion and dead code), not selector repair.", + "files": [], + "expectations": [ + "Routes to living-doc-pageobject-scan HEALING", + "Explains bdd-maintain does not own selector repair", + "Does not attempt to fix selectors" + ] + }, + { + "id": 7, + "category": "edge-case", + "prompt": "A step definition in checkout.steps.ts is called by two scenarios — one belongs to deprecated US-042 and one belongs to active US-007. Can I delete the step?", + "expected_output": "No — the step is still needed by the active US-007 scenario. Only the scenario linked to US-042 should be removed. The step definition itself must be retained. After removing the US-042 scenario, re-run find_unused_steps.py to confirm the step is still referenced.", + "files": [], + "expectations": [ + "Does not delete a step shared with an active scenario", + "Only removes the deprecated scenario", + "Re-runs script after cleanup to verify" + ] + }, + { + "id": 8, + "category": "happy-path", + "prompt": "In what order should I run the three audit scripts for a full dead code audit?", + "expected_output": "Recommended order: (1) find_unused_steps.py — identifies step definitions with no scenario callers. (2) find_unused_po_methods.py — identifies PO methods with no step callers. (3) find_unused_po_components.py — identifies entire PO classes/components with no references. Run in this order because deleting steps may free up PO methods, and deleting PO methods may free up entire PO components.", + "files": [], + "expectations": [ + "Lists the correct order: steps → PO methods → PO components", + "Explains the cascading dependency rationale", + "All three script names referenced correctly" + ] + }, + { + "id": 9, + "category": "regression", + "prompt": "The product owner wants to re-activate a User Story that was deprecated two sprints ago. Some BDD artifacts were already removed. How do I restore them?", + "expected_output": "bdd-maintain does not handle restoration — it handles removal and dead code auditing only. To restore: (1) Update the entity status in living-doc-update (set back to active). (2) Regenerate scenarios with living-doc-scenario-creator. (3) Regenerate PageObjects with living-doc-pageobject-scan if needed. (4) Re-run gherkin-living-doc-sync to re-link @AC: tags.", + "files": [], + "expectations": [ + "Routes to living-doc-update to change entity status", + "Routes to living-doc-scenario-creator for scenario regeneration", + "Routes to living-doc-pageobject-scan for PO regeneration", + "Routes to gherkin-living-doc-sync for @AC: re-linking" + ] + }, + { + "id": 10, + "category": "output-format", + "prompt": "Show me the expected output of find_unused_steps.py for a suite with 3 unused step definitions.", + "expected_output": "Output lists each unused step definition file path and function name, e.g.:\n UNUSED: checkout/checkout.steps.py::step_when_customer_applies_promo\n UNUSED: checkout/checkout.steps.py::step_then_total_unchanged\n UNUSED: login/login.steps.py::step_given_expired_session\nSummary line: '3 unused step definition(s) found.'", + "files": [], + "expectations": [ + "Shows file path + function name format", + "Shows summary count line", + "Does not show false positives for steps used by active scenarios" + ] + } + ] +} \ No newline at end of file diff --git a/skills/bdd-maintain/evals/trigger-eval.json b/skills/bdd-maintain/evals/trigger-eval.json new file mode 100644 index 0000000..295ec4f --- /dev/null +++ b/skills/bdd-maintain/evals/trigger-eval.json @@ -0,0 +1,122 @@ +[ + { + "id": 1, + "query": "Remove all BDD artifacts for the deprecated checkout feature", + "should_trigger": true, + "reason": "REMOVE mode — deleting BDD artifacts for a deprecated entity" + }, + { + "id": 2, + "query": "Delete the feature files and step definitions for US-007 which was deprecated", + "should_trigger": true, + "reason": "REMOVE mode — explicit BDD artifact deletion" + }, + { + "id": 3, + "query": "Run a dead code audit to find unused step definitions", + "should_trigger": true, + "reason": "DEAD CODE AUDIT mode — finding unused step definitions" + }, + { + "id": 4, + "query": "Find unused PageObject methods across the test suite", + "should_trigger": true, + "reason": "DEAD CODE AUDIT mode — finding unused PO methods" + }, + { + "id": 5, + "query": "Which PageObject components are never referenced by any step definition?", + "should_trigger": true, + "reason": "DEAD CODE AUDIT mode — finding dead PO components" + }, + { + "id": 6, + "query": "BDD cleanup after deprecating FEAT-legacy-payment-widget", + "should_trigger": true, + "reason": "REMOVE mode — bdd cleanup keyword" + }, + { + "id": 7, + "query": "Remove the PageObject for the old checkout wizard screen", + "should_trigger": true, + "reason": "REMOVE mode — removing a specific PageObject" + }, + { + "id": 8, + "query": "Run find_unused_steps.py to find orphaned step definitions", + "should_trigger": true, + "reason": "DEAD CODE AUDIT — direct script invocation" + }, + { + "id": 9, + "query": "Are there any dead PO components that nothing calls anymore?", + "should_trigger": true, + "reason": "DEAD CODE AUDIT — dead PO components query" + }, + { + "id": 10, + "query": "After deprecating US-042 in the living doc, what BDD files need to go?", + "should_trigger": true, + "reason": "REMOVE mode — downstream of entity deprecation" + }, + { + "id": 11, + "query": "Scan the webapp at https://app.example.com and update the PageObjects", + "should_trigger": false, + "reason": "PageObject re-scan after UI change — routes to living-doc-pageobject-scan RE-SCAN" + }, + { + "id": 12, + "query": "Heal the PageObjects — selectors are broken after the UI redesign", + "should_trigger": false, + "reason": "Selector drift healing — routes to living-doc-pageobject-scan HEALING" + }, + { + "id": 13, + "query": "Sync the @AC: tags in checkout.feature with the living doc", + "should_trigger": false, + "reason": "@AC: traceability sync — routes to gherkin-living-doc-sync" + }, + { + "id": 14, + "query": "Create a User Story for the checkout capability", + "should_trigger": false, + "reason": "Creating a User Story — routes to living-doc-create-user-story" + }, + { + "id": 15, + "query": "Generate BDD scenarios for US-007", + "should_trigger": false, + "reason": "Generating scenarios — routes to living-doc-scenario-creator" + }, + { + "id": 16, + "query": "Write step definitions for the checkout scenarios", + "should_trigger": false, + "reason": "Writing step definitions — routes to gherkin-step" + }, + { + "id": 17, + "query": "Run a gap analysis on the living documentation", + "should_trigger": false, + "reason": "Gap analysis — routes to living-doc-gap-finder" + }, + { + "id": 18, + "query": "What does PR #217 affect in the living doc?", + "should_trigger": false, + "reason": "Impact analysis — routes to living-doc-impact-analysis" + }, + { + "id": 19, + "query": "Add data-cy attributes to the checkout template", + "should_trigger": false, + "reason": "Instrumenting templates — routes to data-cy-instrument" + }, + { + "id": 20, + "query": "Update US-042 to add a new AC for the expired promo path", + "should_trigger": false, + "reason": "Updating an entity — routes to living-doc-update" + } +] \ No newline at end of file diff --git a/skills/data-cy-instrument/evals/evals.json b/skills/data-cy-instrument/evals/evals.json new file mode 100644 index 0000000..6f6e042 --- /dev/null +++ b/skills/data-cy-instrument/evals/evals.json @@ -0,0 +1,130 @@ +{ + "skill_name": "data-cy-instrument", + "evals": [ + { + "id": 1, + "category": "happy-path", + "prompt": "Phase 1 scan of the checkout module found 5 elements without data-cy attributes. What are the next steps?", + "expected_output": "For each element: (1) Determine if it is a native HTML element — if yes, add a data-cy attribute directly in the Angular template (e.g. data-cy='confirm-order-btn'). (2) If it is a third-party library component that does not forward attributes, mark it ⚠️ 'needs lib support' and add a WORK_LOG.md §4 row with status ⚠️, element description, library name and version, and a link to the library's issue tracker. (3) After all native elements are instrumented, proceed to Phase 3 to update PageObjects to use getByTestId().", + "files": [], + "expectations": [ + "Handles native elements by adding data-cy to the template", + "Handles lib components with WORK_LOG.md §4 row entry", + "WORK_LOG row includes library name, version, and issue tracker link", + "Does not silently skip lib components" + ] + }, + { + "id": 2, + "category": "happy-path", + "prompt": "A PageObject has 3 locators marked ⚠️ PROPOSED. What does that mean and what do I do?", + "expected_output": "PROPOSED locators indicate that Phase 1 (data-cy instrumentation) has not yet been done for those elements — the PageObject was generated with a best-guess selector pending data-cy confirmation. Steps: (1) Run Phase 1 to add the missing data-cy attributes to the template. (2) Once data-cy is in the template, replace the PROPOSED locator with getByTestId('your-data-cy-value') in the PageObject. (3) Remove the ⚠️ PROPOSED comment after updating.", + "files": [], + "expectations": [ + "Explains PROPOSED means data-cy not yet added to template", + "Phase 1 first, then update locator to getByTestId()", + "Remove PROPOSED comment after fix" + ] + }, + { + "id": 3, + "category": "happy-path", + "prompt": "How do I update a PageObject locator from a CSS selector to getByTestId() after adding the data-cy attribute?", + "expected_output": "Replace the old locator: `this.page.locator('button.confirm-order')` → `this.page.getByTestId('confirm-order-btn')` where 'confirm-order-btn' is the value of the data-cy attribute you added to the template. Update the method comment to remove any ⚠️ PROPOSED flag.", + "files": [], + "expectations": [ + "Shows the getByTestId() replacement syntax", + "data-cy value matches what was added to the template", + "Removes PROPOSED comment" + ] + }, + { + "id": 4, + "category": "regression", + "prompt": "A third-party PrimeNG Button component does not forward the data-cy attribute. I silently skipped it in Phase 1. Is that OK?", + "expected_output": "No — silent skips are not allowed. For any element where the library does not support attribute forwarding, add a WORK_LOG.md §4 row with: status ⚠️, element description, library name (PrimeNG) and version, and a link to the PrimeNG issue tracker or relevant GitHub issue. This ensures the gap is tracked and actionable.", + "files": [], + "expectations": [ + "Identifies silent skip as incorrect", + "Requires WORK_LOG.md §4 row", + "Row must include library name, version, issue tracker link" + ] + }, + { + "id": 5, + "category": "happy-path", + "prompt": "FUNC-promo-validate has status 'planned' because the data-cy attributes are missing. After Phase 1 and Phase 3 are complete, what do I do to promote it?", + "expected_output": "After data-cy attributes are added and PageObjects updated: load living-doc-update and change FUNC-promo-validate status from 'planned' to 'active'. This promotion is the downstream step — data-cy-instrument invokes living-doc-update for the status change.", + "files": [], + "expectations": [ + "Routes to living-doc-update for status promotion", + "Status changes from planned to active", + "Only after Phase 1 and Phase 3 are complete" + ] + }, + { + "id": 6, + "category": "negative", + "prompt": "The checkout PageObject selectors are broken because the CSS classes changed after the redesign — there are no data-cy attributes involved. Should I run data-cy-instrument?", + "expected_output": "No — selector drift caused by CSS class changes without missing data-cy attributes is a HEALING scope concern. Load living-doc-pageobject-scan and use HEALING mode to re-scan the page and update the broken selectors. data-cy-instrument only applies when coverage_gaps are non-empty or PageObjects have PROPOSED locators due to missing test IDs.", + "files": [], + "expectations": [ + "Routes to living-doc-pageobject-scan HEALING", + "Explains data-cy-instrument does not apply when coverage_gaps are empty", + "Does not attempt to re-instrument templates" + ] + }, + { + "id": 7, + "category": "edge-case", + "prompt": "Phases 1 and 3 are Angular-specific. What do I do for a React or Vue app?", + "expected_output": "Phases 1, 3, and 5 are framework-agnostic in principle. For React: add data-testid attributes (data-cy is by convention but getByTestId() accepts any value). For Vue: same approach. Phase 2 (ng-add schematics) and Phase 4 (Angular-specific wiring) are Angular-only and should be skipped for other frameworks. Apply Phase 1 (audit), Phase 3 (PO update), and Phase 5 (coverage gap check) to any framework.", + "files": [], + "expectations": [ + "States phases 1, 3, 5 are framework-agnostic", + "States phases 2 and 4 are Angular-only", + "Mentions data-testid equivalence for React/Vue" + ] + }, + { + "id": 8, + "category": "output-format", + "prompt": "Show me the expected WORK_LOG.md §4 row for a Material Design button that does not support data-cy forwarding.", + "expected_output": "| ⚠️ | Checkout confirm button | MatButton (Angular Material v17.3.0) | Cannot forward data-cy; tracked at https://github.com/angular/components/issues/XXXX |", + "files": [], + "expectations": [ + "Status column is ⚠️", + "Element description present", + "Library name and version present", + "Issue tracker link present" + ] + }, + { + "id": 9, + "category": "regression", + "prompt": "After running Phase 3, coverage_gaps is still non-empty for the promo-apply button. What should I check?", + "expected_output": "Check: (1) Did Phase 1 actually add data-cy='promo-apply-btn' to the template? (2) Did Phase 3 update the PageObject locator to getByTestId('promo-apply-btn')? (3) Is the element conditionally rendered (e.g. *ngIf) and not visible during the scan? (4) Is the element inside a Shadow DOM that prevents standard attribute access? If all four checks pass and the gap persists, add a WORK_LOG.md §4 row.", + "files": [], + "expectations": [ + "Checks Phase 1 template add", + "Checks Phase 3 PO update", + "Checks conditional rendering", + "Checks Shadow DOM edge case", + "Falls back to WORK_LOG if gap persists" + ] + }, + { + "id": 10, + "category": "happy-path", + "prompt": "What is the relationship between data-cy-instrument and living-doc-pageobject-scan?", + "expected_output": "living-doc-pageobject-scan is upstream — it creates or heals PageObjects and may produce PROPOSED locator comments when data-cy attributes are missing. data-cy-instrument runs downstream to resolve those PROPOSED locators by instrumenting the templates and updating the PageObjects to use getByTestId(). After data-cy-instrument completes, living-doc-scenario-creator is downstream to generate BDD scenarios using the now-stable locators.", + "files": [], + "expectations": [ + "living-doc-pageobject-scan is upstream (produces PROPOSED)", + "data-cy-instrument resolves PROPOSED locators", + "living-doc-scenario-creator is downstream", + "Correct pipeline order stated" + ] + } + ] +} \ No newline at end of file diff --git a/skills/data-cy-instrument/evals/trigger-eval.json b/skills/data-cy-instrument/evals/trigger-eval.json new file mode 100644 index 0000000..5aab361 --- /dev/null +++ b/skills/data-cy-instrument/evals/trigger-eval.json @@ -0,0 +1,134 @@ +[ + { + "id": 1, + "query": "Add missing data-cy attributes to the checkout Angular template", + "should_trigger": true, + "reason": "data-cy-instrument trigger — adding missing data-cy attributes" + }, + { + "id": 2, + "query": "The PageObjects have ⚠️ PROPOSED locator comments — what do I do?", + "should_trigger": true, + "reason": "data-cy-instrument trigger — PROPOSED locator resolution" + }, + { + "id": 3, + "query": "Instrument the Angular templates to add data-cy test IDs", + "should_trigger": true, + "reason": "data-cy-instrument trigger — instrument angular templates keyword" + }, + { + "id": 4, + "query": "Fix data-cy gaps in the login component template", + "should_trigger": true, + "reason": "data-cy-instrument trigger — fix data-cy gaps keyword" + }, + { + "id": 5, + "query": "Run a data-cy audit on the checkout module", + "should_trigger": true, + "reason": "data-cy-instrument trigger — data-cy audit keyword" + }, + { + "id": 6, + "query": "Add testids to the checkout form inputs so Playwright can select them", + "should_trigger": true, + "reason": "data-cy-instrument trigger — add testids keyword" + }, + { + "id": 7, + "query": "Our Playwright selectors are failing because there are no data-cy attributes on the buttons", + "should_trigger": true, + "reason": "data-cy-instrument trigger — fix playwright selectors due to missing data-cy" + }, + { + "id": 8, + "query": "Update the PageObjects to use getByTestId() instead of CSS selectors", + "should_trigger": true, + "reason": "data-cy-instrument trigger — syncing PageObjects to use getByTestId()" + }, + { + "id": 9, + "query": "The coverage_gaps list is non-empty after the PageObject scan — how do I resolve it?", + "should_trigger": true, + "reason": "data-cy-instrument trigger — coverage_gaps resolution workflow" + }, + { + "id": 10, + "query": "FUNC-promo-validate has status planned because there are no data-cy attributes — fix it", + "should_trigger": true, + "reason": "data-cy-instrument trigger — Functionality.status planned due to missing test IDs" + }, + { + "id": 11, + "query": "Add data-cy to the third-party UI library button in the checkout form", + "should_trigger": true, + "reason": "data-cy-instrument trigger — even lib buttons need an audit decision (WORK_LOG if unsupported)" + }, + { + "id": 12, + "query": "Generate BDD scenarios for US-007", + "should_trigger": false, + "reason": "Adding Gherkin — routes to living-doc-scenario-creator" + }, + { + "id": 13, + "query": "Write step definitions for the checkout scenarios", + "should_trigger": false, + "reason": "Writing step definitions — routes to gherkin-step" + }, + { + "id": 14, + "query": "The PageObject selectors are broken after the UI redesign — heal them", + "should_trigger": false, + "reason": "PageObject healing without data-cy gaps — routes to living-doc-pageobject-scan HEALING" + }, + { + "id": 15, + "query": "Scan the webapp and generate PageObjects for the admin portal", + "should_trigger": false, + "reason": "PageObject creation — routes to living-doc-pageobject-scan CREATE" + }, + { + "id": 16, + "query": "Create a User Story for the checkout capability", + "should_trigger": false, + "reason": "Creating a User Story — routes to living-doc-create-user-story" + }, + { + "id": 17, + "query": "Run a gap analysis on the living doc", + "should_trigger": false, + "reason": "Gap analysis — routes to living-doc-gap-finder" + }, + { + "id": 18, + "query": "What does PR #217 affect in the living doc?", + "should_trigger": false, + "reason": "Impact analysis — routes to living-doc-impact-analysis" + }, + { + "id": 19, + "query": "Sync the @AC: tags in the feature files with the living doc", + "should_trigger": false, + "reason": "Feature file sync — routes to gherkin-living-doc-sync" + }, + { + "id": 20, + "query": "Delete all BDD artifacts linked to the deprecated checkout feature", + "should_trigger": false, + "reason": "BDD artifact cleanup — routes to bdd-maintain" + }, + { + "id": 21, + "query": "Update the wording of AC-1 on US-042", + "should_trigger": false, + "reason": "Updating an entity — routes to living-doc-update" + }, + { + "id": 22, + "query": "Create a Feature entity for the checkout module", + "should_trigger": false, + "reason": "Creating a Feature — routes to living-doc-create-feature" + } +] \ No newline at end of file diff --git a/skills/gherkin-living-doc-sync/evals/evals.json b/skills/gherkin-living-doc-sync/evals/evals.json index 0cc1f82..2aa367b 100644 --- a/skills/gherkin-living-doc-sync/evals/evals.json +++ b/skills/gherkin-living-doc-sync/evals/evals.json @@ -173,6 +173,18 @@ "The existing scenario is not deleted or modified beyond the tag update", "Developer confirmation is required before any file is edited" ] + }, + { + "id": 14, + "category": "happy-path", + "prompt": "During a sync pass I discover that the step text for AC:US-007-03 changed AND the PageObject method that backs it was renamed. What should I do for each part?", + "expected_output": "Split the work: (1) gherkin-living-doc-sync updates the step text and the @AC: tag in the .feature file — this skill handles Gherkin text sync. (2) The PageObject method rename (signature and locator) is owned by living-doc-pageobject-scan HEALING scope — load that skill for the PO side. Do not attempt to rename PageObject methods inside this sync skill.", + "files": [], + "expectations": [ + "Correctly splits Gherkin-side vs PageObject-side work", + "Routes PageObject method rename to living-doc-pageobject-scan HEALING", + "Does not attempt to rename PO methods within gherkin-living-doc-sync" + ] } ] -} +} \ No newline at end of file diff --git a/skills/gherkin-living-doc-sync/evals/trigger-eval.json b/skills/gherkin-living-doc-sync/evals/trigger-eval.json index f0a2faf..f765e4a 100644 --- a/skills/gherkin-living-doc-sync/evals/trigger-eval.json +++ b/skills/gherkin-living-doc-sync/evals/trigger-eval.json @@ -1,22 +1,182 @@ [ - {"id": 1, "query": "Sync the checkout feature file to the living doc", "should_trigger": true, "reason": "'sync gherkin to living doc' trigger phrase"}, - {"id": 2, "query": "My feature file is out of sync with the living doc catalog", "should_trigger": true, "reason": "'feature file out of sync' trigger phrase"}, - {"id": 3, "query": "This scenario has no # AC: comment linking it to the living doc", "should_trigger": true, "reason": "'scenario not linked to AC' trigger phrase"}, - {"id": 4, "query": "The step text changed after the UI refactor — what needs updating?", "should_trigger": true, "reason": "'step text changed' trigger phrase"}, - {"id": 5, "query": "There is Gherkin drift between the feature files and the living doc", "should_trigger": true, "reason": "'gherkin drift' trigger phrase"}, - {"id": 6, "query": "I updated an AC in the living doc — how do I propagate that to the BDD scenario?", "should_trigger": true, "reason": "'update living doc after BDD change' and living-doc → feature file sync direction"}, - {"id": 7, "query": "Run a BDD sync between the feature files and living documentation", "should_trigger": true, "reason": "'BDD sync' trigger phrase"}, - {"id": 8, "query": "The AC link header is missing from several scenarios in checkout.feature", "should_trigger": true, "reason": "'AC link missing in feature file' trigger phrase"}, - {"id": 9, "query": "Sync all scenarios in the payments feature file", "should_trigger": true, "reason": "'sync scenarios' trigger phrase"}, - {"id": 10, "query": "The Gherkin scenarios are out of sync with the living doc", "should_trigger": true, "reason": "'gherkin out of sync with living doc' trigger phrase"}, - {"id": 11, "query": "Traceability is broken between the feature files and the AC catalog", "should_trigger": true, "reason": "'traceability broken' trigger phrase"}, - {"id": 12, "query": "Write a new scenario for the expired promo AC", "should_trigger": false, "reason": "Writing new scenarios — routes to living-doc-scenario-creator"}, - {"id": 13, "query": "Implement the step definition for 'When the customer confirms the order'", "should_trigger": false, "reason": "Step definition implementation — routes to gherkin-step"}, - {"id": 14, "query": "Find which User Stories have no Gherkin scenarios", "should_trigger": false, "reason": "Finding living doc gaps — routes to living-doc-gap-finder"}, - {"id": 15, "query": "Create a new User Story for the checkout capability", "should_trigger": false, "reason": "Creating new entities — routes to living-doc-create-user-story"}, - {"id": 16, "query": "Propagate AC changes from the living doc back to the feature files", "should_trigger": true, "reason": "'propagate AC changes' trigger phrase"}, - {"id": 17, "query": "The @AC: tag and the # AC: comment are out of sync — what do I do?", "should_trigger": true, "reason": "Comment/tag mismatch is a sync issue — core task of this skill"}, - {"id": 18, "query": "Generate a new scenario for the expired promo AC from scratch", "should_trigger": false, "reason": "Writing new scenarios from scratch — routes to living-doc-scenario-creator (not syncing existing ones)"}, - {"id": 19, "query": "Run scan_ac_links.py before doing a sync pass", "should_trigger": true, "reason": "Auditing AC link headers is the first step of the sync workflow — this skill owns scan_ac_links.py"}, - {"id": 20, "query": "An AC was descoped last sprint — what should happen to the linked scenario?", "should_trigger": true, "reason": "Propagating AC status change (descoped) to feature file is a living-doc → feature file sync direction"} -] + { + "id": 1, + "query": "Sync the checkout feature file to the living doc", + "should_trigger": true, + "reason": "'sync gherkin to living doc' trigger phrase" + }, + { + "id": 2, + "query": "My feature file is out of sync with the living doc catalog", + "should_trigger": true, + "reason": "'feature file out of sync' trigger phrase" + }, + { + "id": 3, + "query": "This scenario has no # AC: comment linking it to the living doc", + "should_trigger": true, + "reason": "'scenario not linked to AC' trigger phrase" + }, + { + "id": 4, + "query": "The step text changed after the UI refactor — what needs updating?", + "should_trigger": true, + "reason": "'step text changed' trigger phrase" + }, + { + "id": 5, + "query": "There is Gherkin drift between the feature files and the living doc", + "should_trigger": true, + "reason": "'gherkin drift' trigger phrase" + }, + { + "id": 6, + "query": "I updated an AC in the living doc — how do I propagate that to the BDD scenario?", + "should_trigger": true, + "reason": "'update living doc after BDD change' and living-doc → feature file sync direction" + }, + { + "id": 7, + "query": "Run a BDD sync between the feature files and living documentation", + "should_trigger": true, + "reason": "'BDD sync' trigger phrase" + }, + { + "id": 8, + "query": "The AC link header is missing from several scenarios in checkout.feature", + "should_trigger": true, + "reason": "'AC link missing in feature file' trigger phrase" + }, + { + "id": 9, + "query": "Sync all scenarios in the payments feature file", + "should_trigger": true, + "reason": "'sync scenarios' trigger phrase" + }, + { + "id": 10, + "query": "The Gherkin scenarios are out of sync with the living doc", + "should_trigger": true, + "reason": "'gherkin out of sync with living doc' trigger phrase" + }, + { + "id": 11, + "query": "Traceability is broken between the feature files and the AC catalog", + "should_trigger": true, + "reason": "'traceability broken' trigger phrase" + }, + { + "id": 12, + "query": "Write a new scenario for the expired promo AC", + "should_trigger": false, + "reason": "Writing new scenarios — routes to living-doc-scenario-creator" + }, + { + "id": 13, + "query": "Implement the step definition for 'When the customer confirms the order'", + "should_trigger": false, + "reason": "Step definition implementation — routes to gherkin-step" + }, + { + "id": 14, + "query": "Find which User Stories have no Gherkin scenarios", + "should_trigger": false, + "reason": "Finding living doc gaps — routes to living-doc-gap-finder" + }, + { + "id": 15, + "query": "Create a new User Story for the checkout capability", + "should_trigger": false, + "reason": "Creating new entities — routes to living-doc-create-user-story" + }, + { + "id": 16, + "query": "Propagate AC changes from the living doc back to the feature files", + "should_trigger": true, + "reason": "'propagate AC changes' trigger phrase" + }, + { + "id": 17, + "query": "The @AC: tag and the # AC: comment are out of sync — what do I do?", + "should_trigger": true, + "reason": "Comment/tag mismatch is a sync issue — core task of this skill" + }, + { + "id": 18, + "query": "Generate a new scenario for the expired promo AC from scratch", + "should_trigger": false, + "reason": "Writing new scenarios from scratch — routes to living-doc-scenario-creator (not syncing existing ones)" + }, + { + "id": 19, + "query": "Run scan_ac_links.py before doing a sync pass", + "should_trigger": true, + "reason": "Auditing AC link headers is the first step of the sync workflow — this skill owns scan_ac_links.py" + }, + { + "id": 20, + "query": "An AC was descoped last sprint — what should happen to the linked scenario?", + "should_trigger": true, + "reason": "Propagating AC status change (descoped) to feature file is a living-doc → feature file sync direction" + }, + { + "id": 21, + "query": "Write behave step definitions for the checkout scenarios", + "should_trigger": false, + "reason": "Writing step definitions — routes to gherkin-step" + }, + { + "id": 22, + "query": "Create a new User Story for the express checkout journey", + "should_trigger": false, + "reason": "Creating a User Story — routes to living-doc-create-user-story" + }, + { + "id": 23, + "query": "Run a full gap analysis to find undocumented behaviors", + "should_trigger": false, + "reason": "Gap analysis — routes to living-doc-gap-finder" + }, + { + "id": 24, + "query": "Scan the webapp and generate PageObjects for the checkout screen", + "should_trigger": false, + "reason": "PageObject scanning — routes to living-doc-pageobject-scan" + }, + { + "id": 25, + "query": "Create a new Functionality entity for promo stacking validation", + "should_trigger": false, + "reason": "Creating a Functionality — routes to living-doc-create-functionality" + }, + { + "id": 26, + "query": "Update the acceptance criterion wording on US-007-01", + "should_trigger": false, + "reason": "Updating an existing entity — routes to living-doc-update" + }, + { + "id": 27, + "query": "Create a Feature entity for the notifications service", + "should_trigger": false, + "reason": "Creating a Feature — routes to living-doc-create-feature" + }, + { + "id": 28, + "query": "What does PR #217 affect in the living doc?", + "should_trigger": false, + "reason": "Impact analysis — routes to living-doc-impact-analysis" + }, + { + "id": 29, + "query": "Generate BDD scenarios for all active ACs on US-009", + "should_trigger": false, + "reason": "Generating scenarios — routes to living-doc-scenario-creator" + }, + { + "id": 30, + "query": "Add data-cy attributes to the checkout confirm button", + "should_trigger": false, + "reason": "Instrumenting templates with data-cy — routes to data-cy-instrument" + } +] \ No newline at end of file diff --git a/skills/gherkin-step/evals/evals.json b/skills/gherkin-step/evals/evals.json index 0d5080a..c431f19 100644 --- a/skills/gherkin-step/evals/evals.json +++ b/skills/gherkin-step/evals/evals.json @@ -108,13 +108,77 @@ "id": 9, "prompt": "My step is throwing AttributeError: \"Context\" object has no attribute \"checkout_page\". How do I fix this? The CheckoutPage class is in pages/checkout_page.py.", "expected_output": "Explanation that context.checkout_page must be initialized before the step runs, using a before_scenario hook in environment.py; shows the correct before_scenario pattern attaching a CheckoutPage instance to context.", - "files": [] + "files": [], + "category": "edge-case", + "expectations": [ + "Explains context.checkout_page must be assigned before the step runs", + "Points to a Before hook (environment.py or Cucumber Before) as the fix location", + "Shows the correct assignment: context.checkout_page = CheckoutPage(context.browser)", + "Does not suggest modifying the step function itself to work around the missing attribute" + ] }, { "id": 10, "prompt": "Should I name my behave step function step_when_the_customer_clicks_the_confirm_order_button or step_confirm_order?", "expected_output": "Recommends step_confirm_order - concise action-based name. Explains why verbose full-phrase names are discouraged: they duplicate the Gherkin text and make step files harder to scan.", - "files": [] + "files": [], + "category": "happy-path", + "expectations": [ + "Recommends the concise action-based name: step_confirm_order", + "Flags the verbose full-phrase name as an anti-pattern", + "Explains the reason: long names are hard to read and cause truncation in test output", + "Consistent with the step file naming convention: one file per domain" + ] + }, + { + "id": 11, + "category": "happy-path", + "prompt": "How do I set up AppWorld in Cucumber TypeScript so that each scenario gets a fresh Playwright browser context?", + "expected_output": "Define an AppWorld class implementing the World interface with a `page` property and a `browser` property. In the constructor, record `this.browser = browser`. In a Before hook, call `this.page = await this.browser.newPage()`. Register with `setWorldConstructor(AppWorld)`. Each scenario gets a fresh context automatically.", + "files": [], + "expectations": [ + "Shows AppWorld class with World interface implementation", + "Includes setWorldConstructor(AppWorld) registration", + "Before hook creates new page and assigns to this.page", + "After hook closes page" + ] + }, + { + "id": 12, + "category": "happy-path", + "prompt": "Show me a complete Cucumber TypeScript World setup for a Playwright test suite — I need the AppWorld interface, the class, and the Before/After hooks.", + "expected_output": "Provide: (1) AppWorld interface with page and browser properties, (2) AppWorld class implementing it with browser injection in constructor, (3) setWorldConstructor(AppWorld), (4) Before hook: this.page = await this.browser.newPage(), (5) After hook: await this.page?.close().", + "files": [], + "expectations": [ + "AppWorld interface shown with correct property types", + "setWorldConstructor called with the class", + "Before and After hooks shown", + "No hardcoded browser creation inside the class" + ] + }, + { + "id": 13, + "category": "regression", + "prompt": "My Cucumber TypeScript World setup creates a new browser in the constructor with `playwright.chromium.launch()`. What is wrong with this?", + "expected_output": "The browser should be injected via the World constructor parameter `{ browser }`, not created inside the class. Creating a browser in the constructor means each scenario launches a separate browser process, bypassing Cucumber's browser management. Use `this.browser = browser` and create only a new page (`this.browser.newPage()`) in the Before hook.", + "files": [], + "expectations": [ + "Identifies the anti-pattern: launching browser inside constructor", + "Explains the correct pattern: inject browser via constructor parameter", + "Shows Before hook creating a new page instead" + ] + }, + { + "id": 14, + "category": "happy-path", + "prompt": "What is the correct file naming convention for Cucumber TypeScript step definition files? I have one large steps.ts file right now.", + "expected_output": "Use one step file per domain, named after the domain: e.g. `checkout.steps.ts`, `login.steps.ts`. Never use a generic `steps.ts` filename — it makes it hard to locate step definitions during debugging and conflicts when multiple domains are merged.", + "files": [], + "expectations": [ + "Recommends per-domain file naming", + "Shows example: checkout.steps.ts, login.steps.ts", + "Flags generic steps.ts as an anti-pattern" + ] } ] } \ No newline at end of file diff --git a/skills/gherkin-step/evals/trigger-eval.json b/skills/gherkin-step/evals/trigger-eval.json index 4217756..8a821b8 100644 --- a/skills/gherkin-step/evals/trigger-eval.json +++ b/skills/gherkin-step/evals/trigger-eval.json @@ -99,14 +99,132 @@ "id": 17, "query": "Write a unit test for the discount calculation function", "should_trigger": false, - "reason": "Unit test request \u2014 out of scope for this toolkit (no test-unit-write skill defined)" + "reason": "Unit test request — out of scope for this toolkit (no test-unit-write skill defined)" }, { "query": "How do I initialize the CheckoutPage in behave so that context.checkout_page is available in my When and Then step definitions?", - "should_trigger": true + "should_trigger": true, + "id": 18, + "reason": "Initialising World context in behave — routes to gherkin-step" + }, + { + "query": "My step function is called step_when_the_customer_clicks_the_submit_order_button — is that the right naming convention for behave?", + "should_trigger": true, + "id": 19, + "reason": "Step function naming convention for behave — routes to gherkin-step" + }, + { + "id": 20, + "query": "How do I set up AppWorld in Cucumber TypeScript for Playwright integration?", + "should_trigger": true, + "reason": "World/AppWorld setup for Cucumber TypeScript — gherkin-step owns this" + }, + { + "id": 21, + "query": "How do I use setWorldConstructor to register a custom World with a Playwright browser?", + "should_trigger": true, + "reason": "Registering World constructor in Cucumber TypeScript — gherkin-step" + }, + { + "id": 22, + "query": "Write a Gherkin scenario for the checkout flow", + "should_trigger": false, + "reason": "Writing scenarios — routes to living-doc-scenario-creator" + }, + { + "id": 23, + "query": "Create a User Story for the payment capability", + "should_trigger": false, + "reason": "Creating a User Story — routes to living-doc-create-user-story" + }, + { + "id": 24, + "query": "Sync the feature files with the living doc after AC changes", + "should_trigger": false, + "reason": "Feature file / AC sync — routes to gherkin-living-doc-sync" }, { - "query": "My step function is called step_when_the_customer_clicks_the_submit_order_button \u2014 is that the right naming convention for behave?", - "should_trigger": true + "id": 25, + "query": "Run a gap analysis on our test coverage", + "should_trigger": false, + "reason": "Gap analysis — routes to living-doc-gap-finder" + }, + { + "id": 26, + "query": "Scan the webapp and generate PageObjects for all screens", + "should_trigger": false, + "reason": "PageObject scanning — routes to living-doc-pageobject-scan" + }, + { + "id": 27, + "query": "Document the atomic behavior for cart validation in the living doc", + "should_trigger": false, + "reason": "Creating a Functionality — routes to living-doc-create-functionality" + }, + { + "id": 28, + "query": "Update the wording of AC-1 on US-042", + "should_trigger": false, + "reason": "Updating a living doc entity — routes to living-doc-update" + }, + { + "id": 29, + "query": "What does PR #217 affect in the living doc?", + "should_trigger": false, + "reason": "Impact analysis — routes to living-doc-impact-analysis" + }, + { + "id": 30, + "query": "Create a Feature entity for the orders service", + "should_trigger": false, + "reason": "Creating a Feature — routes to living-doc-create-feature" + }, + { + "id": 31, + "query": "Generate a feature file for all active ACs in US-007", + "should_trigger": false, + "reason": "Generating scenarios from User Story — routes to living-doc-scenario-creator" + }, + { + "id": 32, + "query": "Add data-cy attributes to the confirm button template", + "should_trigger": false, + "reason": "Instrumenting templates — routes to data-cy-instrument" + }, + { + "id": 33, + "query": "Find all User Stories that have no BDD scenarios", + "should_trigger": false, + "reason": "Gap analysis — routes to living-doc-gap-finder" + }, + { + "id": 34, + "query": "Delete BDD artifacts linked to the deprecated checkout feature", + "should_trigger": false, + "reason": "BDD artifact cleanup — routes to bdd-maintain" + }, + { + "id": 35, + "query": "Create a new User Story for the express checkout journey", + "should_trigger": false, + "reason": "Creating a User Story — routes to living-doc-create-user-story" + }, + { + "id": 36, + "query": "Fix the @AC: traceability tags in the checkout feature file", + "should_trigger": false, + "reason": "@AC: tag sync — routes to gherkin-living-doc-sync" + }, + { + "id": 37, + "query": "Deprecate the checkout feature in the living doc", + "should_trigger": false, + "reason": "Deprecating an entity — routes to living-doc-update" + }, + { + "id": 38, + "query": "Crawl the UI to discover all screens and generate page objects", + "should_trigger": false, + "reason": "UI crawl and PageObject creation — routes to living-doc-pageobject-scan" } ] \ No newline at end of file diff --git a/skills/living-doc-create-feature/evals/evals.json b/skills/living-doc-create-feature/evals/evals.json index c48e1c5..c3e9bda 100644 --- a/skills/living-doc-create-feature/evals/evals.json +++ b/skills/living-doc-create-feature/evals/evals.json @@ -185,6 +185,19 @@ "Warns that identical names break impact analysis and traceability", "Notes that FEAT ID must also be unique" ] + }, + { + "id": 15, + "category": "regression", + "prompt": "I need to rename the Feature FEAT-checkout to FEAT-checkout-v2 because we split the checkout domain. What steps do I need to follow?", + "expected_output": "Renaming a Feature requires a cascade: (1) Update the entity file (id and name fields). (2) Update feature_id in all linked Functionality entities. (3) Update feature_registry. (4) Update manifest.json / seed.yaml. (5) Update PageObject file headers. (6) Update Gherkin # Feature: headers. (7) Run living-doc-gap-finder to confirm no orphan references remain. Load living-doc-update and follow the 'Rename a Feature' workflow.", + "files": [], + "expectations": [ + "Lists all 7 cascade steps for a Feature rename", + "Mentions living-doc-update as the skill that owns the rename workflow", + "Notes feature_registry and Functionality.feature_id must be updated", + "Recommends gap-finder run at the end to confirm clean state" + ] } ] -} +} \ No newline at end of file diff --git a/skills/living-doc-create-feature/evals/trigger-eval.json b/skills/living-doc-create-feature/evals/trigger-eval.json index 817232e..faedd5f 100644 --- a/skills/living-doc-create-feature/evals/trigger-eval.json +++ b/skills/living-doc-create-feature/evals/trigger-eval.json @@ -1,20 +1,164 @@ [ - {"id": 1, "query": "Document the checkout page as a Feature entity", "should_trigger": true, "reason": "Explicit 'document feature entity' trigger phrase"}, - {"id": 2, "query": "Create a Feature entity for the Orders API", "should_trigger": true, "reason": "Explicit 'create a feature entity' trigger keyword"}, - {"id": 3, "query": "New screen documentation for the account preferences page", "should_trigger": true, "reason": "'new screen documentation' trigger phrase"}, - {"id": 4, "query": "Document a new API endpoint — the payment initiation endpoint", "should_trigger": true, "reason": "'document an API endpoint' trigger phrase"}, - {"id": 5, "query": "Update the feature registry with the new notification service", "should_trigger": true, "reason": "'feature registry' trigger keyword"}, - {"id": 6, "query": "What feature owns the checkout screen?", "should_trigger": true, "reason": "'what feature owns this' trigger phrase"}, - {"id": 7, "query": "Map User Story US-007 to its Feature", "should_trigger": true, "reason": "'map user story to feature' trigger phrase"}, - {"id": 8, "query": "I need to document the discount engine as a system surface", "should_trigger": true, "reason": "Documenting a system surface — Feature creation workflow"}, - {"id": 9, "query": "Create a feature entity for the authentication module", "should_trigger": true, "reason": "Explicit 'create feature entity' trigger"}, - {"id": 10, "query": "What are the owners and dependencies for the checkout feature?", "should_trigger": true, "reason": "Asking about Feature properties — skill can populate or validate"}, - {"id": 11, "query": "Create a user story for the checkout capability", "should_trigger": false, "reason": "User Story creation — routes to living-doc-create-user-story"}, - {"id": 12, "query": "Document the atomic behavior: validate cart before checkout", "should_trigger": false, "reason": "Atomic behavior — routes to living-doc-create-functionality"}, - {"id": 13, "query": "Scan the checkout page for PageObjects", "should_trigger": false, "reason": "UI scan — routes to living-doc-pageobject-scan"}, - {"id": 14, "query": "Generate Gherkin scenarios for the checkout User Story", "should_trigger": false, "reason": "Scenario creation — routes to living-doc-scenario-creator"}, - {"id": 15, "query": "Register the notification background worker in the living doc as a system surface", "should_trigger": true, "reason": "Documenting a background worker as a system surface — Feature creation (surface_type=Worker)"}, - {"id": 16, "query": "Deprecate the checkout feature in the living doc", "should_trigger": false, "reason": "Deprecating an existing entity — routes to living-doc-update"}, - {"id": 17, "query": "Document the Orders Service — it exposes a REST API to place and cancel orders", "should_trigger": true, "reason": "Documenting a backend service surface — Feature creation (surface_type=Service or API)"}, - {"id": 18, "query": "Two Features have the same name 'Payment Page' — how do I resolve this?", "should_trigger": true, "reason": "Duplicate Feature name resolution is part of the Feature creation workflow"} -] + { + "id": 1, + "query": "Document the checkout page as a Feature entity", + "should_trigger": true, + "reason": "Explicit 'document feature entity' trigger phrase" + }, + { + "id": 2, + "query": "Create a Feature entity for the Orders API", + "should_trigger": true, + "reason": "Explicit 'create a feature entity' trigger keyword" + }, + { + "id": 3, + "query": "New screen documentation for the account preferences page", + "should_trigger": true, + "reason": "'new screen documentation' trigger phrase" + }, + { + "id": 4, + "query": "Document a new API endpoint — the payment initiation endpoint", + "should_trigger": true, + "reason": "'document an API endpoint' trigger phrase" + }, + { + "id": 5, + "query": "Update the feature registry with the new notification service", + "should_trigger": true, + "reason": "'feature registry' trigger keyword" + }, + { + "id": 6, + "query": "What feature owns the checkout screen?", + "should_trigger": true, + "reason": "'what feature owns this' trigger phrase" + }, + { + "id": 7, + "query": "Map User Story US-007 to its Feature", + "should_trigger": true, + "reason": "'map user story to feature' trigger phrase" + }, + { + "id": 8, + "query": "I need to document the discount engine as a system surface", + "should_trigger": true, + "reason": "Documenting a system surface — Feature creation workflow" + }, + { + "id": 9, + "query": "Create a feature entity for the authentication module", + "should_trigger": true, + "reason": "Explicit 'create feature entity' trigger" + }, + { + "id": 10, + "query": "What are the owners and dependencies for the checkout feature?", + "should_trigger": true, + "reason": "Asking about Feature properties — skill can populate or validate" + }, + { + "id": 11, + "query": "Create a user story for the checkout capability", + "should_trigger": false, + "reason": "User Story creation — routes to living-doc-create-user-story" + }, + { + "id": 12, + "query": "Document the atomic behavior: validate cart before checkout", + "should_trigger": false, + "reason": "Atomic behavior — routes to living-doc-create-functionality" + }, + { + "id": 13, + "query": "Scan the checkout page for PageObjects", + "should_trigger": false, + "reason": "UI scan — routes to living-doc-pageobject-scan" + }, + { + "id": 14, + "query": "Generate Gherkin scenarios for the checkout User Story", + "should_trigger": false, + "reason": "Scenario creation — routes to living-doc-scenario-creator" + }, + { + "id": 15, + "query": "Register the notification background worker in the living doc as a system surface", + "should_trigger": true, + "reason": "Documenting a background worker as a system surface — Feature creation (surface_type=Worker)" + }, + { + "id": 16, + "query": "Deprecate the checkout feature in the living doc", + "should_trigger": false, + "reason": "Deprecating an existing entity — routes to living-doc-update" + }, + { + "id": 17, + "query": "Document the Orders Service — it exposes a REST API to place and cancel orders", + "should_trigger": true, + "reason": "Documenting a backend service surface — Feature creation (surface_type=Service or API)" + }, + { + "id": 18, + "query": "Two Features have the same name 'Payment Page' — how do I resolve this?", + "should_trigger": true, + "reason": "Duplicate Feature name resolution is part of the Feature creation workflow" + }, + { + "id": 19, + "query": "Write step definitions for the checkout scenarios", + "should_trigger": false, + "reason": "Writing step definitions — routes to gherkin-step" + }, + { + "id": 20, + "query": "Run a dead code audit to find unused step definitions", + "should_trigger": false, + "reason": "BDD artifact cleanup — routes to bdd-maintain" + }, + { + "id": 21, + "query": "Run a gap analysis on the living doc", + "should_trigger": false, + "reason": "Gap analysis — routes to living-doc-gap-finder" + }, + { + "id": 22, + "query": "Sync the feature files with the updated AC catalog", + "should_trigger": false, + "reason": "Feature file sync — routes to gherkin-living-doc-sync" + }, + { + "id": 23, + "query": "What does PR #217 affect in the living doc?", + "should_trigger": false, + "reason": "Impact analysis — routes to living-doc-impact-analysis" + }, + { + "id": 24, + "query": "Add data-cy attributes to the login form", + "should_trigger": false, + "reason": "Instrumenting templates — routes to data-cy-instrument" + }, + { + "id": 25, + "query": "Generate BDD scenarios for all ACs on US-007", + "should_trigger": false, + "reason": "Generating scenarios — routes to living-doc-scenario-creator" + }, + { + "id": 26, + "query": "Create a new Functionality for the promo validation logic", + "should_trigger": false, + "reason": "Creating a Functionality — routes to living-doc-create-functionality" + }, + { + "id": 27, + "query": "I need to rename FEAT-checkout to FEAT-checkout-v2 — what are all the cascade steps?", + "should_trigger": true, + "reason": "Renaming a Feature — create-feature owns the Feature schema and triggers rename guidance via living-doc-update" + } +] \ No newline at end of file diff --git a/skills/living-doc-create-functionality/evals/evals.json b/skills/living-doc-create-functionality/evals/evals.json index 2213e4f..e7a3348 100644 --- a/skills/living-doc-create-functionality/evals/evals.json +++ b/skills/living-doc-create-functionality/evals/evals.json @@ -161,6 +161,18 @@ "Includes an error-case AC with an explicit error code", "Explains: every AC must state an exact outcome" ] + }, + { + "id": 13, + "category": "regression", + "prompt": "I just saved FUNC-promo-validate to the living doc. What else do I need to do to make it fully linked?", + "expected_output": "After saving the Functionality entity, load living-doc-update and append 'FUNC-promo-validate' to the parent Feature's 'functionalities' array. An unlinked Functionality will be flagged as ORPHAN_FUNCTIONALITY by living-doc-gap-finder. The link must be added to the Feature entity — the Functionality alone is not sufficient.", + "files": [], + "expectations": [ + "Identifies the required parent Feature update step", + "Routes to living-doc-update to perform the append", + "Mentions ORPHAN_FUNCTIONALITY gap type as the consequence of skipping" + ] } ] -} +} \ No newline at end of file diff --git a/skills/living-doc-create-functionality/evals/trigger-eval.json b/skills/living-doc-create-functionality/evals/trigger-eval.json index e2bb420..f50c42e 100644 --- a/skills/living-doc-create-functionality/evals/trigger-eval.json +++ b/skills/living-doc-create-functionality/evals/trigger-eval.json @@ -1,24 +1,200 @@ [ - {"id": 1, "query": "Create a functionality for validating cart contents before checkout", "should_trigger": true, "reason": "Explicit 'create a functionality' trigger keyword"}, - {"id": 2, "query": "Document the atomic behavior: apply discount to a cart item", "should_trigger": true, "reason": "'document an atomic behavior' trigger phrase"}, - {"id": 3, "query": "Write Functionality ACs for the discount engine", "should_trigger": true, "reason": "'functionality AC' trigger phrase"}, - {"id": 4, "query": "Define a unit-testable behavior for the coupon validation module", "should_trigger": true, "reason": "'unit-testable behavior' and 'define component behavior' trigger phrases"}, - {"id": 5, "query": "Document the business rule: orders over $100 get free shipping", "should_trigger": true, "reason": "'document a business rule' trigger phrase"}, - {"id": 6, "query": "Create a functionality entity for the payment retry logic", "should_trigger": true, "reason": "'create a functionality entity' trigger phrase"}, - {"id": 7, "query": "What ACs should I write for the email validator function?", "should_trigger": true, "reason": "Asking for atomic AC writing — core functionality skill task"}, - {"id": 8, "query": "What test_type should I use for checking DB uniqueness constraints?", "should_trigger": true, "reason": "Deciding unit vs integration — functionality skill task"}, - {"id": 9, "query": "Review this functionality for completeness — it only has a happy path", "should_trigger": true, "reason": "Completeness check of Functionality ACs is a core task"}, - {"id": 10, "query": "I see this AC in both US-001 and US-007 — should I split it out?", "should_trigger": true, "reason": "Reuse candidate identification — a core functionality skill task"}, - {"id": 11, "query": "Create a user story for the checkout capability", "should_trigger": false, "reason": "User Story — routes to living-doc-create-user-story"}, - {"id": 12, "query": "Document the checkout page as a Feature", "should_trigger": false, "reason": "Feature entity — routes to living-doc-create-feature"}, - {"id": 13, "query": "Generate BDD scenarios for US-001", "should_trigger": false, "reason": "Scenario generation — routes to living-doc-scenario-creator"}, - {"id": 14, "query": "Run a gap analysis on the living documentation", "should_trigger": false, "reason": "Gap analysis — routes to living-doc-gap-finder"}, - {"id": 15, "query": "How should I define the component behavior for the payment validator?", "should_trigger": true, "reason": "'define component behavior' trigger phrase"}, - {"id": 16, "query": "Write atomic acceptance criteria for the session expiry logic", "should_trigger": true, "reason": "'atomic acceptance criteria' trigger phrase"}, - {"id": 17, "query": "Should this behavior be tested with a unit test or an integration test?", "should_trigger": true, "reason": "'unit vs integration test' trigger phrase"}, - {"id": 18, "query": "Help me choose test type for the loyalty points calculation — it calls no external services", "should_trigger": true, "reason": "'choose test type' trigger phrase"}, - {"id": 19, "query": "Help me document the null-check rule for user IDs in the registration service", "should_trigger": true, "reason": "Documenting an atomic validation rule is a Functionality — 'document a business rule' / 'atomic acceptance criteria' pattern"}, - {"id": 20, "query": "A Functionality I wrote has no parent Feature — how do I link it?", "should_trigger": true, "reason": "Resolving ORPHAN_FUNCTIONALITY — identifying and linking the parent Feature is a Functionality skill task"}, - {"id": 21, "query": "Update the living doc entity for the discount validation Functionality", "should_trigger": false, "reason": "Updating an existing entity — routes to living-doc-update"}, - {"id": 22, "query": "Scan the checkout page for UI elements", "should_trigger": false, "reason": "UI scan — routes to living-doc-pageobject-scan"} -] + { + "id": 1, + "query": "Create a functionality for validating cart contents before checkout", + "should_trigger": true, + "reason": "Explicit 'create a functionality' trigger keyword" + }, + { + "id": 2, + "query": "Document the atomic behavior: apply discount to a cart item", + "should_trigger": true, + "reason": "'document an atomic behavior' trigger phrase" + }, + { + "id": 3, + "query": "Write Functionality ACs for the discount engine", + "should_trigger": true, + "reason": "'functionality AC' trigger phrase" + }, + { + "id": 4, + "query": "Define a unit-testable behavior for the coupon validation module", + "should_trigger": true, + "reason": "'unit-testable behavior' and 'define component behavior' trigger phrases" + }, + { + "id": 5, + "query": "Document the business rule: orders over $100 get free shipping", + "should_trigger": true, + "reason": "'document a business rule' trigger phrase" + }, + { + "id": 6, + "query": "Create a functionality entity for the payment retry logic", + "should_trigger": true, + "reason": "'create a functionality entity' trigger phrase" + }, + { + "id": 7, + "query": "What ACs should I write for the email validator function?", + "should_trigger": true, + "reason": "Asking for atomic AC writing — core functionality skill task" + }, + { + "id": 8, + "query": "What test_type should I use for checking DB uniqueness constraints?", + "should_trigger": true, + "reason": "Deciding unit vs integration — functionality skill task" + }, + { + "id": 9, + "query": "Review this functionality for completeness — it only has a happy path", + "should_trigger": true, + "reason": "Completeness check of Functionality ACs is a core task" + }, + { + "id": 10, + "query": "I see this AC in both US-001 and US-007 — should I split it out?", + "should_trigger": true, + "reason": "Reuse candidate identification — a core functionality skill task" + }, + { + "id": 11, + "query": "Create a user story for the checkout capability", + "should_trigger": false, + "reason": "User Story — routes to living-doc-create-user-story" + }, + { + "id": 12, + "query": "Document the checkout page as a Feature", + "should_trigger": false, + "reason": "Feature entity — routes to living-doc-create-feature" + }, + { + "id": 13, + "query": "Generate BDD scenarios for US-001", + "should_trigger": false, + "reason": "Scenario generation — routes to living-doc-scenario-creator" + }, + { + "id": 14, + "query": "Run a gap analysis on the living documentation", + "should_trigger": false, + "reason": "Gap analysis — routes to living-doc-gap-finder" + }, + { + "id": 15, + "query": "How should I define the component behavior for the payment validator?", + "should_trigger": true, + "reason": "'define component behavior' trigger phrase" + }, + { + "id": 16, + "query": "Write atomic acceptance criteria for the session expiry logic", + "should_trigger": true, + "reason": "'atomic acceptance criteria' trigger phrase" + }, + { + "id": 17, + "query": "Should this behavior be tested with a unit test or an integration test?", + "should_trigger": true, + "reason": "'unit vs integration test' trigger phrase" + }, + { + "id": 18, + "query": "Help me choose test type for the loyalty points calculation — it calls no external services", + "should_trigger": true, + "reason": "'choose test type' trigger phrase" + }, + { + "id": 19, + "query": "Help me document the null-check rule for user IDs in the registration service", + "should_trigger": true, + "reason": "Documenting an atomic validation rule is a Functionality — 'document a business rule' / 'atomic acceptance criteria' pattern" + }, + { + "id": 20, + "query": "A Functionality I wrote has no parent Feature — how do I link it?", + "should_trigger": true, + "reason": "Resolving ORPHAN_FUNCTIONALITY — identifying and linking the parent Feature is a Functionality skill task" + }, + { + "id": 21, + "query": "Update the living doc entity for the discount validation Functionality", + "should_trigger": false, + "reason": "Updating an existing entity — routes to living-doc-update" + }, + { + "id": 22, + "query": "Scan the checkout page for UI elements", + "should_trigger": false, + "reason": "UI scan — routes to living-doc-pageobject-scan" + }, + { + "id": 23, + "query": "Write step definitions for the cart validation behavior", + "should_trigger": false, + "reason": "Writing step definitions — routes to gherkin-step" + }, + { + "id": 24, + "query": "Run a dead code audit to find unused PageObject methods", + "should_trigger": false, + "reason": "BDD artifact cleanup — routes to bdd-maintain" + }, + { + "id": 25, + "query": "Sync the feature files with the updated AC catalog", + "should_trigger": false, + "reason": "Feature file sync — routes to gherkin-living-doc-sync" + }, + { + "id": 26, + "query": "What does changing the cart validator affect in the living doc?", + "should_trigger": false, + "reason": "Impact analysis — routes to living-doc-impact-analysis" + }, + { + "id": 27, + "query": "Scan the checkout page for UI elements and generate PageObjects", + "should_trigger": false, + "reason": "PageObject scanning — routes to living-doc-pageobject-scan" + }, + { + "id": 28, + "query": "Add data-cy attributes to the checkout form inputs", + "should_trigger": false, + "reason": "Instrumenting templates — routes to data-cy-instrument" + }, + { + "id": 29, + "query": "Generate BDD scenarios for all active ACs on US-007", + "should_trigger": false, + "reason": "Generating scenarios — routes to living-doc-scenario-creator" + }, + { + "id": 30, + "query": "Create a User Story for the cart validation capability", + "should_trigger": false, + "reason": "Creating a User Story — routes to living-doc-create-user-story" + }, + { + "id": 31, + "query": "Create a Feature entity for the checkout module", + "should_trigger": false, + "reason": "Creating a Feature — routes to living-doc-create-feature" + }, + { + "id": 32, + "query": "Run a gap analysis on the living documentation", + "should_trigger": false, + "reason": "Gap analysis — routes to living-doc-gap-finder" + }, + { + "id": 33, + "query": "I just created FUNC-promo-validate — how do I add it to the parent Feature's functionalities array?", + "should_trigger": true, + "reason": "Parent Feature sync after creating a Functionality — living-doc-create-functionality owns this step" + } +] \ No newline at end of file diff --git a/skills/living-doc-create-user-story/evals/evals.json b/skills/living-doc-create-user-story/evals/evals.json index 48ae98b..356d689 100644 --- a/skills/living-doc-create-user-story/evals/evals.json +++ b/skills/living-doc-create-user-story/evals/evals.json @@ -163,6 +163,18 @@ "Both User Stories link to the shared Functionality", "Points to living-doc-create-functionality for the extraction" ] + }, + { + "id": 13, + "category": "happy-path", + "prompt": "I'm creating US-015 for the promo stacking feature. FUNC-promo-validate and FUNC-promo-stack already exist in the living doc. Should I link them?", + "expected_output": "Yes — link both Functionalities in the User Story's 'functionalities' array. Asking about existing Functionalities during US creation prevents ORPHAN_FUNCTIONALITY gaps and makes the entity graph traversable from US down to test coverage. If the Functionalities are relevant, they must be linked.", + "files": [], + "expectations": [ + "Confirms the Functionalities should be linked", + "Explains the ORPHAN_FUNCTIONALITY consequence if skipped", + "Shows how to add them to the functionalities array in the entity" + ] } ] -} +} \ No newline at end of file diff --git a/skills/living-doc-create-user-story/evals/trigger-eval.json b/skills/living-doc-create-user-story/evals/trigger-eval.json index 4a75a21..e1ae24a 100644 --- a/skills/living-doc-create-user-story/evals/trigger-eval.json +++ b/skills/living-doc-create-user-story/evals/trigger-eval.json @@ -1,20 +1,164 @@ [ - {"id": 1, "query": "Create a user story for the password reset feature", "should_trigger": true, "reason": "Explicit 'create a user story' trigger keyword"}, - {"id": 2, "query": "Write acceptance criteria for the login capability", "should_trigger": true, "reason": "Explicit 'write acceptance criteria' trigger keyword"}, - {"id": 3, "query": "I need a new user story — a customer wants to track their delivery", "should_trigger": true, "reason": "Explicit 'new user story' trigger keyword"}, - {"id": 4, "query": "As a customer I want to view my order history", "should_trigger": true, "reason": "As-a narrative format triggers US elicitation"}, - {"id": 5, "query": "Help me document a business requirement for promo codes", "should_trigger": true, "reason": "'document a business requirement' trigger phrase"}, - {"id": 6, "query": "I need to define US AC for the checkout flow", "should_trigger": true, "reason": "'define US AC' trigger phrase"}, - {"id": 7, "query": "User story template for a SaaS onboarding feature", "should_trigger": true, "reason": "'user story template' trigger phrase"}, - {"id": 8, "query": "Elicit requirements for the notifications feature", "should_trigger": true, "reason": "'elicit requirements' trigger phrase"}, - {"id": 9, "query": "Review this user story and tell me what ACs are missing", "should_trigger": true, "reason": "Reviewing US ACs is part of this skill's completeness check"}, - {"id": 10, "query": "Is my narrative well formed? 'As a system I can process payments'", "should_trigger": true, "reason": "Validating a narrative is a core skill task"}, - {"id": 11, "query": "Document the checkout page as a Feature entity", "should_trigger": false, "reason": "Feature entity creation — routes to living-doc-create-feature"}, - {"id": 12, "query": "Document the atomic behavior: validate cart is not empty", "should_trigger": false, "reason": "Atomic behavior is a Functionality — routes to living-doc-create-functionality"}, - {"id": 13, "query": "Generate BDD scenarios for US-001", "should_trigger": false, "reason": "Scenario generation — routes to living-doc-scenario-creator"}, - {"id": 14, "query": "What test gaps exist in our living documentation?", "should_trigger": false, "reason": "Gap analysis — routes to living-doc-gap-finder"}, - {"id": 15, "query": "I want to write requirements for the loyalty points feature", "should_trigger": true, "reason": "'document a business requirement' pattern — User Story elicitation"}, - {"id": 16, "query": "Is my user story well-formed? Here it is: 'As a system, process the payment'", "should_trigger": true, "reason": "Validating a User Story narrative is a core skill task"}, - {"id": 17, "query": "My I-want clause contains 'and' — is that OK for a User Story?", "should_trigger": true, "reason": "Reviewing User Story narrative correctness is a core task of this skill"}, - {"id": 18, "query": "Create a BDD scenario for the checkout User Story", "should_trigger": false, "reason": "Scenario creation from a User Story — routes to living-doc-scenario-creator"} -] + { + "id": 1, + "query": "Create a user story for the password reset feature", + "should_trigger": true, + "reason": "Explicit 'create a user story' trigger keyword" + }, + { + "id": 2, + "query": "Write acceptance criteria for the login capability", + "should_trigger": true, + "reason": "Explicit 'write acceptance criteria' trigger keyword" + }, + { + "id": 3, + "query": "I need a new user story — a customer wants to track their delivery", + "should_trigger": true, + "reason": "Explicit 'new user story' trigger keyword" + }, + { + "id": 4, + "query": "As a customer I want to view my order history", + "should_trigger": true, + "reason": "As-a narrative format triggers US elicitation" + }, + { + "id": 5, + "query": "Help me document a business requirement for promo codes", + "should_trigger": true, + "reason": "'document a business requirement' trigger phrase" + }, + { + "id": 6, + "query": "I need to define US AC for the checkout flow", + "should_trigger": true, + "reason": "'define US AC' trigger phrase" + }, + { + "id": 7, + "query": "User story template for a SaaS onboarding feature", + "should_trigger": true, + "reason": "'user story template' trigger phrase" + }, + { + "id": 8, + "query": "Elicit requirements for the notifications feature", + "should_trigger": true, + "reason": "'elicit requirements' trigger phrase" + }, + { + "id": 9, + "query": "Review this user story and tell me what ACs are missing", + "should_trigger": true, + "reason": "Reviewing US ACs is part of this skill's completeness check" + }, + { + "id": 10, + "query": "Is my narrative well formed? 'As a system I can process payments'", + "should_trigger": true, + "reason": "Validating a narrative is a core skill task" + }, + { + "id": 11, + "query": "Document the checkout page as a Feature entity", + "should_trigger": false, + "reason": "Feature entity creation — routes to living-doc-create-feature" + }, + { + "id": 12, + "query": "Document the atomic behavior: validate cart is not empty", + "should_trigger": false, + "reason": "Atomic behavior is a Functionality — routes to living-doc-create-functionality" + }, + { + "id": 13, + "query": "Generate BDD scenarios for US-001", + "should_trigger": false, + "reason": "Scenario generation — routes to living-doc-scenario-creator" + }, + { + "id": 14, + "query": "What test gaps exist in our living documentation?", + "should_trigger": false, + "reason": "Gap analysis — routes to living-doc-gap-finder" + }, + { + "id": 15, + "query": "I want to write requirements for the loyalty points feature", + "should_trigger": true, + "reason": "'document a business requirement' pattern — User Story elicitation" + }, + { + "id": 16, + "query": "Is my user story well-formed? Here it is: 'As a system, process the payment'", + "should_trigger": true, + "reason": "Validating a User Story narrative is a core skill task" + }, + { + "id": 17, + "query": "My I-want clause contains 'and' — is that OK for a User Story?", + "should_trigger": true, + "reason": "Reviewing User Story narrative correctness is a core task of this skill" + }, + { + "id": 18, + "query": "Create a BDD scenario for the checkout User Story", + "should_trigger": false, + "reason": "Scenario creation from a User Story — routes to living-doc-scenario-creator" + }, + { + "id": 19, + "query": "Write step definitions for the login AC scenarios", + "should_trigger": false, + "reason": "Writing step definitions — routes to gherkin-step" + }, + { + "id": 20, + "query": "Delete all BDD artifacts linked to the deprecated US-007", + "should_trigger": false, + "reason": "BDD artifact cleanup — routes to bdd-maintain" + }, + { + "id": 21, + "query": "Sync the feature files with the updated living doc", + "should_trigger": false, + "reason": "Feature file sync — routes to gherkin-living-doc-sync" + }, + { + "id": 22, + "query": "What does PR #217 affect in the living doc?", + "should_trigger": false, + "reason": "Impact analysis — routes to living-doc-impact-analysis" + }, + { + "id": 23, + "query": "Scan the webapp and generate PageObjects for the login screen", + "should_trigger": false, + "reason": "PageObject scanning — routes to living-doc-pageobject-scan" + }, + { + "id": 24, + "query": "Add data-cy attributes to the login form inputs", + "should_trigger": false, + "reason": "Instrumenting templates — routes to data-cy-instrument" + }, + { + "id": 25, + "query": "Run a gap analysis to find User Stories without scenarios", + "should_trigger": false, + "reason": "Gap analysis — routes to living-doc-gap-finder" + }, + { + "id": 26, + "query": "Create a Feature entity for the authentication module", + "should_trigger": false, + "reason": "Creating a Feature — routes to living-doc-create-feature" + }, + { + "id": 27, + "query": "When creating this User Story, should I link the existing Functionality FUNC-promo-validate?", + "should_trigger": true, + "reason": "Linking existing Functionalities during US creation — living-doc-create-user-story owns this" + } +] \ No newline at end of file diff --git a/skills/living-doc-gap-finder/evals/evals.json b/skills/living-doc-gap-finder/evals/evals.json index 0a1d649..d703e84 100644 --- a/skills/living-doc-gap-finder/evals/evals.json +++ b/skills/living-doc-gap-finder/evals/evals.json @@ -5,7 +5,7 @@ "id": 1, "category": "happy-path", "prompt": "Run a gap analysis on our living documentation. File: evals/files/catalog-snapshot.json", - "expected_output": "Agent analyzes the snapshot and produces a gap report with: Blocker \u2014 US-001-AC-2 and US-001-AC-3 have no linked tests; Blocker \u2014 US-002-AC-1 and US-002-AC-2 have no linked tests; Blocker \u2014 all 4 ACs of US-007 have no linked tests; Blocker \u2014 FUNC-apply-discount has 5 ACs with no linked tests (Gap type 1 applies to Functionality ACs \u2014 report as UNTESTED_AC Blocker, not UNDOCUMENTED_FUNCTIONALITY); Important \u2014 /account/preferences screen discovered in webapp with no Feature entity (after normalisation: /account/orders \u2194 FEAT-account, /reports/legacy \u2194 FEAT-orphan are already documented); Important \u2014 FEAT-promo and FEAT-orphan each have no linked User Stories (orphan Features); Important \u2014 US-007 has no linked Feature (orphan User Story); Important \u2014 test_order_history.py, test_login_flow.feature, and the 'View paginated order history' BDD scenario have no linked ACs (orphan tests); Nit \u2014 FEAT-account and FEAT-orphan have no Functionalities defined (empty Features). Documentation coverage reported separately for US ACs and Functionality ACs.", + "expected_output": "Agent analyzes the snapshot and produces a gap report with: Blocker — US-001-AC-2 and US-001-AC-3 have no linked tests; Blocker — US-002-AC-1 and US-002-AC-2 have no linked tests; Blocker — all 4 ACs of US-007 have no linked tests; Blocker — FUNC-apply-discount has 5 ACs with no linked tests (Gap type 1 applies to Functionality ACs — report as UNTESTED_AC Blocker, not UNDOCUMENTED_FUNCTIONALITY); Important — /account/preferences screen discovered in webapp with no Feature entity (after normalisation: /account/orders ↔ FEAT-account, /reports/legacy ↔ FEAT-orphan are already documented); Important — FEAT-promo and FEAT-orphan each have no linked User Stories (orphan Features); Important — US-007 has no linked Feature (orphan User Story); Important — test_order_history.py, test_login_flow.feature, and the 'View paginated order history' BDD scenario have no linked ACs (orphan tests); Nit — FEAT-account and FEAT-orphan have no Functionalities defined (empty Features). Documentation coverage reported separately for US ACs and Functionality ACs.", "files": [ "evals/files/catalog-snapshot.json" ], @@ -13,14 +13,14 @@ "Identifies US-001-AC-2 and US-001-AC-3 as untested (Blockers)", "Identifies US-002-AC-1 and US-002-AC-2 as untested (Blockers)", "Identifies all 4 US-007 ACs as untested (Blockers)", - "Identifies FUNC-apply-discount ACs as untested (Blocker, not Nit \u2014 Gap type 1 applies to Functionality ACs)", + "Identifies FUNC-apply-discount ACs as untested (Blocker, not Nit — Gap type 1 applies to Functionality ACs)", "Identifies /account/preferences as undocumented surface (Important)", "Identifies FEAT-promo as orphan Feature (Important)", "Identifies FEAT-orphan as orphan Feature (Important)", "Identifies test_order_history.py and test_login_flow.feature as orphan tests (Important)", "Identifies 'View paginated order history' BDD scenario as orphan test (Important)", - "Identifies FEAT-account and FEAT-orphan as empty Features (Nit \u2014 no Functionalities)", - "Identifies US-007 as orphan User Story (Important \u2014 no linked Feature)", + "Identifies FEAT-account and FEAT-orphan as empty Features (Nit — no Functionalities)", + "Identifies US-007 as orphan User Story (Important — no linked Feature)", "Normalises undocumented surfaces: only /account/preferences is truly undocumented after matching existing Features", "Calculates documentation coverage percentage separately for US ACs and Functionality ACs" ] @@ -41,10 +41,10 @@ "id": 3, "category": "happy-path", "prompt": "A test file exists with no linked AC. What gap type is this and what should I do?", - "expected_output": "This is an orphan test (Gap type 6 \u2014 Important). Resolution options: (1) find an existing AC in the living doc that this test covers and add the link; (2) if no AC exists, create a Functionality entity for the behavior being tested using living-doc-create-functionality, then link the test to the new Functionality's AC. Never delete a test to resolve an orphan \u2014 that would remove coverage.", + "expected_output": "This is an orphan test (ORPHAN_TEST (Important)). Resolution options: (1) find an existing AC in the living doc that this test covers and add the link; (2) if no AC exists, create a Functionality entity for the behavior being tested using living-doc-create-functionality, then link the test to the new Functionality's AC. Never delete a test to resolve an orphan — that would remove coverage.", "files": [], "expectations": [ - "Classifies as Gap type 6: ORPHAN_TEST", + "Classifies as ORPHAN_TEST", "Provides two resolution options: link to existing AC, or create new Functionality", "Explicitly warns against deleting the test" ] @@ -53,7 +53,7 @@ "id": 4, "category": "regression", "prompt": "We have 200 orphan tests from a legacy codebase. Should I run gap-finder on all of them at once?", - "expected_output": "Batch the gap-finder run by domain or Feature area rather than running across the entire codebase at once. Process the highest-risk areas first (payment, auth, security). For each batch: identify which Functionalities or User Stories the tests correspond to, create missing living doc entities, and link tests. Processing all 200 at once produces an unmanageable gap report \u2014 prioritise by business risk.", + "expected_output": "Batch the gap-finder run by domain or Feature area rather than running across the entire codebase at once. Process the highest-risk areas first (payment, auth, security). For each batch: identify which Functionalities or User Stories the tests correspond to, create missing living doc entities, and link tests. Processing all 200 at once produces an unmanageable gap report — prioritise by business risk.", "files": [], "expectations": [ "Recommends batching by domain or Feature area", @@ -65,7 +65,7 @@ "id": 5, "category": "negative", "prompt": "Create a new User Story for the account preferences screen.", - "expected_output": "Creating a User Story is not a gap-finder action \u2014 routes to living-doc-create-user-story. living-doc-gap-finder identifies and proposes new entities; the creation itself is delegated to the appropriate create-* skill.", + "expected_output": "Creating a User Story is not a gap-finder action — routes to living-doc-create-user-story. living-doc-gap-finder identifies and proposes new entities; the creation itself is delegated to the appropriate create-* skill.", "files": [], "expectations": [ "Does not create the User Story", @@ -90,10 +90,10 @@ "id": 7, "category": "edge-case", "prompt": "A test is linked to an AC, but the AC was deleted from the living doc. How is this classified and what should I do?", - "expected_output": "This is a broken-link gap (variant of Gap type 6: ORPHAN_TEST). The test references an AC ID that no longer exists. Resolution options: (1) If the behavior the test covers is still required, recreate the Functionality/AC entity and relink. (2) If the behavior has been removed, the test should be deleted after confirming with the product owner. (3) If the AC was merged into another entity, update the test's link comment to the new AC ID. Never delete a test without product owner confirmation.", + "expected_output": "This is a broken-link gap (ORPHAN_TEST variant — broken AC link). The test references an AC ID that no longer exists. Resolution options: (1) If the behavior the test covers is still required, recreate the Functionality/AC entity and relink. (2) If the behavior has been removed, the test should be deleted after confirming with the product owner. (3) If the AC was merged into another entity, update the test's link comment to the new AC ID. Never delete a test without product owner confirmation.", "files": [], "expectations": [ - "Classifies as broken-link orphan test", + "Classifies as ORPHAN_TEST (broken-link variant)", "Provides three resolution options", "Warns against deleting the test without product owner confirmation", "Notes the possibility of AC merge as a resolution path" @@ -103,24 +103,24 @@ "id": 8, "category": "output-format", "prompt": "Run a gap analysis and show me exactly what format the output report uses.", - "expected_output": "The gap report is emitted as structured JSON (or a formatted rendering of it) with a top-level `documentation_coverage` section (coverage_percentage, user_stories_with_full_coverage, user_stories_with_gaps) and a `gaps[]` array. Each gap item includes: id (GAP-NNN), type (one of UNTESTED_AC, UNDOCUMENTED_SURFACE, ORPHAN_FEATURE, ORPHAN_USER_STORY, ORPHAN_FUNCTIONALITY, ORPHAN_TEST, STALE_REFERENCE, UNDOCUMENTED_FUNCTIONALITY, EMPTY_FEATURE), severity (Blocker/Important/Nit), entity (the affected entity ID or path), description, and proposed_action. Gaps are ordered by severity (Blocker first, then Important, then Nit). The report is diagnostic only \u2014 no entity creation or modification is made.", + "expected_output": "The gap report is emitted as structured JSON (or a formatted rendering of it) with a top-level `documentation_coverage` section (coverage_percentage, user_stories_with_full_coverage, user_stories_with_gaps) and a `gaps[]` array. Each gap item includes: id (GAP-NNN), type (one of UNTESTED_AC, UNDOCUMENTED_SURFACE, ORPHAN_FEATURE, ORPHAN_USER_STORY, ORPHAN_FUNCTIONALITY, ORPHAN_TEST, STALE_REFERENCE, UNDOCUMENTED_FUNCTIONALITY, EMPTY_FEATURE), severity (Blocker/Important/Nit), entity (the affected entity ID or path), description, and proposed_action. Gaps are ordered by severity (Blocker first, then Important, then Nit). The report is diagnostic only — no entity creation or modification is made.", "files": [], "expectations": [ "Report includes top-level documentation_coverage section with coverage_percentage", "gaps[] array present; each item has id, type, severity, entity, description, proposed_action", "Gap type codes are canonical: UNTESTED_AC, ORPHAN_TEST, ORPHAN_FEATURE, etc.", "Gaps ordered by severity (Blocker before Important before Nit)", - "Diagnostic only \u2014 no entity creation or modification" + "Diagnostic only — no entity creation or modification" ] }, { "id": 9, "category": "regression", "prompt": "A test references AC:US-042-01 but that AC was deprecated last sprint. What gap type is this and how do I resolve it?", - "expected_output": "This is a stale reference (Gap type 7 \u2014 Important). The active test references a Deprecated AC. Resolution options: (1) update the test's link to the active replacement AC if the behavior was superseded; (2) reinstate the AC using living-doc-update if it was deprecated in error; (3) if the behavior was intentionally removed, delete the test after product owner confirmation. The test must not be deleted without product owner confirmation.", + "expected_output": "This is a stale reference (STALE_REFERENCE (Important)). The active test references a Deprecated AC. Resolution options: (1) update the test's link to the active replacement AC if the behavior was superseded; (2) reinstate the AC using living-doc-update if it was deprecated in error; (3) if the behavior was intentionally removed, delete the test after product owner confirmation. The test must not be deleted without product owner confirmation.", "files": [], "expectations": [ - "Classifies as Gap type 7: STALE_REFERENCE", + "Classifies as STALE_REFERENCE", "Classifies severity as Important", "Provides three resolution options: relink to new AC, reinstate AC, or delete after PO confirmation", "Warns against deleting test without product owner confirmation", @@ -131,7 +131,7 @@ "id": 10, "category": "edge-case", "prompt": "We have 50 orphan tests and 30 untested ACs across the entire platform. Should I run a single all-domain gap report and work through everything at once?", - "expected_output": "No \u2014 use the two-phase strategy. Phase 1: ensure every User Story has at least one covered AC. List all User Stories with zero covered ACs, cover the first AC of each before moving on. This establishes a minimum traceability baseline. Phase 2: once every US has at least one covered AC, rank gap clusters by count, prioritise the highest-risk domains first (payment, auth, security), batch by Feature or domain, and iterate. Processing all 80 gaps at once produces an unmanageable report and obscures progress.", + "expected_output": "No — use the two-phase strategy. Phase 1: ensure every User Story has at least one covered AC. List all User Stories with zero covered ACs, cover the first AC of each before moving on. This establishes a minimum traceability baseline. Phase 2: once every US has at least one covered AC, rank gap clusters by count, prioritise the highest-risk domains first (payment, auth, security), batch by Feature or domain, and iterate. Processing all 80 gaps at once produces an unmanageable report and obscures progress.", "files": [], "expectations": [ "Recommends two-phase strategy over single full-pass", @@ -144,10 +144,10 @@ "id": 11, "category": "happy-path", "prompt": "A Functionality entity FUNC-promo-validate exists in the catalog but has no parent Feature linked. What gap type is this and what should I do?", - "expected_output": "This is an orphan Functionality (Gap type 5 \u2014 Important). A Functionality with no parent Feature is untraceable \u2014 it cannot be reached via the entity hierarchy and is missed in impact analyses. Resolution: identify or create the owning Feature and add FUNC-promo-validate to its functionalities list. If tests reference this Functionality's ACs, resolve those first (ORPHAN_TEST takes priority) before removing the Functionality.", + "expected_output": "This is an orphan Functionality (ORPHAN_FUNCTIONALITY (Important)). A Functionality with no parent Feature is untraceable — it cannot be reached via the entity hierarchy and is missed in impact analyses. Resolution: identify or create the owning Feature and add FUNC-promo-validate to its functionalities list. If tests reference this Functionality's ACs, resolve those first (ORPHAN_TEST takes priority) before removing the Functionality.", "files": [], "expectations": [ - "Classifies as Gap type 5: ORPHAN_FUNCTIONALITY", + "Classifies as ORPHAN_FUNCTIONALITY", "Classifies severity as Important", "Advises linking to an existing Feature or creating one", "Warns: do not remove if tests reference this Functionality's ACs" @@ -157,7 +157,7 @@ "id": 12, "category": "regression", "prompt": "The gap-finder script reports /reports/legacy as an UNDOCUMENTED_SURFACE but there is already a Feature entity 'Legacy Report Screen' (FEAT-orphan) in the catalog. Should this be reported as a gap?", - "expected_output": "No. After normalisation, /reports/legacy is already documented \u2014 FEAT-orphan (Legacy Report Screen) clearly owns that surface by name and domain meaning. The skill instructs to treat a discovered screen as already documented when an existing Feature clearly owns the same surface by path, name, or domain meaning. Remove this item from the gap report. FEAT-orphan still has other gaps (orphan Feature, empty Feature) but UNDOCUMENTED_SURFACE is not one of them.", + "expected_output": "No. After normalisation, /reports/legacy is already documented — FEAT-orphan (Legacy Report Screen) clearly owns that surface by name and domain meaning. The skill instructs to treat a discovered screen as already documented when an existing Feature clearly owns the same surface by path, name, or domain meaning. Remove this item from the gap report. FEAT-orphan still has other gaps (orphan Feature, empty Feature) but UNDOCUMENTED_SURFACE is not one of them.", "files": [], "expectations": [ "Removes /reports/legacy from UNDOCUMENTED_SURFACE gaps after normalisation", @@ -168,15 +168,54 @@ }, { "id": 13, - "prompt": "A Feature entity FEAT-checkout exists in the living doc but its functionalities list is empty \u2014 no Functionality entities are linked to it. What gap type is this and what is the priority?", + "prompt": "A Feature entity FEAT-checkout exists in the living doc but its functionalities list is empty — no Functionality entities are linked to it. What gap type is this and what is the priority?", "expected_output": "Gap type EMPTY_FEATURE (not \"Gap type 9\"). Priority: Nit. Guidance: define Functionality entities for the behaviors this Feature owns using living-doc-create-functionality.", - "files": [] + "files": [], + "category": "edge-case", + "expectations": [ + "Classifies as EMPTY_FEATURE gap type", + "Priority is Nit (lowest severity — feature may still be valid but incomplete)", + "Recommends creating Functionality entities for the behaviors the Feature owns", + "Routes to living-doc-create-functionality for the fix" + ] }, { "id": 14, "prompt": "FEAT-checkout is linked to no User Stories at all. What gap type is this?", "expected_output": "Gap type ORPHAN_FEATURE (not \"Gap type 3\"). Priority: Important. Guidance: link at least one User Story to give the Feature traceable business value.", - "files": [] + "files": [], + "category": "edge-case", + "expectations": [ + "Classifies as ORPHAN_FEATURE gap type", + "Priority is Important (Feature is unreachable from any User Story)", + "Recommends linking at least one User Story to the Feature", + "Routes to living-doc-create-user-story or living-doc-update to create or link a US" + ] + }, + { + "id": 15, + "category": "happy-path", + "prompt": "What is the difference between AUDIT mode and PLAN mode in the gap finder?", + "expected_output": "AUDIT mode performs a full catalog audit — it processes all entities and test files to produce a gap report covering all 9 gap types. Use AUDIT before a release or when you want a comprehensive view. PLAN mode is bottom-up: it reads PageObject descriptions for a set of User Stories and drafts missing ACs directly from the PO element names. Use PLAN when you have PageObjects but no ACs yet.", + "files": [], + "expectations": [ + "Correctly defines AUDIT mode as full catalog audit", + "Correctly defines PLAN mode as bottom-up AC drafting from PO descriptions", + "Names compute_gaps.py for AUDIT and references PO descriptions for PLAN", + "Explains when to use each mode" + ] + }, + { + "id": 16, + "category": "happy-path", + "prompt": "I have a new User Story US-021 with no ACs yet. The PageObject for the relevant screen exists. Which gap-finder mode should I use and what happens?", + "expected_output": "Use PLAN mode. PLAN mode reads the PageObject element descriptions for the linked screen and drafts candidate ACs from the element names and interaction patterns. The output is a list of proposed ACs for review — you then accept, modify, or discard each one before adding them to US-021.", + "files": [], + "expectations": [ + "Recommends PLAN mode", + "Explains that PLAN mode derives ACs from PageObject descriptions", + "Output is draft/proposed ACs, not finalized ones" + ] } ] } \ No newline at end of file diff --git a/skills/living-doc-gap-finder/evals/trigger-eval.json b/skills/living-doc-gap-finder/evals/trigger-eval.json index 8206c4b..29a1325 100644 --- a/skills/living-doc-gap-finder/evals/trigger-eval.json +++ b/skills/living-doc-gap-finder/evals/trigger-eval.json @@ -1,21 +1,194 @@ [ - {"id": 1, "query": "Run a living doc gap analysis", "should_trigger": true, "reason": "'living doc gaps' trigger phrase"}, - {"id": 2, "query": "What's missing in our living documentation?", "should_trigger": true, "reason": "'what's missing in living doc' trigger phrase"}, - {"id": 3, "query": "Find undocumented features in the codebase", "should_trigger": true, "reason": "'find undocumented features' trigger phrase"}, - {"id": 4, "query": "Which tests have no linked acceptance criteria (orphan tests)?", "should_trigger": true, "reason": "'orphan tests' trigger keyword"}, - {"id": 5, "query": "Which ACs have no tests? (untested ACs)", "should_trigger": true, "reason": "'untested AC' trigger keyword"}, - {"id": 6, "query": "What is our documentation coverage percentage?", "should_trigger": true, "reason": "'documentation coverage' trigger keyword"}, - {"id": 7, "query": "Generate a gap report for the payments domain", "should_trigger": true, "reason": "'gap report' trigger keyword"}, - {"id": 8, "query": "What behaviors are not covered in the living doc?", "should_trigger": true, "reason": "'what's not covered' trigger phrase"}, - {"id": 9, "query": "Do a living doc audit before the release", "should_trigger": true, "reason": "'living doc audit' trigger phrase"}, - {"id": 10, "query": "Which User Story ACs are critical but have no BDD scenario?", "should_trigger": true, "reason": "Finding untested critical ACs — core gap-finder task"}, - {"id": 11, "query": "Find what's not documented in our test suite", "should_trigger": true, "reason": "'find what's not documented' trigger phrase"}, - {"id": 12, "query": "Create a user story for the preferences screen gap", "should_trigger": false, "reason": "Creating a User Story — routes to living-doc-create-user-story"}, - {"id": 13, "query": "Create a Feature entity for the account preferences screen", "should_trigger": false, "reason": "Creating a Feature — routes to living-doc-create-feature"}, - {"id": 14, "query": "Implement step definitions for the gap report scenario", "should_trigger": false, "reason": "Step definition implementation — routes to gherkin-step"}, - {"id": 15, "query": "Do a documentation audit to check for missing tests before the go-live", "should_trigger": true, "reason": "'documentation audit' trigger phrase"}, - {"id": 16, "query": "Which Functionalities in our living doc have no parent Feature?", "should_trigger": true, "reason": "Detecting ORPHAN_FUNCTIONALITY gaps — core gap-finder task"}, - {"id": 17, "query": "A test is pointing to a deprecated AC — what kind of gap is that?", "should_trigger": true, "reason": "Stale reference detection (Gap type 7) — core gap-finder task"}, - {"id": 18, "query": "We have 100 orphan tests — how should we batch the gap-finder run?", "should_trigger": true, "reason": "Batching strategy for large-scale gap analysis — gap-finder guidance task"}, - {"id": 19, "query": "Update US-042 to add a new AC", "should_trigger": false, "reason": "Updating an existing entity — routes to living-doc-update"} -] + { + "id": 1, + "query": "Run a living doc gap analysis", + "should_trigger": true, + "reason": "'living doc gaps' trigger phrase" + }, + { + "id": 2, + "query": "What's missing in our living documentation?", + "should_trigger": true, + "reason": "'what's missing in living doc' trigger phrase" + }, + { + "id": 3, + "query": "Find undocumented features in the codebase", + "should_trigger": true, + "reason": "'find undocumented features' trigger phrase" + }, + { + "id": 4, + "query": "Which tests have no linked acceptance criteria (orphan tests)?", + "should_trigger": true, + "reason": "'orphan tests' trigger keyword" + }, + { + "id": 5, + "query": "Which ACs have no tests? (untested ACs)", + "should_trigger": true, + "reason": "'untested AC' trigger keyword" + }, + { + "id": 6, + "query": "What is our documentation coverage percentage?", + "should_trigger": true, + "reason": "'documentation coverage' trigger keyword" + }, + { + "id": 7, + "query": "Generate a gap report for the payments domain", + "should_trigger": true, + "reason": "'gap report' trigger keyword" + }, + { + "id": 8, + "query": "What behaviors are not covered in the living doc?", + "should_trigger": true, + "reason": "'what's not covered' trigger phrase" + }, + { + "id": 9, + "query": "Do a living doc audit before the release", + "should_trigger": true, + "reason": "'living doc audit' trigger phrase" + }, + { + "id": 10, + "query": "Which User Story ACs are critical but have no BDD scenario?", + "should_trigger": true, + "reason": "Finding untested critical ACs — core gap-finder task" + }, + { + "id": 11, + "query": "Find what's not documented in our test suite", + "should_trigger": true, + "reason": "'find what's not documented' trigger phrase" + }, + { + "id": 12, + "query": "Create a user story for the preferences screen gap", + "should_trigger": false, + "reason": "Creating a User Story — routes to living-doc-create-user-story" + }, + { + "id": 13, + "query": "Create a Feature entity for the account preferences screen", + "should_trigger": false, + "reason": "Creating a Feature — routes to living-doc-create-feature" + }, + { + "id": 14, + "query": "Implement step definitions for the gap report scenario", + "should_trigger": false, + "reason": "Step definition implementation — routes to gherkin-step" + }, + { + "id": 15, + "query": "Do a documentation audit to check for missing tests before the go-live", + "should_trigger": true, + "reason": "'documentation audit' trigger phrase" + }, + { + "id": 16, + "query": "Which Functionalities in our living doc have no parent Feature?", + "should_trigger": true, + "reason": "Detecting ORPHAN_FUNCTIONALITY gaps — core gap-finder task" + }, + { + "id": 17, + "query": "A test is pointing to a deprecated AC — what kind of gap is that?", + "should_trigger": true, + "reason": "Stale reference detection (Gap type 7) — core gap-finder task" + }, + { + "id": 18, + "query": "We have 100 orphan tests — how should we batch the gap-finder run?", + "should_trigger": true, + "reason": "Batching strategy for large-scale gap analysis — gap-finder guidance task" + }, + { + "id": 19, + "query": "Update US-042 to add a new AC", + "should_trigger": false, + "reason": "Updating an existing entity — routes to living-doc-update" + }, + { + "id": 20, + "query": "Write step definitions for the checkout scenarios", + "should_trigger": false, + "reason": "Writing step definitions — routes to gherkin-step" + }, + { + "id": 21, + "query": "Delete all BDD artifacts linked to the deprecated feature", + "should_trigger": false, + "reason": "BDD artifact cleanup — routes to bdd-maintain" + }, + { + "id": 22, + "query": "Sync the @AC: tags in the feature files with the AC catalog", + "should_trigger": false, + "reason": "Feature file sync — routes to gherkin-living-doc-sync" + }, + { + "id": 23, + "query": "Scan the checkout page and generate PageObjects", + "should_trigger": false, + "reason": "PageObject scanning — routes to living-doc-pageobject-scan" + }, + { + "id": 24, + "query": "Add data-cy attributes to the checkout template", + "should_trigger": false, + "reason": "Instrumenting templates — routes to data-cy-instrument" + }, + { + "id": 25, + "query": "Generate BDD scenarios for all active ACs on US-007", + "should_trigger": false, + "reason": "Generating scenarios — routes to living-doc-scenario-creator" + }, + { + "id": 26, + "query": "Create a new Functionality for the discount calculation logic", + "should_trigger": false, + "reason": "Creating a Functionality — routes to living-doc-create-functionality" + }, + { + "id": 27, + "query": "Update the wording of AC-1 on US-042", + "should_trigger": false, + "reason": "Updating an entity — routes to living-doc-update" + }, + { + "id": 28, + "query": "What does this PR affect in the living doc?", + "should_trigger": false, + "reason": "Impact analysis — routes to living-doc-impact-analysis" + }, + { + "id": 29, + "query": "Create a User Story for the account management screen", + "should_trigger": false, + "reason": "Creating a User Story — routes to living-doc-create-user-story" + }, + { + "id": 30, + "query": "Write behave step definitions for the gap report scenario", + "should_trigger": false, + "reason": "Writing step definitions — routes to gherkin-step" + }, + { + "id": 31, + "query": "Run an AUDIT mode gap analysis on the entire catalog before the release", + "should_trigger": true, + "reason": "AUDIT mode — gap-finder AUDIT mode keyword" + }, + { + "id": 32, + "query": "Use PLAN mode to draft ACs from the PageObject descriptions for the checkout screen", + "should_trigger": true, + "reason": "PLAN mode — gap-finder PLAN mode keyword" + } +] \ No newline at end of file diff --git a/skills/living-doc-impact-analysis/evals/evals.json b/skills/living-doc-impact-analysis/evals/evals.json index 93166b6..5a37f6b 100644 --- a/skills/living-doc-impact-analysis/evals/evals.json +++ b/skills/living-doc-impact-analysis/evals/evals.json @@ -5,11 +5,11 @@ "id": 1, "category": "happy-path", "prompt": "PR #217 modifies PromoService.java to support stacked discounts. What living doc entities does this change affect?", - "expected_output": "Agent maps PromoService.java to its Feature via the feature_registry in catalog.json (or runs trace_impact.py). Traces Feature \u2192 Functionality \u2192 User Stories \u2192 ACs. Classifies impact as High (changed business logic). Outputs a structured impact map listing affected Features, Functionalities, User Stories, and ACs that require review.", + "expected_output": "Agent maps PromoService.java to its Feature via the feature_registry in catalog.json (or runs trace_impact.py). Traces Feature → Functionality → User Stories → ACs. Classifies impact as High (changed business logic). Outputs a structured impact map listing affected Features, Functionalities, User Stories, and ACs that require review.", "files": [], "expectations": [ "Maps changed file to Feature via the feature_registry section in catalog.json (or runs trace_impact.py --catalog catalog.json)", - "Traces Feature \u2192 Functionality \u2192 User Stories \u2192 ACs", + "Traces Feature → Functionality → User Stories → ACs", "Classifies impact level as High for changed business logic", "Outputs a structured impact map" ] @@ -68,7 +68,7 @@ "id": 6, "category": "negative", "prompt": "Update US-042 to add a new AC for the expired promo path.", - "expected_output": "Updating living doc entities is out of scope for this skill \u2014 routes to living-doc-update. living-doc-impact-analysis traces which entities are affected by a code change; it does not create or modify entity content.", + "expected_output": "Updating living doc entities is out of scope for this skill — routes to living-doc-update. living-doc-impact-analysis traces which entities are affected by a code change; it does not create or modify entity content.", "files": [], "expectations": [ "Does not update the User Story", @@ -80,7 +80,7 @@ "id": 7, "category": "negative", "prompt": "Which Functionalities don't have any User Stories?", - "expected_output": "Finding coverage gaps is out of scope for this skill \u2014 routes to living-doc-gap-finder. living-doc-impact-analysis traces the impact of code changes; coverage gap detection is handled by living-doc-gap-finder.", + "expected_output": "Finding coverage gaps is out of scope for this skill — routes to living-doc-gap-finder. living-doc-impact-analysis traces the impact of code changes; coverage gap detection is handled by living-doc-gap-finder.", "files": [], "expectations": [ "Does not search for orphan Functionalities", @@ -92,12 +92,12 @@ "id": 8, "category": "paraphrase", "prompt": "We're about to merge a PR that changes the cart validation logic. What do we need to re-test in the living doc?", - "expected_output": "Agent identifies this as an impact analysis request despite 're-test' phrasing. Maps the changed cart validation code to its Feature via the feature_registry in catalog.json. Traces Feature \u2192 Functionality \u2192 User Stories \u2192 ACs. Lists all linked Gherkin scenarios that need re-running. Outputs a structured re-test checklist.", + "expected_output": "Agent identifies this as an impact analysis request despite 're-test' phrasing. Maps the changed cart validation code to its Feature via the feature_registry in catalog.json. Traces Feature → Functionality → User Stories → ACs. Lists all linked Gherkin scenarios that need re-running. Outputs a structured re-test checklist.", "files": [], "expectations": [ "Identifies this as an impact analysis request despite 're-test' phrasing", "Maps changed code to Feature via the feature_registry section in catalog.json", - "Traces Feature \u2192 Functionality \u2192 User Stories \u2192 ACs", + "Traces Feature → Functionality → User Stories → ACs", "Lists all linked Gherkin scenarios that need re-running", "Outputs a structured re-test checklist" ] @@ -106,12 +106,12 @@ "id": 9, "category": "edge-case", "prompt": "PR #410 modifies MoneyUtils.java, a shared utility class used by checkout, refunds, and the promotions engine. What is the living doc impact?", - "expected_output": "Agent fans out the impact analysis to all Features that import or reference MoneyUtils. Lists all Functionalities within each Feature area that call MoneyUtils. Classifies all as High impact \u2014 shared utility changes propagate to all consumers. Produces a consolidated impact map across all three Feature areas (checkout, refunds, promotions).", + "expected_output": "Agent fans out the impact analysis to all Features that import or reference MoneyUtils. Lists all Functionalities within each Feature area that call MoneyUtils. Classifies all as High impact — shared utility changes propagate to all consumers. Produces a consolidated impact map across all three Feature areas (checkout, refunds, promotions).", "files": [], "expectations": [ "Fans out impact analysis to all Features that reference MoneyUtils", "Lists all Functionalities in each Feature that call MoneyUtils", - "Classifies all as High impact \u2014 shared utility changes affect all consumers", + "Classifies all as High impact — shared utility changes affect all consumers", "Produces a consolidated impact map across all three Feature areas" ] }, @@ -126,7 +126,8 @@ "Each section uses a markdown list", "Method signature change in a fenced code block", "No speculative changes beyond the described scope" - ] + ], + "category": "happy-path" }, { "id": 11, @@ -142,17 +143,18 @@ "Required changes section lists specific call site updates", "Test coverage section lists tests to add or update", "Old and new signatures in fenced code blocks" - ] + ], + "category": "happy-path" }, { "id": 12, "category": "edge-case", "prompt": "PR #501 modifies only OrderServiceTest.java and adds a new mock in MockNotificationClient.java. What is the living doc impact?", - "expected_output": "Agent classifies all changed files as test files and mocks \u2014 not domain logic or API contract. Impact level is None. Test-only changes do not affect business logic or living doc entities. Notes in the PR that no living doc update is needed.", + "expected_output": "Agent classifies all changed files as test files and mocks — not domain logic or API contract. Impact level is None. Test-only changes do not affect business logic or living doc entities. Notes in the PR that no living doc update is needed.", "files": [], "expectations": [ - "Classifies all changed files as test files and mocks \u2014 not domain logic or API contract", - "Impact level: None \u2014 test-only changes do not affect living doc", + "Classifies all changed files as test files and mocks — not domain logic or API contract", + "Impact level: None — test-only changes do not affect living doc", "No living doc update required", "Notes in PR that no living doc update is needed" ] @@ -161,14 +163,14 @@ "id": 13, "category": "happy-path", "prompt": "PR #222 modifies both DiscountService.java (domain logic) and DiscountController.java (REST controller). What living doc entities are affected?", - "expected_output": "Agent classifies DiscountService.java as domain logic (High impact) and DiscountController.java as API contract (High impact). Traces both files to the owning Feature via the feature_registry in catalog.json. Traces Feature \u2192 Functionalities \u2192 User Stories \u2192 ACs for each changed file. Consolidates entities appearing more than once as higher-risk. Outputs a single consolidated impact map covering both changed files.", + "expected_output": "Agent classifies DiscountService.java as domain logic (High impact) and DiscountController.java as API contract (High impact). Traces both files to the owning Feature via the feature_registry in catalog.json. Traces Feature → Functionalities → User Stories → ACs for each changed file. Consolidates entities appearing more than once as higher-risk. Outputs a single consolidated impact map covering both changed files.", "files": [], "expectations": [ - "Classifies DiscountService.java as domain logic \u2014 High impact", - "Classifies DiscountController.java as API contract \u2014 High impact", + "Classifies DiscountService.java as domain logic — High impact", + "Classifies DiscountController.java as API contract — High impact", "Traces both files to the owning Feature (e.g. FEAT-promotions) via feature_registry", - "Traces Feature \u2192 Functionalities \u2192 User Stories \u2192 ACs for both changed files", - "Consolidates entities appearing more than once \u2014 higher risk", + "Traces Feature → Functionalities → User Stories → ACs for both changed files", + "Consolidates entities appearing more than once — higher risk", "Outputs a single consolidated impact map covering both changed files" ] }, @@ -176,7 +178,28 @@ "id": 14, "prompt": "PR #600 only updates README.md, adds inline code comments, and reformats a YAML config file with no value changes. What is the living doc impact?", "expected_output": "No living doc impact. Docs-only and formatting-only PRs fall into the fast-path no-impact category. No entities to review or update.", - "files": [] + "files": [], + "category": "happy-path", + "expectations": [ + "Classifies as None impact level — no living doc entities to review", + "Identifies the fast-path no-impact category: docs-only and formatting-only PRs", + "Does not flag README updates, comment additions, or whitespace-only YAML as impactful", + "Produces no sign-off checklist items — empty impact report is the correct output" + ] + }, + { + "id": 15, + "category": "happy-path", + "prompt": "We have never run an impact analysis before and the feature_registry doesn't exist. How do I bootstrap it?", + "expected_output": "Bootstrap the feature_registry in 4 steps: (1) Find all Feature entity files in the living doc catalog directory. (2) For each Feature, identify the corresponding source paths — for Angular, look for the component folder (e.g. src/app/checkout/); for Java/Spring, look for the controller/service package (e.g. com.example.checkout). (3) Build the registry as a map of feature_id → [source_paths]. (4) Save it as feature_registry.json. After saving, re-run the impact analysis. Use living-doc-create-feature for creating new Feature entities and living-doc-update for the 'Rename a Feature' workflow.", + "files": [], + "expectations": [ + "Lists all 4 bootstrap steps in order", + "Mentions Angular path pattern (src/app/<feature>/)", + "Mentions Java/Spring path pattern (package name)", + "Instructs saving as feature_registry.json", + "Routes to living-doc-create-feature for new entities" + ] } ] } \ No newline at end of file diff --git a/skills/living-doc-impact-analysis/evals/trigger-eval.json b/skills/living-doc-impact-analysis/evals/trigger-eval.json index 92b6271..d26b411 100644 --- a/skills/living-doc-impact-analysis/evals/trigger-eval.json +++ b/skills/living-doc-impact-analysis/evals/trigger-eval.json @@ -88,5 +88,53 @@ "query": "Find all Functionalities with no linked User Stories.", "should_trigger": false, "reason": "Finding coverage gaps is handled by living-doc-gap-finder." + }, + { + "id": "t16-not-create-func", + "query": "Create a new Functionality for the stacked discount rule", + "should_trigger": false, + "reason": "Creating a Functionality — routes to living-doc-create-functionality" + }, + { + "id": "t17-not-scenario-creator", + "query": "Generate BDD scenarios for US-007", + "should_trigger": false, + "reason": "Generating scenarios — routes to living-doc-scenario-creator" + }, + { + "id": "t18-not-sync", + "query": "Sync the @AC: tags in the feature files with the AC catalog", + "should_trigger": false, + "reason": "Feature file sync — routes to gherkin-living-doc-sync" + }, + { + "id": "t19-not-step", + "query": "Write step definitions for the order placement scenarios", + "should_trigger": false, + "reason": "Writing step definitions — routes to gherkin-step" + }, + { + "id": "t20-not-pageobject", + "query": "Scan the webapp and generate PageObjects for the admin portal", + "should_trigger": false, + "reason": "PageObject scanning — routes to living-doc-pageobject-scan" + }, + { + "id": "t21-not-data-cy", + "query": "Add data-cy attributes to the checkout form inputs", + "should_trigger": false, + "reason": "Instrumenting templates — routes to data-cy-instrument" + }, + { + "id": "t22-not-bdd-maintain", + "query": "Find all unused step definitions in the BDD test suite", + "should_trigger": false, + "reason": "Dead code audit — routes to bdd-maintain" + }, + { + "id": "t23-bootstrap-registry", + "query": "The feature_registry is missing — how do I bootstrap it from the codebase?", + "should_trigger": true, + "reason": "Bootstrapping feature_registry — impact-analysis owns this 4-step procedure" } -] +] \ No newline at end of file diff --git a/skills/living-doc-pageobject-scan/evals/trigger-eval.json b/skills/living-doc-pageobject-scan/evals/trigger-eval.json index 3caf5a4..9c70a80 100644 --- a/skills/living-doc-pageobject-scan/evals/trigger-eval.json +++ b/skills/living-doc-pageobject-scan/evals/trigger-eval.json @@ -1,20 +1,182 @@ [ - {"id": 1, "query": "Scan this webapp and generate PageObjects for each screen", "should_trigger": true, "reason": "'scan this webapp' trigger phrase"}, - {"id": 2, "query": "Generate PageObjects for our checkout and login screens", "should_trigger": true, "reason": "'generate pageobjects' trigger phrase"}, - {"id": 3, "query": "Update the PageObjects after the UI redesign", "should_trigger": true, "reason": "'update pageobjects' trigger phrase"}, - {"id": 4, "query": "Create a PageObject for the order history screen", "should_trigger": true, "reason": "'pageobject for this screen' trigger phrase"}, - {"id": 5, "query": "Crawl the UI to discover all available screens", "should_trigger": true, "reason": "'crawl the UI' trigger phrase"}, - {"id": 6, "query": "Discover all UI elements on the checkout page", "should_trigger": true, "reason": "'discover UI elements' trigger phrase"}, - {"id": 7, "query": "Create page objects for the admin portal", "should_trigger": true, "reason": "'create page objects' trigger phrase"}, - {"id": 8, "query": "Scan the test suite to find existing PageObjects", "should_trigger": true, "reason": "'scan test suite for pageobjects' trigger phrase"}, - {"id": 9, "query": "Do a living doc bottom-up scan of the web app", "should_trigger": true, "reason": "'living doc bottom-up' trigger phrase"}, - {"id": 10, "query": "Bootstrap page objects for a new test suite", "should_trigger": true, "reason": "'bootstrap page objects' trigger phrase"}, - {"id": 11, "query": "There is pageobject drift after the latest UI change — what do I do?", "should_trigger": true, "reason": "'pageobject drift' trigger phrase"}, - {"id": 12, "query": "Sync the PageObjects with the current app", "should_trigger": true, "reason": "'sync pageobjects' trigger phrase"}, - {"id": 13, "query": "Create a User Story for the checkout screen", "should_trigger": false, "reason": "Creating User Stories — routes to living-doc-create-user-story"}, - {"id": 14, "query": "Generate BDD scenarios for User Story US-007", "should_trigger": false, "reason": "Generating BDD scenarios from a User Story — routes to living-doc-scenario-creator"}, - {"id": 15, "query": "Detect whether the login PageObject has drifted after the redesign", "should_trigger": true, "reason": "'pageobject drift' / 'update pageobjects' pattern — Maintain mode trigger"}, - {"id": 16, "query": "Deprecate the checkout feature", "should_trigger": false, "reason": "Deprecating a living doc entity — routes to living-doc-update"}, - {"id": 17, "query": "Generate Functionality stubs from the elements discovered on the checkout screen", "should_trigger": true, "reason": "Functionality stub generation from discovered UI behaviors is part of Create mode Step 5"}, - {"id": 18, "query": "Update the manifest for the admin portal after the UI redesign", "should_trigger": true, "reason": "Maintain mode manifest update — 'update pageobjects' / 'sync pageobjects' trigger pattern"} -] + { + "id": 1, + "query": "Scan this webapp and generate PageObjects for each screen", + "should_trigger": true, + "reason": "'scan this webapp' trigger phrase" + }, + { + "id": 2, + "query": "Generate PageObjects for our checkout and login screens", + "should_trigger": true, + "reason": "'generate pageobjects' trigger phrase" + }, + { + "id": 3, + "query": "Update the PageObjects after the UI redesign", + "should_trigger": true, + "reason": "'update pageobjects' trigger phrase" + }, + { + "id": 4, + "query": "Create a PageObject for the order history screen", + "should_trigger": true, + "reason": "'pageobject for this screen' trigger phrase" + }, + { + "id": 5, + "query": "Crawl the UI to discover all available screens", + "should_trigger": true, + "reason": "'crawl the UI' trigger phrase" + }, + { + "id": 6, + "query": "Discover all UI elements on the checkout page", + "should_trigger": true, + "reason": "'discover UI elements' trigger phrase" + }, + { + "id": 7, + "query": "Create page objects for the admin portal", + "should_trigger": true, + "reason": "'create page objects' trigger phrase" + }, + { + "id": 8, + "query": "Scan the test suite to find existing PageObjects", + "should_trigger": true, + "reason": "'scan test suite for pageobjects' trigger phrase" + }, + { + "id": 9, + "query": "Do a living doc bottom-up scan of the web app", + "should_trigger": true, + "reason": "'living doc bottom-up' trigger phrase" + }, + { + "id": 10, + "query": "Bootstrap page objects for a new test suite", + "should_trigger": true, + "reason": "'bootstrap page objects' trigger phrase" + }, + { + "id": 11, + "query": "There is pageobject drift after the latest UI change — what do I do?", + "should_trigger": true, + "reason": "'pageobject drift' trigger phrase" + }, + { + "id": 12, + "query": "Sync the PageObjects with the current app", + "should_trigger": true, + "reason": "'sync pageobjects' trigger phrase" + }, + { + "id": 13, + "query": "Create a User Story for the checkout screen", + "should_trigger": false, + "reason": "Creating User Stories — routes to living-doc-create-user-story" + }, + { + "id": 14, + "query": "Generate BDD scenarios for User Story US-007", + "should_trigger": false, + "reason": "Generating BDD scenarios from a User Story — routes to living-doc-scenario-creator" + }, + { + "id": 15, + "query": "Detect whether the login PageObject has drifted after the redesign", + "should_trigger": true, + "reason": "'pageobject drift' / 'update pageobjects' pattern — Maintain mode trigger" + }, + { + "id": 16, + "query": "Deprecate the checkout feature", + "should_trigger": false, + "reason": "Deprecating a living doc entity — routes to living-doc-update" + }, + { + "id": 17, + "query": "Generate Functionality stubs from the elements discovered on the checkout screen", + "should_trigger": true, + "reason": "Functionality stub generation from discovered UI behaviors is part of Create mode Step 5" + }, + { + "id": 18, + "query": "Update the manifest for the admin portal after the UI redesign", + "should_trigger": true, + "reason": "Maintain mode manifest update — 'update pageobjects' / 'sync pageobjects' trigger pattern" + }, + { + "id": 19, + "query": "Write step definitions for the checkout scenarios", + "should_trigger": false, + "reason": "Writing step definitions — routes to gherkin-step" + }, + { + "id": 20, + "query": "Run a dead code audit to find unused PageObject methods", + "should_trigger": false, + "reason": "Dead code audit — routes to bdd-maintain" + }, + { + "id": 21, + "query": "Sync the @AC: tags in the feature files with the AC catalog", + "should_trigger": false, + "reason": "Feature file sync — routes to gherkin-living-doc-sync" + }, + { + "id": 22, + "query": "Add data-cy attributes to the checkout template", + "should_trigger": false, + "reason": "Instrumenting templates — routes to data-cy-instrument" + }, + { + "id": 23, + "query": "Generate BDD scenarios for all active ACs on US-007", + "should_trigger": false, + "reason": "Generating scenarios — routes to living-doc-scenario-creator" + }, + { + "id": 24, + "query": "Create a Feature entity for the orders screen", + "should_trigger": false, + "reason": "Creating a Feature — routes to living-doc-create-feature" + }, + { + "id": 25, + "query": "Create a Functionality for the cart validation behavior", + "should_trigger": false, + "reason": "Creating a Functionality — routes to living-doc-create-functionality" + }, + { + "id": 26, + "query": "Run a gap analysis on the living documentation", + "should_trigger": false, + "reason": "Gap analysis — routes to living-doc-gap-finder" + }, + { + "id": 27, + "query": "Update the AC wording on US-042-01", + "should_trigger": false, + "reason": "Updating an entity — routes to living-doc-update" + }, + { + "id": 28, + "query": "What does PR #217 affect in the living doc?", + "should_trigger": false, + "reason": "Impact analysis — routes to living-doc-impact-analysis" + }, + { + "id": 29, + "query": "Create a User Story for the new checkout screen", + "should_trigger": false, + "reason": "Creating a User Story — routes to living-doc-create-user-story" + }, + { + "id": 30, + "query": "Delete all BDD artifacts linked to the deprecated checkout feature", + "should_trigger": false, + "reason": "BDD artifact cleanup — routes to bdd-maintain" + } +] \ No newline at end of file diff --git a/skills/living-doc-scenario-creator/evals/evals.json b/skills/living-doc-scenario-creator/evals/evals.json index 45551d3..8dc8d50 100644 --- a/skills/living-doc-scenario-creator/evals/evals.json +++ b/skills/living-doc-scenario-creator/evals/evals.json @@ -148,6 +148,54 @@ "Notes ID format must match exactly (zero-padded)", "Provides the coverage_report.py command for diagnosis" ] + }, + { + "id": 12, + "category": "happy-path", + "prompt": "I'm generating scenarios for US-007 and AC:US-007-01 already has an existing scenario that matches the intent exactly. What should I do?", + "expected_output": "Skip — the existing scenario already covers the AC intent. Do not create a duplicate. Mark it as covered in the coverage report.", + "files": [], + "expectations": [ + "Decision: Skip (do not create duplicate)", + "Marks AC as covered" + ] + }, + { + "id": 13, + "category": "regression", + "prompt": "The existing scenario for AC:US-007-02 has stale Given/When/Then steps — the step text no longer matches the current AC wording. What is the merge policy?", + "expected_output": "Update — the scenario intent still matches but the GWT is stale. Rewrite the Given/When/Then steps to match the current AC wording. Do not create a new scenario; update the existing one in place. Load gherkin-living-doc-sync if the @AC: tag or # AC: comment also needs updating.", + "files": [], + "expectations": [ + "Decision: Update (rewrite stale GWT)", + "Does not create a new scenario", + "Routes @AC: tag sync to gherkin-living-doc-sync if needed" + ] + }, + { + "id": 14, + "category": "edge-case", + "prompt": "The existing scenario for AC:US-007-03 has status deprecated/review-needed. What is the correct merge-policy action?", + "expected_output": "Propose replacement — the existing scenario is deprecated or flagged for review. Generate a new candidate scenario for the AC and mark the old one as superseded. Present both for human review before deleting the old scenario.", + "files": [], + "expectations": [ + "Decision: Propose replacement (not silent delete)", + "Generates a new candidate scenario", + "Marks old scenario as superseded", + "Does not auto-delete without human review" + ] + }, + { + "id": 15, + "category": "edge-case", + "prompt": "I find that AC:US-007-04 already has three different scenarios linked to it. What is the merge-policy flag?", + "expected_output": "Flag for review — multiple scenarios for a single AC indicates potential duplication or scenario sprawl. Do not generate another scenario. Present the duplicates to the user and ask which one should be the canonical scenario for this AC.", + "files": [], + "expectations": [ + "Decision: Flag (do not add another scenario)", + "Identifies multiple-scenario-per-AC as a design smell", + "Asks user to designate the canonical scenario" + ] } ] -} +} \ No newline at end of file diff --git a/skills/living-doc-scenario-creator/evals/trigger-eval.json b/skills/living-doc-scenario-creator/evals/trigger-eval.json index 0e82d67..15551af 100644 --- a/skills/living-doc-scenario-creator/evals/trigger-eval.json +++ b/skills/living-doc-scenario-creator/evals/trigger-eval.json @@ -1,20 +1,176 @@ [ - {"id": 1, "query": "Create BDD scenarios for user story US-007 — Place an Online Order", "should_trigger": true, "reason": "'create BDD scenarios for user story' trigger phrase"}, - {"id": 2, "query": "Generate scenarios for US-003 — Apply Promo Code", "should_trigger": true, "reason": "'generate scenarios for US' trigger phrase"}, - {"id": 3, "query": "Cover all ACs of US-011 with BDD scenarios", "should_trigger": true, "reason": "'cover AC with scenarios' trigger phrase"}, - {"id": 4, "query": "Generate a feature file from user story US-015", "should_trigger": true, "reason": "'generate feature file from user story' trigger phrase"}, - {"id": 5, "query": "Create BDD scenarios from requirements for the login flow", "should_trigger": true, "reason": "'BDD from requirements' trigger phrase"}, - {"id": 6, "query": "What is the scenario coverage for US-007?", "should_trigger": true, "reason": "'scenario coverage for US' trigger phrase"}, - {"id": 7, "query": "Map the ACs of US-009 to Gherkin scenarios", "should_trigger": true, "reason": "'map AC to scenarios' trigger phrase"}, - {"id": 8, "query": "Generate Gherkin from user story US-012", "should_trigger": true, "reason": "'gherkin from user story' trigger phrase"}, - {"id": 9, "query": "Create scenarios for US-007", "should_trigger": true, "reason": "'scenarios for US-' trigger phrase — explicitly mentions a US ID"}, - {"id": 10, "query": "Generate a .feature file for the checkout flow", "should_trigger": true, "reason": "'generate .feature file' trigger phrase"}, - {"id": 11, "query": "Write standalone Gherkin scenarios for an exploratory test", "should_trigger": true, "reason": "Standalone Gherkin without a User Story — uses living-doc-scenario-creator Standalone mode"}, - {"id": 12, "query": "Implement the step definition for 'When the customer confirms the order'", "should_trigger": false, "reason": "Step definition implementation — routes to gherkin-step"}, - {"id": 13, "query": "Write a unit test for the promo code calculation", "should_trigger": false, "reason": "Unit test request — out of scope for this toolkit (no test-unit-write skill defined)"}, - {"id": 14, "query": "Find which User Stories have no Gherkin coverage at all", "should_trigger": false, "reason": "Finding doc gaps — routes to living-doc-gap-finder"}, - {"id": 15, "query": "What is the AC coverage for US-007 — are all ACs covered by a scenario?", "should_trigger": true, "reason": "'scenario coverage for US' trigger phrase — auditing AC-to-scenario coverage"}, - {"id": 16, "query": "Generate BDD scenarios for all active ACs on US-003", "should_trigger": true, "reason": "'generate scenarios for US' / 'cover AC with scenarios' trigger phrase"}, - {"id": 17, "query": "My scenario for AC:US-010-01 covers only the username field — how should I tag it?", "should_trigger": true, "reason": "Aspect:value tag encoding is part of scenario generation for multi-aspect ACs"}, - {"id": 18, "query": "Sync feature files with living doc after AC changes", "should_trigger": false, "reason": "Syncing existing scenarios to living doc — routes to gherkin-living-doc-sync"} -] + { + "id": 1, + "query": "Create BDD scenarios for user story US-007 — Place an Online Order", + "should_trigger": true, + "reason": "'create BDD scenarios for user story' trigger phrase" + }, + { + "id": 2, + "query": "Generate scenarios for US-003 — Apply Promo Code", + "should_trigger": true, + "reason": "'generate scenarios for US' trigger phrase" + }, + { + "id": 3, + "query": "Cover all ACs of US-011 with BDD scenarios", + "should_trigger": true, + "reason": "'cover AC with scenarios' trigger phrase" + }, + { + "id": 4, + "query": "Generate a feature file from user story US-015", + "should_trigger": true, + "reason": "'generate feature file from user story' trigger phrase" + }, + { + "id": 5, + "query": "Create BDD scenarios from requirements for the login flow", + "should_trigger": true, + "reason": "'BDD from requirements' trigger phrase" + }, + { + "id": 6, + "query": "What is the scenario coverage for US-007?", + "should_trigger": true, + "reason": "'scenario coverage for US' trigger phrase" + }, + { + "id": 7, + "query": "Map the ACs of US-009 to Gherkin scenarios", + "should_trigger": true, + "reason": "'map AC to scenarios' trigger phrase" + }, + { + "id": 8, + "query": "Generate Gherkin from user story US-012", + "should_trigger": true, + "reason": "'gherkin from user story' trigger phrase" + }, + { + "id": 9, + "query": "Create scenarios for US-007", + "should_trigger": true, + "reason": "'scenarios for US-' trigger phrase — explicitly mentions a US ID" + }, + { + "id": 10, + "query": "Generate a .feature file for the checkout flow", + "should_trigger": true, + "reason": "'generate .feature file' trigger phrase" + }, + { + "id": 11, + "query": "Write standalone Gherkin scenarios for an exploratory test", + "should_trigger": true, + "reason": "Standalone Gherkin without a User Story — uses living-doc-scenario-creator Standalone mode" + }, + { + "id": 12, + "query": "Implement the step definition for 'When the customer confirms the order'", + "should_trigger": false, + "reason": "Step definition implementation — routes to gherkin-step" + }, + { + "id": 13, + "query": "Write a unit test for the promo code calculation", + "should_trigger": false, + "reason": "Unit test request — out of scope for this toolkit (no test-unit-write skill defined)" + }, + { + "id": 14, + "query": "Find which User Stories have no Gherkin coverage at all", + "should_trigger": false, + "reason": "Finding doc gaps — routes to living-doc-gap-finder" + }, + { + "id": 15, + "query": "What is the AC coverage for US-007 — are all ACs covered by a scenario?", + "should_trigger": true, + "reason": "'scenario coverage for US' trigger phrase — auditing AC-to-scenario coverage" + }, + { + "id": 16, + "query": "Generate BDD scenarios for all active ACs on US-003", + "should_trigger": true, + "reason": "'generate scenarios for US' / 'cover AC with scenarios' trigger phrase" + }, + { + "id": 17, + "query": "My scenario for AC:US-010-01 covers only the username field — how should I tag it?", + "should_trigger": true, + "reason": "Aspect:value tag encoding is part of scenario generation for multi-aspect ACs" + }, + { + "id": 18, + "query": "Sync feature files with living doc after AC changes", + "should_trigger": false, + "reason": "Syncing existing scenarios to living doc — routes to gherkin-living-doc-sync" + }, + { + "id": 19, + "query": "Write step definitions for the checkout scenarios", + "should_trigger": false, + "reason": "Writing step definitions — routes to gherkin-step" + }, + { + "id": 20, + "query": "Run a dead code audit to find unused step definitions", + "should_trigger": false, + "reason": "Dead code audit — routes to bdd-maintain" + }, + { + "id": 21, + "query": "Sync the @AC: tags in the feature files with the AC catalog", + "should_trigger": false, + "reason": "Feature file sync — routes to gherkin-living-doc-sync" + }, + { + "id": 22, + "query": "Scan the checkout page and generate PageObjects", + "should_trigger": false, + "reason": "PageObject scanning — routes to living-doc-pageobject-scan" + }, + { + "id": 23, + "query": "Add data-cy attributes to the checkout template", + "should_trigger": false, + "reason": "Instrumenting templates — routes to data-cy-instrument" + }, + { + "id": 24, + "query": "Create a Feature entity for the checkout module", + "should_trigger": false, + "reason": "Creating a Feature — routes to living-doc-create-feature" + }, + { + "id": 25, + "query": "Create a Functionality for the promo stacking rule", + "should_trigger": false, + "reason": "Creating a Functionality — routes to living-doc-create-functionality" + }, + { + "id": 26, + "query": "Run a gap analysis to find unlinked test files", + "should_trigger": false, + "reason": "Gap analysis — routes to living-doc-gap-finder" + }, + { + "id": 27, + "query": "Update the acceptance criterion wording on US-007-01", + "should_trigger": false, + "reason": "Updating an entity — routes to living-doc-update" + }, + { + "id": 28, + "query": "What does the checkout refactor affect in the living doc?", + "should_trigger": false, + "reason": "Impact analysis — routes to living-doc-impact-analysis" + }, + { + "id": 29, + "query": "I already have a scenario for AC:US-007-01 — should I skip or regenerate it?", + "should_trigger": true, + "reason": "Merge policy decision for existing scenario — scenario-creator owns this" + } +] \ No newline at end of file diff --git a/skills/living-doc-update/evals/evals.json b/skills/living-doc-update/evals/evals.json index b86815f..28edb38 100644 --- a/skills/living-doc-update/evals/evals.json +++ b/skills/living-doc-update/evals/evals.json @@ -30,10 +30,10 @@ "id": 3, "category": "regression", "prompt": "The `LegacyPaymentGatewayService` has been deleted from the codebase. How do I handle this in the living doc?", - "expected_output": "Agent sets status to deprecated on the entity \u2014 never deletes the entity file. Adds deprecated_at and deprecation_reason fields. Links to the commit that deleted the code if possible. Flags any linked Gherkin scenarios for gherkin-living-doc-sync.", + "expected_output": "Agent sets status to deprecated on the entity — never deletes the entity file. Adds deprecated_at and deprecation_reason fields. Links to the commit that deleted the code if possible. Flags any linked Gherkin scenarios for gherkin-living-doc-sync.", "files": [], "expectations": [ - "Sets status: deprecated \u2014 never deletes the entity file", + "Sets status: deprecated — never deletes the entity file", "Adds deprecated_at and deprecation_reason fields", "Links to the commit that deleted the code if possible", "Flags linked Gherkin scenarios for gherkin-living-doc-sync" @@ -55,10 +55,10 @@ "id": 5, "category": "regression", "prompt": "After the sprint review, the product owner clarified the wording of US-042-AC-1. How do I update it without breaking traceability?", - "expected_output": "Agent keeps the AC ID stable \u2014 never changes the ID. Only updates the description, given, when, and then fields. Flags for gherkin-living-doc-sync if the linked scenario step text needs updating to match the revised wording.", + "expected_output": "Agent keeps the AC ID stable — never changes the ID. Only updates the description, given, when, and then fields. Flags for gherkin-living-doc-sync if the linked scenario step text needs updating to match the revised wording.", "files": [], "expectations": [ - "Keeps AC ID stable \u2014 never changes the ID", + "Keeps AC ID stable — never changes the ID", "Updates only description, given, when, then fields", "Flags for gherkin-living-doc-sync if linked scenario text needs updating" ] @@ -67,7 +67,7 @@ "id": 6, "category": "negative", "prompt": "Create a new User Story for the express checkout flow.", - "expected_output": "Creating a new User Story is out of scope for this skill \u2014 routes to living-doc-create-user-story. living-doc-update amends or deprecates existing entities; it does not create new ones.", + "expected_output": "Creating a new User Story is out of scope for this skill — routes to living-doc-create-user-story. living-doc-update amends or deprecates existing entities; it does not create new ones.", "files": [], "expectations": [ "Does not create a new User Story", @@ -79,7 +79,7 @@ "id": 7, "category": "negative", "prompt": "Which User Stories don't have any linked Gherkin scenarios?", - "expected_output": "Finding coverage gaps is out of scope for this skill \u2014 routes to living-doc-gap-finder. living-doc-update modifies existing entities; gap detection is handled by living-doc-gap-finder.", + "expected_output": "Finding coverage gaps is out of scope for this skill — routes to living-doc-gap-finder. living-doc-update modifies existing entities; gap detection is handled by living-doc-gap-finder.", "files": [], "expectations": [ "Does not search for User Stories without scenarios", @@ -90,7 +90,7 @@ { "id": 8, "category": "paraphrase", - "prompt": "US-089 needs updating \u2014 we discovered a new edge case during testing: when the delivery address is outside our shipping zone, the order should be blocked with a clear message. Can you update the story?", + "prompt": "US-089 needs updating — we discovered a new edge case during testing: when the delivery address is outside our shipping zone, the order should be blocked with a clear message. Can you update the story?", "expected_output": "Agent identifies this as an add-AC request despite 'update the story' phrasing. Assigns the next sequential AC ID. Forms the AC: Given customer with out-of-zone address / When order is placed / Then order is blocked with SHIPPING_ZONE_EXCLUDED error. Flags for gherkin-living-doc-sync if linked scenarios need updating. Outputs a change summary with the new AC.", "files": [], "expectations": [ @@ -104,11 +104,11 @@ { "id": 9, "category": "edge-case", - "prompt": "We decided during the sprint to descope US-042-AC-3 \u2014 the promo stacking rule is moving to a future release. How do I handle this in the living doc without losing the work?", - "expected_output": "Agent sets the AC status to 'descoped' \u2014 does not delete the AC. Adds descoped_at and descoped_reason fields. Adds a future_release reference if the work is planned for a later sprint. Flags any linked Gherkin scenarios for @wip or @pending tagging via gherkin-living-doc-sync.", + "prompt": "We decided during the sprint to descope US-042-AC-3 — the promo stacking rule is moving to a future release. How do I handle this in the living doc without losing the work?", + "expected_output": "Agent sets the AC status to 'descoped' — does not delete the AC. Adds descoped_at and descoped_reason fields. Adds a future_release reference if the work is planned for a later sprint. Flags any linked Gherkin scenarios for @wip or @pending tagging via gherkin-living-doc-sync.", "files": [], "expectations": [ - "Sets AC status to 'descoped' \u2014 does not delete the AC", + "Sets AC status to 'descoped' — does not delete the AC", "Adds descoped_at and descoped_reason fields", "Adds a future_release reference if planned for a later sprint", "Flags linked Gherkin scenarios for @wip or @pending tagging via gherkin-living-doc-sync" @@ -125,7 +125,8 @@ "New text clearly labelled (NEW)", "Linked Gherkin scenarios listed with filename and line number", "No content beyond changed AC and linked scenarios is modified" - ] + ], + "category": "happy-path" }, { "id": 11, @@ -141,7 +142,8 @@ "New AC-2 text shown (labelled NEW)", "Linked Gherkin scenario identified for re-sync", "No other ACs or sections modified" - ] + ], + "category": "happy-path" }, { "id": 12, @@ -156,7 +158,8 @@ "superseded_by points to FEAT-payment-page", "Flags owned Functionalities for deprecation review", "Flags linked tests for update or removal" - ] + ], + "category": "regression" }, { "id": 13, @@ -170,7 +173,8 @@ "Notes the script checks field requirements, ID format, and status values", "Notes --catalog flag for referential integrity checks", "Explains exit 0 (valid with warnings) vs exit 1 (errors to fix)" - ] + ], + "category": "regression" }, { "id": 14, @@ -184,18 +188,36 @@ "Identifies the failing invariant: missing error/alternative-path AC", "Proposes an example error-path AC", "Does not set status to active until the invariant passes" - ] }, + ], + "category": "edge-case" + }, { "id": 15, "category": "regression", - "prompt": "AC:US-042-01 is currently ACTIVE at v1.0.0. The discount threshold rule changed \u2014 the minimum order value is now \u00a375 instead of \u00a350. I need to update the AC.", - "expected_output": "Agent shows OLD and NEW AC side by side for confirmation. Updates the AC description to reflect the new threshold. Bumps the version from v1.0.0 to v1.1.0 \u2014 required for all business-rule changes to ACTIVE ACs. The AC ID stays unchanged. Any linked Gherkin scenarios annotated with '# AC: US-042-01' are flagged as potentially stale and handed to gherkin-living-doc-sync.", + "prompt": "AC:US-042-01 is currently ACTIVE at v1.0.0. The discount threshold rule changed — the minimum order value is now £75 instead of £50. I need to update the AC.", + "expected_output": "Agent shows OLD and NEW AC side by side for confirmation. Updates the AC description to reflect the new threshold. Bumps the version from v1.0.0 to v1.1.0 — required for all business-rule changes to ACTIVE ACs. The AC ID stays unchanged. Any linked Gherkin scenarios annotated with '# AC: US-042-01' are flagged as potentially stale and handed to gherkin-living-doc-sync.", "files": [], "expectations": [ "Version bumped from v1.0.0 to v1.1.0", "AC ID stays unchanged", "OLD and NEW AC shown side by side before writing", "Linked Gherkin scenarios flagged for re-sync via gherkin-living-doc-sync" - ] } + ] + }, + { + "id": 16, + "category": "regression", + "prompt": "I need to rename FEAT-checkout to FEAT-checkout-v2. Walk me through every change required.", + "expected_output": "Feature rename cascade — 7 steps: (1) Update the Feature entity file: change 'id' and 'name' fields. (2) Update feature_id in all linked Functionality entities. (3) Update feature_registry.json: replace the old key with the new feature id. (4) Update manifest.json / seed.yaml: update the Feature name/id entry. (5) Update PageObject file headers: change '# Feature: FEAT-checkout' to '# Feature: FEAT-checkout-v2'. (6) Update Gherkin feature file headers: change '# Feature: checkout' references. (7) Run living-doc-gap-finder to confirm no orphan references remain.", + "files": [], + "expectations": [ + "Lists all 7 cascade steps", + "Step 2: update feature_id in Functionality entities", + "Step 3: update feature_registry.json", + "Step 5: update PageObject file headers", + "Step 6: update Gherkin # Feature: headers", + "Step 7: run gap-finder to confirm clean state" + ] + } ] -} +} \ No newline at end of file diff --git a/skills/living-doc-update/evals/trigger-eval.json b/skills/living-doc-update/evals/trigger-eval.json index faab5eb..77145b7 100644 --- a/skills/living-doc-update/evals/trigger-eval.json +++ b/skills/living-doc-update/evals/trigger-eval.json @@ -94,5 +94,59 @@ "query": "Mark FEAT-legacy-payment-widget as deprecated and point to FEAT-payment-page as the replacement.", "should_trigger": true, "reason": "'mark feature deprecated' and 'deprecate feature' trigger phrases; superseded_by field is set as part of deprecation." + }, + { + "id": "t17-not-scenario", + "query": "Generate BDD scenarios for all active ACs on US-007", + "should_trigger": false, + "reason": "Generating scenarios — routes to living-doc-scenario-creator" + }, + { + "id": "t18-not-step", + "query": "Write step definitions for the checkout scenarios", + "should_trigger": false, + "reason": "Writing step definitions — routes to gherkin-step" + }, + { + "id": "t19-not-bdd-maintain", + "query": "Run the BDD cleanup for the deprecated checkout feature", + "should_trigger": false, + "reason": "BDD artifact cleanup — routes to bdd-maintain" + }, + { + "id": "t20-not-sync", + "query": "Sync the @AC: header comments in the feature files", + "should_trigger": false, + "reason": "Feature file sync — routes to gherkin-living-doc-sync" + }, + { + "id": "t21-not-pageobject", + "query": "Scan the webapp and regenerate PageObjects after the redesign", + "should_trigger": false, + "reason": "PageObject scanning — routes to living-doc-pageobject-scan" + }, + { + "id": "t22-not-data-cy", + "query": "Add data-cy attributes to the confirm button", + "should_trigger": false, + "reason": "Instrumenting templates — routes to data-cy-instrument" + }, + { + "id": "t23-not-gap-finder", + "query": "Find all User Stories with no AC coverage", + "should_trigger": false, + "reason": "Gap analysis — routes to living-doc-gap-finder" + }, + { + "id": "t24-not-impact", + "query": "What does this PR affect in the living doc?", + "should_trigger": false, + "reason": "Impact analysis — routes to living-doc-impact-analysis" + }, + { + "id": "t25-rename-feature", + "query": "I need to rename FEAT-checkout to FEAT-checkout-v2 — walk me through all the cascade steps", + "should_trigger": true, + "reason": "Feature rename cascade — living-doc-update owns the 7-step rename workflow" } -] +] \ No newline at end of file From fafcd64788d9fc899461e6636f2852c07b6c13e0 Mon Sep 17 00:00:00 2001 From: miroslavpojer <miroslav.pojer@absa.africa> Date: Sun, 31 May 2026 09:34:57 +0200 Subject: [PATCH 31/35] Tested agent and improved testing doc. --- .../evals/living-doc-bdd-copilot/evals.json | 13 +++ .../living-doc-bdd-copilot/fixture-map.md | 1 + .../agents/living-doc-bdd-copilot.agent.md | 58 ++++++++--- docs/testing/agent-testing.md | 97 ++++++++++++------- docs/testing/skill-testing.md | 24 +++-- 5 files changed, 134 insertions(+), 59 deletions(-) diff --git a/.github/agents/evals/living-doc-bdd-copilot/evals.json b/.github/agents/evals/living-doc-bdd-copilot/evals.json index 0f7c560..a2d26bd 100644 --- a/.github/agents/evals/living-doc-bdd-copilot/evals.json +++ b/.github/agents/evals/living-doc-bdd-copilot/evals.json @@ -337,6 +337,19 @@ "Requires at least one happy-path and one error-path AC", "Asks about existing Functionalities to link — prevents ORPHAN_FUNCTIONALITY gaps" ] + }, + { + "id": 25, + "category": "negative", + "prompt": "Write a unit test for the applyDiscount() service method.", + "expected_output": "Agent declines the request. Explains that writing unit or integration tests is outside its scope. Directs the user to @sdet-copilot (noting it is not yet deployed). Does not write or stub any test code, and does not leave a TODO comment in any file.", + "files": [], + "expectations": [ + "Declines the request — does not write any test code", + "Directs user to @sdet-copilot", + "Does not add a TODO comment to any file", + "Does not partially implement or stub the unit test" + ] } ] } \ No newline at end of file diff --git a/.github/agents/evals/living-doc-bdd-copilot/fixture-map.md b/.github/agents/evals/living-doc-bdd-copilot/fixture-map.md index 54cd1b1..eeeab14 100644 --- a/.github/agents/evals/living-doc-bdd-copilot/fixture-map.md +++ b/.github/agents/evals/living-doc-bdd-copilot/fixture-map.md @@ -16,6 +16,7 @@ | 10 | regression | Credential safety — literal credentials in seed.yaml rejected | — | | 11 | edge-case | Source E guided traversal — blocked crawl, unknown field value | — | | 12 | output-format | manifest.json entry structure for a scanned route | — | +| 25 | negative | Unit test request → decline + direct to @sdet-copilot | — | ## Trigger eval summary diff --git a/.github/agents/living-doc-bdd-copilot.agent.md b/.github/agents/living-doc-bdd-copilot.agent.md index 580e1cf..9f8518f 100644 --- a/.github/agents/living-doc-bdd-copilot.agent.md +++ b/.github/agents/living-doc-bdd-copilot.agent.md @@ -1,17 +1,13 @@ --- description: > - Single agent for living documentation and BDD automation — catalog management plus - executable test generation. Catalog: create/update/deprecate User Stories, Features, - Functionalities and ACs; impact analysis; gap finding (AUDIT/PLAN modes). - Automation: explore webapps, generate PageObjects, produce Gherkin scenarios and step - definitions, maintain BDD suites, sync traceability. Triggers: "create user story", - "document feature", "update AC", "impact analysis", "living doc gaps", "PLAN mode", - "AUDIT mode", "deprecate entity", "mark US ready", "scan webapp", "generate pageobjects", - "heal pageobjects", "generate scenarios", "sync gherkin", "playwright crawl", - "explore the app", "BDD pipeline", "crawl the UI", "create page objects", - "generate feature file", "step definitions", "add missing data-cy", "fix playwright selectors", - "living doc bdd copilot", "living doc copilot". -tools: [vscode/askQuestions, vscode/toolSearch, vscode/memory, vscode/resolveMemoryFileUri, vscode/runCommand, vscode/vscodeAPI, execute/runInTerminal, execute/getTerminalOutput, execute/sendToTerminal, execute/killTerminal, execute/runTask, execute/createAndRunTask, read/readFile, read/viewImage, read/problems, read/terminalLastCommand, agent/runSubagent, browser/openBrowserPage, browser/readPage, browser/screenshotPage, browser/navigatePage, browser/clickElement, browser/dragElement, browser/hoverElement, browser/typeInPage, browser/runPlaywrightCode, browser/handleDialog, edit/createDirectory, edit/createFile, edit/editFiles, edit/rename, search/codebase, search/fileSearch, search/listDirectory, search/textSearch, search/usages, web/fetch, web/githubRepo, web/githubTextSearch, todo] + Living documentation catalog (User Story/Feature/Functionality entities, ACs, + living-doc traceability analysis, gap finding) and BDD automation (Playwright + crawl/explore/scan, PageObject create/heal, Gherkin scenarios/feature files/step + definitions, living-doc sync, scenario coverage). Catalog entity creation, + update, deprecation; PR trace for living-doc entity impact; credential + validation in seed.yaml. NOT for: unit tests, production code, API or generic + tech docs, CI/CD, debugging, performance, security, code review. +tools: [vscode/askQuestions, vscode/toolSearch, vscode/memory, vscode/resolveMemoryFileUri, execute/runInTerminal, execute/getTerminalOutput, execute/sendToTerminal, execute/killTerminal, read/readFile, read/viewImage, read/problems, read/terminalLastCommand, agent/runSubagent, browser/openBrowserPage, browser/readPage, browser/screenshotPage, browser/navigatePage, browser/clickElement, browser/dragElement, browser/hoverElement, browser/typeInPage, browser/runPlaywrightCode, browser/handleDialog, edit/createDirectory, edit/createFile, edit/editFiles, edit/rename, search/codebase, search/fileSearch, search/listDirectory, search/textSearch, search/usages, web/fetch, web/githubRepo, web/githubTextSearch, todo] --- # @living-doc-bdd-copilot @@ -102,6 +98,36 @@ Load **one** skill per session. Do not pre-load skills for modes not yet trigger | Implement step definitions | `gherkin-step` | No manifest loading | | Find ACs with no linked scenario | `living-doc-gap-finder` (bottom-up) | No manifest loading | +### Automation session setup + +**Seed assembly** — build `seed.yaml` from these sources (load what is available; note absent sources, do not error): + +| Source | What to load | +|---|---| +| A | Feature-to-route mappings from the living doc catalog | +| B | Route config: Angular router, React Router, or `sitemap.xml` | +| D | Existing `manifest.json` — if absent, this is a first-run | + +After creating `seed.yaml`, propose adding BDD artifact paths (seed, manifest, PageObjects, feature files) to `.github/copilot-instructions.md` so future sessions have them in context automatically. + +**Partial state detection:** + +| State | Rule | +|---|---| +| seed.yaml present, manifest.json absent | First exploration run — start from `base_url`, create manifest during crawl, do not assume prior discovery | +| Both present | Resume session from manifest state | +| Neither present | Collect seed inputs from user before proceeding | + +**Credential security:** `seed.yaml` credentials must always use `env:VAR_NAME` references. If literal credential values are present, flag as a **security violation** and refuse to proceed until they are replaced with environment variable references. Explain that literal credentials in a committed file are exposed to anyone with repository access. + +**Guided traversal (Source E):** When the crawl reaches a page requiring a business-specific value the agent cannot determine (unknown form field, decision point): + +1. Take a screenshot and show the user the current state. +2. Ask: "I've reached a decision point at `<url>`. What should I do next? Please provide the value for `<field>`." +3. Execute the action via MCP Playwright after receiving the answer. +4. Immediately append the action to `guided_steps` in `seed.yaml` so the route can be re-navigated without prompting in future sessions. +5. Do not invent or guess business-specific field values. + ### Entity deprecation chain When a User Story or Feature is deprecated, three skills fire in sequence. Complete each step fully before starting the next. @@ -114,9 +140,9 @@ When a User Story or Feature is deprecated, three skills fire in sequence. Compl Do not skip steps or run them out of order. Complete catalog changes (step 1) before touching any Gherkin or automation files. -**Manifest loading rule:** Use targeted line ranges for the current route(s). Load full manifest only for RE-SCAN. `seed.yaml`: always load in full. +**Manifest loading rule:** Use targeted line ranges for the current route(s). Load full manifest only for RE-SCAN. `seed.yaml`: always load in full. When PageObject generation discovers a route with no linked Feature entity, set `feature_id: FEAT-UNKNOWN`, flag the route as needing a Feature entity, and cross-load `living-doc-create-feature` to create it before continuing. -**living-doc-bdd-schemas:** Load [remotely](https://raw.githubusercontent.com/AbsaOSS/agentic-toolkit/master/skills/references/living-doc-bdd-schemas.md) only when generating or validating feature file headers, PageObject headers, ExplorationFixture entries, or seed.yaml form_fixtures. +**living-doc-bdd-schemas:** Load [remotely](https://raw.githubusercontent.com/AbsaOSS/agentic-toolkit/master/skills/references/living-doc-bdd-schemas.md) only when generating or validating feature file headers, PageObject headers, ExplorationFixture entries, seed.yaml form_fixtures, or manifest.json route entries. --- @@ -139,8 +165,8 @@ Do not skip steps or run them out of order. Complete catalog changes (step 1) be ## Does NOT -- Write unit or integration tests: `@sdet-copilot` _(not yet deployed — leave `TODO: @sdet-copilot`)_ -- Run language-specific quality gates: `@quality-gate-copilot` _(not yet deployed — leave a TODO note)_ +- **Write unit or integration tests** — decline and direct the user to `@sdet-copilot` (not yet deployed). Do not write or modify any test code. +- **Run language-specific quality gates** — decline and direct the user to `@quality-gate-copilot` (not yet deployed). Do not execute linters, type-checkers, or build pipelines. --- diff --git a/docs/testing/agent-testing.md b/docs/testing/agent-testing.md index 87f2e06..92bb95c 100644 --- a/docs/testing/agent-testing.md +++ b/docs/testing/agent-testing.md @@ -46,37 +46,45 @@ The key insight: an agent's `description:` block is read by the same matching me ## 3. Trigger eval format -Mirrors the skill trigger-eval format exactly. Store at `.github/agents/evals/<agent-name>/trigger-eval.json`: +Store at `.github/agents/evals/<agent-name>/trigger-eval.json` as a **flat JSON array** (no wrapper object): ```json -{ - "agent_name": "my-agent", - "evals": [ - { - "id": "should-trigger-1", - "prompt": "scan this webapp and generate pageobjects", - "should_trigger": true - }, - { - "id": "should-trigger-2", - "prompt": "explore the app and create page objects for the login screen", - "should_trigger": true - }, - { - "id": "should-not-trigger-1", - "prompt": "create a user story for the login feature", - "should_trigger": false, - "expected_agent": "living-doc-bdd-copilot" - }, - { - "id": "should-not-trigger-2", - "prompt": "write a unit test for the login validator", - "should_trigger": false - } - ] -} +[ + { + "id": 1, + "query": "Scan the webapp at https://app.example.com and generate PageObjects", + "should_trigger": true, + "reason": "'scan webapp' + 'generate pageobjects' core phrase" + }, + { + "id": 2, + "query": "Explore the app and map all the UI surfaces", + "should_trigger": true, + "reason": "'explore the app' maps to crawl/explore mode" + }, + { + "id": 3, + "query": "Create a User Story for the loyalty points redemption feature", + "should_trigger": true, + "reason": "Catalog entity creation — living-doc layer" + }, + { + "id": 4, + "query": "Write a unit test for the login validator", + "should_trigger": false, + "reason": "Unit test authoring — out of scope" + }, + { + "id": 5, + "query": "Debug the null pointer exception in PaymentService.processOrder()", + "should_trigger": false, + "reason": "Application debugging — outside scope" + } +] ``` +Note: the field is `query` (not `prompt`). The `reason` field is for human documentation only — it is not used by the eval runner. + Write at least **5 should-trigger** and **5 should-not-trigger** cases. Should-not-trigger cases are as important as the positive ones — they catch over-broad descriptions that shadow other agents. --- @@ -122,6 +130,8 @@ Point `skill-creator` at the agent files — it treats the `description:` block ``` Use the skill-creator skill to optimize the description for .github/agents/my-agent.agent.md using the trigger evals at .github/agents/evals/my-agent/trigger-eval.json. +Constraints: ≤ 1024 chars; structured domain nouns/verbs; include a NOT for: boundary clause. +Report precision and recall scores for each candidate. Repeat until all trigger evals pass. ``` `skill-creator` will propose candidate descriptions, score them against the eval set, and iterate. @@ -131,6 +141,11 @@ using the trigger evals at .github/agents/evals/my-agent/trigger-eval.json. ``` Use the skill-creator skill to run the body evals for .github/agents/my-agent.agent.md using .github/agents/evals/my-agent/evals.json. +Verify: (1) all body-referenced tools are present in the frontmatter tools: list, +(2) mode dispatch routes to the correct skill for each intent, +(3) scope boundaries match ## Scope and ## Does NOT, (4) handoff targets are correct. +Only fix scope, tool, or handoff issues — do not rewrite unless fundamentally mis-scoped. +Repeat until all evals pass. ``` Use the same with-skill / baseline comparison flow described in [skill-testing.md](./skill-testing.md). @@ -180,14 +195,19 @@ description: > Generates BDD tests. ``` -**Good pattern** — explicit Triggers list with concrete phrases: +**Good pattern** — minimalist semantic description with a `NOT for:` boundary: ```yaml description: > - Bridge living documentation to executable tests. ... - Triggers: "scan webapp", "generate pageobjects", "heal pageobjects", - "playwright crawl", "BDD pipeline", "crawl the UI". + Living documentation catalog (User Stories, Features, Functionalities, ACs, impact + analysis, gap finding) and BDD automation (Playwright crawl/explore/scan, PageObjects + create/heal, Gherkin scenarios/feature files/step definitions, living-doc sync, + scenario coverage). Setup: seed.yaml → manifest.json, credential checks, guided + traversal. NOT for: unit tests, production code, API specs, CI/CD, debugging, + performance, security. ``` +The `NOT for:` clause is as important as the positive terms — it prevents the agent from firing on adjacent-but-out-of-scope requests. An explicit `Triggers:` keyword list is not required; structured domain nouns and verbs are sufficient for the matching mechanism to work. + --- ## 9. Regression-first loop @@ -208,13 +228,18 @@ Same as skill testing — run the full trigger-eval set, fix the largest failure ## 10. Minimal session ``` -gh copilot +VS Code Copilot Chat (or gh copilot): → "Use the skill-creator skill to test the agent at .github/agents/my-agent.agent.md - using the evals at .github/agents/evals/my-agent/" -→ inspect trigger accuracy and body output diffs + using the evals at .github/agents/evals/my-agent/. + Report trigger precision/recall and body eval pass rate." +→ inspect trigger accuracy report and body output diffs; + classify each change as improvement, regression, or neutral → edit the `.agent.md` file directly to fix structural issues -→ "Use the skill-creator skill to optimize the description using the trigger-eval.json" -→ re-run evals until stable + (scope, tools: list, mode dispatch, handoff) +→ "Use the skill-creator skill to optimize the description for .github/agents/my-agent.agent.md + using .github/agents/evals/my-agent/trigger-eval.json. + Keep ≤ 1024 chars; include a NOT for: boundary clause. Repeat until all evals pass." +→ re-run full eval suite; keep or revert each change; repeat until stable ``` --- diff --git a/docs/testing/skill-testing.md b/docs/testing/skill-testing.md index a4942c9..a6d76f6 100644 --- a/docs/testing/skill-testing.md +++ b/docs/testing/skill-testing.md @@ -45,6 +45,8 @@ Ask for a side-by-side comparison in the Copilot CLI session: ``` Use the skill-creator skill to compare outputs for skills/my-skill with and without the skill enabled. +Compare on: correctness, structure adherence, completeness, and output verbosity. +Only fix the smallest part of the skill that explains the largest failure cluster. Repeat until all evals pass. ``` Compare correctness, completeness, structure, latency, verbosity, and formatting stability. @@ -78,9 +80,14 @@ When an eval fails, update the smallest possible part of the skill and re-run th Ask Copilot to optimize the description against your trigger eval set: ``` -Use the skill-creator skill to optimize the description for skills/my-skill using skills/my-skill/evals/trigger-eval.json. +Use the skill-creator skill to optimize the description for skills/my-skill +using skills/my-skill/evals/trigger-eval.json. +Keep minimalist (≤ 1024 chars); structured domain nouns/verbs preferred over explicit keyword lists. +Include a NOT for: boundary clause. Report precision and recall per candidate. Repeat until all evals pass. ``` +A good description uses structured domain nouns and a `NOT for:` boundary. An explicit keyword list is not required. + ## 10. What “good enough” looks like - All smoke tests and known regressions passing @@ -101,12 +108,15 @@ Use the skill-creator skill to optimize the description for skills/my-skill usin ## 12. Minimal CLI Loop ``` -gh copilot -→ "Use the skill-creator skill to test my skill at skills/my-skill" -→ inspect results and diffs -→ edit SKILL.md or fixtures -→ "Use the skill-creator skill to rerun the evals for skills/my-skill" -→ optimize description if needed +VS Code Copilot Chat (or gh copilot): +→ "Use the skill-creator skill to test my skill at skills/my-skill. + Run all evals and report pass rate and baseline delta." +→ inspect results and diffs — classify each change as improvement, regression, or neutral +→ edit SKILL.md or fixtures (smallest change that fixes the largest failure cluster) +→ "Use the skill-creator skill to rerun the evals for skills/my-skill." +→ "Use the skill-creator skill to optimize the description for skills/my-skill + using skills/my-skill/evals/trigger-eval.json. + Keep ≤ 1024 chars; include a NOT for: boundary clause. Repeat until all evals pass." → repeat until stable ``` From d4462a723c75b7c690dbdeb01816269931b1bb82 Mon Sep 17 00:00:00 2001 From: miroslavpojer <miroslav.pojer@absa.africa> Date: Sun, 31 May 2026 10:05:49 +0200 Subject: [PATCH 32/35] Skill test - round 1. --- skills/bdd-maintain/SKILL.md | 18 ++++++++++++++++++ skills/data-cy-instrument/SKILL.md | 3 ++- .../living-doc-create-functionality/SKILL.md | 3 ++- skills/living-doc-gap-finder/SKILL.md | 11 ++++++++++- skills/living-doc-pageobject-scan/SKILL.md | 3 ++- .../evals/evals.json | 4 ++-- skills/living-doc-update/SKILL.md | 2 +- 7 files changed, 37 insertions(+), 7 deletions(-) diff --git a/skills/bdd-maintain/SKILL.md b/skills/bdd-maintain/SKILL.md index 0b07cf9..437a8d8 100644 --- a/skills/bdd-maintain/SKILL.md +++ b/skills/bdd-maintain/SKILL.md @@ -100,6 +100,24 @@ python playwright/scripts/find_unused_po_components.py \ All three scripts exit `0` on clean, `1` on findings, `2` on bad arguments — safe for CI gating. +### Handling findings — edge cases + +**Recommended script order for a full audit:** run in the sequence steps → PO methods → PO components. Deleting unused steps can expose unused PO methods; deleting unused PO methods can then expose unused PO classes. Running in this order ensures each pass builds on the previous one rather than missing transitively dead code. + +**Unused step def — distinguish before deleting:** +- If the step belongs to a **deprecated entity** (the US/Feature has `status: deprecated` in the catalog), delete it — the coverage it provided is no longer needed. +- If the step belongs to an **active entity** but has no exercising scenario, it is a stale draft or an orphan; flag it for team review before deleting. Someone may be about to add a scenario for it. +- Never delete without first verifying the step is not imported or re-exported by another step file — grep for the step file name as an import target as well. + +**Script false positives — `find_unused_po_methods.py`:** +The script uses static string matching. It can report a false positive when: +- A method is called on a variable typed as a **base class or interface** rather than the concrete PageObject (e.g. `page.confirm_order()` where `page: BasePage`) +- A method is invoked via a dynamic alias or through a test helper that re-exports it +If the reported method visibly exists in a step file call site, trust the call site over the script output and do not delete the method. File the false positive with the team so the script can be improved. + +**Shared steps between deprecated and active scenarios:** +Before deleting a step definition, always check whether it is exercised by any scenario outside the deprecated entity. Run `find_unused_steps.py` after removing the deprecated scenarios — not before. If the script still reports the step as unused after the deprecated scenarios are removed, it is safe to delete. If it now shows as used (by a surviving scenario), keep it. + --- ## Out-of-scope routing diff --git a/skills/data-cy-instrument/SKILL.md b/skills/data-cy-instrument/SKILL.md index 776e739..02ed648 100644 --- a/skills/data-cy-instrument/SKILL.md +++ b/skills/data-cy-instrument/SKILL.md @@ -7,7 +7,8 @@ description: > Functionalities have `status: planned` due to missing test IDs. Triggers on: "add missing data-cy", "instrument templates", "fix data-cy gaps", "add testids", "data-cy audit", "instrument angular templates", "fix locators", "add data-cy attributes", - "add test ids to templates", "fix playwright selectors due to missing data-cy", "data-cy-instrument". + "add test ids to templates", "fix playwright selectors due to missing data-cy", "data-cy-instrument", + "coverage_gaps", "Functionality status planned". Does NOT trigger for: adding Gherkin (use living-doc-scenario-creator); PageObject healing without data-cy gaps (use living-doc-pageobject-scan HEALING). Pairs with living-doc-pageobject-scan (upstream) and living-doc-scenario-creator (downstream); diff --git a/skills/living-doc-create-functionality/SKILL.md b/skills/living-doc-create-functionality/SKILL.md index 5c411c3..13a72c0 100644 --- a/skills/living-doc-create-functionality/SKILL.md +++ b/skills/living-doc-create-functionality/SKILL.md @@ -7,7 +7,8 @@ description: > Triggers on: "create a functionality", "document an atomic behavior", "functionality AC", "unit-testable behavior", "define component behavior", "atomic acceptance criteria", "document a business rule", "create a functionality entity", "functionality acceptance criteria", - "test_type", "unit vs integration test", "choose test type", "link functionality to feature". + "test_type", "unit vs integration test", "choose test type", "link functionality to feature", + "review this functionality", "reuse candidate", "what ACs should I write for". Does NOT trigger for: E2E User Stories (use living-doc-create-user-story); system surfaces (use living-doc-create-feature); generating BDD scenarios (use living-doc-scenario-creator). diff --git a/skills/living-doc-gap-finder/SKILL.md b/skills/living-doc-gap-finder/SKILL.md index 82129b4..5d0d529 100644 --- a/skills/living-doc-gap-finder/SKILL.md +++ b/skills/living-doc-gap-finder/SKILL.md @@ -9,7 +9,7 @@ description: > Triggers on: "find what's not documented", "living doc gaps", "what's missing in living doc", "find undocumented features", "orphan tests", "orphan functionalities", "untested AC", "documentation coverage", "gap report", "what's not covered", "living doc audit", - "documentation audit". + "documentation audit", "stale reference", "broken AC link", "test points to deprecated AC". Does NOT trigger for: creating new living doc objects (use living-doc-create-* skills). Pairs with living-doc-update (stale references) and living-doc-create-* skills (gap resolution). license: Apache-2.0 @@ -77,6 +77,12 @@ Nine types of gaps are detected, in order of risk: > **Resolution routing:** `UNTESTED_AC` → `living-doc-scenario-creator`; `UNDOCUMENTED_SURFACE` / `ORPHAN_FUNCTIONALITY` / `EMPTY_FEATURE` → `living-doc-create-*`; `ORPHAN_FEATURE` / `ORPHAN_USER_STORY` → `living-doc-update` (add missing link); `ORPHAN_TEST` → `gherkin-living-doc-sync`; **`STALE_REFERENCE`** → `living-doc-update` (deprecate the AC or update the test `@AC:` tag); `UNDOCUMENTED_FUNCTIONALITY` → `living-doc-scenario-creator`. +> **ORPHAN_TEST — never delete a test to resolve the gap.** Deleting a test removes coverage; it does not close the gap — it masks it. Instead: (1) find an existing AC that matches the test's intent and add the `@AC:` link, or (2) if no AC exists, create a Functionality with `living-doc-create-functionality` and link the test to the new AC. Only delete a test after explicit product owner confirmation that the behavior is no longer required. + +> **ORPHAN_TEST — broken-link variant:** A test may reference an AC that was deleted from the catalog entirely (not merely deprecated). Classify this as `ORPHAN_TEST` (broken-link variant) — not `STALE_REFERENCE`. Resolution options: (1) recreate the entity if the behavior is still required and relink; (2) update the test link to the AC that superseded it; (3) delete the test after product owner confirmation. Never delete without confirmation. + +> **Large-scale ORPHAN_TEST remediation:** When a codebase has dozens or hundreds of orphan tests, do not attempt a single full-codebase pass. Batch by domain or Feature area (for example payment, auth, reporting) and process the highest-business-risk areas first. For each batch, identify which Functionalities or User Stories the tests correspond to, create missing entities, and link tests. A single unmanageable gap report leads to paralysis — smaller focused batches produce actionable outcomes. + ## Workflow ### Step 1 — Bottom-up scan @@ -150,6 +156,9 @@ For each test in inventory GAP: STALE_REFERENCE ``` +**ORPHAN_TEST — broken-link variant:** +Also report `ORPHAN_TEST` when a test references an AC ID that **no longer exists** in the catalog (deleted, not merely deprecated). Distinguishing the two: a deprecated AC still has a living entity and can be reinstated; a deleted AC has no catalog entry at all. Resolution options are the same as standard `ORPHAN_TEST` — see the resolution routing note above. + **UNDOCUMENTED_FUNCTIONALITY:** ``` For each Functionality reachable via Feature `functionalities` links diff --git a/skills/living-doc-pageobject-scan/SKILL.md b/skills/living-doc-pageobject-scan/SKILL.md index f4b28e3..2cb4e3d 100644 --- a/skills/living-doc-pageobject-scan/SKILL.md +++ b/skills/living-doc-pageobject-scan/SKILL.md @@ -9,7 +9,8 @@ description: > Triggers on: "scan this webapp", "generate pageobjects", "crawl the UI", "explore the app", "discover routes", "seed.yaml", "manifest.json", "first scan", "create page objects", "pageobject drift", "re-scan", "refresh manifest", "heal pageobjects", "fix failing tests", - "selector drift", "tests are failing". + "selector drift", "tests are failing", "generate functionality stubs", + "bootstrap pageobjects", "bootstrap page objects". Does NOT trigger for: adding/fixing Gherkin (use living-doc-scenario-creator); resolving missing data-cy (use data-cy-instrument); deleting deprecated BDD files (use bdd-maintain). Pairs with data-cy-instrument, living-doc-create-feature, living-doc-scenario-creator, diff --git a/skills/living-doc-scenario-creator/evals/evals.json b/skills/living-doc-scenario-creator/evals/evals.json index 8dc8d50..878e48b 100644 --- a/skills/living-doc-scenario-creator/evals/evals.json +++ b/skills/living-doc-scenario-creator/evals/evals.json @@ -63,8 +63,8 @@ "expected_output": "Standalone Gherkin without a User Story uses living-doc-scenario-creator Standalone mode. Use @AC:STANDALONE tag; scenario is not tied to a catalog entity.", "files": [], "expectations": [ - "Does not generate a standalone scenario", - "Uses Standalone mode with @AC:STANDALONE tag", + "Generates the scenario using Standalone mode — does not refuse the request", + "Uses @AC:STANDALONE tag on the generated scenario", "Explains the distinction: US-driven vs. standalone Gherkin" ] }, diff --git a/skills/living-doc-update/SKILL.md b/skills/living-doc-update/SKILL.md index 6d65323..a8c6135 100644 --- a/skills/living-doc-update/SKILL.md +++ b/skills/living-doc-update/SKILL.md @@ -30,7 +30,7 @@ Ask: *Which entity is being updated, and what kind of change is this?* | Add a new AC | User Story / Functionality | Append a new AC entry with the next sequential AC ID | | Modify AC description | User Story / Functionality | Edit the description; keep the AC ID stable | | Change status | Any entity | Update `status` field; record the transition event | -| Change owner | Feature | Update `owners` field | +| Change owner | Feature | Update `owners` field; add `owner_changed_at` (ISO date) and `owner_change_reason` fields; notify the new owner if open User Stories are linked to the Feature | | Add a linked User Story | Feature | Append to `user_stories` | | Deprecate an entity | Any entity | Set `status: deprecated`; add `deprecated_at`, `deprecation_reason`, and optionally `superseded_by` | | Delete a Functionality | Functionality | Do not delete — deprecate it and link to the commit that removed the code | From c615596891ac585d7212cbb8497d0d8dfef3f04a Mon Sep 17 00:00:00 2001 From: miroslavpojer <miroslav.pojer@absa.africa> Date: Sun, 31 May 2026 10:34:46 +0200 Subject: [PATCH 33/35] Update SKILL.md files to enhance trigger phrases and improve descriptions across various living doc skills --- skills/living-doc-create-feature/SKILL.md | 2 +- skills/living-doc-create-functionality/SKILL.md | 7 +++---- skills/living-doc-gap-finder/SKILL.md | 5 +++-- skills/living-doc-impact-analysis/SKILL.md | 5 ++--- skills/living-doc-pageobject-scan/SKILL.md | 3 +-- skills/token-saving/SKILL.md | 6 +++--- 6 files changed, 13 insertions(+), 15 deletions(-) diff --git a/skills/living-doc-create-feature/SKILL.md b/skills/living-doc-create-feature/SKILL.md index 3e2c2b4..14631b8 100644 --- a/skills/living-doc-create-feature/SKILL.md +++ b/skills/living-doc-create-feature/SKILL.md @@ -8,7 +8,7 @@ description: > Triggers on: "document a new feature", "create a feature entity", "new screen documentation", "document an API endpoint", "feature registry", "what feature owns this", "map user story to feature", "system surface documentation", "feature owners", "feature dependencies", - "duplicate feature name", "resolve feature naming". + "duplicate feature name", "resolve feature naming", "rename feature". Does NOT trigger for: creating User Stories (use living-doc-create-user-story); defining behaviors (use living-doc-create-functionality); scanning PageObjects (use living-doc-pageobject-scan); deprecating (use living-doc-update). diff --git a/skills/living-doc-create-functionality/SKILL.md b/skills/living-doc-create-functionality/SKILL.md index 13a72c0..b11f089 100644 --- a/skills/living-doc-create-functionality/SKILL.md +++ b/skills/living-doc-create-functionality/SKILL.md @@ -2,7 +2,7 @@ name: living-doc-create-functionality description: > Define an atomic, testable behavior (Functionality) with Acceptance Criteria for unit or - integration tests. Use when documenting an atomic behavior, writing Functionality-level ACs, + integration tests. Use when writing Functionality-level ACs, choosing test_type, identifying reuse candidates, or reviewing a Functionality. Triggers on: "create a functionality", "document an atomic behavior", "functionality AC", "unit-testable behavior", "define component behavior", "atomic acceptance criteria", @@ -12,9 +12,8 @@ description: > Does NOT trigger for: E2E User Stories (use living-doc-create-user-story); system surfaces (use living-doc-create-feature); generating BDD scenarios (use living-doc-scenario-creator). - Pairs with living-doc-create-feature (parent surface first) and living-doc-scenario-creator - (BDD after). After creating, update the parent Feature's functionalities[] array - (else ORPHAN_FUNCTIONALITY gap). + Pairs with living-doc-create-feature and living-doc-scenario-creator. After creating, + update the parent Feature's functionalities[] array. license: Apache-2.0 compatibility: GitHub Copilot --- diff --git a/skills/living-doc-gap-finder/SKILL.md b/skills/living-doc-gap-finder/SKILL.md index 5d0d529..413995c 100644 --- a/skills/living-doc-gap-finder/SKILL.md +++ b/skills/living-doc-gap-finder/SKILL.md @@ -9,9 +9,10 @@ description: > Triggers on: "find what's not documented", "living doc gaps", "what's missing in living doc", "find undocumented features", "orphan tests", "orphan functionalities", "untested AC", "documentation coverage", "gap report", "what's not covered", "living doc audit", - "documentation audit", "stale reference", "broken AC link", "test points to deprecated AC". + "documentation audit", "stale reference", "broken AC link", "test points to deprecated AC", + "PLAN mode", "AUDIT mode", "draft ACs from PageObject descriptions". Does NOT trigger for: creating new living doc objects (use living-doc-create-* skills). - Pairs with living-doc-update (stale references) and living-doc-create-* skills (gap resolution). + Pairs with living-doc-update and living-doc-create-* skills. license: Apache-2.0 compatibility: GitHub Copilot --- diff --git a/skills/living-doc-impact-analysis/SKILL.md b/skills/living-doc-impact-analysis/SKILL.md index 6157478..dea7b2c 100644 --- a/skills/living-doc-impact-analysis/SKILL.md +++ b/skills/living-doc-impact-analysis/SKILL.md @@ -9,11 +9,10 @@ description: > Triggers on: "living doc impact", "what does this change affect", "impact of PR on living doc", "trace affected user stories", "affected features", "impact analysis", "living doc sign-off", "what user stories are affected", "which scenarios need re-running", "what needs re-testing", - "PR impact on docs". + "PR impact on docs", "bootstrap feature_registry". Does NOT trigger for: updating living doc (use living-doc-update); finding coverage gaps (use living-doc-gap-finder); creating new entities (use living-doc-create-*). - Pairs with living-doc-update (apply changes), gherkin-living-doc-sync (propagate AC changes), - and bdd-maintain (cleanup for deprecated entities). + Pairs with living-doc-update, gherkin-living-doc-sync, and bdd-maintain. license: Apache-2.0 compatibility: GitHub Copilot --- diff --git a/skills/living-doc-pageobject-scan/SKILL.md b/skills/living-doc-pageobject-scan/SKILL.md index 2cb4e3d..a655a97 100644 --- a/skills/living-doc-pageobject-scan/SKILL.md +++ b/skills/living-doc-pageobject-scan/SKILL.md @@ -13,8 +13,7 @@ description: > "bootstrap pageobjects", "bootstrap page objects". Does NOT trigger for: adding/fixing Gherkin (use living-doc-scenario-creator); resolving missing data-cy (use data-cy-instrument); deleting deprecated BDD files (use bdd-maintain). - Pairs with data-cy-instrument, living-doc-create-feature, living-doc-scenario-creator, - and gherkin-living-doc-sync. + Pairs with data-cy-instrument, living-doc-create-feature, and living-doc-scenario-creator. license: Apache-2.0 compatibility: GitHub Copilot --- diff --git a/skills/token-saving/SKILL.md b/skills/token-saving/SKILL.md index 473b75d..98c6c1f 100644 --- a/skills/token-saving/SKILL.md +++ b/skills/token-saving/SKILL.md @@ -9,9 +9,9 @@ description: > Great question!, Happy to help!); no closing platitudes (Let me know if you have questions!); concise within line limits; skip restating prior context; prefer tables/bullets over prose; append What changed / Why / How to verify footer only for code-output responses, not Q&A, - reviews, or planning. Boundary: when user explicitly requests full detail, deep dive, complete - explanation, or says "don't hold back", length rules suspend — respond fully. Another active - skill's more specific format requirements take precedence. + reviews, or planning. NOT for (full-detail override): when user explicitly requests full detail, + deep dive, complete explanation, or says "don't hold back", length rules suspend — respond + fully. Another active skill's more specific format requirements take precedence. --- # Token-Saving From c70f8ca7ec61f85b65e781d7e2b2d159240a337d Mon Sep 17 00:00:00 2001 From: miroslavpojer <miroslav.pojer@absa.africa> Date: Sun, 31 May 2026 11:33:22 +0200 Subject: [PATCH 34/35] Worked in changed from weakness review. --- skills/gherkin-step/SKILL.md | 140 ++++++++++++++++++ skills/living-doc-create-feature/SKILL.md | 14 ++ .../evals/evals.json | 27 ++++ .../living-doc-create-functionality/SKILL.md | 14 ++ .../evals/evals.json | 27 ++++ skills/living-doc-create-user-story/SKILL.md | 14 ++ .../evals/evals.json | 27 ++++ skills/living-doc-pageobject-scan/SKILL.md | 18 +++ .../evals/evals.json | 28 ++++ .../evals/evals.json | 69 +++++++++ skills/references/living-doc-glossary.md | 16 ++ 11 files changed, 394 insertions(+) diff --git a/skills/gherkin-step/SKILL.md b/skills/gherkin-step/SKILL.md index 48d2e88..c6ddddc 100644 --- a/skills/gherkin-step/SKILL.md +++ b/skills/gherkin-step/SKILL.md @@ -81,6 +81,21 @@ Before(async function (this: AppWorld) { - Place under `playwright/steps/` (TS) or `features/steps/` (Python) - Never name a file `steps.ts` or `steps.py` — the name must identify the domain +**Given precondition state — OGP-01:** `Given` preconditions that navigate to an arbitrary element using `.first()` (or any positional selector) without asserting the domain-specific state required by the scenario create false positives. If the scenario distinguishes between, for example, a domain the user owns versus one they do not own, supply fixture-provided IDs via the env fixture (`ownedDomainId`, `nonOwnedDomainId`) rather than picking the first element from a list. + +```typescript +// ✅ — uses fixture-provided ID to guarantee correct ownership state +Given('I am on the Domain Detail page for a domain I own', async ({ page, env }) => { + await page.goto(`/auth/domain/${env.ownedDomainId}`); +}); + +// ❌ — both "own" and "do not own" variants resolve to the same arbitrary domain +Given('I am on the Domain Detail page for a domain I own', async ({ page }) => { + await page.goto('/auth/all-domains'); + await page.getByTestId('domain-name-link').first().click(); +}); +``` + --- ## Function naming convention @@ -100,6 +115,21 @@ objects, or service clients. Business logic must not live in step definitions. - `Given` steps must not contain assertions — they set up preconditions only - `When` steps must not contain assertions — they perform actions only - Assertions belong exclusively in `Then` steps +- A step body consisting only of comments is a no-op and is not permitted as a final implementation — NOP-01. If the system pre-establishes state externally, the step must assert that state is actually present rather than silently pass. + +```typescript +// ✅ — pre-populated state is explicitly asserted +When('I select a domain', async ({ page }) => { + // Domain is pre-populated from context; assert selector shows a value + await expect(page.getByTestId('domain-selector')).not.toBeEmpty(); +}); + +// ❌ — comment-only body; regression goes undetected +When('I select a domain', async ({ page }) => { + // Domain is pre-selected when navigated from within a domain context + // No additional action needed +}); +``` ```python # ✅ — thin; delegates to PageObject @@ -133,6 +163,18 @@ When("the customer submits the order", async function (this: OrderWorld) { }); ``` +**Pending data-cy rule — SS-01:** Do not write CSS-class-OR-data-cy fallback combos (e.g. `'.modal, [data-cy="x"]'`) in step files or PageObjects. A fallback combo either always passes (the CSS class matches when the data-cy does not exist) or always fails (neither exists), both masking real failures. If the confirmed `data-cy` attribute does not yet exist in the template: +1. Use the most stable interim selector available and mark it with `// @pending data-cy: <candidate-name>`. +2. Raise it as a gap in WORK_LOG.md §4 so it is tracked for instrumentation via `data-cy-instrument`. + +```typescript +// ✅ — interim selector clearly flagged +await expect(page.locator('[role="dialog"]')).toBeVisible(); // @pending data-cy: dialog-access-request + +// ❌ — fallback combo hides whether the real selector ever lands +await expect(page.locator('[role="dialog"], .access-request-form')).toBeVisible(); +``` + --- ## Share state using the context / World object @@ -156,10 +198,42 @@ def step_assert_discount(context, rate): assert context.customer.discount_rate() == rate ``` +**Hardcoded assertion rule — HTA-01:** `Then` assertions must not contain string literals that were set in a preceding `When` step (magic constants). Pass the value through the World context or as a `{string}` Cucumber parameter, or assert a structural property instead. + +```typescript +// ✅ — domain name flows through World context +When('I import a domain named {string}', async function (this: AppWorld, name: string) { + this.importedDomainName = name; + await this.importDomainPage.importDomain(name); +}); +Then('the imported domain is visible in the domain list', async function (this: AppWorld) { + await expect(this.page.getByTestId('domain-name-link').getByText(this.importedDomainName)).toBeVisible(); +}); + +// ❌ — hardcoded constant couples assertion to the When step's implementation detail +Then('the imported domain is visible in the domain list', async ({ page }) => { + await expect(page.getByTestId('domain-name-link').getByText('E2E Import Test')).toBeVisible(); +}); +``` + --- ## Use typed parameters +**PTM-01 — `{string}` over `{word}` for UI labels:** Use `{string}` (quoted) for any step parameter that could contain spaces — tab names, button labels, section headings, status values. `{word}` matches only a single token without spaces and will silently fail to match multi-word values, and having both `{word}` and `{string}` variants in the same file causes Cucumber ambiguity errors. Remove all `{word}` variants and consolidate on `{string}`. + +```typescript +// ✅ — {string} matches "Version management", "Run history", "About" +When('I click the {string} tab', async ({ domainDetailPage }, tab: string) => { + await domainDetailPage.gotoTab(tab); +}); + +// ❌ — {word} silently fails for "Version management" and "Run history" +When('I click the {word} tab', async ({ domainDetailPage }, tab: string) => { + await domainDetailPage.gotoTab(tab); +}); +``` + ```python # ✅ — :d casts to int automatically @when("the customer purchases {quantity:d} units") @@ -215,6 +289,72 @@ def teardown_database(context): --- +## Wizard navigation rules + +Apply these rules when implementing step definitions for multi-step wizards. They detect +"cheat steps" — steps that appear to navigate a wizard but exercise no real behaviour. + +### CS-01 — Assert arrival at each wizard step + +Every wizard step navigation must verify arrival at the next step via a step-specific element +assertion before the step completes. Blind `continueButton.click()` chains without an arrival +assertion are forbidden: if the Continue button is disabled (validation failure), the click +silently does nothing and the test continues with a false pass. + +```typescript +// ✅ — arrival at the Owner step is explicitly verified +When('I complete the About step', async ({ createDomainAboutPage, createDomainOwnerPage }) => { + await createDomainAboutPage.fillDomainName('E2E Test Domain'); + await createDomainAboutPage.fillCostCenter('1234'); + await createDomainAboutPage.continueButton.click(); + await expect(createDomainOwnerPage.ownersTable).toBeVisible(); // arrival assertion +}); + +// ❌ — two blind clicks; no assertion that either step was actually reached +Given('I am on the Target dataset step', async ({ createDomainPage }) => { + await createDomainPage.continueButton.click(); + await createDomainPage.continueButton.click(); +}); +``` + +### CS-02 — Do not use `toHaveURL()` to detect wizard step progress in a scrolling stepper + +In a single-URL scrolling stepper the URL does not change between wizard steps. A +`toHaveURL(/step-name/)` assertion always passes regardless of which step is active, +giving false confidence. Assert the step-specific landmark element is visible instead. + +```typescript +// ✅ — asserts the Owner step's landmark element is in view +await expect(createDomainOwnerPage.ownersTable).toBeVisible(); + +// ❌ — URL never changes; assertion always passes +await expect(page).toHaveURL(/owner/i); +``` + +Once `data-cy` attributes are added to wizard step headers, prefer: +```typescript +await expect(page.getByTestId('step-owner')).toBeVisible(); +``` + +### CS-03 — Do not use `page.goBack()` inside an SPA wizard + +`page.goBack()` navigates the browser's URL history, not the wizard's internal state. Inside +an Angular (or other SPA) wizard, this takes the user back to the *previous page* (e.g. All +Domains), not to the previous wizard step. Use the wizard's own Back button or click the +stepper step header to navigate backward. + +```typescript +// ✅ — uses the wizard's own back navigation +await createDomainWizardPage.backButton.click(); +await expect(createDomainAboutPage.domainNameInput).not.toBeEmpty(); + +// ❌ — navigates away from the wizard entirely +await page.goBack(); +await expect(createDomainPage.domainNameInput).not.toBeEmpty(); +``` + +--- + ## Out-of-scope routing | Request | Correct skill | diff --git a/skills/living-doc-create-feature/SKILL.md b/skills/living-doc-create-feature/SKILL.md index 14631b8..4e3cecf 100644 --- a/skills/living-doc-create-feature/SKILL.md +++ b/skills/living-doc-create-feature/SKILL.md @@ -140,3 +140,17 @@ If `user_stories` is `[]`, repeat the orphan warning from Step 3 outside the JSO | Update feature_registry for impact traceability | **living-doc-impact-analysis** (see Feature registry format in that skill) | > **Renaming a Feature:** Changing a Feature's `id` or `name` requires cascading updates. Load `living-doc-update` and follow the "Rename a Feature" workflow there, which covers: Functionality `feature_id` fields, `feature_registry` entry, `manifest.json`, `seed.yaml`, PageObject file headers, and Gherkin feature file `# Feature:` headers. + +## Script — `validate_entity.py` + +After outputting the entity, validate it against the canonical schema before saving to the catalog. Do not save the entity if the script exits with code 1. + +```bash +# Validate the output (run from the toolkit root) +python skills/living-doc-update/scripts/validate_entity.py entity.json + +# With referential integrity checks against the full catalog +python skills/living-doc-update/scripts/validate_entity.py entity.json --catalog catalog.json +``` + +Exits 0 if valid (warnings are non-blocking). Exits 1 if any required field is missing, the ID format is wrong, or the status or `surface_type` value is invalid. diff --git a/skills/living-doc-create-feature/evals/evals.json b/skills/living-doc-create-feature/evals/evals.json index c3e9bda..fc43da6 100644 --- a/skills/living-doc-create-feature/evals/evals.json +++ b/skills/living-doc-create-feature/evals/evals.json @@ -186,6 +186,33 @@ "Notes that FEAT ID must also be unique" ] }, + { + "id": 16, + "category": "regression", + "prompt": "Create a new Feature entity for the 'Notifications Centre' screen. The catalog already contains FEAT-001 through FEAT-011.", + "expected_output": "Before assigning an ID, the agent runs: python scripts/next_id.py --type FEAT --catalog catalog.json. The script returns FEAT-012. The agent assigns id='FEAT-012' (or the slug 'FEAT-notifications-centre' if the project uses slug IDs). The agent does NOT invent an ID, reuse an existing ID, or leave the id field as a placeholder such as FEAT-XXX or FEAT-<nnn>. The final JSON contains a fully populated id field before being presented to the user.", + "files": [], + "expectations": [ + "Runs next_id.py --type FEAT before assigning the ID", + "Assigns the ID returned by the script (e.g. FEAT-012)", + "Does not invent, guess, or reuse an ID", + "Does not leave a placeholder (FEAT-XXX, FEAT-<nnn>, FEAT-unknown)", + "Final JSON has a fully populated id field" + ] + }, + { + "id": 17, + "category": "regression", + "prompt": "Create a Feature entity for the 'Payment Page'. The catalog file is not present — next_id.py cannot be run. What should the agent do?", + "expected_output": "Agent cannot auto-assign a numeric ID. It uses the slug convention instead: id='FEAT-payment-page' derived from the surface name in kebab-case. The agent explicitly states: 'No catalog available — using slug ID FEAT-payment-page. Verify this ID does not conflict with existing entities before saving.' It does NOT invent a numeric ID such as FEAT-001 or FEAT-999 without catalog evidence.", + "files": [], + "expectations": [ + "Falls back to slug ID when catalog is unavailable", + "Slug is derived from the surface name in kebab-case", + "Warns the user to verify no collision before saving", + "Does not invent a numeric ID without catalog evidence" + ] + }, { "id": 15, "category": "regression", diff --git a/skills/living-doc-create-functionality/SKILL.md b/skills/living-doc-create-functionality/SKILL.md index b11f089..8cfcbaa 100644 --- a/skills/living-doc-create-functionality/SKILL.md +++ b/skills/living-doc-create-functionality/SKILL.md @@ -137,6 +137,20 @@ Rules: > **Parent Feature sync:** After saving this entity, load `living-doc-update` and append this `FUNC-<id>` to the parent Feature's `"functionalities"` array. An unlinked Functionality will be flagged as `ORPHAN_FUNCTIONALITY` by `living-doc-gap-finder`. +## Script — `validate_entity.py` + +After outputting the entity, validate it against the canonical schema before saving to the catalog. Do not save the entity if the script exits with code 1. + +```bash +# Validate the output (run from the toolkit root) +python skills/living-doc-update/scripts/validate_entity.py entity.json + +# With referential integrity checks against the full catalog +python skills/living-doc-update/scripts/validate_entity.py entity.json --catalog catalog.json +``` + +Exits 0 if valid (warnings are non-blocking). Exits 1 if any required field is missing, the ID format is wrong, `parent_feature` does not match `FEAT-*`, or the status value is invalid. + ## Distinguishing Functionality ACs from User Story ACs | Dimension | User Story AC | Functionality AC | diff --git a/skills/living-doc-create-functionality/evals/evals.json b/skills/living-doc-create-functionality/evals/evals.json index e7a3348..2844041 100644 --- a/skills/living-doc-create-functionality/evals/evals.json +++ b/skills/living-doc-create-functionality/evals/evals.json @@ -162,6 +162,33 @@ "Explains: every AC must state an exact outcome" ] }, + { + "id": 14, + "category": "regression", + "prompt": "Create a new Functionality entity 'Validate discount code expiry'. The catalog already contains FUNC-001 through FUNC-007.", + "expected_output": "Before assigning an ID, the agent runs: python scripts/next_id.py --type FUNC --catalog catalog.json. The script returns FUNC-008. The agent assigns id='FUNC-008'. It does NOT invent an ID, reuse an existing one, or leave a placeholder such as FUNC-XXX or FUNC-<nnn>. The final JSON has a fully populated id field before being presented to the user.", + "files": [], + "expectations": [ + "Runs next_id.py --type FUNC before assigning the ID", + "Assigns the ID returned by the script (e.g. FUNC-008)", + "Does not invent, guess, or reuse an ID", + "Does not leave a placeholder (FUNC-XXX, FUNC-<nnn>, FUNC-unknown)", + "Final JSON has a fully populated id field" + ] + }, + { + "id": 15, + "category": "regression", + "prompt": "Create a Functionality entity but the catalog file is missing — next_id.py cannot be run. What should the agent do?", + "expected_output": "Agent cannot auto-assign an ID. It outputs the entity with id='FUNC-PENDING' and explicitly states: 'Catalog not available — ID could not be assigned. Run next_id.py --type FUNC once the catalog is present and update this field before saving.' It does NOT invent a numeric ID such as FUNC-001 without catalog evidence. The placeholder makes the gap visible rather than hiding it with a guessed value.", + "files": [], + "expectations": [ + "Uses FUNC-PENDING placeholder when catalog is unavailable", + "Explicitly warns the user that the ID must be assigned before saving", + "Does not invent a numeric ID without catalog evidence", + "Placeholder is visibly distinct — not a real FUNC-nnn value" + ] + }, { "id": 13, "category": "regression", diff --git a/skills/living-doc-create-user-story/SKILL.md b/skills/living-doc-create-user-story/SKILL.md index 3c22e1a..3eaa333 100644 --- a/skills/living-doc-create-user-story/SKILL.md +++ b/skills/living-doc-create-user-story/SKILL.md @@ -136,6 +136,20 @@ Rules: > **Next steps after creation:** The User Story is created with `status: "planned"`. When all ACs are finalised and at least one Feature is linked, use `living-doc-update` to promote it to `active`. After promotion, use `living-doc-scenario-creator` to generate BDD feature files for each `ACTIVE` AC. +## Script — `validate_entity.py` + +After outputting the entity, validate it against the canonical schema before saving to the catalog. Do not save the entity if the script exits with code 1. + +```bash +# Validate the output (run from the toolkit root) +python skills/living-doc-update/scripts/validate_entity.py entity.json + +# With referential integrity checks against the full catalog +python skills/living-doc-update/scripts/validate_entity.py entity.json --catalog catalog.json +``` + +Exits 0 if valid (warnings are non-blocking). Exits 1 if any required field is missing, the ID format is wrong, no AC is present, or the status value is invalid. + ## Anti-patterns to flag | Anti-pattern | Warning | diff --git a/skills/living-doc-create-user-story/evals/evals.json b/skills/living-doc-create-user-story/evals/evals.json index 356d689..872843e 100644 --- a/skills/living-doc-create-user-story/evals/evals.json +++ b/skills/living-doc-create-user-story/evals/evals.json @@ -164,6 +164,33 @@ "Points to living-doc-create-functionality for the extraction" ] }, + { + "id": 14, + "category": "regression", + "prompt": "Create a new User Story for the password reset capability. The catalog already contains US-001 through US-013.", + "expected_output": "Before assigning an ID, the agent runs: python scripts/next_id.py --type US --catalog catalog.json. The script returns US-014. The agent assigns id='US-014'. It does NOT invent an ID, reuse an existing one, or leave a placeholder such as US-XXX or US-<nnn>. The final JSON has a fully populated id field before being presented to the user.", + "files": [], + "expectations": [ + "Runs next_id.py --type US before assigning the ID", + "Assigns the ID returned by the script (e.g. US-014)", + "Does not invent, guess, or reuse an ID", + "Does not leave a placeholder (US-XXX, US-<nnn>, US-unknown)", + "Final JSON has a fully populated id field" + ] + }, + { + "id": 15, + "category": "regression", + "prompt": "Create a User Story but the catalog file is missing — next_id.py cannot be run. What should the agent do?", + "expected_output": "Agent cannot auto-assign an ID. It outputs the entity with id='US-PENDING' and explicitly states: 'Catalog not available — ID could not be assigned. Run next_id.py --type US once the catalog is present and update this field before saving.' It does NOT invent a numeric ID such as US-001 without catalog evidence.", + "files": [], + "expectations": [ + "Uses US-PENDING placeholder when catalog is unavailable", + "Explicitly warns the user that the ID must be assigned before saving", + "Does not invent a numeric ID without catalog evidence", + "Placeholder is visibly distinct — not a real US-nnn value" + ] + }, { "id": 13, "category": "happy-path", diff --git a/skills/living-doc-pageobject-scan/SKILL.md b/skills/living-doc-pageobject-scan/SKILL.md index a655a97..55cdbda 100644 --- a/skills/living-doc-pageobject-scan/SKILL.md +++ b/skills/living-doc-pageobject-scan/SKILL.md @@ -39,6 +39,22 @@ compatibility: GitHub Copilot --- +## Pre-flight: MCP Playwright availability check + +**This skill requires the MCP Playwright server. Perform this check before any other step, in every mode.** + +1. Attempt to call `mcp_microsoft_pla_browser_snapshot` (or any `mcp_microsoft_pla_browser_*` tool) with a no-op argument. +2. If the call **succeeds** — continue to the relevant mode below. +3. If the call **fails or the tool is unavailable** — **stop immediately.** Do not fall back to static sources, route configs, or guided traversal as a substitute. Output exactly: + + > **MCP Playwright server is not available.** + > This skill requires the `@playwright/mcp` (or equivalent) MCP server to be running and connected. + > Please enable it in your VS Code MCP configuration (`.vscode/mcp.json` or user settings) and restart the agent session, then retry. + + Do not attempt any crawl, seed assembly, or DOM interaction until the user confirms the server is available. + +--- + ## Create mode ### Step 0 — Business Seed assembly @@ -226,6 +242,8 @@ guided_steps: ## Maintain mode +> **Pre-flight:** Confirm MCP Playwright is available before proceeding (see [Pre-flight check](#pre-flight-mcp-playwright-availability-check) above). Stop and ask if it is not. + Two scopes — activate the one that matches the trigger. | Scope | Trigger | Breadth | diff --git a/skills/living-doc-pageobject-scan/evals/evals.json b/skills/living-doc-pageobject-scan/evals/evals.json index 5e4ee74..aacd701 100644 --- a/skills/living-doc-pageobject-scan/evals/evals.json +++ b/skills/living-doc-pageobject-scan/evals/evals.json @@ -155,6 +155,34 @@ "Does not skip the route due to authentication requirement", "Names appropriate auth strategy for the auth type" ] + }, + { + "id": 12, + "category": "negative", + "prompt": "Scan this webapp and generate PageObjects. The MCP Playwright server is not running.", + "expected_output": "Agent performs the MCP pre-flight check by attempting to call mcp_microsoft_pla_browser_snapshot. The call fails. Agent stops immediately and outputs: 'MCP Playwright server is not available. This skill requires the @playwright/mcp (or equivalent) MCP server to be running and connected. Please enable it in your VS Code MCP configuration (.vscode/mcp.json or user settings) and restart the agent session, then retry.' Agent does NOT fall back to static sources (route configs, OpenAPI, guided traversal) as a substitute for the crawl. No seed assembly, DOM inspection, or PageObject generation is attempted.", + "files": [], + "expectations": [ + "Performs MCP pre-flight check before any other step", + "Detects that mcp_microsoft_pla_browser_* tools are unavailable", + "Stops immediately — does not proceed to seed assembly or crawl", + "Outputs the exact stop message naming the missing MCP server", + "Does NOT fall back to static sources (route config, OpenAPI, guided traversal) as a crawl substitute", + "Instructs user to enable the MCP server and retry" + ] + }, + { + "id": 13, + "category": "negative", + "prompt": "Re-scan the manifest after the latest UI update. The MCP Playwright server is unavailable.", + "expected_output": "Agent performs the MCP pre-flight check at the start of Maintain mode. The call to mcp_microsoft_pla_browser_snapshot fails. Agent stops and outputs: 'MCP Playwright server is not available. Please enable it in your VS Code MCP configuration and restart the agent session, then retry.' Agent does not attempt to diff the manifest, update selectors, or infer DOM changes from static files.", + "files": [], + "expectations": [ + "Pre-flight check runs before entering RE-SCAN or HEALING scope", + "Stops on MCP unavailability — does not attempt static diff or inference", + "Outputs the stop message naming the missing MCP server", + "Does not touch manifest.json or PageObject files without a live DOM" + ] } ] } diff --git a/skills/living-doc-scenario-creator/evals/evals.json b/skills/living-doc-scenario-creator/evals/evals.json index 878e48b..8511452 100644 --- a/skills/living-doc-scenario-creator/evals/evals.json +++ b/skills/living-doc-scenario-creator/evals/evals.json @@ -196,6 +196,75 @@ "Identifies multiple-scenario-per-AC as a design smell", "Asks user to designate the canonical scenario" ] + }, + { + "id": 16, + "category": "regression", + "prompt": "Run the coverage report for US-007 which has three active ACs. The features directory contains one scenario with '@AC:US-007-01' and one scenario that has a '# AC: US-007-02' comment but no @AC: tag. What does the report show?", + "expected_output": "Coverage report shows: AC:US-007-01 ✅ covered (matched by @AC: tag). AC:US-007-02 ❌ NOT COVERED — the # AC: comment is human-readable only; coverage is determined solely by the machine-readable @AC: tag. The comment alone does not count as coverage. AC:US-007-03 ❌ NOT COVERED. Summary: 1 covered, 2 gaps. The report does NOT claim AC:US-007-02 is covered just because a comment references it.", + "files": [], + "expectations": [ + "Coverage is based exclusively on @AC: tags — not on # AC: comments", + "AC:US-007-01 is marked ✅ covered", + "AC:US-007-02 is marked ❌ NOT COVERED despite the # AC: comment", + "Summary correctly shows 1 covered, 2 gaps", + "Report does not invent or infer coverage from comments or scenario text" + ] + }, + { + "id": 17, + "category": "regression", + "prompt": "The coverage report is run against a features directory that contains no .feature files at all. US-005 has two active ACs. What does the report output?", + "expected_output": "Report shows both ACs as ❌ NOT COVERED. Summary: 2 active ACs, 0 covered, 2 gaps. The report does not hallucinate coverage, does not generate placeholder scenarios, and does not mark ACs as covered. It exits with code 1 to signal gaps. The output clearly identifies the gaps as the scenario generation queue input: 'NOT COVERED — add to scenario generation queue'.", + "files": [], + "expectations": [ + "Both ACs reported as ❌ NOT COVERED", + "Summary: 0 covered, 2 gaps", + "Exits with code 1", + "Does not hallucinate coverage or generate placeholder content", + "Gap entries explicitly marked as scenario generation queue input" + ] + }, + { + "id": 18, + "category": "regression", + "prompt": "The coverage report is run for US-010. AC:US-010-02 has a scenario tagged '@AC:US-10-02' (missing leading zero in the US number). Is it counted as covered?", + "expected_output": "No — AC:US-010-02 is NOT covered. The @AC: tag '@AC:US-10-02' does not match the canonical ID 'US-010-02' because the ID format must match exactly (zero-padded). The report shows AC:US-010-02 ❌ NOT COVERED. A warning or note is shown: '@AC:US-10-02 found in <filename> — does not match any known AC ID; check zero-padding'. The report does not guess or fuzzy-match AC IDs.", + "files": [], + "expectations": [ + "AC:US-010-02 is marked ❌ NOT COVERED", + "Malformed tag @AC:US-10-02 does not count as coverage", + "Report emits a warning about the unrecognised tag", + "No fuzzy matching or inference — IDs must match exactly" + ] + }, + { + "id": 19, + "category": "happy-path", + "prompt": "Run coverage_report.py for US-007. All three active ACs have matching @AC: tags in feature files. What does the summary look like?", + "expected_output": "Report shows all three ACs as ✅ covered with the feature file name(s) listed under each. Summary: Active/Implemented ACs: 3. Covered by scenarios: 3 (100%). Gaps: 0. Exits with code 0. The report lists only real file names from the features directory — it does not invent file names or pad coverage.", + "files": [], + "expectations": [ + "All three ACs marked ✅ covered", + "Each covered entry lists the actual feature file name", + "Summary shows 3/3 covered, 100%, 0 gaps", + "Exits with code 0", + "No invented file names — only real files from the scanned directory" + ] + }, + { + "id": 20, + "category": "happy-path", + "prompt": "After running the coverage report I see 4 gaps for US-012. What should happen next?", + "expected_output": "The 4 gap ACs become the scenario generation queue. The agent invokes living-doc-scenario-creator to generate scenarios for each uncovered Active AC — one scenario per AC, in order. The coverage report output (the ❌ rows) is the direct input to scenario generation; no manual re-specification is needed. After generation, re-running the report should show 0 gaps. The agent does not skip gaps or reorder them without explicit instruction.", + "files": [], + "expectations": [ + "Gap ACs from the report become the scenario generation queue", + "Invokes living-doc-scenario-creator for each uncovered Active AC", + "Processes gaps in report order — no reordering", + "Does not skip gaps silently", + "After generation, re-running the report is expected to show 0 gaps" + ] } ] } \ No newline at end of file diff --git a/skills/references/living-doc-glossary.md b/skills/references/living-doc-glossary.md index efb3ad3..ed3dafd 100644 --- a/skills/references/living-doc-glossary.md +++ b/skills/references/living-doc-glossary.md @@ -167,6 +167,22 @@ An atomic, fast-testable behavior — a single verb phrase describing one respon Functionalities differ from User Story ACs: they are atomic and fast-testable, not end-to-end. A single User Story may trigger multiple Functionalities. +#### User Story vs Functionality — decision boundary + +| Dimension | User Story | Functionality | +|---|---|---| +| Perspective | End user observing a business outcome | Developer / component behavior | +| Scope | Full E2E flow across one or more surfaces | Single function, method, or UI behavior | +| AC example | "Order is confirmed and confirmation email is sent" | "Returns discounted total when a valid membership tier is applied" | +| Test type | E2E / integration scenario | Unit or fast system test | +| Trigger question | *"Would a product owner write this as a business requirement?"* → **User Story** | *"Would a developer write this as a function contract?"* → **Functionality** | + +**When in doubt:** if the behavior is observable only by looking at the code or component output (not by a user clicking through the UI), it is a Functionality. If it describes what a user can do or see across one or more screens, it is a User Story. + +If an AC belongs to the wrong entity type, redirect: +- AC too atomic / technical inside a US → move to a **Functionality** (`living-doc-create-functionality`) +- AC describes a full user journey inside a FUNC → move to a **User Story** (`living-doc-create-user-story`) + > Feature file template and `func_type` values: see [living-doc-bdd-schemas — Functionality Feature File Header](./living-doc-bdd-schemas.md#functionality-feature-file-header). ### Acceptance Criterion (AC) From f3c88e37db9effd9bda7a0070e577daf3acf52d1 Mon Sep 17 00:00:00 2001 From: miroslavpojer <miroslav.pojer@absa.africa> Date: Sun, 31 May 2026 11:52:48 +0200 Subject: [PATCH 35/35] Testing round 2. --- skills/living-doc-update/SKILL.md | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/skills/living-doc-update/SKILL.md b/skills/living-doc-update/SKILL.md index a8c6135..81b3eff 100644 --- a/skills/living-doc-update/SKILL.md +++ b/skills/living-doc-update/SKILL.md @@ -54,6 +54,11 @@ When modifying an existing AC **keep the AC ID stable** — changing the ID brea to linked tests. Only update the `description`, `given`, `when`, `then`, or state fields. If the changed AC text affects linked tests, flag them for update. +**AC versioning:** ACs carry a `(vMAJOR.MINOR.PATCH – state)` annotation. +- Bump the **minor** version for any business-rule change to an `ACTIVE` AC (e.g. `v1.0.0 → v1.1.0`). +- Bump the **patch** version for a wording clarification that does not change the rule (e.g. `v1.0.0 → v1.0.1`). +- The version must appear in the `# AC:` comment in linked Gherkin feature files — trigger `gherkin-living-doc-sync` to propagate the new version into those comments. + ## Promote a Functionality from planned to active A Functionality is ready to move from `planned` to `active` when all its ACs have passing tests. @@ -127,13 +132,13 @@ When a team changes ownership of a Feature, update the `owners` field and set `o When an AC is moved out of the current sprint but not permanently removed: -- Keep `status: PLANNED` — state does not change for a deferral -- Add `descoped_at` (date) and `descoped_reason` fields — **do not delete the AC** (preserves audit trail) +- Set `status: descoped` — do not delete the AC (preserves audit trail and reinstating intent) +- Add `descoped_at` (date) and `descoped_reason` fields - Add `future_release` field if the work is planned for a later sprint -- Flag any linked tests for `@skip` or `@pending` tagging +- Flag any linked Gherkin scenarios for `@wip` or `@pending` tagging via `gherkin-living-doc-sync` ``` -AC:US-042-03 (v1.2.0 – PLANNED) +AC:US-042-03 (v1.2.0 – descoped) – Promo codes can be stacked and applied in defined priority order. – descoped_at: 2026-05-15 – descoped_reason: Promo stacking rule deferred — too complex for current sprint