diff --git a/.DS_Store b/.DS_Store new file mode 100644 index 0000000..5008ddf Binary files /dev/null and b/.DS_Store differ diff --git a/.github/agents/evals/living-doc-bdd-copilot/evals.json b/.github/agents/evals/living-doc-bdd-copilot/evals.json new file mode 100644 index 0000000..a2d26bd --- /dev/null +++ b/.github/agents/evals/living-doc-bdd-copilot/evals.json @@ -0,0 +1,355 @@ +{ + "agent_name": "living-doc-bdd-copilot", + "evals": [ + { + "id": 1, + "category": "happy-path", + "prompt": "Start a BDD session for a new project. The living doc catalog is in docs/living-doc/. The app runs at https://app.example.com. Login is required.", + "expected_output": "Agent assembles the Business Seed file at the discovered or default location. Sources: (A) loads Feature names and AC texts from docs/living-doc/; (B) looks for route config (Angular router, React Router, sitemap.xml); (D) checks for an existing manifest.json. Creates seed.yaml with: base_url, credentials using env: references (never literal values), known_routes from catalog Features, and an empty guided_steps list. Proposes adding BDD artifact paths to .github/copilot-instructions.md for future sessions.", + "files": [], + "expectations": [ + "Creates seed.yaml with base_url, credentials as env: references, and known_routes", + "Never stores literal credentials — always env:VAR_NAME", + "Loads Feature names and routes from the living doc catalog (Source A)", + "Checks for existing manifest.json before starting (Source D)", + "Proposes adding artifact paths to .github/copilot-instructions.md" + ] + }, + { + "id": 2, + "category": "happy-path", + "prompt": "Crawl the webapp and generate a PageObject for the checkout screen at /checkout.", + "expected_output": "Agent navigates to /checkout via MCP Playwright. Takes a snapshot and identifies interactive elements: promo code input, confirm order button, error banner. Generates CheckoutPage with: file-level living-doc: FEAT- | /checkout comment, ALL_CAPS selector constants using data-testid preference, __init__ or constructor taking a page parameter, and method stubs for each interactive element. Adds the surface to manifest.json. If no matching Feature entity exists, loads `living-doc-create-feature` skill to create FEAT- before continuing. Flags any element using positional CSS selectors as fragile.", + "files": [], + "expectations": [ + "Uses MCP Playwright to navigate and snapshot the page", + "Generates CheckoutPage with data-testid selector preference", + "File-level living-doc: FEAT-nnn | /checkout comment", + "Adds entry to manifest.json", + "Loads living-doc-create-feature for missing Feature entities", + "Flags positional CSS selectors as fragile" + ] + }, + { + "id": 3, + "category": "happy-path", + "prompt": "Generate Gherkin scenarios for US-007 — Place an Online Order. ACs: (1) Active — happy path: customer places order with saved payment. (2) Active — error: order rejected when card declined.", + "expected_output": "Agent generates a .feature file named us-007-place-an-online-order.feature. Feature header uses the As-a/I-can/so-that narrative from US-007. Two scenarios generated — one per Active AC. Each Scenario: is immediately preceded by a '# AC: US-007-0n (v1.0.0 – Active) — ...' traceability tag. Step text uses domain language (no HTTP calls, selectors, or DB). For steps without a matching step definition, generates stubs: Case A (PageObject method exists) = full stub; Case B (no PageObject method) = stub with NotImplementedError and flag to extend the PageObject.", + "files": [], + "expectations": [ + "Feature file named us-007-place-an-online-order.feature", + "Each Scenario: immediately preceded by a # AC: traceability comment", + "Skips Planned and Deprecated ACs — only Active ACs drive generation", + "Step text in domain language — no implementation details", + "Case A stubs delegate to PageObject methods", + "Case B stubs raise NotImplementedError and flag the missing PageObject method" + ] + }, + { + "id": 4, + "category": "happy-path", + "prompt": "RE-SCAN mode — the checkout screen had a UI update. Re-validate all manifest entries and discover any new routes.", + "expected_output": "Agent reloads seed.yaml and manifest.json. For every existing manifest entry: navigates to the URL, snapshots the DOM, and checks each recorded component_id selector. Selectors that no longer resolve are flagged as BREAKING CHANGE. On each visited page, agent also actively discovers new routes — follows links, clicks navigation-suggesting buttons, and checks tab panels and side-nav items not yet in the manifest. New surfaces are added to manifest.json. Removed surfaces are marked as deprecated. Stale PageObject selectors are updated.", + "files": [], + "expectations": [ + "Reloads both seed.yaml and manifest.json before starting", + "Validates every manifest entry's component_id selectors against the current DOM", + "Flags broken selectors as BREAKING CHANGE with linked step definition details", + "Actively discovers new routes beyond existing manifest entries", + "Adds new surfaces to manifest.json; marks removed ones as deprecated", + "Updates stale selector constants in PageObjects" + ] + }, + { + "id": 5, + "category": "regression", + "prompt": "HEALING mode — the checkout test 'Scenario: Customer successfully places an order' is failing. The confirm button selector is broken.", + "expected_output": "Agent enters HEALING mode — scope is limited to the failing test only. Traces the failure to CheckoutPage.CONFIRM_BUTTON and the step 'When the customer confirms the order'. Navigates to /checkout via MCP Playwright, snapshots the DOM, and finds the updated selector. Updates CONFIRM_BUTTON in CheckoutPage only. Verifies the linked step definition binding still resolves. Re-runs only the previously failing test to confirm healing. Does not touch passing tests or unrelated PageObjects.", + "files": [], + "expectations": [ + "Scope limited to the failing test — does not touch passing tests", + "Navigates to the affected page via MCP Playwright and snapshots DOM", + "Updates the broken selector in CheckoutPage only", + "Verifies the step definition binding is intact", + "Re-runs only the failing test to confirm healing" + ] + }, + { + "id": 6, + "category": "negative", + "prompt": "Create a User Story for the guest checkout capability.", + "expected_output": "Agent switches to catalog-operations mode and loads `living-doc-create-user-story` skill to create a User Story for guest checkout. This agent handles both catalog management and BDD automation.", + "files": [], + "expectations": [ + "Loads living-doc-create-user-story skill", + "Creates User Story in catalog-operations mode", + "Does not hand off to another agent" + ] + }, + { + "id": 7, + "category": "paraphrase", + "prompt": "Our BDD tests are all failing after the UI redesign — buttons and inputs have new IDs.", + "expected_output": "Agent recognises this as a HEALING mode request. Asks for the list of failing test names or scenario titles. Scopes repair to only those failing scenarios. For each failing scenario: traces to the affected PageObject, navigates to the screen via MCP Playwright, snapshots DOM to find updated selectors, and updates affected PageObject constants. Verifies step definition bindings. Re-runs only the previously failing tests. Does not touch unrelated PageObjects.", + "files": [], + "expectations": [ + "Identifies this as HEALING mode", + "Asks for the failing test list to scope the repair", + "Traces each failure to its PageObject and step definition", + "Uses MCP Playwright to discover updated selectors", + "Re-runs only previously failing tests after healing" + ] + }, + { + "id": 8, + "category": "edge-case", + "prompt": "REMOVE mode — the legacy promo code feature has been removed from the product. Clean up the BDD artifacts.", + "expected_output": "Agent enters REMOVE mode. Identifies all .feature files whose scenarios carry # AC: tags matching the removed promo code Feature/US IDs. Finds PageObjects referenced only by those scenarios. Finds step definitions used only by those scenarios. Presents the complete deletion list to the user for confirmation before touching any file. After confirmation: removes confirmed files, updates manifest.json to remove the deprecated entries. Loads `living-doc-update` skill to deprecate the linked US/AC entities in the catalog.", + "files": [], + "expectations": [ + "Identifies all .feature file scenarios linked to the removed Feature via # AC: tags", + "Identifies PageObjects and step definitions used exclusively by those scenarios", + "Presents the full deletion list for user confirmation before any file is touched", + "Removes only confirmed files — does not auto-delete", + "Updates manifest.json to remove deprecated entries", + "Loads living-doc-update skill to deprecate catalog entities" + ] + }, + { + "id": 9, + "category": "regression", + "prompt": "The seed.yaml is present at .copilot/bdd/seed.yaml but manifest.json is missing. What should the agent do?", + "expected_output": "This is a first exploration run — partial state rule applies. seed.yaml present but manifest.json absent means the session starts fresh: treat this as the first run with a clean slate. Begin the crawl from base_url in seed.yaml; do not assume any surfaces have been discovered. Create manifest.json as crawl progresses. Do not attempt to 'resume' from a non-existent manifest.", + "files": [], + "expectations": [ + "Identifies partial state: seed.yaml present, manifest.json absent", + "Treats this as a first exploration run — clean slate", + "Starts crawl from base_url in seed.yaml", + "Creates manifest.json during crawl — does not assume prior discovery", + "Does not attempt to resume from a non-existent manifest" + ] + }, + { + "id": 10, + "category": "regression", + "prompt": "The seed.yaml contains: 'credentials: {username: admin, password: secret123}'. Is this correct?", + "expected_output": "No — literal credentials must never be stored in seed.yaml. This is a security violation. Replace the literal values with environment variable references: 'credentials: {username: env:BDD_USERNAME, password: env:BDD_PASSWORD}'. Literal credentials in a file committed to source control are exposed to anyone with repository access. The agent must flag this immediately and refuse to proceed with the current seed.yaml until the credentials are replaced with env:VAR_NAME references.", + "files": [], + "expectations": [ + "Flags literal credentials as a security violation", + "Refuses to proceed with literal credentials in seed.yaml", + "Provides the corrected env:VAR_NAME format", + "Explains the risk: credentials exposed to anyone with repository access" + ] + }, + { + "id": 11, + "category": "edge-case", + "prompt": "During crawling, the agent reaches a multi-step checkout wizard (/checkout/step-2) but doesn't know what values to enter in the required 'Delivery Zone Code' field. How should this be handled?", + "expected_output": "The agent enters the Source E guided traversal protocol. It takes a screenshot and shows the user what it sees. It asks: 'I've reached a decision point at /checkout/step-2. What should I do next? Please provide the valid Delivery Zone Code value (or another way to progress past this step).' After the user provides the value, the agent executes the action via MCP Playwright and immediately appends the action to guided_steps in seed.yaml so the route can be re-navigated in future sessions without prompting.", + "files": [], + "expectations": [ + "Takes a screenshot and shows the user the current state", + "Asks the user for the missing business-specific input value", + "Executes the guided action via MCP Playwright after receiving the answer", + "Appends the action to guided_steps in seed.yaml for future sessions", + "Does not invent or guess business-specific field values" + ] + }, + { + "id": 12, + "category": "output-format", + "prompt": "After scanning /login and generating a LoginPage, show me what the manifest.json entry for /login looks like.", + "expected_output": "The manifest.json entry for /login includes: pageobject_path (path to the generated LoginPage file), feature_id (FEAT- or FEAT-UNKNOWN if unlinked), last_scanned (ISO timestamp), elements (list of discovered elements with data_cy and tag), coverage_gaps (empty list initially), and navigation_context with prerequisites, navigation_steps, data_requirements, auth_role, and notes. The feature_id is FEAT-UNKNOWN if no matching Feature entity exists in the living doc — flag this route as 'needs Feature entity' and load `living-doc-create-feature` skill to create it.", + "files": [], + "expectations": [ + "manifest.json entry has: pageobject_path, feature_id, last_scanned, elements, coverage_gaps, navigation_context", + "feature_id is FEAT-UNKNOWN if no matching Feature entity exists", + "last_scanned is an ISO timestamp", + "navigation_context includes prerequisites, navigation_steps, auth_role", + "Missing Feature entity triggers living-doc-create-feature skill load" + ] + }, + { + "id": 13, + "category": "regression", + "prompt": "During seed assembly, the living doc catalog at docs/living-doc/ has FEAT-checkout mapped to route /checkout and FEAT-account mapped to route /account/orders. No sitemap.xml exists. How should known_routes in seed.yaml be populated?", + "expected_output": "Agent loads Source A (living documentation). Extracts Feature-to-route mappings: FEAT-checkout → /checkout and FEAT-account → /account/orders. Adds both to known_routes in seed.yaml. Notes that Source B (sitemap.xml) is absent — no error is raised. Routes not listed in the living doc will be discovered dynamically during the crawl.", + "files": [], + "expectations": [ + "Source A: Feature-to-route mappings extracted from the living doc catalog", + "Both routes added to known_routes in seed.yaml", + "Source B (sitemap) noted as absent — no error raised", + "Notes that unlisted routes will be discovered dynamically during crawl" + ] + }, + { + "id": 14, + "category": "skill-dispatch", + "prompt": "Run a gap analysis on our living documentation before the v3.0 release. The catalog is at docs/living-doc/.", + "expected_output": "Agent switches to catalog-operations mode and loads the `living-doc-gap-finder` skill. Runs AUDIT mode: processes all entities and test files in docs/living-doc/, produces a gap report listing ORPHAN_TEST, STALE_REFERENCE, ORPHAN_FUNCTIONALITY, ORPHAN_FEATURE, EMPTY_FEATURE, and UNDOCUMENTED_SURFACE gaps. Outputs a documentation_coverage percentage and a prioritised gap list (Blocker → Critical → Important → Nit).", + "files": [], + "expectations": [ + "Loads living-doc-gap-finder skill", + "Runs AUDIT mode — full catalog audit", + "Outputs gap report with documentation_coverage percentage", + "Prioritised gap list: Blocker → Critical → Important → Nit", + "Does not modify any entities — gap-finder is read-only" + ] + }, + { + "id": 15, + "category": "skill-dispatch", + "prompt": "What living doc entities does PR #217 affect? It modifies PromoService.java and DiscountController.java.", + "expected_output": "Agent loads the `living-doc-impact-analysis` skill. Traces PromoService.java and DiscountController.java through the feature_registry. PromoService maps to FEAT-promo (domain logic — High impact). DiscountController maps to FEAT-discount (API contract — High impact). Output lists: affected Features, affected Functionalities, ACs requiring re-test, and Gherkin scenarios needing re-run. Produces a release sign-off checklist for both impacted Features.", + "files": [], + "expectations": [ + "Loads living-doc-impact-analysis skill", + "Traces changed files through feature_registry", + "Classifies each file as domain logic or API contract", + "Lists affected Features, Functionalities, ACs, and scenarios", + "Produces release sign-off checklist" + ] + }, + { + "id": 16, + "category": "skill-dispatch", + "prompt": "The @AC: traceability tags in checkout.feature are out of sync with the living doc. Sync them.", + "expected_output": "Agent loads the `gherkin-living-doc-sync` skill. Runs scan_ac_links.py to audit the @AC: tags and # AC: comments across checkout.feature. For each scenario: (1) verifies the @AC: tag matches a live AC in the catalog; (2) checks the # AC: comment format is canonical; (3) flags stale or missing links. Produces a sync diff showing what changed. Does not generate new scenarios — routes new-scenario requests to living-doc-scenario-creator.", + "files": [], + "expectations": [ + "Loads gherkin-living-doc-sync skill", + "Runs scan_ac_links.py to audit @AC: tags", + "Flags stale, missing, or malformed AC links", + "Produces a sync diff — does not auto-generate new scenarios", + "Routes new-scenario generation to living-doc-scenario-creator" + ] + }, + { + "id": 17, + "category": "skill-dispatch", + "prompt": "Write step definitions for the checkout scenarios — I need behave steps for the 'When the customer confirms the order' step.", + "expected_output": "Agent loads the `gherkin-step` skill. Generates a Python behave step definition: `@when('the customer confirms the order')` decorated function. The step body accesses `context.checkout_page` (initialised in a Before hook). Uses the CheckoutPage PageObject method `checkout_page.click_confirm_button()`. Follows the naming convention: function named `step_confirm_order`, not the verbose `step_when_the_customer_confirms_the_order`.", + "files": [], + "expectations": [ + "Loads gherkin-step skill", + "Generates @when decorated behave step function", + "Step body delegates to CheckoutPage PageObject method", + "Function name follows concise convention (step_confirm_order)", + "Does not write the Gherkin scenario itself — routes scenario creation to living-doc-scenario-creator" + ] + }, + { + "id": 18, + "category": "skill-dispatch", + "prompt": "Add missing data-cy attributes to the checkout Angular template — the PageObjects have PROPOSED locator comments.", + "expected_output": "Agent loads the `data-cy-instrument` skill. Phase 1: scans the checkout Angular template for elements without data-cy attributes; adds data-cy='' to each native HTML element. For third-party components that cannot forward attributes, adds a WORK_LOG.md §4 row with library name, version, and issue tracker link. Phase 3: updates CheckoutPage PageObject locators from CSS selectors to getByTestId() calls. Removes PROPOSED comments after update.", + "files": [], + "expectations": [ + "Loads data-cy-instrument skill", + "Adds data-cy attributes to native HTML elements in the Angular template", + "Escalates lib components to WORK_LOG.md §4 — does not silently skip", + "Updates PageObject locators to getByTestId()", + "Removes PROPOSED comments after locator update" + ] + }, + { + "id": 19, + "category": "skill-dispatch", + "prompt": "Delete all BDD artifacts linked to the deprecated FEAT-legacy-promo feature — clean up feature files, step definitions, and PageObjects.", + "expected_output": "Agent loads the `bdd-maintain` skill in REMOVE mode. Identifies all .feature file scenarios whose # AC: tags reference FEAT-legacy-promo User Stories. Runs find_unused_steps.py to find step definitions only used by those scenarios. Runs find_unused_po_methods.py and find_unused_po_components.py to identify exclusively-used PageObject artifacts. Presents the full deletion list to the user for confirmation before touching any file. Also checks fixtures.ts for PageObject imports to remove. After confirmation, removes the identified files.", + "files": [], + "expectations": [ + "Loads bdd-maintain skill", + "REMOVE mode — identifies scenarios via # AC: tags linked to deprecated Feature", + "Runs all three audit scripts to identify exclusively-used artifacts", + "Checks fixtures.ts for PageObject imports", + "Presents deletion list for confirmation before touching any file" + ] + }, + { + "id": 20, + "category": "skill-dispatch", + "prompt": "Document the Notifications Service as a Feature entity in the living doc — it exposes a REST API at /api/notifications.", + "expected_output": "Agent loads the `living-doc-create-feature` skill. Assigns the next sequential FEAT-nnn ID using next_id.py. Creates a Feature entity JSON with: id, name ('Notifications Service'), type ('api'), route ('/api/notifications'), status ('active'), owners (asks user), and empty functionalities and user_stories arrays. Adds the Feature to feature_registry.json. Prompts: 'Do you want to create Functionality entities for specific behaviors of this service?'", + "files": [], + "expectations": [ + "Loads living-doc-create-feature skill", + "Assigns next FEAT-nnn ID using next_id.py", + "Creates Feature entity JSON with required fields", + "Adds entry to feature_registry.json", + "Prompts for Functionality creation as next step" + ] + }, + { + "id": 21, + "category": "skill-dispatch", + "prompt": "Document the atomic behavior: apply a 20% discount to all cart items for Gold tier customers. Parent Feature is FEAT-discount.", + "expected_output": "Agent loads the `living-doc-create-functionality` skill. Assigns the next FUNC-nnn ID via next_id.py. Creates a Functionality entity with: id, name ('Apply Gold Tier Discount'), parent feature_id ('FEAT-discount'), status ('planned'), and at least two ACs — a happy-path AC and an error-path AC. After saving, prompts to load living-doc-update to append FUNC- to FEAT-discount's functionalities array — otherwise the entity will be flagged as ORPHAN_FUNCTIONALITY.", + "files": [], + "expectations": [ + "Loads living-doc-create-functionality skill", + "Assigns next FUNC-nnn ID via next_id.py", + "Creates Functionality with happy-path and error-path ACs", + "Prompts to update parent FEAT-discount.functionalities array via living-doc-update", + "Warns that skipping the parent link causes ORPHAN_FUNCTIONALITY gap" + ] + }, + { + "id": 22, + "category": "skill-dispatch", + "prompt": "Update the AC wording on US-042-AC-1 — the product owner changed the discount threshold from $50 to $75.", + "expected_output": "Agent loads the `living-doc-update` skill. Shows the OLD AC-1 text and proposes the NEW text with $75 threshold. Keeps the AC ID (US-042-AC-1) stable — never changes IDs. Bumps the AC version from v1.0.0 to v1.1.0. Flags the linked Gherkin scenario for review — the step 'When the cart total exceeds $50' will need updating. Runs validate_entity.py after the edit to confirm no invariants are broken.", + "files": [], + "expectations": [ + "Loads living-doc-update skill", + "Shows OLD and NEW AC text side by side before committing", + "AC ID stays stable — only description/GWT fields updated", + "AC version bumped from v1.0.0 to v1.1.0", + "Flags linked Gherkin scenario as stale and needing update", + "Runs validate_entity.py post-edit" + ] + }, + { + "id": 23, + "category": "skill-dispatch", + "prompt": "Generate BDD scenarios for US-007 — Place an Online Order. It has 3 Active ACs and 1 Planned AC.", + "expected_output": "Agent loads the `living-doc-scenario-creator` skill. Generates a .feature file for US-007 with 3 scenarios — one per Active AC. The Planned AC is skipped (not generated until Active). Each scenario is preceded by a '# AC: US-007-0n (v1.0.0 – Active)' traceability comment. Merge policy: if a scenario already exists for an AC, applies the 4-row decision table (skip if intent matches, update if GWT is stale, propose replacement if deprecated, flag if multiple scenarios exist per AC).", + "files": [], + "expectations": [ + "Loads living-doc-scenario-creator skill", + "Generates scenarios only for Active ACs — skips Planned", + "Each scenario preceded by # AC: traceability comment", + "Applies merge policy decision table for existing scenarios", + "Output is a named .feature file: us-007-place-an-online-order.feature" + ] + }, + { + "id": 24, + "category": "skill-dispatch", + "prompt": "Create a User Story for the guest checkout feature — a guest should be able to place an order without registering.", + "expected_output": "Agent loads the `living-doc-create-user-story` skill. Assigns the next US-nnn ID via next_id.py. Elicits the full narrative: As a [guest customer] / I can [place an order without registering] / so that [I can complete a purchase without commitment]. Guides through AC creation: at least a happy-path AC and one error-path AC. Checks whether existing Functionalities (e.g. FUNC-guest-cart, FUNC-guest-checkout) should be linked in the functionalities array. Routes to living-doc-create-functionality if new Functionalities are needed.", + "files": [], + "expectations": [ + "Loads living-doc-create-user-story skill", + "Assigns next US-nnn ID via next_id.py", + "Elicits complete narrative (As a / I can / so that)", + "Requires at least one happy-path and one error-path AC", + "Asks about existing Functionalities to link — prevents ORPHAN_FUNCTIONALITY gaps" + ] + }, + { + "id": 25, + "category": "negative", + "prompt": "Write a unit test for the applyDiscount() service method.", + "expected_output": "Agent declines the request. Explains that writing unit or integration tests is outside its scope. Directs the user to @sdet-copilot (noting it is not yet deployed). Does not write or stub any test code, and does not leave a TODO comment in any file.", + "files": [], + "expectations": [ + "Declines the request — does not write any test code", + "Directs user to @sdet-copilot", + "Does not add a TODO comment to any file", + "Does not partially implement or stub the unit test" + ] + } + ] +} \ No newline at end of file diff --git a/.github/agents/evals/living-doc-bdd-copilot/fixture-map.md b/.github/agents/evals/living-doc-bdd-copilot/fixture-map.md new file mode 100644 index 0000000..eeeab14 --- /dev/null +++ b/.github/agents/evals/living-doc-bdd-copilot/fixture-map.md @@ -0,0 +1,31 @@ +# Fixture Map — living-doc-bdd-copilot agent evals + +## Eval coverage summary + +| Eval ID | Category | Description | Fixture files | +|---------|----------|-------------|---------------| +| 1 | happy-path | Business Seed assembly — seed.yaml structure | — | +| 2 | happy-path | Create mode: PageObject generation from crawled surface | — | +| 3 | happy-path | Scenario generation from US ACs | — | +| 4 | regression | RE-SCAN mode — selector drift detection and repair | — | +| 5 | regression | HEALING mode — broken step definitions | — | +| 6 | negative | Unit test request → @sdet-copilot | — | +| 7 | paraphrase | "fix failing tests" → HEALING mode trigger | — | +| 8 | regression | REMOVE mode — full feature removal with pre-deletion checklist | — | +| 9 | regression | Partial state rule: seed.yaml present, manifest.json absent → first run | — | +| 10 | regression | Credential safety — literal credentials in seed.yaml rejected | — | +| 11 | edge-case | Source E guided traversal — blocked crawl, unknown field value | — | +| 12 | output-format | manifest.json entry structure for a scanned route | — | +| 25 | negative | Unit test request → decline + direct to @sdet-copilot | — | + +## Trigger eval summary + +| Count | Triggers (should_trigger=true) | Non-triggers (should_trigger=false) | +|-------|-------------------------------|--------------------------------------| +| 24 total | 20 true | 4 false | + +False cases: +- `write a unit test` → @sdet-copilot +- `TypeScript quality gate` → @quality-gate-copilot (out of scope) + +> No fixture files — all evals use inline prompt/expected_output; agent behavior is assessed against the agent.md operating rules and skill definitions. diff --git a/.github/agents/evals/living-doc-bdd-copilot/trigger-eval.json b/.github/agents/evals/living-doc-bdd-copilot/trigger-eval.json new file mode 100644 index 0000000..c14fa01 --- /dev/null +++ b/.github/agents/evals/living-doc-bdd-copilot/trigger-eval.json @@ -0,0 +1,266 @@ +[ + { + "id": 1, + "query": "Scan the webapp at https://app.example.com and generate PageObjects", + "should_trigger": true, + "reason": "'scan webapp' trigger phrase" + }, + { + "id": 2, + "query": "Generate PageObjects for the checkout and login screens", + "should_trigger": true, + "reason": "'generate pageobjects' trigger phrase" + }, + { + "id": 3, + "query": "Heal the PageObjects after the UI redesign — selectors are broken", + "should_trigger": true, + "reason": "'heal pageobjects' trigger phrase" + }, + { + "id": 4, + "query": "Generate BDD scenarios for the active User Stories", + "should_trigger": true, + "reason": "'generate scenarios' trigger phrase" + }, + { + "id": 5, + "query": "Sync the Gherkin feature files with the living doc AC catalog", + "should_trigger": true, + "reason": "'sync gherkin' trigger phrase" + }, + { + "id": 6, + "query": "Use Playwright to crawl the application and discover all screens", + "should_trigger": true, + "reason": "'playwright crawl' trigger phrase" + }, + { + "id": 7, + "query": "Explore the app and map all the UI surfaces", + "should_trigger": true, + "reason": "'explore the app' trigger phrase" + }, + { + "id": 8, + "query": "@bdd-copilot scan the dashboard and generate scenarios", + "should_trigger": true, + "reason": "'bdd copilot' trigger phrase — explicit agent invocation" + }, + { + "id": 9, + "query": "@living-doc-bdd-copilot set up the BDD suite for our new module", + "should_trigger": true, + "reason": "'living doc bdd copilot' trigger phrase — explicit agent invocation" + }, + { + "id": 10, + "query": "Run the full BDD pipeline — crawl, generate PageObjects, and produce feature files", + "should_trigger": true, + "reason": "'BDD pipeline' trigger phrase" + }, + { + "id": 11, + "query": "Crawl the UI to discover all reachable pages", + "should_trigger": true, + "reason": "'crawl the UI' trigger phrase" + }, + { + "id": 12, + "query": "Create page objects for the admin portal", + "should_trigger": true, + "reason": "'create page objects' trigger phrase" + }, + { + "id": 13, + "query": "Generate a feature file for US-007 — Place an Online Order", + "should_trigger": true, + "reason": "'generate feature file' trigger phrase" + }, + { + "id": 14, + "query": "What is the scenario coverage for US-007?", + "should_trigger": true, + "reason": "'scenario coverage' trigger phrase" + }, + { + "id": 15, + "query": "Write the step definitions for the checkout scenarios", + "should_trigger": true, + "reason": "'step definitions' trigger phrase" + }, + { + "id": 16, + "query": "Generate Gherkin from user story US-003", + "should_trigger": true, + "reason": "'gherkin from user story' trigger phrase" + }, + { + "id": 17, + "query": "Create a User Story for the loyalty points redemption feature", + "should_trigger": true, + "reason": "Catalog entity creation is handled by this agent in catalog-operations mode" + }, + { + "id": 18, + "query": "Write a unit test for the discount calculation function", + "should_trigger": false, + "reason": "Unit test authoring — out of scope for this toolkit (no @sdet-copilot agent defined)" + }, + { + "id": 19, + "query": "Update the AC state on US-007-02 to DEPRECATED", + "should_trigger": true, + "reason": "Catalog entity state update is handled by this agent in catalog-operations mode" + }, + { + "id": 20, + "query": "Run the TypeScript quality gate for the frontend", + "should_trigger": false, + "reason": "Quality gate execution — out of scope for this agent" + }, + { + "id": 21, + "query": "The manifest.json is missing — start a first exploration run from the seed file", + "should_trigger": true, + "reason": "Partial state: seed present, manifest absent → first exploration run — 'scan webapp' pattern" + }, + { + "id": 22, + "query": "The seed.yaml has literal credentials — is that correct?", + "should_trigger": true, + "reason": "Credential safety rule enforcement during seed assembly — BDD session setup task" + }, + { + "id": 23, + "query": "I've hit a guided traversal point — the checkout wizard needs a delivery zone code", + "should_trigger": true, + "reason": "Source E guided traversal protocol — blocked crawl point during exploration" + }, + { + "id": 24, + "query": "Update the AC on US-007 to change the payment timeout to 30 seconds", + "should_trigger": true, + "reason": "AC update is a catalog layer operation handled by this agent" + }, + { + "id": 25, + "query": "Debug the null pointer exception in PaymentService.processOrder()", + "should_trigger": false, + "reason": "Application debugging — outside the living doc / BDD scope" + }, + { + "id": 26, + "query": "Add error handling to the checkout API endpoint", + "should_trigger": false, + "reason": "Production code change — outside the living doc / BDD scope" + }, + { + "id": 27, + "query": "Write an OpenAPI spec for the orders REST endpoint", + "should_trigger": false, + "reason": "API schema documentation — not living doc entity creation" + }, + { + "id": 28, + "query": "Configure the Kubernetes resource limits for the order service", + "should_trigger": false, + "reason": "Infrastructure configuration — outside scope" + }, + { + "id": 29, + "query": "Set up a CI pipeline for the frontend build", + "should_trigger": false, + "reason": "CI/CD configuration — outside scope" + }, + { + "id": 30, + "query": "Fix the failing unit tests in CartCalculatorTest", + "should_trigger": false, + "reason": "Unit test fix — outside the living doc / BDD scope" + }, + { + "id": 31, + "query": "Refactor the PaymentService to use the repository pattern", + "should_trigger": false, + "reason": "Code refactoring — outside the living doc / BDD scope" + }, + { + "id": 32, + "query": "Add structured logging to the checkout service", + "should_trigger": false, + "reason": "Application logging — outside scope" + }, + { + "id": 33, + "query": "Review this pull request for code quality issues", + "should_trigger": false, + "reason": "Code review — outside scope" + }, + { + "id": 34, + "query": "Configure ESLint rules for the frontend project", + "should_trigger": false, + "reason": "Dev tooling configuration — outside scope" + }, + { + "id": 35, + "query": "Write a database migration script to add the promo_code column", + "should_trigger": false, + "reason": "DB schema change — outside scope" + }, + { + "id": 36, + "query": "Optimize the SQL query in OrderRepository.findByCustomer()", + "should_trigger": false, + "reason": "Query optimization — outside scope" + }, + { + "id": 37, + "query": "Set up monitoring alerts for the payment service", + "should_trigger": false, + "reason": "Ops / monitoring — outside scope" + }, + { + "id": 38, + "query": "Write technical documentation for the REST API", + "should_trigger": false, + "reason": "Generic tech docs — not living doc entity creation" + }, + { + "id": 39, + "query": "How do I set up a multi-stage Docker build for the backend?", + "should_trigger": false, + "reason": "Container/infra question — outside scope" + }, + { + "id": 40, + "query": "Run the security vulnerability scan on the checkout service", + "should_trigger": false, + "reason": "Security tooling — outside scope" + }, + { + "id": 41, + "query": "Generate a performance report for the checkout flow", + "should_trigger": false, + "reason": "Performance testing — outside scope" + }, + { + "id": 42, + "query": "Write a changelog for the v2.1.0 release", + "should_trigger": false, + "reason": "Release management — outside scope" + }, + { + "id": 43, + "query": "Configure feature flags for the new checkout flow", + "should_trigger": false, + "reason": "Feature flag setup — outside scope" + }, + { + "id": 44, + "query": "Fix the TypeScript compilation error in CheckoutComponent", + "should_trigger": false, + "reason": "Compile error fix — outside scope" + } +] \ No newline at end of file diff --git a/.github/agents/living-doc-bdd-copilot.agent.md b/.github/agents/living-doc-bdd-copilot.agent.md new file mode 100644 index 0000000..9f8518f --- /dev/null +++ b/.github/agents/living-doc-bdd-copilot.agent.md @@ -0,0 +1,314 @@ +--- +description: > + Living documentation catalog (User Story/Feature/Functionality entities, ACs, + living-doc traceability analysis, gap finding) and BDD automation (Playwright + crawl/explore/scan, PageObject create/heal, Gherkin scenarios/feature files/step + definitions, living-doc sync, scenario coverage). Catalog entity creation, + update, deprecation; PR trace for living-doc entity impact; credential + validation in seed.yaml. NOT for: unit tests, production code, API or generic + tech docs, CI/CD, debugging, performance, security, code review. +tools: [vscode/askQuestions, vscode/toolSearch, vscode/memory, vscode/resolveMemoryFileUri, execute/runInTerminal, execute/getTerminalOutput, execute/sendToTerminal, execute/killTerminal, read/readFile, read/viewImage, read/problems, read/terminalLastCommand, agent/runSubagent, browser/openBrowserPage, browser/readPage, browser/screenshotPage, browser/navigatePage, browser/clickElement, browser/dragElement, browser/hoverElement, browser/typeInPage, browser/runPlaywrightCode, browser/handleDialog, edit/createDirectory, edit/createFile, edit/editFiles, edit/rename, search/codebase, search/fileSearch, search/listDirectory, search/textSearch, search/usages, web/fetch, web/githubRepo, web/githubTextSearch, todo] +--- + +# @living-doc-bdd-copilot + +Full living documentation agent. Owns both the catalog layer (requirements, entities, ACs, traceability) and the automation layer (PageObjects, Gherkin, step definitions, BDD maintenance). One agent, no cross-agent handoffs needed. + +**Before any multi-step task:** State your plan in one sentence — name the mode, the skill you will load, and your first concrete action. Then proceed. + +--- + +## Initialisation (catalog layer) + +When the user is setting up living documentation for the first time, ask: + +> "Which storage format does your living doc use? Describe field names, entity structure, and where entities are stored (e.g. YAML files in `docs/living-doc/`)." + +Wait for the answer before the first create or update. Extract storage location, entity templates, AC block structure, and field name mappings. Write to `.copilot/living-doc/.storage-profile.md`. If it already exists at session start, load it and skip the prompt. + +--- + +## Session State + +For multi-step sessions, maintain a state file to keep context lean: + +- **Catalog sessions** (HEALING, PLAN, multi-entity): `.copilot/living-doc/.session-state.md` +- **Automation sessions** (EXPLORE, RE-SCAN, SCENARIO-GEN): `.copilot/bdd/.session-state.md` + +Both files use the same schema: + +```markdown +# Session State +_Auto-managed. Delete when session complete._ + +## Mode +## Goal +## Artifacts + +## Progress + + + +## Current Position +## Pending Actions +## Decisions & Findings +``` + +**Update rules:** Mark entities/routes `[-]` when starting, `[x]` when done. Append to Decisions & Findings on every non-obvious discovery. Delete the file when the session goal is fully achieved. + +**Stopping conditions — escalate to user when:** +- Code deletion cannot be confirmed via repository search (catalog). +- A route fails 3 consecutive navigation attempts — auth wall, 5xx, redirect loop (automation). +- A CAPTCHA or MFA prompt is detected — record and skip the route; do not attempt bypass. +- Context nearing capacity — write compaction summary to Decisions & Findings, ask user to resume in a new session. +- More than 50 tool calls without completing the session goal — pause and summarise. + +--- + +## Mode Dispatch + +Load **one** skill per session. Do not pre-load skills for modes not yet triggered. + +### Catalog Operations + +| User intent | Load skill | +|---|---| +| Create User Story | `living-doc-create-user-story` | +| Create Feature (system surface) | `living-doc-create-feature` | +| Create Functionality (atomic behavior) | `living-doc-create-functionality` | +| Update / deprecate entity or AC | `living-doc-update` | +| Promote entity to ACTIVE | `living-doc-update` | +| PR impact analysis / trace affected entities | `living-doc-impact-analysis` | +| Catalog gaps / AUDIT mode / PLAN mode | `living-doc-gap-finder` | + +`living-doc-gap-finder` is used **top-down** in catalog operations — finding missing documentation entities. Bottom-up (uncovered ACs) is used in automation operations (see below). + +### Automation Operations + +| User intent | Load skill | Manifest scope | +|---|---|---| +| Scan / crawl / explore webapp | `living-doc-pageobject-scan` | Routes being crawled this session | +| Add / fix missing data-cy | `data-cy-instrument` | Routes with coverage gaps only | +| Generate scenarios from ACs | `living-doc-scenario-creator` | Target US's route entry only | +| Fix failing tests / selector drift | `living-doc-pageobject-scan` (HEALING scope) | Failing routes only | +| Full re-scan after UI change | `living-doc-pageobject-scan` (RE-SCAN scope) | Full manifest | +| Remove deprecated feature automation | `bdd-maintain` (REMOVE) | Deprecated route entry only | +| Dead code audit (unused steps / PO methods / PO classes) | `bdd-maintain` (DEAD CODE AUDIT) | Full BDD suite | +| Sync feature files / traceability tags | `gherkin-living-doc-sync` | No manifest loading | +| Implement step definitions | `gherkin-step` | No manifest loading | +| Find ACs with no linked scenario | `living-doc-gap-finder` (bottom-up) | No manifest loading | + +### Automation session setup + +**Seed assembly** — build `seed.yaml` from these sources (load what is available; note absent sources, do not error): + +| Source | What to load | +|---|---| +| A | Feature-to-route mappings from the living doc catalog | +| B | Route config: Angular router, React Router, or `sitemap.xml` | +| D | Existing `manifest.json` — if absent, this is a first-run | + +After creating `seed.yaml`, propose adding BDD artifact paths (seed, manifest, PageObjects, feature files) to `.github/copilot-instructions.md` so future sessions have them in context automatically. + +**Partial state detection:** + +| State | Rule | +|---|---| +| seed.yaml present, manifest.json absent | First exploration run — start from `base_url`, create manifest during crawl, do not assume prior discovery | +| Both present | Resume session from manifest state | +| Neither present | Collect seed inputs from user before proceeding | + +**Credential security:** `seed.yaml` credentials must always use `env:VAR_NAME` references. If literal credential values are present, flag as a **security violation** and refuse to proceed until they are replaced with environment variable references. Explain that literal credentials in a committed file are exposed to anyone with repository access. + +**Guided traversal (Source E):** When the crawl reaches a page requiring a business-specific value the agent cannot determine (unknown form field, decision point): + +1. Take a screenshot and show the user the current state. +2. Ask: "I've reached a decision point at ``. What should I do next? Please provide the value for ``." +3. Execute the action via MCP Playwright after receiving the answer. +4. Immediately append the action to `guided_steps` in `seed.yaml` so the route can be re-navigated without prompting in future sessions. +5. Do not invent or guess business-specific field values. + +### Entity deprecation chain + +When a User Story or Feature is deprecated, three skills fire in sequence. Complete each step fully before starting the next. + +| Step | Skill | Action | +|---|---|---| +| 1 | `living-doc-update` | Set entity `status: deprecated`; add `deprecated_at`, `deprecation_reason`, and optionally `superseded_by` | +| 2 | `gherkin-living-doc-sync` | Find all scenarios tagged `@AC:` for the deprecated entity's ACs; add `@deprecated` and `@review-needed` | +| 3 | `bdd-maintain` (REMOVE) | Confirm file deletion list with user; remove confirmed `.feature` files, PageObjects, and step definitions; update `manifest.json` | + +Do not skip steps or run them out of order. Complete catalog changes (step 1) before touching any Gherkin or automation files. + +**Manifest loading rule:** Use targeted line ranges for the current route(s). Load full manifest only for RE-SCAN. `seed.yaml`: always load in full. When PageObject generation discovers a route with no linked Feature entity, set `feature_id: FEAT-UNKNOWN`, flag the route as needing a Feature entity, and cross-load `living-doc-create-feature` to create it before continuing. + +**living-doc-bdd-schemas:** Load [remotely](https://raw.githubusercontent.com/AbsaOSS/agentic-toolkit/master/skills/references/living-doc-bdd-schemas.md) only when generating or validating feature file headers, PageObject headers, ExplorationFixture entries, seed.yaml form_fixtures, or manifest.json route entries. + +--- + +## Scope + +**Catalog layer:** +- Create/update/deprecate User Story, Feature, and Functionality entities +- Add, update, or reprioritise ACs; promote entities from PLANNED to ACTIVE +- Analyse the impact of a code change or PR on the catalog +- Find catalog gaps: undocumented behaviours, orphan tests, untested ACs (top-down) +- Draft ACs from PO descriptions in PLANNED state (PLAN mode) + +**Automation layer:** +- Assemble Business Seed (`seed.yaml`) and explore webapps via MCP Playwright +- Generate and maintain PageObjects; write manifest.json +- Generate full Gherkin feature files from User Story / Functionality ACs +- Write and extend step definitions +- Heal PageObjects after UI changes (selector drift, failing tests) +- Sync `@AC:` traceability tags between feature files and catalog + +## Does NOT + +- **Write unit or integration tests** — decline and direct the user to `@sdet-copilot` (not yet deployed). Do not write or modify any test code. +- **Run language-specific quality gates** — decline and direct the user to `@quality-gate-copilot` (not yet deployed). Do not execute linters, type-checkers, or build pipelines. + +--- + +## AC Metadata (catalog layer) + +Every AC must carry: + +| Field | Values | +|---|---| +| `state` | `PLANNED` / `IN_REVIEW` / `ACTIVE` / `DEPRECATED` | +| `version` | Semantic version string | +| `pre-conditions` | Conditions that must hold before the AC can be tested | +| `not_in_scope` | Explicit exclusion statement | + +--- + +## Tool Guidance + +| Tool | When to use | Key guidance | +|---|---|---| +| `read/readFile` | Load entity files, skills, manifest, seed, session state | Always read before writing. Load `manifest.json` with targeted line ranges; `seed.yaml` in full. Load skills on demand. | +| `browser/runPlaywrightCode` | Navigate and interact during EXPLORE/HEAL modes | Snapshot before harvesting elements. Never attempt CAPTCHA bypass. | +| `execute/runInTerminal` | Run `scripts/next_id.py`, gap/coverage scripts | Verify script output before using IDs. | +| `search/codebase` | Confirm code deletion before deprecating | Require negative result for at least two identifiers before assuming deleted. | +| `search/textSearch` | Find `@AC:` annotations affected by an AC update | Run before writing AC changes to surface stale Gherkin links. | +| `edit/createFile` | New entity files, PageObjects, feature files, step stubs | Run `search/fileSearch` first — never overwrite without reading. Confirm Storage Profile loaded for entity files. | +| `edit/editFiles` | Update existing files | Show OLD vs NEW before writing `ACTIVE` AC changes. Read full target block first. | + +--- + +## Examples + +**Example 1 — Catalog: create a User Story** + +> User: Create a User Story for the promo code feature. ACs: valid promo reduces cart by 10%; expired promo shows error. + +Plan: Loading `living-doc-create-user-story`. First action: confirm Storage Profile loaded, then draft the As-a/I-can/so-that narrative and ACs for user confirmation. + +--- + +**Example 2 — Automation: generate scenarios** + +> User: Generate Gherkin scenarios for US-007 — Place an Online Order. + +Plan: Loading `living-doc-scenario-creator` for US-007. First action: read US-007 ACs from the catalog, then load the manifest entry for the checkout route. + +--- + +**Example 3 — HEALING mode (catalog)** + +> User: Run HEALING mode — we deleted the legacy payment flow last sprint. + +Plan: Loading `living-doc-gap-finder` (top-down). First action: create session state at `.copilot/living-doc/.session-state.md`, then search codebase for `LegacyPaymentService` to confirm deletion. Never deprecate without a confirmed negative code search. + +--- + +## Living Doc Conventions + +Full model: [living-doc-glossary](https://raw.githubusercontent.com/AbsaOSS/agentic-toolkit/master/skills/references/living-doc-glossary.md) — load only if creating or validating entities. + +**Entity IDs:** `US-` · `FEAT-` · `FUNC-` + +**AC reference format:** `AC:- (v) — ` +State: `PLANNED | IN_REVIEW | ACTIVE | DEPRECATED` + +**Gherkin traceability:** every scenario in `features/us/` and `features/functionalities/` requires: +```gherkin +# AC:US-1-01 (v1.0.0 - ACTIVE) — +@AC:US-1-01 +Scenario: ... +``` +Aspect variant: `@AC:US-1-01/aspect:username-input`. The `@AC:` tag is the single source of machine traceability. + +**Surface types:** `UI` → PageObject (prefer `data-testid`). `API` → contract test layer only. + +**ACTIVE ACs** drive scenario generation. DEPRECATED ACs require `deprecated_at`, `deprecation_reason`, optionally `superseded_by`. + +**Catalog layer healing boundary:** catalog changes (AC states, traceability links, entity deprecation) and automation changes (PageObjects, step definitions, Gherkin files) are separate steps — complete catalog changes before moving to automation updates in the same session. + +--- + +## Skills + +### Catalog skills + +| Skill | Intent | Path | When to load | +|---|---|---|---| +| `living-doc-create-user-story` | Create US with business-level ACs | `skills/living-doc-create-user-story/SKILL.md` | New US or narrative request | +| `living-doc-create-feature` | Document a system surface | `skills/living-doc-create-feature/SKILL.md` | New Feature or inbound surface from EXPLORE mode | +| `living-doc-create-functionality` | Define an atomic, testable behaviour | `skills/living-doc-create-functionality/SKILL.md` | New Functionality or atomic-behaviour AC request | +| `living-doc-update` | Amend or deprecate entities | `skills/living-doc-update/SKILL.md` | Updating, promoting, or deprecating an entity or AC | +| `living-doc-impact-analysis` | Trace which entities a code change affects | `skills/living-doc-impact-analysis/SKILL.md` | PR review or change-trace request | +| `living-doc-gap-finder` | Find catalog gaps (top-down) and uncovered ACs (bottom-up) | `skills/living-doc-gap-finder/SKILL.md` | HEALING mode, gap audit, or scenario gap detection | + +### Automation skills + +| Skill | Intent | Path | When to load | +|---|---|---|---| +| `living-doc-pageobject-scan` | Seed assembly, crawl, PageObject generation, manifest; RE-SCAN and HEALING scopes | `skills/living-doc-pageobject-scan/SKILL.md` | EXPLORE, RE-SCAN, or HEALING mode | +| `data-cy-instrument` | Audit and add missing `data-cy` attributes; sync PageObjects | `skills/data-cy-instrument/SKILL.md` | DATA-CY mode | +| `living-doc-scenario-creator` | Generate full feature files (header + scenarios + step bodies) from ACs | `skills/living-doc-scenario-creator/SKILL.md` | SCENARIO-GEN mode | +| `bdd-maintain` | REMOVE deprecated BDD files; DEAD CODE AUDIT | `skills/bdd-maintain/SKILL.md` | REMOVE or DEAD CODE AUDIT mode | +| `gherkin-step` | Implement step definitions | `skills/gherkin-step/SKILL.md` | Step authoring request | +| `gherkin-living-doc-sync` | Sync feature files with living doc traceability | `skills/gherkin-living-doc-sync/SKILL.md` | Traceability sync request | + +--- + +## Operating rules + +**Storage (catalog):** Confirm and cache the Storage Profile before the first entity create/update. Never invent field names — always use confirmed Storage Profile names. + +**Routing:** Route by request type using Mode Dispatch above. If a request spans catalog and automation (e.g. "create a US and generate its feature file"), complete the catalog step first, then proceed to the automation step within the same session. + +**Entity creation:** Atomic ACs only — one condition + one observable outcome. Every AC needs `id`, `state`, `version`, `pre-conditions`, `not_in_scope`. Assign IDs via `scripts/next_id.py`. + +**Updates:** Show OLD vs NEW before writing any `ACTIVE` AC change. Keep AC IDs stable — changing breaks traceability. + +**HEALING mode (catalog):** Verify deleted code via two negative repository searches before deprecating. Complete catalog changes, then run automation healing as a follow-up step. + +**PLAN mode:** Draft ACs → present for confirmation → create in `PLANNED` state only. + +**Impact analysis:** Produce explicit impact map; recommend updates but do not change entity state without user confirmation. + +--- + +## File editing protocol (CLI context) + +When running via GitHub Copilot CLI task tool, `str_replace`/`edit` are not provisioned. For file modifications use this format: + +``` +FILE: +FIND (exact, unique string): +<<< + +>>> +REPLACE WITH: +<<< + +>>> +``` + +Append: `⚙️ **Caller action required:** Apply the edit specs above using the edit tool, then confirm completion.` + +For new files: use `create` directly. + diff --git a/.github/workflows/check_pr_release_notes.yml b/.github/workflows/check_pr_release_notes.yml index af3761e..646e098 100644 --- a/.github/workflows/check_pr_release_notes.yml +++ b/.github/workflows/check_pr_release_notes.yml @@ -21,3 +21,4 @@ jobs: github-repository: ${{ github.repository }} pr-number: ${{ github.event.number }} skip-labels: "no RN" + title: "## [Rr]elease [Nn]otes" diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..632a17a --- /dev/null +++ b/.gitignore @@ -0,0 +1,3 @@ +**/__pycache__/* +skills-eval-workspace/* +agents-eval-workspace/* \ No newline at end of file diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 7a213fc..3be003b 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -38,7 +38,7 @@ skills/ | `scripts/` | Deterministic or repetitive logic better run as code than described in prose (e.g. a validation script, a formatter, a data transformer) | | `references/` | Domain docs, API specs, decision tables, or anything too large to keep in `SKILL.md` without exceeding 500 lines | | `assets/` | Template files, example inputs/outputs, icons — anything the skill produces or consumes | -| `evals/` | Test prompts and assertions to verify skill behavior and trigger accuracy. See [skill-testing.md](./docs/skill-testing.md) | +| `evals/` | Test prompts and assertions to verify skill behavior and trigger accuracy. See [skill-testing.md](./docs/testing/skill-testing.md) | --- @@ -201,7 +201,7 @@ If every test run of your skill independently writes the same helper script (a f Before proposing a PR, verify that your skill activates correctly and produces good output. The full testing methodology — eval creation, fixture management, with/without comparisons, trigger testing, and description optimization using the Anthropic [`skill-creator`](https://github.com/anthropics/skills/tree/main/skills/skill-creator) -skill — is covered in **[docs/skill-testing.md](./docs/skill-testing.md)**. +skill — is covered in **[docs/testing/skill-testing.md](./docs/testing/skill-testing.md)**. --- @@ -227,5 +227,6 @@ Before opening a pull request, verify: - [ ] No hardcoded credentials, secrets, or internal paths in skill body or scripts - [ ] Any script in `scripts/` is referenced from `SKILL.md` with usage guidance - [ ] New skill's description does not conflict with or shadow existing skills +- [ ] Skill added to the catalog table in `README.md` - [ ] Evals exist (or a note explains why they are not applicable) - [ ] `skills-ref validate ./skills/my-skill` passes (install: `pip install skills-ref`) diff --git a/README.md b/README.md index f90df00..e4dc71b 100644 --- a/README.md +++ b/README.md @@ -75,7 +75,30 @@ For the full guide — what skills are, how they activate, project-scoped instal Browse all available skills in the **[skills/](./skills/)** directory — each skill folder contains a `SKILL.md` with its purpose, trigger phrases, and full instructions. -> The catalog table will be populated as skills are added. See `skills/` for the current set. +| Skill | Description | +|------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------| +| **[living-doc-create-user-story](./skills/living-doc-create-user-story/)** | Create a well-formed User Story with business-level Acceptance Criteria that are traceable, testable, and E2E-ready. | +| **[living-doc-create-feature](./skills/living-doc-create-feature/)** | Document a system surface (UI screen, API endpoint, service) as a Feature entity with ownership and traceability links. | +| **[living-doc-create-functionality](./skills/living-doc-create-functionality/)** | Define an atomic, testable behaviour (Functionality) with AC designed for fast unit or integration tests. | +| **[living-doc-update](./skills/living-doc-update/)** | Amend or deprecate existing User Story, Feature, or Functionality entities — add ACs, change status, update ownership. | +| **[living-doc-impact-analysis](./skills/living-doc-impact-analysis/)** | Trace which Features, Functionalities, User Stories, and Gherkin scenarios are affected by a code change or PR. | +| **[living-doc-gap-finder](./skills/living-doc-gap-finder/)** | Identify undocumented behaviours, orphan tests, and untested ACs. Used by `@living-doc-bdd-copilot` top-down (catalog gaps) and bottom-up (scenario coverage). | +| **[living-doc-pageobject-scan](./skills/living-doc-pageobject-scan/)** | Discover, create, and maintain PageObject classes from a live web application — bootstrapping from scratch and detecting selector drift after UI changes. | +| **[bdd-explore](./skills/bdd-explore/)** | Assemble the Business Seed (`seed.yaml`) and iteratively crawl a web application via MCP Playwright — the first-time scan entry point for `@living-doc-bdd-copilot`. | +| **[bdd-maintain](./skills/bdd-maintain/)** | RE-SCAN, HEALING, REMOVE, and DEAD CODE AUDIT modes for `@living-doc-bdd-copilot` — refresh the manifest after UI changes, fix selector drift, remove deprecated features, and audit unused steps or PageObject methods. | +| **[data-cy-instrument](./skills/data-cy-instrument/)** | Resolve missing `data-cy` attributes in Angular component templates and sync PageObjects to use `getByTestId()` — run after a crawl when `coverage_gaps` are non-empty. | +| **[living-doc-scenario-creator](./skills/living-doc-scenario-creator/)** | Generate full Gherkin feature files from User Story and Functionality ACs — feature file header, @AC:-tagged scenarios, complete Given/When/Then step bodies, coverage report, and step definition resolution. | +| **[gherkin-step](./skills/gherkin-step/)** | Implement clean, reusable step definitions — behave (Python), Cucumber (Java, TypeScript, Scala), parameter types, DataTable, DocString, and hooks. | +| **[gherkin-living-doc-sync](./skills/gherkin-living-doc-sync/)** | Synchronise Gherkin feature files with the living documentation catalog — fix missing AC traceability headers, step text drift, and stale scenario links. | +| **[token-saving](./skills/token-saving/)** | Always-active response discipline — enforces brevity, no filler openers or closers, structured output, and a What/Why/How footer on code responses. Suspends on explicit "full detail" requests. | + +## Agent Roster + +Agents are pre-configured AI personas that orchestrate multiple skills for a specific engineering phase. Agent files live in **[.github/agents/](./.github/agents/)**. + +| Agent | Description | +|---|---| +| **[@living-doc-bdd-copilot](./.github/agents/living-doc-bdd-copilot.agent.md)** | Full living documentation agent — catalog management (User Stories, Features, Functionalities, AC updates, impact analysis, gap finding) plus BDD automation (webapp exploration, PageObjects, Gherkin, step definitions, BDD suite maintenance). | ## Finding More Skills @@ -87,12 +110,12 @@ Before building a new skill, check whether one already exists: | [skills.sh](https://skills.sh) | Open registry — install with `npx skills add ` | | [anthropics/skills](https://github.com/anthropics/skills) | Anthropic reference skills including `skill-creator` | | [absa-group/agent-skills](https://github.com/absa-group/agent-skills) | Broader ABSA-owned skill collection | -| [absa-group/cps-agentic-toolkit](https://github.com/absa-group/cps-agentic-toolkit) | CPS team's skill set built on top of this repo | +| [absa-group/cps-agentic-toolkit](https://github.com/absa-group/cps-agentic-toolkit) | CPS team's extended skill set (ABSA-internal) | ## Contributing See **[CONTRIBUTING.md](./CONTRIBUTING.md)** for the skill authoring guide — folder layout, frontmatter schema, writing -effective descriptions and bodies, [testing](./docs/skill-testing.md), and the PR checklist. +effective descriptions and bodies, [testing](./docs/testing/skill-testing.md), and the PR checklist. To propose a new skill — or to propose expanding the repo into agents, MCP servers, or plugins — [open an issue](https://github.com/AbsaOSS/agentic-toolkit/issues/new). @@ -116,3 +139,4 @@ Claude, Cursor, Windsurf, and custom pipelines. ## Troubleshooting Setup issues and common fixes are covered in **[docs/troubleshooting.md](./docs/troubleshooting.md)**. +All documentation guides are indexed at **[docs/](./docs/)**. diff --git a/docs/README.md b/docs/README.md index a13b50d..23e3313 100644 --- a/docs/README.md +++ b/docs/README.md @@ -2,13 +2,35 @@ Navigation hub for all guides in this repository. Browse by category below. -| Guide | Audience | Description | -|-----------------------------------------|----------|------------------------------------------------------------------------------------| -| [Getting Started](./getting-started.md) | Users | What skills are, how to install them, Copilot CLI usage | -| [Troubleshooting](./troubleshooting.md) | Users | Setup guides and fixes for install, activation, and proxy issues | -| [Contributing](../CONTRIBUTING.md) | Authors | Skill folder layout, frontmatter, description writing, body guidelines, PR process | -| [Skill Testing](./skill-testing.md) | Authors | Eval creation, fixtures, regression loops, trigger and description optimization | - -> **Keep this index up to date.** When you add a new guide under `docs/`, add a row to the table above. +## Contents + +- [Setup & Repository Guides](#setup--repository-guides) +- [Skill Guides](#skill-guides) +- [Agent Guides](#agent-guides) + +## Setup & Repository Guides + +| Guide | Description | +|-----------------------------------------|-------------------------------------------------------------------------------------| +| [Getting Started](./getting-started.md) | What skills are, how to install them, Copilot CLI usage | +| [Contributing](../CONTRIBUTING.md) | Skill folder layout, frontmatter, description writing, body guidelines, PR process | +| [Agent Design Best Practices](./guides/agent-design.md) | Core principles, file structure, context management, tool guidance, examples, and stopping conditions for `.agent.md` files | +| [Skill Testing](./testing/skill-testing.md) | Eval creation, fixtures, regression loops, trigger and description optimization | +| [Agent Testing](./testing/agent-testing.md) | Eval creation, trigger accuracy tuning, and body quality testing for `.agent.md` files | +| [Troubleshooting](./troubleshooting.md) | Setup fixes for install, activation, and proxy issues | + +## Skill Guides + +| Guide | Description | +|-------------------------------------|------------------------------------------------------------------------------------| +| [Token Saving](./guides/token-saving.md) | Keeping AI responses concise — how the token-saving skill works and when it applies | + +## Agent Guides + +| Guide | Description | +|-----------------------------------------------|-------------------------------------------------------------------------| +| [Living Doc BDD Copilot](./guides/living-doc-bdd-copilot.md) | The unified living documentation agent: catalog management (User Stories, Features, Functionalities, AC updates, impact analysis, gap finding) plus BDD automation (webapp exploration, PageObjects, Gherkin, step definitions, maintenance) | + +> **Keep this index up to date.** When you add a new guide, add a row to the appropriate table above. See also the [main README](../README.md) for the skill catalog, scope, and FAQ. diff --git a/docs/getting-started.md b/docs/getting-started.md index d5402a3..d17b36a 100644 --- a/docs/getting-started.md +++ b/docs/getting-started.md @@ -132,7 +132,7 @@ Project skills take precedence over global skills when both exist. ### Add project-specific skills For skills that only apply to a specific repository, place them in `.github/skills/` within that repo. These are loaded -automatically when Copilot CLI is launched from that directory, layered on top of your personal and CPS base skills. +automatically when Copilot CLI is launched from that directory, layered on top of your personal and shared base skills. ``` your-project-repo/ @@ -142,6 +142,51 @@ your-project-repo/ └── SKILL.md ``` +## Install Agents + +Agents are pre-configured AI personas that orchestrate multiple skills for a specific engineering phase. Unlike skills (which are auto-activated by description matching), agents are invoked explicitly by name in Copilot Chat. + +### How agents differ from skills + +| | Skills | Agents | +|---|---|---| +| Activation | Auto — triggered when your prompt matches the description | Manual — you @-mention the agent name | +| Scope | Any compatible tool (Copilot, Claude, Cursor) | GitHub Copilot Chat (VS Code) | +| Install location | `~/.agents/skills/` or `.github/skills/` | `.github/agents/` inside your project repo | + +### Install an agent into your project + +Copy the agent file into your project's `.github/agents/` directory: + +```bash +# One-time setup +mkdir -p .github/agents + +# Copy the agent +cp path/to/agentic-toolkit/.github/agents/living-doc-bdd-copilot.agent.md .github/agents/ +``` + +Or clone the toolkit and copy all agents: + +```bash +git clone https://github.com/AbsaOSS/agentic-toolkit.git +cp agentic-toolkit/.github/agents/*.agent.md .github/agents/ +``` + +Commit the `.github/agents/` directory to share the agents with your team. + +### Use an agent in Copilot Chat + +Open Copilot Chat in VS Code and type `@` followed by the agent name: + +``` +@living-doc-bdd-copilot create user story for the login feature +@living-doc-bdd-copilot living doc gaps +@living-doc-bdd-copilot HEALING mode +``` + +The agent loads its skills on demand and follows its defined scope. See the [Agent Roster](../README.md#agent-roster) for the full list of available agents and their guides. + ## Troubleshooting Running into issues? See **[docs/troubleshooting.md](./troubleshooting.md)** guide. diff --git a/docs/guides/agent-design.md b/docs/guides/agent-design.md new file mode 100644 index 0000000..3392741 --- /dev/null +++ b/docs/guides/agent-design.md @@ -0,0 +1,251 @@ +# Agent Design Best Practices + +This guide distils Anthropic's engineering articles — [Building Effective Agents](https://www.anthropic.com/engineering/building-effective-agents) and [Effective Context Engineering for AI Agents](https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents) — into actionable rules for designing `.agent.md` files in this repository. + +--- + +## Table of Contents + +1. [Three core principles](#1-three-core-principles) +2. [Recommended file structure](#2-recommended-file-structure) +3. [Planning transparency](#3-planning-transparency) +4. [Tool list and tool guidance](#4-tool-list-and-tool-guidance) +5. [Inline examples](#5-inline-examples) +6. [Context management — just-in-time loading](#6-context-management--just-in-time-loading) +7. [Session state and note-taking](#7-session-state-and-note-taking) +8. [Stopping conditions](#8-stopping-conditions) +9. [Handoff contracts](#9-handoff-contracts) +10. [Repo conventions every agent must follow](#10-repo-conventions-every-agent-must-follow) + +--- + +## 1. Three core principles + +Anthropic's three design principles for agents, translated to this repo: + +| Principle | What it means here | +|---|---| +| **Simplicity** | One agent = one clear concern. Never give an agent scope that belongs to a cooperating agent. Mode dispatch loads one skill at a time. | +| **Transparency** | The agent must narrate its plan before executing a multi-step task. See [§3](#3-planning-transparency). | +| **ACI — Agent-Computer Interface** | Every tool the agent can call must be understood from the agent body alone. See [§4](#4-tool-list-and-tool-guidance). | + +--- + +## 2. Recommended file structure + +Organise every `.agent.md` body in this order. Use `##` Markdown headers for each section. + +``` +description: (YAML frontmatter) +tools: (YAML frontmatter) + +# @agent-name ← one-line purpose + relationship to cooperating agents + +## Initialisation ← storage/seed setup; runs only when starting fresh +## Session State ← note-taking schema; required if the agent runs multi-step tasks +## Mode Dispatch ← routing table: intent → skill + scope (if agent is multi-modal) +## Scope ← what this agent does +## Does NOT ← explicit out-of-scope items with named agent responsible +## Tool Guidance ← per-tool notes (usage, edge cases, common mistakes) +## Examples ← 1–2 canonical inline few-shot examples +## [Domain conventions] ← reference data kept inline (small, stable, always needed) +## Skills ← table with path and "When to load" column +## Operating rules ← decision rules; use sub-headers, not a flat bullet list +## File editing protocol ← CLI constraint protocol (if agent runs in CLI context) +## Handoff ← inbound and outbound structured payloads +``` + +**Altitude rule:** Instructions should sit in the Goldilocks zone — specific enough to guide behaviour, flexible enough to avoid brittle if-else hardcoding. Avoid encoding single-valued rules (e.g. `always use field X = "Y"`) in the agent body when they belong in the skill or the Storage Profile. + +--- + +## 3. Planning transparency + +Every agent **must** instruct itself to narrate its plan before executing any multi-step task: + +```markdown +**Before executing any multi-step task:** State your plan in one sentence — name the +mode or skill you will use and your first concrete action. Then proceed. +``` + +This satisfies Anthropic's second principle ("prioritise transparency by explicitly showing planning steps") and helps users understand and correct the agent's interpretation before it acts. + +--- + +## 4. Tool list and tool guidance + +### `tools:` frontmatter + +- List **individual tools** — not group aliases like `vscode` or `browser`. +- Only include tools the agent actually needs for its stated scope. +- An agent that explicitly states "Does NOT crawl web apps" must not list `browser/clickElement`, `browser/typeInPage`, etc. + +### `## Tool Guidance` body section + +Add a table with one row per key tool: + +```markdown +| Tool | When to use | Key guidance | +|---|---|---| +| `read/readFile` | Load entity files before updating | Always read before writing — never assume current values. | +| `edit/editFiles` | Patch existing files | Read the full target block first. Show OLD vs NEW for ACTIVE entity changes. | +| `search/codebase` | Confirm code deletion before deprecating | Require negative result for at least two identifiers before assuming deleted. | +``` + +**Rule:** If a human engineer on your team couldn't immediately tell which tool to use in a given situation, the agent can't either. Add guidance until the choice is unambiguous. + +--- + +## 5. Inline examples + +Include **1–2 canonical few-shot examples** directly in the agent body. Examples are the most token-efficient way to convey expected output format and planning behaviour. + +Format: + +```markdown +## Examples + +**Example 1 — ** + +> User: + +Agent plan: + +_(Brief description of what happens next.)_ + +Expected output: +\``` + +\``` +``` + +**Rules:** +- Cover the most common trigger case and one error/edge case. +- Examples must use real entity ID patterns (`US-007`, `FEAT-003`, `AC:US-007-01`). +- Step text in Gherkin examples must use domain language — no selectors, HTTP references, or database terms. +- Do not stuff every edge case in — 2 canonical examples beat 10 exhaustive ones. + +--- + +## 6. Context management — just-in-time loading + +**Skills:** Load one skill per session, only when the mode is confirmed. Never pre-load skills for modes that haven't been triggered. The `Mode Dispatch` table must show which skill maps to each intent. + +**Manifests and large files:** Always load with targeted line ranges for the route(s) in scope. Load the full file only for full re-scan operations. + +**Reference docs:** Do not inline content that is available in a referenced file, unless it is small (< 30 lines) and needed across all modes. Use `[Load only if …]` annotations in the Skills table. + +**Gap Finder modes:** Mode detail (what HEALING does, what PLAN does) belongs in the `living-doc-gap-finder` skill — not duplicated in the agent body. The agent body should name the mode and point to the skill. + +--- + +## 7. Session state and note-taking + +Any agent that runs multi-step tasks spanning many tool calls **must** define a session state file. This prevents context rot and enables resuming interrupted sessions. + +**Minimum schema:** + +```markdown +# Session State +_Auto-managed. Delete when session complete._ + +## Goal + + +## Progress + + +## Decisions & Findings + +``` + +**Rules:** +- Store at `.copilot//.session-state.md` (dot-prefix; add to `.gitignore`). +- Update after every item completes. +- Append to `Decisions & Findings` for non-obvious discoveries only. +- Never store large data objects here — those belong in the artifact file (e.g. `manifest.json`). +- Delete the file when the session goal is fully achieved. + +**Compaction trigger:** When context is nearing capacity, write a compaction summary (all unresolved items + key findings) to `Decisions & Findings`, then ask the user to start a new session and resume from the state file. + +--- + +## 8. Stopping conditions + +Every agent must define explicit escalation rules. At minimum include: + +```markdown +**Stopping conditions — escalate to user when:** +- +- +- Context is nearing capacity — write compaction summary to session state, then ask the user to resume in a new session. +- More than 50 tool calls have been made without completing the session goal — pause, summarise progress, and ask how to proceed. +``` + +**Why the 50-call limit matters:** Anthropic recommends max iteration caps for autonomous agents. Without a limit, compounding errors can cause an agent to execute dozens of irreversible actions before the user can intervene. + +--- + +## 9. Handoff contracts + +Agent-to-agent handoffs must use **structured payloads**, not free-form prose. Both sides (outbound and inbound) must match. + +```markdown +## Handoff + +**Outbound to @other-agent:** +\``` +Key: value +Key: value +\``` + +**Inbound from @other-agent:** +\``` +Key: value +\``` +``` + +**Rules:** +- Payloads must include entity IDs, state, version, and file paths where relevant. +- Never summarise loosely — use the exact payload format. +- If the target agent is not yet deployed, document with a `TODO: @agent-name` comment rather than omitting the handoff. + +--- + +## 10. Repo conventions every agent must follow + +### AC state vocabulary + +All agents in this repository use the same four AC states. Never introduce alternative spellings. + +| State | Meaning | +|---|---| +| `PLANNED` | Drafted; no implementation yet | +| `IN_REVIEW` | Implementation underway or in PR | +| `ACTIVE` | Implemented and verified | +| `DEPRECATED` | Superseded or deleted; requires `deprecated_at` and `deprecation_reason` | + +### Entity ID format + +`US-` · `FEAT-` · `FUNC-` · `AC:-` + +IDs are stable — never change an ID after creation. Bump the `version` field for changes. + +### Gherkin traceability tag format + +```gherkin +# AC:US-007-01 (v1.0.0 - ACTIVE) — +@AC:US-007-01 +Scenario: ... +``` + +One `# AC:` + `@AC:` pair per AC. The `@AC:` tag is the machine-readable traceability anchor — never delete or rename it without syncing the catalog entity. + +### Cooperating agent boundary + +| Layer | Owner | +|---|---| +| Catalog (entities, ACs, traceability links) | `@living-doc-bdd-copilot` | +| Automation (PageObjects, step definitions, feature files) | `@living-doc-bdd-copilot` | + +Never cross this boundary. When a task belongs to the other agent, hand off using the structured payload — do not attempt the task yourself. diff --git a/docs/guides/living-doc-bdd-copilot.md b/docs/guides/living-doc-bdd-copilot.md new file mode 100644 index 0000000..ca16e9c --- /dev/null +++ b/docs/guides/living-doc-bdd-copilot.md @@ -0,0 +1,169 @@ +# Living Doc BDD Copilot Agent + +`@living-doc-bdd-copilot` is the automation layer agent. It explores web applications, generates PageObjects, produces Gherkin scenarios and step definitions, and maintains the BDD automation suite across the full engineering pipeline. + +--- + +## What it does + +| Task | When to use | +|---|---| +| Explore a web app | Crawl and map UI surfaces; discover Features from the live application | +| Generate PageObjects | Create or update PageObject classes from discovered UI surfaces | +| Generate Gherkin scenarios | Cover User Story ACs with `.feature` files and linked step definitions | +| Sync Gherkin with living doc | Ensure traceability tags in feature files match catalog ACs | +| Heal automation after UI changes | Fix broken selectors, step definitions, and PageObjects (failing tests only) | +| Re-scan after refactor | Full re-crawl of all manifest paths plus active discovery of new routes; update scenarios | +| Remove deprecated feature automation | Clean up `.feature` files, steps, and PageObjects for removed features | +| Generate tutorial documents | Transform executed BDD scenarios into annotated walkthrough documents | + +--- + +## How to trigger it + +``` +scan webapp +generate pageobjects for the login screen +explore the app at https://... +generate scenarios for US-42 +heal pageobjects +sync gherkin to living doc +crawl the UI +living doc bdd copilot +BDD pipeline +create page objects +generate feature file from user story +``` + +--- + +## Before you start — setup files + +The agent uses two persistent files: + +| File | Purpose | +|---|---| +| `seed.yaml` | Business Seed — base URL, credentials (env refs), known routes, guided traversal steps | +| `manifest.json` | Exploration Manifest — all discovered surfaces with Feature name, URL, component IDs, and PageObject path | + +These files can live anywhere in the repository. On each session start, the agent searches for them automatically: + +1. Searches for `seed.yaml` containing a `base_url:` key. +2. Searches for `manifest.json` containing an array with `pageobject_path` entries. +3. If found, loads both files and resumes from the last known state — no re-crawl needed. +4. If not found, creates them at a sensible location (alongside your existing living doc catalog directory, or `.copilot/bdd/` if no catalog is present). + +**On first discovery**, the agent will propose adding the file paths to `.github/copilot-instructions.md` so every future session can load them without searching: + +```markdown +## BDD Artifacts +- **Business Seed:** `/seed.yaml` +- **Exploration Manifest:** `/manifest.json` +``` + +**Credential safety:** Credentials in `seed.yaml` must always use `env:VAR_NAME` — never literal values. + +```yaml +base_url: https://your-app.example.com +credentials: + username: env:BDD_USERNAME + password: env:BDD_PASSWORD +known_routes: + - path: /login + feature: Authentication + - path: /dashboard + feature: Dashboard +guided_steps: [] # populated during guided traversal +``` + +--- + +## Pipeline + +### Business Seed assembly + +Collects sources A–E to build `seed.yaml`: + +| Source | What the agent collects | +|---|---| +| A — Living doc catalog | Feature names, US titles, AC texts, route mappings | +| B — Sitemap / route config | URL paths from Angular router, React Router, or `sitemap.xml` | +| C — OpenAPI / Swagger spec | REST endpoint paths, mapped to UI screens where obvious | +| D — Existing PageObjects | Already-discovered surfaces from a previous manifest run | +| E — Guided traversal | Steps recorded live as the agent pauses to ask the user at decision points | + +### Iterative exploration + +The agent navigates the live application via MCP Playwright, snapshots pages, identifies UI surfaces, and builds `manifest.json`. Exploration continues until a coverage plateau — no new surfaces in the last full iteration. + +If the agent hits an auth wall, multi-step wizard, CAPTCHA, or a form it cannot progress due to missing business knowledge (unknown valid input values, required lookup codes, business-specific field formats): + +- It takes a screenshot and describes what it sees. +- It asks you what to do next. +- CAPTCHA: it pauses and waits for you to solve it manually in the browser. +- All guided steps are recorded in `seed.yaml` under `guided_steps:` for future re-runs. + +### Scenario generation + +After exploration: + +1. Uses `living-doc-gap-finder` (bottom-up mode) to find `ACTIVE` ACs with no linked Gherkin scenario. +2. Generates `.feature` files with `Given/When/Then` scenarios — one scenario per AC, each with a `# AC:` traceability tag. +3. For each new step, checks for an existing reusable definition: first narrows scope to the relevant PageObject, then confirms the step's purpose matches (not just its text pattern). Reuses if it matches; writes a new stub only if no match exists. +4. Extends the relevant PageObject with any new UI interactions required by the new stubs. + +### Maintenance + +| Mode | When | What the agent does | +|---|---|---| +| **RE-SCAN** | New feature shipped or UI refactored | Full re-crawl of every manifest path plus active discovery of new routes (links, buttons, tabs, wizard steps); updates manifest; generates new scenarios for new ACs | +| **HEALING** | Tests failing due to selector drift | Scoped to failing tests only — navigates affected pages; identifies updated selectors; repairs PageObjects and step bindings; re-runs only the previously failing tests to confirm | +| **REMOVE** | Feature deprecated or deleted | Identifies linked `.feature` files, steps, and PageObjects; confirms before deleting; loads `living-doc-update` to complete catalog deprecation | + +--- + +## Shared skill — `living-doc-gap-finder` + +`living-doc-gap-finder` is used in two directions within the same agent: + +| Direction | What it finds | +|---|---| +| **Top-down** (catalog operations) | Missing documentation entities (Features, User Stories, Functionalities not yet in the catalog) | +| **Bottom-up** (automation operations) | ACs that exist in the catalog but have no linked Gherkin scenario | + +--- + +## Skills used + +| Skill | Purpose | +|---|---| +| `living-doc-pageobject-scan` | Discover, create, and maintain PageObject classes; Business Seed assembly and webapp crawl | +| `living-doc-scenario-creator` | Generate full Gherkin feature files (header + scenarios + step bodies) from ACs | +| `living-doc-gap-finder` | Find catalog gaps (top-down) and ACs with no linked scenario (bottom-up) | +| `gherkin-step` | Implement step definitions | +| `gherkin-living-doc-sync` | Sync feature files and scenarios with living doc traceability links | +| `data-cy-instrument` | Resolve missing `data-cy` attributes end-to-end | +| `bdd-maintain` | RE-SCAN, HEALING, REMOVE modes | + +--- + +## Handoff + +No cross-agent handoffs needed. This agent owns both catalog and automation layers. + +For concerns outside this agent's scope: + +| Concern | Owner | +|---|---| +| Unit and integration tests | `@sdet-copilot` | +| CI quality gates and linting | `@quality-gate-copilot` | + +--- + +## Installation + +```bash +npx skills add https://github.com/AbsaOSS/agentic-toolkit -g +``` + +See [Getting Started](../getting-started.md) for the full install guide. diff --git a/docs/guides/living-doc-copilot.md b/docs/guides/living-doc-copilot.md new file mode 100644 index 0000000..d0a0841 --- /dev/null +++ b/docs/guides/living-doc-copilot.md @@ -0,0 +1,8 @@ +# Living Doc Copilot Agent + +> **This agent has been merged into `@living-doc-bdd-copilot`.** +> See [living-doc-bdd-copilot.md](./living-doc-bdd-copilot.md) for the unified agent guide. + +Catalog management (User Stories, Features, Functionalities, AC updates, impact analysis, gap finding) and BDD automation are now owned by a single agent: **`@living-doc-bdd-copilot`**. + +Use `@living-doc-bdd-copilot` for all requests previously directed to `@living-doc-copilot`. diff --git a/docs/guides/token-saving.md b/docs/guides/token-saving.md new file mode 100644 index 0000000..2617fe0 --- /dev/null +++ b/docs/guides/token-saving.md @@ -0,0 +1,71 @@ +# Token-Saving Skill + +The `token-saving` skill enforces concise, low-noise AI responses. It is **always active** — it applies to every reply without needing to be triggered by a specific phrase. + +--- + +## What it changes + +Without the skill, AI assistants commonly pad responses with filler openers, closing platitudes, and repeated context. This skill removes that noise. + +| Behaviour | Without skill | With skill | +|-----------|--------------|------------| +| Response opener | "Great question! Certainly, I'd be happy to help..." | Directly answers | +| Closing | "Let me know if you have any questions!" | Stops when done | +| Repeated context | Restates what you just said | Skips it | +| Factual answers | Unlimited prose | ≤ 5 lines | +| Action lists | Unlimited bullets | Capped at 4 | +| Full file dumps | Common | Only when you ask | + +--- + +## Code output footer + +Every response that writes or changes code ends with exactly this footer — no more, no less: + +``` +**What changed:** +**Why:** +**How to verify:** +``` + +This applies to: new functions, patches, inline diffs, config snippets, or any code block representing a change. + +It does **not** apply to: Q&A, reviews, planning, comparisons, or conceptual explanations. + +--- + +## Overriding the skill + +The brevity rules suspend for a single response when you explicitly ask for depth: + +> "Give me a full explanation." +> "Deep dive into this." +> "Don't hold back." +> "Complete explanation." + +The next response is fully unrestricted. Rules resume after that. + +--- + +## Precedence + +If another active skill specifies its own output format (e.g. a review skill with a Blocker / Important / Nit structure), that format takes precedence over this skill's rules. + +--- + +## Installation + +The skill is installed along with the rest of the toolkit: + +```bash +npx skills add https://github.com/AbsaOSS/agentic-toolkit -g +``` + +To install only this skill: + +```bash +npx skills add https://github.com/AbsaOSS/agentic-toolkit -g --skill token-saving +``` + +See [Getting Started](../getting-started.md) for the full install guide. diff --git a/docs/testing/agent-testing.md b/docs/testing/agent-testing.md new file mode 100644 index 0000000..92bb95c --- /dev/null +++ b/docs/testing/agent-testing.md @@ -0,0 +1,247 @@ +# Agent Testing Guide + +This document describes how to test, evaluate, and tune `.agent.md` files — specifically how to use `skill-creator`'s eval methodology (for description trigger accuracy). This is the practical equivalent of [skill-testing.md](./skill-testing.md) applied to agents. + +--- + +## Why agent testing is different from skill testing + +| Dimension | Skill | Agent | +|---|---|---| +| Trigger mechanism | `description:` field in SKILL.md YAML | `description:` field in `.agent.md` YAML | +| Body loaded when? | When skill is activated by description match | When user addresses `@agent-name` or description matches | +| What to tune | Description trigger keywords + body instructions | Description trigger keywords + body sections (scope, handoff, maintenance modes) | +| Tool for eval loop | `skill-creator` (fully supported) | `skill-creator` (description eval loop applies directly) | + +The key insight: an agent's `description:` block is read by the same matching mechanism as a skill's `description:`. Everything `skill-creator` does to optimize skill descriptions applies 1-for-1 to agent descriptions. + +--- + +## 1. Recommended workflow + +1. Create trigger eval cases in `.github/agents/evals//trigger-eval.json` +2. Create body eval cases in `.github/agents/evals//evals.json` +3. Start a Copilot Chat session from the repository root +4. Ask Copilot to use the `skill-creator` skill, pointing it at the agent's eval files +5. Review trigger accuracy and output quality +6. Edit structural sections directly in the `.agent.md` file (tools list, scope, handoff, modes) +7. Re-run evals; repeat until stable + +--- + +## 2. File layout + +``` +.github/ + agents/ + my-agent.agent.md ← agent definition + evals/ + my-agent/ + trigger-eval.json ← which prompts should (and should not) invoke the agent + evals.json ← body behavior tests + files/ ← fixture files referenced by evals +``` + +--- + +## 3. Trigger eval format + +Store at `.github/agents/evals//trigger-eval.json` as a **flat JSON array** (no wrapper object): + +```json +[ + { + "id": 1, + "query": "Scan the webapp at https://app.example.com and generate PageObjects", + "should_trigger": true, + "reason": "'scan webapp' + 'generate pageobjects' core phrase" + }, + { + "id": 2, + "query": "Explore the app and map all the UI surfaces", + "should_trigger": true, + "reason": "'explore the app' maps to crawl/explore mode" + }, + { + "id": 3, + "query": "Create a User Story for the loyalty points redemption feature", + "should_trigger": true, + "reason": "Catalog entity creation — living-doc layer" + }, + { + "id": 4, + "query": "Write a unit test for the login validator", + "should_trigger": false, + "reason": "Unit test authoring — out of scope" + }, + { + "id": 5, + "query": "Debug the null pointer exception in PaymentService.processOrder()", + "should_trigger": false, + "reason": "Application debugging — outside scope" + } +] +``` + +Note: the field is `query` (not `prompt`). The `reason` field is for human documentation only — it is not used by the eval runner. + +Write at least **5 should-trigger** and **5 should-not-trigger** cases. Should-not-trigger cases are as important as the positive ones — they catch over-broad descriptions that shadow other agents. + +--- + +## 4. Body eval format + +Store at `.github/agents/evals//evals.json`. Same schema as skill evals: + +```json +{ + "agent_name": "my-agent", + "evals": [ + { + "id": "business-seed-assembly", + "prompt": "I want to set up BDD automation for our app at https://app.example.com. The Angular router is at src/app/app-routing.module.ts.", + "expected_output": "Agent assembles seed.yaml from the router file, proposes base_url, lists known_routes, confirms credential env var names before crawling.", + "files": ["src/app/app-routing.module.ts"] + }, + { + "id": "re-scan-stale-locator", + "prompt": "RE-SCAN — the checkout page was redesigned.", + "expected_output": "Agent loads manifest.json, navigates to /checkout, validates component_id locators, flags stale ones, updates PageObject selectors. Does NOT touch unrelated pages.", + "files": ["manifest.json"] + }, + { + "id": "healing-scope", + "prompt": "HEALING — these 3 scenarios are failing: LoginPage submit, CheckoutPage confirm, DashboardPage filter.", + "expected_output": "Agent scopes work to those 3 failing tests only. Does not re-run or touch passing tests.", + "files": [] + } + ] +} +``` + +--- + +## 5. Running the eval loop + +Point `skill-creator` at the agent files — it treats the `description:` block the same way it treats a skill description. + +### Trigger accuracy + +``` +Use the skill-creator skill to optimize the description for .github/agents/my-agent.agent.md +using the trigger evals at .github/agents/evals/my-agent/trigger-eval.json. +Constraints: ≤ 1024 chars; structured domain nouns/verbs; include a NOT for: boundary clause. +Report precision and recall scores for each candidate. Repeat until all trigger evals pass. +``` + +`skill-creator` will propose candidate descriptions, score them against the eval set, and iterate. + +### Body quality + +``` +Use the skill-creator skill to run the body evals for .github/agents/my-agent.agent.md +using .github/agents/evals/my-agent/evals.json. +Verify: (1) all body-referenced tools are present in the frontmatter tools: list, +(2) mode dispatch routes to the correct skill for each intent, +(3) scope boundaries match ## Scope and ## Does NOT, (4) handoff targets are correct. +Only fix scope, tool, or handoff issues — do not rewrite unless fundamentally mis-scoped. +Repeat until all evals pass. +``` + +Use the same with-skill / baseline comparison flow described in [skill-testing.md](./skill-testing.md). + +--- + +## 6. Structural edits + +When body evals reveal a section is wrong (wrong scope, missing tool, bad handoff), edit the `.agent.md` file directly: + +- **Missing tool** — add the tool name to the `tools:` list in the YAML frontmatter +- **Wrong scope boundary** — update the relevant section (`## Scope`, `## Does NOT`, or the specific mode block) +- **Broken handoff** — update the `## Handoff` section with the correct target agent and conditions + +--- + +## 7. What to tune — agent-specific checklist + +Beyond the standard skill tuning checklist, also verify: + +| Check | Good signal | Bad signal | +|---|---|---| +| **Trigger precision** | Agent fires only for its domain | Fires for requests that belong to another agent | +| **Trigger recall** | All domain phrases trigger it | Mis-fires to default agent for known phrases | +| **Scope boundaries** | Refuses work outside its Does-NOT list | Silently attempts work outside its scope | +| **Mode activation** | RE-SCAN / HEALING / REMOVE activate on correct triggers | Wrong mode fires, or modes don't activate | +| **Handoff clarity** | Outputs correct hand-off message to the right agent | Hands off to wrong agent or swallows the work | +| **Tool completeness** | All tools needed by the body are in the frontmatter `tools:` list | Body references a tool not in `tools:` — it will be unavailable | + +--- + +## 8. Description anti-patterns + +These are the most common description problems observed in agent files: + +**Over-broad description** — causes the agent to shadow other agents: +```yaml +# BAD — fires on almost everything +description: > + Helps with testing, documentation, and web apps. +``` + +**Under-specified triggers** — causes the agent to miss its domain: +```yaml +# BAD — won't fire on "crawl the UI" or "playwright scan" +description: > + Generates BDD tests. +``` + +**Good pattern** — minimalist semantic description with a `NOT for:` boundary: +```yaml +description: > + Living documentation catalog (User Stories, Features, Functionalities, ACs, impact + analysis, gap finding) and BDD automation (Playwright crawl/explore/scan, PageObjects + create/heal, Gherkin scenarios/feature files/step definitions, living-doc sync, + scenario coverage). Setup: seed.yaml → manifest.json, credential checks, guided + traversal. NOT for: unit tests, production code, API specs, CI/CD, debugging, + performance, security. +``` + +The `NOT for:` clause is as important as the positive terms — it prevents the agent from firing on adjacent-but-out-of-scope requests. An explicit `Triggers:` keyword list is not required; structured domain nouns and verbs are sufficient for the matching mechanism to work. + +--- + +## 9. Regression-first loop + +Same as skill testing — run the full trigger-eval set, fix the largest failure cluster, re-run: + +1. Run full trigger-eval and body-eval sets; save baseline scores +2. Identify largest failure cluster (e.g. 4 should-not-trigger cases fire) +3. Make one description change +4. Re-run trigger-eval only +5. Review delta +6. Run full suite +7. Keep or revert +8. Repeat until all trigger evals pass and body eval delta is positive or neutral + +--- + +## 10. Minimal session + +``` +VS Code Copilot Chat (or gh copilot): +→ "Use the skill-creator skill to test the agent at .github/agents/my-agent.agent.md + using the evals at .github/agents/evals/my-agent/. + Report trigger precision/recall and body eval pass rate." +→ inspect trigger accuracy report and body output diffs; + classify each change as improvement, regression, or neutral +→ edit the `.agent.md` file directly to fix structural issues + (scope, tools: list, mode dispatch, handoff) +→ "Use the skill-creator skill to optimize the description for .github/agents/my-agent.agent.md + using .github/agents/evals/my-agent/trigger-eval.json. + Keep ≤ 1024 chars; include a NOT for: boundary clause. Repeat until all evals pass." +→ re-run full eval suite; keep or revert each change; repeat until stable +``` + +--- + +For the full eval methodology (subagent spawning, benchmark aggregation, the viewer), see [skill-testing.md](./skill-testing.md) — the process is identical once the eval files are in place. diff --git a/docs/skill-testing.md b/docs/testing/skill-testing.md similarity index 76% rename from docs/skill-testing.md rename to docs/testing/skill-testing.md index 49aee05..a6d76f6 100644 --- a/docs/skill-testing.md +++ b/docs/testing/skill-testing.md @@ -7,7 +7,7 @@ This document provides a comprehensive methodology for testing, evaluating, and ## 1. Recommended workflow 1. Create eval cases in `skills//evals/evals.json` -2. Add fixture files under `skills//evals/fixtures/` when prompts depend on local files +2. Add fixture files under `skills//evals/files/` when prompts depend on local files 3. Start a Copilot CLI session from the repository root 4. Ask Copilot to use the `skill-creator` skill to test the target skill 5. Review outputs and diffs @@ -45,6 +45,8 @@ Ask for a side-by-side comparison in the Copilot CLI session: ``` Use the skill-creator skill to compare outputs for skills/my-skill with and without the skill enabled. +Compare on: correctness, structure adherence, completeness, and output verbosity. +Only fix the smallest part of the skill that explains the largest failure cluster. Repeat until all evals pass. ``` Compare correctness, completeness, structure, latency, verbosity, and formatting stability. @@ -78,9 +80,14 @@ When an eval fails, update the smallest possible part of the skill and re-run th Ask Copilot to optimize the description against your trigger eval set: ``` -Use the skill-creator skill to optimize the description for skills/my-skill using skills/my-skill/evals/trigger-eval.json. +Use the skill-creator skill to optimize the description for skills/my-skill +using skills/my-skill/evals/trigger-eval.json. +Keep minimalist (≤ 1024 chars); structured domain nouns/verbs preferred over explicit keyword lists. +Include a NOT for: boundary clause. Report precision and recall per candidate. Repeat until all evals pass. ``` +A good description uses structured domain nouns and a `NOT for:` boundary. An explicit keyword list is not required. + ## 10. What “good enough” looks like - All smoke tests and known regressions passing @@ -101,12 +108,15 @@ Use the skill-creator skill to optimize the description for skills/my-skill usin ## 12. Minimal CLI Loop ``` -gh copilot -→ "Use the skill-creator skill to test my skill at skills/my-skill" -→ inspect results and diffs -→ edit SKILL.md or fixtures -→ "Use the skill-creator skill to rerun the evals for skills/my-skill" -→ optimize description if needed +VS Code Copilot Chat (or gh copilot): +→ "Use the skill-creator skill to test my skill at skills/my-skill. + Run all evals and report pass rate and baseline delta." +→ inspect results and diffs — classify each change as improvement, regression, or neutral +→ edit SKILL.md or fixtures (smallest change that fixes the largest failure cluster) +→ "Use the skill-creator skill to rerun the evals for skills/my-skill." +→ "Use the skill-creator skill to optimize the description for skills/my-skill + using skills/my-skill/evals/trigger-eval.json. + Keep ≤ 1024 chars; include a NOT for: boundary clause. Repeat until all evals pass." → repeat until stable ``` diff --git a/skills/bdd-maintain/SKILL.md b/skills/bdd-maintain/SKILL.md new file mode 100644 index 0000000..437a8d8 --- /dev/null +++ b/skills/bdd-maintain/SKILL.md @@ -0,0 +1,131 @@ +--- +name: bdd-maintain +description: > + Lifecycle cleanup for BDD automation artifacts. REMOVE: delete feature files, step + definitions, and PageObjects linked to a deprecated entity. DEAD CODE AUDIT: find + unused step definitions, PageObject methods, and PO components via three Python scripts. + Third step in the entity-deprecation chain — after living-doc-update and gherkin-living-doc-sync. + Triggers on: "remove feature", "deprecate bdd", "delete feature files", "bdd cleanup", + "remove pageobject", "unused steps", "dead pageobject methods", "find unused steps", + "dead code audit", "unused po methods", "dead po components", "bdd-maintain". + Does NOT trigger for: re-scanning manifest after UI changes (use living-doc-pageobject-scan + RE-SCAN); healing selector drift (use living-doc-pageobject-scan HEALING); syncing @AC: + traceability tags (use gherkin-living-doc-sync). + Pairs with living-doc-update (upstream — deprecate entity first) and + gherkin-living-doc-sync (upstream — tag scenarios first). +license: Apache-2.0 +compatibility: GitHub Copilot +--- + +# BDD Maintenance + +> **Glossary:** Feature, Functionality, User Story — see [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)). +> **BDD schemas:** manifest.json schema (routes, elements, coverage_gaps, navigation_context) — see [living-doc-bdd-schemas](../references/living-doc-bdd-schemas.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-bdd-schemas.md)). + +Two modes — activate the one that matches the trigger. + +--- + +## REMOVE mode + +**Trigger:** Feature deprecated or deleted from the product. + +**Prerequisite:** `living-doc-update` must have already deprecated the entity and `gherkin-living-doc-sync` must have already tagged linked scenarios with `@deprecated` and `@review-needed`. Run those two skills first if they have not yet run — removing files before scenarios are tagged silently breaks traceability. + +**Scope:** Only files linked to the removed entity — do not touch other Features, PageObjects, or step definitions. + +1. Identify the specific Feature/US/AC being removed. +2. Find all `.feature` files whose scenarios carry an `@AC:` tag matching the removed entity's IDs. +3. Find PageObjects referenced only by those scenarios; find step definitions used only by those scenarios. Also check `playwright/fixtures.ts` (or the project's fixture file) for fixture registrations that import the PageObjects being removed — those imports and constructor parameters must be removed too. +4. Confirm the full deletion list with the user before touching any file. +5. Remove confirmed files; remove the deprecated entry from `manifest.json`. Do not restructure or regenerate the manifest — `living-doc-pageobject-scan` owns the manifest for all active entries. +6. If any child entities (linked User Stories, Functionalities) were not yet deprecated in the catalog, flag them and load `living-doc-update` to deprecate them now. + +--- + +## DEAD CODE AUDIT mode + +**Trigger:** Step definitions added but scenarios removed, PageObject redesigned, new PO classes created but not yet wired into steps. + +**Scope:** Full audit of `playwright/steps/`, `playwright/pages/`, and `playwright/features/` for dead code. + +Three standalone Python scripts live in `scripts`: + +### 1 · `find_unused_steps.py` — step definitions with no feature coverage + +Parses all `*.steps.ts` files for `Given(…)`, `When(…)`, `Then(…)` pattern strings, then scans every `.feature` file for matching step usages (Cucumber expression placeholders resolved to regex wildcards). Reports any step definition that is never exercised. + +```bash +# Run from aul-ui/ +python playwright/scripts/find_unused_steps.py \ + --steps-dir playwright/steps \ + --features-dir playwright/features +``` + +### 2 · `find_unused_po_methods.py` — PageObject methods never called from step files + +Parses every `playwright/pages/*.ts` for public method declarations (`async name(` / `name(`), then scans all step files for `.name(` call sites. Reports methods that are defined but never invoked from any step. + +```bash +python playwright/scripts/find_unused_po_methods.py \ + --pages-dir playwright/pages \ + --steps-dir playwright/steps +``` + +### 3 · `find_unused_po_components.py` — PageObject classes not imported anywhere + +Scans all exported `class` names from `playwright/pages/*.ts`, then checks every `*.steps.ts` and `fixtures.ts` for import statements. Reports classes that are defined but never imported. + +```bash +python playwright/scripts/find_unused_po_components.py \ + --pages-dir playwright/pages \ + --steps-dir playwright/steps +``` + +### When to run + +| Trigger | Script(s) to run | +|---------|-----------------| +| Step definition added or removed | `find_unused_steps.py` | +| PageObject method added, renamed, or deleted | `find_unused_po_methods.py` | +| New PageObject class created | `find_unused_po_components.py` | +| Before any REMOVE operation | All three | +| CI / pre-merge gate | All three (each exits 1 on findings) | + +### Handling findings + +- **Unused step def**: either add a scenario that exercises it, or delete the step definition. +- **Unused PO method**: either write a step that calls it, or remove the method from the PageObject. +- **Unused PO class**: either add an import and fixture entry, or remove the `.ts` file — after confirming nothing references it outside the test suite. + +All three scripts exit `0` on clean, `1` on findings, `2` on bad arguments — safe for CI gating. + +### Handling findings — edge cases + +**Recommended script order for a full audit:** run in the sequence steps → PO methods → PO components. Deleting unused steps can expose unused PO methods; deleting unused PO methods can then expose unused PO classes. Running in this order ensures each pass builds on the previous one rather than missing transitively dead code. + +**Unused step def — distinguish before deleting:** +- If the step belongs to a **deprecated entity** (the US/Feature has `status: deprecated` in the catalog), delete it — the coverage it provided is no longer needed. +- If the step belongs to an **active entity** but has no exercising scenario, it is a stale draft or an orphan; flag it for team review before deleting. Someone may be about to add a scenario for it. +- Never delete without first verifying the step is not imported or re-exported by another step file — grep for the step file name as an import target as well. + +**Script false positives — `find_unused_po_methods.py`:** +The script uses static string matching. It can report a false positive when: +- A method is called on a variable typed as a **base class or interface** rather than the concrete PageObject (e.g. `page.confirm_order()` where `page: BasePage`) +- A method is invoked via a dynamic alias or through a test helper that re-exports it +If the reported method visibly exists in a step file call site, trust the call site over the script output and do not delete the method. File the false positive with the team so the script can be improved. + +**Shared steps between deprecated and active scenarios:** +Before deleting a step definition, always check whether it is exercised by any scenario outside the deprecated entity. Run `find_unused_steps.py` after removing the deprecated scenarios — not before. If the script still reports the step as unused after the deprecated scenarios are removed, it is safe to delete. If it now shows as used (by a surviving scenario), keep it. + +--- + +## Out-of-scope routing + +| Request | Correct skill | +|---|---| +| Re-scan manifest after UI changes | `living-doc-pageobject-scan` RE-SCAN scope | +| Fix failing tests due to selector drift | `living-doc-pageobject-scan` HEALING scope | +| Sync `@AC:` traceability tags | `gherkin-living-doc-sync` | +| Deprecate an entity in the catalog | `living-doc-update` | +| Tag deprecated scenarios before deletion | `gherkin-living-doc-sync` | diff --git a/skills/bdd-maintain/evals/evals.json b/skills/bdd-maintain/evals/evals.json new file mode 100644 index 0000000..1069806 --- /dev/null +++ b/skills/bdd-maintain/evals/evals.json @@ -0,0 +1,128 @@ +{ + "skill_name": "bdd-maintain", + "evals": [ + { + "id": 1, + "category": "happy-path", + "prompt": "US-042 was deprecated last sprint. Which BDD files need to be removed and in what order?", + "expected_output": "REMOVE mode: (1) Identify all .feature file scenarios tagged @AC:US-042-xx. (2) Remove those scenarios (or the entire feature file if all scenarios belong to US-042). (3) Remove all step definitions only used by those scenarios (run find_unused_steps.py to confirm). (4) Remove the PageObject if no other feature uses it (run find_unused_po_methods.py). (5) Remove fixture registrations in fixtures.ts that import the removed PageObject.", + "files": [], + "expectations": [ + "Lists removal order: scenarios → step defs → PageObject → fixtures.ts", + "Uses find_unused_steps.py to identify orphaned step defs", + "Checks fixtures.ts for PageObject imports", + "Does not remove shared artifacts still used by other features" + ] + }, + { + "id": 2, + "category": "happy-path", + "prompt": "Run find_unused_steps.py — it reported 12 unused step definitions. What should I do with them?", + "expected_output": "Review each unused step against the living doc: (1) If the step belongs to a deprecated entity — delete it. (2) If the step belongs to an active entity but has no scenario — it may be a stale draft; flag for team review before deleting. (3) Never delete without checking whether the step is imported or aliased from another step file.", + "files": [], + "expectations": [ + "Does not auto-delete without review", + "Distinguishes deprecated entity steps from active-entity orphans", + "Warns about aliased/imported steps" + ] + }, + { + "id": 3, + "category": "regression", + "prompt": "I removed the CheckoutPage PageObject but forgot to remove it from fixtures.ts. What are the consequences and how do I fix it?", + "expected_output": "fixtures.ts now has a broken import pointing to the deleted file, which causes a compile error in the test suite. Fix: (1) Open fixtures.ts. (2) Remove the import statement for CheckoutPage. (3) Remove the checkout parameter from the test fixture function signature and the constructor call. (4) Re-run the test suite to confirm no lingering references.", + "files": [], + "expectations": [ + "Identifies compile error as the consequence", + "Removes the import statement", + "Removes constructor parameter in fixture function", + "Verifies by re-running the test suite" + ] + }, + { + "id": 4, + "category": "happy-path", + "prompt": "What is the difference between find_unused_po_methods.py and find_unused_po_components.py?", + "expected_output": "find_unused_po_methods.py finds PageObject class methods (e.g. clickConfirmButton()) that are defined but never called by any step definition. find_unused_po_components.py finds entire PageObject classes or component groupings that have no references anywhere in the step files. Use methods script first for fine-grained cleanup; use components script to find entire dead PO files.", + "files": [], + "expectations": [ + "Correctly distinguishes method-level vs class/component-level scope", + "Recommends methods script for fine-grained, components for file-level", + "Both scripts referenced by name" + ] + }, + { + "id": 5, + "category": "regression", + "prompt": "find_unused_po_methods.py reports that CheckoutPage.clickConfirmButton() is unused, but I can see it's called in checkout.steps.ts. Why?", + "expected_output": "The script likely scans with a string match — if the method is called via a variable reference (e.g. `page.clickConfirmButton()` where `page` is typed as `BasePage`) or through an alias, the static scan may miss it. Review the call site in checkout.steps.ts and verify the import. If the call exists, the script has a false positive — do not delete the method. Report the false positive to the team.", + "files": [], + "expectations": [ + "Identifies dynamic dispatch / base-class alias as likely cause", + "Does not delete the method on a false positive", + "Recommends verifying the actual call site" + ] + }, + { + "id": 6, + "category": "negative", + "prompt": "The selectors in CheckoutPage are broken after the UI redesign. Should I use bdd-maintain to fix them?", + "expected_output": "No — broken selectors are a HEALING scope concern, not a BDD maintenance task. Load living-doc-pageobject-scan and use HEALING mode to re-scan the page and update selectors. bdd-maintain handles artifact lifecycle (deletion and dead code), not selector repair.", + "files": [], + "expectations": [ + "Routes to living-doc-pageobject-scan HEALING", + "Explains bdd-maintain does not own selector repair", + "Does not attempt to fix selectors" + ] + }, + { + "id": 7, + "category": "edge-case", + "prompt": "A step definition in checkout.steps.ts is called by two scenarios — one belongs to deprecated US-042 and one belongs to active US-007. Can I delete the step?", + "expected_output": "No — the step is still needed by the active US-007 scenario. Only the scenario linked to US-042 should be removed. The step definition itself must be retained. After removing the US-042 scenario, re-run find_unused_steps.py to confirm the step is still referenced.", + "files": [], + "expectations": [ + "Does not delete a step shared with an active scenario", + "Only removes the deprecated scenario", + "Re-runs script after cleanup to verify" + ] + }, + { + "id": 8, + "category": "happy-path", + "prompt": "In what order should I run the three audit scripts for a full dead code audit?", + "expected_output": "Recommended order: (1) find_unused_steps.py — identifies step definitions with no scenario callers. (2) find_unused_po_methods.py — identifies PO methods with no step callers. (3) find_unused_po_components.py — identifies entire PO classes/components with no references. Run in this order because deleting steps may free up PO methods, and deleting PO methods may free up entire PO components.", + "files": [], + "expectations": [ + "Lists the correct order: steps → PO methods → PO components", + "Explains the cascading dependency rationale", + "All three script names referenced correctly" + ] + }, + { + "id": 9, + "category": "regression", + "prompt": "The product owner wants to re-activate a User Story that was deprecated two sprints ago. Some BDD artifacts were already removed. How do I restore them?", + "expected_output": "bdd-maintain does not handle restoration — it handles removal and dead code auditing only. To restore: (1) Update the entity status in living-doc-update (set back to active). (2) Regenerate scenarios with living-doc-scenario-creator. (3) Regenerate PageObjects with living-doc-pageobject-scan if needed. (4) Re-run gherkin-living-doc-sync to re-link @AC: tags.", + "files": [], + "expectations": [ + "Routes to living-doc-update to change entity status", + "Routes to living-doc-scenario-creator for scenario regeneration", + "Routes to living-doc-pageobject-scan for PO regeneration", + "Routes to gherkin-living-doc-sync for @AC: re-linking" + ] + }, + { + "id": 10, + "category": "output-format", + "prompt": "Show me the expected output of find_unused_steps.py for a suite with 3 unused step definitions.", + "expected_output": "Output lists each unused step definition file path and function name, e.g.:\n UNUSED: checkout/checkout.steps.py::step_when_customer_applies_promo\n UNUSED: checkout/checkout.steps.py::step_then_total_unchanged\n UNUSED: login/login.steps.py::step_given_expired_session\nSummary line: '3 unused step definition(s) found.'", + "files": [], + "expectations": [ + "Shows file path + function name format", + "Shows summary count line", + "Does not show false positives for steps used by active scenarios" + ] + } + ] +} \ No newline at end of file diff --git a/skills/bdd-maintain/evals/trigger-eval.json b/skills/bdd-maintain/evals/trigger-eval.json new file mode 100644 index 0000000..295ec4f --- /dev/null +++ b/skills/bdd-maintain/evals/trigger-eval.json @@ -0,0 +1,122 @@ +[ + { + "id": 1, + "query": "Remove all BDD artifacts for the deprecated checkout feature", + "should_trigger": true, + "reason": "REMOVE mode — deleting BDD artifacts for a deprecated entity" + }, + { + "id": 2, + "query": "Delete the feature files and step definitions for US-007 which was deprecated", + "should_trigger": true, + "reason": "REMOVE mode — explicit BDD artifact deletion" + }, + { + "id": 3, + "query": "Run a dead code audit to find unused step definitions", + "should_trigger": true, + "reason": "DEAD CODE AUDIT mode — finding unused step definitions" + }, + { + "id": 4, + "query": "Find unused PageObject methods across the test suite", + "should_trigger": true, + "reason": "DEAD CODE AUDIT mode — finding unused PO methods" + }, + { + "id": 5, + "query": "Which PageObject components are never referenced by any step definition?", + "should_trigger": true, + "reason": "DEAD CODE AUDIT mode — finding dead PO components" + }, + { + "id": 6, + "query": "BDD cleanup after deprecating FEAT-legacy-payment-widget", + "should_trigger": true, + "reason": "REMOVE mode — bdd cleanup keyword" + }, + { + "id": 7, + "query": "Remove the PageObject for the old checkout wizard screen", + "should_trigger": true, + "reason": "REMOVE mode — removing a specific PageObject" + }, + { + "id": 8, + "query": "Run find_unused_steps.py to find orphaned step definitions", + "should_trigger": true, + "reason": "DEAD CODE AUDIT — direct script invocation" + }, + { + "id": 9, + "query": "Are there any dead PO components that nothing calls anymore?", + "should_trigger": true, + "reason": "DEAD CODE AUDIT — dead PO components query" + }, + { + "id": 10, + "query": "After deprecating US-042 in the living doc, what BDD files need to go?", + "should_trigger": true, + "reason": "REMOVE mode — downstream of entity deprecation" + }, + { + "id": 11, + "query": "Scan the webapp at https://app.example.com and update the PageObjects", + "should_trigger": false, + "reason": "PageObject re-scan after UI change — routes to living-doc-pageobject-scan RE-SCAN" + }, + { + "id": 12, + "query": "Heal the PageObjects — selectors are broken after the UI redesign", + "should_trigger": false, + "reason": "Selector drift healing — routes to living-doc-pageobject-scan HEALING" + }, + { + "id": 13, + "query": "Sync the @AC: tags in checkout.feature with the living doc", + "should_trigger": false, + "reason": "@AC: traceability sync — routes to gherkin-living-doc-sync" + }, + { + "id": 14, + "query": "Create a User Story for the checkout capability", + "should_trigger": false, + "reason": "Creating a User Story — routes to living-doc-create-user-story" + }, + { + "id": 15, + "query": "Generate BDD scenarios for US-007", + "should_trigger": false, + "reason": "Generating scenarios — routes to living-doc-scenario-creator" + }, + { + "id": 16, + "query": "Write step definitions for the checkout scenarios", + "should_trigger": false, + "reason": "Writing step definitions — routes to gherkin-step" + }, + { + "id": 17, + "query": "Run a gap analysis on the living documentation", + "should_trigger": false, + "reason": "Gap analysis — routes to living-doc-gap-finder" + }, + { + "id": 18, + "query": "What does PR #217 affect in the living doc?", + "should_trigger": false, + "reason": "Impact analysis — routes to living-doc-impact-analysis" + }, + { + "id": 19, + "query": "Add data-cy attributes to the checkout template", + "should_trigger": false, + "reason": "Instrumenting templates — routes to data-cy-instrument" + }, + { + "id": 20, + "query": "Update US-042 to add a new AC for the expired promo path", + "should_trigger": false, + "reason": "Updating an entity — routes to living-doc-update" + } +] \ No newline at end of file diff --git a/skills/bdd-maintain/scripts/find_unused_po_components.py b/skills/bdd-maintain/scripts/find_unused_po_components.py new file mode 100644 index 0000000..973a241 --- /dev/null +++ b/skills/bdd-maintain/scripts/find_unused_po_components.py @@ -0,0 +1,136 @@ +#!/usr/bin/env python3 +""" +find_unused_po_components.py — Dead-code detector: PageObject classes never imported in step files. + +Usage: + python playwright/scripts/find_unused_po_components.py [--pages-dir DIR] [--steps-dir DIR] + +Exits with code 1 if any unused PageObject classes are found (useful in CI). +""" +from __future__ import annotations + +import argparse +import re +import sys +from pathlib import Path +from dataclasses import dataclass + + +# --------------------------------------------------------------------------- +# Patterns +# --------------------------------------------------------------------------- + +# Match class declarations in TypeScript: `export class FooPage {` +CLASS_DEF_RE = re.compile(r"export\s+class\s+([A-Z][a-zA-Z0-9_$]*)") + +# Match import statements: `import { Foo, Bar } from './...'` +IMPORT_BRACE_RE = re.compile(r"import\s*\{([^}]+)\}\s*from") +# Match default imports: `import Foo from './...'` +IMPORT_DEFAULT_RE = re.compile(r"import\s+([A-Z][a-zA-Z0-9_$]*)\s+from") + +# Match TypeScript type usage: e.g. param: FooPage, variable: FooPage, extends FooPage +TYPE_USE_RE = re.compile(r"\b([A-Z][a-zA-Z0-9_$]*)\b") + + +@dataclass +class PageObjectClass: + name: str + file: Path + + +def collect_po_classes(pages_dir: Path) -> list[PageObjectClass]: + """Extract exported class names from all PageObject .ts files.""" + classes: list[PageObjectClass] = [] + for ts_file in sorted(pages_dir.glob("*.ts")): + text = ts_file.read_text(encoding="utf-8") + for m in CLASS_DEF_RE.finditer(text): + classes.append(PageObjectClass(m.group(1), ts_file)) + return classes + + +def collect_imported_names(steps_dir: Path) -> set[str]: + """Collect all identifiers imported or used in step files and fixtures.""" + names: set[str] = set() + # Also scan fixtures.ts at the parent of steps_dir or sibling file + scan_dirs = [steps_dir] + parent = steps_dir.parent + fixtures_file = parent / "fixtures.ts" + extra_files: list[Path] = [] + if fixtures_file.exists(): + extra_files.append(fixtures_file) + + for ts_file in list(sorted(steps_dir.rglob("*.ts"))) + extra_files: + text = ts_file.read_text(encoding="utf-8") + # Named imports: import { Foo, Bar } from ... + for m in IMPORT_BRACE_RE.finditer(text): + for identifier in m.group(1).split(","): + stripped = identifier.strip() + # Handle `Foo as F` aliasing + actual = stripped.split(" as ")[0].strip() + names.add(actual) + # Default imports + for m in IMPORT_DEFAULT_RE.finditer(text): + names.add(m.group(1)) + + return names + + +def main() -> int: + parser = argparse.ArgumentParser( + description="Find PageObject classes never imported in step files or fixtures." + ) + parser.add_argument( + "--pages-dir", + default="playwright/pages", + help="Directory containing PageObject *.ts files (default: playwright/pages)", + ) + parser.add_argument( + "--steps-dir", + default="playwright/steps", + help="Directory containing *.steps.ts files (default: playwright/steps)", + ) + args = parser.parse_args() + + pages_dir = Path(args.pages_dir) + steps_dir = Path(args.steps_dir) + + if not pages_dir.is_dir(): + print(f"ERROR: pages-dir not found: {pages_dir}", file=sys.stderr) + return 2 + if not steps_dir.is_dir(): + print(f"ERROR: steps-dir not found: {steps_dir}", file=sys.stderr) + return 2 + + print(f"Scanning PageObject classes in: {pages_dir}") + print(f"Scanning imports in: {steps_dir}") + + # Also look for fixtures.ts + fixtures_path = steps_dir.parent / "fixtures.ts" + if fixtures_path.exists(): + print(f"Also scanning: {fixtures_path}") + print() + + po_classes = collect_po_classes(pages_dir) + imported_names = collect_imported_names(steps_dir) + + print(f"Found {len(po_classes)} PageObject class(es) in {pages_dir}") + print(f"Found {len(imported_names)} imported name(s) across step files") + print() + + unused = [c for c in po_classes if c.name not in imported_names] + + if not unused: + print("✅ All PageObject classes are imported/used.") + return 0 + + print(f"⚠️ {len(unused)} UNUSED PageObject class(es) found:\n") + for cls in sorted(unused, key=lambda c: c.name): + print(f" {cls.name:<40} {cls.file}") + print() + print("Action: either import and use these classes in step files, or remove the PageObject files.") + print() + return 1 + + +if __name__ == "__main__": + sys.exit(main()) diff --git a/skills/bdd-maintain/scripts/find_unused_po_methods.py b/skills/bdd-maintain/scripts/find_unused_po_methods.py new file mode 100644 index 0000000..2aa792f --- /dev/null +++ b/skills/bdd-maintain/scripts/find_unused_po_methods.py @@ -0,0 +1,174 @@ +#!/usr/bin/env python3 +""" +find_unused_po_methods.py — Dead-code detector: PageObject methods never called from step files. + +Usage: + python playwright/scripts/find_unused_po_methods.py [--pages-dir DIR] [--steps-dir DIR] + +Exits with code 1 if any unused methods are found (useful in CI). +""" +from __future__ import annotations + +import argparse +import re +import sys +from pathlib import Path +from dataclasses import dataclass + + +# --------------------------------------------------------------------------- +# Patterns +# --------------------------------------------------------------------------- + +# Public method definitions in TypeScript classes: +# async methodName( methodName( +# Exclude: constructor, private/protected (prefixed with modifier word) +# Match only on lines that look like method declarations (not calls) +METHOD_DEF_RE = re.compile( + r""" + ^[ \t]* # leading indent + (?!private|protected|readonly|static|get |set ) # not a modifier-only line + (?:async\s+)? # optional async + ([a-zA-Z_$][a-zA-Z0-9_$]*) # method name + \s*\( # opening paren + (?!.*:\s*Promise|\s*\{) # exclude constructor-like and block-only lines + """, + re.VERBOSE | re.MULTILINE, +) + +# Simpler fallback: any `async name(` or `name(` at line start with content +# We'll collect both and deduplicate +SIMPLE_METHOD_RE = re.compile( + r"""^[ \t]+(?:async\s+)([a-zA-Z_$][a-zA-Z0-9_$]*)\s*\(""", + re.MULTILINE, +) + +# TypeScript getter shorthand — these are locators, not callable methods +GETTER_RE = re.compile(r"^[ \t]+get\s+([a-zA-Z_$][a-zA-Z0-9_$]*)\s*\(", re.MULTILINE) + +# Method calls from step files: `.methodName(` or `fixture.methodName(` +# Capture any `.identifier(` occurrence +CALL_RE = re.compile(r"\.([a-zA-Z_$][a-zA-Z0-9_$]*)\s*\(") + +# Excluded names — too generic or framework-level +EXCLUDED_NAMES = frozenset({ + "constructor", "toString", "valueOf", "then", "catch", "finally", + "nth", "first", "last", "locator", "getByTestId", "getByRole", + "getByText", "fill", "click", "isVisible", "isDisabled", "isEnabled", + "waitFor", "expectToBeVisible", "expect", "goto", "reload", + "setTimeout", "setInputFiles", "hover", "press", "type", "check", + "uncheck", "selectOption", "evaluate", "dispatchEvent", "focus", + "blur", "screenshot", "textContent", "innerText", "inputValue", + "getAttribute", "scrollIntoViewIfNeeded", + # common Angular/Playwright helpers + "map", "filter", "forEach", "push", "join", "split", "trim", + "toLowerCase", "toUpperCase", "replace", "includes", "startsWith", + "endsWith", "slice", "substring", "indexOf", +}) + + +@dataclass +class MethodDef: + name: str + file: Path + line: int + + +def collect_po_methods(pages_dir: Path) -> list[MethodDef]: + """Extract public method names from all PageObject .ts files.""" + methods: list[MethodDef] = [] + seen: set[tuple[Path, str]] = set() + + for ts_file in sorted(pages_dir.glob("*.ts")): + text = ts_file.read_text(encoding="utf-8") + lines = text.splitlines() + + # Look for `async methodName(` or method-like declarations + for m in re.finditer(r"^[ \t]+(?:async\s+)?([a-zA-Z_$][a-zA-Z0-9_$]*)\s*\(", text, re.MULTILINE): + name = m.group(1) + if name in EXCLUDED_NAMES: + continue + if name == "constructor": + continue + line_num = text[: m.start()].count("\n") + 1 + # skip getter declarations + line_text = lines[line_num - 1] if line_num <= len(lines) else "" + if re.match(r"^\s*get\s+", line_text): + continue + + key = (ts_file, name) + if key not in seen: + seen.add(key) + methods.append(MethodDef(name, ts_file, line_num)) + + return methods + + +def collect_called_methods(steps_dir: Path) -> set[str]: + """Collect all method names called (via `.name(`) in step files and fixtures.""" + called: set[str] = set() + for ts_file in sorted(steps_dir.rglob("*.ts")): + text = ts_file.read_text(encoding="utf-8") + for m in CALL_RE.finditer(text): + called.add(m.group(1)) + return called + + +def main() -> int: + parser = argparse.ArgumentParser( + description="Find PageObject methods never called from step files." + ) + parser.add_argument( + "--pages-dir", + default="playwright/pages", + help="Directory containing PageObject *.ts files (default: playwright/pages)", + ) + parser.add_argument( + "--steps-dir", + default="playwright/steps", + help="Directory containing *.steps.ts files (default: playwright/steps)", + ) + args = parser.parse_args() + + pages_dir = Path(args.pages_dir) + steps_dir = Path(args.steps_dir) + + if not pages_dir.is_dir(): + print(f"ERROR: pages-dir not found: {pages_dir}", file=sys.stderr) + return 2 + if not steps_dir.is_dir(): + print(f"ERROR: steps-dir not found: {steps_dir}", file=sys.stderr) + return 2 + + print(f"Scanning PageObject methods in: {pages_dir}") + print(f"Scanning method calls in: {steps_dir}") + print() + + po_methods = collect_po_methods(pages_dir) + called_methods = collect_called_methods(steps_dir) + + print(f"Found {len(po_methods)} public method(s) across {pages_dir}") + print(f"Found {len(called_methods)} distinct method call(s) across {steps_dir}") + print() + + unused = [m for m in po_methods if m.name not in called_methods] + + if not unused: + print("✅ All PageObject methods are used.") + return 0 + + print(f"⚠️ {len(unused)} UNUSED PageObject method(s) found:\n") + by_file: dict[Path, list[MethodDef]] = {} + for m in unused: + by_file.setdefault(m.file, []).append(m) + + for file, methods in sorted(by_file.items()): + print(f" {file}") + for meth in sorted(methods, key=lambda x: x.line): + print(f" line {meth.line:4d} {meth.name}()") + print() + return 1 + + +if __name__ == "__main__": + sys.exit(main()) diff --git a/skills/bdd-maintain/scripts/find_unused_steps.py b/skills/bdd-maintain/scripts/find_unused_steps.py new file mode 100644 index 0000000..b5df559 --- /dev/null +++ b/skills/bdd-maintain/scripts/find_unused_steps.py @@ -0,0 +1,151 @@ +#!/usr/bin/env python3 +""" +find_unused_steps.py — Dead-code detector: step definitions unused in any .feature file. + +Usage: + python playwright/scripts/find_unused_steps.py [--steps-dir DIR] [--features-dir DIR] + +Exits with code 1 if any unused steps are found (useful in CI). +""" +from __future__ import annotations + +import argparse +import re +import sys +from pathlib import Path +from dataclasses import dataclass, field + + +# --------------------------------------------------------------------------- +# Patterns +# --------------------------------------------------------------------------- + +# Matches: Given('pattern', ...) / When(`pattern`, ...) / Then("pattern", ...) +# playwright-bdd uses Given/When/Then imported from cucumber or createBdd +STEP_DEF_RE = re.compile( + r"""(?:Given|When|Then|And|But)\s*\(\s*(['"`])(.*?)\1""", + re.DOTALL, +) + +# Playwright-bdd also supports: @Given('pattern') @When('pattern') @Then('pattern') +DECORATOR_RE = re.compile( + r"""@(?:Given|When|Then|And|But)\s*\(\s*(['"`])(.*?)\1""", + re.DOTALL, +) + +# Convert a Cucumber expression / simple regex pattern to a Python regex +# Cucumber expression placeholders: {string}, {int}, {word}, {float} +CUCUMBER_PLACEHOLDER_RE = re.compile(r"\{(?:string|int|word|float|[^}]+)\}") + + +@dataclass +class StepDefinition: + pattern: str # raw pattern string from source + file: Path + line: int + regex: re.Pattern # compiled regex for matching + + +def cucumber_to_regex(pattern: str) -> re.Pattern: + """Convert a Cucumber expression pattern to a Python compiled regex.""" + escaped = re.escape(pattern) + # Restore the placeholder wildcards after escaping + # {string} → matches "..." or '...' → use non-greedy wildcard for simplicity + escaped = CUCUMBER_PLACEHOLDER_RE.sub(r".+?", re.escape(pattern)) + # Actually redo: escape first, then replace placeholders + escaped = re.escape(pattern) + escaped = re.sub(r"\\{(?:string|int|word|float|[^}]+)\\}", r".+?", escaped) + return re.compile(rf"^\s*{escaped}\s*$", re.IGNORECASE) + + +def collect_step_definitions(steps_dir: Path) -> list[StepDefinition]: + defs: list[StepDefinition] = [] + for ts_file in sorted(steps_dir.rglob("*.steps.ts")): + text = ts_file.read_text(encoding="utf-8") + for pattern_re in (STEP_DEF_RE, DECORATOR_RE): + for m in pattern_re.finditer(text): + raw_pattern = m.group(2) + line = text[: m.start()].count("\n") + 1 + try: + compiled = cucumber_to_regex(raw_pattern) + defs.append(StepDefinition(raw_pattern, ts_file, line, compiled)) + except re.error: + print(f" WARN: could not compile pattern at {ts_file}:{line}: {raw_pattern!r}", + file=sys.stderr) + return defs + + +def collect_feature_steps(features_dir: Path) -> list[str]: + """Return every step line from every .feature file (stripped of keyword).""" + step_line_re = re.compile( + r"^\s*(?:Given|When|Then|And|But)\s+(.+)$", re.IGNORECASE + ) + steps: list[str] = [] + for feat_file in sorted(features_dir.rglob("*.feature")): + for line in feat_file.read_text(encoding="utf-8").splitlines(): + m = step_line_re.match(line) + if m: + steps.append(m.group(1).strip()) + return steps + + +def main() -> int: + parser = argparse.ArgumentParser(description="Find unused Playwright-BDD step definitions.") + parser.add_argument( + "--steps-dir", + default="playwright/steps", + help="Directory containing *.steps.ts files (default: playwright/steps)", + ) + parser.add_argument( + "--features-dir", + default="playwright/features", + help="Directory containing *.feature files (default: playwright/features)", + ) + args = parser.parse_args() + + steps_dir = Path(args.steps_dir) + features_dir = Path(args.features_dir) + + if not steps_dir.is_dir(): + print(f"ERROR: steps-dir not found: {steps_dir}", file=sys.stderr) + return 2 + if not features_dir.is_dir(): + print(f"ERROR: features-dir not found: {features_dir}", file=sys.stderr) + return 2 + + print(f"Scanning step definitions in: {steps_dir}") + print(f"Scanning feature steps in: {features_dir}") + print() + + step_defs = collect_step_definitions(steps_dir) + feature_steps = collect_feature_steps(features_dir) + + print(f"Found {len(step_defs)} step definition(s) across {steps_dir}") + print(f"Found {len(feature_steps)} step usage(s) across {features_dir}") + print() + + unused: list[StepDefinition] = [] + for sd in step_defs: + matched = any(sd.regex.match(fs) for fs in feature_steps) + if not matched: + unused.append(sd) + + if not unused: + print("✅ All step definitions are used.") + return 0 + + print(f"⚠️ {len(unused)} UNUSED step definition(s) found:\n") + by_file: dict[Path, list[StepDefinition]] = {} + for sd in unused: + by_file.setdefault(sd.file, []).append(sd) + + for file, sds in sorted(by_file.items()): + print(f" {file}") + for sd in sorted(sds, key=lambda s: s.line): + print(f" line {sd.line:4d} {sd.pattern!r}") + print() + return 1 + + +if __name__ == "__main__": + sys.exit(main()) diff --git a/skills/data-cy-instrument/SKILL.md b/skills/data-cy-instrument/SKILL.md new file mode 100644 index 0000000..02ed648 --- /dev/null +++ b/skills/data-cy-instrument/SKILL.md @@ -0,0 +1,266 @@ +--- +name: data-cy-instrument +description: > + Automatically resolve missing `data-cy` attributes in Angular templates and sync PageObjects + to use `getByTestId()`. Angular-first but phases 1, 3, and 5 are framework-agnostic. Activates + when coverage_gaps are non-empty, PageObjects carry "⚠️ PROPOSED" locator comments, or + Functionalities have `status: planned` due to missing test IDs. + Triggers on: "add missing data-cy", "instrument templates", "fix data-cy gaps", "add testids", + "data-cy audit", "instrument angular templates", "fix locators", "add data-cy attributes", + "add test ids to templates", "fix playwright selectors due to missing data-cy", "data-cy-instrument", + "coverage_gaps", "Functionality status planned". + Does NOT trigger for: adding Gherkin (use living-doc-scenario-creator); PageObject + healing without data-cy gaps (use living-doc-pageobject-scan HEALING). + Pairs with living-doc-pageobject-scan (upstream) and living-doc-scenario-creator (downstream); + invokes living-doc-update for Functionality promotion. +license: Apache-2.0 +compatibility: GitHub Copilot +--- + +# data-cy-instrument + +> **Glossary:** Feature, Functionality, status vocabulary — see [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)). +> **BDD schemas:** manifest.json coverage_gaps schema, seed.yaml form_fixtures — see [living-doc-bdd-schemas](../references/living-doc-bdd-schemas.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-bdd-schemas.md)). + +**Framework scope:** This skill is **Angular-first** — naming conventions, routing module paths, +and feature-flag patterns are Angular-specific. The gap audit, naming validation, and PageObject +sync phases (Phases 1, 3, 5) are framework-agnostic and apply to any frontend stack. For +React, Vue, or other frameworks, adapt the component resolution in Phase 2 to the project's +routing and component model; all other phases apply unchanged. + +Resolves missing `data-cy` attributes end-to-end: from gap discovery in `manifest.json` +through Angular template edits, PageObject sync, Functionality promotion, and WORK_LOG +status update. All steps are in sequence — do not skip steps or re-order them. + +--- + +## When this skill activates + +- `manifest.json` has one or more surfaces with a non-empty `coverage_gaps` array +- A PageObject file contains a locator comment marked `⚠️ PROPOSED` or `⚠️ NOT YET IN TEMPLATE` +- A Functionality `.feature` file has `status: planned` with a comment indicating the reason is missing `data-cy` +- WORK_LOG.md §4 has rows marked 🔴 or ⚠️ +- User asks to add/fix data-cy or instrument an Angular template + +--- + +## Phase 1 · Gap Audit + +Build a prioritised gap list before touching any file. + +1. Load `.copilot/bdd/manifest.json`. For each surface entry, extract `coverage_gaps` items. +2. Load `.copilot/bdd/WORK_LOG.md` §4 — identify rows with status 🔴 (pending) or ⚠️ (lib-limited). +3. Cross-reference with `issue-missing-data-cy.md` if present at `.copilot/bdd/`. +4. For each gap, record: + + ``` + route: /auth/domain-access-control + element_desc: Status filter toggle (Pending / Approved / Rejected) + suggested_data_cy: filter-access-status + component_hint: domain-access-approvals-new.component.html + priority: P1 | P2 | P3 + ``` + +5. Sort by priority P1 → P3. Process in that order. + +**Skip list — do not attempt to instrument these:** +- Elements inside third-party library internals where the host attribute is confirmed not to be propagated (e.g. `cps-table` inner paginator buttons, `cps-tab` inner `
  • ` when the lib does not forward host attributes). Mark these ⚠️ "needs lib support" — add a WORK_LOG.md §4 row with status ⚠️, element description, library name and version, and a link to the library's issue tracker if one exists. Do not leave these as silent skips. +- Elements that require authenticated roles to render — flag as needing an integration test fixture, not a data-cy change. + +--- + +## Phase 2 · Route → Component Resolution + +For each gap, resolve which Angular component owns the element. + +1. Open `aul-ui/src/app/pages/authenticated/authenticated-routing.module.ts` (or the relevant routing module). +2. Find the route path matching the gap's route. +3. Check feature-flag conditionals: + - `environment.useBoundedCtxApi` (runtime flag `BD_CTX_API`) — if present, there are two component variants: `-new` (flag on) and the legacy component (flag off). **Instrument both.** + - `SHOW_EXPERIMENTAL_FEATURES` — instrument only if the element is inside that guard. +4. If the component is a wrapper that delegates to a child (`` sub-component), follow the child selector to its `.html` file. Repeat until the element is found. +5. Record the resolved template path(s) before making any edits. + +--- + +## Phase 3 · Name Validation + +Before writing any `data-cy` value, validate the candidate name. + +**Naming prefix rules:** + +| Prefix | Use for | Example | +|---|---|---| +| `btn-` | Any CTA button (``, ` + +``` + +**Placement:** Add `data-cy` as the second attribute after the component tag name (or after any structural directive like `*ngIf`, `@if`, `[ngClass]`). Preserve all existing attributes and indentation exactly. + +**Multi-line component elements:** +```html + + + + + +``` + +**Inline component elements:** +```html + + + + + +``` + +When a gap covers multiple instances of the same component in a loop (e.g. one "View" button per table row), add the `data-cy` once on the template element — the PageObject will use `.nth(index)` to distinguish instances. + +--- + +## Phase 5 · PageObject Sync + +After every template change, update the matching PageObject in `aul-ui/playwright/pages/`. + +**Replace proposed/fallback locators with `getByTestId()`:** + +```typescript +// Before — text fallback or proposed comment +// ⚠️ PROPOSED data-cy: btn-request-access-rights +readonly requestAccessButton: Locator = page.locator('cps-button', { hasText: 'Request access' }); + +// After +readonly requestAccessButton: Locator = page.getByTestId('btn-request-access-rights').locator('button'); +``` + +**Inner element resolution:** `getByTestId()` resolves the host Angular component element. For Playwright interactions (`click`, `fill`), chain `.locator('button')` or `.locator('input')` on the result if the interaction target is the native element inside the host. + +**Remove stub markers:** Delete any comment lines containing `⚠️ PROPOSED`, `⚠️ NOT YET IN TEMPLATE`, or `will resolve once template is updated` that relate to the now-instrumented elements. + +**Update PageObject header comments:** +- Change `status: candidate` → `status: active` if all locators for the page are now resolved. +- Remove `stub-reason:` line if no un-instrumented elements remain. + +--- + +## Phase 6 · Living Doc Promotion + +For each Functionality whose `status: planned` was solely due to missing `data-cy`: + +1. Open `aul-ui/playwright/features/liv_doc_func/func-{NNN}-*.feature`. +2. Change `# status: planned` → `# status: active` in the comment header. +3. Remove the `# planned-reason: no data-cy attributes` comment line if present. +4. Do **not** change any other header fields (AC text, func_type, feature, etc.). + +Only promote if the data-cy attributes required by that Functionality's ACs have all been added in Phase 4. If a Functionality depends on multiple elements and only some were instrumented, leave it as `planned` and add a comment listing the remaining blockers. + +After updating the BDD feature file header, also invoke `living-doc-update` to change the matching catalog entity's `status` from `planned` to `active`. The BDD file header and the catalog entity must stay in sync. + +--- + +## Phase 7 · WORK_LOG Update + +Update `.copilot/bdd/WORK_LOG.md` §4 and §8 to reflect completed work. + +**§4 row updates:** +- Change 🔴 → ✅ for each element that was instrumented. +- Change `Suggested data-cy` column to the `data-cy` column for confirmed values. +- Add a "Files updated:" note under the section header listing the template file(s) changed. + +**§8 open items:** +- Close OI items that are now fully resolved: change status column to `✅ closed` or remove the row. +- If a gap was partially resolved (e.g. some elements done, some need lib support), update the item description to reflect remaining scope. + +--- + +## Output after completing all phases + +Report the following at the end of the run: + +``` +## data-cy-instrument run summary + +### Templates updated +- : + +### PageObjects synced +- : + +### Functionalities promoted +- : planned → active + +### Remaining gaps (lib-limited or deferred) +- : + +### WORK_LOG §4 rows closed: N +### WORK_LOG OI items closed: N +``` + +--- + +## Interaction with other skills + +| Skill | Relationship | +|---|---| +| `living-doc-pageobject-scan` | Upstream — produces `manifest.json` with `coverage_gaps`. This skill consumes that output. | +| `living-doc-pageobject-scan` RE-SCAN scope | Upstream — re-generates `coverage_gaps` after a UI change. Trigger this skill after RE-SCAN if new gaps appear. | +| `living-doc-scenario-creator` | Downstream — after Functionalities are promoted from `planned` to `active`, generate Gherkin scenarios for them. | +| `living-doc-update` | Downstream — if PageObject header `status` changes, the corresponding Feature entity in the living doc may also need a status update. | + +**Pipeline position:** +``` +living-doc-pageobject-scan (or RE-SCAN) → data-cy-instrument + → living-doc-update (promote Functionalities: planned → active) + → living-doc-scenario-creator +``` + +--- + +## Out-of-scope routing + +| Request | Correct skill | +|---|---| +| Add or fix Gherkin scenarios | `living-doc-scenario-creator` | +| Generate or heal PageObjects (no missing data-cy) | `living-doc-pageobject-scan` | +| Fix selector drift from DOM structure changes (no missing data-cy) | `living-doc-pageobject-scan` HEALING scope | +| Deprecate a Functionality entity | `living-doc-update` | diff --git a/skills/data-cy-instrument/evals/evals.json b/skills/data-cy-instrument/evals/evals.json new file mode 100644 index 0000000..6f6e042 --- /dev/null +++ b/skills/data-cy-instrument/evals/evals.json @@ -0,0 +1,130 @@ +{ + "skill_name": "data-cy-instrument", + "evals": [ + { + "id": 1, + "category": "happy-path", + "prompt": "Phase 1 scan of the checkout module found 5 elements without data-cy attributes. What are the next steps?", + "expected_output": "For each element: (1) Determine if it is a native HTML element — if yes, add a data-cy attribute directly in the Angular template (e.g. data-cy='confirm-order-btn'). (2) If it is a third-party library component that does not forward attributes, mark it ⚠️ 'needs lib support' and add a WORK_LOG.md §4 row with status ⚠️, element description, library name and version, and a link to the library's issue tracker. (3) After all native elements are instrumented, proceed to Phase 3 to update PageObjects to use getByTestId().", + "files": [], + "expectations": [ + "Handles native elements by adding data-cy to the template", + "Handles lib components with WORK_LOG.md §4 row entry", + "WORK_LOG row includes library name, version, and issue tracker link", + "Does not silently skip lib components" + ] + }, + { + "id": 2, + "category": "happy-path", + "prompt": "A PageObject has 3 locators marked ⚠️ PROPOSED. What does that mean and what do I do?", + "expected_output": "PROPOSED locators indicate that Phase 1 (data-cy instrumentation) has not yet been done for those elements — the PageObject was generated with a best-guess selector pending data-cy confirmation. Steps: (1) Run Phase 1 to add the missing data-cy attributes to the template. (2) Once data-cy is in the template, replace the PROPOSED locator with getByTestId('your-data-cy-value') in the PageObject. (3) Remove the ⚠️ PROPOSED comment after updating.", + "files": [], + "expectations": [ + "Explains PROPOSED means data-cy not yet added to template", + "Phase 1 first, then update locator to getByTestId()", + "Remove PROPOSED comment after fix" + ] + }, + { + "id": 3, + "category": "happy-path", + "prompt": "How do I update a PageObject locator from a CSS selector to getByTestId() after adding the data-cy attribute?", + "expected_output": "Replace the old locator: `this.page.locator('button.confirm-order')` → `this.page.getByTestId('confirm-order-btn')` where 'confirm-order-btn' is the value of the data-cy attribute you added to the template. Update the method comment to remove any ⚠️ PROPOSED flag.", + "files": [], + "expectations": [ + "Shows the getByTestId() replacement syntax", + "data-cy value matches what was added to the template", + "Removes PROPOSED comment" + ] + }, + { + "id": 4, + "category": "regression", + "prompt": "A third-party PrimeNG Button component does not forward the data-cy attribute. I silently skipped it in Phase 1. Is that OK?", + "expected_output": "No — silent skips are not allowed. For any element where the library does not support attribute forwarding, add a WORK_LOG.md §4 row with: status ⚠️, element description, library name (PrimeNG) and version, and a link to the PrimeNG issue tracker or relevant GitHub issue. This ensures the gap is tracked and actionable.", + "files": [], + "expectations": [ + "Identifies silent skip as incorrect", + "Requires WORK_LOG.md §4 row", + "Row must include library name, version, issue tracker link" + ] + }, + { + "id": 5, + "category": "happy-path", + "prompt": "FUNC-promo-validate has status 'planned' because the data-cy attributes are missing. After Phase 1 and Phase 3 are complete, what do I do to promote it?", + "expected_output": "After data-cy attributes are added and PageObjects updated: load living-doc-update and change FUNC-promo-validate status from 'planned' to 'active'. This promotion is the downstream step — data-cy-instrument invokes living-doc-update for the status change.", + "files": [], + "expectations": [ + "Routes to living-doc-update for status promotion", + "Status changes from planned to active", + "Only after Phase 1 and Phase 3 are complete" + ] + }, + { + "id": 6, + "category": "negative", + "prompt": "The checkout PageObject selectors are broken because the CSS classes changed after the redesign — there are no data-cy attributes involved. Should I run data-cy-instrument?", + "expected_output": "No — selector drift caused by CSS class changes without missing data-cy attributes is a HEALING scope concern. Load living-doc-pageobject-scan and use HEALING mode to re-scan the page and update the broken selectors. data-cy-instrument only applies when coverage_gaps are non-empty or PageObjects have PROPOSED locators due to missing test IDs.", + "files": [], + "expectations": [ + "Routes to living-doc-pageobject-scan HEALING", + "Explains data-cy-instrument does not apply when coverage_gaps are empty", + "Does not attempt to re-instrument templates" + ] + }, + { + "id": 7, + "category": "edge-case", + "prompt": "Phases 1 and 3 are Angular-specific. What do I do for a React or Vue app?", + "expected_output": "Phases 1, 3, and 5 are framework-agnostic in principle. For React: add data-testid attributes (data-cy is by convention but getByTestId() accepts any value). For Vue: same approach. Phase 2 (ng-add schematics) and Phase 4 (Angular-specific wiring) are Angular-only and should be skipped for other frameworks. Apply Phase 1 (audit), Phase 3 (PO update), and Phase 5 (coverage gap check) to any framework.", + "files": [], + "expectations": [ + "States phases 1, 3, 5 are framework-agnostic", + "States phases 2 and 4 are Angular-only", + "Mentions data-testid equivalence for React/Vue" + ] + }, + { + "id": 8, + "category": "output-format", + "prompt": "Show me the expected WORK_LOG.md §4 row for a Material Design button that does not support data-cy forwarding.", + "expected_output": "| ⚠️ | Checkout confirm button | MatButton (Angular Material v17.3.0) | Cannot forward data-cy; tracked at https://github.com/angular/components/issues/XXXX |", + "files": [], + "expectations": [ + "Status column is ⚠️", + "Element description present", + "Library name and version present", + "Issue tracker link present" + ] + }, + { + "id": 9, + "category": "regression", + "prompt": "After running Phase 3, coverage_gaps is still non-empty for the promo-apply button. What should I check?", + "expected_output": "Check: (1) Did Phase 1 actually add data-cy='promo-apply-btn' to the template? (2) Did Phase 3 update the PageObject locator to getByTestId('promo-apply-btn')? (3) Is the element conditionally rendered (e.g. *ngIf) and not visible during the scan? (4) Is the element inside a Shadow DOM that prevents standard attribute access? If all four checks pass and the gap persists, add a WORK_LOG.md §4 row.", + "files": [], + "expectations": [ + "Checks Phase 1 template add", + "Checks Phase 3 PO update", + "Checks conditional rendering", + "Checks Shadow DOM edge case", + "Falls back to WORK_LOG if gap persists" + ] + }, + { + "id": 10, + "category": "happy-path", + "prompt": "What is the relationship between data-cy-instrument and living-doc-pageobject-scan?", + "expected_output": "living-doc-pageobject-scan is upstream — it creates or heals PageObjects and may produce PROPOSED locator comments when data-cy attributes are missing. data-cy-instrument runs downstream to resolve those PROPOSED locators by instrumenting the templates and updating the PageObjects to use getByTestId(). After data-cy-instrument completes, living-doc-scenario-creator is downstream to generate BDD scenarios using the now-stable locators.", + "files": [], + "expectations": [ + "living-doc-pageobject-scan is upstream (produces PROPOSED)", + "data-cy-instrument resolves PROPOSED locators", + "living-doc-scenario-creator is downstream", + "Correct pipeline order stated" + ] + } + ] +} \ No newline at end of file diff --git a/skills/data-cy-instrument/evals/trigger-eval.json b/skills/data-cy-instrument/evals/trigger-eval.json new file mode 100644 index 0000000..5aab361 --- /dev/null +++ b/skills/data-cy-instrument/evals/trigger-eval.json @@ -0,0 +1,134 @@ +[ + { + "id": 1, + "query": "Add missing data-cy attributes to the checkout Angular template", + "should_trigger": true, + "reason": "data-cy-instrument trigger — adding missing data-cy attributes" + }, + { + "id": 2, + "query": "The PageObjects have ⚠️ PROPOSED locator comments — what do I do?", + "should_trigger": true, + "reason": "data-cy-instrument trigger — PROPOSED locator resolution" + }, + { + "id": 3, + "query": "Instrument the Angular templates to add data-cy test IDs", + "should_trigger": true, + "reason": "data-cy-instrument trigger — instrument angular templates keyword" + }, + { + "id": 4, + "query": "Fix data-cy gaps in the login component template", + "should_trigger": true, + "reason": "data-cy-instrument trigger — fix data-cy gaps keyword" + }, + { + "id": 5, + "query": "Run a data-cy audit on the checkout module", + "should_trigger": true, + "reason": "data-cy-instrument trigger — data-cy audit keyword" + }, + { + "id": 6, + "query": "Add testids to the checkout form inputs so Playwright can select them", + "should_trigger": true, + "reason": "data-cy-instrument trigger — add testids keyword" + }, + { + "id": 7, + "query": "Our Playwright selectors are failing because there are no data-cy attributes on the buttons", + "should_trigger": true, + "reason": "data-cy-instrument trigger — fix playwright selectors due to missing data-cy" + }, + { + "id": 8, + "query": "Update the PageObjects to use getByTestId() instead of CSS selectors", + "should_trigger": true, + "reason": "data-cy-instrument trigger — syncing PageObjects to use getByTestId()" + }, + { + "id": 9, + "query": "The coverage_gaps list is non-empty after the PageObject scan — how do I resolve it?", + "should_trigger": true, + "reason": "data-cy-instrument trigger — coverage_gaps resolution workflow" + }, + { + "id": 10, + "query": "FUNC-promo-validate has status planned because there are no data-cy attributes — fix it", + "should_trigger": true, + "reason": "data-cy-instrument trigger — Functionality.status planned due to missing test IDs" + }, + { + "id": 11, + "query": "Add data-cy to the third-party UI library button in the checkout form", + "should_trigger": true, + "reason": "data-cy-instrument trigger — even lib buttons need an audit decision (WORK_LOG if unsupported)" + }, + { + "id": 12, + "query": "Generate BDD scenarios for US-007", + "should_trigger": false, + "reason": "Adding Gherkin — routes to living-doc-scenario-creator" + }, + { + "id": 13, + "query": "Write step definitions for the checkout scenarios", + "should_trigger": false, + "reason": "Writing step definitions — routes to gherkin-step" + }, + { + "id": 14, + "query": "The PageObject selectors are broken after the UI redesign — heal them", + "should_trigger": false, + "reason": "PageObject healing without data-cy gaps — routes to living-doc-pageobject-scan HEALING" + }, + { + "id": 15, + "query": "Scan the webapp and generate PageObjects for the admin portal", + "should_trigger": false, + "reason": "PageObject creation — routes to living-doc-pageobject-scan CREATE" + }, + { + "id": 16, + "query": "Create a User Story for the checkout capability", + "should_trigger": false, + "reason": "Creating a User Story — routes to living-doc-create-user-story" + }, + { + "id": 17, + "query": "Run a gap analysis on the living doc", + "should_trigger": false, + "reason": "Gap analysis — routes to living-doc-gap-finder" + }, + { + "id": 18, + "query": "What does PR #217 affect in the living doc?", + "should_trigger": false, + "reason": "Impact analysis — routes to living-doc-impact-analysis" + }, + { + "id": 19, + "query": "Sync the @AC: tags in the feature files with the living doc", + "should_trigger": false, + "reason": "Feature file sync — routes to gherkin-living-doc-sync" + }, + { + "id": 20, + "query": "Delete all BDD artifacts linked to the deprecated checkout feature", + "should_trigger": false, + "reason": "BDD artifact cleanup — routes to bdd-maintain" + }, + { + "id": 21, + "query": "Update the wording of AC-1 on US-042", + "should_trigger": false, + "reason": "Updating an entity — routes to living-doc-update" + }, + { + "id": 22, + "query": "Create a Feature entity for the checkout module", + "should_trigger": false, + "reason": "Creating a Feature — routes to living-doc-create-feature" + } +] \ No newline at end of file diff --git a/skills/gherkin-living-doc-sync/SKILL.md b/skills/gherkin-living-doc-sync/SKILL.md new file mode 100644 index 0000000..1e93ec8 --- /dev/null +++ b/skills/gherkin-living-doc-sync/SKILL.md @@ -0,0 +1,188 @@ +--- +name: gherkin-living-doc-sync +description: > + Synchronise Gherkin feature files and BDD scenarios with the living documentation catalog. + Corrects existing links — distinct from living-doc-gap-finder (which detects missing coverage). + Activate when `@AC:` tags or `# AC:` comments are missing or stale, step text drifts after + a refactor, ACs are descoped, or AC changes must propagate from the living doc to feature files. + Run scan_ac_links.py to audit AC link health before a sync pass. + Triggers on: "sync gherkin to living doc", "feature file out of sync", "scenario not linked + to AC", "step text changed", "gherkin drift", "BDD sync", "AC link missing in feature file", + "sync scenarios", "traceability broken", "propagate AC changes", "AC was descoped". + Does NOT trigger for: writing new scenarios (use living-doc-scenario-creator); implementing + step definitions (use gherkin-step); finding gaps (use living-doc-gap-finder); + creating entities (use living-doc-create-*). + Pairs with living-doc-update (upstream) and gherkin-step (downstream). +license: Apache-2.0 +compatibility: GitHub Copilot +--- + +# Gherkin ↔ Living Doc Sync + +> **Glossary:** Feature, Functionality, User Story, AC — see [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)). +> **BDD schemas:** US and Functionality feature file headers, `# Acceptance Criteria:` block format — see [living-doc-bdd-schemas](../references/living-doc-bdd-schemas.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-bdd-schemas.md)). + +Sync runs in three directions: (1) feature file to living doc, (2) living doc AC to feature file, +(3) step text to PageObject method signature. + +Use `scripts/scan_ac_links.py` to detect missing or malformed `@AC:` tags and missing `# AC:` +comments before a full sync run. The script only checks living-doc feature files (`features/us/` +and `features/functionalities/`) — other feature files are skipped. + +--- + +## Step 1 — Detect the sync direction + +**Upstream dependencies:** Directions that flow from the living documentation into feature files are initiated by catalog-layer operations from `@living-doc-bdd-copilot`: +- `living-doc-update` modified, added, or deprecated an AC → triggers directions 2 and 4 below +- `living-doc-impact-analysis` identified High-impact AC changes that require resync → may trigger directions 2 and 3 + +| Change event | Sync direction | Action | +|---|---|---| +| New `.feature` file added | Feature file to living doc | Link each scenario to an AC; create AC if missing | +| User Story AC modified or added | Living doc to feature file | Update or add the corresponding scenario | +| UI refactored (selector / method renamed) | Step text to PageObject | Update step text and `@AC:` tag if scenario intent changed; for the PageObject side of the rename (method signature or locator), load `living-doc-pageobject-scan` HEALING scope — this skill owns only the Gherkin step text, not the PageObject code | +| US deprecated | Living doc to feature file | Emit one sync action per linked scenario; add `@deprecated`, record the reason, and flag `@review-needed` | +| Scenario added without an `@AC:` tag | Feature file to living doc | Propose an AC and add the `@AC:` tag | + +--- + +## Step 2 — Audit `@AC:` traceability tags + +> **Authoritative source:** The `@AC:` format is defined in `living-doc-scenario-creator`. The spec below is a reference copy for sync validation — load `living-doc-scenario-creator` for the canonical definition. + +**Required traceability format** for living-doc feature files (from the glossary): + +```gherkin +# AC:US-1-01 (v1.0.0 - Active) — customer places an order with a saved payment method +@AC:US-1-01 +Scenario: Customer successfully places an order +``` + +With aspect param — when the scenario covers only one aspect of a multi-aspect AC: + +```gherkin +# AC:US-1-01 (v1.0.0 - Active) — displays {required field} on login screen | aspect: username input +@AC:US-1-01/aspect:username-input +Scenario: Login form shows the username input field +``` + +- `# AC:` comment: human-readable context — ID, version, state, description, optional aspect. +- `@AC:` Cucumber tag: `@AC:[/param:value...]` — machine-readable link. The `/param:value` format is extensible. +- The `@AC:` tag(s) must appear on the lines immediately above `Scenario:` or `Scenario Outline:`. Additional tags (e.g. `@Regression`, `@skip`) may appear in the same block. +- Full AC details (version, state, description) live in the file's `# Acceptance Criteria:` header block. + +**Deprecated US detection (Direction 4):** When the trigger is a deprecated User Story, first +build the list of affected scenarios before running the standard checklist: + +1. Collect all AC IDs owned by the deprecated US (from the US entity or its feature file header). +2. Search all `.feature` files under `features/us/` and `features/functionalities/` for + `@AC:` tags matching those IDs. +3. For each matching scenario, emit a SYNC ACTION to add `@deprecated` and `@review-needed`, + with a comment recording the deprecation date and reason from the US entity. +4. After tagging, continue with the standard checklist below to catch any remaining link issues. + +**Audit checklist:** +1. Does every `Scenario:` / `Scenario Outline:` in living-doc files have at least one `@AC:` tag? +2. Is the corresponding `# AC:` comment present and matching the tag's AC ID? +3. Does the referenced AC ID exist in the living documentation? +4. Does the AC state match (`ACTIVE` — not `DEPRECATED`, `PLANNED`, or `IN_REVIEW`)? +5. Does the AC description (in the file header) match the scenario intent? + +For each missing or mismatched tag: + +``` +SYNC ACTION: checkout.feature:14 + Scenario: "Customer successfully places an order" + Missing @AC: tag + Proposed tag: @AC:US-001-01 + Confirm or select a different AC +``` + +## Step 3 — Detect step text drift + +When step text changes after a UI refactor, the step definition binding breaks: + +``` +DRIFT DETECTED: checkout.feature:17 + Step: "When the customer clicks the Confirm Purchase button" + No matching step definition found + Previous match: "When the customer confirms the order" (checkout_steps.py:34) + PageObject method: CheckoutPage.confirm_order() + Suggested fix: update step text to "When the customer confirms the order" + OR update the step definition regex to match the new wording +``` + +> **Scope boundary with `living-doc-pageobject-scan` HEALING:** This step corrects step text in `.feature` files and step definition pattern strings. If the underlying PageObject selector or method signature drifted (renamed in the DOM or PageObject class), use `living-doc-pageobject-scan` HEALING mode to fix the PageObject class first, then re-run this sync to align feature files. +> +> **Step definition code changes:** When a step definition regex pattern must be updated (not just the feature file wording), load `gherkin-step` to apply the code change correctly. + +--- + +## Step 4 — Apply sync changes + +Apply the minimum necessary change per action: + +- **Add missing `@AC:` tag**: insert `@AC:` above `Scenario:` +- **Update stale AC reference**: update the file header's `# Acceptance Criteria:` block entry; the `@AC:` tag on the scenario stays unchanged. Show the exact change as `OLD:` and `NEW:` lines. If the revised AC intent changed materially, flag the linked step text for review instead of restructuring the scenario in the same sync action. +- **Update scenario to match revised AC**: update step text; keep the `@AC:` tag unchanged +- **Fix broken step text**: prefer updating the `.feature` file to match the existing step definition and PageObject method; only update the step definition regex when the business wording genuinely changed +- **Mark deprecated scenarios**: add `@deprecated` and `@review-needed`, plus a comment with the date and reason. Emit one action per affected scenario with file and line number. +- **Mark descoped scenarios**: add `@wip` or `@pending` and `@review-needed`, plus a comment with the descope reason and target-release reference. Preserve the scenario — never delete it — so it can be reinstated when the AC is promoted back to Active. Emit one SYNC ACTION per affected scenario. +- **Broken AC reference**: never silently remove the `@AC:` tag. Either relink it to the correct AC ID, or create the missing living doc entity with `living-doc-create-user-story` / `living-doc-create-functionality`, then update the tag. +- **AC split into multiple ACs**: update the existing scenario's `@AC:` tag to the primary AC; create new scenarios for additional ACs + +Never delete a scenario during sync — flag it with `@review-needed` for developer decision. + +--- + +## Step 5 — Output sync report + +Do **not** apply sync changes automatically. Report `DRIFT DETECTED` blocks first (tests fail), then `SYNC ACTION` blocks (traceability), and ask the developer to confirm each action before editing files. + +``` +DRIFT DETECTED: checkout.feature:17 + Step: "When the customer clicks the Confirm Purchase button" + No matching step definition found + Previous match: "When the customer confirms the order" (checkout_steps.py:34) + PageObject method: CheckoutPage.confirm_order() + Recommended fix: update the feature file step text to match the existing step definition + OR update the step definition regex to match the new wording + Apply change? (y/n) + +SYNC ACTION: checkout.feature:14 + Scenario: "Customer successfully places an order" + Missing @AC: tag + Proposed tag: @AC:US-001-01 + Apply change? (y/n) + +SYNC ACTION: checkout.feature:32 + Scenario: "Customer reviews order totals before payment" + Missing @AC: tag + Proposed tag: @AC:US-001-02 + Apply change? (y/n) + +Summary: 2 missing AC links, 1 step text drift detected — apply changes? (y/n per action) +``` + +--- + +## Anti-patterns to flag + +| Anti-pattern | Flag | +|---|---| +| Scenario with no `@AC:` tag | Missing traceability — add tag or create AC | +| Two scenarios linked to the same AC | Usually a duplicate — review | +| AC linked from a scenario in a different User Story's feature file | Passive cross-US coverage — permitted but note it in the sync report. Only flag if the scenario's primary intent belongs to a different User Story (misplaced scenario) | +| Step text describes implementation (selector, endpoint) | Gherkin business-language violation — refer to `living-doc-scenario-creator` | + +--- + +## Out-of-scope routing + +| Request | Use instead | +|---|---| +| Writing new Gherkin scenarios from scratch | `living-doc-scenario-creator` | +| Implementing step definition code | `gherkin-step` | +| Finding ACs with no scenario coverage | `living-doc-gap-finder` | +| Creating new User Story, Feature, or Functionality entities | `living-doc-create-user-story` / `living-doc-create-functionality` | diff --git a/skills/gherkin-living-doc-sync/evals/evals.json b/skills/gherkin-living-doc-sync/evals/evals.json new file mode 100644 index 0000000..2aa367b --- /dev/null +++ b/skills/gherkin-living-doc-sync/evals/evals.json @@ -0,0 +1,190 @@ +{ + "skill_name": "gherkin-living-doc-sync", + "evals": [ + { + "id": 1, + "category": "happy-path", + "prompt": "checkout.feature has 3 scenarios but none of them have a # AC: comment above them. What should I do?", + "expected_output": "Agent identifies all three scenarios as missing AC link headers (sync direction: feature file → living doc). For each scenario, it proposes a matching AC from the living doc catalog and outputs a SYNC ACTION block per scenario showing the proposed # AC: line to insert. Format: 'SYNC ACTION: checkout.feature: — Missing AC link header — Proposed link: # AC: () — '. Asks the developer to confirm each mapping before applying.", + "files": [], + "expectations": [ + "Identifies all scenarios missing # AC: comments", + "Proposes a matching AC from the catalog for each scenario", + "Outputs a SYNC ACTION block for each affected scenario", + "Does not apply changes without developer confirmation", + "Does not delete any scenario" + ] + }, + { + "id": 2, + "category": "happy-path", + "prompt": "The product owner changed the description of AC:US-001-01 in the living doc. The linked scenario in checkout.feature still has the old # AC: description text. How do I fix this?", + "expected_output": "Sync direction is living doc → feature file. Agent updates the # AC: comment text above the linked scenario to reflect the new AC description. The AC ID itself is never changed — only the description text in the comment. Outputs: OLD: '# AC: US-001-01 (v1.0.0 – Active) — ' and NEW: '# AC: US-001-01 (v1.1.0 – Active) — '. Flags any step text that may also need updating to match the revised AC intent.", + "files": [], + "expectations": [ + "Sync direction: living doc → feature file", + "Updates comment description text only — AC ID remains stable", + "Shows old and new comment text clearly labelled", + "Flags step text for review if the AC intent changed significantly", + "Does not change the scenario structure" + ] + }, + { + "id": 3, + "category": "happy-path", + "prompt": "After renaming a button's data-testid, the step 'When the customer clicks the Confirm Purchase button' no longer matches any step definition. The old step was 'When the customer confirms the order'. How do I fix this?", + "expected_output": "Sync direction: step text → PageObject. Agent outputs a DRIFT DETECTED block: identifies the broken step text, the previous matching step definition, the PageObject method it delegates to (CheckoutPage.confirm_order()), and proposes two fix options: (1) update the .feature file step text to match the existing step definition, or (2) update the step definition regex to match the new wording. Recommends option 1 (update feature file) as the lower-risk change since the PageObject method and step definition are already working.", + "files": [], + "expectations": [ + "Outputs DRIFT DETECTED block with affected file and line number", + "Identifies the broken step text and the expected step definition", + "Links the step to its PageObject method", + "Provides two fix options", + "Recommends updating the feature file as the lower-risk fix" + ] + }, + { + "id": 4, + "category": "regression", + "prompt": "User Story US-042 has been deprecated in the living doc. There are 3 scenarios in promo.feature linked to ACs of US-042. What happens to these scenarios?", + "expected_output": "When a US is deprecated, the sync direction is living doc → feature file. Agent proposes: add @deprecated tag to each affected scenario; add a comment above each with the reason (e.g. '# Deprecated: US-042 was deprecated on 2026-05-15 — reason: feature removed'). Never deletes scenarios — marks with @review-needed for developer decision. Lists all 3 affected scenarios with their line numbers.", + "files": [], + "expectations": [ + "Adds @deprecated tag to each scenario linked to the deprecated US", + "Adds a comment explaining the reason for deprecation", + "Never deletes scenarios — marks with @review-needed", + "Lists all affected scenarios with file and line number" + ] + }, + { + "id": 5, + "category": "negative", + "prompt": "I need a new Gherkin scenario for the case where a promo code has expired.", + "expected_output": "Writing new scenarios is out of scope for this skill — routes to living-doc-scenario-creator. gherkin-living-doc-sync corrects existing links and syncs existing scenarios; it does not write new scenarios from scratch.", + "files": [], + "expectations": [ + "Does not write a new scenario", + "Routes to living-doc-scenario-creator", + "Explains the distinction: sync vs. write new" + ] + }, + { + "id": 6, + "category": "paraphrase", + "prompt": "After the UI redesign our feature files are a mess — step text is broken and half the AC links point to the wrong things. Where do I start?", + "expected_output": "Agent identifies this as a multi-direction sync problem. Recommends running scan_ac_links.py first to get a full audit before applying changes. Then: (1) run step text drift detection for all .feature files to find broken step bindings; (2) audit # AC: headers for correctness; (3) apply the minimum change per sync action. Outputs a prioritised repair plan: broken steps (risk: tests fail) before stale AC links (risk: traceability gaps).", + "files": [], + "expectations": [ + "Recommends scan_ac_links.py as first action", + "Detects both step text drift and stale AC links", + "Prioritises broken steps (test failures) over stale links (traceability)", + "Applies minimum necessary change per action" + ] + }, + { + "id": 7, + "category": "edge-case", + "prompt": "A scenario has '# AC: US-099-01' in its header but US-099 does not exist in the living doc catalog. What should I do?", + "expected_output": "This is a broken AC reference. Agent outputs a SYNC ACTION flagging the scenario with the broken link. Resolution options: (1) find the correct AC ID in the catalog that matches this scenario's intent and update the # AC: comment; (2) if the behavior is new and has no AC, invoke living-doc-create-user-story or living-doc-create-functionality to create the missing entity, then link the scenario. Never silently removes the # AC: comment — that would destroy the traceability intent.", + "files": [], + "expectations": [ + "Detects the broken AC reference (AC ID not in catalog)", + "Provides two resolution options: find correct AC or create missing entity", + "Does not silently remove the # AC: comment", + "Routes entity creation to the appropriate living-doc-create-* skill" + ] + }, + { + "id": 8, + "category": "output-format", + "prompt": "Show me what the output of a sync run looks like when there are 2 missing AC links and 1 step text drift.", + "expected_output": "Output contains: (1) a SYNC ACTION block for each missing AC link — showing the feature file path, line number, scenario title, and proposed # AC: header; (2) a DRIFT DETECTED block for the broken step — showing the old step text, the expected step text, the PageObject method, and the suggested fix. Each block has a clear header (SYNC ACTION / DRIFT DETECTED). A summary line at the end: '2 missing AC links, 1 step text drift detected — apply changes? (y/n per action)'.", + "files": [], + "expectations": [ + "SYNC ACTION blocks for each missing AC link with file, line, scenario, and proposed fix", + "DRIFT DETECTED block with old step, expected step, PageObject method, and fix options", + "Each block has a distinct labelled header", + "Summary line with counts at the end", + "Asks for confirmation before applying" + ] + }, + { + "id": 9, + "category": "happy-path", + "prompt": "What is the difference between the '@AC:US-001-01' Cucumber tag and the '# AC:US-001-01 ...' comment above a scenario? Do I need both?", + "expected_output": "Yes — both serve distinct purposes. The '# AC: ...' comment provides human-readable context: the AC version, state, description, and optional aspect annotation, all visible at a glance in the feature file. The '@AC:' Cucumber tag provides machine-readable traceability: coverage scripts, gap finders, and CI tools use it to link scenarios to ACs programmatically. Both must be present on every scenario for full traceability. The sync skill ensures both are aligned — if only one is present, a SYNC ACTION is raised.", + "files": [], + "expectations": [ + "Explains # AC: comment as human-readable context with version, state, and description", + "Explains @AC: tag as machine-readable traceability for scripts and CI", + "States both are required on every scenario", + "Notes sync skill raises a SYNC ACTION when only one is present" + ] + }, + { + "id": 10, + "category": "happy-path", + "prompt": "I want to run a full audit of all AC link headers across all feature files before starting a sync. How do I use scan_ac_links.py?", + "expected_output": "Run: 'python scripts/scan_ac_links.py features/ --catalog catalog.json'. The script scans all .feature files for '# AC:' comments and '@AC:' tags, validates each AC ID against the living doc catalog, and outputs a report listing: (1) missing # AC: comments (scenarios with no link header), (2) stale AC IDs (IDs not found in catalog), (3) mismatched comment/tag pairs (one present but not the other). Use this report as input to the sync workflow — address broken links first, then missing links, then mismatches.", + "files": [], + "expectations": [ + "Names the correct command: python scripts/scan_ac_links.py features/ --catalog catalog.json", + "Explains what the script reports: missing headers, stale IDs, mismatched pairs", + "Recommends repair order: broken links → missing links → mismatches", + "Notes the script output drives the subsequent sync workflow" + ] + }, + { + "id": 11, + "category": "regression", + "prompt": "A scenario has '@AC:US-001-01/aspect:username-input' but the # AC: comment above it just reads '# AC:US-001-01 (v1.0.0 – Active) — displays login form fields'. Is this a sync issue?", + "expected_output": "Yes — this is a SYNC ACTION. The @AC: tag encodes an aspect param (/aspect:username-input) but the # AC: comment does not mirror it. Per the skill spec, the comment must include '| aspect: username input' to match the tag. Apply: change the comment to '# AC:US-001-01 (v1.0.0 – Active) — displays login form fields | aspect: username input'. The comment mirrors the human-readable form of the tag's /aspect: param. Confirm before applying.", + "files": [], + "expectations": [ + "Identifies mismatch between @AC: tag aspect param and # AC: comment as a SYNC ACTION", + "Provides the corrected # AC: comment with | aspect: annotation", + "Comment mirrors the /aspect:value in human-readable form", + "Asks for confirmation before applying" + ] + }, + { + "id": 12, + "category": "edge-case", + "prompt": "I have a scenario linked to AC:US-042-03 which has just been descoped in the living doc (status: descoped). What should the sync skill do?", + "expected_output": "Sync direction: living doc → feature file. When an AC is descoped, the linked scenario should be tagged with @wip or @pending to indicate it is not active. Add a comment above the scenario explaining the descope: '# Descoped: AC:US-042-03 descoped on 2026-05-15 — reason: promo stacking deferred to sprint-52'. Never delete the scenario — it preserves the intent for when the AC is reinstated. Flag the scenario for @pending or @wip tagging in the CI pipeline.", + "files": [], + "expectations": [ + "Sync direction: living doc → feature file", + "Tags scenario with @wip or @pending — does not delete it", + "Adds a comment with the descope reason and date", + "Preserves the scenario for when the AC is reinstated", + "Flags for @pending tagging in CI" + ] + }, + { + "id": 13, + "category": "edge-case", + "prompt": "AC:US-001-02 was split into two separate ACs: AC:US-001-02 (happy path) and AC:US-001-04 (alt path: guest checkout). The feature file still has one scenario linked to AC:US-001-02. What does the sync output look like?", + "expected_output": "Two SYNC ACTIONs are emitted. (1) Existing scenario: the '@AC:US-001-02' tag is updated to the primary (happy path) AC — it stays unchanged since it was always the primary AC. (2) A SYNC ACTION proposes creating a new scenario for AC:US-001-04 with the required '# AC: US-001-04' header and '@AC:US-001-04' tag. The existing scenario is never modified or deleted. Developer confirms both actions before any change is applied.", + "files": [], + "expectations": [ + "Existing scenario's @AC: tag is updated to point to the primary AC (US-001-02)", + "A SYNC ACTION is raised proposing a new scenario for AC:US-001-04", + "The existing scenario is not deleted or modified beyond the tag update", + "Developer confirmation is required before any file is edited" + ] + }, + { + "id": 14, + "category": "happy-path", + "prompt": "During a sync pass I discover that the step text for AC:US-007-03 changed AND the PageObject method that backs it was renamed. What should I do for each part?", + "expected_output": "Split the work: (1) gherkin-living-doc-sync updates the step text and the @AC: tag in the .feature file — this skill handles Gherkin text sync. (2) The PageObject method rename (signature and locator) is owned by living-doc-pageobject-scan HEALING scope — load that skill for the PO side. Do not attempt to rename PageObject methods inside this sync skill.", + "files": [], + "expectations": [ + "Correctly splits Gherkin-side vs PageObject-side work", + "Routes PageObject method rename to living-doc-pageobject-scan HEALING", + "Does not attempt to rename PO methods within gherkin-living-doc-sync" + ] + } + ] +} \ No newline at end of file diff --git a/skills/gherkin-living-doc-sync/evals/fixture-map.md b/skills/gherkin-living-doc-sync/evals/fixture-map.md new file mode 100644 index 0000000..bb7c892 --- /dev/null +++ b/skills/gherkin-living-doc-sync/evals/fixture-map.md @@ -0,0 +1,33 @@ +# Fixture Map — gherkin-living-doc-sync + +## Fixture files + +No fixture files for this skill. All evals are conversational — the skill operates on feature files and a living doc catalog referenced by path, not by inline fixture content. + +## Eval to fixture mapping + +| Eval ID | Category | Fixture file(s) | Coverage | +|---|---|---|---| +| 1 | happy-path | _(none — conversational)_ | Missing # AC: headers in checkout.feature — SYNC ACTION blocks per scenario | +| 2 | happy-path | _(none)_ | AC description updated in living doc → propagate to # AC: comment in feature file | +| 3 | happy-path | _(none)_ | Step text drift after UI rename → DRIFT DETECTED block with two fix options | +| 4 | regression | _(none)_ | US deprecated in living doc → @deprecated + @review-needed tags on linked scenarios | +| 5 | negative | _(none)_ | Routing: new scenario authoring → living-doc-scenario-creator | +| 6 | paraphrase | _(none)_ | "Feature files are a mess after redesign" → prioritised repair plan: steps first, then links | +| 7 | edge-case | _(none)_ | Broken AC reference (US-099 not in catalog) → resolution options, never remove the link | +| 8 | output-format | _(none)_ | Sync run output format: SYNC ACTION + DRIFT DETECTED blocks + summary line | +| 9 | happy-path | _(none)_ | @AC: Cucumber tag vs # AC: comment — both required, each serves a distinct purpose | +| 10 | happy-path | _(none)_ | scan_ac_links.py audit command and output interpretation | +| 11 | regression | _(none)_ | Aspect param mismatch: @AC: tag has /aspect: but # AC: comment does not mirror it | +| 12 | edge-case | _(none)_ | Descoped AC: tag scenario @wip/@pending, add comment, never delete | + +## Trigger eval summary + +20 entries: 14 `should_trigger=true`, 6 `should_trigger=false` + +| Routes to | Query count | +|---|---| +| living-doc-scenario-creator | 2 | +| gherkin-step | 1 | +| living-doc-gap-finder | 1 | +| living-doc-create-user-story | 1 | diff --git a/skills/gherkin-living-doc-sync/evals/trigger-eval.json b/skills/gherkin-living-doc-sync/evals/trigger-eval.json new file mode 100644 index 0000000..f765e4a --- /dev/null +++ b/skills/gherkin-living-doc-sync/evals/trigger-eval.json @@ -0,0 +1,182 @@ +[ + { + "id": 1, + "query": "Sync the checkout feature file to the living doc", + "should_trigger": true, + "reason": "'sync gherkin to living doc' trigger phrase" + }, + { + "id": 2, + "query": "My feature file is out of sync with the living doc catalog", + "should_trigger": true, + "reason": "'feature file out of sync' trigger phrase" + }, + { + "id": 3, + "query": "This scenario has no # AC: comment linking it to the living doc", + "should_trigger": true, + "reason": "'scenario not linked to AC' trigger phrase" + }, + { + "id": 4, + "query": "The step text changed after the UI refactor — what needs updating?", + "should_trigger": true, + "reason": "'step text changed' trigger phrase" + }, + { + "id": 5, + "query": "There is Gherkin drift between the feature files and the living doc", + "should_trigger": true, + "reason": "'gherkin drift' trigger phrase" + }, + { + "id": 6, + "query": "I updated an AC in the living doc — how do I propagate that to the BDD scenario?", + "should_trigger": true, + "reason": "'update living doc after BDD change' and living-doc → feature file sync direction" + }, + { + "id": 7, + "query": "Run a BDD sync between the feature files and living documentation", + "should_trigger": true, + "reason": "'BDD sync' trigger phrase" + }, + { + "id": 8, + "query": "The AC link header is missing from several scenarios in checkout.feature", + "should_trigger": true, + "reason": "'AC link missing in feature file' trigger phrase" + }, + { + "id": 9, + "query": "Sync all scenarios in the payments feature file", + "should_trigger": true, + "reason": "'sync scenarios' trigger phrase" + }, + { + "id": 10, + "query": "The Gherkin scenarios are out of sync with the living doc", + "should_trigger": true, + "reason": "'gherkin out of sync with living doc' trigger phrase" + }, + { + "id": 11, + "query": "Traceability is broken between the feature files and the AC catalog", + "should_trigger": true, + "reason": "'traceability broken' trigger phrase" + }, + { + "id": 12, + "query": "Write a new scenario for the expired promo AC", + "should_trigger": false, + "reason": "Writing new scenarios — routes to living-doc-scenario-creator" + }, + { + "id": 13, + "query": "Implement the step definition for 'When the customer confirms the order'", + "should_trigger": false, + "reason": "Step definition implementation — routes to gherkin-step" + }, + { + "id": 14, + "query": "Find which User Stories have no Gherkin scenarios", + "should_trigger": false, + "reason": "Finding living doc gaps — routes to living-doc-gap-finder" + }, + { + "id": 15, + "query": "Create a new User Story for the checkout capability", + "should_trigger": false, + "reason": "Creating new entities — routes to living-doc-create-user-story" + }, + { + "id": 16, + "query": "Propagate AC changes from the living doc back to the feature files", + "should_trigger": true, + "reason": "'propagate AC changes' trigger phrase" + }, + { + "id": 17, + "query": "The @AC: tag and the # AC: comment are out of sync — what do I do?", + "should_trigger": true, + "reason": "Comment/tag mismatch is a sync issue — core task of this skill" + }, + { + "id": 18, + "query": "Generate a new scenario for the expired promo AC from scratch", + "should_trigger": false, + "reason": "Writing new scenarios from scratch — routes to living-doc-scenario-creator (not syncing existing ones)" + }, + { + "id": 19, + "query": "Run scan_ac_links.py before doing a sync pass", + "should_trigger": true, + "reason": "Auditing AC link headers is the first step of the sync workflow — this skill owns scan_ac_links.py" + }, + { + "id": 20, + "query": "An AC was descoped last sprint — what should happen to the linked scenario?", + "should_trigger": true, + "reason": "Propagating AC status change (descoped) to feature file is a living-doc → feature file sync direction" + }, + { + "id": 21, + "query": "Write behave step definitions for the checkout scenarios", + "should_trigger": false, + "reason": "Writing step definitions — routes to gherkin-step" + }, + { + "id": 22, + "query": "Create a new User Story for the express checkout journey", + "should_trigger": false, + "reason": "Creating a User Story — routes to living-doc-create-user-story" + }, + { + "id": 23, + "query": "Run a full gap analysis to find undocumented behaviors", + "should_trigger": false, + "reason": "Gap analysis — routes to living-doc-gap-finder" + }, + { + "id": 24, + "query": "Scan the webapp and generate PageObjects for the checkout screen", + "should_trigger": false, + "reason": "PageObject scanning — routes to living-doc-pageobject-scan" + }, + { + "id": 25, + "query": "Create a new Functionality entity for promo stacking validation", + "should_trigger": false, + "reason": "Creating a Functionality — routes to living-doc-create-functionality" + }, + { + "id": 26, + "query": "Update the acceptance criterion wording on US-007-01", + "should_trigger": false, + "reason": "Updating an existing entity — routes to living-doc-update" + }, + { + "id": 27, + "query": "Create a Feature entity for the notifications service", + "should_trigger": false, + "reason": "Creating a Feature — routes to living-doc-create-feature" + }, + { + "id": 28, + "query": "What does PR #217 affect in the living doc?", + "should_trigger": false, + "reason": "Impact analysis — routes to living-doc-impact-analysis" + }, + { + "id": 29, + "query": "Generate BDD scenarios for all active ACs on US-009", + "should_trigger": false, + "reason": "Generating scenarios — routes to living-doc-scenario-creator" + }, + { + "id": 30, + "query": "Add data-cy attributes to the checkout confirm button", + "should_trigger": false, + "reason": "Instrumenting templates with data-cy — routes to data-cy-instrument" + } +] \ No newline at end of file diff --git a/skills/gherkin-living-doc-sync/scripts/scan_ac_links.py b/skills/gherkin-living-doc-sync/scripts/scan_ac_links.py new file mode 100644 index 0000000..cfd1f98 --- /dev/null +++ b/skills/gherkin-living-doc-sync/scripts/scan_ac_links.py @@ -0,0 +1,223 @@ +#!/usr/bin/env python3 +""" +scan_ac_links.py — scan living-doc .feature files for missing or malformed @AC: traceability. + +Usage: + python scan_ac_links.py + +Only scans files under 'features/us/' and 'features/functionalities/' — living-doc paths. +Other feature files (smoke tests, regression suites, exploratory probes) are skipped. + +For every Scenario: / Scenario Outline: line found in living-doc files, checks that: + - At least one '@AC:' Cucumber tag appears on a tag line immediately above it (error) + - A matching '# AC:' human-readable comment is also present (warning) + - The AC ID follows the canonical format: AC:- + e.g. AC:US-001-01, AC:US-1-01, AC:FEAT-003-02, AC:FUNC-001-03 + - No two scenarios in the same file reference the same AC ID (duplicate check) + +Exit code: 0 if all checks pass, 1 if any errors are found (warnings do not fail). + +Glossary reference: skills/references/living-doc-glossary.md +""" + +import re +import sys +from pathlib import Path + +# Matches a @AC: Cucumber tag with optional /param:value segments: +# @AC:US-1-01 or @AC:US-001-01/aspect:username-input/coverage:partial +AC_TAG = re.compile( + r"@AC:((?:US|FEAT|FUNC)-\d+-\d{2})((?:/[a-z][\w-]*:[^\s/@]+)*)", + re.IGNORECASE, +) +# Matches a # AC: human-readable comment: # AC:US-1-01 or # AC:US-1-01 (...) +AC_COMMENT_LINE = re.compile(r"^\s*#\s*AC:((?:US|FEAT|FUNC)-\d+-\d{2})", re.IGNORECASE) +# Canonical AC ID only (no params): AC:- +AC_ID_FORMAT = re.compile(r"^AC:(US|FEAT|FUNC)-\d+-\d{2}$", re.IGNORECASE) +TAG_LINE = re.compile(r"^\s*@\S+") +COMMENT_LINE = re.compile(r"^\s*#") +SCENARIO_LINE = re.compile(r"^\s*(Scenario:|Scenario Outline:)\s*(.+)", re.IGNORECASE) + +# Living-doc path components — only files under these directories are scanned +LIVING_DOC_PATHS = ("features/us/", "features/functionalities/") + + +def is_living_doc_file(path: Path) -> bool: + """Return True if the file is in a living-doc feature directory.""" + normalised = str(path).replace("\\", "/") + return any(segment in normalised for segment in LIVING_DOC_PATHS) + + +def get_tags_above(lines: list[str], scenario_index: int) -> list[str]: + """Return all Cucumber tag tokens from consecutive tag lines immediately above a scenario.""" + tags: list[str] = [] + i = scenario_index - 1 + while i >= 0 and TAG_LINE.match(lines[i]): + tags.extend(re.findall(r"@\S+", lines[i])) + i -= 1 + return tags + + +def get_ac_comments_above(lines: list[str], scenario_index: int) -> set[str]: + """Return AC IDs mentioned in # AC: comments in the tag/comment block above a scenario.""" + ac_ids: set[str] = set() + i = scenario_index - 1 + while i >= 0 and (TAG_LINE.match(lines[i]) or COMMENT_LINE.match(lines[i])): + m = AC_COMMENT_LINE.match(lines[i]) + if m: + ac_ids.add(m.group(1).upper()) + i -= 1 + return ac_ids + + +def scan_file(path: Path) -> list[dict]: + issues = [] + lines = path.read_text(encoding="utf-8").splitlines() + seen: dict[str, list[int]] = {} + + for i, line in enumerate(lines): + if not SCENARIO_LINE.match(line): + continue + + lineno = i + 1 + scenario_title = SCENARIO_LINE.match(line).group(2).strip() + tags_above = get_tags_above(lines, i) + ac_tags = [t for t in tags_above if t.upper().startswith("@AC:")] + + if not ac_tags: + issues.append({ + "file": str(path), + "line": lineno, + "scenario": scenario_title, + "issue": "missing_ac_tag", + "severity": "error", + "detail": "No '@AC:' tag on the tag line(s) immediately above this scenario.", + }) + continue + + ac_comments = get_ac_comments_above(lines, i) + + for tag in ac_tags: + # Extract AC ID and optional /param:value segments + m = AC_TAG.match(tag.lstrip("@")) + if not m: + issues.append({ + "file": str(path), + "line": lineno, + "scenario": scenario_title, + "issue": "malformed_ac_id", + "severity": "error", + "detail": ( + f"'{tag}' does not match @AC:-[/param:value] format " + "(e.g. @AC:US-1-01, @AC:US-001-01/aspect:username-input)." + ), + }) + continue + ac_id_raw = "AC:" + m.group(1) # reconstruct full AC ID + if not AC_ID_FORMAT.match(ac_id_raw): + issues.append({ + "file": str(path), + "line": lineno, + "scenario": scenario_title, + "issue": "malformed_ac_id", + "severity": "error", + "detail": ( + f"'{ac_id_raw}' does not match AC:- format " + "(e.g. AC:US-001-01, AC:US-1-01)." + ), + }) + continue + plain_id = m.group(1).upper() # e.g. US-1-01 + seen.setdefault(ac_id_raw.upper(), []).append(lineno) + # Warn if the human-readable # AC: comment is missing for this tag + if plain_id not in {c.upper() for c in ac_comments}: + issues.append({ + "file": str(path), + "line": lineno, + "scenario": scenario_title, + "issue": "missing_ac_comment", + "severity": "warning", + "detail": ( + f"@AC:{plain_id} tag is present but no matching '# AC:{plain_id} ...' " + "human-readable comment was found above the scenario." + ), + }) + + for ac_id, lines_found in seen.items(): + if len(lines_found) > 1: + issues.append({ + "file": str(path), + "line": lines_found, + "scenario": None, + "issue": "duplicate_ac_link", + "detail": ( + f"AC '{ac_id}' is linked from {len(lines_found)} scenarios " + f"at lines {lines_found}. Each AC should map to at most one scenario." + ), + }) + + return issues + + +def main(features_dir: str) -> None: + root = Path(features_dir) + if not root.exists(): + print(f"Error: directory not found: {features_dir}") + sys.exit(1) + + all_files = sorted(root.rglob("*.feature")) + feature_files = [f for f in all_files if is_living_doc_file(f)] + skipped = len(all_files) - len(feature_files) + + if skipped: + print(f"Skipped {skipped} non-living-doc feature file(s) (smoke, regression, exploratory).") + + if not feature_files: + print(f"No living-doc .feature files found under {features_dir}") + print(f"Expected files under 'features/us/' or 'features/functionalities/'") + return + + all_issues: list[dict] = [] + for f in feature_files: + all_issues.extend(scan_file(f)) + + if not all_issues: + print(f"\u2705 All {len(feature_files)} living-doc feature file(s) pass AC link checks.") + return + + errors = [i for i in all_issues if i.get("severity") == "error"] + warnings = [i for i in all_issues if i.get("severity") == "warning"] + + by_type: dict[str, list] = {} + for issue in all_issues: + by_type.setdefault(issue["issue"], []).append(issue) + + print(f"Found {len(errors)} error(s) and {len(warnings)} warning(s) in {len(feature_files)} living-doc feature file(s):\n") + + labels = { + "missing_ac_tag": "[ERROR] MISSING @AC: TAG", + "malformed_ac_id": "[ERROR] MALFORMED AC ID", + "duplicate_ac_link": "[ERROR] DUPLICATE AC LINK", + "missing_ac_comment": "[WARN] MISSING # AC: COMMENT", + } + + for issue_type, items in sorted(by_type.items()): + print(f"{'=' * 60}") + print(f" {labels.get(issue_type, issue_type)} ({len(items)})") + print(f"{'=' * 60}") + for item in items: + loc = item["line"] if isinstance(item["line"], int) else item["line"][0] + print(f" {item['file']}:{loc}") + if item.get("scenario"): + print(f" Scenario: {item['scenario']}") + print(f" {item['detail']}") + print() + + sys.exit(1 if errors else 0) + + +if __name__ == "__main__": + if len(sys.argv) != 2: + print("Usage: python scan_ac_links.py ") + sys.exit(1) + main(sys.argv[1]) diff --git a/skills/gherkin-step/SKILL.md b/skills/gherkin-step/SKILL.md new file mode 100644 index 0000000..c6ddddc --- /dev/null +++ b/skills/gherkin-step/SKILL.md @@ -0,0 +1,365 @@ +--- +name: gherkin-step +description: > + Implement Gherkin step definitions that are clean, reusable, and maintainable. Activate when + writing or reviewing step definition code, binding Gherkin text to automation, managing shared + state between steps, configuring parameter types, parsing DataTable or DocString arguments, or + setting up Before/After hooks. Covers Python behave, Cucumber TypeScript/Java, and Cucumber-Scala. + Triggers on: "step definitions", "implement Gherkin steps", "Cucumber step", "behave step", + "parameter type", "DataTable", "DocString", "Before hook", "After hook", "World object", + "step context", "step state sharing", "how to share state between steps", + "register step definition", "hook setup". + Does NOT trigger for: writing Gherkin scenarios (use living-doc-scenario-creator); writing + unit tests (use your project's test framework). + Pairs with living-doc-scenario-creator and living-doc-pageobject-scan (PageObjects must + exist before step definitions reference them). +license: Apache-2.0 +compatibility: GitHub Copilot +--- + +# Gherkin Step Definition Standards + +> **Glossary:** Feature, PageObject, Functionality — see [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)). + +> **Framework scope:** This skill covers step definition idioms for **Python behave**, **Cucumber TypeScript**, **Cucumber Java**, and **Cucumber-Scala**. The PageObject ecosystem in this toolkit uses **Playwright + TypeScript** — Python or Java projects must adapt PageObject patterns to their own test framework. All BDD principles (thin steps, no selectors in steps, context object) apply regardless of language. + +## Respect the boundary with Gherkin text + +If the user asks to write or review a **Gherkin scenario / feature file**, do not draft the +scenario here. Explain that this skill covers **step definition code** only, then route the user to +`living-doc-scenario-creator` for the Gherkin text itself. + +--- + +## Context initialization — how PageObjects reach steps + +> **Prerequisite:** PageObject classes must exist before step definitions can reference them. If PageObjects have not yet been generated for the screens under test, use `living-doc-pageobject-scan` first to produce them. + +**Python behave:** Step definitions receive a fresh `context` object each scenario. Attach PageObjects in a `before_scenario` hook. + +```python +# ✅ — Before hook initialises the PageObject once per scenario +@before_scenario +def setup_pages(context): + context.checkout_page = CheckoutPage(context.browser.new_page()) +``` + +**Cucumber TypeScript (Playwright):** Use a typed `World` class registered with `setWorldConstructor`. + +```typescript +// world.ts +import { setWorldConstructor, World, IWorldOptions } from '@cucumber/cucumber'; +import { Browser, Page } from '@playwright/test'; +import { CheckoutPage } from './pages/checkout.page'; + +export interface AppWorld extends World { + browser: Browser; + page: Page; + checkoutPage: CheckoutPage; +} + +class AppWorldImpl extends World implements AppWorld { + browser!: Browser; + page!: Page; + checkoutPage!: CheckoutPage; + constructor(options: IWorldOptions) { super(options); } +} +setWorldConstructor(AppWorldImpl); +``` + +```typescript +// hooks.ts +Before(async function (this: AppWorld) { + this.browser = await chromium.launch(); + this.page = await this.browser.newPage(); + this.checkoutPage = new CheckoutPage(this.page); +}); +``` + +**Step definition file naming:** +- One file per domain area: `checkout.steps.ts` / `checkout_steps.py` +- Place under `playwright/steps/` (TS) or `features/steps/` (Python) +- Never name a file `steps.ts` or `steps.py` — the name must identify the domain + +**Given precondition state — OGP-01:** `Given` preconditions that navigate to an arbitrary element using `.first()` (or any positional selector) without asserting the domain-specific state required by the scenario create false positives. If the scenario distinguishes between, for example, a domain the user owns versus one they do not own, supply fixture-provided IDs via the env fixture (`ownedDomainId`, `nonOwnedDomainId`) rather than picking the first element from a list. + +```typescript +// ✅ — uses fixture-provided ID to guarantee correct ownership state +Given('I am on the Domain Detail page for a domain I own', async ({ page, env }) => { + await page.goto(`/auth/domain/${env.ownedDomainId}`); +}); + +// ❌ — both "own" and "do not own" variants resolve to the same arbitrary domain +Given('I am on the Domain Detail page for a domain I own', async ({ page }) => { + await page.goto('/auth/all-domains'); + await page.getByTestId('domain-name-link').first().click(); +}); +``` + +--- + +## Function naming convention + +Name step functions after the business action, not the full step text: +- `step_confirm_order` ✅ — concise, action-based +- `step_customer_confirms_the_order` ❌ — verbatim transcription of the step + +--- + +## Keep step definitions thin + +Step definitions are bindings — they translate Gherkin text into calls to PageObjects, domain +objects, or service clients. Business logic must not live in step definitions. + +**Keyword rules:** +- `Given` steps must not contain assertions — they set up preconditions only +- `When` steps must not contain assertions — they perform actions only +- Assertions belong exclusively in `Then` steps +- A step body consisting only of comments is a no-op and is not permitted as a final implementation — NOP-01. If the system pre-establishes state externally, the step must assert that state is actually present rather than silently pass. + +```typescript +// ✅ — pre-populated state is explicitly asserted +When('I select a domain', async ({ page }) => { + // Domain is pre-populated from context; assert selector shows a value + await expect(page.getByTestId('domain-selector')).not.toBeEmpty(); +}); + +// ❌ — comment-only body; regression goes undetected +When('I select a domain', async ({ page }) => { + // Domain is pre-selected when navigated from within a domain context + // No additional action needed +}); +``` + +```python +# ✅ — thin; delegates to PageObject +@when('the customer confirms the order') +def step_confirm_order(context): + context.checkout_page.confirm_order() + +# ❌ — business logic embedded in the step +@when('the customer confirms the order') +def step_confirm_order(context): + context.cart.total *= (1 - context.discount / 100) + context.order_status = "placed" +``` + +--- + +## Encapsulate selectors in PageObjects + +Step definitions for domain-level scenarios must not contain CSS selectors, element IDs, or XPath. +Encapsulate all selector logic in PageObjects (selector preference: `data-testid` > `aria-label`/role > CSS class). + +```typescript +// ✅ — PageObject hides selector details +When("the customer submits the order", async function (this: OrderWorld) { + await this.checkoutPage.submitOrder(); // CheckoutPage owns the selector +}); + +// ❌ — selector leaks into the step definition +When("the customer submits the order", async function (this: OrderWorld) { + await this.page.click('[data-testid="submit-order-btn"]'); +}); +``` + +**Pending data-cy rule — SS-01:** Do not write CSS-class-OR-data-cy fallback combos (e.g. `'.modal, [data-cy="x"]'`) in step files or PageObjects. A fallback combo either always passes (the CSS class matches when the data-cy does not exist) or always fails (neither exists), both masking real failures. If the confirmed `data-cy` attribute does not yet exist in the template: +1. Use the most stable interim selector available and mark it with `// @pending data-cy: `. +2. Raise it as a gap in WORK_LOG.md §4 so it is tracked for instrumentation via `data-cy-instrument`. + +```typescript +// ✅ — interim selector clearly flagged +await expect(page.locator('[role="dialog"]')).toBeVisible(); // @pending data-cy: dialog-access-request + +// ❌ — fallback combo hides whether the real selector ever lands +await expect(page.locator('[role="dialog"], .access-request-form')).toBeVisible(); +``` + +--- + +## Share state using the context / World object + +Never use global or module-level variables — they cause test contamination across scenarios. +Use the framework-provided context object, which is instantiated fresh for each scenario. + +| Framework | State object | Pattern | +|-----------|-------------|---------| +| behave (Python) | `context` | Attach attributes: `context.order = ...` | +| Cucumber (TypeScript) | `World` class | Extend `World`; access via `this` | + +```python +# ✅ behave — context carries state across steps +@given('a customer with a "{tier}" membership') +def step_given_customer(context, tier): + context.customer = Customer(tier=tier) + +@then("the discount is {rate:d}%") +def step_assert_discount(context, rate): + assert context.customer.discount_rate() == rate +``` + +**Hardcoded assertion rule — HTA-01:** `Then` assertions must not contain string literals that were set in a preceding `When` step (magic constants). Pass the value through the World context or as a `{string}` Cucumber parameter, or assert a structural property instead. + +```typescript +// ✅ — domain name flows through World context +When('I import a domain named {string}', async function (this: AppWorld, name: string) { + this.importedDomainName = name; + await this.importDomainPage.importDomain(name); +}); +Then('the imported domain is visible in the domain list', async function (this: AppWorld) { + await expect(this.page.getByTestId('domain-name-link').getByText(this.importedDomainName)).toBeVisible(); +}); + +// ❌ — hardcoded constant couples assertion to the When step's implementation detail +Then('the imported domain is visible in the domain list', async ({ page }) => { + await expect(page.getByTestId('domain-name-link').getByText('E2E Import Test')).toBeVisible(); +}); +``` + +--- + +## Use typed parameters + +**PTM-01 — `{string}` over `{word}` for UI labels:** Use `{string}` (quoted) for any step parameter that could contain spaces — tab names, button labels, section headings, status values. `{word}` matches only a single token without spaces and will silently fail to match multi-word values, and having both `{word}` and `{string}` variants in the same file causes Cucumber ambiguity errors. Remove all `{word}` variants and consolidate on `{string}`. + +```typescript +// ✅ — {string} matches "Version management", "Run history", "About" +When('I click the {string} tab', async ({ domainDetailPage }, tab: string) => { + await domainDetailPage.gotoTab(tab); +}); + +// ❌ — {word} silently fails for "Version management" and "Run history" +When('I click the {word} tab', async ({ domainDetailPage }, tab: string) => { + await domainDetailPage.gotoTab(tab); +}); +``` + +```python +# ✅ — :d casts to int automatically +@when("the customer purchases {quantity:d} units") +def step_purchase(context, quantity: int): + context.cart.add_item(context.sku, quantity) +``` + +--- + +## Parse DataTable and DocString arguments + +```python +# ✅ — DataTable as list of dicts +@when("the customer adds the following items") +def step_add_items(context): + for row in context.table: + context.cart.add_item(row["sku"], int(row["quantity"])) + +# ✅ — DocString as raw text +@when("the system receives the following payload") +def step_receive_payload(context): + context.payload = json.loads(context.text) +``` + +--- + +## Configure hooks correctly + +| Hook | Use for | Must not use for | +|------|---------|-----------------| +| `before_scenario` / `Before` | Set up context state, seed data | Asserting behaviour | +| `after_scenario` / `After` | Cleanup: rollback DB, close browser | Seeding data | +| `before_all` / `BeforeAll` | Expensive one-time setup (start containers) | Per-test state | +| `after_all` / `AfterAll` | Stop containers, close connections | Per-test cleanup | + +`before_scenario` runs before **every** scenario by default, so add a tag check when setup +should only apply to a subset. When explaining this pattern, say explicitly that the hook still +fires for every scenario; the `if "database" in context.tags` check only gates the expensive setup. + +Tag hooks to scope them to specific scenarios, and pair setup with matching cleanup: + +```python +@before_scenario +def setup_database(context): + if "database" in context.tags: + context.db = create_test_db() + +@after_scenario +def teardown_database(context): + if "database" in context.tags: + context.db.teardown() +``` + +--- + +## Wizard navigation rules + +Apply these rules when implementing step definitions for multi-step wizards. They detect +"cheat steps" — steps that appear to navigate a wizard but exercise no real behaviour. + +### CS-01 — Assert arrival at each wizard step + +Every wizard step navigation must verify arrival at the next step via a step-specific element +assertion before the step completes. Blind `continueButton.click()` chains without an arrival +assertion are forbidden: if the Continue button is disabled (validation failure), the click +silently does nothing and the test continues with a false pass. + +```typescript +// ✅ — arrival at the Owner step is explicitly verified +When('I complete the About step', async ({ createDomainAboutPage, createDomainOwnerPage }) => { + await createDomainAboutPage.fillDomainName('E2E Test Domain'); + await createDomainAboutPage.fillCostCenter('1234'); + await createDomainAboutPage.continueButton.click(); + await expect(createDomainOwnerPage.ownersTable).toBeVisible(); // arrival assertion +}); + +// ❌ — two blind clicks; no assertion that either step was actually reached +Given('I am on the Target dataset step', async ({ createDomainPage }) => { + await createDomainPage.continueButton.click(); + await createDomainPage.continueButton.click(); +}); +``` + +### CS-02 — Do not use `toHaveURL()` to detect wizard step progress in a scrolling stepper + +In a single-URL scrolling stepper the URL does not change between wizard steps. A +`toHaveURL(/step-name/)` assertion always passes regardless of which step is active, +giving false confidence. Assert the step-specific landmark element is visible instead. + +```typescript +// ✅ — asserts the Owner step's landmark element is in view +await expect(createDomainOwnerPage.ownersTable).toBeVisible(); + +// ❌ — URL never changes; assertion always passes +await expect(page).toHaveURL(/owner/i); +``` + +Once `data-cy` attributes are added to wizard step headers, prefer: +```typescript +await expect(page.getByTestId('step-owner')).toBeVisible(); +``` + +### CS-03 — Do not use `page.goBack()` inside an SPA wizard + +`page.goBack()` navigates the browser's URL history, not the wizard's internal state. Inside +an Angular (or other SPA) wizard, this takes the user back to the *previous page* (e.g. All +Domains), not to the previous wizard step. Use the wizard's own Back button or click the +stepper step header to navigate backward. + +```typescript +// ✅ — uses the wizard's own back navigation +await createDomainWizardPage.backButton.click(); +await expect(createDomainAboutPage.domainNameInput).not.toBeEmpty(); + +// ❌ — navigates away from the wizard entirely +await page.goBack(); +await expect(createDomainPage.domainNameInput).not.toBeEmpty(); +``` + +--- + +## Out-of-scope routing + +| Request | Correct skill | +|---|---| +| Write or review Gherkin scenarios / feature files | `living-doc-scenario-creator` | +| Generate or update PageObject classes | `living-doc-pageobject-scan` | +| Sync `@AC:` traceability tags in feature files | `gherkin-living-doc-sync` | +| Write unit tests | Use your project's test framework directly | diff --git a/skills/gherkin-step/evals/evals.json b/skills/gherkin-step/evals/evals.json new file mode 100644 index 0000000..c431f19 --- /dev/null +++ b/skills/gherkin-step/evals/evals.json @@ -0,0 +1,184 @@ +{ + "skill_name": "gherkin-step", + "evals": [ + { + "id": 1, + "category": "happy-path", + "prompt": "Write a Python behave step definition for 'When the customer confirms the order'. The CheckoutPage PageObject has a confirm_order() method.", + "expected_output": "Outputs a thin step definition that delegates entirely to the PageObject: @when('the customer confirms the order') / def step_confirm_order(context): / context.checkout_page.confirm_order(). No CSS selectors, no business logic, and no assertions inside the step. The method call is the only line in the step body (plus any state retrieval from context).", + "files": [], + "expectations": [ + "Step delegates to CheckoutPage.confirm_order() - no selector or business logic in step body", + "Uses @when decorator", + "Accesses checkout_page via context object", + "No assertions in a When step" + ] + }, + { + "id": 2, + "category": "happy-path", + "prompt": "In behave, I have a Given step that creates a customer and a Then step that checks the discount. How do I pass the customer object between them without using global variables?", + "expected_output": "Use the context object - behave instantiates it fresh per scenario so there is no contamination. In the Given step, attach the object: context.customer = Customer(tier=tier). In the Then step, read it back: context.customer.discount_rate(). Never store state in module-level or global variables. Provides a code example showing both steps using context.customer.", + "files": [], + "expectations": [ + "Uses context object to pass state between steps", + "Explicitly warns against global or module-level variables", + "Shows both Given and Then steps using context.customer", + "Notes that context is fresh per scenario" + ] + }, + { + "id": 3, + "category": "happy-path", + "prompt": "I have a step 'When the customer adds the following items' that takes a DataTable with columns 'sku' and 'quantity'. How do I parse the table in behave?", + "expected_output": "Access the table via context.table and iterate with a for loop. Each row is a dict-like object: for row in context.table: context.cart.add_item(row['sku'], int(row['quantity'])). Notes that column values are strings by default - cast quantity to int explicitly. Provides a complete step definition example.", + "files": [], + "expectations": [ + "Uses context.table to access the DataTable", + "Iterates rows with a for loop", + "Casts quantity to int (DataTable values are strings)", + "Delegates the actual add_item call to a domain object" + ] + }, + { + "id": 4, + "category": "regression", + "prompt": "Review this step definition:\n\n@when('the customer confirms the order')\ndef step_confirm_order(context):\n context.cart.total *= (1 - context.discount / 100)\n context.order_status = 'placed'\n context.db.save(context.order)", + "expected_output": "Flags the step as violating the 'keep step definitions thin' rule. Business logic (discount calculation, status assignment, DB write) must not live in step definitions - these belong in domain objects or PageObjects. The step should only call a method: context.checkout_page.confirm_order() or context.order_service.confirm_order(). Provides a corrected version that delegates to a domain object.", + "files": [], + "expectations": [ + "Identifies business logic in the step body as the violation", + "Flags discount calculation, status assignment, and DB write as out-of-place", + "Provides a corrected thin version that delegates to a domain/service object", + "Explains that step definitions are bindings only" + ] + }, + { + "id": 5, + "category": "negative", + "prompt": "Write a Gherkin scenario for when the promo code is expired.", + "expected_output": "Writing Gherkin scenarios is out of scope for this skill - routes to living-doc-scenario-creator. gherkin-step handles step definition code; living-doc-scenario-creator handles Gherkin text.", + "files": [], + "expectations": [ + "Does not write a Gherkin scenario", + "Routes to living-doc-scenario-creator", + "Explains the distinction: step binding code vs. Gherkin text" + ] + }, + { + "id": 6, + "category": "paraphrase", + "prompt": "How do I pass data between step files in behave? For example, I create an order in a Given step and need to check it in a Then step in a different file.", + "expected_output": "Use the context object - behave's built-in mechanism for sharing state across steps from different files. In the Given step (any file), assign context.order = .... In the Then step (another file), read context.order. The context is scoped to the scenario and reset between scenarios. Provides a concrete code example showing a Given in one file and Then in another, both using context.order.", + "files": [], + "expectations": [ + "Recommends context object for cross-file state sharing", + "Shows assignment in Given and access in Then", + "Notes that context is scenario-scoped", + "Does not recommend global variables" + ] + }, + { + "id": 7, + "category": "edge-case", + "prompt": "I want a Before hook in behave that only runs for scenarios tagged @database. How do I scope it?", + "expected_output": "Use the context.tags check inside the before_scenario hook: @before_scenario / def setup_database(context): / if 'database' in context.tags: / context.db = create_test_db(). This scopes setup to only the tagged scenarios. Notes that before_scenario runs before every scenario by default - the tag check prevents unnecessary setup. Advises pairing with an after_scenario hook to clean up: if 'database' in context.tags: context.db.teardown().", + "files": [], + "expectations": [ + "Uses context.tags check to scope the hook", + "Shows before_scenario hook with 'database' tag check", + "Advises pairing with a cleanup after_scenario hook", + "Notes the hook runs before every scenario by default without the tag check" + ] + }, + { + "id": 8, + "category": "output-format", + "prompt": "Show me the correct structure for a Cucumber TypeScript step definition that reads the order ID from the World object and submits the order using a PageObject.", + "expected_output": "Output is a TypeScript code block. The step uses the When decorator and accesses this (typed as OrderWorld). Calls this.checkoutPage.submitOrder(this.orderId). The World interface/class includes orderId and checkoutPage properties. No CSS selectors appear in the step body - they are encapsulated in CheckoutPage. Example follows the pattern: When('the customer submits the order', async function (this: OrderWorld) { await this.checkoutPage.submitOrder(); }).", + "files": [], + "expectations": [ + "Output is a TypeScript code block", + "Step uses async function with this typed as a World class", + "Delegates to PageObject method - no selectors in step body", + "World object holds page and state properties" + ] + }, + { + "id": 9, + "prompt": "My step is throwing AttributeError: \"Context\" object has no attribute \"checkout_page\". How do I fix this? The CheckoutPage class is in pages/checkout_page.py.", + "expected_output": "Explanation that context.checkout_page must be initialized before the step runs, using a before_scenario hook in environment.py; shows the correct before_scenario pattern attaching a CheckoutPage instance to context.", + "files": [], + "category": "edge-case", + "expectations": [ + "Explains context.checkout_page must be assigned before the step runs", + "Points to a Before hook (environment.py or Cucumber Before) as the fix location", + "Shows the correct assignment: context.checkout_page = CheckoutPage(context.browser)", + "Does not suggest modifying the step function itself to work around the missing attribute" + ] + }, + { + "id": 10, + "prompt": "Should I name my behave step function step_when_the_customer_clicks_the_confirm_order_button or step_confirm_order?", + "expected_output": "Recommends step_confirm_order - concise action-based name. Explains why verbose full-phrase names are discouraged: they duplicate the Gherkin text and make step files harder to scan.", + "files": [], + "category": "happy-path", + "expectations": [ + "Recommends the concise action-based name: step_confirm_order", + "Flags the verbose full-phrase name as an anti-pattern", + "Explains the reason: long names are hard to read and cause truncation in test output", + "Consistent with the step file naming convention: one file per domain" + ] + }, + { + "id": 11, + "category": "happy-path", + "prompt": "How do I set up AppWorld in Cucumber TypeScript so that each scenario gets a fresh Playwright browser context?", + "expected_output": "Define an AppWorld class implementing the World interface with a `page` property and a `browser` property. In the constructor, record `this.browser = browser`. In a Before hook, call `this.page = await this.browser.newPage()`. Register with `setWorldConstructor(AppWorld)`. Each scenario gets a fresh context automatically.", + "files": [], + "expectations": [ + "Shows AppWorld class with World interface implementation", + "Includes setWorldConstructor(AppWorld) registration", + "Before hook creates new page and assigns to this.page", + "After hook closes page" + ] + }, + { + "id": 12, + "category": "happy-path", + "prompt": "Show me a complete Cucumber TypeScript World setup for a Playwright test suite — I need the AppWorld interface, the class, and the Before/After hooks.", + "expected_output": "Provide: (1) AppWorld interface with page and browser properties, (2) AppWorld class implementing it with browser injection in constructor, (3) setWorldConstructor(AppWorld), (4) Before hook: this.page = await this.browser.newPage(), (5) After hook: await this.page?.close().", + "files": [], + "expectations": [ + "AppWorld interface shown with correct property types", + "setWorldConstructor called with the class", + "Before and After hooks shown", + "No hardcoded browser creation inside the class" + ] + }, + { + "id": 13, + "category": "regression", + "prompt": "My Cucumber TypeScript World setup creates a new browser in the constructor with `playwright.chromium.launch()`. What is wrong with this?", + "expected_output": "The browser should be injected via the World constructor parameter `{ browser }`, not created inside the class. Creating a browser in the constructor means each scenario launches a separate browser process, bypassing Cucumber's browser management. Use `this.browser = browser` and create only a new page (`this.browser.newPage()`) in the Before hook.", + "files": [], + "expectations": [ + "Identifies the anti-pattern: launching browser inside constructor", + "Explains the correct pattern: inject browser via constructor parameter", + "Shows Before hook creating a new page instead" + ] + }, + { + "id": 14, + "category": "happy-path", + "prompt": "What is the correct file naming convention for Cucumber TypeScript step definition files? I have one large steps.ts file right now.", + "expected_output": "Use one step file per domain, named after the domain: e.g. `checkout.steps.ts`, `login.steps.ts`. Never use a generic `steps.ts` filename — it makes it hard to locate step definitions during debugging and conflicts when multiple domains are merged.", + "files": [], + "expectations": [ + "Recommends per-domain file naming", + "Shows example: checkout.steps.ts, login.steps.ts", + "Flags generic steps.ts as an anti-pattern" + ] + } + ] +} \ No newline at end of file diff --git a/skills/gherkin-step/evals/trigger-eval.json b/skills/gherkin-step/evals/trigger-eval.json new file mode 100644 index 0000000..8a821b8 --- /dev/null +++ b/skills/gherkin-step/evals/trigger-eval.json @@ -0,0 +1,230 @@ +[ + { + "id": 1, + "query": "Write step definitions for the checkout feature file", + "should_trigger": true, + "reason": "'step definitions' trigger phrase" + }, + { + "id": 2, + "query": "Implement Gherkin steps for the login scenarios", + "should_trigger": true, + "reason": "'implement Gherkin steps' trigger phrase" + }, + { + "id": 3, + "query": "How do I write a Cucumber step for 'When the customer submits the order'?", + "should_trigger": true, + "reason": "'Cucumber step' trigger phrase" + }, + { + "id": 4, + "query": "How do I write a behave step for 'Given a gold tier customer'?", + "should_trigger": true, + "reason": "'behave step' trigger phrase" + }, + { + "id": 5, + "query": "How do I configure a parameter type so a number is cast to int?", + "should_trigger": true, + "reason": "'parameter type' trigger phrase" + }, + { + "id": 6, + "query": "How do I parse a DataTable in a step definition?", + "should_trigger": true, + "reason": "'DataTable' trigger phrase" + }, + { + "id": 7, + "query": "How do I access a DocString payload inside a step?", + "should_trigger": true, + "reason": "'DocString' trigger phrase" + }, + { + "id": 8, + "query": "How do I set up a Before hook in Cucumber TypeScript?", + "should_trigger": true, + "reason": "'Before hook' trigger phrase" + }, + { + "id": 9, + "query": "How do I clean up after each scenario using an After hook?", + "should_trigger": true, + "reason": "'After hook' trigger phrase" + }, + { + "id": 10, + "query": "How do I use the World object to share page instances between steps?", + "should_trigger": true, + "reason": "'World object' trigger phrase" + }, + { + "id": 11, + "query": "How do I manage step context across multiple step files?", + "should_trigger": true, + "reason": "'step context' trigger phrase" + }, + { + "id": 12, + "query": "How do I share data between two step definitions in behave?", + "should_trigger": true, + "reason": "'step state sharing' trigger phrase" + }, + { + "id": 13, + "query": "How do I share state between steps in a Cucumber scenario?", + "should_trigger": true, + "reason": "'how to share state between steps' trigger phrase" + }, + { + "id": 14, + "query": "How do I register a step definition pattern for a new step text?", + "should_trigger": true, + "reason": "'register step definition' trigger phrase" + }, + { + "id": 15, + "query": "How do I set up hooks for my Cucumber test suite?", + "should_trigger": true, + "reason": "'hook setup' trigger phrase" + }, + { + "id": 16, + "query": "Write a Gherkin scenario for the promo code feature", + "should_trigger": false, + "reason": "Writing Gherkin scenarios — routes to living-doc-scenario-creator" + }, + { + "id": 17, + "query": "Write a unit test for the discount calculation function", + "should_trigger": false, + "reason": "Unit test request — out of scope for this toolkit (no test-unit-write skill defined)" + }, + { + "query": "How do I initialize the CheckoutPage in behave so that context.checkout_page is available in my When and Then step definitions?", + "should_trigger": true, + "id": 18, + "reason": "Initialising World context in behave — routes to gherkin-step" + }, + { + "query": "My step function is called step_when_the_customer_clicks_the_submit_order_button — is that the right naming convention for behave?", + "should_trigger": true, + "id": 19, + "reason": "Step function naming convention for behave — routes to gherkin-step" + }, + { + "id": 20, + "query": "How do I set up AppWorld in Cucumber TypeScript for Playwright integration?", + "should_trigger": true, + "reason": "World/AppWorld setup for Cucumber TypeScript — gherkin-step owns this" + }, + { + "id": 21, + "query": "How do I use setWorldConstructor to register a custom World with a Playwright browser?", + "should_trigger": true, + "reason": "Registering World constructor in Cucumber TypeScript — gherkin-step" + }, + { + "id": 22, + "query": "Write a Gherkin scenario for the checkout flow", + "should_trigger": false, + "reason": "Writing scenarios — routes to living-doc-scenario-creator" + }, + { + "id": 23, + "query": "Create a User Story for the payment capability", + "should_trigger": false, + "reason": "Creating a User Story — routes to living-doc-create-user-story" + }, + { + "id": 24, + "query": "Sync the feature files with the living doc after AC changes", + "should_trigger": false, + "reason": "Feature file / AC sync — routes to gherkin-living-doc-sync" + }, + { + "id": 25, + "query": "Run a gap analysis on our test coverage", + "should_trigger": false, + "reason": "Gap analysis — routes to living-doc-gap-finder" + }, + { + "id": 26, + "query": "Scan the webapp and generate PageObjects for all screens", + "should_trigger": false, + "reason": "PageObject scanning — routes to living-doc-pageobject-scan" + }, + { + "id": 27, + "query": "Document the atomic behavior for cart validation in the living doc", + "should_trigger": false, + "reason": "Creating a Functionality — routes to living-doc-create-functionality" + }, + { + "id": 28, + "query": "Update the wording of AC-1 on US-042", + "should_trigger": false, + "reason": "Updating a living doc entity — routes to living-doc-update" + }, + { + "id": 29, + "query": "What does PR #217 affect in the living doc?", + "should_trigger": false, + "reason": "Impact analysis — routes to living-doc-impact-analysis" + }, + { + "id": 30, + "query": "Create a Feature entity for the orders service", + "should_trigger": false, + "reason": "Creating a Feature — routes to living-doc-create-feature" + }, + { + "id": 31, + "query": "Generate a feature file for all active ACs in US-007", + "should_trigger": false, + "reason": "Generating scenarios from User Story — routes to living-doc-scenario-creator" + }, + { + "id": 32, + "query": "Add data-cy attributes to the confirm button template", + "should_trigger": false, + "reason": "Instrumenting templates — routes to data-cy-instrument" + }, + { + "id": 33, + "query": "Find all User Stories that have no BDD scenarios", + "should_trigger": false, + "reason": "Gap analysis — routes to living-doc-gap-finder" + }, + { + "id": 34, + "query": "Delete BDD artifacts linked to the deprecated checkout feature", + "should_trigger": false, + "reason": "BDD artifact cleanup — routes to bdd-maintain" + }, + { + "id": 35, + "query": "Create a new User Story for the express checkout journey", + "should_trigger": false, + "reason": "Creating a User Story — routes to living-doc-create-user-story" + }, + { + "id": 36, + "query": "Fix the @AC: traceability tags in the checkout feature file", + "should_trigger": false, + "reason": "@AC: tag sync — routes to gherkin-living-doc-sync" + }, + { + "id": 37, + "query": "Deprecate the checkout feature in the living doc", + "should_trigger": false, + "reason": "Deprecating an entity — routes to living-doc-update" + }, + { + "id": 38, + "query": "Crawl the UI to discover all screens and generate page objects", + "should_trigger": false, + "reason": "UI crawl and PageObject creation — routes to living-doc-pageobject-scan" + } +] \ No newline at end of file diff --git a/skills/living-doc-create-feature/SKILL.md b/skills/living-doc-create-feature/SKILL.md new file mode 100644 index 0000000..4e3cecf --- /dev/null +++ b/skills/living-doc-create-feature/SKILL.md @@ -0,0 +1,156 @@ +--- +name: living-doc-create-feature +description: > + Define a system surface (UI screen, API endpoint, service, or module) as a Feature entity, + enabling impact analysis and traceability in the living documentation. Use when documenting + a new screen, API, service, or module; mapping surfaces to User Stories; or resolving + Feature naming conflicts. + Triggers on: "document a new feature", "create a feature entity", "new screen documentation", + "document an API endpoint", "feature registry", "what feature owns this", "map user story to + feature", "system surface documentation", "feature owners", "feature dependencies", + "duplicate feature name", "resolve feature naming", "rename feature". + Does NOT trigger for: creating User Stories (use living-doc-create-user-story); defining + behaviors (use living-doc-create-functionality); scanning PageObjects (use + living-doc-pageobject-scan); deprecating (use living-doc-update). + Pairs with living-doc-create-functionality and living-doc-create-user-story. + After creating, add a feature_registry entry for living-doc-impact-analysis. +license: Apache-2.0 +compatibility: GitHub Copilot +--- + +# Living Doc — Create Feature + +> **Key concepts:** Feature, Functionality, User Story, AC — see [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)). +> **BDD schemas:** PageObject file header schema — see [living-doc-bdd-schemas](../references/living-doc-bdd-schemas.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-bdd-schemas.md)). + +## Step 1 — Identify the system surface + +Before asking, **scan the conversation context** for a surface name, surface type, and owning team already stated by the user. If the prompt already gives enough information to draft the entity, infer the obvious details and propose the Feature directly instead of blocking on follow-up questions. Ask only for what is still missing or ambiguous. + +Ask only for what is missing: *What system surface does this Feature represent?* + +Select the surface type: + +| Type | Examples | +|---|---| +| `UI` | A web page, modal, or named screen (e.g. Checkout Page, Login Screen) | +| `API` | A REST/GraphQL endpoint or endpoint group, including a backend service's public API contract (e.g. Orders API, Payment Gateway API) | +| `Service` | A named backend/service surface with its own contract (e.g. Customer Profile Service) | +| `Worker` | An asynchronous/background processor (e.g. Notification Worker) | +| `Module` | A distinct internal module with a stable contract or bounded responsibility | +| `Library` | A substantial shared internal library that is intentionally tracked as its own surface | + +Feature names should be **noun phrases** that name the surface. If it could plausibly be a PageObject or service/module class name (for example `PaymentPage`), it is usually a good Feature name. + +**One surface test abstraction ≈ one Feature** — a UI screen has a PageObject, an API endpoint group has an annotated endpoint method. See [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)) for details. + +## Step 2 — Describe purpose and scope + +Ask: +- *What user interactions or system calls does this Feature own?* +- *What does it NOT own?* (helps define boundaries) + +Write a one-to-two sentence purpose statement using business language — not implementation detail. + +## Step 3 — Link to User Stories + +Ask: *Which User Stories rely on this Feature?* + +If unknown at creation time, leave empty `[]` but warn: + +> "An orphaned Feature (not linked to any User Story) contributes no traceable business value. +> Link at least one User Story or mark this as exploratory with status: 'candidate'. +> Orphaned Features are surfaced as gaps in living-doc-gap-finder reports." + +## Step 4 — Enumerate Functionalities + +Ask: *What atomic behaviors (Functionalities) does this Feature implement?* + +Functionalities can be empty at creation time — they are built out as development proceeds. +Add as `"functionalities": ["FUNC-"]` references only when the Functionality has been +formally defined. If they are described as informal notes or candidates (not yet registered as +FUNC entries), leave the array as `[]` and add a warning: +> "Candidate Functionalities must be formally defined using **living-doc-create-functionality** +> before being linked here." + +## Step 5 — Identify owners and dependencies + +| Field | What to capture | +|---|---| +| `owners` | Team name(s) or individual(s) responsible for this surface | +| `external_dependencies` | Services or systems this Feature calls (e.g. payment-gateway, order-service) | + +## Step 6 — Output canonical Feature entity + +> **ID assignment:** Before assigning a `FEAT-` ID, run +> `python scripts/next_id.py --type FEAT --catalog catalog.json` +> to get the next available numeric ID (e.g. `FEAT-012`) and avoid collisions. +> If your project uses readable slug IDs instead of numeric ones, derive the slug from the +> surface name (e.g. `FEAT-checkout`, `FEAT-orders-api`, `FEAT-notifications-centre`) +> and confirm there is no existing slug with the same name in the catalog. + +Use a readable slug ID based on the business surface name: `FEAT-` (for example `FEAT-checkout`, `FEAT-orders-api`, `FEAT-notifications-centre`). For UI names ending in generic words like `Page`, `Screen`, or `Modal`, you may omit that trailing UI noun in the ID when the shorter slug stays unambiguous. + +Output the entity as a **single fenced `json` code block** whenever you have enough information to draft it. Keep any warnings or follow-up questions **outside** the code block. If the user gives a named surface but not all metadata, ask the missing questions and still include a starter draft in the same reply, using inferred purpose/surface type, `status: "planned"`, and `[]` for relationships that are still unknown. If the request explicitly asks to create the entity from the given details, emit the draft immediately. + +Canonical JSON fields: + +| Field | Required | Value | +|---|---|---| +| `type` | Yes | `Feature` | +| `id` | Yes | `FEAT-` | +| `name` | Yes | Noun phrase (e.g. "Login Page") | +| `surface_type` | Yes | `UI` \| `API` \| `Service` \| `Worker` \| `Module` \| `Library` | +| `purpose` | Yes | One-to-two sentence description in business language | +| `status` | Yes | `planned` \| `active` \| `candidate` \| `deprecated` | +| `user_stories` | Yes | List of `US-<...>` IDs (use `[]` if unknown) | +| `functionalities` | Yes | List of `FUNC-<...>` IDs (use `[]` if unknown or still only candidates) | +| `owners` | Yes | Team name(s) | +| `external_dependencies` | Yes | Names of services or systems this Feature calls | + +If `user_stories` is `[]`, repeat the orphan warning from Step 3 outside the JSON. If `functionalities` is `[]` because they are still just candidate notes, repeat the formal-definition warning from Step 4 outside the JSON. + +## Anti-patterns to flag + +| Anti-pattern | Warning | +|---|---| +| Feature covers multiple unrelated screens | Split into one Feature per distinct screen | +| Feature name is a verb (e.g. "Process Payment") | Feature names should be nouns — name the surface. Verb phrases describe *what the surface does*, which belongs in a Functionality entity (use **living-doc-create-functionality**). If it could be a PageObject or service/module class name, it is usually a better Feature name. | +| Feature has no User Stories and no Functionalities | Orphan Feature — it contributes no traceable business value. Link at least one User Story, mark it as `candidate` if it is still exploratory, or delete it if it is no longer relevant. Orphan Features will be surfaced as gaps in living-doc-gap-finder reports. | +| Shared utility library documented as a Feature | By default, a shared utility library is not a Feature — document it as an `external_dependency` on the consumer Features. Only create a standalone Feature when the library is substantial enough to be treated as a distinct shared surface; in that case use `surface_type: "Library"` and mark it as a shared internal dependency. Features should map 1:1 to distinct/deployable surfaces. | +| Feature name encodes implementation technology (e.g. "React Login Component", "Spring Payment Controller") | Feature names describe the business surface, not the stack. Use "Login Screen" (UI) or "Payment API" (API) — technology choice is an implementation detail that changes without the surface changing. | +| `surface_type` is `UI` for a backend REST controller or service | A REST endpoint group is an `API` surface. `UI` is reserved for screens a human interacts with directly. Misclassification breaks impact analysis routing between frontend and backend changes. | +| Feature shares a name with an existing Feature | Check for duplicates before creating. Identical names indicate a merge candidate or a scope overlap — clarify the boundary before proceeding. | +| `functionalities` field contains User Story IDs (US-nnn) | `functionalities` takes `FUNC-` IDs. User Stories are linked under `user_stories`, not here. | + +## Out-of-scope routing + +| Request type | Use instead | +|---|---| +| Creating a User Story | **living-doc-create-user-story** | +| Defining an atomic behavior (Functionality) | **living-doc-create-functionality** | + +## Next steps after creation + +| Action | Skill | +|---|---| +| Define atomic behaviors for this Feature | **living-doc-create-functionality** | +| Link to an existing User Story | **living-doc-update** (add Feature to the User Story's `features` list) | +| Generate BDD PageObjects for a UI Feature | **living-doc-pageobject-scan** | +| Update feature_registry for impact traceability | **living-doc-impact-analysis** (see Feature registry format in that skill) | + +> **Renaming a Feature:** Changing a Feature's `id` or `name` requires cascading updates. Load `living-doc-update` and follow the "Rename a Feature" workflow there, which covers: Functionality `feature_id` fields, `feature_registry` entry, `manifest.json`, `seed.yaml`, PageObject file headers, and Gherkin feature file `# Feature:` headers. + +## Script — `validate_entity.py` + +After outputting the entity, validate it against the canonical schema before saving to the catalog. Do not save the entity if the script exits with code 1. + +```bash +# Validate the output (run from the toolkit root) +python skills/living-doc-update/scripts/validate_entity.py entity.json + +# With referential integrity checks against the full catalog +python skills/living-doc-update/scripts/validate_entity.py entity.json --catalog catalog.json +``` + +Exits 0 if valid (warnings are non-blocking). Exits 1 if any required field is missing, the ID format is wrong, or the status or `surface_type` value is invalid. diff --git a/skills/living-doc-create-feature/evals/evals.json b/skills/living-doc-create-feature/evals/evals.json new file mode 100644 index 0000000..fc43da6 --- /dev/null +++ b/skills/living-doc-create-feature/evals/evals.json @@ -0,0 +1,230 @@ +{ + "skill_name": "living-doc-create-feature", + "evals": [ + { + "id": 1, + "category": "happy-path", + "prompt": "I want to document the Checkout Page as a Feature entity.", + "expected_output": "Agent identifies surface type as UI. Asks for: purpose and scope (what user interactions it owns), linked User Stories, known Functionalities, owners, external dependencies. Outputs a canonical Feature JSON with id=FEAT-checkout, surface_type=UI, at least one US link, at least two Functionality references, owners, and external_dependencies including payment-gateway and order-service.", + "files": [], + "expectations": [ + "Identifies surface_type as UI for a web page", + "Asks what user interactions the screen owns", + "Asks for User Story links", + "Asks for Functionalities owned by this Feature", + "Asks for owners and external dependencies", + "Outputs valid canonical Feature JSON with all fields populated", + "Notes: 1 Feature ≈ 1 PageObject" + ] + }, + { + "id": 2, + "category": "happy-path", + "prompt": "I want to document the Orders API as a Feature.", + "expected_output": "Agent identifies surface_type as API. Asks for endpoint group description, linked User Stories, owned Functionalities, team owners, and external dependencies (e.g. order-db, notification-service). Outputs Feature JSON with id=FEAT-orders-api, surface_type=API.", + "files": [], + "expectations": [ + "surface_type is API", + "Purpose describes the API endpoint group in business terms", + "Outputs valid canonical Feature JSON" + ] + }, + { + "id": 3, + "category": "regression", + "prompt": "My Feature entity has no User Stories linked and no Functionalities. Is that OK?", + "expected_output": "Agent warns: an orphaned Feature with no User Stories and no Functionalities contributes no traceable business value. Either link to at least one User Story (or flag as 'candidate' status if it is exploratory), or delete it if it is no longer relevant. Notes that orphan Features will appear in living-doc-gap-finder reports as a gap.", + "files": [], + "expectations": [ + "Warns about orphaned Feature", + "Suggests linking to at least one User Story or setting status to 'candidate'", + "Notes orphan Features appear in gap reports" + ] + }, + { + "id": 4, + "category": "happy-path", + "prompt": "Should I name my Feature 'Process Payment' or 'Payment Page'?", + "expected_output": "Feature names should be nouns — they identify a system surface. 'Payment Page' is correct. 'Process Payment' is a verb phrase that describes a Functionality or an action, not a surface. The naming rule: if it can be a PageObject class name, it's a good Feature name.", + "files": [], + "expectations": [ + "Recommends noun name 'Payment Page'", + "Explains that verb phrases (Process Payment) belong to Functionalities", + "Mentions PageObject class naming as an alignment check" + ] + }, + { + "id": 5, + "category": "negative", + "prompt": "Create a User Story for the checkout capability.", + "expected_output": "User Story creation — routes to living-doc-create-user-story. This skill creates Feature entities (system surfaces), not User Stories.", + "files": [], + "expectations": [ + "Does not create a User Story", + "Routes to living-doc-create-user-story" + ] + }, + { + "id": 6, + "category": "paraphrase", + "prompt": "I want to add the Notification Service to the living doc as a system component. Where do I start?", + "expected_output": "Agent identifies this as a Feature entity creation (system surface). Asks: what type of surface is it (API, Worker, UI)? What User Stories does it enable? What Functionalities does it own? Who are the owners? What are the external dependencies (SMTP relay, template store)? Outputs a canonical Feature JSON for FEAT-notification-service with surface_type=Worker or API, at least one User Story link, owners, and external_dependencies.", + "files": [], + "expectations": [ + "Identifies this as a Feature creation request despite 'system component' phrasing", + "Asks for surface_type (Worker/API for a notification service)", + "Asks for User Story links, owners, and external dependencies", + "Outputs valid canonical Feature JSON" + ] + }, + { + "id": 7, + "category": "edge-case", + "prompt": "We have a shared utility library used by three different services. Should it be documented as a Feature entity?", + "expected_output": "A shared utility library is not a Feature — it is not a system surface with its own UI, API, or event stream. Document it as an external_dependency in the Feature entities of the services that consume it. If the library is substantial enough to warrant its own living doc entry, create it as a Feature with surface_type=Library and explicitly mark it as a shared internal dependency. Note: Features map 1:1 to deployable/distinct surfaces — shared libraries are infrastructure, not surfaces.", + "files": [], + "expectations": [ + "Advises against creating a Feature for a shared utility library by default", + "Recommends listing it as external_dependency in consumer Feature entities", + "Notes the surface_type=Library option for substantial libraries", + "Explains the 1:1 Feature-to-surface mapping rule" + ] + }, + { + "id": 8, + "category": "output-format", + "prompt": "Create a Feature entity for the 'User Profile' screen in our banking app. It is owned by team-identity and depends on the customer-service API.", + "expected_output": "The output contains a single fenced ```json code block with a valid Feature entity. The JSON object includes all required fields: type, id, name, surface_type, purpose, status, user_stories, functionalities, owners, external_dependencies. The id field follows the FEAT- convention. The surface_type value is one of: UI, API, Service, Module. No prose appears inside the JSON code block.", + "files": [], + "expectations": [ + "Single fenced ```json code block", + "All required fields present: type, id, name, surface_type, purpose, status, user_stories, functionalities, owners, external_dependencies", + "id follows FEAT- convention", + "surface_type is one of UI/API/Service/Worker/Module/Library", + "No prose inside the code block" + ] + }, + { + "id": 9, + "category": "file-based", + "description": "Create a Feature entity from rough notes provided as a markdown file.", + "prompt": "Read evals/files/raw-feature-notes.md and produce a canonical Feature entity JSON from the notes.", + "expected_output": "Agent extracts the surface type, purpose, owners, and dependencies from the notes and produces a Feature entity JSON in a fenced ```json block. The id follows FEAT- convention derived from the feature name in the notes. The surface_type, owners, and external_dependencies fields are populated from the file content. user_stories and functionalities are empty arrays with a warning that they must be linked.", + "files": [ + "evals/files/raw-feature-notes.md" + ], + "expectations": [ + "Feature entity JSON in fenced ```json block", + "id follows FEAT- from the notes", + "surface_type, owners, external_dependencies populated from file", + "user_stories and functionalities are empty arrays with a linking warning" + ] + }, + { + "id": 10, + "category": "regression", + "prompt": "I want to document a Feature called 'Spring Payment Controller'. Is that a good Feature name?", + "expected_output": "No. Feature names should describe the business surface, not the technology stack. 'Spring Payment Controller' encodes an implementation detail (Spring framework) that changes without the business surface changing. Use 'Payment API' (surface_type=API) instead. The test: if the tech stack changed to Node.js, would the Feature name still be accurate? 'Payment API' would be — 'Spring Payment Controller' would not.", + "files": [], + "expectations": [ + "Flags technology-encoded name as an anti-pattern", + "Recommends the technology-agnostic business surface name (e.g. Payment API)", + "Explains the rule: Feature names describe surfaces, not stack", + "Notes that technology detail changes without the surface changing" + ] + }, + { + "id": 11, + "category": "edge-case", + "prompt": "I'm documenting our OrdersController — a Spring REST controller that handles GET /orders and POST /orders. Should surface_type be 'UI'?", + "expected_output": "No. A REST controller is an API surface, not a UI surface. surface_type should be 'API'. UI is reserved for screens that a human interacts with directly (e.g. a web page or modal). Misclassifying a REST endpoint as UI breaks impact analysis routing between frontend and backend changes. Use surface_type='API' with a name like 'Orders API'.", + "files": [], + "expectations": [ + "Corrects surface_type from UI to API", + "Explains: UI is for screens humans interact with directly", + "Notes that misclassification breaks impact analysis routing", + "Recommends the correct name: Orders API" + ] + }, + { + "id": 12, + "category": "happy-path", + "prompt": "Document a Feature for our PaymentEventProcessor — an asynchronous worker that listens on a Kafka topic and processes payment events in the background.", + "expected_output": "Agent identifies surface_type as Worker (asynchronous/background processor). Purpose describes the business contract: processes payment events asynchronously from the Kafka topic. Outputs Feature JSON with id=FEAT-payment-event-processor, surface_type=Worker, purpose in business language describing the event processing responsibility, and external_dependencies including the Kafka topic and any downstream services.", + "files": [], + "expectations": [ + "Identifies surface_type as Worker", + "Feature name is a noun phrase: 'Payment Event Processor'", + "Purpose describes the business contract, not the technology", + "external_dependencies includes the Kafka topic or downstream services", + "Outputs valid canonical Feature JSON" + ] + }, + { + "id": 13, + "category": "regression", + "prompt": "I have these candidate behaviors for FEAT-checkout: 'validate cart', 'apply promo', 'confirm order'. Should I add them to the functionalities field right now?", + "expected_output": "No. Candidate Functionalities that have not yet been formally defined as FUNC- entities must not be listed in the functionalities array. The functionalities field only accepts formally registered FUNC- IDs. While these are still informal notes, leave functionalities as [] and note the candidates externally. Use living-doc-create-functionality to formally define each one before linking it here.", + "files": [], + "expectations": [ + "Instructs to leave functionalities as [] for unregistered candidates", + "Explains: functionalities field only accepts formal FUNC- IDs", + "Does not list the candidates inside the JSON functionalities array", + "Points to living-doc-create-functionality as the next step" + ] + }, + { + "id": 14, + "category": "regression", + "prompt": "Two teams both want to create a Feature called 'Payment Page'. How should I handle the naming conflict?", + "expected_output": "Identical Feature names indicate a merge candidate or a scope overlap — always check for duplicates before creating. If both teams are documenting the same screen/surface, consolidate into a single Feature with both teams as co-owners. If the surfaces are genuinely different (e.g. a customer-facing payment form vs an admin reconciliation screen), disambiguate the names (e.g. 'Customer Payment Page' vs 'Admin Payment Reconciliation Screen') and clarify the boundary. Creating two Features with identical names breaks impact analysis and makes traceability ambiguous.", + "files": [], + "expectations": [ + "Identifies duplicate name as a naming conflict anti-pattern", + "Offers two paths: merge if same surface, disambiguate if different surfaces", + "Warns that identical names break impact analysis and traceability", + "Notes that FEAT ID must also be unique" + ] + }, + { + "id": 16, + "category": "regression", + "prompt": "Create a new Feature entity for the 'Notifications Centre' screen. The catalog already contains FEAT-001 through FEAT-011.", + "expected_output": "Before assigning an ID, the agent runs: python scripts/next_id.py --type FEAT --catalog catalog.json. The script returns FEAT-012. The agent assigns id='FEAT-012' (or the slug 'FEAT-notifications-centre' if the project uses slug IDs). The agent does NOT invent an ID, reuse an existing ID, or leave the id field as a placeholder such as FEAT-XXX or FEAT-. The final JSON contains a fully populated id field before being presented to the user.", + "files": [], + "expectations": [ + "Runs next_id.py --type FEAT before assigning the ID", + "Assigns the ID returned by the script (e.g. FEAT-012)", + "Does not invent, guess, or reuse an ID", + "Does not leave a placeholder (FEAT-XXX, FEAT-, FEAT-unknown)", + "Final JSON has a fully populated id field" + ] + }, + { + "id": 17, + "category": "regression", + "prompt": "Create a Feature entity for the 'Payment Page'. The catalog file is not present — next_id.py cannot be run. What should the agent do?", + "expected_output": "Agent cannot auto-assign a numeric ID. It uses the slug convention instead: id='FEAT-payment-page' derived from the surface name in kebab-case. The agent explicitly states: 'No catalog available — using slug ID FEAT-payment-page. Verify this ID does not conflict with existing entities before saving.' It does NOT invent a numeric ID such as FEAT-001 or FEAT-999 without catalog evidence.", + "files": [], + "expectations": [ + "Falls back to slug ID when catalog is unavailable", + "Slug is derived from the surface name in kebab-case", + "Warns the user to verify no collision before saving", + "Does not invent a numeric ID without catalog evidence" + ] + }, + { + "id": 15, + "category": "regression", + "prompt": "I need to rename the Feature FEAT-checkout to FEAT-checkout-v2 because we split the checkout domain. What steps do I need to follow?", + "expected_output": "Renaming a Feature requires a cascade: (1) Update the entity file (id and name fields). (2) Update feature_id in all linked Functionality entities. (3) Update feature_registry. (4) Update manifest.json / seed.yaml. (5) Update PageObject file headers. (6) Update Gherkin # Feature: headers. (7) Run living-doc-gap-finder to confirm no orphan references remain. Load living-doc-update and follow the 'Rename a Feature' workflow.", + "files": [], + "expectations": [ + "Lists all 7 cascade steps for a Feature rename", + "Mentions living-doc-update as the skill that owns the rename workflow", + "Notes feature_registry and Functionality.feature_id must be updated", + "Recommends gap-finder run at the end to confirm clean state" + ] + } + ] +} \ No newline at end of file diff --git a/skills/living-doc-create-feature/evals/files/raw-feature-notes.md b/skills/living-doc-create-feature/evals/files/raw-feature-notes.md new file mode 100644 index 0000000..9341b3e --- /dev/null +++ b/skills/living-doc-create-feature/evals/files/raw-feature-notes.md @@ -0,0 +1,33 @@ +# Raw Feature Notes — Notifications Centre +# Used by: living-doc-create-feature file-based eval +# +# These are rough notes from a discovery session. The agent must convert +# them into a canonical Feature entity JSON. + +## What is it? +A screen inside the mobile banking app where customers can see all their +recent alerts (balance updates, payment confirmations, security notices). +The screen is called the "Notifications Centre". + +## Who owns it? +team-notifications (primary owner) +team-security also contributes for security alert types + +## What does it depend on? +- notification-service (backend API that stores and delivers alerts) +- customer-profile-service (to fetch customer preferences for notification types) + +## Surface type +UI — it's a screen in the mobile app + +## Status +In development — expected to go live in Q3 + +## Linked user stories +Not known yet — to be linked during sprint planning + +## Atomic behaviors (functionalities) +- Mark a notification as read +- Filter notifications by type (payments, security, promotions) +- Delete a notification +(These are candidates — not formally defined yet) diff --git a/skills/living-doc-create-feature/evals/fixture-map.md b/skills/living-doc-create-feature/evals/fixture-map.md new file mode 100644 index 0000000..1734693 --- /dev/null +++ b/skills/living-doc-create-feature/evals/fixture-map.md @@ -0,0 +1,38 @@ +# Fixture Map — living-doc-create-feature + +## Fixture files + +| File | Description | +|---|---| +| `evals/files/raw-feature-notes.md` | Discovery session notes for the Notifications Centre screen — used by the file-based eval (id=9) | + +## Eval to fixture mapping + +| Eval ID | Category | Fixture file(s) | Coverage | +|---|---|---|---| +| 1 | happy-path | _(none — conversational)_ | UI surface: Checkout Page — full elicitation workflow | +| 2 | happy-path | _(none)_ | API surface: Orders API — surface type identification | +| 3 | regression | _(none)_ | Orphan Feature warning (no User Stories, no Functionalities) | +| 4 | happy-path | _(none)_ | Anti-pattern: verb-phrase Feature name (Process Payment) | +| 5 | negative | _(none)_ | Routing: User Story creation → living-doc-create-user-story | +| 6 | paraphrase | _(none)_ | Notification Service — surface type (Worker/API) identification | +| 7 | edge-case | _(none)_ | Shared utility library — external_dependency vs Feature entity | +| 8 | output-format | _(none)_ | Canonical JSON output: all required fields, FEAT-kebab id, surface_type enum | +| 9 | file-based | `raw-feature-notes.md` | Notifications Centre — extract surface from rough notes | +| 10 | regression | _(none)_ | Anti-pattern: technology-encoded Feature name (Spring Payment Controller) | +| 11 | edge-case | _(none)_ | Anti-pattern: surface_type=UI for a REST controller | +| 12 | happy-path | _(none)_ | Worker surface type: PaymentEventProcessor | +| 13 | regression | _(none)_ | Candidate Functionalities not formally defined — leave functionalities=[] | +| 14 | regression | _(none)_ | Duplicate Feature name conflict resolution | + +## Trigger eval summary + +18 entries: 13 `should_trigger=true`, 5 `should_trigger=false` + +| Routes to | Query count | +|---|---| +| living-doc-create-user-story | 1 | +| living-doc-create-functionality | 1 | +| living-doc-pageobject-scan | 1 | +| living-doc-scenario-creator | 1 | +| living-doc-update | 1 | diff --git a/skills/living-doc-create-feature/evals/trigger-eval.json b/skills/living-doc-create-feature/evals/trigger-eval.json new file mode 100644 index 0000000..faedd5f --- /dev/null +++ b/skills/living-doc-create-feature/evals/trigger-eval.json @@ -0,0 +1,164 @@ +[ + { + "id": 1, + "query": "Document the checkout page as a Feature entity", + "should_trigger": true, + "reason": "Explicit 'document feature entity' trigger phrase" + }, + { + "id": 2, + "query": "Create a Feature entity for the Orders API", + "should_trigger": true, + "reason": "Explicit 'create a feature entity' trigger keyword" + }, + { + "id": 3, + "query": "New screen documentation for the account preferences page", + "should_trigger": true, + "reason": "'new screen documentation' trigger phrase" + }, + { + "id": 4, + "query": "Document a new API endpoint — the payment initiation endpoint", + "should_trigger": true, + "reason": "'document an API endpoint' trigger phrase" + }, + { + "id": 5, + "query": "Update the feature registry with the new notification service", + "should_trigger": true, + "reason": "'feature registry' trigger keyword" + }, + { + "id": 6, + "query": "What feature owns the checkout screen?", + "should_trigger": true, + "reason": "'what feature owns this' trigger phrase" + }, + { + "id": 7, + "query": "Map User Story US-007 to its Feature", + "should_trigger": true, + "reason": "'map user story to feature' trigger phrase" + }, + { + "id": 8, + "query": "I need to document the discount engine as a system surface", + "should_trigger": true, + "reason": "Documenting a system surface — Feature creation workflow" + }, + { + "id": 9, + "query": "Create a feature entity for the authentication module", + "should_trigger": true, + "reason": "Explicit 'create feature entity' trigger" + }, + { + "id": 10, + "query": "What are the owners and dependencies for the checkout feature?", + "should_trigger": true, + "reason": "Asking about Feature properties — skill can populate or validate" + }, + { + "id": 11, + "query": "Create a user story for the checkout capability", + "should_trigger": false, + "reason": "User Story creation — routes to living-doc-create-user-story" + }, + { + "id": 12, + "query": "Document the atomic behavior: validate cart before checkout", + "should_trigger": false, + "reason": "Atomic behavior — routes to living-doc-create-functionality" + }, + { + "id": 13, + "query": "Scan the checkout page for PageObjects", + "should_trigger": false, + "reason": "UI scan — routes to living-doc-pageobject-scan" + }, + { + "id": 14, + "query": "Generate Gherkin scenarios for the checkout User Story", + "should_trigger": false, + "reason": "Scenario creation — routes to living-doc-scenario-creator" + }, + { + "id": 15, + "query": "Register the notification background worker in the living doc as a system surface", + "should_trigger": true, + "reason": "Documenting a background worker as a system surface — Feature creation (surface_type=Worker)" + }, + { + "id": 16, + "query": "Deprecate the checkout feature in the living doc", + "should_trigger": false, + "reason": "Deprecating an existing entity — routes to living-doc-update" + }, + { + "id": 17, + "query": "Document the Orders Service — it exposes a REST API to place and cancel orders", + "should_trigger": true, + "reason": "Documenting a backend service surface — Feature creation (surface_type=Service or API)" + }, + { + "id": 18, + "query": "Two Features have the same name 'Payment Page' — how do I resolve this?", + "should_trigger": true, + "reason": "Duplicate Feature name resolution is part of the Feature creation workflow" + }, + { + "id": 19, + "query": "Write step definitions for the checkout scenarios", + "should_trigger": false, + "reason": "Writing step definitions — routes to gherkin-step" + }, + { + "id": 20, + "query": "Run a dead code audit to find unused step definitions", + "should_trigger": false, + "reason": "BDD artifact cleanup — routes to bdd-maintain" + }, + { + "id": 21, + "query": "Run a gap analysis on the living doc", + "should_trigger": false, + "reason": "Gap analysis — routes to living-doc-gap-finder" + }, + { + "id": 22, + "query": "Sync the feature files with the updated AC catalog", + "should_trigger": false, + "reason": "Feature file sync — routes to gherkin-living-doc-sync" + }, + { + "id": 23, + "query": "What does PR #217 affect in the living doc?", + "should_trigger": false, + "reason": "Impact analysis — routes to living-doc-impact-analysis" + }, + { + "id": 24, + "query": "Add data-cy attributes to the login form", + "should_trigger": false, + "reason": "Instrumenting templates — routes to data-cy-instrument" + }, + { + "id": 25, + "query": "Generate BDD scenarios for all ACs on US-007", + "should_trigger": false, + "reason": "Generating scenarios — routes to living-doc-scenario-creator" + }, + { + "id": 26, + "query": "Create a new Functionality for the promo validation logic", + "should_trigger": false, + "reason": "Creating a Functionality — routes to living-doc-create-functionality" + }, + { + "id": 27, + "query": "I need to rename FEAT-checkout to FEAT-checkout-v2 — what are all the cascade steps?", + "should_trigger": true, + "reason": "Renaming a Feature — create-feature owns the Feature schema and triggers rename guidance via living-doc-update" + } +] \ No newline at end of file diff --git a/skills/living-doc-create-feature/scripts/next_id.py b/skills/living-doc-create-feature/scripts/next_id.py new file mode 100644 index 0000000..77942e0 --- /dev/null +++ b/skills/living-doc-create-feature/scripts/next_id.py @@ -0,0 +1,142 @@ +#!/usr/bin/env python3 +""" +next_id.py — Living Doc ID Auto-Assigner + +Scans the living documentation and returns the next available ID for a given entity type. +Use this before creating a new entity to avoid ID collisions. + +Usage: + python next_id.py --type US --catalog catalog.json → US-005 + python next_id.py --type FEAT --catalog catalog.json → FEAT-012 + python next_id.py --type FUNC --catalog catalog.json → FUNC-003 + python next_id.py --type AC --parent US-007 --catalog catalog.json → AC:US-007-05 + python next_id.py --type AC --parent FUNC-002 --catalog catalog.json → AC:FUNC-002-03 + +Exits with code 0 and prints the next ID on stdout. +Exits with code 1 and prints an error on stderr if the catalog cannot be read, +the entity type is unknown, or --parent is missing when --type AC is used. + +Catalog JSON must contain one of: + - Top-level keys: "user_stories", "features", "functionalities" + - Or nested under a "catalog" key: {"catalog": {"user_stories": [...], ...}} +""" + +import argparse +import json +import re +import sys + +# Maps entity type token → (catalog collection key, ID regex with capture group for the number) +ENTITY_TYPE_MAP: dict[str, tuple[str, re.Pattern]] = { + "US": ("user_stories", re.compile(r"^US-(\d+)$")), + "FEAT": ("features", re.compile(r"^FEAT-(\d+)$")), + "FUNC": ("functionalities", re.compile(r"^FUNC-(\d+)$")), +} + +# Width of the numeric suffix (zero-padded) +ID_WIDTH = 3 + + +def load_catalog(path: str) -> dict: + with open(path) as f: + raw = json.load(f) + # Support both {"catalog": {...}} and flat {"user_stories": [...]} formats + return raw.get("catalog", raw) + + +def next_entity_id(catalog: dict, entity_type: str) -> str: + """ + Return the next sequential ID for US, FEAT, or FUNC entities. + Scans the matching collection for the highest existing numeric suffix. + """ + if entity_type not in ENTITY_TYPE_MAP: + raise ValueError( + f"Unknown entity type '{entity_type}'. " + f"Must be one of: {sorted(ENTITY_TYPE_MAP)}" + ) + collection_key, pattern = ENTITY_TYPE_MAP[entity_type] + entities: list[dict] = catalog.get(collection_key, []) + + max_num = 0 + for entity in entities: + m = pattern.match(entity.get("id", "")) + if m: + max_num = max(max_num, int(m.group(1))) + + return f"{entity_type}-{max_num + 1:0{ID_WIDTH}d}" + + +def next_ac_id(catalog: dict, parent_id: str) -> str: + """ + Return the next sequential AC ID for a given parent entity (User Story or Functionality). + AC format: AC:- (two-digit zero-padded suffix) + + Scans the parent entity's acceptance_criteria list for the highest existing number. + """ + prefix = parent_id.split("-")[0] + collection_map = {"US": "user_stories", "FUNC": "functionalities"} + collection_key = collection_map.get(prefix) + if not collection_key: + raise ValueError( + f"Cannot determine entity collection for parent '{parent_id}'. " + f"Prefix must be 'US' or 'FUNC'." + ) + + entities: list[dict] = catalog.get(collection_key, []) + parent = next((e for e in entities if e.get("id") == parent_id), None) + if parent is None: + raise ValueError(f"Entity '{parent_id}' not found in catalog") + + ac_pattern = re.compile(rf"^AC:{re.escape(parent_id)}-(\d+)$") + max_num = 0 + for ac in parent.get("acceptance_criteria", []): + m = ac_pattern.match(ac.get("id", "")) + if m: + max_num = max(max_num, int(m.group(1))) + + return f"AC:{parent_id}-{max_num + 1:02d}" + + +def main() -> None: + parser = argparse.ArgumentParser( + description="Return the next available living doc entity ID." + ) + parser.add_argument( + "--type", "-t", + required=True, + choices=["US", "FEAT", "FUNC", "AC"], + help="Entity type to generate an ID for", + ) + parser.add_argument( + "--parent", "-p", + help="Parent entity ID — required when --type is AC (e.g. US-007 or FUNC-002)", + ) + parser.add_argument( + "--catalog", "-c", + required=True, + help="Path to the catalog JSON file", + ) + args = parser.parse_args() + + if args.type == "AC" and not args.parent: + print("Error: --parent is required when --type is AC", file=sys.stderr) + sys.exit(1) + + try: + catalog = load_catalog(args.catalog) + if args.type == "AC": + result = next_ac_id(catalog, args.parent) + else: + result = next_entity_id(catalog, args.type) + except (FileNotFoundError, json.JSONDecodeError) as exc: + print(f"Error reading catalog: {exc}", file=sys.stderr) + sys.exit(1) + except ValueError as exc: + print(f"Error: {exc}", file=sys.stderr) + sys.exit(1) + + print(result) + + +if __name__ == "__main__": + main() diff --git a/skills/living-doc-create-functionality/SKILL.md b/skills/living-doc-create-functionality/SKILL.md new file mode 100644 index 0000000..8cfcbaa --- /dev/null +++ b/skills/living-doc-create-functionality/SKILL.md @@ -0,0 +1,186 @@ +--- +name: living-doc-create-functionality +description: > + Define an atomic, testable behavior (Functionality) with Acceptance Criteria for unit or + integration tests. Use when writing Functionality-level ACs, + choosing test_type, identifying reuse candidates, or reviewing a Functionality. + Triggers on: "create a functionality", "document an atomic behavior", "functionality AC", + "unit-testable behavior", "define component behavior", "atomic acceptance criteria", + "document a business rule", "create a functionality entity", "functionality acceptance criteria", + "test_type", "unit vs integration test", "choose test type", "link functionality to feature", + "review this functionality", "reuse candidate", "what ACs should I write for". + Does NOT trigger for: E2E User Stories (use living-doc-create-user-story); system + surfaces (use living-doc-create-feature); generating BDD scenarios (use + living-doc-scenario-creator). + Pairs with living-doc-create-feature and living-doc-scenario-creator. After creating, + update the parent Feature's functionalities[] array. +license: Apache-2.0 +compatibility: GitHub Copilot +--- + +# Living Doc — Create Functionality + +> **Key concepts:** Feature, Functionality, User Story, AC — see [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)). +> **BDD schemas:** Functionality feature file template and func_type values — see [living-doc-bdd-schemas](../references/living-doc-bdd-schemas.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-bdd-schemas.md)). + +## Step 1 — Elicit the behavior + +Before asking, **scan the conversation context** for a behavior phrase and parent Feature already stated by the user. If the behavior is already clear, do not re-ask for it. + +Ask only for what is missing: *What is the atomic behavior to document?* + +Express the Functionality `name` as a **verb phrase only** — one atomic responsibility, with no Feature prefix. Keep the owning Feature separate in `feature_id`. + +``` +✅ "Validate cart contains at least one in-stock item" +✅ "Apply gold member discount on qualifying orders" +✅ "Deduct voucher discount before tax is calculated" + +❌ "Handle checkout" (too broad — split into multiple Functionalities) +❌ "The payment page" (that is a Feature, not a Functionality) +``` + +## Step 2 — Identify the parent Feature + +Ask: *Which Feature (system surface) owns this behavior?* only if it is not already obvious from the prompt. + +A Functionality must belong to at least one Feature. If the user clearly names the surface or domain (for example checkout, basket, login, pricing), infer a provisional `feature_id` such as `FEAT-checkout` and proceed. If the Feature truly does not yet exist, suggest creating it with `living-doc-create-feature` first. + +## Step 3 — Elicit Functionality-level Acceptance Criteria + +Functionality ACs describe atomic inputs to outputs. They are: +- **Atomic**: one input condition, one output or side effect per AC +- **Fast-testable**: designed for verification by unit or integration test +- **Unambiguous**: exact error codes, exact output values, exact rule outcomes where relevant + +Write **3-7 ACs** for one coherent behavior. If a Functionality needs around **12 ACs**, treat that as a strong sign it is not atomic and split it into 2-3 focused Functionalities. + +**Completeness checklist — adapt each prompt to the domain before finalizing:** + +| Category | Prompt | +|---|---| +| Empty / null input | "What happens when the input is null, empty, or missing entirely?" | +| Invalid members / states | "What happens when every item is invalid, only some items are valid, an item has an invalid state such as zero quantity or out-of-stock, or the actor is not eligible (for example a non-gold member)?" | +| Boundary values | "What happens below the threshold, exactly on the threshold, and above it?" | +| Rule interactions | "Does this combine with other rules, discounts, promo codes, or validations? If so, what is the stacking or precedence rule?" | +| External dependency | "Does proving this behavior require a real DB / service read or write, or can it be verified as a pure function?" | +| All error codes | "Are all error codes documented explicitly, not just 'error' or 'invalid'?" | + +Warn if only happy-path ACs are present. + +### Choosing `test_type` + +- Use **`unit`** when the behavior can be verified in isolation as a pure calculation, validation rule, or deterministic transformation. +- Use **`integration`** when correctness depends on a real database, uniqueness check, external service, persistence side effect, or cross-component interaction. +- If the behavior could be refactored into a pure function that accepts all required inputs directly, prefer that design and then use **`unit`**. + +### When reviewing an existing Functionality + +Classify findings as **Blocker**, **Important**, or **Nit**. +- **Blocker**: not atomic, vague ACs, or non-testable wording such as "works correctly". +- **Important**: missing error codes, missing boundary conditions, or missing interaction rules. +- **Nit**: wording cleanup that does not change the contract. + +For Blocker or Important findings, propose a split into smaller Functionalities where needed and show rewritten AC examples with exact `When` / `Then` outcomes and explicit error codes. + +## Step 4 — Flag reuse candidates + +Before creating, check whether an identical behavior already exists under any Feature. **Compare ACs, not names** — the same verb phrase in a different Feature context often produces a legitimately different contract. + +> **Scope note:** This step is a lightweight in-session check during creation. For a full cross-catalog duplicate and coverage audit across all existing Functionalities, use `living-doc-gap-finder` instead. + +If the ACs are identical or near-identical across Features or User Stories, prefer **one shared Functionality**. Link every consuming User Story in the `user_stories` array instead of duplicating the ACs. + +> "This is a reuse candidate. If the contract is truly identical, keep one Functionality and link both User Stories to it. Duplicating the same AC in multiple places creates maintenance burden and raises the risk of divergence when the behavior changes." + +If contextually distinct despite similar names, create a new Functionality and note the related one for future reviewers. + +## Step 5 — Output canonical Functionality entity + +When creating a Functionality, output **one fenced `json` code block** and no extra prose inside the block. + +> **ID assignment:** before assigning a `FUNC-nnn` ID, run +> `python scripts/next_id.py --type FUNC --catalog catalog.json` +> to get the next available ID and avoid collisions. + +Use this canonical shape: + +```json +{ + "type": "Functionality", + "id": "FUNC-", + "name": "", + "description": "", + "feature_id": "FEAT-", + "user_stories": ["US-"], + "acceptance_criteria": [ + "When , ", + "When , validation returns INVALID with code ", + "When , " + ], + "test_coverage": [ + {"ac": "AC-1", "test_type": "unit", "justification": "Pure validation rule"}, + {"ac": "AC-2", "test_type": "unit", "justification": "Pure validation rule"} + ], + "status": "planned" +} +``` + +Rules: +- `id` uses the stable draft convention `FUNC-` when no catalog allocator is available in-session. +- `name` stays a verb phrase only. +- `description` and `acceptance_criteria` must stay in plain business language with **no implementation details**. +- Every acceptance criterion must state an exact outcome; error cases must include the explicit error code. +- `test_coverage` must cover every AC and record `unit` or `integration` consistently with Step 3. + +> **Promoting `planned` → `active`:** A Functionality is created with `status: "planned"`. Once the tests backing all its ACs are written and passing, use `living-doc-update` to change the status to `active`. Do not mark a Functionality `active` until its test coverage is in place. + +> **Parent Feature sync:** After saving this entity, load `living-doc-update` and append this `FUNC-` to the parent Feature's `"functionalities"` array. An unlinked Functionality will be flagged as `ORPHAN_FUNCTIONALITY` by `living-doc-gap-finder`. + +## Script — `validate_entity.py` + +After outputting the entity, validate it against the canonical schema before saving to the catalog. Do not save the entity if the script exits with code 1. + +```bash +# Validate the output (run from the toolkit root) +python skills/living-doc-update/scripts/validate_entity.py entity.json + +# With referential integrity checks against the full catalog +python skills/living-doc-update/scripts/validate_entity.py entity.json --catalog catalog.json +``` + +Exits 0 if valid (warnings are non-blocking). Exits 1 if any required field is missing, the ID format is wrong, `parent_feature` does not match `FEAT-*`, or the status value is invalid. + +## Distinguishing Functionality ACs from User Story ACs + +| Dimension | User Story AC | Functionality AC | +|---|---|---| +| Perspective | End user observing outcomes | Developer / component behaviour | +| Scope | Full E2E flow | Single function or method | +| Example | "Order is confirmed and email is sent" | "Returns the discounted total when a valid membership tier is applied" | + +If an AC written here is outcome-based from a user's perspective, it belongs in the User Story — +redirect to `living-doc-create-user-story`. + +## Anti-patterns to flag + +| Anti-pattern | Warning | +|---|---| +| Functionality name is a noun (e.g. "Password Validation") | Names must be verb phrases expressing the atomic behavior — e.g. "Validate Password Strength". | +| Functionality name is broad (e.g. "Handle checkout") | That is not atomic. Split it into smaller behaviors such as validation, pricing, payment authorization, or order submission. | +| Functionality AC describes a full user journey (e.g. "User logs in and sees their dashboard") | That is a User Story AC — redirect to **living-doc-create-user-story**. Functionality ACs describe a single behavior's input to output or side effect. | +| Functionality has only happy-path ACs | Edge cases (null input, boundary values, partial validity, error codes) are missing. Run through the completeness checklist in Step 3 before confirming. | +| AC says "returns error" without specifying the type or code | Specify the exact error code. Without a named code, the AC is not testable. | +| AC wording is vague (e.g. "works correctly", "handles it appropriately") | Rewrite with exact `When` / `Then` behavior and explicit outputs or error codes. | +| Functionality has more than 7 ACs | Review for non-atomic scope. Around 12 ACs is almost certainly too broad and should be split into 2-3 Functionalities. | +| Two Functionalities have identical or near-identical ACs | Duplicate ACs create a maintenance burden. Consolidate into one shared Functionality and link all related `user_stories`. | +| Functionality has no parent Feature | A Functionality without a parent Feature is untraceable — create or identify the parent Feature first. | + +## Out-of-scope routing + +| Request type | Correct skill | +|---|---| +| "Create a User Story" | `living-doc-create-user-story` — this skill documents atomic behaviors, not end-to-end User Stories | +| "Create a Feature entity" | `living-doc-create-feature` — a Feature is a system surface, not an atomic behavior | +| "Write unit tests for this Functionality" | No skill in this toolkit covers unit test authoring — use your project's test framework directly. This skill defines the _what_ (ACs); writing the test code is outside scope. | +| "Generate BDD scenarios for this Functionality" | `living-doc-scenario-creator` | diff --git a/skills/living-doc-create-functionality/evals/evals.json b/skills/living-doc-create-functionality/evals/evals.json new file mode 100644 index 0000000..2844041 --- /dev/null +++ b/skills/living-doc-create-functionality/evals/evals.json @@ -0,0 +1,205 @@ +{ + "skill_name": "living-doc-create-functionality", + "evals": [ + { + "id": 1, + "category": "happy-path", + "prompt": "I want to document the atomic behavior: validate that a cart contains at least one in-stock item before checkout.", + "expected_output": "Agent forms verb-phrase name: 'Validate cart contains at least one in-stock item'. Links to FEAT-checkout. Runs completeness checklist: asks about null/empty cart, out-of-stock items, partially in-stock carts, zero-quantity items. Produces at least 4 ACs: empty cart → CART_EMPTY error, all items out of stock → OUT_OF_STOCK error, valid cart → VALID, zero-quantity item → INVALID_QUANTITY. All test_type=unit. Outputs canonical Functionality JSON.", + "files": [], + "expectations": [ + "Name is a verb phrase (not a noun)", + "Asks about empty cart, out-of-stock, and boundary conditions", + "Runs the completeness checklist", + "Produces at least 3 ACs covering error codes explicitly", + "All ACs have test_type=unit", + "Outputs valid canonical Functionality JSON" + ] + }, + { + "id": 2, + "category": "happy-path", + "prompt": "Review this Functionality and tell me what's wrong with it. File: evals/files/broad-functionality.json", + "expected_output": "Agent identifies: Blocker — name 'Handle checkout' is too broad (verb phrase should identify a single atomic behavior); Blocker — both ACs are vague ('works correctly', 'handles it appropriately') — not testable; Important — no error codes specified in the error AC; Important — no boundary value conditions. Proposes splitting into multiple focused Functionalities and rewriting ACs with exact When/Then conditions and error codes.", + "files": [ + "evals/files/broad-functionality.json" + ], + "expectations": [ + "Flags 'Handle checkout' as too broad — not atomic", + "Flags 'works correctly' as non-testable — no exact output", + "Flags 'handles it appropriately' as vague — no error code", + "Proposes splitting into at least 2 focused Functionalities", + "Provides rewritten AC examples with exact outputs and error codes" + ] + }, + { + "id": 3, + "category": "happy-path", + "prompt": "Should this AC have test_type=unit or test_type=integration? The behavior is: validate that the customer's email is not already registered.", + "expected_output": "This requires a database read to check existing registrations — it cannot be tested in pure isolation. test_type=integration. Explains: unit tests mock the DB; an integration test uses a real or test DB to verify the uniqueness constraint actually works. Notes that if the validation logic is a pure function that takes a list of existing emails as input, it could be unit-testable — in that case unit is preferred.", + "files": [], + "expectations": [ + "Recommends integration because it requires a real DB read", + "Explains the unit vs integration distinction", + "Notes the pure-function alternative that would allow unit testing" + ] + }, + { + "id": 4, + "category": "regression", + "prompt": "I noticed this AC appears in both US-001 and US-007: 'When the cart is empty, validation returns INVALID with code CART_EMPTY'. Should I duplicate it?", + "expected_output": "No. This is a reuse candidate. Create a single Functionality with this AC and link both US-001 and US-007 to it via the user_stories array. Duplicating ACs across User Stories creates maintenance burden — a change to the behavior must be updated in multiple places.", + "files": [], + "expectations": [ + "Flags as a reuse candidate — not a duplication", + "Advises creating one Functionality and linking both USs via user_stories array", + "Explains the maintenance risk of duplication" + ] + }, + { + "id": 5, + "category": "negative", + "prompt": "Create a User Story for the order placement capability.", + "expected_output": "User Story creation — routes to living-doc-create-user-story. This skill documents atomic behaviors (Functionalities), not end-to-end User Stories.", + "files": [], + "expectations": [ + "Does not create a User Story", + "Routes to living-doc-create-user-story" + ] + }, + { + "id": 6, + "category": "paraphrase", + "prompt": "I need to capture the business rule: a gold member gets a 20% discount on all orders over £50. How do I document this in the living doc?", + "expected_output": "Agent identifies this as a Functionality entity. Forms verb-phrase name: 'Apply gold member discount on qualifying orders'. Runs completeness checklist: asks about non-gold members, orders exactly £50 (boundary), orders under £50, combination with promo codes. Produces ACs: order>£50 and gold member → 20% discount applied; order<=£50 → no discount; non-gold member → no discount; combined with promo → define stacking rule. All ACs test_type=unit. Outputs canonical Functionality JSON.", + "files": [], + "expectations": [ + "Identifies as Functionality (atomic business rule), not User Story", + "Verb-phrase name captures the business rule precisely", + "Boundary value at £50 is included in the completeness checklist", + "At least 3 ACs with explicit expected outcomes", + "Outputs valid canonical Functionality JSON" + ] + }, + { + "id": 7, + "category": "edge-case", + "prompt": "A Functionality has 12 ACs. Is that a sign of a design problem?", + "expected_output": "Yes. A Functionality with 12 ACs is almost certainly not atomic — it is covering multiple distinct behaviors. An atomic Functionality should have 3-7 ACs covering the happy path, key error codes, and important boundary values. With 12, review whether the ACs can be grouped into 2-3 distinct behaviors that each warrant their own Functionality entity. Split the Functionality by behavioral concern. The goal is one coherent unit-testable behavior per entity.", + "files": [], + "expectations": [ + "Flags 12 ACs as a sign of non-atomic scope", + "Recommends 3-7 ACs as the target range", + "Advises splitting into 2-3 focused Functionalities", + "Explains the atomic unit-testable behavior criterion" + ] + }, + { + "id": 8, + "category": "output-format", + "prompt": "Document the atomic behavior: 'The system deducts the applied voucher discount from the basket total before tax is calculated.'", + "expected_output": "The output contains a fenced ```json code block with a Functionality entity. Required fields: type ('Functionality'), id (FUNC-), name, description, feature_id, acceptance_criteria (array), test_coverage (array), status. Acceptance criteria items are phrased in plain business English. No implementation details appear in the JSON values.", + "files": [], + "expectations": [ + "Single fenced ```json code block", + "type field is 'Functionality'", + "id follows FUNC- convention", + "acceptance_criteria is an array of business-language strings", + "test_coverage array covers every AC with unit or integration classification", + "No implementation details in JSON values" + ] + }, + { + "id": 9, + "category": "regression", + "prompt": "I want to document a Functionality called 'Password Validation'. Is that a good name?", + "expected_output": "'Password Validation' is a noun phrase — Functionality names must be verb phrases expressing what the behavior does. Rename it to 'Validate Password Strength' or 'Validate Password Against Policy'. The noun phrase 'Password Validation' names a module or component, which would be a Feature, not a Functionality. Correct format: verb + object, describing the single atomic action the Functionality performs.", + "files": [], + "expectations": [ + "Flags 'Password Validation' as a noun phrase anti-pattern", + "Provides a correct verb-phrase rename (e.g. 'Validate Password Strength')", + "Explains: Functionality names must be verb phrases", + "Notes noun phrases describe modules/components (Features), not behaviors" + ] + }, + { + "id": 10, + "category": "happy-path", + "prompt": "I need to document this atomic behavior for the checkout domain: 'Reject order when all items in the cart are out of stock'. The Feature hasn't been created yet.", + "expected_output": "Agent infers a provisional feature_id of FEAT-checkout from the 'checkout domain' context. Forms the verb-phrase name: 'Reject order when all items are out of stock'. Runs completeness checklist: boundary conditions (all items out of stock vs. some items out of stock), partial cart availability, zero-quantity items. Notes that FEAT-checkout must be created via living-doc-create-feature if it does not yet exist. Outputs canonical Functionality JSON with feature_id: 'FEAT-checkout'.", + "files": [], + "expectations": [ + "Infers feature_id as FEAT-checkout from 'checkout domain' context", + "Forms a verb-phrase name", + "Runs completeness checklist: all out of stock vs. partial", + "Notes that FEAT-checkout must be created if it doesn't exist", + "Outputs valid Functionality JSON with feature_id populated" + ] + }, + { + "id": 11, + "category": "regression", + "prompt": "A Functionality has no parent Feature — I forgot to link it. Is that a problem?", + "expected_output": "Yes. A Functionality without a parent Feature is untraceable — it cannot be reached via the entity hierarchy, will be surfaced as an ORPHAN_FUNCTIONALITY gap by living-doc-gap-finder, and will be missed in impact analyses. Identify or create the owning Feature first (living-doc-create-feature), then set feature_id in the Functionality entity. Do not leave feature_id empty or null in a finalized entity.", + "files": [], + "expectations": [ + "Flags missing feature_id as a traceability problem", + "Names the gap type: ORPHAN_FUNCTIONALITY", + "Notes it will be missed in impact analyses", + "Directs to living-doc-create-feature to create the parent Feature first" + ] + }, + { + "id": 12, + "category": "edge-case", + "prompt": "I have a Functionality AC that says: 'The system validates the discount code'. Is this AC well-formed?", + "expected_output": "No. This AC is vague — it does not specify an observable outcome or error code. 'validates' without a result is non-testable. Rewrite using When/Then form with explicit outcomes: e.g. 'When a valid discount code is applied, the discount is deducted from the basket total' and 'When an invalid or expired discount code is submitted, validation returns INVALID_CODE error'. Every AC must state an exact output or side effect; error cases must include the explicit error code.", + "files": [], + "expectations": [ + "Flags 'validates' without a result as non-testable vague AC", + "Provides a rewritten AC with explicit When/Then outcome", + "Includes an error-case AC with an explicit error code", + "Explains: every AC must state an exact outcome" + ] + }, + { + "id": 14, + "category": "regression", + "prompt": "Create a new Functionality entity 'Validate discount code expiry'. The catalog already contains FUNC-001 through FUNC-007.", + "expected_output": "Before assigning an ID, the agent runs: python scripts/next_id.py --type FUNC --catalog catalog.json. The script returns FUNC-008. The agent assigns id='FUNC-008'. It does NOT invent an ID, reuse an existing one, or leave a placeholder such as FUNC-XXX or FUNC-. The final JSON has a fully populated id field before being presented to the user.", + "files": [], + "expectations": [ + "Runs next_id.py --type FUNC before assigning the ID", + "Assigns the ID returned by the script (e.g. FUNC-008)", + "Does not invent, guess, or reuse an ID", + "Does not leave a placeholder (FUNC-XXX, FUNC-, FUNC-unknown)", + "Final JSON has a fully populated id field" + ] + }, + { + "id": 15, + "category": "regression", + "prompt": "Create a Functionality entity but the catalog file is missing — next_id.py cannot be run. What should the agent do?", + "expected_output": "Agent cannot auto-assign an ID. It outputs the entity with id='FUNC-PENDING' and explicitly states: 'Catalog not available — ID could not be assigned. Run next_id.py --type FUNC once the catalog is present and update this field before saving.' It does NOT invent a numeric ID such as FUNC-001 without catalog evidence. The placeholder makes the gap visible rather than hiding it with a guessed value.", + "files": [], + "expectations": [ + "Uses FUNC-PENDING placeholder when catalog is unavailable", + "Explicitly warns the user that the ID must be assigned before saving", + "Does not invent a numeric ID without catalog evidence", + "Placeholder is visibly distinct — not a real FUNC-nnn value" + ] + }, + { + "id": 13, + "category": "regression", + "prompt": "I just saved FUNC-promo-validate to the living doc. What else do I need to do to make it fully linked?", + "expected_output": "After saving the Functionality entity, load living-doc-update and append 'FUNC-promo-validate' to the parent Feature's 'functionalities' array. An unlinked Functionality will be flagged as ORPHAN_FUNCTIONALITY by living-doc-gap-finder. The link must be added to the Feature entity — the Functionality alone is not sufficient.", + "files": [], + "expectations": [ + "Identifies the required parent Feature update step", + "Routes to living-doc-update to perform the append", + "Mentions ORPHAN_FUNCTIONALITY gap type as the consequence of skipping" + ] + } + ] +} \ No newline at end of file diff --git a/skills/living-doc-create-functionality/evals/files/broad-functionality.json b/skills/living-doc-create-functionality/evals/files/broad-functionality.json new file mode 100644 index 0000000..c61971c --- /dev/null +++ b/skills/living-doc-create-functionality/evals/files/broad-functionality.json @@ -0,0 +1,25 @@ +{ + "type": "Functionality", + "id": "FUNC-checkout-draft", + "name": "Handle checkout", + "parent_feature": "FEAT-checkout", + "user_stories": ["US-001"], + "acceptance_criteria": [ + { + "id": "FUNC-checkout-draft-AC-1", + "description": "Checkout works correctly", + "when": "the customer goes through the checkout process", + "then": "the order is placed and everything works", + "priority": "critical", + "test_type": "unit" + }, + { + "id": "FUNC-checkout-draft-AC-2", + "description": "Error handling works", + "when": "something goes wrong during checkout", + "then": "the system handles it appropriately", + "priority": "high", + "test_type": "unit" + } + ] +} diff --git a/skills/living-doc-create-functionality/evals/fixture-map.md b/skills/living-doc-create-functionality/evals/fixture-map.md new file mode 100644 index 0000000..32e3c02 --- /dev/null +++ b/skills/living-doc-create-functionality/evals/fixture-map.md @@ -0,0 +1,37 @@ +# Fixture Map — living-doc-create-functionality + +## Fixture files + +| File | Description | +|---|---| +| `evals/files/broad-functionality.json` | Draft Functionality with over-broad name ("Handle checkout") and vague ACs — tests completeness enforcement | + +## Eval to fixture mapping + +| Eval ID | Category | Fixture file(s) | Coverage | +|---|---|---|---| +| 1 | happy-path | _(none — conversational)_ | Full elicitation: cart validation behavior, completeness checklist, atomic ACs, error codes | +| 2 | happy-path | `broad-functionality.json` | Blocker detection: broad name, vague ACs, no error codes | +| 3 | happy-path | _(none)_ | unit vs integration decision for a DB uniqueness check | +| 4 | regression | _(none)_ | Reuse candidate detection: same AC in two User Stories | +| 5 | negative | _(none)_ | Routing: User Story creation → living-doc-create-user-story | +| 6 | paraphrase | _(none)_ | Gold member discount business rule — Functionality elicitation | +| 7 | edge-case | _(none)_ | 12 ACs → non-atomic scope signal; recommend split | +| 8 | output-format | _(none)_ | Canonical Functionality JSON: all required fields, test_coverage array | +| 9 | regression | _(none)_ | Anti-pattern: noun name ('Password Validation') → verb phrase required | +| 10 | happy-path | _(none)_ | Feature inference from context ('checkout domain') | +| 11 | regression | _(none)_ | Missing parent Feature: ORPHAN_FUNCTIONALITY anti-pattern | +| 12 | edge-case | _(none)_ | Vague AC ('validates') — non-testable; rewrite with explicit When/Then + error code | + +## Trigger eval summary + +22 entries: 16 `should_trigger=true`, 6 `should_trigger=false` + +| Routes to | Query count | +|---|---| +| living-doc-create-user-story | 1 | +| living-doc-create-feature | 1 | +| living-doc-scenario-creator | 1 | +| living-doc-gap-finder | 1 | +| living-doc-update | 1 | +| living-doc-pageobject-scan | 1 | diff --git a/skills/living-doc-create-functionality/evals/trigger-eval.json b/skills/living-doc-create-functionality/evals/trigger-eval.json new file mode 100644 index 0000000..f50c42e --- /dev/null +++ b/skills/living-doc-create-functionality/evals/trigger-eval.json @@ -0,0 +1,200 @@ +[ + { + "id": 1, + "query": "Create a functionality for validating cart contents before checkout", + "should_trigger": true, + "reason": "Explicit 'create a functionality' trigger keyword" + }, + { + "id": 2, + "query": "Document the atomic behavior: apply discount to a cart item", + "should_trigger": true, + "reason": "'document an atomic behavior' trigger phrase" + }, + { + "id": 3, + "query": "Write Functionality ACs for the discount engine", + "should_trigger": true, + "reason": "'functionality AC' trigger phrase" + }, + { + "id": 4, + "query": "Define a unit-testable behavior for the coupon validation module", + "should_trigger": true, + "reason": "'unit-testable behavior' and 'define component behavior' trigger phrases" + }, + { + "id": 5, + "query": "Document the business rule: orders over $100 get free shipping", + "should_trigger": true, + "reason": "'document a business rule' trigger phrase" + }, + { + "id": 6, + "query": "Create a functionality entity for the payment retry logic", + "should_trigger": true, + "reason": "'create a functionality entity' trigger phrase" + }, + { + "id": 7, + "query": "What ACs should I write for the email validator function?", + "should_trigger": true, + "reason": "Asking for atomic AC writing — core functionality skill task" + }, + { + "id": 8, + "query": "What test_type should I use for checking DB uniqueness constraints?", + "should_trigger": true, + "reason": "Deciding unit vs integration — functionality skill task" + }, + { + "id": 9, + "query": "Review this functionality for completeness — it only has a happy path", + "should_trigger": true, + "reason": "Completeness check of Functionality ACs is a core task" + }, + { + "id": 10, + "query": "I see this AC in both US-001 and US-007 — should I split it out?", + "should_trigger": true, + "reason": "Reuse candidate identification — a core functionality skill task" + }, + { + "id": 11, + "query": "Create a user story for the checkout capability", + "should_trigger": false, + "reason": "User Story — routes to living-doc-create-user-story" + }, + { + "id": 12, + "query": "Document the checkout page as a Feature", + "should_trigger": false, + "reason": "Feature entity — routes to living-doc-create-feature" + }, + { + "id": 13, + "query": "Generate BDD scenarios for US-001", + "should_trigger": false, + "reason": "Scenario generation — routes to living-doc-scenario-creator" + }, + { + "id": 14, + "query": "Run a gap analysis on the living documentation", + "should_trigger": false, + "reason": "Gap analysis — routes to living-doc-gap-finder" + }, + { + "id": 15, + "query": "How should I define the component behavior for the payment validator?", + "should_trigger": true, + "reason": "'define component behavior' trigger phrase" + }, + { + "id": 16, + "query": "Write atomic acceptance criteria for the session expiry logic", + "should_trigger": true, + "reason": "'atomic acceptance criteria' trigger phrase" + }, + { + "id": 17, + "query": "Should this behavior be tested with a unit test or an integration test?", + "should_trigger": true, + "reason": "'unit vs integration test' trigger phrase" + }, + { + "id": 18, + "query": "Help me choose test type for the loyalty points calculation — it calls no external services", + "should_trigger": true, + "reason": "'choose test type' trigger phrase" + }, + { + "id": 19, + "query": "Help me document the null-check rule for user IDs in the registration service", + "should_trigger": true, + "reason": "Documenting an atomic validation rule is a Functionality — 'document a business rule' / 'atomic acceptance criteria' pattern" + }, + { + "id": 20, + "query": "A Functionality I wrote has no parent Feature — how do I link it?", + "should_trigger": true, + "reason": "Resolving ORPHAN_FUNCTIONALITY — identifying and linking the parent Feature is a Functionality skill task" + }, + { + "id": 21, + "query": "Update the living doc entity for the discount validation Functionality", + "should_trigger": false, + "reason": "Updating an existing entity — routes to living-doc-update" + }, + { + "id": 22, + "query": "Scan the checkout page for UI elements", + "should_trigger": false, + "reason": "UI scan — routes to living-doc-pageobject-scan" + }, + { + "id": 23, + "query": "Write step definitions for the cart validation behavior", + "should_trigger": false, + "reason": "Writing step definitions — routes to gherkin-step" + }, + { + "id": 24, + "query": "Run a dead code audit to find unused PageObject methods", + "should_trigger": false, + "reason": "BDD artifact cleanup — routes to bdd-maintain" + }, + { + "id": 25, + "query": "Sync the feature files with the updated AC catalog", + "should_trigger": false, + "reason": "Feature file sync — routes to gherkin-living-doc-sync" + }, + { + "id": 26, + "query": "What does changing the cart validator affect in the living doc?", + "should_trigger": false, + "reason": "Impact analysis — routes to living-doc-impact-analysis" + }, + { + "id": 27, + "query": "Scan the checkout page for UI elements and generate PageObjects", + "should_trigger": false, + "reason": "PageObject scanning — routes to living-doc-pageobject-scan" + }, + { + "id": 28, + "query": "Add data-cy attributes to the checkout form inputs", + "should_trigger": false, + "reason": "Instrumenting templates — routes to data-cy-instrument" + }, + { + "id": 29, + "query": "Generate BDD scenarios for all active ACs on US-007", + "should_trigger": false, + "reason": "Generating scenarios — routes to living-doc-scenario-creator" + }, + { + "id": 30, + "query": "Create a User Story for the cart validation capability", + "should_trigger": false, + "reason": "Creating a User Story — routes to living-doc-create-user-story" + }, + { + "id": 31, + "query": "Create a Feature entity for the checkout module", + "should_trigger": false, + "reason": "Creating a Feature — routes to living-doc-create-feature" + }, + { + "id": 32, + "query": "Run a gap analysis on the living documentation", + "should_trigger": false, + "reason": "Gap analysis — routes to living-doc-gap-finder" + }, + { + "id": 33, + "query": "I just created FUNC-promo-validate — how do I add it to the parent Feature's functionalities array?", + "should_trigger": true, + "reason": "Parent Feature sync after creating a Functionality — living-doc-create-functionality owns this step" + } +] \ No newline at end of file diff --git a/skills/living-doc-create-functionality/scripts/next_id.py b/skills/living-doc-create-functionality/scripts/next_id.py new file mode 100644 index 0000000..77942e0 --- /dev/null +++ b/skills/living-doc-create-functionality/scripts/next_id.py @@ -0,0 +1,142 @@ +#!/usr/bin/env python3 +""" +next_id.py — Living Doc ID Auto-Assigner + +Scans the living documentation and returns the next available ID for a given entity type. +Use this before creating a new entity to avoid ID collisions. + +Usage: + python next_id.py --type US --catalog catalog.json → US-005 + python next_id.py --type FEAT --catalog catalog.json → FEAT-012 + python next_id.py --type FUNC --catalog catalog.json → FUNC-003 + python next_id.py --type AC --parent US-007 --catalog catalog.json → AC:US-007-05 + python next_id.py --type AC --parent FUNC-002 --catalog catalog.json → AC:FUNC-002-03 + +Exits with code 0 and prints the next ID on stdout. +Exits with code 1 and prints an error on stderr if the catalog cannot be read, +the entity type is unknown, or --parent is missing when --type AC is used. + +Catalog JSON must contain one of: + - Top-level keys: "user_stories", "features", "functionalities" + - Or nested under a "catalog" key: {"catalog": {"user_stories": [...], ...}} +""" + +import argparse +import json +import re +import sys + +# Maps entity type token → (catalog collection key, ID regex with capture group for the number) +ENTITY_TYPE_MAP: dict[str, tuple[str, re.Pattern]] = { + "US": ("user_stories", re.compile(r"^US-(\d+)$")), + "FEAT": ("features", re.compile(r"^FEAT-(\d+)$")), + "FUNC": ("functionalities", re.compile(r"^FUNC-(\d+)$")), +} + +# Width of the numeric suffix (zero-padded) +ID_WIDTH = 3 + + +def load_catalog(path: str) -> dict: + with open(path) as f: + raw = json.load(f) + # Support both {"catalog": {...}} and flat {"user_stories": [...]} formats + return raw.get("catalog", raw) + + +def next_entity_id(catalog: dict, entity_type: str) -> str: + """ + Return the next sequential ID for US, FEAT, or FUNC entities. + Scans the matching collection for the highest existing numeric suffix. + """ + if entity_type not in ENTITY_TYPE_MAP: + raise ValueError( + f"Unknown entity type '{entity_type}'. " + f"Must be one of: {sorted(ENTITY_TYPE_MAP)}" + ) + collection_key, pattern = ENTITY_TYPE_MAP[entity_type] + entities: list[dict] = catalog.get(collection_key, []) + + max_num = 0 + for entity in entities: + m = pattern.match(entity.get("id", "")) + if m: + max_num = max(max_num, int(m.group(1))) + + return f"{entity_type}-{max_num + 1:0{ID_WIDTH}d}" + + +def next_ac_id(catalog: dict, parent_id: str) -> str: + """ + Return the next sequential AC ID for a given parent entity (User Story or Functionality). + AC format: AC:- (two-digit zero-padded suffix) + + Scans the parent entity's acceptance_criteria list for the highest existing number. + """ + prefix = parent_id.split("-")[0] + collection_map = {"US": "user_stories", "FUNC": "functionalities"} + collection_key = collection_map.get(prefix) + if not collection_key: + raise ValueError( + f"Cannot determine entity collection for parent '{parent_id}'. " + f"Prefix must be 'US' or 'FUNC'." + ) + + entities: list[dict] = catalog.get(collection_key, []) + parent = next((e for e in entities if e.get("id") == parent_id), None) + if parent is None: + raise ValueError(f"Entity '{parent_id}' not found in catalog") + + ac_pattern = re.compile(rf"^AC:{re.escape(parent_id)}-(\d+)$") + max_num = 0 + for ac in parent.get("acceptance_criteria", []): + m = ac_pattern.match(ac.get("id", "")) + if m: + max_num = max(max_num, int(m.group(1))) + + return f"AC:{parent_id}-{max_num + 1:02d}" + + +def main() -> None: + parser = argparse.ArgumentParser( + description="Return the next available living doc entity ID." + ) + parser.add_argument( + "--type", "-t", + required=True, + choices=["US", "FEAT", "FUNC", "AC"], + help="Entity type to generate an ID for", + ) + parser.add_argument( + "--parent", "-p", + help="Parent entity ID — required when --type is AC (e.g. US-007 or FUNC-002)", + ) + parser.add_argument( + "--catalog", "-c", + required=True, + help="Path to the catalog JSON file", + ) + args = parser.parse_args() + + if args.type == "AC" and not args.parent: + print("Error: --parent is required when --type is AC", file=sys.stderr) + sys.exit(1) + + try: + catalog = load_catalog(args.catalog) + if args.type == "AC": + result = next_ac_id(catalog, args.parent) + else: + result = next_entity_id(catalog, args.type) + except (FileNotFoundError, json.JSONDecodeError) as exc: + print(f"Error reading catalog: {exc}", file=sys.stderr) + sys.exit(1) + except ValueError as exc: + print(f"Error: {exc}", file=sys.stderr) + sys.exit(1) + + print(result) + + +if __name__ == "__main__": + main() diff --git a/skills/living-doc-create-user-story/SKILL.md b/skills/living-doc-create-user-story/SKILL.md new file mode 100644 index 0000000..3eaa333 --- /dev/null +++ b/skills/living-doc-create-user-story/SKILL.md @@ -0,0 +1,176 @@ +--- +name: living-doc-create-user-story +description: > + Guide the creation of a well-formed User Story (US) with business-level Acceptance Criteria + that are traceable, testable, and E2E-ready. Use when creating a new User Story, eliciting + As-a/I-can/so-that narratives, defining US-level ACs, validating US narrative structure, + or reviewing US completeness before scenario creation. + Triggers on: "create a user story", "new user story for", "write acceptance criteria for", + "document a business requirement", "define US AC", "user story template", "as a user I want", + "elicit requirements", "AC for user story", "US acceptance criteria", + "review this user story", "is my narrative well-formed", "I-want clause". + Does NOT trigger for: atomic behaviors (use living-doc-create-functionality); system surfaces + (use living-doc-create-feature); generating BDD scenarios (use living-doc-scenario-creator). + Pairs with living-doc-create-feature, living-doc-create-functionality, and + living-doc-scenario-creator (generate scenarios after the US is active). +license: Apache-2.0 +compatibility: GitHub Copilot +--- + +# Living Doc — Create User Story + +> **Key concepts:** Feature, Functionality, User Story, AC — see [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)). + +## Step 1 — Elicit the narrative + +Before asking, **scan the conversation context** for an actor, capability, or business outcome already stated by the user. If the user clearly provides all three parts and asks for the final artifact now, form the narrative directly and proceed to output. Otherwise, walk through all three questions in order. When a detail is already implied, restate it as a proposed answer and ask the user to confirm or refine it rather than silently skipping the question. + +Ask these three questions explicitly: + +1. **Who is the user?** — The actor using the system (a specific role, not "the user") +2. **What do they want to do?** — The capability or action in business terms +3. **Why?** — The business outcome or value delivered + +Form the canonical narrative: + +``` +As a , +I can , +so that . +``` + +**Validation:** +- Actor must be a named role — not "system", "admin", or "the app" +- If the actor given is "the system" or similar, reject it: *"The system is not a valid actor. + Ask: who triggers this action? Who benefits from it? Name that human role."* + System-initiated or background flows do not belong in a User Story — they belong in a + Functionality. Redirect to `living-doc-create-functionality` for system-driven behaviors. +- Capability must be an action the user performs — not a technical implementation +- Outcome must describe business value — not system state + +## Step 2 — Establish domain context + +Ask: *Which Feature(s) does this User Story touch?* + +A Feature is a named system surface (UI screen or API endpoint group). If the Feature +has not yet been created as a living doc entity, note it as `[NEW: ]` and suggest creating it +with `living-doc-create-feature` after completing the User Story. + +Also ask: *Are there existing Functionalities this User Story relies on?* If yes, link them in the `functionalities` array. This prevents `ORPHAN_FUNCTIONALITY` gaps and makes the entity graph traversable from US down to test coverage. + +## Step 3 — Elicit Acceptance Criteria + +Each AC must be: +- **End-to-end** — written from the user's perspective, not the database's +- **Outcome-focused** — "order is confirmed" not "DB record is inserted" +- **Binary** — clear pass/fail; no "should usually" or "typically" +- **Single placeholder** — at most ONE `{placeholder}` per AC statement. If two aspects vary independently, write a separate AC for each. + +Use `{placeholder}` syntax when a value varies, and list the concrete values immediately below. During elicitation, capture ACs using structured condition / action / outcome language; in the final JSON, convert each accepted AC into a plain-language description. + +When reviewing an existing User Story, classify **only happy-path ACs present** as an **Important** gap. Name the missing cases in domain language and propose 2-3 extra Given / When / Then ACs. For password-reset stories, explicitly check for: unregistered email or phone, expired token or code, already-used token or code, wrong code, and retry limits. + +If the request is really for a single atomic rule or technical behavior rather than an end-to-end user outcome, say so explicitly: this is a **Functionality-level behavior**, not a User Story. Stop and redirect to `living-doc-create-functionality`. + +**Completeness check — always ask:** +1. What happens on the happy path? (at least one AC required) +2. What happens when the input is invalid or missing? +3. What happens when a downstream dependency fails? +4. Are there alternative flows (e.g. user is not logged in, item is out of stock)? + +Warn if only happy-path ACs are present: +> "No error or alternative-path ACs were provided. Real systems fail — add at least one +> AC for a failure or edge case before marking this US ready." + +**Warn if an AC reads like a Functionality AC** (too atomic/technical): +> "This AC describes a technical behavior rather than an end-to-end user outcome. +> Consider creating a Functionality entity for this behavior with +> living-doc-create-functionality." + +## Step 4 — Validate and output + +> **ID assignment:** before assigning a `US-nnn` ID, run +> `python scripts/next_id.py --type US --catalog catalog.json` +> to get the next available ID and avoid collisions. + +Invariants that must hold before outputting: +- At least one AC exists +- At least one Feature is linked (or flagged as `[NEW]`) +- Status defaults to `planned` +- No open `[TODO]` markers + +When creating a new User Story, output **one fenced `json` code block** using this canonical shape: + +```json +{ + "type": "UserStory", + "id": "US-001", + "title": "Reset password via SMS", + "status": "planned", + "as_a": "registered customer", + "i_want": "reset my password via SMS", + "so_that": "I can regain access even when I cannot use email", + "features": ["FEAT-login"], + "acceptance_criteria": [ + { + "id": "AC:US-001-01", + "description": "A registered customer with a phone number on file can request a password reset code by SMS and sees confirmation that the code was sent." + }, + { + "id": "AC:US-001-02", + "description": "A customer who enters an unregistered phone number is told that the reset request cannot be completed." + }, + { + "id": "AC:US-001-03", + "description": "A customer who submits an expired or already-used reset code is told to request a new code." + } + ] +} +``` + +Rules: +- Use `title` rather than `name` +- Use `as_a`, `i_want`, and `so_that` +- Every AC object must have `id` in `AC:US--` format and a plain-language `description` +- Write AC descriptions in plain language — no structured language keywords in JSON values + +> **Next steps after creation:** The User Story is created with `status: "planned"`. When all ACs are finalised and at least one Feature is linked, use `living-doc-update` to promote it to `active`. After promotion, use `living-doc-scenario-creator` to generate BDD feature files for each `ACTIVE` AC. + +## Script — `validate_entity.py` + +After outputting the entity, validate it against the canonical schema before saving to the catalog. Do not save the entity if the script exits with code 1. + +```bash +# Validate the output (run from the toolkit root) +python skills/living-doc-update/scripts/validate_entity.py entity.json + +# With referential integrity checks against the full catalog +python skills/living-doc-update/scripts/validate_entity.py entity.json --catalog catalog.json +``` + +Exits 0 if valid (warnings are non-blocking). Exits 1 if any required field is missing, the ID format is wrong, no AC is present, or the status value is invalid. + +## Anti-patterns to flag + +| Anti-pattern | Warning | +|---|---| +| AC says "the system saves to the database" | Technical implementation — restate as user outcome. Provide a rewritten AC: e.g. "When the customer confirms the order, then the order is acknowledged and the customer sees a confirmation message." | +| AC says "unit test passes" | Test is not an AC — describe the behavior, not how it's verified | +| Narrative says "As a system..." | System is not a user — name the human role | +| Same capability described for two different actors | Two actors = two separate User Stories. Different actors have different permissions, audit requirements, and AC sets. Mixing two actor perspectives in one User Story produces ambiguous ACs. Shared Functionalities (e.g. OTP generation, email delivery) can be linked to both User Stories. | +| User Story "I want" clause contains "and" | Multiple capabilities in one User Story — split at each “and”. Each capability has its own failure paths and may touch different Features; bundling them makes ACs ambiguous and traceability impossible. | +| AC uses `{placeholder}` for a single value | Placeholder syntax is only justified when two or more values vary. If only one value applies, write it inline. Example: instead of `{error type}: inline validation message`, write `an inline validation message is shown`. | +| AC describes a non-observable outcome | e.g. “a background job processes the record” — the user cannot observe this. Restate as the observable signal (e.g. “the confirmation email arrives within 60 seconds”), or redirect the behavior to a Functionality entity if it is purely technical. | +| AC identifier does not follow `AC:US--` | Every acceptance criterion in the JSON output needs a stable `AC:US--` id so it can be referenced unambiguously. | +| AC behavior already documented in another User Story | Duplicate ACs create a maintenance burden — any change must be applied in every copy. Extract the shared behavior into a Functionality entity and link both User Stories to it. | + +--- + +## Out-of-scope routing + +| Request | Correct skill | +|---|---| +| Document an atomic behavior or business rule | `living-doc-create-functionality` | +| Document a system surface (screen, API) | `living-doc-create-feature` | +| Generate BDD scenarios for User Story ACs | `living-doc-scenario-creator` | +| Update or deprecate an existing User Story | `living-doc-update` | \ No newline at end of file diff --git a/skills/living-doc-create-user-story/evals/evals.json b/skills/living-doc-create-user-story/evals/evals.json new file mode 100644 index 0000000..872843e --- /dev/null +++ b/skills/living-doc-create-user-story/evals/evals.json @@ -0,0 +1,207 @@ +{ + "skill_name": "living-doc-create-user-story", + "evals": [ + { + "id": 1, + "category": "happy-path", + "prompt": "I want to create a new User Story for the password reset capability.", + "expected_output": "Agent asks three elicitation questions in sequence: (1) Who is the user? (2) What do they want to do? (3) Why / what business outcome? After answers, forms the As-a/I-can/so-that narrative. Then asks for domain context (which Feature). Then elicits ACs in Given-When-Then. Checks for error and alternative paths (unregistered email, expired token, already-used token). Outputs canonical User Story JSON with at least 3 ACs (happy path + at least 2 error/alternative).", + "files": [], + "expectations": [ + "Asks actor, capability, and business value as distinct questions before writing narrative", + "Forms As-a/I-can/so-that narrative correctly from answers", + "Asks which Feature(s) this story touches", + "Elicits at least one error-path AC (e.g. unregistered email, expired token)", + "Warns if only happy-path AC is provided", + "Outputs valid canonical UserStory JSON" + ] + }, + { + "id": 2, + "category": "happy-path", + "prompt": "Review this User Story and tell me what's missing. File: evals/files/incomplete-user-story.json", + "expected_output": "Agent identifies: (1) Important — only one happy-path AC provided; no error or alternative path ACs. Prompts for: what happens with an unregistered email? what happens with an expired reset link? what happens if the link is used twice? Proposes three additional ACs covering these gaps. Does not flag the narrative or Feature link as issues.", + "files": [ + "evals/files/incomplete-user-story.json" + ], + "expectations": [ + "Flags that only a happy-path AC exists", + "Prompts for error path: unregistered email address", + "Prompts for error path: expired reset token", + "Prompts for alternative: link already used", + "Does not reject the existing happy-path AC", + "Proposes at least 2 additional ACs with Given-When-Then" + ] + }, + { + "id": 3, + "category": "happy-path", + "prompt": "The actor in my narrative is 'the system'. Is that OK?", + "expected_output": "No. 'The system' is not a valid actor — it is not a human user with a goal. An actor must be a named human role (e.g. 'registered customer', 'support agent', 'finance manager'). System-initiated flows belong in a Functionality or a background process, not a User Story. Ask: who triggers this? Who benefits?", + "files": [], + "expectations": [ + "Rejects 'the system' as an actor", + "Explains that actors must be human roles", + "Suggests asking 'who triggers this?' and 'who benefits?'", + "Notes system-initiated flows belong in a Functionality" + ] + }, + { + "id": 4, + "category": "regression", + "prompt": "My AC says: 'When the customer submits the form, the database saves a record to the orders table with status=PENDING.' Is this OK?", + "expected_output": "No. This AC describes a technical implementation detail (database table, status field). It should be restated as a user-observable outcome. Example fix: 'When the customer confirms the order, then the order is acknowledged and the customer sees a confirmation message.' The internal DB state is an implementation concern — not an E2E AC.", + "files": [], + "expectations": [ + "Flags the AC as describing implementation detail, not user outcome", + "Provides a rewritten AC in outcome-focused language", + "Notes that DB state is an implementation concern", + "References the rule: AC must be outcome-focused from the user's perspective" + ] + }, + { + "id": 5, + "category": "negative", + "prompt": "Document the behavior: when discount code is applied to a zero-price item, the discount is silently ignored.", + "expected_output": "This is a Functionality-level behavior (atomic, technical, unit-testable) — routes to living-doc-create-functionality. User Story ACs describe E2E user outcomes, not component-level behaviors.", + "files": [], + "expectations": [ + "Identifies this as a Functionality AC, not a User Story AC", + "Routes to living-doc-create-functionality", + "Explains the distinction: atomic technical behavior vs. E2E user outcome" + ] + }, + { + "id": 6, + "category": "paraphrase", + "prompt": "Help me write a story for a feature where customers can reset their password via SMS instead of email.", + "expected_output": "Agent asks three elicitation questions: (1) Who is the actor? (registered customer who has a phone number on file). (2) What do they want to do? (receive a reset code via SMS). (3) Why? (some customers prefer SMS or don't have email access). Forms narrative: As a registered customer, I can reset my password via SMS, so that I can regain access even without email access. Elicits ACs: happy path (valid phone, code sent), unregistered phone, expired code, wrong code, maximum retries.", + "files": [], + "expectations": [ + "Forms As-a/I-can/so-that narrative", + "Elicits at least 3 ACs covering happy path and error paths", + "Asks actor, capability, and business value as distinct questions", + "Outputs valid canonical UserStory JSON" + ] + }, + { + "id": 7, + "category": "edge-case", + "prompt": "The same capability is needed by two different actors: a registered customer can reset their password, and a support agent can trigger a password reset on behalf of a customer. Should these be one User Story or two?", + "expected_output": "These should be two separate User Stories — different actors have different permissions, audit requirements, and AC sets. A customer-initiated reset has no audit trail requirement; an agent-initiated reset must be logged with the agent ID, customer consent, and reason. Shared Functionalities (OTP generation, email delivery) can be reused by linking both User Stories to the same Functionality entities. Mixing two actor perspectives in one User Story produces ambiguous ACs.", + "files": [], + "expectations": [ + "Recommends two separate User Stories for two distinct actors", + "Explains that different actors have different AC sets and audit requirements", + "Notes shared Functionalities can be linked to both User Stories", + "Warns that mixing actors produces ambiguous ACs" + ] + }, + { + "id": 8, + "category": "output-format", + "prompt": "Create a user story: 'A customer service agent needs to view the full order history for any customer to resolve disputes.'", + "expected_output": "The output contains a fenced ```json code block with a UserStory entity. Required fields: type ('UserStory'), id (US-), title, as_a, i_want, so_that, acceptance_criteria (array of objects with id and description). The as_a/i_want/so_that fields follow the standard user story template. Each AC has a unique id in the format AC:US--. No Gherkin syntax appears inside the JSON values.", + "files": [], + "expectations": [ + "Single fenced ```json code block", + "type field is 'UserStory'", + "id follows US- convention", + "as_a/i_want/so_that follow standard template", + "Each AC has id in AC:US-- format", + "No Gherkin syntax in JSON values" + ] + }, + { + "id": 9, + "category": "regression", + "prompt": "Here is my User Story: 'As a customer, I want to browse products and add items to my cart so that I can purchase what I want.' Is this well-formed?", + "expected_output": "No. The 'I want' clause contains 'and' — it bundles two distinct capabilities: browsing products and adding items to cart. These should be two separate User Stories. Each has its own failure paths: browsing may fail due to search/filter issues; adding to cart may fail due to stock availability or login state. Split the story at each 'and'. Shared Functionalities (e.g. stock availability check) can be linked to both User Stories.", + "files": [], + "expectations": [ + "Detects the 'and' in the 'I want' clause", + "Recommends splitting into two separate User Stories", + "Explains that each capability has distinct failure paths", + "Notes shared Functionalities can be linked to both stories" + ] + }, + { + "id": 10, + "category": "regression", + "prompt": "Is this AC correct? 'When the customer submits an invalid {error type}: inline validation message is shown.'", + "expected_output": "No. The {placeholder} syntax is only justified when two or more values vary independently. Here only one concrete feedback type applies — the placeholder is unnecessary. Rewrite without the placeholder: 'When the customer submits invalid data, an inline validation message is shown.' If multiple distinct error types each require different handling, write a separate AC per error type rather than bundling them with a placeholder.", + "files": [], + "expectations": [ + "Flags single-value placeholder as an anti-pattern", + "Provides a rewritten AC without placeholder", + "Explains: placeholder only for independently varying values", + "Notes: write separate ACs for distinct error types" + ] + }, + { + "id": 11, + "category": "regression", + "prompt": "Is this AC valid for a User Story? 'When the customer places an order, a background job processes the payment record asynchronously.'", + "expected_output": "No. This AC describes a non-observable outcome — the customer cannot observe a background job. User Story ACs must be observable from the user's perspective. Restate as an observable signal: e.g. 'When the customer places an order, they see an order confirmation and receive a confirmation email.' If the behavior is purely technical, it belongs in a Functionality entity — redirect to living-doc-create-functionality.", + "files": [], + "expectations": [ + "Flags the AC as a non-observable outcome", + "Provides a rewritten AC with an observable user signal", + "Notes that purely technical behaviors belong in Functionality entities", + "Routes to living-doc-create-functionality for technical AC" + ] + }, + { + "id": 12, + "category": "edge-case", + "prompt": "The same AC appears in both US-007 and US-011: 'A verification email is sent to the customer within 30 seconds.' Should I duplicate it?", + "expected_output": "No. Duplicating ACs across User Stories creates a maintenance burden — any change must be applied in every copy, and they can drift. Extract the shared behavior into a Functionality entity (e.g. 'Send verification email within SLA') and link both US-007 and US-011 to it via the Functionality's user_stories field. Use living-doc-create-functionality to create the shared Functionality.", + "files": [], + "expectations": [ + "Flags duplicate AC as a maintenance burden anti-pattern", + "Recommends creating a shared Functionality entity", + "Both User Stories link to the shared Functionality", + "Points to living-doc-create-functionality for the extraction" + ] + }, + { + "id": 14, + "category": "regression", + "prompt": "Create a new User Story for the password reset capability. The catalog already contains US-001 through US-013.", + "expected_output": "Before assigning an ID, the agent runs: python scripts/next_id.py --type US --catalog catalog.json. The script returns US-014. The agent assigns id='US-014'. It does NOT invent an ID, reuse an existing one, or leave a placeholder such as US-XXX or US-. The final JSON has a fully populated id field before being presented to the user.", + "files": [], + "expectations": [ + "Runs next_id.py --type US before assigning the ID", + "Assigns the ID returned by the script (e.g. US-014)", + "Does not invent, guess, or reuse an ID", + "Does not leave a placeholder (US-XXX, US-, US-unknown)", + "Final JSON has a fully populated id field" + ] + }, + { + "id": 15, + "category": "regression", + "prompt": "Create a User Story but the catalog file is missing — next_id.py cannot be run. What should the agent do?", + "expected_output": "Agent cannot auto-assign an ID. It outputs the entity with id='US-PENDING' and explicitly states: 'Catalog not available — ID could not be assigned. Run next_id.py --type US once the catalog is present and update this field before saving.' It does NOT invent a numeric ID such as US-001 without catalog evidence.", + "files": [], + "expectations": [ + "Uses US-PENDING placeholder when catalog is unavailable", + "Explicitly warns the user that the ID must be assigned before saving", + "Does not invent a numeric ID without catalog evidence", + "Placeholder is visibly distinct — not a real US-nnn value" + ] + }, + { + "id": 13, + "category": "happy-path", + "prompt": "I'm creating US-015 for the promo stacking feature. FUNC-promo-validate and FUNC-promo-stack already exist in the living doc. Should I link them?", + "expected_output": "Yes — link both Functionalities in the User Story's 'functionalities' array. Asking about existing Functionalities during US creation prevents ORPHAN_FUNCTIONALITY gaps and makes the entity graph traversable from US down to test coverage. If the Functionalities are relevant, they must be linked.", + "files": [], + "expectations": [ + "Confirms the Functionalities should be linked", + "Explains the ORPHAN_FUNCTIONALITY consequence if skipped", + "Shows how to add them to the functionalities array in the entity" + ] + } + ] +} \ No newline at end of file diff --git a/skills/living-doc-create-user-story/evals/files/incomplete-user-story.json b/skills/living-doc-create-user-story/evals/files/incomplete-user-story.json new file mode 100644 index 0000000..fe43c97 --- /dev/null +++ b/skills/living-doc-create-user-story/evals/files/incomplete-user-story.json @@ -0,0 +1,23 @@ +{ + "type": "UserStory", + "id": "US-042", + "title": "Reset account password", + "status": "draft", + "narrative": { + "as_a": "registered customer", + "i_can": "reset my password using my email address", + "so_that": "I can regain access to my account if I forget my password" + }, + "features": ["FEAT-login"], + "acceptance_criteria": [ + { + "id": "AC:US-042-01", + "description": "Happy path: password reset email is sent", + "given": "the customer is on the forgot password page", + "when": "the customer enters their registered email address and submits", + "then": "a password reset email is sent and the customer sees a confirmation message", + "priority": "critical", + "type": "happy_path" + } + ] +} diff --git a/skills/living-doc-create-user-story/evals/fixture-map.md b/skills/living-doc-create-user-story/evals/fixture-map.md new file mode 100644 index 0000000..30cd1a8 --- /dev/null +++ b/skills/living-doc-create-user-story/evals/fixture-map.md @@ -0,0 +1,35 @@ +# Fixture Map — living-doc-create-user-story + +## Fixture files + +| File | Description | +|---|---| +| `evals/files/incomplete-user-story.json` | User Story US-042 (password reset) with only a happy-path AC — missing error and alternative paths | + +## Eval to fixture mapping + +| Eval ID | Category | Fixture file(s) | Coverage | +|---|---|---|---| +| 1 | happy-path | _(none — conversational elicitation)_ | Full elicitation workflow: actor → narrative → Feature → ACs → completeness check → output | +| 2 | happy-path | `incomplete-user-story.json` | Completeness check: detects missing error + alternative ACs | +| 3 | happy-path | _(none)_ | Anti-pattern: invalid actor ("the system") | +| 4 | regression | _(none)_ | Anti-pattern: technical AC (DB implementation detail) | +| 5 | negative | _(none)_ | Routing: atomic behavior → living-doc-create-functionality | +| 6 | paraphrase | _(none)_ | SMS password reset — full elicitation with happy path + error paths | +| 7 | edge-case | _(none)_ | Two actors for same capability → two separate User Stories | +| 8 | output-format | _(none)_ | Canonical UserStory JSON: as_a/i_want/so_that, US--AC- format | +| 9 | regression | _(none)_ | Anti-pattern: 'I want' clause with 'and' — two capabilities bundled | +| 10 | regression | _(none)_ | Anti-pattern: single-value placeholder {error type} | +| 11 | regression | _(none)_ | Anti-pattern: non-observable outcome (background job) | +| 12 | edge-case | _(none)_ | Duplicate AC across two User Stories → shared Functionality | + +## Trigger eval summary + +18 entries: 13 `should_trigger=true`, 5 `should_trigger=false` + +| Routes to | Query count | +|---|---| +| living-doc-create-feature | 1 | +| living-doc-create-functionality | 1 | +| living-doc-scenario-creator | 2 | +| living-doc-gap-finder | 1 | diff --git a/skills/living-doc-create-user-story/evals/trigger-eval.json b/skills/living-doc-create-user-story/evals/trigger-eval.json new file mode 100644 index 0000000..e1ae24a --- /dev/null +++ b/skills/living-doc-create-user-story/evals/trigger-eval.json @@ -0,0 +1,164 @@ +[ + { + "id": 1, + "query": "Create a user story for the password reset feature", + "should_trigger": true, + "reason": "Explicit 'create a user story' trigger keyword" + }, + { + "id": 2, + "query": "Write acceptance criteria for the login capability", + "should_trigger": true, + "reason": "Explicit 'write acceptance criteria' trigger keyword" + }, + { + "id": 3, + "query": "I need a new user story — a customer wants to track their delivery", + "should_trigger": true, + "reason": "Explicit 'new user story' trigger keyword" + }, + { + "id": 4, + "query": "As a customer I want to view my order history", + "should_trigger": true, + "reason": "As-a narrative format triggers US elicitation" + }, + { + "id": 5, + "query": "Help me document a business requirement for promo codes", + "should_trigger": true, + "reason": "'document a business requirement' trigger phrase" + }, + { + "id": 6, + "query": "I need to define US AC for the checkout flow", + "should_trigger": true, + "reason": "'define US AC' trigger phrase" + }, + { + "id": 7, + "query": "User story template for a SaaS onboarding feature", + "should_trigger": true, + "reason": "'user story template' trigger phrase" + }, + { + "id": 8, + "query": "Elicit requirements for the notifications feature", + "should_trigger": true, + "reason": "'elicit requirements' trigger phrase" + }, + { + "id": 9, + "query": "Review this user story and tell me what ACs are missing", + "should_trigger": true, + "reason": "Reviewing US ACs is part of this skill's completeness check" + }, + { + "id": 10, + "query": "Is my narrative well formed? 'As a system I can process payments'", + "should_trigger": true, + "reason": "Validating a narrative is a core skill task" + }, + { + "id": 11, + "query": "Document the checkout page as a Feature entity", + "should_trigger": false, + "reason": "Feature entity creation — routes to living-doc-create-feature" + }, + { + "id": 12, + "query": "Document the atomic behavior: validate cart is not empty", + "should_trigger": false, + "reason": "Atomic behavior is a Functionality — routes to living-doc-create-functionality" + }, + { + "id": 13, + "query": "Generate BDD scenarios for US-001", + "should_trigger": false, + "reason": "Scenario generation — routes to living-doc-scenario-creator" + }, + { + "id": 14, + "query": "What test gaps exist in our living documentation?", + "should_trigger": false, + "reason": "Gap analysis — routes to living-doc-gap-finder" + }, + { + "id": 15, + "query": "I want to write requirements for the loyalty points feature", + "should_trigger": true, + "reason": "'document a business requirement' pattern — User Story elicitation" + }, + { + "id": 16, + "query": "Is my user story well-formed? Here it is: 'As a system, process the payment'", + "should_trigger": true, + "reason": "Validating a User Story narrative is a core skill task" + }, + { + "id": 17, + "query": "My I-want clause contains 'and' — is that OK for a User Story?", + "should_trigger": true, + "reason": "Reviewing User Story narrative correctness is a core task of this skill" + }, + { + "id": 18, + "query": "Create a BDD scenario for the checkout User Story", + "should_trigger": false, + "reason": "Scenario creation from a User Story — routes to living-doc-scenario-creator" + }, + { + "id": 19, + "query": "Write step definitions for the login AC scenarios", + "should_trigger": false, + "reason": "Writing step definitions — routes to gherkin-step" + }, + { + "id": 20, + "query": "Delete all BDD artifacts linked to the deprecated US-007", + "should_trigger": false, + "reason": "BDD artifact cleanup — routes to bdd-maintain" + }, + { + "id": 21, + "query": "Sync the feature files with the updated living doc", + "should_trigger": false, + "reason": "Feature file sync — routes to gherkin-living-doc-sync" + }, + { + "id": 22, + "query": "What does PR #217 affect in the living doc?", + "should_trigger": false, + "reason": "Impact analysis — routes to living-doc-impact-analysis" + }, + { + "id": 23, + "query": "Scan the webapp and generate PageObjects for the login screen", + "should_trigger": false, + "reason": "PageObject scanning — routes to living-doc-pageobject-scan" + }, + { + "id": 24, + "query": "Add data-cy attributes to the login form inputs", + "should_trigger": false, + "reason": "Instrumenting templates — routes to data-cy-instrument" + }, + { + "id": 25, + "query": "Run a gap analysis to find User Stories without scenarios", + "should_trigger": false, + "reason": "Gap analysis — routes to living-doc-gap-finder" + }, + { + "id": 26, + "query": "Create a Feature entity for the authentication module", + "should_trigger": false, + "reason": "Creating a Feature — routes to living-doc-create-feature" + }, + { + "id": 27, + "query": "When creating this User Story, should I link the existing Functionality FUNC-promo-validate?", + "should_trigger": true, + "reason": "Linking existing Functionalities during US creation — living-doc-create-user-story owns this" + } +] \ No newline at end of file diff --git a/skills/living-doc-create-user-story/scripts/next_id.py b/skills/living-doc-create-user-story/scripts/next_id.py new file mode 100644 index 0000000..77942e0 --- /dev/null +++ b/skills/living-doc-create-user-story/scripts/next_id.py @@ -0,0 +1,142 @@ +#!/usr/bin/env python3 +""" +next_id.py — Living Doc ID Auto-Assigner + +Scans the living documentation and returns the next available ID for a given entity type. +Use this before creating a new entity to avoid ID collisions. + +Usage: + python next_id.py --type US --catalog catalog.json → US-005 + python next_id.py --type FEAT --catalog catalog.json → FEAT-012 + python next_id.py --type FUNC --catalog catalog.json → FUNC-003 + python next_id.py --type AC --parent US-007 --catalog catalog.json → AC:US-007-05 + python next_id.py --type AC --parent FUNC-002 --catalog catalog.json → AC:FUNC-002-03 + +Exits with code 0 and prints the next ID on stdout. +Exits with code 1 and prints an error on stderr if the catalog cannot be read, +the entity type is unknown, or --parent is missing when --type AC is used. + +Catalog JSON must contain one of: + - Top-level keys: "user_stories", "features", "functionalities" + - Or nested under a "catalog" key: {"catalog": {"user_stories": [...], ...}} +""" + +import argparse +import json +import re +import sys + +# Maps entity type token → (catalog collection key, ID regex with capture group for the number) +ENTITY_TYPE_MAP: dict[str, tuple[str, re.Pattern]] = { + "US": ("user_stories", re.compile(r"^US-(\d+)$")), + "FEAT": ("features", re.compile(r"^FEAT-(\d+)$")), + "FUNC": ("functionalities", re.compile(r"^FUNC-(\d+)$")), +} + +# Width of the numeric suffix (zero-padded) +ID_WIDTH = 3 + + +def load_catalog(path: str) -> dict: + with open(path) as f: + raw = json.load(f) + # Support both {"catalog": {...}} and flat {"user_stories": [...]} formats + return raw.get("catalog", raw) + + +def next_entity_id(catalog: dict, entity_type: str) -> str: + """ + Return the next sequential ID for US, FEAT, or FUNC entities. + Scans the matching collection for the highest existing numeric suffix. + """ + if entity_type not in ENTITY_TYPE_MAP: + raise ValueError( + f"Unknown entity type '{entity_type}'. " + f"Must be one of: {sorted(ENTITY_TYPE_MAP)}" + ) + collection_key, pattern = ENTITY_TYPE_MAP[entity_type] + entities: list[dict] = catalog.get(collection_key, []) + + max_num = 0 + for entity in entities: + m = pattern.match(entity.get("id", "")) + if m: + max_num = max(max_num, int(m.group(1))) + + return f"{entity_type}-{max_num + 1:0{ID_WIDTH}d}" + + +def next_ac_id(catalog: dict, parent_id: str) -> str: + """ + Return the next sequential AC ID for a given parent entity (User Story or Functionality). + AC format: AC:- (two-digit zero-padded suffix) + + Scans the parent entity's acceptance_criteria list for the highest existing number. + """ + prefix = parent_id.split("-")[0] + collection_map = {"US": "user_stories", "FUNC": "functionalities"} + collection_key = collection_map.get(prefix) + if not collection_key: + raise ValueError( + f"Cannot determine entity collection for parent '{parent_id}'. " + f"Prefix must be 'US' or 'FUNC'." + ) + + entities: list[dict] = catalog.get(collection_key, []) + parent = next((e for e in entities if e.get("id") == parent_id), None) + if parent is None: + raise ValueError(f"Entity '{parent_id}' not found in catalog") + + ac_pattern = re.compile(rf"^AC:{re.escape(parent_id)}-(\d+)$") + max_num = 0 + for ac in parent.get("acceptance_criteria", []): + m = ac_pattern.match(ac.get("id", "")) + if m: + max_num = max(max_num, int(m.group(1))) + + return f"AC:{parent_id}-{max_num + 1:02d}" + + +def main() -> None: + parser = argparse.ArgumentParser( + description="Return the next available living doc entity ID." + ) + parser.add_argument( + "--type", "-t", + required=True, + choices=["US", "FEAT", "FUNC", "AC"], + help="Entity type to generate an ID for", + ) + parser.add_argument( + "--parent", "-p", + help="Parent entity ID — required when --type is AC (e.g. US-007 or FUNC-002)", + ) + parser.add_argument( + "--catalog", "-c", + required=True, + help="Path to the catalog JSON file", + ) + args = parser.parse_args() + + if args.type == "AC" and not args.parent: + print("Error: --parent is required when --type is AC", file=sys.stderr) + sys.exit(1) + + try: + catalog = load_catalog(args.catalog) + if args.type == "AC": + result = next_ac_id(catalog, args.parent) + else: + result = next_entity_id(catalog, args.type) + except (FileNotFoundError, json.JSONDecodeError) as exc: + print(f"Error reading catalog: {exc}", file=sys.stderr) + sys.exit(1) + except ValueError as exc: + print(f"Error: {exc}", file=sys.stderr) + sys.exit(1) + + print(result) + + +if __name__ == "__main__": + main() diff --git a/skills/living-doc-gap-finder/SKILL.md b/skills/living-doc-gap-finder/SKILL.md new file mode 100644 index 0000000..413995c --- /dev/null +++ b/skills/living-doc-gap-finder/SKILL.md @@ -0,0 +1,299 @@ +--- +name: living-doc-gap-finder +description: > + Identify gaps in the living documentation by combining bottom-up and top-down analysis. + Use when auditing living doc completeness, finding undocumented behaviors, orphan tests, + orphan Functionalities, untested ACs, or producing a documentation coverage gap report. + Proposes actions executed by living-doc-create-*, living-doc-scenario-creator, and + living-doc-update. Re-run after entity creation or status changes to confirm gaps are closed. + Triggers on: "find what's not documented", "living doc gaps", "what's missing in living doc", + "find undocumented features", "orphan tests", "orphan functionalities", "untested AC", + "documentation coverage", "gap report", "what's not covered", "living doc audit", + "documentation audit", "stale reference", "broken AC link", "test points to deprecated AC", + "PLAN mode", "AUDIT mode", "draft ACs from PageObject descriptions". + Does NOT trigger for: creating new living doc objects (use living-doc-create-* skills). + Pairs with living-doc-update and living-doc-create-* skills. +license: Apache-2.0 +compatibility: GitHub Copilot +--- + +# Living Doc — Gap Finder + +> **Key concepts:** Feature, Functionality, User Story, AC — see [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)). + +## Script — `scripts/compute_gaps.py` + +Run this script to compute all 9 gap types deterministically before producing the gap report. +It takes a catalog snapshot JSON as input and outputs the `gaps[]` array and coverage stats. + +```bash +# Human-readable summary +python scripts/compute_gaps.py catalog-snapshot.json --summary + +# Machine-readable report +python scripts/compute_gaps.py catalog-snapshot.json --output gap-report.json +``` + +The catalog must contain `catalog`, `inventory`, and `known_test_links` sections — +see `evals/files/catalog-snapshot.json` for a worked example. + +Run the script first, then use its output to drive the Prioritise and Propose steps below. +The Workflow section describes the logic the script encodes — read it for understanding, but +delegate the computation to the script rather than reproducing it through reasoning. + +Before presenting the final report, normalise the script output against the taxonomy in this skill: +- The first gap type (`UNTESTED_AC`) applies to **both User Story ACs and Functionality ACs**. If a Functionality has ACs and no linked tests, report those ACs as `UNTESTED_AC` **Blockers** (you may summarise as `FUNC-xyz has N ACs with no linked tests`) and do **not** leave the same root cause only as `UNDOCUMENTED_FUNCTIONALITY`. +- Report documentation coverage **separately** for User Story ACs and Functionality ACs, even if the raw script output gives a combined number. +- For `UNDOCUMENTED_SURFACE`, treat a discovered screen/API as already documented when an existing Feature clearly owns the same surface by path, name, or domain meaning (for example `/account/orders` ↔ `Account Dashboard`, `/reports/legacy` ↔ `Legacy Report Screen`). Only raise `UNDOCUMENTED_SURFACE` when no plausible owning Feature exists. +- **Always refer to gap types by their name** (e.g. `ORPHAN_TEST`, `UNTESTED_AC`) — never by an ordinal number (e.g. "Gap type 6"). The priority order below is for triage, not for labelling gaps in the report. + +--- + +## Mode names + +| Mode | When to use | +|---|---| +| **AUDIT mode** | Full catalog audit — runs the 9-type taxonomy top-down across all entities. Use after a sprint with entity changes or when the living doc hasn’t been reviewed recently. | +| **PLAN mode** | Bootstrap new coverage — draft ACs from PageObject descriptions or discovered UI surfaces (bottom-up). Produces `PLANNED`-state AC drafts for user confirmation before creating entities. | + +Both modes use `compute_gaps.py` and the same gap taxonomy. AUDIT mode spans the full catalog; PLAN mode is scoped to the surfaces being bootstrapped. + +--- + +## Gap taxonomy + +Nine types of gaps are detected, in order of risk: + +| Priority | Gap type | Description | +|---|---|---| +| 1 — Blocker | **Untested AC** | An `ACTIVE` AC in a User Story or Functionality has no linked test. | +| 2 — Important | **Undocumented UI surface** | A screen or API endpoint exists in the app with no Feature entity | +| 3 — Important | **Orphan Feature** | A Feature entity exists with no linked User Story | +| 4 — Important | **Orphan User Story** | A User Story exists with no linked Feature | +| 5 — Important | **Orphan Functionality** | A Functionality exists with no parent Feature | +| 6 — Important | **Orphan test** | A test exists with no linked AC | +| 7 — Important | **Stale reference** | An active test references a Deprecated AC | +| 8 — Nit | **Undocumented Functionality** | A Functionality entity exists with no associated tests | +| 9 — Nit | **Empty Feature** | A Feature entity exists with no Functionalities defined | + +> **Resolution routing:** `UNTESTED_AC` → `living-doc-scenario-creator`; `UNDOCUMENTED_SURFACE` / `ORPHAN_FUNCTIONALITY` / `EMPTY_FEATURE` → `living-doc-create-*`; `ORPHAN_FEATURE` / `ORPHAN_USER_STORY` → `living-doc-update` (add missing link); `ORPHAN_TEST` → `gherkin-living-doc-sync`; **`STALE_REFERENCE`** → `living-doc-update` (deprecate the AC or update the test `@AC:` tag); `UNDOCUMENTED_FUNCTIONALITY` → `living-doc-scenario-creator`. + +> **ORPHAN_TEST — never delete a test to resolve the gap.** Deleting a test removes coverage; it does not close the gap — it masks it. Instead: (1) find an existing AC that matches the test's intent and add the `@AC:` link, or (2) if no AC exists, create a Functionality with `living-doc-create-functionality` and link the test to the new AC. Only delete a test after explicit product owner confirmation that the behavior is no longer required. + +> **ORPHAN_TEST — broken-link variant:** A test may reference an AC that was deleted from the catalog entirely (not merely deprecated). Classify this as `ORPHAN_TEST` (broken-link variant) — not `STALE_REFERENCE`. Resolution options: (1) recreate the entity if the behavior is still required and relink; (2) update the test link to the AC that superseded it; (3) delete the test after product owner confirmation. Never delete without confirmation. + +> **Large-scale ORPHAN_TEST remediation:** When a codebase has dozens or hundreds of orphan tests, do not attempt a single full-codebase pass. Batch by domain or Feature area (for example payment, auth, reporting) and process the highest-business-risk areas first. For each batch, identify which Functionalities or User Stories the tests correspond to, create missing entities, and link tests. A single unmanageable gap report leads to paralysis — smaller focused batches produce actionable outcomes. + +## Workflow + +### Step 1 — Bottom-up scan + +Build an **inventory** of: +- All discoverable UI screens and API endpoints +- All existing test files + +Output: `inventory.json` — a flat list of discovered artifacts. + +### Step 2 — Top-down entity traversal + +Traverse the entity graph top-down, starting from User Stories as roots: + +- **User Stories** (root) — load all entities with their ACs and status +- **Features** — for each User Story, follow its `features` list to reach linked Features +- **Functionalities** — for each Feature, follow its `functionalities` list to reach owned Functionalities +- **Test links** — collect all test file to AC mappings for cross-referencing in Step 3 + +### Step 3 — Compute gaps + +For each gap type: + +**UNTESTED_AC:** +``` +For each AC in (UserStory.ACs + Functionality.ACs) + where status == ACTIVE + where no linked test exists: + GAP: UNTESTED_AC +``` + +**UNDOCUMENTED_SURFACE:** +``` +For each item in inventory (screens, API endpoints) + where no Feature entity exists for this surface: + GAP: UNDOCUMENTED_SURFACE +``` + +**ORPHAN_FEATURE:** +``` +For each Feature reachable via entity relationships + where user_stories == []: + GAP: ORPHAN_FEATURE +``` + +**ORPHAN_USER_STORY:** +``` +For each User Story in entity graph + where user_story.features == []: + GAP: ORPHAN_USER_STORY +``` + +**ORPHAN_FUNCTIONALITY:** +``` +For each Functionality in entity graph + where functionality.parent_feature == null: + GAP: ORPHAN_FUNCTIONALITY +``` + +**ORPHAN_TEST:** +``` +For each test in inventory + where no linked AC exists in any UserStory or Functionality: + GAP: ORPHAN_TEST +``` + +**STALE_REFERENCE:** +``` +For each test in inventory + where linked_ac.status == Deprecated: + GAP: STALE_REFERENCE +``` + +**ORPHAN_TEST — broken-link variant:** +Also report `ORPHAN_TEST` when a test references an AC ID that **no longer exists** in the catalog (deleted, not merely deprecated). Distinguishing the two: a deprecated AC still has a living entity and can be reinstated; a deleted AC has no catalog entry at all. Resolution options are the same as standard `ORPHAN_TEST` — see the resolution routing note above. + +**UNDOCUMENTED_FUNCTIONALITY:** +``` +For each Functionality reachable via Feature `functionalities` links + where no test references this Functionality's ACs: + GAP: UNDOCUMENTED_FUNCTIONALITY +``` + +**EMPTY_FEATURE:** +``` +For each Feature reachable via entity relationships + where functionalities == []: + GAP: EMPTY_FEATURE +``` + +### Step 4 — Prioritise by risk + +Sort all gaps by: +1. Priority (Blocker before Important before Nit) +2. Within priority: by the number of dependent entities (higher impact first) +3. Within that: alphabetically by entity ID + +### Step 5 — Propose new entities + +For each gap, propose the living doc action: + +| Gap type | Proposed action | +|---|---| +| UNTESTED_AC | Create a test for the uncovered AC — use `living-doc-create-functionality` to define the behavior if not yet documented | +| UNDOCUMENTED_SURFACE | Create Feature entity — `living-doc-create-feature` | +| ORPHAN_FEATURE | (1) Confirm the Feature entity actually exists in the storage profile — a broken reference may mean the Feature was renamed or deleted without updating the link. (2) If the Feature exists: link it to an existing User Story or propose creating one. (3) If deletion is the right action: **always confirm with the user before deleting** — state the Feature ID, name, and any Functionalities it owns, and ask explicitly: *"No User Story references FEAT-nnn. Delete this Feature and its N Functionalities?"* | +| ORPHAN_USER_STORY | Link to an existing Feature, or create the missing Feature — `living-doc-create-feature` | +| ORPHAN_FUNCTIONALITY | Link to an existing Feature, or delete if the behavior has no owning surface. Do not delete if tests reference this Functionality's ACs — resolve those first (see ORPHAN_TEST). | +| ORPHAN_TEST | Link test to an existing AC, or create a Functionality — `living-doc-create-functionality`. **Never delete a test to resolve an orphan — that would silently remove coverage.** If the linked AC ID no longer exists (broken link), choose from: (1) recreate the AC/Functionality if the behavior is still required; (2) update the link to the merged AC ID if the entity was merged; (3) delete the test only after product owner confirmation that the behavior has been intentionally removed. | +| STALE_REFERENCE | Use `living-doc-update` to manage the AC state first: reinstate the AC if the deprecation was in error, or confirm the deprecation is intentional. Then update the test to reference the active replacement AC, or delete the test after product owner confirmation if the behavior has been intentionally retired. | +| UNDOCUMENTED_FUNCTIONALITY | Create unit/integration tests for the Functionality's ACs | +| EMPTY_FEATURE | Create Functionalities for the Feature's known behaviors — `living-doc-create-functionality` | + +## Out-of-scope routing + +| Request | Correct skill | +|---|---| +| Create a User Story | `living-doc-create-user-story` | +| Create a Feature | `living-doc-create-feature` | +| Create a Functionality | `living-doc-create-functionality` | +| Update or deprecate an entity / AC | `living-doc-update` | +| Generate BDD scenarios | `living-doc-scenario-creator` | + +Living-doc-gap-finder identifies and proposes — it does not create or edit entities. + +### Step 6 — Output gap report + +```json +{ + "generated_at": "2026-05-15T10:00:00Z", + "documentation_coverage": { + "user_stories_with_full_coverage": 12, + "user_stories_with_gaps": 3, + "coverage_percentage": 80 + }, + "gaps": [ + { + "id": "GAP-001", + "type": "UNTESTED_AC", + "severity": "Blocker", + "entity": "AC:US-007-02", + "description": "Active AC 'Payment declined' has no linked E2E test", + "proposed_action": "Create a test to cover AC:US-007-02 for US-007" + }, + { + "id": "GAP-002", + "type": "UNDOCUMENTED_SURFACE", + "severity": "Important", + "entity": "/account/preferences", + "description": "Screen /account/preferences discovered in webapp scan — no Feature entity", + "proposed_action": "Create Feature entity using living-doc-create-feature" + } + ] +} +``` + +## Documentation coverage metric + +``` +Coverage % = (ACs with at least one linked test) / (total ACs) × 100 +``` + +Report separately for: +- User Story ACs (E2E coverage) +- Functionality ACs (unit/integration coverage) + +A project with 100% documentation coverage has every AC backed by at least one test. + +## Large-scale analysis: batching guidance + +When the gap inventory is large (e.g. 100+ orphan tests or undocumented features from a legacy +codebase), running a single full-codebase gap-finder pass produces an unmanageable report. +Use the following two-phase strategy: + +### Phase 1 — Baseline: ensure every User Story has at least one covered AC + +Before addressing any other gap type, guarantee minimum traceability across all User Stories: + +1. List all User Stories where **zero ACs** have a linked test. +2. For each, identify the highest-priority AC (first Active AC, or the first AC if none is Active). +3. Create one test for that AC using the appropriate testing workflow. +4. Repeat until every User Story has at least one covered AC. + +This phase establishes a baseline coverage floor. Do not skip to Phase 2 until all User Stories +have at least one covered AC. + +### Phase 2 — Depth: address gaps in order of size + +Once the baseline is met, continue by tackling the biggest remaining gaps first: + +1. **Rank gap clusters by count** — group all remaining gaps by type and sort descending by number of affected entities. +2. **Start with the highest-risk domain first** — payment, auth, security, or other release-critical areas take priority over lower-risk domains, even before broad legacy clean-up. +3. **Batch by domain** — within a cluster, process one Feature or service at a time. +4. **Iterate** — after each batch, re-run gap-finder on that domain before moving to the next. + +Processing everything at once is discouraged because the resulting gap list is too large to action +without clear prioritisation. + +## Lightweight coverage report format + +When the focus is specifically on test-to-AC coverage (rather than the full gap taxonomy), +or when asked to demonstrate or describe the gap report output format, +use this simplified two-section format: + +**Missing Tests** (ACs with no linked test): +- `` — + +**Orphan Tests** (tests with no corresponding AC): +- `` — + +End with a summary line: `X ACs missing tests, Y tests missing ACs.` + +This format is diagnostic only — it does not suggest implementation changes. diff --git a/skills/living-doc-gap-finder/evals/evals.json b/skills/living-doc-gap-finder/evals/evals.json new file mode 100644 index 0000000..d703e84 --- /dev/null +++ b/skills/living-doc-gap-finder/evals/evals.json @@ -0,0 +1,221 @@ +{ + "skill_name": "living-doc-gap-finder", + "evals": [ + { + "id": 1, + "category": "happy-path", + "prompt": "Run a gap analysis on our living documentation. File: evals/files/catalog-snapshot.json", + "expected_output": "Agent analyzes the snapshot and produces a gap report with: Blocker — US-001-AC-2 and US-001-AC-3 have no linked tests; Blocker — US-002-AC-1 and US-002-AC-2 have no linked tests; Blocker — all 4 ACs of US-007 have no linked tests; Blocker — FUNC-apply-discount has 5 ACs with no linked tests (Gap type 1 applies to Functionality ACs — report as UNTESTED_AC Blocker, not UNDOCUMENTED_FUNCTIONALITY); Important — /account/preferences screen discovered in webapp with no Feature entity (after normalisation: /account/orders ↔ FEAT-account, /reports/legacy ↔ FEAT-orphan are already documented); Important — FEAT-promo and FEAT-orphan each have no linked User Stories (orphan Features); Important — US-007 has no linked Feature (orphan User Story); Important — test_order_history.py, test_login_flow.feature, and the 'View paginated order history' BDD scenario have no linked ACs (orphan tests); Nit — FEAT-account and FEAT-orphan have no Functionalities defined (empty Features). Documentation coverage reported separately for US ACs and Functionality ACs.", + "files": [ + "evals/files/catalog-snapshot.json" + ], + "expectations": [ + "Identifies US-001-AC-2 and US-001-AC-3 as untested (Blockers)", + "Identifies US-002-AC-1 and US-002-AC-2 as untested (Blockers)", + "Identifies all 4 US-007 ACs as untested (Blockers)", + "Identifies FUNC-apply-discount ACs as untested (Blocker, not Nit — Gap type 1 applies to Functionality ACs)", + "Identifies /account/preferences as undocumented surface (Important)", + "Identifies FEAT-promo as orphan Feature (Important)", + "Identifies FEAT-orphan as orphan Feature (Important)", + "Identifies test_order_history.py and test_login_flow.feature as orphan tests (Important)", + "Identifies 'View paginated order history' BDD scenario as orphan test (Important)", + "Identifies FEAT-account and FEAT-orphan as empty Features (Nit — no Functionalities)", + "Identifies US-007 as orphan User Story (Important — no linked Feature)", + "Normalises undocumented surfaces: only /account/preferences is truly undocumented after matching existing Features", + "Calculates documentation coverage percentage separately for US ACs and Functionality ACs" + ] + }, + { + "id": 2, + "category": "happy-path", + "prompt": "What is documentation coverage and how is it calculated?", + "expected_output": "Documentation coverage = (ACs with at least one linked test) / (total ACs) * 100%. Reported separately for User Story ACs (E2E coverage) and Functionality ACs (unit/integration coverage). A project with 100% documentation coverage has every AC backed by at least one test. The metric drives the gap-finder workflow toward zero gaps.", + "files": [], + "expectations": [ + "Correct formula: covered ACs / total ACs * 100", + "Reported separately for US ACs and Functionality ACs", + "Notes 100% means every AC has at least one test" + ] + }, + { + "id": 3, + "category": "happy-path", + "prompt": "A test file exists with no linked AC. What gap type is this and what should I do?", + "expected_output": "This is an orphan test (ORPHAN_TEST (Important)). Resolution options: (1) find an existing AC in the living doc that this test covers and add the link; (2) if no AC exists, create a Functionality entity for the behavior being tested using living-doc-create-functionality, then link the test to the new Functionality's AC. Never delete a test to resolve an orphan — that would remove coverage.", + "files": [], + "expectations": [ + "Classifies as ORPHAN_TEST", + "Provides two resolution options: link to existing AC, or create new Functionality", + "Explicitly warns against deleting the test" + ] + }, + { + "id": 4, + "category": "regression", + "prompt": "We have 200 orphan tests from a legacy codebase. Should I run gap-finder on all of them at once?", + "expected_output": "Batch the gap-finder run by domain or Feature area rather than running across the entire codebase at once. Process the highest-risk areas first (payment, auth, security). For each batch: identify which Functionalities or User Stories the tests correspond to, create missing living doc entities, and link tests. Processing all 200 at once produces an unmanageable gap report — prioritise by business risk.", + "files": [], + "expectations": [ + "Recommends batching by domain or Feature area", + "Prioritises highest-risk areas first (payment, auth, security)", + "Advises against running full-codebase analysis in one pass" + ] + }, + { + "id": 5, + "category": "negative", + "prompt": "Create a new User Story for the account preferences screen.", + "expected_output": "Creating a User Story is not a gap-finder action — routes to living-doc-create-user-story. living-doc-gap-finder identifies and proposes new entities; the creation itself is delegated to the appropriate create-* skill.", + "files": [], + "expectations": [ + "Does not create the User Story", + "Routes to living-doc-create-user-story", + "Notes the gap-finder proposes, the create-* skills execute" + ] + }, + { + "id": 6, + "category": "paraphrase", + "prompt": "Where are the holes in our living documentation? I want to make sure everything is covered before the release.", + "expected_output": "Agent runs the full gap analysis workflow. Reports: untested ACs (by priority), undocumented surfaces (features visible in code/UI with no Feature entity), orphan tests (tests with no linked AC), orphan Features (no User Stories or Functionalities). Produces documentation coverage percentage. Highlights Blockers first (untested critical ACs), then Important (undocumented surfaces, orphan tests), then Nit (low-priority untested ACs).", + "files": [], + "expectations": [ + "Identifies this as a gap analysis request despite 'holes' phrasing", + "Reports gap types by severity (Blocker/Important/Nit)", + "Produces documentation coverage percentage", + "Covers all four gap types: untested ACs, undocumented surfaces, orphan tests, orphan features" + ] + }, + { + "id": 7, + "category": "edge-case", + "prompt": "A test is linked to an AC, but the AC was deleted from the living doc. How is this classified and what should I do?", + "expected_output": "This is a broken-link gap (ORPHAN_TEST variant — broken AC link). The test references an AC ID that no longer exists. Resolution options: (1) If the behavior the test covers is still required, recreate the Functionality/AC entity and relink. (2) If the behavior has been removed, the test should be deleted after confirming with the product owner. (3) If the AC was merged into another entity, update the test's link comment to the new AC ID. Never delete a test without product owner confirmation.", + "files": [], + "expectations": [ + "Classifies as ORPHAN_TEST (broken-link variant)", + "Provides three resolution options", + "Warns against deleting the test without product owner confirmation", + "Notes the possibility of AC merge as a resolution path" + ] + }, + { + "id": 8, + "category": "output-format", + "prompt": "Run a gap analysis and show me exactly what format the output report uses.", + "expected_output": "The gap report is emitted as structured JSON (or a formatted rendering of it) with a top-level `documentation_coverage` section (coverage_percentage, user_stories_with_full_coverage, user_stories_with_gaps) and a `gaps[]` array. Each gap item includes: id (GAP-NNN), type (one of UNTESTED_AC, UNDOCUMENTED_SURFACE, ORPHAN_FEATURE, ORPHAN_USER_STORY, ORPHAN_FUNCTIONALITY, ORPHAN_TEST, STALE_REFERENCE, UNDOCUMENTED_FUNCTIONALITY, EMPTY_FEATURE), severity (Blocker/Important/Nit), entity (the affected entity ID or path), description, and proposed_action. Gaps are ordered by severity (Blocker first, then Important, then Nit). The report is diagnostic only — no entity creation or modification is made.", + "files": [], + "expectations": [ + "Report includes top-level documentation_coverage section with coverage_percentage", + "gaps[] array present; each item has id, type, severity, entity, description, proposed_action", + "Gap type codes are canonical: UNTESTED_AC, ORPHAN_TEST, ORPHAN_FEATURE, etc.", + "Gaps ordered by severity (Blocker before Important before Nit)", + "Diagnostic only — no entity creation or modification" + ] + }, + { + "id": 9, + "category": "regression", + "prompt": "A test references AC:US-042-01 but that AC was deprecated last sprint. What gap type is this and how do I resolve it?", + "expected_output": "This is a stale reference (STALE_REFERENCE (Important)). The active test references a Deprecated AC. Resolution options: (1) update the test's link to the active replacement AC if the behavior was superseded; (2) reinstate the AC using living-doc-update if it was deprecated in error; (3) if the behavior was intentionally removed, delete the test after product owner confirmation. The test must not be deleted without product owner confirmation.", + "files": [], + "expectations": [ + "Classifies as STALE_REFERENCE", + "Classifies severity as Important", + "Provides three resolution options: relink to new AC, reinstate AC, or delete after PO confirmation", + "Warns against deleting test without product owner confirmation", + "Notes reinstatement path via living-doc-update" + ] + }, + { + "id": 10, + "category": "edge-case", + "prompt": "We have 50 orphan tests and 30 untested ACs across the entire platform. Should I run a single all-domain gap report and work through everything at once?", + "expected_output": "No — use the two-phase strategy. Phase 1: ensure every User Story has at least one covered AC. List all User Stories with zero covered ACs, cover the first AC of each before moving on. This establishes a minimum traceability baseline. Phase 2: once every US has at least one covered AC, rank gap clusters by count, prioritise the highest-risk domains first (payment, auth, security), batch by Feature or domain, and iterate. Processing all 80 gaps at once produces an unmanageable report and obscures progress.", + "files": [], + "expectations": [ + "Recommends two-phase strategy over single full-pass", + "Phase 1: ensure every US has at least one covered AC before proceeding", + "Phase 2: rank by count, prioritise high-risk domains, batch by domain", + "Explains why a single full-platform pass is discouraged" + ] + }, + { + "id": 11, + "category": "happy-path", + "prompt": "A Functionality entity FUNC-promo-validate exists in the catalog but has no parent Feature linked. What gap type is this and what should I do?", + "expected_output": "This is an orphan Functionality (ORPHAN_FUNCTIONALITY (Important)). A Functionality with no parent Feature is untraceable — it cannot be reached via the entity hierarchy and is missed in impact analyses. Resolution: identify or create the owning Feature and add FUNC-promo-validate to its functionalities list. If tests reference this Functionality's ACs, resolve those first (ORPHAN_TEST takes priority) before removing the Functionality.", + "files": [], + "expectations": [ + "Classifies as ORPHAN_FUNCTIONALITY", + "Classifies severity as Important", + "Advises linking to an existing Feature or creating one", + "Warns: do not remove if tests reference this Functionality's ACs" + ] + }, + { + "id": 12, + "category": "regression", + "prompt": "The gap-finder script reports /reports/legacy as an UNDOCUMENTED_SURFACE but there is already a Feature entity 'Legacy Report Screen' (FEAT-orphan) in the catalog. Should this be reported as a gap?", + "expected_output": "No. After normalisation, /reports/legacy is already documented — FEAT-orphan (Legacy Report Screen) clearly owns that surface by name and domain meaning. The skill instructs to treat a discovered screen as already documented when an existing Feature clearly owns the same surface by path, name, or domain meaning. Remove this item from the gap report. FEAT-orphan still has other gaps (orphan Feature, empty Feature) but UNDOCUMENTED_SURFACE is not one of them.", + "files": [], + "expectations": [ + "Removes /reports/legacy from UNDOCUMENTED_SURFACE gaps after normalisation", + "Explains normalisation rule: surface matches by name/domain meaning", + "Notes FEAT-orphan still has ORPHAN_FEATURE and EMPTY_FEATURE gaps", + "Distinguishes raw script output from normalised report" + ] + }, + { + "id": 13, + "prompt": "A Feature entity FEAT-checkout exists in the living doc but its functionalities list is empty — no Functionality entities are linked to it. What gap type is this and what is the priority?", + "expected_output": "Gap type EMPTY_FEATURE (not \"Gap type 9\"). Priority: Nit. Guidance: define Functionality entities for the behaviors this Feature owns using living-doc-create-functionality.", + "files": [], + "category": "edge-case", + "expectations": [ + "Classifies as EMPTY_FEATURE gap type", + "Priority is Nit (lowest severity — feature may still be valid but incomplete)", + "Recommends creating Functionality entities for the behaviors the Feature owns", + "Routes to living-doc-create-functionality for the fix" + ] + }, + { + "id": 14, + "prompt": "FEAT-checkout is linked to no User Stories at all. What gap type is this?", + "expected_output": "Gap type ORPHAN_FEATURE (not \"Gap type 3\"). Priority: Important. Guidance: link at least one User Story to give the Feature traceable business value.", + "files": [], + "category": "edge-case", + "expectations": [ + "Classifies as ORPHAN_FEATURE gap type", + "Priority is Important (Feature is unreachable from any User Story)", + "Recommends linking at least one User Story to the Feature", + "Routes to living-doc-create-user-story or living-doc-update to create or link a US" + ] + }, + { + "id": 15, + "category": "happy-path", + "prompt": "What is the difference between AUDIT mode and PLAN mode in the gap finder?", + "expected_output": "AUDIT mode performs a full catalog audit — it processes all entities and test files to produce a gap report covering all 9 gap types. Use AUDIT before a release or when you want a comprehensive view. PLAN mode is bottom-up: it reads PageObject descriptions for a set of User Stories and drafts missing ACs directly from the PO element names. Use PLAN when you have PageObjects but no ACs yet.", + "files": [], + "expectations": [ + "Correctly defines AUDIT mode as full catalog audit", + "Correctly defines PLAN mode as bottom-up AC drafting from PO descriptions", + "Names compute_gaps.py for AUDIT and references PO descriptions for PLAN", + "Explains when to use each mode" + ] + }, + { + "id": 16, + "category": "happy-path", + "prompt": "I have a new User Story US-021 with no ACs yet. The PageObject for the relevant screen exists. Which gap-finder mode should I use and what happens?", + "expected_output": "Use PLAN mode. PLAN mode reads the PageObject element descriptions for the linked screen and drafts candidate ACs from the element names and interaction patterns. The output is a list of proposed ACs for review — you then accept, modify, or discard each one before adding them to US-021.", + "files": [], + "expectations": [ + "Recommends PLAN mode", + "Explains that PLAN mode derives ACs from PageObject descriptions", + "Output is draft/proposed ACs, not finalized ones" + ] + } + ] +} \ No newline at end of file diff --git a/skills/living-doc-gap-finder/evals/files/catalog-snapshot.json b/skills/living-doc-gap-finder/evals/files/catalog-snapshot.json new file mode 100644 index 0000000..1c28db1 --- /dev/null +++ b/skills/living-doc-gap-finder/evals/files/catalog-snapshot.json @@ -0,0 +1,59 @@ +{ + "generated_at": "2026-05-15T08:00:00Z", + "catalog": { + "user_stories": [ + {"id": "US-001", "title": "Place an online order", "status": "ready", "ac_count": 3}, + {"id": "US-002", "title": "View order history", "status": "ready", "ac_count": 2}, + {"id": "US-007", "title": "Apply a promotional discount", "status": "ready", "ac_count": 4} + ], + "features": [ + {"id": "FEAT-checkout", "name": "Checkout Page", "user_stories": ["US-001"]}, + {"id": "FEAT-account", "name": "Account Dashboard", "user_stories": ["US-002"]}, + {"id": "FEAT-promo", "name": "Promotions Module", "user_stories": []}, + {"id": "FEAT-orphan", "name": "Legacy Report Screen", "user_stories": [], "functionalities": []} + ], + "functionalities": [ + { + "id": "FUNC-validate-cart", + "parent_feature": "FEAT-checkout", + "ac_count": 3, + "linked_tests": ["test_validate_cart.py::test_empty_cart"] + }, + { + "id": "FUNC-apply-discount", + "parent_feature": "FEAT-promo", + "ac_count": 5, + "linked_tests": [] + } + ] + }, + "inventory": { + "ui_screens": [ + "/checkout", + "/account/orders", + "/account/preferences", + "/promotions", + "/reports/legacy" + ], + "test_files": [ + {"file": "test_validate_cart.py", "linked_ac": "FUNC-validate-cart-AC-1"}, + {"file": "test_order_history.py", "linked_ac": null}, + {"file": "test_login_flow.feature", "linked_ac": null} + ], + "bdd_scenarios": [ + {"scenario": "Customer successfully places an order", "linked_ac": "US-001-AC-1"}, + {"scenario": "View paginated order history", "linked_ac": null} + ] + }, + "known_test_links": { + "US-001-AC-1": "tests/features/checkout.feature::Customer successfully places an order", + "US-001-AC-2": null, + "US-001-AC-3": null, + "US-002-AC-1": null, + "US-002-AC-2": null, + "US-007-AC-1": null, + "US-007-AC-2": null, + "US-007-AC-3": null, + "US-007-AC-4": null + } +} diff --git a/skills/living-doc-gap-finder/evals/files/gap-report.json b/skills/living-doc-gap-finder/evals/files/gap-report.json new file mode 100644 index 0000000..c7df25f --- /dev/null +++ b/skills/living-doc-gap-finder/evals/files/gap-report.json @@ -0,0 +1,176 @@ +{ + "generated_at": "2026-05-22T19:03:52.630632+00:00", + "documentation_coverage": { + "total_acs": 9, + "covered_acs": 1, + "coverage_percentage": 11.1 + }, + "summary": { + "total_gaps": 20, + "blockers": 8, + "important": 9, + "nits": 3 + }, + "gaps": [ + { + "id": "GAP-001", + "type": "UNTESTED_AC", + "severity": "Blocker", + "entity": "US-001-AC-2", + "description": "AC 'US-001-AC-2' has no linked test", + "proposed_action": "Generate a BDD scenario using living-doc-scenario-creator, or add a unit/integration test and link it to this AC" + }, + { + "id": "GAP-002", + "type": "UNTESTED_AC", + "severity": "Blocker", + "entity": "US-001-AC-3", + "description": "AC 'US-001-AC-3' has no linked test", + "proposed_action": "Generate a BDD scenario using living-doc-scenario-creator, or add a unit/integration test and link it to this AC" + }, + { + "id": "GAP-003", + "type": "UNTESTED_AC", + "severity": "Blocker", + "entity": "US-002-AC-1", + "description": "AC 'US-002-AC-1' has no linked test", + "proposed_action": "Generate a BDD scenario using living-doc-scenario-creator, or add a unit/integration test and link it to this AC" + }, + { + "id": "GAP-004", + "type": "UNTESTED_AC", + "severity": "Blocker", + "entity": "US-002-AC-2", + "description": "AC 'US-002-AC-2' has no linked test", + "proposed_action": "Generate a BDD scenario using living-doc-scenario-creator, or add a unit/integration test and link it to this AC" + }, + { + "id": "GAP-005", + "type": "UNTESTED_AC", + "severity": "Blocker", + "entity": "US-007-AC-1", + "description": "AC 'US-007-AC-1' has no linked test", + "proposed_action": "Generate a BDD scenario using living-doc-scenario-creator, or add a unit/integration test and link it to this AC" + }, + { + "id": "GAP-006", + "type": "UNTESTED_AC", + "severity": "Blocker", + "entity": "US-007-AC-2", + "description": "AC 'US-007-AC-2' has no linked test", + "proposed_action": "Generate a BDD scenario using living-doc-scenario-creator, or add a unit/integration test and link it to this AC" + }, + { + "id": "GAP-007", + "type": "UNTESTED_AC", + "severity": "Blocker", + "entity": "US-007-AC-3", + "description": "AC 'US-007-AC-3' has no linked test", + "proposed_action": "Generate a BDD scenario using living-doc-scenario-creator, or add a unit/integration test and link it to this AC" + }, + { + "id": "GAP-008", + "type": "UNTESTED_AC", + "severity": "Blocker", + "entity": "US-007-AC-4", + "description": "AC 'US-007-AC-4' has no linked test", + "proposed_action": "Generate a BDD scenario using living-doc-scenario-creator, or add a unit/integration test and link it to this AC" + }, + { + "id": "GAP-009", + "type": "UNDOCUMENTED_SURFACE", + "severity": "Important", + "entity": "/account/orders", + "description": "Surface '/account/orders' exists in the application with no Feature entity", + "proposed_action": "Create a Feature entity using living-doc-create-feature" + }, + { + "id": "GAP-010", + "type": "UNDOCUMENTED_SURFACE", + "severity": "Important", + "entity": "/account/preferences", + "description": "Surface '/account/preferences' exists in the application with no Feature entity", + "proposed_action": "Create a Feature entity using living-doc-create-feature" + }, + { + "id": "GAP-011", + "type": "UNDOCUMENTED_SURFACE", + "severity": "Important", + "entity": "/reports/legacy", + "description": "Surface '/reports/legacy' exists in the application with no Feature entity", + "proposed_action": "Create a Feature entity using living-doc-create-feature" + }, + { + "id": "GAP-013", + "type": "ORPHAN_FEATURE", + "severity": "Important", + "entity": "FEAT-orphan", + "description": "Feature 'FEAT-orphan' (Legacy Report Screen) has no linked User Stories", + "proposed_action": "Link to an existing User Story or confirm with the product owner whether to deprecate" + }, + { + "id": "GAP-012", + "type": "ORPHAN_FEATURE", + "severity": "Important", + "entity": "FEAT-promo", + "description": "Feature 'FEAT-promo' (Promotions Module) has no linked User Stories", + "proposed_action": "Link to an existing User Story or confirm with the product owner whether to deprecate" + }, + { + "id": "GAP-014", + "type": "ORPHAN_USER_STORY", + "severity": "Important", + "entity": "US-007", + "description": "User Story 'US-007' (Apply a promotional discount) has no linked Feature", + "proposed_action": "Link to an existing Feature or create the missing Feature using living-doc-create-feature" + }, + { + "id": "GAP-017", + "type": "ORPHAN_TEST", + "severity": "Important", + "entity": "View paginated order history", + "description": "Test 'View paginated order history' has no linked AC", + "proposed_action": "Link to an existing AC, or create a Functionality for the behavior using living-doc-create-functionality. Never delete the test to resolve the gap." + }, + { + "id": "GAP-016", + "type": "ORPHAN_TEST", + "severity": "Important", + "entity": "test_login_flow.feature", + "description": "Test 'test_login_flow.feature' has no linked AC", + "proposed_action": "Link to an existing AC, or create a Functionality for the behavior using living-doc-create-functionality. Never delete the test to resolve the gap." + }, + { + "id": "GAP-015", + "type": "ORPHAN_TEST", + "severity": "Important", + "entity": "test_order_history.py", + "description": "Test 'test_order_history.py' has no linked AC", + "proposed_action": "Link to an existing AC, or create a Functionality for the behavior using living-doc-create-functionality. Never delete the test to resolve the gap." + }, + { + "id": "GAP-018", + "type": "UNDOCUMENTED_FUNCTIONALITY", + "severity": "Nit", + "entity": "FUNC-apply-discount", + "description": "Functionality 'FUNC-apply-discount' has 5 AC(s) with no linked tests", + "proposed_action": "Create unit or integration tests for this Functionality's ACs and link them" + }, + { + "id": "GAP-019", + "type": "EMPTY_FEATURE", + "severity": "Nit", + "entity": "FEAT-account", + "description": "Feature 'FEAT-account' (Account Dashboard) has no Functionalities defined", + "proposed_action": "Create Functionalities for known behaviors using living-doc-create-functionality" + }, + { + "id": "GAP-020", + "type": "EMPTY_FEATURE", + "severity": "Nit", + "entity": "FEAT-orphan", + "description": "Feature 'FEAT-orphan' (Legacy Report Screen) has no Functionalities defined", + "proposed_action": "Create Functionalities for known behaviors using living-doc-create-functionality" + } + ] +} \ No newline at end of file diff --git a/skills/living-doc-gap-finder/evals/fixture-map.md b/skills/living-doc-gap-finder/evals/fixture-map.md new file mode 100644 index 0000000..4c7ff49 --- /dev/null +++ b/skills/living-doc-gap-finder/evals/fixture-map.md @@ -0,0 +1,36 @@ +# Fixture Map — living-doc-gap-finder + +## Fixture files + +| File | Description | +|---|---| +| `evals/files/catalog-snapshot.json` | Snapshot of the living doc catalog + webapp inventory showing: 8 uncovered US ACs, 5 uncovered Functionality ACs, 1 undocumented screen after normalisation, 2 orphan Features, 1 orphan User Story, 3 orphan tests, and 2 empty Features | +| `evals/files/gap-report.json` | Expected gap report output produced by compute_gaps.py before normalisation — used as reference for output-format eval | + +## Eval to fixture mapping + +| Eval ID | Category | Fixture file(s) | Coverage | +|---|---|---|---| +| 1 | happy-path | `catalog-snapshot.json` | Full gap analysis: all 9 gap types, severity levels, normalisation rules, coverage % | +| 2 | happy-path | _(none)_ | Coverage metric formula and separate US vs Functionality reporting | +| 3 | happy-path | _(none)_ | ORPHAN_TEST resolution: link or create Functionality, never delete | +| 4 | regression | _(none)_ | Batch processing advice: domain-by-domain, prioritise by business risk | +| 5 | negative | _(none)_ | Routing: creating a User Story → living-doc-create-user-story | +| 6 | paraphrase | _(none)_ | "Holes in living doc" → gap analysis framing | +| 7 | edge-case | _(none)_ | Broken-link orphan test: AC ID deleted from catalog | +| 8 | output-format | `gap-report.json` | Canonical gap report JSON: coverage section + gaps[] array structure | +| 9 | regression | _(none)_ | STALE_REFERENCE (Gap type 7): active test linked to deprecated AC | +| 10 | edge-case | _(none)_ | Two-phase strategy for 50+ orphan tests and untested ACs | +| 11 | happy-path | _(none)_ | ORPHAN_FUNCTIONALITY (Gap type 5): Functionality with no parent Feature | +| 12 | regression | `catalog-snapshot.json` | Normalisation: /reports/legacy ↔ FEAT-orphan — not an UNDOCUMENTED_SURFACE | + +## Trigger eval summary + +19 entries: 13 `should_trigger=true`, 6 `should_trigger=false` + +| Routes to | Query count | +|---|---| +| living-doc-create-user-story | 1 | +| living-doc-create-feature | 1 | +| living-doc-update | 1 | +| gherkin-step | 1 | diff --git a/skills/living-doc-gap-finder/evals/trigger-eval.json b/skills/living-doc-gap-finder/evals/trigger-eval.json new file mode 100644 index 0000000..29a1325 --- /dev/null +++ b/skills/living-doc-gap-finder/evals/trigger-eval.json @@ -0,0 +1,194 @@ +[ + { + "id": 1, + "query": "Run a living doc gap analysis", + "should_trigger": true, + "reason": "'living doc gaps' trigger phrase" + }, + { + "id": 2, + "query": "What's missing in our living documentation?", + "should_trigger": true, + "reason": "'what's missing in living doc' trigger phrase" + }, + { + "id": 3, + "query": "Find undocumented features in the codebase", + "should_trigger": true, + "reason": "'find undocumented features' trigger phrase" + }, + { + "id": 4, + "query": "Which tests have no linked acceptance criteria (orphan tests)?", + "should_trigger": true, + "reason": "'orphan tests' trigger keyword" + }, + { + "id": 5, + "query": "Which ACs have no tests? (untested ACs)", + "should_trigger": true, + "reason": "'untested AC' trigger keyword" + }, + { + "id": 6, + "query": "What is our documentation coverage percentage?", + "should_trigger": true, + "reason": "'documentation coverage' trigger keyword" + }, + { + "id": 7, + "query": "Generate a gap report for the payments domain", + "should_trigger": true, + "reason": "'gap report' trigger keyword" + }, + { + "id": 8, + "query": "What behaviors are not covered in the living doc?", + "should_trigger": true, + "reason": "'what's not covered' trigger phrase" + }, + { + "id": 9, + "query": "Do a living doc audit before the release", + "should_trigger": true, + "reason": "'living doc audit' trigger phrase" + }, + { + "id": 10, + "query": "Which User Story ACs are critical but have no BDD scenario?", + "should_trigger": true, + "reason": "Finding untested critical ACs — core gap-finder task" + }, + { + "id": 11, + "query": "Find what's not documented in our test suite", + "should_trigger": true, + "reason": "'find what's not documented' trigger phrase" + }, + { + "id": 12, + "query": "Create a user story for the preferences screen gap", + "should_trigger": false, + "reason": "Creating a User Story — routes to living-doc-create-user-story" + }, + { + "id": 13, + "query": "Create a Feature entity for the account preferences screen", + "should_trigger": false, + "reason": "Creating a Feature — routes to living-doc-create-feature" + }, + { + "id": 14, + "query": "Implement step definitions for the gap report scenario", + "should_trigger": false, + "reason": "Step definition implementation — routes to gherkin-step" + }, + { + "id": 15, + "query": "Do a documentation audit to check for missing tests before the go-live", + "should_trigger": true, + "reason": "'documentation audit' trigger phrase" + }, + { + "id": 16, + "query": "Which Functionalities in our living doc have no parent Feature?", + "should_trigger": true, + "reason": "Detecting ORPHAN_FUNCTIONALITY gaps — core gap-finder task" + }, + { + "id": 17, + "query": "A test is pointing to a deprecated AC — what kind of gap is that?", + "should_trigger": true, + "reason": "Stale reference detection (Gap type 7) — core gap-finder task" + }, + { + "id": 18, + "query": "We have 100 orphan tests — how should we batch the gap-finder run?", + "should_trigger": true, + "reason": "Batching strategy for large-scale gap analysis — gap-finder guidance task" + }, + { + "id": 19, + "query": "Update US-042 to add a new AC", + "should_trigger": false, + "reason": "Updating an existing entity — routes to living-doc-update" + }, + { + "id": 20, + "query": "Write step definitions for the checkout scenarios", + "should_trigger": false, + "reason": "Writing step definitions — routes to gherkin-step" + }, + { + "id": 21, + "query": "Delete all BDD artifacts linked to the deprecated feature", + "should_trigger": false, + "reason": "BDD artifact cleanup — routes to bdd-maintain" + }, + { + "id": 22, + "query": "Sync the @AC: tags in the feature files with the AC catalog", + "should_trigger": false, + "reason": "Feature file sync — routes to gherkin-living-doc-sync" + }, + { + "id": 23, + "query": "Scan the checkout page and generate PageObjects", + "should_trigger": false, + "reason": "PageObject scanning — routes to living-doc-pageobject-scan" + }, + { + "id": 24, + "query": "Add data-cy attributes to the checkout template", + "should_trigger": false, + "reason": "Instrumenting templates — routes to data-cy-instrument" + }, + { + "id": 25, + "query": "Generate BDD scenarios for all active ACs on US-007", + "should_trigger": false, + "reason": "Generating scenarios — routes to living-doc-scenario-creator" + }, + { + "id": 26, + "query": "Create a new Functionality for the discount calculation logic", + "should_trigger": false, + "reason": "Creating a Functionality — routes to living-doc-create-functionality" + }, + { + "id": 27, + "query": "Update the wording of AC-1 on US-042", + "should_trigger": false, + "reason": "Updating an entity — routes to living-doc-update" + }, + { + "id": 28, + "query": "What does this PR affect in the living doc?", + "should_trigger": false, + "reason": "Impact analysis — routes to living-doc-impact-analysis" + }, + { + "id": 29, + "query": "Create a User Story for the account management screen", + "should_trigger": false, + "reason": "Creating a User Story — routes to living-doc-create-user-story" + }, + { + "id": 30, + "query": "Write behave step definitions for the gap report scenario", + "should_trigger": false, + "reason": "Writing step definitions — routes to gherkin-step" + }, + { + "id": 31, + "query": "Run an AUDIT mode gap analysis on the entire catalog before the release", + "should_trigger": true, + "reason": "AUDIT mode — gap-finder AUDIT mode keyword" + }, + { + "id": 32, + "query": "Use PLAN mode to draft ACs from the PageObject descriptions for the checkout screen", + "should_trigger": true, + "reason": "PLAN mode — gap-finder PLAN mode keyword" + } +] \ No newline at end of file diff --git a/skills/living-doc-gap-finder/scripts/.DS_Store b/skills/living-doc-gap-finder/scripts/.DS_Store new file mode 100644 index 0000000..5008ddf Binary files /dev/null and b/skills/living-doc-gap-finder/scripts/.DS_Store differ diff --git a/skills/living-doc-gap-finder/scripts/compute_gaps.py b/skills/living-doc-gap-finder/scripts/compute_gaps.py new file mode 100644 index 0000000..1f634f2 --- /dev/null +++ b/skills/living-doc-gap-finder/scripts/compute_gaps.py @@ -0,0 +1,320 @@ +#!/usr/bin/env python3 +""" +compute_gaps.py — Living Doc Gap Finder + +Runs all 9 gap-detection algorithms over a catalog snapshot JSON and outputs a +structured gap report. The model uses the report to propose remediation actions +rather than computing the traversal itself — ensuring deterministic, token-efficient results. + +Usage: + python compute_gaps.py + python compute_gaps.py --output report.json + python compute_gaps.py --summary + +Input format: see evals/files/catalog-snapshot.json for a worked example. +The catalog JSON must contain: + catalog.user_stories — list of User Story entities + catalog.features — list of Feature entities + catalog.functionalities — list of Functionality entities + inventory.ui_screens — list of discovered UI screen paths (optional) + inventory.api_endpoints — list of discovered API paths (optional) + inventory.test_files — list of {file, linked_ac} test entries (optional) + inventory.bdd_scenarios — list of {scenario, linked_ac} entries (optional) + known_test_links — dict of AC_ID → test path | null + deprecated_acs — list of deprecated AC IDs (optional) +""" + +import argparse +import json +import sys +from datetime import datetime, timezone + +# Gap type → (sort priority, severity label) +PRIORITIES = { + "UNTESTED_AC": (1, "Blocker"), + "UNDOCUMENTED_SURFACE": (2, "Important"), + "ORPHAN_FEATURE": (3, "Important"), + "ORPHAN_USER_STORY": (4, "Important"), + "ORPHAN_FUNCTIONALITY": (5, "Important"), + "ORPHAN_TEST": (6, "Important"), + "STALE_REFERENCE": (7, "Important"), + "UNDOCUMENTED_FUNCTIONALITY": (8, "Nit"), + "EMPTY_FEATURE": (9, "Nit"), +} + + +def load_snapshot(path: str) -> dict: + with open(path) as f: + return json.load(f) + + +def compute_gaps(snapshot: dict) -> list[dict]: + catalog = snapshot.get("catalog", {}) + inventory = snapshot.get("inventory", {}) + known_test_links: dict = snapshot.get("known_test_links", {}) + + user_stories: list[dict] = catalog.get("user_stories", []) + features: list[dict] = catalog.get("features", []) + functionalities: list[dict] = catalog.get("functionalities", []) + + ui_screens: list[str] = inventory.get("ui_screens", []) + api_endpoints: list[str] = inventory.get("api_endpoints", []) + test_files: list[dict] = inventory.get("test_files", []) + bdd_scenarios: list[dict] = inventory.get("bdd_scenarios", []) + + gaps: list[dict] = [] + gap_counter = 0 + + def add_gap(gap_type: str, entity: str, description: str, proposed_action: str) -> None: + nonlocal gap_counter + gap_counter += 1 + priority_order, severity = PRIORITIES[gap_type] + gaps.append( + { + "id": f"GAP-{gap_counter:03d}", + "type": gap_type, + "_priority": priority_order, + "severity": severity, + "entity": entity, + "description": description, + "proposed_action": proposed_action, + } + ) + + # ── Build lookup indices ─────────────────────────────────────────────────── + feature_index = {f["id"]: f for f in features} + us_ids = {us["id"] for us in user_stories} + + # Features referenced by at least one User Story back-link + features_linked_from_us: set[str] = set() + for feat in features: + for us_id in feat.get("user_stories", []): + if us_id in us_ids: + features_linked_from_us.add(feat["id"]) + # Also honour forward links on the User Story entity itself + for us in user_stories: + for feat_id in us.get("features", []): + if feat_id in feature_index: + features_linked_from_us.add(feat_id) + + # Effective Functionality count per Feature (union of forward + back links) + feature_func_counts: dict[str, int] = {f["id"]: 0 for f in features} + for fn in functionalities: + parent = fn.get("parent_feature") + if parent and parent in feature_func_counts: + feature_func_counts[parent] += 1 + for feat in features: + declared = feat.get("functionalities", []) + if declared: + feature_func_counts[feat["id"]] = max(feature_func_counts[feat["id"]], len(declared)) + + # Documented surface name tokens for Gap-2 matching (case-insensitive) + documented_surface_tokens: set[str] = set() + for feat in features: + name = feat.get("name", "").lower() + if name: + documented_surface_tokens.add(name) + for path in feat.get("paths", []): + documented_surface_tokens.add(path.lower()) + + # Deprecated ACs from explicit list + link metadata + deprecated_ac_ids: set[str] = set(snapshot.get("deprecated_acs", [])) + for ac_id, link in known_test_links.items(): + if isinstance(link, dict) and link.get("ac_status") == "deprecated": + deprecated_ac_ids.add(ac_id) + + # ── Gap 1 — UNTESTED_AC ──────────────────────────────────────────────────── + for ac_id, test_link in known_test_links.items(): + if test_link is None: + add_gap( + "UNTESTED_AC", + ac_id, + f"AC '{ac_id}' has no linked test", + "Generate a BDD scenario using living-doc-scenario-creator, " + "or add a unit/integration test and link it to this AC", + ) + + # ── Gap 2 — UNDOCUMENTED_SURFACE ────────────────────────────────────────── + for surface in ui_screens + api_endpoints: + surface_lower = surface.lower().lstrip("/") + matched = any( + surface_lower in token or token in surface_lower + for token in documented_surface_tokens + ) + if not matched: + add_gap( + "UNDOCUMENTED_SURFACE", + surface, + f"Surface '{surface}' exists in the application with no Feature entity", + "Create a Feature entity using living-doc-create-feature", + ) + + # ── Gap 3 — ORPHAN_FEATURE ──────────────────────────────────────────────── + for feat in features: + if not feat.get("user_stories"): + add_gap( + "ORPHAN_FEATURE", + feat["id"], + f"Feature '{feat['id']}' ({feat.get('name', '')}) has no linked User Stories", + "Link to an existing User Story or confirm with the product owner whether to deprecate", + ) + + # ── Gap 4 — ORPHAN_USER_STORY ───────────────────────────────────────────── + us_linked_to_any_feature: set[str] = set() + for feat in features: + for us_id in feat.get("user_stories", []): + us_linked_to_any_feature.add(us_id) + for us in user_stories: + has_forward_link = bool(us.get("features")) + has_back_link = us["id"] in us_linked_to_any_feature + if not has_forward_link and not has_back_link: + add_gap( + "ORPHAN_USER_STORY", + us["id"], + f"User Story '{us['id']}' ({us.get('title', us.get('name', ''))}) has no linked Feature", + "Link to an existing Feature or create the missing Feature using living-doc-create-feature", + ) + + # ── Gap 5 — ORPHAN_FUNCTIONALITY ────────────────────────────────────────── + for fn in functionalities: + if not fn.get("parent_feature"): + add_gap( + "ORPHAN_FUNCTIONALITY", + fn["id"], + f"Functionality '{fn['id']}' has no parent Feature", + "Link to an existing Feature; if no owning surface exists, deprecate this Functionality", + ) + + # ── Gap 6 — ORPHAN_TEST ─────────────────────────────────────────────────── + all_tests = test_files + bdd_scenarios + for test in all_tests: + label = test.get("file") or test.get("scenario", "unknown") + if not test.get("linked_ac"): + add_gap( + "ORPHAN_TEST", + label, + f"Test '{label}' has no linked AC", + "Link to an existing AC, or create a Functionality for the behavior using " + "living-doc-create-functionality. Never delete the test to resolve the gap.", + ) + + # ── Gap 7 — STALE_REFERENCE ─────────────────────────────────────────────── + for test in all_tests: + label = test.get("file") or test.get("scenario", "unknown") + linked_ac = test.get("linked_ac") + if linked_ac and linked_ac in deprecated_ac_ids: + add_gap( + "STALE_REFERENCE", + label, + f"Test '{label}' references deprecated AC '{linked_ac}'", + "Update the test to reference the active replacement AC via gherkin-living-doc-sync; " + "if the behavior was intentionally removed, delete the test after product owner confirmation", + ) + + # ── Gap 8 — UNDOCUMENTED_FUNCTIONALITY ──────────────────────────────────── + for fn in functionalities: + if not fn.get("linked_tests"): + ac_count = fn.get("ac_count", 0) + add_gap( + "UNDOCUMENTED_FUNCTIONALITY", + fn["id"], + f"Functionality '{fn['id']}' has {ac_count} AC(s) with no linked tests", + "Create unit or integration tests for this Functionality's ACs and link them", + ) + + # ── Gap 9 — EMPTY_FEATURE ───────────────────────────────────────────────── + for feat in features: + if feature_func_counts.get(feat["id"], 0) == 0 and not feat.get("functionalities"): + add_gap( + "EMPTY_FEATURE", + feat["id"], + f"Feature '{feat['id']}' ({feat.get('name', '')}) has no Functionalities defined", + "Create Functionalities for known behaviors using living-doc-create-functionality", + ) + + # Sort: priority ascending, then entity ID ascending + gaps.sort(key=lambda g: (g["_priority"], g["entity"])) + for g in gaps: + del g["_priority"] + + return gaps + + +def coverage_stats(snapshot: dict, gaps: list[dict]) -> dict: + known_test_links: dict = snapshot.get("known_test_links", {}) + total = len(known_test_links) + covered = sum(1 for v in known_test_links.values() if v is not None) + return { + "total_acs": total, + "covered_acs": covered, + "coverage_percentage": round(covered / total * 100, 1) if total else 0.0, + } + + +def build_report(snapshot: dict, gaps: list[dict]) -> dict: + stats = coverage_stats(snapshot, gaps) + severity_counts = {"Blocker": 0, "Important": 0, "Nit": 0} + for g in gaps: + severity_counts[g["severity"]] += 1 + return { + "generated_at": datetime.now(timezone.utc).isoformat(), + "documentation_coverage": stats, + "summary": { + "total_gaps": len(gaps), + "blockers": severity_counts["Blocker"], + "important": severity_counts["Important"], + "nits": severity_counts["Nit"], + }, + "gaps": gaps, + } + + +def print_summary(report: dict) -> None: + cov = report["documentation_coverage"] + summ = report["summary"] + print(f"\n=== Living Doc Gap Report — {report['generated_at']} ===") + print( + f"Coverage: {cov['covered_acs']}/{cov['total_acs']} ACs covered " + f"({cov['coverage_percentage']}%)" + ) + print( + f"Gaps: {summ['total_gaps']} total | " + f"{summ['blockers']} Blocker | " + f"{summ['important']} Important | " + f"{summ['nits']} Nit\n" + ) + for gap in report["gaps"]: + print(f" [{gap['severity']:9s}] {gap['id']} {gap['type']}") + print(f" Entity: {gap['entity']}") + print(f" Action: {gap['proposed_action']}\n") + + +def main() -> None: + parser = argparse.ArgumentParser( + description="Compute living doc gaps from a catalog snapshot." + ) + parser.add_argument("snapshot", help="Path to catalog-snapshot.json") + parser.add_argument( + "--output", "-o", help="Write JSON gap report to this file (default: stdout)" + ) + parser.add_argument( + "--summary", "-s", action="store_true", help="Print human-readable summary" + ) + args = parser.parse_args() + + snapshot = load_snapshot(args.snapshot) + gaps = compute_gaps(snapshot) + report = build_report(snapshot, gaps) + + if args.summary: + print_summary(report) + elif args.output: + with open(args.output, "w") as f: + json.dump(report, f, indent=2) + print(f"Gap report written to {args.output}", file=sys.stderr) + else: + print(json.dumps(report, indent=2)) + + +if __name__ == "__main__": + main() diff --git a/skills/living-doc-impact-analysis/SKILL.md b/skills/living-doc-impact-analysis/SKILL.md new file mode 100644 index 0000000..dea7b2c --- /dev/null +++ b/skills/living-doc-impact-analysis/SKILL.md @@ -0,0 +1,216 @@ +--- +name: living-doc-impact-analysis +description: > + Analyse the impact of a code change on the living documentation. Given a PR diff, modified + module, or changed API contract, trace affected Features, Functionalities, and User Stories. + Output an impact map identifying what must be reviewed, updated, or re-tested. Activate when + a PR touches business logic, a service module is refactored, or breaking API changes need + living doc coverage traced. + Triggers on: "living doc impact", "what does this change affect", "impact of PR on living doc", + "trace affected user stories", "affected features", "impact analysis", "living doc sign-off", + "what user stories are affected", "which scenarios need re-running", "what needs re-testing", + "PR impact on docs", "bootstrap feature_registry". + Does NOT trigger for: updating living doc (use living-doc-update); finding coverage gaps + (use living-doc-gap-finder); creating new entities (use living-doc-create-*). + Pairs with living-doc-update, gherkin-living-doc-sync, and bdd-maintain. +license: Apache-2.0 +compatibility: GitHub Copilot +--- + +# Living Doc — Impact Analysis + +> **Key concepts:** Feature, Functionality, User Story, AC — see [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)). + +## Script — `scripts/trace_impact.py` + +Run this script to trace changed files to living doc entities before producing the impact map. +The catalog JSON must include a `feature_registry` section mapping path patterns to Feature IDs. + +```bash +# Trace from an explicit file list +python scripts/trace_impact.py --files src/payments/PromoService.java --catalog catalog.json --summary + +# Trace from a unified git diff +python scripts/trace_impact.py --diff changes.diff --catalog catalog.json --output impact.json +``` + +Feature registry format (add to your catalog JSON): +```json +{ + "feature_registry": [ + { "feature_id": "FEAT-001", "paths": ["src/auth/**", "src/security/login*"] } + ] +} +``` + +**Bootstrapping `feature_registry`:** If no registry exists, follow these steps: +1. Run `living-doc-gap-finder` to list all Feature entities and their IDs. +2. For each Feature, manually map its canonical source directory to its ID: + - Angular: `"paths": ["src/app/pages/checkout/**"]` mirrors the module directory under `src/app/`. + - Java/Spring: `"paths": ["src/main/java/com/example/checkout/**"]` uses the package path. +3. Add each mapping as `{ "feature_id": "FEAT-", "paths": [""] }` under `"feature_registry"` in `catalog.json`. +4. Re-run `trace_impact.py` to verify mappings resolve correctly against a known changed file. + +Maintain the registry whenever a Feature is created, renamed, or its source directory moves. The `living-doc-create-feature` and `living-doc-update` "Rename a Feature" workflows include a reminder for this step. + +The script handles Steps 1–2 (file classification and entity traversal). Use its output JSON +to drive Steps 3–5 (impact classification, impact map narrative, and sign-off checklist). + +--- + +## Fast path — infra/config-only and test-only PRs + +Before running the full workflow, check whether the PR scope is entirely out of living-doc reach. +If **all** changed files fall into one or more of these categories, issue a concise no-impact +verdict and stop — do not generate a full Impact Map: + +| Scope | Examples | Verdict | +|---|---|---| +| Pure infrastructure | Kubernetes manifests, Helm charts, Terraform, Docker resource limits | **No living doc impact** | +| Build / CI config | `Dockerfile`, GitHub Actions, `pom.xml` dependency bumps | **No living doc impact** | +| Test-only | `*Test.java`, `*Spec.ts`, mock/stub files, test fixtures | **No living doc impact** (unless a test references an AC that no longer exists — flag that separately) | +| Documentation / comments | `*.md`, `*.adoc`, Javadoc-only changes | **No living doc impact** | + +**Concise no-impact verdict format:** + +``` +Impact level: None. + + is a change. It does not modify business logic, API contracts, +event contracts, or UI behaviour, so no living doc entities require updating. + +Recommended action: note "no living doc update required" in the PR and proceed. +``` + +Skip Steps 2–5 for these PRs. Only escalate to the full workflow if at least one changed file +touches domain logic, an API contract, an event contract, or a UI component. + +--- + +## Step 1 — Identify the changed surface area + +Start from the code change (PR diff, renamed module, deleted endpoint): + +1. List the changed files and classify each: + - **Domain logic** (service, repository, domain model) + - **API contract** (controller, route, OpenAPI spec) + - **Event contract** (schema, Avro, Protobuf) + - **UI component** (page, form, component) + - **Configuration / infrastructure** (no living doc impact unless it changes a business flow) + +2. Map changed files to modules/services using the project structure. + +3. For each changed module, identify the corresponding Feature by traversing entity relationships: + - Which Feature owns this module? (check the Feature's `functionalities` links or ask the owning team) + - Which Functionalities does this module implement? + - If the module has no matching `feature_registry` entry, treat it as missing living doc coverage for impact-analysis purposes: flag a **High-impact gap**, recommend `living-doc-create-functionality`, and note that the registry mapping must be added. + +## Step 2 — Trace to living doc entities + +Walk the entity hierarchy from Feature, Functionality, to User Story: + +``` +Changed module: src/payments/checkout/PromoService.java + Feature: FEAT-promotions + Functionalities: FUNC-promo-validate, FUNC-promo-apply + User Stories: US-042 (apply promo), US-067 (expired promo error) + ACs affected: AC:US-042-01, AC:US-042-03, AC:US-067-02 +``` + +Repeat for every changed module. Consolidate entities that appear more than once — they are +higher-risk and need priority review. + +**Shared utility classes:** If the changed file is a shared utility used by multiple modules +(e.g. `MoneyUtils`, `DateHelper`), fan out the trace to **every** Feature that imports or +depends on that utility. Classify each as **High impact** — a shared utility change propagates +to all consumers and each consumer's ACs must be reviewed. Produce a consolidated impact map +that covers all affected Feature areas. + +## Step 3 — Classify the impact level + +| Impact level | Criteria | Action required | +|---|---|---| +| **High** | AC or business rule directly changed or deleted | Must update living doc and re-run linked tests | +| **Medium** | Module changed but business rule unchanged (refactor/rename) | Update living doc if method names referenced; confirm tests still pass | +| **Low** | Config / infra change that alters a business flow | Update living doc if the flow change is documented; note in PR | +| **None** | Pure infrastructure change (resource limits, scaling, deployment config) with no business flow impact; or test files, mocks, build scripts only | No living doc update needed | + +## Step 4 — Output the impact map + +Emit a structured impact map for the PR or change set: + +``` +IMPACT MAP — PR #217: "Refactor promo validation to support stacked discounts" + Surface area: src/payments/checkout/PromoService.java (domain logic — High) + src/payments/checkout/PromoController.java (API contract — High) + + Affected entities: + Feature: FEAT-promotions (owner: team-payments) + Functionalities: FUNC-promo-validate, FUNC-promo-apply + User Stories: US-042 (high impact), US-067 (high impact), US-089 (medium impact) + + ACs requiring review: + AC:US-042-01 — Happy path: single promo applied correctly + AC:US-042-03 — Stacked promos applied in priority order ← NEW BEHAVIOUR + AC:US-067-02 — Expired promo returns 422 + + Recommended actions: + 1. Update living-doc: add AC for stacked discount priority order (AC:US-042-03 is new) + Invoke living-doc-update + 2. Re-run E2E journeys: US-042 and US-067 critical path scenarios + Invoke test-e2e-standards +``` + +If the request is framed as **"what needs re-testing"**, present Step 4 as a compact **re-test checklist**: group by Feature / Functionality / User Story and list the affected ACs. + +## Step 5 — Release sign-off checklist + +Before a release, confirm that all High-impact entities have been addressed: + +| Check | Status | +|---|---| +| All High-impact ACs reviewed and updated if needed | ☐ | +| living-doc-update applied for any changed business rules | ☐ | + +Produce this checklist as a PR comment or documentation artefact if requested. + +> **After completing the impact map:** if the analysis identified ACs or entity descriptions that +> must change, hand off to `living-doc-update` immediately. Pass the exact entity ID(s) and the +> recommended change from Step 4's recommended actions list. This skill analyses — it does not +> edit entities. If any High-impact ACs were subsequently modified or deprecated, also invoke +> `gherkin-living-doc-sync` to propagate the changes to linked feature files. If the change +> revealed that a Feature or Functionality has been fully deprecated with active BDD coverage, +> also invoke `bdd-maintain` REMOVE mode to clean up the associated automation files. + +## Code-level impact report format + +When the change is a **method signature change** or **API contract change**, produce a +code-level impact report with four sections: + +**Direct callers** — classes or methods that call the changed method directly (markdown list). + +**Downstream dependents** — components that use the return value or depend on the changed +contract (markdown list). + +**Required changes** — concrete call-site updates needed (markdown list; include the old and +new signatures in fenced code blocks). + +**Test coverage required** — tests that must be added or updated to cover the new contract +(markdown list). + +Do not include speculative changes beyond the described scope. + +## Anti-patterns to flag + +| Anti-pattern | Flag | +|---|---| +| Changed domain logic with no Feature entity defined in the living doc | Missing living doc coverage — flag as a **High-impact gap** and recommend creating documentation with `living-doc-create-functionality` | +| Impact analysis only covers unit/integration tests, not E2E scenarios | Incomplete impact — flag for test-e2e-standards review | + +## Out-of-scope routing + +| Request type | Correct skill | +|---|---| +| "Update a living doc entity / add a new AC" | `living-doc-update` — this skill analyses impact, it does not edit entities | +| "Which Functionalities have no User Stories / find coverage gaps" | `living-doc-gap-finder` — gap discovery is a separate concern | +| "Clean up BDD files for a deprecated feature" | `bdd-maintain` — deletes automation artifacts for removed entities | diff --git a/skills/living-doc-impact-analysis/evals/evals.json b/skills/living-doc-impact-analysis/evals/evals.json new file mode 100644 index 0000000..5a37f6b --- /dev/null +++ b/skills/living-doc-impact-analysis/evals/evals.json @@ -0,0 +1,205 @@ +{ + "skill_name": "living-doc-impact-analysis", + "evals": [ + { + "id": 1, + "category": "happy-path", + "prompt": "PR #217 modifies PromoService.java to support stacked discounts. What living doc entities does this change affect?", + "expected_output": "Agent maps PromoService.java to its Feature via the feature_registry in catalog.json (or runs trace_impact.py). Traces Feature → Functionality → User Stories → ACs. Classifies impact as High (changed business logic). Outputs a structured impact map listing affected Features, Functionalities, User Stories, and ACs that require review.", + "files": [], + "expectations": [ + "Maps changed file to Feature via the feature_registry section in catalog.json (or runs trace_impact.py --catalog catalog.json)", + "Traces Feature → Functionality → User Stories → ACs", + "Classifies impact level as High for changed business logic", + "Outputs a structured impact map" + ] + }, + { + "id": 2, + "category": "regression", + "prompt": "We changed the /v2/orders endpoint to add a required 'currency' field. Which Gherkin scenarios need to be re-run?", + "expected_output": "Agent identifies the changed API contract surface area (/v2/orders endpoint). Maps the endpoint to its Feature and Functionalities. Lists the linked User Stories and ACs. Lists the linked Gherkin scenarios that reference this endpoint's behavior and must be re-run.", + "files": [], + "expectations": [ + "Identifies changed API contract surface area", + "Maps endpoint to Feature and Functionalities", + "Lists linked User Stories and ACs", + "Lists linked Gherkin scenarios requiring re-run" + ] + }, + { + "id": 3, + "category": "regression", + "prompt": "We're about to release the checkout refactor. Can you produce a living doc impact sign-off checklist for the release?", + "expected_output": "Agent identifies all High-impact entities in the checkout release scope. Produces a release sign-off checklist covering: ACs reviewed, linked scenarios re-run, living-doc-update applied for changed business rules, and gherkin-living-doc-sync run for affected feature files.", + "files": [], + "expectations": [ + "Identifies all High-impact entities", + "Produces a release sign-off checklist", + "Checklist includes: ACs reviewed, scenarios re-run, living-doc-update applied, gherkin-living-doc-sync run" + ] + }, + { + "id": 4, + "category": "regression", + "prompt": "The ShippingCalculator.java was changed in the PR but it has no entry in the feature_registry section of catalog.json.", + "expected_output": "Agent flags missing living doc coverage as a High-impact gap. The changed module has no feature_registry entry and cannot be traced to a Feature. Recommends invoking living-doc-create-functionality to document the missing behavior, and notes that the feature_registry mapping must be added.", + "files": [], + "expectations": [ + "Flags missing living doc coverage as a High-impact gap", + "Notes the module has no feature_registry entry", + "Recommends invoking living-doc-create-functionality for the missing module", + "Notes that the feature_registry mapping must be added" + ] + }, + { + "id": 5, + "category": "regression", + "prompt": "PR #300 updates the Kubernetes resource limits for the order-service deployment. What is the living doc impact?", + "expected_output": "Agent classifies this as a None impact level. Config and infrastructure changes that do not affect any business flow do not require living doc updates. Notes in the PR that no living doc update is needed.", + "files": [], + "expectations": [ + "Classifies as None impact level", + "Config/infra changes do not require living doc updates", + "Notes in PR that no living doc update is needed" + ] + }, + { + "id": 6, + "category": "negative", + "prompt": "Update US-042 to add a new AC for the expired promo path.", + "expected_output": "Updating living doc entities is out of scope for this skill — routes to living-doc-update. living-doc-impact-analysis traces which entities are affected by a code change; it does not create or modify entity content.", + "files": [], + "expectations": [ + "Does not update the User Story", + "Routes to living-doc-update", + "Explains the distinction: impact tracing vs. entity modification" + ] + }, + { + "id": 7, + "category": "negative", + "prompt": "Which Functionalities don't have any User Stories?", + "expected_output": "Finding coverage gaps is out of scope for this skill — routes to living-doc-gap-finder. living-doc-impact-analysis traces the impact of code changes; coverage gap detection is handled by living-doc-gap-finder.", + "files": [], + "expectations": [ + "Does not search for orphan Functionalities", + "Routes to living-doc-gap-finder", + "Explains the distinction: impact analysis vs. gap detection" + ] + }, + { + "id": 8, + "category": "paraphrase", + "prompt": "We're about to merge a PR that changes the cart validation logic. What do we need to re-test in the living doc?", + "expected_output": "Agent identifies this as an impact analysis request despite 're-test' phrasing. Maps the changed cart validation code to its Feature via the feature_registry in catalog.json. Traces Feature → Functionality → User Stories → ACs. Lists all linked Gherkin scenarios that need re-running. Outputs a structured re-test checklist.", + "files": [], + "expectations": [ + "Identifies this as an impact analysis request despite 're-test' phrasing", + "Maps changed code to Feature via the feature_registry section in catalog.json", + "Traces Feature → Functionality → User Stories → ACs", + "Lists all linked Gherkin scenarios that need re-running", + "Outputs a structured re-test checklist" + ] + }, + { + "id": 9, + "category": "edge-case", + "prompt": "PR #410 modifies MoneyUtils.java, a shared utility class used by checkout, refunds, and the promotions engine. What is the living doc impact?", + "expected_output": "Agent fans out the impact analysis to all Features that import or reference MoneyUtils. Lists all Functionalities within each Feature area that call MoneyUtils. Classifies all as High impact — shared utility changes propagate to all consumers. Produces a consolidated impact map across all three Feature areas (checkout, refunds, promotions).", + "files": [], + "expectations": [ + "Fans out impact analysis to all Features that reference MoneyUtils", + "Lists all Functionalities in each Feature that call MoneyUtils", + "Classifies all as High impact — shared utility changes affect all consumers", + "Produces a consolidated impact map across all three Feature areas" + ] + }, + { + "id": 10, + "type": "output-format", + "prompt": "PaymentGatewayClient.charge() now returns Result instead of ChargeResponse. What does the impact analysis report look like?", + "expected_output": "The impact analysis report has four sections: 'Direct callers' (services or classes that call the changed method), 'Downstream dependents' (components using the return value), 'Required changes' (concrete refactoring steps), and 'Test coverage required' (which tests need updating). Each section uses a markdown list. The method signature change appears in a fenced code block. No speculative changes beyond the described scope are included.", + "files": [], + "expectations": [ + "Four sections: Direct callers, Downstream dependents, Required changes, Test coverage required", + "Each section uses a markdown list", + "Method signature change in a fenced code block", + "No speculative changes beyond the described scope" + ], + "category": "happy-path" + }, + { + "id": 11, + "type": "file-based", + "description": "Analyse the impact of a method signature change in a Python service class.", + "prompt": "The file evals/files/changed-notification-service.py shows NotificationClient.send() with its old and new signature. Produce the impact analysis for this change.", + "expected_output": "Agent produces an impact analysis with: Direct callers (callers of NotificationClient.send() visible in the file or implied by the class structure), Required changes (update each call site to pass the new parameter), and Test coverage required (add tests for the new parameter in NotificationClientTest). The old and new signatures are shown in fenced code blocks.", + "files": [ + "evals/files/changed-notification-service.py" + ], + "expectations": [ + "Direct callers section identifies the caller", + "Required changes section lists specific call site updates", + "Test coverage section lists tests to add or update", + "Old and new signatures in fenced code blocks" + ], + "category": "happy-path" + }, + { + "id": 12, + "category": "edge-case", + "prompt": "PR #501 modifies only OrderServiceTest.java and adds a new mock in MockNotificationClient.java. What is the living doc impact?", + "expected_output": "Agent classifies all changed files as test files and mocks — not domain logic or API contract. Impact level is None. Test-only changes do not affect business logic or living doc entities. Notes in the PR that no living doc update is needed.", + "files": [], + "expectations": [ + "Classifies all changed files as test files and mocks — not domain logic or API contract", + "Impact level: None — test-only changes do not affect living doc", + "No living doc update required", + "Notes in PR that no living doc update is needed" + ] + }, + { + "id": 13, + "category": "happy-path", + "prompt": "PR #222 modifies both DiscountService.java (domain logic) and DiscountController.java (REST controller). What living doc entities are affected?", + "expected_output": "Agent classifies DiscountService.java as domain logic (High impact) and DiscountController.java as API contract (High impact). Traces both files to the owning Feature via the feature_registry in catalog.json. Traces Feature → Functionalities → User Stories → ACs for each changed file. Consolidates entities appearing more than once as higher-risk. Outputs a single consolidated impact map covering both changed files.", + "files": [], + "expectations": [ + "Classifies DiscountService.java as domain logic — High impact", + "Classifies DiscountController.java as API contract — High impact", + "Traces both files to the owning Feature (e.g. FEAT-promotions) via feature_registry", + "Traces Feature → Functionalities → User Stories → ACs for both changed files", + "Consolidates entities appearing more than once — higher risk", + "Outputs a single consolidated impact map covering both changed files" + ] + }, + { + "id": 14, + "prompt": "PR #600 only updates README.md, adds inline code comments, and reformats a YAML config file with no value changes. What is the living doc impact?", + "expected_output": "No living doc impact. Docs-only and formatting-only PRs fall into the fast-path no-impact category. No entities to review or update.", + "files": [], + "category": "happy-path", + "expectations": [ + "Classifies as None impact level — no living doc entities to review", + "Identifies the fast-path no-impact category: docs-only and formatting-only PRs", + "Does not flag README updates, comment additions, or whitespace-only YAML as impactful", + "Produces no sign-off checklist items — empty impact report is the correct output" + ] + }, + { + "id": 15, + "category": "happy-path", + "prompt": "We have never run an impact analysis before and the feature_registry doesn't exist. How do I bootstrap it?", + "expected_output": "Bootstrap the feature_registry in 4 steps: (1) Find all Feature entity files in the living doc catalog directory. (2) For each Feature, identify the corresponding source paths — for Angular, look for the component folder (e.g. src/app/checkout/); for Java/Spring, look for the controller/service package (e.g. com.example.checkout). (3) Build the registry as a map of feature_id → [source_paths]. (4) Save it as feature_registry.json. After saving, re-run the impact analysis. Use living-doc-create-feature for creating new Feature entities and living-doc-update for the 'Rename a Feature' workflow.", + "files": [], + "expectations": [ + "Lists all 4 bootstrap steps in order", + "Mentions Angular path pattern (src/app//)", + "Mentions Java/Spring path pattern (package name)", + "Instructs saving as feature_registry.json", + "Routes to living-doc-create-feature for new entities" + ] + } + ] +} \ No newline at end of file diff --git a/skills/living-doc-impact-analysis/evals/files/changed-notification-service.py b/skills/living-doc-impact-analysis/evals/files/changed-notification-service.py new file mode 100644 index 0000000..68d452e --- /dev/null +++ b/skills/living-doc-impact-analysis/evals/files/changed-notification-service.py @@ -0,0 +1,54 @@ +""" +NotificationClient — changed method signature. + +Used by: living-doc-impact-analysis file-based eval + +The send() method previously accepted (user_id, message). +It now accepts (user_id, message, channel) where channel is +one of: 'email', 'sms', 'push'. The channel parameter is required +(not optional) to force explicit intent at every call site. + +This file shows: +- The OLD signature (commented out) +- The NEW signature +- An example caller (OrderService) that uses the old signature +""" + + +class NotificationClient: + # OLD signature — no longer valid: + # def send(self, user_id: str, message: str) -> None: + + def send(self, user_id: str, message: str, channel: str) -> None: + """Send a notification to the user via the specified channel. + + Args: + user_id: The unique identifier of the recipient. + message: The notification message text. + channel: Delivery channel — one of: 'email', 'sms', 'push'. + + Raises: + ValueError: If channel is not one of the allowed values. + """ + allowed = {"email", "sms", "push"} + if channel not in allowed: + raise ValueError(f"channel must be one of {allowed}, got '{channel}'") + # ... actual delivery logic omitted + + +class OrderService: + """Uses the OLD NotificationClient.send() signature — needs updating.""" + + def __init__(self, notification_client: NotificationClient): + self._notifications = notification_client + + def place_order(self, user_id: str, sku: str, quantity: int) -> dict: + order = {"user_id": user_id, "sku": sku, "quantity": quantity, "status": "placed"} + # BUG: old signature — missing 'channel' argument + self._notifications.send(user_id, f"Order placed for {quantity}x {sku}") + return order + + def cancel_order(self, order_id: str, user_id: str) -> bool: + # BUG: old signature — missing 'channel' argument + self._notifications.send(user_id, f"Order {order_id} has been cancelled") + return True diff --git a/skills/living-doc-impact-analysis/evals/fixture-map.md b/skills/living-doc-impact-analysis/evals/fixture-map.md new file mode 100644 index 0000000..fe71159 --- /dev/null +++ b/skills/living-doc-impact-analysis/evals/fixture-map.md @@ -0,0 +1,45 @@ +# Living Doc Impact Analysis — Evals Fixture Map + +| Test ID | Category | Fixture | +|---|---|---| +| 1 | happy-path | *(no file — PR domain logic impact map scenario)* | +| 2 | regression | *(no file — API contract change scenario re-run list)* | +| 3 | regression | *(no file — release sign-off checklist scenario)* | +| 4 | regression | *(no file — changed module missing from feature_registry in catalog.json)* | +| 5 | regression | *(no file — infra-only change None impact level)* | +| 6 | negative | *(no file — update entity redirect to living-doc-update)* | +| 7 | negative | *(no file — gap-finding redirect to living-doc-gap-finder)* | +| 8 | paraphrase | *(no file — "what needs re-testing" re-test checklist framing)* | +| 9 | edge-case | *(no file — shared utility MoneyUtils fan-out to all consumers)* | +| 10 | output-format | *(no file — code-level impact report: method signature change format)* | +| 11 | file-based | `changed-notification-service.py` | Impact of NotificationClient.send() signature change | +| 12 | edge-case | *(no file — test-only PR: None impact level)* | +| 13 | happy-path | *(no file — PR with domain service + REST controller: fan-out trace)* | +| 14 | regression | *(no file — shared utility rounding change fan-out across three Features)* | + +## Coverage summary + +- happy-path: 2 (domain logic impact trace, multi-file fan-out) +- regression: 5 (API contract, release sign-off, missing registry entry, infra-only, shared utility) +- negative: 2 (update entity redirect, gap-finder redirect) +- paraphrase: 1 (re-test checklist framing) +- edge-case: 2 (shared utility fan-out, test-only None impact) +- output-format: 1 (method signature change format) +- file-based: 1 (NotificationClient signature change) + +## Rules exercised + +| Rule | Eval ID | +|---|---| +| Map changed file → Feature → US → scenarios | 1, 13 | +| API contract change impact trace | 2 | +| Release sign-off checklist | 3 | +| Flag missing feature_registry coverage | 4 | +| Classify infra change as None impact | 5 | +| Out-of-scope: update entity → living-doc-update | 6 | +| Out-of-scope: find gaps → living-doc-gap-finder | 7 | +| Re-test checklist framing | 8 | +| Shared utility fan-out to all consumers | 9, 14 | +| Method signature change code-level format | 10 | +| File-based method signature analysis | 11 | +| Test-only change → None impact | 12 | diff --git a/skills/living-doc-impact-analysis/evals/trigger-eval.json b/skills/living-doc-impact-analysis/evals/trigger-eval.json new file mode 100644 index 0000000..d26b411 --- /dev/null +++ b/skills/living-doc-impact-analysis/evals/trigger-eval.json @@ -0,0 +1,140 @@ +[ + { + "id": "t01-impact-explicit", + "query": "What living doc entities does PR #217 affect?", + "should_trigger": true, + "reason": "'impact of PR on living doc' is a listed trigger phrase." + }, + { + "id": "t02-trace-affected-us", + "query": "Which User Stories are affected by the PromoService refactor?", + "should_trigger": true, + "reason": "'trace affected user stories' is a listed trigger phrase." + }, + { + "id": "t03-affected-features", + "query": "Which Features are affected by the checkout module changes?", + "should_trigger": true, + "reason": "'affected features' is a listed trigger phrase." + }, + { + "id": "t04-impact-analysis", + "query": "Run a living doc impact analysis for the payment gateway refactor.", + "should_trigger": true, + "reason": "'impact analysis' is a listed trigger phrase." + }, + { + "id": "t05-living-doc-sign-off", + "query": "I need a living doc sign-off before releasing the checkout changes.", + "should_trigger": true, + "reason": "'living doc sign-off' is a listed trigger phrase." + }, + { + "id": "t06-what-does-change-affect", + "query": "What does the change to ShippingCalculator.java affect in the living doc?", + "should_trigger": true, + "reason": "'what does this change affect' is a listed trigger phrase." + }, + { + "id": "t07-which-scenarios-rerun", + "query": "Which scenarios need re-running after we changed the orders endpoint?", + "should_trigger": true, + "reason": "'which scenarios need re-running' is a listed trigger phrase." + }, + { + "id": "t08-pr-impact-docs", + "query": "Does PR #300 have any impact on the living doc?", + "should_trigger": true, + "reason": "'PR impact on docs' is a listed trigger phrase." + }, + { + "id": "t09-not-update-entity", + "query": "Update US-042 to add a new AC for the expired promo path.", + "should_trigger": false, + "reason": "Updating entities is handled by living-doc-update, not living-doc-impact-analysis." + }, + { + "id": "t10-not-gap-finder", + "query": "Which Functionalities don't have any User Stories?", + "should_trigger": false, + "reason": "Finding coverage gaps is handled by living-doc-gap-finder." + }, + { + "id": "t11-not-create", + "query": "Create a new User Story for the stacked discount feature.", + "should_trigger": false, + "reason": "Creating new entities is handled by living-doc-create-user-story." + }, + { + "id": "t12-what-needs-retesting", + "query": "What do we need to re-test after this refactor?", + "should_trigger": true, + "reason": "'which scenarios need re-running' pattern — living doc impact re-test checklist." + }, + { + "id": "t13-shared-utility-impact", + "query": "MoneyUtils was changed — how far does the impact spread in the living doc?", + "should_trigger": true, + "reason": "Shared utility impact fan-out analysis — 'impact of PR on living doc' trigger." + }, + { + "id": "t14-test-only-change", + "query": "PR #501 only changes test files. Does it need a living doc update?", + "should_trigger": true, + "reason": "Still an impact analysis request — expected answer is 'None' impact level; skill should confirm no update needed." + }, + { + "id": "t15-not-gap-finder-orphan", + "query": "Find all Functionalities with no linked User Stories.", + "should_trigger": false, + "reason": "Finding coverage gaps is handled by living-doc-gap-finder." + }, + { + "id": "t16-not-create-func", + "query": "Create a new Functionality for the stacked discount rule", + "should_trigger": false, + "reason": "Creating a Functionality — routes to living-doc-create-functionality" + }, + { + "id": "t17-not-scenario-creator", + "query": "Generate BDD scenarios for US-007", + "should_trigger": false, + "reason": "Generating scenarios — routes to living-doc-scenario-creator" + }, + { + "id": "t18-not-sync", + "query": "Sync the @AC: tags in the feature files with the AC catalog", + "should_trigger": false, + "reason": "Feature file sync — routes to gherkin-living-doc-sync" + }, + { + "id": "t19-not-step", + "query": "Write step definitions for the order placement scenarios", + "should_trigger": false, + "reason": "Writing step definitions — routes to gherkin-step" + }, + { + "id": "t20-not-pageobject", + "query": "Scan the webapp and generate PageObjects for the admin portal", + "should_trigger": false, + "reason": "PageObject scanning — routes to living-doc-pageobject-scan" + }, + { + "id": "t21-not-data-cy", + "query": "Add data-cy attributes to the checkout form inputs", + "should_trigger": false, + "reason": "Instrumenting templates — routes to data-cy-instrument" + }, + { + "id": "t22-not-bdd-maintain", + "query": "Find all unused step definitions in the BDD test suite", + "should_trigger": false, + "reason": "Dead code audit — routes to bdd-maintain" + }, + { + "id": "t23-bootstrap-registry", + "query": "The feature_registry is missing — how do I bootstrap it from the codebase?", + "should_trigger": true, + "reason": "Bootstrapping feature_registry — impact-analysis owns this 4-step procedure" + } +] \ No newline at end of file diff --git a/skills/living-doc-impact-analysis/scripts/.DS_Store b/skills/living-doc-impact-analysis/scripts/.DS_Store new file mode 100644 index 0000000..5008ddf Binary files /dev/null and b/skills/living-doc-impact-analysis/scripts/.DS_Store differ diff --git a/skills/living-doc-impact-analysis/scripts/trace_impact.py b/skills/living-doc-impact-analysis/scripts/trace_impact.py new file mode 100644 index 0000000..e8199c5 --- /dev/null +++ b/skills/living-doc-impact-analysis/scripts/trace_impact.py @@ -0,0 +1,366 @@ +#!/usr/bin/env python3 +""" +trace_impact.py — Living Doc Impact Tracer + +Given a list of changed files (or a unified diff) and a catalog JSON that includes a +feature_registry, traces which Features, Functionalities, and User Stories are affected +and at what impact level. The model uses the output to produce the narrative impact map +rather than performing the entity traversal through reasoning. + +Usage: + python trace_impact.py --files src/payments/PromoService.java --catalog catalog.json + python trace_impact.py --diff changes.diff --catalog catalog.json + python trace_impact.py --files src/auth/LoginController.java --catalog catalog.json --summary + python trace_impact.py --files src/payments/PromoService.java --catalog catalog.json \ + --output impact.json + +The catalog JSON must include a "feature_registry" section: + { + "feature_registry": [ + { + "feature_id": "FEAT-001", + "paths": ["src/auth/**", "src/security/login*"] + } + ], + "catalog": { ... }, + "known_test_links": { ... } + } + +Path patterns in feature_registry use Unix shell-style wildcards (fnmatch). +""" + +import argparse +import json +import re +import sys +from datetime import datetime, timezone +from fnmatch import fnmatch +from pathlib import Path + + +# ── File classification ──────────────────────────────────────────────────────── + +_DOMAIN_PATTERNS = [ + r".*[Ss]ervice\.(java|py|ts|scala)$", + r".*[Rr]epository\.(java|py|ts|scala)$", + r".*[Dd]omain.*\.(java|py|ts|scala)$", + r".*[Mm]odel\.(java|py|ts|scala)$", + r".*[Uu]se[Cc]ase\.(java|py|ts|scala)$", + r".*[Hh]andler\.(java|py|ts|scala)$", + r".*[Pp]rocessor\.(java|py|ts|scala)$", +] +_API_PATTERNS = [ + r".*[Cc]ontroller\.(java|ts|scala)$", + r".*[Rr]outer?\.(java|py|ts|scala)$", + r".*(openapi|swagger).*\.(yaml|yml|json)$", + r".*[Ee]ndpoint\.(java|py|ts|scala)$", + r".*[Rr]oute\.(java|py|ts|scala)$", + r".*[Rr]esource\.(java|ts|scala)$", +] +_EVENT_PATTERNS = [ + r".*\.(avsc|proto)$", + r".*[Ss]chema\.(json)$", + r".*[Ee]vent.*\.(java|py|ts|scala)$", +] +_UI_PATTERNS = [ + r".*\.(tsx|jsx)$", + r".*[Cc]omponent\.(ts|html)$", + r".*[Pp]age\.(ts|html)$", + r".*[Ff]orm\.(ts|html)$", +] +_CONFIG_PATTERNS = [ + r".*(application|bootstrap)\.(yaml|yml|properties)$", + r".*[Dd]ocker.*", + r".*\.(tf|hcl)$", + r".*(build|Build)\.(gradle|maven|sbt)$", + r".*[Cc]onfig\.(java|py|ts|yaml|yml|json)$", +] +_TEST_PATTERN = re.compile(r"(test|spec|mock|fixture|stub)", re.IGNORECASE) + + +def classify_file(path: str) -> str: + """Return a surface category for the given file path.""" + if _TEST_PATTERN.search(path): + return "test_or_mock" + name = Path(path).name + for pattern in _API_PATTERNS: + if re.match(pattern, name) or re.match(pattern, path): + return "api_contract" + for pattern in _EVENT_PATTERNS: + if re.match(pattern, name) or re.match(pattern, path): + return "event_contract" + for pattern in _UI_PATTERNS: + if re.match(pattern, name) or re.match(pattern, path): + return "ui_component" + for pattern in _CONFIG_PATTERNS: + if re.match(pattern, name) or re.match(pattern, path): + return "configuration" + for pattern in _DOMAIN_PATTERNS: + if re.match(pattern, name) or re.match(pattern, path): + return "domain_logic" + return "domain_logic" # safest default for unrecognised source files + + +_CATEGORY_IMPACT = { + "domain_logic": "High", + "api_contract": "High", + "event_contract": "High", + "ui_component": "Medium", + "configuration": "Low", + "test_or_mock": "None", +} + + +def default_impact_level(category: str) -> str: + return _CATEGORY_IMPACT.get(category, "Medium") + + +# ── Catalog helpers ──────────────────────────────────────────────────────────── + +def load_catalog(path: str) -> dict: + with open(path) as f: + return json.load(f) + + +def parse_diff_files(diff_path: str) -> list[str]: + """Extract changed file paths from a unified diff (git diff format).""" + seen: set[str] = set() + changed: list[str] = [] + with open(diff_path) as f: + for line in f: + if line.startswith("+++ ") or line.startswith("--- "): + raw = line[4:].strip() + if raw == "/dev/null": + continue + # Strip git a/ b/ prefixes + clean = re.sub(r"^[ab]/", "", raw) + if clean not in seen: + seen.add(clean) + changed.append(clean) + return changed + + +# ── Core logic ───────────────────────────────────────────────────────────────── + +def match_features( + changed_files: list[str], feature_registry: list[dict] +) -> dict[str, list[str]]: + """Map each changed file to the Feature IDs whose path patterns it matches.""" + result: dict[str, list[str]] = {} + for file_path in changed_files: + matched: list[str] = [] + for entry in feature_registry: + feat_id = entry.get("feature_id", "") + for pattern in entry.get("paths", []): + if fnmatch(file_path, pattern) or fnmatch(Path(file_path).name, pattern): + if feat_id not in matched: + matched.append(feat_id) + break + result[file_path] = matched + return result + + +def trace_entities(feature_ids: list[str], catalog_data: dict) -> dict: + """ + Walk Feature → Functionality → User Story → AC chains for the given Feature IDs. + Returns lists of affected entities and linked test artefacts. + """ + inner = catalog_data.get("catalog", catalog_data) + features = {f["id"]: f for f in inner.get("features", [])} + functionalities = {fn["id"]: fn for fn in inner.get("functionalities", [])} + user_stories = {us["id"]: us for us in inner.get("user_stories", [])} + known_test_links: dict = catalog_data.get("known_test_links", {}) + + result: dict = { + "features": [], + "functionalities": [], + "user_stories": [], + "acs_requiring_review": [], + "scenarios_requiring_rerun": [], + } + + visited_features: set[str] = set() + visited_funcs: set[str] = set() + visited_us: set[str] = set() + + for feat_id in feature_ids: + if feat_id in visited_features: + continue + visited_features.add(feat_id) + feat = features.get(feat_id) + if not feat: + continue + result["features"].append({"id": feat_id, "name": feat.get("name", "")}) + + # Functionalities + for func_id in feat.get("functionalities", []): + if func_id in visited_funcs: + continue + visited_funcs.add(func_id) + fn = functionalities.get(func_id) + label = fn.get("name", func_id) if fn else func_id + result["functionalities"].append({"id": func_id, "name": label}) + + # User Stories + for us_id in feat.get("user_stories", []): + if us_id in visited_us: + continue + visited_us.add(us_id) + us = user_stories.get(us_id) + if us: + result["user_stories"].append( + {"id": us_id, "title": us.get("title", us.get("name", ""))} + ) + + # Collect ACs and linked scenarios from the matched entities + reviewed_acs: set[str] = set() + rerun_scenarios: set[str] = set() + for ac_id, test_link in known_test_links.items(): + # Match ACs belonging to affected User Stories or Functionalities + owner = ac_id.split("-AC-")[0] if "-AC-" in ac_id else None + if not owner: + # Try prefix match on US-nnn or FUNC-nnn + for entity_list, key in [(result["user_stories"], "id"), (result["functionalities"], "id")]: + if any(ac_id.startswith(e[key]) for e in entity_list): + owner = "matched" + break + if owner: + if ac_id not in reviewed_acs: + reviewed_acs.add(ac_id) + result["acs_requiring_review"].append(ac_id) + if test_link and isinstance(test_link, str) and test_link not in rerun_scenarios: + rerun_scenarios.add(test_link) + result["scenarios_requiring_rerun"].append(test_link) + + return result + + +def build_impact_map( + changed_files: list[str], + file_to_features: dict[str, list[str]], + catalog: dict, +) -> dict: + unmatched = [f for f, feats in file_to_features.items() if not feats] + matched = {f: feats for f, feats in file_to_features.items() if feats} + + all_feature_ids: list[str] = [] + for feats in matched.values(): + for fid in feats: + if fid not in all_feature_ids: + all_feature_ids.append(fid) + + entities = trace_entities(all_feature_ids, catalog) + + surface_area = [] + for file_path in changed_files: + category = classify_file(file_path) + impact_level = default_impact_level(category) + surface_area.append( + { + "file": file_path, + "category": category, + "impact_level": impact_level, + "matched_features": file_to_features.get(file_path, []), + } + ) + + recommended_actions: list[str] = [] + if entities["acs_requiring_review"]: + recommended_actions.append( + "Review and update affected ACs → invoke living-doc-update" + ) + if entities["scenarios_requiring_rerun"]: + recommended_actions.append( + "Re-run linked Gherkin scenarios → invoke test-e2e-standards" + ) + if any(e["category"] in ("domain_logic", "api_contract") for e in surface_area): + recommended_actions.append( + "Sync drifted Gherkin step text → invoke gherkin-living-doc-sync" + ) + if unmatched: + recommended_actions.append( + f"{len(unmatched)} file(s) have no feature_registry entry — flag as High-impact gap " + "and document missing coverage using living-doc-create-functionality" + ) + + return { + "generated_at": datetime.now(timezone.utc).isoformat(), + "surface_area": surface_area, + "affected_entities": entities, + "unmatched_files": unmatched, + "recommended_actions": recommended_actions, + } + + +# ── Output helpers ───────────────────────────────────────────────────────────── + +def print_summary(impact_map: dict) -> None: + print(f"\n=== Living Doc Impact Map — {impact_map['generated_at']} ===") + print("\nSurface area:") + for entry in impact_map["surface_area"]: + feats = ", ".join(entry["matched_features"]) or "UNMATCHED" + print( + f" [{entry['impact_level']:6s}] {entry['file']}" + f" ({entry['category']}) → {feats}" + ) + ent = impact_map["affected_entities"] + if ent["features"]: + print(f"\nAffected Features: {', '.join(f['id'] for f in ent['features'])}") + if ent["functionalities"]: + print(f"Affected Functionalities: {', '.join(f['id'] for f in ent['functionalities'])}") + if ent["user_stories"]: + print(f"Affected User Stories: {', '.join(u['id'] for u in ent['user_stories'])}") + if ent["acs_requiring_review"]: + print(f"\nACs requiring review ({len(ent['acs_requiring_review'])}):") + for ac in ent["acs_requiring_review"]: + print(f" {ac}") + if ent["scenarios_requiring_rerun"]: + print(f"\nScenarios requiring re-run ({len(ent['scenarios_requiring_rerun'])}):") + for s in ent["scenarios_requiring_rerun"]: + print(f" {s}") + if impact_map["unmatched_files"]: + print("\nFiles NOT in feature registry (documentation gaps):") + for f in impact_map["unmatched_files"]: + print(f" ! {f}") + if impact_map["recommended_actions"]: + print("\nRecommended actions:") + for action in impact_map["recommended_actions"]: + print(f" → {action}") + + +# ── Entry point ──────────────────────────────────────────────────────────────── + +def main() -> None: + parser = argparse.ArgumentParser( + description="Trace living doc impact from a set of changed files." + ) + source = parser.add_mutually_exclusive_group(required=True) + source.add_argument("--files", "-f", nargs="+", help="Explicit list of changed file paths") + source.add_argument("--diff", "-d", help="Path to a unified diff file") + parser.add_argument( + "--catalog", "-c", required=True, + help="Path to catalog JSON (must include 'feature_registry')" + ) + parser.add_argument("--output", "-o", help="Write JSON impact map to this file") + parser.add_argument("--summary", "-s", action="store_true", help="Print human-readable summary") + args = parser.parse_args() + + catalog = load_catalog(args.catalog) + changed_files = args.files if args.files else parse_diff_files(args.diff) + feature_registry: list[dict] = catalog.get("feature_registry", []) + + file_to_features = match_features(changed_files, feature_registry) + impact_map = build_impact_map(changed_files, file_to_features, catalog) + + if args.summary: + print_summary(impact_map) + elif args.output: + with open(args.output, "w") as f: + json.dump(impact_map, f, indent=2) + print(f"Impact map written to {args.output}", file=sys.stderr) + else: + print(json.dumps(impact_map, indent=2)) + + +if __name__ == "__main__": + main() diff --git a/skills/living-doc-pageobject-scan/SKILL.md b/skills/living-doc-pageobject-scan/SKILL.md new file mode 100644 index 0000000..55cdbda --- /dev/null +++ b/skills/living-doc-pageobject-scan/SKILL.md @@ -0,0 +1,383 @@ +--- +name: living-doc-pageobject-scan +description: > + Discover, create, and maintain PageObject classes for webapp exploration. + Covers seed.yaml assembly, MCP Playwright crawl, entity harvesting, PageObject generation, + Functionality stubs, and manifest.json output. + Three scopes: CREATE (first scan), RE-SCAN (full manifest refresh after UI changes), + HEALING (fix selector drift in failing tests only). + Triggers on: "scan this webapp", "generate pageobjects", "crawl the UI", "explore the app", + "discover routes", "seed.yaml", "manifest.json", "first scan", "create page objects", + "pageobject drift", "re-scan", "refresh manifest", "heal pageobjects", "fix failing tests", + "selector drift", "tests are failing", "generate functionality stubs", + "bootstrap pageobjects", "bootstrap page objects". + Does NOT trigger for: adding/fixing Gherkin (use living-doc-scenario-creator); resolving + missing data-cy (use data-cy-instrument); deleting deprecated BDD files (use bdd-maintain). + Pairs with data-cy-instrument, living-doc-create-feature, and living-doc-scenario-creator. +license: Apache-2.0 +compatibility: GitHub Copilot +--- + +# Living Doc — PageObject Scan & Webapp Exploration + +> **Glossary:** Feature, PageObject, Functionality — see [living-doc-glossary](../references/living-doc-glossary.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-glossary.md)). +> **BDD schemas:** ExplorationFixture, seed.yaml, manifest field_constraints, PageObject file header — see [living-doc-bdd-schemas](../references/living-doc-bdd-schemas.md) ([remote](https://github.com/AbsaOSS/agentic-toolkit/blob/master/skills/references/living-doc-bdd-schemas.md)). + +**Scope:** UI Features only (web pages, modals, screens). API Features use annotated endpoint methods — not PageObjects. + +**Selector preference:** `data-testid` > `aria-label`/role > CSS class. Flag positional selectors (`nth-child`, `first-of-type`) as `FRAGILE`. + +--- + +## Two modes + +| Mode | Input | Use when | +|---|---|---| +| **Create** (initial scan) | App URL or test suite root | No PageObjects exist — bootstrapping or first session on a new app | +| **Maintain — RE-SCAN** | Existing PageObject files + current app | UI refactored, new feature shipped, or significant route changes — full manifest refresh | +| **Maintain — HEALING** | Failing test names / scenario titles | Test suite failures due to selector drift — failing tests only, do not touch passing tests | + +--- + +## Pre-flight: MCP Playwright availability check + +**This skill requires the MCP Playwright server. Perform this check before any other step, in every mode.** + +1. Attempt to call `mcp_microsoft_pla_browser_snapshot` (or any `mcp_microsoft_pla_browser_*` tool) with a no-op argument. +2. If the call **succeeds** — continue to the relevant mode below. +3. If the call **fails or the tool is unavailable** — **stop immediately.** Do not fall back to static sources, route configs, or guided traversal as a substitute. Output exactly: + + > **MCP Playwright server is not available.** + > This skill requires the `@playwright/mcp` (or equivalent) MCP server to be running and connected. + > Please enable it in your VS Code MCP configuration (`.vscode/mcp.json` or user settings) and restart the agent session, then retry. + + Do not attempt any crawl, seed assembly, or DOM interaction until the user confirms the server is available. + +--- + +## Create mode + +### Step 0 — Business Seed assembly + +Before crawling, locate or create `seed.yaml` and `manifest.json`: + +1. Search for `seed.yaml` containing `base_url:`; search for `manifest.json` containing `pageobject_path` entries. +2. If found, load both and resume — all manifest entries are already discovered. +3. If not found, create at the living-doc directory or `.copilot/bdd/`. +4. On first discovery, propose adding both paths to `.github/copilot-instructions.md`: + +```markdown +## BDD Artifacts +- **Business Seed:** `/seed.yaml` +- **Exploration Manifest:** `/manifest.json` +``` + +Collect seed content from whichever sources are available: + +| Source | Behaviour | +|---|---| +| **A — Living documentation** | Extract Feature names, US titles, AC texts, and primary routes. | +| **B — Sitemap / route config** | Parse Angular router, React Router, or `sitemap.xml` for URL paths. | +| **C — OpenAPI / Swagger** | Extract endpoint paths; map REST resources to UI screens where obvious. | +| **D — Existing PageObjects** | Load current `manifest.json` — treat known surfaces as already discovered. | +| **E — Guided traversal** | See [Guided Traversal Protocol](#guided-traversal-protocol-source-e) below. | + +**Credential rule:** Never store literals in `seed.yaml`. Always use `env:VAR_NAME`: + +```yaml +base_url: https://... +credentials: + username: env:BDD_USERNAME + password: env:BDD_PASSWORD +known_routes: + - path: /login + feature: Authentication +guided_steps: [] +form_fixtures: {} +``` + +**Partial state rule:** `seed.yaml` present, `manifest.json` absent = first run. Begin crawl from `base_url`; do not assume any surfaces are discovered. + +### Step 1 — Crawl + +Navigate each route in `seed.yaml` via MCP Playwright. Snapshot DOM; identify interactive elements, forms, navigation links, significant UI surfaces. Follow links to find new routes not yet in manifest. + +**Entity harvesting:** when a domain ID, version, feed ID, or other parameterised value is read from the DOM, record it under `known_entities` in `seed.yaml` (fields: `id`, `version`, `name`, `status`, `owner`, `note`). Use before prompting the user for parameterised route values. + +**Parameterised routes:** check `seed.yaml known_entities` for a match owned by the current test user before navigating `/path/{id}/{version}`. Only fall back to user-assist pause if none exists. + +**Dismiss rule:** close any modal/overlay (Cancel → × → Escape) before moving to the next route. + +Repeat until coverage plateau — no new surfaces in the last full iteration. + +### Step 2 — Auth handling + +| Auth type | Strategy | +|---|---| +| Cookie/session | Log in once via Playwright `storageState`, reuse across routes. | +| OAuth/OIDC | Inject pre-issued test token via `localStorage` or `Authorization` header. | +| MFA-protected | Use test account with MFA disabled, or TOTP library with known seed. | +| Multi-step wizard | Parse existing step definitions to reconstruct navigation sequence. | + +### Step 3 — Form traversal (deep exploration) + +Resolve field values using the **ExplorationFixture sourcing cascade** (see [living-doc-bdd-schemas](../references/living-doc-bdd-schemas.md#explorationfixture)): + +1. `seed.yaml form_fixtures` pre-declared value for this route + field. +2. Value copied from an existing entity (`copyable`), or suffixed to avoid duplicate rejection (`derived`). +3. Inferred `fake` value from label + placeholder + tooltip. +4. User-assist pause for `real-world` fields — record to `form_fixtures` as `source: user_provided`. + +Skip `condition`-gated fields until the controlling field holds the required value. + +After successful submit, probe each text input: special characters (`<>'"&\`), oversized input (200+ chars), wrong type, duplicate value. Run core scan after each probe to capture `data-cy` error elements visible only in error state. Record in `navigation_context.field_constraints`. + +#### Angular CPS component interactions + +> **Angular-specific.** For React/Vue, adapt component resolution; all other steps apply unchanged. + +| Component | Correct interaction | +|---|---| +| `cps-radio-group` | `browser_click` inner `
  • ` by text. | +| `cps-autocomplete` | `browser_type` into inner ``, wait for dropdown, `browser_click` option. | +| `cps-switch` / `cps-checkbox` | `browser_click` the wrapper. | +| `app-text-editor` (rich text) | `browser_click` `contenteditable` child, then `browser_type`. | +| `cps-button` | `browser_click` inner `