From cd83c6a9469472db5214ee7f2712114ff26bd550 Mon Sep 17 00:00:00 2001
From: Michael Dailey <michael.dailey@provartesting.com>
Date: Fri, 5 Jun 2026 10:11:26 -0500
Subject: [PATCH] PDX-0: chore(release): bump version to 1.6.0 and refresh
 release docs

Req: cut the 1.6.0 release bundling PDX-501..511 (validator overhaul: structural + backend-only rules ported local, validity bridge, tri-state status, quality threshold 80->90, project-aware test_case_id, two new MCP resources) and sweep the user-facing docs so the release ships accurate.

Fix: bump package.json + server.json 1.5.4 -> 1.6.0; correct docs/mcp.md tool count 38 -> 42 and document the provar://docs/tool-guide resource; drop two internal ticket refs from mcp.md prose; expand mcp-pilot-guide Scenario 2 with status / quality_threshold / validity-bridge; add the missing provar_org_describe tool to the mcp-start message; refresh README counts and replace the stale dotted-name tool list with a pointer to docs/mcp.md; add the v1.6.0 CHANGELOG entry. Gates: compile clean, 1479 unit tests, smoke 60/60, lint clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 CHANGELOG.md                    | 92 +++++++++++++++++++++++++++++++++
 README.md                       | 42 ++-------------
 docs/mcp-pilot-guide.md         |  3 ++
 docs/mcp.md                     | 18 +++++--
 messages/sf.provar.mcp.start.md |  1 +
 package.json                    |  2 +-
 server.json                     |  4 +-
 7 files changed, 119 insertions(+), 43 deletions(-)
 create mode 100644 CHANGELOG.md

diff --git a/CHANGELOG.md b/CHANGELOG.md
new file mode 100644
index 00000000..58c7ff5e
--- /dev/null
+++ b/CHANGELOG.md
@@ -0,0 +1,92 @@
+# Changelog
+
+## v1.6.0
+
+A major step-up for the local validator: it now mirrors what actually loads and runs in Provar, surfaces severity through validity, and ships a higher default quality bar — plus two new MCP resources that expose the validator's contract to AI clients.
+
+### Highlights
+
+- **The local validator now mirrors Provar's own load/runtime behaviour.** Dozens of structural and best-practice checks that previously lived only in the Quality Hub backend now run locally, so `provar_testcase_validate` catches load-blocking and runtime defects with no API key required.
+- **Validity now reflects severity.** A `critical` best-practice violation (e.g. a hallucinated `apiId`, a non-integer `testItemId`) now gates `is_valid` instead of quietly passing — an AI agent can trust `is_valid: true` again.
+- **A single, tri-state verdict.** Every validate tool returns `status: valid | needs_improvement | invalid`, alongside the effective `quality_threshold` and `meets_quality_threshold`, so agents have one unambiguous signal to gate on.
+- **A higher default quality bar.** The default quality threshold is raised **80 → 90**, tunable per call (`quality_threshold`) or globally (`PROVAR_MCP_QUALITY_THRESHOLD`).
+- **Two new MCP resources** expose the validator's contract: the structured Provar test-step schema and a canonical Validation Rule Registry.
+
+### New
+
+- **MCP resource `provar://schema/test-step`** — the structured JSON contract for the Provar test-case XML (root, generic `apiCall`, every step type with required/optional args and value classes).
+- **MCP resource `provar://docs/validation-rules`** — the canonical registry of every validation rule across both layers (id, severity, weight, what it checks, and whether it gates `is_valid`).
+- **`status`, `quality_threshold`, `meets_quality_threshold`** output fields on `provar_testcase_validate` and the suite/plan/project validate tools.
+- **`quality_threshold` input** plus the **`PROVAR_MCP_QUALITY_THRESHOLD`** env var (precedence: per-call arg → env → 90).
+- **Context-aware `comparisonType` validation** — the valid comparison set is scoped by step type (AssertValues, UI Assert, …) instead of one flat list.
+- **Project-aware `test_case_id` allocation** — generated test cases take the next id in the surrounding Provar project rather than a hard-coded `id="1"`; the chosen id is surfaced on the response.
+- **`PROVAR_PLUGIN_NOT_FOUND`** error code when the Provar Automation plugin is missing.
+
+### Changed
+
+- **Validity bridge:** critical best-practice violations are surfaced as `is_valid`-gating issues, deduplicated against the Layer-1 rule that already owns the same concept.
+- Severity alignment with Quality Hub: `UI-BINDING-ORDER-001` and `VAR-NAMING-001` reclassified critical → major (they hard-fail at runtime but do not block loading).
+- Test-case generator fidelity: `UiDoAction` is serialised as `uiInteraction`, and `UiAssert` field assertions are nested for correct Provar IDE rendering.
+
+### Fixed
+
+- Best-practices engine no longer crashes on numeric tag values.
+- `RENDER-CASE-001` scoped to the six real `valueClass` values, removing false positives.
+- `TC_010` accepts any integer test-case id and treats id as optional (the `guid` is the real identifier).
+- Windows: the `sf` executable and its arguments are quoted so project paths containing spaces work.
+
+### Upgrade notes
+
+- **Mostly non-breaking.** New inputs, env vars, and output fields are additive.
+- **Behaviour change to note:** with the validity bridge and the higher default threshold (90), a test case that previously returned `is_valid: true` may now report `status: "needs_improvement"` (score below 90) or `"invalid"` (a critical violation now gates validity). Set `quality_threshold` per call or `PROVAR_MCP_QUALITY_THRESHOLD` globally to restore the previous 80 bar if needed.
+
+## v1.5.1
+
+### Highlights
+
+- **Smaller, faster MCP handshake** — opt into compact tool schemas and load only the tool groups you need. **~36% fewer handshake tokens with compact mode alone, up to ~57% when combined with group filtering.**
+- **Smarter validation loops** — agents get tunable response detail, run-over-run diffs, and a single completeness signal that's safe to gate on.
+- **Single-call test authoring** — test-case generation is now a true one-shot construction, with a runtime guard so agents stop iterating in the wrong direction.
+- **Reliable connection + environment resolution** in `.testproject` files.
+
+### Tool-catalog footprint
+
+Tokens sent to the LLM on `tools/list` (≈4 chars/token):
+
+| Configuration                              | Tools | ~Tokens | Savings vs default |
+| ------------------------------------------ | ----: | ------: | -----------------: |
+| Standard (all groups, full descriptions)   |    41 |  18,355 |                  — |
+| Compact (all groups, compact descriptions) |    41 |  11,758 |           **−36%** |
+| Authoring profile (compact + 4 groups)     |    21 |   7,906 |           **−57%** |
+
+Per-tool savings are largest where they matter most — `testcase_generate` alone drops from ~2,070 tokens of description to a fraction of that in compact mode.
+
+### New
+
+- Compact schema mode and tool-group filtering for trimmed startup payloads.
+- `detail`, `baseline_run_id`, `run_id`, and `completeness_score` on validation tools.
+- `fields` parameter on inspect / list tools to scope responses to only what's needed.
+- Depth guard and token-attribution middleware across all tools.
+- Construct-vs-amend contract carried into test-case tool titles, descriptions, and a runtime check on empty steps.
+
+### Guidance & prompt improvements
+
+- Test-case authoring rewritten as a **single-call construction** contract — agents now produce a complete test case in one call instead of looping through construct → amend → re-amend cycles. End-to-end authoring of a multi-step Salesforce flow drops from typically **3–5 tool calls to 1**.
+- Construct-vs-amend semantics surfaced at three layers (tool title, description, runtime guard) so agents that skim only the title still get the contract.
+- Validation tools now return a single `completeness_score` (0–100) so agents have one number to gate on, instead of inferring stop/continue from violation arrays.
+- Compact tool descriptions are tuned to keep the _contract_ (when to call, prerequisites, common failure modes) while dropping prose — the signal agents actually use stays intact.
+
+### Fixed
+
+- Validation stop decisions now account for all violation levels (plan metadata, suite, best-practices) instead of stopping while issues remain.
+- Read-only validation diffs work without writing new results.
+- Validation baselines are now scoped to their original project context, so a baseline from one project can't silently diff against another.
+- Unknown tool-group names now warn instead of silently disabling everything.
+- Release builds now reliably fetch the latest NitroX schemas instead of falling back to a bundled copy.
+- Connection + environment resolution in `.testproject` files.
+- Various agent-loop and review-pass hardening for the test-case authoring path.
+
+### Upgrade notes
+
+- **Non-breaking.** All new parameters and env vars are opt-in.
+- Existing callers see no behavior change.
diff --git a/README.md b/README.md
index d6012540..43469cb9 100644
--- a/README.md
+++ b/README.md
@@ -100,7 +100,7 @@ sf provar auth login
 claude mcp add provar -s user -- sf provar mcp start --allowed-paths /path/to/your/provar/project
 ```
 
-📖 **[docs/mcp.md](https://github.com/ProvarTesting/provardx-cli/blob/main/docs/mcp.md) — full setup, all 35+ tools, 11 MCP prompts, troubleshooting.**
+📖 **[docs/mcp.md](https://github.com/ProvarTesting/provardx-cli/blob/main/docs/mcp.md) — full setup, all 42 tools, 6 resources, 11 MCP prompts, troubleshooting.**
 
 ---
 
@@ -251,42 +251,10 @@ DESCRIPTION
   Note: --json is not available on this command — stdout is reserved for MCP traffic.
 
 TOOLS EXPOSED
-  provar.project.inspect               — inspect project folder inventory
-  provar.pageobject.generate           — generate Java Page Object skeleton
-  provar.pageobject.validate           — validate Page Object quality (30+ rules)
-  provar.testcase.generate             — generate XML test case skeleton
-  provar.testcase.validate             — validate test case XML (validity + best-practices scores)
-  provar.testsuite.validate            — validate test suite hierarchy
-  provar.testplan.validate             — validate test plan with metadata completeness checks
-  provar.project.validate              — validate full project: cross-cutting rules, connections, environments
-  provar.properties.generate           — generate provardx-properties.json from the standard template
-  provar.properties.read               — read and parse a provardx-properties.json file
-  provar.properties.set                — update fields in a provardx-properties.json file
-  provar.properties.validate           — validate a provardx-properties.json file against the schema
-  provar.ant.generate                  — generate an ANT build.xml for CI/CD pipeline execution
-  provar.ant.validate                  — validate an ANT build.xml for structural correctness
-  provar.qualityhub.connect            — connect to a Quality Hub org
-  provar.qualityhub.display            — display connected Quality Hub org info
-  provar.qualityhub.testrun            — trigger a Quality Hub test run
-  provar.qualityhub.testrun.report     — poll test run status
-  provar.qualityhub.testrun.abort      — abort an in-progress test run
-  provar.qualityhub.testcase.retrieve  — retrieve test cases by user story / component
-  provar.automation.setup              — detect or download/install Provar Automation binaries
-  provar.automation.testrun            — trigger a Provar Automation test run (LOCAL)
-  provar.automation.compile            — compile Page Objects after changes
-  provar.automation.config.load        — register a provardx-properties.json as the active config (required before compile/testrun)
-  provar.automation.metadata.download  — download Salesforce metadata into the project
-  provar.qualityhub.defect.create      — create Quality Hub defects from failed test executions
-  provar.testrun.report.locate         — resolve artifact paths (JUnit.xml, HTML reports) for a completed test run
-  provar.testrun.rca                   — analyse a completed test run: classify failures, extract page objects, detect pre-existing issues
-  provar.testplan.add-instance         — wire a test case into a plan suite by writing a .testinstance file
-  provar.testplan.create-suite         — create a new test suite directory with .planitem inside a plan
-  provar.testplan.remove-instance      — remove a .testinstance file from a plan suite
-  provar.nitrox.discover               — discover projects containing NitroX (Hybrid Model) page objects
-  provar.nitrox.read                   — read NitroX .po.json files and return parsed content
-  provar.nitrox.validate               — validate a NitroX .po.json against schema rules
-  provar.nitrox.generate               — generate a new NitroX .po.json from a component description
-  provar.nitrox.patch                  — apply a JSON merge-patch to an existing NitroX .po.json file
+  42 tools across: project inspection & org describe, Page Object and test-case
+  authoring/validation, test-suite/plan validation, properties files, Quality Hub
+  (test runs, defects, corpus examples), Provar Automation, ANT build, and NitroX
+  components. See docs/mcp.md for the full catalogue with schemas and examples.
 
 EXAMPLES
   Start MCP server (accepts stdio connections from Claude Desktop / Cursor):
diff --git a/docs/mcp-pilot-guide.md b/docs/mcp-pilot-guide.md
index f2ce6b6f..06f5e6bc 100644
--- a/docs/mcp-pilot-guide.md
+++ b/docs/mcp-pilot-guide.md
@@ -238,6 +238,9 @@ Prompt your AI assistant:
 **What to look for:**
 
 - `validity_score` and `quality_score` both returned (0–100)
+- A tri-state `status` — `valid`, `needs_improvement` (structurally valid but `quality_score` below the threshold), or `invalid` (a critical issue gates validity)
+- `quality_threshold` (the effective threshold, default **90**) and `meets_quality_threshold` returned alongside the score
+- Critical best-practice violations surface as `is_valid: false` issues (the validity bridge) — e.g. a hallucinated `apiId` or a non-integer `testItemId`
 - Specific rule violations called out (e.g. TC_010 missing test case ID, TC_001 missing XML declaration)
 - Best-practices suggestions (e.g. hardcoded credentials, missing step descriptions)
 - `validation_source: "local"` if no API key is configured, `"quality_hub"` if authenticated
diff --git a/docs/mcp.md b/docs/mcp.md
index e3df2a0d..826898ad 100644
--- a/docs/mcp.md
+++ b/docs/mcp.md
@@ -77,6 +77,7 @@ The Provar DX CLI ships with a built-in **Model Context Protocol (MCP) server**
   - [provar://docs/step-reference](#provardocsstep-reference)
   - [provar://schema/test-step](#provarschematest-step)
   - [provar://docs/validation-rules](#provardocsvalidation-rules)
+  - [provar://docs/tool-guide](#provardocstool-guide)
   - [provar://nitrox/component-catalog](#provarnitroxcomponent-catalog)
   - [provar://nitrox/catalog-source](#provarnitroxcatalog-source)
 - [AI loop pattern](#ai-loop-pattern)
@@ -564,7 +565,7 @@ Paste the [standard config](#the-standard-config-recommended) into either file u
 }
 ```
 
-> **Tool limit:** Agentforce Vibes loads approximately 20 tools per MCP server at runtime. The Provar MCP server exposes 38 tools — you may need to restart or re-enable the server between tasks if the active tool list gets out of date. Salesforce is tracking this limit; consult the [Agentforce Vibes MCP documentation](https://developer.salesforce.com/docs/platform/einstein-for-devs/guide/devagent-mcp.html) for the latest guidance.
+> **Tool limit:** Agentforce Vibes loads approximately 20 tools per MCP server at runtime. The Provar MCP server exposes 42 tools — you may need to restart or re-enable the server between tasks if the active tool list gets out of date. Salesforce is tracking this limit; consult the [Agentforce Vibes MCP documentation](https://developer.salesforce.com/docs/platform/einstein-for-devs/guide/devagent-mcp.html) for the latest guidance.
 
 </details>
 
@@ -957,7 +958,7 @@ AssertValues uses **flat** argument structure (`expectedValue`, `actualValue`, `
 | Mode             | Behaviour                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
 | ---------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
 | `auto` (default) | When a `UiWithScreen` is followed by UI action siblings (any of `UiDoAction`, `UiAssert`, `UiRead`, `UiFill`, `UiNavigate`, `UiWithRow`, `UiHandleAlert`), those siblings are absorbed into the screen's `<clause name="substeps">` block. The grouping run stops at the next `UiWithScreen`, any non-UI step (`SetValues`, `ApexConnect`, …), or end of list. `UiWithRow` plays a dual role: when it follows a `UiWithScreen` it is pulled in as a child container and absorbs its own following UI actions. When the payload contains screen containers but no `UiWithScreen` at root (e.g. starts with `UiWithRow`), the generator synthesizes a root `UiWithScreen` wrapper (`target` = `target_uri` or `sf:ui:target`) so the output still satisfies `UI-NEST-STRUCT-001` — without that wrapper, the root `UiWithRow` itself would fail validation. `testItemId`s are assigned depth-first: parent screen, then its substeps slot, then its children. Numbering remains sequential and gap-free. |
-| `flat`           | Legacy behaviour: every step is emitted as a root sibling, no `<clauses>` block is generated. Use this for payloads that are already structured correctly by the caller, or when debugging the pre-PDX-495 shape.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
+| `flat`           | Legacy behaviour: every step is emitted as a root sibling, no `<clauses>` block is generated. Use this for payloads that are already structured correctly by the caller, or when debugging the legacy flat shape.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
 | `single-screen`  | Wraps every step in one synthetic `UiWithScreen` whose `target` is `sf:ui:target` (or the URI passed via `target_uri`). Matches the existing `ui:pageobject:target` semantics. Use for tests that all live on a single screen.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
 
 If `target_uri` is `ui:pageobject:target?pageId=…` the single-screen wrap takes precedence regardless of `grouping_mode` — this is the pre-existing non-SF nesting behaviour.
@@ -2006,7 +2007,7 @@ This produces silent-pass behaviour that is hard to spot from a log: the run exi
 The plan-mode resolver consults the properties file registered by [`provar_automation_config_load`](#provar_automation_config_load) (`PROVARDX_PROPERTIES_FILE_PATH` in `~/.sf/config.json`), reads `projectPath`, then:
 
 1. Walks `<projectPath>/plans/**/*.testinstance` for any `testCasePath="..."` referencing the test under validation. If found → `plan` mode → DATA-001 suppressed.
-2. Otherwise checks `testCase` / `testCases` for a direct reference. If found → `direct` mode → DATA-001 with the PDX-489 advisory.
+2. Otherwise checks `testCase` / `testCases` for a direct reference. If found → `direct` mode → DATA-001 with the direct-mode advisory.
 3. Falls back to `unknown` mode when no project context is resolvable — DATA-001 still fires (structural fallback) so authors editing a test case in isolation are still warned.
 
 **Recommended workaround**
@@ -2577,6 +2578,17 @@ The resource content is `docs/VALIDATION_RULE_REGISTRY.md`, generated from the r
 
 ---
 
+### `provar://docs/tool-guide`
+
+Tool-selection guide for the Provar MCP server, organised by what you want to accomplish (run tests, author tests, debug failures, manage config, …) rather than by tool name. Read this to choose the right tool and understand correct sequencing — e.g. which prerequisite tool must run before another — before making calls.
+
+**URI:** `provar://docs/tool-guide`  
+**MIME type:** `text/markdown`
+
+The resource content is the same as `docs/PROVAR_TOOL_GUIDE.md` in this repository, compiled into the package at build time. If the file is missing, the resource returns a short placeholder telling the client to reinstall or upgrade the plugin.
+
+---
+
 ### `provar://nitrox/component-catalog`
 
 Catalog of all shipped NitroX (Hybrid Model) base component packages. Lists every package with its components, types, tagNames, interactions, and attributes. Read this before calling `provar_nitrox_generate` to understand available component patterns and naming conventions.
diff --git a/messages/sf.provar.mcp.start.md b/messages/sf.provar.mcp.start.md
index 4cf9c79d..8aae7c53 100644
--- a/messages/sf.provar.mcp.start.md
+++ b/messages/sf.provar.mcp.start.md
@@ -16,6 +16,7 @@ Project & inspection:
 - provar_project_inspect — inspect project folder inventory
 - provar_project_validate — validate full project from disk: coverage, quality scores
 - provar_connection_list — list connections and named environments from the project
+- provar_org_describe — describe Salesforce objects from the Provar workspace .metadata cache
 
 Page Object:
 
diff --git a/package.json b/package.json
index 6c555865..aa4b5cf4 100644
--- a/package.json
+++ b/package.json
@@ -1,7 +1,7 @@
 {
   "name": "@provartesting/provardx-cli",
   "description": "A plugin for the Salesforce CLI to orchestrate testing activities and report quality metrics to Provar Quality Hub",
-  "version": "1.5.4",
+  "version": "1.6.0",
   "mcpName": "io.github.ProvarTesting/provar",
   "license": "BSD-3-Clause",
   "plugins": [
diff --git a/server.json b/server.json
index 0ebe3b05..88e771f5 100644
--- a/server.json
+++ b/server.json
@@ -14,12 +14,12 @@
     "url": "https://github.com/ProvarTesting/provardx-cli",
     "source": "github"
   },
-  "version": "1.5.4",
+  "version": "1.6.0",
   "packages": [
     {
       "registryType": "npm",
       "identifier": "@provartesting/provardx-cli",
-      "version": "1.5.4",
+      "version": "1.6.0",
       "transport": {
         "type": "stdio"
       },