PDX-509: validity bridge, tri-state status + quality threshold#211
Conversation
RCA: The local MCP validator runs a two-layer model the Quality Hub API does not have. Critical best-practice violations (e.g. unknown apiId) only moved quality_score and never gated is_valid, so a test that will not load in Provar could still be reported as valid. The quality threshold also defaulted to 80 with no env override and collapsed sub-threshold cases to a bare "invalid". Fix: Bridge critical best-practice violations into Layer-1 issues so they gate is_valid (deduped against hand-coded rules; score parity preserved). Add a shared resolveQualityThreshold (arg -> PROVAR_MCP_QUALITY_THRESHOLD -> 90) and a tri-state status (invalid / needs_improvement / valid) with meets_quality_threshold across the testcase and hierarchy validators.
…dit severities
RCA: With the validity bridge live, critical best-practice violations now gate
is_valid. This surfaced two latent, previously score-only defects: (1) the
generator emitted valueClass="date"/"datetime" with raw ISO strings, but the
real Provar corpus (651 cases) stores date/dateTime exclusively as epoch
milliseconds with camelCase dateTime — an ISO string fails to load (RENDER-DATE-
VALUECLASS-001), and the casing rule wrongly lowercase-enforced dateTime; (2)
the varStringLiteral back-end exemption for UI-target args is wrong (a bare
{Var} there lands in the URL as %7B...%7D and hard-fails).
Fix: Suppress Layer-1-owned concepts from the bridge so the authoritative schema
rule reports them once (empty <steps> still loads). Convert ISO date/datetime to
epoch ms (UTC) with canonical dateTime casing in the generator and accept that
casing in RENDER-CASE-001. Drop the varStringLiteral UI-target exemption. Add
PDX-509 bridge/threshold/tri-state tests and a resolveQualityThreshold suite;
update docs/mcp.md. Severity audit: all implemented criticals are load-blocking
except VAR-NAMING-001 (its own text says "RUNTIME FAILURES" = major) — flagged
for review, not changed, to preserve QH score parity.
Quality Orchestrator🟢 LOW · 🧪 Tests to Run · Running 9 of 55 tests
▶ Run commandnpx vitest run \
unit/mcp/bestPracticesEngine.test.ts \
unit/mcp/hierarchyValidate.test.ts \
unit/mcp/projectValidateFromPath.test.ts \
unit/mcp/testCaseGenerate.test.ts \
unit/mcp/testCaseValidate.test.ts \
unit/mcp/testPlanValidate.test.ts \
unit/mcp/testSuiteValidate.test.ts \
unit/mcp/qualityThreshold.test.ts \
unit/services/projectValidation.test.ts
|
There was a problem hiding this comment.
Pull request overview
This PR re-aligns local Provar MCP validation with Provar’s load-validity model by bridging load-blocking (critical) best-practice violations into issues[] as ERRORs (so they gate is_valid), and by introducing a tri-state per-test verdict (invalid / needs_improvement / valid) driven by an explicit quality threshold that now defaults to 90 (with env override support).
Changes:
- Bridge unsuppressed critical best-practice violations into
issues[]to gateis_valid, while preserving Quality Hub score parity. - Introduce tri-state
status,meets_quality_threshold, and effectivequality_thresholdacross validation tools; add sharedresolveQualityThreshold(arg → env → 90). - Fix generator emission for
valueClass="date"/"dateTime"by converting ISO inputs to epoch milliseconds; update valueClass casing handling and VAR string-literal rule behavior/tests/docs.
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| test/unit/mcp/testSuiteValidate.test.ts | Updates fixtures and adds suite-level tri-state/threshold behavior assertions. |
| test/unit/mcp/testPlanValidate.test.ts | Updates apiId fixtures to fully-qualified forms. |
| test/unit/mcp/testCaseValidate.test.ts | Adds validity-bridge and tri-state/threshold handler tests. |
| test/unit/mcp/testCaseGenerate.test.ts | Updates generator expectations for epoch-ms date/dateTime emission and escaping coverage. |
| test/unit/mcp/qualityThreshold.test.ts | Adds dedicated tests for threshold precedence and bounds. |
| test/unit/mcp/bestPracticesEngine.test.ts | Updates VAR-STRING-LITERAL behavior tests and clarifies severity expectations. |
| src/services/projectValidation.ts | Uses shared threshold resolution instead of hardcoded default. |
| src/mcp/utils/qualityThreshold.ts | Introduces shared threshold resolution logic and default=90. |
| src/mcp/tools/testSuiteValidate.ts | Applies shared threshold resolution and updates schema description text. |
| src/mcp/tools/testPlanValidate.ts | Applies shared threshold resolution and updates schema description text. |
| src/mcp/tools/testCaseValidate.ts | Adds tri-state verdict fields; implements critical-BP bridging and baseline diff de-dupe logic. |
| src/mcp/tools/testCaseGenerate.ts | Converts ISO date/datetime inputs to epoch ms and emits canonical dateTime casing. |
| src/mcp/tools/projectValidateFromPath.ts | Removes hardcoded default and documents new threshold precedence/default. |
| src/mcp/tools/hierarchyValidate.ts | Adds needs_improvement status for test cases and updates summary counts. |
| src/mcp/tools/bestPracticesEngine.ts | Accepts canonical dateTime casing and removes UI-target arg exemption for VAR literals. |
| src/mcp/rules/provar_best_practices_rules.json | Updates VAR-STRING-LITERAL rule description to reflect runtime failure behavior. |
| src/commands/provar/automation/project/validate.ts | Updates CLI default threshold to 90 and minor formatting adjustments. |
| messages/sf.provar.automation.project.validate.md | Updates CLI flag help text to reflect default threshold 90. |
| docs/mcp.md | Updates MCP docs for new fields, threshold precedence, and bridge semantics. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (inferred === 'date' || inferred === 'datetime') { | ||
| // PDX-509: Provar stores date/dateTime as epoch MILLISECONDS, never an ISO string — | ||
| // an ISO string fails to load in the IDE (RENDER-DATE-VALUECLASS-001), and the real | ||
| // corpus uses epoch ms exclusively with camelCase `dateTime`. Convert here. | ||
| const ms = isoToEpochMs(val, inferred); | ||
| const valueClass = inferred === 'datetime' ? 'dateTime' : 'date'; | ||
| const emitted = ms !== null ? String(ms) : escapeXmlContent(val); | ||
| return `${indent}<value class="value" valueClass="${valueClass}">${emitted}</value>`; | ||
| } |
| // A critical BP violation bridged into issues[] shares its rule_id with the | ||
| // surfaced issue (the issue/BP rule_id namespaces are disjoint), so drop it | ||
| // from the BP list here to avoid double-counting it in the baseline diff. |
…ment RCA: Copilot flagged two issues. (1) When a value matches the ISO date/datetime shape but is not a real date (e.g. "2026-99-99"), isoToEpochMs returns null and the generator still emitted valueClass="date"/"dateTime" with the raw text — a load-breaking value. (2) The diff-dedup comment was internally inconsistent: it said the issue/BP rule_id namespaces are disjoint yet that a bridged critical shares its rule_id with the surfaced issue. Fix: Fall back to valueClass="string" when a date/datetime value cannot be converted to epoch ms, with a regression test. Reword the diff-dedup comment to state plainly that hand-coded issue rule_ids never collide with BP rule_ids, so a BP rule_id in issues[] can only be a bridged critical.
RCA: VAR-NAMING-001 was tagged critical/weight 8, but a variable name with a space is a RUNTIME failure (the test still loads and fails at execution), which the severity taxonomy classifies as major, not load-blocking critical. As a critical it was being surfaced by the PDX-509 bridge and wrongly flipping is_valid for a runtime defect. The Quality Hub backend is making the same change (major / weight 5) so score parity is preserved. Fix: Set VAR-NAMING-001 severity=major, weight=5 in the local rules JSON, and note in the description that it is a runtime defect, not a load-blocking one. It is no longer bridged, so a bad variable name no longer flips is_valid (it docks quality_score and surfaces as a major best-practice violation).
PDX-509 — Validity-surfacing bridge + quality threshold
Re-aligns the local MCP validator with Provar's test-case-validity model. Previously the validator ran a two-layer split the Quality Hub API does not have:
criticalbest-practice violations only movedquality_scoreand never gatedis_valid, so a test that won't load in Provar (e.g. an unknownapiId) could still come backis_valid: true.What changed
criticalbest-practice violation is surfaced intoissues[]as anERRORand now gatesis_valid.quality_scoreis computed from the untouched BP list, so QH Lambda score parity is preserved (a bridged critical appears in bothissues[]andbest_practices_violations[]).TC_*) are authoritative for the concepts they own (root, identifier, steps presence,testItemId, comparisonType); the matching BP criticals are never double-reported. An empty-but-present<steps>still loads.status(invalid/needs_improvement/valid) +meets_quality_threshold+ effectivequality_threshold, onprovar_testcase_validateand the hierarchy tools (testsuite/testplan/project).PROVAR_MCP_QUALITY_THRESHOLDenv var, sharedresolveQualityThreshold(precedence: per-call arg → env → 90).sfUiTargetObjectId/sfUiTargetResultNameexemption (a bare{Var}there lands in the URL as%7B…%7Dand hard-fails). Staysmajor.valueClass="date"/"datetime"with raw ISO strings. The real Provar corpus (AllPOCProjects, 651 cases) stores date/dateTime exclusively as epoch milliseconds with camelCasedateTime, andRENDER-DATE-VALUECLASS-001confirms an ISO string fails to load. The generator now converts ISO → epoch ms (bare date = UTC midnight, tz-less datetime = UTC) with canonical casing, andRENDER-CASE-001(validateValueClassCasing) was corrected to accept camelCasedateTime.Severity audit
All 29 implemented
criticalrules are genuinely load-blocking exceptVAR-NAMING-001, whose own description says "RUNTIME FAILURES" (taxonomy →major). Flagged, not changed, to preserve QH score parity — needs a product decision.Quality Hub backend follow-ups (to coordinate separately)
VarStringLiteral: drop thesfUiTargetObjectId/sfUiTargetResultNameexemption.ValueClassCasing: treatdateTimeas camelCase-canonical (don't flagvalueClass="dateTime").VAR-NAMING-001: confirm severitycritical→major.Docs
docs/mcp.mdupdated (new input/output fields, the bridge note, threshold-90,PROVAR_MCP_QUALITY_THRESHOLD). Customer-facing docs (provar-mcp-public-docs.md, university course) reference the validate contract — they need the newstatus/threshold-90 added by the Provar team (not touched here).Tests
Adds PDX-509 bridge / tri-state / threshold tests + a
resolveQualityThresholdsuite, and updates fixtures to use fully-qualified apiIds (matching real Provar files). 1427 passing, smoke 60/60, lint + compile clean.🤖 Generated with Claude Code