Skip to content

feat(fingerprint): Add handwritten v1 hashing algorithm [2/3]#247

Draft
dmcilvaney wants to merge 7 commits into
microsoft:mainfrom
dmcilvaney:damcilva/schema_version_parts/projection-golden-vectors
Draft

feat(fingerprint): Add handwritten v1 hashing algorithm [2/3]#247
dmcilvaney wants to merge 7 commits into
microsoft:mainfrom
dmcilvaney:damcilva/schema_version_parts/projection-golden-vectors

Conversation

@dmcilvaney

Copy link
Copy Markdown
Contributor

No description provided.

Copilot AI review requested due to automatic review settings June 20, 2026 02:16
@dmcilvaney dmcilvaney changed the title feat(fingerprint): Add handwritten v1 hashing algorithm feat(fingerprint): Add handwritten v1 hashing algorithm [2/3] Jun 20, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR advances the “schema-version-parts” workstream by introducing the v1 projection-based fingerprinting substrate (encoder + tag parser + v1 projector + golden vectors) alongside the existing hashstructure path, and adds accompanying documentation/plans/reports for the cutover phases.

Changes:

  • Add internal/fingerprint/ projection primitives (canonical encoder, version-set tag parser, projection combiner) plus a hand-written projectV1 and golden-vector freeze tests.
  • Add fingerprint:"v1..*"/"-" tags across fingerprinted internal/projectconfig/ structs and tighten TestAllFingerprintedFieldsHaveDecision to require explicit decisions validated via fingerprint.ValidateFieldTag.
  • Add schema-migration docs plus workstream plan/report artifacts for phases 1–3.

Reviewed changes

Copilot reviewed 32 out of 33 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
report/schema-version-parts/report-phase2.md Phase 2 completion report for the v1 projector + golden vectors work.
report/schema-version-parts/report-phase1.md Phase 1 completion report for encoder/tag-parser/combiner primitives.
report/schema-version-parts/.gitkeep Documents expected phase report outputs in this directory.
plan/schema-version-parts/phase3-reset-cutover.md Planned Phase 3 cutover steps (switch-over + token format).
plan/schema-version-parts/phase2-projection-golden-vectors.md Phase 2 implementation plan and acceptance criteria.
plan/schema-version-parts/phase1-encoder-tag-parser.md Phase 1 implementation plan and acceptance criteria.
plan/schema-version-parts/overview.md Workstream overview tying phases together and describing validation.
plan/schema-version-parts/handoff-prompt.md Session handoff prompt for implementing phases in order.
internal/projectconfig/specsource.go Adds fingerprint version-set tags to spec source fields.
internal/projectconfig/render.go Adds fingerprint tag to render config field.
internal/projectconfig/package.go Adds hazard comment for a fingerprint:"-" pruned subtree.
internal/projectconfig/overlay.go Adds fingerprint tags to overlay fields (and keeps description excluded).
internal/projectconfig/fingerprint_test.go Enforces mandatory fingerprint tag decisions via ValidateFieldTag.
internal/projectconfig/distro.go Adds fingerprint tags to distro reference fields.
internal/projectconfig/component.go Adds fingerprint tags + hazard comments; keeps Packages measured.
internal/projectconfig/build.go Adds fingerprint tags to build config leaves; hazard comments for pruned composites.
internal/fingerprint/versiontag.go Implements parsing/validation for fingerprint version-set tags.
internal/fingerprint/versiontag_internal_test.go Unit tests for version-set tag parsing and emit-key resolution.
internal/fingerprint/testdata/golden_v1.json Frozen v1 projection digest table (append-only contract).
internal/fingerprint/project.go Hand-written projectV1 projector + ValidateFieldTag and helpers.
internal/fingerprint/project_internal_test.go Projector tests, emission probe, and placeholder/completeness gates.
internal/fingerprint/golden_internal_test.go Golden vector management contract + append-only enforcement test.
internal/fingerprint/combine.go sha256 folding of projection bytes + non-config inputs.
internal/fingerprint/combine_internal_test.go Unit tests for combiner determinism and sensitivity.
internal/fingerprint/canonical.go Canonical <len>:<key>=<len>:<value> encoder and omit/encode rules.
internal/fingerprint/canonical_internal_test.go Unit tests for canonical encoding, omit rules, and failure cases.
docs/developer/schema-migration/README.md Executive summary of the RFC and reset/lazy-migration split.
docs/developer/schema-migration/problem-and-motivation.md Plain-language problem statement and “substrate problem” rationale.
docs/developer/schema-migration/part-2-lazy-migration.md Summary of deferred/provisional Part 2 lazy migration approach.
docs/developer/schema-migration/part-1-the-reset.md Summary of the reset’s goals, token format, and constraints.
docs/developer/schema-migration/delivery-plan.md PR sequencing and cutover bundling plan.
.github/instructions/go.instructions.md Updates contributor guidance for fingerprint tags and projectV1 workflow.

Comment thread internal/fingerprint/canonical.go Outdated
Comment on lines +27 to +29
// - Scalar leaves (string, bool, sized ints, and slices of those) use plain
// [reflect.Value.IsZero]: emit omits a zero, emitAlways emits it anyway (the
// '!' always-emit case for a build-meaningful zero).
Comment on lines +60 to +63
order. **Empty map / nil-slice handling.** `emit` applies plain `IsZero`, so a
nil scalar slice is omitted while a non-nil empty slice emits an empty value;
the nil-vs-`[]` collapse is `canonicalizeForFingerprint`'s job at the hash
boundary (Phase 2/3), not the primitive's.
Comment on lines +18 to +21
1. A scalar-slice **canonicalizer** (`canonicalizeForFingerprint`) - a generic
reflective normalizer that collapses every nil-or-empty scalar slice to nil at
the hash boundary, with no hand-maintained field inventory. Its canonical-form
test was written **first**, before any golden vector (the ordering gate).
Comment on lines +36 to +39
- `project.go` - `currentContentVersion = 1`; `projectV1` + nested sub-projectors;
the `proj` accumulator (defers the first emit error, bufio-style);
`canonicalizeForFingerprint` / `canonicalizeInto` / `canonicalizeSlice`;
`ValidateFieldTag` (mandatory-decision + composite-`!` gate) and `tomlKeyOf`.
Comment thread plan/schema-version-parts/overview.md Outdated
Comment on lines +106 to +107
| `internal/fingerprint/` | New canonical encoder, tag parser, `projectV1`, `canonicalizeForFingerprint`, sha256 combiner; `ComputeIdentity` switch-over; remove `hashstructure`. (`fingerprint.go`) |
| `internal/projectconfig/` | `fingerprint:"vN..*"` tags on `ComponentConfig` and nested structs (`component.go`, `build.go`); extend `TestAllFingerprintedFieldsHaveDecision` (`fingerprint_test.go`); `Packages` mandatory-tag correction; resolver scalar-slice canonicalization (`component.go`). |
Comment on lines +64 to +67
Replace the `hashstructure.Hash(component, ...)` config-hash step with
`canonicalizeForFingerprint(cfg)` immediately followed by `projectV1` + `sha256`, invoked
**inside the hash boundary** so every path into the hasher is canonicalized regardless of how
the caller obtained the config. Retire the `uint64` `ConfigHash` field on `ComponentInputs`;
@dmcilvaney dmcilvaney force-pushed the damcilva/schema_version_parts/projection-golden-vectors branch from 6f44065 to 4499493 Compare June 24, 2026 23:06
Copilot AI review requested due to automatic review settings June 24, 2026 23:09
@dmcilvaney dmcilvaney force-pushed the damcilva/schema_version_parts/projection-golden-vectors branch from 4499493 to 7ad6237 Compare June 24, 2026 23:09

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 30 out of 32 changed files in this pull request and generated 1 comment.

Comment on lines +37 to +43
func (b *treeBuilder) set(key string, value any) {
if b.out == nil {
b.out = map[string]any{}
}

b.out[key] = value
}
@dmcilvaney dmcilvaney force-pushed the damcilva/schema_version_parts/projection-golden-vectors branch from 7ad6237 to b814e33 Compare June 25, 2026 01:03
…256 combiner

Phase 1 (PR A1) of the schema-version-parts cutover. Adds the pure projection-substrate primitives in internal/fingerprint, beside the existing hashstructure path: the canonicalBuf length-prefixed encoder with the split omit-predicate, the fingerprint version-set tag parser, and the sha256 combiner step. Nothing is wired into ComputeIdentity and hashstructure is untouched, so no lock byte or scenario snapshot changes. Includes in-package unit tests and the phase 1 report; updates plan status.
@dmcilvaney dmcilvaney force-pushed the damcilva/schema_version_parts/projection-golden-vectors branch from b814e33 to 272e74c Compare June 25, 2026 15:33
Copilot AI review requested due to automatic review settings June 25, 2026 15:33

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 22 out of 23 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

internal/projectconfig/fingerprint_test.go:35

  • fingerprintedStructs is duplicated here even though fingerprint.FingerprintedStructTypes() is intended to be the single source of truth (see internal/fingerprint/project.go). Using the exported list avoids the two reflective walks drifting if a struct is added/removed in only one place.
	// All struct types whose fields participate in component fingerprinting.
	// When adding a new struct that feeds into the fingerprint, add it here.
	fingerprintedStructs := []reflect.Type{
		reflect.TypeFor[projectconfig.ComponentConfig](),
		reflect.TypeFor[projectconfig.ComponentBuildConfig](),

## Component Fingerprinting: version-set tags + `projectV1`

Structs in `internal/projectconfig/` are hashed by `hashstructure.Hash()` to detect component changes. Fields **included by default** (safe: false positive > false negative).
Structs in `internal/projectconfig/` feed the component fingerprint, and **every field must carry an explicit `fingerprint` tag** - an absent tag fails `TestAllFingerprintedFieldsHaveDecision`. (The live hash is still `hashstructure.Hash()`; the hand-written `projectV1` in `internal/fingerprint/project.go` is wired in parallel and replaces it at the reset.)
@dmcilvaney dmcilvaney force-pushed the damcilva/schema_version_parts/projection-golden-vectors branch from 272e74c to 973f011 Compare June 25, 2026 15:53
@dmcilvaney dmcilvaney force-pushed the damcilva/schema_version_parts/projection-golden-vectors branch from 973f011 to 3599227 Compare June 25, 2026 16:11
Copilot AI review requested due to automatic review settings June 25, 2026 16:11

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 22 out of 23 changed files in this pull request and generated 2 comments.

Comment on lines +105 to +107
When adding a field to a fingerprinted struct, ask: **"Does changing this field change the build output?"**
- **No** (human docs, scheduling hints, CI policy, metadata, back-references) → tag it `fingerprint:"-"` and register the exclusion in `expectedExclusions` in `internal/projectconfig/fingerprint_test.go`.
- **Yes** (build flags, spec source, defines, etc.) → tag it `fingerprint:"v1..*"` (omit-if-zero) and wire it into `projectV1`: add its `emit`/sub-projector line, set it in `emissionProbeConfig`, then append a `<toml-key>-set` golden vector with `go test ./internal/fingerprint -run TestGoldenVectors -update-golden`. The golden table is append-only - never edit an existing digest. The full contract lives in `goldenConfigs`'s doc comment.
Comment on lines +19 to +24
// FingerprintedStructTypes returns every struct type whose fields participate in
// the component fingerprint. It is the single source of truth that both the
// mandatory-tag decision test (`projectconfig.TestAllFingerprintedFieldsHaveDecision`)
// and the emission probe (`measuredLeafEmitKeys`) walk - keeping it in one
// exported place is what stops those two walks from drifting silently (a struct
// dropped from one but not the other would weaken coverage with no failure).
Add the v1 projection layer beside the existing hashstructure path, additive
and not yet wired into ComputeIdentity:

- projectV1 + frozen nested sub-projectors, emitting measured fields by literal
  Go path under their frozen TOML emit-key
- canonicalizeForFingerprint: reflective nil-or-empty scalar-slice normalizer at
  the hash boundary, pruning at fingerprint:"-" edges (cycle-safe, no field
  inventory)
- fingerprint:"v1..*" tags on every measured field across the 10 fingerprinted
  structs; hazard comments on each '-'-pruned composite; Packages kept measured
- golden vectors with an append-only -update-golden guard; emission probe;
  composite-'!' placeholder gate; mandatory-tag decision test
- frozen maximalConfig vs growing emissionProbeConfig split + documented
  additive-field workflow and naming convention

No lock byte or scenario snapshot moves; hashstructure untouched.
@dmcilvaney dmcilvaney force-pushed the damcilva/schema_version_parts/projection-golden-vectors branch from 3599227 to 7ffefaa Compare June 25, 2026 18:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants