Skip to content

chore: cull ghost-code (-29 KLOC, post-audit scope reset)#9

Merged
Dashtid merged 11 commits into
mainfrom
chore/cull-ghost-code
Jun 14, 2026
Merged

chore: cull ghost-code (-29 KLOC, post-audit scope reset)#9
Dashtid merged 11 commits into
mainfrom
chore/cull-ghost-code

Conversation

@Dashtid

@Dashtid Dashtid commented Jun 14, 2026

Copy link
Copy Markdown
Owner

Summary

The 2026-06-14 audit (web research + git archaeology) found ~13 KLOC of production scripts and ~8 KLOC of tests defending behavior with no operational consumer on a single-user laptop. This PR culls them.

Diff: 65 files changed, 197 insertions, 29,176 deletions.

Why

  • 174 commits over 14 months; only ~3 looked like "ran it, broke, fixed" — the rest was test backfill and refactor churn.
  • Most categories duplicated either native Windows tools (Task Manager, Event Viewer, `wsl.exe`, Settings) or the lab-server stack on q-lab (Prometheus/Grafana for monitoring, Velero for backup, k9s for Kubernetes).
  • The pre-cull 0.65 test:prod ratio was a symptom — most tests mocked OS calls and asserted on the mocks, which proves wiring not behavior.

What survived

Category Scripts
Windows first-time-setup fresh-windows-setup, export/install-from-exported-packages, Compare-SoftwareInventory
Windows maintenance system-updates + Install-SystemUpdatesTask
Windows backup Backup-DeveloperEnvironment (snapshot before rebuild)
Windows development Manage-Docker, remote-development-setup
Windows network Set-StaticIP
Windows troubleshooting Repair-CommonIssues
Linux nvidia-gpu-exporter, disk-cleanup, headless-server-setup

What was killed

Category Reason
All 5 Windows `monitoring/*` Task Manager + Sysinternals + Reliability Monitor + Event Viewer cover it; no scheduling, no alerting on this single-user box
5 of 6 Windows `backup/*` OneDrive + browser sync + Velero (on q-lab) cover it; restore paths were untested in disaster
Windows `Test-DevEnvironment.ps1`, `Manage-WSL.ps1` `wsl.exe` wins; dev-env checker had no clear trigger
Windows `reporting/`, `security/Get-UserAccountAudit`, `network/Manage-VPN` Single-user laptop doesn't need AD audit; OpenVPN GUI handles VPN
3 of 4 Linux `maintenance/*` `apt` / `journalctl --vacuum-time` cover it; the rollback script was speculative
Linux `docker-cleanup`, `pod-health-monitor`, `service-health-monitor` q-lab Prometheus/k9s cover these
Linux `security-hardening.sh` Lives in `defensive-toolkit`
5 umbrella Pester test files (Backup, Monitoring, Tier2, Tier3, DeveloperEnvironment) Pre-dated per-script behavioral tests; redundant or referenced deleted scripts
6 dir READMEs Their directories are now empty

Docs rewritten

  • `README.md` — script catalog reflects post-cull scope
  • `QUICKSTART.md` — examples use surviving scripts
  • `BACKLOG.md` — Current State block rewritten; Sprint 7 cancelled; closeouts retained as history
  • `docs/ROADMAP.md` — strategic doc rewritten to reflect deliberate-narrow-scope philosophy
  • `CHANGELOG.md` — new 3.0.0 entry documenting the cull
  • Subdirectory READMEs for surviving categories (Windows/backup, Windows/network, Windows/development, Linux/maintenance, Linux/monitoring)
  • `tests/Linux/maintenance.bats` trimmed to disk-cleanup only

Policy going forward

  • Any script that goes 6 months without a `fix:` commit triggered by real failure is a candidate for archival.
  • No more "behavioral coverage" sprints — favor smoke tests for the surviving setup scripts; don't mock-pad scripts that aren't invoked.

Note on commit count

This PR includes 8 commits: the 1 cull commit + 7 pre-existing local-main commits that hadn't been pushed to origin (CI fix, kubernetes-cli choco pin, Sprint 4.x/5.x test backfill, Sprint 5.2 refactor, Sprint 6.1 BATS unify, Sprint 6.3 retry/backoff). Merging this PR will catch origin/main up to local main + apply the cull.

Test plan

  • Surviving Pester suite passes: 607 tests, 0 failures (was 1320 — the delta went with the deleted scripts, no orphan refs)
  • `tests/Linux/maintenance.bats` BATS-preprocessor-syntax sanity check
  • CI runs green on this PR

Dashtid added 8 commits June 11, 2026 21:08
…4.3)

Rename Main to Invoke-BrowserProfileBackup with a mirrored param() block
so the testability guard forwards every script param explicitly (same
pattern as Sprint 4.2). Replace inner exit N with return N so the function
returns an exit code cleanly. Add explicit -Path to Get-BackupDirectory
and -BackupDir to Get-BackupList so they are independently testable
without relying on dynamic scope reads of script-level $OutputPath.

34 Pester tests cover the 11 helpers (Get-BackupDirectory custom-path
and fallback, Test-BrowserInstalled per-browser path probe, Get-FirefoxProfiles
INI parsing for both IsRelative=1 and IsRelative=0, Get-BrowserExtensions
Chrome/Edge/Brave manifest parsing including __MSG_ locale-placeholder
fallback and corrupt-JSON fallback plus Firefox extensions enumeration,
Export-BookmarksToHtml writing valid Netscape-bookmark-file-1 HTML for
Chrome and the Firefox info-line placeholder, Backup-BrowserProfile
happy path / IncludeCookies+IncludeHistory switches / error catch /
Firefox multi-profile iteration, Compress-BackupFolder zip+remove
original and failure fallback, Remove-OldBackups RetentionDays=0
short-circuit and age-cutoff filtering, Get-BackupList filename pattern
parsing and skip of unmatched filenames, Restore-BrowserProfile
missing-archive / no-Firefox-profile / temp-dir cleanup in the finally
block, Export-HtmlReport file output) plus the top-level
Invoke-BrowserProfileBackup paths (ListBackups empty / with-rows,
Restore-without-target returns 1, no-browser-installed returns 2,
IncludePasswords writes PASSWORD_REMINDER.txt).

New Pester gotcha documented: Get-Content's -Path and -LiteralPath are
distinct parameters. A Mock ParameterFilter on $Path does not see
-LiteralPath values, so when tests verify written files via
Get-Content -LiteralPath, the mock filter still matches (with $Path
null) and returns mocked content instead of the real file. Workaround
is to read verification files via [System.IO.File]::ReadAllText() so
the test bypasses the mock entirely.

Coverage 33.54% -> 37.27% (+3.73 pp). Tests 1214 -> 1248.
Wrap the top-level try/catch in Invoke-SystemStateExport with a mirrored
param() block so the testability guard forwards every script param
explicitly (same pattern as Sprint 4.2/4.3). Replace inner exit N with
return N. Helpers that read $DryRun via dynamic scope are left as-is;
tests exercise the production path and use Invoke-SystemStateExport's
own -DryRun param for the dry-run code path.

18 Pester tests cover Get-ExportComponents ('All' expansion / explicit
list passthrough), New-ExportFolder timestamped folder + subdir
structure, Export-Drivers success + Get-PnpDevice throw, Export-Services
JSON+CSV output, Export-WindowsFeatures windows-features.json,
Export-NetworkConfig (adapters / ip-config / routes / dns / firewall),
Export-ScheduledTasks (verifying the \Microsoft\* path filter excludes
system tasks), Export-EventLogs, New-ExportManifest writing valid JSON
with components/results, Compress-ExportFolder zip+remove and
failure-fallback, Export-HTMLReport file output with stats/results
table, Export-JSONReport ComputerName/Statistics/Results structure,
and Invoke-SystemStateExport top-level (-DryRun returns 0, fatal error
returns 1, dispatcher only invokes listed components).

Coverage 37.27% -> 40.83% (+3.56 pp, crossed the 40% threshold).
Tests 1248 -> 1266.
Wrap the top-level try/catch/finally in Invoke-BackupIntegrityTest with
a mirrored param() block so the testability guard forwards every script
param explicitly (same pattern as Sprint 4.2-4.4). Replace inner exit N
with return N. The finally-block temp-folder cleanup stays in place
inside the function, so it still runs whether the function returns
normally or throws.

27 Pester tests cover the 10 helper functions + Invoke top-level.
Tests build a real ZIP archive in $TestDrive (containing a real
backup_metadata.json with SHA256 hashes) so the archive helpers run
end-to-end against real bytes -- no Mock for Compress-Archive,
Expand-Archive, or [System.IO.Compression.ZipFile]::OpenRead. This
gives much higher signal than a fully mocked test pyramid because the
actual zip-parsing branches execute.

Coverage:
- Format-FileSize bytes/KB/MB/GB boundaries.
- Get-BackupInfo archive vs folder vs corrupted-archive paths.
- Test-ArchiveStructure valid + corrupt.
- Get-BackupMetadata archive-internal, folder-resident, missing-file
  warning path.
- Expand-BackupToTemp success path (real extraction + $script:TempFolder
  bookkeeping) and failure path.
- Test-FileHashes Skipped=true when no FileHashes in metadata,
  HashesMatched++ on a real SHA256 match, mismatched-hash recording in
  Stats.FailedFiles.
- Test-FileExtraction readable vs corrupted-archive error.
- Restore-ToTarget archive+folder paths and Expand-Archive failure.
- Remove-TempFolder existing folder + missing folder no-throw.
- Export-HTMLReport / Export-JSONReport file output.
- Invoke-BackupIntegrityTest Restore-without-target returns 1, Quick
  happy path returns 0, fatal-error path returns 1.

Coverage 40.83% -> 44.12% (+3.29 pp). Tests 1266 -> 1293.

Sprint 4 complete: 132 tests added (+13.96 pp from 30.16% to 44.12%),
1 production bug fixed, 5 scripts in the backup category fully covered.
Beat the +8-10 pp estimate.
…nt 5.1)

Refactor the straight-line main try/catch into Invoke-SystemPerformance
with a mirrored param() block (same wrap-and-forward pattern as Sprint
4.2-4.5). Inner exit 1 becomes return 1; the testability guard at the
bottom forwards every script param explicitly.

Fix one real production bug: four script-level parameters carried over
from a "merged from Watch-DiskSpace.ps1" refactor (-IncludeDiskAnalysis,
-AutoCleanup, -TopFilesCount, -TopFoldersCount) were declared on the
script's param block but never actually wired into the main flow.
Get-DiskAnalysis existed as a complete helper -- it just was never
called. Users who passed -IncludeDiskAnalysis or -AutoCleanup saw the
parameter accepted without a parser error and walked away thinking they
had a disk-analysis report when nothing had run. Reconnected the
wire-up: when -IncludeDiskAnalysis is set in single-run mode, main now
calls Get-DiskAnalysis -DiskVolumes $metrics.DiskVolumes
-EnableAutoCleanup:$AutoCleanup after metrics collection. The
continuous-monitor loop deliberately skips this (repeated heavy disk
scans would not make sense for a real-time monitor).

23 Pester tests cover the major helpers: Get-ThresholdAlerts
Critical/Warning bands and multi-alert combinations, Get-TopProcesses
sort+top-N and PID 0 filter and Get-Process-throws fallback,
Get-SystemInfo CIM aggregation, Get-LargestFiles >100MB filter,
Get-CleanupSuggestions Temp / Windows-Update branches,
Invoke-DiskAutoCleanup with Remove-Item and Clear-RecycleBin
destructive operations Mock'd so no real deletion can happen,
Get-DiskAnalysis dispatcher (Warning threshold gate,
EnableAutoCleanup+Critical-only auto-clean trigger), Export-JSONReport
and Export-CSVReport file output, and Invoke-SystemPerformance happy
path / fatal-error / AlertOnly / IncludeDiskAnalysis wire-up
verification.

Updated the literal-'exit 1' meta-check in Monitoring.Tests.ps1 to
also accept 'return 1' / 'exit $exitCode' (the new patterns introduced
by this refactor).

Coverage 44.12% -> 46.76% (+2.64 pp). Tests 1293 -> 1316.
…ironmentTest (Sprint 5.2)

Renames Main -> Invoke-DevEnvironmentTest with a mirrored param() block
(Profile, RequirementsFile, AutoInstall, CheckSSH, CheckExtensions,
OutputFormat, OutputPath). Replaces inner exit 1 / final exit $exitCode
with return so the function returns an exit code cleanly. Adds a
testability guard that splats script params into the function only when
not dot-sourced -- same pattern used in Sprints 4.x and 5.1.

Behavioral tests are deferred. The script depends on 17 external CLI
tools (git, node, npm, python, pip, docker, kubectl, ssh, code, gh, az,
terraform, ...). An initial attempt at stub-based tests caused Pester to
exit with code 4 and zero output, and a separate run hung when the SSH
probe bypassed the stub. BACKLOG already flagged this script as the
"most expensive mocking surface in the repo"; the refactor pays for
itself by uncovering bugs the same way Sprint 5.1 did.
Adds a -Linux switch and auto-detection so the runner covers both
suites in one command. With no flags, runs whichever runner is
available (Pester for Windows, bats for Linux) and skips the other
with a warning. With an explicit -Windows or -Linux flag, errors out
if that runner isn't installed.

BATS results are parsed from TAP output: the plan line (1..N) gives
the test count; ok/not-ok lines give pass/fail counts. Each .bats
file under tests/Linux is invoked individually, mirroring the CI
loop.

Closes Sprint 6.1.
…(Sprint 6.3)

Restore-VsCodeExtension now retries a failed `code --install-extension`
call once with a 5-second backoff before giving up. Previous behavior
logged "Failed to install: <ext>" on any non-zero exit code and moved
on, so a transient marketplace blip or rate-limit caused permanent
extension loss across a restore. Retry is bounded (2 attempts max) so
a real failure still completes the restore quickly.

The old single "Continues iterating past failures" test is replaced by
two contexts covering the new behavior: retry-success (extension fails
attempt 1, succeeds attempt 2: count reflects success, code invoked
Total+1 times, Start-Sleep invoked once) and retry-failure (extension
fails both attempts: count reflects failure, code still invoked
Total+1 times). Start-Sleep is mocked so tests stay fast.

Pester 1316 -> 1320 (24 tests in this file, all passing).
The 2026-06-14 audit (web research + git archaeology) found ~13 KLOC of
production scripts and ~8 KLOC of tests defending behavior with no
operational consumer on a single-user laptop. 174 commits over 14
months, only ~3 looked like "ran it, broke, fixed" — the rest was test
backfill and refactor churn. Most categories duplicated either native
Windows tools (Task Manager, Event Viewer, wsl.exe, Settings) or the
lab-server stack on q-lab (Prometheus/Grafana for monitoring, Velero
for backup, k9s for Kubernetes).

What survived: 4 first-time-setup scripts, system-updates +
Install-SystemUpdatesTask, Backup-DeveloperEnvironment (snapshot before
rebuild), Manage-Docker, remote-development-setup, Set-StaticIP,
Repair-CommonIssues. Linux: nvidia-gpu-exporter, disk-cleanup,
headless-server-setup.

What was killed:
- All 5 Windows monitoring scripts + their tests
- 5 of 6 Windows backup scripts + tests (Backup-DeveloperEnvironment survives)
- Windows Test-DevEnvironment.ps1, Manage-WSL.ps1
- Windows Get-SystemReport, Get-UserAccountAudit, Manage-VPN
- 3 of 4 Linux maintenance scripts (disk-cleanup survives)
- Linux service-health-monitor, docker-cleanup, pod-health-monitor,
  security-hardening (security work lives in defensive-toolkit)
- 5 umbrella Pester test files that pre-dated the per-script
  *.Behavioral.Tests.ps1 files
- 6 dir READMEs for now-empty categories

Docs rewritten: README, QUICKSTART, BACKLOG, docs/ROADMAP, CHANGELOG,
subdirectory READMEs (Windows/backup, Windows/network,
Windows/development, Linux/maintenance, Linux/monitoring).
tests/Linux/maintenance.bats trimmed to disk-cleanup only.

New policy: any script that goes 6 months without a fix: commit
triggered by real failure is a candidate for archival. Cancelled
Sprint 7 (Linux coverage gaps) — would have produced more ghost code.

Surviving Pester suite: 607 tests passing (was 1320 — the delta went
with the deleted scripts, no orphan refs).
@github-actions

github-actions Bot commented Jun 14, 2026

Copy link
Copy Markdown

Linux Test Results

10 tests   - 148   10 ✅  - 89   0s ⏱️ -1s
 3 suites  -  47    0 💤 ± 0 
 1 files   ±  0    0 ❌  - 59 

Results for commit e3c7cab. ± Comparison against base commit ebd7283.

This pull request removes 148 tests.
Docker Cleanup Script - File Structure.docker-cleanup.sh exists
Docker Cleanup Script - File Structure.docker-cleanup.sh has shebang
Kubernetes Monitoring Scripts - File Structure.File Permissions (Executable).pod-health-monitor.sh has shebang
Kubernetes Monitoring Scripts - File Structure.File Permissions (Executable).pvc-monitor.sh has shebang
Kubernetes Monitoring Scripts - File Structure.Required Files.pod-health-monitor.sh exists
Kubernetes Monitoring Scripts - File Structure.Required Files.pvc-monitor.sh exists
Linux Maintenance Scripts - File Structure.File Permissions (Executable).restore-previous-state.sh has shebang
Linux Maintenance Scripts - File Structure.File Permissions (Executable).system-updates.sh has shebang
Linux Maintenance Scripts - File Structure.Required Files.README.md exists
Linux Maintenance Scripts - File Structure.Required Files.config.example.json exists
…

♻️ This comment has been updated with latest results.

@github-actions

github-actions Bot commented Jun 14, 2026

Copy link
Copy Markdown

Windows Test Results

607 tests   - 607   607 ✅  - 607   35s ⏱️ -12s
208 suites  - 172     0 💤 ±  0 
  1 files   ±  0     0 ❌ ±  0 

Results for commit e3c7cab. ± Comparison against base commit ebd7283.

This pull request removes 607 tests.
Backup Scripts - File Existence.Compare-SoftwareInventory.ps1 should exist
Backup Scripts - File Existence.Export-SystemState.ps1 should exist
Backup Scripts - File Existence.Test-BackupIntegrity.ps1 should exist
Backup-BrowserProfiles.ps1.Feature Implementation.Should define browser profile paths
Backup-BrowserProfiles.ps1.Feature Implementation.Should have backup compression functionality
Backup-BrowserProfiles.ps1.Feature Implementation.Should have bookmark export to HTML function
Backup-BrowserProfiles.ps1.Feature Implementation.Should have browser extension detection
Backup-BrowserProfiles.ps1.Feature Implementation.Should have restore functionality
Backup-BrowserProfiles.ps1.Feature Implementation.Should have retention policy cleanup
Backup-BrowserProfiles.ps1.Help Documentation.Should have description
…

♻️ This comment has been updated with latest results.

The pin `fb74f38f7d00949e1ddd4e49e59ba5dd17f2bb46` annotated as v3.88.1
was never an actual git object in trufflesecurity/trufflehog — same
broken-SHA-pin pattern as the EnricoMi fix in commit 71d41b9. The
real v3.88.1 SHA is d73edfb..., but bumping to v3.95.5 (latest stable,
2026-06-02) since we're already disturbing the pin.

New SHA `d411fff7b8879a62509f3fa98c07f247ac089a51` verified as a
commit-type ref via the GitHub git/refs/tags API.

Updates two call sites:
- .github/workflows/pr-checks.yml: TruffleHog PR scan
- .github/workflows/security-scan.yml: TruffleHog incremental scan
@github-actions github-actions Bot added the ci/cd label Jun 14, 2026
Dashtid added 2 commits June 14, 2026 19:11
`pr-checks.yml`'s "Scan for common secrets patterns" step was excluding
only `pr-checks.yml` itself, so the grep matched the literal
"BEGIN RSA PRIVATE KEY" / "BEGIN OPENSSH PRIVATE KEY" pattern strings
inside `security-scan.yml:49-50` and treated them as real secrets.

This bug was always present but never surfaced because the trufflehog
step ran first with a broken SHA pin (see commit 67301aa) and failed
the workflow before this step was reached. Fixing the SHA pin exposed
the dormant grep self-match.

Aligning with the pattern that `security-scan.yml` already uses
(`--exclude="*.yml"`). Trufflehog scans the YAML files for real
secrets so coverage isn't lost; the naive grep is for non-workflow
code anyway.
Two pre-existing assertions in `tests/Linux/GPUMonitoring.Tests.ps1`
were stricter than the script they cover; both failed once Linux Test
Results actually started running (after the trufflehog SHA pin fix
unblocked the workflow).

- `command -v nvidia-smi` -> accept either that or `check_command
  nvidia-smi` (the helper from `Linux/lib/bash/common-functions.sh`
  the script now uses).
- `METRICS_DIR="/var/lib/prometheus/node-exporter"` -> `METRICS_DIR=.*?...`
  so the test passes against the current env-var-overridable form
  `METRICS_DIR="${METRICS_DIR:-/var/lib/prometheus/node-exporter}"`.

The script itself is unchanged. Local Pester: 10/10 pass.
@Dashtid Dashtid merged commit dfae1cb into main Jun 14, 2026
14 checks passed
@Dashtid Dashtid deleted the chore/cull-ghost-code branch June 14, 2026 17:46
Dashtid added a commit that referenced this pull request Jun 14, 2026
`Trivy Filesystem Scan` has been failing 54 consecutive runs on main
since at least 2025-10-15. The currently-pinned `trivy-action@v0.29.0`
transitively references `aquasecurity/setup-trivy@v0.2.2`, which
GitHub Actions can no longer resolve — the job aborts during "Set up
job", before checkout, before the scan even starts. This was masked
on PRs because the broken SHA pins in #10 made everyone assume CI
churn was pin-related; it wasn't.

Changes:
- Bump `aquasecurity/trivy-action` from v0.29.0 to v0.36.0
  (SHA `ed142fd...`). v0.36.0 is the current release post-March-2026
  trivy-action supply-chain incident — earlier 0.0.1-0.34.2 had their
  tags re-pointed during the compromise and should be avoided.
- Add `TRIVY_DB_REPOSITORY` / `TRIVY_JAVA_DB_REPOSITORY` env vars to
  point at the public.ecr.aws mirrors. GHCR rate-limits hitting
  trivy-action runs is the dominant CI failure mode in 2025-2026
  (trivy-action#389); the AWS ECR mirror sidesteps it.
- Set `ignore-unfixed: true`, `exit-code: 0`, `continue-on-error: true`.
  This makes Trivy advisory: findings still appear in the GitHub
  Security tab via the existing SARIF upload, but they don't gate
  merges. For a single-developer toolkit with no production blast
  radius, that's the right altitude.

Adding to PR #10 because both fixes touch security-scan.yml and the
broader theme is the same: stale CI plumbing exposed once the
trufflehog pin fix from PR #9 unblocked the workflow chain.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant