chore: cull ghost-code (-29 KLOC, post-audit scope reset) by Dashtid · Pull Request #9 · Dashtid/sysadmin-toolkit

Dashtid · 2026-06-14T15:59:28Z

Summary

The 2026-06-14 audit (web research + git archaeology) found ~13 KLOC of production scripts and ~8 KLOC of tests defending behavior with no operational consumer on a single-user laptop. This PR culls them.

Diff: 65 files changed, 197 insertions, 29,176 deletions.

Why

174 commits over 14 months; only ~3 looked like "ran it, broke, fixed" — the rest was test backfill and refactor churn.
Most categories duplicated either native Windows tools (Task Manager, Event Viewer, `wsl.exe`, Settings) or the lab-server stack on q-lab (Prometheus/Grafana for monitoring, Velero for backup, k9s for Kubernetes).
The pre-cull 0.65 test:prod ratio was a symptom — most tests mocked OS calls and asserted on the mocks, which proves wiring not behavior.

What survived

Category	Scripts
Windows first-time-setup	fresh-windows-setup, export/install-from-exported-packages, Compare-SoftwareInventory
Windows maintenance	system-updates + Install-SystemUpdatesTask
Windows backup	Backup-DeveloperEnvironment (snapshot before rebuild)
Windows development	Manage-Docker, remote-development-setup
Windows network	Set-StaticIP
Windows troubleshooting	Repair-CommonIssues
Linux	nvidia-gpu-exporter, disk-cleanup, headless-server-setup

What was killed

Category	Reason
All 5 Windows `monitoring/*`	Task Manager + Sysinternals + Reliability Monitor + Event Viewer cover it; no scheduling, no alerting on this single-user box
5 of 6 Windows `backup/*`	OneDrive + browser sync + Velero (on q-lab) cover it; restore paths were untested in disaster
Windows `Test-DevEnvironment.ps1`, `Manage-WSL.ps1`	`wsl.exe` wins; dev-env checker had no clear trigger
Windows `reporting/`, `security/Get-UserAccountAudit`, `network/Manage-VPN`	Single-user laptop doesn't need AD audit; OpenVPN GUI handles VPN
3 of 4 Linux `maintenance/*`	`apt` / `journalctl --vacuum-time` cover it; the rollback script was speculative
Linux `docker-cleanup`, `pod-health-monitor`, `service-health-monitor`	q-lab Prometheus/k9s cover these
Linux `security-hardening.sh`	Lives in `defensive-toolkit`
5 umbrella Pester test files (Backup, Monitoring, Tier2, Tier3, DeveloperEnvironment)	Pre-dated per-script behavioral tests; redundant or referenced deleted scripts
6 dir READMEs	Their directories are now empty

Docs rewritten

`README.md` — script catalog reflects post-cull scope
`QUICKSTART.md` — examples use surviving scripts
`BACKLOG.md` — Current State block rewritten; Sprint 7 cancelled; closeouts retained as history
`docs/ROADMAP.md` — strategic doc rewritten to reflect deliberate-narrow-scope philosophy
`CHANGELOG.md` — new 3.0.0 entry documenting the cull
Subdirectory READMEs for surviving categories (Windows/backup, Windows/network, Windows/development, Linux/maintenance, Linux/monitoring)
`tests/Linux/maintenance.bats` trimmed to disk-cleanup only

Policy going forward

Any script that goes 6 months without a `fix:` commit triggered by real failure is a candidate for archival.
No more "behavioral coverage" sprints — favor smoke tests for the surviving setup scripts; don't mock-pad scripts that aren't invoked.

Note on commit count

This PR includes 8 commits: the 1 cull commit + 7 pre-existing local-main commits that hadn't been pushed to origin (CI fix, kubernetes-cli choco pin, Sprint 4.x/5.x test backfill, Sprint 5.2 refactor, Sprint 6.1 BATS unify, Sprint 6.3 retry/backoff). Merging this PR will catch origin/main up to local main + apply the cull.

Test plan

Surviving Pester suite passes: 607 tests, 0 failures (was 1320 — the delta went with the deleted scripts, no orphan refs)
`tests/Linux/maintenance.bats` BATS-preprocessor-syntax sanity check
CI runs green on this PR

…4.3) Rename Main to Invoke-BrowserProfileBackup with a mirrored param() block so the testability guard forwards every script param explicitly (same pattern as Sprint 4.2). Replace inner exit N with return N so the function returns an exit code cleanly. Add explicit -Path to Get-BackupDirectory and -BackupDir to Get-BackupList so they are independently testable without relying on dynamic scope reads of script-level $OutputPath. 34 Pester tests cover the 11 helpers (Get-BackupDirectory custom-path and fallback, Test-BrowserInstalled per-browser path probe, Get-FirefoxProfiles INI parsing for both IsRelative=1 and IsRelative=0, Get-BrowserExtensions Chrome/Edge/Brave manifest parsing including __MSG_ locale-placeholder fallback and corrupt-JSON fallback plus Firefox extensions enumeration, Export-BookmarksToHtml writing valid Netscape-bookmark-file-1 HTML for Chrome and the Firefox info-line placeholder, Backup-BrowserProfile happy path / IncludeCookies+IncludeHistory switches / error catch / Firefox multi-profile iteration, Compress-BackupFolder zip+remove original and failure fallback, Remove-OldBackups RetentionDays=0 short-circuit and age-cutoff filtering, Get-BackupList filename pattern parsing and skip of unmatched filenames, Restore-BrowserProfile missing-archive / no-Firefox-profile / temp-dir cleanup in the finally block, Export-HtmlReport file output) plus the top-level Invoke-BrowserProfileBackup paths (ListBackups empty / with-rows, Restore-without-target returns 1, no-browser-installed returns 2, IncludePasswords writes PASSWORD_REMINDER.txt). New Pester gotcha documented: Get-Content's -Path and -LiteralPath are distinct parameters. A Mock ParameterFilter on $Path does not see -LiteralPath values, so when tests verify written files via Get-Content -LiteralPath, the mock filter still matches (with $Path null) and returns mocked content instead of the real file. Workaround is to read verification files via [System.IO.File]::ReadAllText() so the test bypasses the mock entirely. Coverage 33.54% -> 37.27% (+3.73 pp). Tests 1214 -> 1248.

Wrap the top-level try/catch in Invoke-SystemStateExport with a mirrored param() block so the testability guard forwards every script param explicitly (same pattern as Sprint 4.2/4.3). Replace inner exit N with return N. Helpers that read $DryRun via dynamic scope are left as-is; tests exercise the production path and use Invoke-SystemStateExport's own -DryRun param for the dry-run code path. 18 Pester tests cover Get-ExportComponents ('All' expansion / explicit list passthrough), New-ExportFolder timestamped folder + subdir structure, Export-Drivers success + Get-PnpDevice throw, Export-Services JSON+CSV output, Export-WindowsFeatures windows-features.json, Export-NetworkConfig (adapters / ip-config / routes / dns / firewall), Export-ScheduledTasks (verifying the \Microsoft\* path filter excludes system tasks), Export-EventLogs, New-ExportManifest writing valid JSON with components/results, Compress-ExportFolder zip+remove and failure-fallback, Export-HTMLReport file output with stats/results table, Export-JSONReport ComputerName/Statistics/Results structure, and Invoke-SystemStateExport top-level (-DryRun returns 0, fatal error returns 1, dispatcher only invokes listed components). Coverage 37.27% -> 40.83% (+3.56 pp, crossed the 40% threshold). Tests 1248 -> 1266.

Wrap the top-level try/catch/finally in Invoke-BackupIntegrityTest with a mirrored param() block so the testability guard forwards every script param explicitly (same pattern as Sprint 4.2-4.4). Replace inner exit N with return N. The finally-block temp-folder cleanup stays in place inside the function, so it still runs whether the function returns normally or throws. 27 Pester tests cover the 10 helper functions + Invoke top-level. Tests build a real ZIP archive in $TestDrive (containing a real backup_metadata.json with SHA256 hashes) so the archive helpers run end-to-end against real bytes -- no Mock for Compress-Archive, Expand-Archive, or [System.IO.Compression.ZipFile]::OpenRead. This gives much higher signal than a fully mocked test pyramid because the actual zip-parsing branches execute. Coverage: - Format-FileSize bytes/KB/MB/GB boundaries. - Get-BackupInfo archive vs folder vs corrupted-archive paths. - Test-ArchiveStructure valid + corrupt. - Get-BackupMetadata archive-internal, folder-resident, missing-file warning path. - Expand-BackupToTemp success path (real extraction + $script:TempFolder bookkeeping) and failure path. - Test-FileHashes Skipped=true when no FileHashes in metadata, HashesMatched++ on a real SHA256 match, mismatched-hash recording in Stats.FailedFiles. - Test-FileExtraction readable vs corrupted-archive error. - Restore-ToTarget archive+folder paths and Expand-Archive failure. - Remove-TempFolder existing folder + missing folder no-throw. - Export-HTMLReport / Export-JSONReport file output. - Invoke-BackupIntegrityTest Restore-without-target returns 1, Quick happy path returns 0, fatal-error path returns 1. Coverage 40.83% -> 44.12% (+3.29 pp). Tests 1266 -> 1293. Sprint 4 complete: 132 tests added (+13.96 pp from 30.16% to 44.12%), 1 production bug fixed, 5 scripts in the backup category fully covered. Beat the +8-10 pp estimate.

…nt 5.1) Refactor the straight-line main try/catch into Invoke-SystemPerformance with a mirrored param() block (same wrap-and-forward pattern as Sprint 4.2-4.5). Inner exit 1 becomes return 1; the testability guard at the bottom forwards every script param explicitly. Fix one real production bug: four script-level parameters carried over from a "merged from Watch-DiskSpace.ps1" refactor (-IncludeDiskAnalysis, -AutoCleanup, -TopFilesCount, -TopFoldersCount) were declared on the script's param block but never actually wired into the main flow. Get-DiskAnalysis existed as a complete helper -- it just was never called. Users who passed -IncludeDiskAnalysis or -AutoCleanup saw the parameter accepted without a parser error and walked away thinking they had a disk-analysis report when nothing had run. Reconnected the wire-up: when -IncludeDiskAnalysis is set in single-run mode, main now calls Get-DiskAnalysis -DiskVolumes $metrics.DiskVolumes -EnableAutoCleanup:$AutoCleanup after metrics collection. The continuous-monitor loop deliberately skips this (repeated heavy disk scans would not make sense for a real-time monitor). 23 Pester tests cover the major helpers: Get-ThresholdAlerts Critical/Warning bands and multi-alert combinations, Get-TopProcesses sort+top-N and PID 0 filter and Get-Process-throws fallback, Get-SystemInfo CIM aggregation, Get-LargestFiles >100MB filter, Get-CleanupSuggestions Temp / Windows-Update branches, Invoke-DiskAutoCleanup with Remove-Item and Clear-RecycleBin destructive operations Mock'd so no real deletion can happen, Get-DiskAnalysis dispatcher (Warning threshold gate, EnableAutoCleanup+Critical-only auto-clean trigger), Export-JSONReport and Export-CSVReport file output, and Invoke-SystemPerformance happy path / fatal-error / AlertOnly / IncludeDiskAnalysis wire-up verification. Updated the literal-'exit 1' meta-check in Monitoring.Tests.ps1 to also accept 'return 1' / 'exit $exitCode' (the new patterns introduced by this refactor). Coverage 44.12% -> 46.76% (+2.64 pp). Tests 1293 -> 1316.

…ironmentTest (Sprint 5.2) Renames Main -> Invoke-DevEnvironmentTest with a mirrored param() block (Profile, RequirementsFile, AutoInstall, CheckSSH, CheckExtensions, OutputFormat, OutputPath). Replaces inner exit 1 / final exit $exitCode with return so the function returns an exit code cleanly. Adds a testability guard that splats script params into the function only when not dot-sourced -- same pattern used in Sprints 4.x and 5.1. Behavioral tests are deferred. The script depends on 17 external CLI tools (git, node, npm, python, pip, docker, kubectl, ssh, code, gh, az, terraform, ...). An initial attempt at stub-based tests caused Pester to exit with code 4 and zero output, and a separate run hung when the SSH probe bypassed the stub. BACKLOG already flagged this script as the "most expensive mocking surface in the repo"; the refactor pays for itself by uncovering bugs the same way Sprint 5.1 did.

Adds a -Linux switch and auto-detection so the runner covers both suites in one command. With no flags, runs whichever runner is available (Pester for Windows, bats for Linux) and skips the other with a warning. With an explicit -Windows or -Linux flag, errors out if that runner isn't installed. BATS results are parsed from TAP output: the plan line (1..N) gives the test count; ok/not-ok lines give pass/fail counts. Each .bats file under tests/Linux is invoked individually, mirroring the CI loop. Closes Sprint 6.1.

…(Sprint 6.3) Restore-VsCodeExtension now retries a failed `code --install-extension` call once with a 5-second backoff before giving up. Previous behavior logged "Failed to install: <ext>" on any non-zero exit code and moved on, so a transient marketplace blip or rate-limit caused permanent extension loss across a restore. Retry is bounded (2 attempts max) so a real failure still completes the restore quickly. The old single "Continues iterating past failures" test is replaced by two contexts covering the new behavior: retry-success (extension fails attempt 1, succeeds attempt 2: count reflects success, code invoked Total+1 times, Start-Sleep invoked once) and retry-failure (extension fails both attempts: count reflects failure, code still invoked Total+1 times). Start-Sleep is mocked so tests stay fast. Pester 1316 -> 1320 (24 tests in this file, all passing).

The 2026-06-14 audit (web research + git archaeology) found ~13 KLOC of production scripts and ~8 KLOC of tests defending behavior with no operational consumer on a single-user laptop. 174 commits over 14 months, only ~3 looked like "ran it, broke, fixed" — the rest was test backfill and refactor churn. Most categories duplicated either native Windows tools (Task Manager, Event Viewer, wsl.exe, Settings) or the lab-server stack on q-lab (Prometheus/Grafana for monitoring, Velero for backup, k9s for Kubernetes). What survived: 4 first-time-setup scripts, system-updates + Install-SystemUpdatesTask, Backup-DeveloperEnvironment (snapshot before rebuild), Manage-Docker, remote-development-setup, Set-StaticIP, Repair-CommonIssues. Linux: nvidia-gpu-exporter, disk-cleanup, headless-server-setup. What was killed: - All 5 Windows monitoring scripts + their tests - 5 of 6 Windows backup scripts + tests (Backup-DeveloperEnvironment survives) - Windows Test-DevEnvironment.ps1, Manage-WSL.ps1 - Windows Get-SystemReport, Get-UserAccountAudit, Manage-VPN - 3 of 4 Linux maintenance scripts (disk-cleanup survives) - Linux service-health-monitor, docker-cleanup, pod-health-monitor, security-hardening (security work lives in defensive-toolkit) - 5 umbrella Pester test files that pre-dated the per-script *.Behavioral.Tests.ps1 files - 6 dir READMEs for now-empty categories Docs rewritten: README, QUICKSTART, BACKLOG, docs/ROADMAP, CHANGELOG, subdirectory READMEs (Windows/backup, Windows/network, Windows/development, Linux/maintenance, Linux/monitoring). tests/Linux/maintenance.bats trimmed to disk-cleanup only. New policy: any script that goes 6 months without a fix: commit triggered by real failure is a candidate for archival. Cancelled Sprint 7 (Linux coverage gaps) — would have produced more ghost code. Surviving Pester suite: 607 tests passing (was 1320 — the delta went with the deleted scripts, no orphan refs).

github-actions · 2026-06-14T16:00:45Z

Linux Test Results

10 tests - 148 10 ✅ - 89 0s ⏱️ -1s
3 suites - 47 0 💤 ± 0
1 files ± 0 0 ❌ - 59

Results for commit e3c7cab. ± Comparison against base commit ebd7283.

This pull request removes 148 tests.

Docker Cleanup Script - File Structure.docker-cleanup.sh exists
Docker Cleanup Script - File Structure.docker-cleanup.sh has shebang
Kubernetes Monitoring Scripts - File Structure.File Permissions (Executable).pod-health-monitor.sh has shebang
Kubernetes Monitoring Scripts - File Structure.File Permissions (Executable).pvc-monitor.sh has shebang
Kubernetes Monitoring Scripts - File Structure.Required Files.pod-health-monitor.sh exists
Kubernetes Monitoring Scripts - File Structure.Required Files.pvc-monitor.sh exists
Linux Maintenance Scripts - File Structure.File Permissions (Executable).restore-previous-state.sh has shebang
Linux Maintenance Scripts - File Structure.File Permissions (Executable).system-updates.sh has shebang
Linux Maintenance Scripts - File Structure.Required Files.README.md exists
Linux Maintenance Scripts - File Structure.Required Files.config.example.json exists
…

♻️ This comment has been updated with latest results.

github-actions · 2026-06-14T16:03:26Z

Windows Test Results

607 tests - 607 607 ✅ - 607 35s ⏱️ -12s
208 suites - 172 0 💤 ± 0
1 files ± 0 0 ❌ ± 0

Results for commit e3c7cab. ± Comparison against base commit ebd7283.

This pull request removes 607 tests.

Backup Scripts - File Existence.Compare-SoftwareInventory.ps1 should exist
Backup Scripts - File Existence.Export-SystemState.ps1 should exist
Backup Scripts - File Existence.Test-BackupIntegrity.ps1 should exist
Backup-BrowserProfiles.ps1.Feature Implementation.Should define browser profile paths
Backup-BrowserProfiles.ps1.Feature Implementation.Should have backup compression functionality
Backup-BrowserProfiles.ps1.Feature Implementation.Should have bookmark export to HTML function
Backup-BrowserProfiles.ps1.Feature Implementation.Should have browser extension detection
Backup-BrowserProfiles.ps1.Feature Implementation.Should have restore functionality
Backup-BrowserProfiles.ps1.Feature Implementation.Should have retention policy cleanup
Backup-BrowserProfiles.ps1.Help Documentation.Should have description
…

♻️ This comment has been updated with latest results.

The pin `fb74f38f7d00949e1ddd4e49e59ba5dd17f2bb46` annotated as v3.88.1 was never an actual git object in trufflesecurity/trufflehog — same broken-SHA-pin pattern as the EnricoMi fix in commit 71d41b9. The real v3.88.1 SHA is d73edfb..., but bumping to v3.95.5 (latest stable, 2026-06-02) since we're already disturbing the pin. New SHA `d411fff7b8879a62509f3fa98c07f247ac089a51` verified as a commit-type ref via the GitHub git/refs/tags API. Updates two call sites: - .github/workflows/pr-checks.yml: TruffleHog PR scan - .github/workflows/security-scan.yml: TruffleHog incremental scan

`pr-checks.yml`'s "Scan for common secrets patterns" step was excluding only `pr-checks.yml` itself, so the grep matched the literal "BEGIN RSA PRIVATE KEY" / "BEGIN OPENSSH PRIVATE KEY" pattern strings inside `security-scan.yml:49-50` and treated them as real secrets. This bug was always present but never surfaced because the trufflehog step ran first with a broken SHA pin (see commit 67301aa) and failed the workflow before this step was reached. Fixing the SHA pin exposed the dormant grep self-match. Aligning with the pattern that `security-scan.yml` already uses (`--exclude="*.yml"`). Trufflehog scans the YAML files for real secrets so coverage isn't lost; the naive grep is for non-workflow code anyway.

Two pre-existing assertions in `tests/Linux/GPUMonitoring.Tests.ps1` were stricter than the script they cover; both failed once Linux Test Results actually started running (after the trufflehog SHA pin fix unblocked the workflow). - `command -v nvidia-smi` -> accept either that or `check_command nvidia-smi` (the helper from `Linux/lib/bash/common-functions.sh` the script now uses). - `METRICS_DIR="/var/lib/prometheus/node-exporter"` -> `METRICS_DIR=.*?...` so the test passes against the current env-var-overridable form `METRICS_DIR="${METRICS_DIR:-/var/lib/prometheus/node-exporter}"`. The script itself is unchanged. Local Pester: 10/10 pass.

`Trivy Filesystem Scan` has been failing 54 consecutive runs on main since at least 2025-10-15. The currently-pinned `trivy-action@v0.29.0` transitively references `aquasecurity/setup-trivy@v0.2.2`, which GitHub Actions can no longer resolve — the job aborts during "Set up job", before checkout, before the scan even starts. This was masked on PRs because the broken SHA pins in #10 made everyone assume CI churn was pin-related; it wasn't. Changes: - Bump `aquasecurity/trivy-action` from v0.29.0 to v0.36.0 (SHA `ed142fd...`). v0.36.0 is the current release post-March-2026 trivy-action supply-chain incident — earlier 0.0.1-0.34.2 had their tags re-pointed during the compromise and should be avoided. - Add `TRIVY_DB_REPOSITORY` / `TRIVY_JAVA_DB_REPOSITORY` env vars to point at the public.ecr.aws mirrors. GHCR rate-limits hitting trivy-action runs is the dominant CI failure mode in 2025-2026 (trivy-action#389); the AWS ECR mirror sidesteps it. - Set `ignore-unfixed: true`, `exit-code: 0`, `continue-on-error: true`. This makes Trivy advisory: findings still appear in the GitHub Security tab via the existing SARIF upload, but they don't gate merges. For a single-developer toolkit with no production blast radius, that's the right altitude. Adding to PR #10 because both fixes touch security-scan.yml and the broader theme is the same: stale CI plumbing exposed once the trufflehog pin fix from PR #9 unblocked the workflow chain.

Dashtid added 8 commits June 11, 2026 21:08

github-actions Bot added docker documentation Improvements or additions to documentation kubernetes linux maintenance monitoring security windows labels Jun 14, 2026

github-actions Bot added the ci/cd label Jun 14, 2026

Dashtid added 2 commits June 14, 2026 19:11

Dashtid merged commit dfae1cb into main Jun 14, 2026
14 checks passed

Dashtid deleted the chore/cull-ghost-code branch June 14, 2026 17:46

Dashtid mentioned this pull request Jun 14, 2026

fix(ci): repoint three more broken SHA pins in security-scan.yml #10

Merged

2 tasks

Dashtid mentioned this pull request Jun 14, 2026

docs(backlog): empty Active work; move shrinks to Cancelled #11

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: cull ghost-code (-29 KLOC, post-audit scope reset)#9

chore: cull ghost-code (-29 KLOC, post-audit scope reset)#9
Dashtid merged 11 commits into
mainfrom
chore/cull-ghost-code

Dashtid commented Jun 14, 2026

Uh oh!

github-actions Bot commented Jun 14, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 14, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Dashtid commented Jun 14, 2026

Summary

Why

What survived

What was killed

Docs rewritten

Policy going forward

Note on commit count

Test plan

Uh oh!

github-actions Bot commented Jun 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Linux Test Results

Uh oh!

github-actions Bot commented Jun 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Windows Test Results

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented Jun 14, 2026 •

edited

Loading

github-actions Bot commented Jun 14, 2026 •

edited

Loading