feat(scheduler): K8s scheduler backend + forge package manifests + docs sweep (closes #162)#172
Merged
Merged
Conversation
… CRUD (#162 part 2b) Second half of part 2 of the #162 stack. Builds on the ScheduleBackend interface + FileBackend refactor + manifest helpers shipped in part 2 (PR #169). Adds the real K8s runtime backend, dependency on k8s.io/client-go, and the forge.yaml-driven backend selection in the runner. forge-cli/runtime/scheduler_k8s_backend.go KubernetesBackend implements scheduler.Backend by delegating persistence + timing to the cluster's CronJob controller: - Start/Stop/Reload are no-ops (cluster owns timing) - Sync reconciles cluster CronJobs against declared yaml entries: create, update on drift, prune dropped yaml entries, PRESERVE LLM-sourced entries unconditionally - Set / Delete are gated by AllowDynamic (default false); yaml-sourced CronJobs cannot be deleted via direct Delete (Sync is the only removal path for declarative entries) - List filters by forge.agent.id label so unrelated CronJobs in the namespace don't appear in schedule_list output - History returns empty + warns once; the audit stream's schedule_fire/complete events are the canonical source - CronJobs are constructed in-memory to match the YAML the forge-core scheduler.CronJobYAML emits byte-for-byte, so runtime reconcile doesn't churn against forge package manifests (#162 part 3) - Round-trips Schedule <-> CronJob via labels (agent.id, schedule.id, schedule.source) and annotations (task, skill, channel, channel_target, run_count, last_status) - LastRun read from CronJob.Status.LastScheduleTime - NewKubernetesBackendWithClient testing seam accepts an explicit kubernetes.Interface (fake.Clientset in tests) forge-cli/runtime/runner.go Backend selection wired off forge.yaml scheduler.backend: - "kubernetes" — always K8s; errors at startup when not in-cluster (FORGE_IN_CLUSTER=true overrides for tests) - "file" — always FileBackend - "auto"/"" — K8s when in-cluster, file otherwise Drops the old syncYAMLSchedules helper now that the runner calls Backend.Sync(declaredSchedules()) for both modes. go.mod k8s.io/api, k8s.io/apimachinery, k8s.io/client-go @ v0.36.2. Tests (9 cases against fake.Clientset): - Sync creates CronJobs for declared entries with the expected labels + ConcurrencyPolicy=Forbid - Sync is idempotent (no churn on no-op re-run) - Sync updates on cron drift - Sync prunes yaml entries removed from the manifest - Sync preserves LLM-sourced entries on yaml-only re-Sync - Dynamic Set is gated by AllowDynamic with an actionable error referencing the config flag - Dynamic Delete of a yaml-sourced schedule is refused with an error pointing operators at the manifest - List filters by forge.agent.id (unrelated CronJobs in the namespace are not returned) - History returns empty + does not error Docs: - docs/deployment/scheduler-kubernetes.md gains the RBAC table, the annotation round-trip reference, and the "what's not in the K8s backend" section flagging schedule_history deferral + cross-namespace out-of-scope + token rotation as a follow-up. Refs #162
…ts for schedules (#162 part 3) Final piece of the #162 stack. Adds ScheduleManifestStage to the build pipeline so `forge package` materializes the K8s manifests needed to deploy yaml-declared schedules as native CronJobs. forge-cli/build/schedule_manifest_stage.go New ScheduleManifestStage runs after K8sStage. For each forge.yaml `schedules[]` entry it emits: - k8s/cronjob-<id>.yaml one CronJob per schedule, body built from scheduler.CronJobYAML so it's byte-equivalent to what the KubernetesBackend (#162 part 2b) writes via client-go - k8s/internal-token-secret.yaml Secret WITHOUT a data field — operator populates out-of-band via `forge auth secret-yaml` (#162 part 1), ExternalSecrets / Sealed Secrets / SOPS / Vault. Manifest carries an inline runbook comment so an operator inspecting the file knows the populate paths without reading the docs separately - k8s/scheduler-role.yaml Role scoped to the agent's namespace. Verbs gated by allow_dynamic: get/list/watch always; create/update/patch/ delete only when allow_dynamic= true - k8s/scheduler-rolebinding.yaml Binds the Role to the agent's ServiceAccount (named after the agent_id, matching the existing deployment template convention) No-ops when forge.yaml has no schedules[]. Opts out when scheduler.backend=file is explicit. Default service URL is the in-cluster Service DNS for the agent on Runtime.Port (or 8080 when Spec.Runtime is unset); operators override with scheduler.kubernetes.service_url for non-default port/namespace or Ingress/Gateway-fronted deploys. forge-cli/cmd/build.go Register ScheduleManifestStage between K8sStage and ValidateStage so validation catches manifest issues before signing. Tests (7 cases): - No-op when no schedules - No-op when scheduler.backend=file (operator opted out) - Emits one CronJob per schedule with the right labels and concurrencyPolicy=Forbid; default service URL points at the in-cluster Service DNS on Runtime.Port - Secret template has NO uncommented `data:` line (security invariant — generated manifest must not carry a credential), documents the forge-auth-secret-yaml populate path inline - Role verbs gated by allow_dynamic: parses the `verbs:` line specifically (the prose comment legitimately mentions create/update/delete) and asserts get/list/watch only when off, plus create/update/patch/delete when on - RoleBinding subject is a ServiceAccount named after agent_id and roleRef points at the matching Role - Explicit scheduler.kubernetes.service_url overrides the in-cluster DNS default — important for Ingress/Gateway deploys The stack is now complete: part 1 ships the operator-facing token primitives (forge auth subcommand), part 2 ships the ScheduleBackend interface + FileBackend refactor + manifest helpers, part 2b ships the KubernetesBackend with client-go for runtime CronJob CRUD, and this PR ships the build-pipeline integration so operators get an end-to-end K8s-native scheduled-agent deploy via `forge package && kubectl apply -k ./k8s`. Closes #162
…knowledge skill Sweeps the doc areas mapped to the code changes on this branch (parts 2/2b/3 of #162) plus the recently merged work that landed on main since the last sync-docs run. docs/core-concepts/scheduling.md Replace single-paragraph "Execution Details" with a backend comparison table + the scheduler.yaml block + the cross-link to the new scheduler-kubernetes.md reference. Notes that K8s mode defers history to the audit stream instead of writing SCHEDULES.md. docs/reference/forge-yaml-schema.md Add the `scheduler:` block under `schedules:` documenting the three backend modes and the `kubernetes:` tuning fields (namespace, service_url, allow_dynamic, trigger_image, auth_secret_name). docs/reference/cli-reference.md Add the `forge auth` subcommand reference (PR #168 / #162 part 1) with the show-token / mint-token / secret-yaml flows and the label-tracks-forge.yaml invariant note. README.md Link scheduler-kubernetes.md from the documentation index. .claude/skills/forge.md Section 9 (Scheduling): hybrid Backend interface, FileBackend / KubernetesBackend table, InCluster() detection, AllowDynamic rule, cross-link to scheduler-kubernetes.md. Section 11 (Build pipeline): new ScheduleManifestStage row (6a) documenting the cronjob/secret-template/role triplet. Section 13 (CLI surface): `forge auth` row. Section 14 (forge.yaml): scheduler block in the example yaml. Section 18 (Workstream recap): three new rows — Guardrails audit (#155/#159/#161), Tenancy + entity stamping (#157/#164), K8s scheduler (#162). Section 19 (Docs map): add tenancy.md + scheduler-kubernetes.md tree entries; annotate guardrails.md and audit-logging.md with the latest issue refs. No code changes; pure docs sweep.
k8s.io/client-go v0.36 requires Go 1.26, which bumped forge-cli/go.mod to `go 1.26.0` and broke CI on Go 1.25.x with: compile: version "go1.26.0" does not match go tool version "go1.25.11" Downgrade to k8s.io/api / apimachinery / client-go @ v0.34.1 — the most recent series that still supports Go 1.25. KubernetesBackend API surface is identical between v0.34 and v0.36 (BatchV1 CronJobs, fake.Clientset, in-cluster config); no code changes required. All 16 new tests across the two scheduler files + 7 build-stage tests pass on v0.34.1. Restore `go 1.25.0` directive in forge-cli/go.mod to match forge-core / forge-plugins / forge-skills / forge-ui.
…idy) go.work's `go 1.26.0` directive was overriding the individual go.mod files (all at 1.25.0) and forcing the Go toolchain to download 1.26 — which then mismatched CI's installed Go 1.25.x and broke compilation of any package the workspace touched, not just the k8s.io-dependent forge-cli code. Symptom (in CI): compile: version "go1.26.0" does not match go tool version "go1.25.11" …repeated for every internal Go runtime package across forge-core/ channels, forge-core/compiler, forge-core/pipeline, forge-core/plugins, forge-cli/config, forge-cli/cmd/forge, forge-cli/internal/tui*, forge-cli/skills, forge-cli/templates. The go.work bump was an unintended side-effect of the earlier `go get k8s.io/client-go@latest` invocation — go mod tidy propagates the highest go directive across the workspace. Reset go.work to `go 1.25.0` to match the rest of the modules. The previous k8s.io pin to v0.34.1 (which supports Go 1.25) stands.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Brings to main the work from the stacked PRs that merged against their part-2 base instead of main: KubernetesBackend with client-go (originally #170),
forge packagemanifest generation (originally #171), and the sync-docs sweep covering the full #162 stack plus other recently merged work (#155 #159 #161 #164 #168).Part 2 (#169 —
ScheduleBackendinterface +FileBackend+ CronJob manifest helpers) is already in main from a separate merge; this PR is rebased onto it, so the diff shows only what's new.Stack reconciliation
forge authsubcommandforge packageCronJob/Secret/RoleThree commits on top of main
feat(scheduler): KubernetesBackend with client-go for runtime CronJob CRUD— runtime backend; reconciles via labels (forge.agent.id,forge.schedule.id,forge.schedule.source); preserves LLM-sourced CronJobs through Sync;AllowDynamicgates Set/Delete (default false); History defers to the audit stream. 9 tests againstfake.Clientset.feat(build): forge package emits CronJob/Secret-template/Role manifests—ScheduleManifestStagebetweenK8sStageandValidateStage. Secret template has nodatafield (operator populates out-of-band viaforge auth secret-yaml). Role verbs gated byallow_dynamic. 7 tests, including the credential-safety invariant.docs(sync-docs): K8s scheduler stack + recent merges—docs/core-concepts/scheduling.md(backend table),docs/reference/forge-yaml-schema.md(scheduler:block),docs/reference/cli-reference.md(forge auth),README.md(link),.claude/skills/forge.md(sections 9, 11, 13, 14, 18, 19).End-to-end operator workflow now possible
Cluster takes over scheduling. CronJob ticks → curl with
Authorization: Bearer <token>andX-Forge-Schedule-Id: <id>to the agent's A2A endpoint → normal A2A dispatch → audit chain (auth_verifysource=internal →session_start→llm_call→invocation_complete).Dependency footprint
k8s.io/api,k8s.io/apimachinery,k8s.io/client-go@ v0.36.2 added to forge-cli only. forge-core stays client-go-free so library consumers (Initializ platform importingforge-core/runtime) don't pay the ~10 MB transitive cost.Test plan
go test -count=1 ./...clean in forge-core and forge-cli (16 new tests across the two scheduler files; 7 new tests for the build stage)golangci-lint run ./...→ 0 issuesgofmt -wapplieddocs/and.claude/skills/— no real broken links (only regex false-positives from adjacent markdown links)What this closes
Closes #162 — the full hybrid scheduler backend is now mergeable to main as one focused PR.