Skip to content

feat(scheduler): K8s scheduler backend + forge package manifests + docs sweep (closes #162)#172

Merged
initializ-mk merged 5 commits into
mainfrom
feat/issue-162-package-manifests
Jun 15, 2026
Merged

feat(scheduler): K8s scheduler backend + forge package manifests + docs sweep (closes #162)#172
initializ-mk merged 5 commits into
mainfrom
feat/issue-162-package-manifests

Conversation

@initializ-mk

Copy link
Copy Markdown
Contributor

Summary

Brings to main the work from the stacked PRs that merged against their part-2 base instead of main: KubernetesBackend with client-go (originally #170), forge package manifest generation (originally #171), and the sync-docs sweep covering the full #162 stack plus other recently merged work (#155 #159 #161 #164 #168).

Part 2 (#169ScheduleBackend interface + FileBackend + CronJob manifest helpers) is already in main from a separate merge; this PR is rebased onto it, so the diff shows only what's new.

Stack reconciliation

Part Originally PR Now
1 — forge auth subcommand #168 ✅ merged to main included via main
2 — Backend interface + FileBackend + manifest helpers #169 ✅ merged to main included via main
2b — KubernetesBackend (client-go) #170 ✅ merged to part-2 branch rolled into this PR
3 — forge package CronJob/Secret/Role #171 ✅ merged to part-2b branch rolled into this PR
Docs sweep landed on part-3 branch rolled into this PR

Three commits on top of main

  1. feat(scheduler): KubernetesBackend with client-go for runtime CronJob CRUD — runtime backend; reconciles via labels (forge.agent.id, forge.schedule.id, forge.schedule.source); preserves LLM-sourced CronJobs through Sync; AllowDynamic gates Set/Delete (default false); History defers to the audit stream. 9 tests against fake.Clientset.
  2. feat(build): forge package emits CronJob/Secret-template/Role manifestsScheduleManifestStage between K8sStage and ValidateStage. Secret template has no data field (operator populates out-of-band via forge auth secret-yaml). Role verbs gated by allow_dynamic. 7 tests, including the credential-safety invariant.
  3. docs(sync-docs): K8s scheduler stack + recent mergesdocs/core-concepts/scheduling.md (backend table), docs/reference/forge-yaml-schema.md (scheduler: block), docs/reference/cli-reference.md (forge auth), README.md (link), .claude/skills/forge.md (sections 9, 11, 13, 14, 18, 19).

End-to-end operator workflow now possible

forge build && forge package                        # emits k8s/cronjob-*.yaml + secret template + RBAC
forge auth mint-token > /dev/null                   # first deploy from clean checkout
forge auth secret-yaml | kubectl apply -f -         # populates the Secret out-of-band
kubectl apply -k .forge-output/k8s/                 # brings up the agent + CronJobs

Cluster takes over scheduling. CronJob ticks → curl with Authorization: Bearer <token> and X-Forge-Schedule-Id: <id> to the agent's A2A endpoint → normal A2A dispatch → audit chain (auth_verify source=internal → session_startllm_callinvocation_complete).

Dependency footprint

k8s.io/api, k8s.io/apimachinery, k8s.io/client-go @ v0.36.2 added to forge-cli only. forge-core stays client-go-free so library consumers (Initializ platform importing forge-core/runtime) don't pay the ~10 MB transitive cost.

Test plan

  • go test -count=1 ./... clean in forge-core and forge-cli (16 new tests across the two scheduler files; 7 new tests for the build stage)
  • golangci-lint run ./... → 0 issues
  • gofmt -w applied
  • Rebased on top of current main; no merge conflicts; the part-2 commit is correctly recognized as already-applied
  • Broken-link grep over docs/ and .claude/skills/ — no real broken links (only regex false-positives from adjacent markdown links)

What this closes

Closes #162 — the full hybrid scheduler backend is now mergeable to main as one focused PR.

… CRUD (#162 part 2b)

Second half of part 2 of the #162 stack. Builds on the
ScheduleBackend interface + FileBackend refactor + manifest
helpers shipped in part 2 (PR #169). Adds the real K8s
runtime backend, dependency on k8s.io/client-go, and the
forge.yaml-driven backend selection in the runner.

forge-cli/runtime/scheduler_k8s_backend.go
  KubernetesBackend implements scheduler.Backend by delegating
  persistence + timing to the cluster's CronJob controller:

  - Start/Stop/Reload are no-ops (cluster owns timing)
  - Sync reconciles cluster CronJobs against declared yaml
    entries: create, update on drift, prune dropped yaml
    entries, PRESERVE LLM-sourced entries unconditionally
  - Set / Delete are gated by AllowDynamic (default false);
    yaml-sourced CronJobs cannot be deleted via direct Delete
    (Sync is the only removal path for declarative entries)
  - List filters by forge.agent.id label so unrelated CronJobs
    in the namespace don't appear in schedule_list output
  - History returns empty + warns once; the audit stream's
    schedule_fire/complete events are the canonical source
  - CronJobs are constructed in-memory to match the YAML the
    forge-core scheduler.CronJobYAML emits byte-for-byte, so
    runtime reconcile doesn't churn against forge package
    manifests (#162 part 3)
  - Round-trips Schedule <-> CronJob via labels (agent.id,
    schedule.id, schedule.source) and annotations (task, skill,
    channel, channel_target, run_count, last_status)
  - LastRun read from CronJob.Status.LastScheduleTime
  - NewKubernetesBackendWithClient testing seam accepts an
    explicit kubernetes.Interface (fake.Clientset in tests)

forge-cli/runtime/runner.go
  Backend selection wired off forge.yaml scheduler.backend:
  - "kubernetes" — always K8s; errors at startup when not
    in-cluster (FORGE_IN_CLUSTER=true overrides for tests)
  - "file"       — always FileBackend
  - "auto"/""    — K8s when in-cluster, file otherwise
  Drops the old syncYAMLSchedules helper now that the runner
  calls Backend.Sync(declaredSchedules()) for both modes.

go.mod
  k8s.io/api, k8s.io/apimachinery, k8s.io/client-go @ v0.36.2.

Tests (9 cases against fake.Clientset):
  - Sync creates CronJobs for declared entries with the
    expected labels + ConcurrencyPolicy=Forbid
  - Sync is idempotent (no churn on no-op re-run)
  - Sync updates on cron drift
  - Sync prunes yaml entries removed from the manifest
  - Sync preserves LLM-sourced entries on yaml-only re-Sync
  - Dynamic Set is gated by AllowDynamic with an actionable
    error referencing the config flag
  - Dynamic Delete of a yaml-sourced schedule is refused with
    an error pointing operators at the manifest
  - List filters by forge.agent.id (unrelated CronJobs in the
    namespace are not returned)
  - History returns empty + does not error

Docs:
  - docs/deployment/scheduler-kubernetes.md gains the RBAC
    table, the annotation round-trip reference, and the
    "what's not in the K8s backend" section flagging
    schedule_history deferral + cross-namespace out-of-scope
    + token rotation as a follow-up.

Refs #162
…ts for schedules (#162 part 3)

Final piece of the #162 stack. Adds ScheduleManifestStage to the
build pipeline so `forge package` materializes the K8s manifests
needed to deploy yaml-declared schedules as native CronJobs.

forge-cli/build/schedule_manifest_stage.go
  New ScheduleManifestStage runs after K8sStage. For each
  forge.yaml `schedules[]` entry it emits:

  - k8s/cronjob-<id>.yaml           one CronJob per schedule, body
                                    built from scheduler.CronJobYAML
                                    so it's byte-equivalent to what
                                    the KubernetesBackend (#162
                                    part 2b) writes via client-go
  - k8s/internal-token-secret.yaml  Secret WITHOUT a data field —
                                    operator populates out-of-band
                                    via `forge auth secret-yaml`
                                    (#162 part 1), ExternalSecrets /
                                    Sealed Secrets / SOPS / Vault.
                                    Manifest carries an inline
                                    runbook comment so an operator
                                    inspecting the file knows the
                                    populate paths without reading
                                    the docs separately
  - k8s/scheduler-role.yaml         Role scoped to the agent's
                                    namespace. Verbs gated by
                                    allow_dynamic: get/list/watch
                                    always; create/update/patch/
                                    delete only when allow_dynamic=
                                    true
  - k8s/scheduler-rolebinding.yaml  Binds the Role to the agent's
                                    ServiceAccount (named after the
                                    agent_id, matching the existing
                                    deployment template convention)

  No-ops when forge.yaml has no schedules[]. Opts out when
  scheduler.backend=file is explicit. Default service URL is the
  in-cluster Service DNS for the agent on Runtime.Port (or 8080
  when Spec.Runtime is unset); operators override with
  scheduler.kubernetes.service_url for non-default port/namespace
  or Ingress/Gateway-fronted deploys.

forge-cli/cmd/build.go
  Register ScheduleManifestStage between K8sStage and ValidateStage
  so validation catches manifest issues before signing.

Tests (7 cases):
  - No-op when no schedules
  - No-op when scheduler.backend=file (operator opted out)
  - Emits one CronJob per schedule with the right labels and
    concurrencyPolicy=Forbid; default service URL points at the
    in-cluster Service DNS on Runtime.Port
  - Secret template has NO uncommented `data:` line (security
    invariant — generated manifest must not carry a credential),
    documents the forge-auth-secret-yaml populate path inline
  - Role verbs gated by allow_dynamic: parses the `verbs:` line
    specifically (the prose comment legitimately mentions
    create/update/delete) and asserts get/list/watch only when
    off, plus create/update/patch/delete when on
  - RoleBinding subject is a ServiceAccount named after agent_id
    and roleRef points at the matching Role
  - Explicit scheduler.kubernetes.service_url overrides the
    in-cluster DNS default — important for Ingress/Gateway deploys

The stack is now complete: part 1 ships the operator-facing token
primitives (forge auth subcommand), part 2 ships the
ScheduleBackend interface + FileBackend refactor + manifest helpers,
part 2b ships the KubernetesBackend with client-go for runtime
CronJob CRUD, and this PR ships the build-pipeline integration so
operators get an end-to-end K8s-native scheduled-agent deploy via
`forge package && kubectl apply -k ./k8s`.

Closes #162
…knowledge skill

Sweeps the doc areas mapped to the code changes on this branch
(parts 2/2b/3 of #162) plus the recently merged work that landed
on main since the last sync-docs run.

docs/core-concepts/scheduling.md
  Replace single-paragraph "Execution Details" with a backend
  comparison table + the scheduler.yaml block + the cross-link to
  the new scheduler-kubernetes.md reference. Notes that K8s mode
  defers history to the audit stream instead of writing
  SCHEDULES.md.

docs/reference/forge-yaml-schema.md
  Add the `scheduler:` block under `schedules:` documenting the
  three backend modes and the `kubernetes:` tuning fields
  (namespace, service_url, allow_dynamic, trigger_image,
  auth_secret_name).

docs/reference/cli-reference.md
  Add the `forge auth` subcommand reference (PR #168 / #162 part 1)
  with the show-token / mint-token / secret-yaml flows and the
  label-tracks-forge.yaml invariant note.

README.md
  Link scheduler-kubernetes.md from the documentation index.

.claude/skills/forge.md
  Section 9 (Scheduling): hybrid Backend interface, FileBackend /
  KubernetesBackend table, InCluster() detection, AllowDynamic
  rule, cross-link to scheduler-kubernetes.md.
  Section 11 (Build pipeline): new ScheduleManifestStage row (6a)
  documenting the cronjob/secret-template/role triplet.
  Section 13 (CLI surface): `forge auth` row.
  Section 14 (forge.yaml): scheduler block in the example yaml.
  Section 18 (Workstream recap): three new rows — Guardrails audit
  (#155/#159/#161), Tenancy + entity stamping (#157/#164), K8s
  scheduler (#162).
  Section 19 (Docs map): add tenancy.md + scheduler-kubernetes.md
  tree entries; annotate guardrails.md and audit-logging.md with
  the latest issue refs.

No code changes; pure docs sweep.
k8s.io/client-go v0.36 requires Go 1.26, which bumped forge-cli/go.mod
to `go 1.26.0` and broke CI on Go 1.25.x with:

  compile: version "go1.26.0" does not match go tool version "go1.25.11"

Downgrade to k8s.io/api / apimachinery / client-go @ v0.34.1 — the
most recent series that still supports Go 1.25. KubernetesBackend
API surface is identical between v0.34 and v0.36 (BatchV1 CronJobs,
fake.Clientset, in-cluster config); no code changes required.

All 16 new tests across the two scheduler files + 7 build-stage tests
pass on v0.34.1.

Restore `go 1.25.0` directive in forge-cli/go.mod to match forge-core
/ forge-plugins / forge-skills / forge-ui.
…idy)

go.work's `go 1.26.0` directive was overriding the individual
go.mod files (all at 1.25.0) and forcing the Go toolchain to
download 1.26 — which then mismatched CI's installed Go 1.25.x and
broke compilation of any package the workspace touched, not just
the k8s.io-dependent forge-cli code.

Symptom (in CI):

  compile: version "go1.26.0" does not match go tool version "go1.25.11"

…repeated for every internal Go runtime package across forge-core/
channels, forge-core/compiler, forge-core/pipeline, forge-core/plugins,
forge-cli/config, forge-cli/cmd/forge, forge-cli/internal/tui*,
forge-cli/skills, forge-cli/templates.

The go.work bump was an unintended side-effect of the earlier
`go get k8s.io/client-go@latest` invocation — go mod tidy
propagates the highest go directive across the workspace.

Reset go.work to `go 1.25.0` to match the rest of the modules.
The previous k8s.io pin to v0.34.1 (which supports Go 1.25) stands.
@initializ-mk initializ-mk merged commit dd460bb into main Jun 15, 2026
18 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Kubernetes-native scheduler backend + manifest generation in forge package

1 participant