Skip to content

fix(scheduler): derive K8s service_url default at runtime (closes #179)#181

Open
initializ-mk wants to merge 1 commit into
mainfrom
fix/issue-179-k8s-scheduler-default-service-url
Open

fix(scheduler): derive K8s service_url default at runtime (closes #179)#181
initializ-mk wants to merge 1 commit into
mainfrom
fix/issue-179-k8s-scheduler-default-service-url

Conversation

@initializ-mk

Copy link
Copy Markdown
Contributor

Summary

Closes #179. The runtime K8s scheduler backend was hard-erroring when `scheduler.kubernetes.service_url` was unset, even though the build-time `schedule_manifest_stage` already knew how to default the same field. This PR mirrors the build-time default in the runtime so an in-cluster agent without an explicit `service_url` comes up cleanly.

```

Before (in-cluster, no service_url in forge.yaml)

Error: kubernetes scheduler backend: scheduler.kubernetes.service_url is required

After

ServiceURL auto-derived to http://<agent_id>..svc:/
```

Changes

Code

  • `forge-cli/runtime/scheduler_k8s_backend.go`
    • `K8sBackendConfig` gains a `Port int` field (defaults to 8080 to match the runner's listen-port default).
    • New helper `defaultK8sServiceURL(agentID, namespace, port)` mirrors `forge-cli/build/schedule_manifest_stage.go:70-82`.
    • `NewKubernetesBackend` and `NewKubernetesBackendWithClient` both derive the default after namespace resolution; explicit `ServiceURL` still wins.
    • The `service_url is required` hard-error path is gone.
  • `forge-cli/runtime/runner.go` — `selectScheduleBackend` plumbs `r.cfg.Port` into the new `K8sBackendConfig.Port` field.

Tests

`forge-cli/runtime/scheduler_k8s_backend_test.go`:

Docs

  • `docs/deployment/scheduler-kubernetes.md` — new "service_url defaulting" subsection; YAML comment updated.
  • `docs/core-concepts/scheduling.md` — YAML comment updated with cross-link.
  • `CHANGELOG.md` — Unreleased / Fixed entry.

Test plan

  • `go test ./forge-cli/runtime/ -count=1` — full forge-cli/runtime suite passes (15.9s).
  • `go test ./forge-cli/build/ -run TestSchedule -count=1` — build-stage schedule tests still pass.
  • `go vet ./forge-cli/runtime/ ./forge-cli/build/` — clean.
  • `gofmt -l` on touched files — clean.
  • `golangci-lint run` on touched files — 0 issues.
  • Manual: deploy a sidecar agent to a cluster with no `scheduler.kubernetes.service_url` set — confirm it boots and `forge.<agent_id>..svc:/` is logged.

Risks

Low. The change only adds a fallback for a previously-fatal path. Operators with an explicit `service_url` (the supported configuration today) see no change in behavior — pinned by `TestKubernetesBackend_ServiceURLExplicitOverride`. Field-backed defaults match the build stage's behavior 1:1, so newly-derived URLs identical to what `forge package` would have written.

Pre-fix, the K8s scheduler backend's runtime constructor hard-errored
when scheduler.kubernetes.service_url was empty:

    Error: kubernetes scheduler backend: scheduler.kubernetes.service_url is required

…even though the build-time schedule-manifest stage at
forge-cli/build/schedule_manifest_stage.go already knew how to default
the same field to http://<agent_id>.<namespace>.svc:<port>/ when
unset. Two adjacent code paths reaching opposite conclusions for the
same missing field — operators who deployed in-cluster without an
explicit service_url couldn't start the agent.

Fix: mirror the build-time default in the runtime constructor.

  - K8sBackendConfig gains a Port int field; selectScheduleBackend
    plumbs r.cfg.Port into it.
  - NewKubernetesBackend (and the -WithClient test seam) derives
    http://<agent_id>.<namespace>.svc:<port>/ when cfg.ServiceURL is
    empty. defaultK8sServiceURL is the shared helper.
  - Port=0 falls back to 8080 (matches the runner's listen-port
    default at forge-cli/runtime/runner.go:152-153).
  - The hard-error branch is gone — there's no scenario in-cluster
    where we can't derive a sensible default.
  - Operator override semantics unchanged: an explicit service_url
    always wins. Pinned by TestKubernetesBackend_ServiceURLExplicitOverride.

Tests added:
  - TestKubernetesBackend_ServiceURLDefaultDerivation — the #179 pin:
    empty ServiceURL + Port=9090 → http://my-agent.ns-a.svc:9090/
  - TestKubernetesBackend_ServiceURLDefaultPortFallback — Port=0
    falls back to 8080
  - TestKubernetesBackend_ServiceURLExplicitOverride — explicit
    ServiceURL (e.g. https://gateway.example.com/...) passes through
    untouched

Docs:
  - docs/deployment/scheduler-kubernetes.md — new "service_url
    defaulting" subsection; YAML comment updated to note the
    auto-derivation
  - docs/core-concepts/scheduling.md — YAML comment updated with
    cross-link
  - CHANGELOG.md — Unreleased / Fixed entry
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Runtime K8s scheduler backend hard-errors when scheduler.kubernetes.service_url is unset (build stage auto-derives the same value)

1 participant