Skip to content

feat: Release Tracks — floating Docker tags (latest/standard/trailing) promotion engine (#36160)#36161

Draft
sfreudenthaler wants to merge 18 commits into
mainfrom
feat/36160-evergreen-release-tracks
Draft

feat: Release Tracks — floating Docker tags (latest/standard/trailing) promotion engine (#36160)#36161
sfreudenthaler wants to merge 18 commits into
mainfrom
feat/36160-evergreen-release-tracks

Conversation

@sfreudenthaler

@sfreudenthaler sfreudenthaler commented Jun 14, 2026

Copy link
Copy Markdown
Member

Resolves #36160 · Epic #35693

Proposed Changes

  • Add a small Python promotion engine at cicd/evergreen-tracks/ (managed with uv) that advances three floating Docker tagslatest / standard / trailing — across the linear GA CalVer stream by release age (newest GA, ~14d, ~28d; thresholds configurable).
  • State is registry-only via marker tags: <version>_tainted (forward-only block — a bad release can't propagate to more conservative tracks) and <track>_hold (sticky manual freeze). No separate datastore; the audit trail is the Actions run logs.
  • Tags are re-pointed by digest with docker buildx imagetools create (no layer re-push). Age is read from the CalVer date in the version, not build/publish date, so emergency backports of older releases can't be swept into a future promotion.
  • Two workflows: cicd_evergreen-tracks-promote.yml (daily cron + dispatch) and cicd_evergreen-tracks-admin.yml (manual taint/hold).
  • Document Release Tracks (usage + the "why") in the root README.md.

Checklist

  • Tests — 60 unit tests (pure planner, calver, markers, registry parser, CLI); run with cd cicd/evergreen-tracks && uv run pytest.
  • Translations — N/A (CI tooling, no UI strings).
  • Security Implications Contemplated — see notes below.

Additional Info

Tag control: latest is moved on-demand by the release pipeline (promote-latest job in cicd_6-release.yml, --tracks latest) the moment a GA's images publish, for dotcms/dotcms and dotcms/dotcms-dev — the old deploy-docker latest: true path is unwired (latest: false). The daily cron ages standard/trailing forward and always applies (no separate enable gate). To pause promotion, disable the scheduled workflow or hold the track; to block a bad release, taint it. All registry mutations (release-driven latest, cron, admin) serialize under one concurrency group evergreen-tracks-registry.

Credential scope (important): promotion needs only write, but untaint and release-hold call the Hub delete API — the DOCKER_USERNAME/DOCKER_TOKEN used here must have Read/Write/Delete scope, or those two admin actions fail. (Verified both ways: write-only 403s on delete; RWD succeeds.)

Validation: the full lifecycle (promote-by-age, taint→skip, hold→freeze, release-hold→resume, untaint→restore, teardown) was exercised end-to-end against the dotcms/dotcms-test sandbox and verified by digest. The live smoke can't run in core-workflow-test CI (it intentionally carries no live Docker secrets), so it should run here in core CI / on dispatch.

Security notes: free-text workflow inputs are passed via env: (not interpolated into run:) to avoid expression injection; no tokens are logged; least-privilege permissions: contents: read on the promote workflow.

Out of scope (per epic): LTS-line tracks, Java-variant track tags, the Cloud control-plane UI, and update cadence changes.

🤖 Generated with Claude Code

Steve Freudenthaler and others added 15 commits June 14, 2026 14:19
(cherry picked from commit db22adbc8975bee8e43e2a7143fb978d45f2a1d0)
(cherry picked from commit 50591149121ec21388c99f1534ac9bcdcb8b8822)
(cherry picked from commit 295eab6f1e3b6a1b7adb3111a88e3e083cbaf0c9)
(cherry picked from commit 8d3884d56d6745585a716b1b46682634edf63db4)
(cherry picked from commit b1936fc5c88ee4d66d416cb37ec7055ac78672aa)
…tags

The previous test named 'test_list_tags_paginates_and_returns_name_digest'
had next=null and did not exercise the pagination loop. Replace it with a
test that serves two mocked responses (page 1 with next pointing to page 2,
page 2 with next=null) and asserts tags from both pages appear in the result.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
(cherry picked from commit acbe5a4e62db88e6bcb36f1b6b3b210315c509cf)
(cherry picked from commit 7c8da73bfbbfc840004c865a5d982fabbe8ed441)
(cherry picked from commit f5c72e96613b362946dfe5da57cfd2f4acfc86e1)
(cherry picked from commit 073f0b99109d1d61d9b0e653eb1e37ce5334a61c)
…uard env vars

- Move --apply flag from top-level parser onto each subparser (promote and admin)
  so `evergreen-tracks promote --repo foo/bar --apply` works as expected
- Add tests/test_cli.py with 25 tests covering parser behaviour, cmd_promote,
  and all cmd_admin paths (taint/untaint/hold/release-hold, guards, return codes)
- Replace bare os.environ[] lookups for DOCKER_USERNAME/DOCKER_TOKEN with
  os.environ.get() guards that return code 2 with a clear error message instead
  of crashing with an unhandled KeyError

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
(cherry picked from commit 41543da383b66b3402e828d2d93d329fd61ce659)
(cherry picked from commit 2ca500f882b9dc0fd36c5d078217c02c3d073d14)
(cherry picked from commit 4dd9ea1c72b6234b595c87bcea5d78da6ef8cc2f)
…cile held tracks

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
(cherry picked from commit 8ed72a269078cc8686ff8f5854d8d91d7f1c7afb)
…nt no-op

A promote run with a held track and no other moves previously produced no
output. Log "held at <marker>, skipping promotion" for consistent held
tracks and always report when idle. Adds a caplog test.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
(cherry picked from commit 046a73ba1d6f7b789d831e136117f9f84b2bf0be)
…36160)

- Promote/admin workflows default to dotcms/dotcms.
- Daily cron promotion is gated behind the EVERGREEN_TRACKS_APPLY repo
  variable (dry-run until explicitly enabled) so merging does not start
  mutating production tags.
- Workflow inputs moved into env to avoid expression injection in run steps.
- Document Release Tracks in the root README, including the CalVer-date
  age rationale and the taint/hold model.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@github-actions github-actions Bot added Area : CI/CD PR changes GitHub Actions/workflows Area : Documentation PR changes documentation files labels Jun 14, 2026
@claude

claude Bot commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Claude finished @sfreudenthaler's task in 1m 3s —— View job


Rollback Safety Analysis

  • Gather context and understand request
  • Read rollback-unsafe categories reference
  • Get full PR diff
  • Analyze changes against each unsafe category
  • Apply label

Result: ✅ Safe To Rollback

All 20 changed files are pure additions (+1659 / -0) in three buckets:

Bucket Files Rollback Impact
CI/CD workflows .github/workflows/cicd_evergreen-tracks-admin.yml, cicd_evergreen-tracks-promote.yml No database, ES, or API changes. Workflows only run on explicit trigger or cron after EVERGREEN_TRACKS_APPLY=true is set.
Python promotion engine cicd/evergreen-tracks/ (source, tests, pyproject.toml, uv.lock) Self-contained tooling in a subdirectory; no Java, no DB schema, no ES mapping, no REST/GraphQL surface.
Documentation README.md, cicd/evergreen-tracks/README.md Docs only.

Category-by-category verdict:

Category Match?
C-1 Structural Data Model Change ❌ No DB schema changes
C-2 Elasticsearch Mapping Change ❌ No ES mapping touched
C-3 Content JSON Model Version Bump ❌ No contentlet_as_json changes
C-4 DROP TABLE / DROP COLUMN ❌ No DDL drops
H-1 One-Way Data Migration ❌ No data transformation tasks
H-2 RENAME TABLE / RENAME COLUMN ❌ None
H-3 PK / Unique Constraint Restructuring ❌ None
H-4 New Field Type ❌ None
H-5 Storage Provider Change ❌ None
H-6 DROP PROCEDURE / FUNCTION ❌ None
H-7 NOT NULL Column Without Default ❌ None
H-8 VTL Viewtool Contract Change ❌ None
M-1 Column Type Change ❌ None
M-2 Push Publishing Bundle Format ❌ None
M-3 REST / GraphQL API Contract ❌ None
M-4 OSGi Interface Change ❌ None

Rolling back dotCMS from a release that included this PR to the previous release is fully safe: the promotion engine lives entirely in CI/CD and leaves no persistent state in the database, Elasticsearch, or any other shared datastore. The only external side-effect is Docker Hub tag mutations, which are controlled by the separate EVERGREEN_TRACKS_APPLY gate and have their own marker-tag-based audit trail in the registry.

Label added: AI: Safe To Rollback

@github-actions

github-actions Bot commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

🤖 Codex Review — openai.gpt-5.5

**[> [🟠 High] .github/workflows/cicd_evergreen-tracks-admin.yml:34DOCKER_USERNAME / DOCKER_TOKEN are set at job scope, so every step/action in the job receives Docker Hub credentials, including third-party setup actions that don’t need them. Keep these secrets scoped only to docker/login-action and the final admin run step.

**[> [🟠 High] .github/workflows/cicd_evergreen-tracks-admin.yml:34DOCKER_USERNAME / DOCKER_TOKEN are set at job scope, so every step/action in the job receives Docker Hub credentials, including third-party setup actions that don’t need them. Keep these secrets scoped only to docker/login-action and the final admin run step.

[🟠 High] .github/workflows/cicd_evergreen-tracks-promote.yml:34 — the concurrency group is scoped to ${{ github.workflow }} and ${{ github.ref }}, while the admin workflow has no shared concurrency group. Promote runs from different refs, and promote/admin runs, can still overlap while reading and mutating the same registry tags, which can overwrite holds/taints or move tracks from stale state.

**[> [🟠 High] .github/workflows/cicd_evergreen-tracks-admin.yml:34DOCKER_USERNAME / DOCKER_TOKEN are set at job scope, so every step/action in the job receives Docker Hub credentials, including third-party setup actions that don’t need them. Keep these secrets scoped only to docker/login-action and the final admin run step.

[🟠 High] .github/workflows/cicd_evergreen-tracks-promote.yml:34 — the concurrency group is scoped to ${{ github.workflow }} and ${{ github.ref }}, while the admin workflow has no shared concurrency group. Promote runs from different refs, and promote/admin runs, can still overlap while reading and mutating the same registry tags, which can overwrite holds/taints or move tracks from stale state.

[🟡 Medium] cicd/evergreen-tracks/src/evergreen_tracks/cli.py:89untaint calls hub_login() before delete_tag(..., apply=args.apply), so even dry-runs require Docker credentials and make a live Docker Hub login request. This breaks the documented dry-run behavior and prevents credential-free validation.

**[> [🟠 High] .github/workflows/cicd_evergreen-tracks-admin.yml:34DOCKER_USERNAME / DOCKER_TOKEN are set at job scope, so every step/action in the job receives Docker Hub credentials, including third-party setup actions that don’t need them. Keep these secrets scoped only to docker/login-action and the final admin run step.

[🟠 High] .github/workflows/cicd_evergreen-tracks-promote.yml:34 — the concurrency group is scoped to ${{ github.workflow }} and ${{ github.ref }}, while the admin workflow has no shared concurrency group. Promote runs from different refs, and promote/admin runs, can still overlap while reading and mutating the same registry tags, which can overwrite holds/taints or move tracks from stale state.

[🟡 Medium] cicd/evergreen-tracks/src/evergreen_tracks/cli.py:89untaint calls hub_login() before delete_tag(..., apply=args.apply), so even dry-runs require Docker credentials and make a live Docker Hub login request. This breaks the documented dry-run behavior and prevents credential-free validation.

[🟡 Medium] cicd/evergreen-tracks/src/evergreen_tracks/cli.py:114release-hold has the same dry-run issue: it logs into Docker Hub before checking/applying the delete, so --apply false still requires credentials and external network access.


Run: #27556377640 · tokens: in: 26182 · out: 7564 (reasoning: 7230) · total: 33746

revision = 3
requires-python = ">=3.12"

[[package]]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Legal Risk

certifi 2026.5.20 was released under the MPL-2.0 license, a license that
has been flagged by your organization for consideration.

Recommendation

While merging is not directly blocked, it's best to pause and consider what it means to use this license before continuing. If you are unsure, reach out to your security team or Semgrep admin to address this issue.

The promote workflow had no concurrency control, so a scheduled run and a
manual promote/admin dispatch could overlap. Both read live registry state
and then apply tag moves, so concurrent runs risk acting on stale state and
overwriting a hold/taint or moving a track on outdated data.

Add a workflow-level concurrency group keyed by workflow + ref with
cancel-in-progress: false so runs queue rather than abort mid-promote.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@sfreudenthaler

Copy link
Copy Markdown
Member Author

🟠 High — .github/workflows/cicd_evergreen-tracks-promote.yml:27: the promote workflow has no concurrency group, so a scheduled run and a manual promote/admin run can overlap, both read registry state, then apply stale tag moves.

Addressed in 15f171a. Added a workflow-level concurrency group keyed by github.workflow + github.ref with cancel-in-progress: false, so a scheduled run and a manual promote/admin dispatch are serialized — concurrent runs queue and wait their turn rather than aborting mid-promote and leaving tags half-moved. Queueing (not cancelling) is the safer choice for a tag-promotion engine.

@wezell wezell left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, so :latest is going to become 2 weeks old? Should we maintain latest as is and add another tag called... :current? latest is a well known convention.

@sfreudenthaler

Copy link
Copy Markdown
Member Author

Wait, so :latest is going to become 2 weeks old? Should we maintain latest as is and add another tag called... :current? latest is a well known convention.

after live discussion decision is to hook our releaser into this as a on-demand invoke that way

  • latest moves right away
  • only one tool asserts control moving evergreen track tags

@sfreudenthaler sfreudenthaler marked this pull request as draft June 15, 2026 19:57
…line

Addresses @wezell's review: `latest` is a well-known convention and must
not lag behind a GA. Per the live decision, hook the releaser into the
evergreen-tracks engine so latest moves immediately, and make that engine
the single controller of latest/standard/trailing.

- cli: add `--tracks` subset filter to `promote` so a caller can scope to
  one track (e.g. `--tracks latest`).
- cicd_6-release.yml: stop moving latest via deploy-docker (`latest: false`)
  and add a `promote-latest` job that invokes the engine on-demand once the
  release images are published, for dotcms/dotcms and dotcms/dotcms-dev.
  Applies unconditionally for a real latest release from main on dotcms/core
  (restores always-on behavior; NOT behind EVERGREEN_TRACKS_APPLY, which
  still gates only the aged standard/trailing tracks during rollout).
- Serialize all registry mutations under one static concurrency group
  (`evergreen-tracks-registry`) shared by promote, admin, and the release
  promote-latest job, closing the remaining cross-workflow race.
- Tests + README updated.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@sfreudenthaler

Copy link
Copy Markdown
Member Author

Wait, so :latest is going to become 2 weeks old? Should we maintain latest as is and add another tag called... :current? latest is a well known convention.

Good catch — latest stays exactly as you'd expect (newest GA), and we keep the convention. Per our live discussion, I wired the releaser into this engine so there's one controller for all three tags instead of two. Pushed in 3539720:

  • latest moves on-demand, immediately. A new promote-latest job in cicd_6-release.yml invokes the evergreen-tracks engine right after the release images publish (--tracks latest --apply), for both dotcms/dotcms and dotcms/dotcms-dev. No 24h cron lag.
  • Unwired the old path. cicd_6-release.yml no longer asks deploy-docker to move latest (latest: false). The engine is now the single mover of latest/standard/trailing.
  • Rollout safety preserved. The release-triggered latest promote applies unconditionally (it replaces always-on behavior); it is not behind EVERGREEN_TRACKS_APPLY, which still gates only the new aged standard/trailing tracks until we flip it on.
  • Race closed. All registry mutations (daily cron, admin, and this release-triggered promote) now serialize under one static concurrency group evergreen-tracks-registry — this also addresses the remaining cross-workflow concurrency point from the Codex review.

So standard ≈ 2 weeks and trailing ≈ 4 weeks are the aged tracks; latest is unchanged.

The gate's only durable value was a global pause switch, which is already
covered by `hold` (per-track freeze), `taint` (block a release), and
disabling the scheduled workflow. `latest` no longer depends on it (it's
release-driven), and standard/trailing have no consumers yet so applying
the daily cron is low-risk. Cron now always applies; manual dispatch keeps
its dry-run default.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI: Safe To Rollback Area : CI/CD PR changes GitHub Actions/workflows Area : Documentation PR changes documentation files

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

Floating Docker tags for Release Tracks: latest/standard/trailing promotion engine

2 participants