Skip to content

Add SLO workflow for the Java SDK#644

Open
polRk wants to merge 6 commits intomasterfrom
slo-tests
Open

Add SLO workflow for the Java SDK#644
polRk wants to merge 6 commits intomasterfrom
slo-tests

Conversation

@polRk
Copy link
Copy Markdown
Member

@polRk polRk commented Apr 26, 2026

Summary

Adds two GitHub Actions workflows that drive ydb-platform/ydb-slo-action against the Java SDK. The actual workload sources live in ydb-java-examples (companion PR: ydb-platform/ydb-java-examples#61).

Workflows

.github/workflows/slo.yml

  • Builds Docker images for the current PR and the baseline (auto-detected as merge-base with master, overridable via workflow_dispatch).
  • Hands both images to ydb-slo-action/init@v2 with KV read/write RPS flags.
  • Triggers: PRs labelled SLO, push to master, manual workflow_dispatch.
  • Concurrency-grouped per ref + matrix entry, with cancel-in-progress.
  • Currently a single matrix entry (java-query-kv); easy to extend to other workloads later.

.github/workflows/slo-report.yml

  • Listens via workflow_run for the SLO workflow to finish.
  • On success, calls ydb-slo-action/report@v2 to post the comparison report to the PR.
  • Removes the SLO label from the PR after publishing.

Build script: .github/scripts/build-slo-image.sh

The Java SDK under test cannot be assumed to be published to a remote Maven repository, so the script:

  1. Assembles a temporary build context with ydb-java-sdk/ and ydb-java-examples/ checkouts side by side.
  2. Feeds that context to the Dockerfile shipped in ydb-java-examples/slo/.
  3. The multi-stage Dockerfile then builds the SDK from source, installs it into an in-image local Maven repo, and pins ydb.sdk.version in the examples parent pom to that version before building the workload.

The script accepts a --fallback-image flag for the baseline build (mirroring the equivalent script in ydb-go-sdk) so a historical SDK commit that no longer compiles doesn't break the SLO run.

Triggering convention

Same as in ydb-go-sdk and ydb-js-sdk: a PR opts into SLO testing by getting the SLO label. The label is removed automatically once the report is posted.

Local verification

The workload image was built from the same script and end-to-end tested against ydb-slo-action/deploy/compose.yml (1 storage + 5 database + Prometheus + Grafana + chaos-monkey). Results, including chaos-induced ydb/transport_unavailable errors and a clean p99 latency measurement, are documented in the companion PR.

Companion PR

Workload sources: ydb-platform/ydb-java-examples#61.

Add two GitHub Actions workflows that drive ydb-platform/ydb-slo-action
against the Java SDK:

- slo.yml: builds Docker images for the current PR and the merge-base
  baseline, then hands them to ydb-slo-action/init@v2 with KV
  read/write RPS flags. Triggered on PRs with the `SLO` label, on
  push to master and via workflow_dispatch.
- slo-report.yml: waits on the SLO workflow via workflow_run,
  publishes the comparison report through ydb-slo-action/report@v2,
  and removes the `SLO` label from the PR.

The workload sources live in ydb-platform/ydb-java-examples, in a
new `slo` Maven module. The build script in this commit
(.github/scripts/build-slo-image.sh) assembles a temporary build
context with both checkouts side by side and feeds it to the
Dockerfile shipped with the workload, so the Java SDK under test is
built from source and pinned into the workload build without ever
needing to publish snapshots.

The workflow checks out ydb-java-examples at `master` by default;
`workflow_dispatch` exposes an `examples_ref` input for testing
against unmerged workload changes.
Copilot AI review requested due to automatic review settings April 26, 2026 10:48

This comment was marked as outdated.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 26, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 71.01%. Comparing base (8556216) to head (99bdb11).
⚠️ Report is 15 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master     #644      +/-   ##
============================================
+ Coverage     70.87%   71.01%   +0.14%     
- Complexity     3314     3351      +37     
============================================
  Files           374      379       +5     
  Lines         15699    15862     +163     
  Branches       1650     1664      +14     
============================================
+ Hits          11126    11264     +138     
- Misses         3931     3951      +20     
- Partials        642      647       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@polRk polRk added the SLO label May 4, 2026
polRk added 5 commits May 4, 2026 16:52
The report publishing and SLO label removal are now handled
directly in the SLO workflow after tests complete. The workflow
trigger is simplified to only run on labeled pull requests or
manual dispatch.
Remove manual workflow_dispatch trigger and associated inputs.
Trigger on pull request open, reopen, synchronize, and label
events. Use a larger runner for performance tests.
…nner

copy_tree fell back to `cp -a src dst` after a partial `cp -al`, nesting the
SDK inside itself and leaving /src/ydb-java-sdk without a pom.xml at build
time. Copy contents explicitly via `src/. dst/` and verify the expected
layout before invoking docker build.

The Remove SLO label step used `gh`, which is not installed on the
self-hosted runner; call the REST API directly with curl instead.
init@v2 uploads its workload artifacts in a post-hook, so an inline
report@v2 step in the same job runs before the upload and sees zero
artifacts. Move report publication into a separate workflow triggered on
workflow_run: completed so it runs after init's post-hook has finished.

Pair the label removal with it. Trigger the removal only on success or
failure conclusions — cancelled and skipped runs leave the label in place
so a concurrent re-run can proceed with the label still set. Look up the
PR by head SHA when workflow_run.pull_requests is empty (fork PRs).
Remove the label on any SLO run completion — success, failure, cancelled,
or skipped — so the PR never gets stuck with the label. Drop the fork
fallback that looked up the PR by head SHA; same-repo PRs are the only
supported path. Use gh directly instead of curl since slo-report runs on
the shared ubuntu-latest runner where gh is preinstalled.
polRk added a commit to ydb-platform/ydb-cpp-sdk that referenced this pull request May 5, 2026
Rewrite the SLO workflows following the pattern used by ydb-java-sdk
(ydb-platform/ydb-java-sdk#644).

slo.yml:
- Drop the hand-rolled docker run orchestration that spun up YDB and
  invoked the workload with --dont-push / explicit create/run phases.
  The v2 action owns that lifecycle via deploy/compose.yml — we just
  hand it two prebuilt images.
- Gate on the `SLO` PR label.
- Build both images with a single `docker build` per ref; if the
  baseline commit can't be built (missing Dockerfile or compile error
  on a historical SHA), fall back to the current image so the run is
  comparable against itself rather than silently failing.
- Drop `--build-arg REF=…` — ref is now read from WORKLOAD_REF env at
  runtime.
- Rename matrix entry to `cpp-key-value` to match the built binary and
  collapse the per-compiler matrix to a single clang entry (gcc variant
  can be added back when needed).

slo_report.yml:
- Pin to @v2.
- Add a second job that removes the `SLO` label from the PR after the
  report is published, matching js-sdk/java-sdk/go-sdk.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants