Skip to content

[SPARK-57142][INFRA] Share SBT precompile artifact with tpcds-1g CI job#56200

Open
zhengruifeng wants to merge 1 commit into
apache:masterfrom
zhengruifeng:precompile-tpcds-ci-share-dev5
Open

[SPARK-57142][INFRA] Share SBT precompile artifact with tpcds-1g CI job#56200
zhengruifeng wants to merge 1 commit into
apache:masterfrom
zhengruifeng:precompile-tpcds-ci-share-dev5

Conversation

@zhengruifeng
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

This PR wires the tpcds-1g job in .github/workflows/build_and_test.yml to consume the shared precompile artifact, extending the pattern already applied to docker-integration-tests and k8s-integration-tests (SPARK-57069; parent SPARK-56830).

Concretely:

  • The precompile job's if: gate is extended to also fire when tpcds-1g == 'true' in the precondition output, so the artifact is available whenever the job runs.
  • tpcds-1g:
    • needs: precondition -> needs: [precondition, precompile]
    • if: extended with (!cancelled()) && so the job still runs if precompile is cancelled.
    • Adds "Download precompiled artifact" + "Extract precompiled artifact" steps after Java install, with graceful fallback (continue-on-error: true).

The tpcds-1g job drives SBT directly via build/sbt "sql/testOnly ..." (and build/sbt "sql/Test/runMain org.apache.spark.sql.GenTPCDSData ..." on a TPC-DS data cache miss), so it does not go through dev/run-tests.py and needs no SKIP_SCALA_BUILD flag -- the same situation as k8s-integration-tests. The first SBT invocation otherwise compiles sql/core (main + test) from scratch. The precompile job already runs Test/package, which compiles the sql/core test classes this job depends on (TPCDSQueryTestSuite, TPCDSCollationQueryTestSuite, GenTPCDSData, TPCDSSchema). Extracting the precompiled target/ lets SBT skip that compile and run the test phase directly.

Optional: graceful fallback if precompile fails

Same pattern as the prior consumers:

  • precompile keeps continue-on-error: true.
  • The "Download precompiled artifact" step is gated on needs.precompile.result == 'success' and has continue-on-error: true.
  • "Extract precompiled artifact" is gated on the download succeeding and has continue-on-error: true.
  • If extraction fails or the artifact is missing, SBT compiles from scratch exactly as before.

Worst case is degraded to the pre-PR behavior, not a workflow failure.

Note: the existing # Any TPC-DS related updates on this job need to be applied to tpcds-1g-gen job of benchmark.yml as well comment refers to TPC-DS data-generation parameters (scale factor, tpcds-kit ref, GenTPCDSData args). This PR changes none of those -- it only adds build-artifact reuse, and benchmark.yml is a standalone workflow with no shared precompile job -- so no corresponding change is needed there.

Why are the changes needed?

Today every run of build_and_test.yml that requires tpcds-1g re-runs the same sql/core SBT compile that the precompile job already produced for pyspark / sparkr / build / docker / k8s. Wiring tpcds-1g to the existing artifact removes that duplicate compile for free (precompile is already running).

Does this PR introduce any user-facing change?

No. CI infrastructure change only.

How was this patch tested?

The change is exercised by the CI run of this PR itself. The Download/Extract steps log the artifact size; if the precompile job is forced to fail (or its artifact is missing), the job falls back to the original local SBT build.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.7)

Wire the tpcds-1g job to consume the shared precompile artifact, extending the
pattern already used by docker-integration-tests and k8s-integration-tests
(SPARK-57069).

The tpcds-1g job drives SBT directly via 'build/sbt "sql/testOnly ..."', so the
first SBT invocation otherwise compiles sql/core (main + test) from scratch.
The precompile job already runs 'Test/package', which compiles the sql/core
test classes (TPCDSQueryTestSuite, TPCDSCollationQueryTestSuite, GenTPCDSData,
TPCDSSchema). Extracting the precompiled target/ lets SBT skip that compile and
run the test phase directly, the same way the k8s job reuses the artifact (no
SKIP_SCALA_BUILD needed since the job does not go through dev/run-tests.py).

- precompile 'if:' gate fires on tpcds-1g == 'true'.
- tpcds-1g: 'needs: precondition' -> 'needs: [precondition, precompile]', plus
  '(!cancelled()) &&' so it still runs if precompile is cancelled.
- Download/Extract steps after Java install, with graceful fallback
  (continue-on-error). If the artifact is missing, SBT compiles from scratch as
  before.

Generated-by: Claude Code (Opus 4.7)
@zhengruifeng zhengruifeng changed the title [INFRA] Share SBT precompile artifact with tpcds-1g CI job [SPARK-57142][INFRA] Share SBT precompile artifact with tpcds-1g CI job May 29, 2026
@zhengruifeng zhengruifeng marked this pull request as ready for review May 29, 2026 11:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant