[SPARK-57142][INFRA] Share SBT precompile artifact with tpcds-1g CI job#56200
Open
zhengruifeng wants to merge 1 commit into
Open
[SPARK-57142][INFRA] Share SBT precompile artifact with tpcds-1g CI job#56200zhengruifeng wants to merge 1 commit into
zhengruifeng wants to merge 1 commit into
Conversation
Wire the tpcds-1g job to consume the shared precompile artifact, extending the pattern already used by docker-integration-tests and k8s-integration-tests (SPARK-57069). The tpcds-1g job drives SBT directly via 'build/sbt "sql/testOnly ..."', so the first SBT invocation otherwise compiles sql/core (main + test) from scratch. The precompile job already runs 'Test/package', which compiles the sql/core test classes (TPCDSQueryTestSuite, TPCDSCollationQueryTestSuite, GenTPCDSData, TPCDSSchema). Extracting the precompiled target/ lets SBT skip that compile and run the test phase directly, the same way the k8s job reuses the artifact (no SKIP_SCALA_BUILD needed since the job does not go through dev/run-tests.py). - precompile 'if:' gate fires on tpcds-1g == 'true'. - tpcds-1g: 'needs: precondition' -> 'needs: [precondition, precompile]', plus '(!cancelled()) &&' so it still runs if precompile is cancelled. - Download/Extract steps after Java install, with graceful fallback (continue-on-error). If the artifact is missing, SBT compiles from scratch as before. Generated-by: Claude Code (Opus 4.7)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This PR wires the
tpcds-1gjob in.github/workflows/build_and_test.ymlto consume the sharedprecompileartifact, extending the pattern already applied todocker-integration-testsandk8s-integration-tests(SPARK-57069; parent SPARK-56830).Concretely:
precompilejob'sif:gate is extended to also fire whentpcds-1g == 'true'in the precondition output, so the artifact is available whenever the job runs.tpcds-1g:needs: precondition->needs: [precondition, precompile]if:extended with(!cancelled()) &&so the job still runs if precompile is cancelled.continue-on-error: true).The
tpcds-1gjob drives SBT directly viabuild/sbt "sql/testOnly ..."(andbuild/sbt "sql/Test/runMain org.apache.spark.sql.GenTPCDSData ..."on a TPC-DS data cache miss), so it does not go throughdev/run-tests.pyand needs noSKIP_SCALA_BUILDflag -- the same situation ask8s-integration-tests. The first SBT invocation otherwise compilessql/core(main + test) from scratch. Theprecompilejob already runsTest/package, which compiles thesql/coretest classes this job depends on (TPCDSQueryTestSuite,TPCDSCollationQueryTestSuite,GenTPCDSData,TPCDSSchema). Extracting the precompiledtarget/lets SBT skip that compile and run the test phase directly.Optional: graceful fallback if precompile fails
Same pattern as the prior consumers:
precompilekeepscontinue-on-error: true.needs.precompile.result == 'success'and hascontinue-on-error: true.continue-on-error: true.Worst case is degraded to the pre-PR behavior, not a workflow failure.
Note: the existing
# Any TPC-DS related updates on this job need to be applied to tpcds-1g-gen job of benchmark.yml as wellcomment refers to TPC-DS data-generation parameters (scale factor,tpcds-kitref,GenTPCDSDataargs). This PR changes none of those -- it only adds build-artifact reuse, andbenchmark.ymlis a standalone workflow with no sharedprecompilejob -- so no corresponding change is needed there.Why are the changes needed?
Today every run of
build_and_test.ymlthat requirestpcds-1gre-runs the samesql/coreSBT compile that theprecompilejob already produced forpyspark/sparkr/build/ docker / k8s. Wiringtpcds-1gto the existing artifact removes that duplicate compile for free (precompile is already running).Does this PR introduce any user-facing change?
No. CI infrastructure change only.
How was this patch tested?
The change is exercised by the CI run of this PR itself. The Download/Extract steps log the artifact size; if the precompile job is forced to fail (or its artifact is missing), the job falls back to the original local SBT build.
Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Opus 4.7)