[AURON #2291] Make auron-flink-assembly the structural deployment jar#2292
[AURON #2291] Make auron-flink-assembly the structural deployment jar#2292weiqingy wants to merge 7 commits into
Conversation
…embly jar The auron-flink-assembly shaded jar bundles both Auron's StreamExecCalc override (from auron-flink-planner) and Flink's stock copy (transitive from flink-table-planner_2.12). Add a maven-shade-plugin filter that excludes the stock class so only Auron's override remains, making activation a structural guarantee instead of depending on shade's class-overlap ordering.
…y jar The auron-flink-assembly jar bundles a modified copy of Apache Flink's flink-table-planner (Auron's StreamExecCalc substituted in) plus a large dependency closure. Add ASF-compliant license documentation: - META-INF/NOTICE: Auron product header (via ApacheNoticeResourceTransformer), the bundled-modified-Flink statement with an ALv2 section 4(b) note naming StreamExecCalc, and kafka-clients' required jar-root attribution folded in. - Root LICENSE: Apache License 2.0 plus blocks for every bundled non-ALv2 dependency (icu4j Unicode/ICU; protobuf-java, kryo, minlog, janino, commons-compiler BSD-3; zstd-jni BSD-2; jts-core EPL-2.0; checker-qual MIT). - Shade transformers and filters to aggregate dependency notices and drop the colliding jar-root NOTICE/LICENSE files of kafka-clients and icu4j.
AssemblyJarStructureIT (maven-failsafe, verify phase) opens the built auron-flink-assembly jar and asserts it is structurally correct: exactly one StreamExecCalc class, that it is Auron's override (byte-level check for the FlinkAuronCalcOperator marker, which stock Flink's class lacks), that the full Flink planner content is bundled, and that META-INF/NOTICE attributes both Apache Auron and Apache Flink. Guards against the stock StreamExecCalc winning the shade.
Add the auron-flink-assembly module to the Flink CI job and run AssemblyJarStructureIT in the verify phase. A single `mvn verify -pl auron-flink-extension/auron-flink-assembly -am` builds the native library and all upstream Flink modules (so auron-core's test-jar is produced for auron-flink-runtime), runs their tests, then verifies the shaded assembly jar.
|
Hi @Tartarus0zm, could you please help review this PR when you get a chance? Thank you! |
There was a problem hiding this comment.
@weiqingy Thanks for your contribution, I think this is a very rigorous strategy, I completely agree.
But I haven't seen any targeted tests for StreamExecCalc. In the future, when we rewrite other Exec nodes, developers will need to follow this convention, but we don't have a validation mechanism in place. It looks like we'll have to rely solely on a handful of committers doing code reviews to ensure it works, and that's what worries me.
- Can we define an interface, something like
FlinkAuronStreamExec, that our rewritten FlinkStreamExecCalcwould need to implement? Then, going forward, any support we add for other Exec nodes would also have to comply with this constraint? That way we can write dedicated tests inAssemblyJarStructureIT. - Can we add a check in the auron-flink-planner module that scans all rewritten classes under
org/apache/flink/table/planner/plan/nodes/execand verifies they all implement theFlinkAuronStreamExecinterface? This would ensure no developer forgets to do so. - We can add a reminder in the
FlinkAuronStreamExecdocs, developers also need to configure the filter strategy inauron-flink-assembly/pom.xml
What do you think, does this work for you?
…ker interface Adding a new Auron override of a Flink ExecNode requires shipping a class at Flink's fully-qualified class name and excluding the stock copy from the assembly shade filter. Nothing failed the build if a step was missed, so a forgotten exclude could let Flink's stock class win the shade and the override go silently inactive. Introduce FlinkAuronExecNode, a marker interface every override implements, and a registry resource (META-INF/auron/shadowed-flink-execnodes.txt) listing the override class names. ShadowedExecNodeRegistryTest scans the planner's exec-node package and fails the build when an override does not implement the marker or is absent from the registry, making the contract self-enforcing. StreamExecCalc is the first registered override.
…y shade AssemblyJarStructureIT previously hard-coded a single StreamExecCalc check. As more Flink ExecNode overrides are added, each needs its own shade <exclude>; a forgotten one lets Flink's stock class win the shade and the override silently fall back, with no failing test. Drive the structural test off the override registry instead: for every class named in META-INF/auron/shadowed-flink-execnodes.txt, assert the assembly jar holds exactly one copy and it carries the FlinkAuronExecNode marker. A missing <exclude> drops the marker from the surviving class and fails the test. The assembly pom filter comment now documents that per-override step.
@Tartarus0zm Thanks for the careful read and review. I've implemented all three of your suggestions, and backed the last one with a test rather than a doc so the part your note worried about isn't left to committer vigilance. Pushed in the two new commits. 1. Marker interface. Added 2. Scan check in 3. The Net effect: each of the three "forgot a step" mistakes now fails a specific test instead of resting on review. Does this match what you had in mind? And let me know your preference on the marker name. |
|
The 1 failing check is unrelated to this PR — a transient CI-runner network failure in the Spark 4.1 test track, which this PR doesn't touch.
This PR only changes Flink modules ( |
Tartarus0zm
left a comment
There was a problem hiding this comment.
Thanks for your contribution again! LGTM
Which issue does this PR close?
Closes #2291
Rationale for this change
auron-flink-assemblyis the single Flink deployment jar. It already bundles both Auron'sStreamExecCalcoverride and Flink's stockStreamExecCalc(transitive fromflink-table-planner_2.12), but with no class-level filter the surviving copy was decided by maven-shade's class-overlap ordering — not a guarantee. This makes Auron's override win structurally, so activation no longer depends on incidental shade/classpath ordering.What changes are included in this PR?
StreamExecCalcfrom the shaded jar so only Auron's override remains.AssemblyJarStructureIT, a build-time test asserting the jar contains exactly oneStreamExecCalcand it is Auron's.Are there any user-facing changes?
The recommended deployment becomes a single structural jar swap (remove
flink-table-planner-loader-*.jarfrom$FLINK_HOME/lib/, addauron-flink-assembly.jar). The user-facing deployment documentation is tracked separately in #1892.How was this patch tested?
AssemblyJarStructureIT(build-time, runs in CI). Manually verified the loader to non-loader jar swap on a Flink 1.18 home running a Calc SQL with native conversion active.