Commit 7712cf3
feat(ml): add stateless bundle-local size-aware batching and benchmark (#37532)
* feat(ml): add stateless bundle-local size-aware batching and benchmark
* fix(ml): improve test coverage for SortAndBatchElements
- Exclude *_benchmark.py from codecov (standalone scripts, not production code)
- Remove redundant validation from internal DoFn classes (already validated by PTransform)
- Add direct in-process unit tests for DoFn internals to capture coverage
(FnApiRunner runs DoFns in separate process, invisible to coverage tools)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Address PR review: clarify benchmark comment and warn on len() fallback
Reframe benchmark docstring to clarify that sorting combined with
weight-based splitting drives the improvement. Move default element
size fallback into DoFn instances with a one-time warning when len()
is unsupported, so users know to provide a custom element_size_fn.
* Migrate JupyterLab sidepanel extension to prebuilt package distribution
Replace deprecated jupyter labextension install/link workflow with
pip-installable prebuilt extension for JupyterLab 4+ compatibility.
- Add install.json for prebuilt extension discovery metadata
- Add style/index.js CSS entry point and styleModule field in package.json
- Include js in package.json files glob so style/index.js is published
- Add Extensions and Extensions :: Prebuilt classifiers to pyproject.toml
- Add missing src/yaml/* to tsconfig.json includes
- Remove deprecated labextension install/link/build instructions from READMEs
- Replace ipywidgets labextension install with pip install in Interactive README
* Use real Beam pipelines in sort-and-batch benchmark
* Address all review comments
- Reuse _WindowAwareBatchingDoFn._MAX_LIVE_WINDOWS instead of keeping a separate hard-coded limit in SortAndBatchElements.
- Drop the padding-efficiency unit test that compared incongruent batching strategies and keep the transform tests focused on deterministic behavior.
- Align benchmark typing with modern Python style by using collections.abc imports and native built-in generics.
- Make the sorted-order test clearer by naming the expected batch contents explicitly.
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>1 parent 2f300cc commit 7712cf3
11 files changed
Lines changed: 1308 additions & 54 deletions
File tree
- .github
- sdks/python/apache_beam
- runners/interactive
- extensions/apache-beam-jupyterlab-sidepanel
- style
- testing/benchmarks
- transforms
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
73 | 73 | | |
74 | 74 | | |
75 | 75 | | |
| 76 | + | |
76 | 77 | | |
77 | 78 | | |
78 | 79 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
244 | 244 | | |
245 | 245 | | |
246 | 246 | | |
247 | | - | |
248 | | - | |
249 | | - | |
250 | | - | |
251 | | - | |
252 | | - | |
253 | | - | |
254 | | - | |
255 | | - | |
256 | | - | |
257 | | - | |
258 | | - | |
259 | | - | |
| 247 | + | |
260 | 248 | | |
261 | 249 | | |
262 | 250 | | |
263 | | - | |
264 | 251 | | |
265 | 252 | | |
266 | 253 | | |
| |||
Lines changed: 13 additions & 39 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
31 | 31 | | |
32 | 32 | | |
33 | 33 | | |
34 | | - | |
35 | | - | |
36 | | - | |
37 | | - | |
38 | | - | |
| 34 | + | |
39 | 35 | | |
40 | 36 | | |
41 | 37 | | |
42 | 38 | | |
43 | 39 | | |
44 | | - | |
45 | | - | |
46 | | - | |
47 | | - | |
48 | | - | |
49 | | - | |
50 | | - | |
51 | | - | |
| 40 | + | |
| 41 | + | |
52 | 42 | | |
53 | | - | |
54 | | - | |
| 43 | + | |
55 | 44 | | |
56 | 45 | | |
57 | | - | |
| 46 | + | |
58 | 47 | | |
59 | 48 | | |
60 | | - | |
61 | | - | |
62 | | - | |
63 | | - | |
64 | | - | |
65 | | - | |
66 | | - | |
67 | | - | |
68 | | - | |
| 49 | + | |
69 | 50 | | |
70 | 51 | | |
71 | 52 | | |
| |||
90 | 71 | | |
91 | 72 | | |
92 | 73 | | |
93 | | - | |
94 | | - | |
95 | | - | |
96 | | - | |
97 | 74 | | |
98 | | - | |
99 | | - | |
100 | | - | |
101 | | - | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
102 | 80 | | |
103 | 81 | | |
104 | 82 | | |
| |||
110 | 88 | | |
111 | 89 | | |
112 | 90 | | |
113 | | - | |
| 91 | + | |
114 | 92 | | |
115 | 93 | | |
116 | 94 | | |
| |||
214 | 192 | | |
215 | 193 | | |
216 | 194 | | |
217 | | - | |
218 | | - | |
219 | | - | |
220 | | - | |
221 | | - | |
| 195 | + | |
222 | 196 | | |
Lines changed: 5 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
Lines changed: 2 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
18 | | - | |
| 18 | + | |
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
| |||
100 | 100 | | |
101 | 101 | | |
102 | 102 | | |
| 103 | + | |
103 | 104 | | |
104 | 105 | | |
105 | 106 | | |
| |||
Lines changed: 2 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
33 | 33 | | |
34 | 34 | | |
35 | 35 | | |
| 36 | + | |
| 37 | + | |
36 | 38 | | |
37 | 39 | | |
38 | 40 | | |
| |||
Lines changed: 13 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
Lines changed: 1 addition & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
29 | 29 | | |
30 | 30 | | |
31 | 31 | | |
| 32 | + | |
32 | 33 | | |
33 | 34 | | |
34 | 35 | | |
0 commit comments