Skip to content

Commit 7712cf3

Browse files
Eliaaazzzclaude
andauthored
feat(ml): add stateless bundle-local size-aware batching and benchmark (#37532)
* feat(ml): add stateless bundle-local size-aware batching and benchmark * fix(ml): improve test coverage for SortAndBatchElements - Exclude *_benchmark.py from codecov (standalone scripts, not production code) - Remove redundant validation from internal DoFn classes (already validated by PTransform) - Add direct in-process unit tests for DoFn internals to capture coverage (FnApiRunner runs DoFns in separate process, invisible to coverage tools) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Address PR review: clarify benchmark comment and warn on len() fallback Reframe benchmark docstring to clarify that sorting combined with weight-based splitting drives the improvement. Move default element size fallback into DoFn instances with a one-time warning when len() is unsupported, so users know to provide a custom element_size_fn. * Migrate JupyterLab sidepanel extension to prebuilt package distribution Replace deprecated jupyter labextension install/link workflow with pip-installable prebuilt extension for JupyterLab 4+ compatibility. - Add install.json for prebuilt extension discovery metadata - Add style/index.js CSS entry point and styleModule field in package.json - Include js in package.json files glob so style/index.js is published - Add Extensions and Extensions :: Prebuilt classifiers to pyproject.toml - Add missing src/yaml/* to tsconfig.json includes - Remove deprecated labextension install/link/build instructions from READMEs - Replace ipywidgets labextension install with pip install in Interactive README * Use real Beam pipelines in sort-and-batch benchmark * Address all review comments - Reuse _WindowAwareBatchingDoFn._MAX_LIVE_WINDOWS instead of keeping a separate hard-coded limit in SortAndBatchElements. - Drop the padding-efficiency unit test that compared incongruent batching strategies and keep the transform tests focused on deterministic behavior. - Align benchmark typing with modern Python style by using collections.abc imports and native built-in generics. - Make the sorted-order test clearer by naming the expected batch contents explicitly. --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 2f300cc commit 7712cf3

11 files changed

Lines changed: 1308 additions & 54 deletions

File tree

.github/codecov.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,7 @@ ignore:
7373
- "**/*_microbenchmark.py"
7474
- "sdks/go/pkg/beam/register/register.go"
7575
- "sdks/python/apache_beam/testing/benchmarks/nexmark/**"
76+
- "**/*_benchmark.py"
7677
- "sdks/python/apache_beam/examples/**"
7778

7879
# See https://docs.codecov.com/docs/flags for options.

sdks/python/apache_beam/runners/interactive/README.md

Lines changed: 1 addition & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -244,23 +244,10 @@ a quick reference). For a more general and complete getting started guide, see
244244
jupyter kernelspec list
245245
```
246246
247-
* Extend JupyterLab through labextension. **Note**: labextension is different from nbextension
248-
from pre-lab jupyter notebooks.
249-
250-
All jupyter labextensions need nodejs
251-
252-
```bash
253-
# Homebrew users do
254-
brew install node
255-
# Or Conda users do
256-
conda install -c conda-forge nodejs
257-
```
258-
259-
Enable ipywidgets
247+
* Install ipywidgets (includes the JupyterLab widget manager as a prebuilt extension):
260248
261249
```bash
262250
pip install ipywidgets
263-
jupyter labextension install @jupyter-widgets/jupyterlab-manager
264251
```
265252
266253
### Start the notebook

sdks/python/apache_beam/runners/interactive/extensions/apache-beam-jupyterlab-sidepanel/README.md

Lines changed: 13 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -31,41 +31,22 @@ Includes two different side panels:
3131

3232
## Installation
3333

34-
There are two ways to install the extension:
35-
36-
### 1. Via pip (recommended)
37-
38-
The extension is now available as a Python package on PyPI. You can install it with:
34+
This extension is distributed as a prebuilt Python package. Install it with pip:
3935

4036
```bash
4137
pip install apache-beam-jupyterlab-sidepanel
4238
```
4339

44-
After installation, rebuild JupyterLab to activate the extension:
45-
46-
```bash
47-
jupyter lab clean
48-
jupyter lab build
49-
```
50-
51-
Then restart JupyterLab. The side panels will be available automatically.
40+
Then restart JupyterLab. The side panels will be available automatically — no
41+
`jupyter lab build` step is needed.
5242

53-
54-
### 2. Via JupyterLab Extension Manager (legacy, will be deprecated soon)
43+
You can verify the extension is installed:
5544

5645
```bash
57-
jupyter labextension install apache-beam-jupyterlab-sidepanel
46+
jupyter labextension list
5847
```
5948

60-
This installs the extension using JupyterLab's legacy extension system.
61-
62-
---
63-
64-
## Notes
65-
66-
- Pip installation is now the preferred method as it handles Python packaging and JupyterLab extension registration seamlessly.
67-
- After any upgrade or reinstallation, always rebuild JupyterLab to ensure the extension is activated.
68-
- For detailed usage and development, refer to the source code and issues on [GitHub](https://github.com/apache/beam).
49+
The extension should appear under the **prebuilt extensions** section.
6950

7051
---
7152

@@ -90,15 +71,12 @@ The `jlpm` command is JupyterLab's pinned version of
9071

9172
# Install dependencies
9273
jlpm
93-
# Build Typescript source
94-
jlpm build
95-
# Link your development version of the extension with JupyterLab
96-
jupyter labextension link .
9774

98-
# Rebuild Typescript source after making changes
99-
jlpm build
100-
# Rebuild JupyterLab after making any changes
101-
jupyter lab build
75+
# Install the extension in editable mode (runs an initial JS build)
76+
pip install -e .
77+
78+
# Verify installation
79+
jupyter labextension list
10280
```
10381

10482
You can watch the source directory and run JupyterLab in watch mode to watch for changes in the extension's source and automatically rebuild the extension and application.
@@ -110,7 +88,7 @@ jlpm watch
11088
jupyter lab --watch
11189
```
11290

113-
Now every change will be built locally and bundled into JupyterLab. Be sure to refresh your browser page after saving file changes to reload the extension (note: you'll need to wait for webpack to finish, which can take 10s+ at times).
91+
Now every change will be built locally and bundled into JupyterLab. Be sure to refresh your browser page after saving file changes to reload the extension (note: you'll need to wait for the build to finish, which can take 10s+ at times).
11492

11593
### Test
11694

@@ -214,9 +192,5 @@ $PREFIX/share/jupyter/labextensions/apache-beam-jupyterlab-sidepanel/
214192
### Uninstall
215193

216194
```bash
217-
jupyter labextension uninstall apache-beam-jupyterlab-sidepanel
218-
```
219-
or
220-
```bash
221-
pip uninstall apache-beam-jupyterlab-sidepanel
195+
pip uninstall apache_beam_jupyterlab_sidepanel
222196
```
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
{
2+
"packageManager": "python",
3+
"packageName": "apache_beam_jupyterlab_sidepanel",
4+
"uninstallInstructions": "Use your Python package manager (pip, conda, etc.) to uninstall the package apache_beam_jupyterlab_sidepanel"
5+
}

sdks/python/apache_beam/runners/interactive/extensions/apache-beam-jupyterlab-sidepanel/package.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
"author": "apache-beam",
1616
"files": [
1717
"lib/**/*.{d.ts,eot,gif,html,jpg,js,js.map,json,png,svg,woff2,ttf}",
18-
"style/**/*.{css,eot,gif,html,jpg,json,png,svg,woff2,ttf}"
18+
"style/**/*.{css,js,eot,gif,html,jpg,json,png,svg,woff2,ttf}"
1919
],
2020
"main": "lib/index.js",
2121
"types": "lib/index.d.ts",
@@ -100,6 +100,7 @@
100100
"style/*.css",
101101
"style/index.js"
102102
],
103+
"styleModule": "style/index.js",
103104
"jupyterlab": {
104105
"extension": true,
105106
"outputDir": "apache_beam_jupyterlab_sidepanel/labextension"

sdks/python/apache_beam/runners/interactive/extensions/apache-beam-jupyterlab-sidepanel/pyproject.toml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,8 @@ classifiers = [
3333
"Framework :: Jupyter",
3434
"Framework :: Jupyter :: JupyterLab",
3535
"Framework :: Jupyter :: JupyterLab :: 4",
36+
"Framework :: Jupyter :: JupyterLab :: Extensions",
37+
"Framework :: Jupyter :: JupyterLab :: Extensions :: Prebuilt",
3638
"License :: OSI Approved :: Apache Software License",
3739
"Programming Language :: Python :: 3",
3840
]
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
// Licensed under the Apache License, Version 2.0 (the 'License'); you may not
2+
// use this file except in compliance with the License. You may obtain a copy of
3+
// the License at
4+
//
5+
// http://www.apache.org/licenses/LICENSE-2.0
6+
//
7+
// Unless required by applicable law or agreed to in writing, software
8+
// distributed under the License is distributed on an 'AS IS' BASIS, WITHOUT
9+
// WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
10+
// License for the specific language governing permissions and limitations under
11+
// the License.
12+
13+
import './index.css';

sdks/python/apache_beam/runners/interactive/extensions/apache-beam-jupyterlab-sidepanel/tsconfig.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@
2929
"src/common/*",
3030
"src/kernel/*",
3131
"src/inspector/*",
32+
"src/yaml/*",
3233
"src/__tests__/**/*"
3334
]
3435
}

0 commit comments

Comments
 (0)