Add bucket+mount transport for Jobs script upload#4025
Add bucket+mount transport for Jobs script upload#4025davanstrien wants to merge 27 commits intomainfrom
Conversation
When `HF_JOBS_USE_BUCKET_TRANSPORT=1` is set, `run_uv_job` and
`create_scheduled_uv_job` upload local scripts to a
`{namespace}/jobs-artifacts` bucket instead of base64-encoding them
into an environment variable. The bucket is mounted at `/artifacts`
and the job runs scripts directly from disk.
Falls back to base64 transport if bucket creation fails, hf_xet is
unavailable, or the `/artifacts` mount path is already taken by a
user-provided volume. The existing base64 path is unchanged.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
Follows lhoestq's review suggestion in #4025. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wauplin
left a comment
There was a problem hiding this comment.
Thanks for working on that @davanstrien ! Pretty excited by what it will unlock so let's double-down on this direction 😃
Follows Wauplin's review suggestion in #4025.
Follows Wauplin's review suggestion in #4025.
Follows Wauplin's review suggestion in #4025. Makes it easier for users to browse the artifacts bucket and find a specific Job by submission time. Format: `YYYYMMDDTHHMMSS-{6 hex digits}` (UTC). Side effect: `uuid` is no longer used in this code path, so the import added in the previous commit is removed.
…onally Follows Wauplin's review in #4025: "get rid of the base64-encoding branch [...] let's move to buckets and maintain only one transport mode". The `HF_JOBS_USE_BUCKET_TRANSPORT` opt-in is removed and bucket transport is now the only path for local scripts. The three failure modes that previously silently fell back to base64 now raise: - Missing `hf_xet`: `ImportError` with an install hint. - Reserved mount path already in use: `ValueError`. - Bucket creation/upload failure: the underlying exception propagates. Also removes the regression test for bash shell quoting — the bash -c code path no longer exists, so `uv run` receives args as argv and shell metacharacters in dependency specifiers are no longer interpreted.
Follows Wauplin's review in #4025: mount with the per-job prefix so the Jobs container sees only the artifacts folder created for that job. Previously the whole `{ns}/jobs-artifacts` bucket was mounted at `/artifacts` and scripts lived under `/artifacts/scripts/{timestamp}-{hex}/`. Now the volume is constructed with `path="scripts/{timestamp}-{hex}"", so the job sees its files directly at `/artifacts/`. This also drops the `scripts_prefix` from `_upload_scripts_to_bucket`'s return type (it's no longer needed by the caller) and shortens the generated `uv run` command.
Follows Wauplin's review in #4025: prefer `/data` since it mirrors the historical Spaces persistent-storage path and is shorter to remember. The bucket name stays `{ns}/jobs-artifacts`.
Follows Wauplin's review in #4025 ("should we make it read-write to encourage users to write their output in it?"). The mount is already read-write in practice — `Volume.read_only` defaults to `False` for buckets and the Hub normalises omitted and explicit-false to the same downstream payload — so this commit just makes the intent explicit at the call site, documents it in the docstring, and pins the invariant with a test assertion.
…ict param Cursor Bugbot flagged that the new top-level `import secrets` shadows the `secrets: dict` parameter in `_create_uv_command_env_and_secrets`, `run_uv_job`, and `create_scheduled_uv_job`. The current call site is safe (it lives in `_upload_scripts_to_bucket`, which has no such parameter), but a future refactor that inlines or moves it would silently break. Importing `token_hex` by name eliminates the hazard and matches the existing convention in this file (e.g. `from itertools import islice` alongside `import itertools`).
|
(converted to draft, @davanstrien could you reset to "ready for review" when ready? -just trying to filter my notifications here 😄 ) |
yes sorry for the noise! |
|
Thanks both! Pushed follow-up commits working through the feedback. TL;DR: took @Wauplin's stronger suggestion and dropped the base64 path entirely, which enabled some other simplifications. @lhoestq — the
Smoke-tested end-to-end against production. Ran Job logs: Bucket state after: |
`hf_xet` is a default `install_requires` dependency on all common platforms (setup.py line 20), so the previous `pip install huggingface_hub[hf_xet]` hint was misleading — the extra is redundant with the default install. The actual failure modes (unusual platform, `HF_HUB_DISABLE_XET` set, broken/stale install) are varied enough that linking to the installation guide is cleaner than prescribing a specific command.
Wauplin
left a comment
There was a problem hiding this comment.
Thanks! Some minor comments but logic looks good to me. Haven't checked the tests much yet
Co-authored-by: Lucain <lucain@huggingface.co>
Co-authored-by: Lucain <lucain@huggingface.co>
Co-authored-by: Lucain <lucain@huggingface.co>
Co-authored-by: Lucain <lucain@huggingface.co>
Co-authored-by: Lucain <lucain@huggingface.co>
Per review feedback: the volume is read-write so jobs can save
artifacts back to the same prefix, which makes scoping uploads under
`scripts/` misleading. Files now land at `{timestamp}-{hex}/<name>`
at the bucket root, and the volume path is the per-job subfolder.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The `hf_xet` availability check was removed from `_upload_scripts_to_bucket`, so `huggingface_hub.hf_api.is_xet_available` no longer exists as a patch target. Removes the now-dead `patch(...)` calls and the `test_raises_when_xet_unavailable` test that asserted the (removed) ImportError. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wauplin
left a comment
There was a problem hiding this comment.
Approved once comments are addressed 😃
Tested it locally on a dummy script and it worked fine https://huggingface.co/buckets/Wauplin/jobs-artifacts/tree/20260415T092902-5ca267
I also took the liberty to update the PR description since the PR scope changed a bit since you created it
Co-authored-by: Lucain <lucain@huggingface.co>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 0fb46cc. Configure here.
Per Wauplin's review comment: users had no way to know where the silently-created `jobs-artifacts` bucket lives. `_upload_scripts_to_bucket` now captures the `BucketUrl` returned by `create_bucket` and prints it after the upload succeeds. `print` over `logger.info` because the hub logger defaults to WARNING, which would make `logger.info` silent for anyone who hasn't opted into verbose logging — defeating the point of the message. Also fixes a stale test assertion: the earlier accepted `private=True` suggestion didn't update `test_bucket_transport_uploads_and_returns_volume`, so the test was broken on the branch tip. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Use bucket transport for Jobs script upload
When
hf jobs uv runis given local files (scripts, configs, etc.), we now upload them to a{namespace}/jobs-artifactsbucket and mount it into the job container at/datainstead of base64-encoding them into an environment variable.This replaces the old
bash -c+xargs+base64 -dpipeline which was fragile and required manual shell quoting. The bucket approach is simpler, easier to debug, and importantly lets jobs write output artifacts back to/data/since the mount is read-write.How it works
{namespace}/jobs-artifactsunder a{YYYYMMDDTHHMMSS}-{6 hex digits}subfolder (makes it easy to find artifacts for a specific job by timestamp)Volumescoped to that subfolder is mounted at/data, so the job container sees its files directly at the mount rootuv runcommand references/data/<filename>— no more shell wrappers/data, we raise a clearValueErrorWhat changed vs the original base64 approach
The base64 transport path is fully removed — there's no opt-in flag, no fallback. Bucket transport is the only mode now. This was the simplest thing to do since base64 was experimental/hacky and maintaining two transport modes wasn't worth it.
Constants
Two new constants in
constants.py:HF_JOBS_ARTIFACTS_BUCKET_NAME = "jobs-artifacts"HF_JOBS_ARTIFACTS_MOUNT_PATH = "/data"Files changed
src/huggingface_hub/constants.py— new bucket/mount constantssrc/huggingface_hub/hf_api.py—_create_uv_command_env_and_secretsrewritten to use bucket transport, new_upload_scripts_to_buckethelpertests/test_cli.py— newTestBucketTransportclass replacing the old shell-quoting tests (happy path, failure propagation, mount collision, multi-file upload)Note
Medium Risk
Changes the transport mechanism for local files in Jobs from env-var base64 injection to Hub bucket uploads, which affects job execution and storage semantics. Risk is mainly around bucket permissions/mount conflicts and ensuring volumes are composed correctly for existing users.
Overview
Jobs
uv runlocal file transport is rewritten to use a Hub bucket mount instead of base64-in-env. Whenrun_uv_job/create_scheduled_uv_jobis given local scripts/configs, files are uploaded to a per-job subfolder in{namespace}/jobs-artifactsand mounted read-write at/data, and the generated command now runs as plainuv runreferencing/data/...paths.Adds
HF_JOBS_ARTIFACTS_BUCKET_NAMEandHF_JOBS_ARTIFACTS_MOUNT_PATHconstants, introduces_upload_scripts_to_bucket, and propagates anextra_volumesreturn value so callers automatically append the artifactsVolume. The oldbash -c+xargs+base64transport is removed, failures now propagate directly, and the code explicitly errors if a user-provided volume already uses the reserved/datamount; tests are updated to cover the new bucket transport behavior and edge cases.Reviewed by Cursor Bugbot for commit 44af196. Bugbot is set up for automated code reviews on this repo. Configure here.