Skip to content

Add bucket+mount transport for Jobs script upload#4025

Open
davanstrien wants to merge 27 commits intomainfrom
feat/bucket-transport-jobs
Open

Add bucket+mount transport for Jobs script upload#4025
davanstrien wants to merge 27 commits intomainfrom
feat/bucket-transport-jobs

Conversation

@davanstrien
Copy link
Copy Markdown
Member

@davanstrien davanstrien commented Apr 2, 2026

Use bucket transport for Jobs script upload

When hf jobs uv run is given local files (scripts, configs, etc.), we now upload them to a {namespace}/jobs-artifacts bucket and mount it into the job container at /data instead of base64-encoding them into an environment variable.

This replaces the old bash -c + xargs + base64 -d pipeline which was fragile and required manual shell quoting. The bucket approach is simpler, easier to debug, and importantly lets jobs write output artifacts back to /data/ since the mount is read-write.

How it works

  • Local files are uploaded to {namespace}/jobs-artifacts under a {YYYYMMDDTHHMMSS}-{6 hex digits} subfolder (makes it easy to find artifacts for a specific job by timestamp)
  • A Volume scoped to that subfolder is mounted at /data, so the job container sees its files directly at the mount root
  • The uv run command references /data/<filename> — no more shell wrappers
  • Failures propagate directly (no silent fallback)
  • If a user already has a volume mounted at /data, we raise a clear ValueError

What changed vs the original base64 approach

The base64 transport path is fully removed — there's no opt-in flag, no fallback. Bucket transport is the only mode now. This was the simplest thing to do since base64 was experimental/hacky and maintaining two transport modes wasn't worth it.

Constants

Two new constants in constants.py:

  • HF_JOBS_ARTIFACTS_BUCKET_NAME = "jobs-artifacts"
  • HF_JOBS_ARTIFACTS_MOUNT_PATH = "/data"

Files changed

  • src/huggingface_hub/constants.py — new bucket/mount constants
  • src/huggingface_hub/hf_api.py_create_uv_command_env_and_secrets rewritten to use bucket transport, new _upload_scripts_to_bucket helper
  • tests/test_cli.py — new TestBucketTransport class replacing the old shell-quoting tests (happy path, failure propagation, mount collision, multi-file upload)

Note

Medium Risk
Changes the transport mechanism for local files in Jobs from env-var base64 injection to Hub bucket uploads, which affects job execution and storage semantics. Risk is mainly around bucket permissions/mount conflicts and ensuring volumes are composed correctly for existing users.

Overview
Jobs uv run local file transport is rewritten to use a Hub bucket mount instead of base64-in-env. When run_uv_job/create_scheduled_uv_job is given local scripts/configs, files are uploaded to a per-job subfolder in {namespace}/jobs-artifacts and mounted read-write at /data, and the generated command now runs as plain uv run referencing /data/... paths.

Adds HF_JOBS_ARTIFACTS_BUCKET_NAME and HF_JOBS_ARTIFACTS_MOUNT_PATH constants, introduces _upload_scripts_to_bucket, and propagates an extra_volumes return value so callers automatically append the artifacts Volume. The old bash -c + xargs + base64 transport is removed, failures now propagate directly, and the code explicitly errors if a user-provided volume already uses the reserved /data mount; tests are updated to cover the new bucket transport behavior and edge cases.

Reviewed by Cursor Bugbot for commit 44af196. Bugbot is set up for automated code reviews on this repo. Configure here.

davanstrien and others added 2 commits April 2, 2026 08:40
When `HF_JOBS_USE_BUCKET_TRANSPORT=1` is set, `run_uv_job` and
`create_scheduled_uv_job` upload local scripts to a
`{namespace}/jobs-artifacts` bucket instead of base64-encoding them
into an environment variable. The bucket is mounted at `/artifacts`
and the job runs scripts directly from disk.

Falls back to base64 transport if bucket creation fails, hf_xet is
unavailable, or the `/artifacts` mount path is already taken by a
user-provided volume. The existing base64 path is unchanged.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@davanstrien
Copy link
Copy Markdown
Member Author

cc @Wauplin @lhoestq

@bot-ci-comment
Copy link
Copy Markdown

bot-ci-comment bot commented Apr 2, 2026

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Copy Markdown
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome :)

Comment thread src/huggingface_hub/hf_api.py Outdated
Comment thread src/huggingface_hub/hf_api.py Outdated
Comment thread src/huggingface_hub/constants.py Outdated
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
Comment thread tests/test_cli.py Outdated
Follows lhoestq's review suggestion in #4025.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on that @davanstrien ! Pretty excited by what it will unlock so let's double-down on this direction 😃

Comment thread src/huggingface_hub/hf_api.py
Comment thread src/huggingface_hub/hf_api.py Outdated
Comment thread src/huggingface_hub/hf_api.py Outdated
Comment thread src/huggingface_hub/hf_api.py Outdated
Comment thread src/huggingface_hub/constants.py Outdated
Follows Wauplin's review suggestion in #4025.
Follows Wauplin's review suggestion in #4025.
Follows Wauplin's review suggestion in #4025. Makes it easier for users
to browse the artifacts bucket and find a specific Job by submission time.

Format: `YYYYMMDDTHHMMSS-{6 hex digits}` (UTC).

Side effect: `uuid` is no longer used in this code path, so the import
added in the previous commit is removed.
…onally

Follows Wauplin's review in #4025: "get rid of the base64-encoding
branch [...] let's move to buckets and maintain only one transport mode".

The `HF_JOBS_USE_BUCKET_TRANSPORT` opt-in is removed and bucket transport
is now the only path for local scripts. The three failure modes that
previously silently fell back to base64 now raise:

- Missing `hf_xet`: `ImportError` with an install hint.
- Reserved mount path already in use: `ValueError`.
- Bucket creation/upload failure: the underlying exception propagates.

Also removes the regression test for bash shell quoting — the bash -c
code path no longer exists, so `uv run` receives args as argv and shell
metacharacters in dependency specifiers are no longer interpreted.
Follows Wauplin's review in #4025: mount with the per-job prefix so the
Jobs container sees only the artifacts folder created for that job.

Previously the whole `{ns}/jobs-artifacts` bucket was mounted at
`/artifacts` and scripts lived under
`/artifacts/scripts/{timestamp}-{hex}/`. Now the volume is constructed
with `path="scripts/{timestamp}-{hex}"", so the job sees its files
directly at `/artifacts/`.

This also drops the `scripts_prefix` from `_upload_scripts_to_bucket`'s
return type (it's no longer needed by the caller) and shortens the
generated `uv run` command.
Follows Wauplin's review in #4025: prefer `/data` since it mirrors the
historical Spaces persistent-storage path and is shorter to remember.
The bucket name stays `{ns}/jobs-artifacts`.
Follows Wauplin's review in #4025 ("should we make it read-write to
encourage users to write their output in it?"). The mount is already
read-write in practice — `Volume.read_only` defaults to `False` for
buckets and the Hub normalises omitted and explicit-false to the same
downstream payload — so this commit just makes the intent explicit at
the call site, documents it in the docstring, and pins the invariant
with a test assertion.
@davanstrien davanstrien changed the title Add experimental bucket+mount transport for Jobs script upload Add bucket+mount transport for Jobs script upload Apr 13, 2026
Comment thread src/huggingface_hub/hf_api.py Outdated
…ict param

Cursor Bugbot flagged that the new top-level `import secrets` shadows
the `secrets: dict` parameter in `_create_uv_command_env_and_secrets`,
`run_uv_job`, and `create_scheduled_uv_job`. The current call site is
safe (it lives in `_upload_scripts_to_bucket`, which has no such
parameter), but a future refactor that inlines or moves it would
silently break.

Importing `token_hex` by name eliminates the hazard and matches the
existing convention in this file (e.g. `from itertools import islice`
alongside `import itertools`).
@Wauplin Wauplin marked this pull request as draft April 13, 2026 14:23
@Wauplin
Copy link
Copy Markdown
Contributor

Wauplin commented Apr 13, 2026

(converted to draft, @davanstrien could you reset to "ready for review" when ready? -just trying to filter my notifications here 😄 )

@davanstrien
Copy link
Copy Markdown
Member Author

(converted to draft, @davanstrien could you reset to "ready for review" when ready? -just trying to filter my notifications here 😄 )

yes sorry for the noise!

@davanstrien
Copy link
Copy Markdown
Member Author

davanstrien commented Apr 13, 2026

Thanks both! Pushed follow-up commits working through the feedback. TL;DR: took @Wauplin's stronger suggestion and dropped the base64 path entirely, which enabled some other simplifications.

@lhoestq — the scripts/ rename landed earlier in 93e00708, and your "default to True" suggestion got subsumed by the base64 removal below (the flag is gone entirely).

@Wauplin:

  • import uuid at top level5808552e. Then actually removed from the top-level imports in cf7e0193 once uuid was replaced (see next item).

  • Move bucket constants to constants.py1a78fe92. Added HF_JOBS_ARTIFACTS_BUCKET_NAME and HF_JOBS_ARTIFACTS_MOUNT_PATH.

  • YYYYMMDDTHHMMSS-{6 hex} subfolder namingcf7e0193. Uses datetime.now(timezone.utc).strftime('%Y%m%dT%H%M%S') + secrets.token_hex(3).

  • Mount with path=scripts_prefix9c85c1ea. The bucket volume is now scoped to the per-job subfolder via Volume.path, so the container sees only its own files at the mount root. _upload_scripts_to_bucket no longer returns a prefix.

  • Mount at /data554f5f64. Bucket name stays jobs-artifacts.

  • Drop base64 entirely91219b22. The three failure modes that previously fell back to base64 now raise:

    • /data mount collision with a user volume → ValueError
    • hf_xet missing → ImportError with install hint
    • bucket create/upload failure → propagates

    HF_JOBS_USE_BUCKET_TRANSPORT is gone; ~50 lines of shell-quoting + xargs plumbing and 2 tests removed.

  • Read-write mountdf347c72. Already RW by default, so this just makes the intent explicit at the call site (read_only=False), documents it in the docstring, and pins the invariant with a test assertion. Double-checked on the Hub side that omitted readOnly and explicit-false normalise to the same downstream payload — zero wire-level change.

Smoke-tested end-to-end against production. Ran hf jobs uv run smoke.py where the script prints its argv, lists /data, and writes /data/smoke_output.txt.

Job logs:

sys.argv = ['/data/hf_jobs_smoke.py']
/data contents: ['hf_jobs_smoke.py']    # only its own file
Wrote /data/smoke_output.txt - size 19 bytes

Bucket state after:

bucket://davanstrien/jobs-artifacts/scripts/20260413T133239-8bd453/
├── hf_jobs_smoke.py     ← uploaded by client
└── smoke_output.txt     ← written by job at runtime

Comment thread src/huggingface_hub/constants.py Outdated
`hf_xet` is a default `install_requires` dependency on all common
platforms (setup.py line 20), so the previous `pip install
huggingface_hub[hf_xet]` hint was misleading — the extra is redundant
with the default install. The actual failure modes (unusual platform,
`HF_HUB_DISABLE_XET` set, broken/stale install) are varied enough that
linking to the installation guide is cleaner than prescribing a
specific command.
@davanstrien davanstrien marked this pull request as ready for review April 13, 2026 15:44
Copy link
Copy Markdown
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Some minor comments but logic looks good to me. Haven't checked the tests much yet

Comment thread src/huggingface_hub/constants.py Outdated
Comment thread src/huggingface_hub/hf_api.py Outdated
Comment thread src/huggingface_hub/hf_api.py Outdated
Comment thread src/huggingface_hub/hf_api.py Outdated
Comment thread src/huggingface_hub/hf_api.py Outdated
Comment thread src/huggingface_hub/hf_api.py Outdated
Comment thread src/huggingface_hub/hf_api.py Outdated
Comment thread src/huggingface_hub/hf_api.py Outdated
Comment thread tests/test_cli.py Outdated
davanstrien and others added 8 commits April 14, 2026 15:36
Co-authored-by: Lucain <lucain@huggingface.co>
Co-authored-by: Lucain <lucain@huggingface.co>
Co-authored-by: Lucain <lucain@huggingface.co>
Co-authored-by: Lucain <lucain@huggingface.co>
Co-authored-by: Lucain <lucain@huggingface.co>
Per review feedback: the volume is read-write so jobs can save
artifacts back to the same prefix, which makes scoping uploads under
`scripts/` misleading. Files now land at `{timestamp}-{hex}/<name>`
at the bucket root, and the volume path is the per-job subfolder.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The `hf_xet` availability check was removed from `_upload_scripts_to_bucket`,
so `huggingface_hub.hf_api.is_xet_available` no longer exists as a patch
target. Removes the now-dead `patch(...)` calls and the
`test_raises_when_xet_unavailable` test that asserted the (removed) ImportError.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm !

Comment thread src/huggingface_hub/hf_api.py Outdated
Copy link
Copy Markdown
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved once comments are addressed 😃

Tested it locally on a dummy script and it worked fine https://huggingface.co/buckets/Wauplin/jobs-artifacts/tree/20260415T092902-5ca267

I also took the liberty to update the PR description since the PR scope changed a bit since you created it

Comment thread src/huggingface_hub/hf_api.py Outdated
Co-authored-by: Lucain <lucain@huggingface.co>
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 0fb46cc. Configure here.

Comment thread tests/test_cli.py Outdated
Per Wauplin's review comment: users had no way to know where the
silently-created `jobs-artifacts` bucket lives. `_upload_scripts_to_bucket`
now captures the `BucketUrl` returned by `create_bucket` and prints it
after the upload succeeds. `print` over `logger.info` because the hub
logger defaults to WARNING, which would make `logger.info` silent for
anyone who hasn't opted into verbose logging — defeating the point of
the message.

Also fixes a stale test assertion: the earlier accepted `private=True`
suggestion didn't update `test_bucket_transport_uploads_and_returns_volume`,
so the test was broken on the branch tip.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants