Skip to content

feat(update-weight): add mooncake transfer engine transport#2159

Open
Vincent-Bo-ali wants to merge 1 commit into
THUDM:mainfrom
Vincent-Bo-ali:vin/mooncake-te
Open

feat(update-weight): add mooncake transfer engine transport#2159
Vincent-Bo-ali wants to merge 1 commit into
THUDM:mainfrom
Vincent-Bo-ali:vin/mooncake-te

Conversation

@Vincent-Bo-ali

Copy link
Copy Markdown

Summary

This PR adds an experimental Mooncake TransferEngine transport for full-weight sync to slime-managed SGLang engines.

Instead of forming a trainer-engine NCCL update group or writing full checkpoints to a shared filesystem, slime now can:

  1. Ask each SGLang engine to initialize a local Mooncake receiver buffer.
  2. Write each full-weight bucket into that receiver buffer through Mooncake TransferEngine.
  3. Send a manifest to SGLang so the engine loads tensors from its local receiver buffer.
  4. Destroy the receiver cleanly when the rollout engine disconnects.

This path is intentionally scoped to --update-weight-mode full and slime-managed SGLang engines.

User-Facing Interface

New transport option:

--update-weight-mode full
--update-weight-transport mooncake
--mooncake-metadata-server P2PHANDSHAKE
--mooncake-protocol tcp
--mooncake-buffer-size $((8 * 1024 * 1024 * 1024))
--mooncake-buffer-count 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant