Skip to content

Shard GMM weights on the expert dimension, mask out unprocessed tokens, and conditionally remat gating GMMs in the bwd pass.#3998

Merged
copybara-service[bot] merged 1 commit into
mainfrom
test_922371173
May 30, 2026
Merged

Shard GMM weights on the expert dimension, mask out unprocessed tokens, and conditionally remat gating GMMs in the bwd pass.#3998
copybara-service[bot] merged 1 commit into
mainfrom
test_922371173

Conversation

@copybara-service
Copy link
Copy Markdown
Contributor

Shard GMM weights on the expert dimension, mask out unprocessed tokens, and conditionally remat gating GMMs in the bwd pass.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 27, 2026

Codecov Report

❌ Patch coverage is 19.23077% with 42 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/maxtext/models/deepseek_batchsplit.py 19.23% 42 Missing ⚠️

📢 Thoughts on this report? Let us know!

@copybara-service copybara-service Bot force-pushed the test_922371173 branch 6 times, most recently from a11193f to 18449b3 Compare May 30, 2026 04:35
…s, and conditionally remat gating GMMs in the bwd pass.

PiperOrigin-RevId: 923759264
@copybara-service copybara-service Bot merged commit c3bc78a into main May 30, 2026
@copybara-service copybara-service Bot deleted the test_922371173 branch May 30, 2026 05:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants