[Qwen3.5] Onboard model to Checkpointing Util & Verify Correctness by Rohan-Bierneni · Pull Request #3839 · AI-Hypercomputer/maxtext

Rohan-Bierneni · 2026-05-07T17:58:20Z

Description

This PR onboard the Qwen3.5 model to the checkpointing util. Also we now use the converted hf -> maxtext checkpoint to run forward_pass_logit checker and verify model correctness.

Tests

Results of forward_pass_logit_checker on 2x2 cases (scanned/unscanned) (to_maxtext/to_huggingface):

HF -> Maxtext Tests:

Scanned:
- Command: https://paste.googleplex.com/5199078444105728
- Output: https://paste.googleplex.com/5416843419451392
Unscanned:
- Command: https://paste.googleplex.com/4965438162337792
- Output: https://paste.googleplex.com/4676840485683200

MaxText -> HF Tests:
Note: Took the maxtext checkpoint above and ran to_huggingface then to_maxtext to finally run forward_pass_logit_checker against checkpoint directly from hf repo.

Scanned:
- Command:
  - to_huggingface: https://paste.googleplex.com/4880765002317824
  - to_maxtext: https://paste.googleplex.com/4838089334849536
  - forward_pass_logit_checker: https://paste.googleplex.com/5068151130816512
- Output: https://paste.googleplex.com/6109510277136384
Unscanned:
- Command:
  - to_huggingface: https://paste.googleplex.com/4934408399355904
  - to_maxtext: https://paste.googleplex.com/6257034256318464
  - forward_pass_logit_checker: https://paste.googleplex.com/5068151130816512
- Output: https://paste.googleplex.com/6321468832088064

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

codecov · 2026-05-07T18:07:29Z

Codecov Report

❌ Patch coverage is 3.46535% with 195 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...xtext/checkpoint_conversion/utils/param_mapping.py	1.57%	125 Missing ⚠️
...rc/maxtext/checkpoint_conversion/utils/hf_shape.py	0.00%	35 Missing ⚠️
src/maxtext/checkpoint_conversion/utils/utils.py	0.00%	23 Missing ⚠️
src/maxtext/checkpoint_conversion/to_maxtext.py	0.00%	9 Missing ⚠️
...xt/checkpoint_conversion/utils/hf_model_configs.py	62.50%	3 Missing ⚠️

📢 Thoughts on this report? Let us know!

shuningjin

Thanks for adding the Qwen 3.5 conversion! Great work figuring out the logic to handle this complex structure, along with comprehensive testing.

I see this adds support for composite_hf_key (multiple HF keys to one MT key). For context, we previously only had composite_mt_key (multiple MT keys to one HF key), so it's great to have this functionality. cc @hengtaoguo

To help maintain this going forward, I have two suggestions:

Centralize the logic: Currently, the scanned (list of HF key tuples) vs. unscanned (single HF key tuple) logic is handled via an if/else block in the model hooks. Would it be possible to incorporate this directly into the central framework (to_huggingface via utils.utils._process and to_maxtext)? Here is an example of what that might look like. This will help simplify the logic for future models.
Add inline documentation: Could we add more comments clarifying the usage of composite_hf_key, as well as composite_mt_key?

Finally, I left a few minor comments. Please also run a quick formatting pass on the files to address some inconsistent indentation.

hengtaoguo

Thanks for the great work! Approve to unblock.

Add config file for 397B model update attentions.py with new decoder block type Update other files with new model to ensure model initialization is correct Update decoder block type Train Compile test is passing resolve nits in config file formatting resolve formatting errors Fix conflict in maxtext_utils Fix linter errors Fix linter errors Fix linter errors Ran pyink locally for formatting Fix naming for config file Add code for param_mapping for qwen3.5 Add hook fn function for Qwen3.5 Update hook logic for 1 to n mt -> hf hook fns Add other components to onboard model to checkpoint util Add support for mini qwen3.5 model to pyconfig Add qwen3.5-35b to checkpoint conversion for testing Able to do checkpoint conversion but forward pass mismatch Bug in concatenated tensors fixed. Multiple HF to 1 MT Add flags for lazy loading Logic for unscanned direct moe conversion Add logic for deinterleaving fuzed tensor when save to hf Add models to hf_shape dict Fix hf shape config structure Fix logic for converting tuples in to_huggingface Resolve comments in config file and forward pass file Resolve some comments in param_mapping & remove commented out code Move logic from param_mapping to central util framework Resolve comments Ran pyink to format files Linter passes now Add newline at EOF Fix package version failure Run linter

Rohan-Bierneni force-pushed the rbierneni-qwen35-checkpoint branch from 02bf76d to 5132b09 Compare May 11, 2026 17:44

Rohan-Bierneni self-assigned this May 14, 2026

Rohan-Bierneni requested a review from darisoy as a code owner May 18, 2026 10:49

Rohan-Bierneni force-pushed the rbierneni-qwen35-checkpoint branch from 230c444 to af96f80 Compare May 18, 2026 16:18

shuningjin reviewed May 20, 2026

View reviewed changes

hengtaoguo approved these changes May 20, 2026

View reviewed changes

Comment thread src/maxtext/checkpoint_conversion/to_maxtext.py Outdated

Rohan-Bierneni force-pushed the rbierneni-qwen35-checkpoint branch from af96f80 to d89d8aa Compare May 27, 2026 17:28

Rohan-Bierneni force-pushed the rbierneni-qwen35-checkpoint branch from f7f2ca7 to 5976ca2 Compare May 27, 2026 18:27

shralex approved these changes May 28, 2026

View reviewed changes

shuningjin approved these changes May 29, 2026

View reviewed changes

Rohan-Bierneni added the pull ready label May 29, 2026

Rohan-Bierneni force-pushed the rbierneni-qwen35-checkpoint branch 3 times, most recently from 1cc2571 to b4da49b Compare May 30, 2026 00:24

Rohan-Bierneni force-pushed the rbierneni-qwen35-checkpoint branch from b4da49b to 999c339 Compare May 30, 2026 00:28

copybara-service Bot merged commit 493fba6 into main May 31, 2026
32 of 33 checks passed

copybara-service Bot deleted the rbierneni-qwen35-checkpoint branch May 31, 2026 00:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Qwen3.5] Onboard model to Checkpointing Util & Verify Correctness#3839

[Qwen3.5] Onboard model to Checkpointing Util & Verify Correctness#3839
copybara-service[bot] merged 1 commit into
mainfrom
rbierneni-qwen35-checkpoint

Rohan-Bierneni commented May 7, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 7, 2026 •

edited

Loading

Uh oh!

shuningjin left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hengtaoguo left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Rohan-Bierneni commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Checklist

Uh oh!

codecov Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

shuningjin left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hengtaoguo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Rohan-Bierneni commented May 7, 2026 •

edited

Loading

codecov Bot commented May 7, 2026 •

edited

Loading

shuningjin left a comment •

edited

Loading