Skip to content

[Qwen3.5] Onboard model to Checkpointing Util & Verify Correctness#3839

Merged
copybara-service[bot] merged 1 commit into
mainfrom
rbierneni-qwen35-checkpoint
May 31, 2026
Merged

[Qwen3.5] Onboard model to Checkpointing Util & Verify Correctness#3839
copybara-service[bot] merged 1 commit into
mainfrom
rbierneni-qwen35-checkpoint

Conversation

@Rohan-Bierneni
Copy link
Copy Markdown
Collaborator

@Rohan-Bierneni Rohan-Bierneni commented May 7, 2026

Description

This PR onboard the Qwen3.5 model to the checkpointing util. Also we now use the converted hf -> maxtext checkpoint to run forward_pass_logit checker and verify model correctness.

Tests

Results of forward_pass_logit_checker on 2x2 cases (scanned/unscanned) (to_maxtext/to_huggingface):

HF -> Maxtext Tests:

MaxText -> HF Tests:
Note: Took the maxtext checkpoint above and ran to_huggingface then to_maxtext to finally run forward_pass_logit_checker against checkpoint directly from hf repo.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 7, 2026

@Rohan-Bierneni Rohan-Bierneni force-pushed the rbierneni-qwen35-checkpoint branch from 02bf76d to 5132b09 Compare May 11, 2026 17:44
@Rohan-Bierneni Rohan-Bierneni self-assigned this May 14, 2026
@Rohan-Bierneni Rohan-Bierneni requested a review from darisoy as a code owner May 18, 2026 10:49
@Rohan-Bierneni Rohan-Bierneni force-pushed the rbierneni-qwen35-checkpoint branch from 230c444 to af96f80 Compare May 18, 2026 16:18
Copy link
Copy Markdown
Collaborator

@shuningjin shuningjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the Qwen 3.5 conversion! Great work figuring out the logic to handle this complex structure, along with comprehensive testing.

I see this adds support for composite_hf_key (multiple HF keys to one MT key). For context, we previously only had composite_mt_key (multiple MT keys to one HF key), so it's great to have this functionality. cc @hengtaoguo

To help maintain this going forward, I have two suggestions:

  1. Centralize the logic: Currently, the scanned (list of HF key tuples) vs. unscanned (single HF key tuple) logic is handled via an if/else block in the model hooks. Would it be possible to incorporate this directly into the central framework (to_huggingface via utils.utils._process and to_maxtext)? Here is an example of what that might look like. This will help simplify the logic for future models.
  2. Add inline documentation: Could we add more comments clarifying the usage of composite_hf_key, as well as composite_mt_key?

Finally, I left a few minor comments. Please also run a quick formatting pass on the files to address some inconsistent indentation.

Comment thread src/maxtext/configs/models/qwen3.5-35b-a3b.yml Outdated
Comment thread tests/utils/forward_pass_logit_checker.py Outdated
Comment thread tests/utils/forward_pass_logit_checker.py Outdated
Comment thread src/maxtext/checkpoint_conversion/utils/param_mapping.py
Comment thread src/maxtext/checkpoint_conversion/utils/param_mapping.py
Comment thread src/maxtext/checkpoint_conversion/utils/param_mapping.py
Comment thread src/maxtext/checkpoint_conversion/utils/param_mapping.py Outdated
Comment thread src/maxtext/checkpoint_conversion/utils/param_mapping.py Outdated
Comment thread src/maxtext/checkpoint_conversion/utils/param_mapping.py Outdated
Comment thread src/maxtext/checkpoint_conversion/utils/param_mapping.py Outdated
Copy link
Copy Markdown
Collaborator

@hengtaoguo hengtaoguo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great work! Approve to unblock.

Comment thread src/maxtext/checkpoint_conversion/to_maxtext.py Outdated
@Rohan-Bierneni Rohan-Bierneni force-pushed the rbierneni-qwen35-checkpoint branch from af96f80 to d89d8aa Compare May 27, 2026 17:28
@Rohan-Bierneni Rohan-Bierneni force-pushed the rbierneni-qwen35-checkpoint branch from f7f2ca7 to 5976ca2 Compare May 27, 2026 18:27
@Rohan-Bierneni Rohan-Bierneni force-pushed the rbierneni-qwen35-checkpoint branch 3 times, most recently from 1cc2571 to b4da49b Compare May 30, 2026 00:24
Add config file for 397B model

update attentions.py with new decoder block type

Update other files with new model to ensure model initialization is correct

Update decoder block type

Train Compile test is passing

resolve nits in config file formatting

resolve formatting errors

Fix conflict in maxtext_utils

Fix linter errors

Fix linter errors

Fix linter errors

Ran pyink locally for formatting

Fix naming for config file

Add code for param_mapping for qwen3.5

Add hook fn function for Qwen3.5

Update hook logic for 1 to n mt -> hf hook fns

Add other components to onboard model to checkpoint util

Add support for mini qwen3.5 model to pyconfig

Add qwen3.5-35b to checkpoint conversion for testing

Able to do checkpoint conversion but forward pass mismatch

Bug in concatenated tensors fixed. Multiple HF to 1 MT

Add flags for lazy loading

Logic for unscanned direct moe conversion

Add logic for deinterleaving fuzed tensor when save to hf

Add models to hf_shape dict

Fix hf shape config structure

Fix logic for converting tuples in to_huggingface

Resolve comments in config file and forward pass file

Resolve some comments in param_mapping & remove commented out code

Move logic from param_mapping to central util framework

Resolve comments

Ran pyink to format files

Linter passes now

Add newline at EOF

Fix package version failure

Run linter
@Rohan-Bierneni Rohan-Bierneni force-pushed the rbierneni-qwen35-checkpoint branch from b4da49b to 999c339 Compare May 30, 2026 00:28
@copybara-service copybara-service Bot merged commit 493fba6 into main May 31, 2026
32 of 33 checks passed
@copybara-service copybara-service Bot deleted the rbierneni-qwen35-checkpoint branch May 31, 2026 00:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants