[Qwen3.5] Onboard model to Checkpointing Util & Verify Correctness#3839
Conversation
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
02bf76d to
5132b09
Compare
230c444 to
af96f80
Compare
There was a problem hiding this comment.
Thanks for adding the Qwen 3.5 conversion! Great work figuring out the logic to handle this complex structure, along with comprehensive testing.
I see this adds support for composite_hf_key (multiple HF keys to one MT key). For context, we previously only had composite_mt_key (multiple MT keys to one HF key), so it's great to have this functionality. cc @hengtaoguo
To help maintain this going forward, I have two suggestions:
- Centralize the logic: Currently, the scanned (list of HF key tuples) vs. unscanned (single HF key tuple) logic is handled via an if/else block in the model hooks. Would it be possible to incorporate this directly into the central framework (
to_huggingfaceviautils.utils._processandto_maxtext)? Here is an example of what that might look like. This will help simplify the logic for future models. - Add inline documentation: Could we add more comments clarifying the usage of
composite_hf_key, as well ascomposite_mt_key?
Finally, I left a few minor comments. Please also run a quick formatting pass on the files to address some inconsistent indentation.
hengtaoguo
left a comment
There was a problem hiding this comment.
Thanks for the great work! Approve to unblock.
af96f80 to
d89d8aa
Compare
f7f2ca7 to
5976ca2
Compare
1cc2571 to
b4da49b
Compare
Add config file for 397B model update attentions.py with new decoder block type Update other files with new model to ensure model initialization is correct Update decoder block type Train Compile test is passing resolve nits in config file formatting resolve formatting errors Fix conflict in maxtext_utils Fix linter errors Fix linter errors Fix linter errors Ran pyink locally for formatting Fix naming for config file Add code for param_mapping for qwen3.5 Add hook fn function for Qwen3.5 Update hook logic for 1 to n mt -> hf hook fns Add other components to onboard model to checkpoint util Add support for mini qwen3.5 model to pyconfig Add qwen3.5-35b to checkpoint conversion for testing Able to do checkpoint conversion but forward pass mismatch Bug in concatenated tensors fixed. Multiple HF to 1 MT Add flags for lazy loading Logic for unscanned direct moe conversion Add logic for deinterleaving fuzed tensor when save to hf Add models to hf_shape dict Fix hf shape config structure Fix logic for converting tuples in to_huggingface Resolve comments in config file and forward pass file Resolve some comments in param_mapping & remove commented out code Move logic from param_mapping to central util framework Resolve comments Ran pyink to format files Linter passes now Add newline at EOF Fix package version failure Run linter
b4da49b to
999c339
Compare
Description
This PR onboard the Qwen3.5 model to the checkpointing util. Also we now use the converted hf -> maxtext checkpoint to run forward_pass_logit checker and verify model correctness.
Tests
Results of forward_pass_logit_checker on 2x2 cases (scanned/unscanned) (to_maxtext/to_huggingface):
HF -> Maxtext Tests:
MaxText -> HF Tests:
Note: Took the maxtext checkpoint above and ran to_huggingface then to_maxtext to finally run forward_pass_logit_checker against checkpoint directly from hf repo.
Checklist
Before submitting this PR, please make sure (put X in square brackets):
gemini-reviewlabel.