[Fix] Match fp32-keep patterns against post-mapping HF keys#1897
Merged
Conversation
The fp32-keep mechanism (kept in fp32 and excluded from FSDP sharding, e.g. Qwen3.5 dense linear_attn.norm.weight / A_log) classified params inconsistently: `_fully_shard` matched `fp32_keys_pattern` against the pre-mapping `to_hf_key_list` output (model.layers...), while the save path matched the post-mapping hf_keys (model.language_model.layers...). For models whose `hf_key_mapping` rewrites the namespace, the patterns never matched at shard time, so the fp32 params were not excluded from FSDP. Align `_fully_shard` to the post-mapping hf_keys via load_spec_mapping so all consumers (sharding, _split_ignored_params, _get_save_dtype) classify against the same keys that land in the saved checkpoint.
49b024b to
c2167a7
Compare
braisedpork1964
pushed a commit
to braisedpork1964/xtuner
that referenced
this pull request
Jun 11, 2026
…#1897) The fp32-keep mechanism (kept in fp32 and excluded from FSDP sharding, e.g. Qwen3.5 dense linear_attn.norm.weight / A_log) classified params inconsistently: `_fully_shard` matched `fp32_keys_pattern` against the pre-mapping `to_hf_key_list` output (model.layers...), while the save path matched the post-mapping hf_keys (model.language_model.layers...). For models whose `hf_key_mapping` rewrites the namespace, the patterns never matched at shard time, so the fp32 params were not excluded from FSDP. Align `_fully_shard` to the post-mapping hf_keys via load_spec_mapping so all consumers (sharding, _split_ignored_params, _get_save_dtype) classify against the same keys that land in the saved checkpoint.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The fp32-keep mechanism (kept in fp32 and excluded from FSDP sharding,
e.g. Qwen3.5 dense linear_attn.norm.weight / A_log) classified params
inconsistently:
_fully_shardmatchedfp32_keys_patternagainst thepre-mapping
to_hf_key_listoutput (model.layers...), while the savepath matched the post-mapping hf_keys (model.language_model.layers...).
For models whose
hf_key_mappingrewrites the namespace, the patternsnever matched at shard time, so the fp32 params were not excluded from
FSDP.
Align
_fully_shardto the post-mapping hf_keys via load_spec_mappingso all consumers (sharding, _split_ignored_params, _get_save_dtype)
classify against the same keys that land in the saved checkpoint.