Skip to content

[Fix] Match fp32-keep patterns against post-mapping HF keys#1897

Merged
HAOCHENYE merged 1 commit into
InternLM:mainfrom
HAOCHENYE:fix/qwen3-5-dense-fp32-keep
Jun 10, 2026
Merged

[Fix] Match fp32-keep patterns against post-mapping HF keys#1897
HAOCHENYE merged 1 commit into
InternLM:mainfrom
HAOCHENYE:fix/qwen3-5-dense-fp32-keep

Conversation

@HAOCHENYE

Copy link
Copy Markdown
Collaborator

The fp32-keep mechanism (kept in fp32 and excluded from FSDP sharding,
e.g. Qwen3.5 dense linear_attn.norm.weight / A_log) classified params
inconsistently: _fully_shard matched fp32_keys_pattern against the
pre-mapping to_hf_key_list output (model.layers...), while the save
path matched the post-mapping hf_keys (model.language_model.layers...).
For models whose hf_key_mapping rewrites the namespace, the patterns
never matched at shard time, so the fp32 params were not excluded from
FSDP.

Align _fully_shard to the post-mapping hf_keys via load_spec_mapping
so all consumers (sharding, _split_ignored_params, _get_save_dtype)
classify against the same keys that land in the saved checkpoint.

The fp32-keep mechanism (kept in fp32 and excluded from FSDP sharding,
e.g. Qwen3.5 dense linear_attn.norm.weight / A_log) classified params
inconsistently: `_fully_shard` matched `fp32_keys_pattern` against the
pre-mapping `to_hf_key_list` output (model.layers...), while the save
path matched the post-mapping hf_keys (model.language_model.layers...).
For models whose `hf_key_mapping` rewrites the namespace, the patterns
never matched at shard time, so the fp32 params were not excluded from
FSDP.

Align `_fully_shard` to the post-mapping hf_keys via load_spec_mapping
so all consumers (sharding, _split_ignored_params, _get_save_dtype)
classify against the same keys that land in the saved checkpoint.
@HAOCHENYE HAOCHENYE force-pushed the fix/qwen3-5-dense-fp32-keep branch from 49b024b to c2167a7 Compare June 9, 2026 16:03
@HAOCHENYE HAOCHENYE merged commit d6a1a7a into InternLM:main Jun 10, 2026
5 of 6 checks passed
braisedpork1964 pushed a commit to braisedpork1964/xtuner that referenced this pull request Jun 11, 2026
…#1897)

The fp32-keep mechanism (kept in fp32 and excluded from FSDP sharding,
e.g. Qwen3.5 dense linear_attn.norm.weight / A_log) classified params
inconsistently: `_fully_shard` matched `fp32_keys_pattern` against the
pre-mapping `to_hf_key_list` output (model.layers...), while the save
path matched the post-mapping hf_keys (model.language_model.layers...).
For models whose `hf_key_mapping` rewrites the namespace, the patterns
never matched at shard time, so the fp32 params were not excluded from
FSDP.

Align `_fully_shard` to the post-mapping hf_keys via load_spec_mapping
so all consumers (sharding, _split_ignored_params, _get_save_dtype)
classify against the same keys that land in the saved checkpoint.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant