Skip to content

[Models] Update SWA RoPE theta for MLA/GQA attention#8077

Merged
Jiang-Jia-Jun merged 3 commits into
PaddlePaddle:developfrom
chang-wenbin:mla_gqa_swa_rope_theta
Jun 26, 2026
Merged

[Models] Update SWA RoPE theta for MLA/GQA attention#8077
Jiang-Jia-Jun merged 3 commits into
PaddlePaddle:developfrom
chang-wenbin:mla_gqa_swa_rope_theta

Conversation

@chang-wenbin

@chang-wenbin chang-wenbin commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

Motivation

为配置了 swa_rope_theta 的 MLA/GQA 滑动窗口注意力层使用独立 RoPE base,避免 SWA 层与全量注意力层共用 rope_theta

Modifications

  • ForwardMeta 增加 swa_rotary_embsgpu_model_runnershare_inputs["swa_rope_emb"] 传入。
  • InputBatchProposerInputBatch 在配置 swa_rope_theta 时额外构建 swa_rope_emb
  • AppendAttentionBackendwindow_attn_skip_freq[layer_id] == 1 且配置 swa_rope_theta 时使用 swa_rotary_embs
  • DeepseekV3MLAAttention 对 SWA 层使用 swa_rope_theta 初始化 RoPE,并缓存 window_attn_skip_freq

Usage or Command

N/A

Accuracy Tests

N/A

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

zhoutianzi666
zhoutianzi666 previously approved these changes Jun 25, 2026
PaddlePaddle-bot

This comment was marked as outdated.

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Paddle-CI-Agent | pr_review | 2026-06-25 19:30:12

📋 Review 摘要

PR 概述:为 MLA/GQA 的 SWA 注意力路径接入独立 swa_rope_theta / swa_rope_emb
变更范围ForwardMetaInputBatch / ProposerInputBatch、GPU runner、append attention backend、DeepSeek V3 MLA attention。
影响面 Tag[Models] [OP]

问题

未发现新的阻塞性问题。PR 规范问题在下面章节报,不要在这里重复。

历史 Findings 修复情况

Finding 问题 状态
F1 glm_moe_dsa 的非 SWA 层会丢失 rope_parameters["rope_theta"] ⚠️ 仍存在
F2 rope_already_applied 的 PaddleFormers fallback 在 SWA 层会再次应用 RoPE。 ⚠️ 仍存在

📝 PR 规范检查

标题缺少官方 Tag,描述仍是空模板且未提供精度/对齐结果;下面给出可直接替换的标题和描述。

标题建议(可直接复制):

  • [Models] Update SWA RoPE theta for MLA/GQA attention
PR 描述建议(点击展开,可直接复制)
## Motivation
为配置了 `swa_rope_theta` 的 MLA/GQA 滑动窗口注意力层使用独立 RoPE base,避免 SWA 层与全量注意力层共用 `rope_theta`## Modifications
- `ForwardMeta` 增加 `swa_rotary_embs``gpu_model_runner``share_inputs["swa_rope_emb"]` 传入。
- `InputBatch``ProposerInputBatch` 在配置 `swa_rope_theta` 时额外构建 `swa_rope_emb`- `AppendAttentionBackend``window_attn_skip_freq[layer_id] == 1` 且配置 `swa_rope_theta` 时使用 `swa_rotary_embs`- `DeepseekV3MLAAttention` 对 SWA 层使用 `swa_rope_theta` 初始化 RoPE,并缓存 `window_attn_skip_freq`## Usage or Command
N/A

## Accuracy Tests
N/A

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

本轮按风险优先审查了 5 个变更文件中的 RoPE/SWA 主链路,并核对了 append attention backend、DeepSeek MLA、GPU ForwardMeta 初始化和输入 batch 构造。除历史未解决的 F1/F2 外,未发现新的 diff 级别阻塞缺陷;合入前仍建议修复这两个历史问题并补充精度/对齐说明。

@chang-wenbin chang-wenbin changed the title update mla_gqa_swa_rope_theta [Models] Update SWA RoPE theta for MLA/GQA attention Jun 25, 2026
@codecov-commenter

codecov-commenter commented Jun 25, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 31.81818% with 15 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@e8ae0f9). Learn more about missing BASE report.

Files with missing lines Patch % Lines
fastdeploy/worker/input_batch.py 33.33% 4 Missing and 4 partials ⚠️
fastdeploy/model_executor/models/deepseek_v3.py 0.00% 5 Missing ⚠️
...l_executor/layers/attention/append_attn_backend.py 50.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #8077   +/-   ##
==========================================
  Coverage           ?   67.62%           
==========================================
  Files              ?      475           
  Lines              ?    66909           
  Branches           ?    10321           
==========================================
  Hits               ?    45249           
  Misses             ?    18813           
  Partials           ?     2847           
Flag Coverage Δ
GPU 77.53% <31.81%> (?)
XPU 16.03% <4.54%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@PaddlePaddle-bot

Copy link
Copy Markdown

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-06-26 11:48:07 UTC+08:00

CI报告基于以下代码生成(30分钟更新一次):
PR commit: a85a97a | Merge base: e8ae0f9 (branch: develop)


1 Required任务 : 9/10 通过

总执行(rerun次数) 总任务 ✅ 通过 ❌ 失败 ⏳ 运行中 ⏸️ 等待中 跳过
68(26) 42 38 4 0 0 0
任务 错误类型 置信度 日志
Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage PR问题:差异覆盖率 50% 未达 80% Job

2 失败详情

🔴 Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage — PR问题(置信度: 高)

分析器: 通用分析(fallback)

失败用例: 覆盖率阈值检查

用例 错误摘要
Verify Code Coverage Threshold (80%) 新增代码差异覆盖率为 50%,低于 80% 阈值,进程以 exit code 9 失败

关键日志:

GPU Patch Coverage Details:
"fastdeploy/model_executor/models/deepseek_v3.py": percent_covered=0.0, violation_lines=[302, 304, 311, 316, 536]
"fastdeploy/worker/input_batch.py": percent_covered=66.67, violation_lines=[252, 742, 854, 1092]
"fastdeploy/model_executor/layers/attention/append_attn_backend.py": percent_covered=50.0, violation_lines=[327, 328]
"total_num_lines": 22, "total_num_violations": 11, "total_percent_covered": 50
Process completed with exit code 9.
  • 根因摘要: 新增 SWA RoPE 逻辑缺少覆盖率
    PR 新增了 swa_rope_thetaswa_rope_emb 和 SWA 层切换 swa_rotary_embs 的逻辑,但对应分支没有被当前单测覆盖。单测执行本身通过,失败发生在覆盖率门禁,说明需要补充覆盖这些新增分支的测试或调整已有测试触发相关配置。

修复建议:

  1. deepseek_v3.pyswa_rope_theta 覆盖 self.rope_theta 以及 SWA 分支判断补测试,重点覆盖第 302、304、311、316、536 行。
  2. input_batch.pyswa_rope_emb = get_rope(...) 的初始化和 reset 路径补测试,覆盖第 252、742、854、1092 行。
  3. append_attn_backend.py 中 SWA 层选择 forward_meta.swa_rotary_embs 的分支补测试,覆盖第 327、328 行。

关联变更: fastdeploy/model_executor/models/deepseek_v3.py, fastdeploy/worker/input_batch.py, fastdeploy/model_executor/layers/attention/append_attn_backend.py

@Jiang-Jia-Jun Jiang-Jia-Jun merged commit f4eda5a into PaddlePaddle:develop Jun 26, 2026
59 of 69 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants