Skip to content

[Cherry-pick][PD Disaggregation] Limit prefill fetch num with FD_MAX_INFLIGHT_PREFILL(#7981)#8081

Open
Sunny-bot1 wants to merge 3 commits into
PaddlePaddle:release/2.6from
Sunny-bot1:fd_26_fix_prefill_num
Open

[Cherry-pick][PD Disaggregation] Limit prefill fetch num with FD_MAX_INFLIGHT_PREFILL(#7981)#8081
Sunny-bot1 wants to merge 3 commits into
PaddlePaddle:release/2.6from
Sunny-bot1:fd_26_fix_prefill_num

Conversation

@Sunny-bot1

@Sunny-bot1 Sunny-bot1 commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

Motivation

CP : #7981

Modifications

Usage or Command

Accuracy Tests

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

kevincheng2 and others added 3 commits June 10, 2026 17:16
In PD separation mode, different ranks may receive cache_info at different
times. When consume_signals gets a layer0 signal, some ranks find the
engine_idx already in idx_cache_task_dict (ready) while others don't (pending).
This causes different ranks to put different batch_engine_signals into the
queue, leading to mismatched finish_send_cache_barrier.wait() calls and
deadlock.

Fix: route all layer0 signals through pending_layer0_signals uniformly,
then immediately recover any that already have cache_info registered.
Each recovered signal is put into the queue individually (single-request
batch) to ensure all ranks have identical batch granularity regardless
of recovery timing.

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Paddle-CI-Agent | pr_review | 2026-06-26 13:41:58

📋 Review 摘要

PR 概述:为 PD prefill 请求拉取增加 FD_MAX_INFLIGHT_PREFILL 上限。
变更范围fastdeploy/engine/common_engine_prepare_mixin.pyfastdeploy/envs.py
影响面 Tag[Engine] [PD Disaggregation]

问题

级别 文件 概述
🔴 Bug fastdeploy/engine/common_engine_prepare_mixin.py:93 多个 prefill fetch 线程会同时按同一个剩余额度出队,FD_MAX_INFLIGHT_PREFILL 实际可能被突破

📝 PR 规范检查

当前 PR 标题的 Cherry-Pick 标记大小写不符合模板,PR 描述保留了模板占位注释且核心章节为空。建议替换为以下可复制内容。

标题建议(可直接复制):

  • [Cherry-Pick][PD Disaggregation] Limit prefill fetch num with FD_MAX_INFLIGHT_PREFILL(#7981)
PR 描述建议(点击展开,可直接复制)
## Motivation
限制 PD Disaggregation prefill 实例一次性拉取的在途请求数量,避免 prefill fetch 过多请求导致资源压力放大。

## Modifications
-`fastdeploy/engine/common_engine_prepare_mixin.py` 中读取 `FD_MAX_INFLIGHT_PREFILL`,并在 `_fetch_request_prefill()` 中根据当前 running prefill 数量限制本轮 `scheduler.get_requests()` 的 batch。
-`fastdeploy/envs.py` 中新增环境变量 `FD_MAX_INFLIGHT_PREFILL`,默认值为 `20`## Usage or Command
设置环境变量控制 prefill 在途请求上限,例如:
`FD_MAX_INFLIGHT_PREFILL=20`

## Accuracy Tests
N/A,本 PR 不涉及模型数值逻辑;当前 diff 未新增单测,需补充多 fetch 线程同时调用 `_fetch_request_prefill()` 时不突破 `FD_MAX_INFLIGHT_PREFILL` 的回归测试后再勾选单测项。

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [x] Provide accuracy results.
- [x] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

新增 env 和 fetch 批量限制方向与 PR 目的一致,但当前实现没有把多线程 fetch 的 pending 请求计入上限,默认配置下限制会失效。建议先修复该并发记账问题,并补一条多 fetch 线程同时拉取时不突破上限的回归测试。

def _fetch_request_prefill(self) -> bool:
"""Fetch and prepare requests for a prefill instance. Returns True if tasks were fetched."""
max_inflight_prefill = envs.FD_MAX_INFLIGHT_PREFILL
inflight_prefill = len(self.resource_manager.running)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug 这里用 len(self.resource_manager.running) 计算剩余额度,无法限制默认的多 fetch 线程场景。

_prepare_request_v1() 在 prefill 角色会按 FD_PREFILL_PREPARE_REQ_THREAD_NUM 启动多个 _fetch_loop(默认 3 个线程)。每个线程都会在请求还没有执行到 add_request_in_p() 之前读取同一个 running 长度并调用 scheduler.get_requests(batch=available_for_new);而这些已出队、正在申请 D 侧资源或异步预处理的请求尚未进入 running。因此 FD_MAX_INFLIGHT_PREFILL=20 时,3 个线程可以同时各拉 20 个请求,实际 inflight 变成 60,限制失效并继续放大 PD 侧资源压力。

建议修复方式:
把“检查剩余额度”和“登记已占用额度”放到同一个共享临界区,例如在 prefill fetch 入口维护一个受锁保护的 pending 计数,出队前先 reserve available_for_new,请求失败或完成后再释放;或者将 FD_MAX_INFLIGHT_PREFILL 的判断移动到 ResourceManagerV1 内部,与 preallocate_resource_in_p() / add_request_in_p() 共用锁,并把已出队但未进入 running 的 pending 请求也计入上限。

@codecov-commenter

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 40.00000% with 3 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (release/2.6@82c7c7a). Learn more about missing BASE report.

Files with missing lines Patch % Lines
fastdeploy/engine/common_engine_prepare_mixin.py 40.00% 3 Missing ⚠️
Additional details and impacted files
@@              Coverage Diff               @@
##             release/2.6    #8081   +/-   ##
==============================================
  Coverage               ?   71.48%           
==============================================
  Files                  ?      386           
  Lines                  ?    55795           
  Branches               ?     8765           
==============================================
  Hits                   ?    39885           
  Misses                 ?    13104           
  Partials               ?     2806           
Flag Coverage Δ
GPU 71.48% <40.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@PaddlePaddle-bot

Copy link
Copy Markdown

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-06-27 12:06:58 UTC+08:00

CI报告基于以下代码生成(30分钟更新一次):
PR commit: cbbb9c6 | Merge base: 82c7c7a (branch: release/2.6)


1 Required任务 : 8/10 通过

总执行(rerun次数) 总任务 ✅ 通过 ❌ 失败 ⏳ 运行中 ⏸️ 等待中 跳过
36(0) 36 31 5 0 0 0
任务 错误类型 置信度 日志
Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage PR问题 Job
Approval 需要 Approval Job

2 失败详情

🔴 Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage — PR问题(置信度: 高)

错误类型: PR问题 | 置信度: 高
分析器: 通用分析(fallback)
失败用例:

用例 错误摘要
tests/engine/test_common_engine.py::TestCommonEngineAdditionalCoverage::test_schedule_request_to_worker_v1_prefill_continuous_wait_async_none _fetch_request_prefill 新增读取 resource_manager.running,测试 DummyRM 缺少该属性导致 AttributeError

关键日志:

fastdeploy/engine/common_engine_prepare_mixin.py:93
inflight_prefill = len(self.resource_manager.running)
AttributeError: 'DummyRM' object has no attribute 'running'
  • 根因摘要: PR新增并发限制后测试桩缺少running

本 PR 在 fastdeploy/engine/common_engine_prepare_mixin.py:92-100_fetch_request_prefill() 增加 FD_MAX_INFLIGHT_PREFILL 限流逻辑,并直接读取 len(self.resource_manager.running)。失败用例使用的 tests/engine/test_common_engine.py:336-364_make_v1_prefill_continuous_rm() 返回的 DummyRM 只定义了 waitingreal_bszadd_request_in_ppre_recycle_resource 等字段,没有同步新增 running 属性,因此在进入新逻辑前抛出 AttributeError。真实 ResourceManagerV1fastdeploy/engine/sched/resource_manager_v1.py:181 初始化了 self.running,当前 CI 失败更像是本 PR 新行为引入后单测桩未更新。

修复建议:

  1. tests/engine/test_common_engine.py_make_v1_prefill_continuous_rm() / DummyRM.__init__ 中补充 self.running = [],必要时按场景构造已有在飞请求数量。
  2. 补一条断言覆盖 len(resource_manager.running) >= FD_MAX_INFLIGHT_PREFILL_fetch_request_prefill() 返回 False 且不继续从 scheduler 拉取请求。

关联变更: fastdeploy/engine/common_engine_prepare_mixin.py:92-100, fastdeploy/envs.py:199; 关联测试桩 tests/engine/test_common_engine.py:336-364

🔴 Approval — 需要 Approval(置信度: 高)

该 Job 需要人工 Approval,完成审批后 CI 才会继续执行。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants