Skip to content

Commit 89e6416

Browse files
ssjiaSS-JIA
authored andcommitted
[ET-VK] Lower reduce_peak_memory threshold from 500 MB to 10 MB
During prepack, staging buffers accumulate in buffers_to_clear_ until flush() is called. Previously, the reduce_peak_memory path (which calls submit_and_wait + flush to free staging buffers incrementally) only triggered when total constant data exceeded 500 MB. This meant models with moderate weight sizes (e.g. 42 MB) never benefited from incremental cleanup, causing all staging buffers to coexist in memory until the final flush. Lowering the threshold to 10 MB enables incremental staging buffer cleanup for most models. On SceneX V9 FP16 (42 MB weights, Samsung S24 Adreno 750), this reduces transient VMA peak during prepack from 89.6 MB to 57.3 MB (-36%) at a cost of ~15 ms additional load latency (+4.4%). Steady-state memory and inference performance are unaffected. Authored with Claude. Differential Revision: [D100332227](https://our.internmc.facebook.com/intern/diff/D100332227/) ghstack-source-id: 365456277 Pull Request resolved: #18816
1 parent 8d1ff1a commit 89e6416

1 file changed

Lines changed: 2 additions & 1 deletion

File tree

backends/vulkan/runtime/graph/ComputeGraph.cpp

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1134,8 +1134,9 @@ void ComputeGraph::clear_deferred_cmds() {
11341134
void ComputeGraph::prepack() {
11351135
int i = 0;
11361136
bool submitted = false;
1137-
const bool reduce_peak_memory = total_constant_nbytes_ > 500 * MB;
1137+
const bool reduce_peak_memory = total_constant_nbytes_ > 10 * MB;
11381138
// int count = 0;
1139+
11391140
context_->set_cmd();
11401141
for (std::unique_ptr<PrepackNode>& node : prepack_nodes_) {
11411142
// Do not trigger on the first or last prepack node.

0 commit comments

Comments
 (0)