Commit 89e6416
[ET-VK] Lower reduce_peak_memory threshold from 500 MB to 10 MB
During prepack, staging buffers accumulate in buffers_to_clear_ until
flush() is called. Previously, the reduce_peak_memory path (which calls
submit_and_wait + flush to free staging buffers incrementally) only
triggered when total constant data exceeded 500 MB. This meant models
with moderate weight sizes (e.g. 42 MB) never benefited from incremental
cleanup, causing all staging buffers to coexist in memory until the
final flush.
Lowering the threshold to 10 MB enables incremental staging buffer
cleanup for most models. On SceneX V9 FP16 (42 MB weights, Samsung S24
Adreno 750), this reduces transient VMA peak during prepack from 89.6 MB
to 57.3 MB (-36%) at a cost of ~15 ms additional load latency (+4.4%).
Steady-state memory and inference performance are unaffected.
Authored with Claude.
Differential Revision: [D100332227](https://our.internmc.facebook.com/intern/diff/D100332227/)
ghstack-source-id: 365456277
Pull Request resolved: #188161 parent 8d1ff1a commit 89e6416
1 file changed
Lines changed: 2 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1134 | 1134 | | |
1135 | 1135 | | |
1136 | 1136 | | |
1137 | | - | |
| 1137 | + | |
1138 | 1138 | | |
| 1139 | + | |
1139 | 1140 | | |
1140 | 1141 | | |
1141 | 1142 | | |
| |||
0 commit comments