Perf: parallelize count_pw_st with OpenMP collapse(2) by MiniYuanBot · Pull Request #7438 · deepmodeling/abacus-develop

MiniYuanBot · 2026-06-05T13:46:35Z

What's changed?

source/source_basis/module_pw/pw_distributeg.cpp (count_pw_st):
- Added OpenMP parallel for collapse(2) to the (ix, iy) double loop for plane-wave stick enumeration
- Added reduction(+: npwtot_local, nstot_local) for accumulation of total plane-wave and stick counts
- Added reduction(min/max: ...) for boundary coordinate tracking (lix, rix, liy, riy)
- This change accelerates the PW initialization stage, which becomes a bottleneck for large FFT grids
Performance impact (tested on Intel Core i7, GCC 13.3.0, -O3 -fopenmp, grid=256×256×256, repeats=10):
- Benchmark focuses on the modified count_pw_st function, which is the hotspot in PW initialization for large grids.

Threads	Total (ms)	Avg (ms)	Speedup	Efficiency
1	12363.61	1236.36	1.00	100.0%
2	6111.27	611.13	2.02	101.2%
4	3234.23	323.42	3.82	95.6%
8	2105.93	210.59	5.87	73.4%
12	1851.56	185.16	6.68	55.6%

Near-linear scaling up to 4 threads (efficiency >95%)
8-thread efficiency drops to ~73% due to memory bandwidth saturation
12-thread marginal gain diminishes, consistent with SMT overhead on consumer-grade platforms
Behavior changes: None. The serial code path is preserved when _OPENMP is undefined. All existing MODULE_PW_* unit tests (12/12) continue to pass.

MiniYuanBot · 2026-06-05T14:00:41Z

\label project_learning
This is Problem 1 of the assignment01 on the plane wave module.
Thanks for the review: )

Qianruipku · 2026-06-06T09:04:31Z

+    int liy_local = 0, riy_local = 0;
+
+#ifdef _OPENMP
+    #pragma omp parallel for collapse(2) \


Did you compare the performance with collapse(1)? In this kind of loop nest, collapse(1) is often faster than collapse(2) when using the same level of parallelism.

Besides, could you compare the single-thread performance with and without OpenMP? I think collapse(2) might still be much slower even with one thread.

Here is the benchmark on $1024^3$ grid, averaged over 10 runs, GCC -O3, OMP_PROC_BIND=close.

Threads collapse=1 (ms) collapse=2 (ms) No Pragma (ms) Diff (2 vs 1)

1 1605.65 1613.87 1606.42 +0.5%

4 404.674 406.729 1606.07 +0.5%

8 256.21 255.527 1606.35 −0.3%

12 232.543 233.994 1607.89 +0.6%

It seems like collapse(2) shows no measurable advantage over collapse(1).
So I've switched to collapse(1). Thanks for the suggestion!

add for collapse(2)

536816f

mohanchen added the project_learning label Jun 5, 2026

mohanchen requested a review from Qianruipku June 5, 2026 22:03

mohanchen assigned Qianruipku Jun 5, 2026

Qianruipku reviewed Jun 6, 2026

View reviewed changes

MiniYuanBot added 2 commits June 6, 2026 19:19

switch to collapse(1)

4ee64b1

Merge branch 'develop' into feat/openmp-collapse-for-loop

bd4c1cd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perf: parallelize count_pw_st with OpenMP collapse(2)#7438

Perf: parallelize count_pw_st with OpenMP collapse(2)#7438
MiniYuanBot wants to merge 3 commits into
deepmodeling:developfrom
mystic-qaq:feat/openmp-collapse-for-loop

MiniYuanBot commented Jun 5, 2026 •

edited

Loading

Uh oh!

MiniYuanBot commented Jun 5, 2026 •

edited

Loading

Uh oh!

Qianruipku Jun 6, 2026

Uh oh!

Qianruipku Jun 6, 2026

Uh oh!

MiniYuanBot Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Threads	collapse=1 (ms)	collapse=2 (ms)	No Pragma (ms)	Diff (2 vs 1)
1	1605.65	1613.87	1606.42	+0.5%
4	404.674	406.729	1606.07	+0.5%
8	256.21	255.527	1606.35	−0.3%
12	232.543	233.994	1607.89	+0.6%

Conversation

MiniYuanBot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's changed?

Uh oh!

MiniYuanBot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Qianruipku Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

Qianruipku Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

MiniYuanBot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

MiniYuanBot commented Jun 5, 2026 •

edited

Loading

MiniYuanBot commented Jun 5, 2026 •

edited

Loading