Refactor/merge openmp#7446
Open
lijianing-sudo wants to merge 13 commits into
Open
Conversation
…icle_thermo velocity scaling
…T, NHC, FIRE, LJ) Cover 6 remaining hot-path per-atom loops that were not parallelized in the prior merge-openmp branch: - md_func.cpp: rescale_vel() — velocity rescaling factor apply - msst.cpp: vel_sum() — norm2 reduction, propagate_vel() — exp-based velocity propagation (highest compute density among uncovered loops) - nhchain.cpp: vel_baro() — NPT per-atom velocity scaling - fire.cpp: check_fire() — triple reduction + velocity mixing + zero - esolver_lj.cpp: runner() — N² neighbor pair computation with schedule(dynamic) for load balancing, per-thread virial accumulation All optimizations use schedule(static) with nat>=256 threshold (LJ uses dynamic,32 for neighbor-count imbalance). No data dependencies changed — all loops are per-atom independent. No conflict with prior merge-openmp branch.
The 'if' clause is only valid on '#pragma omp parallel', not on '#pragma omp for' when used inside an explicit parallel region. This caused a compile error: 'if' is not valid for '#pragma omp for'.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR: OpenMP Parallel Optimization for ABACUS MD Module and ML Potential Interfaces (NEP/DPMD/LJ)
Reminder
Linked Issue
Fix #...
Unit Tests and/or Case Tests for my changes
Existing Unit Tests Pass:
MODULE_MD_LJ_pot(6 tests)MODULE_MD_func(7 tests)MODULE_MD_fireMODULE_MD_verletMODULE_MD_nhcMODULE_MD_msstMODULE_MD_lgvTest Infrastructure:
source/source_md/test/md_test_fixture.h) to eliminate duplicated SetUp/TearDown across 6 test files.Microbenchmark Verification:
Test/openmp_nep_basic_benchmark.cppand companion scripts).max_abs_diff = 0).1e-10to1e-8level due to summation order changes — expected and acceptable for MD trajectories.What's changed?
This PR integrates OpenMP parallelization from three feature branches (
refactor/md-factory,refactor/parallel-optimize,refactor/md-openmp-remainder) into the ABACUS MD module and ML potential interfaces. 22 parallel loops or worksharing regions are added across 12 source files (+3934/−342 lines total).1. MD Base Loops (
source/source_md/)MD_base::update_pos()md_base.cpp#pragma omp parallel for schedule(static)MD_base::update_vel()md_base.cpp#pragma omp parallel for schedule(static)kinetic_energy()md_func.cppreduction(+:ke)force_virial()force copymd_func.cpptemp_vector()md_func.cpprescale_vel()md_func.cppschedule(static)All loops use
if (natom >= 256)to skip parallel overhead for small systems.2. NEP Interface (
source/source_esolver/esolver_nep.cpp/.h)atom_type_index/atom_local_indexindex caches for flatiat-based parallel loops.nep.compute()external library call remains serial.3. DPMD Interface (
source/source_esolver/esolver_dp.cpp/.h)iat → (it, ia)index caches.dp_cell,dp_coord,dp_model_force,dp_model_virial) to avoid repeated allocations.dp.compute()external library call and 3×3 virial copy-back remain serial.4. Thermostat and Barostat (
source/source_md/)Verletthermalize()velocity rescalingverlet.cppMSSTrescale()shock-direction velocity scalingmsst.cppMSSTvel_sum()velocity norm reductionmsst.cppMSSTpropagate_vel()per-atom velocity propagationmsst.cppNoseHooverparticle_thermo()final velocity scalingnhchain.cppNoseHoovervel_baro()barostat velocity updatenhchain.cppThermostat chain recurrence integration and cell dilation remain serial.
5. FIRE Algorithm (
source/source_md/fire.cpp)FIRE::check_fire()parallelized in three phases:P,sumforce,normvelP <= 0branch)Scalar state updates (
alpha,negative_count,dt) remain serial.6. LJ Interface (
source/source_esolver/esolver_lj.cpp/.h)iat-based loop.schedule(dynamic, 32)to handle neighbor-count imbalance.atomic(energy) andcritical(virial) reduction at thread exit — no per-neighbor locks.7. Code Quality Refactors
calc_kinetic_state()/calc_stress_state()(md_func.h,md_statistics.h).new/delete→std::unique_ptr(run_md.cpp).Performance Summary (Microbenchmark, 8 threads, 2M atoms, Xeon Platinum 8163)
update_posupdate_velkinetic_energytemp_vectorcoord_fillenergy_sumforce_fillvirial_sumcoord_fillforce_copythermalizerescalepropagate_velparticle_thermocheck_fire(mix)runnercore loop*NEP virial 14.24× includes loop reorganization benefits beyond pure 8-thread scaling.
Known Limitations & Future Work
__NEP,deepmd).nat >= 256is an empirical uniform threshold; per-kernel tuning (64/128/256/512) is recommended.schedule(dynamic, 32)vsstaticand optimal chunk size have not been systematically benchmarked across different neighbor distributions.Any changes of core modules? (ignore if not applicable)
The MD ESolver interface layer (
esolver_nep.cpp,esolver_dp.cpp,esolver_lj.cpp) is modified to add index caches and parallel worksharing constructs. No changes to the ESolver base class virtual function signatures. All external library calls (nep.compute(),dp.compute()) remain serial and their calling convention is unchanged.