Skip to content

Share the global-parameters core across executables; generate MPI broadcasts; build hygiene#1552

Open
sbryngelson wants to merge 27 commits into
MFlowCode:masterfrom
sbryngelson:executable-dedup
Open

Share the global-parameters core across executables; generate MPI broadcasts; build hygiene#1552
sbryngelson wants to merge 27 commits into
MFlowCode:masterfrom
sbryngelson:executable-dedup

Conversation

@sbryngelson

Copy link
Copy Markdown
Member

Stacked on #1551 — this branch includes those commits; the diff below them is this PR's own content (14 commits). Review after (or alongside) #1551; it will rebase cleanly once #1551 merges.

Summary

Third installment of the parameter/declaration single-source-of-truth work (#1550, #1551). Two thrusts:

  1. Executable dedup. The three per-executable m_global_parameters.fpp shared 50–90% of their content by copy-paste. A new src/common/m_global_parameters_common.fpp now owns the generated declaration includes, the shared state (eqn_idx, sys_size, b_size, tensor_size, shear bookkeeping), the equation-index setup (s_initialize_eqn_idx), parallel-IO init, the finalize core, and the common default assignments. The per-target modules keep only what genuinely differs (QBMM moment encodings, GPU update blocks, post's output state — each leave-behind verified divergent-by-need). The three m_mpi_proxy.fpp broadcast lists are now generated from the parameter registry (generated_bcast.fpp per target): a registered namelist parameter is broadcast by construction, with only computed variables (m_glb, cfl_dt, bc_io, …) remaining hand-listed. Net src/ change: −477 LOC.

  2. Build hygiene. CMakeLists.txt (1,135 lines) splits into a ~460-line root plus cmake/{GPU,Fypp,ParamsCodegen,MFCTargets}.cmake — pure motion, verified by byte-identical configure graphs. The generated includes (now 15: namelist, decls, constants, case-opt, bcast × 3 targets) regenerate via a single ninja-tracked custom command; the configure-time bootstrap and its mtime stamp are deleted (editing params/*.py + rebuild now always yields a correct binary, no reconfigure). Build variant directories get readable names: build/install/gpu-acc-chem-f93aa400b9 instead of de835ec46c (hash inputs unchanged — variant identity preserved).

Bug fix (deliberate behavior change #1)

post_process never broadcast chem_wrt_Y despite consuming it on all ranks in s_save_data — silent wrong/missing species output for multi-rank chemistry post-processing. The registry-driven generator fixes this by construction. (Broadcast tuple-set deltas vs. before, verified per target: simulation identical; pre_process adds only the dead-but-registered n_start_old; post_process adds only chem_wrt_Y.)

Deliberate behavior change #2

post_process's beta_idx (Lagrangian bubble void fraction) is now appended after the MHD/elasticity/surface-tension blocks, matching simulation's writer layout — the previous post-only ordering misread restart data for bubbles_lagrange combined with those features. Covered by the full golden suite.

Verification

  • Full golden suite (sharded) green on the dedup stack and again on the final rebased tip (post-Add Herschel-Bulkley non-Newtonian viscosity model #1545), including the chemistry tests that exercise per-case rebuilds.
  • Emitted-.f90 declaration-set and GPU-directive-set equivalence per target for every hoist step; broadcast tuple-set equivalence per target; configure-graph byte-equivalence for the CMake split; cold-start + incremental-regeneration proofs for the generation rework.

Notes for reviewers

  • src/common continues to compile once per executable with MFC_<TARGET> defines — deliberate and now documented in contributing.md ("How the Build Fits Together"); it is what enables per-target generated includes and the shared module pattern.
  • A follow-up PR (broadcast correctness) fixes three pre-existing multi-rank broadcast bugs found while auditing the generated lists; it is stacked on this branch.
  • Old hash-named build directories orphan until ./mfc.sh clean.

Sweeps bubble_model, avg_state, wave_speeds, recon_type, muscl_order, muscl_lim, int_comp, format, and precision (99 sites). Includes renaming legacy WENO_TYPE/MUSCL_TYPE comparisons to recon_type_weno/recon_type_muscl (same values) and two select-case labels in m_qbmm; WENO_TYPE/MUSCL_TYPE retirement is a follow-up.
Fypp resolves #:include at parse time, so generated_case_opt_decls.fpp is now emitted for every target (header-only stub outside simulation) and included unconditionally. Restores the dropped shear GPU_DECLARE (consumed in device kernels), restores the original nmom guard conditions in pre/post, and drops unused imports. The post-process beta_idx ordering change from the hoist is retained deliberately: it aligns post with simulation's writer layout (the parent ordering was inconsistent for bubbles_lagrange combined with mhd/elasticity/etc.).
Simulation declares nb in its own case-optimization block (excluded from generated decls), so the hoisted routine cannot reference it directly; pass it as an argument like nmom.
Drive s_mpi_bcast_user_inputs in all three m_mpi_proxy.fpp files from
generate_bcast_fpp(target) in fortran_gen.py. The generator emits case_dir,
class-(a) scalars (INT/LOG/REAL), FORTRAN_ARRAY_DIMS arrays, and the
fluid_pp / bub_pp / lag_params / chem_params struct-array loops. Manual
residues (bc_x/y/z members, domain bounds, m/n/p_glb, patch loops, etc.)
stay in each file. get_generated_files() grows from 12 to 15 entries.

Latent bug fixed: chem_wrt_Y (post, FORTRAN_ARRAY_DIMS dim=num_species)
was namelist-bound but never broadcast; consumed by s_save_data on all
ranks. Registry-driven generation closes it by construction.

Tuple-set equivalence: pre +n_start_old (dead param, harmless), post
+chem_wrt_Y only, sim exact identity.
bc_x/y/z are per-target declarations (the documented multi-variable-line exclusion), so their default assignments cannot live in m_global_parameters_common; the hoist had moved them and broke compilation. Restores the three BC default blocks to each executable after the s_assign_common_defaults call.
Verbatim motion only — no renames, reflows, or logic changes.

Section map (original line ranges → files):
  lines 129-331  compiler flags / GPU logic → cmake/GPU.cmake
  lines 107-461  FYPP_EXE discovery + HANDLE_SOURCES → cmake/Fypp.cmake
  lines 464-541  params codegen stamp + gen-file lists + custom command/target → cmake/ParamsCodegen.cmake
  lines 549-821  MFC_SETUP_TARGET function → cmake/MFCTargets.cmake

include() order in root CMakeLists.txt:
  1. cmake/GPU.cmake         (sets FYPP_GCOV_OPTS, NVHPC_USE_TWO_PASS_IPO, MFC_CUDA_CC)
  2. cmake/Fypp.cmake        (finds FYPP_EXE, defines HANDLE_SOURCES — needs FYPP_GCOV_OPTS)
  3. cmake/ParamsCodegen.cmake (sets _mfc_gen_files_* lists — must precede HANDLE_SOURCES calls)
  4. HANDLE_SOURCES calls    (in root — consume _mfc_gen_files_* and FYPP_EXE)
  5. cmake/MFCTargets.cmake  (defines MFC_SETUP_TARGET — needs NVHPC_USE_TWO_PASS_IPO)
  6. MFC_SETUP_TARGET calls  (in root — instantiate targets)
  7. docs section            (in root — unchanged)

Equivalence gate: cmake configure of -DMFC_PRE_PROCESS=ON -DMFC_MPI=OFF
-DCMAKE_BUILD_TYPE=Release BEFORE and AFTER, path-normalized diff of
CMakeCache.txt, Makefile, CMakeFiles/pre_process.dir/flags.make, and
CMakeFiles/pre_process.dir/build.make — all empty (identical).

A clean 3-target build rides the next scheduled gate.
…ludes

Delete the configure-time execute_process + stamp block from
cmake/ParamsCodegen.cmake (29 lines removed). The build-time
add_custom_command is now the sole mechanism that writes the 15
generated_*.fpp includes.

Cold-start proof (scratch build dir, no pre-existing includes):
- cmake configure succeeds with include/ absent
- make -n shows cmake_gen.py scheduled before all fypp steps
- make mfc_params_gen produces all 15 files across 3 target dirs
Incremental proof: touch toolchain/mfc/params/definitions.py ->
make -n schedules regeneration without reconfigure.

cmake_gen.py already calls path.parent.mkdir(parents=True,exist_ok=True)
so no file(MAKE_DIRECTORY) guards were added. find_package(Python3) is
kept in ParamsCodegen.cmake (the CMakeLists.txt one is in a docs block
that runs later). Full build+test rides the next gate.
Copilot AI review requested due to automatic review settings June 11, 2026 01:04
@github-actions

Copy link
Copy Markdown

PR #1552 Review

HEAD SHA: 525e0bd
Scope: CMakeLists refactor, params code-generation, shared m_global_parameters_common, WENO/MUSCL constant naming.


Summary

A large structural refactor with clear benefits: CMakeLists.txt is split into four focused cmake/ includes, three per-target m_global_parameters.fpp files lose duplicated boilerplate, and the hand-written MPI broadcast loops across all three executables are replaced with generated code. The architecture and intent are sound. One MPI correctness regression is present in the generated broadcast code that must be fixed before merge.


Critical — MPI Correctness Regression

Dropped fluid_pp member broadcasts for non-Newtonian fluids

File: toolchain/mfc/params/generators/fortran_gen.py, function _emit_fluid_pp

The generated fluid_pp broadcast loop in all three targets now broadcasts only six members:

fp_real_members = ["gamma", "pi_inf", "G", "cv", "qv", "qvp"]

The previous hand-written loops in all three m_mpi_proxy.fpp files explicitly broadcast eight additional members per fluid:

Member Type Purpose
K real Herschel-Bulkley consistency index
nn real power-law exponent
tau0 real yield stress
hb_m real Herschel-Bulkley smoothing exponent
mu_min real lower viscosity bound
mu_max real upper viscosity bound
mu_bulk real bulk viscosity
non_newtonian logical non-Newtonian feature flag

These members are read directly from the namelist on rank 0 only; they must be broadcast to reach all other MPI ranks. Since fluid_pp is in TYPED_DECLS, _classify_scalar_vars skips it entirely — the only path for these members is through _emit_fluid_pp. None of the three targets emit a broadcast for them.

Effect: In any multi-rank run with non_newtonian = .true., non-zero ranks have non_newtonian = .false. and all Herschel-Bulkley parameters at their defaults. The simulation produces wrong answers with no error or warning.

Fix: Add the missing members to fp_real_members (and emit non_newtonian separately as MPI_LOGICAL):

fp_real_members = ["gamma", "pi_inf", "G", "cv", "qv", "qvp",
                   "K", "nn", "tau0", "hb_m", "mu_min", "mu_max", "mu_bulk"]

Then emit the logical member after the real loop:

lines.append(f"    call MPI_BCAST(fluid_pp(i)%non_newtonian, 1, MPI_LOGICAL, 0, MPI_COMM_WORLD, ierr)")

The companion unit test in test_fortran_gen.py should be extended to assert these members appear for all three targets.


Minor Observations

Comment formatting in m_constants.fpp

Line 53 reads:

! Reconstruction Types Interface Compression

This appears to be two section headings merged without a separator, dropping the blank line that formerly separated the "Reconstruction Types" constants from the "Interface Compression" constants. Not a correctness issue, but a blank comment line between them restores the intent.


What Looks Good

  • CMakeLists.txt split into cmake/GPU.cmake, cmake/Fypp.cmake, cmake/MFCTargets.cmake, cmake/ParamsCodegen.cmake is clean. Build-time codegen via ninja custom commands replaces configure-time execute_process, which is the correct pattern.
  • m_global_parameters_common.fpp correctly guards all case-optimization declarations with #:if not MFC_CASE_OPTIMIZATION, and the stub generated for pre/post confirms no double-declaration risk.
  • Named constants (recon_type_weno, recon_type_muscl, etc.) replacing magic literals WENO_TYPE = 1/MUSCL_TYPE = 2 are an unambiguous improvement, properly sourced from generated_constants.fpp.
  • s_assign_common_defaults and the partial-initialization pattern (common then per-target) correctly replicate the original default assignment order.
  • build.py slug now includes a human-readable prefix; no correctness impact.
  • The manual MPI residue kept in the three m_mpi_proxy.fpp files (grid dimensions, BC structs, cfl_dt, bc_io, shear/body forces, lag/bub/chem guards) matches what is genuinely not auto-generatable.

Recommendation

Request changes. The dropped fluid_pp non-Newtonian broadcasts are a silent correctness regression for multi-rank non-Newtonian runs. Fix _emit_fluid_pp, extend the unit test to cover the missing members for all three targets, then this is ready to merge.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

The hardcoded fluid_pp emitter predated the MFlowCode#1545 merge and silently dropped K/nn/tau0/hb_m/mu_min/mu_max/mu_bulk and non_newtonian from the generated lists - a multi-rank regression for non-Newtonian runs relative to the manual lists it replaced. Emitted sets now verified member-identical to master's manual lists per target. Caught by automated PR review.
@sbryngelson

Copy link
Copy Markdown
Member Author

Confirmed and fixed in e4bb56a: the hardcoded fluid_pp emitter predated the #1545 merge and dropped the Herschel-Bulkley members (K, nn, tau0, hb_m, mu_min, mu_max, mu_bulk, non_newtonian) from the generated broadcast lists. The emitted sets are now verified member-identical to master's manual lists for all three targets, and the test suite pins the HB members so this class of escape fails loudly. The stacked follow-up PR (#1553) additionally converts this emitter to a registry walk, making new fluid_pp members broadcast by construction. Good catch.

The named-constant retirement deleted the WENO/MUSCL block between two section headings and merged them into one nonsensical comment. Flagged by automated PR review on MFlowCode#1552.
@sbryngelson

Copy link
Copy Markdown
Member Author

Both review items now addressed on this branch: the fluid_pp broadcast regression (e4bb56a, member-identical to master's manual lists, test-pinned) and the merged m_constants section heading (dac3d1f). Thanks for the thorough review.

…h dedup docs

NIB read num_ib_patches_max (2050000) but the patch_ib namelist array is dimensioned num_ib_patches_max_namelist (54000), so validation accepted indices that overflow the array. Also refreshes three doc passages made stale by the dedup. Both flagged by review.
@sbryngelson

Copy link
Copy Markdown
Member Author

Code review

Found 1 issue (beyond the two from the earlier automated review, both already addressed):

  1. NIB reads the wrong Fortran constant: num_ib_patches_max (2,050,000) instead of num_ib_patches_max_namelist (54,000), while the patch_ib namelist array is dimensioned by the latter — so case validation accepted patch_ib indices that overflow the Fortran array. The adjacent comment even claims it enforces the namelist limit. Fixed in f42e532 along with three doc passages made stale by the dedup.

NUM_PATCHES_MAX = _fc("num_patches_max", 10) # patch_icpp (Fortran array bound)
NIB = _fc("num_ib_patches_max", 54000) # patch_ib (Fortran array bound)
NAF = _fc("num_ib_airfoils_max", 5) # ib_airfoil (Fortran array bound)

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

Cray ftn rejects declare-target on use-associated names (ftn-1448 on Frontier gpu-omp): the dedup hoisted these declarations into the common module but left their GPU_DECLARE lines in simulation. Declares move to the declaring module; mixed lines split so locally-declared variables keep their declares in place. CPU-preprocessed output verified identical; declare-target scoping verified clean for both files under OpenMP emission.
@codecov

codecov Bot commented Jun 11, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 55.63218% with 193 lines in your changes missing coverage. Please review.
✅ Project coverage is 60.99%. Comparing base (ac30c32) to head (eaf67d2).

Files with missing lines Patch % Lines
src/common/m_global_parameters_common.fpp 71.00% 31 Missing and 18 partials ⚠️
src/post_process/m_global_parameters.fpp 44.82% 23 Missing and 9 partials ⚠️
src/pre_process/m_global_parameters.fpp 53.70% 22 Missing and 3 partials ⚠️
src/post_process/m_data_output.fpp 5.00% 7 Missing and 12 partials ⚠️
src/simulation/m_global_parameters.fpp 63.26% 12 Missing and 6 partials ⚠️
src/simulation/m_riemann_solvers.fpp 25.00% 4 Missing and 5 partials ⚠️
src/common/m_mpi_common.fpp 0.00% 0 Missing and 7 partials ⚠️
src/simulation/m_qbmm.fpp 50.00% 0 Missing and 5 partials ⚠️
src/simulation/m_bubbles_EL.fpp 0.00% 0 Missing and 4 partials ⚠️
src/simulation/m_muscl.fpp 0.00% 0 Missing and 4 partials ⚠️
... and 9 more
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1552      +/-   ##
==========================================
- Coverage   61.17%   60.99%   -0.19%     
==========================================
  Files          74       75       +1     
  Lines       20313    19965     -348     
  Branches     2961     2930      -31     
==========================================
- Hits        12427    12177     -250     
+ Misses       5870     5806      -64     
+ Partials     2016     1982      -34     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants