Fix three pre-existing multi-rank broadcast defects; deregister dead parameters#1553
Open
sbryngelson wants to merge 31 commits into
Open
Fix three pre-existing multi-rank broadcast defects; deregister dead parameters#1553sbryngelson wants to merge 31 commits into
sbryngelson wants to merge 31 commits into
Conversation
Sweeps bubble_model, avg_state, wave_speeds, recon_type, muscl_order, muscl_lim, int_comp, format, and precision (99 sites). Includes renaming legacy WENO_TYPE/MUSCL_TYPE comparisons to recon_type_weno/recon_type_muscl (same values) and two select-case labels in m_qbmm; WENO_TYPE/MUSCL_TYPE retirement is a follow-up.
Fypp resolves #:include at parse time, so generated_case_opt_decls.fpp is now emitted for every target (header-only stub outside simulation) and included unconditionally. Restores the dropped shear GPU_DECLARE (consumed in device kernels), restores the original nmom guard conditions in pre/post, and drops unused imports. The post-process beta_idx ordering change from the hoist is retained deliberately: it aligns post with simulation's writer layout (the parent ordering was inconsistent for bubbles_lagrange combined with mhd/elasticity/etc.).
Simulation declares nb in its own case-optimization block (excluded from generated decls), so the hoisted routine cannot reference it directly; pass it as an argument like nmom.
Drive s_mpi_bcast_user_inputs in all three m_mpi_proxy.fpp files from generate_bcast_fpp(target) in fortran_gen.py. The generator emits case_dir, class-(a) scalars (INT/LOG/REAL), FORTRAN_ARRAY_DIMS arrays, and the fluid_pp / bub_pp / lag_params / chem_params struct-array loops. Manual residues (bc_x/y/z members, domain bounds, m/n/p_glb, patch loops, etc.) stay in each file. get_generated_files() grows from 12 to 15 entries. Latent bug fixed: chem_wrt_Y (post, FORTRAN_ARRAY_DIMS dim=num_species) was namelist-bound but never broadcast; consumed by s_save_data on all ranks. Registry-driven generation closes it by construction. Tuple-set equivalence: pre +n_start_old (dead param, harmless), post +chem_wrt_Y only, sim exact identity.
bc_x/y/z are per-target declarations (the documented multi-variable-line exclusion), so their default assignments cannot live in m_global_parameters_common; the hoist had moved them and broke compilation. Restores the three BC default blocks to each executable after the s_assign_common_defaults call.
Verbatim motion only — no renames, reflows, or logic changes. Section map (original line ranges → files): lines 129-331 compiler flags / GPU logic → cmake/GPU.cmake lines 107-461 FYPP_EXE discovery + HANDLE_SOURCES → cmake/Fypp.cmake lines 464-541 params codegen stamp + gen-file lists + custom command/target → cmake/ParamsCodegen.cmake lines 549-821 MFC_SETUP_TARGET function → cmake/MFCTargets.cmake include() order in root CMakeLists.txt: 1. cmake/GPU.cmake (sets FYPP_GCOV_OPTS, NVHPC_USE_TWO_PASS_IPO, MFC_CUDA_CC) 2. cmake/Fypp.cmake (finds FYPP_EXE, defines HANDLE_SOURCES — needs FYPP_GCOV_OPTS) 3. cmake/ParamsCodegen.cmake (sets _mfc_gen_files_* lists — must precede HANDLE_SOURCES calls) 4. HANDLE_SOURCES calls (in root — consume _mfc_gen_files_* and FYPP_EXE) 5. cmake/MFCTargets.cmake (defines MFC_SETUP_TARGET — needs NVHPC_USE_TWO_PASS_IPO) 6. MFC_SETUP_TARGET calls (in root — instantiate targets) 7. docs section (in root — unchanged) Equivalence gate: cmake configure of -DMFC_PRE_PROCESS=ON -DMFC_MPI=OFF -DCMAKE_BUILD_TYPE=Release BEFORE and AFTER, path-normalized diff of CMakeCache.txt, Makefile, CMakeFiles/pre_process.dir/flags.make, and CMakeFiles/pre_process.dir/build.make — all empty (identical). A clean 3-target build rides the next scheduled gate.
…ludes Delete the configure-time execute_process + stamp block from cmake/ParamsCodegen.cmake (29 lines removed). The build-time add_custom_command is now the sole mechanism that writes the 15 generated_*.fpp includes. Cold-start proof (scratch build dir, no pre-existing includes): - cmake configure succeeds with include/ absent - make -n shows cmake_gen.py scheduled before all fypp steps - make mfc_params_gen produces all 15 files across 3 target dirs Incremental proof: touch toolchain/mfc/params/definitions.py -> make -n schedules regeneration without reconfigure. cmake_gen.py already calls path.parent.mkdir(parents=True,exist_ok=True) so no file(MAKE_DIRECTORY) guards were added. find_package(Python3) is kept in ParamsCodegen.cmake (the CMakeLists.txt one is in a docs block that runs later). Full build+test rides the next gate.
The hardcoded fluid_pp emitter predated the MFlowCode#1545 merge and silently dropped K/nn/tau0/hb_m/mu_min/mu_max/mu_bulk and non_newtonian from the generated lists - a multi-rank regression for non-Newtonian runs relative to the manual lists it replaced. Emitted sets now verified member-identical to master's manual lists per target. Caught by automated PR review.
c6a802c to
7209e80
Compare
The named-constant retirement deleted the WENO/MUSCL block between two section headings and merged them into one nonsensical comment. Flagged by automated PR review on MFlowCode#1552.
7209e80 to
740a29f
Compare
…h dedup docs NIB read num_ib_patches_max (2050000) but the patch_ib namelist array is dimensioned num_ib_patches_max_namelist (54000), so validation accepted indices that overflow the array. Also refreshes three doc passages made stale by the dedup. Both flagged by review.
740a29f to
ea0f69b
Compare
Member
Author
Code reviewFound 1 issue:
MFC/src/pre_process/m_mpi_proxy.fpp Lines 44 to 47 in 740a29f 🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
ea0f69b to
f9a08e3
Compare
Cray ftn rejects declare-target on use-associated names (ftn-1448 on Frontier gpu-omp): the dedup hoisted these declarations into the common module but left their GPU_DECLARE lines in simulation. Declares move to the declaring module; mixed lines split so locally-declared variables keep their declares in place. CPU-preprocessed output verified identical; declare-target scoping verified clean for both files under OpenMP emission.
fluid_pp%{mul0,ss,pv,gamma_v,M_v,mu_v,k_v,cp_v,D_v} and
lag_params%{T0,Thost,c0,rho0,x0} were removed from the Fortran derived
types by upstream MFlowCode#1085/MFlowCode#1093 but remained registered in the toolchain.
Setting any of them causes namelist read to abort with a misleading
'datatype mismatch' error.
Verified against src/common/m_derived_types.fpp:
- physical_parameters has: gamma, pi_inf, Re, cv, qv, qvp, G
- bubbles_lagrange_parameters has: solver_approach, cluster_type,
pressure_corrector, smooth_type, heatTransfer_model, massTransfer_model,
write_bubbles, write_bubbles_stats, nBubs_glb, epsilonb, charwidth, valmaxvoid
Also remove the now-dead PATTERNS entries in descriptions.py and update
the stale comments in fortran_gen.py (_emit_fluid_pp and _emit_lag_params).
Both emitters now walk the registry rather than hardcoding member lists.
bc_x/y/z%{vb1,vb2,vb3,ve1,ve2,ve3} are read from the namelist on rank 0
and consumed on all ranks by s_slip_wall/s_no_slip_wall in
src/common/m_boundary_common.fpp (~833-1155). These routines are compiled
into all three targets and are reached from pre_process (m_data_output,
m_perturbation) and post_process (m_start_up) code paths.
Without the broadcast, non-root ranks use uninitialised values for the
wall-velocity components, producing rank-dependent ghost cells for any
multi-rank wall-BC pre-process or post-process run. The sim residue has
always broadcast these; pre and post were missing them.
Add the matching broadcasts to both pre and post using the same nested
Fypp loop idiom as the simulation residue.
muscl_eps was excluded from broadcast generation via _BCAST_EXCLUDE on the incorrect assumption that it is derived post-broadcast. The derivation (in m_weno or m_muscl) only fires under f_is_default(muscl_eps), and default values are assigned on rank 0 only. Every multi-rank MUSCL run therefore had rank-divergent muscl_eps on non-root ranks. Remove it from the exclusion set. Tuple-set delta (var, mpi_type, count) vs. HEAD~: sim: +1 entry: (muscl_eps, mpi_p, 1) pre: no change post: no change _emit_fluid_pp and _emit_lag_params now walk the registry instead of maintaining hardcoded member lists. After Commit 1 deregistered the dead members (mul0/ss/pv/gamma_v/M_v/mu_v/k_v/cp_v/D_v for fluid_pp; T0/Thost/c0/rho0/x0 for lag_params), the registry now matches the Fortran types exactly. Re(1) count=2 remains sim-only via an explicit target check with a comment. G is walked as a regular REAL member. Tests: 3 new tests added — muscl_eps now broadcast in sim, fluid_pp and lag_params registry walks produce exactly the registered members minus documented exclusions, dead members absent.
…idue broadcasts pre_process sent the integer bc_x/y/z%beg/%end with mpi_p (an 8-byte real transfer over 4-byte integers - undefined behavior that happened to work by adjacency); simulation and post_process already used MPI_INTEGER. Also adds a test pinning the hand-written vb/ve and BC-code residue in pre/post so a merge conflict cannot silently drop them. Both from review.
b11af6c to
1d2f9ba
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #1553 +/- ##
==========================================
- Coverage 61.17% 61.00% -0.18%
==========================================
Files 74 75 +1
Lines 20313 19970 -343
Branches 2961 2930 -31
==========================================
- Hits 12427 12182 -245
+ Misses 5870 5806 -64
+ Partials 2016 1982 -34 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
While converting the MPI broadcast lists to registry-driven generation (parent PR), a member-by-member audit of every namelist parameter's broadcast/consumption pattern turned up three pre-existing defects on
master. All three predate this work.bc_x/y/z%{vb1..3, ve1..3}were never broadcast inpre_process/post_processbut are consumed on all ranks bys_slip_wall/s_no_slip_wallinm_boundary_common— rank-dependent ghost cells for any multi-rank wall-velocity BC run in pre/post. Fixed by adding the member broadcasts (simulation already had them).muscl_epswas never broadcast, and its post-read derivation runs only underf_is_default(muscl_eps)with defaults assigned rank-0-only — so non-root ranks carried divergent values on every multi-rank MUSCL run. Fixed by broadcasting it like every other registered REAL.14 dead registrations crash the namelist read:
lag_params%{T0, Thost, c0, rho0, x0}andfluid_pp%{mul0, ss, pv, gamma_v, M_v, mu_v, k_v, cp_v, D_v}were removed from the derived types (Refactor subgrid bubble models #1085/fix swapped vapor gas properties #1093) but never deregistered from the toolchain — setting any of them in a case file aborts with a misleading "datatype mismatch". Deregistered; the case validator now rejects them by name instead.With the dead members gone, the
fluid_pp/lag_paramsbroadcast emitters convert from hardcoded member lists to registry walks — a member added to those types is now broadcast by construction, closing the gap that allowed bugs 1–2 to exist.Verification
muscl_epseverywhere it is registered) and the dead-member removals; everything else identical.