Add Gadi Singularity container build files for UW3#133
Conversation
lmoresi
left a comment
There was a problem hiding this comment.
Thanks @jcgraciosa — multi-stage build is clean and the Gadi-specific environment touches (`OMPI_MCA_io=ompio`, `OPENBLAS_NUM_THREADS=1`, scratch redirect for the Singularity cache) are exactly right. Nice touch on the aarch64 graceful skips for gmsh/vtk-osmesa so the recipe still tests locally on Apple Silicon.
One change required, plus a few optional polish items.
Required: drop `pykdtree` from `underworld3.rhel`
Line ~143 of `underworld3.rhel` lists `pykdtree` in the pip install. UW3 removed this dependency in the Aug 2025 KDTree backend switch — `uw.kdtree` is now backed by ckdtree/nanoflann. Beyond being unused, pykdtree's OpenMP integration causes:
- Fatal crashes on macOS when loaded alongside PETSc/numpy/scipy (double `libomp.dylib` initialisation — C-level abort, not catchable by Python).
- Hangs under MPI during KDTree queries due to thread contention between OpenMP and the MPI processes.
The second one is the concern for this container specifically — Gadi runs are MPI by construction. Please drop `pykdtree` from the pip install list.
(Background and justification are in the planning entry titled "Remove pykdtree dependency", 2026-02-13.)
Optional polish
These are not blockers — happy to land the must-fix without them, and follow up later.
-
Default `PETSC_IMAGE` points at a personal namespace. `underworld3.rhel:24` defaults to `ghcr.io/jcgraciosa/petsc:3.25.0-ompi`. The README correctly tells users to override with `--build-arg`, but a downstream user copy-pasting the build command without reading the README would pull from your personal namespace. Either point the default at an org-level image (e.g. `ghcr.io/underworldcode/petsc:3.25.0-ompi`, once published) or leave the default empty so the build fails loudly instead of silently using the wrong image.
-
`--with-cxx-dialect=C++11` for PETSc 3.25. PETSc 3.25 requires C++14 as a minimum. The flag may still work for backwards compatibility but it's understating what PETSc actually needs — easier to drop it entirely (PETSc auto-detects) or set `C++14` explicitly.
-
`--download-fblaslapack=1` while runtime has `openblas`. The runtime layer installs system `openblas`, but PETSc downloads f2c BLAS/LAPACK separately. Using `--with-blaslapack-dir` against the system openblas would shrink the image a bit and speed up the configure stage. Optimisation, not correctness.
Cosmetic
- Containerfile build-command comments reference `./docs/development/gadi_singularity/` (in two places) while the actual path is `./docs/developer/` — two-letter typo.
- `petsc.rhel` ends without a trailing newline.
Looks good
- Multi-stage `runtime → builder → final` keeps the final image small.
- Patch application has a graceful "already-merged-upstream" fallback.
- petsc4py/slepc4py install failure dumps the build log on exit — good debuggability.
- `mpi4py` forced `--no-binary` against openmpi and `h5py` rebuilt `HDF5_MPI=ON` against PETSc's HDF5 — both essential and easy to get wrong.
Happy to approve and merge once `pykdtree` is removed. The polish items can roll into this PR or a follow-up — whichever you prefer.
Underworld development team with AI support from Claude Code
986513b to
06813bd
Compare
Since 2026-04-30 every push to development and every PR has been failing
CI with the runner timing out:
Resolving Environment ⧖ Starting # 02:51:02
##[error]The runner has received a shutdown signal. # 03:57:25
That's 66 minutes spent in micromamba's conda-forge solve before the
test step starts, after which GitHub kills the runner. environment.yaml
has loose constraints (python <= 3.11, an exact pin on petsc=3.21.5
that's now a year-old in conda-forge, and several unpinned packages
including pykdtree which UW3 doesn't actually use any more) — that
combination, plus recent conda-forge package state shifts, has put the
solver into deep backtracking.
The pixi.lock committed in the repo already captures the exact same
dependency set we use locally for development. Using
prefix-dev/setup-pixi with frozen: true (refuses to re-solve) gives a
deterministic, fast install matching local dev state. Bonus: the build
step now uses `pixi run -e dev build` (= `pip install . --no-build-isolation`
from pixi.toml), avoiding the editable install the previous CI was doing
in violation of project policy (CLAUDE.md "NEVER use pip install -e .").
Replaces the workflow file in place; keeping environment.yaml in the
repo for now because PR underworldcode#133 (Gadi Singularity container) consumes it,
and removing it would create a cross-PR ordering hazard.
Test plan: this commit's own CI run is the test plan — if it goes
green in <10 min instead of red in 90 min, we have our answer.
Underworld development team with AI support from Claude Code
Adds two Containerfiles and a README for building and running UW3 as a Singularity container on Gadi (NCI), modeled on the UW2 gadi_singularity setup.
Files added
docs/developer/gadi_singularity/petsc.rhel— builds PETSc 3.25.0 with full AMR support (petsc4py, slepc4py, mmg, parmmg, ptscotch, hypre, etc.) on Rocky Linux 8.10docs/developer/gadi_singularity/underworld3.rhel— builds UW3 on top of the PETSc imagedocs/developer/gadi_singularity/README.md— build and deployment instructionsTested
singularity pullNotes
COPY petsc-custom/patches/...)--platform linux/amd64required when building on Apple SiliconSINGULARITY_CACHEDIRmust point to scratch on Gadi (home quota too small)Future work
--mca btl ^openib)Underworld development team with AI support from Claude Code