diff --git a/docs/docs/tutorials/fitting-bayesian.ipynb b/docs/docs/tutorials/fitting-bayesian.ipynb index 965bd570..1ee6da97 100644 --- a/docs/docs/tutorials/fitting-bayesian.ipynb +++ b/docs/docs/tutorials/fitting-bayesian.ipynb @@ -28,7 +28,7 @@ "\n", "where $\\theta$ are the model parameters, $d$ is the observed data, $p(d \\mid \\theta)$ is the likelihood, and $p(\\theta)$ is the prior. In `easyscience`, the `min`/`max` bounds of a `Parameter` are interpreted as a **uniform prior**, and a Gaussian likelihood is constructed from the data and supplied weights.\n", "\n", - "`easyscience` exposes a Bayesian Markov-chain Monte Carlo (MCMC) sampler through the `Fitter.mcmc_sample` method. Under the hood this uses BUMPS' DREAM sampler, so the underlying minimizer must be switched to BUMPS.\n", + "`easyscience` exposes a Bayesian Markov-chain Monte Carlo (MCMC) sampler through the `Sampler` class. Under the hood this uses BUMPS' DREAM sampler, so the underlying minimizer must be switched to BUMPS.\n", "\n", "```{note}\n", "This tutorial focuses on Bayesian analysis with a simple QENS model for illustration. For dedicated QENS fitting with more sophisticated models, consider using [`EasyDynamics`](https://github.com/easyscience/easydynamics).\n", @@ -45,7 +45,7 @@ "\n", "The trade-off is computational cost: MCMC requires thousands of model evaluations, whereas an MLE fit may converge in dozens. For the simple 4-parameter model used here the difference is negligible, but for expensive models it is worth starting with MLE and only switching to MCMC when you need the richer output.\n", "\n", - "In this tutorial we re-use the QENS dataset and the Lorentzian-with-resolution model from the [Fitting QENS](fitting-qens.ipynb) tutorial, but instead of returning a single best-fit value with a symmetric error bar we will draw thousands of samples from the posterior.\n" + "In this tutorial we re-use the QENS dataset and the Lorentzian-with-resolution model from the [Fitting QENS](fitting-qens.ipynb) tutorial, but instead of returning a single best-fit value with a symmetric error bar we will draw thousands of samples from the posterior." ] }, { @@ -250,20 +250,20 @@ "\n", "We now draw samples from the posterior distribution $p(\\theta \\mid d)$ using the BUMPS DREAM (DiffeRential Evolution Adaptive Metropolis) algorithm. DREAM is an ensemble MCMC method that runs multiple chains in parallel and automatically tunes the proposal distribution.\n", "\n", - "DREAM only works with the BUMPS minimizer. We reuse the ``mle_fitter`` created above, switch it to BUMPS, and call `Fitter.mcmc_sample`, which returns a dictionary with the following keys:\n", + "DREAM only works with the BUMPS minimizer. We reuse the ``mle_fitter`` created above, switch it to BUMPS, and create a `Sampler` instance bound to the fitter and data. Calling `sampler.sample()` returns a `SamplingResults` object with the following attributes:\n", "\n", "- `draws`: a `(n_samples, n_parameters)` array of posterior samples: each **row** is one complete draw from the joint posterior (one value for every parameter simultaneously), and each **column** holds all sampled values for a single parameter;\n", "- `param_names`: the unique names of the parameters, in the same column order as `draws`;\n", - "- `internal_bumps_object`: the underlying BUMPS `MCMCDraw` object, useful for advanced diagnostics;\n", + "- `state`: the underlying BUMPS `MCMCDraw` object, useful for advanced diagnostics;\n", "- `logp`: the log-posterior of each retained sample.\n", "\n", "The key sampling parameters are:\n", "\n", - "- `samples` (4000): the total number of posterior draws to generate across all chains;\n", + "- `samples` (10000): the total number of posterior draws to generate across all chains;\n", "- `burn` (500): the number of initial *burn-in* iterations to discard — the sampler needs time to find the typical set of the posterior, and early samples are not representative;\n", "- `thin` (2): the *thinning* interval — only every second sample is kept, which reduces autocorrelation between consecutive draws;\n", "\n", - "First, we switch to the BUMPS minimizer:\n" + "First, we switch to the BUMPS minimizer:" ] }, { @@ -285,17 +285,13 @@ "metadata": {}, "outputs": [], "source": [ - "result = mle_fitter.mcmc_sample(\n", - " x=omega,\n", - " y=intensity_obs,\n", - " weights=1 / intensity_error,\n", - " samples=10000,\n", - " burn=500,\n", - " thin=2,\n", - ")\n", + "from easyscience.fitting import Sampler\n", + "\n", + "sampler = Sampler(mle_fitter, omega, intensity_obs, weights=1 / intensity_error)\n", + "results = sampler.sample(samples=10000, burn=500, thin=2)\n", "\n", - "print(f'Drew {result[\"draws\"].shape[0]} samples for {result[\"draws\"].shape[1]} parameters.')\n", - "print('parameters:', result['param_names'])" + "print(f'Drew {results.draws.shape[0]} samples for {results.draws.shape[1]} parameters.')\n", + "print('parameters:', results.param_names)" ] }, { @@ -320,12 +316,9 @@ "metadata": {}, "outputs": [], "source": [ - "draws = result['draws']\n", - "logp = result['logp']\n", - "if callable(logp):\n", - " _, logp = logp()\n", - " logp = logp.flatten()\n", - "name_to_col = {name: idx for idx, name in enumerate(result['param_names'])}\n", + "draws = results.draws\n", + "logp = results.logp\n", + "name_to_col = {name: idx for idx, name in enumerate(results.param_names)}\n", "\n", "\n", "def column_for(parameter):\n", @@ -367,7 +360,7 @@ "The MLE finds the single point that maximises the likelihood, while the Bayesian median is the central value of the *posterior* — which also accounts for the prior $p(\\theta)$. If a parameter's posterior is skewed (asymmetric), the median and the mode (which approximates the MLE) will not coincide. The table below shows both so you can spot any such differences.\n", "\n", "\n", - "Note that the columns of `result['draws']` are ordered by `result['param_names']` (which use the parameters' `unique_name`), so it is worth building a small helper to look up a column by friendly name.\n" + "Note that the columns of `results.draws` are ordered by `results.param_names` (which use the parameters' `unique_name`), so it is worth building a small helper to look up a column by friendly name." ] }, { @@ -493,85 +486,6 @@ "plt.show()" ] }, - { - "cell_type": "markdown", - "id": "bcadc094", - "metadata": {}, - "source": [ - "## Persist and reload a chain\n", - "\n", - "In a real analysis you may want to **save a running chain to disk** for later inspection, or to resume it across compute sessions.\n", - "\n", - "`easyscience` provides robust I/O via ``save_sampler_state(state, prefix)`` and ``load_sampler_state(prefix)``. Additionally, ``Fitter.mcmc_sample()`` accepts a ``resume_state`` parameter — pass the ``MCMCDraw`` object from a previous run (or one reloaded from disk via ``load_sampler_state``) and DREAM **continues** the saved chain instead of starting cold.\n", - "\n", - "**Ring-buffer contract (important!)** DREAM stores draws in a fixed-size ring buffer sized to *samples*. To add N new draws on top of an existing chain of M draws without losing the old ones, pass ``samples = M + N`` and ``burn = 0``:\n", - "\n", - "```python\n", - "extended = fitter.mcmc_sample(..., samples=M + N, burn=0,\n", - " resume_state=previous_state)\n", - "```\n", - "\n", - "Here we save the chain from the previous exercise to disk, reload it, and verify the round-trip preserved the draws." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "b7ef160c", - "metadata": {}, - "outputs": [], - "source": [ - "import glob\n", - "import os\n", - "\n", - "from easyscience.fitting.minimizers import save_sampler_state\n", - "\n", - "# Save the chain from the previous exercise to disk.\n", - "# save_sampler_state writes several files using 'prefix' as a filename stem.\n", - "prefix = os.path.abspath('dream_chain_example')\n", - "save_sampler_state(result['internal_bumps_object'], prefix)\n", - "\n", - "saved_files = sorted(glob.glob(prefix + '-*'))\n", - "print(f'Saved {len(saved_files)} file(s):')\n", - "for f in saved_files:\n", - " print(f' {os.path.basename(f)}')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "3acaa5c6", - "metadata": {}, - "outputs": [], - "source": [ - "from easyscience.fitting.minimizers import load_sampler_state\n", - "\n", - "# Reload the saved state from disk.\n", - "reloaded = load_sampler_state(prefix)\n", - "\n", - "# Compare the draws before and after the round-trip. We draw with\n", - "# portion=1.0 on both so the comparison uses the full chain: an in-memory\n", - "# state may carry a convergence-trimmed `portion` (so draw() returns only a\n", - "# tail of the samples), whereas a state reloaded from disk always reports\n", - "# portion=1.0. The underlying chain data is identical either way.\n", - "_draw_orig = result['internal_bumps_object'].draw(portion=1.0)\n", - "_draw_reloaded = reloaded.draw(portion=1.0)\n", - "\n", - "print('Draws match:', np.allclose(_draw_orig.points, _draw_reloaded.points))\n", - "print('logp match:', np.allclose(_draw_orig.logp, _draw_reloaded.logp))\n", - "print('Nvar (params):', reloaded.Nvar)\n", - "print('Npop (pop. size):', reloaded.Npop)\n", - "print('Ncr:', reloaded.Ncr)\n", - "print('Labels:', reloaded.labels)\n", - "\n", - "# The reloaded state is a fully functional MCMCDraw that can be passed\n", - "# as resume_state= to MultiFitter.sample() to continue the chain.\n", - "# (Note: BUMPS' DREAM resume has a known limitation with thin>1 and\n", - "# outlier removal. load_sampler_state also works around a BUMPS >=1.0.4\n", - "# regression where a short chain's single-row buffers are read back as 1-D\n", - "# arrays; prefer it over calling bumps.dream.state.load_state directly.)" - ] - }, { "cell_type": "markdown", "id": "03339658", @@ -579,14 +493,9 @@ "source": [ "## Extend the chain and check convergence\n", "\n", - "The original run used ``samples=10000`` (5000 retained after thinning). Here we **extend the chain by 5000 more raw samples** using ``resume_state`` — DREAM continues from the saved population instead of starting cold.\n", - "\n", - "**Ring-buffer contract:** DREAM stores draws in a fixed-size ring buffer sized to *samples* (raw samples before thinning). To keep everything, pass ``samples = old_raw + new_raw`` and ``burn=0``:\n", + "The original run used ``samples=10000`` (5000 retained after thinning). Here we **extend the chain by 5000 more raw samples** using ``sampler.extend()`` — DREAM continues from the previous in-memory state instead of starting cold.\n", "\n", - "```python\n", - "extended = fitter.mcmc_sample(..., samples=old_raw + new_raw, burn=0,\n", - " resume_state=previous_state)\n", - "```\n", + "``sampler.extend(additional_samples=5000)`` does the ring-buffer arithmetic for you automatically: it reads the number of stored generations from the saved state and sizes the new buffer to ``old_generations + additional_samples`` so no existing draws are lost.\n", "\n", "After the extension we compare the posterior summaries and check convergence with Gelman-Rubin R-hat." ] @@ -599,25 +508,14 @@ "outputs": [], "source": [ "# Extend the existing chain by 5000 more raw samples.\n", - "# samples = original_raw + new_raw = 10000 + 5000 = 15000 raw\n", - "# burn=0 because the chain is already warm.\n", - "total_raw = 10000 + 5000\n", - "\n", - "extended = mle_fitter.mcmc_sample(\n", - " x=omega,\n", - " y=intensity_obs,\n", - " weights=1 / intensity_error,\n", - " samples=total_raw,\n", - " burn=0,\n", - " thin=2,\n", - " resume_state=result['internal_bumps_object'], # continue from where we left off\n", - ")\n", + "# extend() handles the ring-buffer arithmetic automatically.\n", + "extended_results = sampler.extend(additional_samples=5000, thin=2)\n", "\n", - "short_draws = result['draws']\n", - "extended_draws = extended['draws']\n", + "short_draws = results.draws\n", + "extended_draws = extended_results.draws\n", "\n", "print(f'Short chain: {short_draws.shape[0]} retained draws')\n", - "print(f'Extended chain: {extended_draws.shape[0]} retained draws (requested {total_raw} raw)')" + "print(f'Extended chain: {extended_draws.shape[0]} retained draws')" ] }, { @@ -628,16 +526,16 @@ "outputs": [], "source": [ "# Build helpers to look up parameter columns.\n", - "name_to_col_ext = {name: idx for idx, name in enumerate(extended['param_names'])}\n", - "name_to_col_orig = {name: idx for idx, name in enumerate(result['param_names'])}\n", + "name_to_col_ext = {name: idx for idx, name in enumerate(extended_results.param_names)}\n", + "name_to_col_orig = {name: idx for idx, name in enumerate(results.param_names)}\n", "\n", "\n", "def col_orig(par):\n", - " return result['draws'][:, name_to_col_orig[par.unique_name]]\n", + " return results.draws[:, name_to_col_orig[par.unique_name]]\n", "\n", "\n", "def col_ext(par):\n", - " return extended['draws'][:, name_to_col_ext[par.unique_name]]\n", + " return extended_results.draws[:, name_to_col_ext[par.unique_name]]\n", "\n", "\n", "# Side-by-side posterior summary: compare the first 5000 draws (before extension)\n", @@ -696,8 +594,8 @@ "# Convergence diagnostic: Gelman-Rubin R-hat from the extended chain state.\n", "# Values close to 1.0 indicate good convergence.\n", "print('Gelman-Rubin R-hat (extended chain) — values < 1.05 indicate convergence:')\n", - "rhat = extended['internal_bumps_object'].gelman()\n", - "for name, val in zip(extended['param_names'], rhat):\n", + "rhat = extended_results.state.gelman()\n", + "for name, val in zip(extended_results.param_names, rhat):\n", " status = '✓' if val < 1.05 else '?' if val < 1.1 else '✗'\n", " print(f' {name:<20s} {val:.4f} {status}')" ] @@ -705,7 +603,7 @@ ], "metadata": { "kernelspec": { - "display_name": ".venv (3.11.9)", + "display_name": "era", "language": "python", "name": "python3" }, @@ -719,7 +617,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.9" + "version": "3.12.11" } }, "nbformat": 4, diff --git a/src/easyscience/fitting/__init__.py b/src/easyscience/fitting/__init__.py index 54bad74a..cb9e691f 100644 --- a/src/easyscience/fitting/__init__.py +++ b/src/easyscience/fitting/__init__.py @@ -4,8 +4,10 @@ from .available_minimizers import AvailableMinimizers from .fitter import Fitter from .minimizers.utils import FitResults +from .sampler import Sampler +from .sampler import SamplingResults # Causes circular import # from .multi_fitter import MultiFitter # noqa: F401, E402 -all = [AvailableMinimizers, Fitter, FitResults] +all = [AvailableMinimizers, Fitter, FitResults, Sampler, SamplingResults] diff --git a/src/easyscience/fitting/fitter.py b/src/easyscience/fitting/fitter.py index 2c909f39..38ef7351 100644 --- a/src/easyscience/fitting/fitter.py +++ b/src/easyscience/fitting/fitter.py @@ -2,7 +2,6 @@ # SPDX-License-Identifier: BSD-3-Clause import functools -from typing import Any from typing import Callable from typing import List from typing import Optional @@ -413,140 +412,3 @@ def _post_compute_reshaping( fit_result.y_calc = np.reshape(fit_result.y_calc, y.shape) fit_result.y_err = np.reshape(fit_result.y_err, y.shape) return fit_result - - def mcmc_sample( - self, - x: Union[np.ndarray, List[np.ndarray]], - y: Union[np.ndarray, List[np.ndarray]], - weights: Union[Optional[np.ndarray], List[Optional[np.ndarray]]], - samples: int = 10000, - burn: int = 2000, - thin: int = 10, - chains: Optional[int] = None, - population: Optional[int] = None, - resume_state: Optional[Any] = None, - vectorized: bool = False, - sampler_kwargs: Optional[dict] = None, - progress_callback: Optional[Callable[[dict], Optional[bool]]] = None, - abort_test: Optional[Callable[[], bool]] = None, - ) -> dict: - """Run Bayesian MCMC sampling using the BUMPS DREAM sampler. - - Works with both a plain ``Fitter`` (single dataset) and a - ``MultiFitter`` (multiple datasets) via polymorphic dispatch: - ``_precompute_reshaping`` and ``_fit_function_wrapper`` are resolved - on the concrete subclass at call time, so multi-dataset flattening - is handled automatically when called on a ``MultiFitter`` instance. - - Parameters - ---------- - x : Union[np.ndarray, List[np.ndarray]] - Independent variable array (or list of arrays for ``MultiFitter``). - y : Union[np.ndarray, List[np.ndarray]] - Dependent variable array (or list of arrays for ``MultiFitter``). - weights : Union[Optional[np.ndarray], List[Optional[np.ndarray]]] - Weight array (or list of arrays for ``MultiFitter``). - samples : int, default=10000 - Number of retained DREAM samples requested from BUMPS. - burn : int, default=2000 - Burn-in steps to discard before collecting samples. - thin : int, default=10 - Thinning interval — only every ``thin``-th sample is kept, - which reduces autocorrelation between consecutive draws. - chains : Optional[int], default=None - User-friendly alias for ``population``. Provide one or the - other, not both. - population : Optional[int], default=None - DREAM population **scale factor** (not an absolute chain count): - BUMPS creates ``ceil(population * n_parameters)`` parallel chains. - resume_state : Optional[Any], default=None - A BUMPS ``MCMCDraw`` state object from a previous - ``mcmc_sample()`` call (the ``'internal_bumps_object'`` value - of the returned dict). When provided, DREAM **continues** the - saved chain instead of starting cold. The population, parameter - count, and parameter names must match the current model. - - **Ring-buffer contract:** DREAM stores draws in a fixed-size - ring buffer sized to *samples*. Resuming with ``samples=N`` - retains only the last N draws. To extend an existing chain of - M draws by N without losing any:: - - fitter.mcmc_sample( - data, samples=M + N, burn=0, resume_state=previous_state - ) - - The ``burn`` parameter controls burn-in for the *new* draws - only; passing ``burn=0`` (strongly recommended on resume) - skips additional burn-in. A non-zero ``burn`` on a - previously-converged chain is usually a mistake. - - Resuming against *different* data is undefined behaviour (the - chain's likelihood changes underneath it). - vectorized : bool, default=False - When ``True``, each x array may be multi-dimensional (e.g. an - ``(N, M, 2)`` grid for a 2D model) and is left as-is. When - ``False`` (default), each x array is expected to be 1-D. - sampler_kwargs : Optional[dict], default=None - Additional keyword arguments forwarded to the BUMPS DREAM sampler. - progress_callback : Optional[Callable[[dict], Optional[bool]]], default=None - Optional callback invoked at each DREAM generation. The payload - dict includes ``iteration`` and ``sampling: True``. - abort_test : Optional[Callable[[], bool]], default=None - Optional callable that returns ``True`` to abort sampling early. - - Returns - ------- - dict - Dictionary with keys ``'draws'``, ``'param_names'``, - ``'internal_bumps_object'``, and ``'logp'``. - - Raises - ------ - ValueError - If ``samples``, ``burn``, or ``thin`` are invalid. - RuntimeError - If the active minimizer is not a BUMPS instance. - """ - if not isinstance(samples, int) or samples <= 0: - raise ValueError('samples must be a positive integer.') - if not isinstance(burn, int) or burn < 0: - raise ValueError('burn must be a non-negative integer.') - if not isinstance(thin, int) or thin < 1: - raise ValueError('thin must be a positive integer.') - - # Resolve chains/population alias - from easyscience.fitting.minimizers.minimizer_bumps import Bumps - - pop = Bumps._resolve_population_alias(chains=chains, population=population) - - x_fit, x_new, y_new, w_new, dims = self._precompute_reshaping(x, y, weights, vectorized) - self._dependent_dims = dims - - original_fit_func = self._fit_function - self.fit_function = self._fit_function_wrapper(x_new, flatten=True) - - try: - minimizer = self.minimizer - if not (hasattr(minimizer, 'package') and minimizer.package == 'bumps'): - raise RuntimeError( - 'Bayesian sampling requires a BUMPS minimizer. ' - 'Use ``fitter.switch_minimizer(AvailableMinimizers.Bumps)`` first.' - ) - - result = minimizer.mcmc_sample( - x=x_fit, - y=y_new, - weights=w_new, - samples=samples, - burn=burn, - thin=thin, - population=pop, - resume_state=resume_state, - sampler_kwargs=sampler_kwargs, - progress_callback=progress_callback, - abort_test=abort_test, - ) - finally: - self.fit_function = original_fit_func - - return result diff --git a/src/easyscience/fitting/minimizers/minimizer_bumps.py b/src/easyscience/fitting/minimizers/minimizer_bumps.py index 58dc3582..735e00ef 100644 --- a/src/easyscience/fitting/minimizers/minimizer_bumps.py +++ b/src/easyscience/fitting/minimizers/minimizer_bumps.py @@ -264,41 +264,6 @@ def _resolve_fitclass(method: str): return fitclass raise FitError(f'Unknown BUMPS fitting method: {method}') - @staticmethod - def _resolve_population_alias( - chains: int | None = None, - population: int | None = None, - ) -> int | None: - """Resolve the ``chains`` / ``population`` alias for DREAM. - - ``chains`` is the user-friendly name; ``population`` is the parameter - name used by BUMPS. If both are provided and differ, an error is - raised. Returns the resolved value (or ``None`` if neither is set). - - Parameters - ---------- - chains : int | None, default=None - User-friendly alias for DREAM population count. - population : int | None, default=None - BUMPS DREAM population count. - - Returns - ------- - int | None - The resolved population value, or ``None`` if neither is set. - - Raises - ------ - ValueError - If both ``chains`` and ``population`` are provided but differ. - """ - if chains is not None and population is not None and chains != population: - raise ValueError( - f'Conflicting population specifications: chains={chains} ' - f'and population={population}. Use only one.' - ) - return chains if chains is not None else population - def _build_progress_payload( self, problem, iteration: int, point: np.ndarray, nllf: float ) -> dict: @@ -425,7 +390,6 @@ def mcmc_sample( samples: int = 10000, burn: int = 2000, thin: int = 10, - chains: int | None = None, population: int | None = None, resume_state: Any | None = None, sampler_kwargs: dict | None = None, @@ -436,7 +400,7 @@ def mcmc_sample( Builds a BUMPS `FitProblem` from the current model and runs the DREAM sampler. This is the public minimizer-level entry point for Bayesian - sampling; the higher-level `Fitter.mcmc_sample` delegates to this + sampling; the higher-level `Sampler` delegates to this method after flattening multi-dataset arrays. Parameters @@ -453,9 +417,6 @@ def mcmc_sample( Burn-in steps. thin : int, default=10 Thinning interval. - chains : int | None, default=None - User-friendly alias for ``population``. Provide one or the - other, not both. population : int | None, default=None DREAM population **scale factor** (not an absolute chain count): BUMPS creates ``ceil(population * n_parameters)`` parallel chains, @@ -473,7 +434,8 @@ def mcmc_sample( ``samples=N`` retains only the **last N** draws. To extend an existing chain of M draws by N without losing any:: - fitter.mcmc_sample(data, samples=M + N, burn=0, resume_state=old_state) + sampler = Sampler(fitter, data) + sampler.extend(additional_samples=N) The ``burn`` parameter controls burn-in for the *new* draws only; passing ``burn=0`` (strongly recommended on resume) @@ -481,7 +443,7 @@ def mcmc_sample( previously-converged chain is usually a mistake and emits a warning. - The ``chains``/``population`` and ``initializer`` parameters + The ``population`` and ``initializer`` parameters have **no effect** when ``resume_state`` is provided — they are determined by the saved state. @@ -507,8 +469,7 @@ def mcmc_sample( Raises ------ ValueError - If the input shapes or weights are invalid, if both ``chains`` - and ``population`` are provided with different values, if + If the input shapes or weights are invalid, if ``progress_callback`` is not callable, or if ``resume_state`` is incompatible with the current model (parameter count, names/order, or population mismatch). @@ -552,9 +513,7 @@ def mcmc_sample( curve = model_func(x, y, weights) problem = FitProblem(curve) - # Resolve population parameter (before resume validation, since - # validation needs the resolved value). - pop = self._resolve_population_alias(chains=chains, population=population) + pop = population # --- Resume-state compatibility validation --- if resume_state is not None: @@ -611,7 +570,7 @@ def mcmc_sample( # On resume we must recover the scale factor from the state. n_params = len(problem._parameters) if pop is not None: - # Caller explicitly set chains/population — validate it + # Caller explicitly set population — validate it # produces the same Npop as the saved state. expected_npop = int(math.ceil(pop * n_params)) if expected_npop != resume_state.Npop: @@ -867,9 +826,14 @@ def _gen_fit_results( def save_sampler_state(state: Any, filename: str) -> None: """Persist a DREAM sampler state to disk. + .. deprecated:: + Not public API anymore — use ``Sampler.save()`` (which also writes a + metadata sidecar). Kept as the internal helper used by + ``easyscience.fitting.sampler``. + Thin wrapper around BUMPS ``save_state``. ``state`` is the ``internal_bumps_object`` returned by :meth:`Bumps.mcmc_sample` (or by - ``MultiFitter.mcmc_sample``). Three gzipped text files are written: + ``Sampler``). Three gzipped text files are written: ``-chain.mc.gz``, ``-point.mc.gz`` and ``-stats.mc.gz``. @@ -885,11 +849,16 @@ def save_sampler_state(state: Any, filename: str) -> None: save_state(state, str(filename)) -def load_sampler_state(filename: str) -> Any: +def load_sampler_state(filename: str, skip: int = 0) -> Any: """Load a DREAM sampler state saved by :func:`save_sampler_state`. + .. deprecated:: + Not public API anymore — use ``Sampler.load_state()`` (which also + restores parameter names from the metadata sidecar). Kept as the + internal helper used by ``easyscience.fitting.sampler``. + The returned object can be passed as ``resume_state`` to - :meth:`Bumps.mcmc_sample` (or ``MultiFitter.mcmc_sample``) to extend a + :meth:`Bumps.mcmc_sample` (or the ``Sampler``) to extend a previously saved chain. Works around a regression introduced in BUMPS 1.0.4: ``load_state`` reads @@ -905,6 +874,9 @@ def load_sampler_state(filename: str) -> Any: ---------- filename : str Path prefix that was passed to :func:`save_sampler_state`. + skip : int, default=0 + Discard the first ``skip`` saved generations on load, forwarded to + ``bumps.dream.state.load_state``. Returns ------- @@ -916,7 +888,7 @@ def load_sampler_state(filename: str) -> Any: loader = getattr(_bumps_state, 'loadtxt_with_fallback', None) if loader is None: # BUMPS < 1.0.4: load_state uses a custom parser that is already 2-D safe. - return _bumps_state.load_state(str(filename)) + return _bumps_state.load_state(str(filename), skip=skip) def _loadtxt_2d(file: Any, report: int = 0) -> np.ndarray: arr = loader(file, report=report) @@ -925,6 +897,6 @@ def _loadtxt_2d(file: Any, report: int = 0) -> np.ndarray: _bumps_state.loadtxt_with_fallback = _loadtxt_2d try: - return _bumps_state.load_state(str(filename)) + return _bumps_state.load_state(str(filename), skip=skip) finally: _bumps_state.loadtxt_with_fallback = loader diff --git a/src/easyscience/fitting/sampler.py b/src/easyscience/fitting/sampler.py new file mode 100644 index 00000000..828e90e6 --- /dev/null +++ b/src/easyscience/fitting/sampler.py @@ -0,0 +1,541 @@ +# SPDX-FileCopyrightText: 2026 EasyScience contributors +# SPDX-License-Identifier: BSD-3-Clause +"""Bayesian MCMC sampling — the ``Sampler`` class and persistence helpers.""" + +from __future__ import annotations + +import hashlib +import json +import warnings +from dataclasses import dataclass +from typing import TYPE_CHECKING +from typing import Any +from typing import Callable +from typing import List +from typing import Optional +from typing import Union + +import numpy as np + +from .minimizers.minimizer_base import MINIMIZER_PARAMETER_PREFIX + +if TYPE_CHECKING: # avoid import cycles; Fitter is only needed for type hints + from .fitter import Fitter + +# Sidecar schema owned by easyscience from version 2 onwards. Version 1 +# sidecars (written by easyreflectometry's save_posterior) carry only +# 'param_names' and are still accepted on load. +_SIDECAR_SCHEMA_VERSION = 2 +_ACCEPTED_SIDECAR_SCHEMA_VERSIONS = (1, 2) + + +def _easyscience_version() -> str: + """Return the installed easyscience version string.""" + try: + from importlib.metadata import version as _v + + return _v('easyscience') + except Exception: + return 'unknown' + + +def _data_fingerprint( + x_list: list, + y_list: list, + w_list: list, +) -> Optional[str]: + """Return a SHA-256 hex digest of concatenated (x|y|weights), or None.""" + try: + h = hashlib.sha256() + for arr in list(x_list) + list(y_list) + list(w_list): + h.update(np.ascontiguousarray(arr, dtype=np.float64).tobytes()) + return h.hexdigest() + except Exception: + return None + + +def save_chain( + state: Any, + param_names: Optional[List[str]], + path: str, + data_fingerprint: Optional[str] = None, +) -> None: + """Persist a DREAM chain state plus a metadata sidecar. + + Writes BUMPS native files (``-chain.mc.gz`` etc. via + ``bumps.dream.state.save_state``) plus a ``.params.json`` sidecar + holding parameter names and metadata. Used by :meth:`Sampler.save` and by + the deprecated ``easyreflectometry`` ``save_posterior`` shim. + + Parameters + ---------- + state : Any + The BUMPS ``MCMCDraw`` object to persist. + param_names : Optional[List[str]] + Parameter names (one per chain column), stored in the sidecar. + path : str + File path prefix. BUMPS appends its own suffixes. + data_fingerprint : Optional[str], default=None + SHA-256 fingerprint of the data the chain was sampled against, + verified (with a warning) on reload. + """ + from .minimizers.minimizer_bumps import save_sampler_state + + save_sampler_state(state, str(path)) + + sidecar = { + 'schema_version': _SIDECAR_SCHEMA_VERSION, + 'param_names': param_names, + 'easyscience_version': _easyscience_version(), + 'data_fingerprint': data_fingerprint, + } + with open(f'{path}.params.json', 'w') as f: + json.dump(sidecar, f, indent=2) + + +def load_chain(path: str, skip: int = 0) -> tuple[Any, Optional[List[str]], dict]: + """Reload a DREAM chain state saved by :func:`save_chain`. + + Uses the patched loader (``load_sampler_state``), which carries the + BUMPS >= 1.0.4 single-row ``loadtxt`` workaround. Parameter names are + restored from the sidecar when available (schema versions 1 and 2), + falling back to the state's labels with the minimizer prefix stripped. + Used by :meth:`Sampler.load_state` and by the deprecated + ``easyreflectometry`` ``load_posterior`` shim. + + Parameters + ---------- + path : str + File path prefix used when saving. + skip : int, default=0 + Discard the first ``skip`` saved generations on load, forwarded to + ``bumps.dream.state.load_state``. + + Returns + ------- + tuple[Any, Optional[List[str]], dict] + The reloaded BUMPS ``MCMCDraw`` state, the parameter names (or + ``None`` if neither sidecar nor labels yielded them), and the raw + sidecar dict (empty if absent/unreadable). + """ + from .minimizers.minimizer_bumps import load_sampler_state + + state = load_sampler_state(str(path), skip=skip) + + sidecar: dict = {} + param_names: Optional[List[str]] = None + try: + with open(f'{path}.params.json', 'r') as f: + sidecar = json.load(f) + if sidecar.get('schema_version') in _ACCEPTED_SIDECAR_SCHEMA_VERSIONS: + param_names = sidecar.get('param_names') + except (FileNotFoundError, json.JSONDecodeError): + sidecar = {} + + if param_names is None: + # Fallback: strip the minimizer prefix from state.labels. Note BUMPS + # save_state/load_state does not preserve labels, so a reloaded state + # typically carries default labels like ['P0', 'P1', ...]. + param_names = [ + lbl[len(MINIMIZER_PARAMETER_PREFIX) :] + if lbl.startswith(MINIMIZER_PARAMETER_PREFIX) + else lbl + for lbl in state.labels + ] + + return state, param_names, sidecar + + +@dataclass +class SamplingResults: + """Structured result of an MCMC sampling run (analogous to ``FitResults``). + + Attributes + ---------- + draws : np.ndarray + Posterior samples, shape ``(n_samples, n_params)``. + param_names : list[str] + Parameter names (one per column of ``draws``). + logp : np.ndarray + Log-posterior values, shape ``(n_samples,)``. + state : Any + The raw BUMPS ``MCMCDraw`` chain state. + """ + + draws: np.ndarray + param_names: list[str] + logp: np.ndarray + state: Any + + def to_legacy_dict(self) -> dict: + """Return the legacy dict shape produced by the deprecated + ``mcmc_sample()`` APIs.""" + return { + 'draws': self.draws, + 'param_names': self.param_names, + 'internal_bumps_object': self.state, + 'logp': self.logp, + } + + +class Sampler: + """Bayesian MCMC sampler for one dataset, backed by a Fitter's BUMPS minimizer. + + One ``Sampler`` instance represents one chain over one ``(x, y, weights)`` + dataset. The data is bound at construction; :meth:`sample` and + :meth:`extend` take no data arguments, so a chain can never be extended + against different data (undefined behaviour in BUMPS). + + Construct directly with a configured ``Fitter`` (or ``MultiFitter``) whose + minimizer has been switched to ``AvailableMinimizers.Bumps``. + + The sampler is BUMPS/DREAM-specific for now: the BUMPS check in + :meth:`_run` is the seam where another backend would plug in. + + Parameters + ---------- + fitter : Fitter + A configured ``Fitter`` (or ``MultiFitter``) whose minimizer has been + switched to ``AvailableMinimizers.Bumps``. + x : Union[np.ndarray, List[np.ndarray]] + Independent variable array (or list of arrays for ``MultiFitter``). + y : Union[np.ndarray, List[np.ndarray]] + Dependent variable array (or list of arrays for ``MultiFitter``). + weights : Union[Optional[np.ndarray], List[Optional[np.ndarray]]], default=None + Weight array (or list of arrays for ``MultiFitter``). + vectorized : bool, default=False + When ``True``, each x array may be multi-dimensional (e.g. an + ``(N, M, 2)`` grid for a 2D model) and is left as-is. + sampler_kwargs : Optional[dict], default=None + Per-instance default keyword arguments forwarded to the BUMPS DREAM + sampler on every run, merged with (and overridden by) per-call + ``sampler_kwargs``. + """ + + def __init__( + self, + fitter: 'Fitter', + x: Union[np.ndarray, List[np.ndarray]], + y: Union[np.ndarray, List[np.ndarray]], + weights: Union[Optional[np.ndarray], List[Optional[np.ndarray]]] = None, + vectorized: bool = False, + sampler_kwargs: Optional[dict] = None, + ): + self._fitter = fitter + self._x = x + self._y = y + self._weights = weights + self._vectorized = vectorized + self._default_sampler_kwargs = dict(sampler_kwargs or {}) + self._state: Any = None # BUMPS MCMCDraw (current chain state) + self._results: Optional[SamplingResults] = None + + @property + def state(self) -> Any: + """Raw BUMPS MCMCDraw state (None before first sample/load_state).""" + return self._state + + @property + def results(self) -> Optional[SamplingResults]: + """Results of the most recent sample/extend/load_state call.""" + return self._results + + @property + def draws(self) -> Optional[np.ndarray]: + """Posterior draws from the most recent run (or None).""" + return self._results.draws if self._results is not None else None + + @property + def param_names(self) -> Optional[List[str]]: + """Parameter names from the most recent run (or None).""" + return self._results.param_names if self._results is not None else None + + @property + def logp(self) -> Optional[np.ndarray]: + """Log-posterior values from the most recent run (or None).""" + return self._results.logp if self._results is not None else None + + def _fingerprint(self) -> Optional[str]: + """SHA-256 fingerprint of the bound (x, y, weights) data, or None.""" + x_list = list(self._x) if isinstance(self._x, (list, tuple)) else [self._x] + y_list = list(self._y) if isinstance(self._y, (list, tuple)) else [self._y] + if self._weights is None: + w_list = [] + elif isinstance(self._weights, (list, tuple)): + w_list = [w for w in self._weights if w is not None] + else: + w_list = [self._weights] + return _data_fingerprint(x_list, y_list, w_list) + + def _run( + self, + samples: int, + burn: int, + thin: int, + population: Optional[int], + resume_state: Optional[Any], + sampler_kwargs: Optional[dict], + progress_callback: Optional[Callable[[dict], Optional[bool]]], + abort_test: Optional[Callable[[], bool]], + ) -> SamplingResults: + """Shared sampling engine for :meth:`sample` and :meth:`extend`. + + Argument validation for ``samples``/``burn``/``thin`` lives in + ``Bumps.mcmc_sample`` (single source of truth). + """ + # Check the minimizer is BUMPS *before* mutating the fitter — a + # non-BUMPS fitter must not be needlessly rebuilt. + minimizer = self._fitter.minimizer + if not (hasattr(minimizer, 'package') and minimizer.package == 'bumps'): + raise RuntimeError( + 'Bayesian sampling requires a BUMPS minimizer. ' + 'Use ``fitter.switch_minimizer(AvailableMinimizers.Bumps)`` first.' + ) + + x_fit, x_new, y_new, w_new, dims = self._fitter._precompute_reshaping( + self._x, self._y, self._weights, self._vectorized + ) + self._fitter._dependent_dims = dims + wrapped = self._fitter._fit_function_wrapper(x_new, flatten=True) + + merged_kwargs = {**self._default_sampler_kwargs, **(sampler_kwargs or {})} + + original_fit_func = self._fitter.fit_function + # Assigning fit_function triggers _update_minimizer() and *rebuilds* + # the minimizer object — it must be re-fetched after this assignment. + self._fitter.fit_function = wrapped + try: + minimizer = self._fitter.minimizer + result = minimizer.mcmc_sample( + x=x_fit, + y=y_new, + weights=w_new, + samples=samples, + burn=burn, + thin=thin, + population=population, + resume_state=resume_state, + sampler_kwargs=merged_kwargs or None, + progress_callback=progress_callback, + abort_test=abort_test, + ) + finally: + self._fitter.fit_function = original_fit_func + + results = SamplingResults( + draws=result['draws'], + param_names=result['param_names'], + logp=result['logp'], + state=result['internal_bumps_object'], + ) + self._state = results.state + self._results = results + return results + + def sample( + self, + samples: int = 10000, + burn: int = 2000, + thin: int = 10, + population: Optional[int] = None, + sampler_kwargs: Optional[dict] = None, + progress_callback: Optional[Callable[[dict], Optional[bool]]] = None, + abort_test: Optional[Callable[[], bool]] = None, + ) -> SamplingResults: + """Run fresh Bayesian MCMC sampling on the bound data. + + Calling ``sample()`` on a sampler that already holds a chain starts a + **fresh** chain — the previous state and results are replaced. Use + :meth:`extend` to continue an existing chain. + + Parameters + ---------- + samples : int, default=10000 + Number of retained DREAM samples requested from BUMPS. + burn : int, default=2000 + Burn-in steps to discard before collecting samples. + thin : int, default=10 + Thinning interval — only every ``thin``-th sample is kept, + which reduces autocorrelation between consecutive draws. + population : Optional[int], default=None + DREAM population **scale factor** (not an absolute chain count): + BUMPS creates ``ceil(population * n_parameters)`` parallel chains. + sampler_kwargs : Optional[dict], default=None + Additional keyword arguments forwarded to the BUMPS DREAM + sampler (merged over the instance defaults). + progress_callback : Optional[Callable[[dict], Optional[bool]]], default=None + Optional callback invoked at each DREAM generation. The payload + dict includes ``iteration`` and ``sampling: True``. + abort_test : Optional[Callable[[], bool]], default=None + Optional callable that returns ``True`` to abort sampling early. + + Returns + ------- + SamplingResults + Structured sampling results (also stored on :attr:`results`). + + Notes + ----- + Exceptions propagate from the sampling engine: ``ValueError`` if + ``samples``, ``burn``, or ``thin`` are invalid, and ``RuntimeError`` + if the active minimizer is not a BUMPS instance. + """ + return self._run( + samples=samples, + burn=burn, + thin=thin, + population=population, + resume_state=None, + sampler_kwargs=sampler_kwargs, + progress_callback=progress_callback, + abort_test=abort_test, + ) + + def extend( + self, + additional_samples: int = 5000, + thin: int = 10, + total_samples: Optional[int] = None, + sampler_kwargs: Optional[dict] = None, + progress_callback: Optional[Callable[[dict], Optional[bool]]] = None, + abort_test: Optional[Callable[[], bool]] = None, + ) -> SamplingResults: + """Continue the existing chain with additional samples. + + DREAM stores draws in a fixed-size ring buffer sized to its + ``samples`` parameter; this method does the ring-buffer arithmetic + for you (``samples = stored_generations * population + + additional_samples``) so no existing draws are lost, regardless of + the thinning interval. Runs with ``burn=0`` — re-burning a converged + chain is usually a mistake. The DREAM population is recovered from + the saved state and cannot be changed on extend. + + Parameters + ---------- + additional_samples : int, default=5000 + Number of additional DREAM samples to draw, in the same units + as ``samples`` in :meth:`sample`. With thinning, approximately + ``additional_samples / thin`` new draws are retained. + thin : int, default=10 + Thinning interval for the retained draws. + total_samples : Optional[int], default=None + Advanced: total retained samples requested from the ring buffer, + **overriding** the ``additional_samples`` arithmetic. With + ``total_samples=N``, only the last N draws are retained. + sampler_kwargs : Optional[dict], default=None + Additional keyword arguments forwarded to the BUMPS DREAM + sampler (merged over the instance defaults). + progress_callback : Optional[Callable[[dict], Optional[bool]]], default=None + Optional callback invoked at each DREAM generation. + abort_test : Optional[Callable[[], bool]], default=None + Optional callable that returns ``True`` to abort sampling early. + + Returns + ------- + SamplingResults + Structured sampling results for the full (extended) chain. + + Raises + ------ + RuntimeError + If there is no chain to extend (call :meth:`sample` or + :meth:`load_state` first), or the minimizer is not BUMPS. + """ + if self._state is None: + raise RuntimeError('No chain to extend. Call sample() or load_state() first.') + if total_samples is None: + # The DREAM ring buffer holds Ngen generations of Npop points + # each. Sizing the new buffer from the stored generations (rather + # than the retained-draw count, which is divided by the thinning + # interval) guarantees no existing draws are dropped for any + # ``thin`` value. + old_samples = int(self._state.Ngen) * int(self._state.Npop) + total_samples = old_samples + int(additional_samples) + return self._run( + samples=total_samples, + burn=0, + thin=thin, + population=None, + resume_state=self._state, + sampler_kwargs=sampler_kwargs, + progress_callback=progress_callback, + abort_test=abort_test, + ) + + def save(self, path: str) -> None: + """Persist the chain state and metadata to disk. + + Writes BUMPS native files (``-chain.mc.gz`` etc.) plus a + ``.params.json`` sidecar with parameter names, the easyscience + version, and a fingerprint of the bound data (verified with a warning + on :meth:`load_state`). + + Parameters + ---------- + path : str + File path prefix. BUMPS appends its own suffixes. + + Raises + ------ + RuntimeError + If no chain state exists yet (call :meth:`sample` first). + """ + if self._state is None: + raise RuntimeError('No chain state to save. Call sample() first.') + save_chain( + self._state, + self._results.param_names if self._results is not None else None, + path, + data_fingerprint=self._fingerprint(), + ) + + def load_state(self, path: str, skip: int = 0) -> SamplingResults: + """Load a previously saved chain into this sampler. + + The sampler must be constructed with the same fitter and data used to + create the chain — :meth:`extend` then continues the saved chain. If + the sidecar carries a data fingerprint and it does not match this + sampler's bound data, a ``UserWarning`` is emitted (extending a chain + against different data is undefined behaviour). + + Populates :attr:`state` and :attr:`results` (draws, log-posterior and + parameter names) from the saved chain, so summaries and + ``PosteriorResults.from_sampler()`` work without resampling. + + Parameters + ---------- + path : str + File path prefix used in :meth:`save`. + skip : int, default=0 + Discard the first ``skip`` saved generations on load. Useful for + trimming additional burn-in without re-sampling. + + Returns + ------- + SamplingResults + The reloaded chain results (also stored on :attr:`results`). + """ + state, param_names, sidecar = load_chain(path, skip=skip) + + saved_fingerprint = sidecar.get('data_fingerprint') + if saved_fingerprint is not None: + current_fingerprint = self._fingerprint() + if current_fingerprint is not None and current_fingerprint != saved_fingerprint: + warnings.warn( + 'The data bound to this Sampler does not match the data ' + 'fingerprint stored with the saved chain. Extending a ' + 'chain against different data is undefined behaviour.', + UserWarning, + stacklevel=2, + ) + + _draw = state.draw() + results = SamplingResults( + draws=_draw.points, + param_names=param_names, + logp=_draw.logp, + state=state, + ) + self._state = state + self._results = results + return results diff --git a/tests/integration/fitting/test_multi_fitter.py b/tests/integration/fitting/test_multi_fitter.py index 2eb9ef32..52955990 100644 --- a/tests/integration/fitting/test_multi_fitter.py +++ b/tests/integration/fitting/test_multi_fitter.py @@ -288,283 +288,5 @@ def test_multi_fit_1D_2D(fit_engine): assert np.all(result.y_obs == Y[idx]) -# --------------------------------------------------------------------------- -# Tests for MultiFitter.mcmc_sample (Bayesian MCMC via BUMPS DREAM) -# --------------------------------------------------------------------------- - - -class TestMultiFitterMcmcSample: - """Integration tests for ``MultiFitter.mcmc_sample``.""" - - def test_raises_runtime_error_when_not_bumps(self): - """mcmc_sample() must raise RuntimeError if the minimizer is not a BUMPS instance.""" - sp = AbsSin(0.354, 3.05) - f = MultiFitter([sp], [sp]) - - x = np.linspace(0, 5, 50) - y = np.sin(x) - weights = np.ones_like(x) - - with pytest.raises(RuntimeError, match='Bayesian sampling requires a BUMPS minimizer'): - f.mcmc_sample(x=[x], y=[y], weights=[weights], samples=10, burn=5, thin=1) - - @pytest.mark.filterwarnings('ignore::UserWarning') - def test_returns_expected_keys_and_shapes(self): - """mcmc_sample() with BUMPS should return draws, param_names, state, logp.""" - ref_sin = AbsSin(0.2, np.pi) - sp = AbsSin(0.354, 3.05) - - x = np.linspace(0, 5, 50) - y = ref_sin(x) - weights = np.ones_like(x) - - sp.offset.fixed = False - sp.phase.fixed = False - - f = MultiFitter([sp], [sp]) - try: - f.switch_minimizer('Bumps') - except AttributeError: - pytest.skip('BUMPS is not installed') - - result = f.mcmc_sample(x=[x], y=[y], weights=[weights], samples=100, burn=20, thin=2) - - assert isinstance(result, dict) - assert 'draws' in result - assert 'param_names' in result - assert 'internal_bumps_object' in result - assert 'logp' in result - - # draws shape: (retained_samples, n_params) - assert result['draws'].ndim == 2 - assert result['draws'].shape[1] == len(result['param_names']) - - # param_names should match the model's fit parameters - expected_pars = {p.unique_name for p in sp.get_fit_parameters()} - assert set(result['param_names']) == expected_pars - - @pytest.mark.filterwarnings('ignore::UserWarning') - def test_multi_dataset_returns_consistent_param_names(self): - """mcmc_sample() with multiple datasets should have correct param_names across all.""" - ref_sin_1 = AbsSin(0.2, np.pi) - sp_sin_1 = AbsSin(0.354, 3.05) - sp_line = Line(0.43, 6.1) - - # Link a parameter across models - sp_line.m.make_dependent_on( - dependency_expression='sp_sin1', dependency_map={'sp_sin1': sp_sin_1.offset} - ) - - x1 = np.linspace(0, 5, 50) - y1 = ref_sin_1(x1) - x2 = np.copy(x1) - y2 = Line(1, 4.6)(x2) - weights = np.ones_like(x1) - - sp_sin_1.offset.fixed = False - sp_sin_1.phase.fixed = False - sp_line.c.fixed = False - - f = MultiFitter([sp_sin_1, sp_line], [sp_sin_1, sp_line]) - try: - f.switch_minimizer('Bumps') - except AttributeError: - pytest.skip('BUMPS is not installed') - - result = f.mcmc_sample( - x=[x1, x2], y=[y1, y2], weights=[weights, weights], samples=100, burn=20, thin=2 - ) - - # All parameters across both models should appear - all_params = {p.unique_name for p in sp_sin_1.get_fit_parameters()} - all_params |= {p.unique_name for p in sp_line.get_fit_parameters()} - assert set(result['param_names']) == all_params - - def test_population_param_controls_chain_count(self): - """Passing population should succeed and produce valid draws.""" - sp = AbsSin(0.354, 3.05) - sp.offset.fixed = False - sp.phase.fixed = False - f = MultiFitter([sp], [sp]) - try: - f.switch_minimizer('Bumps') - except AttributeError: - pytest.skip('BUMPS is not installed') - - x = np.linspace(0, 5, 50) - y = np.sin(x) - weights = np.ones_like(x) - - result = f.mcmc_sample( - x=[x], y=[y], weights=[weights], samples=100, burn=20, thin=2, population=5 - ) - assert 'draws' in result - - @pytest.mark.filterwarnings('ignore::UserWarning') - def test_vectorized_2d_input_produces_valid_draws(self): - """mcmc_sample() with vectorized=True and 2D input should produce valid draws.""" - sp = AbsSin2D(0.1, 1.75) - - x = np.linspace(0, 5, 50) - X, Y = np.meshgrid(x, x) - x2D = np.stack((X, Y), axis=2) - y2D = np.abs(np.sin(X)) * np.abs(np.sin(Y)) - weights = np.ones_like(y2D) - - sp.offset.fixed = False - sp.phase.fixed = False - - f = MultiFitter([sp], [sp]) - try: - f.switch_minimizer('Bumps') - except AttributeError: - pytest.skip('BUMPS is not installed') - - result = f.mcmc_sample( - x=[x2D], y=[y2D], weights=[weights], samples=100, burn=20, thin=2, vectorized=True - ) - - assert result['draws'].ndim == 2 - assert result['draws'].shape[0] > 0 - assert result['draws'].shape[1] == len(result['param_names']) - - def test_fit_function_restored_after_runtime_error(self): - """fit_function must be restored to its original value even when mcmc_sample() raises.""" - sp = AbsSin(0.354, 3.05) - f = MultiFitter([sp], [sp]) - - x = np.linspace(0, 5, 50) - y = np.sin(x) - weights = np.ones_like(x) - - original_func = f.fit_function - - with pytest.raises(RuntimeError): - f.mcmc_sample(x=[x], y=[y], weights=[weights], samples=10, burn=5, thin=1) - - assert f.fit_function is original_func - - @pytest.mark.filterwarnings('ignore::UserWarning') - def test_fit_function_restored_after_successful_sample(self): - """fit_function must be restored to its original value after a successful mcmc_sample().""" - ref_sin = AbsSin(0.2, np.pi) - sp = AbsSin(0.354, 3.05) - - x = np.linspace(0, 5, 50) - y = ref_sin(x) - weights = np.ones_like(x) - - sp.offset.fixed = False - sp.phase.fixed = False - - f = MultiFitter([sp], [sp]) - try: - f.switch_minimizer('Bumps') - except AttributeError: - pytest.skip('BUMPS is not installed') - - original_func = f.fit_function - f.mcmc_sample(x=[x], y=[y], weights=[weights], samples=100, burn=20, thin=2) - assert f.fit_function is original_func - - @pytest.mark.filterwarnings('ignore::UserWarning') - def test_sampler_kwargs_forwarded(self): - """sampler_kwargs dict is forwarded to the BUMPS DREAM sampler.""" - ref_sin = AbsSin(0.2, np.pi) - sp = AbsSin(0.354, 3.05) - - x = np.linspace(0, 5, 50) - y = ref_sin(x) - weights = np.ones_like(x) - - sp.offset.fixed = False - sp.phase.fixed = False - - f = MultiFitter([sp], [sp]) - try: - f.switch_minimizer('Bumps') - except AttributeError: - pytest.skip('BUMPS is not installed') - - result = f.mcmc_sample( - x=[x], - y=[y], - weights=[weights], - samples=100, - burn=20, - thin=2, - sampler_kwargs={'init': 'random'}, - ) - - assert result['draws'].ndim == 2 - assert result['draws'].shape[0] > 0 - - def _bumps_sampler(self): - """Build a 2-parameter BUMPS sampler over a small QENS-like model.""" - ref_sin = AbsSin(0.2, np.pi) - sp = AbsSin(0.354, 3.05) - sp.offset.fixed = False - sp.phase.fixed = False - x = np.linspace(0, 5, 50) - y = ref_sin(x) - weights = np.ones_like(x) - f = MultiFitter([sp], [sp]) - try: - f.switch_minimizer('Bumps') - except AttributeError: - pytest.skip('BUMPS is not installed') - return f, x, y, weights - - @pytest.mark.filterwarnings('ignore::UserWarning') - def test_resume_nondefault_population(self): - """Resuming a chain that used a non-default population must not fail. - - Regression test: the recovered population scale factor must be used - when configuring the resumed sampler, otherwise BUMPS regenerates the - default population and raises ``Cannot change Nvar, Npop or Ncr on - resize``. - """ - f, x, y, weights = self._bumps_sampler() - first = f.mcmc_sample( - x=[x], y=[y], weights=[weights], samples=100, burn=20, thin=1, population=5 - ) - first_state = first['internal_bumps_object'] - - extended = f.mcmc_sample( - x=[x], y=[y], weights=[weights], samples=200, burn=0, thin=1, resume_state=first_state - ) - - assert extended['draws'].shape[1] == first['draws'].shape[1] - assert extended['draws'].shape[0] > 0 - # Population must be preserved across the resume. - assert extended['internal_bumps_object'].Npop == first_state.Npop - - def test_resume_after_save_load_roundtrip(self, tmp_path): - """A chain saved and reloaded from disk must resume. - - Regression test: BUMPS ``load_state`` does not preserve parameter - labels (they come back as ``['P0', 'P1', ...]``), so name-based - validation must not reject a reloaded state. Resume should proceed - (matching by parameter order) with a warning instead of raising. - - Also guards the BUMPS 1.0.4 single-row buffer regression handled by - ``load_sampler_state`` (a short chain stores one CR-weight row, which - BUMPS' numpy-based reader would otherwise collapse to a 1-D array). - """ - from easyscience.fitting.minimizers import load_sampler_state - from easyscience.fitting.minimizers import save_sampler_state - - f, x, y, weights = self._bumps_sampler() - first = f.mcmc_sample(x=[x], y=[y], weights=[weights], samples=100, burn=20, thin=1) - - prefix = str(tmp_path / 'chain') - save_sampler_state(first['internal_bumps_object'], prefix) - reloaded = load_sampler_state(prefix) - - with pytest.warns(UserWarning, match='does not carry parameter names'): - extended = f.mcmc_sample( - x=[x], y=[y], weights=[weights], samples=200, burn=0, thin=1, resume_state=reloaded - ) - - assert extended['draws'].shape[1] == first['draws'].shape[1] - assert extended['draws'].shape[0] > 0 +# NOTE: The Bayesian MCMC tests formerly here (TestMultiFitterMcmcSample) were +# moved to test_sampler.py and adapted to ``Sampler(f, ...)``. diff --git a/tests/integration/fitting/test_sampler.py b/tests/integration/fitting/test_sampler.py new file mode 100644 index 00000000..c8486b52 --- /dev/null +++ b/tests/integration/fitting/test_sampler.py @@ -0,0 +1,451 @@ +# SPDX-FileCopyrightText: 2026 EasyScience contributors +# SPDX-License-Identifier: BSD-3-Clause +"""Integration tests for the ``Sampler`` class (Bayesian MCMC via BUMPS DREAM). + +The behavioural MCMC tests formerly in +``test_multi_fitter.py::TestMultiFitterMcmcSample`` were moved here and +adapted to ``Sampler(f, ...)`` directly. +""" + +import json + +import numpy as np +import pytest + +from easyscience import ObjBase +from easyscience import Parameter +from easyscience.fitting import Sampler +from easyscience.fitting import SamplingResults +from easyscience.fitting.multi_fitter import MultiFitter + + +class Line(ObjBase): + m: Parameter + c: Parameter + + def __init__(self, m_val: float, c_val: float): + m = Parameter('m', m_val) + c = Parameter('c', c_val) + super(Line, self).__init__('line', m=m, c=c) + + def __call__(self, x): + return self.m.value * x + self.c.value + + +class AbsSin(ObjBase): + phase: Parameter + offset: Parameter + + def __init__(self, offset_val: float, phase_val: float): + offset = Parameter('offset', offset_val) + phase = Parameter('phase', phase_val) + super().__init__('sin', offset=offset, phase=phase) + + def __call__(self, x): + return np.abs(np.sin(self.phase.value * x + self.offset.value)) + + +class AbsSin2D(ObjBase): + phase: Parameter + offset: Parameter + + def __init__(self, offset_val: float, phase_val: float): + offset = Parameter('offset', offset_val) + phase = Parameter('phase', phase_val) + super().__init__('sin2D', offset=offset, phase=phase) + + def __call__(self, x): + X = x[:, :, 0] # x is a 2D array + Y = x[:, :, 1] + return np.abs(np.sin(self.phase.value * X + self.offset.value)) * np.abs( + np.sin(self.phase.value * Y + self.offset.value) + ) + + +def _bumps_fitter_and_data(): + """Build a 2-parameter BUMPS MultiFitter over a small sine model.""" + ref_sin = AbsSin(0.2, np.pi) + sp = AbsSin(0.354, 3.05) + sp.offset.fixed = False + sp.phase.fixed = False + x = np.linspace(0, 5, 50) + y = ref_sin(x) + weights = np.ones_like(x) + f = MultiFitter([sp], [sp]) + try: + f.switch_minimizer('Bumps') + except AttributeError: + pytest.skip('BUMPS is not installed') + return f, sp, x, y, weights + + +class TestSampler: + """Integration tests for ``Sampler(f, ...)`` / ``Sampler``.""" + + def test_sample_requires_bumps(self): + """sample() must raise RuntimeError if the minimizer is not BUMPS — + and must not mutate the fitter (no needless minimizer rebuild).""" + sp = AbsSin(0.354, 3.05) + f = MultiFitter([sp], [sp]) + + x = np.linspace(0, 5, 50) + y = np.sin(x) + weights = np.ones_like(x) + + sampler = Sampler(f, [x], [y], [weights]) + minimizer_before = f.minimizer + with pytest.raises(RuntimeError, match='Bayesian sampling requires a BUMPS minimizer'): + sampler.sample(samples=10, burn=5, thin=1) + assert f.minimizer is minimizer_before + + @pytest.mark.filterwarnings('ignore::UserWarning') + def test_sample_returns_results_object(self): + """sample() returns a populated SamplingResults; to_legacy_dict() has the legacy shape.""" + f, sp, x, y, weights = _bumps_fitter_and_data() + sampler = Sampler(f, [x], [y], [weights]) + + results = sampler.sample(samples=100, burn=20, thin=2) + + assert isinstance(results, SamplingResults) + assert results.draws.ndim == 2 + assert results.draws.shape[1] == len(results.param_names) + assert results.logp.shape[0] == results.draws.shape[0] + assert results.state is not None + + # param_names should match the model's fit parameters + expected_pars = {p.unique_name for p in sp.get_fit_parameters()} + assert set(results.param_names) == expected_pars + + # results cached on the sampler + assert sampler.results is results + assert sampler.state is results.state + assert sampler.draws is results.draws + assert sampler.param_names == results.param_names + + # legacy dict shape + legacy = results.to_legacy_dict() + assert set(legacy.keys()) == {'draws', 'param_names', 'internal_bumps_object', 'logp'} + assert legacy['internal_bumps_object'] is results.state + + @pytest.mark.filterwarnings('ignore::UserWarning') + def test_sample_multi_dataset(self): + """Multi-dataset sampling via Sampler(f, ...) has correct param_names.""" + ref_sin_1 = AbsSin(0.2, np.pi) + sp_sin_1 = AbsSin(0.354, 3.05) + sp_line = Line(0.43, 6.1) + + # Link a parameter across models + sp_line.m.make_dependent_on( + dependency_expression='sp_sin1', dependency_map={'sp_sin1': sp_sin_1.offset} + ) + + x1 = np.linspace(0, 5, 50) + y1 = ref_sin_1(x1) + x2 = np.copy(x1) + y2 = Line(1, 4.6)(x2) + weights = np.ones_like(x1) + + sp_sin_1.offset.fixed = False + sp_sin_1.phase.fixed = False + sp_line.c.fixed = False + + f = MultiFitter([sp_sin_1, sp_line], [sp_sin_1, sp_line]) + try: + f.switch_minimizer('Bumps') + except AttributeError: + pytest.skip('BUMPS is not installed') + + sampler = Sampler(f, [x1, x2], [y1, y2], [weights, weights]) + results = sampler.sample(samples=100, burn=20, thin=2) + + # All parameters across both models should appear + all_params = {p.unique_name for p in sp_sin_1.get_fit_parameters()} + all_params |= {p.unique_name for p in sp_line.get_fit_parameters()} + assert set(results.param_names) == all_params + + def test_sample_population(self): + """Passing population should succeed and produce valid draws.""" + f, _, x, y, weights = _bumps_fitter_and_data() + sampler = Sampler(f, [x], [y], [weights]) + + results = sampler.sample(samples=100, burn=20, thin=2, population=5) + assert results.draws.shape[0] > 0 + + @pytest.mark.filterwarnings('ignore::UserWarning') + def test_sample_vectorized_2d(self): + """sample() with vectorized=True and 2D input should produce valid draws.""" + sp = AbsSin2D(0.1, 1.75) + + x = np.linspace(0, 5, 50) + X, Y = np.meshgrid(x, x) + x2D = np.stack((X, Y), axis=2) + y2D = np.abs(np.sin(X)) * np.abs(np.sin(Y)) + weights = np.ones_like(y2D) + + sp.offset.fixed = False + sp.phase.fixed = False + + f = MultiFitter([sp], [sp]) + try: + f.switch_minimizer('Bumps') + except AttributeError: + pytest.skip('BUMPS is not installed') + + sampler = Sampler(f, [x2D], [y2D], [weights], vectorized=True) + results = sampler.sample(samples=100, burn=20, thin=2) + + assert results.draws.ndim == 2 + assert results.draws.shape[0] > 0 + assert results.draws.shape[1] == len(results.param_names) + + def test_fit_function_restored_on_error(self): + """fit_function must be restored even when the minimizer raises.""" + f, _, x, y, weights = _bumps_fitter_and_data() + sampler = Sampler(f, [x], [y], [weights]) + original_func = f.fit_function + + # Invalid `samples` is rejected by the minimizer (single source of + # validation) *after* the fitter has been mutated for sampling. + with pytest.raises(ValueError, match='samples must be a positive integer'): + sampler.sample(samples=-1, burn=5, thin=1) + + assert f.fit_function is original_func + + @pytest.mark.filterwarnings('ignore::UserWarning') + def test_fit_function_restored_on_success(self): + """fit_function must be restored after a successful sample().""" + f, _, x, y, weights = _bumps_fitter_and_data() + sampler = Sampler(f, [x], [y], [weights]) + original_func = f.fit_function + + sampler.sample(samples=100, burn=20, thin=2) + assert f.fit_function is original_func + + @pytest.mark.filterwarnings('ignore::UserWarning') + def test_sampler_kwargs_forwarded(self): + """Per-call sampler_kwargs dict is forwarded to the BUMPS DREAM sampler.""" + f, _, x, y, weights = _bumps_fitter_and_data() + sampler = Sampler(f, [x], [y], [weights]) + + results = sampler.sample(samples=100, burn=20, thin=2, sampler_kwargs={'init': 'random'}) + + assert results.draws.ndim == 2 + assert results.draws.shape[0] > 0 + + @pytest.mark.filterwarnings('ignore::UserWarning') + def test_default_sampler_kwargs_merged(self): + """Constructor-level sampler_kwargs defaults are used; per-call kwargs win.""" + f, _, x, y, weights = _bumps_fitter_and_data() + sampler = Sampler(f, [x], [y], [weights], sampler_kwargs={'init': 'random'}) + + captured = {} + original_mcmc_sample = type(f.minimizer).mcmc_sample + + def spy(self, **kwargs): + captured.update(kwargs.get('sampler_kwargs') or {}) + return original_mcmc_sample(self, **kwargs) + + try: + type(f.minimizer).mcmc_sample = spy + sampler.sample(samples=100, burn=20, thin=2) + assert captured == {'init': 'random'} + + captured.clear() + sampler.sample(samples=100, burn=20, thin=2, sampler_kwargs={'init': 'lhs'}) + assert captured == {'init': 'lhs'} # per-call overrides default + finally: + type(f.minimizer).mcmc_sample = original_mcmc_sample + + @pytest.mark.filterwarnings('ignore::UserWarning') + def test_extend_chain(self): + """extend(additional_samples=) continues the chain; ring-buffer math is done for the user.""" + f, _, x, y, weights = _bumps_fitter_and_data() + sampler = Sampler(f, [x], [y], [weights]) + + first = sampler.sample(samples=100, burn=20, thin=1) + n_first = first.draws.shape[0] + + extended = sampler.extend(additional_samples=100, thin=1) + + assert extended.draws.shape[1] == first.draws.shape[1] + assert extended.draws.shape[0] >= n_first + assert sampler.results is extended + + @pytest.mark.filterwarnings('ignore::UserWarning') + def test_extend_with_thinning_keeps_existing_draws(self): + """extend() must not drop stored draws when thin > 1. + + Regression test: the ring buffer is sized from the state's stored + generations (``Ngen * Npop``), not from the retained-draw count, + which BUMPS divides by the thinning interval. + """ + f, _, x, y, weights = _bumps_fitter_and_data() + sampler = Sampler(f, [x], [y], [weights]) + + first = sampler.sample(samples=1000, burn=20, thin=10) + n_first = first.draws.shape[0] + + extended = sampler.extend(additional_samples=1000, thin=10) + + # All previously retained draws are kept, plus ~additional/thin new ones. + assert extended.draws.shape[0] > n_first + + @pytest.mark.filterwarnings('ignore::UserWarning') + def test_extend_total_samples_override(self): + """extend(total_samples=) bypasses the additional_samples arithmetic.""" + f, _, x, y, weights = _bumps_fitter_and_data() + sampler = Sampler(f, [x], [y], [weights]) + + sampler.sample(samples=100, burn=20, thin=1) + extended = sampler.extend(total_samples=150, thin=1) + + assert extended.draws.shape[0] > 0 + + def test_extend_requires_existing_state(self): + """extend() before sample()/load_state() raises RuntimeError.""" + f, _, x, y, weights = _bumps_fitter_and_data() + sampler = Sampler(f, [x], [y], [weights]) + + with pytest.raises(RuntimeError, match='No chain to extend'): + sampler.extend(additional_samples=10) + + @pytest.mark.filterwarnings('ignore::UserWarning') + def test_extend_after_save_load_roundtrip(self, tmp_path): + """save() -> new sampler -> load_state() -> extend() continues the chain. + + Regression coverage: BUMPS ``load_state`` does not preserve parameter + labels, so resume validation must proceed by parameter order (with a + warning) instead of raising. + """ + f, _, x, y, weights = _bumps_fitter_and_data() + sampler = Sampler(f, [x], [y], [weights]) + first = sampler.sample(samples=100, burn=20, thin=1) + + prefix = str(tmp_path / 'chain') + sampler.save(prefix) + + sampler2 = Sampler(f, [x], [y], [weights]) + loaded = sampler2.load_state(prefix) + assert loaded.draws.shape[1] == first.draws.shape[1] + + with pytest.warns(UserWarning, match='does not carry parameter names'): + extended = sampler2.extend(additional_samples=100, thin=1) + + assert extended.draws.shape[1] == first.draws.shape[1] + assert extended.draws.shape[0] > 0 + + @pytest.mark.filterwarnings('ignore::UserWarning') + def test_extend_preserves_nondefault_population(self): + """Extending a chain that used a non-default population must not fail. + + Regression test: the population scale factor is recovered from the + saved state on resume, otherwise BUMPS regenerates the default + population and raises ``Cannot change Nvar, Npop or Ncr on resize``. + """ + f, _, x, y, weights = _bumps_fitter_and_data() + sampler = Sampler(f, [x], [y], [weights]) + + first = sampler.sample(samples=100, burn=20, thin=1, population=5) + first_npop = first.state.Npop + + extended = sampler.extend(additional_samples=100, thin=1) + + assert extended.draws.shape[1] == first.draws.shape[1] + assert extended.draws.shape[0] > 0 + # Population must be preserved across the extend. + assert extended.state.Npop == first_npop + + def test_save_raises_without_state(self, tmp_path): + """save() before sample() raises RuntimeError.""" + f, _, x, y, weights = _bumps_fitter_and_data() + sampler = Sampler(f, [x], [y], [weights]) + + with pytest.raises(RuntimeError, match='No chain state to save'): + sampler.save(str(tmp_path / 'chain')) + + @pytest.mark.filterwarnings('ignore::UserWarning') + def test_load_state_populates_results(self, tmp_path): + """A freshly loaded sampler reports draws/logp/param_names without resampling.""" + f, sp, x, y, weights = _bumps_fitter_and_data() + sampler = Sampler(f, [x], [y], [weights]) + first = sampler.sample(samples=100, burn=20, thin=2) + + prefix = str(tmp_path / 'chain') + sampler.save(prefix) + + sampler2 = Sampler(f, [x], [y], [weights]) + assert sampler2.draws is None + loaded = sampler2.load_state(prefix) + + assert sampler2.state is loaded.state + assert loaded.draws.ndim == 2 + assert loaded.draws.shape[1] == first.draws.shape[1] + assert loaded.logp.shape[0] == loaded.draws.shape[0] + # Sidecar restores the original (prefix-stripped) parameter names. + assert loaded.param_names == first.param_names + + @pytest.mark.filterwarnings('ignore::UserWarning') + def test_load_restores_param_names_from_sidecar(self, tmp_path): + """v2 sidecar restores names; a legacy v1 sidecar is still accepted.""" + f, _, x, y, weights = _bumps_fitter_and_data() + sampler = Sampler(f, [x], [y], [weights]) + first = sampler.sample(samples=100, burn=20, thin=2) + + prefix = str(tmp_path / 'chain') + sampler.save(prefix) + + # v2 sidecar written by Sampler.save() + with open(f'{prefix}.params.json') as fh: + sidecar = json.load(fh) + assert sidecar['schema_version'] == 2 + assert sidecar['param_names'] == first.param_names + assert sidecar['data_fingerprint'] is not None + + # Rewrite as a legacy v1 sidecar (as written by old save_posterior) + with open(f'{prefix}.params.json', 'w') as fh: + json.dump( + { + 'schema_version': 1, + 'param_names': first.param_names, + 'easyreflectometry_version': '0.0.0', + }, + fh, + ) + + sampler2 = Sampler(f, [x], [y], [weights]) + loaded = sampler2.load_state(prefix) + assert loaded.param_names == first.param_names + + @pytest.mark.filterwarnings('ignore::UserWarning') + def test_load_short_chain_regression(self, tmp_path): + """Short chain save/load — pins the BUMPS >= 1.0.4 loadtxt workaround. + + A short chain stores a single CR-weight update row; BUMPS' numpy-based + reader collapses it to a 1-D array and ``load_state`` raises + ``IndexError`` without the 2-D coercion workaround. + """ + f, _, x, y, weights = _bumps_fitter_and_data() + sampler = Sampler(f, [x], [y], [weights]) + sampler.sample(samples=20, burn=5, thin=1) + + prefix = str(tmp_path / 'short_chain') + sampler.save(prefix) + + sampler2 = Sampler(f, [x], [y], [weights]) + loaded = sampler2.load_state(prefix) + assert loaded.draws.shape[0] > 0 + + @pytest.mark.filterwarnings('ignore::UserWarning') + def test_load_fingerprint_mismatch_warns(self, tmp_path): + """Loading a chain into a sampler bound to different data warns.""" + f, _, x, y, weights = _bumps_fitter_and_data() + sampler = Sampler(f, [x], [y], [weights]) + sampler.sample(samples=100, burn=20, thin=2) + + prefix = str(tmp_path / 'chain') + sampler.save(prefix) + + other_y = y + 0.5 + sampler2 = Sampler(f, [x], [other_y], [weights]) + with pytest.warns(UserWarning, match='does not match the data fingerprint'): + sampler2.load_state(prefix) diff --git a/tests/unit/fitting/minimizers/test_minimizer_bumps.py b/tests/unit/fitting/minimizers/test_minimizer_bumps.py index 591a5c18..c69359a3 100644 --- a/tests/unit/fitting/minimizers/test_minimizer_bumps.py +++ b/tests/unit/fitting/minimizers/test_minimizer_bumps.py @@ -757,6 +757,31 @@ def _setup_driver_mock( ) return mock_FitDriver, mock_driver + @pytest.mark.parametrize( + 'kwargs, match', + [ + ({'samples': 0}, 'samples must be a positive integer'), + ({'samples': -1}, 'samples must be a positive integer'), + ({'burn': -1}, 'burn must be a non-negative integer'), + ({'thin': 0}, 'thin must be a positive integer'), + ], + ) + def test_sample_invalid_args(self, minimizer: Bumps, kwargs, match) -> None: + """Invalid samples/burn/thin values raise ValueError before any sampling. + + This is the single source of truth for these checks — the higher-level + ``Sampler`` relies on it. + """ + with pytest.raises(ValueError, match=match): + minimizer.mcmc_sample( + x=np.array([1.0]), + y=np.array([0.1]), + weights=np.array([1.0]), + samples=kwargs.get('samples', 10), + burn=kwargs.get('burn', 0), + thin=kwargs.get('thin', 1), + ) + def test_sample_basic(self, minimizer: Bumps, monkeypatch) -> None: """Verify that mcmc_sample() returns a dict with expected keys.""" mock_FitDriver, _ = self._setup_driver_mock(monkeypatch) diff --git a/tests/unit/fitting/test_fitter.py b/tests/unit/fitting/test_fitter.py index 634492c5..702f5e59 100644 --- a/tests/unit/fitting/test_fitter.py +++ b/tests/unit/fitting/test_fitter.py @@ -258,156 +258,3 @@ def test_post_compute_reshaping(self, fitter: Fitter): # TODO # def test_fit_function_wrapper() # def test_precompute_reshaping() - - -# =================================================================== -# Fitter.mcmc_sample() — Bayesian DREAM sampling -# =================================================================== - - -class TestFitterMcmcSample: - @pytest.fixture - def fitter(self, monkeypatch): - monkeypatch.setattr(Fitter, '_update_minimizer', MagicMock()) - return Fitter(MagicMock(), MagicMock()) - - def test_basic(self, fitter: Fitter): - """mcmc_sample() calls minimizer.mcmc_sample() and returns its result.""" - fitter._precompute_reshaping = MagicMock( - return_value=('x_fit', 'x_new', 'y_new', 'w_new', 'dims') - ) - fitter._fit_function_wrapper = MagicMock(return_value='wrapped') - fitter._minimizer = MagicMock() - fitter._minimizer.package = 'bumps' - expected = { - 'draws': np.array([[1.0]]), - 'param_names': ['a'], - 'internal_bumps_object': 'stub', - 'logp': None, - } - fitter._minimizer.mcmc_sample = MagicMock(return_value=expected) - - result = fitter.mcmc_sample( - x=np.array([1.0]), - y=np.array([0.1]), - weights=np.array([1.0]), - samples=100, - burn=20, - thin=2, - population=5, - ) - - assert result == expected - fitter._precompute_reshaping.assert_called_once_with( - np.array([1.0]), np.array([0.1]), np.array([1.0]), False - ) - fitter._fit_function_wrapper.assert_called_once_with('x_new', flatten=True) - fitter._minimizer.mcmc_sample.assert_called_once() - kw = fitter._minimizer.mcmc_sample.call_args.kwargs - assert kw['x'] == 'x_fit' - assert kw['y'] == 'y_new' - assert kw['weights'] == 'w_new' - assert kw['samples'] == 100 - assert kw['burn'] == 20 - assert kw['thin'] == 2 - assert kw['population'] == 5 - assert kw['progress_callback'] is None - assert fitter._dependent_dims == 'dims' - - def test_raises_if_not_bumps(self, fitter: Fitter): - """RuntimeError raised when the active minimizer is not BUMPS.""" - fitter._precompute_reshaping = MagicMock( - return_value=('x_fit', 'x_new', 'y_new', 'w_new', 'dims') - ) - fitter._fit_function_wrapper = MagicMock(return_value='wrapped') - fitter._minimizer = MagicMock() - fitter._minimizer.package = 'lmfit' - - with pytest.raises(RuntimeError, match='Bayesian sampling requires a BUMPS minimizer'): - fitter.mcmc_sample(x=np.array([1.0]), y=np.array([0.1]), weights=np.array([1.0])) - - def test_progress_callback_forwarded(self, fitter: Fitter): - """progress_callback is forwarded to minimizer.mcmc_sample().""" - fitter._precompute_reshaping = MagicMock( - return_value=('x_fit', 'x_new', 'y_new', 'w_new', 'dims') - ) - fitter._fit_function_wrapper = MagicMock(return_value='wrapped') - fitter._minimizer = MagicMock() - fitter._minimizer.package = 'bumps' - fitter._minimizer.mcmc_sample = MagicMock( - return_value={ - 'draws': [], - 'param_names': [], - 'internal_bumps_object': None, - 'logp': None, - } - ) - cb = MagicMock() - - fitter.mcmc_sample( - x=np.array([1.0]), - y=np.array([0.1]), - weights=np.array([1.0]), - progress_callback=cb, - ) - - assert fitter._minimizer.mcmc_sample.call_args.kwargs['progress_callback'] is cb - - def test_fit_function_restored_on_success(self, fitter: Fitter): - """Original fit function is restored after a successful call.""" - fitter._precompute_reshaping = MagicMock( - return_value=('x_fit', 'x_new', 'y_new', 'w_new', 'dims') - ) - fitter._fit_function_wrapper = MagicMock(return_value='wrapped') - fitter._minimizer = MagicMock() - fitter._minimizer.package = 'bumps' - fitter._minimizer.mcmc_sample = MagicMock( - return_value={ - 'draws': [], - 'param_names': [], - 'internal_bumps_object': None, - 'logp': None, - } - ) - original = fitter._fit_function - - fitter.mcmc_sample(x=np.array([1.0]), y=np.array([0.1]), weights=np.array([1.0])) - - assert fitter._fit_function is original - - def test_fit_function_restored_on_error(self, fitter: Fitter): - """Original fit function is restored even when minimizer raises.""" - fitter._precompute_reshaping = MagicMock( - return_value=('x_fit', 'x_new', 'y_new', 'w_new', 'dims') - ) - fitter._fit_function_wrapper = MagicMock(return_value='wrapped') - fitter._minimizer = MagicMock() - fitter._minimizer.package = 'bumps' - fitter._minimizer.mcmc_sample = MagicMock(side_effect=RuntimeError('boom')) - original = fitter._fit_function - - with pytest.raises(RuntimeError): - fitter.mcmc_sample(x=np.array([1.0]), y=np.array([0.1]), weights=np.array([1.0])) - - assert fitter._fit_function is original - - @pytest.mark.parametrize( - 'kwargs, match', - [ - ({'samples': 0}, 'samples must be a positive integer'), - ({'samples': -1}, 'samples must be a positive integer'), - ({'burn': -1}, 'burn must be a non-negative integer'), - ({'thin': 0}, 'thin must be a positive integer'), - ], - ) - def test_invalid_args_raise(self, fitter: Fitter, kwargs, match): - """Invalid samples/burn/thin values raise ValueError before any I/O.""" - with pytest.raises(ValueError, match=match): - fitter.mcmc_sample( - x=np.array([1.0]), - y=np.array([0.1]), - weights=np.array([1.0]), - samples=kwargs.get('samples', 10), - burn=kwargs.get('burn', 0), - thin=kwargs.get('thin', 1), - )