Skip to content

Bayesian extend/resume#257

Open
rozyczko wants to merge 9 commits into
developfrom
bayesian_extend
Open

Bayesian extend/resume#257
rozyczko wants to merge 9 commits into
developfrom
bayesian_extend

Conversation

@rozyczko

@rozyczko rozyczko commented Jun 4, 2026

Copy link
Copy Markdown
Member

Support for resuming and extending Bayesian MCMC chains using the BUMPS DREAM sampler.
Added ability to save and reload sampler state and continue chains across sessions.

  • Resume and extend MCMC chains:

    • Added resume_state parameter to mcmc_sample and sample methods in both fitter.py and minimizer_bumps.py, allowing users to continue sampling from a saved chain state. This includes documentation of the "ring-buffer contract" for extending chains without losing samples.
  • Population/chains argument resolution:

    • Introduced _resolve_population_alias method to handle both chains and population arguments, raising an error if both are set with different values, improving API clarity and preventing silent bugs.

@codecov

codecov Bot commented Jun 4, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 95.23810% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.78%. Comparing base (aadbd48) to head (28e0c53).

Files with missing lines Patch % Lines
.../easyscience/fitting/minimizers/minimizer_bumps.py 94.73% 3 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #257      +/-   ##
===========================================
+ Coverage    82.65%   82.78%   +0.12%     
===========================================
  Files           62       62              
  Lines         5023     5078      +55     
===========================================
+ Hits          4152     4204      +52     
- Misses         871      874       +3     
Flag Coverage Δ
integration 44.30% <84.12%> (+0.43%) ⬆️
unittests 82.02% <74.60%> (-0.13%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/easyscience/fitting/fitter.py 97.33% <100.00%> (+0.05%) ⬆️
src/easyscience/fitting/minimizers/__init__.py 100.00% <100.00%> (ø)
src/easyscience/fitting/multi_fitter.py 98.52% <ø> (-0.03%) ⬇️
.../easyscience/fitting/minimizers/minimizer_bumps.py 95.63% <94.73%> (-0.32%) ⬇️

@rozyczko rozyczko added [scope] enhancement Adds/improves features (major.MINOR.patch) [priority] high Should be prioritized soon labels Jun 4, 2026
@rozyczko rozyczko marked this pull request as ready for review June 8, 2026 07:51

@damskii9992 damskii9992 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just gonna stop my review here as there is already plenty to work on.
As we discussed in-person, I also think we should refactor the bayesian sampling to a seperate "samplers" module. This should help us when we add other Bayesian samplers in the future as well as provide a cleaner user API (the user doesn't create a "Fitter" when doing sampling).
It also more cleanly separates responsibility in the code. A Minimizer minimizes a function to the data, a Sampler samples the parameters to create distributions.
To avoid duplicating code, it might be smart to move shared functionality to new base-classes.

Comment on lines +267 to +300
@staticmethod
def _resolve_population_alias(
chains: int | None = None,
population: int | None = None,
) -> int | None:
"""Resolve the ``chains`` / ``population`` alias for DREAM.

``chains`` is the user-friendly name; ``population`` is the parameter
name used by BUMPS. If both are provided and differ, an error is
raised. Returns the resolved value (or ``None`` if neither is set).

Parameters
----------
chains : int | None, default=None
User-friendly alias for DREAM population count.
population : int | None, default=None
BUMPS DREAM population count.

Returns
-------
int | None
The resolved population value, or ``None`` if neither is set.

Raises
------
ValueError
If both ``chains`` and ``population`` are provided but differ.
"""
if chains is not None and population is not None and chains != population:
raise ValueError(
f'Conflicting population specifications: chains={chains} '
f'and population={population}. Use only one.'
)
return chains if chains is not None else population

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you re-introduce this?

Comment on lines +471 to +474
**Ring-buffer contract (important!):** DREAM stores draws in a
fixed-size ring buffer sized to *samples*. Resuming with
``samples=N`` retains only the **last N** draws. To extend an
existing chain of M draws by N without losing any::

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All you really need to write here is: samples must be the total number of samples, so to extend by M, samples should be N + M.

Comment on lines +478 to +482
The ``burn`` parameter controls burn-in for the *new* draws
only; passing ``burn=0`` (strongly recommended on resume)
skips additional burn-in. A non-zero ``burn`` on a
previously-converged chain is usually a mistake and emits a
warning.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we maybe force it to be 0 when resuming?

Comment on lines -419 to +462
BUMPS DREAM population count (number of parallel chains).
DREAM population **scale factor** (not an absolute chain count):
BUMPS creates ``ceil(population * n_parameters)`` parallel chains,
so the default scale of 10 yields ``10 * n_parameters`` chains.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just write:
"BUMPS DREAM population count per parameter (number of parallel chains).

Comment on lines +484 to +486
The ``chains``/``population`` and ``initializer`` parameters
have **no effect** when ``resume_state`` is provided — they
are determined by the saved state.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be a cleaner API if we make the resume a new method without these ignored parameters, and then resume_state would be obligatory.

Comment on lines +573 to +595
# BUMPS ``save_state``/``load_state`` does **not** preserve the
# original parameter labels: a reloaded ``MCMCDraw`` comes back
# with default labels like ``['P0', 'P1', ...]``. We can therefore
# only validate names when the state still carries our prefixed
# labels, i.e. an in-memory state from the current session. For a
# reloaded state we fall back to the parameter-count check (done
# above) plus an order-based warning, because BUMPS resumes
# positionally by column order, not by name.
fresh_names = [p.name[len(MINIMIZER_PARAMETER_PREFIX) :] for p in problem._parameters]
state_labels = list(resume_state.labels)
state_prefix = MINIMIZER_PARAMETER_PREFIX
labels_carry_our_names = bool(state_labels) and all(
lbl.startswith(state_prefix) for lbl in state_labels
)
if labels_carry_our_names:
state_stripped = [lbl[len(state_prefix) :] for lbl in state_labels]
if fresh_names != state_stripped:
raise ValueError(
f'Parameter names/order mismatch between the current model '
f'and resume_state.\n'
f' Current model : {fresh_names}\n'
f' resume_state : {state_stripped}'
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we should save the state on the BUMPS object, so that when resuming in a notebook, the user doesn't have to supply the resume_state to resume. They would only need to supply it when resuming from a file. That would also clean up this code, as you know that when this keyword is supplied, the state is always resumed from a file. And I think it would provide a nicer user-experience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

[priority] high Should be prioritized soon [scope] enhancement Adds/improves features (major.MINOR.patch)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants