Bayesian extend/resume#257
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #257 +/- ##
===========================================
+ Coverage 82.65% 82.78% +0.12%
===========================================
Files 62 62
Lines 5023 5078 +55
===========================================
+ Hits 4152 4204 +52
- Misses 871 874 +3
Flags with carried forward coverage won't be shown. Click here to find out more.
|
compatibility for resampling API
damskii9992
left a comment
There was a problem hiding this comment.
Just gonna stop my review here as there is already plenty to work on.
As we discussed in-person, I also think we should refactor the bayesian sampling to a seperate "samplers" module. This should help us when we add other Bayesian samplers in the future as well as provide a cleaner user API (the user doesn't create a "Fitter" when doing sampling).
It also more cleanly separates responsibility in the code. A Minimizer minimizes a function to the data, a Sampler samples the parameters to create distributions.
To avoid duplicating code, it might be smart to move shared functionality to new base-classes.
| @staticmethod | ||
| def _resolve_population_alias( | ||
| chains: int | None = None, | ||
| population: int | None = None, | ||
| ) -> int | None: | ||
| """Resolve the ``chains`` / ``population`` alias for DREAM. | ||
|
|
||
| ``chains`` is the user-friendly name; ``population`` is the parameter | ||
| name used by BUMPS. If both are provided and differ, an error is | ||
| raised. Returns the resolved value (or ``None`` if neither is set). | ||
|
|
||
| Parameters | ||
| ---------- | ||
| chains : int | None, default=None | ||
| User-friendly alias for DREAM population count. | ||
| population : int | None, default=None | ||
| BUMPS DREAM population count. | ||
|
|
||
| Returns | ||
| ------- | ||
| int | None | ||
| The resolved population value, or ``None`` if neither is set. | ||
|
|
||
| Raises | ||
| ------ | ||
| ValueError | ||
| If both ``chains`` and ``population`` are provided but differ. | ||
| """ | ||
| if chains is not None and population is not None and chains != population: | ||
| raise ValueError( | ||
| f'Conflicting population specifications: chains={chains} ' | ||
| f'and population={population}. Use only one.' | ||
| ) | ||
| return chains if chains is not None else population |
There was a problem hiding this comment.
Why did you re-introduce this?
| **Ring-buffer contract (important!):** DREAM stores draws in a | ||
| fixed-size ring buffer sized to *samples*. Resuming with | ||
| ``samples=N`` retains only the **last N** draws. To extend an | ||
| existing chain of M draws by N without losing any:: |
There was a problem hiding this comment.
All you really need to write here is: samples must be the total number of samples, so to extend by M, samples should be N + M.
| The ``burn`` parameter controls burn-in for the *new* draws | ||
| only; passing ``burn=0`` (strongly recommended on resume) | ||
| skips additional burn-in. A non-zero ``burn`` on a | ||
| previously-converged chain is usually a mistake and emits a | ||
| warning. |
There was a problem hiding this comment.
Should we maybe force it to be 0 when resuming?
| BUMPS DREAM population count (number of parallel chains). | ||
| DREAM population **scale factor** (not an absolute chain count): | ||
| BUMPS creates ``ceil(population * n_parameters)`` parallel chains, | ||
| so the default scale of 10 yields ``10 * n_parameters`` chains. |
There was a problem hiding this comment.
Maybe just write:
"BUMPS DREAM population count per parameter (number of parallel chains).
| The ``chains``/``population`` and ``initializer`` parameters | ||
| have **no effect** when ``resume_state`` is provided — they | ||
| are determined by the saved state. |
There was a problem hiding this comment.
I think it might be a cleaner API if we make the resume a new method without these ignored parameters, and then resume_state would be obligatory.
| # BUMPS ``save_state``/``load_state`` does **not** preserve the | ||
| # original parameter labels: a reloaded ``MCMCDraw`` comes back | ||
| # with default labels like ``['P0', 'P1', ...]``. We can therefore | ||
| # only validate names when the state still carries our prefixed | ||
| # labels, i.e. an in-memory state from the current session. For a | ||
| # reloaded state we fall back to the parameter-count check (done | ||
| # above) plus an order-based warning, because BUMPS resumes | ||
| # positionally by column order, not by name. | ||
| fresh_names = [p.name[len(MINIMIZER_PARAMETER_PREFIX) :] for p in problem._parameters] | ||
| state_labels = list(resume_state.labels) | ||
| state_prefix = MINIMIZER_PARAMETER_PREFIX | ||
| labels_carry_our_names = bool(state_labels) and all( | ||
| lbl.startswith(state_prefix) for lbl in state_labels | ||
| ) | ||
| if labels_carry_our_names: | ||
| state_stripped = [lbl[len(state_prefix) :] for lbl in state_labels] | ||
| if fresh_names != state_stripped: | ||
| raise ValueError( | ||
| f'Parameter names/order mismatch between the current model ' | ||
| f'and resume_state.\n' | ||
| f' Current model : {fresh_names}\n' | ||
| f' resume_state : {state_stripped}' | ||
| ) |
There was a problem hiding this comment.
I feel like we should save the state on the BUMPS object, so that when resuming in a notebook, the user doesn't have to supply the resume_state to resume. They would only need to supply it when resuming from a file. That would also clean up this code, as you know that when this keyword is supplied, the state is always resumed from a file. And I think it would provide a nicer user-experience.
Support for resuming and extending Bayesian MCMC chains using the BUMPS DREAM sampler.
Added ability to save and reload sampler state and continue chains across sessions.
Resume and extend MCMC chains:
resume_stateparameter tomcmc_sampleandsamplemethods in bothfitter.pyandminimizer_bumps.py, allowing users to continue sampling from a saved chain state. This includes documentation of the "ring-buffer contract" for extending chains without losing samples.Population/chains argument resolution:
_resolve_population_aliasmethod to handle bothchainsandpopulationarguments, raising an error if both are set with different values, improving API clarity and preventing silent bugs.