Skip to content

Commit c83b739

Browse files
authored
Merge pull request #473 from stan-dev/feature/433-notebook-advi-init-sampler
Feature/433 notebook advi init sampler
2 parents f048803 + 496b090 commit c83b739

10 files changed

Lines changed: 1295 additions & 1025 deletions

File tree

cmdstanpy/model.py

Lines changed: 20 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1181,6 +1181,7 @@ def variational(
11811181
:param iter: Maximum number of ADVI iterations.
11821182
11831183
:param grad_samples: Number of MC draws for computing the gradient.
1184+
Default is 10. If problems arise, try doubling current value.
11841185
11851186
:param elbo_samples: Number of MC draws for estimate of ELBO.
11861187
@@ -1247,14 +1248,10 @@ def variational(
12471248

12481249
# treat failure to converge as failure
12491250
transcript_file = runset.stdout_files[dummy_chain_id]
1250-
valid = True
12511251
pat = re.compile(r'The algorithm may not have converged.', re.M)
12521252
with open(transcript_file, 'r') as transcript:
12531253
contents = transcript.read()
1254-
errors = re.findall(pat, contents)
1255-
if len(errors) > 0:
1256-
valid = False
1257-
if not valid:
1254+
if len(re.findall(pat, contents)) > 0:
12581255
if require_converged:
12591256
raise RuntimeError(
12601257
'The algorithm may not have converged.\n'
@@ -1268,12 +1265,25 @@ def variational(
12681265
'Proceeding because require_converged is set to False',
12691266
)
12701267
if not runset._check_retcodes():
1271-
msg = 'Error during variational inference:\n{}'.format(
1272-
runset.get_err_msgs()
1273-
)
1274-
msg = '{}Command and output files:\n{}'.format(
1275-
msg, runset.__repr__()
1268+
transcript_file = runset.stdout_files[dummy_chain_id]
1269+
with open(transcript_file, 'r') as transcript:
1270+
contents = transcript.read()
1271+
pat = re.compile(
1272+
r'stan::variational::normal_meanfield::calc_grad:', re.M
12761273
)
1274+
if len(re.findall(pat, contents)) > 0:
1275+
if grad_samples is None:
1276+
grad_samples = 10
1277+
msg = (
1278+
'Variational algorithm gradient calculation failed. '
1279+
'Double the value of argument "grad_samples", '
1280+
'current value is {}.'.format(grad_samples)
1281+
)
1282+
else:
1283+
msg = (
1284+
'Variational algorithm failed.\n '
1285+
'Console output:\n{}'.format(contents)
1286+
)
12771287
raise RuntimeError(msg)
12781288
# pylint: disable=invalid-name
12791289
vb = CmdStanVB(runset)

cmdstanpy/stanfit.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2073,6 +2073,7 @@ def _set_variational_attrs(self, sample_csv_0: str) -> None:
20732073
self._metadata = InferenceMetadata(meta)
20742074
# these three assignments don't grant type information
20752075
self._column_names: Tuple[str, ...] = meta['column_names']
2076+
self._eta: float = meta['eta']
20762077
self._variational_mean: np.ndarray = meta['variational_mean']
20772078
self._variational_sample: np.ndarray = meta['variational_sample']
20782079

@@ -2094,6 +2095,13 @@ def column_names(self) -> Tuple[str, ...]:
20942095
"""
20952096
return self._column_names
20962097

2098+
@property
2099+
def eta(self) -> float:
2100+
"""
2101+
Step size scaling parameter 'eta'
2102+
"""
2103+
return self._eta
2104+
20972105
@property
20982106
def variational_params_np(self) -> np.ndarray:
20992107
"""

cmdstanpy/utils.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -678,6 +678,8 @@ def scan_variational_csv(path: str) -> Dict[str, Any]:
678678
lineno, line
679679
)
680680
)
681+
_, eta = line.split('=')
682+
dict['eta'] = float(eta)
681683
line = fd.readline().lstrip(' #\t\n')
682684
lineno += 1
683685
xs = line.split(',')

docsrc/examples.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,5 +7,7 @@ __________________
77
examples/MCMC Sampling.ipynb
88
examples/Maximum Likelihood Estimation.ipynb
99
examples/Variational Inference.ipynb
10+
examples/VI as Sampler Inits.ipynb
1011
examples/Run Generated Quantities.ipynb
1112
examples/Using External C++.ipynb
13+
Lines changed: 209 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,209 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"## Using Variational Estimates to Initialize the NUTS-HMC Sampler\n",
8+
"\n",
9+
"In this example we show how to use the parameter estimates return by Stan's variational inference algorithm\n",
10+
"as the initial parameter values for Stan's NUTS-HMC sampler, using a the [earnings-logearn_height model](https://github.com/stan-dev/posteriordb/blob/master/posterior_database/models/stan/logearn_height.stan) and data from the [posteriordb package](https://github.com/stan-dev/posteriordb).\n",
11+
"\n",
12+
"The experiments reported in the paper [Pathfinder: Parallel quasi-Newton variational inference](https://arxiv.org/abs/2108.03782) by Zhang et al. show that mean-field ADVI provides a better estimate of the posterior, as measured by the 1-Wasserstein distance to the reference posterior, than 75 iterations of the warmup Phase I algorithm used by the NUTS-HMC sampler, furthermore, ADVI is more computationally efficient, requiring fewer evaluations of the log density and gradient functions. Therefore, using the estimates from ADVI to initialize the parameter values for the NUTS-HMC sampler will allow the sampler to do a better job of adapting the stepsize and metric during warmup, resulting in better performance and estimation.\n",
13+
"\n",
14+
"\n",
15+
"### Model and data\n",
16+
"\n",
17+
"For conveince, we have copied the posteriordb model and data to this directory, in files [logearn_height.stan](logearn_height.stan) and [earnings.json](earnings.json)."
18+
]
19+
},
20+
{
21+
"cell_type": "code",
22+
"execution_count": null,
23+
"metadata": {},
24+
"outputs": [],
25+
"source": [
26+
"import os\n",
27+
"from cmdstanpy import CmdStanModel\n",
28+
" \n",
29+
"stan_file = 'logearn_height.stan'\n",
30+
"data_file = 'earnings.json'\n",
31+
"\n",
32+
"# instantiate, compile bernoulli model\n",
33+
"model = CmdStanModel(stan_file=stan_file)"
34+
]
35+
},
36+
{
37+
"cell_type": "markdown",
38+
"metadata": {},
39+
"source": [
40+
"The earnings dataset is a set of 1192 observations of annual earnings in USD, height in inches, and indicator for sex==male."
41+
]
42+
},
43+
{
44+
"cell_type": "code",
45+
"execution_count": null,
46+
"metadata": {},
47+
"outputs": [],
48+
"source": [
49+
"import json\n",
50+
"with open(data_file, 'r') as fd:\n",
51+
" data_dict = json.load(fd)\n",
52+
"print(data_dict.keys())\n",
53+
"print(data_dict['N'])\n",
54+
"for i in range(5):\n",
55+
" print(data_dict['earn'][i], data_dict['height'][i])\n"
56+
]
57+
},
58+
{
59+
"cell_type": "markdown",
60+
"metadata": {},
61+
"source": [
62+
"The \"logearn_height\" model regresses the log earnings on height."
63+
]
64+
},
65+
{
66+
"cell_type": "code",
67+
"execution_count": null,
68+
"metadata": {},
69+
"outputs": [],
70+
"source": [
71+
"print(model.code())"
72+
]
73+
},
74+
{
75+
"cell_type": "markdown",
76+
"metadata": {},
77+
"source": [
78+
"### Run Stan's variational inference algorithm, obtain fitted estimates\n",
79+
"\n",
80+
"The `CmdStanModel` method `variational` runs CmdStan's ADVI algorithm.\n",
81+
"Conditioning the model on the data results in a posterior geometry which is difficult to navigate. Because the ADVI algorithm is unstable and may fail to converge, we run it with argument `require_converged` set to `False`. We also specify a seed, to avoid instabilities as well as for reproducibility."
82+
]
83+
},
84+
{
85+
"cell_type": "code",
86+
"execution_count": null,
87+
"metadata": {},
88+
"outputs": [],
89+
"source": [
90+
"vb_fit = model.variational(data=data_file, require_converged=False, seed=123)"
91+
]
92+
},
93+
{
94+
"cell_type": "markdown",
95+
"metadata": {},
96+
"source": [
97+
"The ADVI algorithm provides estimates of all model parameters as well as the step size scaling factor `eta`.\n",
98+
"\n",
99+
"The `variational` method returns a `CmdStanVB` object, with methods `eta` and `stan_variables`, which\n",
100+
"return the step size scaling factor and estimates of all model parameters as a Python dictionary respectively."
101+
]
102+
},
103+
{
104+
"cell_type": "code",
105+
"execution_count": null,
106+
"metadata": {},
107+
"outputs": [],
108+
"source": [
109+
"print(vb_fit.eta, vb_fit.stan_variables())"
110+
]
111+
},
112+
{
113+
"cell_type": "markdown",
114+
"metadata": {},
115+
"source": [
116+
"Posteriordb provides reference posteriors for all models. For the logearn_height model, conditioned on the dataset `earnings.json`, the posterior variables are:\n",
117+
"\n",
118+
"- beta[1]: 5.782\n",
119+
"- beta[2]: 0.059\n",
120+
"- sigma: 0.894\n",
121+
"\n",
122+
"By default, the sampler algorithm randomly initializes all model parameters in the range uniform[-2, 2]. The ADVI estimates will provide a better starting point, especially w/r/t to parameter `beta[1]`, than the defaults.\n",
123+
"In addition, we can use the step size scaling factor to scale the initial step size, which allows us to skip the first phase of warmup (default 75 iterations)."
124+
]
125+
},
126+
{
127+
"cell_type": "code",
128+
"execution_count": null,
129+
"metadata": {},
130+
"outputs": [],
131+
"source": [
132+
"vb_vars = vb_fit.stan_variables()\n",
133+
"vb_stepsize = 1.0 / vb_fit.eta\n",
134+
"mcmc_vb_inits_fit = model.sample(\n",
135+
" data=data_file, inits=vb_vars, step_size=vb_stepsize,\n",
136+
" adapt_init_phase=0, seed=123\n",
137+
")"
138+
]
139+
},
140+
{
141+
"cell_type": "code",
142+
"execution_count": null,
143+
"metadata": {},
144+
"outputs": [],
145+
"source": [
146+
"mcmc_vb_inits_fit.summary()"
147+
]
148+
},
149+
{
150+
"cell_type": "markdown",
151+
"metadata": {},
152+
"source": [
153+
"The sampler results match the reference posterior, (taking into account MCSE).\n",
154+
"\n",
155+
"- beta[1]: 5.782\n",
156+
"- beta[2]: 0.059\n",
157+
"- sigma: 0.894\n",
158+
"\n",
159+
"To see how this is useful, we run the sampler with default initializations, step size, and warmup."
160+
]
161+
},
162+
{
163+
"cell_type": "code",
164+
"execution_count": null,
165+
"metadata": {},
166+
"outputs": [],
167+
"source": [
168+
"mcmc_random_inits_fit = model.sample(data=data_file, seed=123)"
169+
]
170+
},
171+
{
172+
"cell_type": "code",
173+
"execution_count": null,
174+
"metadata": {},
175+
"outputs": [],
176+
"source": [
177+
"mcmc_random_inits_fit.summary()"
178+
]
179+
},
180+
{
181+
"cell_type": "markdown",
182+
"metadata": {},
183+
"source": [
184+
"Using the variational estimates to skip warmup phase I shows improved N_Eff/s (number of effective sampler per second) values for all parameters. This is a simple model run on a small dataset. For complex models where the initial parameter values are far from the default initializations, this procedure may allow for faster and better adaptation during warmup."
185+
]
186+
}
187+
],
188+
"metadata": {
189+
"kernelspec": {
190+
"display_name": "Python 3",
191+
"language": "python",
192+
"name": "python3"
193+
},
194+
"language_info": {
195+
"codemirror_mode": {
196+
"name": "ipython",
197+
"version": 3
198+
},
199+
"file_extension": ".py",
200+
"mimetype": "text/x-python",
201+
"name": "python",
202+
"nbconvert_exporter": "python",
203+
"pygments_lexer": "ipython3",
204+
"version": "3.8.5"
205+
}
206+
},
207+
"nbformat": 4,
208+
"nbformat_minor": 4
209+
}

0 commit comments

Comments
 (0)