|
7 | 7 | "## Using Variational Estimates to Initialize the NUTS-HMC Sampler\n", |
8 | 8 | "\n", |
9 | 9 | "In this example we show how to use the parameter estimates return by Stan's variational inference algorithm\n", |
10 | | - "as the initial parameter values for Stan's NUTS-HMC sampler, using a the [earnings-logearn_height model](https://github.com/stan-dev/posteriordb/blob/master/posterior_database/models/stan/logearn_height.stan) and data from the [posteriordb package](https://github.com/stan-dev/posteriordb).\n", |
| 10 | + "as the initial parameter values for Stan's NUTS-HMC sampler.\n", |
| 11 | + "The program and data are taken from the [posteriordb package](https://github.com/stan-dev/posteriordb).\n", |
11 | 12 | "\n", |
12 | 13 | "The experiments reported in the paper [Pathfinder: Parallel quasi-Newton variational inference](https://arxiv.org/abs/2108.03782) by Zhang et al. show that mean-field ADVI provides a better estimate of the posterior, as measured by the 1-Wasserstein distance to the reference posterior, than 75 iterations of the warmup Phase I algorithm used by the NUTS-HMC sampler, furthermore, ADVI is more computationally efficient, requiring fewer evaluations of the log density and gradient functions. Therefore, using the estimates from ADVI to initialize the parameter values for the NUTS-HMC sampler will allow the sampler to do a better job of adapting the stepsize and metric during warmup, resulting in better performance and estimation.\n", |
13 | 14 | "\n", |
14 | 15 | "\n", |
15 | 16 | "### Model and data\n", |
16 | 17 | "\n", |
17 | | - "For conveince, we have copied the posteriordb model and data to this directory, in files [logearn_height.stan](logearn_height.stan) and [earnings.json](earnings.json)." |
| 18 | + "We use the [blr model](https://github.com/stan-dev/posteriordb/blob/master/posterior_database/models/stan/blr.stan),\n", |
| 19 | + "a Bayesian standard linear regression model with noninformative priors,\n", |
| 20 | + "and its corresponding simulated dataset [sblri.json](https://github.com/stan-dev/posteriordb/blob/master/posterior_database/data/data/sblri.json.zip),\n", |
| 21 | + "which was simulated via script [sblr.R](https://github.com/stan-dev/posteriordb/blob/master/posterior_database/data/data-raw/sblr/sblr.R).\n", |
| 22 | + "For conveince, we have copied the posteriordb model and data to this directory, in files [`blr.stan`](blr.stan) and [`sblri.json`](sblri.json)." |
18 | 23 | ] |
19 | 24 | }, |
20 | 25 | { |
|
25 | 30 | "source": [ |
26 | 31 | "import os\n", |
27 | 32 | "from cmdstanpy import CmdStanModel\n", |
28 | | - " \n", |
29 | | - "stan_file = 'logearn_height.stan'\n", |
30 | | - "data_file = 'earnings.json'\n", |
31 | 33 | "\n", |
32 | | - "# instantiate, compile bernoulli model\n", |
33 | | - "model = CmdStanModel(stan_file=stan_file)" |
34 | | - ] |
35 | | - }, |
36 | | - { |
37 | | - "cell_type": "markdown", |
38 | | - "metadata": {}, |
39 | | - "source": [ |
40 | | - "The earnings dataset is a set of 1192 observations of annual earnings in USD, height in inches, and indicator for sex==male." |
41 | | - ] |
42 | | - }, |
43 | | - { |
44 | | - "cell_type": "code", |
45 | | - "execution_count": null, |
46 | | - "metadata": {}, |
47 | | - "outputs": [], |
48 | | - "source": [ |
49 | | - "import json\n", |
50 | | - "with open(data_file, 'r') as fd:\n", |
51 | | - " data_dict = json.load(fd)\n", |
52 | | - "print(data_dict.keys())\n", |
53 | | - "print(data_dict['N'])\n", |
54 | | - "for i in range(5):\n", |
55 | | - " print(data_dict['earn'][i], data_dict['height'][i])\n" |
56 | | - ] |
57 | | - }, |
58 | | - { |
59 | | - "cell_type": "markdown", |
60 | | - "metadata": {}, |
61 | | - "source": [ |
62 | | - "The \"logearn_height\" model regresses the log earnings on height." |
63 | | - ] |
64 | | - }, |
65 | | - { |
66 | | - "cell_type": "code", |
67 | | - "execution_count": null, |
68 | | - "metadata": {}, |
69 | | - "outputs": [], |
70 | | - "source": [ |
| 34 | + "stan_file = 'blr.stan' # basic linear regression\n", |
| 35 | + "data_file = 'sblri.json' # simulated data\n", |
| 36 | + "\n", |
| 37 | + "model = CmdStanModel(stan_file=stan_file)\n", |
| 38 | + "\n", |
71 | 39 | "print(model.code())" |
72 | 40 | ] |
73 | 41 | }, |
|
94 | 62 | "cell_type": "markdown", |
95 | 63 | "metadata": {}, |
96 | 64 | "source": [ |
97 | | - "The ADVI algorithm provides estimates of all model parameters as well as the step size scaling factor `eta`.\n", |
| 65 | + "The ADVI algorithm provides estimates of all model parameters.\n", |
98 | 66 | "\n", |
99 | | - "The `variational` method returns a `CmdStanVB` object, with methods `eta` and `stan_variables`, which\n", |
100 | | - "return the step size scaling factor and estimates of all model parameters as a Python dictionary respectively." |
| 67 | + "The `variational` method returns a `CmdStanVB` object, with method `stan_variables`, which\n", |
| 68 | + "returns the approximate estimates of all model parameters as a Python dictionary." |
101 | 69 | ] |
102 | 70 | }, |
103 | 71 | { |
|
106 | 74 | "metadata": {}, |
107 | 75 | "outputs": [], |
108 | 76 | "source": [ |
109 | | - "print(vb_fit.eta, vb_fit.stan_variables())" |
| 77 | + "print(vb_fit.stan_variables())" |
110 | 78 | ] |
111 | 79 | }, |
112 | 80 | { |
113 | 81 | "cell_type": "markdown", |
114 | 82 | "metadata": {}, |
115 | 83 | "source": [ |
116 | | - "Posteriordb provides reference posteriors for all models. For the logearn_height model, conditioned on the dataset `earnings.json`, the posterior variables are:\n", |
| 84 | + "Posteriordb provides reference posteriors for all models. For the blr model, conditioned on the dataset `sblri.json`, the reference posteriors are in file [`sblri-blr.json`](https://github.com/stan-dev/posteriordb/blob/master/posterior_database/reference_posteriors/summary_statistics/mean/mean/sblri-blr.json)\n", |
117 | 85 | "\n", |
118 | | - "- beta[1]: 5.782\n", |
119 | | - "- beta[2]: 0.059\n", |
120 | | - "- sigma: 0.894\n", |
| 86 | + "The reference posteriors for all elements of `beta` and `sigma` are all very close to $1.0$.\n", |
121 | 87 | "\n", |
122 | | - "By default, the sampler algorithm randomly initializes all model parameters in the range uniform[-2, 2]. The ADVI estimates will provide a better starting point, especially w/r/t to parameter `beta[1]`, than the defaults.\n", |
123 | | - "In addition, we can use the step size scaling factor to scale the initial step size, which allows us to skip the first phase of warmup (default 75 iterations)." |
| 88 | + "By default, the sampler algorithm randomly initializes all model parameters in the range uniform[-2, 2]. The ADVI estimates will provide a better starting point, and will therefore allow us to shorten the number of warmup iterations." |
124 | 89 | ] |
125 | 90 | }, |
126 | 91 | { |
|
130 | 95 | "outputs": [], |
131 | 96 | "source": [ |
132 | 97 | "vb_vars = vb_fit.stan_variables()\n", |
133 | | - "vb_stepsize = 1.0 / vb_fit.eta\n", |
134 | 98 | "mcmc_vb_inits_fit = model.sample(\n", |
135 | | - " data=data_file, inits=vb_vars, step_size=vb_stepsize,\n", |
136 | | - " adapt_init_phase=0, seed=123\n", |
| 99 | + " data=data_file, inits=vb_vars, iter_warmup=75, seed=12345\n", |
137 | 100 | ")" |
138 | 101 | ] |
139 | 102 | }, |
|
150 | 113 | "cell_type": "markdown", |
151 | 114 | "metadata": {}, |
152 | 115 | "source": [ |
153 | | - "The sampler results match the reference posterior, (taking into account MCSE).\n", |
154 | | - "\n", |
155 | | - "- beta[1]: 5.782\n", |
156 | | - "- beta[2]: 0.059\n", |
157 | | - "- sigma: 0.894\n", |
158 | | - "\n", |
159 | | - "To see how this is useful, we run the sampler with default initializations, step size, and warmup." |
| 116 | + "The sampler estimates match the reference posterior. If we run the HMC sampler for only 75 warmup iterations with random inits, it fails to estimate `sigma`, and produces fewer effective samples per second (N_Eff/s)." |
160 | 117 | ] |
161 | 118 | }, |
162 | 119 | { |
|
165 | 122 | "metadata": {}, |
166 | 123 | "outputs": [], |
167 | 124 | "source": [ |
168 | | - "mcmc_random_inits_fit = model.sample(data=data_file, seed=123)" |
| 125 | + "mcmc_random_inits_fit = model.sample(data=data_file, iter_warmup=75, seed=12345)" |
169 | 126 | ] |
170 | 127 | }, |
171 | 128 | { |
|
176 | 133 | "source": [ |
177 | 134 | "mcmc_random_inits_fit.summary()" |
178 | 135 | ] |
179 | | - }, |
180 | | - { |
181 | | - "cell_type": "markdown", |
182 | | - "metadata": {}, |
183 | | - "source": [ |
184 | | - "Using the variational estimates to skip warmup phase I shows improved N_Eff/s (number of effective sampler per second) values for all parameters. This is a simple model run on a small dataset. For complex models where the initial parameter values are far from the default initializations, this procedure may allow for faster and better adaptation during warmup." |
185 | | - ] |
186 | 136 | } |
187 | 137 | ], |
188 | 138 | "metadata": { |
|
0 commit comments