edits to notebook

mitzimorris · mitzimorris · commit 3c478ecdcacf · 2021-10-22T16:47:16.000-04:00
diff --git a/docsrc/examples/VI as Sampler Inits.ipynb b/docsrc/examples/VI as Sampler Inits.ipynb
@@ -8,18 +8,17 @@
     "\n",
     "In this example we show how to use the parameter estimates return by Stan's variational inference algorithm\n",
     "as the initial parameter values for Stan's NUTS-HMC sampler.\n",
-    "The program and data are taken from the [posteriordb package](https://github.com/stan-dev/posteriordb).\n",
-    "\n",
-    "The experiments reported in the paper [Pathfinder: Parallel quasi-Newton variational inference](https://arxiv.org/abs/2108.03782) by Zhang et al. show that mean-field ADVI provides a better estimate of the posterior, as measured by the 1-Wasserstein distance to the reference posterior, than 75 iterations of the warmup Phase I algorithm used by the NUTS-HMC sampler, furthermore, ADVI is more computationally efficient, requiring fewer evaluations of the log density and gradient functions.  Therefore, using the estimates from ADVI to initialize the parameter values for the NUTS-HMC sampler will allow the sampler to do a better job of adapting the stepsize and metric during warmup, resulting in better performance and estimation.\n",
-    "\n",
+    "By default, the sampler algorithm randomly initializes all model parameters in the range uniform\\[-2, 2\\].  When the true parameter value is outside of this range, starting from the ADVI estimates will speed up and improve adaptation.\n",
     "\n",
     "### Model and data\n",
     "\n",
+    "The Stan model and data are taken from the [posteriordb package](https://github.com/stan-dev/posteriordb).\n",
+    "\n",
     "We use the [blr model](https://github.com/stan-dev/posteriordb/blob/master/posterior_database/models/stan/blr.stan),\n",
     "a Bayesian standard linear regression model with noninformative priors,\n",
     "and its corresponding simulated dataset [sblri.json](https://github.com/stan-dev/posteriordb/blob/master/posterior_database/data/data/sblri.json.zip),\n",
     "which was simulated via script [sblr.R](https://github.com/stan-dev/posteriordb/blob/master/posterior_database/data/data-raw/sblr/sblr.R).\n",
-    "For conveince, we have copied the posteriordb model and data to this directory, in files [`blr.stan`](blr.stan) and [`sblri.json`](sblri.json)."
+    "For conveince, we have copied the posteriordb model and data to this directory, in files `blr.stan` and `sblri.json`."
    ]
   },
   {
@@ -81,11 +80,11 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Posteriordb provides reference posteriors for all models.  For the blr model, conditioned on the dataset `sblri.json`, the reference posteriors are in file [`sblri-blr.json`](https://github.com/stan-dev/posteriordb/blob/master/posterior_database/reference_posteriors/summary_statistics/mean/mean/sblri-blr.json)\n",
+    "Posteriordb provides reference posteriors for all models.  For the blr model, conditioned on the dataset `sblri.json`, the reference posteriors are in file [sblri-blr.json](https://github.com/stan-dev/posteriordb/blob/master/posterior_database/reference_posteriors/summary_statistics/mean/mean/sblri-blr.json)\n",
     "\n",
     "The reference posteriors for all elements of `beta` and `sigma` are all very close to $1.0$.\n",
     "\n",
-    "By default, the sampler algorithm randomly initializes all model parameters in the range uniform[-2, 2].  The ADVI estimates will provide a better starting point, and will therefore allow us to shorten the number of warmup iterations."
+    "The experiments reported in the paper [Pathfinder: Parallel quasi-Newton variational inference](https://arxiv.org/abs/2108.03782) by Zhang et al. show that mean-field ADVI provides a better estimate of the posterior, as measured by the 1-Wasserstein distance to the reference posterior, than 75 iterations of the warmup Phase I algorithm used by the NUTS-HMC sampler, furthermore, ADVI is more computationally efficient, requiring fewer evaluations of the log density and gradient functions.  Therefore, using the estimates from ADVI to initialize the parameter values for the NUTS-HMC sampler will allow the sampler to do a better job of adapting the stepsize and metric during warmup, resulting in better performance and estimation.\n"
    ]
   },
   {
@@ -113,7 +112,23 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The sampler estimates match the reference posterior.  If we run the HMC sampler for only 75 warmup iterations with random inits, it fails to estimate `sigma`, and produces fewer effective samples per second (N_Eff/s)."
+    "The sampler estimates match the reference posterior."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(mcmc_vb_inits_fit.diagnose())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Using the default random parameter initializations, we need to run more warmup iteratons. If we only run 75 warmup iterations with random inits, the result fails to estimate `sigma` correctly.  It is necessary to run the model with at least 150 warmup iterations to produce a good set of estimates."
    ]
   },
   {
@@ -133,6 +148,15 @@
    "source": [
     "mcmc_random_inits_fit.summary()"
    ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(mcmc_random_inits_fit.diagnose())"
+   ]
   }
  ],
  "metadata": {