better docs

mitzimorris · mitzimorris · commit f99a805e1252 · 2022-06-19T19:38:23.000-04:00
diff --git a/docsrc/examples/MCMC Sampling.ipynb b/docsrc/examples/MCMC Sampling.ipynb
@@ -4,38 +4,58 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# MCMC Sampling\n",
+    "# MCMC Sampling"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Overview\n",
     "\n",
-    "The [CmdStanModel](https://mc-stan.org/cmdstanpy/api.html#cmdstanmodel) object's\n",
-    "method [sample](https://mc-stan.org/cmdstanpy/api.html#cmdstanpy.CmdStanModel.sample)\n",
-    "invokes Stan's adaptive HMC-NUTS sampler which uses the Hamiltonian Monte Carlo (HMC) algorithm\n",
-    "and its adaptive variant the no-U-turn sampler (NUTS) to produce a set of\n",
-    "draws from the posterior distribution of the model parameters conditioned on the data.\n",
+    "Stan's MCMC sampler implements the Hamiltonian Monte Carlo (HMC) algorithm and its adaptive variant\n",
+    "the no-U-turn sampler (NUTS).\n",
+    "It creates a set of draws from the posterior distribution of the model conditioned on the data,\n",
+    "allowing for exact Bayesian inference of the model parameters.\n",
+    "Each draw consists of the values for all parameter, transformed parameter, and\n",
+    "generated quantities variables, reported on the constrained scale.\n",
     "\n",
-    "The `sample` method returns a [CmdStanMCMC](https://mc-stan.org/cmdstanpy/api.html#cmdstanmcmc) object.\n",
-    "Underlyingly, the sampler run outputs are a set of per-chain Stan CSV files.\n",
-    "The `CmdStanMCMC` object provide multiple accessor functions which allow the user\n",
-    "to access the resulting sample in whatever data format is needed for further analysis.\n",
+    "The [CmdStanModel sample](https://mc-stan.org/cmdstanpy/api.html#cmdstanpy.CmdStanModel.sample) method\n",
+    "wraps the CmdStan [sample](https://mc-stan.org/docs/cmdstan-guide/mcmc-config.html) method.\n",
+    "Underlyingly, the CmdStan outputs are a set of per-chain Stan CSV files.\n",
+    "In addition to the resulting sample, reported as one row per draw,\n",
+    "the Stan CSV files encode information about the inference engine configuration\n",
+    "and the sampler state.\n",
+    "The NUTS-HMC adaptive sampler algorithm also outputs the per-chain\n",
+    "HMC tuning parameters `step_size` and `metric`.\n",
     "\n",
-    "The sample can be extracted in tabular format, either as\n",
+    "The `sample` method returns a [CmdStanMCMC](https://mc-stan.org/cmdstanpy/api.html#cmdstanmcmc) object,\n",
+    "which provides access to the disparate information from the Stan CSV files.\n",
+    "Accessor functions allow the user\n",
+    "to access the sample in whatever data format is needed for further analysis.\n",
     "\n",
-    "- an [numpy.ndarray](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html#numpy.ndarray)\n",
+    "- The sample can be extracted in tabular format, either as\n",
     "\n",
-    "- a [pandas.DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html#pandas.DataFrame)\n",
+    "    + an [numpy.ndarray](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html#numpy.ndarray)\n",
     "\n",
-    "The sample can be treated as a collection of named, structured variables, and extracted as\n",
+    "    + a [pandas.DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html#pandas.DataFrame)\n",
     "\n",
-    "- a Python `dict` mapping names to `numpy.ndarray` objects\n",
+    "- The sample can be treated as a collection of named, structured variables, and extracted as\n",
     "\n",
-    "- an [xarray.Dataset](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html)\n",
+    "    + a Python `dict` mapping names to `numpy.ndarray` objects\n",
     "\n",
-    "The CmdStanMCMC object also provides access to the per-chain HMC tuning parameters `step_size` and `metric`\n",
-    "and the [InferenceMetadata](https://mc-stan.org/cmdstanpy/internal_api.html#inferencemetadata)\n",
-    "which consists of the CmdStan configuration, the layout of the CSV file data table,\n",
-    "and the mapping between the table columns and the Stan program structured variables.\n",
+    "    + an [xarray.Dataset](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html)\n",
     "\n",
     "\n",
-    "\n"
+    "In addtion, the `CmdStanMCMC` object has accessor methods for\n",
+    "\n",
+    "- The per-chain HMC tuning parameters `step_size` and `metric` \n",
+    "\n",
+    "- The CmdStan run configuration and console outputs\n",
+    "\n",
+    "- The sampler algorithm diagnostics\n",
+    "\n",
+    "- The mapping between the Stan model variables and the corresponding CSV file columns."
    ]
   },
   {
@@ -185,25 +205,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Summarizing the sample"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "scrolled": true
-   },
-   "outputs": [],
-   "source": [
-    "fit.summary()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Analyzing the sample"
+    "## Accessing the sampler outputs"
    ]
   },
   {
@@ -322,27 +324,18 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Saving the sampler output files"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "The sampler output files are written to a temporary directory which\n",
-    "is deleted upon session exit unless the ``output_dir`` argument is specified.\n",
-    "The ``save_csvfiles`` function moves the CmdStan CSV output files\n",
-    "to a specified directory without having to re-run the sampler.\n",
-    "The console output files are not saved. These files are treated as ephemeral; if the sample is valid, all relevant information is recorded in the CSV files."
+    "## Summarizing the sample"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "scrolled": true
+   },
    "outputs": [],
    "source": [
-    "# fit.save_csvfiles(dir=\"some_dir\")"
+    "fit.summary()"
    ]
   },
   {
@@ -369,7 +362,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "#### eight_schools.stan"
+    "**eight_schools.stan**"
    ]
   },
   {
@@ -386,7 +379,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "#### eight_schools.data.json"
+    "**eight_schools.data.json**"
    ]
   },
   {
@@ -434,6 +427,19 @@
    "source": [
     "print(eight_schools_fit.diagnose())"
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Saving the sampler output files\n",
+    "\n",
+    "The sampler output files are written to a temporary directory which\n",
+    "is deleted upon session exit unless the ``output_dir`` argument is specified.\n",
+    "The ``save_csvfiles`` function moves the CmdStan CSV output files\n",
+    "to a specified directory without having to re-run the sampler.\n",
+    "The console output files are not saved. These files are treated as ephemeral; if the sample is valid, all relevant information is recorded in the CSV files."
+   ]
   }
  ],
  "metadata": {