Skip to content

Add chapter on multiple imputation.#935

Open
abner-hb wants to merge 9 commits intostan-dev:masterfrom
abner-hb:master
Open

Add chapter on multiple imputation.#935
abner-hb wants to merge 9 commits intostan-dev:masterfrom
abner-hb:master

Conversation

@abner-hb
Copy link
Copy Markdown

@abner-hb abner-hb commented Mar 12, 2026

Submission Checklist

  • Builds locally YES
  • New functions marked with <<{ since VERSION }>> YES (no new functions)
  • Declare copyright holder and open-source license: see below

Summary

Add a chapter on multiple imputation to the Stan User's Guide.

Copyright and Licensing

Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company): Abner Heredia Bustos

By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses:

@WardBrian
Copy link
Copy Markdown
Member

Hi @abner-hb, thanks for this! I just made a couple commits to remove the changes to the built docs -- we let our Jenkins jobs build those for us during releases.

I'll ask @bob-carpenter to take a look at the contents when he gets a chance

@WardBrian WardBrian requested a review from bob-carpenter March 12, 2026 15:35
@bob-carpenter
Copy link
Copy Markdown
Member

I can review this.

Copy link
Copy Markdown
Member

@bob-carpenter bob-carpenter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for contributing this.

I am really sorry that I left 73 comments on such a short chapter. It was meant pedagogically and I hope it helps other things you write. It took a couple years of this kind of back-and-forth with Gelman and Vehtari and Goodrich before they stopped marking up everything I wrote this way. Gelman and Vehtari are excellent role models for writing clarity.

If you'd rather not do this, I'm happy to make all the changes I suggested myself.

their precision. So, it is often necessary to account explicitly for
the missing data when fitting a model of interest[^bda].

[^bda]: Chapter 18 in @GelmanEtAl:2013 offers a Bayesian perspective
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just skimmed Chapter 18. Gelman et al. do not provide a fully Bayesian perspective, he instead uses multiple imputation. The fully Bayesian perspective is given in the User's Guide chapter on missing data.

I would also put this into the main text. Only use footnotes in doc as a last resort.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not fully Bayesian but isn't it partially Bayesian?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fair call it "approximately Bayesian," as that's how Gelman talks about anything from maximum likelihood point estimates to VI.

<!-- there is no need to use any special pooling
rules[^rubin] to account for the uncertainty in the imputation. -->

[^rubin]: Chapter 3, page 45 of @carpenter-etal:2023 summarizes one
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove---this seems to be an unmoored comment. At least it doesn't show up when I render the html.


## Cut models

[**NOTE**: I would greatly appreciate any comments or changes to
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this note.

If you're not comfortable writing it, you can reduce this section to something really simple.

The point is just that using ad hoc multiple imputation like this is equivalent to doing cut in BUGS as described by Plummer, because there's no information flow from the second-stage inference back to the multiple imputation as you would get in the fully Bayesian model described in the chapter on missing data earlier in the user's guide.

[**NOTE**: I would greatly appreciate any comments or changes to
improve this subsection.]

A full bayesian probability model includes a feedback flow of
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bayesian -> Bayesian

The model doesn't have any feedback or flow per se---it's just how joint distributions work.

influence only some parameters in the model. From @plummer:2015,
p. 37:

> Cut models arise in applications with multiple data sources that
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove comments.

p. 37:

> Cut models arise in applications with multiple data sources that
provide information about different parameters in the model [...]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, rather than filling this all in, I'd just write a one-line summary and point to Plummer's article.

@WardBrian WardBrian requested a review from bob-carpenter April 13, 2026 19:20
Copy link
Copy Markdown
Member

@bob-carpenter bob-carpenter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went over this again and marked many of the grammatical nitpicky comments as resolved. I think they'd be better the way I was suggesting, but it's more important to actually publish this.

If you don't want to make these changes, @abner-hb, just let me know and I can go and make them myself.

I really appreciate your taking the time to write this despite the huge flurry of comments I've left. I hope they've been more helpful than frustrating, as that was my intention.


@article{plummer:2015,
author = {Plummer, Martyn},
title = {Cuts in Bayesian graphical models},
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs {B}ayesian in the title or "Bayesian" will get lower-cased.

We generally don't need the url, doi, or publisher, but they're OK to leave in.

And thanks for citing Martyn's paper.


Suppose that we have a matrix $x$ with columns
$x_{\cdot, 1}, \ldots, x_{\cdot, K}$ that we want to use to sample
values from a vector of quantities of interest called $\theta$. With
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

want to use to sample values -> that make up covariates in a regression with quantities of interest $\theta$.

posterior distribution of $\theta$ as $p(\theta \mid x^\text{comp})$.
But with missing data, our matrix is split into $x^{\text{obs}}$ (the
observed values of $x$) and $x^{\text{mis}}$ (the missing values of
$x$). This can be problematic because, in general, our knowledge may
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the last sentence. Just remove "This can be problematic ...".

\end{align*}
where $x^{\text{imp}}$ is a data set that includes imputed values of
$x^{\text{mis}}$.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to note to continue line 52 that this depends on a model of $x$, which you typically don't have with a regression, because the inferences for parameters are independent of the model of $x$ when $x$ is fully observed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants