Bayesian Statistics HW6

Q1. Consider the following Bayesian hierarchical mixture model. Let \(\mathbf{Y} = \{Y_i\}_{i=1}^n\) denote a set of observations that are grouped into \(J\) clusters following different Gaussian distributions, and let \(\mathbf{Z} = \{Z_i\}_{i=1}^n\) denote a set of latent variables for allocating each observation to a cluster such that \(Z_i = j\) if and only if the \(i\)th observation is allocated to the \(j\)th cluster, for \(j = 1, \dots, J\). The Bayesian hierarchical mixture model is expressed as \[ \begin{aligned} Y_i \mid (\mu, \sigma^2, Z) &\overset{\text{ind}}{\sim} \mathcal N(\mu_{Z_i}, \sigma^2), \quad i = 1, \dots, n, \\ Z_i \mid \mathbf{V} &\overset{\text{iid}}{\sim} \prod_{j=1}^J \pi_j(\mathbf{V})^{\delta_j(Z_i)}, \quad i = 1, \dots, n, \\ V_j \mid \alpha &\overset{\text{iid}}{\sim} \text{Beta}(1, \alpha), \quad j = 1, \dots, J-1, \quad V_J = 1, \\ \mu_j \mid \sigma^2 &\overset{\text{iid}}{\sim} \mathcal N(0, \kappa \sigma^2), \quad j = 1, \dots, J, \\ \sigma^2 &\sim \text{Inv-Gamma}(\gamma, \beta),\end{aligned}\] where \(\mu = (\mu_1, \dots, \mu_J)\) denotes a set of cluster-specific means of a normal distribution, \(\sigma^2\) denotes a common variance across clusters, \(\mathbf{V} = \{V_j\}_{j=1}^{J-1}\) denotes a set of random variables from a so-called stick-breaking process with \(V_J = 1\), so that \[ \pi_j(\mathbf{V}) = V_j \prod_{l<j} (1 - V_l)\] becomes a mixture probability weight, \(\delta_j(Z_i)\) is a delta function such that a probability measure is concentrated at \(Z_i = j\), and \((\alpha, \kappa, \gamma, \beta)\) are known constants. Our goal is to construct the Gibbs sampler that simulates the target posterior distribution of the model, \(p(\mathbf{Z}, \mathbf{V}, \mu, \sigma^2 \mid \mathbf{Y})\).

(*) Obtain the full posterior.

\[\begin{aligned} p(\mathbf{Z}, \mathbf{V}, \mu, \sigma^2 \mid \mathbf{Y}) &\propto p(\mathbf Y|\mathbf{Z}, \mathbf{V}, \mu, \sigma^2)p(\mathbf{Z}, \mathbf{V}, \mu, \sigma^2) \\ &\propto p(\mathbf Y|\mathbf{Z}, \mu, \sigma^2)p(\mathbf Z | \mathbf V)p(\mathbf{V}, \mu, \sigma^2) \\ &\propto p(\mathbf Y|\mathbf{Z}, \mu, \sigma^2)p(\mathbf Z | \mathbf V)p(\mathbf V)p(\mu|\sigma^2)p(\sigma^2) \end{aligned}\]


(a) Construct a sampling step for \(\mathbf{Z}\) in the Gibbs sampler.

1). Conditional posterior of \(Z_i\)

Recall that \[ Z_i \mid \mathbf{V} \;\sim\; \prod_{j=1}^J \pi_j(\mathbf{V})^{\delta_j(Z_i)}, \] where \[ \pi_j(\mathbf{V}) \;=\; V_j \prod_{l<j} (1 - V_l), \] and \[ Y_i \mid (Z_i = j, \mu_j, \sigma^2) \;\sim\; N(\mu_j, \sigma^2). \]

Given all other parameters \(\mu, \sigma^2, \mathbf{V}\) and the observed data \(Y_i\), the posterior probability for \(Z_i\) taking a particular cluster value \(j\) is proportional to the prior \(\pi_j(\mathbf{V})\) times the likelihood of \(Y_i\) under that cluster’s parameters:

\[ p(Z_i = j \mid \mathbf{Y}, \mu, \sigma^2, \mathbf{V}) \;\propto\; \pi_j(\mathbf{V}) \,\cdot\, f\bigl(Y_i \mid \mu_j, \sigma^2\bigr), \] where \(f(\cdot)\) is the density of the normal distribution \(\mathcal N(\mu_j, \sigma^2)\). Explicitly,

\[ f\bigl(Y_i \mid \mu_j, \sigma^2\bigr) \;=\; \frac{1}{\sqrt{2\pi\,\sigma^2}} \exp\!\Bigl(\!-\frac{(Y_i - \mu_j)^2}{2\,\sigma^2}\Bigr). \]

2). Normalizing to obtain the full conditional

To obtain a proper probability, we normalize over \(j = 1, \dots, J\):

\[ p(Z_i = j \mid \mathbf{Y}, \mu, \sigma^2, \mathbf{V}) \;=\; \frac{\pi_j(\mathbf{V})\, f\bigl(Y_i \mid \mu_j, \sigma^2\bigr)} {\sum_{k=1}^J \pi_k(\mathbf{V})\, f\bigl(Y_i \mid \mu_k, \sigma^2\bigr)}. \]

3). Gibbs sampling step

In a Gibbs sampler, we update each \(Z_i\) in turn (or all \(Z_i\) in a block) by:

i). Compute unnormalized weights for \(j = 1, \dots, J\): \[ w_j \;=\; \pi_j(\mathbf{V}) \times f\bigl(Y_i \mid \mu_j, \sigma^2\bigr). \]

ii). Normalize to get the probabilities: \[ p(Z_i = j \mid \dots) \;=\; \frac{w_j}{\sum_{k=1}^J w_k}. \]

iii). Sample \(Z_i\) from the discrete distribution with the above probabilities.


(b) Construct a sampling step for \(\mathbf{V}\) in the Gibbs sampler.

1). Prior of \(V_j\)

For \(j = 1, \dots, J-1\):

\[ V_j \mid \alpha \sim \text{Beta}(1, \alpha), \]

and \(V_J = 1\) is fixed. The Beta density is as follows:

\[ p(V_j) = \frac{\Gamma(1 + \alpha)}{\Gamma(1)\Gamma(\alpha)} V_j^{0} (1 - V_j)^{\alpha - 1} = \alpha (1 - V_j)^{\alpha - 1}. \]

2). Likelihood Contribution

The stick-breaking process defines the mixture weight for cluster \(j\):

\[ \pi_j(\mathbf{V}) = V_j \prod_{l<j} (1 - V_l). \]

Given the cluster assignment \(Z_i\), the likelihood contribution related to \(V_j\) depends on how many observations are assigned to cluster \(j\) and how many to clusters after \(j\).

For each \(i\): - If \(Z_i = j\), the likelihood involves \(\pi_j(\mathbf{V}) = V_j \prod_{l<j} (1 - V_l)\). - For clusters after \(j\), the \((1 - V_j)\) term appears in their weights.

Thus, the likelihood contribution of \(V_j\) is: - \(V_j\) raised to the power of number of observations assigned to cluster \(j\). - \((1 - V_j)\) raised to the power of number of observations assigned to clusters after \(j\).

3). Sufficient Statistics

Define: - \(n_j = \sum_{i=1}^n \delta_j(Z_i)\): number of observations assigned to cluster \(j\). - \(m_j = \sum_{i=1}^n \sum_{k>j} \delta_k(Z_i)\): number of observations assigned to clusters after \(j\).

4). Posterior Derivation

Combining the prior and likelihood:

\[ \begin{aligned} p(V_j \mid \mathbf{Z}=j) &\propto p(V_j) \times p(\mathbf Z=j|V_j) \\ &\propto \left[ \alpha (1 - V_j)^{\alpha - 1} \right] \times V_j^{n_j} (1 - V_j)^{m_j} \\ &\propto V_j^{n_j} (1 - V_j)^{\alpha - 1 + m_j}, \end{aligned} \]

which is the Beta kernel. Thus,

\[ V_j \mid (\mathbf{Z}=j) \sim \text{Beta}(1 + n_j, \alpha + m_j). \]

5). Summary: Gibbs Sampling Step

For each \(j = 1, \dots, J-1\):

  1. Compute:
    • \(n_j = \#\{i : Z_i = j\}\)
    • \(m_j = \#\{i : Z_i > j\}\)
  2. Sample:

\[ V_j \sim \text{Beta}(1 + n_j, \alpha + m_j). \]

Note that \(V_J = 1\) is deterministic.


(c) Construct a sampling step for \(\mu\) in the Gibbs sampler.

1). prior of \(\mu_j\)

Each \(\mu_j\) has

\[ \mu_j \mid \sigma^2 \sim N(0, \kappa \sigma^2), \quad j = 1, \dots, J. \] That is, \[p(\mu_j | \sigma^2) \propto \exp\left( -\frac{1}{2\kappa \sigma^2} \mu_j^2 \right).\]

2). Likelihood contribution for \(\mu_j\)

Given the cluster assignment \(Z_i\), the observations in cluster \(j\) are:

\[ Y_i \mid (Z_i = j, \mu_j, \sigma^2) \sim \mathcal N(\mu_j, \sigma^2). \]

For all observations assigned to cluster \(j\) (i.e., \(Z_i = j\)), the joint likelihood is:

\[ \prod_{i: Z_i = j} \frac{1}{\sqrt{2\pi \sigma^2}} \exp\left( -\frac{(Y_i - \mu_j)^2}{2\sigma^2} \right). \]

Equivalently:

\[ p(Y_i \mid Z_i = j, \mu_j, \sigma^2) \propto \exp\left( -\frac{1}{2\sigma^2} \sum_{i: Z_i = j} (Y_i - \mu_j)^2 \right). \]

3). Combine prior and likelihood

Multiplying prior and likelihood:

\[\begin{aligned} \log p(\mu_j \mid \mathbf{Y}, \mathbf{Z}, \sigma^2) &\propto -\frac{1}{2\kappa \sigma^2} \mu_j^2 - \frac{1}{2\sigma^2} \sum_{i: Z_i = j} (Y_i - \mu_j)^2 \\ &= -\frac{1}{2\kappa \sigma^2} \mu_j^2 - \frac{1}{2\sigma^2} \left( \sum_{i: Z_i = j} Y_i^2 - 2\mu_j \sum_{i: Z_i = j} Y_i + n_j \mu_j^2 \right) \\ &\propto -\frac{1}{2\sigma^2} \left( \frac{1}{\kappa} + n_j \right) \mu_j^2 + \frac{1}{\sigma^2} \left( \sum_{i: Z_i = j} Y_i \right) \mu_j. \end{aligned}\]

where \(n_j = \sum_{i=1}^n \delta_j(Z_i)\) is the number of observations in cluster \(j\).

4). Posterior distribution

Define:

\[ S_j = \sum_{i: Z_i = j} Y_i. \]

Then:

\[ \log p(\mu_j \mid \mathbf{Y}, \mathbf{Z}, \sigma^2) \propto -\frac{1}{2\sigma^2} \left( \frac{1}{\kappa} + n_j \right) \left( \mu_j^2 - 2 \frac{S_j}{\frac{1}{\kappa} + n_j} \mu_j \right). \]

Completing the square:

\[ \propto -\frac{1}{2\sigma^2} \left( \frac{1}{\kappa} + n_j \right) \left( \mu_j - \frac{S_j}{\frac{1}{\kappa} + n_j} \right)^2. \]

Thus, the conditional posterior is:

\[ \mu_j \mid \mathbf{Y}, \mathbf{Z}, \sigma^2 \sim N\left( \frac{S_j}{\frac{1}{\kappa} + n_j}, \; \frac{\sigma^2}{\frac{1}{\kappa} + n_j} \right). \]

5). Gibbs Sampling Step

For each \(j = 1, \dots, J\):

i). Compute: - \(n_j = \#\{i : Z_i = j\}\), - \(S_j = \sum_{i: Z_i = j} Y_i\).

ii). Sample:

\[ \mu_j \sim N\left( \frac{S_j}{\frac{1}{\kappa} + n_j}, \; \frac{\sigma^2}{\frac{1}{\kappa} + n_j} \right). \]


(d) Construct a sampling step for \(\sigma^2\) in the Gibbs sampler.

1) Prior for \(\sigma^2\)

The prior is an Inverse-Gamma distribution:

\[ \sigma^2 \sim \text{Inv-Gamma}(\gamma, \beta). \]

That is, the density:

\[ p(\sigma^2) \propto (\sigma^2)^{-(\gamma + 1)} \exp\left( -\frac{\beta}{\sigma^2} \right). \]

2) Likelihood contributions involving \(\sigma^2\)

There are two parts in the model where \(\sigma^2\) appears:

i). Likelihood of \(Y_i\):

For all observations:

\[ Y_i \mid (Z_i = j, \mu_j, \sigma^2) \sim N(\mu_j, \sigma^2). \]

This gives the likelihood contribution:

\[ \prod_{i=1}^n \frac{1}{\sqrt{2\pi \sigma^2}} \exp\left( -\frac{(Y_i - \mu_{Z_i})^2}{2\sigma^2} \right). \]

ii). Prior of \(\mu_j\):

Each \(\mu_j\) has:

\[ \mu_j \mid \sigma^2 \sim N(0, \kappa \sigma^2). \]

This gives:

\[ \prod_{j=1}^J \frac{1}{\sqrt{2\pi \kappa \sigma^2}} \exp\left( -\frac{\mu_j^2}{2\kappa \sigma^2} \right). \]

3) Joint likelihood in terms of \(\sigma^2\)

Combining all terms:

  • From the likelihood of \(Y_i\):

The log-likelihood contribution is:

\[ -\frac{n}{2} \log(\sigma^2) - \frac{1}{2\sigma^2} \sum_{i=1}^n (Y_i - \mu_{Z_i})^2. \]

  • From the prior of \(\mu_j\):

There are \(J\) terms:

\[ -\frac{J}{2} \log(\sigma^2) - \frac{1}{2\kappa \sigma^2} \sum_{j=1}^J \mu_j^2. \]

  • From the Inverse-Gamma prior:

\[ -(\gamma + 1) \log(\sigma^2) - \frac{\beta}{\sigma^2}. \]

The total log posterior of \(\sigma^2\) is proportional to:

\[ -\left( \gamma + 1 + \frac{n}{2} + \frac{J}{2} \right) \log(\sigma^2)- \frac{1}{2\sigma^2} \left( \sum_{i=1}^n (Y_i - \mu_{Z_i})^2 + \frac{1}{\kappa} \sum_{j=1}^J \mu_j^2 + 2\beta \right). \]

4) Recognize posterior form

This is the kernel of an Inverse-Gamma distribution. Specifically, the posterior is:

\[ \sigma^2 \mid - \sim \text{Inv-Gamma}(\gamma_{\text{post}}, \beta_{\text{post}}), \]

where:

\[ \gamma_{\text{post}} = \gamma + \frac{n}{2} + \frac{J}{2}, \]

\[ \beta_{\text{post}} = \beta + \frac{1}{2} \sum_{i=1}^n (Y_i - \mu_{Z_i})^2 + \frac{1}{2\kappa} \sum_{j=1}^J \mu_j^2. \]


5) Gibbs Sampling Step

  1. Compute:
  • Residual sum of squares:

\[ S_1 = \sum_{i=1}^n (Y_i - \mu_{Z_i})^2. \]

  • Prior sum:

\[ S_2 = \sum_{j=1}^J \mu_j^2. \]

  1. Update hyperparameters:

\[ \gamma_{\text{post}} = \gamma + \frac{n}{2} + \frac{J}{2}, \]

\[ \beta_{\text{post}} = \beta + \frac{1}{2} S_1 + \frac{1}{2\kappa} S_2. \]

  1. Sample:

\[ \sigma^2 \sim \text{Inv-Gamma}(\gamma_{\text{post}}, \beta_{\text{post}}). \]