Q1 Consider analyzing count data, \(y = (y_1, . . . , y_n)^T , y_i \in \mathbb{N}_0 := \{0, 1, 2, . . . \}, i = 1, . . . , n\).

(a) The basic approach is using the Poisson model, \[p(y_i|\theta) = \frac{\theta^{y_i}e^{-\theta}}{y_i!}, \quad y_i \in \mathbb{N}_0, i = 1, . . . , n.\] With an improper prior \(p(\theta) \propto 1, \theta > 0\), show that the posterior distribution \(p(\theta|y)\) is a gamma distribution. For the dataset countdata.txt, draw the posterior density of \(\theta\). [Do not use Monte Carlo simulation. You can directly visualize the density curve.]

\[ \begin{aligned} p(\theta|y) \propto p(\theta)p(y|\theta) \propto 1 \cdot \prod_{i=1}^{n} \frac{\theta^{y_i}e^{-\theta}}{y_i!} \propto \theta^{\sum_{i=1}^{n} y_i} \exp\{-n\theta\} \end{aligned} \]

solution

Recall that \(x^{\alpha-1} \exp\{-x\beta\}\) is the kernel of Gamma distribution \(x \sim \Gamma(\alpha, \beta)\) where \(\alpha, \beta\) is shape and rate parameter respectively. This shows that:

\[ \theta|y \sim \Gamma\left(\sum_{i=1}^{n} y_i + 1, n\right). \]

The graph of the posterior density is drawn as follows:

rm(list = ls())
library(ggplot2)
## Warning: 패키지 'ggplot2'는 R 버전 4.3.3에서 작성되었습니다
set.seed(2023311161)
countdata = read.table("countdata.txt")
y = countdata[,1]
x <- seq(0, 50, by = 0.01)

pdf <- dgamma(x, shape = sum(y), rate = length(y))

plot1 <- ggplot(data.frame(x, pdf), aes(x = x, y = pdf)) +
  geom_line(color = "blue", linewidth = 1) +
  labs(x = "x", y = "Density", title = "Gamma Distribution") +
  xlim(40, 50)

plot1
## Warning: Removed 4000 rows containing missing values or values outside the scale range
## (`geom_line()`).


(b) For the dataset countdata.txt, perform a posterior predictive check by generating replicates from the posterior predictive distribution through Monte Carlo simulation. Choose appropriate test quantities. Draw conclusions based on the results.

The posterior predictive distribution is as follows: \[p(\tilde y|y) = \int p(\tilde y|\theta)p(\theta|y)d\theta.\]

We may obtain posterior predictive distribution by extracting \(\theta\) first and the replicate \(\tilde y\) sequentially (Monte Carlo). The following shows the function ‘random_pois’ and the histogram of the replacated data.

random_pois <- function(N,y){
  yt = c()
  for(i in 1:N){
    theta = rgamma(1,shape = sum(y)+1, rate = length(y)) #p(theta|y)
    yt[i] = rpois(1, theta) #p(y tilda|theta)
  }
  return(yt)
}
yt <- random_pois(1000,y)
plot2 <- ggplot(data.frame(yt), aes(x = yt)) +
  geom_histogram(aes(y = after_stat(density)),fill = "gray50", color = "black", bins = 30) +
  labs(x = "x", y = "Density")
plot2

The purpose of this posterior predictive check is to investigate whether the replicated data covers the overdispersed area, far from mean data. So the ‘ppcheck_pois’ function counts how often the 3rd quantile of the replicate exceed that of the observed data \(y\) or the 1st quantile of the replicate is less than that of the \(y\).

ppcheck_pois <- function(M, y,N){
  pp <- 0
  for(i in 1:M){
    ytil <- random_pois(N,y)
    if(quantile(ytil,probs=0.75) >= quantile(y,probs=0.75)||
       quantile(ytil,probs=0.25) <= quantile(y,probs=0.25)){
      pp <- pp+1
    }
  }
  return(pp/M)
}
ppcheck_pois(200,y,1000)
## [1] 0

This shows the replicate are rarely likely to cover the area far from the mean.


(c) An unfortunate property of the Poisson model is that it cannot model overdispersed data or data in which the variance is greater than the mean. This is because Poisson model has one parameter. An alternative is considering the Poisson-gamma mixture model, \(y_i|\theta_i \sim Poi(\theta_i)\) and \(\theta_i|(\alpha, \beta) \sim \Gamma(\alpha, \beta), \alpha > 0, \beta > 0\). Hence, the density of \(y_i\) is given by: \[\begin{aligned} p(y_i|\alpha, \beta) &= \int_{0}^{\infty} p(y_i|\theta_i)p(\theta_i|\alpha, \beta)d\theta_i \\ &= \binom{\alpha + y_i - 1}{y_i} \left(\frac{\beta}{\beta + 1}\right)^{\alpha} \left(\frac{1}{\beta + 1}\right)^{y_i}, \quad y_i \in \mathbb{N}_0, \quad i = 1, \dots, n. \end{aligned} \] Note that this is the density of a negative binomial distribution if \(\alpha\) is a positive integer. Despite the flexibility of \(\alpha\) being any positive value in our setting, this model is commonly referred to as the negative binomial model for count data. For the dataset countdata.txt, draw a contour plot of the unnormalized posterior density of \((\alpha, \beta)\) with an improper prior \(p(\alpha, \beta) \propto 1\). [Do not use Monte Carlo simulation. You can directly draw a contour plot, though some iterative trials may be necessary to find a suitable range for visualization.]

solution

The posterior distribution \(p(\alpha, \beta|y)\) can be obtained as follows:

\[ \begin{aligned} p(\alpha, \beta|y) &\propto p(\alpha, \beta)p(y|\alpha, \beta) \\ &\propto 1 \times \prod_{i=1}^{n} \binom{\alpha + y_i - 1}{y_i} \left( \frac{\beta}{\beta + 1} \right)^{\alpha} \left( \frac{1}{\beta + 1} \right)^{y_i} \\ &= \left\{ \prod_{i=1}^{n} \binom{\alpha + y_i - 1}{y_i} \right\} \left( \frac{\beta}{\beta + 1} \right)^{n\alpha} \left( \frac{1}{\beta + 1} \right)^{\sum_i y_i}. \end{aligned} \]

Using this result, an R function can be defined that works as the unnormalized posterior density.

f <- function(a, b, y) {
  n=length(y)
  result = 1
  for(i in 1:n){
    temp = choose(a + y[i] - 1, y[i]) * (b / (b + 1))^a * (1 / (b + 1))^y[i]
    result = result*temp
  }
  return(result)
}

The range of \(\alpha\) and \(\beta\) is determined as \(10 \leq \alpha \leq 35\) and \(0.2 \leq \beta \leq 0.8\), the interval of which are \(0.1\) and \(0.01\) respectively. The density values are saved in the variable z.

a_values <- seq(10, 35, by = 0.1) # 251
b_values <- seq(0.2, 0.8, by = 0.01) #61
z <- outer(a_values, b_values, FUN = Vectorize(function(a, b) f(a, b, y)))

The corresponding contour plot is displayed below.

contour(a_values, b_values, z, nlevels = 15,
        xlab = "a", ylab = "b",
        main = "Contour Plot of f(a, b)")


(d) For the dataset countdata.txt, perform the posterior predictive check by generating replicates from the posterior predictive distribution using Monte Carlo simulation. Use the same test quantities as those used in (b). To generate samples from the posterior predictive distribution, first draw \((\alpha, \beta)\) from the posterior distribution using the “grid sampling”, and then generate a replicate using the data distribution. You will need to find a suitable range and grid points for the grid sampling. Draw conclusions based on the results.

solution

The posterior predictive distribution for mixture model is as follows:

\[ p(\tilde{y}|y) = \int p(\tilde{y}|\alpha, \beta)p(\alpha, \beta|y)d(\alpha, \beta). \]

The replicated \(\tilde{y}\) follows negative binomial Nev-Bin\((\alpha, \beta)\) given the extracted \((\alpha, \beta)\).

For grid approximation (grid sampling), the same grids of \(\alpha, \beta\) determined in (c) will be used again. Then, each value in z can be viewed as discrete density value at each \(\alpha\) and \(\beta\), so the z divided by the total sum of z can be regarded as approximated density at the \(\alpha\) and \(\beta\).

z_gridapp <- z / sum(z)

The following random_ab function generates the \((\alpha, \beta)\) using inverse CDF method and generate the replicate \(\tilde{y}\). For each iteration (n in the R code), generate u from uniform distribution \(Unif(0, 1)\) and investigate whether the u is less than (approximated) CDF value (prob in R code). Whenever that happens, save the indices of each \(\alpha\) and \(\beta\) and find the corresponding the value of \((\alpha, \beta)\). Then, generate the nth replicate from rnbinom function. In this function, the prob parameter should be \(\frac{\beta}{\beta + 1}\), not just \(\beta\). After repeating for N times (N: the input that works as the total number of iteration), the generated \((\alpha, \beta)\), and \(\tilde{y}\) will be returned in dataframe.

random_ab <- function(N, a, b, z) {
  anew <- c()
  bnew <- c()
  ytilda <- c()
  
  for (n in 1:N) {
    prob <- 0
    u <- runif(1)
    
    for (i in 1:nrow(z)) {
      for (j in 1:ncol(z)) {
        if (u < prob) { break }
        prob <- prob + z[i, j]
        aidx <- i
        bidx <- j
      }
      if (u < prob) { break }
    }
    
    anew[n] = a[aidx]
    bnew[n] = b[bidx]
    ytilda[n] = rnbinom(1, a[aidx], b[bidx] / (b[bidx] + 1))
  }  # end of n loop
  
  return(data.frame(a = anew, b = bnew, y = ytilda))
}  # end of function

Again, a function for posterior predictive check ppcheck_mix is defined as follows, similar to the (b): whether the 3rd quantile of the replicate is greater than or equal to that of the given data or the 1st quantile of the replicate is less than or equal to that of the given data.

ppcheck_mix <- function(M, y, N, a, b, z) {
  pp <- 0
  for (i in 1:M) {
    ytil <- random_ab(N, a, b, z)$y
    if (quantile(ytil, probs = 0.75) >= quantile(y, probs = 0.75) ||
        quantile(ytil, probs = 0.25) <= quantile(y, probs = 0.25)) {
      pp <- pp + 1
    }
  }
  return(pp / M)
}

ppcheck_mix(200, y, 1000, a_values, b_values, z_gridapp)
## [1] 0.735

The result is much higher than in (b). This implies that the case in (d) are more likely to generate overdispersed data (greater than mean).

Q2 \[y|(\beta,\sigma^2) \sim N_n(X\beta,\sigma^2I)\]with prior \(p(\beta,\sigma^2) \propto \sigma^{-2}\)

(a) Derive the conditional posterior distribution of \(\beta\) given \(\sigma^2\), say \(p(\beta|\sigma^2,y)\).

We have \(p(y|\beta,\sigma^2) = (2\pi\sigma^2)^{-n/2}\exp\bigl\{-\frac{1}{2\sigma^2}(y-X\beta)^T(y-X\beta)\bigr\}\), so \[\begin{aligned} p(\beta,\sigma^2|,y) &\propto p(y|\beta,\sigma^2)p(\beta,\sigma^2) \\ &\propto (\sigma^2)^{-n/2-1}\exp\bigl\{-\frac{1}{2\sigma^2}(y-X\beta)^T(y-X\beta)\bigr\} \\ &\propto (\sigma^2)^{-n/2-1}\exp\bigl\{-\frac{1}{2\sigma^2}(y-X\hat\beta +X\hat\beta -X\beta)^T(y-X\hat\beta +X\hat\beta -X\beta)\bigr\} \\ &\propto (\sigma^2)^{-n/2-1}\exp\bigl\{-\frac{1}{2\sigma^2}(y-X\hat\beta)^T(y-X\hat\beta) + \underbrace{2(y-X\hat\beta)^T(X\hat\beta-X\beta)}_{1)}+(\hat\beta-\beta)^TX^TX(\hat\beta-\beta)\bigr\} \\ &\propto (\sigma^2)^{-n/2-1}\exp\bigl\{-\frac{1}{2\sigma^2}(y-X\hat\beta)^T(y-X\hat\beta) +(\hat\beta-\beta)^TX^TX(\hat\beta-\beta)\bigr\} . \end{aligned}\]

The underbrace 1) becomes 0 because \(\hat\beta\) is the solution of the normal equation \(X^T(y-X\hat\beta) = 0\) Then, \(\exp\{(\hat\beta-\beta)^TX^TX(\hat\beta-\beta)\}\) is the only term in \(p(\beta,\sigma^2|y)\) retaining \(\beta\). So, \[\begin{aligned} p(\beta|\sigma^2,y) &\propto \exp\bigl\{(\hat\beta-\beta)^TX^TX(\hat\beta-\beta)\bigr\} \\ &\propto (2\pi)^{-n/2}|\sigma^2(X^TX)^{-1}|^{-1/2}\exp\bigl\{(\hat\beta-\beta)^TX^TX(\hat\beta-\beta)\bigr\}, \end{aligned}\] and thus, \(\beta|(\sigma^2,y) \sim N_p(\hat\beta,\sigma^2(X^TX)^{-1})\).

(b) Let \(\beta_1\) be the first element of \(\beta\). Derive the conditional posterior distribution of \(\beta_1\) given \(\sigma^2\), say \(p(\beta_1|\sigma^2,y)\)

We may use the result of (a), that is, \[\beta_1|(\sigma^2,y) \sim N(\hat\beta_1,[\sigma^2(X^TX)^{-1}]_{1,1})\] where \(\hat\beta = (X^TX)^{-1}X^Ty\).

(c) Derive the conditional posterior distribution of \(\sigma^2\) given \(\beta\), say \(p(\sigma^2|\beta,y)\).

Return to the \(p(\beta,\sigma^2|y)\); \[p(\beta,\sigma^2|y) \propto (\sigma^2)^{-n/2-1}\exp\bigl\{-\frac{1}{2\sigma^2}(y-X\hat\beta)^T(y-X\hat\beta) +(\hat\beta-\beta)^TX^TX(\hat\beta-\beta)\bigr\}\] This shows that \((\sigma^2)^{-n/2-1}\exp\bigl\{-\frac{1}{2\sigma^2}(y-X\hat\beta)^T(y-X\hat\beta)\bigr\}\) is the only term containing \(\sigma^2\). The scaled inverse chi-square distribution \(\text{Inv-}\chi^2(\theta|\nu,s^2)\) has the following pdf: \[p(\theta) \propto \theta^{-(\nu/2+1)}\exp\{-\nu s^2/(2\theta)\}.\] Using this fact, we conclude \[\sigma^2|(\beta,y) \sim \text{Inv-}\chi^2\Bigl(n,\frac{(y-X\hat\beta)^T(y-X\hat\beta)}{n}\Bigr).\]

(d) By marginalizing out \(\sigma^2\) from \(p(\beta,\sigma^2|y)\), derive the marginal posterior distribution of \(\beta\), say \(p(\beta|y)\).

1) Suppose \(z|u\sim N_p(\mu,u\Sigma)\) and \(u\sim \text{Inv-}\chi^2(\nu,s^2)\), then \(z\sim t_{\nu}(\mu,s^2\Sigma)\)

\[\begin{aligned} p(z) &\propto \int p(z|u)p(u)du \\ &\propto \int|u\Sigma|^{-1/2}\exp\bigl\{-\frac{1}{2u}\underbrace{(z-\mu)^T\Sigma^{-1}(z-\mu)}_{:=\Delta_z}\bigr\}u^{-(\nu/2+1)}\exp\Bigl\{\frac{-\nu s^2}{2u}\Bigr\}du \\ &\propto \int \underbrace{u^{-p/2}u^{-(\nu/2+1)}\exp\biggl\{-\frac{\Delta_z+\nu s^2}{2u}\biggr\}}_{2)}du \\ & \Bigl( \text{2) is a kernel of Inv-}\chi^2(\nu+p,\frac{\Delta_z+\nu s^2}{\nu+p}) \Bigr)\\ &= \frac{\Gamma(\frac{\nu+p}{2})}{(\frac{\nu+p}{2})^{\frac{\nu+p}{2}}\Bigl(\sqrt{\frac{\Delta_z+\nu s^2}{\nu+p}}\Bigr)^{\nu+p}} \\ &\propto \biggl(\frac{\Delta_z+\nu s^2}{\nu+p}\biggr)^{-(\nu+p)/2} \\ &\propto \biggl(\frac{\Delta_z+\nu s^2}{\nu}\biggr)^{-(\nu+p)/2} \\ &\propto \biggl(1+\frac{\Delta_z}{\nu s^2}\biggr)^{-(\nu+p)/2} \\ &= \Bigl\{1+\frac{1}{\nu}(z-\mu)^T\frac{\Sigma^{-1}}{s^2}(z-\mu)\Bigr\}^{-(\nu+p)/2}, \end{aligned}\] which is a kernel of \(t_{\nu}(\mu,s^2\Sigma)\).

2) \(z \Rightarrow (\beta|y)\) and \(u\Rightarrow (\sigma^2|y)\)

We have \(\beta|(\sigma^2,y)\sim N_p(\hat\beta,\sigma^2(X^TX)^{-1})\) and \(\sigma^2|y \sim\text{Inv-}\chi^2(n-p,s^2)\), where \(s^2=\frac{1}{n-p}(y-X\hat\beta)^T(y-X\hat\beta)\), the result of (f). Thus, \(\beta|y\sim t_{n-p}(\hat\beta,s^2(X^TX)^{-1})\).

(e) By marginalizing out \(\sigma^2\) from \(p(\beta_1,\sigma^2|y)\), derive the marginal posterior distribution of \(\beta_1\), say \(p(\beta_1|y)\).

We may directly use the result of (d); \[\beta_1|y\sim t_{n-1}(\hat\beta,s^2[(X^TX)^{-1}]_{1,1}).\]

(f) By marginalizing out \(\beta\) from \(p(\beta,\sigma^2|y)\), derive the marginal posterior distribution of \(\sigma^2\), say \(p(\sigma^2|y)\).

Recall that \(p(\beta,\sigma^2|y) \propto (\sigma^2)^{-n/2-1}\exp\bigl\{-\frac{1}{2\sigma^2}(y-X\hat\beta)^T(y-X\hat\beta) +(\hat\beta-\beta)^TX^TX(\hat\beta-\beta)\bigr\}\). \(\exp\{(\hat\beta-\beta)^TX^TX(\hat\beta-\beta)\}\) is a kernel of \(N_p(\hat\beta,\sigma^2(X^TX)^{-1})\), so \[\int\exp\{(\hat\beta-\beta)^TX^TX(\hat\beta-\beta)\}d\beta=(2\pi)^{-p/2}|\sigma^2(X^TX)^{-1}|^{-1/2} = (2\pi\sigma^2)^{-p/2}|X^TX|^{1/2}.\]Then, \[\begin{aligned} \int p(\beta,\sigma^2|y)d\beta &\propto (\sigma^2)^{-n/2-1}\exp\bigl\{-\frac{1}{2\sigma^2}(y-X\hat\beta)^T(y-X\hat\beta)\bigr\}\cdot(2\pi\sigma^2)^{-p/2}|X^TX|^{1/2} \\ &\propto (\sigma^2)^{-(n-p)/2-1}\exp\Bigl\{-\frac{n-p}{2\sigma^2}\frac{(y-X\hat\beta)^T(y-X\hat\beta)}{n-p}\Bigr\}, \end{aligned}\] which is a kernel of \(\text{Inv-}\chi^2\Bigl(n-p,\frac{(y-X\hat\beta)^T(y-X\hat\beta)}{n-p}\Bigr)\). Thus, \[\sigma^2|y\sim\text{Inv-}\chi^2\Bigl(n-p,\frac{(y-X\hat\beta)^T(y-X\hat\beta)}{n-p}\Bigr).\]