MCMC HW3

Q1. Data augmentation

(2013) BAYESIAN DATA AUGMENTATION DOSE FINDING WITH CONTINUAL REASSESSMENT METHOD AND DELAYED TOXICITY.pdf

Make R code for data augmentation and estimate toxicity probability \(\hat\pi_d = \alpha_d^{\exp(\hat a)}, d =1,\dots,J\) when patient 16 arrived on day 364.

The question asks to estimate for the case of patient no. 16, so we use the data of the patients until no. 15.

rm(list = ls())
library(dplyr)

## Warning: 패키지 'dplyr'는 R 버전 4.3.3에서 작성되었습니다

## 
## 다음의 패키지를 부착합니다: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

set.seed(2023311161) 
dose_data <- read.table("dose.txt", header = TRUE)

dose_data <- dose_data[1:15,]
n <- nrow(dose_data)

This question sets the dose level as 20,30,40, and 50, and the corresponding prespecified constant \(\alpha\) is 0.1, 0.15, 0.2, 0.25. The assesment period \(T\) is given as 63 weeks, and I set the number of intervals of the period \(K\) as 9. Also, the constant \(C\) regarding \(\lambda_k\) is given as 2, and I set the initial value of \(a\) as 0. The total number of iteration is 5000.

alpha <- c(0.1, 0.15, 0.2, 0.25)  # Prespecified toxicity probabilities for dose levels
K <- 9  # Number of intervals
T <- 63  # Toxicity assessment period in days (9 weeks)
iterations <- 5000  # Number of MCMC iterations
C <- 2
a <- 0  # Starting value for a

The \(\tilde\lambda_k\) is defined as \(\tilde\lambda_k := K/(T(K-k+0.5))\), which corresponds to the ‘tilde_lambda’ vector. Also, \(\lambda_k \sim \Gamma(\tilde\lambda_k/C, 1/C)\), so I set the initial values of each \(\lambda_k\) as the following R code.

tilde_lambda <- sapply(1:K, function(k) K/(T*(K - k + 0.5)) )
lambda <- sapply(1:K, function(k) rgamma(1, shape = tilde_lambda[k]/C, rate = 1/C))

Now, we have to change and add some values in the dataset. This covers the case where the patient arrives at the day 364, which becomes the end of the test. So, the ‘DayOffStudy’ values which were bigger than 364 is swifted to 364. Also, there are some ui’s whose values are greater than \(T\), so those ui’s are changed into \(T\).

If toxicity occurs (DLT = 1), we set the ti = ui. Otherwise, we set ti infinity. The Yi is 1 if DLT = 1, Yi = 0 if DLT = 0 and ui = T. For the rest of the cases, se we YI as ‘NA’. By, definition, we set xi as minimum of ui and ti. The ‘alpha’ column is generated according to the dose level.

dose_data <- dose_data %>% 
  #
  mutate(DayoffStudy = ifelse(DayoffStudy >= 364, 364, DayoffStudy)) %>%
  mutate(ui = ifelse(DayoffStudy - DayonStudy < T,DayoffStudy - DayonStudy, T ) ) %>%
  mutate(ti = ifelse(DLT == 1, ui, Inf) ) %>%
  mutate(Yi = case_when(DLT == 1 ~ 1,
                        DLT == 0 & ui == 63 ~ 0,
                        TRUE ~ NA)
         ) %>%
  mutate(xi = min(ui,ti) ) %>%
  mutate(alpha = case_when(Dose == 30 ~ 0.15,
                           Dose == 40 ~ 0.2,
                           Dose == 50 ~ 0.25,
                           TRUE ~ 0.1))

dose_data

The following code is associated with \(s_{i,k}\) that is defined in the section 2.3 of the paper. To summarize the definition of \(s_{i,k}\), it becomes \(T/K\) (in my setting, it becomes 7) if \(x_i\) is on the right side of the given interval, becomes 0 if it is on the left side of the inverval, and \(x_i - h_{k-1}\) if \(x_i\) is in the inverval \([h_{k-1}, h_k)\). For example, when input is \(x=37\), it’s in the 6th inverval \([35, 42)\), so \(s_{i,k} = 7, 7, 7 ,7, 7, 2, 0, 0, 0\)

create_si <- function(x) {
  # Initialize s vector with 9 zeros
  s <- numeric(9)
  
  # Find the partition (k)
  if(x == 63){k <- 9}
  else{k <- floor(x / 7) + 1}  # k ranges from 1 to 9 based on x}
  # Set s_j = 7 for j < k
  if (k > 1) {
    s[1:(k - 1)] <- 7
  }
  
  # Set s_k
  s[k] <- x - (k - 1) * 7
  
  
  return(s)
}

create_si(37)

## [1] 7 7 7 7 7 2 0 0 0

Based on this function, we can now create a new function that obtains all the \(s_{i,k}\) for all \(i,k\). The initial values of all \(s_{i,k}\) are in the ‘sis’ matrix.

create_sis <- function(dose_data){
  n <- nrow(dose_data)
  sis <- matrix(0, nrow = n, ncol = K)
  for(r in 1:n){
    sis[r,] <- create_si(dose_data[r,'xi'])
  }
  return(sis)
}
sis <- create_sis(dose_data)

This code shows how to define \(\delta_{i,k}\). The \(\delta_{i,k}\) become 1 when the toxicity of \(i\)th patient occurs in the \(k\)th interval. If there is no toxicity or if YI is missing, we leave the \(\delta_{i,k}\) as 0.

create_deltas <- function(dose_data){
  
  deltas <- matrix(0, nrow = nrow(dose_data), ncol = K)
  
  for(r in 1:nrow(dose_data)){
    if( is.na(dose_data[r,'Yi']) ){next}
    else if(dose_data[r,'Yi'] == 1){
      k <- ceiling(dose_data[r,'xi']/7) # 7 == T/K
      deltas[r,k] <- 1
    }
  }
  
  return(deltas)
}


deltas <- create_deltas(dose_data)
deltas

##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
##  [1,]    0    0    0    0    0    0    0    0    0
##  [2,]    0    0    0    0    0    0    0    0    0
##  [3,]    0    0    0    0    0    0    0    0    0
##  [4,]    0    0    0    0    0    0    0    0    0
##  [5,]    0    0    0    0    0    0    0    0    0
##  [6,]    0    0    0    0    0    0    0    0    0
##  [7,]    0    0    0    0    0    0    0    0    0
##  [8,]    0    0    0    0    0    0    0    0    0
##  [9,]    0    0    0    0    0    0    0    0    0
## [10,]    0    0    0    0    0    0    0    0    0
## [11,]    0    0    1    0    0    0    0    0    0
## [12,]    0    0    1    0    0    0    0    0    0
## [13,]    0    0    0    0    0    0    0    0    0
## [14,]    0    0    0    0    0    0    0    0    0
## [15,]    0    0    1    0    0    0    0    0    0

The missing Yi’s is generated according to the following distribution: \[y_i|(\mathcal D_{obs},a,\lambda) \sim \text{Bernoulli}\biggl( \frac{ \alpha_{d_i}^{\exp(a)}\exp\{-\sum_{k=1}^K\lambda_ks_{i,k} \} }{ 1-\alpha_{d_i}^{\exp(a)} + \alpha_{d_i}^{\exp(a)}\exp\{-\sum_{k=1}^K\lambda_ks_{i,k} \} } \biggr)\] The \(\alpha_{d_i}\) is the prespecified \(\alpha\) of patient \(i\) that corresponds to its dose level \(d_i\). The Bernoulli parameter is in the ‘prob’ vector.

sample_Yi <- function(alpha_d, a, lambda, si) {#alpha_d <- 0.15
  
  prob <- alpha_d^(exp(a))*exp(-sum(lambda*si)) / 
    (1 - alpha_d^(exp(a)) + alpha_d^(exp(a))*exp(-sum(lambda*si)) )
  return(rbinom(1, 1, prob))
}

Next purpose is to obtain a function of \(f(y|a)\), which is denoted as likelihood \(L(y|a)\) in the paper. This computes \((\alpha_{d_i}^{\exp(a)})^{y_i}(1 - \alpha_{d_i}^{\exp(a)} )^{1-y_i}\) for every \(i\)th patient and multiply all of them.

f_y_a <- function(a) {
  Yi <- dose_data$Yi
  alpha <- dose_data$alpha
  prod_val <- prod(sapply(1:length(Yi), function(i) {
   alpha[i]^(exp(a) * Yi[i]) * (1 - alpha[i]^(exp(a)))^(1 - Yi[i])
  }))
  
  # Check for extreme cases
  if (is.infinite(prod_val) || is.nan(prod_val)) prod_val <- 0
  return(as.numeric(prod_val))
  
}

‘dose_na’ is for the missing values, and Y_missings stores the samples of imputations. ‘a_samples’ and ‘lambda_samples’ are the posteriors of each parameters.

na_rows <- which(is.na(dose_data$Yi))
dose_na <- dose_data[na_rows,]

Y_missings <- matrix(NA, nrow = length(na_rows), ncol = iterations)
rownames(Y_missings) <- na_rows

a_samples <- rep(NA, iterations)
lambda_samples <- matrix(NA, nrow = K, ncol = iterations)

Now, we delve into the MCMC algorithm. This code directly follows the algorithm given in the paper.

For the Imputation step, impute the missing Yi’s using the corresponding \(s_{i,k}\) and sample_Yi function. If 1 is imputed for the missing Yi, we generate new ti between 1 and its ui, and set the new xi accordingly. After updating the dose_data, we update the \(\delta_{i,k}\) and \(s_{i,k}\) accordingly.

We update \(a\) and \(\lambda_k\) for the P-step. Since MH is needed for \(a\), we use \(\mathcal N(a^{(t)}, 2^2)\) as a proposal density. Then, the acceptance ratio (MH_R) is calculated as \[\frac{f(y|a')f(a')}{f(y|a^{(t)}) f(a^{(t)})}.\] because the proposal ones are canceled out.

The posterior of \(\lambda_k\) is \[\lambda_k|y \sim \Gamma\Bigl(\tilde\lambda_k/C + \sum_{i=1}^n\delta_{i,k}, 1/C + \sum_{i=1}^ny_is_{i,k} \Bigr).\] So, each shape and rate parameter is updated using the ‘deltas’ and ‘sis’ matrix, and generate new \(\lambda_k\) accordingly.

# MCMC
for (iter in 1:iterations) {#iter <- 1
  
  # I-step
  for (i in 1:length(na_rows) ) {
    si <- sis[na_rows[i],]
    Y_missings[i,iter] <- sample_Yi(dose_na[i,'alpha'], a, lambda, si)
    if(Y_missings[i,iter] == 1){
      ui_temp <- dose_na[i,'ui']
      dose_na[i,'ti'] <- sample(seq(1:ui_temp), 1)
      dose_na[i,'xi'] <- min(dose_na[i,'ui'], dose_na[i,'ti'])
    }
  }
  dose_na$Yi <- Y_missings[,iter]
  dose_data[na_rows,] <- dose_na
  
  deltas <- create_deltas(dose_data)
  sis <- create_sis(dose_data)
  
  
  # P-step
  
  # Update a
  a_temp <- rnorm(1, a, 2) # from normal with mean a and sd 2
  MH_R <- (f_y_a(a_temp)*dnorm(a_temp,0,1) )/(f_y_a(a)*dnorm(a,0,1) )
  
  if(runif(1) < MH_R){
    a_samples[iter] <- a_temp
    a <- a_temp
  } else {
      a_samples[iter] <- a
  }
  
  
  # Update lambda
  for (k in 1:K) {
    shape <- tilde_lambda[k]/C + sum(deltas[,k])
    rate <- 1/C + dose_data$Yi*sis[,k]
    
    lambda[k] <- rgamma(1, shape = shape, rate = rate)
  }
  lambda_samples[,iter] <- lambda
  
}

The following shows the trace plot of \(a\) after burn-in period (1000).

burnin <- 1000
ts.plot(a_samples[burnin:iterations], main = "trace plot of a", ylab = 'a')

Q2. Variable selection : R code for variable selection in multiple regression using a spike-and-slab prior.

(1) Model Setting

The model setting is as follows: for \(k=1,\dots,K=10\),

\[\begin{aligned} y &\sim \mathcal N_n(X\beta, \sigma_y^2I_n) \\ \beta_k|(\lambda_k, w, \sigma^2) &\sim (1-\lambda_k)\mathcal(0,\sigma^2) + \lambda_k\mathcal N(0,w^2\sigma^2) \\ \sigma_y^2 &\sim \text{Inverse-Gamma}(0.001, 0.001) \\ \lambda_k &\sim \text{Bernoulli}(1/2) \\ 1/\sigma^2 &\sim \text{Unif}(4,100) \\ w &\sim 1+Z;\quad Z\sim\Gamma(1,1/100) \end{aligned}\] where \(n\) is the number of all data.

rm(list = ls())
set.seed(2023311161)
library(Rcpp)

## Warning: 패키지 'Rcpp'는 R 버전 4.3.3에서 작성되었습니다

library(RcppArmadillo)
library(invgamma)

spikeslab <- read.table("spikeslab.txt", header = TRUE)
head(spikeslab)

y <- spikeslab[,'Y']
X <- as.matrix(spikeslab[,-1])
n <- nrow(spikeslab)
K <- ncol(X)

What we have to generate are \(\beta_k, \sigma_y^2, \lambda_k, \sigma^2\) and \(w\), so we set the initial values as follows (randomly). The ‘p_lambda’ is for the posterior of \(\lambda_k\).

sigma2 <- 1/10     # Initial value for sigma^2
sigma_y2 <- 1      # Initial value for sigma_y^2
w <- 1             # Initial value for w
beta <- rep(0, K)  # Initialize beta
lambda <- rbinom(K, 1, 0.5) # Initialize lambda indicators
p_lambda <- 1/2
n_iter <- 5000
burnin <- 1000

Also, we make empty matrix and vectors for the MCMC samples.

beta_samples <- matrix(0, nrow = K, ncol = n_iter)
w_samples <- rep(0,n_iter)
sigma2_samples <- rep(0,n_iter)
sigma_y2_samples <- rep(0,n_iter)

To sum up, we can write the posterior of all parameters as follows: \[\begin{aligned} f(\beta, \lambda, \sigma_y^2, \sigma^2,w|y,X) &\propto f(y|X,\beta,\sigma_y^2)f(\beta, \lambda, \sigma_y^2, \sigma^2,w) \\ &\propto f(y|X,\beta,\sigma_y^2)f(\beta|\lambda,w,\sigma^2)f(\lambda, \sigma_y^2, \sigma^2,w) \\ &\propto f(y|X,\beta,\sigma_y^2)f(\beta|\lambda,w,\sigma^2)f(\lambda)f( \sigma_y^2) f( \sigma^2)f(w) \end{aligned}\]

Before delving into each of the posteriors, our purpose is to simplify the distribution \(\beta, \sigma^2,\) and \(w\).

1) \(\boldsymbol{f(\beta|\lambda,w,\sigma^2)}\)

\[\begin{aligned} f(\beta|\lambda,w,\sigma^2) &=\prod_{k=1}^Kf(\beta_k|\lambda_k,w,\sigma^2) \\ &= \prod_{k=1}^K\biggl( (1-\lambda_k)\frac{1}{\sqrt{2\pi\sigma^2} }\exp\biggl\{ -\frac{\beta_k^2}{2\sigma^2} \biggr\} +\lambda_k\frac{1}{\sqrt{2\pi w^2\sigma^2} } \exp\biggl\{-\frac{\beta_k^2}{2w^2\sigma^2} \biggr\}\biggr) \\ &=\prod_{k=1}^K \biggl[ \biggl(\frac{1}{\sqrt{2\pi\sigma^2} }\exp\biggl\{ -\frac{\beta_k^2}{2\sigma^2} \biggr\} \biggr)^{1-\lambda_k} \biggl( \frac{1}{\sqrt{2\pi w^2\sigma^2} } \exp\biggl\{-\frac{\beta_k^2}{2w^2\sigma^2} \biggr\}\biggr)^{\lambda_k} \biggr] \\ &\because \lambda_k \in \{0,1\} \\ &= \prod_{k=1}^K \biggl( \frac{1}{\sqrt{2\pi \sigma^2} }(w^2)^{-\lambda_k/2} \exp\biggl\{-\frac{(1-\lambda_k)\beta_k^2}{2\sigma^2} -\frac{\lambda_k\beta_k^2}{2w^2\sigma^2} \biggr\} \biggr) \\ &\propto \prod_{k=1}^K(\sigma^2)^{-1/2}w^{-\lambda_k}\exp\biggl\{ -\frac{\bigl( w^2(1-\lambda_k) + \lambda_k \bigr)\beta_k^2}{2w^2\sigma^2} \biggr\} \\ &=(\sigma^2)^{-K/2}w^{-\sum_{k=1}^K\lambda_k}\exp\biggl\{ -\sum_{k=1}^K\frac{\bigl( w^2(1-\lambda_k) + \lambda_k \bigr)}{2w^2\sigma^2} \beta_k^2 \biggr\} \end{aligned}\]

The term in the exponential \(\sum_{k=1}^K\frac{\bigl( w^2(1-\lambda_k) + \lambda_k \bigr)}{2w^2\sigma^2} \beta_k^2\) is a quadratic form, so it can be written as \[\sum_{k=1}^K\frac{\bigl( w^2(1-\lambda_k) + \lambda_k \bigr)}{2w^2\sigma^2} \beta_k^2 = \frac{1}{2}\beta^TD\beta/\sigma^2,\] where \(D\) is a \(K\times K\) diagonal matrix and \(D_{k,k} = \frac{ w^2(1-\lambda_k) + \lambda_k}{2w^2}\). So, \[\begin{aligned} f(\beta|\lambda,w,\sigma^2) \propto (\sigma^2)^{-K/2}w^{-\sum_{k=1}^K\lambda_k}\exp\biggl\{-\frac{1}{2}\beta^TD\beta/\sigma^2 \biggr\}. \end{aligned}\]

2) \(\boldsymbol{f(\sigma^2)}\)

Since the inverse of \(\sigma^2\) follows Uniform distribution, we need a transformation of variable to obtain \(f(\sigma^2)\). Letting \(u := 1/\sigma^2\), we have \[\begin{aligned} \frac{du}{d\sigma^2} = -\frac{1}{\sigma^4}. \end{aligned}\] Then, the pdf of \(\sigma^2\) is \[\begin{aligned} f(\sigma^2) = f(u)\biggl| -\frac{1}{\sigma^4}\biggr| \propto \frac{1}{\sigma^4}, \quad \sigma^2 \in (1/100, 1/4) \end{aligned}\] because \(f(u) = 1/96\) for \(u \in (4,100)\) and 0 otherwise.

3) \(f(w)\)

Since the shape parameter of \(Z\) is just 1, it follows exponential distribution with \(1/100\) as a rate parameter: \(Z\sim \text{Exp}(1/100)\). Then, \(w-1\) follows an exponential distribution, that is. \[f(w) = \frac{1}{100}\exp\Bigl\{-\frac{1}{100}(w-1) \Bigr\},\quad w>1.\]

(2) posteriors

1) \(\boldsymbol{(\beta|-)}\)

\[\begin{aligned} f(\beta|-) &\propto f(y|X,\beta,\sigma_y^2)f(\beta|\lambda,w,\sigma^2) \\ &\propto \exp\biggl\{-\frac{1}{2\sigma_y^2}(y - X\beta)^T(y-X\beta) \biggr\}f(\beta|\lambda,w,\sigma^2) \\ &\propto \exp\biggl\{-\frac{1}{2\sigma_y^2}(y - X\beta)^T(y-X\beta) \biggr\}\exp\biggl\{-\frac{1}{2}\beta^TD\beta/\sigma^2 \biggr\} \\ &\propto \exp\biggl\{ -\frac{1}{2\sigma^2}\beta^TD\beta - \frac{1}{2\sigma_y^2}\beta^TX^TX\beta + \frac{2}{2\sigma_y^2}\beta^TX^Ty \biggr\} \\ &= \exp\biggl\{ -\frac{1}{2}\beta^T\underbrace{\Bigl(\frac{D}{\sigma^2} + \frac{X^TX}{\sigma_y^2} \Bigr)}_{:=A}\beta +\frac{2}{2\sigma_y^2}\beta^TX^Ty \biggr\} \\ &\propto \exp\biggl\{-\frac{1}{2}\Bigl(\beta-A^{-1}X^Ty/\sigma_y^2\Bigr)^TA\Bigl(\beta-A^{-1}X^Ty/\sigma_y^2\Bigr) \biggr\} \\ &\propto \mathcal N_K\Bigl(\beta; A^{-1}X^Ty/\sigma_y^2, A^{-1} \Bigr) \end{aligned}\]

When implementing Gibbs sampling code for the posterior of \(\beta\), we need a function for inverse matrix in Rcpp (to improve the speed). Instead of using ‘solve’ in R, we will use ‘inv’ of Armadillo as follows (‘invertA’).

#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]

using namespace Rcpp;

// [[Rcpp::export]]
arma::mat invertA(const arma::mat& D, const arma::mat& X, double sigma2, double sigma_y2) {
  arma::mat A = D / sigma2 + X.t() * X / sigma_y2; // Compute A
  arma::mat A_inv = arma::inv(A);                  // Invert A
  return A_inv;
}

Based on the ‘invertA’, we can make a function in Rcpp for generating multivariate normal. We may use Choleskey decomposition in the covariance matrix, because when \(b \sim \mathcal N_p(\mu, V), z \sim \mathcal N_p(0,I_p)\) and \(V = LL^T\), then \(u = \mu + Lz\) has identical distribution with \(b\). Both follow \(p\)-dimensional normal, their mean is \(\mu\), and \(\text{Cov}[u] = L\text{Cov}[z]L^T = V\).

// [[Rcpp::export]]
arma::vec mvrnormRcpp(const arma::vec& mean, const arma::mat& cov) {
  int n = mean.size();
  arma::vec z = arma::randn(n);          // Generate standard normal 
  arma::mat L = arma::chol(cov, "lower"); // Cholesky decomposition of covariance matrix
  return mean + L * z;                    // Generate sample
}

This Rcpp code is in the file MC_HW3-2_kohanjun.cpp.

sourceCpp("MC_HW3-2_kohanjun.cpp")

2) \(\boldsymbol{\lambda_k|-}\)

\[\begin{aligned} f(\lambda_k|-) &\propto f(\beta_k|\lambda_k,w,\sigma^2)f(\lambda_k) \end{aligned}\] That is, this can be written as follows, because \(\lambda_k\in\{0,1\}\). \[\begin{aligned} f(\lambda_k = 1|-) &= \frac{f(\beta_k|\lambda_k=1,w,\sigma^2)f(\lambda_k=1)}{f(\beta_k|\lambda_k=0,w,\sigma^2)f(\lambda_k=0) + f(\beta_k|\lambda_k=1,w,\sigma^2)f(\lambda_k=1)} \end{aligned}\] In the given prior, \(\lambda_k\sim\text{Bernoulli}(1/2)\), so \(f(\lambda_k) = 1/2\) and \[\begin{aligned} f(\lambda_k = 1|-) = \frac{\mathcal N(\beta_k; 0, w^2\sigma^2)}{\mathcal N(\beta_k; 0, \sigma^2) + \mathcal N(\beta_k; 0, w^2\sigma^2)} \end{aligned}\]

So, we can set \(a\) as pdf value of \(\mathcal N(\beta_k; 0, w^2\sigma^2)\), and \(b\) as pdf of \(\mathcal N(\beta_k; 0, \sigma^2)\).

3) \(\boldsymbol{\sigma_y^2|-}\)

Inverse-Gamma multiplied by Normal returns an Inverse-Gamma (conjugate prior). That is, (\(c := 0.001, d := 0.001\)) \[\begin{aligned} &f(\sigma_y^2|-) \\ &\propto f(y|X,\beta, \sigma_y^2)f(\sigma_y^2) \\ &\propto |\sigma_y^2I_n|^{-1/2} \exp\biggl\{-\frac{1}{2\sigma_y^2}(y-X\beta)^T(y-X\beta)\biggr\}\cdot(\sigma_y^2)^{-c-1}\exp\{-d/\sigma_y^2\} \\ &\propto (\sigma_y^2)^{-c-n/2-1}\exp\biggl\{-\Bigl(\frac{1}{2}(y-X\beta)^T(y-X\beta)+d \Bigr)/\sigma_y^2 \biggr\} \\ &\propto \text{IG}\Bigl(c + n/2, \frac{1}{2}(y-X\beta)^T(y-X\beta) + d \Bigr) \end{aligned}\] So, we can use Gibbs sampler for \(\sigma_y^2\).

4) \(\boldsymbol{\sigma^2|-}\)

\[\begin{aligned} f(\sigma^2|-) &\propto f(\beta|\lambda,w,\sigma^2)f(\sigma^2) \\ &\propto (\sigma^2)^{-K/2}w^{-\sum_{k=1}^K\lambda_k}\exp\biggl\{-\frac{1}{2}\beta^TD\beta/\sigma^2 \biggr\}f(\sigma^2) \\ &\propto (\sigma^2)^{-K/2}\exp\biggl\{-\frac{1}{2}\beta^TD\beta/\sigma^2 \biggr\}(\sigma^2)^{-2} \\ &\propto \text{IG}\Bigl(\frac{K}{2} + 1, \frac{1}{2}\beta^TD\beta \Bigr), \quad \sigma^2 \in (1/100, 1/4) \end{aligned}\] and \(f(\sigma^2|-) = 0\) otherwise. This is not a standard Inverse-Gamma distribution (truncated), because \(\sigma^2\) should not be generated outside of the region \((1/100, 1/4)\), so we need a MH algorithm using such proposal density as log-Normal. The logarithm of target distribution is defined as follows. The normalization constant for truncation is omitted for simplicity, because it will be canceled out when obtaining acceptance ratio.

log_target_sigma2_post <- function(sigma2, shape, rate){
  if(sigma2 < 1/100 || sigma2 > 1/4){
    return(-Inf)
  } else {
    return(dinvgamma(sigma2, shape = shape, rate = rate))
  } 
  
}

5) \(\boldsymbol{w|-}\)

\[\begin{aligned} f(w|-) &\propto f(\beta | \lambda, w, \sigma^2)f(w) \\ &\propto w^{-\sum_{k=1}^K\lambda_k}\exp\biggl\{-\frac{1}{2}\beta^TD\beta/\sigma^2 \biggr\}\cdot f(w). \end{aligned}\] This is not an explicit form, so we use MH algorithm using log-Normal proposal. The logarithm of target distribution is defined as follows.

log_target_w_post <- function(w, lambda, beta, D, sigma2){ # logarithm
  if(w <= 1){
    return(-Inf)
    } else {
      
    t1 <- dexp(w-1, rate = 1/100, log = TRUE)
    t2 <- -sum(lambda)*log(w)
    t3 <- -1/2*sum(t(beta)%*%D%*%beta)/sigma2
    
    return(t1+t2+t3)
    }
}

(3) MH-within-Gibbs Algorithm

The corresponding algorithm is given below.

# MCMC
for (iter in 1:n_iter) {
  
  # Step 1: Sample beta
  D_diag <- (w^2 * (1 - lambda) + lambda) / (2 * w^2)
  D <- diag(D_diag)  # Create diagonal matrix D
  A_inv <- invertA(D, X, sigma2, sigma_y2)  # Invert A
  mean_beta <- A_inv %*% (t(X) %*% y) / sigma_y2
  beta <- as.vector(mvrnormRcpp(mean_beta, A_inv))
  
  # Step 2: Sample lambda_k
  for (k in 1:K) {
    a <- (1 / sqrt(w^2 * sigma2)) * exp(-beta[k]^2 / (2 * w^2 * sigma2))
    b <- (1 / sqrt(sigma2)) * exp(-beta[k]^2 / (2 * sigma2))
    p_lambda <- a / (a + b)
    lambda[k] <- rbinom(1, 1, p_lambda)
  }
  
  # Step 3: Sample sigma_y^2 from Inverse-Gamma posterior
  shape_sigma_y2 <- 0.001 + n / 2
  rate_sigma_y2 <- 0.001 + sum((y - X %*% beta)^2) / 2
  sigma_y2 <- rinvgamma(1, shape = shape_sigma_y2, rate = rate_sigma_y2)
  
  # Step 4: Sample sigma^2 usign MH
  shape_sigma2 <- K/2 + 1
  rate_sigma2 <- sum(beta * D_diag * beta)/2
  sigma2_temp <- rlnorm(1, log(sigma2), 2)
  log_R_MH <- (log_target_sigma2_post(sigma2_temp,shape_sigma2,rate_sigma2) - 
                 dlnorm(sigma2_temp, log(sigma2), 2, log = TRUE)) - 
    (log_target_sigma2_post(sigma2, shape_sigma2, rate_sigma2) -
       dlnorm(sigma2, log(sigma2_temp), 2, log = TRUE ) )
  if(log(runif(1)) < log_R_MH){sigma2 <- sigma2_temp}
  
  
  # Step 5: Sample w using MH
  w_p <- rlnorm(1, log(w), 2) # Log-normal proposal
  D_p <- diag((w_p^2 * (1 - lambda) + lambda) / (2 * w_p^2))
  
  log_R_MH <- (log_target_w_post(w_p,lambda,beta,D_p,sigma2) - dlnorm(w_p, log(w), 2, log = TRUE) ) -(log_target_w_post(w,lambda,beta,D,sigma2) - dlnorm(w, log(w_p), 2, log = TRUE) )


  if (log(runif(1)) < log_R_MH) {
    w <- w_p
  }
  
  # Store beta samples
  beta_samples[, iter] <- beta
  w_samples[iter] <- w
  sigma2_samples[iter] <- sigma2
  sigma_y2_samples[iter] <- sigma_y2

}

(4) trace plots & conclusion

The following is the trace plots of all \(\beta_k\)’s. The posterior mean shows the 1st, 2nd, and 5th one shows the largest absolute values.

par(mfrow = c(5,2))
for(k in 1:K){
  ts.plot(beta_samples[k,burnin:n_iter], ylab = bquote(~beta[.(k)] ))
}

rowMeans(beta_samples[,burnin:n_iter])

##  [1]  2.07769292 -1.50949146 -0.10025357  0.11732274  3.07678920 -0.03968955
##  [7]  0.06054715  0.10215052 -0.03385632  0.11451503

This is the trace plot of \(\sigma_y^2\),

par(mfrow = c(1,1))
ts.plot(sigma_y2_samples[burnin:n_iter],
        ylab = bquote(~sigma[.y]^2),
        main = bquote("trace plot of " ~ sigma[.y]^2))

and this is the trace plot of \(\sigma^2\). The summary shows the range of \(\sigma^2\) lies in \((1/100, 1/4)\).

ts.plot(sigma2_samples[burnin:n_iter], 
        ylab = bquote(~sigma^2), 
        main = bquote("trace plot of "~ sigma^2))

summary(sigma2_samples)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.01001 0.01096 0.01367 0.03801 0.02928 0.24904

The trace plot of \(w\) is as follows.

ts.plot(w_samples[burnin:n_iter], 
        ylab = "w", 
        main = "trace plot of w")

Monte Carlo Methods HW3

Q1. Data augmentation

Make R code for data augmentation and estimate toxicity probability \(\hat\pi_d = \alpha_d^{\exp(\hat a)}, d =1,\dots,J\) when patient 16 arrived on day 364.

Q2. Variable selection : R code for variable selection in multiple regression using a spike-and-slab prior.