Monte Carlo Methods HW3
Q1. Data augmentation
Make R code for data augmentation and estimate toxicity probability \(\hat\pi_d = \alpha_d^{\exp(\hat a)}, d =1,\dots,J\) when patient 16 arrived on day 364.
The question asks to estimate for the case of patient no. 16, so we use the data of the patients until no. 15.
rm(list = ls())
library(dplyr)
## Warning: 패키지 'dplyr'는 R 버전 4.3.3에서 작성되었습니다
##
## 다음의 패키지를 부착합니다: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
set.seed(2023311161)
dose_data <- read.table("dose.txt", header = TRUE)
dose_data <- dose_data[1:15,]
n <- nrow(dose_data)
This question sets the dose level as 20,30,40, and 50, and the corresponding prespecified constant \(\alpha\) is 0.1, 0.15, 0.2, 0.25. The assesment period \(T\) is given as 63 weeks, and I set the number of intervals of the period \(K\) as 9. Also, the constant \(C\) regarding \(\lambda_k\) is given as 2, and I set the initial value of \(a\) as 0. The total number of iteration is 5000.
alpha <- c(0.1, 0.15, 0.2, 0.25) # Prespecified toxicity probabilities for dose levels
K <- 9 # Number of intervals
T <- 63 # Toxicity assessment period in days (9 weeks)
iterations <- 5000 # Number of MCMC iterations
C <- 2
a <- 0 # Starting value for a
The \(\tilde\lambda_k\) is defined as \(\tilde\lambda_k := K/(T(K-k+0.5))\), which corresponds to the ‘tilde_lambda’ vector. Also, \(\lambda_k \sim \Gamma(\tilde\lambda_k/C, 1/C)\), so I set the initial values of each \(\lambda_k\) as the following R code.
tilde_lambda <- sapply(1:K, function(k) K/(T*(K - k + 0.5)) )
lambda <- sapply(1:K, function(k) rgamma(1, shape = tilde_lambda[k]/C, rate = 1/C))
Now, we have to change and add some values in the dataset. This covers the case where the patient arrives at the day 364, which becomes the end of the test. So, the ‘DayOffStudy’ values which were bigger than 364 is swifted to 364. Also, there are some ui’s whose values are greater than \(T\), so those ui’s are changed into \(T\).
If toxicity occurs (DLT = 1), we set the ti = ui. Otherwise, we set ti infinity. The Yi is 1 if DLT = 1, Yi = 0 if DLT = 0 and ui = T. For the rest of the cases, se we YI as ‘NA’. By, definition, we set xi as minimum of ui and ti. The ‘alpha’ column is generated according to the dose level.
dose_data <- dose_data %>%
#
mutate(DayoffStudy = ifelse(DayoffStudy >= 364, 364, DayoffStudy)) %>%
mutate(ui = ifelse(DayoffStudy - DayonStudy < T,DayoffStudy - DayonStudy, T ) ) %>%
mutate(ti = ifelse(DLT == 1, ui, Inf) ) %>%
mutate(Yi = case_when(DLT == 1 ~ 1,
DLT == 0 & ui == 63 ~ 0,
TRUE ~ NA)
) %>%
mutate(xi = min(ui,ti) ) %>%
mutate(alpha = case_when(Dose == 30 ~ 0.15,
Dose == 40 ~ 0.2,
Dose == 50 ~ 0.25,
TRUE ~ 0.1))
dose_data
The following code is associated with \(s_{i,k}\) that is defined in the section 2.3 of the paper. To summarize the definition of \(s_{i,k}\), it becomes \(T/K\) (in my setting, it becomes 7) if \(x_i\) is on the right side of the given interval, becomes 0 if it is on the left side of the inverval, and \(x_i - h_{k-1}\) if \(x_i\) is in the inverval \([h_{k-1}, h_k)\). For example, when input is \(x=37\), it’s in the 6th inverval \([35, 42)\), so \(s_{i,k} = 7, 7, 7 ,7, 7, 2, 0, 0, 0\)
create_si <- function(x) {
# Initialize s vector with 9 zeros
s <- numeric(9)
# Find the partition (k)
if(x == 63){k <- 9}
else{k <- floor(x / 7) + 1} # k ranges from 1 to 9 based on x}
# Set s_j = 7 for j < k
if (k > 1) {
s[1:(k - 1)] <- 7
}
# Set s_k
s[k] <- x - (k - 1) * 7
return(s)
}
create_si(37)
## [1] 7 7 7 7 7 2 0 0 0
Based on this function, we can now create a new function that obtains all the \(s_{i,k}\) for all \(i,k\). The initial values of all \(s_{i,k}\) are in the ‘sis’ matrix.
create_sis <- function(dose_data){
n <- nrow(dose_data)
sis <- matrix(0, nrow = n, ncol = K)
for(r in 1:n){
sis[r,] <- create_si(dose_data[r,'xi'])
}
return(sis)
}
sis <- create_sis(dose_data)
This code shows how to define \(\delta_{i,k}\). The \(\delta_{i,k}\) become 1 when the toxicity of \(i\)th patient occurs in the \(k\)th interval. If there is no toxicity or if YI is missing, we leave the \(\delta_{i,k}\) as 0.
create_deltas <- function(dose_data){
deltas <- matrix(0, nrow = nrow(dose_data), ncol = K)
for(r in 1:nrow(dose_data)){
if( is.na(dose_data[r,'Yi']) ){next}
else if(dose_data[r,'Yi'] == 1){
k <- ceiling(dose_data[r,'xi']/7) # 7 == T/K
deltas[r,k] <- 1
}
}
return(deltas)
}
deltas <- create_deltas(dose_data)
deltas
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
## [1,] 0 0 0 0 0 0 0 0 0
## [2,] 0 0 0 0 0 0 0 0 0
## [3,] 0 0 0 0 0 0 0 0 0
## [4,] 0 0 0 0 0 0 0 0 0
## [5,] 0 0 0 0 0 0 0 0 0
## [6,] 0 0 0 0 0 0 0 0 0
## [7,] 0 0 0 0 0 0 0 0 0
## [8,] 0 0 0 0 0 0 0 0 0
## [9,] 0 0 0 0 0 0 0 0 0
## [10,] 0 0 0 0 0 0 0 0 0
## [11,] 0 0 1 0 0 0 0 0 0
## [12,] 0 0 1 0 0 0 0 0 0
## [13,] 0 0 0 0 0 0 0 0 0
## [14,] 0 0 0 0 0 0 0 0 0
## [15,] 0 0 1 0 0 0 0 0 0
The missing Yi’s is generated according to the following distribution: \[y_i|(\mathcal D_{obs},a,\lambda) \sim \text{Bernoulli}\biggl( \frac{ \alpha_{d_i}^{\exp(a)}\exp\{-\sum_{k=1}^K\lambda_ks_{i,k} \} }{ 1-\alpha_{d_i}^{\exp(a)} + \alpha_{d_i}^{\exp(a)}\exp\{-\sum_{k=1}^K\lambda_ks_{i,k} \} } \biggr)\] The \(\alpha_{d_i}\) is the prespecified \(\alpha\) of patient \(i\) that corresponds to its dose level \(d_i\). The Bernoulli parameter is in the ‘prob’ vector.
sample_Yi <- function(alpha_d, a, lambda, si) {#alpha_d <- 0.15
prob <- alpha_d^(exp(a))*exp(-sum(lambda*si)) /
(1 - alpha_d^(exp(a)) + alpha_d^(exp(a))*exp(-sum(lambda*si)) )
return(rbinom(1, 1, prob))
}
Next purpose is to obtain a function of \(f(y|a)\), which is denoted as likelihood \(L(y|a)\) in the paper. This computes \((\alpha_{d_i}^{\exp(a)})^{y_i}(1 - \alpha_{d_i}^{\exp(a)} )^{1-y_i}\) for every \(i\)th patient and multiply all of them.
f_y_a <- function(a) {
Yi <- dose_data$Yi
alpha <- dose_data$alpha
prod_val <- prod(sapply(1:length(Yi), function(i) {
alpha[i]^(exp(a) * Yi[i]) * (1 - alpha[i]^(exp(a)))^(1 - Yi[i])
}))
# Check for extreme cases
if (is.infinite(prod_val) || is.nan(prod_val)) prod_val <- 0
return(as.numeric(prod_val))
}
‘dose_na’ is for the missing values, and Y_missings stores the samples of imputations. ‘a_samples’ and ‘lambda_samples’ are the posteriors of each parameters.
na_rows <- which(is.na(dose_data$Yi))
dose_na <- dose_data[na_rows,]
Y_missings <- matrix(NA, nrow = length(na_rows), ncol = iterations)
rownames(Y_missings) <- na_rows
a_samples <- rep(NA, iterations)
lambda_samples <- matrix(NA, nrow = K, ncol = iterations)
Now, we delve into the MCMC algorithm. This code directly follows the algorithm given in the paper.
For the Imputation step, impute the missing Yi’s using the corresponding \(s_{i,k}\) and sample_Yi function. If 1 is imputed for the missing Yi, we generate new ti between 1 and its ui, and set the new xi accordingly. After updating the dose_data, we update the \(\delta_{i,k}\) and \(s_{i,k}\) accordingly.
We update \(a\) and \(\lambda_k\) for the P-step. Since MH is needed for \(a\), we use \(\mathcal N(a^{(t)}, 2^2)\) as a proposal density. Then, the acceptance ratio (MH_R) is calculated as \[\frac{f(y|a')f(a')}{f(y|a^{(t)}) f(a^{(t)})}.\] because the proposal ones are canceled out.
The posterior of \(\lambda_k\) is \[\lambda_k|y \sim \Gamma\Bigl(\tilde\lambda_k/C + \sum_{i=1}^n\delta_{i,k}, 1/C + \sum_{i=1}^ny_is_{i,k} \Bigr).\] So, each shape and rate parameter is updated using the ‘deltas’ and ‘sis’ matrix, and generate new \(\lambda_k\) accordingly.
# MCMC
for (iter in 1:iterations) {#iter <- 1
# I-step
for (i in 1:length(na_rows) ) {
si <- sis[na_rows[i],]
Y_missings[i,iter] <- sample_Yi(dose_na[i,'alpha'], a, lambda, si)
if(Y_missings[i,iter] == 1){
ui_temp <- dose_na[i,'ui']
dose_na[i,'ti'] <- sample(seq(1:ui_temp), 1)
dose_na[i,'xi'] <- min(dose_na[i,'ui'], dose_na[i,'ti'])
}
}
dose_na$Yi <- Y_missings[,iter]
dose_data[na_rows,] <- dose_na
deltas <- create_deltas(dose_data)
sis <- create_sis(dose_data)
# P-step
# Update a
a_temp <- rnorm(1, a, 2) # from normal with mean a and sd 2
MH_R <- (f_y_a(a_temp)*dnorm(a_temp,0,1) )/(f_y_a(a)*dnorm(a,0,1) )
if(runif(1) < MH_R){
a_samples[iter] <- a_temp
a <- a_temp
} else {
a_samples[iter] <- a
}
# Update lambda
for (k in 1:K) {
shape <- tilde_lambda[k]/C + sum(deltas[,k])
rate <- 1/C + dose_data$Yi*sis[,k]
lambda[k] <- rgamma(1, shape = shape, rate = rate)
}
lambda_samples[,iter] <- lambda
}
The following shows the trace plot of \(a\) after burn-in period (1000).
burnin <- 1000
ts.plot(a_samples[burnin:iterations], main = "trace plot of a", ylab = 'a')
Q2. Variable selection : R code for variable selection in multiple regression using a spike-and-slab prior.
(1) Model Setting
The model setting is as follows: for \(k=1,\dots,K=10\),
\[\begin{aligned} y &\sim \mathcal N_n(X\beta, \sigma_y^2I_n) \\ \beta_k|(\lambda_k, w, \sigma^2) &\sim (1-\lambda_k)\mathcal(0,\sigma^2) + \lambda_k\mathcal N(0,w^2\sigma^2) \\ \sigma_y^2 &\sim \text{Inverse-Gamma}(0.001, 0.001) \\ \lambda_k &\sim \text{Bernoulli}(1/2) \\ 1/\sigma^2 &\sim \text{Unif}(4,100) \\ w &\sim 1+Z;\quad Z\sim\Gamma(1,1/100) \end{aligned}\] where \(n\) is the number of all data.
rm(list = ls())
set.seed(2023311161)
library(Rcpp)
## Warning: 패키지 'Rcpp'는 R 버전 4.3.3에서 작성되었습니다
library(RcppArmadillo)
library(invgamma)
spikeslab <- read.table("spikeslab.txt", header = TRUE)
head(spikeslab)
y <- spikeslab[,'Y']
X <- as.matrix(spikeslab[,-1])
n <- nrow(spikeslab)
K <- ncol(X)
What we have to generate are \(\beta_k, \sigma_y^2, \lambda_k, \sigma^2\) and \(w\), so we set the initial values as follows (randomly). The ‘p_lambda’ is for the posterior of \(\lambda_k\).
sigma2 <- 1/10 # Initial value for sigma^2
sigma_y2 <- 1 # Initial value for sigma_y^2
w <- 1 # Initial value for w
beta <- rep(0, K) # Initialize beta
lambda <- rbinom(K, 1, 0.5) # Initialize lambda indicators
p_lambda <- 1/2
n_iter <- 5000
burnin <- 1000
Also, we make empty matrix and vectors for the MCMC samples.
beta_samples <- matrix(0, nrow = K, ncol = n_iter)
w_samples <- rep(0,n_iter)
sigma2_samples <- rep(0,n_iter)
sigma_y2_samples <- rep(0,n_iter)
To sum up, we can write the posterior of all parameters as follows: \[\begin{aligned} f(\beta, \lambda, \sigma_y^2, \sigma^2,w|y,X) &\propto f(y|X,\beta,\sigma_y^2)f(\beta, \lambda, \sigma_y^2, \sigma^2,w) \\ &\propto f(y|X,\beta,\sigma_y^2)f(\beta|\lambda,w,\sigma^2)f(\lambda, \sigma_y^2, \sigma^2,w) \\ &\propto f(y|X,\beta,\sigma_y^2)f(\beta|\lambda,w,\sigma^2)f(\lambda)f( \sigma_y^2) f( \sigma^2)f(w) \end{aligned}\]
Before delving into each of the posteriors, our purpose is to simplify the distribution \(\beta, \sigma^2,\) and \(w\).
1) \(\boldsymbol{f(\beta|\lambda,w,\sigma^2)}\)
\[\begin{aligned} f(\beta|\lambda,w,\sigma^2) &=\prod_{k=1}^Kf(\beta_k|\lambda_k,w,\sigma^2) \\ &= \prod_{k=1}^K\biggl( (1-\lambda_k)\frac{1}{\sqrt{2\pi\sigma^2} }\exp\biggl\{ -\frac{\beta_k^2}{2\sigma^2} \biggr\} +\lambda_k\frac{1}{\sqrt{2\pi w^2\sigma^2} } \exp\biggl\{-\frac{\beta_k^2}{2w^2\sigma^2} \biggr\}\biggr) \\ &=\prod_{k=1}^K \biggl[ \biggl(\frac{1}{\sqrt{2\pi\sigma^2} }\exp\biggl\{ -\frac{\beta_k^2}{2\sigma^2} \biggr\} \biggr)^{1-\lambda_k} \biggl( \frac{1}{\sqrt{2\pi w^2\sigma^2} } \exp\biggl\{-\frac{\beta_k^2}{2w^2\sigma^2} \biggr\}\biggr)^{\lambda_k} \biggr] \\ &\because \lambda_k \in \{0,1\} \\ &= \prod_{k=1}^K \biggl( \frac{1}{\sqrt{2\pi \sigma^2} }(w^2)^{-\lambda_k/2} \exp\biggl\{-\frac{(1-\lambda_k)\beta_k^2}{2\sigma^2} -\frac{\lambda_k\beta_k^2}{2w^2\sigma^2} \biggr\} \biggr) \\ &\propto \prod_{k=1}^K(\sigma^2)^{-1/2}w^{-\lambda_k}\exp\biggl\{ -\frac{\bigl( w^2(1-\lambda_k) + \lambda_k \bigr)\beta_k^2}{2w^2\sigma^2} \biggr\} \\ &=(\sigma^2)^{-K/2}w^{-\sum_{k=1}^K\lambda_k}\exp\biggl\{ -\sum_{k=1}^K\frac{\bigl( w^2(1-\lambda_k) + \lambda_k \bigr)}{2w^2\sigma^2} \beta_k^2 \biggr\} \end{aligned}\]
The term in the exponential \(\sum_{k=1}^K\frac{\bigl( w^2(1-\lambda_k) + \lambda_k \bigr)}{2w^2\sigma^2} \beta_k^2\) is a quadratic form, so it can be written as \[\sum_{k=1}^K\frac{\bigl( w^2(1-\lambda_k) + \lambda_k \bigr)}{2w^2\sigma^2} \beta_k^2 = \frac{1}{2}\beta^TD\beta/\sigma^2,\] where \(D\) is a \(K\times K\) diagonal matrix and \(D_{k,k} = \frac{ w^2(1-\lambda_k) + \lambda_k}{2w^2}\). So, \[\begin{aligned} f(\beta|\lambda,w,\sigma^2) \propto (\sigma^2)^{-K/2}w^{-\sum_{k=1}^K\lambda_k}\exp\biggl\{-\frac{1}{2}\beta^TD\beta/\sigma^2 \biggr\}. \end{aligned}\]
2) \(\boldsymbol{f(\sigma^2)}\)
Since the inverse of \(\sigma^2\) follows Uniform distribution, we need a transformation of variable to obtain \(f(\sigma^2)\). Letting \(u := 1/\sigma^2\), we have \[\begin{aligned} \frac{du}{d\sigma^2} = -\frac{1}{\sigma^4}. \end{aligned}\] Then, the pdf of \(\sigma^2\) is \[\begin{aligned} f(\sigma^2) = f(u)\biggl| -\frac{1}{\sigma^4}\biggr| \propto \frac{1}{\sigma^4}, \quad \sigma^2 \in (1/100, 1/4) \end{aligned}\] because \(f(u) = 1/96\) for \(u \in (4,100)\) and 0 otherwise.
3) \(f(w)\)
Since the shape parameter of \(Z\) is just 1, it follows exponential distribution with \(1/100\) as a rate parameter: \(Z\sim \text{Exp}(1/100)\). Then, \(w-1\) follows an exponential distribution, that is. \[f(w) = \frac{1}{100}\exp\Bigl\{-\frac{1}{100}(w-1) \Bigr\},\quad w>1.\]
(2) posteriors
1) \(\boldsymbol{(\beta|-)}\)
\[\begin{aligned} f(\beta|-) &\propto f(y|X,\beta,\sigma_y^2)f(\beta|\lambda,w,\sigma^2) \\ &\propto \exp\biggl\{-\frac{1}{2\sigma_y^2}(y - X\beta)^T(y-X\beta) \biggr\}f(\beta|\lambda,w,\sigma^2) \\ &\propto \exp\biggl\{-\frac{1}{2\sigma_y^2}(y - X\beta)^T(y-X\beta) \biggr\}\exp\biggl\{-\frac{1}{2}\beta^TD\beta/\sigma^2 \biggr\} \\ &\propto \exp\biggl\{ -\frac{1}{2\sigma^2}\beta^TD\beta - \frac{1}{2\sigma_y^2}\beta^TX^TX\beta + \frac{2}{2\sigma_y^2}\beta^TX^Ty \biggr\} \\ &= \exp\biggl\{ -\frac{1}{2}\beta^T\underbrace{\Bigl(\frac{D}{\sigma^2} + \frac{X^TX}{\sigma_y^2} \Bigr)}_{:=A}\beta +\frac{2}{2\sigma_y^2}\beta^TX^Ty \biggr\} \\ &\propto \exp\biggl\{-\frac{1}{2}\Bigl(\beta-A^{-1}X^Ty/\sigma_y^2\Bigr)^TA\Bigl(\beta-A^{-1}X^Ty/\sigma_y^2\Bigr) \biggr\} \\ &\propto \mathcal N_K\Bigl(\beta; A^{-1}X^Ty/\sigma_y^2, A^{-1} \Bigr) \end{aligned}\]
When implementing Gibbs sampling code for the posterior of \(\beta\), we need a function for inverse matrix in Rcpp (to improve the speed). Instead of using ‘solve’ in R, we will use ‘inv’ of Armadillo as follows (‘invertA’).
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
// [[Rcpp::export]]
arma::mat invertA(const arma::mat& D, const arma::mat& X, double sigma2, double sigma_y2) {
arma::mat A = D / sigma2 + X.t() * X / sigma_y2; // Compute A
arma::mat A_inv = arma::inv(A); // Invert A
return A_inv;
}
Based on the ‘invertA’, we can make a function in Rcpp for generating multivariate normal. We may use Choleskey decomposition in the covariance matrix, because when \(b \sim \mathcal N_p(\mu, V), z \sim \mathcal N_p(0,I_p)\) and \(V = LL^T\), then \(u = \mu + Lz\) has identical distribution with \(b\). Both follow \(p\)-dimensional normal, their mean is \(\mu\), and \(\text{Cov}[u] = L\text{Cov}[z]L^T = V\).
// [[Rcpp::export]]
arma::vec mvrnormRcpp(const arma::vec& mean, const arma::mat& cov) {
int n = mean.size();
arma::vec z = arma::randn(n); // Generate standard normal
arma::mat L = arma::chol(cov, "lower"); // Cholesky decomposition of covariance matrix
return mean + L * z; // Generate sample
}
This Rcpp code is in the file MC_HW3-2_kohanjun.cpp.
sourceCpp("MC_HW3-2_kohanjun.cpp")
2) \(\boldsymbol{\lambda_k|-}\)
\[\begin{aligned} f(\lambda_k|-) &\propto f(\beta_k|\lambda_k,w,\sigma^2)f(\lambda_k) \end{aligned}\] That is, this can be written as follows, because \(\lambda_k\in\{0,1\}\). \[\begin{aligned} f(\lambda_k = 1|-) &= \frac{f(\beta_k|\lambda_k=1,w,\sigma^2)f(\lambda_k=1)}{f(\beta_k|\lambda_k=0,w,\sigma^2)f(\lambda_k=0) + f(\beta_k|\lambda_k=1,w,\sigma^2)f(\lambda_k=1)} \end{aligned}\] In the given prior, \(\lambda_k\sim\text{Bernoulli}(1/2)\), so \(f(\lambda_k) = 1/2\) and \[\begin{aligned} f(\lambda_k = 1|-) = \frac{\mathcal N(\beta_k; 0, w^2\sigma^2)}{\mathcal N(\beta_k; 0, \sigma^2) + \mathcal N(\beta_k; 0, w^2\sigma^2)} \end{aligned}\]
So, we can set \(a\) as pdf value of \(\mathcal N(\beta_k; 0, w^2\sigma^2)\), and \(b\) as pdf of \(\mathcal N(\beta_k; 0, \sigma^2)\).
3) \(\boldsymbol{\sigma_y^2|-}\)
Inverse-Gamma multiplied by Normal returns an Inverse-Gamma (conjugate prior). That is, (\(c := 0.001, d := 0.001\)) \[\begin{aligned} &f(\sigma_y^2|-) \\ &\propto f(y|X,\beta, \sigma_y^2)f(\sigma_y^2) \\ &\propto |\sigma_y^2I_n|^{-1/2} \exp\biggl\{-\frac{1}{2\sigma_y^2}(y-X\beta)^T(y-X\beta)\biggr\}\cdot(\sigma_y^2)^{-c-1}\exp\{-d/\sigma_y^2\} \\ &\propto (\sigma_y^2)^{-c-n/2-1}\exp\biggl\{-\Bigl(\frac{1}{2}(y-X\beta)^T(y-X\beta)+d \Bigr)/\sigma_y^2 \biggr\} \\ &\propto \text{IG}\Bigl(c + n/2, \frac{1}{2}(y-X\beta)^T(y-X\beta) + d \Bigr) \end{aligned}\] So, we can use Gibbs sampler for \(\sigma_y^2\).
4) \(\boldsymbol{\sigma^2|-}\)
\[\begin{aligned} f(\sigma^2|-) &\propto f(\beta|\lambda,w,\sigma^2)f(\sigma^2) \\ &\propto (\sigma^2)^{-K/2}w^{-\sum_{k=1}^K\lambda_k}\exp\biggl\{-\frac{1}{2}\beta^TD\beta/\sigma^2 \biggr\}f(\sigma^2) \\ &\propto (\sigma^2)^{-K/2}\exp\biggl\{-\frac{1}{2}\beta^TD\beta/\sigma^2 \biggr\}(\sigma^2)^{-2} \\ &\propto \text{IG}\Bigl(\frac{K}{2} + 1, \frac{1}{2}\beta^TD\beta \Bigr), \quad \sigma^2 \in (1/100, 1/4) \end{aligned}\] and \(f(\sigma^2|-) = 0\) otherwise. This is not a standard Inverse-Gamma distribution (truncated), because \(\sigma^2\) should not be generated outside of the region \((1/100, 1/4)\), so we need a MH algorithm using such proposal density as log-Normal. The logarithm of target distribution is defined as follows. The normalization constant for truncation is omitted for simplicity, because it will be canceled out when obtaining acceptance ratio.
log_target_sigma2_post <- function(sigma2, shape, rate){
if(sigma2 < 1/100 || sigma2 > 1/4){
return(-Inf)
} else {
return(dinvgamma(sigma2, shape = shape, rate = rate))
}
}
5) \(\boldsymbol{w|-}\)
\[\begin{aligned} f(w|-) &\propto f(\beta | \lambda, w, \sigma^2)f(w) \\ &\propto w^{-\sum_{k=1}^K\lambda_k}\exp\biggl\{-\frac{1}{2}\beta^TD\beta/\sigma^2 \biggr\}\cdot f(w). \end{aligned}\] This is not an explicit form, so we use MH algorithm using log-Normal proposal. The logarithm of target distribution is defined as follows.
log_target_w_post <- function(w, lambda, beta, D, sigma2){ # logarithm
if(w <= 1){
return(-Inf)
} else {
t1 <- dexp(w-1, rate = 1/100, log = TRUE)
t2 <- -sum(lambda)*log(w)
t3 <- -1/2*sum(t(beta)%*%D%*%beta)/sigma2
return(t1+t2+t3)
}
}
(3) MH-within-Gibbs Algorithm
The corresponding algorithm is given below.
# MCMC
for (iter in 1:n_iter) {
# Step 1: Sample beta
D_diag <- (w^2 * (1 - lambda) + lambda) / (2 * w^2)
D <- diag(D_diag) # Create diagonal matrix D
A_inv <- invertA(D, X, sigma2, sigma_y2) # Invert A
mean_beta <- A_inv %*% (t(X) %*% y) / sigma_y2
beta <- as.vector(mvrnormRcpp(mean_beta, A_inv))
# Step 2: Sample lambda_k
for (k in 1:K) {
a <- (1 / sqrt(w^2 * sigma2)) * exp(-beta[k]^2 / (2 * w^2 * sigma2))
b <- (1 / sqrt(sigma2)) * exp(-beta[k]^2 / (2 * sigma2))
p_lambda <- a / (a + b)
lambda[k] <- rbinom(1, 1, p_lambda)
}
# Step 3: Sample sigma_y^2 from Inverse-Gamma posterior
shape_sigma_y2 <- 0.001 + n / 2
rate_sigma_y2 <- 0.001 + sum((y - X %*% beta)^2) / 2
sigma_y2 <- rinvgamma(1, shape = shape_sigma_y2, rate = rate_sigma_y2)
# Step 4: Sample sigma^2 usign MH
shape_sigma2 <- K/2 + 1
rate_sigma2 <- sum(beta * D_diag * beta)/2
sigma2_temp <- rlnorm(1, log(sigma2), 2)
log_R_MH <- (log_target_sigma2_post(sigma2_temp,shape_sigma2,rate_sigma2) -
dlnorm(sigma2_temp, log(sigma2), 2, log = TRUE)) -
(log_target_sigma2_post(sigma2, shape_sigma2, rate_sigma2) -
dlnorm(sigma2, log(sigma2_temp), 2, log = TRUE ) )
if(log(runif(1)) < log_R_MH){sigma2 <- sigma2_temp}
# Step 5: Sample w using MH
w_p <- rlnorm(1, log(w), 2) # Log-normal proposal
D_p <- diag((w_p^2 * (1 - lambda) + lambda) / (2 * w_p^2))
log_R_MH <- (log_target_w_post(w_p,lambda,beta,D_p,sigma2) - dlnorm(w_p, log(w), 2, log = TRUE) ) -(log_target_w_post(w,lambda,beta,D,sigma2) - dlnorm(w, log(w_p), 2, log = TRUE) )
if (log(runif(1)) < log_R_MH) {
w <- w_p
}
# Store beta samples
beta_samples[, iter] <- beta
w_samples[iter] <- w
sigma2_samples[iter] <- sigma2
sigma_y2_samples[iter] <- sigma_y2
}
(4) trace plots & conclusion
The following is the trace plots of all \(\beta_k\)’s. The posterior mean shows the 1st, 2nd, and 5th one shows the largest absolute values.
par(mfrow = c(5,2))
for(k in 1:K){
ts.plot(beta_samples[k,burnin:n_iter], ylab = bquote(~beta[.(k)] ))
}
rowMeans(beta_samples[,burnin:n_iter])
## [1] 2.07769292 -1.50949146 -0.10025357 0.11732274 3.07678920 -0.03968955
## [7] 0.06054715 0.10215052 -0.03385632 0.11451503
This is the trace plot of \(\sigma_y^2\),
par(mfrow = c(1,1))
ts.plot(sigma_y2_samples[burnin:n_iter],
ylab = bquote(~sigma[.y]^2),
main = bquote("trace plot of " ~ sigma[.y]^2))
and this is the trace plot of \(\sigma^2\). The summary shows the range of \(\sigma^2\) lies in \((1/100, 1/4)\).
ts.plot(sigma2_samples[burnin:n_iter],
ylab = bquote(~sigma^2),
main = bquote("trace plot of "~ sigma^2))
summary(sigma2_samples)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.01001 0.01096 0.01367 0.03801 0.02928 0.24904
The trace plot of \(w\) is as follows.
ts.plot(w_samples[burnin:n_iter],
ylab = "w",
main = "trace plot of w")