Monte Carlo Methods HW6
1. SAMC :
Consider the gaussian mixture model given by \[f(x) = \frac{1}{3}\mathcal N\biggl(\begin{pmatrix}-4\\-6\end{pmatrix},\begin{pmatrix}1 & 0.9 \\0.9 & 1\end{pmatrix}\biggr) + \frac{1}{3}\mathcal N\biggl(\begin{pmatrix}0 \\ 0\end{pmatrix},\begin{pmatrix}1 & 0 \\ 0 & 1\end{pmatrix}\biggr) + \frac{1}{3}\mathcal N\biggl(\begin{pmatrix}6\\8\end{pmatrix},\begin{pmatrix}1&-0.9\\-0.9&1\end{pmatrix}\biggr).\]
Generate random number from Gaussian mixture model using SAMC.
(1) basic setup
This question is about Gaussian mixture, so we need same parameters
as HW4 (Hamiltonian MC). The initial value is set to (-1,-1), and the
relevant rcpp functions are in "MC_HW6_kohanjun.cpp".
rm(list = ls())
set.seed(2023311161)
library(Rcpp)
## Warning: 패키지 'Rcpp'는 R 버전 4.3.3에서 작성되었습니다
library(RcppArmadillo)
library(RcppDist)
## Warning: 패키지 'RcppDist'는 R 버전 4.3.3에서 작성되었습니다
library(mvtnorm)
## Warning: 패키지 'mvtnorm'는 R 버전 4.3.3에서 작성되었습니다
library(ggplot2)
## Warning: 패키지 'ggplot2'는 R 버전 4.3.3에서 작성되었습니다
mu1 <- c(-4, -6)
mu2 <- c(0, 0)
mu3 <- c(6, 8)
mu <- list(mu1 = mu1, mu2 = mu2, mu3 = mu3)
Sigma1 <- matrix(c(1, 0.9, 0.9, 1), nrow = 2)
Sigma2 <- diag(2)
Sigma3 <- matrix(c(1, -0.9, -0.9, 1), nrow = 2)
Sigma <- list(Sigma1 = Sigma1, Sigma2 = Sigma2, Sigma3 = Sigma3)
Sigma1_inv <- solve(Sigma1)
Sigma2_inv <- solve(Sigma2)
Sigma3_inv <- solve(Sigma3)
x <- matrix(c(-1,-1), nrow = 1)
sourceCpp("MC_HW6_kohanjun.cpp")
In the rcpp file, there are dmvnorm_rcpp and
rmvnorm_rcpp functions that returns the pdf value of given
multivariate normal distribution and generates random number of
multivariate normal respectivley.
// [[Rcpp::export]]
vec dmvnorm_rcpp(const mat& x, const vec& mean, const mat& sigma, bool logd = false) {
return dmvnorm(x, mean, sigma, logd);
}
// [[Rcpp::export]]
mat rmvnorm_rcpp(int n, const vec& mean, const mat& sigma) {
return rmvnorm(n, mean, sigma);
}
We also use same logarithm density function log_target
that uses log-sum-trick as follows. This function returns same value as
direct density calculation.
log_target <- function(x) {
K <- 3
log_comp_densities <- numeric(K)
ws <- rep(1/K,K)
for (k in 1:K) {
log_comp_densities[k] <- dmvnorm_rcpp(x,mean = mu[[k]], sigma = Sigma[[k]], logd = TRUE)
}
log_weights <- log(ws)
# Log-sum-exp trick
alpha <- max(log_weights + log_comp_densities)
log_density <- alpha + log(sum(exp(log_weights + log_comp_densities - alpha)))
return(log_density)
}
exp(log_target(x))
## [1] 0.01951661
as.numeric((1/3)*
(dmvnorm_rcpp(x, mean=mu1, sigma=Sigma1)+
dmvnorm_rcpp(x, mean=mu2, sigma=Sigma2) +
dmvnorm_rcpp(x, mean=mu3, sigma=Sigma3)))
## [1] 0.01951661
(2) subregions
The purpose of this section is to partition the total sample space
\(\mathcal X\) depending on the value
of the density. After comparing the values of the density at 3 modes,
the maximum of the \(f(x)\) is about
\(0.1217088\). After rounding up to the
4th decimal place, we set this as the maximum of the density
f_max' The reason for rounding up the maximum is to
consider the probability that the calculated \(0.1217088\) is slightly less than the real
maximum. I set the number of subregions to be \(M = 20\), and I made equally-space
f_interval from 0 to f_max.
exp(log_target(matrix(c(-4,-6), nrow = 1 ) ) )
## [1] 0.1217088
exp(log_target(matrix(c(0,0), nrow = 1 ) ) )
## [1] 0.05305165
exp(log_target(matrix(c(6,8), nrow = 1 ) ) )
## [1] 0.1217088
f_max <- ceiling(exp(log_target(matrix(c(6,8), nrow = 1 ) ) ) * 10000) / 10000
M <- 20
f_interval <- seq(0, f_max, length.out = M+1)
Let \(f^{(m)}\) denote the \(m\)th boundary (\(m = 1,...,M+1\)) so that ‘f_max’ equals
\(f^{(M+1)}\) and \(f^{(0)}=0\). The following
‘assign_subregion’ function determines in which subregion the input
\(x\) lies. In which
function, the term f >= f_interval[-(M+1)] determines
whether \(f^{(1)},f^{(2)},...,f^{(M)} \le
f\), and f < f_interval[-1] determines whether
\(f <
f^{(2)},f^{(3)},...,f^{(M+1)}\). There should be only one \(n\) such that \(f^{(n)} \le f < f^{(n+1)}\), so the
function assign_subregion determines that \(f\) is in \(n\)th subregion (\(n = 1,...,M\)). As the value of the
distribution \(f\) is high, the index
of the subregion become high as well.
assign_subregion <- function(x){
f <- exp(log_target(x))
which(f >= f_interval[-(M+1)] & f < f_interval[-1])
}
assign_subregion(matrix(c(-4,-6), nrow=1))
## [1] 20
assign_subregion(matrix(c(0,0), nrow=1))
## [1] 9
(3) other parameters
The \(\pi\) vector is set to have high value for lower distribution (high energy) and low value for high distribution (low energy). If the weight is set to be uniform (\(1/M\)), the number of samples from \(2\)nd mode (whose joint pdf values are lower than other modes) are less than the number of samples from other modes.
The initial weight vector \(\theta\)
is set to zero vector, and the gain factor \(\gamma_t\) is defined as follows: \[\gamma_t = \frac{1}{(\max\{1,
t/100\})^{0.51}}\] which is consistent with the code
gammas <- function(t) {(min(1, 100/t))^(0.51)}. The
notation \(t\) stands for the \(t\)th iteration.
The maximum and minimum of all the elements of \(\theta\) is 30 and -30, and the number of total iteration is \(100000\).
pis <- (M:1)/sum(M:1)
thetas <- rep(0, M)
gammas <- function(t) {(min(1, 100/t))^(0.51)}
theta_min <- -30
theta_max <- 30
n_iter <- 100000
(4) SAMC algorithm
The samc is the main function of the SAMC algorithm. The
Jxt denotes the subregion of current sample
xt. Letting the proposal density be \(q(\cdot | x) = \mathcal N_2(y; x, 10\cdot
I_2)\), generate \(y\) from the
proposal. The reason for the variance to be \(10\) is that relatively high variance
prevents the generate \(y\) from
getting stuck to specific modes. Using the y, obtain the
subregion of it Jy. The acceptance ratio of the sampling is
\[R_{SAMC} = \exp\Bigl\{ \theta_t^{(J(x_t)) }
- \theta_t^{(J(y))} \Bigr\}\cdot\frac{f(y)\cdot q(x_t|y)}{f(x_t)\cdot
q(y|x_t)}.\] The definition of the ratio has unnormalized density
\(\psi\) on both numerator and
denominator instead of \(f\), but we
may also use \(f\) because the
normalizing constant cancels out each other. Since the proposal \(q\) is Gaussian, \(q(x_t|y)\) and \(q(y|x_t)\) cancels out. Then, the logarithm
of the ratio becomes \[\log(R_{SAMC})
= \theta_t^{(J(x_t)) } - \theta_t^{(J(y))} + \log f(y) - \log
f(x_t),\] which is consistent with log_acc_ratio.
With probability \(R_{SAMC}\), accept
the \(y\) and let \(x_{t+1} = y\).
For weigh updating, we need standard basis vectors. The vector \(e_{x_{t+1}}\) has only one number of \(1\) whose index equals to the subregion of
\(x_{t+1}\). \(\theta_{t+1/2} := \theta_t +
\gamma_{t+1}(e_{x_{t+1}}-\pi)\) corresponds to
thetas_half <- thetas + gammas(t) * (e_xt - pis). In the
question, there is no specific condition regarding the range of weight
vector \(\Theta\), so I manually set
the range as \([-30,30]^M\). The code
pmax(thetas_half, theta_min) returns maximum between each
element of thetas_half vector and theta_min,
while pmin(pmax(thetas_half, theta_min), theta_max) returns
minimum between each element of
pmax(thetas_half, theta_min) and theta_max.
This ensures that all the thetas_half elements are between
\(-30\) and \(30\).
samc <- function(x){
x_samples <- matrix(NA, nrow=n_iter, ncol=2,
dimnames = list(NULL, c('x1', 'x2')) )
thetas_samples <- matrix(NA, nrow=n_iter, ncol = length(thetas))
xt <- x
for(t in 1:n_iter){
# step 1 : sampling
Jxt <- assign_subregion(xt)
y <- rmvnorm_rcpp(1, mean = xt, sigma = 10*diag(2) )
Jy <- assign_subregion(y)
log_acc_ratio <- (thetas[Jxt] - thetas[Jy]) +
log_target(y) - log_target(xt)
if(log(runif(1)) < log_acc_ratio) {xt <- y}
x_samples[t,] <- xt
# step 2 : weight update
e_xt <- rep(0, M)
e_xt[assign_subregion(xt)] <- 1
thetas_half <- thetas + gammas(t) * (e_xt - pis)
thetas <- pmin(pmax(thetas_half, theta_min), theta_max)
thetas_samples[t,] <- thetas
}
result <- list(x_samples = x_samples, thetas_samples = thetas_samples)
return(result)
}
(5) results
The generated samples of \(x=(x_1,x_2)\), are in
x_samples_samc. The burn-in period is \(50000\).
burnin <- 50000
result <- samc(x)
x_samples_samc <- result$x_samples
Before showing the result, we’ll define marginal distributions of \(f\). The marginal distributions are the mixtures of the marginals of each component. So, \[\begin{aligned} f_1(x_1) &= \frac{1}{3}\mathcal N(x_1;-4,1) + \frac{1}{3}\mathcal N(x_1;0,1) + \frac{1}{3}\mathcal N(x_1;6,1) \\ f_2(x_2) &= \frac{1}{3}\mathcal N(x_2;-6,1) + \frac{1}{3}\mathcal N(x_2;0,1) + \frac{1}{3}\mathcal N(x_2;8,1) \end{aligned}\]
f1 <- function(x){
(1/3)*(dnorm(x,-4,1) + dnorm(x,0,1) + dnorm(x,6,1))
}
f2 <- function(x){
(1/3)*(dnorm(x,-6,1) + dnorm(x,0,1) + dnorm(x,8,1))
}
The histograms of generated samples after burn-in are similar to the marginal pdfs respectively.
par(mfrow = c(2,1) )
hist(x_samples_samc[burnin:n_iter,1], freq = FALSE, breaks = 50, ylim = c(0, 0.15),
xlab = 'x1', main = bquote("Histogram of " ~ x[1] ~ "using SAMC"), cex.main = 2 )
curve(f1,col='red', add=TRUE)
hist(x_samples_samc[burnin:n_iter,2], freq = FALSE, breaks = 50, ylim = c(0, 0.15),
xlab = 'x2', main = bquote("Histogram of " ~ x[2] ~ "using SAMC"), cex.main = 2)
curve(f2,col='red', add=TRUE)
The trace plost show the samples cover the 3 modes.
par(mfrow = c(2,1) )
plot.ts(x_samples_samc[burnin:n_iter,1], ylab = 'x1',
main = "Trace Plot of x1", cex.main = 2)
plot.ts(x_samples_samc[burnin:n_iter,2], ylab = 'x2',
main = "Trace Plot of x2", cex.main = 2)
The result of scatter plot overlaid by contour plot clearly explains that the samples cover the 3 modes.
grid <- expand.grid(x = seq(-10,10, by = 0.1), y = seq(-10,15, by = 0.1))
gm_density <- function(x, y) {
density <- 0
for (i in 1:3) {
density <- density + (1/3)*dmvnorm_rcpp(matrix(c(x, y),nrow=1),
mean = mu[[i]],
sigma = Sigma[[i]])
}
return(density)
}
grid$density <- mapply(gm_density, grid$x, grid$y)
ggplot() +
geom_point(data = as.data.frame(x_samples_samc[burnin:n_iter,]),
aes(x = x1, y = x2), color = 'gray', size = 0.3, alpha = 0.3) +
geom_contour(data = grid, aes(x = x, y=y, z=density, color = after_stat(level)) ) +
scale_color_gradient(low = 'blue', high = 'red') +
labs(title = "Contour Plot of f(x) with Scatter Plot using SAMC", x = 'x1', y='x2') +
theme_minimal()
The following trace plot shows that the weight elements start from large values and converge to 0.
par(mfrow = c(1,1))
matplot(result$thetas_samples,
type = "l", lty = 1, col = rainbow(ncol(result$thetas_samples)),
xlab = "Iteration", ylab = "Value",
main = bquote("Trace Plots of all " ~ theta[t] ~"'s"), cex.main = 2.5)
legend("bottomright", legend = paste("Theta", 1:ncol(result$thetas_samples)),
col = rainbow(ncol(result$thetas_samples)), lty = 1, cex = 1.5)
2. DMH (Spatial Autologistic model)
Let \(\boldsymbol x = \{x_i : i\in\mathcal D\}\) denote the observed binary data, where \(x_i\) is called a spin, \(\mathcal D\) is the set of indices of the spins, and \(\mathcal N(i)\) denote the set of neighbors of spin \(i\). \[\begin{aligned} f(x|\alpha,\beta) &= \frac{1}{\kappa(\alpha,\beta)}\exp\biggl\{ \alpha\sum_{i\in\mathcal D}x_i + \frac{\beta}{2}\sum_{i\in\mathcal D}\Bigl(x_i\cdot\sum_{j\in\mathcal N(i)}x_j \Bigr) \biggr\},\quad (\alpha,\beta) \in \Theta \\ \kappa(\alpha,\beta) &= \sum_{\text{all possible }x}\exp\biggl\{ \alpha\sum_{i\in\mathcal D}x_i + \frac{\beta}{2}\sum_{i\in\mathcal D}\Bigl(x_i\cdot\sum_{j\in\mathcal N(i)}x_j \Bigr) \biggr\} \end{aligned}\] To conduct a Bayesian analysis for the model, a uniform prior on \[(\alpha,\beta) \in \Theta = [-1,1]\times[0,1].\] To generate the auxiliary variable \(y\), we update each grid point using the gibbs sampler, \[\begin{aligned}&\mathbb P(y_i=1 | \alpha,\beta,y_{-i}) \\&= \frac{\exp\Bigl\{\alpha(1 + S_{1,-i}(y)) + \beta\Bigl( \frac{1}{2}\sum_{j\in\mathcal N(i)}y_j + S_{2,-i}(y)\Bigr) \Bigr\}}{\exp\Bigl\{\alpha(1 + S_{1,-i}(y)) + \beta\Bigl( \frac{1}{2}\sum_{j\in\mathcal N(i)}y_j + S_{2,-i}(y)\Bigr) \Bigr\} + \exp\Bigl\{\alpha(-1 + S_{1,-i}(y)) + \beta\Bigl( -\frac{1}{2}\sum_{j\in\mathcal N(i)}y_j + S_{2,-i}(y)\Bigr) \Bigr\}} \end{aligned}\]
Based on the given inner sampler, generate code for a DMH algorithm to estimate the parameters of a spatial autologistic model \((\alpha, \beta)\)
- Using a Gaussian random walk proposal distribution with the covariance matrix \(0.03^2 I_2\).
- Conduct at least 10 sweeps for an inner sampler.
- number of iteration : 10,000, burn-in process : 500
(1) basic setup
The \(S_{1,-i}, S_{2,-i}\) has following meaning for given data \(x\).
\[\begin{aligned} S_{1,-i}(x) &= \sum_{k\in\mathcal D\backslash\{i\}}x_k, \qquad S_{2,-i}(x) = \frac{1}{2}\sum_{k\in\mathcal D\backslash\{i\}}x_k\Bigl(\sum_{j\in\mathcal N(k)}x_j \Bigr). \end{aligned}\] Similarly, let us define following notations for simplicity. \[\begin{aligned} S_1(x) := \sum_{k\in\mathcal D}x_k, \qquad S_2(x) := \frac{1}{2}\sum_{k\in\mathcal D}\Bigl(x_k\sum_{l\in\mathcal N(k)}x_l\Bigr). \end{aligned}\]
The raw data UScancer.map is converted so that all the
values are -1,0, and 1.
rm(list=ls())
set.seed(2023311161)
library(Rcpp)
library(RcppArmadillo)
library(RcppDist)
library(ggplot2)
library(gridExtra)
## Warning: 패키지 'gridExtra'는 R 버전 4.3.2에서 작성되었습니다
rawfile = file("./UScancer.map", open = "r")
lines = readLines(rawfile)
close(rawfile, type="rw")
data = unlist(strsplit(lines,split=""))
data = matrix(as.numeric(data), nrow=58,byrow=T)
# convert 0 to -1 & 2 to 0
data[data == 0] = -1
data[data == 2] = 0
unique(as.vector(data)) # -1 1 0
## [1] -1 1 0
The total number of iteration is \(10500\), and the burn-in is \(500\). The initial values of \(\alpha\) and \(\beta\) are 0 and 1/2 respectively, which
are stored in theta vector. The number of sweep
(sw) is set to \(10\).
n_iter <- 10500
burnin <- 500
theta <- c(0,1/2)
sw <- 10
(2) Rcpp functions
Until now, \(x_k\) indicated the
location of the spin; \(k\)th spin in
\(\mathcal D\). If the data is denoted
as \(x_{i,j}\) this refers to the
coordinate of the spin; \(i\)th row and
\(j\)th column. The following
neighbor function returns the sum of all neighbors of \(x_{i,j}\), where \(x\) is a data. In general, there are 4
neighbors (last case of the if statement); north, south,
west, and east. However, there are some exceptions. When \(x_{i,j}\) is on some edge (i.e. north
edge), it has 3 neighbors; west, east, and south. When \(x_{i,j}\) is in some corner (i.e. southwest
edge), it has only 2 neighbors; north and east. So, the function
neighbor divide all the cases into 9 if
statements and returns the summation of the vector nn.
double neighbor(mat& data, int i, int j){
int ei,ej,s=0;
dvec nn(4);
ei = data.n_rows;
ej = data.n_cols;
if((i==0)&((j>0)&(j<(ej-1)))){
// north edge
nn(s) = data(i,j+1);
nn(s+1) = data(i,j-1);
nn(s+2) = data(i+1,j);
}else if((i==(ei-1))&((j>0)&(j<(ej-1)))){
// south edge
nn(s) = data(i,j+1);
nn(s+1) = data(i,j-1);
nn(s+2) = data(i-1,j);
}else if(((i>0)&(i<(ei-1)))&(j==0)){
// west edge
nn(s) = data(i,j+1);
nn(s+1) = data(i+1,j);
nn(s+2) = data(i-1,j);
}else if(((i>0)&(i<(ei-1)))&(j==(ej-1))){
// east edge
nn(s) = data(i,j-1);
nn(s+1) = data(i+1,j);
nn(s+2) = data(i-1,j);
}else if((i==0)&(j==0)){
// northwest corner
nn(s) = data(i+1,j);
nn(s+1) = data(i,j+1);
}else if((i==(ei-1))&(j==(ej-1))){
// southeast corner
nn(s) = data(i,j-1);
nn(s+1) = data(i-1,j);
}else if((i==(ei-1))&(j==0)){
// southwest corner
nn(s) = data(i,j+1);
nn(s+1) = data(i-1,j);
}else if((i==0)&(j==(ej-1))){
// northeast corner
nn(s) = data(i,j-1);
nn(s+1) = data(i+1,j);
}else{ // general case
nn(s) = data(i-1,j);
nn(s+1) = data(i+1,j);
nn(s+2) = data(i,j-1);
nn(s+3) = data(i,j+1);
}
return(accu(nn));
}
The following s1_a function returns total sum of input
data \(x\), which corresponds to \(S_1(x)\). The s2_a function
returns the value \(S_2(x)\).
double s1_a(mat& data){
return accu(data);
}
double s2_a(mat& data){
int i,j,r=0,c=0;
double res=0.0;
r=data.n_rows;
c=data.n_cols;
for(i=0;i<r;i++){
for(j=0;j<c;j++){
res += data(i,j)*neighbor(data,i,j);
}
}
return(res/2);
}
The generate_Y function literally generates the
auxiliary variable \(y\). When
generating \(y\), we only consider
\(-1\) and \(1\). One of the input n stands
for the number of sweeps. The s1_c and s2_c
stand for \(S_1(y),S_2(y)\) for current
\(y\), and s1_f,
s2_f stand for the \(S_1(y),S_2(y)\) after flip. For example,
when \(k\)th spin of current \(y\) is -1 (\(y_k
= -1\)), the s1_c and s2_ccalculates
\(S_1\) and \(S_2\) when \(y_k=-1\), while s1_f and
s2_f calculate \(S_1\) and
\(S_2\) by letting \(y_k = 1\).
Let’s consider the case \(-1 \to
+1\) first. Denote \((y|y_k=w)\)
as a data \(y\) where the \(k\) spin has value \(w\) for simplicity. The conditional
probability of being \(y_k = 1\) can be
rewritten as \[\begin{aligned}
\mathbb P(y_k=1|\alpha,\beta,y_{-k}) = \frac{ \exp\Bigl\{\alpha
S_1(y|y_k=1) + \beta S_2(y|y_k=1) \Bigr\} }{ \exp\Bigl\{\alpha
S_1(y|y_k=1) + \beta S_2(y|y_k=1) \Bigr\} + \exp\Bigl\{\alpha
S_1(y|y_k=-1) + \beta S_2(y|y_k=-1) \Bigr\} }.
\end{aligned}\] Since \(S_1(y|y_k=1)
- S_1(y|y_k=-1) = 2\) and \[\begin{aligned}S_2(y|y_k=1) - S_2(y|y_k=-1)
&= \frac{1}{2}\Bigl(1\cdot\sum_{l\in\mathcal N(k)}y_l\Bigr) -
\frac{1}{2}\Bigl(-1\cdot\sum_{l\in\mathcal N(k)}y_l\Bigr) \\
&=\sum_{l\in\mathcal N(k)}y_l,\end{aligned}\] which
correspond to s1_f = S1 + 2; and
s2_f = S2 + neighbor(Y,i,j);, the denominator of the
probability is as follows: \[\begin{aligned}
&\exp\Bigl\{\alpha S_1(y|y_k=1) + \beta S_2(y|y_k=1) \Bigr\} +
\exp\Bigl\{\alpha S_1(y|y_k=-1) + \beta S_2(y|y_k=-1) \Bigr\} \\
&= \exp\Bigl\{\alpha S_1(y|y_k=-1) + \beta
S_2(y|y_k=-1) \Bigr\}\Bigl( 1 + \exp\Bigl\{ \alpha\cdot2 +
\beta\cdot\sum_{l\in\mathcal N(k)}y_l\Bigr\}\Bigr).
\end{aligned}\] Then, the logarithm of the numerator and
denominator of the probability is \[\begin{aligned}
\log(\text{numerator}) &= \alpha S_1(y|y_k=1) + \beta S_2(y|y_k=1)
\\
\log(\text{denominator}) &= \log\Bigl( 1 + \exp\Bigl\{ \alpha\cdot2
+ \beta\cdot\sum_{l\in\mathcal N(k)}y_l\Bigr\}\Bigr) + \Bigl( \alpha
S_1(y|y_k=-1) + \beta S_2(y|y_k=-1) \Bigr)
\end{aligned}\] These values correspond to num and
den respectively. This logic applies identically to the
case of flipping to \(-1\).
Then, flip the current value \(y_{i,j}\) with probability \(\exp\)(ratio). After storing
the \(S_1,S_2\) values of flipping
state into S1 and S2 respectively, go to next
sweep and repeat this for n times.
// [[Rcpp::export]]
mat generate_Y(mat data, double alpha, double beta, int n=1){
int m,i,j,r=0,c=0;
double u,num,den,ratio=0.0;
double s1_c = 0.0, s2_c = 0.0;
double s1_f = 0.0, s2_f = 0.0;
double S1 = s1_a(data);
double S2 = s2_a(data);
r = data.n_rows;
c = data.n_cols;
mat Y(r,c,fill::zeros);
Y = data;
// gibbs sampling for auxiliary variable
for(m=0; m<n; m++){
for(i=0;i<r;i++){
for(j=0;j<c;j++){
if(Y(i,j) == -1){ // to +1
s1_c = S1;
s2_c = S2;
s1_f = S1 + 2;
s2_f = S2 + neighbor(Y,i,j);
num = alpha*s1_f + beta*s2_f;
den = (alpha*s1_c + beta*s2_c) +
log(1+exp(alpha*(s1_f-s1_c)+beta*(s2_f-s2_c)));
ratio = num - den;
}else if(Y(i,j) == 1){ // to -1
s1_c = S1;
s2_c = S2;
s1_f = S1 - 2;
s2_f = S2 - neighbor(Y,i,j);
num = alpha*s1_f + beta*s2_f;
den = (alpha*s1_c + beta*s2_c) +
log(1+exp(alpha*(s1_f-s1_c)+beta*(s2_f-s2_c)));
ratio = num - den;
}
u = R::runif(0,1);
if(log(u) < ratio){
Y(i,j) = -Y(i,j);
S1 = s1_f;
S2 = s2_f;
}else{
Y(i,j) = Y(i,j);
}
}
}
}
return Y;
}
The unnormal_logf function returns the value \(\exp\Bigl\{\alpha S_1(x) + \beta
S_2(x) \Bigr\}\) for input data \(x\). This will be used in acceptance ratio
of DMH algorithm.
// [[Rcpp::export]]
double unnormal_logf(mat& data, double alpha, double beta){
double term1 = alpha * s1_a(data);
double term2 = (beta / 2.0) * s2_a(data);
return term1 + term2;
}
All these Rcpp functions are in MC_HW6_kohanjun.cpp
file.
sourceCpp("MC_HW6_kohanjun.cpp")
(3) Double MH algorithm
The prior of \(\alpha\) and \(\beta\) are both from uniform distribution.
\[\begin{aligned}
\alpha \sim \text{Unif}(-1,1), \qquad \beta \sim \text{Unif}(0,1).
\end{aligned}\] Since the accpetance ratio requires prior of
\((\alpha,\beta)\), we need to check
whether \((\alpha,\beta)\) lie in the
region \(\Theta\). The following
ab_in_range function returns TRUE if both
\(\alpha\) and \(\beta\) lie in \(\Theta\), and returns FALSE
otherwise.
ab_in_range <- function(a,b){
if(dunif(a,-1,1) == 1/2 & dunif(b,0,1) == 1) return(TRUE)
else return(FALSE)
}
ab_in_range(0,1/2)
## [1] TRUE
The following DMH function is the main of Double
Metropolis Hastings algorithm. We assumed a Gaussian random walk for
\(\theta = (\alpha,\beta)\), that is,
the proposal density \(q\) is \[q(\cdot | \theta) = \mathcal N_2(\theta, 0.03^2
I_2).\] Generate theta_temp using current \(\theta\), and then generate auxiliary data
Y using generate_Y and temporary \(\alpha^*,\beta^*\),
alpha_temp, beta_temp. The acceptance ratio of DMH is \[R_{DMH} = \min\biggl\{1,
\frac{p(\theta^*)h(x|\theta^*)h(y|\theta_t)q(\theta_t|\theta^*)}{p(\theta_t)h(x|\theta_t)h(y|\theta^*)q(\theta^*|\theta_t)}
\biggr\},\] where \(p\) is the
prior of \(\theta\), \(h\) is the unnormalized \(f\), \(\theta^* =
(\alpha^*,\beta^*)\), and \(\theta_t\) is current \(\theta\) value at \(t\)th iteration. The \(q\) term cancels out because of Gaussian
random walk.
Assuming \(\theta_t \in \Theta\),
when \(\theta^* \in \Theta\) as well,
the \(p\) term cancels out as well.
Then, the logarithm of the acceptance ratio becomes \[\log R_{DMH} = \log h(x|\theta^*) + \log
h(y|\theta_t) - \log h(x|\theta_t) - \log h(y|\theta^*),\] which
corresponds to log_acc_ratio. When \(\theta^*\) is out of the region \(\Theta\), \(p(\theta^*)\) becomes 0 and the acceptance
ratio becomes 0 as well. So, for this case, log_acc_ratio
is set to -Inf. Accept \((\alpha^*,\beta^*)\) with probability \(\exp\)(log_acc_ratio), and
repeat this for n_iter times.
DMH <- function(data){
theta_samples <- matrix(NA, nrow = n_iter, ncol = 2,
dimnames=list(NULL, c('alpha','beta')))
for(iter in 1:n_iter){
alpha <- theta[1];beta <- theta[2]
theta_temp <- rmvnorm_rcpp(1, mean = theta, sigma = 0.03^2*diag(2))
alpha_temp <- theta_temp[1,1];beta_temp <- theta_temp[1,2]
Y <- generate_Y(data, alpha_temp, beta_temp, n=sw)
if(ab_in_range(alpha_temp, beta_temp)) {
log_acc_ratio <- unnormal_logf(data,alpha_temp,beta_temp) +
unnormal_logf(Y,alpha,beta) -
unnormal_logf(data,alpha,beta) -
unnormal_logf(Y,alpha_temp,beta_temp)
} else {
log_acc_ratio <- -Inf
}
if(log(runif(1)) < log_acc_ratio){
theta <- theta_temp
}
theta_samples[iter,] <- theta
}
return(theta_samples)
}
(4) results
The sampled \(\alpha,\beta\) are
stored in theta_samples. The medians andstandard deviations
of them are in ma, mb, and sda,
sdb respectively, which are used in functions of normal
distribution pdf values fa and fb.
theta_samples <- DMH(data)
ma <- median(theta_samples[burnin:n_iter,1])
mb <- median(theta_samples[burnin:n_iter,2])
ma;mb
## [1] -0.3021663
## [1] 0.2456627
sda <- sd(theta_samples[burnin:n_iter,1])
sdb <- sd(theta_samples[burnin:n_iter,2])
sda;sdb
## [1] 0.03224568
## [1] 0.03559585
fa <- function(x){
dnorm(x,ma,sda)
}
fb <- function(x){
dnorm(x,mb,sdb)
}
The histograms show that \(\alpha\)
and \(\beta\) samples are similar to
normal distribution with mean ma, mb and
standard deviation sda, sdb respectively.
h1 <- ggplot()+
geom_histogram(data = as.data.frame(theta_samples[burnin:n_iter,]),
aes(x = alpha, y = after_stat(density)),
bins = 30, fill = 'gray', color = 'black') +
stat_function(fun = fa, color = 'red') +
labs(title = bquote("Histogram of "~ alpha), x = expression(alpha)) +
theme(plot.title = element_text(size = 20)) +
theme_minimal()
h2 <- ggplot()+
geom_histogram(data = as.data.frame(theta_samples[burnin:n_iter,]),
aes(x = beta, y = after_stat(density)),
bins = 30, fill = 'gray', color = 'black') +
stat_function(fun = fb, color = 'red') +
labs(title = bquote("Histogram of "~ beta), x = expression(beta)) +
theme(plot.title = element_text(size = 20)) +
theme_minimal()
grid.arrange(h1,h2, ncol = 1)
The trace plots show that the sampled \(\alpha\) and \(\beta\) oscillate around ma
and mb respectively.
par(mfrow = c(2,1))
plot.ts(theta_samples[burnin:n_iter,1], ylab = expression(alpha),
main = bquote("Trace Plot of "~ alpha), cex.main = 2)
plot.ts(theta_samples[burnin:n_iter,2],ylab = expression(beta),
main = bquote("Trace Plot of " ~ beta), cex.main = 2)