1. SAMC :

Consider the gaussian mixture model given by \[f(x) = \frac{1}{3}\mathcal N\biggl(\begin{pmatrix}-4\\-6\end{pmatrix},\begin{pmatrix}1 & 0.9 \\0.9 & 1\end{pmatrix}\biggr) + \frac{1}{3}\mathcal N\biggl(\begin{pmatrix}0 \\ 0\end{pmatrix},\begin{pmatrix}1 & 0 \\ 0 & 1\end{pmatrix}\biggr) + \frac{1}{3}\mathcal N\biggl(\begin{pmatrix}6\\8\end{pmatrix},\begin{pmatrix}1&-0.9\\-0.9&1\end{pmatrix}\biggr).\]

Generate random number from Gaussian mixture model using SAMC.

(1) basic setup

This question is about Gaussian mixture, so we need same parameters as HW4 (Hamiltonian MC). The initial value is set to (-1,-1), and the relevant rcpp functions are in "MC_HW6_kohanjun.cpp".

rm(list = ls())
set.seed(2023311161)
library(Rcpp)
## Warning: 패키지 'Rcpp'는 R 버전 4.3.3에서 작성되었습니다
library(RcppArmadillo)
library(RcppDist)
## Warning: 패키지 'RcppDist'는 R 버전 4.3.3에서 작성되었습니다
library(mvtnorm)
## Warning: 패키지 'mvtnorm'는 R 버전 4.3.3에서 작성되었습니다
library(ggplot2)
## Warning: 패키지 'ggplot2'는 R 버전 4.3.3에서 작성되었습니다
mu1 <- c(-4, -6)
mu2 <- c(0, 0)
mu3 <- c(6, 8)
mu <- list(mu1 = mu1, mu2 = mu2, mu3 = mu3)


Sigma1 <- matrix(c(1, 0.9, 0.9, 1), nrow = 2)
Sigma2 <- diag(2)
Sigma3 <- matrix(c(1, -0.9, -0.9, 1), nrow = 2)
Sigma <- list(Sigma1 = Sigma1, Sigma2 = Sigma2, Sigma3 = Sigma3)


Sigma1_inv <- solve(Sigma1)
Sigma2_inv <- solve(Sigma2)
Sigma3_inv <- solve(Sigma3)

x <- matrix(c(-1,-1), nrow = 1)
sourceCpp("MC_HW6_kohanjun.cpp")

In the rcpp file, there are dmvnorm_rcpp and rmvnorm_rcpp functions that returns the pdf value of given multivariate normal distribution and generates random number of multivariate normal respectivley.

// [[Rcpp::export]]
vec dmvnorm_rcpp(const mat& x, const vec& mean, const mat& sigma, bool logd = false) {

  return dmvnorm(x, mean, sigma, logd);
}

// [[Rcpp::export]]
mat rmvnorm_rcpp(int n, const vec& mean, const mat& sigma) {
  return rmvnorm(n, mean, sigma);
}

We also use same logarithm density function log_target that uses log-sum-trick as follows. This function returns same value as direct density calculation.

log_target <- function(x) {

  K <- 3
  log_comp_densities <- numeric(K)
  ws <- rep(1/K,K)
  
  for (k in 1:K) {
    log_comp_densities[k] <- dmvnorm_rcpp(x,mean = mu[[k]], sigma = Sigma[[k]], logd = TRUE)
  }
  
  log_weights <- log(ws)
  
  # Log-sum-exp trick
  alpha <- max(log_weights + log_comp_densities)
  log_density <- alpha + log(sum(exp(log_weights + log_comp_densities - alpha)))
  
  return(log_density)
}
exp(log_target(x))
## [1] 0.01951661
as.numeric((1/3)*
             (dmvnorm_rcpp(x, mean=mu1, sigma=Sigma1)+
                dmvnorm_rcpp(x, mean=mu2, sigma=Sigma2) + 
                dmvnorm_rcpp(x, mean=mu3, sigma=Sigma3)))
## [1] 0.01951661

(2) subregions

The purpose of this section is to partition the total sample space \(\mathcal X\) depending on the value of the density. After comparing the values of the density at 3 modes, the maximum of the \(f(x)\) is about \(0.1217088\). After rounding up to the 4th decimal place, we set this as the maximum of the density f_max' The reason for rounding up the maximum is to consider the probability that the calculated \(0.1217088\) is slightly less than the real maximum. I set the number of subregions to be \(M = 20\), and I made equally-space f_interval from 0 to f_max.

exp(log_target(matrix(c(-4,-6), nrow = 1 ) ) )
## [1] 0.1217088
exp(log_target(matrix(c(0,0), nrow = 1 ) ) )
## [1] 0.05305165
exp(log_target(matrix(c(6,8), nrow = 1 ) ) )
## [1] 0.1217088
f_max <- ceiling(exp(log_target(matrix(c(6,8), nrow = 1 ) ) )  * 10000) / 10000

M <- 20
f_interval <- seq(0, f_max, length.out = M+1)

Let \(f^{(m)}\) denote the \(m\)th boundary (\(m = 1,...,M+1\)) so that ‘f_max’ equals \(f^{(M+1)}\) and \(f^{(0)}=0\). The following ‘assign_subregion’ function determines in which subregion the input \(x\) lies. In which function, the term f >= f_interval[-(M+1)] determines whether \(f^{(1)},f^{(2)},...,f^{(M)} \le f\), and f < f_interval[-1] determines whether \(f < f^{(2)},f^{(3)},...,f^{(M+1)}\). There should be only one \(n\) such that \(f^{(n)} \le f < f^{(n+1)}\), so the function assign_subregion determines that \(f\) is in \(n\)th subregion (\(n = 1,...,M\)). As the value of the distribution \(f\) is high, the index of the subregion become high as well.

assign_subregion <- function(x){
  f <- exp(log_target(x))
  which(f >= f_interval[-(M+1)] & f < f_interval[-1])
}
assign_subregion(matrix(c(-4,-6), nrow=1))
## [1] 20
assign_subregion(matrix(c(0,0), nrow=1))
## [1] 9

(3) other parameters

The \(\pi\) vector is set to have high value for lower distribution (high energy) and low value for high distribution (low energy). If the weight is set to be uniform (\(1/M\)), the number of samples from \(2\)nd mode (whose joint pdf values are lower than other modes) are less than the number of samples from other modes.

The initial weight vector \(\theta\) is set to zero vector, and the gain factor \(\gamma_t\) is defined as follows: \[\gamma_t = \frac{1}{(\max\{1, t/100\})^{0.51}}\] which is consistent with the code gammas <- function(t) {(min(1, 100/t))^(0.51)}. The notation \(t\) stands for the \(t\)th iteration.

The maximum and minimum of all the elements of \(\theta\) is 30 and -30, and the number of total iteration is \(100000\).

pis <- (M:1)/sum(M:1)
thetas <- rep(0, M)
gammas <- function(t) {(min(1, 100/t))^(0.51)}

theta_min <- -30                
theta_max <- 30 
n_iter <- 100000

(4) SAMC algorithm

The samc is the main function of the SAMC algorithm. The Jxt denotes the subregion of current sample xt. Letting the proposal density be \(q(\cdot | x) = \mathcal N_2(y; x, 10\cdot I_2)\), generate \(y\) from the proposal. The reason for the variance to be \(10\) is that relatively high variance prevents the generate \(y\) from getting stuck to specific modes. Using the y, obtain the subregion of it Jy. The acceptance ratio of the sampling is \[R_{SAMC} = \exp\Bigl\{ \theta_t^{(J(x_t)) } - \theta_t^{(J(y))} \Bigr\}\cdot\frac{f(y)\cdot q(x_t|y)}{f(x_t)\cdot q(y|x_t)}.\] The definition of the ratio has unnormalized density \(\psi\) on both numerator and denominator instead of \(f\), but we may also use \(f\) because the normalizing constant cancels out each other. Since the proposal \(q\) is Gaussian, \(q(x_t|y)\) and \(q(y|x_t)\) cancels out. Then, the logarithm of the ratio becomes \[\log(R_{SAMC}) = \theta_t^{(J(x_t)) } - \theta_t^{(J(y))} + \log f(y) - \log f(x_t),\] which is consistent with log_acc_ratio. With probability \(R_{SAMC}\), accept the \(y\) and let \(x_{t+1} = y\).

For weigh updating, we need standard basis vectors. The vector \(e_{x_{t+1}}\) has only one number of \(1\) whose index equals to the subregion of \(x_{t+1}\). \(\theta_{t+1/2} := \theta_t + \gamma_{t+1}(e_{x_{t+1}}-\pi)\) corresponds to thetas_half <- thetas + gammas(t) * (e_xt - pis). In the question, there is no specific condition regarding the range of weight vector \(\Theta\), so I manually set the range as \([-30,30]^M\). The code pmax(thetas_half, theta_min) returns maximum between each element of thetas_half vector and theta_min, while pmin(pmax(thetas_half, theta_min), theta_max) returns minimum between each element of pmax(thetas_half, theta_min) and theta_max. This ensures that all the thetas_half elements are between \(-30\) and \(30\).

samc <- function(x){
  x_samples <- matrix(NA, nrow=n_iter, ncol=2, 
                      dimnames = list(NULL, c('x1', 'x2')) )
  thetas_samples <- matrix(NA, nrow=n_iter, ncol = length(thetas))
  xt <- x

  
  for(t in 1:n_iter){
    # step 1 : sampling
    Jxt <- assign_subregion(xt)
    y <- rmvnorm_rcpp(1, mean = xt, sigma = 10*diag(2) )
    Jy <- assign_subregion(y)
    log_acc_ratio <- (thetas[Jxt] - thetas[Jy]) + 
      log_target(y) - log_target(xt)
    if(log(runif(1)) < log_acc_ratio) {xt <- y}
    x_samples[t,] <- xt
    
    # step 2 : weight update
    e_xt <- rep(0, M)
    e_xt[assign_subregion(xt)] <- 1
    thetas_half <- thetas + gammas(t) * (e_xt - pis)
    thetas <- pmin(pmax(thetas_half, theta_min), theta_max)
    thetas_samples[t,] <- thetas
  }
  
  result <- list(x_samples = x_samples, thetas_samples = thetas_samples)
  return(result)
  
}

(5) results

The generated samples of \(x=(x_1,x_2)\), are in x_samples_samc. The burn-in period is \(50000\).

burnin <- 50000
result <- samc(x)
x_samples_samc <- result$x_samples

Before showing the result, we’ll define marginal distributions of \(f\). The marginal distributions are the mixtures of the marginals of each component. So, \[\begin{aligned} f_1(x_1) &= \frac{1}{3}\mathcal N(x_1;-4,1) + \frac{1}{3}\mathcal N(x_1;0,1) + \frac{1}{3}\mathcal N(x_1;6,1) \\ f_2(x_2) &= \frac{1}{3}\mathcal N(x_2;-6,1) + \frac{1}{3}\mathcal N(x_2;0,1) + \frac{1}{3}\mathcal N(x_2;8,1) \end{aligned}\]

f1 <- function(x){
  (1/3)*(dnorm(x,-4,1) + dnorm(x,0,1) + dnorm(x,6,1))
}

f2 <- function(x){
  (1/3)*(dnorm(x,-6,1) + dnorm(x,0,1) + dnorm(x,8,1))
}

The histograms of generated samples after burn-in are similar to the marginal pdfs respectively.

par(mfrow = c(2,1) )
hist(x_samples_samc[burnin:n_iter,1], freq = FALSE, breaks = 50, ylim = c(0, 0.15),
     xlab = 'x1', main = bquote("Histogram of " ~ x[1] ~ "using SAMC"), cex.main = 2 )
curve(f1,col='red', add=TRUE)
hist(x_samples_samc[burnin:n_iter,2], freq = FALSE, breaks = 50, ylim = c(0, 0.15),
     xlab = 'x2', main = bquote("Histogram of " ~ x[2] ~ "using SAMC"), cex.main = 2)
curve(f2,col='red', add=TRUE)

The trace plost show the samples cover the 3 modes.

par(mfrow = c(2,1) )
plot.ts(x_samples_samc[burnin:n_iter,1], ylab = 'x1',
        main = "Trace Plot of x1", cex.main = 2)
plot.ts(x_samples_samc[burnin:n_iter,2], ylab = 'x2',
        main = "Trace Plot of x2", cex.main = 2)

The result of scatter plot overlaid by contour plot clearly explains that the samples cover the 3 modes.

grid <- expand.grid(x = seq(-10,10, by = 0.1), y = seq(-10,15, by = 0.1))
gm_density <- function(x, y) {
  density <- 0
  for (i in 1:3) {
    density <- density + (1/3)*dmvnorm_rcpp(matrix(c(x, y),nrow=1), 
                                            mean = mu[[i]], 
                                            sigma = Sigma[[i]])
  }
  return(density)
}
grid$density <- mapply(gm_density, grid$x, grid$y)

ggplot() +
  geom_point(data = as.data.frame(x_samples_samc[burnin:n_iter,]),
             aes(x = x1, y = x2), color = 'gray', size = 0.3, alpha = 0.3) +
  geom_contour(data = grid, aes(x = x, y=y, z=density, color = after_stat(level)) ) +
  scale_color_gradient(low = 'blue', high = 'red') +
  labs(title = "Contour Plot of f(x) with Scatter Plot using SAMC", x = 'x1', y='x2') +
  theme_minimal()

The following trace plot shows that the weight elements start from large values and converge to 0.

par(mfrow = c(1,1))
matplot(result$thetas_samples, 
        type = "l", lty = 1, col = rainbow(ncol(result$thetas_samples)),
        xlab = "Iteration", ylab = "Value", 
        main = bquote("Trace Plots of all " ~ theta[t] ~"'s"), cex.main = 2.5)
legend("bottomright", legend = paste("Theta", 1:ncol(result$thetas_samples)), 
       col = rainbow(ncol(result$thetas_samples)), lty = 1, cex = 1.5)

2. DMH (Spatial Autologistic model)

Let \(\boldsymbol x = \{x_i : i\in\mathcal D\}\) denote the observed binary data, where \(x_i\) is called a spin, \(\mathcal D\) is the set of indices of the spins, and \(\mathcal N(i)\) denote the set of neighbors of spin \(i\). \[\begin{aligned} f(x|\alpha,\beta) &= \frac{1}{\kappa(\alpha,\beta)}\exp\biggl\{ \alpha\sum_{i\in\mathcal D}x_i + \frac{\beta}{2}\sum_{i\in\mathcal D}\Bigl(x_i\cdot\sum_{j\in\mathcal N(i)}x_j \Bigr) \biggr\},\quad (\alpha,\beta) \in \Theta \\ \kappa(\alpha,\beta) &= \sum_{\text{all possible }x}\exp\biggl\{ \alpha\sum_{i\in\mathcal D}x_i + \frac{\beta}{2}\sum_{i\in\mathcal D}\Bigl(x_i\cdot\sum_{j\in\mathcal N(i)}x_j \Bigr) \biggr\} \end{aligned}\] To conduct a Bayesian analysis for the model, a uniform prior on \[(\alpha,\beta) \in \Theta = [-1,1]\times[0,1].\] To generate the auxiliary variable \(y\), we update each grid point using the gibbs sampler, \[\begin{aligned}&\mathbb P(y_i=1 | \alpha,\beta,y_{-i}) \\&= \frac{\exp\Bigl\{\alpha(1 + S_{1,-i}(y)) + \beta\Bigl( \frac{1}{2}\sum_{j\in\mathcal N(i)}y_j + S_{2,-i}(y)\Bigr) \Bigr\}}{\exp\Bigl\{\alpha(1 + S_{1,-i}(y)) + \beta\Bigl( \frac{1}{2}\sum_{j\in\mathcal N(i)}y_j + S_{2,-i}(y)\Bigr) \Bigr\} + \exp\Bigl\{\alpha(-1 + S_{1,-i}(y)) + \beta\Bigl( -\frac{1}{2}\sum_{j\in\mathcal N(i)}y_j + S_{2,-i}(y)\Bigr) \Bigr\}} \end{aligned}\]

Based on the given inner sampler, generate code for a DMH algorithm to estimate the parameters of a spatial autologistic model \((\alpha, \beta)\)

  • Using a Gaussian random walk proposal distribution with the covariance matrix \(0.03^2 I_2\).
  • Conduct at least 10 sweeps for an inner sampler.
  • number of iteration : 10,000, burn-in process : 500

(1) basic setup

The \(S_{1,-i}, S_{2,-i}\) has following meaning for given data \(x\).

\[\begin{aligned} S_{1,-i}(x) &= \sum_{k\in\mathcal D\backslash\{i\}}x_k, \qquad S_{2,-i}(x) = \frac{1}{2}\sum_{k\in\mathcal D\backslash\{i\}}x_k\Bigl(\sum_{j\in\mathcal N(k)}x_j \Bigr). \end{aligned}\] Similarly, let us define following notations for simplicity. \[\begin{aligned} S_1(x) := \sum_{k\in\mathcal D}x_k, \qquad S_2(x) := \frac{1}{2}\sum_{k\in\mathcal D}\Bigl(x_k\sum_{l\in\mathcal N(k)}x_l\Bigr). \end{aligned}\]

The raw data UScancer.map is converted so that all the values are -1,0, and 1.

rm(list=ls())
set.seed(2023311161)
library(Rcpp)
library(RcppArmadillo)
library(RcppDist)
library(ggplot2)
library(gridExtra)
## Warning: 패키지 'gridExtra'는 R 버전 4.3.2에서 작성되었습니다
rawfile = file("./UScancer.map", open = "r")
lines = readLines(rawfile)
close(rawfile, type="rw")
data = unlist(strsplit(lines,split=""))
data = matrix(as.numeric(data), nrow=58,byrow=T)

# convert 0 to -1 & 2 to 0
data[data == 0] = -1
data[data == 2] = 0

unique(as.vector(data)) #  -1  1  0
## [1] -1  1  0

The total number of iteration is \(10500\), and the burn-in is \(500\). The initial values of \(\alpha\) and \(\beta\) are 0 and 1/2 respectively, which are stored in theta vector. The number of sweep (sw) is set to \(10\).

n_iter <- 10500
burnin <- 500
theta <- c(0,1/2)
sw <- 10

(2) Rcpp functions

Until now, \(x_k\) indicated the location of the spin; \(k\)th spin in \(\mathcal D\). If the data is denoted as \(x_{i,j}\) this refers to the coordinate of the spin; \(i\)th row and \(j\)th column. The following neighbor function returns the sum of all neighbors of \(x_{i,j}\), where \(x\) is a data. In general, there are 4 neighbors (last case of the if statement); north, south, west, and east. However, there are some exceptions. When \(x_{i,j}\) is on some edge (i.e. north edge), it has 3 neighbors; west, east, and south. When \(x_{i,j}\) is in some corner (i.e. southwest edge), it has only 2 neighbors; north and east. So, the function neighbor divide all the cases into 9 if statements and returns the summation of the vector nn.

double neighbor(mat& data, int i, int j){
  
  int ei,ej,s=0;
  dvec nn(4);
  
  ei = data.n_rows;
  ej = data.n_cols;
  
  if((i==0)&((j>0)&(j<(ej-1)))){
    // north edge
    nn(s) = data(i,j+1);
    nn(s+1) = data(i,j-1);
    nn(s+2) = data(i+1,j);
  }else if((i==(ei-1))&((j>0)&(j<(ej-1)))){
    // south edge
    nn(s) = data(i,j+1);
    nn(s+1) = data(i,j-1);
    nn(s+2) = data(i-1,j);
  }else if(((i>0)&(i<(ei-1)))&(j==0)){
    // west edge
    nn(s) = data(i,j+1);
    nn(s+1) = data(i+1,j);
    nn(s+2) = data(i-1,j);
  }else if(((i>0)&(i<(ei-1)))&(j==(ej-1))){
    // east edge
    nn(s) = data(i,j-1);
    nn(s+1) = data(i+1,j);
    nn(s+2) = data(i-1,j);
  }else if((i==0)&(j==0)){
    // northwest corner
    nn(s) = data(i+1,j);
    nn(s+1) = data(i,j+1);
  }else if((i==(ei-1))&(j==(ej-1))){
    // southeast corner
    nn(s) = data(i,j-1);
    nn(s+1) = data(i-1,j);
  }else if((i==(ei-1))&(j==0)){
    // southwest corner
    nn(s) = data(i,j+1);
    nn(s+1) = data(i-1,j);
  }else if((i==0)&(j==(ej-1))){
    // northeast corner
    nn(s) = data(i,j-1);
    nn(s+1) = data(i+1,j);
  }else{ // general case
    nn(s) = data(i-1,j);
    nn(s+1) = data(i+1,j);
    nn(s+2) = data(i,j-1);
    nn(s+3) = data(i,j+1);
  }
  
  return(accu(nn));
}

The following s1_a function returns total sum of input data \(x\), which corresponds to \(S_1(x)\). The s2_a function returns the value \(S_2(x)\).

double s1_a(mat& data){
  return accu(data);
}

double s2_a(mat& data){
  
  int i,j,r=0,c=0;
  double res=0.0;
  r=data.n_rows;
  c=data.n_cols;
  
  for(i=0;i<r;i++){
    for(j=0;j<c;j++){
      res += data(i,j)*neighbor(data,i,j); 
    }
  }
  
  return(res/2);
}

The generate_Y function literally generates the auxiliary variable \(y\). When generating \(y\), we only consider \(-1\) and \(1\). One of the input n stands for the number of sweeps. The s1_c and s2_c stand for \(S_1(y),S_2(y)\) for current \(y\), and s1_f, s2_f stand for the \(S_1(y),S_2(y)\) after flip. For example, when \(k\)th spin of current \(y\) is -1 (\(y_k = -1\)), the s1_c and s2_ccalculates \(S_1\) and \(S_2\) when \(y_k=-1\), while s1_f and s2_f calculate \(S_1\) and \(S_2\) by letting \(y_k = 1\).

Let’s consider the case \(-1 \to +1\) first. Denote \((y|y_k=w)\) as a data \(y\) where the \(k\) spin has value \(w\) for simplicity. The conditional probability of being \(y_k = 1\) can be rewritten as \[\begin{aligned} \mathbb P(y_k=1|\alpha,\beta,y_{-k}) = \frac{ \exp\Bigl\{\alpha S_1(y|y_k=1) + \beta S_2(y|y_k=1) \Bigr\} }{ \exp\Bigl\{\alpha S_1(y|y_k=1) + \beta S_2(y|y_k=1) \Bigr\} + \exp\Bigl\{\alpha S_1(y|y_k=-1) + \beta S_2(y|y_k=-1) \Bigr\} }. \end{aligned}\] Since \(S_1(y|y_k=1) - S_1(y|y_k=-1) = 2\) and \[\begin{aligned}S_2(y|y_k=1) - S_2(y|y_k=-1) &= \frac{1}{2}\Bigl(1\cdot\sum_{l\in\mathcal N(k)}y_l\Bigr) - \frac{1}{2}\Bigl(-1\cdot\sum_{l\in\mathcal N(k)}y_l\Bigr) \\ &=\sum_{l\in\mathcal N(k)}y_l,\end{aligned}\] which correspond to s1_f = S1 + 2; and s2_f = S2 + neighbor(Y,i,j);, the denominator of the probability is as follows: \[\begin{aligned} &\exp\Bigl\{\alpha S_1(y|y_k=1) + \beta S_2(y|y_k=1) \Bigr\} + \exp\Bigl\{\alpha S_1(y|y_k=-1) + \beta S_2(y|y_k=-1) \Bigr\} \\ &= \exp\Bigl\{\alpha S_1(y|y_k=-1) + \beta S_2(y|y_k=-1) \Bigr\}\Bigl( 1 + \exp\Bigl\{ \alpha\cdot2 + \beta\cdot\sum_{l\in\mathcal N(k)}y_l\Bigr\}\Bigr). \end{aligned}\] Then, the logarithm of the numerator and denominator of the probability is \[\begin{aligned} \log(\text{numerator}) &= \alpha S_1(y|y_k=1) + \beta S_2(y|y_k=1) \\ \log(\text{denominator}) &= \log\Bigl( 1 + \exp\Bigl\{ \alpha\cdot2 + \beta\cdot\sum_{l\in\mathcal N(k)}y_l\Bigr\}\Bigr) + \Bigl( \alpha S_1(y|y_k=-1) + \beta S_2(y|y_k=-1) \Bigr) \end{aligned}\] These values correspond to num and den respectively. This logic applies identically to the case of flipping to \(-1\).

Then, flip the current value \(y_{i,j}\) with probability \(\exp\)(ratio). After storing the \(S_1,S_2\) values of flipping state into S1 and S2 respectively, go to next sweep and repeat this for n times.

// [[Rcpp::export]]
mat generate_Y(mat data, double alpha, double beta, int n=1){
  int m,i,j,r=0,c=0;
  double u,num,den,ratio=0.0;
  double s1_c = 0.0, s2_c = 0.0; 
  double s1_f = 0.0, s2_f = 0.0; 
  double S1 = s1_a(data);
  double S2 = s2_a(data);
  
  r = data.n_rows;
  c = data.n_cols;
  mat Y(r,c,fill::zeros);
  Y = data;
  
  // gibbs sampling for auxiliary variable
  for(m=0; m<n; m++){
    for(i=0;i<r;i++){
      for(j=0;j<c;j++){
        if(Y(i,j) == -1){ // to +1
          s1_c = S1;
          s2_c = S2;
          s1_f = S1 + 2;
          s2_f = S2 + neighbor(Y,i,j);
          num = alpha*s1_f + beta*s2_f;
          den = (alpha*s1_c + beta*s2_c) + 
            log(1+exp(alpha*(s1_f-s1_c)+beta*(s2_f-s2_c)));
          ratio = num - den;
          
        }else if(Y(i,j) == 1){ // to -1
          s1_c = S1;
          s2_c = S2;
          s1_f = S1 - 2;
          s2_f = S2 - neighbor(Y,i,j);
          num = alpha*s1_f + beta*s2_f;
          den = (alpha*s1_c + beta*s2_c) + 
            log(1+exp(alpha*(s1_f-s1_c)+beta*(s2_f-s2_c)));
          ratio = num - den;
          
        } 
        
        u = R::runif(0,1);
        if(log(u) < ratio){
          Y(i,j) = -Y(i,j);
          S1 = s1_f;
          S2 = s2_f;
        }else{ 
          Y(i,j) = Y(i,j);
        }
      } 
    }
  }
  return Y;
} 

The unnormal_logf function returns the value \(\exp\Bigl\{\alpha S_1(x) + \beta S_2(x) \Bigr\}\) for input data \(x\). This will be used in acceptance ratio of DMH algorithm.

// [[Rcpp::export]]
double unnormal_logf(mat& data, double alpha, double beta){
  double term1 = alpha * s1_a(data);
  double term2 = (beta / 2.0) * s2_a(data);
  return term1 + term2;
}

All these Rcpp functions are in MC_HW6_kohanjun.cpp file.

sourceCpp("MC_HW6_kohanjun.cpp")

(3) Double MH algorithm

The prior of \(\alpha\) and \(\beta\) are both from uniform distribution. \[\begin{aligned} \alpha \sim \text{Unif}(-1,1), \qquad \beta \sim \text{Unif}(0,1). \end{aligned}\] Since the accpetance ratio requires prior of \((\alpha,\beta)\), we need to check whether \((\alpha,\beta)\) lie in the region \(\Theta\). The following ab_in_range function returns TRUE if both \(\alpha\) and \(\beta\) lie in \(\Theta\), and returns FALSE otherwise.

ab_in_range <- function(a,b){
  if(dunif(a,-1,1) == 1/2 & dunif(b,0,1) == 1) return(TRUE)
  else return(FALSE)
}
ab_in_range(0,1/2)
## [1] TRUE

The following DMH function is the main of Double Metropolis Hastings algorithm. We assumed a Gaussian random walk for \(\theta = (\alpha,\beta)\), that is, the proposal density \(q\) is \[q(\cdot | \theta) = \mathcal N_2(\theta, 0.03^2 I_2).\] Generate theta_temp using current \(\theta\), and then generate auxiliary data Y using generate_Y and temporary \(\alpha^*,\beta^*\), alpha_temp, beta_temp. The acceptance ratio of DMH is \[R_{DMH} = \min\biggl\{1, \frac{p(\theta^*)h(x|\theta^*)h(y|\theta_t)q(\theta_t|\theta^*)}{p(\theta_t)h(x|\theta_t)h(y|\theta^*)q(\theta^*|\theta_t)} \biggr\},\] where \(p\) is the prior of \(\theta\), \(h\) is the unnormalized \(f\), \(\theta^* = (\alpha^*,\beta^*)\), and \(\theta_t\) is current \(\theta\) value at \(t\)th iteration. The \(q\) term cancels out because of Gaussian random walk.

Assuming \(\theta_t \in \Theta\), when \(\theta^* \in \Theta\) as well, the \(p\) term cancels out as well. Then, the logarithm of the acceptance ratio becomes \[\log R_{DMH} = \log h(x|\theta^*) + \log h(y|\theta_t) - \log h(x|\theta_t) - \log h(y|\theta^*),\] which corresponds to log_acc_ratio. When \(\theta^*\) is out of the region \(\Theta\), \(p(\theta^*)\) becomes 0 and the acceptance ratio becomes 0 as well. So, for this case, log_acc_ratio is set to -Inf. Accept \((\alpha^*,\beta^*)\) with probability \(\exp\)(log_acc_ratio), and repeat this for n_iter times.

DMH <- function(data){
  theta_samples <- matrix(NA, nrow = n_iter, ncol = 2,
                          dimnames=list(NULL, c('alpha','beta')))
  
  
  for(iter in 1:n_iter){
    alpha <- theta[1];beta <- theta[2]
    theta_temp <- rmvnorm_rcpp(1, mean = theta, sigma = 0.03^2*diag(2))
    alpha_temp <- theta_temp[1,1];beta_temp <- theta_temp[1,2]
    Y <- generate_Y(data, alpha_temp, beta_temp, n=sw)
    
    if(ab_in_range(alpha_temp, beta_temp)) {
      log_acc_ratio <- unnormal_logf(data,alpha_temp,beta_temp) + 
        unnormal_logf(Y,alpha,beta) -
        unnormal_logf(data,alpha,beta) - 
        unnormal_logf(Y,alpha_temp,beta_temp)
    } else {
      log_acc_ratio <- -Inf
    }
    
    if(log(runif(1)) < log_acc_ratio){
      theta <- theta_temp
    }
    theta_samples[iter,] <- theta
  }
  
  return(theta_samples)
}

(4) results

The sampled \(\alpha,\beta\) are stored in theta_samples. The medians andstandard deviations of them are in ma, mb, and sda, sdb respectively, which are used in functions of normal distribution pdf values fa and fb.

theta_samples <- DMH(data)

ma <- median(theta_samples[burnin:n_iter,1])
mb <- median(theta_samples[burnin:n_iter,2])
ma;mb
## [1] -0.3021663
## [1] 0.2456627
sda <- sd(theta_samples[burnin:n_iter,1])
sdb <- sd(theta_samples[burnin:n_iter,2])
sda;sdb
## [1] 0.03224568
## [1] 0.03559585
fa <- function(x){
  dnorm(x,ma,sda)
}
fb <- function(x){
  dnorm(x,mb,sdb)
}

The histograms show that \(\alpha\) and \(\beta\) samples are similar to normal distribution with mean ma, mb and standard deviation sda, sdb respectively.

h1 <- ggplot()+
  geom_histogram(data = as.data.frame(theta_samples[burnin:n_iter,]),
                 aes(x = alpha, y = after_stat(density)),
                 bins = 30, fill = 'gray', color = 'black') +
  stat_function(fun = fa, color = 'red') +
  labs(title = bquote("Histogram of "~ alpha), x = expression(alpha)) +
  theme(plot.title = element_text(size = 20)) +
  theme_minimal()

h2 <- ggplot()+
  geom_histogram(data = as.data.frame(theta_samples[burnin:n_iter,]),
                 aes(x = beta, y = after_stat(density)),
                 bins = 30, fill = 'gray', color = 'black') +
  stat_function(fun = fb, color = 'red') +
  labs(title = bquote("Histogram of "~ beta), x = expression(beta)) +
  theme(plot.title = element_text(size = 20)) +
  theme_minimal()

grid.arrange(h1,h2, ncol = 1)

The trace plots show that the sampled \(\alpha\) and \(\beta\) oscillate around ma and mb respectively.

par(mfrow = c(2,1))
plot.ts(theta_samples[burnin:n_iter,1], ylab = expression(alpha),
        main = bquote("Trace Plot of "~ alpha), cex.main = 2)
plot.ts(theta_samples[burnin:n_iter,2],ylab = expression(beta),
        main = bquote("Trace Plot of " ~ beta), cex.main = 2)