Q1. A deck of K cards numbered 1, 2, . . . ,K is shuffled and then turned over one card at a time. Say that a “hit” occurs whenever card k is the kth card to be turned over, k = 1, . . . ,K.

i) Show that the expected number of hits is 1 for any K.

Let \(1_k = 1\) if the kth card hits and \(1_k = 0\) otherwise. Then,

\[E[\sum_{k=1}^K 1_k] = \sum_{k=1}^K E[1_k] = \sum_{k=1}^K P[1_k = 1]\]

\(1_k = 1\) implies the card ‘k’ is chosen at the ‘kth’ turn-over.

Then, the number of all the cases is \((K-1)!\).

So, \(P[1_k = 1] = \frac{(K-1)!}{K!} = \frac{1}{K}\).

Thus,

\[E[\sum_{k=1}^K 1_k] = \sum_{k=1}^K \frac{1}{K} = 1\]

ii) For K = 10, 20, 30, estimate the expected value and the variance of the number of hits using the Monte Carlo simulation by iterating the procedure sufficiently many times.

A function that generates the random permutation is needed for a prerequisite of main function. The random permutation function is named as ‘random_permu’ and it receives a numeric vector as an input. This function consists of one if-else statement, the first if-statement is for a simple case where the vector contains only one number.

The main part is in the else statement. At first, save the length of the input, ‘permu’ in the function, to the ‘k’ variable. Save the last (kth) value of the ‘permu’ vector to ‘temp’ variable. Generate a random number ‘u’ from Unif(0, 1), and make a random integer ‘i’ between 1 and k using the ‘u’ and ‘floor’ function. Since \(0 \le u < 1\), the range of ‘ku’ is as follows: \(0 \le ku < k\). Since the floor of ‘ku’ is the integer between 0 and k-1, the floor of ‘ku’ + 1 returns the integer between 1 and k. Then, change the ith and kth number each other, and reduce the number k by 1. Repeat this process until k becomes 2.

The following code shows how the ‘random_permu’ function works.

# define random permutation of a vector
random_permu <- function(permu=1){
  if(length(permu) == 1){
    return(permu)
  }
  else{
    k <- length(permu)
    
    while(k >= 2){
      temp <- permu[k]
      
      u <- runif(1)
      i <- floor(k*u)+1 # returns a random integer btw 1 and k
      permu[k] <- permu[i]
      permu[i] <- temp
      
      k <- k-1
    }
    
    return(permu)
    
  }
}

# for example,
random_permu(1:10)
##  [1]  9  5  7  8  4  2  1  6 10  3
##  [1]  8  1  4  9  6 10  2  7  5  3

After making the random permutation function, the main function that returns the expectation and variance of the number of hits should be implemented; the name of the function is ‘num_hits’. This function requires two inputs; the first one ‘K’ is the number of cards, and the second one ‘iter’ means the number of iterations.

How to obtain the estimated variance of the hits should be explained before getting into the code. The expectation can be estimated by the Law of Large Numbers (Monte Carlo Integral); \(\frac{1}{n} \sum_{i=1}^{n} g(U_i) \to E[g(U)]\), where \(U_i \sim Unif(0,1)\) and \(g(U\_i)\) is the corresponding random variable. Using this fact, the mean of the square of \(g(U_i)\), \(\frac{1}{n} \sum_{i=1}^{n} g(U_i)^2\) converges to \(E[g(U)^2]\). Then, the following expression is established:

\[\frac{1}{n} \sum_{i=1}^{n} g(U_i)^2 - \{\frac{1}{n} \sum_{i=1}^{n} g(U_i)\}^2\]

In this problem, \(g(U)\) corresponds to the number of hits for each iteration.

Then, the for-loop in the ‘num_hits’ function can be explained in this context. For every iteration, calculate how many hits occur using the boolean, ‘sum’, and ‘random_permu’ function, which will be saved to ‘temp’ variable. Add the ‘temp’ to the ‘exp_sum’ variable, and add the square of ‘temp’ to ‘var_sum’. After the iteration is finished, the added ‘exp_sum’ is divided by the iteration number ‘iter’, which will return the estimated expectation. Also, the added ‘var_sum’ is divided by ‘iter’ and subtracted by the square of estimated expectation, which will become the estimated variance. This process is congruous to the idea of the Law of Large Numbers.

After then, insert the estimated expectation ‘E’ and variance ‘V’ together in the ‘result’ vector, and designate the name of each value as ‘E’ and ‘V’ respectively. Finally, return the ‘result’ vector.

The following code shows how the ‘num_hits’ function works. The iteration of 1000 will be enough to estimate the values.

num_hits <- function(K, iter){#
  # K : the number of cards
  # iter : the number of iterations this function will carry out
  exp_sum <- 0
  var_sum <- 0
  
  for(i in 1:iter){
    temp <- sum(random_permu(1:K) == 1:K) # how many hits occur
    exp_sum <- exp_sum + temp
    var_sum <- var_sum + temp^2
  }
  
  E <- exp_sum/iter
  V <- var_sum/iter - E^2 # using the LLN
  
  result <- c(E, V)
  names(result) <- c('E', 'V')
  
  return(result)
}

The following code shows the estimated expectation and variance for each card numbers; K = 10, 20, and 30. As previously mentioned, the iteration number is 1000. For each K, the ideal variance is presented to compare with the estimated value. It is evident that the iteration number can be raised up if the amount is regarded as insufficient.

num_hits(10,1000)
##        E        V 
## 0.982000 1.001676
##        E        V 
## 1.026000 0.971324
K <- 10; (K-1)/K
## [1] 0.9
## [1] 0.9

num_hits(20,1000)
##        E        V 
## 0.988000 1.075856
##        E        V 
## 1.061000 1.073279
K <- 20; (K-1)/K
## [1] 0.95
## [1] 0.95

num_hits(30,1000)
##        E        V 
## 1.068000 1.079376
##        E        V 
## 1.012000 1.041856
K <- 30; (K-1)/K
## [1] 0.9666667
## [1] 0.9666667

iii) It is known that the distribution of the number of hits approaches a Poisson distribution as \(K \to \infty\). For K = 10, 20, 30, estimate the probability that the number of hits is zero using the Monte Carlo simulation by iterating the procedure sufficiently many times. Discuss your results.

Each indicator \(1_k\) is same as a Bernoulli distribution with probability \(p = 1/K, k = 1,2,...,K\). This repeats for K times, so (for given K) the number of hits \(\sum 1_k\) follows the Binomial distribution \(B(K,\frac{1}{K})\). Let \(X = \sum 1_k\) be a random variable representing the number of hits, then the problem is asking the value of \(P(X = 0)\). So, it is required to define a function generating random numbers from Binomial distribution first.

The ‘random_binomial’ function needs 3 inputs; the capital \(N\) is the number of random numbers from Binomial distribution, the small ‘\(n\)’ is the number of Bernoulli trials, and ‘\(p\)’ is the Bernoulli probability. For choosing the ith random number, initialize some variables (prob, sum.p, u, and r) to \((1-p)^n\), prob, runif(1), and 0 respectively. If the random number \(u\) ~ Unif(0, 1) is bigger than or equal to the cumulative probability ‘sum.p’, change the current probability ‘prob’ to prob*p/(1-p)*(n-r)/(r+1), add the cumulative probability by the ‘prob’, and increase ‘r’ (representing the integer between 0 and n) by 1. If the ‘u’ is smaller than ‘sum.p’, stop the iteration and save the value ‘r’ to the ith value of ‘X’ vector. After repeating this for N times, the X vector will be returned.

The following code shows how the ‘random_binomial’ function works.

random_binomial <- function(N, n, p){
  X <- c()
  
  for(i in 1:N){
    prob <- (1-p)^n
    sum.p <- prob
    u <- runif(1)
    r <- 0
    
    while(u >= sum.p){
      prob <- prob*p/(1-p)*(n-r)/(r+1)
      sum.p <- sum.p + prob
      r <- r+1
    }
    X[i] <- r
  }
  
  return(X)
  
}

random_binomial(20, 5, 1/5)
##  [1] 3 1 1 2 0 1 0 1 1 0 0 0 1 1 0 2 1 1 4 0
##  [1] 0 1 2 0 1 1 1 3 1 1 1 1 1 1 3 2 0 1 3 2

Making use of this new function, a function that returns the probability of a hit being a given number (in this case, 0) can be defined. This function, the name of which is ‘prob_hit_binomial’, requires 4 inputs; ‘iter’ is the number of iterations, ‘N’ is the number of random Binomial numbers for each iteration, ‘K’ is the number of cards, and ‘hit’ corresponds to \(X = \sum 1_k\) the default of which is 0.

For each iteration, count how many numbers of hits there are in the output of ‘random_binomial’ and save it to ‘temp’ variable. Then add the value to the ‘count’ variable, the initial value of which is 0. Repeat this for ‘iter’ times and divide the ‘count’ by ’N*iter’. This is the estimated probability of the hits being the given value (in this problem, 0).

The following code shows how the ‘prob_hit_binomial’ works.

prob_hit_binomial <- function(iter, N, K, hit = 0){
  count <- 0
  
  for(i in 1:iter){
    temp <- sum(random_binomial(N, K, 1/K) == rep(hit, N))
    count <- count + temp
  }
  
  return(count/(N*iter))
    
}


prob_hit_binomial(1000, 10, 10)
## [1] 0.3506
## [1] 0.3498
prob_hit_binomial(1000, 10, 20)
## [1] 0.3528
## [1] 0.3585
prob_hit_binomial(1000, 10, 30)
## [1] 0.352
## [1] 0.3631

exp(-1)
## [1] 0.3678794
## [1] 0.3678794

As the number of cards ‘K’ increases, the probability that the number of hits is zero approaches to the value \(e^{-1} = 0.3678794\).

The reason for the probability being approximate to \(e^{-1}\) is that the binomial distribution approaches to Poisson distribution as the ‘n’ is sufficiently large and the corresponding \(\lambda = np\) is sufficiently moderate. To be specific, let \(X \sim B(n,p)\) follows binomial distribution. Then,

\[ \begin{aligned} P(X = x) & = \frac{n!}{x!(n-x)!} p^x(1-p)^{n-x} \\  & = \frac{n!}{x!(n-x)!} (\frac{\lambda}{n})^x(1-\frac{\lambda}{n})^{n-x}\\   & = \frac{n(n-1)...(n-x+1)}{x!} (\frac{\lambda}{n})^x(1-\frac{\lambda}{n})^{n-x}\\  & = \frac{\lambda^x}{x!} \frac{n(n-1)...(n-x+1)}{n^x}(1-\frac{\lambda}{n})^{\frac{n-x}{-\lambda} (-\lambda)}\\   & \approx \frac{\lambda^x}{x!} e^{-\lambda} \end{aligned} \]

, which is the Poisson distribution. In this case, \(\lambda = np = K*\frac{1}{K} = 1\). So the probability that the number of hits is 0 is approximated to \(\frac{1^0}{0!} e^{-1} = 0.367879\).

Q2. We consider the rejection method to draw a sample from the distribution with the density
\[f(x) = \frac{1}{2} (1+x) e^{-x}. \]
An exponential distribution is used to construct an envelope function, i.e.,
\[ g_{\lambda}(x)= \lambda e^{-\lambda x}, x > 0\]

i) Show that
\[c_{\lambda} = \sup_{x > 0} \frac{f(x)}{g_{\lambda} (x)} = \begin{cases} e^{-\lambda}, & 0 < \lambda < 1, \\ \infty, & \lambda \ge 1 \end{cases}\]

Below is the organized form of \(\frac{f(x)}{g_{\lambda} (x)}\).

\[\frac{f(x)}{g_{\lambda} (x)} = \frac{(1 + x)e^{(\lambda - 1)x}}{2\lambda}\]

  1. \(\lambda > 1\)

\(\lim_{x \to \infty} e^{(\lambda - 1)x} = \infty\), so \(\lim_{x \to \infty} \frac{f(x)}{g_{\lambda} (x)} = \infty\). Accordingly, \(c_{\lambda} = \sup_{x > 0}\frac{f(x)}{g_{\lambda} (x)} = \infty\)

  1. \(\lambda = 1\)

\(\lim\_{x \to \infty} (1+x) = \infty\), so \(\lim_{x \to \infty} \frac{f(x)}{g_{\lambda} (x)} = \infty\) and \(c_{\lambda} = \sup\_{x > 0}\frac{f(x)}{g_{\lambda} (x)} = \infty\)

  1. \(0 < \lambda < 1\)

For given, \(\lambda\), \(\begin{aligned} \frac{d}{dx} (1+x)e^{(\lambda - 1)x} & = e^{(\lambda - 1)x} + (1+x)(\lambda - 1)e^{(\lambda - 1)x} \\ & = e^{(\lambda - 1)x}\{1 + (1+x)(\lambda - 1) \} \\ & = 0 \end{aligned}\)

Then, \(1 + (1+x)(\lambda - 1) = 0\).

So, \(x = \frac{\lambda}{1 - \lambda}\).

This implies that \(\frac{f(x)}{g_{\lambda} (x)}\) attains maximum \(c_{\lambda}\) when \(x = \frac{\lambda}{1 - \lambda}\). Thus,

\(\begin{aligned} c_{\lambda} &= \frac{(1 + x)e^{(\lambda - 1)x}}{2\lambda}| x = \frac{\lambda}{1 - \lambda}\\ &= \frac{(1 + \frac{\lambda}{1 - \lambda})e^{(\lambda - 1)\frac{\lambda}{1 - \lambda}}}{2\lambda} \\ &= \frac{(\frac{1}{1-\lambda})e^{-\lambda}}{2\lambda} \\ &= \frac{e^{-\lambda}}{2\lambda(1-\lambda)} \end{aligned}\)

ii) Derive the value of \(\lambda\) that makes the algorithm the most efficient

For every \(\lambda\) in \(0 <\lambda < 1\), \(c_{\lambda} \ge 1\) (which will be proved later) and the rejection method gets faster when the \(c_{\lambda}\) is close to 1. So, the problem is same as to derive \(\lambda\) that makes the \(c_{\lambda}\) the smallest among the given \(\lambda\). Since the form of \(c_{\lambda}\) is \(\frac{e^{-\lambda}}{2\lambda(1-\lambda)}\), it is same as to obtain \(\lambda\) that maximize \(2\lambda(1-\lambda)e^{\lambda}\) among \(0 <\lambda < 1\).

\(\begin{aligned} \frac{d}{d\lambda}\lambda(1-\lambda)e^{\lambda} &= (1-2\lambda)e^{\lambda} + (\lambda-\lambda^2)e^{\lambda} \\ &= (1 - \lambda - \lambda^2)e^{\lambda} \end{aligned}\)

This leads to \(\lambda = \frac{-1 \pm \sqrt 5}{2}\), so \(\lambda = \frac{-1 + \sqrt 5}{2} \approx 0.618, \quad (0 < \lambda < 1)\).

iii) On a single plot with a reasonable range of x, visualize the target density \(f(x)\) and the envelope \(c_{\lambda}g{\lambda}\) with our best \(\lambda\) to see how close they are$

As previously mentioned, the target density function is \(f(x) = \frac{1}{2}(1+x)e^{-x}\). Given the best \(\lambda\), the best envelope is \(c_{\lambda}g_{\lambda} = \frac{e^{-lambda}}{2\lambda(1-\lambda)}\lambda e^{-\lambda x}=\frac{e^{-\lambda(x+1)}}{2(1-\lambda)}\). \(f(10) = \frac{1}{2}(1+10)e^{-10} = 0.00025\), which is sufficiently small, so the range of \(x\) axis is determined by \(0 < x < 10\). In this problem, the range of x is bigger than 0, but 0 is included in the R code. The reason for this is that not only the value 0 doesn’t affect the plot seriously, but also it makes the code clean.

Define a vector ‘x’ containing numbers from 0 to 10, the interval of which is 0.01. Then, the variable ‘fx’ works as a target density \(f(x)\) on the defined ‘x’. The variable ‘lambda’ is same as the optimal value \(\lambda = \frac{-1+\sqrt{5}}{2}\), and ‘cg’ is the envelope.

x <- seq(0,10,0.01)
fx <- 1/2*(1 + x)*exp(-x)
lambda <- 1/2*(-1 + sqrt(5))
cg <- exp(-lambda*(x+1))/(2*(1-lambda))
plot(x,cg,main = 'plot of target density and its envelope', ylab = 'density',
type = 'l', lty = 'dashed')
lines(x,fx, col ='red')
abline(v = lambda/(1-lambda), col = 'green')
legend('topright', legend = c('c*g(x)','f(x)'), pch = c(4, 20),
col = c('black','red'), bg = 'gray')

The envelope function is depicted with black dashed line, and the target function \(f(x)\) is drawn with red connected line. The values of target function is very close to the envelope function, except for the x when it is close to 0, because the lambda is the optimal value. The vertical green line whose x-axis value is \(\frac{\lambda}{1-\lambda} \simeq 1.618\) shows that the target and envelope functions meet when x = 1.618.

iv) Generate sufficiently many random numbers (of size of your choice) from f using the rejection method. Overlay a histogram of the random numbers with the target density curve f.

The function \(g_{\lambda}(x)\) is exponential function with mean \(\frac{1}{\lambda}\) Then, the rejection method needs random numbers form exponential distribution. So, a function that generates random numbers from exponential distribution (the mean of which is \(\frac{1}{\lambda}\)) should be defined in advance. Let \(U \sim Unif(0,1)\) be a random number from uniform distribution and \(X\) be the random number from the exponential distribution. Then the following equation holds: \(U = 1 -e^{-\lambda}\) and \(X = -\frac{1}{\lambda}log(1-U)\). Since \(1-U \sim Unif(0,1)\), the random variable \(X\) can be written as follows: \(X = -\frac{1}{\lambda}logU\)

The following code shows this method.

random_exponential <- function(N, lambda=1){
X <- c()
u <- runif(N)
for(i in 1:N){
X[i] <- -1/lambda*log(u[i])
}
return(X)
}
random_exponential(5, 2)
## [1] 0.05970902 0.35416898 0.99631438 0.08563744 0.09551172

Define two functions corresponding to the \(f(x)\) and \(g_{\lambda}(x)\) denoted by ‘f’ and ‘g’ respectively. Also, insert the optimal value of \(c_{\lambda} = 1.141627\) to the variable ‘c’.

f <- function(x){return(1/2*(1 + x)*exp(-x))}
g <- function(x){return(lambda*exp(-lambda*x))}
c <- exp(-lambda)/(2*lambda*(1-lambda))

Now, a function generating random numbers from \(f(x)\) can be made, named by ‘hw2q2’. For each iteration, generate random number ‘y’ from exponential distribution and ‘u’ from uniform distribution. if the ‘u’ is bigger than \(\frac{f(x)}{c_{\lambda}g_{\lambda}(x)}\), regenerate the ‘y’ and ‘u’. If the ‘u’ is smaller than or equal to the threshold \(\frac{f(x)}{c_{\lambda}g_{\lambda}(x)}\) then assign the value of ‘y’ to the ith value of ‘X’ vector and go on to the next iteration.

hw2q2 <- function(N){
X <- c()
total <- 0
for(i in 1:N){
while(1){
total <- total + 1
y <- random_exponential(1, lambda)
u <- runif(1)
if(u <= f(y)/(c*g(y))){
X[i] <- y
break
}
}
}
acc_rate <- N/total
result <- list(X, acc_rate)
names(result) <- c('X', 'acc_rate')
return(result)
}
result <- hw2q2(10000)

Using the function ‘hw2q2’, 10000 random numbers from \(f(x)\) are made. These numbers are saved in ‘result$X’ variable (more information about why the return of this function needs ‘$’ is interpreted in the next question). A histogram can be made like the picture on the left side. The \(f(x)\) function drawn in ed line is overlaid on the histogram. This picture shows that the histogram is quite consistent with the target function \(f(x)\).

hist(result$X,main = 'Histogram of Q2', xlab = 'random numbers from f(x)',
freq = FALSE)
lines(x,fx, col ='red')

v) Compare the theoretical value of the probability of acceptance and its estimate obtained by your simulation.

The estimated probability of acceptance can be obtained through the ‘hw2q2’ code. At the beginning part of the function ‘total’ variable is defined with 0 value. This variable is for counting how many trials there are to accept ‘N’ numbers (N is the number of random numbers that this function will generate). Whether the ‘u’ is below the threshold value or not, the ‘total’ increases by 1 for every trial. Then ‘acc_rate’ variable is defined as ‘N/total’, which shows the estimated probability of acceptance. Now, save this value as well as the ‘N’ random numbers, denoted by X, to ‘result’ variable, the form of which is list so as to save different forms.

After making the ‘hw2q2’ function, the random numbers are saved in the ‘X’ index, while the estimated probability is in ‘acc_rate’ index. In this case, the estimated probability is 0.8714597. The theoretical value of the probability of acceptance is \(\frac{1}{c_{\lambda}} = 0.875943\) so the estimation is quite close to the ideal value.

acc_rate <- result$acc_rate
acc_rate
## [1] 0.8721437
1/c
## [1] 0.875943