Statistical Computing HW3
Q1. We return to the problem of sampling from the distribution \(F\) with the density considered in Homework 2.\[f(x) = \frac{1}{2}(1+x)e^{-x}, x > 0\]
i) Show that the direct application of the inverse transform method requires the inverse function of \(h(x) = xe^{-x}, x < -2\)
The inverse transform method based on \(U \sim \text{Unif}(0,1)\) and \(X = F^{-1}(U)\)is as follows:
\[\begin{aligned} P[X \le x] & = P[F^{-1}(U) \le x] \\ & = P[U \le F(x)] = F(x) \\ & = \int_0^x \frac{1}{2}(1+t)e^{-t}dt \\ & = \left[ -\frac{1}{2}(1+t)e^{-t} \right]_{t = 0}^{t = x} - \int_0^x -\frac{1}{2} e^{-t}dt \\ & = \left[ -\frac{1}{2}(1+t)e^{-t} - \frac{1}{2} e^{-t}\right]_{t = 0}^{t = x} \\ & = \left[ -e^{-t} - \frac{1}{2}te^{-t}\right]_{t = 0}^{t = x} \\ & = 1 - e^{-x} - \frac{1}{2}xe^{-x} \\ & = 1 - \frac{1}{2}(x + 2)e^{-x} \\ & = 1 + \frac{1}{2}we^{w+2} \\ & = 1 + \frac{e^2}{2}we^w, (w < -2) \end{aligned}\]
So, if we let \(H(w) = we^w\) and \(U = F(X)\),
\[U-1 = \frac{e^2}{2}H(W),\\ W = H^{-1}\left( \frac{2}{e^2}(U-1) \right) = -X - 2, \text{and}\\ X = -2 - H^{-1}\left(\frac{2}{e^2}(U-1) \right).\]
Thus, the inverse transform method for the given \(f(x)\) needs the inverse of \(H(x) = xe^x, x < -2\).
ii) Now we consider a different approach. Let \(Y \sim \text{Gamma}(2, 1)\), where \(\text{Gamma}(\alpha, \beta)\) is the gamma distribution with shape parameter \(\alpha > 0\) and rate parameter \(\beta > 0\). Show that \((Y - 1)|(Y > 1) \sim F\).
The pdf of \(Y\) is \(\frac{1}{2}y^2e^{-y}\) \((y > 0)\). Then,
\[\begin{aligned} \mathbb{P}[Y-1 \leq x | Y > 1],(x > 0) &= \mathbb{P}[1<Y<x+1]\mathbb{P}[Y>1] \\ &= \frac{\int_{1}^{x+1} \frac{1}{2}y^2e^{-y}dy}{\int_{1}^{\infty} \frac{1}{2}y^2e^{-y}dy} \end{aligned}\]
Since \(\int_{y}y e^{-y}dy = [-ye^{-y}]_{y=1}^{y=a} = ae^{-a} - (1 e^{-1}) = ae^{-a} - e^{-1}\),
\(\mathbb{P}[Y-1 \leq x | Y > 1] = 2e^{-1} - (x+2)e^{-x} = 1 - \frac{1}{2}(x+2)e^{-x} = F(x)\) (where \(F(x)\) is the cumulative distribution function).
Thus, \((Y-1)|(Y>1) \sim F\).
iii) Based on ii), construct a rejection algorithm for sampling from \(F\) using two independent uniform random variables, \(U_1, U_2 \stackrel{\text{iid}}{\sim} \text{Unif}(0, 1)\), and generate \(10^5\) random numbers from \(F\) using your algorithm. Do not use the built-in gamma and qgamma functions in R. Overlay a histogram of the generated random numbers with the density \(f\).
This question requires rejection algorithm, so the exponential function with mean 2 and the support \(x\) over than 1 is used for \(g(x)\). That is, if \(X \sim \exp(\lambda /\alpha)\), where \(\alpha,\lambda\) come from \(\Gamma(\alpha, \lambda)\), (\(t > 0\)) \[\mathbb P(X-k>t|X > k) = \mathbb P(X > t) = \exp(-\frac{\lambda}{\alpha}t),\] due to the memoryless property of exponential distribution.
Then, \[ \begin{aligned} \mathbb P(X-k\le t|X > k) &= 1 - \exp(-\frac{\lambda}{\alpha}t) \quad \text{let $x=t+k > k,$} \\ &= 1 - \exp(-\frac{\lambda}{\alpha}(x-k)) \\ &= G(x) \end{aligned} \] So, \[g(x) = \frac{d}{dx}G(x) = \frac{\lambda}{\alpha}\exp\left\{ -\frac{\lambda}{\alpha} (x-k)\right\}\]
In this case, \(\alpha = 2,\lambda = 1\), and \(k = 1\). So, \[g(x) = \frac{1}{2}\exp\left\{ -\frac{1}{2} (x-1)\right\}, \quad x > 1.\]
Let us simply change the variable of \(f(x)\) so as to make the support of \(f\) and \(g\) equal. Let \(x = t + 1, (t > 0) \to t = x – 1.\) Then, \[f(t) = \frac{1}{2}(t+1)e^{-t} = \frac{1}{2}xe^{-(x-1)}, x > 1.\] Then, \[ \begin{aligned} c = \sup_{x>1}\frac{f(x)}{g(x)} &= \sup_{x>1} \frac{ \frac{1}{2}xe^{-(x-1)}}{\frac{1}{2}\exp\left\{ -\frac{1}{2} (x-1)\right\}} \\ &= \sup_{x>1} x\exp\left\{ -\frac{1}{2}(x-1)\right\} \end{aligned}\] We have \[\frac{d}{dx}x\exp\left\{ -\frac{1}{2}(x-1)\right\} = \left(1-\frac{1}{2}x \right)\exp\left\{ -\frac{1}{2}(x-1)\right\} = 0 \\ \to x = 2.\] That is, \(\frac{f(x)}{g(x)}\) attains maximum at \(x = 2\). So, \[c = 2\exp\left\{-\frac{1}{2}*1\right\} \approx 1.21306.\] Then, the rejection algorithm should follow the steps: 1) generate a random variable \(Y\) from the \(g(x)\), 2) if a random number \(U \sim \text{unif}(0,1))\) is smaller than or equal to \(\frac{f(y)}{c*g(y)}\), let \(X = Y-1\), 3) if not, return to the first stage 1). The value of the threshold is \[\frac{f(y)}{c*g(y)} = \frac{\frac{1}{2}y\exp\left\{-(y-1)\right\}}{2e^{-\frac{1}{2}} \frac{1}{2}\exp\left\{-\frac{1}{2}(y-1)\right\}} = \frac{y}{2}\exp\left\{-\frac{1}{2}y + 1\right\}\]
The reason for putting \(X = **Y – 1**\) is that the \(‘Y-1’\) (not the ‘Y’) follows the distribution of \(F\).
The following code shows how to generate \(10^5\) random numbers from the function.
stu_id <- 2011142127
set.seed(stu_id)
hw3_q1_3 <- function(N){
X <- c()
total <- 0
for(i in 1:N){
while(1){
total <- total + 1
u1 <- runif(1)
y <- 1 - 2*log(u1)
u2 <- runif(1)
if(u2 <= 1/2*y*exp(-y/2 + 1)){## thresold
X[i] <- y-1 # insert y-1
break
}
}
}
acc_rate <- N/total
result <- list(X, acc_rate)
names(result) <- c('X', 'acc_rate')
return(result)
}
The following graph shows the histogram of the random numbers overlaid by the red \(f(x)\).
res1 <- hw3_q1_3(100000)
hist(res1$X, freq = FALSE, main = 'Histogram of hw3 Q1-3',
xlab = 'generated numbers from F')
x <- seq(0,10,0.01)
fx <- 1/2*(1 + x)*exp(-x)
lines(x,fx, col ='red')
iv) Compare the rejection algorithm in iii) with the one considered in Homework 2 in terms of the probability of acceptance. Which one is better?
The probability of acceptance (theoretical) in this problem is \(\frac{1}{2\exp(-1/2)} = 0.824361\). In contrast, the probability of acceptance in HW2 was \(\frac{1}{c_{\lambda} } = \frac{2\lambda(1 - \lambda)}{e^{-\lambda}}|(\lambda = \frac{-1+\sqrt{5}}{2}) = 0.875943\), which is bigger than that of this problem.
Since the \(c_{\lambda}\) comes from the obtimal value of \(\lambda\) that makes \(c_{\lambda}\) the smallest, the probability of acceptance should be smaller than any other probability of acceptance.
The estimated probability of acceptance is similar to the ideal value.
res1$acc_rate
## [1] 0.8250076
v) Let \(G\) be the distribution
function of \(Y \sim \Gamma(2, 1)\).
Show that
\[F^{-1}(u) = G^{-1}\left( 1 +
\frac{2}{e}(u-1)\right)-1,\quad 0 < u < 1.\]
Since \(g(x) = xe^{-x}\), \[ \begin{aligned} G(x) &= \int_0^x te^{-t}dt \\ &= \left[-te^{-t} \right]_{t = 0}^{t = x} + \int_0^x e^{-t}dt \\ &= \left[-te^{-t} -e^{-t}\right]_{t = 0}^{t = x} \\ &= 1-(1+x)e^{-x}. \end{aligned} \]
This question will use the inverse transform method. Let \(X = F^{-1}(U)\) and \(U \sim \text{Unif}(0,1)\). Then, \[ \begin{aligned} \mathbb P(X \le x) &= \mathbb P(F^{-1}(U) \le x) \\ &= \mathbb P(U \le F(x)) \\ &= \mathbb P \left(U \le 1 - \frac{1}{2}(x+2)e^{-x}\right) \\ &= \mathbb P \left( \frac{2}{e}(U-1) \le -(x+2)e^{(x+1)} \right) \\ &= \mathbb P \left( 1+\frac{2}{e}(U-1) \le G(x+1) \right) \\ &= \mathbb P \left( G^{-1}\left(1+\frac{2}{e}(U-1) \right) \le x+1\right) \\ &= \mathbb P \left( G^{-1}\left(1+\frac{2}{e}(U-1) \right) -1 \le x\right) \end{aligned} \]
Thus, \(F^{-1}(u) = G^{-1}\left( 1 + \frac{2}{e}(u-1)\right)-1,\quad 0 < u < 1\).
Q2. Suppose we wish to generate random variables \(X \sim\Gamma(3, 1.5)\) and \(Y \sim \Gamma(7.5, 1)\) with some dependence structure on \((X, Y)\) specified by a bivariate Gaussian copula. To compare the correlations of Gaussian copula and of the induced joint distribution of (X, Y), generate \(n = 10^5\) pairs of (X, Y) using the bivariate Gaussian copula with each of the correlation coefficients \(\rho = −0.9,−0.6,−0.3, 0, 0.3, 0.6, 0.9\) (increasing by 0.3), so that you can have 7 sets of \(10^5\) pairs of (X, Y). You are free to use the built-in quantile and probability functions of normal and gamma distributions (qnorm, pnorm, qgamma, and pgamma). You can also use the chol function for the Cholesky decomposition if you want.
i) Let \((U, V) \sim C_{\rho}\), where \(C_{\rho}\) is the bivariate Gaussian copula with \(\rho\). For each \(\rho\), give histograms of the generated \(U\) and \(V\) (to see the marginals) and the scatter plot between the generated \(U\) and \(V\) (to see the joint distribution). Discuss your results.
Before we begin, ‘Chol’ function (the first letter is capital ‘C’) should be explained. The ‘chol’ function in R returns a matrix whose elements below its diagonal are all 0. In this problem, a Cholesky decomposition with elements above the diagonal being 0’s is needed. So the ‘Chol’ function is defined which returns the transpose of ‘chol’ function.
Chol <- function(mat){
return(t(chol(mat)))
}
# sample matrix to see how 'Chol' works
samp <- matrix(c(1,0.5,0.5,1), nrow = 2)
chol(samp)
## [,1] [,2]
## [1,] 1 0.5000000
## [2,] 0 0.8660254
Chol(samp)
## [,1] [,2]
## [1,] 1.0 0.0000000
## [2,] 0.5 0.8660254
The idea of generating two random numbers \(U,V\) from a copula \(C_{\tilde X, \tilde Y}\), where \((\tilde X, \tilde Y) \sim \tilde H, \tilde X \sim \tilde F, \tilde Y \sim \tilde G\) is as follows: 1) generate \((\tilde X, \tilde Y) \sim \tilde H\), 2) let \(U = \tilde F(\tilde X)\) and \(V \sim \tilde G(\tilde Y)\). Then \((U,V) \sim C_{\tilde X, \tilde Y}\). In this problem, \(\tilde H\) is the joint normal distribution, so we need a Cholesky decomposition to generate \(X\) and \(Y\).
The following procedure presents how to generate random variables \(U,V\) using the joint normal random variables \(W_1\) and \(W_2\). 1) generate independent normal \(Z_1\) and \(Z_2\), 2) calculate \(W = AZ\) where \(W = [W_1 \hspace{1mm} W_2]^{\top}, Z = [Z_1 \hspace{1mm} Z_2]^{\top}\), and \(R = \begin{bmatrix} 1 & \rho_{12} \\ \rho_{21} & 1 \end{bmatrix} = AA^{\top}\), 3) let \(U = \Phi(W_1), V = \Phi(W_2)\).
The following code shows how to generate the \(U\) and \(V\). Note that ‘qnorm’ returns the inverse of normal distribution when the input is a number between 0 and 1, and ‘pnorm’ returns the integration from -inf to the given input.
set.seed(stu_id)
hw3_q2_1 <- function(N, p){# 'p' represents the correlation coefficient
U <- c()
V <- c()
for(i in 1:N){
z1 <- qnorm(runif(1)) # instead of rnorm
z2 <- qnorm(runif(1))
R <- matrix(c(1,p,p,1), nrow = 2) # correlation matrix
A <- Chol(R) # Cholesky decomposition
W <- A %*% c(z1, z2) # joint normal with correlation
U[i] <- pnorm(W[1])
V[i] <- pnorm(W[2])
}
result <- data.frame(U = U, V = V)
return(result)
}
hw3_q2_1(1, 0.3)
## U V
## 1 0.2483785 0.4110868
The following code shows how to make the histograms and scatter plots for each \(\rho\). It’s difficult to type ‘\(\rho\)’ in R, so the letter ‘p’, that corresponds to the Greek \(\rho\), is used.
p <- c(-0.9,-0.6, -0.3, 0, 0.3, 0.6, 0.9)
for(i in 1:7){
res <- hw3_q2_1(100000, p[i])
U <- res$U
V <- res$V
op <- par(mfrow = c(2,2))
hist(U, freq = FALSE, main = sprintf('Histogram of U when p = %.1f',p[i]),
xlab = 'U')
hist(V, freq = FALSE, main = sprintf('Histogram of V when p = %.1f',p[i]),
xlab = 'V')
plot(U, V, main = sprintf('scatter plot of U and V when p = %.1f',p[i]),
xlab = 'U', ylab = 'V')
par(op)
}
Each \(U\) and \(V\) histogram is similar to the graph of uniform distribution regardless of the value of \(\rho\). As \(\rho\) becomes close to -1, the scatter plot has strong negative linear relationship between \(U\) and \(V\) so that there are relatively few points around (0,0) and (1,1). As \(\rho\) becomes 0, the points have no linear relationship, resulting in all the points being scattered uniformly in the square. As \(\rho\) becomes 1, the scatter plot has strong positive linear relationship between \(U\) and \(V\) so there are few points around (1,0) and (0,1).
ii) What is the joint distribution of \((U, V)\) when \(\rho = 0\)? Support your answer.
When \(\rho = 0\), the normal random variables \(W_1\) and \(W_2\) are independent. So, the \(W_1\) and \(W_2\) can be written simply as \(Z_1\) and \(Z_2\) (independent random variables from normal distribution respectively).
Then, the joint distribution of \(U\) and \(V\) is, by definition, \[\begin{aligned} \mathbb P(U \le u, V \le v) &= \mathbb P(\Phi(W_1) \le u, \Phi(W_2) \le v) \\ &= \mathbb P(\Phi(Z_1) \le u)\mathbb P(\Phi(Z_2) \le v) \quad \because \text{independence} \\ &= \mathbb P(Z_1 \le \Phi^{-1}(u) )\mathbb P(Z_2 \le \Phi^{-1}(v) ) \\ &= \int_{-\infty}^{\Phi^{-1}(u)}\frac{1}{\sqrt{2\pi}}\exp\left\{-\frac{x^2}{2}\right\}dx \int_{-\infty}^{\Phi^{-1}(v)}\frac{1}{\sqrt{2\pi}}\exp\left\{-\frac{y^2}{2}\right\}dy \\ &= \Phi\left(\Phi^{-1}(u)\right)\Phi\left(\Phi^{-1}(v)\right) \\ &= uv \end{aligned}\]
iii) For each \(\rho\), give histograms of the generated \(X\) and \(Y\) (to see the marginals) and the scatter plot between the generated \(X\) and \(Y\) (to see the joint distribution). Discuss your results.
The idea of generating two random numbers \(X\) and \(Y\) from a copula \(C_{\tilde X, \tilde Y}\) where \((\tilde X, \tilde Y) \sim \tilde H, \tilde X \sim \tilde F, \tilde Y \sim \tilde G\) is as follows: 1) generate \(\tilde X, \tilde Y \sim \tilde H\), 2) let \(X \sim F^{-1}\left( \tilde F(\tilde X)\right)\), and \(Y \sim G^{-1}\left( \tilde G(\tilde Y)\right)\). Then, \(X \sim F\) and \(Y \sim G\), where \((X,Y) \sim H\).
The following procedure presents how to generate random variables \(X, Y\) using the joint normal random variables \(W_1\) and \(W_2\). 1) generate univariate normal \(Z_1\) and \(Z_2\), 2) calculate \(W = AZ\) where \(W = [W_1 \hspace{1mm} W_2]^{\top}, Z = [Z_1 \hspace{1mm} Z_2]^{\top}\), and \(R = \begin{bmatrix} 1 & \rho_{12} \\ \rho_{21} & 1 \end{bmatrix} = AA^{\top}\), 3) let \(X = F^{-1}(\Phi(W_1))\) and \(Y = G^{-1}(\Phi(W_2))\).
The following code shows how to generate the X and Y, and how to make histograms of X and Y and the scatter plot of (X,Y) for each \(\rho\). Note that the ‘qgamma’ returns the inverse of gamma distribution. The built-in ‘gamma’ function is used only for auxiliary graph g(x) to make the code tidy. For reference, \[\Gamma(7.5) = 6.5*5.5*4.5*3.5*2.5*1.5*0.5*\sqrt{\pi} \approx 1071.254.\]
set.seed(stu_id)
hw3_q2_3 <- function(N, p){
X <- c()
Y <- c()
for(i in 1:N){
z1 <- qnorm(runif(1))
z2 <- qnorm(runif(1))
R <- matrix(c(1,p,p,1), nrow = 2)
A <- Chol(R)
W <- A %*% c(z1, z2)
X[i] <- qgamma(pnorm(W[1]), shape = 3, rate = 1.5)
Y[i] <- qgamma(pnorm(W[2]), shape = 7.5, rate = 1)
}
result <- data.frame(X = X, Y = Y)
return(result)
}
hw3_q2_3(1, -0.3)
## X Y
## 1 1.147455 7.66872
p <- c(-0.9,-0.6, -0.3, 0, 0.3, 0.6, 0.9)
x <- seq(0,25,0.01)
f <- 27/16*x^2*exp(-3/2*x)
g <- 1/gamma(7.5)*x^(6.5)*exp(-x) # gamma(7.5) is about 1871.254
for(i in 1:7){
res2 <- hw3_q2_3(100000, p[i])
X <- res2$X
Y <- res2$Y
op <- par(mfrow = c(2,2))
hist(X, freq = FALSE, main = sprintf('Histogram of X when p = %.1f',p[i]),
xlab = 'X')
lines(x,f, col = 'red')
hist(Y, freq = FALSE, main = sprintf('Histogram of Y when p = %.1f',p[i]),
xlab = 'Y')
lines(x,g, col = 'red')
plot(X, Y, main = sprintf('scatter plot of X and Y when p = %.1f',p[i]),
xlab = 'X', ylab = 'Y')
par(op)
}
iv) What is the joint distribution of (X, Y) when \(\rho\) = 0? Support your answer.
Let \(F(X) = U_1 \sim
\text{Unif}(0,1)\) and \(G(Y) = U_2
\sim \text{Unif}(0,1)\).
Also, let \(W_1 = \Phi^{-1}(U_1)\) and \(W_2 = \Phi^{-1}(U_2)\). Then, \[\begin{aligned}
\mathbb P(X \le x, Y \le y)
&= \mathbb P(F(X) \le F(x), G(Y) \le G(y)) \\
&= \mathbb P(U_1 \le F(x), U_2 \le G(y)) \\
&= \mathbb P\left(W_1 \le \Phi^{-1}(F(x)), W_2 \le \Phi^{-1}(G(y))
\right) \\
&= \int_{-\infty}^{\Phi^{-1}G(y)}\int_{-\infty}^{\Phi^{-1}F(x)}
\frac{1}{2\pi *1*1*\sqrt{1-\rho^2}}
\exp\left\{ -\frac{1}{2(1-\rho^2)}(w_1^2-2\rho w_1 w_2 + w_2^2) \right\}
dw_1dw_2 {\huge |} \rho = 0 \\
&= \int_{-\infty}^{\Phi^{-1}G(y)}\int_{-\infty}^{\Phi^{-1}F(x)}
\frac{1}{2\pi}\exp\left\{-\frac{1}{2}(w_1^2 + w_2^2) \right\}dw_1dw_2 \\
&= \int_{-\infty}^{\Phi^{-1}G(y)}\sqrt{\frac{1}{2\pi}}
\exp\left\{ -\frac{1}{2}w_2^2\right\}
\int_{-\infty}^{\Phi^{-1}F(x)}\sqrt{\frac{1}{2\pi}}
\exp\left\{ -\frac{1}{2}w_1^2\right\} \\
&= \Phi \left(\Phi^{-1}(G(y)) \right) \Phi \left(\Phi^{-1}(F(x))
\right) \\
&= F(x)G(y).
\end{aligned}\]
v) Draw a plot of \(\rho\) versus the estimated Corr(\(X, Y\)). Discuss your results.
To begin with, a simple function that calculates the correlation between \(X\) and \(Y\) should be explained.
For each \(\rho\) defined by ‘p’, generate ‘N’ random numbers from the ‘hw3_q2_3’ function. For the generated \(X\) and \(Y\), calculate the correlation with the ‘cor’ function. Save each correlation value and return all the correlations.
set.seed(stu_id)
hw3_q2_5 <- function(N, p){
corr <- c()
for(i in 1:length(p)){
pi <- p[i]
Data <- hw3_q2_3(N, pi)
corr[i] <- cor(Data$X,Data$Y)
}
return(corr)
}
p <- c(-0.9,-0.6,-0.3,0,0.3,0.6, 0.9)
corr <- hw3_q2_5(100000, p)
corr
## [1] -0.818377845 -0.552489367 -0.286471903 0.002138141 0.286790980
## [6] 0.587880418 0.891319060
Then, plot the theoretical correlation vs the estimated correlation through the code.
plot(p, corr, main = 'plot of p versus estimated correlation',
xlab = 'p', ylab = 'estimated corr')
This plot indicates that the estimated values are very close to theoretical one.