SC HW4

Q1. Let $C$ be the bivariate Gaussian copula with $\rho$ = 0.5, $F \sim \Gamma(3, 1.5)$, and $G \sim \Gamma(7.5, 1)$, where $\Gamma(\alpha, \beta)$ is the gamma distribution with shape parameter $\alpha> 0$ and rate parameter $\beta > 0$. Consider random variables $(X, Y)$ having the joint distribution constructed by $C$, $F$ and $G$, i.e., $(X, Y) \sim H$, where $H(x, y) = C(F(x),G(y))$. We want to estimate $\mathbb P[−0.5 < \sin(X^3 + Y^2) < 0.5]$ using the following two approaches. / Use successive computation in calculating the sample mean and variance. For each method, report your estimate and the minimum sample size required to achieve the criterion in your simulation. Discuss which one is preferred.

i) Use the Monte Carlo integration with random variables drawn from the bivariate Gaussian copula (Algorithm A) until the (estimated) variance of the estimator is less than $10^{−5}$.

The idea of algorithm A is as follows: Let the indicator function of $X$ and $Y$, $I [|\sin(X^3 + Y^2)| < 0.5]$, be 1 when $|\sin(X^3 + Y^2)|$ is less than 0.5, or 0 otherwise. Then $\mathbb P\left( I [|\sin(X^3 + Y^2)| < 0.5] = 1 \right ) := p$, the probability that should be estimated, is, $p = 1*p + 0*(1-p) = \mathbb E[I [|\sin(X^3 + Y^2)| < 0.5]]$. So, by the Law of Large Numbers (Monte Carlo Integration), the value ‘p’ can be estimated by \[\frac{1}{n}\sum_{i = 1}^n I [ |\sin(X_i{^3} + Y_i{^2}) | < 0.5 ] \to \mathbb E[I [|\sin(X^3 + Y^2)| < 0.5]] = p\\ \text{as } n \to \infty.\]

Each $X_i$ and $Y_i$ can be expressed by $U_1$ and $U_2$ generated from Gaussian copula $C$, whose set $(U_{1,i}, U_{2,i})$ is i.i.d for all $i$.

In short, the process of estimating the expectation is as follows. Generate $(U_{1,i}, U_{2,i})$ from the Gaussian copula $C$ independently for $i$, obtain $(X_i, Y_i) = \left( F^{-1}(U_{1,i}), G^{-1}(U_{2,i}) \right)$that follows $H$, then the expectation can be estimated by \[\frac{1}{n}\sum_{i = 1}^n I [ |\sin(X_i{^3} + Y_i{^2}) | < 0.5 ] \to \mathbb E[I [|\sin(X^3 + Y^2)| < 0.5]].\]

Cholesky function is required to generate $X$ and $Y$ from $C$, which is similar to the HW3. So the ‘Chol’ function, the transpose of the built-in ‘chol’ in R, is defined as follows.

Chol <- function(mat){
  return(t(chol(mat)))
}

An indicator function is defined to present whether the value $\vert \sin(X^3 + Y^2) \vert$ is less than 0.5, called ‘indicator_gau_cop’. This function needs 4 parameters used for the parameters of gamma distributions each. This function generates two random variables from standard normal distributions first (using ‘runif’ function) denoted by z1 and z2, performs Cholesky decomposition to get the correlated random vectors W, get the CDF values of standard normal distribution with respect to each element of W, and obtain the x and y from the inverse of F and G w.r.t the CDF values. For the generated x and y, the reference value (denoted by ‘ref’) is set to the $\vert \sin(X^3 + Y^2) \vert$. If the ‘ref’ is less than 0.5, the indicator variable I is 1, or 0 otherwise, which is same as $I(\vert \sin(X^3 + Y^2) \vert )$. At the end of the function returns the indicator value.

stu_id <- 2011142127
set.seed(stu_id)

indicator_gau_cop <- function(a1,l1, a2,l2){
  
  z1 <- qnorm(runif(1)) # instead of rnorm
  z2 <- qnorm(runif(1))
  
  R <- matrix(c(1,0.5,0.5,1), nrow = 2)
  A <- Chol(R) 
  W <- A %*% c(z1, z2)
  
  u1 <- pnorm(W[1])
  u2 <- pnorm(W[2])  
  
  x <- qgamma(u1, a1, l1)
  y <- qgamma(u2, a2, l2)
  
  ref <- abs(sin(x^3 + y^2))
  if(ref < 0.5){
    I <- 1
  }else{
    I <- 0
  }
  
  return(I)
}

The main function ‘hw4_q1_1’ to calculate the estimated value ‘p’ (sample mean), sample variance, estimated variance of p, and the number of iterations is as follows. The estimated value is written as the sample mean of the indicator value $\bar X$ from now on. Let $X_n$be the nth indicator value, $\bar X_n$be sample mean of successive ‘n’ indicators from the beginning, $S_n^2$ be the sample variance of ‘n’ sequential indicators, and $S_n^2/n$ be the estimated variance of nth sample mean $\bar X_n$ (‘p’). The initial value of sample mean and sample variance is $\bar X = X_1$ and $S_1^2 = 0$. Then, the general form of $\bar X_n$ and $S_n^2$ is as follows (by the successive computation formula): \[\bar X_n = \frac{(n-1)\bar X_{n-1} + X_n}{n},\quad n \ge 2\\ S_n^2 = \left( 1 - \frac{1}{n-1}\right) S_{n-1}^2 + n(\bar X_n - \bar X_{n-1})^2, \quad n \ge 2.\]

The initial values of sample mean and sample variance in the ‘hw4_q1_1’, whose variable names are ‘Xbar’ and ‘Ssq’ respectively, are the output of the function indicator_gau_cop(3, 1.5, 7.5, 1) and 0 each. From the next iteration $(n \ge 2)$, generate the indicator X from indicator_gau_cop function, save the previous $(n-1)$th values of sample mean and sample variance to Xbar_prev and Ssq_prev, and calculate the $n$th value of sample mean and sample variance in accordance with the above formulas. Save the nth sample mean and sample variance to Xbar_prev and Ssq_prev again, and obtain the $(n+1)$th value of sample mean and variance. There is no restriction about the smallest number for the CLT, so this function does not break the iteration until $n = 30$. The variance of the estimator $\bar X_n$ is $S_n^2/n$, so this iteration discontinues when two conditions $n \ge 30$ and $S_n^2/n \le 10^{-5}$ meet. After the halt of the iteration, save the sample mean, sample variance, the variance of the sample mean, and the number of iterations in the form of data frame.

hw4_q1_1 <- function(d = 10^(-5)){
  Xbar <- indicator_gau_cop(3, 1.5, 7.5, 1)
  Ssq <- 0
  n <- 1
  
  while(1){
    n <- n+1
    X <- indicator_gau_cop(3, 1.5, 7.5, 1)
    Xbar_prev <- Xbar
    Ssq_prev <- Ssq
    
    Xbar <- ((n-1)*Xbar_prev + X)/n
    Ssq <- (1 - 1/(n-1))*Ssq_prev + n*(Xbar - Xbar_prev)^2
    
    if(n >= 30 && Ssq/n < d){break}
  }
  
  result <- data.frame(Xbar = Xbar, Ssq = Ssq, est_var = Ssq/n, n = n)
  return(result)
}

According to this function, the estimated probability (sample mean) is about 0.336, the sample variance is 0.223, the variance of estimator is 9.9996*$10^{-6}$, and the number of iterations is 22327.

result1 <- hw4_q1_1(10^(-5))
result1

##        Xbar       Ssq      est_var     n
## 1 0.3364536 0.2232626 9.999668e-06 22327

cf) why algorithm A holds, given $H(x,y) = C(F(x), G(y))$

based on Riemann–Stieltjes integral, \[\begin{aligned} \mathbb E[k(x,y)] &= \int k(x,y)dH(x,y) \\ &= \int k(x,y)dC(F(x), G(y)) \quad (u_1 := F(x), u_2 := G(y)) \\ &= \int k(F^{-1}(u_1), G^{-1}(u_2)dC(u_1, u_2) \\ &= \mathbb E[k(F^{-1}(U_1), G^{-1}(U_2)] \end{aligned}\]

In this problem, $k(X,y) = I[\vert \sin(X^3 + Y^2) \vert < 0.5]$. Thus, after generating $X$ and $Y$, the LLN (Monte Carlo integral) can be applied.

ii) Use the Monte Carlo integration with random variables drawn from the iid uniform distributions (Algorithm B) until the (estimated) variance of the estimator is less than $10^{−5}$.

Seeing that the copula density exists for Gaussian copula, the expectation $\mathbb E[ I[\vert \sin(X^3 + Y^2) \vert < 0.5] ]$ be expressed as follows after generating the i.i.d uniform distributions $U_{1,i}, U_{2,i}$: \[\mathbb E[ I[\vert \sin(X^3 + Y^2) \vert < 0.5] ] \to \frac{1}{n}\sum_{i = 1}^n I \left[ \vert \sin((F^{-1}(U_{1,i}))^3 + (G^{-1}(U_{2,i}))^2 )\vert \right]*c(U_{1,i}, U_{2,i})\] In this expression, c represents the density of the Gaussian copula. Also, $U_{1,i} = F(X_i)$ and $U_{2,i} = G(Y_i)$. So, this is the main idea of algorithm B to estimate the probability $\mathbb P( I[\vert \sin(X^3 + Y^2) \vert < 0.5] = 1) = \mathbb E[ I[\vert \sin(X^3 + Y^2) \vert < 0.5] ]$, in which $U_{1,i}, U_{2,i}$ are independent uniform distribution.

Another indicator function ‘indicator_iid_unif’ is defined as follows. The input is the correlation coefficient $\rho$. The output itself is not actually indicator, but the name is nominated as ‘indicator’ to simply correspond to the previous function. At first, generate 2 independent random numbers $U_1, U_2$ from uniform distributions, and carry out inverse of gamma function to the$U_1, U_2$, denoted by x and y. The reference value is the absolute value of $\sin (X^3 + Y^2)$. Then, Set the indicator variableI is 1 if the reference is less than 0.5, or 0 otherwise.

Given the correlation matrix R and $q = [\Phi^{-1}(U_1), \hspace{1mm} \Phi^{-1}(U_2)]^{\top}$, the density of Gaussian copula is as follows: \[c(U_1, U_2) = \frac{1}{\sqrt{|R|}}\exp\left\{-\frac{1}{2}q^{\top}(R^{-1} - I)q\right\}, \quad I : (2 \times 2) \text{ identity matrix}\]

The matrix R is $R = \begin{bmatrix}1 & \rho \\ \rho & 1 \end{bmatrix}$, and put $\Phi^{-1}(U_1) = a$ and $\Phi^{-1}(U_2) = b$ for simplicity. hen the Gaussian copula density is \[\begin{aligned} c(U_1, U_2) &= \frac{1}{1 - \rho^2}\exp\left\{ -\frac{1}{2}\begin{bmatrix}a&b\end{bmatrix} \begin{bmatrix} \frac{\rho^2}{1-\rho^2} & \frac{-\rho}{1-\rho^2} \\ \frac{-\rho}{1-\rho^2} & \frac{\rho^2}{1-\rho^2}\end{bmatrix} \begin{bmatrix}a\\b\end{bmatrix}\right\} \\ &= \frac{1}{1 - \rho^2} \exp\left\{-\frac{1}{2(1-\rho^2)}(\rho^2(a^2+b^2) - 2\rho ab) \right\} \end{aligned}\]

Using this expression, the output of this function is the indicator I multiplied by the copula density.

set.seed(stu_id)

indicator_iid_unif <- function(p){# p : corr
  u1 <- runif(1)
  u2 <- runif(1)
  
  x <- qgamma(u1, 3, 1.5)
  y <- qgamma(u2, 7.5, 1)

  ref <- abs(sin(x^3 + y^2))
  if(ref < 0.5){
    I <- 1
  }else{
    I <- 0
  }
  
  a <- qnorm(u1)
  b <- qnorm(u2)
  
  c <- 1/sqrt(1-p^2)*exp(-1/(2*(1-p^2))*(p^2*(a^2+b^2) - 2*p*(a*b)))
  
  return(I*c)
}

The main function ‘hw4_q1_2’ is almost similar to the previous one ‘hw4_q1_1’, except for the fact that an indicator (1 or 0) multiplied by a copula density is the sample. Accordingly, the $\bar X$ and $S^2$ are the sample mean and sample variance of the indicator (1 or 0) multiplied by copula density. Then, the initial value of ‘Xbar’ and an indicator ‘X’ for every iteration in the function are the only variables to change. The rest of the code is identical to the previous function, including the successive computation. This function also checks whether the variance of the sample mean (estimated probability) is less than the input $d = 10^{-5}$ after n is greater than or equal to 30.

hw4_q1_2 <- function(d = 10^(-5)){
  Xbar <- indicator_iid_unif(0.5)
  Ssq <- 0
  n <- 1
  
  while(1){
    n <- n+1
    X <- indicator_iid_unif(0.5)
    Xbar_prev <- Xbar
    Ssq_prev <- Ssq
    
    Xbar <- ((n-1)*Xbar_prev + X)/n
    Ssq <- (1 - 1/(n-1))*Ssq_prev +n*(Xbar - Xbar_prev)^2
    
    if(n >= 30 && Ssq/n < d){break}
  }
  
  result <- data.frame(Xbar = Xbar, Ssq = Ssq, est_var = Ssq/n, n = n)
  return(result)
}

The sample mean (estimated probability) is about 0.332, the sample variance is 0.326, the variance of the estimated probability is 9.9997*$10^{-6}$, and the number of iterations is 32649.

result2 <- hw4_q1_2(10^(-5))
result2

##        Xbar       Ssq      est_var     n
## 1 0.3321876 0.3264809 9.999722e-06 32649

cf) why algorithms B hold, given $H(x,y) = C(F(x), G(y))$

based on Riemann integral, \[ h(x,y) = \frac{\partial^2}{\partial x\partial y} = c(F(x), G(y))f(x)g(y). \]

Using this, \[\begin{aligned} \mathbb E[k(X,Y)] &= \int k(x,y)dH(x,y) \\ &= \int \int k(x,y)h(x,y)dxdy \\ &= \int \int k(x,y)c(F(x), G(y))f(x)g(y) \end{aligned}\]

Let $u_1 = F(x)$ and $u_2 = G(y)$, whih are independent. Then $du_1 = f(x)dx$ and $du_2 = g(y)dy$.

Using this, \[\begin{aligned} \mathbb E[k(X,Y)] &= \int \int k\left( F^{-1}(u_1), G^{-1}(u_2) \right)c(u_1, u_2)du_1du_2 \\ &= \int \int k\left( F^{-1}(u_1), G^{-1}(u_2) \right)c(u_1, u_2)(1*1)du_1du_2\\ &= \mathbb E[k\left( F^{-1}(U_1), G^{-1}(U_2) \right)c(U_1, U_2)] \end{aligned}\]

In this problem, $k(X,Y) = I[ \vert \sin(X^3 + Y^2)\vert < 0.5]$. Thus, after generating independent $U_1, U_2$, the LLN (Monte Carlo integral) can be applied.

Q2. An insurance company sells two insurance policies, auto insurance and fire insurance. Consider the following scenario.

$\bullet$ A new customer signs up for auto insurance according to a Poisson process with rate 50 per week.
$\bullet$ A new customer signs up for fire insurance according to a Poisson process with rate 20 per week.
$\bullet$ A new customer signs up for both insurance policies (auto and fire) according to a Poisson process with rate 10 per week.
$\bullet$ A policyholder holding an auto insurance policy makes a claim according to a Poisson process with rate 0.04 per week, and the amount of a claim has an exponential distribution with mean $1,000.
$\bullet$ A policyholder holding a fire insurance policy makes a claim according to a Poisson process with rate 0.01 per week, and the amount of a claim has an exponential distribution with mean $5,000.
$\bullet$ A policyholder holding an auto insurance policy retains his/her auto insurance policy for an exponentially distributed time with mean 50 weeks (rate 0.02 per week). If the policyholder is with both types of insurance, his/her fire insurance policy still remains in effect even after the termination of the auto insurance policy.
$\bullet$ A policyholder holding a fire insurance policy retains his/her fire insurance policy for an exponentially distributed time with mean 20 weeks (rate 0.05 per week). If the policyholder is with both types of insurance, his/her auto insurance policy still remains in effect even after the termination of the fire insurance policy.
$\bullet$ A policyholder holding only an auto insurance policy continuously pays the company $45 per week.
$\bullet$ A policyholder holding only a fire insurance policy continuously pays the company $55 per week.
$\bullet$ A policyholder holding both insurance policies continuously pays the company $95 per week.

Define week variable $t$ (continuous) and state variables $(n_A, n_F , n_{AF} , a)$, where $n_A$ is the number of policyholders with only auto insurance, $n_F$ is the number of policyholders withonly fire insurance, $n_{AF}$ is the number of policyholders with both types of insurance, and $a$ is the firm’s current capital. Starting from $n_A$ = 100, $n_F$ = 100, $n_{AF}$ = 50, $a$ = $50,000, and $t$ = 0, we want to estimate the probability that the firm’s capital is always nonnegative at all times up to 100 weeks.

i) Explain how you can construct your simulation (using e.g., the distribution of the events, the distribution of the time until the next event occurs, etc…).

Let us check the events written in the problem more in detail. Without specific limitation on the events, it is regarded for all the events as independent. If $X_i$ representsa time of i-event with rate $\lambda_i$, it is known that the minimum of $X_i, \min_i X_i$ follows exponential with rate $\sum_i \lambda_i$, and the probability that $ _i X_i = X_k$ is $\lambda_k/\sum_i \lambda_i$ (* proved later).

(1) New sign-up

The number of events that a new customer signs up for auto insurance (A) follows Poisson process with rate $\nu_A = 50$. So, the time of the first new sign-up for A follows the exponential with $\nu_A = 50$.

The number of events that a new customer signs up for fire insurance (F) follows Poisson process with rate $\nu_F = 20$. So, the time of the first new sign-up for F follows the exponential with $\nu_F = 20$.

The number of events that a new customer signs up for both insurance (AF) follows Poisson process with rate $\nu_{AF} = 10$. So, the time of the first new sign-up for AF follows the exponential with $\nu_{AF} = 10$.

(2) Policy holding time

The period of time for which A holder retains the policy follows exponential distribution with rate $\mu_A=2/100$. In other words, the time of the termination of policy A follows exponential distribution with rate $\mu_A=2/100$. Then, the time of first termination of A-policy among $n_A$people (holding only A policy) follows exponential with rate $n_A\mu_A$, and the number $n_A$ decreases by 1.

Also, the time of first termination of A-policy among $n_{AF}$people follows exponential with rate $n_{AF}\mu_{A}$, while the number $n_{AF}$decreases by 1 and $n_F$ increases by 1. This is because the termination of A for AF holders does not affect the F policy, which results in the customer becoming the F holder.

The period of time for which F holder retains the policy follows exponential distribution with rate $\mu_F=5/100$. In other words, the time of the termination of policy A follows exponential distribution with rate $\mu_F=5/100$. Then, the time of first termination of F-policy among $n_F$people (holding only F policy) follows exponential with rate $n_F\mu_F$, and the number $n_F$ decreases by 1.

Also, the time of first termination of A-policy among $n_{AF}$people follows exponential with rate $n_{AF}\mu_{F}$, while the number $n_{AF}$decreases by 1 and $n_A$ increases by 1. This is because the termination of F for AF holders does not affect the A policy, which results in the customer becoming the A holder.

(3) Claim

The event of claims by A holders follows Poisson process with rate $\lambda_A=4/100$. So, the first time when A policy holders make claims follows exponential distribution with $\lambda_A=4/100$. Since there are $n_A + n_{AF}$ people who have the A-policy, the time of the first claim among $n_A + n_{AF}$ people follows exponential with rate $(n_A + n_{AF})\lambda_A$.

The event of claims by F holders follows Poisson process with rate $\lambda_F=1/100$. So, the first time when F policy holders make claims follows exponential distributions with $\lambda_F=1/100$. Since there are $n_F + n_{AF}$ people who have the F-policy, the time of the first claim among $n_F + n_{AF}$ people follows exponential with rate $(n_F + n_{AF})\lambda_F$.

(4) Time of the next event among all kinds of events

To sum up, the type of events are as follows: new sign-up for A, F, and AF, termination of A for $n_A$ people, termination of A for $n_{AF}$ people, termination of F for $n_F$ people, termination of F for $n_{AF}$ people, claims of A insurance among $n_A + n_{AF}$ people, and claims of F insurance among$n_F + n_{AF}$ people. Thus, the first event among those follows exponential with rate; \[\begin{aligned} z &= \nu_A + \nu_F + \nu_{AF}\\ &+ (n_A*\mu_A) + (n_{AF}*\mu_A) \\ &+ (n_F*\mu_F) + (n_{AF}*\mu_F) \\ &+ (n_A + n_{AF})\lambda_A + (n_F + n_{AF})\lambda_F. \end{aligned}\]

(5) Premium $\&$ amount of claim

There is a regular premium for each insurance. The premium of A is F is and AF are as follows: \[ c_A = \$45 \\ c_F = \$55 \\ c_{AF} = \$95. \]

If A holder claims, the amount of the claim follows exponential distribution with rate $y_A=1/1000$. If F holder claims, the amount of the claim follows exponential distribution with rate $y_F=1/5000$.

(6) Indicator function

Given a fixed time (100 weeks in this case), an indicator function is defined so that it can output whether the capital remains non-negative for the period. (Since it requires that the Capital should be always Non-Negative until , the name of the function is chosen as ‘CNN_indicator’). In this function, all of the parameters (ex. $\mu_A, \lambda_F$…) and initial values (ex. $n_{AF}, a$,…) are determined. Before getting into the loop, initial time $t$ is set to 0, the rate $z$ is set as the above expression, and generate the time of the next event based on the rate $z$, nominated as $t_E$.

(7) While loop

When the generated time of next event is bigger than the given time (denoted as ‘T_’), which usually occurs after a significant number of iterations, the output I is set to ‘1’ and discontinue the loop.

The other case, which is the general case, is when the time of next event is less than or equal to ‘T_’. Then, increase the amount of asset a by the number of each policy holder times the premium of each insurance times the time interval: $a + (n_Ac_A + n_Fc_F + n_{AF}c_{AF})(t_E - t)$ After then, the current time $t$ is shifted to the $t_E$

After resetting the asset and time, the next step is to generate the type of the next event. To categorize the events in minimum numbers, there are 9 events: 1) new sign-up for A, 2) new sign-up for F, 3) new sign-up for AF, 4) termination of A for A holders, 5) termination of A for AF holders, 6) termination of F for F holders, 7) termination of F for AF holders, 8) a claim of A, and 9) claim of F. The number inside a bracket in front of the event name corresponds to the event from now on.

Each event follows exponential distribution with discrete rate parameters respectively (the value of parameters are defined in advance), and the probability that the next event (minimum of the time of next event) is the specific event is proportional to that rate parameter (times the number of holders). Then, generating the type of next event (which will be same as generating integers from 1 to 9) with different probability will be the next step. So, the probability vector ‘p’ is set for each iteration and insert this to the integer-generating function, nominated as ‘random_integers’.

The vector ‘p’ has the 9 rate parameters sequentially; $\nu_A, \nu_F, \nu_{AF}, n_A\mu_A, n_{AF}\mu_A, n_F\mu_F, n_{AF}\mu_F, (n_A + n_{AF}\lambda_A), (n_F + n_{AF})\lambda_F$.

If the generated integer J is 1 with prob $\nu_A/z$, the number of A holders increases by 1. If the J is 2 with prob $\nu_F/z$ the number of F holders increases by 1. If the J is 3 with prob $\nu_{AF}/z$ the number of AF holders increases by 1.

The number J between 4 and 7 covers the case when the policy terminates. J = 4 occurs with probability $n_A\mu_A/z$, so the number of A holders decreases by 1. J = 5 occurs with probability $n_{AF}\mu_A/z$, so the number of AF holders decreases by 1 and the number of F holders increases by 1. J = 6 occurs with probability $n_F\mu_F/z$, so the number of F holders decreases by 1. J = 7 occurs with probability $n_{AF}\mu_F/z$, so the number of AF holders decreases by 1 and the number of A holders increases by 1.

J = 8 means a claim A occurs, with probability $(n_A + n_{AF}\lambda_A)/z$. Then generate the amount of claim that follows $Y \sim \text{Exp}(y_A)$. If the Y is bigger than the current asset, make the indicator I be 0 and stop the iteration. Otherwise, decrease the current asset by the Y. J = 9 means a claim F occurs with probability $(n_F + n_{AF})\lambda_F$. Then, generate the amount of claim that follows $Y \sim \text{Exp}(y_F)$. If the Y is bigger than the current asset, let the indicator 𝐼 be 0 and stop the iteration. Otherwise, decrease the current asset by Y.

After the generation of J and the following action, reset the time of next event $t_E$ by adding the current time ‘t’ and the generated new time from $\text{Exp}(z)$, the parameter ‘z’ of which is changed by the increase or decrease of each policy holders. Then, it will go back to the front of the while loop and check whether the t_E is greater than ‘T_’ again. This iteration will stop until the indicator I is determined between 1 and 0. If the I is determined, this function will end by outputting the value.

(8) Estimated probability

The final step is to estimate the probability of interest. This problem requires to repeat the generation of the indicator for 1000 times and to remain the capital non-negative for 100 weeks. So, add all the 1000 indicators with ‘T_’ = 100 and divide the summation by 1000. This result is the estimated probability. (The name of this function is ‘hw4_q2’).

ii) Estimate the probability of interest with 1,000 replications of the above simulation scenario.

As previously mentioned, integers-generating function is needed for choosing an indicator. N means how many numbers it will return, ‘idx’ means the biggest integer it can generate (the smallest is 1 in this function), and ‘prob’ is the probability vector, the sum of which should not be 1 in the input stage. The ith element of the vector means the integer ‘i’ can be generated in proportion to the ith probability. Using the inverse transform method, integers-generating function can be made as follows. After generating U ~ Unif(0,1), it will check in which region the U is included. If the U is between p[i-1] and p[i] (assume that p[0] is 0), it will return ‘i’ as an output. The bottom of this code shows how this function generates random integers from 1 to 4 with identical probability.

set.seed(stu_id)

random_integers <- function(N, idx, prob = rep(1/idx, idx)){
  X <- c()
  prob <- prob/sum(prob)
  
  for(n in 1:N){
    i <- 1
    sump <- prob[i]
    u <- runif(1)
    
    while(u >= sump){
      sump <- sump + prob[i+1]
      i <- i+1
    }
    X[n] <- i
  }
  return(X)
}

random_integers(10, 4)

##  [1] 1 2 4 2 2 3 3 4 2 3

The following code is the indicator function. vA, vF, vAF means the rate parameters for new sign up, lA, lF means the rate parameter for claim events, yA, yF means the rate parameter for the amount of claim, mA, mF means the rate parameter for terminating the policy, cA, cF, cAF is the weekly premium, nA, nF, nAF is the number of policy holders, ‘a’ is the current amount of asset, and ‘t’ is the current time. ‘z’ is defined identically with the ‘z’ expression in question i), and tE is the generated time of next event for the given ‘z’. For every iteration, reset the asset, current time, probability vector, and generate the integer J. Perform different actions according to the value of J, change the ‘z’ by the after the actions, and then an indicator can be defined at certain iteration.

CNN_indicator <- function(T_){

  vA <- 50
  vF <- 20
  vAF <- 10
  
  lA <- 4/100
  lF <- 1/100
  
  yA <- 1/1000
  yF <- 1/5000
  
  mA <- 2/100
  mF <- 5/100
  
  cA <- 45
  cF <- 55
  cAF <- 95
  
  nA <- 100
  nF <- 100
  nAF <- 50
  a <- 50000
  
  t <- 0

  z <- vA+vF+vAF + nA*mA+nAF*mA + nF*mF+nAF*mF + (nA+nAF)*lA + (nF+nAF)*lF
  tE <- -1/z*log(runif(1))
  
  while(1){
    
    if(tE > T_){
      I <- 1
      break
    }
    else{
      a <- a + (nA*cA + nF*cF + nAF*cAF)*(tE - t)
      t <- tE
      
      p <- c(vA, vF, vAF, nA*mA,nAF*mA, nF*mF,nAF*mF, (nA+nAF)*lA, (nF+nAF)*lF)
      J <- random_integers(1, length(p), p)
      
      if(J == 1){# new A holder
        nA <- nA + 1
      }
      else if(J == 2){# new F holder
        nF <- nF + 1
      }
      else if(J == 3){# new AF holder
        nAF <- nAF + 1
      }
      else if(J == 4){# lost A holder
        nA <- nA - 1
      }
      else if(J == 5){# among AF, lost A
        nAF <- nAF - 1
        nF <- nF + 1
      }
      else if(J == 6){# lost F holder
        nF <- nF - 1
      }
      else if(J == 7){# among AF, lost F
        nAF <- nAF - 1
        nA <- nA + 1
      }
      else if(J == 8){# A claim
        Y <- -1/yA*log(runif(1))
        if(Y > a){
          I <- 0
          break
        }
        else{
          a <- a - Y
        }
      }
      else{# J == 9, F claim
        Y <- -1/yF*log(runif(1))
        if(Y > a){
          I <- 0
          break
        }
        else{
          a <- a - Y
        }
      }# end of condition J = 1,2,,,,
    
    z <- vA+vF+vAF + nA*mA+nAF*mA + nF*mF+nAF*mF + (nA+nAF)*lA + (nF+nAF)*lF  
    tE <- t -1/z*log(runif(1)) 
    }# end of condition whether t > T_
      
  }# end of while loop
  
  return(I)
}

At last, add up all the indicator values for 1000 times, with the input value ‘T_’ as 100. When the summation is divided by 1000, it will be the estimated probability. Based on this code, the probability is about 0.816.

hw4_q2 <- function(T_){
  temp <- 0
  for(i in 1:1000){
    temp <- temp + CNN_indicator(T_)
  }
  
  return(temp/1000)
}

prob <- hw4_q2(100)
prob

## [1] 0.816

cf) $\min_{1\le i \le n} X_i \sim \text{Exp}(\sum_{i=1}^n \lambda_i)$

\[\begin{aligned} \mathbb P(\min_{1\le i \le n} X_i > x) &= \mathbb P(X_1 > x, \dots, X_n > x) \\ &= \prod_{i = 1}^n \mathbb P(X_i > x) \quad \text{i.i.d} \\ &= \prod_{i = 1}^n \exp(-\lambda_i x) \\ &= \exp\left\{ -\left(\sum_{i=1}^n \lambda_i \right)x \right\} \end{aligned}\]

Then, the CDF of $\min_{1\le i \le n} X_i$ is \[\begin{aligned} \mathbb P(\min_{1\le i \le n} X_i \le x) &= 1 - \mathbb P(\min_{1\le i \le n} X_i > x) \\ &= 1 - \exp\left\{ -\left(\sum_{i=1}^n \lambda_i \right)x \right\}, \end{aligned}\] which is the CDF of exponential distribution with rate parameter $\sum_{i=1}^n \lambda_i$.

cf) $\mathbb P\Bigl(\min_{1\le i \le n} X_i = X_k \Bigr)= \lambda_k/\sum_{i=1}^n \lambda_i$

\[\begin{aligned} \mathbb P\Bigl(\min_{1\le i \le n} X_i = X_k \Bigr) &= \mathbb E \biggl[P\Bigl(\min_{1\le i \le n} X_i = X_k|X_k \Bigr) \biggr] \\ &= \int P\Bigl(\min_{1\le i \le n} X_i = X_k|x_k \Bigr)f(x_k)dx_k \\ &= \int \mathbb P[X_i \ge x_k, i \neq k]f(x_k)dx_k \\ &= \int \prod_{i \neq k}\mathbb P[X_i \ge x_k]f(x_k)dx_k \\ &= \int \exp\{-\sum_{i \neq k}\lambda_ix_k\}\lambda_k\exp\{-\lambda_kx_k\}dx_k \\ &=\lambda_k \int_0^{\infty}\exp\{-\sum_{i=1}^n \lambda_i x_k\}dx_k \\ &= \lambda_k/\sum_{i=1}^n \lambda_i \end{aligned}\]

Statistical Computing HW4

i) Use the Monte Carlo integration with random variables drawn from the bivariate Gaussian copula (Algorithm A) until the (estimated) variance of the estimator is less than \(10^{−5}\).

ii) Use the Monte Carlo integration with random variables drawn from the iid uniform distributions (Algorithm B) until the (estimated) variance of the estimator is less than \(10^{−5}\).