MCMC Estimation of the Two-Stage Binary Outcome Misclassification Model
COMBO_MCMC_2stage.Rd
Jointly estimate \(\beta\), \(\gamma^{(1)}\), and \(\gamma^{(2)}\) parameters from the true outcome first-stage observation, and second-stage observation mechanisms, respectively, in a two-stage binary outcome misclassification model.
Usage
COMBO_MCMC_2stage(
Ystar1,
Ystar2,
x_matrix,
z1_matrix,
z2_matrix,
prior,
beta_prior_parameters,
gamma1_prior_parameters,
gamma2_prior_parameters,
naive_gamma2_prior_parameters,
number_MCMC_chains = 4,
MCMC_sample = 2000,
burn_in = 1000,
display_progress = TRUE
)
Arguments
- Ystar1
A numeric vector of indicator variables (1, 2) for the observed outcome \(Y^{*(1)}\). The reference category is 2.
- Ystar2
A numeric vector of indicator variables (1, 2) for the second-stage observed outcome \(Y^{*(2)}\). There should be no
NA
terms. The reference category is 2.- x_matrix
A numeric matrix of covariates in the true outcome mechanism.
x_matrix
should not contain an intercept.- z1_matrix
A numeric matrix of covariates in the observation mechanism.
z1_matrix
should not contain an intercept.- z2_matrix
A numeric matrix of covariates in the second-stage observation mechanism.
z2_matrix
should not contain an intercept and no values should beNA
.- prior
A character string specifying the prior distribution for the \(\beta\), \(\gamma^{(1)}\), and \(\gamma^{(2)}\) parameters. Options are
"t"
,"uniform"
,"normal"
, or"dexp"
(double Exponential, or Weibull).- beta_prior_parameters
A numeric list of prior distribution parameters for the \(\beta\) terms. For prior distributions
"t"
,"uniform"
,"normal"
, or"dexp"
, the first element of the list should contain a matrix of location, lower bound, mean, or shape parameters, respectively, for \(\beta\) terms. For prior distributions"t"
,"uniform"
,"normal"
, or"dexp"
, the second element of the list should contain a matrix of shape, upper bound, standard deviation, or scale parameters, respectively, for \(\beta\) terms. For prior distribution"t"
, the third element of the list should contain a matrix of the degrees of freedom for \(\beta\) terms. The third list element should be empty for all other prior distributions. All matrices in the list should have dimensionsn_cat
Xdim_x
, and all elements in then_cat
row should be set toNA
.- gamma1_prior_parameters
A numeric list of prior distribution parameters for the \(\gamma^{(1)}\) terms. For prior distributions
"t"
,"uniform"
,"normal"
, or"dexp"
, the first element of the list should contain an array of location, lower bound, mean, or shape parameters, respectively, for \(\gamma^{(1)}\) terms. For prior distributions"t"
,"uniform"
,"normal"
, or"dexp"
, the second element of the list should contain an array of shape, upper bound, standard deviation, or scale parameters, respectively, for \(\gamma^{(1)}\) terms. For prior distribution"t"
, the third element of the list should contain an array of the degrees of freedom for \(\gamma^{(1)}\) terms. The third list element should be empty for all other prior distributions. All arrays in the list should have dimensionsn_cat
Xn_cat
Xdim_z1
, and all elements in then_cat
row should be set toNA
.- gamma2_prior_parameters
A numeric list of prior distribution parameters for the \(\gamma^{(2)}\) terms. For prior distributions
"t"
,"uniform"
,"normal"
, or"dexp"
, the first element of the list should contain an array of location, lower bound, mean, or shape parameters, respectively, for \(\gamma^{(2)}\) terms. For prior distributions"t"
,"uniform"
,"normal"
, or"dexp"
, the second element of the list should contain an array of shape, upper bound, standard deviation, or scale parameters, respectively, for \(\gamma^{(2)}\) terms. For prior distribution"t"
, the third element of the list should contain an array of the degrees of freedom for \(\gamma^{(2)}\) terms. The third list element should be empty for all other prior distributions. All arrays in the list should have dimensionsn_cat
Xn_cat
Xn_cat
Xdim_z2
, and all elements in then_cat
row should be set toNA
.- naive_gamma2_prior_parameters
A numeric list of prior distribution parameters for the naive model \(\gamma^{(2)}\) terms. For prior distributions
"t"
,"uniform"
,"normal"
, or"dexp"
, the first element of the list should contain an array of location, lower bound, mean, or shape parameters, respectively, for naive \(\gamma^{(2)}\) terms. For prior distributions"t"
,"uniform"
,"normal"
, or"dexp"
, the second element of the list should contain an array of shape, upper bound, standard deviation, or scale parameters, respectively, for naive \(\gamma^{(2)}\) terms. For prior distribution"t"
, the third element of the list should contain an array of the degrees of freedom for naive \(\gamma^{(2)}\) terms. The third list element should be empty for all other prior distributions. All arrays in the list should have dimensionsn_cat
Xn_cat
Xdim_z2
, and all elements in then_cat
row should be set toNA
. Note that prior distributions for the naive \(\beta\) terms are inherted from thebeta_prior_parameters
argument.- number_MCMC_chains
An integer specifying the number of MCMC chains to compute. The default is
4
.- MCMC_sample
An integer specifying the number of MCMC samples to draw. The default is
2000
.- burn_in
An integer specifying the number of MCMC samples to discard for the burn-in period. The default is
1000
.- display_progress
A logical value specifying whether messages should be displayed during model compilation. The default is
TRUE
.
Value
COMBO_MCMC_2stage
returns a list of the posterior samples and posterior
means for both the binary outcome misclassification model and a naive logistic
regression of the observed outcome, Y*
, predicted by the matrix x
.
The list contains the following components:
- posterior_sample_df
A data frame containing three columns. The first column indicates the chain from which a sample is taken, from 1 to
number_MCMC_chains
. The second column specifies the parameter associated with a given row. \(\beta\) terms have dimensionsdim_x
Xn_cat
. The \(\gamma^{(1)}\) terms have dimensionsn_cat
Xn_cat
Xdim_z1
, where the first index specifies the first-stage observed outcome category and the second index specifies the true outcome category. The \(\gamma^{(2)}\) terms have dimensionsn_cat
Xn_cat
Xn_cat
Xdim_z2
, where the first index specifies the second-stage observed outcome category, the second index specifies the first-stage observed outcome category, and the third index specifies the true outcome category. The final column provides the MCMC sample.- posterior_means_df
A data frame containing three columns. The first column specifies the parameter associated with a given row. Parameters are indexed as in the
posterior_sample_df
. The second column provides the posterior mean computed across all chains and all samples. The final column provides the posterior median computed across all chains and all samples.- naive_posterior_sample_df
A data frame containing three columns. The first column indicates the chain from which a sample is taken, from 1 to
number_MCMC_chains
. The second column specifies the parameter associated with a given row. Naive \(\beta\) terms have dimensionsdim_x
Xn_cat
. The final column provides the MCMC sample.- naive_posterior_means_df
A data frame containing three columns. The first column specifies the naive parameter associated with a given row. Parameters are indexed as in the
naive_posterior_sample_df
. The second column provides the posterior mean computed across all chains and all samples. The final column provides the posterior median computed across all chains and all samples.
Examples
# \donttest{
# Helper functions
sum_every_n <- function(x, n){
vector_groups = split(x,
ceiling(seq_along(x) / n))
sum_x = Reduce(`+`, vector_groups)
return(sum_x)
}
sum_every_n1 <- function(x, n){
vector_groups = split(x,
ceiling(seq_along(x) / n))
sum_x = Reduce(`+`, vector_groups) + 1
return(sum_x)
}
# Example
set.seed(123)
n <- 1000
x_mu <- 0
x_sigma <- 1
z1_shape <- 1
z2_shape <- 1
true_beta <- matrix(c(1, -2), ncol = 1)
true_gamma1 <- matrix(c(.5, 1, -.5, -1), nrow = 2, byrow = FALSE)
true_gamma2 <- array(c(1.5, 1, .5, .5, -.5, 0, -1, -1), dim = c(2, 2, 2))
x_matrix = matrix(rnorm(n, x_mu, x_sigma), ncol = 1)
X = matrix(c(rep(1, n), x_matrix[,1]), ncol = 2, byrow = FALSE)
z1_matrix = matrix(rgamma(n, z1_shape), ncol = 1)
Z1 = matrix(c(rep(1, n), z1_matrix[,1]), ncol = 2, byrow = FALSE)
z2_matrix = matrix(rgamma(n, z2_shape), ncol = 1)
Z2 = matrix(c(rep(1, n), z2_matrix[,1]), ncol = 2, byrow = FALSE)
exp_xb = exp(X %*% true_beta)
pi_result = exp_xb[,1] / (exp_xb[,1] + 1)
pi_matrix = matrix(c(pi_result, 1 - pi_result), ncol = 2, byrow = FALSE)
true_Y <- rep(NA, n)
for(i in 1:n){
true_Y[i] = which(stats::rmultinom(1, 1, pi_matrix[i,]) == 1)
}
exp_z1g1 = exp(Z1 %*% true_gamma1)
pistar1_denominator = matrix(c(1 + exp_z1g1[,1], 1 + exp_z1g1[,2]),
ncol = 2, byrow = FALSE)
pistar1_result = exp_z1g1 / pistar1_denominator
pistar1_matrix = matrix(c(pistar1_result[,1], 1 - pistar1_result[,1],
pistar1_result[,2], 1 - pistar1_result[,2]),
ncol = 2, byrow = FALSE)
obs_Y1 <- rep(NA, n)
for(i in 1:n){
true_j = true_Y[i]
obs_Y1[i] = which(rmultinom(1, 1,
pistar1_matrix[c(i, n + i),
true_j]) == 1)
}
Ystar1 <- obs_Y1
exp_z2g2_1 = exp(Z2 %*% true_gamma2[,,1])
exp_z2g2_2 = exp(Z2 %*% true_gamma2[,,2])
pi_denominator1 = apply(exp_z2g2_1, FUN = sum_every_n1, n, MARGIN = 2)
pi_result1 = exp_z2g2_1 / rbind(pi_denominator1)
pi_denominator2 = apply(exp_z2g2_2, FUN = sum_every_n1, n, MARGIN = 2)
pi_result2 = exp_z2g2_2 / rbind(pi_denominator2)
pistar2_matrix1 = rbind(pi_result1,
1 - apply(pi_result1,
FUN = sum_every_n, n = n,
MARGIN = 2))
pistar2_matrix2 = rbind(pi_result2,
1 - apply(pi_result2,
FUN = sum_every_n, n = n,
MARGIN = 2))
pistar2_array = array(c(pistar2_matrix1, pistar2_matrix2),
dim = c(dim(pistar2_matrix1), 2))
obs_Y2 <- rep(NA, n)
for(i in 1:n){
true_j = true_Y[i]
obs_k = Ystar1[i]
obs_Y2[i] = which(rmultinom(1, 1,
pistar2_array[c(i,n+ i),
obs_k, true_j]) == 1)
}
Ystar2 <- obs_Y2
unif_lower_beta <- matrix(c(-5, -5, NA, NA), nrow = 2, byrow = TRUE)
unif_upper_beta <- matrix(c(5, 5, NA, NA), nrow = 2, byrow = TRUE)
unif_lower_gamma1 <- array(data = c(-5, NA, -5, NA, -5, NA, -5, NA),
dim = c(2,2,2))
unif_upper_gamma1 <- array(data = c(5, NA, 5, NA, 5, NA, 5, NA),
dim = c(2,2,2))
unif_upper_gamma2 <- array(rep(c(5, NA), 8), dim = c(2,2,2,2))
unif_lower_gamma2 <- array(rep(c(-5, NA), 8), dim = c(2,2,2,2))
unif_lower_naive_gamma2 <- array(data = c(-5, NA, -5, NA, -5, NA, -5, NA),
dim = c(2,2,2))
unif_upper_naive_gamma2 <- array(data = c(5, NA, 5, NA, 5, NA, 5, NA),
dim = c(2,2,2))
beta_prior_parameters <- list(lower = unif_lower_beta, upper = unif_upper_beta)
gamma1_prior_parameters <- list(lower = unif_lower_gamma1, upper = unif_upper_gamma1)
gamma2_prior_parameters <- list(lower = unif_lower_gamma2, upper = unif_upper_gamma2)
naive_gamma2_prior_parameters <- list(lower = unif_lower_naive_gamma2,
upper = unif_upper_naive_gamma2)
MCMC_results <- COMBO_MCMC_2stage(Ystar1, Ystar2,
x_matrix = x_matrix, z1_matrix = z1_matrix,
z2_matrix = z2_matrix,
prior = "uniform",
beta_prior_parameters = beta_prior_parameters,
gamma1_prior_parameters = gamma1_prior_parameters,
gamma2_prior_parameters = gamma2_prior_parameters,
naive_gamma2_prior_parameters = naive_gamma2_prior_parameters,
number_MCMC_chains = 2,
MCMC_sample = 200, burn_in = 100)
#> Compiling model graph
#> Resolving undeclared variables
#> Allocating nodes
#> Graph information:
#> Observed stochastic nodes: 2000
#> Unobserved stochastic nodes: 14
#> Total graph size: 81063
#>
#> Initializing model
#>
#> Compiling model graph
#> Resolving undeclared variables
#> Allocating nodes
#> Graph information:
#> Observed stochastic nodes: 2000
#> Unobserved stochastic nodes: 6
#> Total graph size: 36030
#>
#> Initializing model
#>
MCMC_results$posterior_means_df# }
#> # A tibble: 14 × 3
#> parameter_name posterior_mean posterior_median
#> <fct> <dbl> <dbl>
#> 1 beta[1,1] 1.59 1.60
#> 2 beta[1,2] -2.18 -2.13
#> 3 gamma1[1,1,1] 0.340 0.329
#> 4 gamma1[1,2,1] -1.17 -1.15
#> 5 gamma1[1,1,2] 1.20 1.21
#> 6 gamma1[1,2,2] -2.15 -2.05
#> 7 gamma2[1,1,1,1] 0.853 0.766
#> 8 gamma2[1,2,1,1] 0.578 0.368
#> 9 gamma2[1,1,2,1] -0.342 -0.383
#> 10 gamma2[1,2,2,1] -0.933 -0.826
#> 11 gamma2[1,1,1,2] 1.18 1.17
#> 12 gamma2[1,2,1,2] 2.65 2.59
#> 13 gamma2[1,1,2,2] -0.656 -0.896
#> 14 gamma2[1,2,2,2] -2.48 -2.41