MCMC Estimation of the Binary Outcome Misclassification Model
COMBO_MCMC.Rd
Jointly estimate \(\beta\) and \(\gamma\) parameters from the true outcome and observation mechanisms, respectively, in a binary outcome misclassification model.
Usage
COMBO_MCMC(
Ystar,
x_matrix,
z_matrix,
prior,
beta_prior_parameters,
gamma_prior_parameters,
number_MCMC_chains = 4,
MCMC_sample = 2000,
burn_in = 1000,
display_progress = TRUE
)
Arguments
- Ystar
A numeric vector of indicator variables (1, 2) for the observed outcome
Y*
. The reference category is 2.- x_matrix
A numeric matrix of covariates in the true outcome mechanism.
x_matrix
should not contain an intercept.- z_matrix
A numeric matrix of covariates in the observation mechanism.
z_matrix
should not contain an intercept.- prior
A character string specifying the prior distribution for the \(\beta\) and \(\gamma\) parameters. Options are
"t"
,"uniform"
,"normal"
, or"dexp"
(double Exponential, or Weibull).- beta_prior_parameters
A numeric list of prior distribution parameters for the \(\beta\) terms. For prior distributions
"t"
,"uniform"
,"normal"
, or"dexp"
, the first element of the list should contain a matrix of location, lower bound, mean, or shape parameters, respectively, for \(\beta\) terms. For prior distributions"t"
,"uniform"
,"normal"
, or"dexp"
, the second element of the list should contain a matrix of shape, upper bound, standard deviation, or scale parameters, respectively, for \(\beta\) terms. For prior distribution"t"
, the third element of the list should contain a matrix of the degrees of freedom for \(\beta\) terms. The third list element should be empty for all other prior distributions. All matrices in the list should have dimensionsn_cat
Xdim_x
, and all elements in then_cat
row should be set toNA
.- gamma_prior_parameters
A numeric list of prior distribution parameters for the \(\gamma\) terms. For prior distributions
"t"
,"uniform"
,"normal"
, or"dexp"
, the first element of the list should contain an array of location, lower bound, mean, or shape parameters, respectively, for \(\gamma\) terms. For prior distributions"t"
,"uniform"
,"normal"
, or"dexp"
, the second element of the list should contain an array of shape, upper bound, standard deviation, or scale parameters, respectively, for \(\gamma\) terms. For prior distribution"t"
, the third element of the list should contain an array of the degrees of freedom for \(\gamma\) terms. The third list element should be empty for all other prior distributions. All arrays in the list should have dimensionsn_cat
Xn_cat
Xdim_z
, and all elements in then_cat
row should be set toNA
.- number_MCMC_chains
An integer specifying the number of MCMC chains to compute. The default is
4
.- MCMC_sample
An integer specifying the number of MCMC samples to draw. The default is
2000
.- burn_in
An integer specifying the number of MCMC samples to discard for the burn-in period. The default is
1000
.- display_progress
A logical value specifying whether messages should be displayed during model compilation. The default is
TRUE
.
Value
COMBO_MCMC
returns a list of the posterior samples and posterior
means for both the binary outcome misclassification model and a naive logistic
regression of the observed outcome, Y*
, predicted by the matrix x
.
The list contains the following components:
- posterior_sample_df
A data frame containing three columns. The first column indicates the chain from which a sample is taken, from 1 to
number_MCMC_chains
. The second column specifies the parameter associated with a given row. \(\beta\) terms have dimensionsdim_x
Xn_cat
. The \(\gamma\) terms have dimensionsn_cat
Xn_cat
Xdim_z
, where the first index specifies the observed outcome category and the second index specifies the true outcome category. The final column provides the MCMC sample.- posterior_means_df
A data frame containing three columns. The first column specifies the parameter associated with a given row. Parameters are indexed as in the
posterior_sample_df
. The second column provides the posterior mean computed across all chains and all samples. The final column provides the posterior median computed across all chains and all samples.- naive_posterior_sample_df
A data frame containing three columns. The first column indicates the chain from which a sample is taken, from 1 to
number_MCMC_chains
. The second column specifies the parameter associated with a given row. Naive \(\beta\) terms have dimensionsdim_x
Xn_cat
. The final column provides the MCMC sample.- naive_posterior_means_df
A data frame containing three columns. The first column specifies the naive parameter associated with a given row. Parameters are indexed as in the
naive_posterior_sample_df
. The second column provides the posterior mean computed across all chains and all samples. The final column provides the posterior median computed across all chains and all samples.
Examples
# \donttest{
set.seed(123)
n <- 1000
x_mu <- 0
x_sigma <- 1
z_shape <- 1
true_beta <- matrix(c(1, -2), ncol = 1)
true_gamma <- matrix(c(.5, 1, -.5, -1), nrow = 2, byrow = FALSE)
x_matrix = matrix(rnorm(n, x_mu, x_sigma), ncol = 1)
X = matrix(c(rep(1, n), x_matrix[,1]), ncol = 2, byrow = FALSE)
z_matrix = matrix(rgamma(n, z_shape), ncol = 1)
Z = matrix(c(rep(1, n), z_matrix[,1]), ncol = 2, byrow = FALSE)
exp_xb = exp(X %*% true_beta)
pi_result = exp_xb[,1] / (exp_xb[,1] + 1)
pi_matrix = matrix(c(pi_result, 1 - pi_result), ncol = 2, byrow = FALSE)
true_Y <- rep(NA, n)
for(i in 1:n){
true_Y[i] = which(stats::rmultinom(1, 1, pi_matrix[i,]) == 1)
}
exp_zg = exp(Z %*% true_gamma)
pistar_denominator = matrix(c(1 + exp_zg[,1], 1 + exp_zg[,2]), ncol = 2, byrow = FALSE)
pistar_result = exp_zg / pistar_denominator
pistar_matrix = matrix(c(pistar_result[,1], 1 - pistar_result[,1],
pistar_result[,2], 1 - pistar_result[,2]),
ncol = 2, byrow = FALSE)
obs_Y <- rep(NA, n)
for(i in 1:n){
true_j = true_Y[i]
obs_Y[i] = which(rmultinom(1, 1,
pistar_matrix[c(i, n + i),
true_j]) == 1)
}
Ystar <- obs_Y
unif_lower_beta <- matrix(c(-5, -5, NA, NA), nrow = 2, byrow = TRUE)
unif_upper_beta <- matrix(c(5, 5, NA, NA), nrow = 2, byrow = TRUE)
unif_lower_gamma <- array(data = c(-5, NA, -5, NA, -5, NA, -5, NA),
dim = c(2,2,2))
unif_upper_gamma <- array(data = c(5, NA, 5, NA, 5, NA, 5, NA),
dim = c(2,2,2))
beta_prior_parameters <- list(lower = unif_lower_beta, upper = unif_upper_beta)
gamma_prior_parameters <- list(lower = unif_lower_gamma, upper = unif_upper_gamma)
MCMC_results <- COMBO_MCMC(Ystar, x = x_matrix, z = z_matrix,
prior = "uniform",
beta_prior_parameters = beta_prior_parameters,
gamma_prior_parameters = gamma_prior_parameters,
number_MCMC_chains = 2,
MCMC_sample = 200, burn_in = 100)
#> Compiling model graph
#> Resolving undeclared variables
#> Allocating nodes
#> Graph information:
#> Observed stochastic nodes: 1000
#> Unobserved stochastic nodes: 6
#> Total graph size: 35030
#>
#> Initializing model
#>
#> Compiling model graph
#> Resolving undeclared variables
#> Allocating nodes
#> Graph information:
#> Observed stochastic nodes: 1000
#> Unobserved stochastic nodes: 2
#> Total graph size: 12013
#>
#> Initializing model
#>
MCMC_results$posterior_means_df# }
#> # A tibble: 6 × 3
#> parameter_name posterior_mean posterior_median
#> <fct> <dbl> <dbl>
#> 1 beta[1,1] 1.05 1.07
#> 2 beta[1,2] -2.40 -2.39
#> 3 gamma[1,1,1] 0.538 0.552
#> 4 gamma[1,2,1] -0.0422 -0.0907
#> 5 gamma[1,1,2] 0.982 0.904
#> 6 gamma[1,2,2] -1.61 -1.43