Predictive Value Weighting Estimation of the Binary Mediator Misclassification Model
COMMA_PVW.Rd
Estimate \(\beta\), \(\gamma\), and \(\theta\) parameters from the true mediator, observed mediator, and outcome mechanisms, respectively, in a binary mediator misclassification model using a predictive value weighting approach.
Usage
COMMA_PVW(
Mstar,
outcome,
outcome_distribution,
interaction_indicator,
x_matrix,
z_matrix,
c_matrix,
beta_start,
gamma_start,
theta_start,
tolerance = 1e-07,
max_em_iterations = 1500,
em_method = "squarem"
)
Arguments
- Mstar
A numeric vector of indicator variables (1, 2) for the observed mediator
M*
. There should be noNA
terms. The reference category is 2.- outcome
A vector containing the outcome variables of interest. There should be no
NA
terms.- outcome_distribution
A character string specifying the distribution of the outcome variable. Options are
"Bernoulli"
,"Poisson"
, or"Normal"
.- interaction_indicator
A logical value indicating if an interaction between
x
andm
should be used to generate the outcome variable,y
.- x_matrix
A numeric matrix of predictors in the true mediator and outcome mechanisms.
x_matrix
should not contain an intercept and no values should beNA
.- z_matrix
A numeric matrix of covariates in the observation mechanism.
z_matrix
should not contain an intercept and no values should beNA
.- c_matrix
A numeric matrix of covariates in the true mediator and outcome mechanisms.
c_matrix
should not contain an intercept and no values should beNA
.- beta_start
A numeric vector or column matrix of starting values for the \(\beta\) parameters in the true mediator mechanism. The number of elements in
beta_start
should be equal to the number of columns ofx_matrix
andc_matrix
plus 1. Starting values should be provided in the following order: intercept, slope coefficient for thex_matrix
term, slope coefficient for first column of thec_matrix
, ..., slope coefficient for the final column of thec_matrix
.- gamma_start
A numeric vector or matrix of starting values for the \(\gamma\) parameters in the observation mechanism. In matrix form, the
gamma_start
matrix rows correspond to parameters for theM* = 1
observed mediator, with the dimensions ofz_matrix
plus 1, and the gamma parameter matrix columns correspond to the true mediator categories \(M \in \{1, 2\}\). A numeric vector forgamma_start
is obtained by concatenating the gamma matrix, i.e.gamma_start <- c(gamma_matrix)
. Starting values should be provided in the following order within each column: intercept, slope coefficient for first column of thez_matrix
, ..., slope coefficient for the final column of thez_matrix
.- theta_start
A numeric vector or column matrix of starting values for the \(\theta\) parameters in the outcome mechanism. The number of elements in
theta_start
should be equal to the number of columns ofx_matrix
andc_matrix
plus 2 (ifinteraction_indicator
isFALSE
) or 3 (ifinteraction_indicator
isTRUE
). Starting values should be provided in the following order: intercept, slope coefficient for thex_matrix
term, slope coefficient for the mediatorm
term, slope coefficient for first column of thec_matrix
, ..., slope coefficient for the final column of thec_matrix
, and, optionally, slope coefficient forxm
).- tolerance
A numeric value specifying when to stop estimation, based on the difference of subsequent log-likelihood estimates. The default is
1e-7
.- max_em_iterations
A numeric value specifying when to stop estimation, based on the difference of subsequent log-likelihood estimates. The default is
1e-7
.- em_method
A character string specifying which EM algorithm will be applied. Options are
"em"
,"squarem"
, or"pem"
. The default and recommended option is"squarem"
.
Value
COMMA_PVW
returns a data frame containing four columns. The first
column, Parameter
, represents a unique parameter value for each row.
The next column contains the parameter Estimates
. The third column,
Convergence
, reports whether or not the algorithm converged for a
given parameter estimate. The final column, Method
, reports
that the estimates are obtained from the "PVW" procedure.
Examples
set.seed(20240709)
sample_size <- 2000
n_cat <- 2 # Number of categories in the binary mediator
# Data generation settings
x_mu <- 0
x_sigma <- 1
z_shape <- 1
c_shape <- 1
# True parameter values (gamma terms set the misclassification rate)
true_beta <- matrix(c(1, -2, .5), ncol = 1)
true_gamma <- matrix(c(1, 1, -.5, -1.5), nrow = 2, byrow = FALSE)
true_theta <- matrix(c(1, 1.5, -2, -.2), ncol = 1)
example_data <- COMMA_data(sample_size, x_mu, x_sigma, z_shape, c_shape,
interaction_indicator = FALSE,
outcome_distribution = "Bernoulli",
true_beta, true_gamma, true_theta)
beta_start <- matrix(rep(1, 3), ncol = 1)
gamma_start <- matrix(rep(1, 4), nrow = 2, ncol = 2)
theta_start <- matrix(rep(1, 4), ncol = 1)
Mstar = example_data[["obs_mediator"]]
outcome = example_data[["outcome"]]
x_matrix = example_data[["x"]]
z_matrix = example_data[["z"]]
c_matrix = example_data[["c"]]
PVW_results <- COMMA_PVW(Mstar, outcome, outcome_distribution = "Bernoulli",
interaction_indicator = FALSE,
x_matrix, z_matrix, c_matrix,
beta_start, gamma_start, theta_start)
#> Warning: non-integer #successes in a binomial glm!
PVW_results
#> Parameter Estimates Convergence Method
#> 1 beta_1 0.8272721 TRUE PVW
#> 2 beta_2 -1.6154039 TRUE PVW
#> 3 beta_3 0.3586729 TRUE PVW
#> 4 gamma11 1.2279060 TRUE PVW
#> 5 gamma21 1.3535571 TRUE PVW
#> 6 gamma12 -0.4846708 TRUE PVW
#> 7 gamma22 -1.4126826 TRUE PVW
#> 8 theta_0 0.6481385 TRUE PVW
#> 9 theta_x1 1.7419080 TRUE PVW
#> 10 theta_m -1.6710536 TRUE PVW
#> 11 theta_c1 -0.1796049 TRUE PVW