Estimate \(\beta\), \(\gamma\), and \(\theta\) parameters from the true mediator, observed mediator, and outcome mechanisms, respectively, in a binary mediator misclassification model using a predictive value weighting approach.


  tolerance = 1e-07,
  max_em_iterations = 1500,
  em_method = "squarem"



A numeric vector of indicator variables (1, 2) for the observed mediator M*. There should be no NA terms. The reference category is 2.


A vector containing the outcome variables of interest. There should be no NA terms.


A character string specifying the distribution of the outcome variable. Options are "Bernoulli", "Poisson", or "Normal".


A logical value indicating if an interaction between x and m should be used to generate the outcome variable, y.


A numeric matrix of predictors in the true mediator and outcome mechanisms. x_matrix should not contain an intercept and no values should be NA.


A numeric matrix of covariates in the observation mechanism. z_matrix should not contain an intercept and no values should be NA.


A numeric matrix of covariates in the true mediator and outcome mechanisms. c_matrix should not contain an intercept and no values should be NA.


A numeric vector or column matrix of starting values for the \(\beta\) parameters in the true mediator mechanism. The number of elements in beta_start should be equal to the number of columns of x_matrix and c_matrix plus 1. Starting values should be provided in the following order: intercept, slope coefficient for the x_matrix term, slope coefficient for first column of the c_matrix, ..., slope coefficient for the final column of the c_matrix.


A numeric vector or matrix of starting values for the \(\gamma\) parameters in the observation mechanism. In matrix form, the gamma_start matrix rows correspond to parameters for the M* = 1 observed mediator, with the dimensions of z_matrix plus 1, and the gamma parameter matrix columns correspond to the true mediator categories \(M \in \{1, 2\}\). A numeric vector for gamma_start is obtained by concatenating the gamma matrix, i.e. gamma_start <- c(gamma_matrix). Starting values should be provided in the following order within each column: intercept, slope coefficient for first column of the z_matrix, ..., slope coefficient for the final column of the z_matrix.


A numeric vector or column matrix of starting values for the \(\theta\) parameters in the outcome mechanism. The number of elements in theta_start should be equal to the number of columns of x_matrix and c_matrix plus 2 (if interaction_indicator is FALSE) or 3 (if interaction_indicator is TRUE). Starting values should be provided in the following order: intercept, slope coefficient for the x_matrix term, slope coefficient for the mediator m term, slope coefficient for first column of the c_matrix, ..., slope coefficient for the final column of the c_matrix, and, optionally, slope coefficient for xm).


A numeric value specifying when to stop estimation, based on the difference of subsequent log-likelihood estimates. The default is 1e-7.


A character string specifying which EM algorithm will be applied. Options are "em", "squarem", or "pem". The default and recommended option is "squarem".


COMMA_PVW returns a data frame containing four columns. The first column, Parameter, represents a unique parameter value for each row. The next column contains the parameter Estimates. The third column, Convergence, reports whether or not the algorithm converged for a given parameter estimate. The final column, Method, reports that the estimates are obtained from the "PVW" procedure.


Note that this method can only be used for binary outcome models.


sample_size <- 2000

n_cat <- 2 # Number of categories in the binary mediator

# Data generation settings
x_mu <- 0
x_sigma <- 1
z_shape <- 1
c_shape <- 1

# True parameter values (gamma terms set the misclassification rate)
true_beta <- matrix(c(1, -2, .5), ncol = 1)
true_gamma <- matrix(c(1, 1, -.5, -1.5), nrow = 2, byrow = FALSE)
true_theta <- matrix(c(1, 1.5, -2, -.2), ncol = 1)

example_data <- COMMA_data(sample_size, x_mu, x_sigma, z_shape, c_shape,
                           interaction_indicator = FALSE,
                           outcome_distribution = "Bernoulli",
                           true_beta, true_gamma, true_theta)
beta_start <- matrix(rep(1, 3), ncol = 1)
gamma_start <- matrix(rep(1, 4), nrow = 2, ncol = 2)
theta_start <- matrix(rep(1, 4), ncol = 1)

Mstar = example_data[["obs_mediator"]]
outcome = example_data[["outcome"]]
x_matrix = example_data[["x"]]
z_matrix = example_data[["z"]]
c_matrix = example_data[["c"]]
PVW_results <- COMMA_PVW(Mstar, outcome, outcome_distribution = "Bernoulli",
                         interaction_indicator = FALSE,
                         x_matrix, z_matrix, c_matrix,
                         beta_start, gamma_start, theta_start)
#> Warning: non-integer #successes in a binomial glm!

#>    Parameter  Estimates Convergence Method
#> 1     beta_1  0.8272721        TRUE    PVW
#> 2     beta_2 -1.6154039        TRUE    PVW
#> 3     beta_3  0.3586729        TRUE    PVW
#> 4    gamma11  1.2279060        TRUE    PVW
#> 5    gamma21  1.3535571        TRUE    PVW
#> 6    gamma12 -0.4846708        TRUE    PVW
#> 7    gamma22 -1.4126826        TRUE    PVW
#> 8    theta_0  0.6481385        TRUE    PVW
#> 9   theta_x1  1.7419080        TRUE    PVW
#> 10   theta_m -1.6710536        TRUE    PVW
#> 11  theta_c1 -0.1796049        TRUE    PVW