EM Algorithm Estimation of the Binary Mediator Misclassification Model
COMMA_EM.RdJointly estimate \(\beta\), \(\gamma\), and \(\theta\) parameters from the true mediator, observed mediator, and outcome mechanisms, respectively, in a binary mediator misclassification model.
Usage
COMMA_EM(
Mstar,
outcome,
outcome_distribution,
interaction_indicator,
x_matrix,
z_matrix,
c_matrix,
beta_start,
gamma_start,
theta_start,
sigma_start = NULL,
tolerance = 1e-07,
max_em_iterations = 1500,
em_method = "squarem"
)Arguments
- Mstar
A numeric vector of indicator variables (1, 2) for the observed mediator
M*. There should be noNAterms. The reference category is 2.- outcome
A vector containing the outcome variables of interest. There should be no
NAterms.- outcome_distribution
A character string specifying the distribution of the outcome variable. Options are
"Bernoulli","Normal", or"Poisson".- interaction_indicator
A logical value indicating if an interaction between
xandmshould be used to generate the outcome variable,y.- x_matrix
A numeric matrix of predictors in the true mediator and outcome mechanisms.
x_matrixshould not contain an intercept and no values should beNA.- z_matrix
A numeric matrix of covariates in the observation mechanism.
z_matrixshould not contain an intercept and no values should beNA.- c_matrix
A numeric matrix of covariates in the true mediator and outcome mechanisms.
c_matrixshould not contain an intercept and no values should beNA.- beta_start
A numeric vector or column matrix of starting values for the \(\beta\) parameters in the true mediator mechanism. The number of elements in
beta_startshould be equal to the number of columns ofx_matrixandc_matrixplus 1. Starting values should be provided in the following order: intercept, slope coefficient for thex_matrixterm, slope coefficient for first column of thec_matrix, ..., slope coefficient for the final column of thec_matrix.- gamma_start
A numeric vector or matrix of starting values for the \(\gamma\) parameters in the observation mechanism. In matrix form, the
gamma_startmatrix rows correspond to parameters for theM* = 1observed mediator, with the dimensions ofz_matrixplus 1, and the gamma parameter matrix columns correspond to the true mediator categories \(M \in \{1, 2\}\). A numeric vector forgamma_startis obtained by concatenating the gamma matrix, i.e.gamma_start <- c(gamma_matrix). Starting values should be provided in the following order within each column: intercept, slope coefficient for first column of thez_matrix, ..., slope coefficient for the final column of thez_matrix.- theta_start
A numeric vector or column matrix of starting values for the \(\theta\) parameters in the outcome mechanism. The number of elements in
theta_startshould be equal to the number of columns ofx_matrixandc_matrixplus 2 (ifinteraction_indicatorisFALSE) or 3 (ifinteraction_indicatorisTRUE). Starting values should be provided in the following order: intercept, slope coefficient for thex_matrixterm, slope coefficient for the mediatormterm, slope coefficient for first column of thec_matrix, ..., slope coefficient for the final column of thec_matrix, and, optionally, slope coefficient forxm).- sigma_start
A numeric value specifying the starting value for the standard deviation. This value is only required if
outcome_distributionis"Normal". Otherwise, this value is set toNULL.- tolerance
A numeric value specifying when to stop estimation, based on the difference of subsequent log-likelihood estimates. The default is
1e-7.- max_em_iterations
A numeric value specifying when to stop estimation, based on the difference of subsequent log-likelihood estimates. The default is
1e-7.- em_method
A character string specifying which EM algorithm will be applied. Options are
"em","squarem", or"pem". The default and recommended option is"squarem".
Value
COMMA_EM returns a data frame containing four columns. The first
column, Parameter, represents a unique parameter value for each row.
The next column contains the parameter Estimates, followed by the standard
error estimates, SE. The final column, Convergence, reports
whether or not the algorithm converged for a given parameter estimate.
Examples
set.seed(20240709)
sample_size <- 2000
n_cat <- 2 # Number of categories in the binary mediator
# Data generation settings
x_mu <- 0
x_sigma <- 1
z_shape <- 1
c_shape <- 1
# True parameter values (gamma terms set the misclassification rate)
true_beta <- matrix(c(1, -2, .5), ncol = 1)
true_gamma <- matrix(c(1, 1, -.5, -1.5), nrow = 2, byrow = FALSE)
true_theta <- matrix(c(1, 1.5, -2, -.2), ncol = 1)
example_data <- COMMA_data(sample_size, x_mu, x_sigma, z_shape, c_shape,
interaction_indicator = FALSE,
outcome_distribution = "Bernoulli",
true_beta, true_gamma, true_theta)
beta_start <- matrix(rep(1, 3), ncol = 1)
gamma_start <- matrix(rep(1, 4), nrow = 2, ncol = 2)
theta_start <- matrix(rep(1, 4), ncol = 1)
Mstar = example_data[["obs_mediator"]]
outcome = example_data[["outcome"]]
x_matrix = example_data[["x"]]
z_matrix = example_data[["z"]]
c_matrix = example_data[["c"]]
EM_results <- COMMA_EM(Mstar, outcome, "Bernoulli", FALSE,
x_matrix, z_matrix, c_matrix,
beta_start, gamma_start, theta_start)
#> Loading required package: doParallel
#> Loading required package: foreach
#> Loading required package: iterators
#> Loading required package: parallel
#> Loading required package: numDeriv
#> Loading required package: quantreg
#> Loading required package: SparseM
#>
#> Attaching package: 'turboEM'
#> The following objects are masked from 'package:numDeriv':
#>
#> grad, hessian
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
EM_results
#> Parameter Estimates Convergence
#> 1 beta_0 0.8355459 TRUE
#> 2 beta_1 -1.6157072 TRUE
#> 3 beta_2 0.3864456 TRUE
#> 4 gamma11 1.0924272 TRUE
#> 5 gamma21 1.4723205 TRUE
#> 6 gamma12 -0.2875945 TRUE
#> 7 gamma22 -1.7222641 TRUE
#> 8 theta_0 0.8914868 TRUE
#> 9 theta_x1 1.7143072 TRUE
#> 10 theta_m -1.9727559 TRUE
#> 11 theta_c1 -0.1759259 TRUE