EM Algorithm Estimation of the Binary Mediator Misclassification Model
COMMA_EM.Rd
Jointly estimate \(\beta\), \(\gamma\), and \(\theta\) parameters from the true mediator, observed mediator, and outcome mechanisms, respectively, in a binary mediator misclassification model.
Usage
COMMA_EM(
Mstar,
outcome,
outcome_distribution,
interaction_indicator,
x_matrix,
z_matrix,
c_matrix,
beta_start,
gamma_start,
theta_start,
sigma_start = NULL,
tolerance = 1e-07,
max_em_iterations = 1500,
em_method = "squarem"
)
Arguments
- Mstar
A numeric vector of indicator variables (1, 2) for the observed mediator
M*
. There should be noNA
terms. The reference category is 2.- outcome
A vector containing the outcome variables of interest. There should be no
NA
terms.- outcome_distribution
A character string specifying the distribution of the outcome variable. Options are
"Bernoulli"
,"Normal"
, or"Poisson"
.- interaction_indicator
A logical value indicating if an interaction between
x
andm
should be used to generate the outcome variable,y
.- x_matrix
A numeric matrix of predictors in the true mediator and outcome mechanisms.
x_matrix
should not contain an intercept and no values should beNA
.- z_matrix
A numeric matrix of covariates in the observation mechanism.
z_matrix
should not contain an intercept and no values should beNA
.- c_matrix
A numeric matrix of covariates in the true mediator and outcome mechanisms.
c_matrix
should not contain an intercept and no values should beNA
.- beta_start
A numeric vector or column matrix of starting values for the \(\beta\) parameters in the true mediator mechanism. The number of elements in
beta_start
should be equal to the number of columns ofx_matrix
andc_matrix
plus 1. Starting values should be provided in the following order: intercept, slope coefficient for thex_matrix
term, slope coefficient for first column of thec_matrix
, ..., slope coefficient for the final column of thec_matrix
.- gamma_start
A numeric vector or matrix of starting values for the \(\gamma\) parameters in the observation mechanism. In matrix form, the
gamma_start
matrix rows correspond to parameters for theM* = 1
observed mediator, with the dimensions ofz_matrix
plus 1, and the gamma parameter matrix columns correspond to the true mediator categories \(M \in \{1, 2\}\). A numeric vector forgamma_start
is obtained by concatenating the gamma matrix, i.e.gamma_start <- c(gamma_matrix)
. Starting values should be provided in the following order within each column: intercept, slope coefficient for first column of thez_matrix
, ..., slope coefficient for the final column of thez_matrix
.- theta_start
A numeric vector or column matrix of starting values for the \(\theta\) parameters in the outcome mechanism. The number of elements in
theta_start
should be equal to the number of columns ofx_matrix
andc_matrix
plus 2 (ifinteraction_indicator
isFALSE
) or 3 (ifinteraction_indicator
isTRUE
). Starting values should be provided in the following order: intercept, slope coefficient for thex_matrix
term, slope coefficient for the mediatorm
term, slope coefficient for first column of thec_matrix
, ..., slope coefficient for the final column of thec_matrix
, and, optionally, slope coefficient forxm
).- sigma_start
A numeric value specifying the starting value for the standard deviation. This value is only required if
outcome_distribution
is"Normal"
. Otherwise, this value is set toNULL
.- tolerance
A numeric value specifying when to stop estimation, based on the difference of subsequent log-likelihood estimates. The default is
1e-7
.- max_em_iterations
A numeric value specifying when to stop estimation, based on the difference of subsequent log-likelihood estimates. The default is
1e-7
.- em_method
A character string specifying which EM algorithm will be applied. Options are
"em"
,"squarem"
, or"pem"
. The default and recommended option is"squarem"
.
Value
COMMA_EM
returns a data frame containing four columns. The first
column, Parameter
, represents a unique parameter value for each row.
The next column contains the parameter Estimates
, followed by the standard
error estimates, SE
. The final column, Convergence
, reports
whether or not the algorithm converged for a given parameter estimate.
Examples
set.seed(20240709)
sample_size <- 2000
n_cat <- 2 # Number of categories in the binary mediator
# Data generation settings
x_mu <- 0
x_sigma <- 1
z_shape <- 1
c_shape <- 1
# True parameter values (gamma terms set the misclassification rate)
true_beta <- matrix(c(1, -2, .5), ncol = 1)
true_gamma <- matrix(c(1, 1, -.5, -1.5), nrow = 2, byrow = FALSE)
true_theta <- matrix(c(1, 1.5, -2, -.2), ncol = 1)
example_data <- COMMA_data(sample_size, x_mu, x_sigma, z_shape, c_shape,
interaction_indicator = FALSE,
outcome_distribution = "Bernoulli",
true_beta, true_gamma, true_theta)
beta_start <- matrix(rep(1, 3), ncol = 1)
gamma_start <- matrix(rep(1, 4), nrow = 2, ncol = 2)
theta_start <- matrix(rep(1, 4), ncol = 1)
Mstar = example_data[["obs_mediator"]]
outcome = example_data[["outcome"]]
x_matrix = example_data[["x"]]
z_matrix = example_data[["z"]]
c_matrix = example_data[["c"]]
EM_results <- COMMA_EM(Mstar, outcome, "Bernoulli", FALSE,
x_matrix, z_matrix, c_matrix,
beta_start, gamma_start, theta_start)
#> Loading required package: doParallel
#> Loading required package: foreach
#> Loading required package: iterators
#> Loading required package: parallel
#> Loading required package: numDeriv
#> Loading required package: quantreg
#> Loading required package: SparseM
#>
#> Attaching package: 'turboEM'
#> The following objects are masked from 'package:numDeriv':
#>
#> grad, hessian
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
#> Warning: non-integer #successes in a binomial glm!
EM_results
#> Parameter Estimates Convergence
#> 1 beta_0 0.8355459 TRUE
#> 2 beta_1 -1.6157072 TRUE
#> 3 beta_2 0.3864456 TRUE
#> 4 gamma11 1.0924272 TRUE
#> 5 gamma21 1.4723205 TRUE
#> 6 gamma12 -0.2875945 TRUE
#> 7 gamma22 -1.7222641 TRUE
#> 8 theta_0 0.8914868 TRUE
#> 9 theta_x1 1.7143072 TRUE
#> 10 theta_m -1.9727559 TRUE
#> 11 theta_c1 -0.1759259 TRUE