
EM-Algorithm Estimation of the Two-Stage Binary Outcome Misclassification Model
COMBO_EM_2stage.RdJointly estimate \(\beta\), \(\gamma^{(1)}\), \(\gamma^{(2)}\) parameters from the true outcome, first-stage observation, and second-stage observation mechanisms, respectively, in a two-stage binary outcome misclassification model.
Usage
COMBO_EM_2stage(
Ystar1,
Ystar2,
x_matrix,
z1_matrix,
z2_matrix,
beta_start,
gamma1_start,
gamma2_start,
tolerance = 1e-07,
max_em_iterations = 1500,
em_method = "squarem"
)Arguments
- Ystar1
A numeric vector of indicator variables (1, 2) for the first-stage observed outcome \(Y^{*(1)}\). There should be no
NAterms. The reference category is 2.- Ystar2
A numeric vector of indicator variables (1, 2) for the second-stage observed outcome \(Y^{*(2)}\). There should be no
NAterms. The reference category is 2.- x_matrix
A numeric matrix of covariates in the true outcome mechanism.
x_matrixshould not contain an intercept and no values should beNA.- z1_matrix
A numeric matrix of covariates in the first-stage observation mechanism.
z1_matrixshould not contain an intercept and no values should beNA.- z2_matrix
A numeric matrix of covariates in the second-stage observation mechanism.
z2_matrixshould not contain an intercept and no values should beNA.- beta_start
A numeric vector or column matrix of starting values for the \(\beta\) parameters in the true outcome mechanism. The number of elements in
beta_startshould be equal to the number of columns ofx_matrixplus 1.- gamma1_start
A numeric vector or matrix of starting values for the \(\gamma^{(1)}\) parameters in the first-stage observation mechanism. In matrix form, the
gamma1_startmatrix rows correspond to parameters for the \(Y^{*(1)} = 1\) first-stage observed outcome, with the dimensions ofz1_matrixplus 1, and the parameter matrix columns correspond to the true outcome categories \(Y \in \{1, 2\}\). A numeric vector forgamma1_startis obtained by concatenating the matrix, i.e.gamma1_start <- c(gamma1_matrix).- gamma2_start
A numeric array of starting values for the \(\gamma^{(2)}\) parameters in the second-stage observation mechanism. The first dimension (matrix rows) of
gamma2_startcorrespond to parameters for the \(Y^{*(2)} = 1\) second-stage observed outcome, with the dimensions of thez2_matrixplus 1. The second dimension (matrix columns) correspond to the first-stage observed outcome categories \(Y^{*(1)} \in \{1, 2\}\). The third dimension ofgamma2_startcorresponds to to the true outcome categories \(Y \in \{1, 2\}\).- tolerance
A numeric value specifying when to stop estimation, based on the difference of subsequent log-likelihood estimates. The default is
1e-7.- max_em_iterations
An integer specifying the maximum number of iterations of the EM algorithm. The default is
1500.- em_method
A character string specifying which EM algorithm will be applied. Options are
"em","squarem", or"pem". The default and recommended option is"squarem".
Value
COMBO_EM_2stage returns a data frame containing four columns. The first
column, Parameter, represents a unique parameter value for each row.
The next column contains the parameter Estimates, followed by the standard
error estimates, SE. The final column, Convergence, reports
whether or not the algorithm converged for a given parameter estimate.
Estimates are provided for the two-stage binary misclassification model.
Examples
# \donttest{
set.seed(123)
n <- 1000
x_mu <- 0
x_sigma <- 1
z1_shape <- 1
z2_shape <- 1
true_beta <- matrix(c(1, -2), ncol = 1)
true_gamma1 <- matrix(c(.5, 1, -.5, -1), nrow = 2, byrow = FALSE)
true_gamma2 <- array(c(1.5, 1, .5, .5, -.5, 0, -1, -1), dim = c(2, 2, 2))
my_data <- COMBO_data_2stage(sample_size = n,
x_mu = x_mu, x_sigma = x_sigma,
z1_shape = z1_shape, z2_shape = z2_shape,
beta = true_beta, gamma1 = true_gamma1, gamma2 = true_gamma2)
table(my_data[["obs_Ystar2"]], my_data[["obs_Ystar1"]], my_data[["true_Y"]])
#> , , = 1
#>
#>
#> 1 2
#> 1 457 113
#> 2 51 38
#>
#> , , = 2
#>
#>
#> 1 2
#> 1 30 40
#> 2 39 232
#>
beta_start <- rnorm(length(c(true_beta)))
gamma1_start <- rnorm(length(c(true_gamma1)))
gamma2_start <- rnorm(length(c(true_gamma2)))
EM_results <- COMBO_EM_2stage(Ystar1 = my_data[["obs_Ystar1"]],
Ystar2 = my_data[["obs_Ystar2"]],
x_matrix = my_data[["x"]],
z1_matrix = my_data[["z1"]],
z2_matrix = my_data[["z2"]],
beta_start = beta_start,
gamma1_start = gamma1_start,
gamma2_start = gamma2_start)
EM_results# }
#> Parameter Estimates SE Convergence
#> 1 beta_1 1.48576087 0.3238284 TRUE
#> 2 beta_2 -2.32657028 0.4682280 TRUE
#> 3 gamma1_11 0.32115176 0.1672771 TRUE
#> 4 gamma1_21 1.18406349 0.2141527 TRUE
#> 5 gamma1_12 -1.16368178 0.3454245 TRUE
#> 6 gamma1_22 -0.76542933 0.2742785 TRUE
#> 7 gamma2_1111 1.09495809 0.2177926 TRUE
#> 8 gamma2_2111 1.41645126 0.3075869 TRUE
#> 9 gamma2_1121 0.16752733 0.6064557 TRUE
#> 10 gamma2_2121 1.11817996 0.3890782 TRUE
#> 11 gamma2_1112 -1.08983781 1.5149193 TRUE
#> 12 gamma2_2112 0.13234707 0.9157872 TRUE
#> 13 gamma2_1122 -0.93782348 0.3907942 TRUE
#> 14 gamma2_2122 -1.45453541 0.4430070 TRUE
#> 15 naive_beta_1 0.06898792 5.3173399 TRUE
#> 16 naive_beta_2 0.07839931 -10.4633987 TRUE
#> 17 naive_gamma2_11 0.17418532 5.8189020 TRUE
#> 18 naive_gamma2_21 0.19302573 4.2266253 TRUE
#> 19 naive_gamma2_12 0.14501424 -3.9715920 TRUE
#> 20 naive_gamma2_22 0.10691412 0.0766478 TRUE