EM-Algorithm Estimation of the Two-Stage Binary Outcome Misclassification Model
COMBO_EM_2stage.Rd
Jointly estimate \(\beta\), \(\gamma^{(1)}\), \(\gamma^{(2)}\) parameters from the true outcome, first-stage observation, and second-stage observation mechanisms, respectively, in a two-stage binary outcome misclassification model.
Usage
COMBO_EM_2stage(
Ystar1,
Ystar2,
x_matrix,
z1_matrix,
z2_matrix,
beta_start,
gamma1_start,
gamma2_start,
tolerance = 1e-07,
max_em_iterations = 1500,
em_method = "squarem"
)
Arguments
- Ystar1
A numeric vector of indicator variables (1, 2) for the first-stage observed outcome \(Y^{*(1)}\). There should be no
NA
terms. The reference category is 2.- Ystar2
A numeric vector of indicator variables (1, 2) for the second-stage observed outcome \(Y^{*(2)}\). There should be no
NA
terms. The reference category is 2.- x_matrix
A numeric matrix of covariates in the true outcome mechanism.
x_matrix
should not contain an intercept and no values should beNA
.- z1_matrix
A numeric matrix of covariates in the first-stage observation mechanism.
z1_matrix
should not contain an intercept and no values should beNA
.- z2_matrix
A numeric matrix of covariates in the second-stage observation mechanism.
z2_matrix
should not contain an intercept and no values should beNA
.- beta_start
A numeric vector or column matrix of starting values for the \(\beta\) parameters in the true outcome mechanism. The number of elements in
beta_start
should be equal to the number of columns ofx_matrix
plus 1.- gamma1_start
A numeric vector or matrix of starting values for the \(\gamma^{(1)}\) parameters in the first-stage observation mechanism. In matrix form, the
gamma1_start
matrix rows correspond to parameters for the \(Y^{*(1)} = 1\) first-stage observed outcome, with the dimensions ofz1_matrix
plus 1, and the parameter matrix columns correspond to the true outcome categories \(Y \in \{1, 2\}\). A numeric vector forgamma1_start
is obtained by concatenating the matrix, i.e.gamma1_start <- c(gamma1_matrix)
.- gamma2_start
A numeric array of starting values for the \(\gamma^{(2)}\) parameters in the second-stage observation mechanism. The first dimension (matrix rows) of
gamma2_start
correspond to parameters for the \(Y^{*(2)} = 1\) second-stage observed outcome, with the dimensions of thez2_matrix
plus 1. The second dimension (matrix columns) correspond to the first-stage observed outcome categories \(Y^{*(1)} \in \{1, 2\}\). The third dimension ofgamma2_start
corresponds to to the true outcome categories \(Y \in \{1, 2\}\).- tolerance
A numeric value specifying when to stop estimation, based on the difference of subsequent log-likelihood estimates. The default is
1e-7
.- max_em_iterations
An integer specifying the maximum number of iterations of the EM algorithm. The default is
1500
.- em_method
A character string specifying which EM algorithm will be applied. Options are
"em"
,"squarem"
, or"pem"
. The default and recommended option is"squarem"
.
Value
COMBO_EM_2stage
returns a data frame containing four columns. The first
column, Parameter
, represents a unique parameter value for each row.
The next column contains the parameter Estimates
, followed by the standard
error estimates, SE
. The final column, Convergence
, reports
whether or not the algorithm converged for a given parameter estimate.
Estimates are provided for the two-stage binary misclassification model.
Examples
# \donttest{
set.seed(123)
n <- 1000
x_mu <- 0
x_sigma <- 1
z1_shape <- 1
z2_shape <- 1
true_beta <- matrix(c(1, -2), ncol = 1)
true_gamma1 <- matrix(c(.5, 1, -.5, -1), nrow = 2, byrow = FALSE)
true_gamma2 <- array(c(1.5, 1, .5, .5, -.5, 0, -1, -1), dim = c(2, 2, 2))
my_data <- COMBO_data_2stage(sample_size = n,
x_mu = x_mu, x_sigma = x_sigma,
z1_shape = z1_shape, z2_shape = z2_shape,
beta = true_beta, gamma1 = true_gamma1, gamma2 = true_gamma2)
table(my_data[["obs_Ystar2"]], my_data[["obs_Ystar1"]], my_data[["true_Y"]])
#> , , = 1
#>
#>
#> 1 2
#> 1 457 113
#> 2 51 38
#>
#> , , = 2
#>
#>
#> 1 2
#> 1 30 40
#> 2 39 232
#>
beta_start <- rnorm(length(c(true_beta)))
gamma1_start <- rnorm(length(c(true_gamma1)))
gamma2_start <- rnorm(length(c(true_gamma2)))
EM_results <- COMBO_EM_2stage(Ystar1 = my_data[["obs_Ystar1"]],
Ystar2 = my_data[["obs_Ystar2"]],
x_matrix = my_data[["x"]],
z1_matrix = my_data[["z1"]],
z2_matrix = my_data[["z2"]],
beta_start = beta_start,
gamma1_start = gamma1_start,
gamma2_start = gamma2_start)
EM_results# }
#> Parameter Estimates SE Convergence
#> 1 beta_1 1.48576087 0.3238284 TRUE
#> 2 beta_2 -2.32657028 0.4682280 TRUE
#> 3 gamma1_11 0.32115176 0.1672771 TRUE
#> 4 gamma1_21 1.18406349 0.2141527 TRUE
#> 5 gamma1_12 -1.16368178 0.3454245 TRUE
#> 6 gamma1_22 -0.76542933 0.2742785 TRUE
#> 7 gamma2_1111 1.09495809 0.2177926 TRUE
#> 8 gamma2_2111 1.41645126 0.3075869 TRUE
#> 9 gamma2_1121 0.16752733 0.6064557 TRUE
#> 10 gamma2_2121 1.11817996 0.3890782 TRUE
#> 11 gamma2_1112 -1.08983781 1.5149193 TRUE
#> 12 gamma2_2112 0.13234707 0.9157872 TRUE
#> 13 gamma2_1122 -0.93782348 0.3907942 TRUE
#> 14 gamma2_2122 -1.45453541 0.4430070 TRUE
#> 15 naive_beta_1 0.06898792 5.3173399 TRUE
#> 16 naive_beta_2 0.07839931 -10.4633987 TRUE
#> 17 naive_gamma2_11 0.17418532 5.8189020 TRUE
#> 18 naive_gamma2_21 0.19302573 4.2266253 TRUE
#> 19 naive_gamma2_12 0.14501424 -3.9715920 TRUE
#> 20 naive_gamma2_22 0.10691412 0.0766478 TRUE