Written by: Josh Rosenberg

Primary Source: Joshua M. Rosenberg, September 14, 2017

# Introduction

At present, MPlus is a widely-used tool to carry out Latent Profile Analysis, and there does not seem to be a widely-accepted or used way to carry out Latent Profile Analysis in R. This compares output from MPlus to output from the R package MCLUST, which is accessed through the package tidymixmod which I started to work on to make estimating common models used for LPA easier to carry out using MCLUST.

### Links to resources:

- MPlus
- MPlusAutomation (this package uses MPlus; MPlus has to be purchased and installed for it to work)
- MCLUST
- tidymixmod (this is the package we’re currently working on; it provides an interface to MCLUST)

# Loading, setting up

First, let’s load a few packages and setup.

```
library(tidyverse)
library(MplusAutomation)
library(stringr)
# if tidymixmod is not installed, can use the line below to install
# devtools::install_github("jrosen48/tidymixmod")
library(tidymixmod)
the_dir <- "/Users/joshuarosenberg/Dropbox/5_Professional/homepage/source/static/_media/data/"
```

Next, we’ll write a few functions to view MPlus summary statistics from the output.

```
extract_mplus_summary <- function(x) {
x$summaries[c("LL", "BIC", "Entropy")]
}
extract_mplus_paramaters <- function(x) {
y <- x$parameters[[1]]
y <- y[-nrow(y), ]
dplyr::select(y, param_name = paramHeader, var_name = param, est, se, class = LatentClass)
}
```

Here we are running MPlus models using MPlusAutomation. The specification of each of the three models is available from the post here.

```
runModels(paste0(the_dir, "2-iris-LPA_means.inp"))
runModels(paste0(the_dir, "2-iris-LPA_means_correlated.inp"))
runModels(paste0(the_dir, "2-iris-LPA_means_correlated_free_variances.inp"))
m1_mplus_out <- readModels(paste0(the_dir, "2-iris-LPA_means.out"))
m2_mplus_out <- readModels(paste0(the_dir, "2-iris-LPA_means_correlated.out"))
m3_mplus_out <- readModels(paste0(the_dir, "2-iris-LPA_means_correlated_free_variances.out"))
m1_mplus <- m1_mplus_out %>%
extract_mplus_paramaters() %>%
mutate(class = paste0("class_", class)) %>%
select(-se) %>%
spread(class, est)
m2_mplus <- m2_mplus_out %>%
extract_mplus_paramaters() %>%
mutate(class = paste0("class_", class)) %>%
select(-se) %>%
spread(class, est)
m3_mplus <- m3_mplus_out %>%
extract_mplus_paramaters() %>%
mutate(class = paste0("class_", class)) %>%
select(-se) %>%
spread(class, est)
```

Here we are running MCLUST models using the R package tidymixmod, which interfaces to the R package MCLUST.

```
m1_mclust <- create_profiles_mclust(iris[, -5], n_profiles = 2, model_name = "constrained_variance", to_return = "mclust")
m2_mclust <- create_profiles_mclust(iris[, -5], n_profiles = 2, model_name = "constrained_variance_and_covariance", to_return = "mclust")
m3_mclust <- create_profiles_mclust(iris[, -5], n_profiles = 2, model_name = "freed_variance_and_covariance", to_return = "mclust")
```

# Comparing MPlus and MCLUST summary statistics

*LL is log-likelihood*

```
list(m1_mplus_out, m2_mplus_out, m3_mplus_out) %>%
map_df(extract_mplus_summary) %>%
mutate(model = paste0("m", 1:3),
method = "MPlus") %>%
select(method, model, everything())
```

```
## method model LL BIC Entropy
## 1 MPlus m1 -488.915 1042.968 0.991
## 2 MPlus m2 -296.448 688.097 1.000
## 3 MPlus m3 -214.355 574.018 1.000
```

```
list(m1_mclust, m2_mclust, m3_mclust) %>%
map_df(extract_mclust_summary) %>%
mutate(model = paste0("m", 1:3),
method = "MCLUST") %>%
select(method, model, everything())
```

```
## method model LL BIC Entropy
## 1 MCLUST m1 -488.915 1042.968 0.998
## 2 MCLUST m2 -296.448 688.097 1.000
## 3 MCLUST m3 -214.355 574.018 1.000
```

There is a slight difference in the entropy statistic for the first model (means with constrained variances), possibly due to a rounding error from the MPlus output.

# Comparing MPlus and MCLUST parameter estimates

#### For m1 (varying means, constrained variances)

```
m1_m <- extract_means(m1_mclust) %>%
mutate(class_1 = round(class_1, 3),
class_2 = round(class_2, 3))
m1_c1_v <- extract_variance(m1_mclust, 1)
m1_c2_v <- extract_variance(m1_mclust, 2)
# MPlus output
m1_mplus %>%
mutate(model = "MPlus") %>%
select(model, everything()) %>%
arrange(param_name, desc(class_1))
```

```
## model param_name var_name class_1 class_2
## 1 MPlus Means SEPAL_LENG 5.006 6.265
## 2 MPlus Means SEPAL_WIDT 3.423 2.873
## 3 MPlus Means PETAL_LENG 1.471 4.911
## 4 MPlus Means PETAL_WIDT 0.250 1.678
## 5 MPlus Variances PETAL_LENG 0.459 0.459
## 6 MPlus Variances SEPAL_LENG 0.328 0.328
## 7 MPlus Variances PETAL_WIDT 0.123 0.123
## 8 MPlus Variances SEPAL_WIDT 0.121 0.121
```

```
# MCLUST output
bind_rows(m1_c1_v, m1_c2_v) %>%
spread(class, est) %>%
bind_rows(m1_m) %>%
mutate(model = "MCLUST") %>%
select(model, everything()) %>%
arrange(param_name, desc(class_1)) %>%
mutate(var_name = toupper(var_name),
var_name = str_sub(var_name, end = 10),
var_name = str_replace(var_name, "\\.", "_"))
```

```
## # A tibble: 8 x 5
## model param_name var_name class_1 class_2
## <chr> <chr> <chr> <dbl> <dbl>
## 1 MCLUST Means SEPAL_LENG 5.006 6.265
## 2 MCLUST Means SEPAL_WIDT 3.423 2.873
## 3 MCLUST Means PETAL_LENG 1.471 4.911
## 4 MCLUST Means PETAL_WIDT 0.250 1.678
## 5 MCLUST Variances PETAL_LENG 0.459 0.459
## 6 MCLUST Variances SEPAL_LENG 0.328 0.328
## 7 MCLUST Variances PETAL_WIDT 0.123 0.123
## 8 MCLUST Variances SEPAL_WIDT 0.121 0.121
```

These seem to be identical.

#### For m2 (varying means, constrained variances and covariances)

```
m2_m <- extract_means(m2_mclust) %>%
mutate(class_1 = round(class_1, 3),
class_2 = round(class_2, 3))
m2_c1_v <- extract_variance(m2_mclust, 1)
m2_c2_v <- extract_variance(m2_mclust, 2)
m2_c1_cv <- extract_covariance(m2_mclust, 1) %>%
semi_join(m2_mplus, by = c("param_name", "var_name")) %>%
mutate(class = "class_1")
m2_c2_cv <- extract_covariance(m2_mclust, 2) %>%
semi_join(m2_mplus, by = c("param_name", "var_name")) %>%
mutate(class = "class_2")
m2_cv <- bind_rows(m2_c1_cv, m2_c2_cv) %>%
mutate(est = round(est, 3),
model = "MCLUST") %>%
spread(class, est)
# MPlus output
m2_mplus %>%
mutate(model = "MPlus") %>%
select(model, everything()) %>%
arrange(param_name, desc(class_1))
```

```
## model param_name var_name class_1 class_2
## 1 MPlus Means SEPAL_LENG 5.006 6.262
## 2 MPlus Means SEPAL_WIDT 3.428 2.872
## 3 MPlus Means PETAL_LENG 1.462 4.906
## 4 MPlus Means PETAL_WIDT 0.246 1.676
## 5 MPlus PETAL_LE.WITH PETAL_WIDT 0.193 0.193
## 6 MPlus SEPAL_LE.WITH PETAL_LENG 0.305 0.305
## 7 MPlus SEPAL_LE.WITH PETAL_WIDT 0.114 0.114
## 8 MPlus SEPAL_LE.WITH SEPAL_WIDT 0.113 0.113
## 9 MPlus SEPAL_WI.WITH PETAL_LENG 0.098 0.098
## 10 MPlus SEPAL_WI.WITH PETAL_WIDT 0.056 0.056
## 11 MPlus Variances PETAL_LENG 0.460 0.460
## 12 MPlus Variances SEPAL_LENG 0.331 0.331
## 13 MPlus Variances PETAL_WIDT 0.123 0.123
## 14 MPlus Variances SEPAL_WIDT 0.120 0.120
```

```
# MCLUST output
bind_rows(m2_c1_v, m2_c2_v) %>%
spread(class, est) %>%
bind_rows(m1_m) %>%
mutate(model = "MCLUST") %>%
select(model, everything()) %>%
bind_rows(m2_cv) %>%
arrange(param_name) %>%
arrange(param_name, desc(class_1)) %>%
mutate(var_name = toupper(var_name),
var_name = str_sub(var_name, end = 10),
var_name = str_replace(var_name, "\\.", "_"))
```

```
## # A tibble: 14 x 5
## model param_name var_name class_1 class_2
## <chr> <chr> <chr> <dbl> <dbl>
## 1 MCLUST Means SEPAL_LENG 5.006 6.265
## 2 MCLUST Means SEPAL_WIDT 3.423 2.873
## 3 MCLUST Means PETAL_LENG 1.471 4.911
## 4 MCLUST Means PETAL_WIDT 0.250 1.678
## 5 MCLUST PETAL_LE.WITH PETAL_WIDT 0.193 0.193
## 6 MCLUST SEPAL_LE.WITH PETAL_LENG 0.305 0.305
## 7 MCLUST SEPAL_LE.WITH PETAL_WIDT 0.114 0.114
## 8 MCLUST SEPAL_LE.WITH SEPAL_WIDT 0.113 0.113
## 9 MCLUST SEPAL_WI.WITH PETAL_LENG 0.098 0.098
## 10 MCLUST SEPAL_WI.WITH PETAL_WIDT 0.056 0.056
## 11 MCLUST Variances PETAL_LENG 0.460 0.460
## 12 MCLUST Variances SEPAL_LENG 0.331 0.331
## 13 MCLUST Variances PETAL_WIDT 0.123 0.123
## 14 MCLUST Variances SEPAL_WIDT 0.120 0.120
```

There seem to be a few differences in the hundreths place.

#### For m3 (varying means, freed covariances and covariances)

```
m3_m <- extract_means(m3_mclust) %>%
mutate(class_1 = round(class_1, 3),
class_2 = round(class_2, 3))
m3_c1_v <- extract_variance(m3_mclust, 1)
m3_c2_v <- extract_variance(m3_mclust, 2)
m3_c1_cv <- extract_covariance(m3_mclust, 1) %>%
semi_join(m3_mplus, by = c("param_name", "var_name")) %>%
mutate(class = "class_1")
m3_c2_cv <- extract_covariance(m3_mclust, 2) %>%
semi_join(m3_mplus, by = c("param_name", "var_name")) %>%
mutate(class = "class_2")
m3_cv <- bind_rows(m3_c1_cv, m3_c2_cv) %>%
mutate(est = round(est, 3),
model = "MCLUST") %>%
spread(class, est)
# MPlus output
m3_mplus %>%
mutate(model = "MPlus") %>%
select(model, everything()) %>%
arrange(param_name, desc(class_1))
```

```
## model param_name var_name class_1 class_2
## 1 MPlus Means SEPAL_LENG 5.006 6.262
## 2 MPlus Means SEPAL_WIDT 3.428 2.872
## 3 MPlus Means PETAL_LENG 1.462 4.906
## 4 MPlus Means PETAL_WIDT 0.246 1.676
## 5 MPlus PETAL_LE.WITH PETAL_WIDT 0.006 0.286
## 6 MPlus SEPAL_LE.WITH SEPAL_WIDT 0.097 0.121
## 7 MPlus SEPAL_LE.WITH PETAL_LENG 0.016 0.449
## 8 MPlus SEPAL_LE.WITH PETAL_WIDT 0.010 0.166
## 9 MPlus SEPAL_WI.WITH PETAL_LENG 0.011 0.141
## 10 MPlus SEPAL_WI.WITH PETAL_WIDT 0.009 0.079
## 11 MPlus Variances SEPAL_WIDT 0.141 0.110
## 12 MPlus Variances SEPAL_LENG 0.122 0.435
## 13 MPlus Variances PETAL_LENG 0.030 0.675
## 14 MPlus Variances PETAL_WIDT 0.011 0.179
```

```
# MCLUST output
bind_rows(m1_c1_v, m1_c2_v) %>%
spread(class, est) %>%
bind_rows(m1_m) %>%
mutate(model = "MCLUST") %>%
select(model, everything()) %>%
bind_rows(m3_cv) %>%
arrange(param_name) %>%
arrange(param_name, desc(class_1)) %>%
mutate(var_name = toupper(var_name),
var_name = str_sub(var_name, end = 10),
var_name = str_replace(var_name, "\\.", "_"))
```

```
## # A tibble: 14 x 5
## model param_name var_name class_1 class_2
## <chr> <chr> <chr> <dbl> <dbl>
## 1 MCLUST Means SEPAL_LENG 5.006 6.265
## 2 MCLUST Means SEPAL_WIDT 3.423 2.873
## 3 MCLUST Means PETAL_LENG 1.471 4.911
## 4 MCLUST Means PETAL_WIDT 0.250 1.678
## 5 MCLUST PETAL_LE.WITH PETAL_WIDT 0.006 0.286
## 6 MCLUST SEPAL_LE.WITH SEPAL_WIDT 0.097 0.121
## 7 MCLUST SEPAL_LE.WITH PETAL_LENG 0.016 0.449
## 8 MCLUST SEPAL_LE.WITH PETAL_WIDT 0.010 0.166
## 9 MCLUST SEPAL_WI.WITH PETAL_LENG 0.011 0.141
## 10 MCLUST SEPAL_WI.WITH PETAL_WIDT 0.009 0.079
## 11 MCLUST Variances PETAL_LENG 0.459 0.459
## 12 MCLUST Variances SEPAL_LENG 0.328 0.328
## 13 MCLUST Variances PETAL_WIDT 0.123 0.123
## 14 MCLUST Variances SEPAL_WIDT 0.121 0.121
```

These seem to be a bit different at the tenths or hundreths decimal point.

# Conclusion

At least for these simple models, the output appears to be identical, with slight differences in the entropy statistic and some of the parameter estimates for the third models.

More complex models **may** be different:

- MPlus uses random starts to initialize ML, while MClust initializes ML with starts from hierarchical clustering.
- MPlus seems to use restricted MLR to obtain robust standard errors, whereas MClust uses weighted likelihood bootstrap methods; I’m not sure if these are different for more complex models, and so we should check the standard errors.

Future directions include:

- Deciding on a uniform interface for prcr and tidymixmod.
- Merging tidymixmod with prcr (our R package for person-oriented analysis).

#### Latest posts by Josh Rosenberg (see all)

- In what months are educational psychology jobs posted? An update! - July 9, 2018
- Evolution of a (data) visualization - June 28, 2018
- Learning R (for data analysis and data science): Where to start - June 8, 2018