The DEmixR package provides tools for fitting and
evaluating two-component mixture models (normal and
lognormal) using robust global optimization via Differential Evolution
(DEoptim
), with local refinement by quasi-Newton
optimization.
It is designed to be more robust than standard EM-based methods, particularly in challenging situations where the mixture components overlap strongly or have complex likelihood surfaces.
You can install the released version of DEmixR
from CRAN
with:
prelim_plots()
– Diagnostic plots (histogram, QQ, PP,
log-QQ) select_best_mixture()
– Compare lognormal vs normal
mixtures using BIC fit_norm2()
– Fit a two-component
normal mixture fit_lognorm2()
– Fit a
two-component lognormal mixture
bootstrap_mix2()
– Bootstrap mixture parameters (parametric
or nonparametric) evaluate_init()
– Refine initial
parameter values with local optimization
Use prelim_plots()
for initial data exploration and
distribution assessment Use select_best_mixture()
when
unsure whether normal or lognormal
distribution is appropriate Use
fit_norm2()
/fit_lognorm2()
when you know the
appropriate distribution family Use bootstrap_mix2()
for
uncertainty quantification and confidence intervals Use
evaluate_init()
for testing specific starting values or
refining solutions
We first generate synthetic data from a two-component
normal mixture and draw the basic plots with
DEmixR
.
# Load the package
library(DEmixR)
# Simulation data
set.seed(123)
x <- c(rnorm(3000, mean = 0, sd = 1),
rnorm(2000, mean = 3, sd = 1.2))
# Diagnostic plots
prelim_plots(x, which = c("hist", "qq"))
# Automatically choose the best mixture family
best_model <- select_best_mixture(x, n_runs = 3, NP = 50, itermax = 2000)
## | | | 0% | |+++++++++++++++++++++++ | 33% | |+++++++++++++++++++++++++++++++++++++++++++++++ | 67% | |++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++| 100%
## [1] "normal"
## normal
## 19547.24
## | | | 0% | |++++++++++++++ | 20% | |++++++++++++++++++++++++++++ | 40% | |++++++++++++++++++++++++++++++++++++++++++ | 60% | |++++++++++++++++++++++++++++++++++++++++++++++++++++++++ | 80% | |++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++| 100%
## p m1 s1 m2 s2
## 0.599738895 0.005086783 0.983210109 2.984878390 1.181022679
## [1] -9752.326
## [1] 19514.65
## [1] 19547.24
# Simulation data
set.seed(123)
y <- c(rlnorm(3000, meanlog = 0, sdlog = 0.5),
rlnorm(2000, meanlog = 1, sdlog = 0.7))
# Fit a two-component lognormal mixture
fit_ln <- fit_lognorm2(y, n_runs = 5, parallelType = 0)
## | | | 0% | |++++++++++++++ | 20% | |++++++++++++++++++++++++++++ | 40% | |++++++++++++++++++++++++++++++++++++++++++ | 60% | |++++++++++++++++++++++++++++++++++++++++++++++++++++++++ | 80% | |++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++| 100%
## p m1 s1 m2 s2
## 0.57503793 -0.01180295 0.48857490 0.95267679 0.69951039
# Bootstrap confidence intervals
boot_res <- bootstrap_mix2(fit_ln, B = 100, parametric = TRUE, parallelType = 0)
boot_res$central
## p m1 s1 m2 s2
## 0.57428385 -0.01681622 0.48844615 0.95834689 0.69416098
## p m1 s1 m2 s2
## 2.5% 0.4115206 -0.06678695 0.4343919 0.7041057 0.6095682
## 97.5% 0.7041991 0.05896932 0.5272649 1.2061600 0.7729422
## $success
## [1] TRUE
##
## $par
## [1] 0.57497232 -0.01185789 0.48853997 0.95258083 0.69950326
##
## $logLik
## [1] -7547.12
##
## $convergence
## [1] 0
##
## $message
## [1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"
# Simulated biomarker data with two populations
set.seed(456)
biomarker <- c(rlnorm(4000, meanlog = 2.5, sdlog = 0.4), # Healthy group
rlnorm(1000, meanlog = 3.8, sdlog = 0.6)) # Disease group
# Initial exploration
prelim_plots(biomarker, which = c("hist", "logqq"))
# Fit lognormal mixture
fit_bio <- try(fit_lognorm2(biomarker, n_runs = 5, parallelType = 0, quiet = 2))
## | | | 0% | |++++++++++++++ | 20% | |++++++++++++++++++++++++++++ | 40% | |++++++++++++++++++++++++++++++++++++++++++ | 60% | |++++++++++++++++++++++++++++++++++++++++++++++++++++++++ | 80% | |++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++| 100%
if (!inherits(fit_bio, "try-error")) {
cat("Biomarker analysis results:\n")
print(fit_bio$par)
# Estimate proportion in each group
cat("\nEstimated proportion in group 1 (healthy):",
round(fit_bio$par[1], 3), "\n")
cat("Estimated proportion in group 2 (disease):",
round(1 - fit_bio$par[1], 3), "\n")
# Bootstrap for confidence intervals
boot_bio <- try(bootstrap_mix2(fit_bio, B = 100, quiet = 2))
if (!inherits(boot_bio, "try-error")) {
cat("\n95% CI for proportion in group 1:\n")
print(boot_bio$ci[, "p"])
}
}
## Biomarker analysis results:
## p m1 s1 m2 s2
## 0.7942352 2.5045852 0.3877220 3.7999468 0.6123135
##
## Estimated proportion in group 1 (healthy): 0.794
## Estimated proportion in group 2 (disease): 0.206
##
## 95% CI for proportion in group 1:
## 2.5% 97.5%
## 0.7543291 0.8244741
This section illustrates some of the optional parameters for fine-tuning performance and diagnostics.
fit_*
# Fit lognormal mixture with custom optimization settings
fit_ln <- fit_lognorm2(
x,
NP = 150, # population size for DEoptim
n_runs = 20, # independent runs to avoid local optima
itermax = 200, # maximum iterations per run
parallelType = 1, # use parallelization across platforms
quiet = 1, # print minimal progress info
par_init = NULL # optionally supply initial values
)
Key options: - NP
: larger = better
exploration, but slower. - n_runs
: increase for more
robustness; with 1000, it is very slow, but have ultra robustness -
parallelType
: 0 = serial
,
1 = socket clusters (Windows/Mac/Linux)
,
2 = forking (Mac/Linux only)
. - quiet
:
0 = verbose
, 1 = minimal
,
2 = silent
.
bootstrap_mix2
boot_res <- bootstrap_mix2(
fit_ln,
B = 500, # number of bootstrap replications
parallelType = 1, # parallel computation
parametric = TRUE, # parametric bootstrap
ci_level = 0.90, # confidence interval level
quiet = 1 # show progress bar
)
boot_res$central # means for normal and medians for lognormal
boot_res$ci # bootstrap confidence intervals
Notes: - Supports both parametric and nonparametric bootstrap. - Uses an internal function to avoid label switching.
prelim_plots
prelim_plots(
x,
which = c("hist", "qq", "pp", "logqq"),
hist_bins = 40,
col_hist = "grey85",
col_density = "darkorange",
col_qq = "grey60",
col_line = "darkorange"
)
Options: - which
: choose one or more of
"hist", "qq", "pp", "logqq"
. - hist_bins
:
number of bins for histogram. - col_*
: customize plot
colors.
select_best_mixture
sel <- select_best_mixture(
x,
n_runs = 5,
NP = 100,
itermax = 150,
quiet = 2
)
sel$best$family
sel$best$par
What it does: - Fits both Normal and Lognormal mixtures. - Compares them via BIC. - Returns the best-performing family.
evaluate_init
init_eval <- evaluate_init(
x,
family = "normal",
n_starts = 10, # number of random initializations
NP = 50, # DEoptim population size
itermax = 100, # maximum iterations
quiet = 2 # suppress output
)
print(init_eval)
Usage: - Helps check the stability of optimization. - Useful to diagnose poor convergence.
The DEmixR package provides robust and flexible tools for mixture model analysis, making it a strong alternative to existing EM-based approaches. Its use of global optimization helps avoid local minima and improve reliability, especially in complex or overlapping mixtures.
Note: Harmless socket connection warnings may appear when using parallelization.
prelim_plots()
to understand your
data distributionselect_best_mixture()
when uncertain about
distribution familyn_runs
for challenging optimization
problemsevaluate_init()
if
convergence is problematic For more information, see the help files for
each function: ?fit_norm2
, ?bootstrap_mix2
,
etc.