Title: | Segmentation Approaches in Chemometrics |
Version: | 0.1.0 |
Maintainer: | Elia Gonzato <elia.gonzato@outlook.it> |
Description: | Evaluation of prediction performance of smaller regions of spectra for Chemometrics. Segmentation of spectra, evolving dimensions regions and sliding windows as selection methods. Election of the best model among those computed based on error metrics. Chen et al.(2017) <doi:10.1007/s00216-017-0218-9>. |
License: | MIT + file LICENSE |
URL: | https://github.com/egonzato/windows.pls |
BugReports: | https://github.com/egonzato/windows.pls/issues |
Depends: | R (≥ 2.10) |
Imports: | dplyr, ggplot2, grDevices, mdatools, readr, stringr, tidyr, tidyverse |
Suggests: | testthat (≥ 3.0.0) |
Config/testthat/edition: | 3 |
Encoding: | UTF-8 |
LazyData: | true |
NeedsCompilation: | no |
RoxygenNote: | 7.2.3 |
Packaged: | 2023-08-08 18:07:28 UTC; Elia Gonzato |
Author: | Elia Gonzato [aut, cre, cph] |
Repository: | CRAN |
Date/Publication: | 2023-08-09 17:30:02 UTC |
Beer Dataset from Near Infrared Spectroscopy
Description
The beer dataset contains 60 samples published by Norgaard et al. Recorded with a 30mm quartz cell on the undiluted degassed beer and measured from 1100 to 2250 nm (576 data points) in steps of 2 nm. A good playing ground for regression methods starting from spectral intensities.
Usage
beer
Format
beer
A data frame with 80 rows and 577 columns:
- y
Original extract concentration
- xtrain
Intesities measured on 576 different data points
Source
https://www.kaggle.com/datasets/robertoschimmenti/beer-nir?resource=download
References
Norgaard, L., Saudland, A., Wagner, J., Nielsen, J. P., Munck, L., & Engelsen, S. B. (2000). Interval partial least-squares regression (iPLS): a comparative chemometric study with an example from near-infrared spectroscopy. Applied Spectroscopy, 54(3), 413–419. Adapted from a R dataset available as part of the OHPL package (https://search.r-project.org/CRAN/refmans/OHPL/html/00Index.html).
Turns wavelengths into variable's names
Description
Turns wavelengths into variable's names
Usage
convert.names.wl(start = NULL, stop = NULL, step = 2)
Arguments
start |
First wavelength of the spectra. |
stop |
Last wavelength of the spectra. |
step |
Distance between each recorded wavelength. |
Value
Returns vector with syntactically valid names for each wavelength
Examples
data(beer)
X=beer[,2:ncol(beer)]
head(names(X))
names(X)=convert.names.wl(1100,2250,2)
head(names(X))
Cross-validation for segmented spectral regions of the original spectra.
Description
Computes and stores cross-validation metrics for one of the three possible modes ‘wpls’, ‘epls’, ‘swpls’.
Usage
cv.wpls(
xblock = NULL,
yblock = NULL,
windows = 3,
window.size = 30,
increment = 10,
cv = 10,
scale = FALSE,
ncp = 10,
mode = "wpls"
)
Arguments
xblock |
A matrix containing one spectra for each observation. |
yblock |
A vector containing the concentration associated to each spectra in the xblock matrix. |
windows |
Parameter used when either ‘wpls’ or ‘ewpls’ is chosen. Points out how many windows the user wants to divide the spectra in. |
window.size |
Parameter used when ‘swpls’ is chosen. Indicates the width of the window that slides along the spectra. |
increment |
Parameter used when ‘swpls’ is chosen. Indicates how many steps the window slides forward. |
cv |
Number of segments used for cross-validation. |
scale |
logical, asks to perform standardization. |
ncp |
Maximum number of principal components to be computed for each model. |
mode |
'wpls','ewpls' or 'swpls', see Details for more. |
Details
NIR and Vis-NIR technologies are used to obtain spectra which might contain helpful information about the content of the samples the user is investigating. Since this method has been combined with multivariate statistical methods, researchers have been questioning the importance of using spectra in its entirety or if it might be a better solution to divide it in smaller regions which can guarantee higher performance in terms of predictions. Several methods have been proposed, from selecting only some regions to selecting combinations of those which are performing the best. This function provides three possibilities:
-
‘wpls’, which stands for Window PLS, divides the original spectra into several windows, computes PLS and stores metrics of interest such as RMSE and R2 for calibration and cross-validation both.
-
‘ewpls’, which stands for Evolving Window PLS, divides the original spectra into several windows, but each new window incorporates the previous ones, so that we are comparing smaller windows with the entire spectra.
-
’swpls’, which stands for Sliding Window Window PLS, ,asks the width of the window that will be used to compute the model and the step that the window will make forward in the spectra so that a new model is calculated. In this way the window slides along spectra and computes several models, which will be compared with metrics.
This function proposes a simpler version of iPLS, that can be found in the mdatools package, which divides the spectra in smaller segments and tries to find the combination with the lowest RMSE in cross-validation.
Value
Returns a list containing:
xblock |
Matrix containing spectra used to train the model. |
yblock |
Vector containing values of the dependent variable. |
cal |
List containing RMSE and R2 of calibratrion. |
cv |
List containing RMSE and R2 of cross-validation. |
ncp |
Number of components used to compute the model. |
scale |
Contains logical condition used for standardization. |
cv.segment |
Number of segments used for cross-validation. |
References
Chen, J., Yin, Z., Tang, Y. et al. Vis-NIR spectroscopy with moving-window PLS method applied to rapid analysis of whole blood viscosity. Anal Bioanal Chem 409, 2737–2745 (2017).
Y.P. Du, Y.Z. Liang, J.H. Jiang, R.J. Berry, Y. Ozaki, Spectral regions selection to improve prediction ability of PLS models by changeable size moving window partial least squares and searching combination moving window partial least squares, Analytica Chimica Acta, Volume 501, Issue 2, 2004, Pages 183-191,
-
mdatools package, https://github.com/svkucheryavski/mdatools
Examples
data(beer)
conc=beer[,1]
sp=beer[,2:ncol(beer)]
names(sp)=convert.names.wl(1100,2250,2)
conc=unlist(conc)
mywpls=cv.wpls(sp, conc,mode='wpls', windows = 5)
Plots in a single window the R2 of each model.
Description
Plots in a single window the R2 of each model.
Usage
global.r2(
wpls = NULL,
col.cal = "blue",
col.cv = "red",
col.strip.background = "orange",
xlab = NULL,
ylab = NULL,
title = NULL
)
Arguments
wpls |
object obtained from cv.wpls. |
col.cal |
color for the calibration line. |
col.cv |
color for the cross-validation line. |
col.strip.background |
color of the banner for each window. |
xlab |
title of the x axis. |
ylab |
title of the y axis. |
title |
title of the plot. |
Value
Plot of R2 of each spectra region used to compute PLS.
Examples
data(beer)
conc=beer[,1]
sp=beer[,2:ncol(beer)]
names(sp)=convert.names.wl(1100,2250,2)
conc=unlist(conc)
mywpls=cv.wpls(sp, conc,mode='wpls', windows = 5)
global.r2(mywpls,col.cal='navy',
col.cv='red',
col.strip.background='orange',
xlab='Component',
ylab=expression(R^2))
Plots in a single window the RMSE of each model.
Description
Plots in a single window the RMSE of each model.
Usage
global.rmse(
wpls = NULL,
col.cal = "blue",
col.cv = "red",
col.strip.background = "steelblue",
xlab = NULL,
ylab = NULL,
title = NULL
)
Arguments
wpls |
object obtained from cv.wpls. |
col.cal |
color for the calibration line. |
col.cv |
color for the cross-validation line. |
col.strip.background |
color of the banner for each window. |
xlab |
title of the x axis. |
ylab |
title of the y axis. |
title |
title of the plot. |
Value
Plot of RMSE of each spectra region used to compute PLS.
Examples
data(beer)
conc=beer[,1]
sp=beer[,2:ncol(beer)]
names(sp)=convert.names.wl(1100,2250,2)
conc=unlist(conc)
mywpls=cv.wpls(sp, conc,mode='wpls', windows = 5)
global.rmse(mywpls,col.cal='navy',
col.cv='red',
col.strip.background='orange',
xlab='Component',
ylab='RMSE')
Plots spectra highlighting windows with the best performance.
Description
Plots spectra highlighting windows with the best performance.
Usage
map.best.window(
wpls = NULL,
fade = 0.7,
col.window = "steelblue",
xlab = "Wavelengths",
ylab = "Absorbance",
title = NULL,
legend = NULL
)
Arguments
wpls |
object obtained from cv.wpls. |
fade |
opacity of the window. |
col.window |
color of the window that highlights the region. |
xlab |
title of the x axis. |
ylab |
title of the y axis. |
title |
title of the plot. |
legend |
description description |
Value
Plot of the spectra with a window that highlights the region with the lowest cross-validation error.
Examples
data(beer)
conc=beer[,1]
sp=beer[,2:ncol(beer)]
names(sp)=convert.names.wl(1100,2250,2)
conc=unlist(conc)
mywpls=cv.wpls(sp, conc,mode='wpls', windows = 5)
map.best.window(mywpls)
Colors and plots each spectra based on the associated concentration of the outcome variable
Description
Colors and plots each spectra based on the associated concentration of the outcome variable
Usage
map.spectra.gradient(
xblock = NULL,
yblock = NULL,
legend.title = "Gradient",
plot.title = "Spectra with gradient based on Y variable",
xlab = "Wavelength",
ylab = "Absorbance",
grad = 10,
l.width = 0.75,
col.legend = NULL
)
Arguments
xblock |
A matrix containing one spectra for each observation. |
yblock |
A vector containing the concentration associated to each spectra in the xblock matrix. |
legend.title |
Title of the legend which displays the gradient. |
plot.title |
Title of the plot. |
xlab |
Title of the x axis. |
ylab |
Title of the y axis. |
grad |
Number of colors for the gradient's palette. |
l.width |
Width of each spectra. |
col.legend |
Deletes presence of the legend. |
Value
Plot with spectra of all observations, mapped with the intensity of the associated concentration.
Examples
data(beer)
X=beer[,2:ncol(beer)]
names(X)=convert.names.wl(1100,2250,2)
Y=unlist(beer[,1])
map.spectra.gradient(X,Y)
Plots R2 of calibration and cross-validation of a single nindow.
Description
Plots R2 of calibration and cross-validation of a single nindow.
Usage
r2.single.window(
wpls = NULL,
condition = "Complete",
shape.cal = 19,
shape.cv = 19,
width = 1,
size = 2,
col.cal = "blue",
col.cv = "red",
xaxis.title = "Component",
yaxis.title = expression(R^2),
title = paste("Plot of R2 for the", condition, "model"),
legend.name = NULL,
x.legend = 0.9,
y.legend = 0.2
)
Arguments
wpls |
object obtained from cv.wpls. |
condition |
name of the Window the user wants to plot. |
shape.cal |
shape of the point of the calibration line. |
shape.cv |
shape of the point of the cross-validation line. |
width |
width of the line. |
size |
size of the points of calibration and cross-validation. |
col.cal |
color for the calibration line. |
col.cv |
color for the cross-validation line. |
xaxis.title |
title of the x axis. |
yaxis.title |
title of the y axis. |
title |
title of the plot. |
legend.name |
displays legend and its name. |
x.legend |
position of the legend on the x axis, ranges from 0 to 1. |
y.legend |
position of the legend on the y axis, ranges from 0 to 1. |
Value
Plot of R2 of the region requested by the user.
Examples
data(beer)
conc=beer[,1]
sp=beer[,2:ncol(beer)]
names(sp)=convert.names.wl(1100,2250,2)
conc=unlist(conc)
mywpls=cv.wpls(sp, conc,mode='wpls', windows = 5)
r2.single.window(mywpls,'Window2')
Plots RMSE of calibration and cross-validation of a single wnindow.
Description
Plots RMSE of calibration and cross-validation of a single wnindow.
Usage
rmse.single.window(
wpls = NULL,
condition = "Complete",
shape.cal = 19,
shape.cv = 19,
width = 1,
size = 2,
col.cal = "blue",
col.cv = "red",
xaxis.title = "Component",
yaxis.title = "RMSE",
title = paste("Plot of RMSE for the", condition, "model"),
legend.name = NULL,
x.legend = 0.1,
y.legend = 0.2
)
Arguments
wpls |
object obtained from cv.wpls. |
condition |
name of the Window the user wants to plot. |
shape.cal |
shape of the point of the calibration line. |
shape.cv |
shape of the point of the cross-validation line. |
width |
width of the line. |
size |
size of the points of calibration and cross-validation. |
col.cal |
color for the calibration line. |
col.cv |
color for the cross-validation line. |
xaxis.title |
title of the x axis. |
yaxis.title |
title of the y axis. |
title |
title of the plot. |
legend.name |
displays legend and its name. |
x.legend |
position of the legend on the x axis, ranges from 0 to 1. |
y.legend |
position of the legend on the y axis, ranges from 0 to 1. |
Value
Plot of RMSE of the region requested by the user.
Examples
data(beer)
conc=unlist(beer[,1])
sp=beer[,2:ncol(beer)]
names(sp)=convert.names.wl(1100,2250,2)
mywpls=cv.wpls(sp, conc,mode='wpls', windows = 5)
rmse.single.window(mywpls,'Window2')
Displays how spectra are divided in windows
Description
Displays how spectra are divided in windows
Usage
segment.windows(
xblock = NULL,
yblock = NULL,
windows = 3,
fade = 0.3,
xlab = "Wavelength",
ylab = "Absorbance",
title = paste("Spectra divided in", windows, "segments", sep = " "),
legend = NULL,
grad = 10
)
Arguments
xblock |
A matrix containing one spectra for each observation. |
yblock |
A vector containing the concentration associated to each spectra in the xblock matrix. |
windows |
Number of windows the spectra has to be divided in. |
fade |
Opacity of the window. |
xlab |
Title of the x axis. |
ylab |
Title of the y axis. |
title |
Title of the plot. |
legend |
Name of the substance which drives the gradient of spectra’s mapping. |
grad |
Number of colors that are used to build the gradient. |
Value
Plot of spectra in which segments have a different background color.
Examples
data(beer)
conc=unlist(beer[,1])
sp=beer[,2:ncol(beer)]
names(sp)=convert.names.wl(1100,2250,2)
segment.windows(sp,conc,windows=7,fade=0.25)
Selection of the best window computed with cv.wpls
Description
Takes as input the object containing metrics of the several models computed with cv.wpls and selects the best basing on the lowest RMSE available; then computes PLS and gives as output an object containing results.
Usage
sel.best.window(wpls = NULL)
Arguments
wpls |
object obtained from cv.wpls. |
Value
An object containing results of the best model. Has the same content of a model obtained from the function pls of mdatools.
Examples
data(beer)
conc=beer[,1]
sp=beer[,2:ncol(beer)]
names(sp)=convert.names.wl(1100,2250,2)
conc=unlist(conc)
mywpls=cv.wpls(sp, conc,mode='wpls', windows = 5)
best.pls=sel.best.window(mywpls)