| Title: | The COR for Optimal Subset Selection in Distributed Estimation |
| Date: | 2024-12-10 |
| Version: | 0.2.0 |
| Description: | An algorithm of optimal subset selection, related to Covariance matrices, observation matrices and Response vectors (COR) to select the optimal subsets in distributed estimation. The philosophy of the package is described in Guo G. (2024) <doi:10.1007/s11222-024-10471-z>. |
| License: | MIT + file LICENSE |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.2 |
| Imports: | stats |
| NeedsCompilation: | no |
| Packaged: | 2024-12-16 08:52:32 UTC; ASUS |
| Author: | Guangbao Guo |
| Maintainer: | Guangbao Guo <ggb11111111@163.com> |
| Depends: | R (≥ 3.5.0) |
| Repository: | CRAN |
| Date/Publication: | 2024-12-16 10:20:02 UTC |
Caculate the optimal subset lengths on the COR
Description
Caculate the optimal subset lengths on the COR
Usage
COR(K = K, nk = nk, alpha = alpha, X = X, y = y)
Arguments
K |
is the number of subsets |
nk |
is the length of subsets |
alpha |
is the significance level |
X |
is the observation matrix |
y |
is the response vector |
Value
A list containing:
seqL |
The index of the subset with the minimum L value. |
seqN |
The index of the subset with the minimum N value. |
lWMN |
The optimal subset lengths on the COR. |
References
Guo, G., Song, H. & Zhu, L. The COR criterion for optimal subset selection in distributed estimation. Statistics and Computing, 34, 163 (2024). doi:10.1007/s11222-024-10471-z
Examples
p=6;n=1000;K=2;nk=200;alpha=0.05;sigma=1
e=rnorm(n,0,sigma); beta=c(sort(c(runif(p,0,1))));
data=c(rnorm(n*p,5,10));X=matrix(data, ncol=p);
y=X%*%beta+e;
COR(K=K,nk=nk,alpha=alpha,X=X,y=y)
Calculate the LIC estimator for linear regression
Description
This function estimates the coefficients of a linear regression model using a design matrix 'X' and a response vector 'Y'. It implements an A-optimal and D-optimal design criteria to choose optimal subsets of observations.
Usage
LICbeta(X, Y, alpha, K, nk)
Arguments
X |
The observation matrix (n x p) |
Y |
The response vector (n x 1) |
alpha |
The significance level for computing confidence intervals |
K |
The number of subsets |
nk |
The number of observations per subset |
Value
A list containing:
E5 |
The LIC estimator for linear regression. |
References
Guo, G., Song, H. & Zhu, L. The COR criterion for optimal subset selection in distributed estimation. Statistics and Computing, 34, 163 (2024). doi:10.1007/s11222-024-10471-z
Calculate the LIC estimator based on A-optimal and D-optimal criterion
Description
Calculate the LIC estimator based on A-optimal and D-optimal criterion
Usage
LICnew(X, Y, alpha, K, nk)
Arguments
X |
A matrix of observations (design matrix) with size n x p |
Y |
A vector of responses with length n |
alpha |
The significance level for confidence intervals |
K |
The number of subsets to consider |
nk |
The size of each subset |
Value
A list containing:
E5 |
The LIC estimator based on A-optimal and D-optimal criterion. |
References
Guo, G., Song, H. & Zhu, L. The COR criterion for optimal subset selection in distributed estimation. Statistics and Computing, 34, 163 (2024). doi:10.1007/s11222-024-10471-z
Examples
p = 6; n = 1000; K = 2; nk = 200; alpha = 0.05; sigma = 1
e = rnorm(n, 0, sigma); beta = c(sort(c(runif(p, 0, 1))));
data = c(rnorm(n * p, 5, 10)); X = matrix(data, ncol = p);
Y = X %*% beta + e;
LICnew(X = X, Y = Y, alpha = alpha, K = K, nk = nk)
Calculate MSE values for different beta estimation methods
Description
Calculate MSE values for different beta estimation methods
Usage
MSEbeta(X, Y, alpha, K, nk)
Arguments
X |
The design matrix (observations). |
Y |
The response vector. |
alpha |
The significance level. |
K |
The number of subsets. |
nk |
The length of subsets (number of observations in each subset). |
Value
A list containing:
MSECOR |
The MSE of the COR beta estimator. |
MSEAopt |
The MSE of the A-optimal beta estimator. |
MSEDopt |
The MSE of the D-optimal beta estimator. |
MSElic |
The MSE of the LIC beta estimator. |
References
Guo, G., Song, H. & Zhu, L. The COR criterion for optimal subset selection in distributed estimation. Statistics and Computing, 34, 163 (2024). doi:10.1007/s11222-024-10471-z
Caculate the MSE values of the COR criterion in simulation
Description
Caculate the MSE values of the COR criterion in simulation
Usage
MSEcom(K = K, nk = nk, alpha = alpha, X = X, y = y)
Arguments
K |
is the number of subsets |
nk |
is the length of subsets |
alpha |
is the significance level |
X |
is the observation matrix |
y |
is the response vector |
Value
A list containing:
MSEx |
The Mean Squared Error between the true beta and the estimate betax based on the COR. |
MSEA |
The Mean Squared Error between the true beta and the estimate betaA based on the least squares estimate for subset A. |
MSEc |
The Mean Squared Error between the true beta and the estimate betac based on the COR-selected subset. |
MSEm |
The Mean Squared Error between the true beta and the median estimator betamm across all subsets. |
MSEa |
The Mean Squared Error between the true beta and the mean estimator betaa across all subsets. |
References
Guo, G., Song, H. & Zhu, L. The COR criterion for optimal subset selection in distributed estimation. Statistics and Computing, 34, 163 (2024). doi:10.1007/s11222-024-10471-z
Examples
p=6;n=1000;K=2;nk=500;alpha=0.05;sigma=1
e=rnorm(n,0,sigma); beta=c(sort(c(runif(p,0,1))));
data=c(rnorm(n*p,5,10));X=matrix(data, ncol=p);
y=X%*%beta+e;
MSEcom(K=K,nk=nk,alpha=alpha,X=X,y=y)
Caculate the MSE values of the COR criterion for redundant data in simulation
Description
Caculate the MSE values of the COR criterion for redundant data in simulation
Usage
MSEver(K = K, nk = nk, alpha = alpha, X = X, y = y)
Arguments
K |
is the number of subsets |
nk |
is the length of subsets |
alpha |
is the significance level |
X |
is the observation matrix |
y |
is the response vector |
Value
A list containing:
minE |
The minimum value of the error variance estimator. |
Mcor |
The MSE of the COR estimator. |
Mx |
The MSE of the estimator based on the subset with the maximum M. |
MA |
The MSE of the estimator based on the subset with the minimum W. |
References
Guo, G., Song, H. & Zhu, L. The COR criterion for optimal subset selection in distributed estimation. Statistics and Computing, 34, 163 (2024). doi:10.1007/s11222-024-10471-z
Examples
p=6;n=1000;K=2;nk=200;alpha=0.05;sigma=1
e=rnorm(n,0,sigma); beta=c(sort(c(runif(p,0,1))));
data=c(rnorm(n*p,5,10));X=matrix(data, ncol=p);
y=X%*%beta+e;
MSEver(K=K,nk=nk,alpha=alpha,X=X,y=y)
Caculate the estimators of beta on the A-opt and D-opt
Description
Caculate the estimators of beta on the A-opt and D-opt
Usage
beta_AD(K = K, nk = nk, alpha = alpha, X = X, y = y)
Arguments
K |
is the number of subsets |
nk |
is the length of subsets |
alpha |
is the significance level |
X |
is the observation matrix |
y |
is the response vector |
Value
A list containing:
betaA |
The estimator of beta on the A-opt. |
betaD |
The estimator of beta on the D-opt. |
References
Guo, G., Song, H. & Zhu, L. The COR criterion for optimal subset selection in distributed estimation. Statistics and Computing, 34, 163 (2024). doi:10.1007/s11222-024-10471-z
Examples
p=6;n=1000;K=2;nk=200;alpha=0.05;sigma=1
e=rnorm(n,0,sigma); beta=c(sort(c(runif(p,0,1))));
data=c(rnorm(n*p,5,10));X=matrix(data, ncol=p);
y=X%*%beta+e;
beta_AD(K=K,nk=nk,alpha=alpha,X=X,y=y)
Caculate the estimators of beta on the LEV-opt#'
Description
Caculate the estimators of beta on the LEV-opt#'
Usage
beta_LW(X, Y, K, nk)
Arguments
X |
is the observation matrix |
Y |
is the response vector |
K |
is the number of subsets |
nk |
is the length of subsets |
Value
A list containing:
betalev |
The estimator of beta on the LEV-opt subset. |
betam |
The mean of the beta estimators across all K subsets. |
AMSE |
The Average Mean Squared Error (AMSE) for the estimator. |
WMSE |
The Weighted Mean Squared Error (WMSE) for the estimator. |
MSElevb |
The Mean Squared Error (MSE) of the LEV-opt estimator compared to the true beta. |
MSEb |
The Mean Squared Error (MSE) of the mean estimator (betam) compared to the true beta. |
MSEyleva |
The Mean Squared Error (MSE) of the LEV-opt estimator on the subset with the maximum hat value (Xleva). |
MSEyleviy |
The Mean Squared Error (MSE) of the LEV-opt estimator on the subset with the minimum hat value (Xlevi). |
MSEW |
The Mean Squared Error (MSE) of the weighted estimator (Wbeta) compared to the true beta. |
MSEw |
The Mean Squared Error (MSE) of the weighted estimator (wbeta) compared to the true beta. |
References
Guo, G., Song, H. & Zhu, L. The COR criterion for optimal subset selection in distributed estimation. Statistics and Computing, 34, 163 (2024). doi:10.1007/s11222-024-10471-z
Caculate the estimator of beta on the COR
Description
Caculate the estimator of beta on the COR
Usage
beta_cor(K = K, nk = nk, alpha = alpha, X = X, y = y)
Arguments
K |
is the number of subsets |
nk |
is the length of subsets |
alpha |
is the significance level |
X |
is the observation matrix |
y |
is the response vector |
Value
A list containing:
betaC |
The estimator of beta on the COR. |
References
Guo, G., Song, H. & Zhu, L. The COR criterion for optimal subset selection in distributed estimation. Statistics and Computing, 34, 163 (2024). doi:10.1007/s11222-024-10471-z
Examples
p=6;n=1000;K=2;nk=200;alpha=0.05;sigma=1
e=rnorm(n,0,sigma); beta=c(sort(c(runif(p,0,1))));
data=c(rnorm(n*p,5,10));X=matrix(data, ncol=p);
y=X%*%beta+e;
beta_cor(K=K,nk=nk,alpha=alpha,X=X,y=y)
The communities and crime data set
Description
A data set about the communities and crime
Usage
data("communities")
Format
A data frame with 1994 observations on the following 128 variables.
V1a numeric vector
V2a numeric vector
V3a numeric vector
V4a character vector
V5a numeric vector
V6a numeric vector
V7a numeric vector
V8a numeric vector
V9a numeric vector
V10a numeric vector
V11a numeric vector
V12a numeric vector
V13a numeric vector
V14a numeric vector
V15a numeric vector
V16a numeric vector
V17a numeric vector
V18a numeric vector
V19a numeric vector
V20a numeric vector
V21a numeric vector
V22a numeric vector
V23a numeric vector
V24a numeric vector
V25a numeric vector
V26a numeric vector
V27a numeric vector
V28a numeric vector
V29a numeric vector
V30a numeric vector
V31a numeric vector
V32a numeric vector
V33a numeric vector
V34a numeric vector
V35a numeric vector
V36a numeric vector
V37a numeric vector
V38a numeric vector
V39a numeric vector
V40a numeric vector
V41a numeric vector
V42a numeric vector
V43a numeric vector
V44a numeric vector
V45a numeric vector
V46a numeric vector
V47a numeric vector
V48a numeric vector
V49a numeric vector
V50a numeric vector
V51a numeric vector
V52a numeric vector
V53a numeric vector
V54a numeric vector
V55a numeric vector
V56a numeric vector
V57a numeric vector
V58a numeric vector
V59a numeric vector
V60a numeric vector
V61a numeric vector
V62a numeric vector
V63a numeric vector
V64a numeric vector
V65a numeric vector
V66a numeric vector
V67a numeric vector
V68a numeric vector
V69a numeric vector
V70a numeric vector
V71a numeric vector
V72a numeric vector
V73a numeric vector
V74a numeric vector
V75a numeric vector
V76a numeric vector
V77a numeric vector
V78a numeric vector
V79a numeric vector
V80a numeric vector
V81a numeric vector
V82a numeric vector
V83a numeric vector
V84a numeric vector
V85a numeric vector
V86a numeric vector
V87a numeric vector
V88a numeric vector
V89a numeric vector
V90a numeric vector
V91a numeric vector
V92a numeric vector
V93a numeric vector
V94a numeric vector
V95a numeric vector
V96a numeric vector
V97a numeric vector
V98a numeric vector
V99a numeric vector
V100a numeric vector
V101a numeric vector
V102a numeric vector
V103a numeric vector
V104a numeric vector
V105a numeric vector
V106a numeric vector
V107a numeric vector
V108a numeric vector
V109a numeric vector
V110a numeric vector
V111a numeric vector
V112a numeric vector
V113a numeric vector
V114a numeric vector
V115a numeric vector
V116a numeric vector
V117a numeric vector
V118a numeric vector
V119a numeric vector
V120a numeric vector
V121a numeric vector
V122a numeric vector
V123a numeric vector
V124a numeric vector
V125a numeric vector
V126a numeric vector
V127a numeric vector
V128a numeric vector
Source
UCI repository
References
Redmond, M. A. and A. Baveja: A Data-Driven Software Tool for Enabling Cooperative Information Sharing Among Police Departments. European Journal of Operational Research 141 (2002) 660-678.
Examples
data(communities)
## maybe str(communities) ; plot(communities) ...
The chemical sensor data set
Description
A data set about chemical sensor
Usage
data("ethylene_CO")
Format
A data frame with 4001 observations on the following 19 variables.
V1a character vector
V2a character vector
V3a character vector
V4a character vector
V5a character vector
V6a character vector
V7a character vector
V8a character vector
V9a character vector
V10a character vector
V11a character vector
V12a character vector
V13a character vector
V14a character vector
V15a character vector
V16a character vector
V17a character vector
V18a character vector
V19a character vector
Details
We selected the first 4001 rows on the original data set about 1048576 observations on 19 variables.
Source
UCI Repository
References
Wang, H. Y., Zhu, R., and Ma, P. (2018). Optimal subsampling for large sample logistic regression. Journal of the American Statistical Association, 113(522), 829-844.
Examples
data(ethylene_CO)
## maybe str(ethylene_CO) ; plot(ethylene_CO) ...