| Type: | Package |
| Title: | An Amazing Fast Way to Fit Elastic Net |
| Version: | 1.1.2 |
| Date: | 2018-08-01 |
| Description: | Fit Elastic Net, Lasso, and Ridge regression and do cross-validation in a fast way. We build the algorithm based on Least Angle Regression by Bradley Efron, Trevor Hastie, Iain Johnstone, etc. (2004)(<doi:10.1214/009053604000000067 >) and some algorithms like Givens rotation and Forward/Back Substitution. In this way, many matrices to be computed are retained as triangular matrices which can eventually speed up the computation. The fitting algorithm for Elastic Net is written in C++ using Armadillo linear algebra library. |
| Depends: | R (≥ 3.1.0) |
| License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
| Imports: | Rcpp (≥ 0.12.16) |
| LinkingTo: | Rcpp, RcppArmadillo |
| Suggests: | knitr, rmarkdown |
| URL: | https://github.com/CUFESAM/Elastic-Net |
| BugReports: | https://github.com/CUFESAM/Elastic-Net/issues |
| NeedsCompilation: | yes |
| Packaged: | 2018-08-08 13:22:50 UTC; <e8><8b><8f><e8><90><8c> |
| Author: | Jingyi Ma [aut], Qiuhong Lai [ctb], Linyu Zuo [ctb, cre], Yi Yang [ctb], Meng Su [ctb], Zhen Yu [ctb], Gege Gao [ctb], Xiao Liu [ctb], Xueni Ruan [ctb], Xinyuan Yang [ctb], Yu Bai [ctb], Zhijun Liao [ctb] |
| Maintainer: | Linyu Zuo <zuozhe5959@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2018-08-11 16:30:10 UTC |
Fitting ElasticNet in a fast way.
Description
FasterElasticNet uses some math algorithm such as cholesky decomposition and forward solve etc. to reduce the amount of computation. We also use Rcpp with Armadillo to improve our algorithm by speeding up almost 5 times compared by the R version.
Details
To use fasterElasticNet, dataset x(mxn) and y(mx1) should be put into the function to fit the model. Then, a completely trace of lambda1 and lambda2 can be computed if no lambda1 and lambda2 were input by using ElasticNet. Using cv.choosemodel with the number of folds will returns a best model with smallest MSE after cross-validation. Using output to print the output and predict function will return the prediction based on a new dataset.
Author(s)
Jingyi Ma
Maintainer: Linyu Zuo <zuozhe5959@gmail.com>
References
BRADLEY, EFRON, TREVOR, HASTIE, IAIN, JOHNSTONE, AND, ROBERT, TIBSHIRANI. LEAST ANGLE REGRESSION[J]. The Annals of Statistics, 2004, 32(2): 407-499
See Also
https://github.com/CUFESAM/Elastic-Net
Examples
#Use R built-in datasets mtcars for a model fitting
x <- mtcars[,-1]
y <- mtcars[, 1]
#fit model
model <- ElasticNetCV(x,y)
#fit a elastic net with lambda2 = 1
model$Elasticnet_(lambda2 = 1)
#choose model using cv
model$cv.choosemodel(k = 31) #Leave-one-out cross validation
model$output() #See the output
#predict
pre <- mtcars[1:3,-1]
model$predict(pre)
Cross validation
Description
Computes k-fold cross-validation for elastic net.
Usage
ElasticNetCV(x, y)
Arguments
x |
A data.frame or matrix of predictors |
y |
A vector of response variables |
Details
This function reads data into its environment and returns a list of three outcomes. To perform elastic net or cross-validation of elastic net, use the corresponding element of the returned list. See examples below. The penalty of L1-norm and L2-norm is denoted by lambda1 and lambda2 respectively.
Value
cv.choosemodel |
Given the parameter k folds and lambda2 (optional), cv.choosemodel performs cross-validation to select the opti- mal lambda1 and computes the corresponding coefficient of each variable. If lambda2 is NULL, cv.choosemodel selects the optimal lambda2 from a sequence going from 0 to 1 in steps of 0.1 and the corresponding optimal lambda1, then it returns the coefficient of each variable. |
A list of three outcomes will be returned:
Elasticnet |
Given lambda1 (optional) and lambda2, Elasticnet_ calculates an elastic net-regularized regression and returns the coefficients of each variable. If lambda1 is NULL, Elasticnet_ prints out the trace of lambda1 and the corresponding coefficient of each variable. |
output |
Prints the cross-validation outputs, including the minimum MSE, the coefficient of each variable, lambda1 and lambda2. |
predict |
Reads a data.frame of the testing data set and returns predictions using the trained model. |
Examples
#Use R built-in datasets mtcars for a model fitting
x <- mtcars[,-1]
y <- mtcars[, 1]
#fit model
model <- ElasticNetCV(x,y)
#fit a elastic net with lambda2 = 1
model$Elasticnet_(lambda2 = 1)
#choose model using cv
model$cv.choosemodel(k = 31) #Leave-one-out cross validation
model$output() #See the output
#predict
pre <- mtcars[1:3,-1]
model$predict(pre)
A fast way fitting elastic net using RcppArmadillo
Description
Elastic net is a regularization and variable selection method which linearly combines the L1 penalty of the lasso and L2 penalty of ridge methods. Based on this method, elastic- net is designed to return the trace of finding the best linear regression model. Compared with the existed R version of ElasticNet, our version speeds up the algorithm by using Cholesky decomposition, Givens rotation and RcppArmadillo.
Usage
elasticnet(XTX, XTY, lam2, lam1 = -1)
Arguments
XTX |
The product of the transpose of independent variable X and itself. |
XTY |
The product of the transpose of independent variable X and response variable Y |
lam1 |
Penalty of L1-norm. No L1 penalty when lam1 = -1 |
lam2 |
Penalty of L2-norm, a hyper-paramater |
Details
When only lambda2 is given, elasticnet will return the trace of variable selection with lambda1 decreasing from lambda1_0 to zero. lambda1_0 is a value for lambda1 when there is only one predictor (the one most correlated with the response variable) in the model.
If lambda1 and lambda2 are both given, it will also return a trace. But in this case, the trace will stop when lambda1 and lambda2 reach the given ones.
To speed up the algorithm, we use some calculational tricks:
In the consideration of the low efficiency of R dealing with high-dimensional matrix, we use lower triangular matrices during the iteration of the algorithm to avoid massive matrix calculations. When adding one predictor into the model, we update XTX by recalcuting the lower triangular matrix in the Cholesky decomposition of it. While re- moving one predictor from the model, we update the lower triangular matrix with the help of Givens rotations.
Furthermore, due to the low efficiency of R dealing with loops, we rewrite the entire algorithm with RcppArmadillo, a C++ linear algebra library.
Value
A list will be returned. When only lambda2 is given, the returned list contains the trace of lambda1 (relamb) and the corresponding coefficients of the predictors (reb). If both lambda1 and lambda2 are given, the corresponding coefficients of the predictors will be returned.
Examples
#Use R built-in datasets mtcars for a model fitting
x <- as.matrix(mtcars[,-1])
y <- as.matrix(mtcars[, 1])
XTX <- t(x) %*% x
XTY <- t(x) %*% y
#Prints the output of elastic net model with lambda2 = 0
res <- elasticnet(XTX,XTY,lam2 = 0)
Housing data from kaggle
Description
A subdata from kaggle "Get start" competition
Usage
data("housing")
Format
A data frame with 10153 observations on the following 140 variables.
floorfor apartments, floor of the building
area_mArea, sq.m.
green_zone_partProportion of area of greenery in the total area
indust_partShare of industrial zones in area of the total area
preschool_quotaNumber of seats in pre-school organizations
preschool_education_centers_raionNumber of pre-school institutions
school_quotaNumber of high school seats in area
school_education_centers_raionNumber of high school institutions
school_education_centers_top_20_raionNumber of high schools of the top 20 best schools in Moscow
healthcare_centers_raionNumber of healthcare centers in district
university_top_20_raionNumber of higher education institutions in the top ten ranking of the Federal rank
sport_objects_raionNumber of higher education institutions
additional_education_raionNumber of additional education organizations
culture_objects_top_25_raionNumber of objects of cultural heritage
shopping_centers_raionNumber of malls and shopping centres in district
office_raionNumber of malls and shopping centres in district
build_count_blockShare of block buildings
build_count_woodShare of wood buildings
build_count_frameShare of frame buildings
build_count_brickShare of brick buildings
build_count_monolithShare of monolith buildings
build_count_panelShare of panel buildings
build_count_foamShare of foam buildings
build_count_slagShare of slag buildings
build_count_before_1920Share of before_1920 buildings
build_count_1921.1945Share of 1921-1945 buildings
build_count_1946.1970Share of 1946-1970 buildings
build_count_1971.1995Share of 1971-1995 buildings
build_count_after_1995Share of after_1995 buildings
kindergarten_kmDistance to kindergarten
school_kmDistance to high school
park_kmDistance to park
green_zone_kmDistance to green zone
industrial_kmDistance to industrial zone
water_treatment_kmDistance to water treatment
cemetery_kmDistance to the cemetery
incineration_kmDistance to the incineration
railroad_station_walk_minTime to the railroad station (walk)
railroad_station_avto_kmDistance to the railroad station (avto)
railroad_station_avto_minTime to the railroad station (avto)
public_transport_station_min_walkTime to the public transport station (walk)
water_kmDistance to the water reservoir / river
mkad_kmDistance to MKAD (Moscow Circle Auto Road)
big_road1_kmDistance to Nearest major road
big_road2_kmThe distance to next distant major road
railroad_kmDistance to the railway / Moscow Central Ring / open areas Underground
bus_terminal_avto_kmDistance to bus terminal (avto)
oil_chemistry_kmDistance to dirty industries
nuclear_reactor_kmDistance to nuclear reactor
radiation_kmDistance to burial of radioactive waste
power_transmission_line_kmDistance to power transmission line
thermal_power_plant_kmDistance to thermal power plant
ts_kmDistance to power station
big_market_kmDistance to grocery / wholesale markets
market_shop_kmDistance to markets and department stores
fitness_kmDistance to fitness
swim_pool_kmDistance to swimming pool
ice_rink_kmDistance to ice palace
stadium_kmDistance to stadium
basketball_kmDistance to the basketball courts
hospice_morgue_kmDistance to hospice/morgue
detention_facility_kmDistance to detention facility
public_healthcare_kmDistance to public healthcare
university_kmDistance to universities
workplaces_kmDistance to workplaces
shopping_centers_kmDistance to shopping centers
office_kmDistance to business centers/ offices
additional_education_kmDistance to additional education
preschool_kmDistance to preschool education organizations
big_church_kmDistance to large church
church_synagogue_kmDistance to Christian chirches and Synagogues
mosque_kmDistance to mosques
theater_kmDistance to theater
museum_kmDistance to museums
exhibition_kmDistance to exhibition
catering_kmDistance to catering
green_part_500The share of green zones in 500 meters zone
prom_part_500The share of industrial zones in 500 meters zone
office_count_500The number of office space in 500 meters zone
office_sqm_500The square of office space in 500 meters zone
trc_count_500The number of shopping malls in 500 meters zone
trc_sqm_500The square of shopping malls in 500 meters zone
cafe_count_500_na_priceCafes and restaurant bill N/A in 500 meters zone
cafe_count_500_price_500Cafes and restaurant bill, average under 500 in 500 meters zone
cafe_count_500_price_1000Cafes and restaurant bill, average 500-1000 in 500 meters zone
cafe_count_500_price_1500Cafes and restaurant bill, average 1000-1500 in 500 meters zone
cafe_count_500_price_2500Cafes and restaurant bill, average 1500-2500 in 500 meters zone
cafe_count_500_price_4000Cafes and restaurant bill, average 2500-4000 in 500 meters zone
cafe_count_500_price_highCafes and restaurant bill, average over 4000 in 500 meters zone
big_church_count_500The number of big churchs in 500 meters zone
church_count_500The number of churchs in 500 meters zone
mosque_count_500The number of mosques in 500 meters zone
leisure_count_500The number of leisure facilities in 500 meters zone
sport_count_500The number of sport facilities in 500 meters zone
market_count_500The number of markets in 500 meters zone
green_part_1000The share of green zones in 1000 meters zone
prom_part_1000The share of industrial zones in 1000 meters zone
office_sqm_1000The square of office space in 1000 meters zone
trc_count_1000The number of shopping malls in 1000 meters zone
trc_sqm_1000The square of shopping malls in 1000 meters zone
cafe_count_1000_na_priceCafes and restaurant bill N/A in 1000 meters zone
cafe_count_1000_price_highCafes and restaurant bill, average over 4000 in 1000 meters zone
big_church_count_1000The number of big churchs in 1000 meters zone
mosque_count_1000The number of mosques in 1000 meters zone
leisure_count_1000The number of leisure facilities in 1000 meters zone
sport_count_1000The number of sport facilities in 1000 meters zone
market_count_1000The number of markets in 1000 meters zone
green_part_1500The share of green zones in 1500 meters zone
prom_part_1500The share of industrial zones in 1500 meters zone
office_sqm_1500The square of office space in 1500 meters zone
trc_count_1500The number of shopping malls in 1500 meters zone
trc_sqm_1500The square of shopping malls in 1500 meters zone
cafe_count_1500_price_highCafes and restaurant bill, average over 4000 in 1500 meters zone
mosque_count_1500The number of mosques in 1500 meters zone
sport_count_1500The number of sport facilities in 1500 meters zone
market_count_1500The number of markets in 1500 meters zone
green_part_2000The share of green zones in 2000 meters zone
prom_part_2000The share of industrial zones in 2000 meters zone
office_sqm_2000The square of office space in 2000 meters zone
trc_count_2000The number of shopping malls in 2000 meters zone
trc_sqm_2000The square of shopping malls in 2000 meters zone
mosque_count_2000The number of mosques in 2000 meters zone
sport_count_2000The number of sport facilities in 2000 meters zone
market_count_2000The number of markets in 2000 meters zone
green_part_3000The share of green zones in 3000 meters zone
prom_part_3000The share of industrial zones in 3000 meters zone
office_sqm_3000The square of office space in 3000 meters zone
trc_count_3000The number of shopping malls in 3000 meters zone
trc_sqm_3000The square of shopping malls in 3000 meters zone
mosque_count_3000The number of mosques in 3000 meters zone
sport_count_3000The number of sport facilities in 3000 meters zone
market_count_3000The number of markets in 3000 meters zone
green_part_5000The share of green zones in 5000 meters zone
prom_part_5000The share of industrial zones in 5000 meters zone
trc_count_5000The number of shopping malls in 5000 meters zone
trc_sqm_5000The square of shopping malls in 5000 meters zone
mosque_count_5000The number of mosques in 5000 meters zone
sport_count_5000The number of sport facilities in 5000 meters zone
market_count_5000The number of markets in 5000 meters zone
price_docI don't know
Source
www.kaggle.com
Examples
data(housing)