| Type: | Package | 
| Title: | High-Dimensional Metrics | 
| Version: | 0.3.2 | 
| Date: | 2024-02-09 | 
| Depends: | R (≥ 3.0.0) | 
| Description: | Implementation of selected high-dimensional statistical and econometric methods for estimation and inference. Efficient estimators and uniformly valid confidence intervals for various low-dimensional causal/ structural parameters are provided which appear in high-dimensional approximately sparse models. Including functions for fitting heteroscedastic robust Lasso regressions with non-Gaussian errors and for instrumental variable (IV) and treatment effect estimation in a high-dimensional setting. Moreover, the methods enable valid post-selection inference and rely on a theoretically grounded, data-driven choice of the penalty. Chernozhukov, Hansen, Spindler (2016) <doi:10.48550/arXiv.1603.01700>. | 
| License: | MIT + file LICENSE | 
| LazyData: | TRUE | 
| Imports: | MASS, glmnet, ggplot2, checkmate, Formula, methods | 
| Suggests: | testthat, knitr, rmarkdown, formatR, xtable, mvtnorm, markdown | 
| VignetteBuilder: | knitr | 
| Encoding: | UTF-8 | 
| RoxygenNote: | 7.3.1 | 
| NeedsCompilation: | no | 
| Packaged: | 2024-02-14 15:17:25 UTC; bachp | 
| Author: | Martin Spindler [cre, aut], Victor Chernozhukov [aut], Christian Hansen [aut], Philipp Bach [ctb] | 
| Maintainer: | Martin Spindler <martin.spindler@gmx.de> | 
| Repository: | CRAN | 
| Date/Publication: | 2024-02-14 21:20:02 UTC | 
hdm: High-Dimensional Metrics
Description
This package implements methods for estimation and inference in a high-dimensional setting.
Details
| Package: | hdm | 
| Type: | Package | 
| Version: | 0.1 | 
| Date: | 2015-05-25 | 
| License: | GPL-3 | 
This package provides efficient estimators and uniformly valid confidence intervals for various low-dimensional causal/structural parameters appearing in high-dimensional approximately sparse models. The package includes functions for fitting heteroskedastic robust Lasso regressions with non-Gaussian erros and for instrumental variable (IV) and treatment effect estimation in a high-dimensional setting. Moreover, the methods enable valid post-selection inference. Moreover, a theoretically grounded, data-driven choice of the penalty level is provided.
Author(s)
Maintainer: Martin Spindler martin.spindler@gmx.de
Authors:
- Victor Chernozhukov 
- Christian Hansen 
Other contributors:
- Philipp Bach philipp.bach@uni-hamburg.de [contributor] 
AJR data set
Description
Dataset on settler mortality.
Format
- Mort
- Settler mortality 
- logMort
- logarithm of Mort 
- Latitude
- Latitude 
- Latitude2
- Latitude^2 
- Africa
- Africa 
- Asia
- Asia 
- Namer
- North America 
- Samer
- South America 
- Neo
- Neo-Europes 
- GDP
- GDP 
- Exprop
- Average protection against expropriation risk 
Details
Data set was analysed in Acemoglu et al. (2001). A detailed description of the data can be found at https://economics.mit.edu/people/faculty/daron-acemoglu/data-archive
References
D. Acemoglu, S. Johnson, J. A. Robinson (2001). Colonial origins of comparative development: an empirical investigation. American Economic Review, 91, 1369–1401.
Examples
data(AJR)
BLP data set
Description
Automobile data set from the US.
Format
- model.name
- model name 
- model.id
- model id 
- firm.id
- firm id 
- cdid
- cdid 
- id
- id 
- price
- log price 
- mpg
- miles per gallon 
- mpd
- miles per dollar 
- hpwt
- horse power per weight 
- air
- air conditioning (binary variable) 
- space
- size of the car 
- share
- market share 
- outshr
- share s0 
- y
- outcome variable defined as log(share) - log(outshr) 
- trend
- time trend 
Details
Data set was analysed in Berry, Levinsohn and Pakes (1995). The data stem from annual issues of the Automotive News Market Data Book. 
The data set inlcudes information on all models marketed during the the period beginning 1971 and ending in 1990 cotaining 2217 model/years from 997 distinct models.
A detailed description is given in BLP (1995, 868–871). The internal function constructIV constructs instrumental variables along the lines described and used in BLP (1995).
References
S. Berry, J. Levinsohn, A. Pakes (1995). Automobile Prices in Market EquilibriumD. Econometrica, 63(4), 841–890.
Examples
data(BLP)
Eminent Domain data set
Description
Dataset on judicial eminent domain decisions.
Format
- y
- economic outcome variable 
- x
- set of exogenous variables 
- d
- eminent domain decisions 
- z
- set of potential instruments 
Details
Data set was analyzed in Belloni et al. (2012). They estimate the effect of judicial eminent domain decisions on economic outcomes with instrumental variables (IV) in a setting high a large set of potential IVs. A detailed decription of the data can be found at https://www.econometricsociety.org/publications/econometrica/2012/11/01/sparse-models-and-methods-optimal-instruments-application The data set contains four "sub-data sets" which differ mainly in the dependent variables: repeat-sales FHFA/OFHEO house price index for metro (FHFA) and non-metro (NM) area, the Case-Shiller home price index (CS), and state-level GDP from the Bureau of Economic Analysis - all transformed with the logarithm. The structure of each subdata set is given above. In the data set the following variables and name conventions are used: "numpanelskx_..." is the number of panels with at least k members with the characteristic following the "_". The probability controls (names start with "F_prob_") follow a similar naming convention and give the probability of observing a panel with characteristic given following second "_" given the characteristics of the pool of judges available to be assigned to the case.
Characteristics in the data for the control variables or instruments:
- noreligion
- judge reports no religious affiliation 
- jd_public
- judge's law degree is from a public university 
- dem
- judge reports being a democrat 
- female
- judge is female 
- nonwhite
- judge is nonwhite (and not black) 
- black
- judge is black 
- jewish
- judge is Jewish 
- catholic
- judge is Catholic 
- mainline
- baseline religion 
- protestant
- belongs to a protestant church 
- evangelical
- belongs to an evangelical church 
- instate_ba
- judge's undergraduate degree was obtained within state 
- ba_public
- judge's undergraduate degree was obtained at a public university 
- elev
- judge was elevated from a district court 
- year
- year dummy (reference category is one year before the earliest year in the data set (excluded)) 
- circuit
- dummy for the circuit level (reference category excluded) 
- missing_cy_12
- a dummy for whether there were no cases in that circuit-year 
- numcasecat_12
- the number of takings appellate decisions 
References
D. Belloni, D. Chen, V. Chernozhukov and C. Hansen (2012). Sparse models and methods for optimal instruments with an application to eminent domain. Econometrica 80 (6), 2369–2429.
Examples
data(EminentDomain)
Growth data set
Description
Data set of growth compiled by Barro Lee.
Format
Dataframe with the following variables:
- outcome
- dependent variable: national growth rates in GDP per capita for the periods 1965-1975 and 1975-1985 
- x
- covariates which might influence growth 
Details
The data set contains growth data of Barro-Lee. The Barro Lee data consists
of a panel of 138 countries for the period 1960 to 1985. The dependent
variable is national growth rates in GDP per capita for the periods
1965-1975 and 1975-1985. The growth rate in GDP over a period from t_1 to t_2
is commonly defined as \log(GDP_{t_1}/GDP_{t_2}). The number of covariates is p=62.
The number of complete observations is 90.
Source
The full data set and further details can be found at https://www2.nber.org/pub/barro.lee/ and, https://www.bristol.ac.uk//Depts//Economics//Growth//barlee.htm.
References
R.J. Barro, J.W. Lee (1994). Data set for a panel of 139 countries. NBER.
R.J. Barro, X. Sala-i-Martin (1995). Economic Growth. McGrwa-Hill, New York.
Examples
data(GrwothData)
Shooting Lasso
Description
Implementation of the Shooting Lasso (Fu, 1998) with variable dependent penalization weights.
Usage
LassoShooting.fit(
  x,
  y,
  lambda,
  control = list(maxIter = 1000, optTol = 10^(-5), zeroThreshold = 10^(-6)),
  XX = NULL,
  Xy = NULL,
  beta.start = NULL
)
Arguments
| x | matrix of regressor variables ( | 
| y | dependent variable (vector or matrix) | 
| lambda | vector of length  | 
| control | list with control parameters:  | 
| XX | optional, precalculated matrix  | 
| Xy | optional, precalculated matrix  | 
| beta.start | start value for beta | 
Details
The function implements the Shooting Lasso (Fu, 1998) with variable dependent
penalization. The arguments XX and Xy are optional and allow to use precalculated matrices which might improve performance.
Value
| coefficients | estimated coefficients by the Shooting Lasso Algorithm | 
| coef.list | matrix of coefficients from each iteration | 
| num.it | number of iterations run | 
References
Fu, W. (1998). Penalized regressions: the bridge vs the lasso. Journal of Computational and Graphical Software 7, 397-416.
Coefficients from S3 objects rlassoEffects
Description
Method to extract coefficients from objects of class rlassoEffects
Usage
## S3 method for class 'rlassoEffects'
coef(
  object,
  complete = TRUE,
  selection.matrix = FALSE,
  include.targets = FALSE,
  ...
)
Arguments
| object | an object of class  | 
| complete | general option of the function  | 
| selection.matrix | if TRUE, a selection matrix is returned that indicates the selected variables from each auxiliary regression. Default is set to FALSE. | 
| include.targets | if FALSE (by default) only the selected control variables are listed in the  | 
| ... | further arguments passed to functions coef or print. | 
Details
Printing coefficients and selection matrix for S3 object rlassoEffects. Interpretation of entries in the selection matrix
-  "-"indicates a target variable,
-  "x"indicates that a variable has been selected with rlassoEffects (coefficient is different from zero),
-  "."indicates that a variable has been de-selected with rlassoEffects (coefficient is zero).
Examples
library(hdm)
set.seed(1)
n = 100 #sample size
p = 100 # number of variables
s = 7 # number of non-zero variables
X = matrix(rnorm(n*p), ncol=p)
colnames(X) <- paste("X", 1:p, sep="")
beta = c(rep(3,s), rep(0,p-s))
y = 1 + X%*%beta + rnorm(n)
data = data.frame(cbind(y,X))
colnames(data)[1] <- "y"
lasso.effect = rlassoEffects(X, y, index=c(1,2,3,50), 
                             method = "double selection")
coef(lasso.effect) # standard use of coef() - without selection matrix
# with selection matrix
coef(lasso.effect, selection.matrix = TRUE)
# prettier output with print_coef (identical options as coef())
print_coef(lasso.effect, selection.matrix = TRUE) 
Coefficients from S3 objects rlassoIV
Description
Method to extract coefficients from objects of class rlassoIV.
Usage
## S3 method for class 'rlassoIV'
coef(object, complete = TRUE, selection.matrix = FALSE, ...)
Arguments
| object | an object of class  | 
| complete | general option of the function  | 
| selection.matrix | if TRUE, a selection matrix is returned that indicates the selected variables from each first stage regression. Default is set to FALSE. See section on details for more information. | 
| ... | further arguments passed to function coef. | 
Details
Printing coefficients and selection matrix for S3 object rlassoIV. "x" indicates that a variable has been selected, i.e., the corresponding estimated coefficient is different from zero.
The very last column collects all variables that have been selected in at least one of the lasso regressions represented in the selection.matrix. 
rlassoIV performs three lasso regression steps. A first stage lasso regression of the endogenous treatment variable d on the instruments z and exogenous covariates x,
a lasso regression of y on the exogenous variables x, and a lasso regression of the instrumented treatment variable, i.e., a regression of the predicted values of d, on controls x.
Value
Coefficients obtained from rlassoIV by default. If option selection.matrix is TRUE, a list is returned with final coefficients, a matrix selection.matrix, and a matrix selection.matrixZ: 
selection.matrix contains the selection index for the lasso regression of y on x (first column) and the lasso regression of the predicted values of d on x
together with the union of these indizes.
selection.matrixZ contains the selection index from the first-stage lasso regression of d on z and x.
Examples
## Not run: 
data(EminentDomain)
z <- EminentDomain$logGDP$z # instruments
x <- EminentDomain$logGDP$x # exogenous variables
y <- EminentDomain$logGDP$y # outcome varialbe
d <- EminentDomain$logGDP$d # treatment / endogenous variable
lasso.IV = rlassoIV(x=x, d=d, y=y, z=z, select.X=TRUE, select.Z=TRUE) 
coef(lasso.IV) # default behavior
coef(lasso.IV, selection.matrix = T) # print selection matrix
## End(Not run)
Coefficients from S3 objects rlassoIVselectX
Description
Method to extract coefficients and selection matrix from objects of class rlassoIVselectX.
Usage
## S3 method for class 'rlassoIVselectX'
coef(object, complete = TRUE, selection.matrix = FALSE, ...)
Arguments
| object | an object of class  | 
| complete | general option of the function  | 
| selection.matrix | if TRUE, a selection matrix is returned that indicates the selected variables from each regression. Default is set to FALSE. See section on details for more information. | 
| ... | further arguments passed to functions coef. | 
Details
Printing coefficients and selection matrix for S3 object rlassoIVselectX. The first column of the selection matrix reports the selection index for the lasso regression of y on x in the specified
rlassoIVselectX command. "x" indicates that a variable has been selected, i.e., the corresponding estimated coefficient is different from zero.
The second column contains the selection index for the lasso regression of d on x and the remaining columns
the index of selected variables x for the instruments z. The very last column collects all variables that have been selected in at least one of the lasso regressions.
Examples
## Not run: 
library(hdm)
data(AJR); y = AJR$GDP; d = AJR$Exprop; z = AJR$logMort
x = model.matrix(~ -1 + (Latitude + Latitude2 + Africa + 
                           Asia + Namer + Samer)^2, data=AJR)
AJR.Xselect = rlassoIV(GDP ~ Exprop +  (Latitude + Latitude2 + Africa + Asia + Namer + Samer)^2 |
                         logMort +  (Latitude + Latitude2 + Africa + Asia + Namer + Samer)^2,
                       data=AJR, select.X=TRUE, select.Z=FALSE)
coef(AJR.Xselect) # Default behavior
coef(AJR.Xselect, selection.matrix = TRUE) # print selection matrix
## End(Not run)
Coefficients from S3 objects rlassoIVselectZ
Description
Method to extract coefficients from objects of class rlassoIVselectZ.
Usage
## S3 method for class 'rlassoIVselectZ'
coef(object, complete = TRUE, selection.matrix = FALSE, ...)
Arguments
| object | an object of class  | 
| complete | general option of the function  | 
| selection.matrix | if TRUE, a selection matrix is returned that indicates the selected variables from each first stage regression. Default is set to FALSE. See section on details for more information. | 
| ... | further arguments passed to functions coef. | 
Details
Printing coefficients and selection matrix for S3 object rlassoIVselectZ. The columns of the selection matrix report the selection index for the first stage lasso regressions as specified
rlassoIVselectZ command, i.e., the selected variables for each of the endogenous variables. "x" indicates that a variable has been selected, i.e., the corresponding estimated coefficient is different from zero.
The very last column collects all variables that have been selected in at least one of the lasso regressions.
Examples
## Not run: 
lasso.IV.Z = rlassoIVselectZ(x=x, d=d, y=y, z=z)
data(EminentDomain)
z <- EminentDomain$logGDP$z # instruments
x <- EminentDomain$logGDP$x # exogenous variables
y <- EminentDomain$logGDP$y # outcome varialbe
d <- EminentDomain$logGDP$d # treatment / endogenous variable
lasso.IV.Z = rlassoIVselectZ(x=x, d=d, y=y, z=z)
coef(lasso.IV.Z) # Default behavior
coef(lasso.IV.Z, selection.matrix = T)
## End(Not run)
cps2012 data set
Description
Census data from the US for the year 2012.
Format
- lnw
- log of hourly wage (annual earnings / annual hours) 
- female
- female indicator 
- married status
- six indicators: widowed, divorced, separated, nevermarried, and married (omitted) 
- education attainment
- six indicators: hsd08, hsd911, hsg, cg, ad, and sc (omitted) 
- region indicators
- four indicators: mw, so, we, and ne (omitted) 
- potential experience
- (max[0, age - years of education - 7]): exp1, exp2 (divided by 100), exp3 (divided by 1000), exp4 (divided by 10000) 
- weight
- March Supplement sampling weight 
- year
- CPS year 
Details
The CPS is a monthly U.S. household survey conducted jointly by the U.S. Census Bureau and the Bureau of Labor Statistics. The data comprise the year 2012. This data set was used in Mulligan and Rubinstein (2008). The sample comprises white non-hipanic, ages 25-54, working full time full year (35+ hours per week at least 50 weeks), exclude living in group quarters, self-employed, military, agricultural, and private household sector, allocated earning, inconsistent report on earnings and employment, missing data.
References
C. B. Mulligan and Y. Rubinstein (2008). Selection, investment, and women's relative wages over time. The Quarterly Journal of Economics, 1061–1110.
Examples
data(BLP)
Function for Calculation of the penalty parameter
Description
This function implements different methods for calculation of the penalization parameter \lambda. Further details can be found under rlasso.
Usage
lambdaCalculation(
  penalty = list(homoscedastic = FALSE, X.dependent.lambda = FALSE, lambda.start = NULL,
    c = 1.1, gamma = 0.1),
  y = NULL,
  x = NULL
)
Arguments
| penalty | list with options for the calculation of the penalty. 
 | 
| y | residual which is used for calculation of the variance or the data-dependent loadings | 
| x | matrix of regressor variables | 
Value
The functions returns a list with the penalty lambda which is the product of lambda0 and Ups0. Ups0
denotes either the variance (independent case) or the data-dependent loadings for the regressors. method gives the selected method for the calculation.
Multiple Testing Adjustment of p-values for S3 objects rlassoEffects
and lm
Description
Multiple hypotheses testing adjustment of p-values from a high-dimensional linear model.
Usage
p_adjust(x, ...)
## S3 method for class 'rlassoEffects'
p_adjust(x, method = "RW", B = 1000, ...)
## S3 method for class 'lm'
p_adjust(x, method = "RW", B = 1000, test.index = NULL, ...)
Arguments
| x | an object of S3 class  | 
| ... | further arguments passed on to methods. | 
| method | the method of p-value adjustment for multiple testing.
Romano-Wolf stepdown (' | 
| B | number of bootstrap repetitions (default 1000). | 
| test.index | vector of integers, logicals or variables names indicating
the position of coefficients (integer case), logical vector of length of the
coefficients (TRUE or FALSE) or the coefficient names of x which should be
tested simultaneously (only for S3 class  | 
Details
Multiple testing adjustment is performed for S3 objects of class
rlassoEffects and lm. Implemented methods for multiple testing
adjustment are Romano-Wolf stepdown 'RW' (default) and the adjustment
methods available in the p.adjust function of the stats package,
including the Bonferroni, Bonferroni-Holm, and Benjamini-Hochberg corrections,
see p.adjust.methods.
Objects of class rlassoEffects are constructed by
rlassoEffects.
Value
A matrix with the estimated coefficients and the p-values that are adjusted according to the specified method.
Methods (by class)
-  p_adjust(rlassoEffects):rlassoEffects.
-  p_adjust(lm):lm.
References
J.P. Romano, M. Wolf (2005). Exact and approximate stepdown methods for multiple hypothesis testing. Journal of the American Statistical Association, 100(469), 94-108.
J.P. Romano, M. Wolf (2016). Efficient computation of adjusted p-values for resampling-based stepdown multiple testing. Statistics and Probability Letters, (113), 38-40.
A. Belloni, V. Chernozhukov, K. Kato (2015). Uniform post-selection inference for least absolute deviation regression and other Z-estimation problems. Biometrika, 102(1), 77-94.
Examples
library(hdm);
set.seed(1)
n = 100 #sample size
p = 25 # number of variables
s = 3 # nubmer of non-zero variables
X = matrix(rnorm(n*p), ncol=p)
colnames(X) <- paste("X", 1:p, sep="")
beta = c(rep(3,s), rep(0,p-s))
y = 1 + X%*%beta + rnorm(n)
data = data.frame(cbind(y,X))
colnames(data)[1] <- "y"
lasso.effect = rlassoEffects(X, y, index=c(1:20))
pvals.lasso.effect = p_adjust(lasso.effect, method = "RW", B = 1000)
ols = lm(y ~ -1 + X, data)
pvals.ols = p_adjust(ols, method = "RW", B = 1000)
pvals.ols = p_adjust(ols, method = "RW", B = 1000, test.index = c(1,2,5))
pvals.ols = p_adjust(ols, method = "RW", B = 1000, test.index = c(rep(TRUE, 5), rep(FALSE, p-5)))
Pension 401(k) data set
Description
Data set on financial wealth and 401(k) plan participation
Format
Dataframe with the following variables (amongst others):
- p401
- participation in 401(k) 
- e401
- eligibility for 401(k) 
- a401
- 401(k) assets 
- tw
- total wealth (in US $) 
- tfa
- financial assets (in US $) 
- net_tfa
- net financial assets (in US $) 
- nifa
- non-401k financial assets (in US $) 
- net_nifa
- net non-401k financial assets 
- net_n401
- net non-401(k) assets (in US $) 
- ira
- individual retirement account (IRA) 
- inc
- income (in US $) 
- age
- age 
- fsize
- family size 
- marr
- married 
- pira
- participation in IRA 
- db
- defined benefit pension 
- hown
- home owner 
- educ
- education (in years) 
- male
- male 
- twoearn
- two earners 
- nohs, hs, smcol, col
- dummies for education: no high-school, high-school, some college, college 
- hmort
- home mortage (in US $) 
- hequity
- home equity (in US $) 
- hval
- home value (in US $) 
Details
The sample is drawn from the 1991 Survey of Income and Program Participation (SIPP) and consists of 9,915 observations. The observational units are household reference persons aged 25-64 and spouse if present. Households are included in the sample if at least one person is employed and no one is self-employed. The data set was analysed in Chernozhukov and Hansen (2004) and Belloni et al. (2014) where further details can be found. They examine the effects of 401(k) plans on wealth using data from the Survey of Income and Program Participation using 401(k) eligibility as an instrument for 401(k) participation.
References
V. Chernohukov, C. Hansen (2004). The impact of 401(k) participation on the wealth distribution: An instrumental quantile regression analysis. The Review of Economic and Statistics 86 (3), 735–751.
A. Belloni, V. Chernozhukov, I. Fernandez-Val, and C. Hansen (2014). Program evaluation with high-dimensional data. Working Paper.
Examples
data(pension)
Methods for S3 object rlassologit
Description
Objects of class rlassologit are constructed by rlassologit.
print.rlassologit prints and displays some information about fitted rlassologit objects.
summary.rlassologit summarizes information of a fitted rlassologit object.
predict.rlassologit predicts values based on a rlassologit object.
model.matrix.rlassologit constructs the model matrix of a lasso object.
Usage
## S3 method for class 'rlassologit'
predict(object, newdata = NULL, type = "response", ...)
## S3 method for class 'rlassologit'
model.matrix(object, ...)
## S3 method for class 'rlassologit'
print(x, all = TRUE, digits = max(3L, getOption("digits") - 3L), ...)
## S3 method for class 'rlassologit'
summary(object, all = TRUE, digits = max(3L, getOption("digits") - 3L), ...)
Arguments
| object | an object of class  | 
| newdata | new data set for prediction | 
| type | type of prediction required. The default ('response) is on the scale of the response variable; the alternative 'link' is on the scale of the linear predictors. | 
| ... | arguments passed to the print function and other methods | 
| x | an object of class  | 
| all | logical, indicates if coefficients of all variables (TRUE) should be displayed or only the non-zero ones (FALSE) | 
| digits | significant digits in printout | 
Methods for S3 object rlasso
Description
Objects of class rlasso are constructed by rlasso.
print.rlasso prints and displays some information about fitted rlasso objects.
summary.rlasso summarizes information of a fitted rlasso object.
predict.rlasso predicts values based on a rlasso object.
model.matrix.rlasso constructs the model matrix of a rlasso object.
Usage
## S3 method for class 'rlasso'
print(x, all = TRUE, digits = max(3L, getOption("digits") - 3L), ...)
## S3 method for class 'rlasso'
summary(object, all = TRUE, digits = max(3L, getOption("digits") - 3L), ...)
## S3 method for class 'rlasso'
model.matrix(object, ...)
## S3 method for class 'rlasso'
predict(object, newdata = NULL, ...)
Arguments
| x | an object of class  | 
| all | logical, indicates if coefficients of all variables (TRUE) should be displayed or only the non-zero ones (FALSE) | 
| digits | significant digits in printout | 
| ... | arguments passed to the print function and other methods | 
| object | an object of class  | 
| newdata | new data set for prediction. An optional data frame in which to look for variables with which to predict. If omitted, the fitted values are returned. | 
Methods for S3 object rlassoEffects
Description
Objects of class rlassoEffects are constructed by  rlassoEffects.
print.rlassoEffects prints and displays some information about fitted rlassoEffect objects.
summary.rlassoEffects summarizes information of a fitted rlassoEffect object and is described at summary.rlassoEffects.
confint.rlassoEffects extracts the confidence intervals.
plot.rlassoEffects plots the estimates with confidence intervals.
Usage
## S3 method for class 'rlassoEffects'
print(x, digits = max(3L, getOption("digits") - 3L), ...)
## S3 method for class 'rlassoEffects'
confint(object, parm, level = 0.95, joint = FALSE, ...)
## S3 method for class 'rlassoEffects'
plot(
  x,
  joint = FALSE,
  level = 0.95,
  main = "",
  xlab = "coef",
  ylab = "",
  xlim = NULL,
  ...
)
Arguments
| x | an object of class  | 
| digits | significant digits in printout | 
| ... | arguments passed to the print function and other methods. | 
| object | an object of class  | 
| parm | a specification of which parameters are to be given confidence intervals among the variables for which inference was done, either a vector of numbers or a vector of names. If missing, all parameters are considered. | 
| level | confidence level required | 
| joint | logical, if  | 
| main | an overall title for the plot | 
| xlab | a title for the x axis | 
| ylab | a title for the y axis | 
| xlim | vector of length two giving lower and upper bound of x axis | 
Methods for S3 object rlassoIV
Description
Objects of class rlassoIV are constructed by rlassoIV. 
print.rlassoIV prints and displays some information about fitted rlassoIV objects.
summary.rlassoIV summarizes information of a fitted rlassoIV object.
confint.rlassoIV extracts the confidence intervals.
Usage
## S3 method for class 'rlassoIV'
print(x, digits = max(3L, getOption("digits") - 3L), ...)
## S3 method for class 'rlassoIV'
summary(object, digits = max(3L, getOption("digits") - 3L), ...)
## S3 method for class 'rlassoIV'
confint(object, parm, level = 0.95, ...)
Arguments
| x | an object of class  | 
| digits | significant digits in printout | 
| ... | arguments passed to the print function and other methods | 
| object | An object of class  | 
| parm | a specification of which parameters are to be given confidence intervals, either a vector of numbers or a vector of names. If missing, all parameters are considered. | 
| level | confidence level required. | 
Methods for S3 object rlassoIVselectX
Description
Objects of class rlassoIVselectX are constructed by rlassoIVselectX. 
print.rlassoIVselectX prints and displays some information about fitted rlassoIVselectX objects.
summary.rlassoIVselectX summarizes information of a fitted rlassoIVselectX object.
confint.rlassoIVselectX extracts the confidence intervals.
Usage
## S3 method for class 'rlassoIVselectX'
print(x, digits = max(3L, getOption("digits") - 3L), ...)
## S3 method for class 'rlassoIVselectX'
summary(object, digits = max(3L, getOption("digits") - 3L), ...)
## S3 method for class 'rlassoIVselectX'
confint(object, parm, level = 0.95, ...)
Arguments
| x | an object of class  | 
| digits | significant digits in printout | 
| ... | arguments passed to the print function and other methods | 
| object | an object of class  | 
| parm | a specification of which parameters are to be given confidence intervals, either a vector of numbers or a vector of names. If missing, all parameters are considered. | 
| level | the confidence level required. | 
Methods for S3 object rlassoIVselectZ
Description
Objects of class rlassoIVselectZ are constructed by rlassoIVselectZ. 
print.rlassoIVselectZ prints and displays some information about fitted rlassoIVselectZ objects.
summary.rlassoIVselectZ summarizes information of a fitted rlassoIVselectZ object.
confint.rlassoIVselectZ extracts the confidence intervals.
Usage
## S3 method for class 'rlassoIVselectZ'
print(x, digits = max(3L, getOption("digits") - 3L), ...)
## S3 method for class 'rlassoIVselectZ'
summary(object, digits = max(3L, getOption("digits") - 3L), ...)
## S3 method for class 'rlassoIVselectZ'
confint(object, parm, level = 0.95, ...)
Arguments
| x | an object of class  | 
| digits | significant digits in printout | 
| ... | arguments passed to the print function and other methods | 
| object | an object of class  | 
| parm | a specification of which parameters are to be given confidence intervals, either a vector of numbers or a vector of names. If missing, all parameters are considered. | 
| level | confidence level required. | 
Methods for S3 object rlassoTE
Description
Objects of class rlassoTE are constructed by  rlassoATE,  rlassoATET, rlassoLATE,  rlassoLATET.
print.rlassoTE prints and displays some information about fitted rlassoTE objects.
summary.rlassoTE summarizes information of a fitted rlassoTE object.
confint.rlassoTE extracts the confidence intervals.
Usage
## S3 method for class 'rlassoTE'
print(x, digits = max(3L, getOption("digits") - 3L), ...)
## S3 method for class 'rlassoTE'
summary(object, digits = max(3L, getOption("digits") - 3L), ...)
## S3 method for class 'rlassoTE'
confint(object, parm, level = 0.95, ...)
Arguments
| x | an object of class  | 
| digits | number of significant digits in printout | 
| ... | arguments passed to the print function and other methods | 
| object | an object of class  | 
| parm | a specification of which parameters are to be given confidence intervals, either a vector of numbers or a vector of names. If missing, all parameters are considered. | 
| level | confidence level required. | 
Methods for S3 object rlassologitEffects
Description
Objects of class rlassologitEffects are construced by rlassologitEffects or rlassologitEffect. 
print.rlassologitEffects prints and displays some information about fitted rlassologitEffect objects.
summary.rlassologitEffects summarizes information of a fitted rlassologitEffects object.
confint.rlassologitEffects extracts the confidence intervals.
Usage
## S3 method for class 'rlassologitEffects'
print(x, digits = max(3L, getOption("digits") - 3L), ...)
## S3 method for class 'rlassologitEffects'
summary(object, digits = max(3L, getOption("digits") - 3L), ...)
## S3 method for class 'rlassologitEffects'
confint(object, parm, level = 0.95, joint = FALSE, ...)
Arguments
| x | an object of class  | 
| digits | number of significant digits in printout | 
| ... | arguments passed to the print function and other methods | 
| object | an object of class  | 
| parm | a specification of which parameters are to be given confidence intervals, either a vector of numbers or a vector of names. If missing, all parameters are considered. | 
| level | confidence level required. | 
| joint | logical, if joint confidence intervals should be clalculated | 
Methods for S3 object tsls
Description
Objects of class tsls are constructed by tsls. 
print.tsls prints and displays some information about fitted tsls objects.
summary.tsls summarizes information of a fitted tsls object.
Usage
## S3 method for class 'tsls'
print(x, digits = max(3L, getOption("digits") - 3L), ...)
## S3 method for class 'tsls'
summary(object, digits = max(3L, getOption("digits") - 3L), ...)
Arguments
| x | an object of class  | 
| digits | significant digits in printout | 
| ... | arguments passed to the print function and other methods | 
| object | an object of class  | 
Printing coefficients from S3 objects rlassoEffects
Description
Printing coefficients for class rlassoEffects
Usage
print_coef(x, ...)
## S3 method for class 'rlassoEffects'
print_coef(
  x,
  complete = TRUE,
  selection.matrix = FALSE,
  include.targets = TRUE,
  ...
)
Arguments
| x | an object of class  | 
| ... | further arguments passed to functions coef or print. | 
| complete | general option of the function  | 
| selection.matrix | if TRUE, a selection matrix is returned that indicates the selected variables from each auxiliary regression. Default is set to FALSE. | 
| include.targets | if FALSE (by default) only the selected control variables are listed in the  | 
Details
Printing coefficients and selection matrix for S3 object rlassoEffects
Examples
library(hdm)
set.seed(1)
n = 100 #sample size
p = 100 # number of variables
s = 7 # number of non-zero variables
X = matrix(rnorm(n*p), ncol=p)
colnames(X) <- paste("X", 1:p, sep="")
beta = c(rep(3,s), rep(0,p-s))
y = 1 + X%*%beta + rnorm(n)
data = data.frame(cbind(y,X))
colnames(data)[1] <- "y"
lasso.effect = rlassoEffects(X, y, index=c(1,2,3,50), 
                             method = "double selection")
# without target coefficient estimates
print_coef(lasso.effect, selection.matrix = TRUE) 
# with target coefficient estimates
print_coef(lasso.effect, selection.matrix = TRUE, targets = TRUE) 
rlasso: Function for Lasso estimation under homoscedastic and heteroscedastic non-Gaussian disturbances
Description
The function estimates the coefficients of a Lasso regression with
data-driven penalty under homoscedasticity and heteroscedasticity with non-Gaussian noise and X-dependent or X-independent design. The
method of the data-driven penalty can be chosen. The object which is
returned is of the S3 class rlasso.
Usage
rlasso(x, ...)
## S3 method for class 'formula'
rlasso(
  formula,
  data = NULL,
  post = TRUE,
  intercept = TRUE,
  model = TRUE,
  penalty = list(homoscedastic = FALSE, X.dependent.lambda = FALSE, lambda.start = NULL,
    c = 1.1, gamma = 0.1/log(n)),
  control = list(numIter = 15, tol = 10^-5, threshold = NULL),
  ...
)
## S3 method for class 'character'
rlasso(
  x,
  data = NULL,
  post = TRUE,
  intercept = TRUE,
  model = TRUE,
  penalty = list(homoscedastic = FALSE, X.dependent.lambda = FALSE, lambda.start = NULL,
    c = 1.1, gamma = 0.1/log(n)),
  control = list(numIter = 15, tol = 10^-5, threshold = NULL),
  ...
)
## Default S3 method:
rlasso(
  x,
  y,
  post = TRUE,
  intercept = TRUE,
  model = TRUE,
  penalty = list(homoscedastic = FALSE, X.dependent.lambda = FALSE, lambda.start = NULL,
    c = 1.1, gamma = 0.1/log(n)),
  control = list(numIter = 15, tol = 10^-5, threshold = NULL),
  ...
)
Arguments
| x | regressors (vector, matrix or object can be coerced to matrix) | 
| ... | further arguments (only for consistent defintion of methods) | 
| formula | an object of class "formula" (or one that can be coerced to
that class): a symbolic description of the model to be fitted in the form
 | 
| data | an optional data frame, list or environment (or object coercible
by as.data.frame to a data frame) containing the variables in the model. If
not found in data, the variables are taken from environment(formula),
typically the environment from which  | 
| post | logical. If  | 
| intercept | logical. If  | 
| model | logical. If  | 
| penalty | list with options for the calculation of the penalty. 
 | 
| control | list with control values.
 | 
| y | dependent variable (vector, matrix or object can be coerced to matrix) | 
Details
The function estimates the coefficients of a Lasso regression with
data-driven penalty under homoscedasticity / heteroscedasticity and non-Gaussian noise. The options homoscedastic is a logical with FALSE by default.
Moreover, for the calculation of the penalty parameter it can be chosen, if the penalization parameter depends on the  design matrix (X.dependent.lambda=TRUE) or independent (default, X.dependent.lambda=FALSE).
The default value of the constant c is 1.1 in the post-Lasso case and 0.5 in the Lasso case. 
A special option is to set homoscedastic to none and to supply a values lambda.start. Then this value is used as penalty parameter with independent design and heteroscedastic errors to weight the regressors.
For details of the
implementation of the Algorithm for estimation of the data-driven penalty,
in particular the regressor-independent loadings, we refer to Appendix A in
Belloni et al. (2012). When the option "none" is chosen for homoscedastic (together with
lambda.start), lambda is set to lambda.start and the
regressor-independent loadings und heteroscedasticity are used. The options "X-dependent" and
"X-independent" under homoscedasticity are described in Belloni et al. (2013). 
The option post=TRUE conducts post-lasso estimation, i.e. a refit of
the model with the selected variables.
Value
rlasso returns an object of class rlasso. An object of
class "rlasso" is a list containing at least the following components:
- coefficients parameter estimates 
- beta parameter estimates (named vector of coefficients without intercept) 
- intercept value of the intercept 
- index index of selected variables (logical vector) 
- lambda data-driven penalty term for each variable, product of lambda0 (the penalization parameter) and the loadings 
- lambda0 penalty term 
- loadings loading for each regressor 
- residuals residuals, response minus fitted values 
- sigma root of the variance of the residuals 
- iter number of iterations 
- call function call 
- options options 
- model model matrix (if - model = TRUEin function call)
References
A. Belloni, D. Chen, V. Chernozhukov and C. Hansen (2012). Sparse models and methods for optimal instruments with an application to eminent domain. Econometrica 80 (6), 2369-2429.
A. Belloni, V. Chernozhukov and C. Hansen (2013). Inference for high-dimensional sparse econometric models. In Advances in Economics and Econometrics: 10th World Congress, Vol. 3: Econometrics, Cambirdge University Press: Cambridge, 245-295.
Examples
set.seed(1)
n = 100 #sample size
p = 100 # number of variables
s = 3 # nubmer of variables with non-zero coefficients
X = Xnames = matrix(rnorm(n*p), ncol=p)
colnames(Xnames) <- paste("V", 1:p, sep="")
beta = c(rep(5,s), rep(0,p-s))
Y = X%*%beta + rnorm(n)
reg.lasso <- rlasso(Y~Xnames)
Xnew = matrix(rnorm(n*p), ncol=p)  # new X
colnames(Xnew) <- paste("V", 1:p, sep="")
Ynew =  Xnew%*%beta + rnorm(n)  #new Y
yhat = predict(reg.lasso, newdata = Xnew)
Functions for estimation of treatment effects
Description
This class of functions estimates the average treatment effect (ATE), the ATE of the tretated (ATET), the local average treatment effects (LATE) and the LATE of the tretated (LATET). The estimation methods rely on immunized / orthogonal moment conditions which guarantee valid post-selection inference in a high-dimensional setting. Further details can be found in Belloni et al. (2014).
Usage
rlassoATE(x, ...)
## Default S3 method:
rlassoATE(x, d, y, bootstrap = "none", nRep = 500, ...)
## S3 method for class 'formula'
rlassoATE(formula, data, bootstrap = "none", nRep = 500, ...)
rlassoATET(x, ...)
## Default S3 method:
rlassoATET(x, d, y, bootstrap = "none", nRep = 500, ...)
## S3 method for class 'formula'
rlassoATET(formula, data, bootstrap = "none", nRep = 500, ...)
rlassoLATE(x, ...)
## Default S3 method:
rlassoLATE(
  x,
  d,
  y,
  z,
  bootstrap = "none",
  nRep = 500,
  post = TRUE,
  intercept = TRUE,
  always_takers = TRUE,
  never_takers = TRUE,
  ...
)
## S3 method for class 'formula'
rlassoLATE(
  formula,
  data,
  bootstrap = "none",
  nRep = 500,
  post = TRUE,
  intercept = TRUE,
  always_takers = TRUE,
  never_takers = TRUE,
  ...
)
rlassoLATET(x, ...)
## Default S3 method:
rlassoLATET(
  x,
  d,
  y,
  z,
  bootstrap = "none",
  nRep = 500,
  post = TRUE,
  intercept = TRUE,
  always_takers = TRUE,
  ...
)
## S3 method for class 'formula'
rlassoLATET(
  formula,
  data,
  bootstrap = "none",
  nRep = 500,
  post = TRUE,
  intercept = TRUE,
  always_takers = TRUE,
  ...
)
Arguments
| x | exogenous variables | 
| ... | arguments passed, e.g.  | 
| d | treatment variable (binary) | 
| y | outcome variable / dependent variable | 
| bootstrap | boostrap method which should be employed: 'none', 'Bayes', 'normal', 'wild' | 
| nRep | number of replications for the bootstrap | 
| formula | An object of class  | 
| data | An optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. 
If not found in data, the variables are taken from environment(formula), typically the environment from which  | 
| z | instrumental variables (binary) | 
| post | logical. If  | 
| intercept | logical. If  | 
| always_takers | option to adapt to cases with (default) and without always-takers. If  | 
| never_takers | option to adapt to cases with (default) and without never-takers. If  | 
Details
Details can be found in Belloni et al. (2014).
Value
Functions return an object of class rlassoTE with estimated effects, standard errors and
individual effects in the form of a list.
References
A. Belloni, V. Chernozhukov, I. Fernandez-Val, and C. Hansen (2014). Program evaluation with high-dimensional data. Working Paper.
rigorous Lasso for Linear Models: Inference
Description
Estimation and inference of (low-dimensional) target coefficients in a high-dimensional linear model.
Usage
rlassoEffects(x, ...)
## Default S3 method:
rlassoEffects(
  x,
  y,
  index = c(1:ncol(x)),
  method = "partialling out",
  I3 = NULL,
  post = TRUE,
  ...
)
## S3 method for class 'formula'
rlassoEffects(
  formula,
  data,
  I,
  method = "partialling out",
  included = NULL,
  post = TRUE,
  ...
)
rlassoEffect(x, y, d, method = "double selection", I3 = NULL, post = TRUE, ...)
Arguments
| x | matrix of regressor variables serving as controls and potential
treatments. For  | 
| ... | parameters passed to the  | 
| y | outcome variable (vector or matrix) | 
| index | vector of integers, logicals or variables names indicating the position (column) of
variables (integer case), logical vector of length of the variables (TRUE or FALSE) or the variable names of  | 
| method | method for inference, either 'partialling out' (default) or 'double selection'. | 
| I3 | For the 'double selection'-method the logical vector  | 
| post | logical, if post Lasso is conducted with default  | 
| formula | An element of class  | 
| data | an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which the function is called. | 
| I | An one-sided formula specifying the variables for which inference is conducted. | 
| included | One-sided formula of variables which should be included in any case (only for method="double selection"). | 
| d | variable for which inference is conducted (treatment variable) | 
Details
The functions estimates (low-dimensional) target coefficients in a high-dimensional linear model.
An application is e.g. estimation of a treatment effect \alpha_0 in a
setting of high-dimensional controls. The user can choose between the so-called post-double-selection method and partialling-out.
The idea of the double selection method is to select variables by Lasso regression of
the outcome variable on the control variables and the treatment variable on
the control variables. The final estimation is done by a regression of the
outcome on the treatment effect and the union of the selected variables in
the first two steps. In partialling-out first the effect of the regressors on the outcome and the treatment variable is taken out by Lasso and then a regression of the residuals is conducted. The resulting estimator for \alpha_0 is normal
distributed which allows inference on the treatment effect. It presents a wrap function for rlassoEffect 
which does inference for a single variable.
Value
The function returns an object of class rlassoEffects with the following entries: 
| coefficients | vector with estimated values of the coefficients for each selected variable | 
| se | standard error (vector) | 
| t | t-statistic | 
| pval | p-value | 
| samplesize | sample size of the data set | 
| index | index of the variables for which inference is performed | 
References
A. Belloni, V. Chernozhukov, C. Hansen (2014). Inference on treatment effects after selection among high-dimensional controls. The Review of Economic Studies 81(2), 608-650.
Examples
library(hdm); library(ggplot2)
set.seed(1)
n = 100 #sample size
p = 100 # number of variables
s = 3 # number of non-zero variables
X = matrix(rnorm(n*p), ncol=p)
colnames(X) <- paste("X", 1:p, sep="")
beta = c(rep(3,s), rep(0,p-s))
y = 1 + X%*%beta + rnorm(n)
data = data.frame(cbind(y,X))
colnames(data)[1] <- "y"
fm = paste("y ~", paste(colnames(X), collapse="+"))
fm = as.formula(fm)                 
lasso.effect = rlassoEffects(X, y, index=c(1,2,3,50))
lasso.effect = rlassoEffects(fm, I = ~ X1 + X2 + X3 + X50, data=data)
print(lasso.effect)
summary(lasso.effect)
confint(lasso.effect)
plot(lasso.effect)
Post-Selection and Post-Regularization Inference in Linear Models with Many Controls and Instruments
Description
The function estimates a treatment effect in a setting with very many controls and very many instruments (even larger than the sample size).
Usage
rlassoIV(x, ...)
## Default S3 method:
rlassoIV(x, d, y, z, select.Z = TRUE, select.X = TRUE, post = TRUE, ...)
## S3 method for class 'formula'
rlassoIV(formula, data, select.Z = TRUE, select.X = TRUE, post = TRUE, ...)
rlassoIVmult(x, d, y, z, select.Z = TRUE, select.X = TRUE, ...)
Arguments
| x | matrix of exogenous variables | 
| ... | arguments passed to the function  | 
| d | endogenous variable | 
| y | outcome / dependent variable (vector or matrix) | 
| z | matrix of instrumental variables | 
| select.Z | logical, indicating selection on the instruments. | 
| select.X | logical, indicating selection on the exogenous variables. | 
| post | logical, wheter post-Lasso should be conducted (default= | 
| formula | An object of class  | 
| data | an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. 
If not found in data, the variables are taken from environment(formula), typically the environment from which  | 
Details
The implementation for selection on x and z follows the procedure described in Chernozhukov et al.
(2015) and is built on 'triple selection' to achieve an orthogonal moment
function. The function returns an object of S3 class rlassoIV.
Moreover, it is wrap function for the case that selection should be done only with the instruments Z (rlassoIVselectZ) or with 
the control variables X (rlassoIVselectX) or without selection (tsls). Exogenous variables 
x are automatically used as instruments and added to the
instrument set z.
Value
an object of class rlassoIV containing at least the following
components: 
| coefficients | estimated parameter value | 
| se | variance-covariance matrix | 
References
V. Chernozhukov, C. Hansen, M. Spindler (2015). Post-selection and post-regularization inference in linear models with many controls and instruments. American Economic Review: Paper & Proceedings 105(5), 486–490.
Examples
## Not run: 
data(EminentDomain)
z <- EminentDomain$logGDP$z # instruments
x <- EminentDomain$logGDP$x # exogenous variables
y <- EminentDomain$logGDP$y # outcome varialbe
d <- EminentDomain$logGDP$d # treatment / endogenous variable
lasso.IV.Z = rlassoIV(x=x, d=d, y=y, z=z, select.X=FALSE, select.Z=TRUE) 
summary(lasso.IV.Z)
confint(lasso.IV.Z)
## End(Not run)
Instrumental Variable Estimation with Selection on the exogenous Variables by Lasso
Description
This function estimates the coefficient of an endogenous variable by employing Instrument Variables in a setting where the exogenous variables are high-dimensional and hence
selection on the exogenous variables is required.
The function returns an element of class rlassoIVselectX
Usage
rlassoIVselectX(x, ...)
## Default S3 method:
rlassoIVselectX(x, d, y, z, post = TRUE, ...)
## S3 method for class 'formula'
rlassoIVselectX(formula, data, post = TRUE, ...)
Arguments
| x | exogenous variables in the structural equation (matrix) | 
| ... | arguments passed to the function  | 
| d | endogenous variables in the structural equation (vector or matrix) | 
| y | outcome or dependent variable in the structural equation (vector or matrix) | 
| z | set of potential instruments for the endogenous variables. | 
| post | logical. If  | 
| formula | An object of class  | 
| data | An optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. 
If not found in data, the variables are taken from environment(formula), typically the environment from which  | 
Details
The implementation is a special case of of Chernozhukov et al. (2015).
The option post=TRUE conducts post-lasso estimation for the Lasso estimations, i.e. a refit of the
model with the selected variables. Exogenous variables 
x are automatically used as instruments and added to the
instrument set z.
Value
An object of class rlassoIVselectX containing at least the following
components: 
| coefficients | estimated parameter vector | 
| vcov | variance-covariance matrix | 
| residuals | residuals | 
| samplesize | sample size | 
References
Chernozhukov, V., Hansen, C. and M. Spindler (2015). Post-Selection and Post-Regularization Inference in Linear Models with Many Controls and Instruments American Economic Review, Papers and Proceedings 105(5), 486–490.
Examples
library(hdm)
data(AJR); y = AJR$GDP; d = AJR$Exprop; z = AJR$logMort
x = model.matrix(~ -1 + (Latitude + Latitude2 + Africa + 
                           Asia + Namer + Samer)^2, data=AJR)
dim(x)
  #AJR.Xselect = rlassoIV(x=x, d=d, y=y, z=z, select.X=TRUE, select.Z=FALSE)
  AJR.Xselect = rlassoIV(GDP ~ Exprop +  (Latitude + Latitude2 + Africa + Asia + Namer + Samer)^2 |
             logMort +  (Latitude + Latitude2 + Africa + Asia + Namer + Samer)^2,
             data=AJR, select.X=TRUE, select.Z=FALSE)
summary(AJR.Xselect)
confint(AJR.Xselect)
Instrumental Variable Estimation with Lasso
Description
This function selects the instrumental variables in the first stage by
Lasso. First stage predictions are then used in the second stage as optimal
instruments to estimate the parameter vector. The function returns an element of class rlassoIVselectZ
Usage
rlassoIVselectZ(x, ...)
## Default S3 method:
rlassoIVselectZ(x, d, y, z, post = TRUE, intercept = TRUE, ...)
## S3 method for class 'formula'
rlassoIVselectZ(formula, data, post = TRUE, intercept = TRUE, ...)
Arguments
| x | exogenous variables in the structural equation (matrix) | 
| ... | arguments passed to the function  | 
| d | endogenous variables in the structural equation (vector or matrix) | 
| y | outcome or dependent variable in the structural equation (vector or matrix) | 
| z | set of potential instruments for the endogenous variables. Exogenous variables serve as their own instruments. | 
| post | logical. If  | 
| intercept | logical. If  | 
| formula | An object of class  | 
| data | An optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. 
If not found in data, the variables are taken from environment(formula), typically the environment from which  | 
Details
The implementation follows the procedure described in Belloni et al. (2012).
Option post=TRUE conducts post-lasso estimation, i.e. a refit of the
model with the selected variables, to estimate the optimal instruments. The
parameter vector of the structural equation is then fitted by two-stage
least square (tsls) estimation.
Value
An object of class rlassoIVselectZ containing at least the following
components: 
| coefficients | estimated parameter vector | 
| vcov | variance-covariance matrix | 
| residuals | residuals | 
| samplesize | sample size | 
| selection.matrix | matrix of selected variables in the first stage for each endogenous variable | 
References
D. Belloni, D. Chen, V. Chernozhukov and C. Hansen (2012). Sparse models and methods for optimal instruments with an application to eminent domain. Econometrica 80 (6), 2369–2429.
rlassologit: Function for logistic Lasso estimation
Description
The function estimates the coefficients of a logistic Lasso regression with
data-driven penalty. The method of the data-driven penalty can be chosen.
The object which is returned is of the S3 class rlassologit
Usage
rlassologit(x, ...)
## S3 method for class 'formula'
rlassologit(
  formula,
  data = NULL,
  post = TRUE,
  intercept = TRUE,
  model = TRUE,
  penalty = list(lambda = NULL, c = 1.1, gamma = 0.1/log(n)),
  control = list(threshold = NULL),
  ...
)
## S3 method for class 'character'
rlassologit(
  x,
  data = NULL,
  post = TRUE,
  intercept = TRUE,
  model = TRUE,
  penalty = list(lambda = NULL, c = 1.1, gamma = 0.1/log(n)),
  control = list(threshold = NULL),
  ...
)
## Default S3 method:
rlassologit(
  x,
  y,
  post = TRUE,
  intercept = TRUE,
  model = TRUE,
  penalty = list(lambda = NULL, c = 1.1, gamma = 0.1/log(n)),
  control = list(threshold = NULL),
  ...
)
Arguments
| x | regressors (matrix) | 
| ... | further parameters passed to glmnet | 
| formula | an object of class 'formula' (or one that can be coerced to
that class): a symbolic description of the model to be fitted in the form
 | 
| data | an optional data frame, list or environment. | 
| post | logical. If  | 
| intercept | logical. If  | 
| model | logical. If  | 
| penalty | list with options for the calculation of the penalty.   | 
| control | list with control values.
 | 
| y | dependent variable (vector or matrix) | 
Details
The function estimates the coefficients of a Logistic Lasso regression with
data-driven penalty. The
option post=TRUE conducts post-lasso estimation, i.e. a refit of the
model with the selected variables.
Value
rlassologit returns an object of class
rlassologit. An object of class rlassologit is a list
containing at least the following components: 
| coefficients | parameter estimates | 
| beta | parameter estimates (without intercept) | 
| intercept | value of intercept | 
| index | index of selected variables (logicals) | 
| lambda | penalty term | 
| residuals | residuals | 
| sigma | root of the variance of the residuals | 
| call | function call | 
| options | options | 
References
Belloni, A., Chernozhukov and Y. Wei (2013). Honest confidence regions for logistic regression with a large number of controls. arXiv preprint arXiv:1304.3969.
Examples
## Not run: 
library(hdm)
## DGP
set.seed(2)
n <- 250
p <- 100
px <- 10
X <- matrix(rnorm(n*p), ncol=p)
beta <- c(rep(2,px), rep(0,p-px))
intercept <- 1
P <- exp(intercept + X %*% beta)/(1+exp(intercept + X %*% beta))
y <- rbinom(length(y), size=1, prob=P)
## fit rlassologit object
rlassologit.reg <- rlassologit(y~X)
## methods
summary(rlassologit.reg, all=F)
print(rlassologit.reg)
predict(rlassologit.reg, type='response')
X3 <- matrix(rnorm(n*p), ncol=p)
predict(rlassologit.reg, newdata=X3)
## End(Not run)
rigorous Lasso for Logistic Models: Inference
Description
The function estimates (low-dimensional) target coefficients in a high-dimensional logistic model.
Usage
rlassologitEffects(x, ...)
## Default S3 method:
rlassologitEffects(x, y, index = c(1:ncol(x)), I3 = NULL, post = TRUE, ...)
## S3 method for class 'formula'
rlassologitEffects(formula, data, I, included = NULL, post = TRUE, ...)
rlassologitEffect(x, y, d, I3 = NULL, post = TRUE)
Arguments
| x | matrix of regressor variables serving as controls and potential
treatments.  For  | 
| ... | additional parameters | 
| y | outcome variable | 
| index | vector of integers, logical or names indicating the position (column) or name of variables of x which should be used as treatment variables. | 
| I3 | logical vector with same length as the number of controls; indicates if variables (TRUE) should be included in any case. | 
| post | logical. If  | 
| formula | An element of class  | 
| data | an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which the function is called. | 
| I | An one-sided formula specifying the variables for which inference is conducted. | 
| included | One-sided formula of variables which should be included in any case. | 
| d | variable for which inference is conducted (treatment variable) | 
Details
The functions estimates (low-dimensional) target coefficients in a high-dimensional logistic model.
An application is e.g. estimation of a treatment effect \alpha_0 in a
setting of high-dimensional controls. The function is a wrap function for rlassologitEffect which does inference for only one variable (d).
Value
The function returns an object of class rlassologitEffects with the following entries: 
| coefficients | estimated value of the coefficients | 
| se | standard errors | 
| t | t-statistics | 
| pval | p-values | 
| samplesize | sample size of the data set | 
| I | index of variables of the union of the lasso regressions | 
References
A. Belloni, V. Chernozhukov, Y. Wei (2013). Honest confidence regions for a regression parameter in logistic regression with a loarge number of controls. cemmap working paper CWP67/13.
Examples
## Not run: 
library(hdm)
## DGP
set.seed(2)
n <- 250
p <- 100
px <- 10
X <- matrix(rnorm(n*p), ncol=p)
colnames(X) = paste("V", 1:p, sep="")
beta <- c(rep(2,px), rep(0,p-px))
intercept <- 1
P <- exp(intercept + X %*% beta)/(1+exp(intercept + X %*% beta))
y <- rbinom(n, size=1, prob=P)
xd <- X[,2:50]
d <- X[,1]
logit.effect <- rlassologitEffect(x=xd, d=d, y=y)
logit.effects <- rlassologitEffects(X,y, index=c(1,2,40))
logit.effects.f <- rlassologitEffects(y ~ X, I = ~ V1 + V2)
## End(Not run)
Summarizing rlassoEffects fits
Description
Summary method for class rlassoEffects
Usage
## S3 method for class 'rlassoEffects'
summary(object, ...)
## S3 method for class 'summary.rlassoEffects'
print(x, digits = max(3L, getOption("digits") - 3L), ...)
Arguments
| object | an object of class  | 
| ... | further arguments passed to or from other methods. | 
| x | an object of class  | 
| digits | the number of significant digits to use when printing. | 
Details
Summary of objects of class rlassoEffects
Two-Stage Least Squares Estimation (TSLS)
Description
The function does Two-Stage Least Squares Estimation (TSLS).
Usage
tsls(x, ...)
## Default S3 method:
tsls(x, d, y, z, intercept = TRUE, homoscedastic = TRUE, ...)
## S3 method for class 'formula'
tsls(formula, data, intercept = TRUE, homoscedastic = TRUE, ...)
Arguments
| x | exogenous variables | 
| ... | further arguments (only for consistent defintion of methods) | 
| d | endogenous variables | 
| y | outcome variable | 
| z | instruments | 
| intercept | logical, if intercept should be included | 
| homoscedastic | logical, if homoscedastic ( | 
| formula | An object of class  | 
| data | An optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. 
If not found in data, the variables are taken from environment(formula), typically the environment from which  | 
Details
The function computes tsls estimate (coefficients) and variance-covariance-matrix assuming homoskedasticity
for outcome variable y where d are endogenous variables in structural equation, x are exogensous variables in
structural equation and z are instruments. It returns an object of class tsls for which the methods print and summary 
are provided.
Value
The function returns a list with the following elements
| coefficients | coefficients | 
| vcov | variance-covariance matrix | 
| residuals | outcome minus predicted values | 
| call | function call | 
| samplesize | sample size | 
| se | standard error |