| Title: | Panel Data Wrangling Tools | 
| Version: | 1.2.13 | 
| BugReports: | https://github.com/JSzitas/panelWranglR/issues | 
| Description: | Leading/lagging a panel, creating dummy variables, taking panel differences, looking for panel autocorrelations, and more. Implemented via a 'data.table' back end. | 
| License: | GPL-3 | 
| Depends: | R (≥ 3.2.0) | 
| Suggests: | testthat (≥ 2.1.0) | 
| Encoding: | UTF-8 | 
| LazyData: | true | 
| URL: | https://github.com/JSzitas/panelWranglR | 
| RoxygenNote: | 6.1.1 | 
| Imports: | data.table, Hmisc, caret | 
| NeedsCompilation: | no | 
| Packaged: | 2019-09-28 18:02:59 UTC; juraj | 
| Author: | Juraj Szitás [aut, cre] | 
| Maintainer: | Juraj Szitás <szitas.juraj13@gmail.com> | 
| Repository: | CRAN | 
| Date/Publication: | 2019-10-03 08:30:02 UTC | 
Wrapper for find correlations
Description
Just a helper function for correl_panel.
Usage
corr_finder(df, corr_cutoff)
Arguments
| df | The dataframe to use. | 
| corr_cutoff | The correlation cutoff to pass to findCorrelations | 
Examples
X_1 <- rnorm(1000)
X_2 <- rnorm(1000) + 0.6 * X_1
X_3 <- rnorm(1000) - 0.4 * X_1
data_fm <- do.call( cbind, list( X_1,
                                 X_2,
                                 X_3 ))
corr_finder( df = data_fm,
             corr_cutoff = 0.3 )
Collect a panel, from wide to long
Description
Transforms cross sectional/time dummies to unified variables
Usage
panel_collect(data, cross.section = NULL, cross.section.columns = NULL,
  time.variable = NULL, time.variable.columns = NULL)
Arguments
| data | The panel to transform | 
| cross.section | The name of the transformed cross sectional variable supply as chracter. | 
| cross.section.columns | The names of the columns indicating cross sections to collect. | 
| time.variable | The name of the transformed time variable supply as character. | 
| time.variable.columns | The names of the columns indicating time variables to collect. | 
Details
For time variables named like "Time_Var_i" with arbitrary i, the program will check that all time variables are named using this convention, and strip this convention
Value
A collected data.table, with new columns constructed by collecting from the wide format.
Examples
x_1 <- rnorm( 10 )
cross_levels <- c( "AT", "DE" )
time <- seq(1:5)
time <- rep(time, 2)
geo_list <- list()
for(i in 1:length(cross_levels))
{
  geo <- rep( cross_levels[i],
                100 )
                  geo_list[[i]] <- geo
                  }
                  geo <- unlist(geo_list)
                  geo <- as.data.frame(geo)
 example_data <- cbind( time,
                       x_1 )
 example_data <- as.data.frame(example_data)
 example_data <- cbind( geo,
                       example_data)
 names(example_data) <- c("geo", "time", "x_1")
# generate dummies using panel_dummify()
 test_dummies <- panel_dummify( data = example_data,
                                cross.section = "geo",
                                time.variable = "time")
panel_collect( data = test_dummies,
               cross.section = "geo",
               cross.section.columns = c( "AT", "DE"))
Panel linear combinations
Description
A function to find highly correlated variables in a panel of data, both by cross sections and by time dummies.
Usage
panel_correl(data, cross.section = NULL, time.variable = NULL,
  corr.threshold = 0.7, autocorr.threshold = 0.5,
  cross.threshold = 0.7, select.cross.sections = NULL,
  select.time.periods = NULL)
Arguments
| data | The data to use, a data.frame or a data.table. | 
| cross.section | The name of the cross sectional variable. | 
| time.variable | The name of the time variable. | 
| corr.threshold | The correlation threshold for finding significant correlations in the base specification, disregarding time or cross sectional dependencies. | 
| autocorr.threshold | The correlation threshold for autocorrelation (splitting the pooled panel into cross sections). | 
| cross.threshold | The correlation threshold for finding significant correlations in the cross sections. | 
| select.cross.sections | An optional subset of cross sectional units. | 
| select.time.periods | An optional subset of time periods | 
Examples
   x_1 <- rnorm( 100 )
   x_2 <- rnorm( 100 ) + 0.5 * x_1
   cross_levels <- c( "AT", "DE")
   time <- seq(1:50)
   time <- rep(time, 2)
   geo_list <- list()
   for(i in 1:length(cross_levels))
   {  geo <- rep( cross_levels[i], 50 )
      geo_list[[i]] <- geo }
   geo <- unlist(geo_list)
   geo <- as.data.frame(geo)
   example_data <-  do.call ( cbind, list( time, x_1, x_2))
   example_data <- as.data.frame(example_data)
   example_data <- cbind( geo,
                         example_data)
                         names(example_data) <- c("geo", "time", "x_1",
                                                 "x_2")
   panel_correl( data = example_data,
                 cross.section = "geo",
                 time.variable = "time",
                 corr.threshold = 0.2,
                 autocorr.threshold = 0.5,
                 cross.threshold = 0.1)
Tidy panel differencing
Description
Efficient, tidy panel differencing
Usage
panel_diff(data, cross.section, time.variable = NULL, diff.order = 1,
  lags = 1, variables.selected = NULL, keep.original = FALSE)
Arguments
| data | The data input, anything coercible to a data.table. | 
| cross.section | The cross section argument, see examples. | 
| time.variable | The variable to indicate time in your panel. Defaults to NULL, though it is recommended to have a time variable. | 
| diff.order | The number of applications of the difference operator to use in panel differencing. Defaults to 1. | 
| lags | The number of lags to use for differences. Defaults to 1. | 
| variables.selected | A variable selection for variables to difference, defaults to NULL and differences ALL variables. | 
| keep.original | Whether to keep the original undifferenced data, defaults to FALSE. | 
Details
Works on a full data.table backend for maximum speed wherever possible.
Value
The differenced data.table which contains either only the differenced variables, or also the original variables.
Examples
X <- matrix(rnorm(4000),800,5)
tim <- seq(1:400)
geo_AT <- rep(c("AT"), length = 400)
geo_NO <- rep(c("NO"), length = 400)
both_vec_1 <- cbind(tim,geo_NO)
both_vec_2 <- cbind(tim,geo_AT)
both <- rbind(both_vec_1,both_vec_2)
names(both[,"geo_NO"]) <- "geo"
X <- cbind(both,X)
panel_diff(data = X,
           cross.section = "geo_NO",
           time.variable = "tim",
           diff.order = 1,
           lags = 1,
           variables.selected = c("V3","V4"),
           keep.original = TRUE)
Tidy time/variable dummies for panel data
Description
A simple function to dummify cross sections or time variables in panel data.
Usage
panel_dummify(data, cross.section = NULL, time.variable = NULL)
Arguments
| data | The panel to dummify | 
| cross.section | The cross section variable in the panel. Defaults to NULL. | 
| time.variable | The variable to indicate time in your panel. Defaults to NULL. | 
Details
The encoding is binary, whether this is more appropriate than using a factor variable is up to the user.
Value
A new data.table, with the original variables to dummify removed, and new dummy columns included.
Examples
x_1 <- rnorm( 10 )
cross_levels <- c( "AT", "DE" )
time <- seq(1:5)
time <- rep(time, 2)
geo_list <- list()
for(i in 1:length(cross_levels))
{
  geo <- rep( cross_levels[i],
                100 )
                  geo_list[[i]] <- geo
                  }
                  geo <- unlist(geo_list)
                  geo <- as.data.frame(geo)
 example_data <- cbind( time,
                        x_1 )
 example_data <- as.data.frame(example_data)
 example_data <- cbind( geo,
                        example_data)
 names(example_data) <- c("geo", "time", "x_1")
 test_dummies <- panel_dummify( data = example_data,
                                cross.section = "geo",
                                time.variable = "time")
Tidy panel lagging
Description
Efficient, tidy panel lagging
Usage
panel_lag(data, cross.section, time.variable = NULL, lags = 1,
  variables.selected = NULL, keep.original = TRUE)
Arguments
| data | The data input, anything coercible to a data.table. | 
| cross.section | The cross section argument, see examples. | 
| time.variable | The variable to indicate time in your panel. Defaults to NULL, though it is recommended to have a time variable. | 
| lags | The lags to use in panel lagging. | 
| variables.selected | A variable selection for variables to lag, defaults to NULL and lags ALL variables. | 
| keep.original | Whether to keep the original unlagged data, defaults to TRUE. | 
Details
Works on a full data.table backend for maximum speed wherever possible.
Value
The lagged data.table which contains either only the lagged variables, or also the original variables.
Examples
X <- matrix(rnorm(4000),800,5)
tim <- seq(1:400)
geo_AT <- rep(c("AT"), length = 400)
geo_NO <- rep(c("NO"), length = 400)
both_vec_1 <- cbind(tim,geo_NO)
both_vec_2 <- cbind(tim,geo_AT)
both <- rbind(both_vec_1,both_vec_2)
names(both[,"geo_NO"]) <- "geo"
X <- cbind(both,X)
panel_lag(data = X,
          cross.section = "geo_NO",
          time.variable = "tim",
          lags = 5,
          variables.selected = c("V5","tim", "V7"),
          keep.original = TRUE)
Tidy panel leading
Description
Efficient, tidy panel leading
Usage
panel_lead(data, cross.section, time.variable = NULL, leads = 1,
  variables.selected = NULL, keep.original = TRUE)
Arguments
| data | The data input, anything coercible to a data.table. | 
| cross.section | The cross section argument, see examples. | 
| time.variable | The variable to indicate time in your panel. Defaults to NULL, though it is recommended to have a time variable. | 
| leads | The leads to use in panel leading. | 
| variables.selected | A variable selection for variables to lead, defaults to NULL and leads ALL variables. | 
| keep.original | Whether to keep the original unleadged data, defaults to TRUE. | 
Details
Works on a full data.table backend for maximum speed wherever possible.
Value
The leading data.table which contains either only the leading variables, or also the original variables.
Examples
X <- matrix(rnorm(4000),800,5)
tim <- seq(1:400)
geo_AT <- rep(c("AT"), length = 400)
geo_NO <- rep(c("NO"), length = 400)
both_vec_1 <- cbind(tim,geo_NO)
both_vec_2 <- cbind(tim,geo_AT)
both <- rbind(both_vec_1,both_vec_2)
names(both[,"geo_NO"]) <- "geo"
X <- cbind(both,X)
panel_lead(data = X,
          cross.section = "geo_NO",
          time.variable = "tim",
          leads = 5,
          variables.selected = c("V5","tim", "V7"),
          keep.original = TRUE)