Help for package healthyR.ts

Title:

The Time Series Modeling Companion to 'healthyR'

Version:

0.3.1

Description:

Hospital time series data analysis workflow tools, modeling, and automations. This library provides many useful tools to review common administrative time series hospital data. Some of these include average length of stay, and readmission rates. The aim is to provide a simple and consistent verb framework that takes the guesswork out of everything.

License:

MIT + file LICENSE

Encoding:

UTF-8

RoxygenNote:

7.3.2.9000

URL:

https://www.spsanderson.com/healthyR.ts/, https://github.com/spsanderson/healthyR.ts

BugReports:

https://github.com/spsanderson/healthyR.ts/issues

Imports:

magrittr, rlang (≥ 0.1.2), tibble, timetk, tidyr, dplyr, purrr, ggplot2, lubridate, plotly, recipes, modeltime, cowplot, graphics, forcats, stringi, parsnip, workflowsets, hardhat

Suggests:

knitr, rmarkdown, scales, rsample, healthyR.ai, stringr, forecast, tidymodels, glue, xts, zoo, TSA, tune, dials, workflows, tidyselect, glmnet, earth, smooth, kernlab

VignetteBuilder:

knitr

Depends:

R (≥ 3.3)

NeedsCompilation:

Packaged:

2024-10-11 18:25:41 UTC; steve

Author:

Steven Sanderson

[aut, cre, cph]

Maintainer:

Steven Sanderson <spsanderson@gmail.com>

Repository:

CRAN

Date/Publication:

2024-10-11 23:00:03 UTC

Pipe operator

Description

See magrittr::%>% for details.

Usage

lhs %>% rhs

Value

This does not return a value but rather is used to string functions together.

Forecast arima.string

Description

Forecast arima.string

Usage

arima_string(object, padding = FALSE)

Value

A string

Author(s)

Author(s) of forecast package

Misc for boilerplate

Description

Misc for boilerplate

Usage

assign_value(name, value, cr = TRUE)

Value

No return value, called for side effects

Automatically Stationarize Time Series Data

Description

This function attempts to make a non-stationary time series stationary. This function attempts to make a given time series stationary by applying transformations such as differencing or logarithmic transformation. If the time series is already stationary, it returns the original time series.

Usage

auto_stationarize(.time_series)

Arguments

.time_series

A time series object to be made stationary.

Details

If the input time series is non-stationary (determined by the Augmented Dickey-Fuller test), this function will try to make it stationary by applying a series of transformations:

It checks if the time series is already stationary using the Augmented Dickey-Fuller test.
If not stationary, it attempts a logarithmic transformation.
If the logarithmic transformation doesn't work, it applies differencing.

Value

If the time series is already stationary, it returns the original time series. If a transformation is applied to make it stationary, it returns a list with two elements:

stationary_ts: The stationary time series.
ndiffs: The order of differencing applied to make it stationary.

Author(s)

Steven P. Sanderson II, MPH

Examples

# Example 1: Using the AirPassengers dataset
auto_stationarize(AirPassengers)

# Example 2: Using the BJsales dataset
auto_stationarize(BJsales)

Helper function - Calibrate and Plot

Description

This function is a helper function. It will take in a set of workflows and then perform the modeltime::modeltime_calibrate() and modeltime::plot_modeltime_forecast().

Usage

calibrate_and_plot(
  ...,
  .type = "testing",
  .splits_obj,
  .data,
  .print_info = TRUE,
  .interactive = FALSE
)

Arguments

...

The workflow(s) you want to add to the function.

.type

Either the training(splits) or testing(splits) data.

.splits_obj

The splits object.

.data

The full data set.

.print_info

The default is TRUE and will print out the calibration accuracy tibble and the resulting plotly plot.

.interactive

The defaults is FALSE. This controls if a forecast plot is interactive or not via plotly.

Details

This function expects to take in workflows fitted with training data.

Value

The original time series, the simulated values and a some plots

Author(s)

Steven P. Sanderson II, MPH

Examples

## Not run: 
suppressPackageStartupMessages(library(timetk))
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(recipes))
suppressPackageStartupMessages(library(rsample))
suppressPackageStartupMessages(library(parsnip))
suppressPackageStartupMessages(library(workflows))

data <- ts_to_tbl(AirPassengers) %>%
  select(-index)

splits <- timetk::time_series_split(
   data
  , date_col
  , assess = 12
  , skip = 3
  , cumulative = TRUE
)

rec_obj <- recipe(value ~ ., data = training(splits))

model_spec <- linear_reg(
   mode = "regression"
   , penalty = 0.1
   , mixture = 0.5
) %>%
   set_engine("lm")

wflw <- workflow() %>%
   add_recipe(rec_obj) %>%
   add_model(model_spec) %>%
   fit(training(splits))

output <- calibrate_and_plot(
  wflw
  , .type = "training"
  , .splits_obj = splits
  , .data = data
  , .print_info = FALSE
  , .interactive = FALSE
 )

## End(Not run)

Misc for boilerplate

Description

Misc for boilerplate

Usage

chr_assign(name, value, cr = TRUE)

Value

No return value, called for side effects

Confidence Interval Generic

Description

Gets the upper 97.5% quantile of a numeric vector.

Usage

ci_hi(.x, .na_rm = FALSE)

Arguments

.x

A vector of numeric values

.na_rm

A Boolean, defaults to FALSE. Passed to the quantile function.

Details

Gets the upper 97.5% quantile of a numeric vector.

Value

A numeric value.

Author(s)

Steven P. Sanderson II, MPH

Examples

x <- mtcars$mpg
ci_hi(x)

Confidence Interval Generic

Description

Gets the lower 2.5% quantile of a numeric vector.

Usage

ci_lo(.x, .na_rm = FALSE)

Arguments

.x

A vector of numeric values

.na_rm

A Boolean, defaults to FALSE. Passed to the quantile function.

Details

Gets the lower 2.5% quantile of a numeric vector.

Value

A numeric value.

Author(s)

Steven P. Sanderson II, MPH

Examples

x <- mtcars$mpg
ci_lo(x)

Provide Colorblind Compliant Colors

Description

8 Hex RGB color definitions suitable for charts for colorblind people.

Usage

color_blind()

Details

This function is used in others in order to help render plots for those that are color blind.

Value

A vector of 8 Hex RGB definitions.

Author(s)

Steven P. Sanderson II, MPH

Examples

color_blind()

Misc for boilerplate

Description

Misc for boilerplate

Usage

get_recipe_call(.rec_call)

Value

No return value, called for side effects

Event Analysis

Description

This is a function that sits inside of the ts_time_event_analysis_tbl(). It is only meant to be used there. This is an internal function.

Usage

internal_ts_backward_event_tbl(.data, .horizon)

Arguments

.data

The date.frame/tibble that holds the data.

.horizon

How far do you want to look back or ahead.

Details

This is a helper function for ts_time_event_analysis_tbl() only.

Value

A tibble.

Author(s)

Steven P. Sanderson II, MPH

Event Analysis

Description

This is a function that sits inside of the ts_time_event_analysis_tbl(). It is only meant to be used there. This is an internal function.

Usage

internal_ts_both_event_tbl(.data, .horizon)

Arguments

.data

The date.frame/tibble that holds the data.

.horizon

How far do you want to look back or ahead.

Details

This is a helper function for ts_time_event_analysis_tbl() only.

Value

A tibble.

Author(s)

Steven P. Sanderson II, MPH

Event Analysis

Description

This is a function that sits inside of the ts_time_event_analysis_tbl(). It is only meant to be used there. This is an internal function.

Usage

internal_ts_forward_event_tbl(.data, .horizon)

Arguments

.data

The date.frame/tibble that holds the data.

.horizon

How far do you want to look back or ahead.

Details

This is a helper function for ts_time_event_analysis_tbl() only.

Value

A tibble.

Author(s)

Steven P. Sanderson II, MPH

Model Method Extraction Helper

Description

This takes in a model fit and returns the method of the fit object.

Usage

model_extraction_helper(.fit_object)

Arguments

.fit_object

A time-series fitted model

Details

Currently supports forecasting model of one of the following from the forecast package:

Arima
auto.arima
ets
nnetar
workflow fitted models.

Value

A model description

Author(s)

Steven P. Sanderson II, MPH

Examples

# NOT RUN
## Not run: 
suppressPackageStartupMessages(library(forecast))

# Create a model
fit_arima  <- auto.arima(AirPassengers)

model_extraction_helper(fit_arima)

## End(Not run)

Requited Packages

Description

Requited Packages

Required Packages

Usage

required_pkgs.step_ts_acceleration(x, ...)

required_pkgs.step_ts_velocity(x, ...)

Arguments

x

A recipe step

Value

A character vector

Recipes Time Series Acceleration Generator

Description

step_ts_acceleration creates a a specification of a recipe step that will convert numeric data into from a time series into its acceleration.

Usage

step_ts_acceleration(
  recipe,
  ...,
  role = "predictor",
  trained = FALSE,
  columns = NULL,
  skip = FALSE,
  id = rand_id("ts_acceleration")
)

Arguments

recipe

A recipe object. The step will be added to the sequence of operations for this recipe.

...

One or more selector functions to choose which variables that will be used to create the new variables. The selected variables should have class numeric

role

For model terms created by this step, what analysis role should they be assigned?. By default, the function assumes that the new variable columns created by the original variables will be used as predictors in a model.

trained

A logical to indicate if the quantities for preprocessing have been estimated.

columns

A character string of variables that will be used as inputs. This field is a placeholder and will be populated once recipes::prep() is used.

skip

A logical. Should the step be skipped when the recipe is baked by bake.recipe()? While all operations are baked when prep.recipe() is run, some operations may not be able to be conducted on new data (e.g. processing the outcome variable(s)). Care should be taken when using skip = TRUE as it may affect the computations for subsequent operations.

id

A character string that is unique to this step to identify it.

Details

Numeric Variables Unlike other steps, step_ts_acceleration does not remove the original numeric variables. recipes::step_rm() can be used for this purpose.

Value

For step_ts_acceleration, an updated version of recipe with the new step added to the sequence of existing steps (if any).

Main Recipe Functions:

recipes::recipe()
recipes::prep()
recipes::bake()

Examples

suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(recipes))

len_out    = 10
by_unit    = "month"
start_date = as.Date("2021-01-01")

data_tbl <- tibble(
  date_col = seq.Date(from = start_date, length.out = len_out, by = by_unit),
  a    = rnorm(len_out),
  b    = runif(len_out)
)

# Create a recipe object
rec_obj <- recipe(a ~ ., data = data_tbl) %>%
  step_ts_acceleration(b)

# View the recipe object
rec_obj

# Prepare the recipe object
prep(rec_obj)

# Bake the recipe object - Adds the Time Series Signature
bake(prep(rec_obj), data_tbl)

rec_obj %>% prep() %>% juice()

Recipes Time Series velocity Generator

Description

step_ts_velocity creates a a specification of a recipe step that will convert numeric data into from a time series into its velocity.

Usage

step_ts_velocity(
  recipe,
  ...,
  role = "predictor",
  trained = FALSE,
  columns = NULL,
  skip = FALSE,
  id = rand_id("ts_velocity")
)

Arguments

recipe

A recipe object. The step will be added to the sequence of operations for this recipe.

...

One or more selector functions to choose which variables that will be used to create the new variables. The selected variables should have class numeric

role

trained

A logical to indicate if the quantities for preprocessing have been estimated.

columns

A character string of variables that will be used as inputs. This field is a placeholder and will be populated once recipes::prep() is used.

skip

id

A character string that is unique to this step to identify it.

Details

Numeric Variables Unlike other steps, step_ts_velocity does not remove the original numeric variables. recipes::step_rm() can be used for this purpose.

Value

For step_ts_velocity, an updated version of recipe with the new step added to the sequence of existing steps (if any).

Main Recipe Functions:

recipes::recipe()
recipes::prep()
recipes::bake()

Examples

suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(recipes))

len_out    = 10
by_unit    = "month"
start_date = as.Date("2021-01-01")

data_tbl <- tibble(
  date_col = seq.Date(from = start_date, length.out = len_out, by = by_unit),
  a    = rnorm(len_out),
  b    = runif(len_out)
)

# Create a recipe object
rec_obj <- recipe(a ~ ., data = data_tbl) %>%
  step_ts_velocity(b)

# View the recipe object
rec_obj

# Prepare the recipe object
prep(rec_obj)

# Bake the recipe object - Adds the Time Series Signature
bake(prep(rec_obj), data_tbl)

rec_obj %>% prep() %>% juice()

Tidy Style FFT

Description

Perform an fft using stats::fft() and return a tidier style output list with plots.

Usage

tidy_fft(
  .data,
  .date_col,
  .value_col,
  .frequency = 12L,
  .harmonics = 1L,
  .upsampling = 10L
)

Arguments

.data

The data.frame/tibble you will pass for analysis.

.date_col

The column that holds the date.

.value_col

The column that holds the data to be analyzed.

.frequency

The frequency of the data, 12 = monthly for example.

.harmonics

How many harmonic waves do you want to produce.

.upsampling

The up sampling of the time series.

Details

This function will perform a few different things, but primarily it will compute the Fast Discrete Fourier Transform (FFT) using stats::fft(). The formula is given as:

y[h] = sum_{k=1}^n z[k]*exp(-2*pi*1i*(k-1)*(h-1)/n)

There are many items returned inside of a list invisibly. There are four primary categories of data returned in the list. Below are the primary categories and the items inside of them.

data:

data
error_data
input_vector
maximum_harmonic_tbl
differenced_value_tbl
dff_tbl
ts_obj

plots:

harmonic_plot
diff_plot
max_har_plot
harmonic_plotly
max_har_plotly

parameters:

harmonics
upsampling
start_date
end_date
freq

model:

m
harmonic_obj
harmonic_model
model_summary

Value

A list object returned invisibly.

Author(s)

Steven P. Sanderson II, MPH

Examples

suppressPackageStartupMessages(library(dplyr))

data_tbl <- AirPassengers %>%
  ts_to_tbl() %>%
  select(-index)

a <- tidy_fft(
  .data = data_tbl,
  .value_col = value,
  .date_col = date_col,
  .harmonics = 3,
  .frequency = 12
)

a$plots$max_har_plot
a$plots$harmonic_plot

Tidy eval helpers

Description

sym() creates a symbol from a string and syms() creates a list of symbols from a character vector.
enquo() and enquos() delay the execution of one or several function arguments. enquo() returns a single quoted expression, which is like a blueprint for the delayed computation. enquos() returns a list of such quoted expressions.
expr() quotes a new expression locally. It is mostly useful to build new expressions around arguments captured with enquo() or enquos(): expr(mean(!!enquo(arg), na.rm = TRUE)).
as_name() transforms a quoted variable name into a string. Supplying something else than a quoted variable name is an error.

That's unlike as_label() which also returns a single string but supports any kind of R object as input, including quoted function calls and vectors. Its purpose is to summarise that object into a single label. That label is often suitable as a default name.

If you don't know what a quoted expression contains (for instance expressions captured with enquo() could be a variable name, a call to a function, or an unquoted constant), then use as_label(). If you know you have quoted a simple variable name, or would like to enforce this, use as_name().

To learn more about tidy eval and how to use these tools, visit Metaprogramming section of Advanced R.

Value

These functions do not return a value but rather are used for side effects.

Augment Function Acceleration

Description

Takes a numeric vector and will return the acceleration of that vector.

Usage

ts_acceleration_augment(.data, .value, .names = "auto")

Arguments

.data

The data being passed that will be augmented by the function.

.value

This is passed rlang::enquo() to capture the vectors you want to augment.

.names

The default is "auto"

Details

Takes a numeric vector and will return the acceleration of that vector. The acceleration of a time series is computed by taking the second difference, so

(x_t - x_t1) - (x_t - x_t1)_t1

This function is intended to be used on its own in order to add columns to a tibble.

Value

A augmented tibble

Author(s)

Steven P. Sanderson II, MPH

Examples

suppressPackageStartupMessages(library(dplyr))

len_out    = 10
by_unit    = "month"
start_date = as.Date("2021-01-01")

data_tbl <- tibble(
  date_col = seq.Date(from = start_date, length.out = len_out, by = by_unit),
  a    = rnorm(len_out),
  b    = runif(len_out)
)

ts_acceleration_augment(data_tbl, b)

Vector Function Time Series Acceleration

Description

Takes a numeric vector and will return the acceleration of that vector.

Usage

ts_acceleration_vec(.x)

Arguments

.x

A numeric vector

Details

Takes a numeric vector and will return the acceleration of that vector. The acceleration of a time series is computed by taking the second difference, so

(x_t - x_t1) - (x_t - x_t1)_t1

This function can be used on it's own. It is also the basis for the function ts_acceleration_augment().

Value

A numeric vector

Author(s)

Steven P. Sanderson II, MPH

Examples

suppressPackageStartupMessages(library(dplyr))

len_out    = 25
by_unit    = "month"
start_date = as.Date("2021-01-01")

data_tbl <- tibble(
  date_col = seq.Date(from = start_date, length.out = len_out, by = by_unit),
  a    = rnorm(len_out),
  b    = runif(len_out)
)

vec_1 <- ts_acceleration_vec(data_tbl$b)

plot(data_tbl$b)
lines(data_tbl$b)
lines(vec_1, col = "blue")

Augmented Dickey-Fuller Test for Time Series Stationarity

Description

This function performs the Augmented Dickey-Fuller test to assess the stationarity of a time series. The Augmented Dickey-Fuller (ADF) test is used to determine if a given time series is stationary. This function takes a numeric vector as input, and you can optionally specify the lag order with the .k parameter. If .k is not provided, it is calculated based on the number of observations using a formula. The test statistic and p-value are returned.

Usage

ts_adf_test(.x, .k = NULL)

Arguments

.x

A numeric vector representing the time series to be tested for stationarity.

.k

An optional parameter specifying the number of lags to use in the ADF test (default is calculated).

Value

A list containing the results of the Augmented Dickey-Fuller test:

test_stat: The test statistic from the ADF test.
p_value: The p-value of the test.

Author(s)

Steven P. Sanderson II, MPH

Examples

# Example 1: Using the AirPassengers dataset
ts_adf_test(AirPassengers)

# Example 2: Using a custom time series vector
custom_ts <- rnorm(100, 0, 1)
ts_adf_test(custom_ts)

Simulate ARIMA Model

Description

Returns a list output of any n simulations of a user specified ARIMA model. The function returns a list object with two sections:

data
plots

The data section of the output contains the following:

simulation_time_series object (ts format)
simulation_time_series_output (mts format)
simulations_tbl (simulation_time_series_object in a tibble)
simulations_median_value_tbl (contains the stats::median() value of the simulated data)

The plots section of the output contains the following:

static_plot The ggplot2 plot
plotly_plot The plotly plot

Usage

ts_arima_simulator(
  .n = 100,
  .num_sims = 25,
  .order_p = 0,
  .order_d = 0,
  .order_q = 0,
  .ma = c(),
  .ar = c(),
  .sim_color = "steelblue",
  .alpha = 0.05,
  .size = 1,
  ...
)

Arguments

.n

The number of points to be simulated.

.num_sims

The number of different simulations to be run.

.order_p

The p value, the order of the AR term.

.order_d

The d value, the number of differencing to make the series stationary

.order_q

The q value, the order of the MA term.

.ma

You can list the MA terms respectively if desired.

.ar

You can list the AR terms respectively if desired.

.sim_color

The color of the lines for the simulated series.

.alpha

The alpha component of the ggplot2 and plotly lines.

.size

The size of the median line for the ggplot2

...

Any other additional arguments for stats::arima.sim

Details

This function takes in a user specified arima model. The specification is passed to stats::arima.sim()

Value

A list object.

Author(s)

Steven P. Sanderson II, MPH

Examples

output <- ts_arima_simulator()
output$plots$static_plot

Boilerplate Workflow

Description

This is a boilerplate function to create automatically the following:

recipe
model specification
workflow
tuned model (grid ect)
calibration tibble and plot

Usage

ts_auto_arima(
  .data,
  .date_col,
  .value_col,
  .formula,
  .rsamp_obj,
  .prefix = "ts_arima",
  .tune = TRUE,
  .grid_size = 10,
  .num_cores = 1,
  .cv_assess = 12,
  .cv_skip = 3,
  .cv_slice_limit = 6,
  .best_metric = "rmse",
  .bootstrap_final = FALSE
)

Arguments

.data

The data being passed to the function. The time-series object.

.date_col

The column that holds the datetime.

.value_col

The column that has the value

.formula

The formula that is passed to the recipe like value ~ .

.rsamp_obj

The rsample splits object

.prefix

Default is ts_arima

.tune

Defaults to TRUE, this creates a tuning grid and tuned model.

.grid_size

If .tune is TRUE then the .grid_size is the size of the tuning grid.

.num_cores

How many cores do you want to use. Default is 1

.cv_assess

How many observations for assess. See timetk::time_series_cv()

.cv_skip

How many observations to skip. See timetk::time_series_cv()

.cv_slice_limit

How many slices to return. See timetk::time_series_cv()

.best_metric

Default is "rmse". See modeltime::default_forecast_accuracy_metric_set()

.bootstrap_final

Not yet implemented.

Details

This uses the modeltime::arima_reg() with the engine set to arima

Value

A list

Author(s)

Steven P. Sanderson II, MPH

Examples


library(dplyr)
library(timetk)
library(modeltime)

data <- AirPassengers %>%
  ts_to_tbl() %>%
  select(-index)

splits <- time_series_split(
  data
  , date_col
  , assess = 12
  , skip = 3
  , cumulative = TRUE
)

ts_aa <- ts_auto_arima(
  .data = data,
  .num_cores = 2,
  .date_col = date_col,
  .value_col = value,
  .rsamp_obj = splits,
  .formula = value ~ .,
  .grid_size = 5,
  .cv_slice_limit = 2,
  .tune = FALSE
)

ts_aa$recipe_info

Boilerplate Workflow

Description

This is a boilerplate function to create automatically the following:

recipe
model specification
workflow
tuned model (grid ect)
calibration tibble and plot

Usage

ts_auto_arima_xgboost(
  .data,
  .date_col,
  .value_col,
  .formula,
  .rsamp_obj,
  .prefix = "ts_arima_boost",
  .tune = TRUE,
  .grid_size = 10,
  .num_cores = 1,
  .cv_assess = 12,
  .cv_skip = 3,
  .cv_slice_limit = 6,
  .best_metric = "rmse",
  .bootstrap_final = FALSE
)

Arguments

.data

The data being passed to the function. The time-series object.

.date_col

The column that holds the datetime.

.value_col

The column that has the value

.formula

The formula that is passed to the recipe like value ~ .

.rsamp_obj

The rsample splits object

.prefix

Default is ts_arima_boost

.tune

Defaults to TRUE, this creates a tuning grid and tuned model.

.grid_size

If .tune is TRUE then the .grid_size is the size of the tuning grid.

.num_cores

How many cores do you want to use. Default is 1

.cv_assess

How many observations for assess. See timetk::time_series_cv()

.cv_skip

How many observations to skip. See timetk::time_series_cv()

.cv_slice_limit

How many slices to return. See timetk::time_series_cv()

.best_metric

Default is "rmse". See modeltime::default_forecast_accuracy_metric_set()

.bootstrap_final

Not yet implemented.

Details

This uses the modeltime::arima_boost() with the engine set to xgboost

Value

A list

Author(s)

Steven P. Sanderson II, MPH

Examples


library(dplyr)
library(timetk)
library(modeltime)

data <- AirPassengers %>%
  ts_to_tbl() %>%
  select(-index)

splits <- time_series_split(
  data
  , date_col
  , assess = 12
  , skip = 3
  , cumulative = TRUE
)

ts_auto_arima_xgboost <- ts_auto_arima_xgboost(
  .data = data,
  .num_cores = 2,
  .date_col = date_col,
  .value_col = value,
  .rsamp_obj = splits,
  .formula = value ~ .,
  .grid_size = 5,
  .cv_slice_limit = 2,
  .tune = FALSE
)

ts_auto_arima_xgboost$recipe_info

Boilerplate Workflow

Description

This is a boilerplate function to create automatically the following:

recipe
model specification
workflow
tuned model (grid ect)
calibration tibble and plot

Usage

ts_auto_croston(
  .data,
  .date_col,
  .value_col,
  .formula,
  .rsamp_obj,
  .prefix = "ts_croston",
  .tune = TRUE,
  .grid_size = 10,
  .num_cores = 1,
  .cv_assess = 12,
  .cv_skip = 3,
  .cv_slice_limit = 6,
  .best_metric = "rmse",
  .bootstrap_final = FALSE
)

Arguments

.data

The data being passed to the function. The time-series object.

.date_col

The column that holds the datetime.

.value_col

The column that has the value

.formula

The formula that is passed to the recipe like value ~ .

.rsamp_obj

The rsample splits object

.prefix

Default is ts_exp_smooth

.tune

Defaults to TRUE, this creates a tuning grid and tuned model.

.grid_size

If .tune is TRUE then the .grid_size is the size of the tuning grid.

.num_cores

How many cores do you want to use. Default is 1

.cv_assess

How many observations for assess. See timetk::time_series_cv()

.cv_skip

How many observations to skip. See timetk::time_series_cv()

.cv_slice_limit

How many slices to return. See timetk::time_series_cv()

.best_metric

Default is "rmse". See modeltime::default_forecast_accuracy_metric_set()

.bootstrap_final

Not yet implemented.

Details

This uses the forecast::croston() for the parsnip engine. This model does not use exogenous regressors, so only a univariate model of: value ~ date will be used from the .date_col and .value_col that you provide.

Value

A list

Author(s)

Steven P. Sanderson II, MPH

Examples


library(dplyr)
library(timetk)
library(modeltime)

data <- AirPassengers %>%
  ts_to_tbl() %>%
  select(-index)

splits <- time_series_split(
  data
  , date_col
  , assess = 12
  , skip = 3
  , cumulative = TRUE
)

ts_exp <- ts_auto_croston(
  .data = data,
  .num_cores = 2,
  .date_col = date_col,
  .value_col = value,
  .rsamp_obj = splits,
  .formula = value ~ .,
  .grid_size = 5,
  .tune = FALSE
)

ts_exp$recipe_info

Boilerplate Workflow

Description

This is a boilerplate function to create automatically the following:

recipe
model specification
workflow
tuned model (grid ect)
calibration tibble and plot

Usage

ts_auto_exp_smoothing(
  .data,
  .date_col,
  .value_col,
  .formula,
  .rsamp_obj,
  .prefix = "ts_exp_smooth",
  .tune = TRUE,
  .grid_size = 20,
  .num_cores = 1,
  .cv_assess = 12,
  .cv_skip = 3,
  .cv_slice_limit = 6,
  .best_metric = "rmse",
  .bootstrap_final = FALSE
)

Arguments

.data

The data being passed to the function. The time-series object.

.date_col

The column that holds the datetime.

.value_col

The column that has the value

.formula

The formula that is passed to the recipe like value ~ .

.rsamp_obj

The rsample splits object

.prefix

Default is ts_exp_smooth

.tune

Defaults to TRUE, this creates a tuning grid and tuned model.

.grid_size

If .tune is TRUE then the .grid_size is the size of the tuning grid.

.num_cores

How many cores do you want to use. Default is 1

.cv_assess

How many observations for assess. See timetk::time_series_cv()

.cv_skip

How many observations to skip. See timetk::time_series_cv()

.cv_slice_limit

How many slices to return. See timetk::time_series_cv()

.best_metric

Default is "rmse". See modeltime::default_forecast_accuracy_metric_set()

.bootstrap_final

Not yet implemented.

Details

This uses modeltime::exp_smoothing() under the hood with the engine set to ets

Value

A list

Author(s)

Steven P. Sanderson II, MPH

Examples


library(dplyr)
library(timetk)
library(modeltime)

data <- AirPassengers %>%
  ts_to_tbl() %>%
  select(-index)

splits <- time_series_split(
  data
  , date_col
  , assess = 12
  , skip = 3
  , cumulative = TRUE
)

ts_exp <- ts_auto_exp_smoothing(
  .data = data,
  .num_cores = 2,
  .date_col = date_col,
  .value_col = value,
  .rsamp_obj = splits,
  .formula = value ~ .,
  .grid_size = 20,
  .tune = FALSE
)

ts_exp$recipe_info

Boilerplate Workflow

Description

This is a boilerplate function to create automatically the following:

recipe
model specification
workflow
tuned model (grid ect)
calibration tibble and plot

Usage

ts_auto_glmnet(
  .data,
  .date_col,
  .value_col,
  .formula,
  .rsamp_obj,
  .prefix = "ts_glmnet",
  .tune = TRUE,
  .grid_size = 10,
  .num_cores = 1,
  .cv_assess = 12,
  .cv_skip = 3,
  .cv_slice_limit = 6,
  .best_metric = "rmse",
  .bootstrap_final = FALSE
)

Arguments

.data

The data being passed to the function. The time-series object.

.date_col

The column that holds the datetime.

.value_col

The column that has the value

.formula

The formula that is passed to the recipe like value ~ .

.rsamp_obj

The rsample splits object

.prefix

Default is ts_glmnet

.tune

Defaults to TRUE, this creates a tuning grid and tuned model.

.grid_size

If .tune is TRUE then the .grid_size is the size of the tuning grid.

.num_cores

How many cores do you want to use. Default is 1

.cv_assess

How many observations for assess. See timetk::time_series_cv()

.cv_skip

How many observations to skip. See timetk::time_series_cv()

.cv_slice_limit

How many slices to return. See timetk::time_series_cv()

.best_metric

Default is "rmse". See modeltime::default_forecast_accuracy_metric_set()

.bootstrap_final

Not yet implemented.

Details

This uses parsnip::linear_reg() and sets the engine to glmnet

Value

A list

Author(s)

Steven P. Sanderson II, MPH

Examples


library(dplyr)
library(timetk)
library(modeltime)
library(glmnet)

data <- AirPassengers %>%
  ts_to_tbl() %>%
  select(-index)

splits <- time_series_split(
  data
  , date_col
  , assess = 12
  , skip = 3
  , cumulative = TRUE
)

ts_glmnet <- ts_auto_glmnet(
  .data = data,
  .num_cores = 2,
  .date_col = date_col,
  .value_col = value,
  .rsamp_obj = splits,
  .formula = value ~ .,
  .grid_size = 5,
  .tune = FALSE
)

ts_glmnet$recipe_info

Boilerplate Workflow

Description

This is a boilerplate function to create automatically the following:

recipe
model specification
workflow
calibration tibble and plot

Usage

ts_auto_lm(
  .data,
  .date_col,
  .value_col,
  .formula,
  .rsamp_obj,
  .prefix = "ts_lm",
  .bootstrap_final = FALSE
)

Arguments

.data

The data being passed to the function. The time-series object.

.date_col

The column that holds the datetime.

.value_col

The column that has the value

.formula

The formula that is passed to the recipe like value ~ .

.rsamp_obj

The rsample splits object

.prefix

Default is ts_lm

.bootstrap_final

Not yet implemented.

Details

This uses parsnip::linear_reg() and sets the engine to lm

Value

A list

Author(s)

Steven P. Sanderson II, MPH

Examples


library(dplyr)
library(timetk)
library(modeltime)

data <- AirPassengers %>%
  ts_to_tbl() %>%
  select(-index)

splits <- time_series_split(
  data
  , date_col
  , assess = 12
  , skip = 3
  , cumulative = TRUE
)

ts_lm <- ts_auto_lm(
  .data = data,
  .date_col = date_col,
  .value_col = value,
  .rsamp_obj = splits,
  .formula = value ~ .,
)

ts_lm$recipe_info

Boilerplate Workflow

Description

This is a boilerplate function to create automatically the following:

recipe
model specification
workflow
tuned model (grid ect)
calibration tibble and plot

Usage

ts_auto_mars(
  .data,
  .date_col,
  .value_col,
  .formula,
  .rsamp_obj,
  .prefix = "ts_mars",
  .tune = TRUE,
  .grid_size = 10,
  .num_cores = 1,
  .cv_assess = 12,
  .cv_skip = 3,
  .cv_slice_limit = 6,
  .best_metric = "rmse",
  .bootstrap_final = FALSE
)

Arguments

.data

The data being passed to the function. The time-series object.

.date_col

The column that holds the datetime.

.value_col

The column that has the value

.formula

The formula that is passed to the recipe like value ~ .

.rsamp_obj

The rsample splits object

.prefix

Default is ts_mars

.tune

Defaults to TRUE, this creates a tuning grid and tuned model.

.grid_size

If .tune is TRUE then the .grid_size is the size of the tuning grid.

.num_cores

How many cores do you want to use. Default is 1

.cv_assess

How many observations for assess. See timetk::time_series_cv()

.cv_skip

How many observations to skip. See timetk::time_series_cv()

.cv_slice_limit

How many slices to return. See timetk::time_series_cv()

.best_metric

Default is "rmse". See modeltime::default_forecast_accuracy_metric_set()

.bootstrap_final

Not yet implemented.

Details

This uses the parsnip::mars() function with the engine set to earth.

Value

A list

Author(s)

Steven P. Sanderson II, MPH

Examples


library(dplyr)
library(timetk)
library(modeltime)
library(earth)

data <- AirPassengers %>%
  ts_to_tbl() %>%
  select(-index)

splits <- time_series_split(
  data
  , date_col
  , assess = 12
  , skip = 3
  , cumulative = TRUE
)

ts_auto_mars <- ts_auto_mars(
  .data = data,
  .num_cores = 2,
  .date_col = date_col,
  .value_col = value,
  .rsamp_obj = splits,
  .formula = value ~ .,
  .grid_size = 20,
  .tune = FALSE
)

ts_auto_mars$recipe_info

Boilerplate Workflow

Description

This is a boilerplate function to create automatically the following:

recipe
model specification
workflow
tuned model (grid ect)
calibration tibble and plot

Usage

ts_auto_nnetar(
  .data,
  .date_col,
  .value_col,
  .formula,
  .rsamp_obj,
  .prefix = "ts_nnetar",
  .tune = TRUE,
  .grid_size = 10,
  .num_cores = 1,
  .cv_assess = 12,
  .cv_skip = 3,
  .cv_slice_limit = 6,
  .best_metric = "rmse",
  .bootstrap_final = FALSE
)

Arguments

.data

The data being passed to the function. The time-series object.

.date_col

The column that holds the datetime.

.value_col

The column that has the value

.formula

The formula that is passed to the recipe like value ~ .

.rsamp_obj

The rsample splits object

.prefix

Default is ts_nnetar

.tune

Defaults to TRUE, this creates a tuning grid and tuned model.

.grid_size

If .tune is TRUE then the .grid_size is the size of the tuning grid.

.num_cores

How many cores do you want to use. Default is 1

.cv_assess

How many observations for assess. See timetk::time_series_cv()

.cv_skip

How many observations to skip. See timetk::time_series_cv()

.cv_slice_limit

How many slices to return. See timetk::time_series_cv()

.best_metric

Default is "rmse". See modeltime::default_forecast_accuracy_metric_set()

.bootstrap_final

Not yet implemented.

Details

This uses the modeltime::nnetar_reg() function with the engine set to nnetar.

Value

A list

Author(s)

Steven P. Sanderson II, MPH

Examples


library(dplyr)
library(timetk)
library(modeltime)

data <- AirPassengers %>%
  ts_to_tbl() %>%
  select(-index)

splits <- time_series_split(
  data
  , date_col
  , assess = 12
  , skip = 3
  , cumulative = TRUE
)

ts_nnetar <- ts_auto_nnetar(
  .data = data,
  .num_cores = 2,
  .date_col = date_col,
  .value_col = value,
  .rsamp_obj = splits,
  .formula = value ~ .,
  .grid_size = 5,
  .tune = FALSE
)

ts_nnetar$recipe_info

Boilerplate Workflow

Description

This is a boilerplate function to create automatically the following:

recipe
model specification
workflow
tuned model (grid ect)
calibration tibble and plot

Usage

ts_auto_prophet_boost(
  .data,
  .date_col,
  .value_col,
  .formula,
  .rsamp_obj,
  .prefix = "ts_prophet_boost",
  .tune = TRUE,
  .grid_size = 10,
  .num_cores = 1,
  .cv_assess = 12,
  .cv_skip = 3,
  .cv_slice_limit = 6,
  .best_metric = "rmse",
  .bootstrap_final = FALSE
)

Arguments

.data

The data being passed to the function. The time-series object.

.date_col

The column that holds the datetime.

.value_col

The column that has the value

.formula

The formula that is passed to the recipe like value ~ .

.rsamp_obj

The rsample splits object

.prefix

Default is ts_prophet_boost

.tune

Defaults to TRUE, this creates a tuning grid and tuned model.

.grid_size

If .tune is TRUE then the .grid_size is the size of the tuning grid.

.num_cores

How many cores do you want to use. Default is 1

.cv_assess

How many observations for assess. See timetk::time_series_cv()

.cv_skip

How many observations to skip. See timetk::time_series_cv()

.cv_slice_limit

How many slices to return. See timetk::time_series_cv()

.best_metric

Default is "rmse". See modeltime::default_forecast_accuracy_metric_set()

.bootstrap_final

Not yet implemented.

Details

This uses the modeltime::prophet_boost() function with the engine set to prophet_xgboost.

Value

A list

Author(s)

Steven P. Sanderson II, MPH

Examples


library(dplyr)
library(timetk)
library(modeltime)

data <- AirPassengers %>%
  ts_to_tbl() %>%
  select(-index)

splits <- time_series_split(
  data
  , date_col
  , assess = 12
  , skip = 3
  , cumulative = TRUE
)

ts_prophet_boost <- ts_auto_prophet_boost(
  .data = data,
  .num_cores = 2,
  .date_col = date_col,
  .value_col = value,
  .rsamp_obj = splits,
  .formula = value ~ .,
  .grid_size = 5,
  .tune = FALSE
)

ts_prophet_boost$recipe_info

Boilerplate Workflow

Description

This is a boilerplate function to create automatically the following:

recipe
model specification
workflow
tuned model (grid ect)
calibration tibble and plot

Usage

ts_auto_prophet_reg(
  .data,
  .date_col,
  .value_col,
  .formula,
  .rsamp_obj,
  .prefix = "ts_prophet_reg",
  .tune = TRUE,
  .grid_size = 10,
  .num_cores = 1,
  .cv_assess = 12,
  .cv_skip = 3,
  .cv_slice_limit = 6,
  .best_metric = "rmse",
  .bootstrap_final = FALSE
)

Arguments

.data

The data being passed to the function. The time-series object.

.date_col

The column that holds the datetime.

.value_col

The column that has the value

.formula

The formula that is passed to the recipe like value ~ .

.rsamp_obj

The rsample splits object

.prefix

Default is ts_prophet

.tune

Defaults to TRUE, this creates a tuning grid and tuned model.

.grid_size

If .tune is TRUE then the .grid_size is the size of the tuning grid.

.num_cores

How many cores do you want to use. Default is 1

.cv_assess

How many observations for assess. See timetk::time_series_cv()

.cv_skip

How many observations to skip. See timetk::time_series_cv()

.cv_slice_limit

How many slices to return. See timetk::time_series_cv()

.best_metric

Default is "rmse". See modeltime::default_forecast_accuracy_metric_set()

.bootstrap_final

Not yet implemented.

Details

This uses the modeltime::prophet_reg() function with the engine set to prophet.

Value

A list

Author(s)

Steven P. Sanderson II, MPH

Examples


library(dplyr)
library(timetk)
library(modeltime)

data <- AirPassengers %>%
  ts_to_tbl() %>%
  select(-index)

splits <- time_series_split(
  data
  , date_col
  , assess = 12
  , skip = 3
  , cumulative = TRUE
)

ts_prophet_reg <- ts_auto_prophet_reg(
  .data = data,
  .num_cores = 2,
  .date_col = date_col,
  .value_col = value,
  .rsamp_obj = splits,
  .formula = value ~ .,
  .grid_size = 5,
  .tune = FALSE
)

ts_prophet_reg$recipe_info

Build a Time Series Recipe

Description

Automatically builds generic time series recipe objects from a given tibble.

Usage

ts_auto_recipe(
  .data,
  .date_col,
  .pred_col,
  .step_ts_sig = TRUE,
  .step_ts_rm_misc = TRUE,
  .step_ts_dummy = TRUE,
  .step_ts_fourier = TRUE,
  .step_ts_fourier_period = 365/12,
  .K = 1,
  .step_ts_yeo = TRUE,
  .step_ts_nzv = TRUE
)

Arguments

.data

The data that is going to be modeled. You must supply a tibble.

.date_col

The column that holds the date for the time series.

.pred_col

The column that is to be predicted.

.step_ts_sig

A Boolean indicating should the timetk::step_timeseries_signature() be added, default is TRUE.

.step_ts_rm_misc

A Boolean indicating should the following items be removed from the time series signature, default is TRUE.

iso$
xts$
hour
min
sec
am.pm

.step_ts_dummy

A Boolean indicating if all_nominal_predictors() should be dummied and with one hot encoding.

.step_ts_fourier

A Boolean indicating if timetk::step_fourier() should be added to the recipe.

.step_ts_fourier_period

A number such as 365/12, 365/4 or 365 indicting the period of the fourier term. The numeric period for the oscillation frequency.

.K

The number of orders to include for each sine/cosine fourier series. More orders increase the number of fourier terms and therefore the variance of the fitted model at the expense of bias. See details for examples of K specification.

.step_ts_yeo

A Boolean indicating if the recipes::step_YeoJohnson() should be added to the recipe.

.step_ts_nzv

A Boolean indicating if the recipes::step_nzv() should be run on all predictors.

Details

This will build out a couple of generic recipe objects and return those items in a list.

Author(s)

Steven P. Sanderson II, MPH

Examples

suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(rsample))

data_tbl <- ts_to_tbl(AirPassengers) %>%
  select(-index)

splits <- initial_time_split(
 data_tbl
 , prop = 0.8
)

ts_auto_recipe(
    .data = data_tbl
    , .date_col = date_col
    , .pred_col = value
)

ts_auto_recipe(
  .data = training(splits)
  , .date_col = date_col
  , .pred_col = value
)

Boilerplate Workflow

Description

This is a boilerplate function to automatically create the following:

recipe
model specification
workflow
tuned model (grid ect)
calibration tibble and plot

Usage

ts_auto_smooth_es(
  .data,
  .date_col,
  .value_col,
  .formula,
  .rsamp_obj,
  .prefix = "ts_smooth_es",
  .tune = TRUE,
  .grid_size = 10,
  .num_cores = 1,
  .cv_assess = 12,
  .cv_skip = 3,
  .cv_slice_limit = 6,
  .best_metric = "rmse",
  .bootstrap_final = FALSE
)

Arguments

.data

The data being passed to the function. The time-series object.

.date_col

The column that holds the datetime.

.value_col

The column that has the value

.formula

The formula that is passed to the recipe like value ~ .

.rsamp_obj

The rsample splits object

.prefix

Default is ts_smooth_es

.tune

Defaults to TRUE, this creates a tuning grid and tuned model.

.grid_size

If .tune is TRUE then the .grid_size is the size of the tuning grid.

.num_cores

How many cores do you want to use. Default is 1

.cv_assess

How many observations for assess. See timetk::time_series_cv()

.cv_skip

How many observations to skip. See timetk::time_series_cv()

.cv_slice_limit

How many slices to return. See timetk::time_series_cv()

.best_metric

Default is "rmse". See modeltime::default_forecast_accuracy_metric_set()

.bootstrap_final

Not yet implemented.

Details

This uses modeltime::exp_smoothing() and sets the parsnip::engine to smooth_es.

Value

A list

Author(s)

Steven P. Sanderson II, MPH

Examples


library(dplyr)
library(timetk)
library(modeltime)

data <- AirPassengers %>%
  ts_to_tbl() %>%
  select(-index)

splits <- time_series_split(
  data
  , date_col
  , assess = 12
  , skip = 3
  , cumulative = TRUE
)

ts_smooth_es <- ts_auto_smooth_es(
  .data = data,
  .num_cores = 2,
  .date_col = date_col,
  .value_col = value,
  .rsamp_obj = splits,
  .formula = value ~ .,
  .grid_size = 3,
  .tune = FALSE
)

ts_smooth_es$recipe_info

Boilerplate Workflow

Description

This is a boilerplate function to automatically create the following:

recipe
model specification
workflow
tuned model (grid ect)
calibration tibble and plot

Usage

ts_auto_svm_poly(
  .data,
  .date_col,
  .value_col,
  .formula,
  .rsamp_obj,
  .prefix = "ts_svm_poly",
  .tune = TRUE,
  .grid_size = 10,
  .num_cores = 1,
  .cv_assess = 12,
  .cv_skip = 3,
  .cv_slice_limit = 6,
  .best_metric = "rmse",
  .bootstrap_final = FALSE
)

Arguments

.data

The data being passed to the function. The time-series object.

.date_col

The column that holds the datetime.

.value_col

The column that has the value

.formula

The formula that is passed to the recipe like value ~ .

.rsamp_obj

The rsample splits object

.prefix

Default is ts_smooth_es

.tune

Defaults to TRUE, this creates a tuning grid and tuned model.

.grid_size

If .tune is TRUE then the .grid_size is the size of the tuning grid.

.num_cores

How many cores do you want to use. Default is 1

.cv_assess

How many observations for assess. See timetk::time_series_cv()

.cv_skip

How many observations to skip. See timetk::time_series_cv()

.cv_slice_limit

How many slices to return. See timetk::time_series_cv()

.best_metric

Default is "rmse". See modeltime::default_forecast_accuracy_metric_set()

.bootstrap_final

Not yet implemented.

Details

This uses parsnip::svm_poly() and sets the parsnip::engine to kernlab.

Value

A list

Author(s)

Steven P. Sanderson II, MPH

Examples


library(dplyr)
library(timetk)
library(modeltime)

data <- AirPassengers %>%
  ts_to_tbl() %>%
  select(-index)

splits <- time_series_split(
  data
  , date_col
  , assess = 12
  , skip = 3
  , cumulative = TRUE
)

ts_auto_poly <- ts_auto_svm_poly(
  .data = data,
  .num_cores = 2,
  .date_col = date_col,
  .value_col = value,
  .rsamp_obj = splits,
  .formula = value ~ .,
  .grid_size = 3,
  .tune = FALSE
)

ts_auto_poly$recipe_info

Boilerplate Workflow

Description

This is a boilerplate function to automatically create the following:

recipe
model specification
workflow
tuned model (grid ect)
calibration tibble and plot

Usage

ts_auto_svm_rbf(
  .data,
  .date_col,
  .value_col,
  .formula,
  .rsamp_obj,
  .prefix = "ts_svm_rbf",
  .tune = TRUE,
  .grid_size = 10,
  .num_cores = 1,
  .cv_assess = 12,
  .cv_skip = 3,
  .cv_slice_limit = 6,
  .best_metric = "rmse",
  .bootstrap_final = FALSE
)

Arguments

.data

The data being passed to the function. The time-series object.

.date_col

The column that holds the datetime.

.value_col

The column that has the value

.formula

The formula that is passed to the recipe like value ~ .

.rsamp_obj

The rsample splits object

.prefix

Default is ts_smooth_es

.tune

Defaults to TRUE, this creates a tuning grid and tuned model.

.grid_size

If .tune is TRUE then the .grid_size is the size of the tuning grid.

.num_cores

How many cores do you want to use. Default is 1

.cv_assess

How many observations for assess. See timetk::time_series_cv()

.cv_skip

How many observations to skip. See timetk::time_series_cv()

.cv_slice_limit

How many slices to return. See timetk::time_series_cv()

.best_metric

Default is "rmse". See modeltime::default_forecast_accuracy_metric_set()

.bootstrap_final

Not yet implemented.

Details

This uses parsnip::svm_rb() and sets the parsnip::engine to kernlab.

Value

A list

Author(s)

Steven P. Sanderson II, MPH

Examples


library(dplyr)
library(timetk)
library(modeltime)

data <- AirPassengers %>%
  ts_to_tbl() %>%
  select(-index)

splits <- time_series_split(
  data
  , date_col
  , assess = 12
  , skip = 3
  , cumulative = TRUE
)

ts_auto_rbf <- ts_auto_svm_rbf(
  .data = data,
  .num_cores = 2,
  .date_col = date_col,
  .value_col = value,
  .rsamp_obj = splits,
  .formula = value ~ .,
  .grid_size = 3,
  .tune = FALSE
)

ts_auto_rbf$recipe_info

Boilerplate Workflow

Description

This is a boilerplate function to create automatically the following:

recipe
model specification
workflow
calibration tibble and plot

Usage

ts_auto_theta(
  .data,
  .date_col,
  .value_col,
  .rsamp_obj,
  .prefix = "ts_theta",
  .bootstrap_final = FALSE
)

Arguments

.data

The data being passed to the function. The time-series object.

.date_col

The column that holds the datetime.

.value_col

The column that has the value

.rsamp_obj

The splits object

.prefix

Default is ts_theta

.bootstrap_final

Not yet implemented.

Details

This uses the forecast::thetaf() for the parsnip engine. This model does not use exogenous regressors, so only a univariate model of: value ~ date will be used from the .date_col and .value_col that you provide.

Value

A list

Author(s)

Steven P. Sanderson II, MPH

Examples


library(dplyr)
library(timetk)
library(modeltime)

data <- AirPassengers %>%
  ts_to_tbl() %>%
  select(-index)

splits <- time_series_split(
  data
  , date_col
  , assess = 12
  , skip = 3
  , cumulative = TRUE
)

ts_theta <- ts_auto_theta(
  .data = data,
  .date_col = date_col,
  .value_col = value,
  .rsamp_obj = splits
)

ts_theta$recipe_info

Boilerplate Workflow

Description

This is a boilerplate function to create automatically the following:

recipe
model specification
workflow
tuned model (grid ect)
calibration tibble and plot

Usage

ts_auto_xgboost(
  .data,
  .date_col,
  .value_col,
  .formula,
  .rsamp_obj,
  .prefix = "ts_xgboost",
  .tune = TRUE,
  .grid_size = 10,
  .num_cores = 1,
  .cv_assess = 12,
  .cv_skip = 3,
  .cv_slice_limit = 6,
  .best_metric = "rmse",
  .bootstrap_final = FALSE
)

Arguments

.data

The data being passed to the function. The time-series object.

.date_col

The column that holds the datetime.

.value_col

The column that has the value

.formula

The formula that is passed to the recipe like value ~ .

.rsamp_obj

The rsample splits object

.prefix

Default is ts_xgboost

.tune

Defaults to TRUE, this creates a tuning grid and tuned model.

.grid_size

If .tune is TRUE then the .grid_size is the size of the tuning grid.

.num_cores

How many cores do you want to use. Default is 1

.cv_assess

How many observations for assess. See timetk::time_series_cv()

.cv_skip

How many observations to skip. See timetk::time_series_cv()

.cv_slice_limit

How many slices to return. See timetk::time_series_cv()

.best_metric

Default is "rmse". See modeltime::default_forecast_accuracy_metric_set()

.bootstrap_final

Not yet implemented.

Details

This uses the parsnip::boost_tree() with the engine set to xgboost

Value

A list

Author(s)

Steven P. Sanderson II, MPH

Examples


library(dplyr)
library(timetk)
library(modeltime)

data <- AirPassengers %>%
  ts_to_tbl() %>%
  select(-index)

splits <- time_series_split(
  data
  , date_col
  , assess = 12
  , skip = 3
  , cumulative = TRUE
)

ts_xgboost <- ts_auto_xgboost(
  .data = data,
  .num_cores = 2,
  .date_col = date_col,
  .value_col = value,
  .rsamp_obj = splits,
  .formula = value ~ .,
  .grid_size = 5,
  .tune = FALSE
)

ts_xgboost$recipe_info

Brownian Motion

Description

Create a Brownian Motion Tibble

Usage

ts_brownian_motion(
  .time = 100,
  .num_sims = 10,
  .delta_time = 1,
  .initial_value = 0,
  .return_tibble = TRUE
)

Arguments

.time

Total time of the simulation.

.num_sims

Total number of simulations.

.delta_time

Time step size.

.initial_value

Integer representing the initial value.

.return_tibble

The default is TRUE. If set to FALSE then an object of class matrix will be returned.

Details

Brownian Motion, also known as the Wiener process, is a continuous-time random process that describes the random movement of particles suspended in a fluid. It is named after the physicist Robert Brown, who first described the phenomenon in 1827.

The equation for Brownian Motion can be represented as:

W(t) = W(0) + sqrt(t) * Z

Where W(t) is the Brownian motion at time t, W(0) is the initial value of the Brownian motion, sqrt(t) is the square root of time, and Z is a standard normal random variable.

Brownian Motion has numerous applications, including modeling stock prices in financial markets, modeling particle movement in fluids, and modeling random walk processes in general. It is a useful tool in probability theory and statistical analysis.

Value

A tibble/matrix

Author(s)

Steven P. Sanderson II, MPH

Examples

ts_brownian_motion()

Brownian Motion

Description

Create a Brownian Motion Tibble

Usage

ts_brownian_motion_augment(
  .data,
  .date_col,
  .value_col,
  .time = 100,
  .num_sims = 10,
  .delta_time = NULL
)

Arguments

.data

The data.frame/tibble being augmented.

.date_col

The column that holds the date.

.value_col

The value that is going to get augmented. The last value of this column becomes the initial value internally.

.time

How many time steps ahead.

.num_sims

How many simulations should be run.

.delta_time

Time step size.

Details

The equation for Brownian Motion can be represented as:

W(t) = W(0) + sqrt(t) * Z

Where W(t) is the Brownian motion at time t, W(0) is the initial value of the Brownian motion, sqrt(t) is the square root of time, and Z is a standard normal random variable.

Value

A tibble/matrix

Author(s)

Steven P. Sanderson II, MPH

Examples

rn <- rnorm(31)
df <- data.frame(
 date_col = seq.Date(from = as.Date("2022-01-01"),
                      to = as.Date("2022-01-31"),
                      by = "day"),
 value = rn
)

ts_brownian_motion_augment(
  .data = df,
  .date_col = date_col,
  .value_col = value
)

Auto-Plot a Geometric/Brownian Motion Augment

Description

Plot an augmented Geometric/Brownian Motion.

Usage

ts_brownian_motion_plot(.data, .date_col, .value_col, .interactive = FALSE)

Arguments

.data

The data you are going to pass to the function to augment.

.date_col

The column that holds the date

.value_col

The column that holds the value

.interactive

The default is FALSE, TRUE will produce an interactive plotly plot.

Details

This function will take output from either the ts_brownian_motion_augment() or the ts_geometric_brownian_motion_augment() function and plot them. The legend is set to "none" if the simulation count is higher than 9.

Value

A ggplot2 object or an interactive plotly plot

Author(s)

Steven P. Sanderson II, MPH

Examples

library(dplyr)

df <- ts_to_tbl(AirPassengers) %>% select(-index)

augmented_data <- df %>%
  ts_brownian_motion_augment(
    .date_col = date_col,
    .value_col = value,
    .time = 144
  )

 augmented_data %>%
   ts_brownian_motion_plot(.date_col = date_col, .value_col = value)

Time Series Calendar Heatmap

Description

Takes in data that has been aggregated to the day level and makes a calendar heatmap.

Usage

ts_calendar_heatmap_plot(
  .data,
  .date_col,
  .value_col,
  .low = "red",
  .high = "green",
  .plt_title = "",
  .interactive = TRUE
)

Arguments

.data

The time-series data with a date column and value column.

.date_col

The column that has the datetime values

.value_col

The column that has the values

.low

The color for the low value, must be quoted like "red". The default is "red"

.high

The color for the high value, must be quoted like "green". The default is "green"

.plt_title

The title of the plot

.interactive

Default is TRUE to get an interactive plot using plotly::ggplotly(). It can be set to FALSE to get a ggplot plot.

Details

The data provided must have been aggregated to the day level, if not funky output could result and it is possible nothing will be output but errors. There must be a date column and a value column, those are the only items required for this function to work.

This function is intentionally inflexible, it complains more and does less in order to force the user to supply a clean data-set.

Value

A ggplot2 plot or if interactive a plotly plot

Author(s)

Steven P. Sanderson II, MPH

Examples

data_tbl <- data.frame(
  date_col = seq.Date(
    from = as.Date("2020-01-01"),
    to   = as.Date("2022-06-01"),
    length.out = 365*2 + 180
    ),
  value = rnorm(365*2+180, mean = 100)
)

ts_calendar_heatmap_plot(
  .data          = data_tbl
  , .date_col    = date_col
  , .value_col   = value
  , .interactive = FALSE
)

Compare data over time periods

Description

Given a tibble/data.frame, you can get date from two different but comparative date ranges. Lets say you want to compare visits in one year to visits from 2 years before without also seeing the previous 1 year. You can do that with this function.

Usage

ts_compare_data(.data, .date_col, .start_date, .end_date, .periods_back)

Arguments

.data

The date.frame/tibble that holds the data

.date_col

The column with the date value

.start_date

The start of the period you want to analyze

.end_date

The end of the period you want to analyze

.periods_back

How long ago do you want to compare data too. Time units are collapsed using lubridate::floor_date(). The value can be:

second
minute
hour
day
week
month
bimonth
quarter
season
halfyear
year

Arbitrary unique English abbreviations as in the lubridate::period() constructor are allowed.

Details

Uses the timetk::filter_by_time() function in order to filter the date column.
Uses the timetk::subtract_time() function to subtract time from the start date.

Value

A tibble.

Author(s)

Steven P. Sanderson II, MPH

Examples

suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(timetk))

data_tbl <- ts_to_tbl(AirPassengers) %>%
  select(-index)

ts_compare_data(
  .data           = data_tbl
  , .date_col     = date_col
  , .start_date   = "1955-01-01"
  , .end_date     = "1955-12-31"
  , .periods_back = "2 years"
  ) %>%
  summarise_by_time(
    .date_var = date_col
    , .by     = "year"
    , visits  = sum(value)
  )

Time Series Event Analysis Plot

Description

Plot out the data from the ts_time_event_analysis_tbl() function.

Usage

ts_event_analysis_plot(
  .data,
  .plot_type = "mean",
  .plot_ci = TRUE,
  .interactive = FALSE
)

Arguments

.data

The data that comes from the ts_time_event_analysis_tbl()

.plot_type

The default is "mean" which will show the mean event change of the output from the analysis tibble. The possible values for this are: mean, median, and individual.

.plot_ci

The default is TRUE. This will only work if you choose one of the aggregate plots of either "mean" or "median"

.interactive

The default is FALSE. TRUE will return a plotly plot.

Details

This function will take in data strictly from the ts_time_event_analysis_tbl() and plot out the data. You can choose what type of plot you want in the parameter of .plot_type. This will give you a choice of "mean", "median", and "individual".

You can also plot the upper and lower confidence intervals if you choose one of the aggregate plots ("mean"/"median").

Value

A ggplot2 object

Author(s)

Steven P. Sanderson II, MPH

Examples

library(dplyr)
df <- ts_to_tbl(AirPassengers) %>% select(-index)

ts_time_event_analysis_tbl(
  .data = df,
  .horizon = 6,
  .date_col = date_col,
  .value_col = value,
  .direction = "both"
) %>%
  ts_event_analysis_plot()

ts_time_event_analysis_tbl(
  .data = df,
  .horizon = 6,
  .date_col = date_col,
  .value_col = value,
  .direction = "both"
) %>%
  ts_event_analysis_plot(.plot_type = "individual")

Extract Boilerplate Items

Description

Extract the fitted workflow from a ts_auto_ function.

Usage

ts_extract_auto_fitted_workflow(.input)

Arguments

.input

This is the output list object of a ts_auto_ function.

Details

Extract the fitted workflow from a ts_auto_ function. This will only work on those functions that are designated as Boilerplate.

Value

A fitted workflow object.

Author(s)

Steven P. Sanderson II, MPH

Examples

## Not run: 
library(dplyr)

data <- AirPassengers %>%
  ts_to_tbl() %>%
  select(-index)

splits <- time_series_split(
  data
  , date_col
  , assess = 12
  , skip = 3
  , cumulative = TRUE
)

ts_lm <- ts_auto_lm(
  .data = data,
  .date_col = date_col,
  .value_col = value,
  .rsamp_obj = splits,
  .formula = value ~ .,
)

ts_extract_auto_fitted_workflow(ts_lm)

## End(Not run)

Time Series Feature Clustering

Description

This function returns an output list of data and plots that come from using the K-Means clustering algorithm on a time series data.

Usage

ts_feature_cluster(
  .data,
  .date_col,
  .value_col,
  ...,
  .features = c("frequency", "entropy", "acf_features"),
  .scale = TRUE,
  .prefix = "ts_",
  .centers = 3
)

Arguments

.data

The data passed must be a data.frame/tibble only.

.date_col

The date column.

.value_col

The column that holds the value of the time series where you want the features and clustering performed on.

...

This is where you can place grouping variables that are passed off to dplyr::group_by()

.features

This is a quoted string vector using c() of features that you would like to pass. You can pass any feature you make or those from the tsfeatures package.

.scale

If TRUE, time series are scaled to mean 0 and sd 1 before features are computed

.prefix

A prefix to prefix the feature columns. Default: "ts_"

.centers

An integer of how many different centers you would like to generate. The default is 3.

Details

This function will return a list object output. The function itself requires that a time series tibble/data.frame get passed to it, along with the .date_col, the .value_col and a period of data. It uses the underlying function timetk::tk_tsfeatures() and takes the output of that and performs a clustering analysis using the K-Means algorithm.

The function has a parameter of .features which can take any of the features listed in the tsfeatures package by Rob Hyndman. You can also create custom functions in the .GlobalEnviron and it will take them as quoted arguments.

So you can make a function as follows

my_mean <- function(x){return(mean(x, na.rm = TRUE))}

You can then call this by using .features = c("my_mean").

The output of this function includes the following:

Data Section

ts_feature_tbl
user_item_matrix_tbl
mapped_tbl
scree_data_tbl
input_data_tbl (the original data)

Plots

static_plot
plotly_plot

Value

A list output

Author(s)

Steven P. Sanderson II, MPH

Examples


library(dplyr)

data_tbl <- ts_to_tbl(AirPassengers) %>%
  mutate(group_id = rep(1:12, 12))

ts_feature_cluster(
  .data = data_tbl,
  .date_col = date_col,
  .value_col = value,
  group_id,
  .features = c("acf_features","entropy"),
  .scale = TRUE,
  .prefix = "ts_",
  .centers = 3
)

Time Series Feature Clustering

Description

This function returns an output list of data and plots that come from using the K-Means clustering algorithm on a time series data.

Usage

ts_feature_cluster_plot(
  .data,
  .date_col,
  .value_col,
  ...,
  .center = 3,
  .facet_ncol = 3,
  .smooth = FALSE
)

Arguments

.data

The data passed must be the output of the ts_feature_cluster() function.

.date_col

The date column.

.value_col

The column that holds the value of the time series that the featurs were built from.

...

This is where you can place grouping variables that are passed off to dplyr::group_by()

.center

An integer of the chosen amount of centers from the ts_feature_cluster() function.

This is passed to the timetk::plot_time_series() function.

.smooth

This is passed to the timetk::plot_time_series() function and is set to a default of FALSE.

Details

This function will return a list object output. The function itself requires that the ts_feature_cluster() be passed to it as it will look for a specific attribute internally.

The output of this function includes the following:

Data Section

original_data
kmm_data_tbl
user_item_tbl
cluster_tbl

Plots

static_plot
plotly_plot

K-Means Object

k-means object

Value

A list output

Author(s)

Steven P. Sanderson II, MPH

Examples


library(dplyr)

data_tbl <- ts_to_tbl(AirPassengers) %>%
  mutate(group_id = rep(1:12, 12))

output <- ts_feature_cluster(
  .data = data_tbl,
  .date_col = date_col,
  .value_col = value,
  group_id,
  .features = c("acf_features","entropy"),
  .scale = TRUE,
  .prefix = "ts_",
  .centers = 3
)

ts_feature_cluster_plot(
  .data = output,
  .date_col = date_col,
  .value_col = value,
  .center = 2,
  group_id
)

Time-series Forecasting Simulator

Description

Creating different forecast paths for forecast objects (when applicable), by utilizing the underlying model distribution with the simulate function.

Usage

ts_forecast_simulator(
  .model,
  .data,
  .ext_reg = NULL,
  .frequency = NULL,
  .bootstrap = TRUE,
  .horizon = 4,
  .iterations = 25,
  .sim_color = "steelblue",
  .alpha = 0.05
)

Arguments

.model

A forecasting model of one of the following from the forecast package:

Arima
auto.arima
ets
nnetar
Arima() with xreg

.data

The data that is used for the .model parameter. This is used with timetk::tk_index()

.ext_reg

A tibble or matrix of future xregs that should be the same length as the horizon you want to forecast.

.frequency

This is for the conversion of an internal table and should match the time frequency of the data.

.bootstrap

A boolean value of TRUE/FALSE. From forecast::simulate.Arima() Do simulation using resampled errors rather than normally distributed errors.

.horizon

An integer defining the forecast horizon.

.iterations

An integer, set the number of iterations of the simulation.

.sim_color

Set the color of the simulation paths lines.

.alpha

Set the opacity level of the simulation path lines.

Details

This function expects to take in a model of either Arima, auto.arima, ets or nnetar from the forecast package. You can supply a forecasting horizon, iterations and a few other items. You may also specify an Arima() model using xregs.

Value

The original time series, the simulated values and a some plots

Author(s)

Steven P. Sanderson II, MPH

Examples

suppressPackageStartupMessages(library(forecast))
suppressPackageStartupMessages(library(dplyr))

# Create a model
fit <- auto.arima(AirPassengers)
data_tbl <- ts_to_tbl(AirPassengers)

# Simulate 50 possible forecast paths, with .horizon of 12 months
output <- ts_forecast_simulator(
  .model        = fit
  , .horizon    = 12
  , .iterations = 50
  , .data       = data_tbl
)

output$ggplot

Geometric Brownian Motion

Description

Create a Geometric Brownian Motion.

Usage

ts_geometric_brownian_motion(
  .num_sims = 100,
  .time = 25,
  .mean = 0,
  .sigma = 0.1,
  .initial_value = 100,
  .delta_time = 1/365,
  .return_tibble = TRUE
)

Arguments

.num_sims

Total number of simulations.

.time

Total time of the simulation.

.mean

Expected return

.sigma

Volatility

.initial_value

Integer representing the initial value.

.delta_time

Time step size.

.return_tibble

The default is TRUE. If set to FALSE then an object of class matrix will be returned.

Details

Geometric Brownian Motion (GBM) is a statistical method for modeling the evolution of a given financial asset over time. It is a type of stochastic process, which means that it is a system that undergoes random changes over time.

GBM is widely used in the field of finance to model the behavior of stock prices, foreign exchange rates, and other financial assets. It is based on the assumption that the asset's price follows a random walk, meaning that it is influenced by a number of unpredictable factors such as market trends, news events, and investor sentiment.

The equation for GBM is:

 dS/S = mdt + sdW

where S is the price of the asset, t is time, m is the expected return on the asset, s is the volatility of the asset, and dW is a small random change in the asset's price.

GBM can be used to estimate the likelihood of different outcomes for a given asset, and it is often used in conjunction with other statistical methods to make more accurate predictions about the future performance of an asset.

This function provides the ability of simulating and estimating the parameters of a GBM process. It can be used to analyze the behavior of financial assets and to make informed investment decisions.

Value

A tibble/matrix

Author(s)

Steven P. Sanderson II, MPH

Examples

ts_geometric_brownian_motion()

Geometric Brownian Motion

Description

Create a Geometric Brownian Motion.

Usage

ts_geometric_brownian_motion_augment(
  .data,
  .date_col,
  .value_col,
  .num_sims = 10,
  .time = 25,
  .mean = 0,
  .sigma = 0.1,
  .delta_time = 1/365
)

Arguments

.data

The data you are going to pass to the function to augment.

.date_col

The column that holds the date

.value_col

The column that holds the value

.num_sims

Total number of simulations.

.time

Total time of the simulation.

.mean

Expected return

.sigma

Volatility

.delta_time

Time step size.

Details

The equation for GBM is:

 dS/S = mdt + sdW

where S is the price of the asset, t is time, m is the expected return on the asset, s is the volatility of the asset, and dW is a small random change in the asset's price.

This function provides the ability of simulating and estimating the parameters of a GBM process. It can be used to analyze the behavior of financial assets and to make informed investment decisions.

Value

A tibble/matrix

Author(s)

Steven P. Sanderson II, MPH

Examples

rn <- rnorm(31)
df <- data.frame(
 date_col = seq.Date(from = as.Date("2022-01-01"),
                      to = as.Date("2022-01-31"),
                      by = "day"),
 value = rn
)

ts_geometric_brownian_motion_augment(
  .data = df,
  .date_col = date_col,
  .value_col = value
)

Get date or datetime variables (column names)

Description

Get date or datetime variables (column names)

Usage

ts_get_date_columns(.data)

Arguments

.data

An object of class data.frame

Details

ts_get_date_columns returns the column names of date or datetime variables in a data frame.

Value

A vector containing the column names that are of date/date-like classes.

Author(s)

Steven P. Sanderson II, MPH

Examples

ts_to_tbl(AirPassengers) %>%
  ts_get_date_columns()

Augment Data with Time Series Growth Rates

Description

This function is used to augment a data frame or tibble with time series growth rates of selected columns. You can provide a data frame or tibble as the first argument, the column(s) for which you want to calculate the growth rates using the .value parameter, and optionally specify custom names for the new columns using the .names parameter.

Usage

ts_growth_rate_augment(.data, .value, .names = "auto")

Arguments

.data

A data frame or tibble containing the data to be augmented.

.value

A quosure specifying the column(s) for which you want to calculate growth rates.

.names

Optional. A character vector specifying the names of the new columns to be created. Use "auto" for automatic naming.

Value

A tibble that includes the original data and additional columns representing the growth rates of the selected columns. The column names are either automatically generated or as specified in the .names parameter.

Author(s)

Steven P. Sanderson II, MPH

Examples

data <- data.frame(
  Year = 1:5,
  Income = c(100, 120, 150, 180, 200),
  Expenses = c(50, 60, 75, 90, 100)
)
ts_growth_rate_augment(data, .value = c(Income, Expenses))

Vector Function Time Series Growth Rate

Description

This function computes the growth rate of a numeric vector, typically representing a time series, with optional transformations like scaling, power, and lag differences.

Usage

ts_growth_rate_vec(.x, .scale = 100, .power = 1, .log_diff = FALSE, .lags = 1)

Arguments

.x

A numeric vector

.scale

A numeric value that is used to scale the output

.power

A numeric value that is used to raise the output to a power

.log_diff

A logical value that determines whether the output is a log difference

.lags

An integer that determines the number of lags to use

Details

The function calculates growth rates for a time series, allowing for scaling, exponentiation, and lag differences. It can be useful for financial data analysis, among other applications.

The growth rate is computed as follows:

If lags is positive and log_diff is FALSE: growth_rate = (((x / lag(x, lags))^power) - 1) * scale
If lags is positive and log_diff is TRUE: growth_rate = log(x / lag(x, lags)) * scale
If lags is negative and log_diff is FALSE: growth_rate = (((x / lead(x, -lags))^power) - 1) * scale
If lags is negative and log_diff is TRUE: growth_rate = log(x / lead(x, -lags)) * scale

Value

A list object of workflows.

Author(s)

Steven P. Sanderson II, MPH

Examples

# Calculate the growth rate of a time series without any transformations.
ts_growth_rate_vec(c(100, 110, 120, 130))

# Calculate the growth rate with scaling and a power transformation.
ts_growth_rate_vec(c(100, 110, 120, 130), .scale = 10, .power = 2)

# Calculate the log differences of a time series with lags.
ts_growth_rate_vec(c(100, 110, 120, 130), .log_diff = TRUE, .lags = -1)

# Plot
plot.ts(AirPassengers)
plot.ts(ts_growth_rate_vec(AirPassengers))

Get Time Series Information

Description

This function will take in a data set and return to you a tibble of useful information.

Usage

ts_info_tbl(.data, .date_col)

Arguments

.data

The data you are passing to the function

.date_col

This is only needed if you are passing a tibble.

Details

This function can accept objects of the following classes:

ts
xts
mts
zoo
tibble/data.frame

The function will return the following pieces of information in a tibble:

name
class
frequency
start
end
var
length

Value

A tibble

Author(s)

Steven P. Sanderson II, MPH

Examples

ts_info_tbl(AirPassengers)
ts_info_tbl(BJsales)

Check if an object is a date class

Description

Check if an object is a date class

Usage

ts_is_date_class(.x)

Arguments

.x

A vector to check

Value

Logical (TRUE/FALSE)

Examples


seq.Date(from = as.Date("2022-01-01"), by = "day", length.out = 10) %>%
ts_is_date_class()

letters %>% ts_is_date_class()

Time Series Lag Correlation Analysis

Description

This function outputs a list object of both data and plots.

The data output are the following:

lag_list
lag_tbl
correlation_lag_matrix
correlation_lag_tbl

The plots output are the following:

lag_plot
plotly_lag_plot
correlation_heatmap
plotly_heatmap

Usage

ts_lag_correlation(
  .data,
  .date_col,
  .value_col,
  .lags = 1,
  .heatmap_color_low = "white",
  .heatmap_color_hi = "steelblue"
)

Arguments

.data

A tibble of time series data

.date_col

A date column

.value_col

The value column being analyzed

.lags

This is a vector of integer lags, ie 1 or c(1,6,12)

.heatmap_color_low

What color should the low values of the heatmap of the correlation matrix be, the default is 'white'

.heatmap_color_hi

What color should the low values of the heatmap of the correlation matrix be, the default is 'steelblue'

Details

This function takes in a time series data in the form of a tibble and outputs a list object of data and plots. This function will take in an argument of '.lags' and get those lags in your data, outputting a correlation matrix, heatmap and lag plot among other things of the input data.

Value

A list object

Author(s)

Steven P. Sanderson II, MPH

Examples

library(dplyr)

df <- ts_to_tbl(AirPassengers) %>% select(-index)
lags <- c(1,3,6,12)

output <- ts_lag_correlation(
  .data = df,
  .date_col = date_col,
  .value_col = value,
  .lags = lags
)

output$data$correlation_lag_matrix
output$plots$lag_plot

Time Series Moving Average Plot

Description

This function will produce two plots. Both of these are moving average plots. One of the plots is from xts::plot.xts() and the other a ggplot2 plot. This is done so that the user can choose which type is best for them. The plots are stacked so each graph is on top of the other.

Usage

ts_ma_plot(
  .data,
  .date_col,
  .value_col,
  .ts_frequency = "monthly",
  .main_title = NULL,
  .secondary_title = NULL,
  .tertiary_title = NULL
)

Arguments

.data

The data you want to visualize. This should be pre-processed and the aggregation should match the .frequency argument.

.date_col

The data column from the .data argument.

.value_col

The value column from the .data argument

.ts_frequency

The frequency of the aggregation, quoted, ie. "monthly", anything else will default to weekly, so it is very important that the data passed to this function be in either a weekly or monthly aggregation.

.main_title

The title of the main plot.

.secondary_title

The title of the second plot.

.tertiary_title

The title of the third plot.

Details

This function expects to take in a data.frame/tibble. It will return a list object so it is a good idea to save the output to a variable and extract from there.

Value

A few time series data sets and two plots.

Author(s)

Steven P. Sanderson II, MPH

Examples

suppressPackageStartupMessages(library(dplyr))

data_tbl <- ts_to_tbl(AirPassengers) %>%
  select(-index)

output <- ts_ma_plot(
  .data = data_tbl,
  .date_col = date_col,
  .value_col = value
)

output$pgrid
output$xts_plt
output$data_summary_tbl %>% head()

output <- ts_ma_plot(
  .data = data_tbl,
  .date_col = date_col,
  .value_col = value,
  .ts_frequency = "week"
)

output$pgrid
output$xts_plt
output$data_summary_tbl %>% head()

Time Series Model Tuner

Description

This function will create a tuned model. It uses the ts_model_spec_tune_template() under the hood to get the generic template that is used in the grid search.

Usage

ts_model_auto_tune(
  .modeltime_model_id,
  .calibration_tbl,
  .splits_obj,
  .drop_training_na = TRUE,
  .date_col,
  .value_col,
  .tscv_assess = "12 months",
  .tscv_skip = "6 months",
  .slice_limit = 6,
  .facet_ncol = 2,
  .grid_size = 30,
  .num_cores = 1,
  .best_metric = "rmse"
)

Arguments

.modeltime_model_id

The .model_id from a calibrated modeltime table.

.calibration_tbl

A calibrated modeltime table.

.splits_obj

The time_series_split object.

.drop_training_na

A boolean that will drop NA values from the training(splits) data

.date_col

The column that holds the date values.

.value_col

The column that holds the time series values.

.tscv_assess

A character expression like "12 months". This gets passed to timetk::time_series_cv()

.tscv_skip

A character expression like "6 months". This gets passed to timetk::time_series_cv()

.slice_limit

An integer that gets passed to timetk::time_series_cv()

The number of faceted columns to be passed to plot_time_series_cv_plan

.grid_size

An integer that gets passed to the dials::grid_latin_hypercube() function.

.num_cores

The default is 1, you can set this to any integer value as long as it is equal to or less than the available cores on your machine.

.best_metric

The default is "rmse" and this can be set to any default dials metric. This must be passed as a character.

Details

This function can work with the following parsnip/modeltime engines:

"auto_arima"
"auto_arima_xgboost"
"ets"
"croston"
"theta"
"stlm_ets"
"tbats"
"stlm_arima"
"nnetar"
"prophet"
"prophet_xgboost"
"lm"
"glmnet"
"stan"
"spark"
"keras"
"earth"
"xgboost"
"kernlab"

This function returns a list object with several items inside of it. There are three categories of items that are inside of the list.

data
model_info
plots

The data section has the following items:

calibration_tbl This is the calibration data passed into the function.
calibration_tuned_tbl This is a calibration tibble that has used the tuned workflow.
tscv_data_tbl This is the tibble of the time series cross validation.
tuned_results This is a tuning results tibble with all slices from the time series cross validation.
best_tuned_results_tbl This is a tibble of the parameters for the best test set with the chosen metric.
tscv_obj This is the actual time series cross validation object returned from timetk::time_series_cv()

The model_info section has the following items:

model_spec This is the original modeltime/parsnip model specification.
model_spec_engine This is the engine used for the model specification.
model_spec_tuner This is the tuning model template returned from ts_model_spec_tune_template()
plucked_model This is the model that we have plucked from the calibration tibble for tuning.
wflw_tune_spec This is a new workflow with the model_spec_tuner attached.
grid_spec This is the grid search specification for the tuning process.
tuned_tscv_wflw_spec This is the final tuned model where the workflow and model have been finalized. This would be the model that you would want to pull out if you are going to work with it further.

The plots section has the following items:

tune_results_plt This is a static ggplot of the grid search.
tscv_pl This is the time series cross validation plan plot.

Value

A list object with multiple items.

Author(s)

Steven P. Sanderson II, MPH

Examples

## Not run: 
suppressPackageStartupMessages(library(modeltime))
suppressPackageStartupMessages(library(timetk))
suppressPackageStartupMessages(library(dplyr))

data <- ts_to_tbl(AirPassengers) %>%
  select(-index)

splits <- time_series_split(
    data
    , date_col
    , assess = 12
    , skip = 3
    , cumulative = TRUE
)

rec_objs <- ts_auto_recipe(
  .data = data
  , .date_col = date_col
  , .pred_col = value
)

wfsets <- ts_wfs_mars(
  .model_type = "earth"
  , .recipe_list = rec_objs
)

wf_fits <- wfsets %>%
  modeltime_fit_workflowset(
    data = training(splits)
    , control = control_fit_workflowset(
     allow_par = TRUE
     , verbose = TRUE
    )
  )

models_tbl <- wf_fits %>%
  filter(.model != "NULL")

calibration_tbl <- models_tbl %>%
  modeltime_calibrate(new_data = testing(splits))

output <- ts_model_auto_tune(
  .modeltime_model_id = 1,
  .calibration_tbl = calibration_tbl,
  .splits_obj = splits,
  .drop_training_na = TRUE,
  .date_col = date_col,
  .value_col = value,
  .tscv_assess = "12 months",
  .tscv_skip = "3 months",
  .num_cores = parallel::detectCores() - 1
)

## End(Not run)

Compare Two Time Series Models

Description

This function will expect to take in two models that will be used for comparison. It is useful to use this after appropriately following the modeltime workflow and getting two models to compare. This is an extension of the calibrate and plot, but it only takes two models and is most likely better suited to be used after running a model through the ts_model_auto_tune() function to see the difference in performance after a base model has been tuned.

Usage

ts_model_compare(
  .model_1,
  .model_2,
  .type = "testing",
  .splits_obj,
  .data,
  .print_info = TRUE,
  .metric = "rmse"
)

Arguments

.model_1

The model being compared to the base, this can also be a hyperparameter tuned model.

.model_2

The base model.

.type

The default is the testing tibble, can be set to training as well.

.splits_obj

The splits object

.data

The original data that was passed to splits

.print_info

This is a boolean, the default is TRUE

.metric

This should be one of the following character strings:

"mae"
"mape"
"mase"
"smape"
"rmse"
"rsq"

Details

This function expects to take two models. You must tell it if it will be assessing the training or testing data, where the testing data is the default. You must therefore supply the splits object to this function along with the origianl dataset. You must also tell it which default modeltime accuracy metric should be printed on the graph itself. You can also tell this function to print information to the console or not. A static ggplot2 polot and an interactive plotly plot will be returned inside of the output list.

Value

The function outputs a list invisibly.

Author(s)

Steven P. Sanderson II, MPH

Examples

## Not run: 
suppressPackageStartupMessages(library(modeltime))
suppressPackageStartupMessages(library(timetk))
suppressPackageStartupMessages(library(rsample))
suppressPackageStartupMessages(library(dplyr))

data_tbl <- ts_to_tbl(AirPassengers) %>%
  select(-index)

splits <- time_series_split(
  data       = data_tbl,
  date_var   = date_col,
  assess     = "12 months",
  cumulative = TRUE
)

rec_obj <- ts_auto_recipe(
 .data     = data_tbl,
 .date_col = date_col,
 .pred_col = value
)

wfs_mars <- ts_wfs_mars(.recipe_list = rec_obj)

wf_fits <- wfs_mars %>%
  modeltime_fit_workflowset(
    data = training(splits)
    , control = control_fit_workflowset(
         allow_par = FALSE
         , verbose = TRUE
       )
 )

calibration_tbl <- wf_fits %>%
    modeltime_calibrate(new_data = testing(splits))

base_mars <- calibration_tbl %>% pluck_modeltime_model(1)
date_mars <- calibration_tbl %>% pluck_modeltime_model(2)

ts_model_compare(
 .model_1    = base_mars,
 .model_2    = date_mars,
 .type       = "testing",
 .splits_obj = splits,
 .data       = data_tbl,
 .print_info = TRUE,
 .metric     = "rmse"
 )$plots$static_plot

## End(Not run)

Model Rank

Description

This takes in a calibration tibble and computes the ranks of the models inside of it.

Usage

ts_model_rank_tbl(.calibration_tbl)

Arguments

.calibration_tbl

A calibrated modeltime table.

Details

This takes in a calibration tibble and computes the ranks of the models inside of it. It computes for now only the default yardstick metrics from modeltime These are the following using the dplyr min_rank() function with desc use on rsq:

"rmse"
"mae"
"mape"
"smape"
"rsq"

Value

A tibble with models ranked by metric performance order

Author(s)

Steven P. Sanderson II, MPH

Examples

# NOT RUN
## Not run: 
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(timetk))
suppressPackageStartupMessages(library(modeltime))
suppressPackageStartupMessages(library(rsample))
suppressPackageStartupMessages(library(workflows))
suppressPackageStartupMessages(library(parsnip))
suppressPackageStartupMessages(library(recipes))

data_tbl <- ts_to_tbl(AirPassengers) %>%
  select(-index)

splits <- time_series_split(
  data_tbl,
  date_var = date_col,
  assess = "12 months",
  cumulative = TRUE
)

rec_obj <- recipe(value ~ ., training(splits))

model_spec_arima <- arima_reg() %>%
  set_engine(engine = "auto_arima")

model_spec_mars <- mars(mode = "regression") %>%
  set_engine("earth")

wflw_fit_arima <- workflow() %>%
  add_recipe(rec_obj) %>%
  add_model(model_spec_arima) %>%
  fit(training(splits))

wflw_fit_mars <- workflow() %>%
  add_recipe(rec_obj) %>%
  add_model(model_spec_mars) %>%
  fit(training(splits))

model_tbl <- modeltime_table(wflw_fit_arima, wflw_fit_mars)

calibration_tbl <- model_tbl %>%
  modeltime_calibrate(new_data = testing(splits))

ts_model_rank_tbl(calibration_tbl)


## End(Not run)

Time Series Model Spec Template

Description

This function will create a generic tuneable model specification, this function can be used by itself and is called internally by ts_model_auto_tune().

Usage

ts_model_spec_tune_template(.parsnip_engine = NULL, .model_spec_class = NULL)

Arguments

.parsnip_engine

The model engine that is used by parsnip::set_engine().

.model_spec_class

The model spec class that is use by parsnip. For example the 'kernlab' engine can use both svm_poly and svm_rbf.

Details

This function takes in a single parameter and uses that to output a generic tuneable model specification. This function can work with the following parsnip/modeltime engines:

"auto_arima"
"auto_arima_xgboost"
"ets"
"croston"
"theta"
"smooth_es"
"stlm_ets"
"tbats"
"stlm_arima"
"nnetar"
"prophet"
"prophet_xgboost"
"lm"
"glmnet"
"stan"
"spark"
"keras"
"earth"
"xgboost"
"kernlab"

Value

A tuneable parsnip model specification.

Author(s)

Steven P. Sanderson II, MPH

Examples

ts_model_spec_tune_template("ets")
ts_model_spec_tune_template("prophet")

Quality Control Run Chart

Description

A control chart is a specific type of graph that shows data points between upper and lower limits over a period of time. You can use it to understand if the process is in control or not. These charts commonly have three types of lines such as upper and lower specification limits, upper and lower limits and planned value. By the help of these lines, Control Charts show the process behavior over time.

Usage

ts_qc_run_chart(
  .data,
  .date_col,
  .value_col,
  .interactive = FALSE,
  .median = TRUE,
  .cl = TRUE,
  .mcl = TRUE,
  .ucl = TRUE,
  .lc = FALSE,
  .lmcl = FALSE,
  .llcl = FALSE
)

Arguments

.data

The data.frame/tibble to be passed.

.date_col

The column holding the timestamp.

.value_col

The column with the values to be analyzed.

.interactive

Default is FALSE, TRUE for an interactive plotly plot.

.median

Default is TRUE. This will show the median line of the data.

.cl

This is the first upper control line

.mcl

This is the second sigma control line positive

.ucl

This is the third sigma control line positive

.lc

This is the first negative control line

.lmcl

This is the second sigma negative control line

.llcl

This si the thrid sigma negative control line

Details

Expects a time-series tibble/data.frame
Expects a date column and a value column

Value

A static ggplot2 graph or if .interactive is set to TRUE a plotly plot

Author(s)

Steven P. Sanderson II, MPH

Examples

library(dplyr)

data_tbl <- ts_to_tbl(AirPassengers) %>%
  select(-index)

data_tbl %>%
  ts_qc_run_chart(
    .date_col    = date_col
    , .value_col = value
    , .llcl      = TRUE
  )

Time Series Model QQ Plot

Description

This takes in a calibration tibble and will produce a QQ plot.

Usage

ts_qq_plot(.calibration_tbl, .model_id = NULL, .interactive = FALSE)

Arguments

.calibration_tbl

A calibrated modeltime table.

.model_id

The id of a particular model from a calibration tibble. If there are multiple models in the tibble and this remains NULL then the plot will be returned using ggplot2::facet_grid(~ .model_id)

.interactive

A boolean with a default value of FALSE. TRUE will produce an interactive plotly plot.

Details

This takes in a calibration tibble and will create a QQ plot. You can also pass in a model_id and a boolean for interactive which will return a plotly::ggplotly interactive plot.

Value

A QQ plot.

Author(s)

Steven P. Sanderson II, MPH

Examples

# NOT RUN
## Not run: 
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(timetk))
suppressPackageStartupMessages(library(modeltime))
suppressPackageStartupMessages(library(rsample))
suppressPackageStartupMessages(library(workflows))
suppressPackageStartupMessages(library(parsnip))
suppressPackageStartupMessages(library(recipes))

data_tbl <- ts_to_tbl(AirPassengers) %>%
  select(-index)

splits <- time_series_split(
  data_tbl,
  date_var = date_col,
  assess = "12 months",
  cumulative = TRUE
)

rec_obj <- recipe(value ~ ., training(splits))

model_spec_arima <- arima_reg() %>%
  set_engine(engine = "auto_arima")

model_spec_mars <- mars(mode = "regression") %>%
  set_engine("earth")

wflw_fit_arima <- workflow() %>%
  add_recipe(rec_obj) %>%
  add_model(model_spec_arima) %>%
  fit(training(splits))

wflw_fit_mars <- workflow() %>%
  add_recipe(rec_obj) %>%
  add_model(model_spec_mars) %>%
  fit(training(splits))

model_tbl <- modeltime_table(wflw_fit_arima, wflw_fit_mars)

calibration_tbl <- model_tbl %>%
  modeltime_calibrate(new_data = testing(splits))

ts_qq_plot(calibration_tbl)


## End(Not run)

Random Walk Function

Description

This function takes in four arguments and returns a tibble of random walks.

Usage

ts_random_walk(
  .mean = 0,
  .sd = 0.1,
  .num_walks = 100,
  .periods = 100,
  .initial_value = 1000
)

Arguments

.mean

The desired mean of the random walks

.sd

The standard deviation of the random walks

.num_walks

The number of random walks you want generated

.periods

The length of the random walk(s) you want generated

.initial_value

The initial value where the random walks should start

Details

Monte Carlo simulations were first formally designed in the 1940’s while developing nuclear weapons, and since have been heavily used in various fields to use randomness solve problems that are potentially deterministic in nature. In finance, Monte Carlo simulations can be a useful tool to give a sense of how assets with certain characteristics might behave in the future. While there are more complex and sophisticated financial forecasting methods such as ARIMA (Auto-Regressive Integrated Moving Average) and GARCH (Generalized Auto-Regressive Conditional Heteroskedasticity) which attempt to model not only the randomness but underlying macro factors such as seasonality and volatility clustering, Monte Carlo random walks work surprisingly well in illustrating market volatility as long as the results are not taken too seriously.

Value

A tibble

Examples

ts_random_walk(
.mean = 0,
.sd = 1,
.num_walks = 25,
.periods = 180,
.initial_value = 6
)

Get Random Walk `ggplot2` layers

Description

Get layers to add to a ggplot graph from the ts_random_walk() function.

Usage

ts_random_walk_ggplot_layers(.data)

Arguments

.data

The data passed to the function.

Details

Set the intercept of the initial value from the random walk
Set the max and min of the cumulative sum of the random walks

Value

A ggplot2 layers object

Author(s)

Steven P. Sanderson II, MPH

Examples

library(ggplot2)

df <- ts_random_walk()

df %>%
  ggplot(
    mapping = aes(
      x = x
      , y = cum_y
      , color = factor(run)
      , group = factor(run)
   )
 ) +
 geom_line(alpha = 0.8) +
 ts_random_walk_ggplot_layers(df)

Provide Colorblind Compliant Colors

Description

8 Hex RGB color definitions suitable for charts for colorblind people.

Usage

ts_scale_color_colorblind(..., theme = "ts")

Arguments

...

Data passed in from a ggplot object

theme

Right now this is ts only. Anything else will render an error.

Details

This function is used in others in order to help render plots for those that are color blind.

Value

A gggplot layer

Author(s)

Steven P. Sanderson II, MPH

Provide Colorblind Compliant Colors

Description

8 Hex RGB color definitions suitable for charts for colorblind people.

Usage

ts_scale_fill_colorblind(..., theme = "ts")

Arguments

...

Data passed in from a ggplot object

theme

Right now this is ts only. Anything else will render an error.

Details

This function is used in others in order to help render plots for those that are color blind.

Value

A gggplot layer

Author(s)

Steven P. Sanderson II, MPH

Time Series Model Scedacity Plot

Description

This takes in a calibration tibble and will produce a scedacity plot.

Usage

ts_scedacity_scatter_plot(
  .calibration_tbl,
  .model_id = NULL,
  .interactive = FALSE
)

Arguments

.calibration_tbl

A calibrated modeltime table.

.model_id

The id of a particular model from a calibration tibble. If there are multiple models in the tibble and this remains NULL then the plot will be returned using ggplot2::facet_grid(~ .model_id)

.interactive

A boolean with a default value of FALSE. TRUE will produce an interactive plotly plot.

Details

This takes in a calibration tibble and will create a scedacity plot. You can also pass in a model_id and a boolean for interactive which will return a plotly::ggplotly interactive plot.

Value

A Scedacity plot.

Author(s)

Steven P. Sanderson II, MPH

Examples

# NOT RUN
## Not run: 
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(timetk))
suppressPackageStartupMessages(library(modeltime))
suppressPackageStartupMessages(library(rsample))
suppressPackageStartupMessages(library(workflows))
suppressPackageStartupMessages(library(parsnip))
suppressPackageStartupMessages(library(recipes))

data_tbl <- ts_to_tbl(AirPassengers) %>%
  select(-index)

splits <- time_series_split(
  data_tbl,
  date_var = date_col,
  assess = "12 months",
  cumulative = TRUE
)

rec_obj <- recipe(value ~ ., training(splits))

model_spec_arima <- arima_reg() %>%
  set_engine(engine = "auto_arima")

model_spec_mars <- mars(mode = "regression") %>%
  set_engine("earth")

wflw_fit_arima <- workflow() %>%
  add_recipe(rec_obj) %>%
  add_model(model_spec_arima) %>%
  fit(training(splits))

wflw_fit_mars <- workflow() %>%
  add_recipe(rec_obj) %>%
  add_model(model_spec_mars) %>%
  fit(training(splits))

model_tbl <- modeltime_table(wflw_fit_arima, wflw_fit_mars)

calibration_tbl <- model_tbl %>%
  modeltime_calibrate(new_data = testing(splits))

ts_scedacity_scatter_plot(calibration_tbl)


## End(Not run)

Simple Moving Average Plot

Description

This function will take in a value column and return any number n moving averages.

Usage

ts_sma_plot(
  .data,
  .date_col,
  .value_col,
  .sma_order = 2,
  .func = mean,
  .align = "center",
  .partial = FALSE
)

Arguments

.data

The data that you are passing, must be a data.frame/tibble.

.date_col

The column that holds the date.

.value_col

The column that holds the value.

.sma_order

This will default to 1. This can be a vector like c(2,4,6,12)

.func

The unquoted function you want to pass, mean, median, etc

.align

This can be either "left", "center", "right"

.partial

This is a bool value of TRUE/FALSE, the default is TRUE

Details

This function will accept a time series object or a tibble/data.frame. This is a simple wrapper around timetk::slidify_vec(). It uses that function to do the underlying moving average work.

It can only handle a single moving average at a time and therefore if multiple are called for, it will loop through and append data to a tibble object.

Value

Will return a list object.

Author(s)

Steven P. Sanderson II, MPH

Examples

df <- ts_to_tbl(AirPassengers)
out <- ts_sma_plot(df, date_col, value, .sma_order = c(3,6))

out$data

out$plots$static_plot

Time Series Splits Plot

Description

Sometimes we want to see the training and testing data in a plot. This is a simple wrapper around a couple of functions from the timetk package.

Usage

ts_splits_plot(.splits_obj, .date_col, .value_col)

Arguments

.splits_obj

The predefined splits object.

.date_col

The date column for the time series.

.value_col

The value column of the time series.

Details

You should already have a splits object defined. This function takes in three parameters, the splits object, a date column and the value column.

Value

A time series cv plan plot

Author(s)

Steven P. Sanderson II, MPH

Examples

suppressPackageStartupMessages(library(modeltime))
suppressPackageStartupMessages(library(timetk))
suppressPackageStartupMessages(library(dplyr))

data <- ts_to_tbl(AirPassengers) %>%
  select(-index)

splits <- time_series_split(
    data
    , date_col
    , assess = 12
    , skip = 3
    , cumulative = TRUE
)

ts_splits_plot(
    .splits_obj = splits,
    .date_col   = date_col,
    .value_col  = value
)

Event Analysis

Description

Given a tibble/data.frame, you can get information on what happens before, after, or in both directions of some given event, where the event is defined by some percentage increase/decrease in values from time t to t+1

Usage

ts_time_event_analysis_tbl(
  .data,
  .date_col,
  .value_col,
  .percent_change = 0.05,
  .horizon = 12,
  .precision = 2,
  .direction = "forward",
  .filter_non_event_groups = TRUE
)

Arguments

.data

The date.frame/tibble that holds the data.

.date_col

The column with the date value.

.value_col

The column with the value you are measuring.

.percent_change

This defaults to 0.05 which is a 5% increase in the .value_col.

.horizon

How far do you want to look back or ahead.

.precision

The default is 2 which means it rounds the lagged 1 value percent change to 2 decimal points. You may want more for more finely tuned results, this will result in fewer groupings.

.direction

The default is forward. You can supply either forward, backwards or both.

.filter_non_event_groups

The default is TRUE, this drops groupings with no events on the rare occasion it does occur.

Details

This takes in a data.frame/tibble of a time series. It requires a date column, and a value column. You can convert a ts/xts/zoo/mts object into a tibble by using the ts_to_tbl() function.

You will provide the function with a percentage change in the form of -1 to 1 inclusive. You then provide a time horizon in which you want to see. For example you may want to see what happens to AirPassengers after a 0.1 percent increase in volume.

The next most important thing to supply is the direction. Do you want to see what typically happens after such an event, what leads up to such an event, or both.

Value

A tibble.

Author(s)

Steven P. Sanderson II, MPH

Examples

suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(ggplot2))

df_tbl <- ts_to_tbl(AirPassengers) %>% select(-index)

tst <- ts_time_event_analysis_tbl(df_tbl, date_col, value, .direction = "both",
.horizon = 6)

glimpse(tst)

tst %>%
  ggplot(aes(x = x, y = mean_event_change)) +
  geom_line() +
  geom_line(aes(y = event_change_ci_high), color = "blue", linetype = "dashed") +
  geom_line(aes(y = event_change_ci_low), color = "blue", linetype = "dashed") +
  geom_vline(xintercept = 7, color = "red", linetype = "dashed") +
  theme_minimal() +
  labs(
    title = "'AirPassengers' Event Analysis at 5% Increase",
    subtitle = "Vertical Red line is normalized event epoch - Direction: Both",
    x = "",
    y = "Mean Event Change"
  )

Coerce a time-series object to a tibble

Description

This function takes in a time-series object and returns it in a tibble format.

Usage

ts_to_tbl(.data)

Arguments

.data

The time-series object you want transformed into a tibble

Details

This function makes use of timetk::tk_tbl() under the hood to obtain the initial tibble object. After the inital object is obtained a new column called date_col is constructed from the index column using lubridate if an index column is returned.

Value

A tibble

Author(s)

Steven P. Sanderson II, MPH

Examples


ts_to_tbl(BJsales)
ts_to_tbl(AirPassengers)

Augment Function Velocity

Description

Takes a numeric vector and will return the velocity of that vector.

Usage

ts_velocity_augment(.data, .value, .names = "auto")

Arguments

.data

The data being passed that will be augmented by the function.

.value

This is passed rlang::enquo() to capture the vectors you want to augment.

.names

The default is "auto"

Details

Takes a numeric vector and will return the velocity of that vector. The velocity of a time series is computed by taking the first difference, so

x_t - x_t1

This function is intended to be used on its own in order to add columns to a tibble.

Value

A augmented

Author(s)

Steven P. Sanderson II, MPH

Examples

suppressPackageStartupMessages(library(dplyr))

len_out    = 10
by_unit    = "month"
start_date = as.Date("2021-01-01")

data_tbl <- tibble(
  date_col = seq.Date(from = start_date, length.out = len_out, by = by_unit),
  a    = rnorm(len_out),
  b    = runif(len_out)
)

ts_velocity_augment(data_tbl, b)

Vector Function Time Series Acceleration

Description

Takes a numeric vector and will return the velocity of that vector.

Usage

ts_velocity_vec(.x)

Arguments

.x

A numeric vector

Details

Takes a numeric vector and will return the velocity of that vector. The velocity of a time series is computed by taking the first difference, so

x_t - x_t1

This function can be used on it's own. It is also the basis for the function ts_velocity_augment().

Value

A numeric vector

Author(s)

Steven P. Sanderson II, MPH

Examples

suppressPackageStartupMessages(library(dplyr))

len_out    = 25
by_unit    = "month"
start_date = as.Date("2021-01-01")

data_tbl <- tibble(
  date_col = seq.Date(from = start_date, length.out = len_out, by = by_unit),
  a    = rnorm(len_out),
  b    = runif(len_out)
)

vec_1 <- ts_velocity_vec(data_tbl$b)

plot(data_tbl$b)
lines(data_tbl$b)
lines(vec_1, col = "blue")

Time Series Value, Velocity and Acceleration Plot

Description

This function will produce three plots faceted on a single graph. The three graphs are the following:

Value Plot (Actual values)
Value Velocity Plot
Value Acceleration Plot

Usage

ts_vva_plot(.data, .date_col, .value_col)

Arguments

.data

The data you want to visualize. This should be pre-processed and the aggregation should match the .frequency argument.

.date_col

The data column from the .data argument.

.value_col

The value column from the .data argument

Details

This function expects to take in a data.frame/tibble. It will return a list object that contains the augmented data along with a static plot and an interactive plotly plot. It is important that the data be prepared and have at minimum a date column and the value column as they need to be supplied to the function. If your data is a ts, xts, zoo or mts then use ts_to_tbl() to convert it to a tibble.

Value

The original time series augmented with the differenced data, a static plot and a plotly plot of the ggplot object. The output is a list that gets returned invisibly.

Author(s)

Steven P. Sanderson II, MPH

Examples

suppressPackageStartupMessages(library(dplyr))

data_tbl <- ts_to_tbl(AirPassengers) %>%
  select(-index)

ts_vva_plot(data_tbl, date_col, value)$plots$static_plot

Auto Arima XGBoost Workflowset Function

Description

This function is used to quickly create a workflowsets object.

Usage

ts_wfs_arima_boost(
  .model_type = "all_engines",
  .recipe_list,
  .trees = 10,
  .min_node = 2,
  .tree_depth = 6,
  .learn_rate = 0.015,
  .stop_iter = NULL,
  .seasonal_period = 0,
  .non_seasonal_ar = 0,
  .non_seasonal_differences = 0,
  .non_seasonal_ma = 0,
  .seasonal_ar = 0,
  .seasonal_differences = 0,
  .seasonal_ma = 0
)

Arguments

.model_type

This is where you will set your engine. It uses modeltime::arima_boost() under the hood and can take one of the following:

"arima_xgboost"
"auto_arima_xgboost
"all_engines" - This will make a model spec for all available engines.

.recipe_list

You must supply a list of recipes. list(rec_1, rec_2, ...)

.trees

An integer for the number of trees contained in the ensemble.

.min_node

An integer for the minimum number of data points in a node that is required for the node to be split further.

.tree_depth

An integer for the maximum depth of the tree (i.e. number of splits) (specific engines only).

.learn_rate

A number for the rate at which the boosting algorithm adapts from iteration-to-iteration (specific engines only).

.stop_iter

The number of iterations without improvement before stopping (xgboost only).

.seasonal_period

Set to 0,

.non_seasonal_ar

Set to 0,

.non_seasonal_differences

Set to 0,

.non_seasonal_ma

Set to 0,

.seasonal_ar

Set to 0,

.seasonal_differences

Set to 0,

.seasonal_ma

Set to 0,

Details

This function expects to take in the recipes that you want to use in the modeling process. This is an automated workflow process. There are sensible defaults set for the model specification, but if you choose you can set them yourself if you have a good understanding of what they should be. The mode is set to "regression".

This uses the option set_engine("auto_arima_xgboost") or set_engine("arima_xgboost")

modeltime::arima_boost() arima_boost() is a way to generate a specification of a time series model that uses boosting to improve modeling errors (residuals) on Exogenous Regressors. It works with both "automated" ARIMA (auto.arima) and standard ARIMA (arima). The main algorithms are:

Auto ARIMA + XGBoost Errors (engine = auto_arima_xgboost, default)
ARIMA + XGBoost Errors (engine = arima_xgboost)

Value

Returns a workflowsets object.

Author(s)

Steven P. Sanderson II, MPH

Examples

suppressPackageStartupMessages(library(modeltime))
suppressPackageStartupMessages(library(timetk))
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(rsample))

data <- AirPassengers %>%
  ts_to_tbl() %>%
  select(-index)

splits <- time_series_split(
   data
  , date_col
  , assess = 12
  , skip = 3
  , cumulative = TRUE
)

rec_objs <- ts_auto_recipe(
 .data = training(splits)
 , .date_col = date_col
 , .pred_col = value
)

wf_sets <- ts_wfs_arima_boost("all_engines", rec_objs)
wf_sets

Auto Arima (Forecast auto_arima) Workflowset Function

Description

This function is used to quickly create a workflowsets object.

Usage

ts_wfs_auto_arima(.model_type = "auto_arima", .recipe_list)

Arguments

.model_type

This is where you will set your engine. It uses modeltime::arima_reg() under the hood and can take one of the following:

"auto_arima"

.recipe_list

You must supply a list of recipes. list(rec_1, rec_2, ...)

Details

This only uses the option set_engine("auto_arima") and therefore the .model_type is not needed. The parameter is kept because it is possible in the future that this could change, and it keeps with the framework of how other functions are written.

modeltime::arima_reg() arima_reg() is a way to generate a specification of an ARIMA model before fitting and allows the model to be created using different packages. Currently the only package is forecast.

Value

Returns a workflowsets object.

Author(s)

Steven P. Sanderson II, MPH

Examples

suppressPackageStartupMessages(library(modeltime))
suppressPackageStartupMessages(library(timetk))
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(rsample))

data <- AirPassengers %>%
  ts_to_tbl() %>%
  select(-index)

splits <- time_series_split(
   data
  , date_col
  , assess = 12
  , skip = 3
  , cumulative = TRUE
)

rec_objs <- ts_auto_recipe(
 .data = training(splits)
 , .date_col = date_col
 , .pred_col = value
)

wf_sets <- ts_wfs_auto_arima("auto_arima", rec_objs)
wf_sets

Auto ETS Workflowset Function

Description

This function is used to quickly create a workflowsets object.

Usage

ts_wfs_ets_reg(
  .model_type = "all_engines",
  .recipe_list,
  .seasonal_period = "auto",
  .error = "auto",
  .trend = "auto",
  .season = "auto",
  .damping = "auto",
  .smooth_level = 0.1,
  .smooth_trend = 0.1,
  .smooth_seasonal = 0.1
)

Arguments

.model_type

This is where you will set your engine. It uses modeltime::exp_smoothing() under the hood and can take one of the following:

"ets"
"croston"
"theta"
"smooth_es"
"all_engines" - This will make a model spec for all available engines.

.recipe_list

You must supply a list of recipes. list(rec_1, rec_2, ...)

.seasonal_period

A seasonal frequency. Uses "auto" by default. A character phrase of "auto" or time-based phrase of "2 weeks" can be used if a date or date-time variable is provided. See Fit Details below.

.error

The form of the error term: "auto", "additive", or "multiplicative". If the error is multiplicative, the data must be non-negative.

.trend

The form of the trend term: "auto", "additive", "multiplicative" or0 "none".

.season

The form of the seasonal term: "auto", "additive", "multiplicative" or "none".

.damping

Apply damping to a trend: "auto", "damped", or "none".

.smooth_level

This is often called the "alpha" parameter used as the base level smoothing factor for exponential smoothing models.

.smooth_trend

This is often called the "beta" parameter used as the trend smoothing factor for exponential smoothing models.

.smooth_seasonal

This is often called the "gamma" parameter used as the seasonal smoothing factor for exponential smoothing models.

Details

This uses the following engines:

modeltime::exp_smoothing() exp_smoothing() is a way to generate a specification of an Exponential Smoothing model before fitting and allows the model to be created using different packages. Currently the only package is forecast. Several algorithms are implemented:

"ets"
"croston"
"theta"
"smooth_es

Value

Returns a workflowsets object.

Author(s)

Steven P. Sanderson II, MPH

Examples

suppressPackageStartupMessages(library(modeltime))
suppressPackageStartupMessages(library(timetk))
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(rsample))

data <- AirPassengers %>%
  ts_to_tbl() %>%
  select(-index)

splits <- time_series_split(
   data
  , date_col
  , assess = 12
  , skip = 3
  , cumulative = TRUE
)

rec_objs <- ts_auto_recipe(
 .data = training(splits)
 , .date_col = date_col
 , .pred_col = value
)

wf_sets <- ts_wfs_ets_reg("all_engines", rec_objs)
wf_sets

Auto Linear Regression Workflowset Function

Description

This function is used to quickly create a workflowsets object.

Usage

ts_wfs_lin_reg(.model_type, .recipe_list, .penalty = 1, .mixture = 0.5)

Arguments

.model_type

This is where you will set your engine. It uses parsnip::linear_reg() under the hood and can take one of the following:

"lm"
"glmnet"
"all_engines" - This will make a model spec for all available engines.

Not yet implemented are:

"stan"
"spark"
"keras"

.recipe_list

You must supply a list of recipes. list(rec_1, rec_2, ...)

.penalty

The penalty parameter of the glmnet. The default is 1

.mixture

The mixture parameter of the glmnet. The default is 0.5

Details

This function expects to take in the recipes that you want to use in the modeling process. This is an automated workflow process. There are sensible defaults set for the glmnet model specification, but if you choose you can set them yourself if you have a good understanding of what they should be.

Value

Returns a workflowsets object.

Author(s)

Steven P. Sanderson II, MPH

Examples

suppressPackageStartupMessages(library(modeltime))
suppressPackageStartupMessages(library(timetk))
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(rsample))

data <- AirPassengers %>%
  ts_to_tbl() %>%
  select(-index)

splits <- time_series_split(
   data
  , date_col
  , assess = 12
  , skip = 3
  , cumulative = TRUE
)

rec_objs <- ts_auto_recipe(
 .data = training(splits)
 , .date_col = date_col
 , .pred_col = value
)

wf_sets <- ts_wfs_lin_reg("all_engines", rec_objs)
wf_sets

Auto MARS (Earth) Workflowset Function

Description

This function is used to quickly create a workflowsets object.

Usage

ts_wfs_mars(
  .model_type = "earth",
  .recipe_list,
  .num_terms = 200,
  .prod_degree = 1,
  .prune_method = "backward"
)

Arguments

.model_type

This is where you will set your engine. It uses parsnip::mars() under the hood and can take one of the following:

"earth"

.recipe_list

You must supply a list of recipes. list(rec_1, rec_2, ...)

.num_terms

The number of features that will be retained in the final model, including the intercept.

.prod_degree

The highest possible interaction degree.

.prune_method

The pruning method. This is a character, the default is "backward". You can choose from one of the following:

"backward"
"none"
"exhaustive"
"forward"
"seqrep"
"cv"

Details

This only uses the option set_engine("earth") and therefore the .model_type is not needed. The parameter is kept because it is possible in the future that this could change, and it keeps with the framework of how other functions are written.

Value

Returns a workflowsets object.

Author(s)

Steven P. Sanderson II, MPH

Examples

suppressPackageStartupMessages(library(modeltime))
suppressPackageStartupMessages(library(timetk))
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(rsample))

data <- AirPassengers %>%
  ts_to_tbl() %>%
  select(-index)

splits <- time_series_split(
   data
  , date_col
  , assess = 12
  , skip = 3
  , cumulative = TRUE
)

rec_objs <- ts_auto_recipe(
 .data = training(splits)
 , .date_col = date_col
 , .pred_col = value
)

wf_sets <- ts_wfs_mars("earth", rec_objs)
wf_sets

Auto NNETAR Workflowset Function

Description

This function is used to quickly create a workflowsets object.

Usage

ts_wfs_nnetar_reg(
  .model_type = "nnetar",
  .recipe_list,
  .non_seasonal_ar = 0,
  .seasonal_ar = 0,
  .hidden_units = 5,
  .num_networks = 10,
  .penalty = 0.1,
  .epochs = 10
)

Arguments

.model_type

This is where you will set your engine. It uses modeltime::nnetar_reg() under the hood and can take one of the following:

"nnetar"

.recipe_list

You must supply a list of recipes. list(rec_1, rec_2, ...)

.non_seasonal_ar

The order of the non-seasonal auto-regressive (AR) terms. Often denoted "p" in pdq-notation.

.seasonal_ar

The order of the seasonal auto-regressive (SAR) terms. Often denoted "P" in PDQ-notation.

.hidden_units

An integer for the number of units in the hidden model.

.num_networks

Number of networks to fit with different random starting weights. These are then averaged when producing forecasts.

.penalty

A non-negative numeric value for the amount of weight decay.

.epochs

An integer for the number of training iterations.

Details

This uses the following engines:

modeltime::nnetar_reg() nnetar_reg() is a way to generate a specification of an NNETAR model before fitting and allows the model to be created using different packages. Currently the only package is forecast.

"nnetar"

Value

Returns a workflowsets object.

Author(s)

Steven P. Sanderson II, MPH

Examples

suppressPackageStartupMessages(library(modeltime))
suppressPackageStartupMessages(library(timetk))
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(rsample))

data <- AirPassengers %>%
  ts_to_tbl() %>%
  select(-index)

splits <- time_series_split(
   data
  , date_col
  , assess = 12
  , skip = 3
  , cumulative = TRUE
)

rec_objs <- ts_auto_recipe(
 .data = training(splits)
 , .date_col = date_col
 , .pred_col = value
)

wf_sets <- ts_wfs_nnetar_reg("nnetar", rec_objs)
wf_sets

Auto PROPHET Regression Workflowset Function

Description

This function is used to quickly create a workflowsets object.

Usage

ts_wfs_prophet_reg(
  .model_type = "all_engines",
  .recipe_list,
  .growth = NULL,
  .changepoint_num = 25,
  .changepoint_range = 0.8,
  .seasonality_yearly = "auto",
  .seasonality_weekly = "auto",
  .seasonality_daily = "auto",
  .season = "additive",
  .prior_scale_changepoints = 25,
  .prior_scale_seasonality = 1,
  .prior_scale_holidays = 1,
  .logistic_cap = NULL,
  .logistic_floor = NULL,
  .trees = 50,
  .min_n = 10,
  .tree_depth = 5,
  .learn_rate = 0.01,
  .loss_reduction = NULL,
  .stop_iter = NULL
)

Arguments

.model_type

This is where you will set your engine. It uses modeltime::prophet_reg() under the hood and can take one of the following:

"prophet" Or modeltime::prophet_boost() under the hood and can take one of the following:
"prophet_xgboost" You can also choose:
"all_engines" - This will make a model spec for all available engines.

.recipe_list

You must supply a list of recipes. list(rec_1, rec_2, ...)

.growth

String 'linear' or 'logistic' to specify a linear or logistic trend.

.changepoint_num

Number of potential changepoints to include for modeling trend.

.changepoint_range

Adjusts the flexibility of the trend component by limiting to a percentage of data before the end of the time series. 0.80 means that a changepoint cannot exist after the first 80% of the data.

.seasonality_yearly

One of "auto", TRUE or FALSE. Set to FALSE for prophet_xgboost. Toggles on/off a seasonal component that models year-over-year seasonality.

.seasonality_weekly

One of "auto", TRUE or FALSE. Toggles on/off a seasonal component that models week-over-week seasonality. Set to FALSE for prophet_xgboost

.seasonality_daily

One of "auto", TRUE or FALSE. Toggles on/off a seasonal componet that models day-over-day seasonality. Set to FALSE for prophet_xgboost

.season

'additive' (default) or 'multiplicative'.

.prior_scale_changepoints

Parameter modulating the flexibility of the automatic changepoint selection. Large values will allow many changepoints, small values will allow few changepoints.

.prior_scale_seasonality

Parameter modulating the strength of the seasonality model. Larger values allow the model to fit larger seasonal fluctuations, smaller values dampen the seasonality.

.prior_scale_holidays

Parameter modulating the strength of the holiday components model, unless overridden in the holidays input.

.logistic_cap

When growth is logistic, the upper-bound for "saturation".

.logistic_floor

When growth is logistic, the lower-bound for "saturation"

.trees

An integer for the number of trees contained in the ensemble.

.min_n

An integer for the minimum number of data points in a node that is required for the node to be split further.

.tree_depth

An integer for the maximum depth of the tree (i.e. number of splits) (specific engines only).

.learn_rate

A number for the rate at which the boosting algorithm adapts from iteration-to-iteration (specific engines only).

.loss_reduction

A number for the reduction in the loss function required to split further (specific engines only).

.stop_iter

The number of iterations without improvement before stopping (xgboost only).

Details

This function expects to take in the recipes that you want to use in the modeling process. This is an automated workflow process. There are sensible defaults set for the prophet and prophet_xgboost model specification, but if you choose you can set them yourself if you have a good understanding of what they should be.

Value

Returns a workflowsets object.

Author(s)

Steven P. Sanderson II, MPH

Examples

suppressPackageStartupMessages(library(modeltime))
suppressPackageStartupMessages(library(timetk))
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(rsample))

data <- AirPassengers %>%
  ts_to_tbl() %>%
  select(-index)

splits <- time_series_split(
   data
  , date_col
  , assess = 12
  , skip = 3
  , cumulative = TRUE
)

rec_objs <- ts_auto_recipe(
 .data = training(splits)
 , .date_col = date_col
 , .pred_col = value
)

wf_sets <- ts_wfs_prophet_reg("all_engines", rec_objs)
wf_sets

Auto SVM Poly (Kernlab) Workflowset Function

Description

This function is used to quickly create a workflowsets object.

Usage

ts_wfs_svm_poly(
  .model_type = "kernlab",
  .recipe_list,
  .cost = 1,
  .degree = 1,
  .scale_factor = 1,
  .margin = 0.1
)

Arguments

.model_type

This is where you will set your engine. It uses parsnip::svm_poly() under the hood and can take one of the following:

"kernlab"

.recipe_list

You must supply a list of recipes. list(rec_1, rec_2, ...)

.cost

A positive number for the cose of predicting a sample within or on the wrong side of the margin.

.degree

A positive number for polynomial degree.

.scale_factor

A positive number for the polynomial scaling factor.

.margin

A positive number for the epsilon in the SVM insensitive loss function (regression only.)

Details

This only uses the option set_engine("kernlab") and therefore the .model_type is not needed. The parameter is kept because it is possible in the future that this could change, and it keeps with the framework of how other functions are written.

parsnip::svm_poly() svm_poly() defines a support vector machine model. For classification, the model tries to maximize the width of the margin between classes. For regression, the model optimizes a robust loss function that is only affected by very large model residuals.

This SVM model uses a nonlinear function, specifically a polynomial function, to create the decision boundary or regression line.

Value

Returns a workflowsets object.

Author(s)

Steven P. Sanderson II, MPH

Examples

suppressPackageStartupMessages(library(modeltime))
suppressPackageStartupMessages(library(timetk))
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(rsample))

data <- AirPassengers %>%
  ts_to_tbl() %>%
  select(-index)

splits <- time_series_split(
   data
  , date_col
  , assess = 12
  , skip = 3
  , cumulative = TRUE
)

rec_objs <- ts_auto_recipe(
 .data = training(splits)
 , .date_col = date_col
 , .pred_col = value
)

wf_sets <- ts_wfs_svm_poly("kernlab", rec_objs)
wf_sets

Auto SVM RBF (Kernlab) Workflowset Function

Description

This function is used to quickly create a workflowsets object.

Usage

ts_wfs_svm_rbf(
  .model_type = "kernlab",
  .recipe_list,
  .cost = 1,
  .rbf_sigma = 0.01,
  .margin = 0.1
)

Arguments

.model_type

This is where you will set your engine. It uses parsnip::svm_rbf() under the hood and can take one of the following:

"kernlab"

.recipe_list

You must supply a list of recipes. list(rec_1, rec_2, ...)

.cost

A positive number for the cost of predicting a sample within or on the wrong side of the margin.

.rbf_sigma

A positive number for the radial basis function.

.margin

A positive number for the epsilon in the SVM insensitive loss function (regression only).

Details

parsnip::svm_rbf() svm_rbf() defines a support vector machine model. For classification, the model tries to maximize the width of the margin between classes. For regression, the model optimizes a robust loss function that is only affected by very large model residuals.

This SVM model uses a nonlinear function, specifically a polynomial function, to create the decision boundary or regression line.

Value

Returns a workflowsets object.

Author(s)

Steven P. Sanderson II, MPH

Examples

suppressPackageStartupMessages(library(modeltime))
suppressPackageStartupMessages(library(timetk))
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(rsample))

data <- AirPassengers %>%
  ts_to_tbl() %>%
  select(-index)

splits <- time_series_split(
   data
  , date_col
  , assess = 12
  , skip = 3
  , cumulative = TRUE
)

rec_objs <- ts_auto_recipe(
 .data = training(splits)
 , .date_col = date_col
 , .pred_col = value
)

wf_sets <- ts_wfs_svm_rbf("kernlab", rec_objs)
wf_sets

Auto XGBoost (XGBoost) Workflowset Function

Description

This function is used to quickly create a workflowsets object.

Usage

ts_wfs_xgboost(
  .model_type = "xgboost",
  .recipe_list,
  .trees = 15L,
  .min_n = 1L,
  .tree_depth = 6L,
  .learn_rate = 0.3,
  .loss_reduction = 0,
  .sample_size = 1,
  .stop_iter = Inf
)

Arguments

.model_type

This is where you will set your engine. It uses parsnip::boost_tree under the hood and can take one of the following:

"xgboost"

.recipe_list

You must supply a list of recipes. list(rec_1, rec_2, ...)

.trees

The number of trees (type: integer, default: 15L)

.min_n

Minimal Node Size (type: integer, default: 1L)

.tree_depth

Tree Depth (type: integer, default: 6L)

.learn_rate

Learning Rate (type: double, default: 0.3)

.loss_reduction

Minimum Loss Reduction (type: double, default: 0.0)

.sample_size

Proportion Observations Sampled (type: double, default: 1.0)

.stop_iter

The number of ierations Before Stopping (type: integer, default: Inf)

Details

This only uses the option set_engine("xgboost") and therefore the .model_type is not needed. The parameter is kept because it is possible in the future that this could change, and it keeps with the framework of how other functions are written.

parsnip::boost_tree() xgboost::xgb.train() creates a series of decision trees forming an ensemble. Each tree depends on the results of previous trees. All trees in the ensemble are combined to produce a final prediction.

Value

Returns a workflowsets object.

Author(s)

Steven P. Sanderson II, MPH

Examples

suppressPackageStartupMessages(library(modeltime))
suppressPackageStartupMessages(library(timetk))
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(rsample))

data <- AirPassengers %>%
  ts_to_tbl() %>%
  select(-index)

splits <- time_series_split(
   data
  , date_col
  , assess = 12
  , skip = 3
  , cumulative = TRUE
)

rec_objs <- ts_auto_recipe(
 .data = training(splits)
 , .date_col = date_col
 , .pred_col = value
)

wf_sets <- ts_wfs_xgboost("xgboost", rec_objs)
wf_sets

Differencing with Log Transformation to Make Time Series Stationary

Description

This function attempts to make a non-stationary time series stationary by applying differencing with a logarithmic transformation. It iteratively increases the differencing order until stationarity is achieved or informs the user if the transformation is not possible.

Usage

util_difflog_ts(.time_series)

Arguments

.time_series

A time series object to be made stationary.

Details

The function calculates the frequency of the input time series using the stats::frequency function and checks if the minimum value of the time series is greater than 0. It then applies differencing with a logarithmic transformation incrementally until the Augmented Dickey-Fuller test indicates stationarity (p-value < 0.05) or until the differencing order reaches the frequency of the data.

If differencing with a logarithmic transformation successfully makes the time series stationary, it returns the stationary time series and related information as a list with the following elements:

stationary_ts: The stationary time series after the transformation.
ndiffs: The order of differencing applied to make it stationary.
adf_stats: Augmented Dickey-Fuller test statistics on the stationary time series.
trans_type: Transformation type, which is "diff_log" in this case.
ret: TRUE to indicate a successful transformation.

If the data either had a minimum value less than or equal to 0 or requires more differencing than its frequency allows, it informs the user and suggests trying double differencing with a logarithmic transformation.

Value

If the time series is already stationary or the differencing with a logarithmic transformation is successful,

Author(s)

Steven P. Sanderson II, MPH

Examples

# Example 1: Using a time series dataset
util_difflog_ts(AirPassengers)

# Example 2: Using a different time series dataset
util_difflog_ts(BJsales)$ret

Double Differencing to Make Time Series Stationary

Description

This function attempts to make a non-stationary time series stationary by applying double differencing. It iteratively increases the differencing order until stationarity is achieved.

Usage

util_doublediff_ts(.time_series)

Arguments

.time_series

A time series object to be made stationary.

Details

The function calculates the frequency of the input time series using the stats::frequency function. It then applies double differencing incrementally until the Augmented Dickey-Fuller test indicates stationarity (p-value < 0.05) or until the differencing order reaches the frequency of the data.

If double differencing successfully makes the time series stationary, it returns the stationary time series and related information as a list with the following elements:

stationary_ts: The stationary time series after double differencing.
ndiffs: The order of differencing applied to make it stationary.
adf_stats: Augmented Dickey-Fuller test statistics on the stationary time series.
trans_type: Transformation type, which is "double_diff" in this case.
ret: TRUE to indicate a successful transformation.

If the data requires more double differencing than its frequency allows, it informs the user and suggests trying differencing with the natural logarithm instead.

Value

If the time series is already stationary or the double differencing is successful, it returns a list as described in the details section. If additional differencing is required, it informs the user and returns a list with ret set to FALSE, suggesting trying differencing with the natural logarithm.

Author(s)

Steven P. Sanderson II, MPH

Examples

# Example 1: Using a time series dataset
util_doublediff_ts(AirPassengers)

# Example 2: Using a different time series dataset
util_doublediff_ts(BJsales)$ret

Double Differencing with Log Transformation to Make Time Series Stationary

Description

This function attempts to make a non-stationary time series stationary by applying double differencing with a logarithmic transformation. It iteratively increases the differencing order until stationarity is achieved or informs the user if the transformation is not possible.

Usage

util_doubledifflog_ts(.time_series)

Arguments

.time_series

A time series object to be made stationary.

Details

The function calculates the frequency of the input time series using the stats::frequency function and checks if the minimum value of the time series is greater than 0. It then applies double differencing with a logarithmic transformation incrementally until the Augmented Dickey-Fuller test indicates stationarity (p-value < 0.05) or until the differencing order reaches the frequency of the data.

If double differencing with a logarithmic transformation successfully makes the time series stationary, it returns the stationary time series and related information as a list with the following elements:

stationary_ts: The stationary time series after the transformation.
ndiffs: The order of differencing applied to make it stationary.
adf_stats: Augmented Dickey-Fuller test statistics on the stationary time series.
trans_type: Transformation type, which is "double_diff_log" in this case.
ret: TRUE to indicate a successful transformation.

If the data either had a minimum value less than or equal to 0 or requires more differencing than its frequency allows, it informs the user that the data could not be stationarized.

Value

If the time series is already stationary or the double differencing with a logarithmic transformation is successful, it returns a list as described in the details section. If the transformation is not possible, it informs the user and returns a list with ret set to FALSE, indicating that the data could not be stationarized.

Author(s)

Steven P. Sanderson II, MPH

Examples

# Example 1: Using a time series dataset
util_doubledifflog_ts(AirPassengers)

# Example 2: Using a different time series dataset
util_doubledifflog_ts(BJsales)$ret

Logarithmic Transformation to Make Time Series Stationary

Description

This function attempts to make a non-stationary time series stationary by applying a logarithmic transformation. If successful, it returns the stationary time series. If the transformation fails, it informs the user.

Usage

util_log_ts(.time_series)

Arguments

.time_series

A time series object to be made stationary.

Details

This function checks if the minimum value of the input time series is greater than or equal to zero. If yes, it performs the Augmented Dickey-Fuller test on the logarithm of the time series. If the p-value of the test is less than 0.05, it concludes that the logarithmic transformation made the time series stationary and returns the result as a list with the following elements:

stationary_ts: The stationary time series after the logarithmic transformation.
ndiffs: Not applicable in this case, marked as NA.
adf_stats: Augmented Dickey-Fuller test statistics on the stationary time series.
trans_type: Transformation type, which is "log" in this case.
ret: TRUE to indicate a successful transformation.

If the minimum value of the time series is less than or equal to 0 or if the logarithmic transformation doesn't make the time series stationary, it informs the user and returns a list with ret set to FALSE.

Value

If the time series is already stationary or the logarithmic transformation is successful, it returns a list as described in the details section. If the transformation fails, it returns a list with ret set to FALSE.

Author(s)

Steven P. Sanderson II, MPH

Examples

# Example 1: Using a time series dataset
util_log_ts(AirPassengers)

# Example 2: Using a different time series dataset
util_log_ts(BJsales.lead)$ret

Single Differencing to Make Time Series Stationary

Description

This function attempts to make a non-stationary time series stationary by applying single differencing. It iteratively increases the differencing order until stationarity is achieved.

Usage

util_singlediff_ts(.time_series)

Arguments

.time_series

A time series object to be made stationary.

Details

The function calculates the frequency of the input time series using the stats::frequency function. It then applies single differencing incrementally until the Augmented Dickey-Fuller test indicates stationarity (p-value < 0.05) or until the differencing order reaches the frequency of the data.

If single differencing successfully makes the time series stationary, it returns the stationary time series and related information as a list with the following elements:

stationary_ts: The stationary time series after differencing.
ndiffs: The order of differencing applied to make it stationary.
adf_stats: Augmented Dickey-Fuller test statistics on the stationary time series.
trans_type: Transformation type, which is "diff" in this case.
ret: TRUE to indicate a successful transformation.

If the data requires more single differencing than its frequency allows, it informs the user and returns a list with ret set to FALSE, indicating that double differencing may be needed.

Value

If the time series is already stationary or the single differencing is successful, it returns a list as described in the details section. If additional differencing is required, it informs the user and returns a list with ret set to FALSE.

Author(s)

Steven P. Sanderson II, MPH

Examples

# Example 1: Using a time series dataset
util_singlediff_ts(AirPassengers)

# Example 2: Using a different time series dataset
util_singlediff_ts(BJsales)$ret

Pipe operator

Description

Usage

Value

Forecast arima.string

Description

Usage

Value

Author(s)

Misc for boilerplate

Description

Usage

Value

Automatically Stationarize Time Series Data

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Helper function - Calibrate and Plot

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Misc for boilerplate

Description

Usage

Value

Confidence Interval Generic

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Confidence Interval Generic

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Provide Colorblind Compliant Colors

Description

Usage

Details

Value

Author(s)

Examples

Misc for boilerplate

Description

Usage

Value

Event Analysis

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Event Analysis

Description

Usage

Arguments

Details

Value

Author(s)

See Also