State-space models constitute a unified framework for time series analysis that explicitly separates the underlying signal from observational noise. This methodology implements full Bayesian inference on structural time series components (Bayesian Structural Time Series - BSTS), combining the flexibility of 01 with automatic variable selection through 02 priors. The approach allows uncertainty quantification at all model levels while maintaining economic interpretability of the components.
A state-space model is defined by two fundamental equations:
Observation Equation: \[y_t = Z_t' \alpha_t + \beta' x_t + \epsilon_t, \quad \epsilon_t \sim N(0, \sigma^2_{\epsilon})\]
State (Transition) Equation: \[\alpha_{t+1} = T_t \alpha_t + R_t \eta_t, \quad \eta_t \sim N(0, Q_t)\]
where: - \(y_t\) is the observation at time \(t\) - \(\alpha_t\) is the latent state vector of dimension \(m\) - \(x_t\) is the vector of exogenous regressors of dimension \(k\) - \(\beta\) are the regression coefficients - \(Z_t\) connects the state to observations - \(T_t\) is the state transition matrix - \(R_t\) and \(Q_t\) define the variance structure of the state process - \(\epsilon_t\) is the observation error
The simplest model includes only a stochastic level:
\[\begin{aligned} y_t &= \mu_t + \beta' x_t + \epsilon_t \\ \mu_{t+1} &= \mu_t + \eta_{\mu,t} \end{aligned}\]
where \(\eta_{\mu,t} \sim N(0, \sigma^2_\mu)\) represents innovations in the level. This model is appropriate for series with varying level but no systematic trend.
Extends the LL model by adding a stochastic trend:
\[\begin{aligned} y_t &= \mu_t + \beta' x_t + \epsilon_t \\ \mu_{t+1} &= \mu_t + \delta_t + \eta_{\mu,t} \\ \delta_{t+1} &= \delta_t + \eta_{\delta,t} \end{aligned}\]
where: - \(\mu_t\) is the level at time \(t\) - \(\delta_t\) is the slope (rate of change) at time \(t\) - \(\eta_{\mu,t} \sim N(0, \sigma^2_\mu)\) are level innovations - \(\eta_{\delta,t} \sim N(0, \sigma^2_\delta)\) are slope innovations
For series with seasonal patterns:
\[\gamma_{t+1} = -\sum_{s=1}^{S-1} \gamma_{t-s+1} + \eta_{\gamma,t}\]
where \(S\) is the seasonal period (e.g., 12 for monthly data) and \(\eta_{\gamma,t} \sim N(0, \sigma^2_\gamma)\).
In high-dimensional contexts with multiple potential lags, variable selection is crucial for: - Avoiding overfitting - Maintaining interpretability - Identifying genuine predictors
For each coefficient \(\beta_j\) we define a hierarchical prior:
\[\beta_j | \gamma_j \sim \begin{cases} N(0, \tau^2 v_j) & \text{if } \gamma_j = 1 \text{ (slab)} \\ \delta_0 & \text{if } \gamma_j = 0 \text{ (spike)} \end{cases}\]
where: - \(\gamma_j \in \{0,1\}\) is the inclusion indicator - \(\tau^2 v_j\) is the slab variance (non-zero component) - \(\delta_0\) is a point mass at zero
The prior on the indicators is:
\[\gamma_j \sim \text{Bernoulli}(\pi_j)\]
with \(\pi_j\) 03 according to expected model size.
Hyperparameters are configured via:
This configuration balances flexibility with parsimony, allowing data to dominate inference while maintaining smooth regularization.
Inference is performed through a hybrid Gibbs sampling scheme:
While not explicitly reported for computational efficiency, standard diagnostics include: - Visual inspection of traces - Chain autocorrelation - Effective sample size for key parameters - Stable inclusion probabilities for \(\gamma_j\)
We implement Leave-Future-Out (LFO) with rolling origin:
This more conservative scheme (6-month horizon and step vs 12 in ECM-MARS) allows greater temporal resolution in stability evaluation.
For each fold \(f\):
For each future observation \(y_t^*\):
\[\text{ELPD}_t = \log \int p(y_t^* | \theta) p(\theta | \mathcal{D}_{\text{train}}) d\theta\]
Approximated via MCMC samples as:
\[\widehat{\text{ELPD}}_t = \log \left( \frac{1}{S} \sum_{s=1}^S p(y_t^* | \theta^{(s)}) \right)\]
where we assume normality: \(p(y_t^* | \theta^{(s)}) = N(y_t^*; \mu_t^{(s)}, \sigma_t^{(s)})\).
Using the posterior mean as point prediction:
where \(\bar{y}_t = \frac{1}{S} \sum_{s=1}^S y_t^{(s)}\) is the posterior mean.
We evaluate the quality of quantified uncertainty:
If the model is well-calibrated, PIT values follow a uniform distribution on [0,1].
We systematically evaluate:
Model | Level | Trend | Seasonality | Free Parameters |
---|---|---|---|---|
LL | Stochastic | No | Optional | \(\sigma^2_\mu, \sigma^2_\epsilon\) |
LLT | Stochastic | Stochastic | Optional | \(\sigma^2_\mu, \sigma^2_\delta, \sigma^2_\epsilon\) |
The optimal structure is selected through:
This hierarchy prioritizes probabilistic predictive capacity over point fit.
A full model “wins” in fold \(f\) if:
\[\begin{cases} \Delta\text{ELPD}_f > 0 & \text{(better predictive density)} \\ \Delta\text{RMSE}_f > 0 & \text{(lower error, i.e., RMSE}_{\text{base}} - \text{RMSE}_{\text{full}} > 0\text{)} \end{cases}\]
Note: Unlike GLM-AR(1), here \(\Delta\text{RMSE} = \text{RMSE}_{\text{base}} - \text{RMSE}_{\text{full}}\), so positive values indicate improvement.
The code includes robust handling of:
Each pair generates:
list(
best_summary = tibble( # Best model summary
spec, folds, wins, support,
dELPD_mean, dRMSE_mean, ...
),
best_results = tibble( # Details per fold
fold, n_train, n_test,
ELPD_base, ELPD_full, dELPD,
RMSE_base, RMSE_full, dRMSE,
cover80, cover95, pit, ...
),
all_summaries = tibble() # All tested structures
)
Aspect | ECM-MARS | GLM-AR(1) | BSTS |
---|---|---|---|
Paradigm | Hybrid frequentist | Parametric Bayesian | Structural Bayesian |
Prerequisites | I(1), cointegration | None | None |
Non-linearity | MARS (splines) | No | No (but flexible components) |
Variable selection | Manual (pre-tests) | No | Automatic (spike-slab) |
Temporal components | ECM only | Global AR(1) | Level, trend, seasonality |
Uncertainty | Implicit bootstrap | Complete posterior | Complete posterior + states |
Interpretability | High (economic) | Medium | High (decomposition) |
Computational cost | Medium | High | Very high |
Interpretable decomposition: Separates signal from noise with economically meaningful components
Automatic selection: Spike-and-slab identifies relevant predictors without pre-specification
Structural uncertainty handling: Quantifies uncertainty in unobservable components
Robustness to specification: Does not require assumptions about integration order or cointegration
Interval forecasts: Natural and well-calibrated prediction intervals
High computational cost: FFBS + SSVS is intensive, especially with many regressors
Linearity in predictors: Does not capture non-linear interactions like MARS
Complex diagnostics: Convergence harder to verify than OLS
Sensitivity to priors in small samples: Especially for component variances
INPUT: DataFrame with time series, variable specification
OUTPUT: Ranking of pairs by predictive capacity and stability
# PREPARATION
1. LOAD data and clean names
2. IDENTIFY 6 circulation variables, 7 production
3. GENERATE 84 pairs (2 × 6 × 7)
# PROCESSING PER PAIR
FOR each pair (Y, X):
4. BUILD matrix with Y, X, lags(X, 1:6)
5. REMOVE observations with NA
# STRUCTURE TUNING
FOR each structure in {LL, LLT}:
# LEAVE-FUTURE-OUT
6. GENERATE LFO splits (init=80%, h=6, step=6)
FOR each fold:
7. SPLIT train/test
8. SCALE regressors with train stats
# MODELS
9. FIT base_model:
- state_spec = structure without regressors
- MCMC with 2000 iter, 500 burn-in
10. FIT full_model:
- state_spec = structure
- prior = spike_slab(X_lags, expected_size=5)
- MCMC with 2000 iter, 500 burn-in
# PREDICTION
11. GENERATE predictive distributions (h=6)
12. EXTRACT mean, sd, quantiles
# EVALUATION
13. CALCULATE:
- ELPD_base, ELPD_full → ΔELPD
- RMSE_base, RMSE_full → ΔRMSE
- Coverage 80%, 95%
- PIT values
14. DETERMINE victory: (ΔELPD > 0) ∧ (ΔRMSE > 0)
15. SUMMARIZE structure:
- support = wins/folds
- metric averages
16. SELECT best structure by ΔELPD
17. SAVE pair results
# RANKING AND EXPORT
18. ORDER pairs by (support DESC, ΔELPD DESC, ΔRMSE DESC)
19. FILTER winners by thresholds (0.70, 0.60)
20. EXPORT CSVs and visualizations
21. GENERATE plots for last fold (without re-fitting)
The BSTS methodology represents the most comprehensive and flexible approach of the three implemented, combining the rigor of Bayesian inference with the interpretability of structural decomposition. Unlike ECM-MARS which requires prior validation of strict econometric assumptions, and GLM-AR(1) which models only global serial dependence, BSTS decomposes the series into interpretable components while automatically selecting relevant predictors.
Empirical results suggest that BSTS identifies robust predictive relationships with high precision, albeit at substantially higher computational cost. The ability to quantify uncertainty at all levels—from structural components to predictions—makes it particularly valuable for applications where risk assessment is critical.
The implementation with a 6-month horizon (vs 12 in other methodologies) provides more granular evaluation of temporal stability, revealing that many relationships appearing stable at long horizons show variability at shorter scales. This suggests the importance of considering multiple horizons in predictive model evaluation.
The BSTS framework is especially appropriate when: - Component interpretation is a priority - There is uncertainty about which predictors to include - Well-calibrated prediction intervals are required - Data exhibit multiple sources of variation (trend, seasonality, regressors)
The combination of automatic variable selection with structural modeling positions BSTS as a bridge between classical econometric methods and modern machine learning techniques, maintaining interpretability while embracing the inherent complexity in economic data.
Structural decomposition separates systematic variation (trend, seasonality) from random variation, facilitating economic interpretation and improving forecasts by modeling each component appropriately.↩︎
The spike-and-slab prior combines a continuous distribution (slab) for active coefficients with a point mass at zero (spike) for inactive coefficients, enabling exact variable selection.↩︎
Calibration of predictive probabilities is crucial for decision-making under uncertainty. A well-calibrated model assigns probabilities that reflect long-run empirical frequencies.↩︎
Forward Filtering Backward Sampling (FFBS) is the standard algorithm for sampling states in linear Gaussian models, combining forward Kalman filtering with backward sampling conditioned on complete data.↩︎
Stochastic Search Variable Selection (SSVS) is an MCMC method for exploring model space, alternating between updating coefficients given the model and updating the model given coefficients.↩︎
The validation scheme with h=6 and step=6 avoids overlap between test sets, ensuring independence in evaluation while maximizing data usage.↩︎