State-Space Model Framework for Time Series

State-space models constitute a unified framework for time series analysis that explicitly separates the underlying signal from observational noise. This methodology implements full Bayesian inference on structural time series components (Bayesian Structural Time Series - BSTS), combining the flexibility of 0¹ with automatic variable selection through 0² priors. The approach allows uncertainty quantification at all model levels while maintaining economic interpretability of the components.

General State-Space Representation

Mathematical Formulation

A state-space model is defined by two fundamental equations:

Observation Equation: \[y_t = Z_t' \alpha_t + \beta' x_t + \epsilon_t, \quad \epsilon_t \sim N(0, \sigma^2_{\epsilon})\]

State (Transition) Equation: \[\alpha_{t+1} = T_t \alpha_t + R_t \eta_t, \quad \eta_t \sim N(0, Q_t)\]

where: - \(y_t\) is the observation at time \(t\) - \(\alpha_t\) is the latent state vector of dimension \(m\) - \(x_t\) is the vector of exogenous regressors of dimension \(k\) - \(\beta\) are the regression coefficients - \(Z_t\) connects the state to observations - \(T_t\) is the state transition matrix - \(R_t\) and \(Q_t\) define the variance structure of the state process - \(\epsilon_t\) is the observation error

Advantages of the State-Space Framework

Structural decomposition: Separates trend, seasonality, and irregular components
Natural handling of missing data: The Kalman filter automatically interpolates
Complete uncertainty: Posterior distributions for states and forecasts
Modular flexibility: Components can be added as needed

Implemented Structural Components

Local Level (LL)

The simplest model includes only a stochastic level:

\[\begin{aligned} y_t &= \mu_t + \beta' x_t + \epsilon_t \\ \mu_{t+1} &= \mu_t + \eta_{\mu,t} \end{aligned}\]

where \(\eta_{\mu,t} \sim N(0, \sigma^2_\mu)\) represents innovations in the level. This model is appropriate for series with varying level but no systematic trend.

Local Linear Trend (LLT)

Extends the LL model by adding a stochastic trend:

\[\begin{aligned} y_t &= \mu_t + \beta' x_t + \epsilon_t \\ \mu_{t+1} &= \mu_t + \delta_t + \eta_{\mu,t} \\ \delta_{t+1} &= \delta_t + \eta_{\delta,t} \end{aligned}\]

where: - \(\mu_t\) is the level at time \(t\) - \(\delta_t\) is the slope (rate of change) at time \(t\) - \(\eta_{\mu,t} \sim N(0, \sigma^2_\mu)\) are level innovations - \(\eta_{\delta,t} \sim N(0, \sigma^2_\delta)\) are slope innovations

Seasonal Component (Optional)

For series with seasonal patterns:

\[\gamma_{t+1} = -\sum_{s=1}^{S-1} \gamma_{t-s+1} + \eta_{\gamma,t}\]

where \(S\) is the seasonal period (e.g., 12 for monthly data) and \(\eta_{\gamma,t} \sim N(0, \sigma^2_\gamma)\).

Bayesian Variable Selection with Spike-and-Slab

Economic Motivation

In high-dimensional contexts with multiple potential lags, variable selection is crucial for: - Avoiding overfitting - Maintaining interpretability - Identifying genuine predictors

Spike-and-Slab Prior

For each coefficient \(\beta_j\) we define a hierarchical prior:

\[\beta_j | \gamma_j \sim \begin{cases} N(0, \tau^2 v_j) & \text{if } \gamma_j = 1 \text{ (slab)} \\ \delta_0 & \text{if } \gamma_j = 0 \text{ (spike)} \end{cases}\]

where: - \(\gamma_j \in \{0,1\}\) is the inclusion indicator - \(\tau^2 v_j\) is the slab variance (non-zero component) - \(\delta_0\) is a point mass at zero

The prior on the indicators is:

\[\gamma_j \sim \text{Bernoulli}(\pi_j)\]

with \(\pi_j\) 0³ according to expected model size.

Hyperparameter Calibration

Hyperparameters are configured via:

Expected model size: \(E[\sum \gamma_j] = \max(1, \min(5, k))\) where \(k\) is the number of predictors
Prior information weight: \(w = 0.01\) (weakly informative prior)
Diagonal shrinkage: \(\kappa = 0.5\) for moderate regularization

This configuration balances flexibility with parsimony, allowing data to dominate inference while maintaining smooth regularization.

Bayesian Inference via MCMC

Sampling Algorithm

Inference is performed through a hybrid Gibbs sampling scheme:

Latent states (\(\alpha_{1:T}\)): 0⁴ (FFBS)
Variances (\(\sigma^2_\epsilon, \sigma^2_\mu, \sigma^2_\delta\)): Gibbs with conjugate inverse-gamma priors
Coefficients and indicators (\(\beta, \gamma\)): 0⁵ (SSVS)

MCMC Configuration

Total iterations: 2000
Burn-in: 500 (25%)
Thinning: Not applied (all post-burn samples)
Chains: 1 (parallelization at pair level)
Seed: Fixed for reproducibility

Convergence Diagnostics

While not explicitly reported for computational efficiency, standard diagnostics include: - Visual inspection of traces - Chain autocorrelation - Effective sample size for key parameters - Stable inclusion probabilities for \(\gamma_j\)

Temporal Validation with 0⁶

Validation Scheme

We implement Leave-Future-Out (LFO) with rolling origin:

Initial set: 80% of data or minimum 30 observations
Forecast horizon: \(h = 6\) months
Step between origins: 6 months
Window type: Expanding (accumulates history)

This more conservative scheme (6-month horizon and step vs 12 in ECM-MARS) allows greater temporal resolution in stability evaluation.

Procedure per Fold

For each fold \(f\):

Data preparation:
- Split into train/test according to split
- Scale regressors per fold using train statistics
Model fitting:
- Base: Only structural components without regressors
- Full: Structural components + regressors with spike-and-slab
Probabilistic prediction:
- Generate complete posterior predictive distribution
- Extract mean, standard deviation, and quantiles
Evaluation:
- Calculate ELPD, point metrics, and calibration

Predictive Evaluation Metrics

Expected Log Predictive Density (ELPD)

For each future observation \(y_t^*\):

\[\text{ELPD}_t = \log \int p(y_t^* | \theta) p(\theta | \mathcal{D}_{\text{train}}) d\theta\]

Approximated via MCMC samples as:

\[\widehat{\text{ELPD}}_t = \log \left( \frac{1}{S} \sum_{s=1}^S p(y_t^* | \theta^{(s)}) \right)\]

where we assume normality: \(p(y_t^* | \theta^{(s)}) = N(y_t^*; \mu_t^{(s)}, \sigma_t^{(s)})\).

Point Error Metrics

Using the posterior mean as point prediction:

RMSE: \(\sqrt{\frac{1}{h} \sum_{t=1}^h (y_t - \bar{y}_t)^2}\)
MAE: \(\frac{1}{h} \sum_{t=1}^h |y_t - \bar{y}_t|\)

where \(\bar{y}_t = \frac{1}{S} \sum_{s=1}^S y_t^{(s)}\) is the posterior mean.

Calibration Metrics

We evaluate the quality of quantified uncertainty:

80% Coverage: Proportion of observations within 80% credible interval
95% Coverage: Proportion of observations within 95% credible interval
PIT (Probability Integral Transform): \(u_t = F_{Y_t}(y_t)\) where \(F_{Y_t}\) is the predictive CDF

If the model is well-calibrated, PIT values follow a uniform distribution on [0,1].

Optimal Structure Selection

Structure Grid

We systematically evaluate:

Model	Level	Trend	Seasonality	Free Parameters
LL	Stochastic	No	Optional	\(\sigma^2_\mu, \sigma^2_\epsilon\)
LLT	Stochastic	Stochastic	Optional	\(\sigma^2_\mu, \sigma^2_\delta, \sigma^2_\epsilon\)

Selection Criterion

The optimal structure is selected through:

Primary: Highest average \(\Delta\text{ELPD}\) across folds
Secondary: Highest support (proportion of winning folds)
Tertiary: Highest \(\Delta\text{RMSE}\) (error reduction)

This hierarchy prioritizes probabilistic predictive capacity over point fit.

Victory and Stability Criterion

Victory per Fold

A full model “wins” in fold \(f\) if:

\[\begin{cases} \Delta\text{ELPD}_f > 0 & \text{(better predictive density)} \\ \Delta\text{RMSE}_f > 0 & \text{(lower error, i.e., RMSE}_{\text{base}} - \text{RMSE}_{\text{full}} > 0\text{)} \end{cases}\]

Note: Unlike GLM-AR(1), here \(\Delta\text{RMSE} = \text{RMSE}_{\text{base}} - \text{RMSE}_{\text{full}}\), so positive values indicate improvement.

Stability Metrics

Support: \(\frac{\text{wins}}{\text{folds}}\) = proportion of victorious folds
Strict threshold: support \(\geq 0.70\) with minimum 5 folds
Moderate threshold: support \(\geq 0.60\) with minimum 5 folds

Computational Implementation and Optimizations

Processing Architecture

Parallelization by pairs: The 84 pairs are processed sequentially
MCMC efficiency: Single chain per model (speed vs diagnostics trade-off)
Matrix caching: Reuse of decompositions in Kalman filter

Special Case Management

The code includes robust handling of:

Zero variance: Skip fold if \(\text{sd}(y) \approx 0\)
Convergence failures: Try-catch with informative messages
Singular matrices: Automatic regularization in Kalman filter
Degenerate predictions: Fallback to quantile-based intervals

Output Data Structure

Each pair generates:

list(
  best_summary = tibble(     # Best model summary
    spec, folds, wins, support, 
    dELPD_mean, dRMSE_mean, ...
  ),
  best_results = tibble(     # Details per fold
    fold, n_train, n_test,
    ELPD_base, ELPD_full, dELPD,
    RMSE_base, RMSE_full, dRMSE,
    cover80, cover95, pit, ...
  ),
  all_summaries = tibble()   # All tested structures
)

Comparative Analysis with Previous Methodologies

Comparative Approach Table

Aspect	ECM-MARS	GLM-AR(1)	BSTS
Paradigm	Hybrid frequentist	Parametric Bayesian	Structural Bayesian
Prerequisites	I(1), cointegration	None	None
Non-linearity	MARS (splines)	No	No (but flexible components)
Variable selection	Manual (pre-tests)	No	Automatic (spike-slab)
Temporal components	ECM only	Global AR(1)	Level, trend, seasonality
Uncertainty	Implicit bootstrap	Complete posterior	Complete posterior + states
Interpretability	High (economic)	Medium	High (decomposition)
Computational cost	Medium	High	Very high

Distinctive Advantages of BSTS

Interpretable decomposition: Separates signal from noise with economically meaningful components
Automatic selection: Spike-and-slab identifies relevant predictors without pre-specification
Structural uncertainty handling: Quantifies uncertainty in unobservable components
Robustness to specification: Does not require assumptions about integration order or cointegration
Interval forecasts: Natural and well-calibrated prediction intervals

Relative Limitations

High computational cost: FFBS + SSVS is intensive, especially with many regressors
Linearity in predictors: Does not capture non-linear interactions like MARS
Complex diagnostics: Convergence harder to verify than OLS
Sensitivity to priors in small samples: Especially for component variances

Extensions and Future Developments

Immediate Improvements

Additional components:
- Calendar effects: Business days, moving holidays
- Level shifts: Automatic detection of structural breaks
- Dynamic regressors: Time-varying coefficients
Informative priors:
- Incorporate economic information in inclusion priors
- Hierarchical priors for groups of related variables
Enhanced validation:
- Stochastic cross-validation (Pareto smoothed importance sampling)
- Backtesting with multiple horizons

Methodological Extensions

Non-linearity:
- Gaussian processes for non-linear components
- Bayesian neural networks for interaction capture
Hierarchical models:
- Partial pooling between related pairs
- Shared latent factors
Causality:
- Intervention models with causal inference
- Dynamic causal graphs

Complete Pipeline Pseudocode

INPUT: DataFrame with time series, variable specification
OUTPUT: Ranking of pairs by predictive capacity and stability

# PREPARATION
1. LOAD data and clean names
2. IDENTIFY 6 circulation variables, 7 production
3. GENERATE 84 pairs (2 × 6 × 7)

# PROCESSING PER PAIR
FOR each pair (Y, X):
  
  4. BUILD matrix with Y, X, lags(X, 1:6)
  5. REMOVE observations with NA
  
  # STRUCTURE TUNING
  FOR each structure in {LL, LLT}:
    
    # LEAVE-FUTURE-OUT
    6. GENERATE LFO splits (init=80%, h=6, step=6)
    
    FOR each fold:
      7. SPLIT train/test
      8. SCALE regressors with train stats
      
      # MODELS
      9. FIT base_model:
         - state_spec = structure without regressors
         - MCMC with 2000 iter, 500 burn-in
      
      10. FIT full_model:
          - state_spec = structure
          - prior = spike_slab(X_lags, expected_size=5)
          - MCMC with 2000 iter, 500 burn-in
      
      # PREDICTION
      11. GENERATE predictive distributions (h=6)
      12. EXTRACT mean, sd, quantiles
      
      # EVALUATION
      13. CALCULATE:
          - ELPD_base, ELPD_full → ΔELPD
          - RMSE_base, RMSE_full → ΔRMSE
          - Coverage 80%, 95%
          - PIT values
      
      14. DETERMINE victory: (ΔELPD > 0) ∧ (ΔRMSE > 0)
    
    15. SUMMARIZE structure:
        - support = wins/folds
        - metric averages
  
  16. SELECT best structure by ΔELPD
  17. SAVE pair results

# RANKING AND EXPORT
18. ORDER pairs by (support DESC, ΔELPD DESC, ΔRMSE DESC)
19. FILTER winners by thresholds (0.70, 0.60)
20. EXPORT CSVs and visualizations
21. GENERATE plots for last fold (without re-fitting)

Methodological Conclusions

The BSTS methodology represents the most comprehensive and flexible approach of the three implemented, combining the rigor of Bayesian inference with the interpretability of structural decomposition. Unlike ECM-MARS which requires prior validation of strict econometric assumptions, and GLM-AR(1) which models only global serial dependence, BSTS decomposes the series into interpretable components while automatically selecting relevant predictors.

Empirical results suggest that BSTS identifies robust predictive relationships with high precision, albeit at substantially higher computational cost. The ability to quantify uncertainty at all levels—from structural components to predictions—makes it particularly valuable for applications where risk assessment is critical.

The implementation with a 6-month horizon (vs 12 in other methodologies) provides more granular evaluation of temporal stability, revealing that many relationships appearing stable at long horizons show variability at shorter scales. This suggests the importance of considering multiple horizons in predictive model evaluation.

The BSTS framework is especially appropriate when: - Component interpretation is a priority - There is uncertainty about which predictors to include - Well-calibrated prediction intervals are required - Data exhibit multiple sources of variation (trend, seasonality, regressors)

The combination of automatic variable selection with structural modeling positions BSTS as a bridge between classical econometric methods and modern machine learning techniques, maintaining interpretability while embracing the inherent complexity in economic data.

Structural decomposition separates systematic variation (trend, seasonality) from random variation, facilitating economic interpretation and improving forecasts by modeling each component appropriately.↩︎
The spike-and-slab prior combines a continuous distribution (slab) for active coefficients with a point mass at zero (spike) for inactive coefficients, enabling exact variable selection.↩︎
Calibration of predictive probabilities is crucial for decision-making under uncertainty. A well-calibrated model assigns probabilities that reflect long-run empirical frequencies.↩︎
Forward Filtering Backward Sampling (FFBS) is the standard algorithm for sampling states in linear Gaussian models, combining forward Kalman filtering with backward sampling conditioned on complete data.↩︎
Stochastic Search Variable Selection (SSVS) is an MCMC method for exploring model space, alternating between updating coefficients given the model and updating the model given coefficients.↩︎
The validation scheme with h=6 and step=6 avoids overlap between test sets, ensuring independence in evaluation while maximizing data usage.↩︎

Methodological Details of Bayesian State-Space Models with Variable Selection

José Mauricio Gómez Julián

octubre 2025