The gmwmx2 R package allows to estimate
linear model with correlated residuals in presence of missing data.
More precisely, we assume the following model: \[\begin{equation} \boldsymbol{Y} = \boldsymbol{X} \boldsymbol{\beta} + \boldsymbol{\varepsilon}, \quad \boldsymbol{\varepsilon} \sim \mathcal{F}\left\{\boldsymbol{\Sigma}\left(\boldsymbol{\gamma}\right)\right\}, \end{equation}\]
where \(\boldsymbol{X} \in \mathbb{R}^{n \times p}\) is a design matrix of observed predictors, \(\boldsymbol{\beta} \in \mathbb{R}^p\) is the regression parameter vector and \(\boldsymbol{\varepsilon} = \{\varepsilon_{i}\}_{i=1,\ldots,n}\) is a zero-mean process following an unspecified joint distribution \(\mathcal{F}\) with positive-definite covariance function \(\boldsymbol{\Sigma}(\boldsymbol{\gamma}) \in \mathbb{R}^{n\times n}\) characterizing the second-order dependence structure of the process and parameterized by the vector \(\boldsymbol{\gamma} \in \boldsymbol{\Gamma} \subset \mathbb{R}^q\).
We then define the a random variable \(\boldsymbol{Z} =\{Z_{i}\}_{i=1,\ldots,n}\) which describes the missing observation mechanism. More specifically, the vector \(\boldsymbol{Z} \in \{0, 1\}^n\) is a binary-valued stationary process independent of \(\boldsymbol{Y}\) with expectation \(\mu(\boldsymbol{\vartheta}) = \mathbb{E}[Z_i] \in [0, \, 1)\), \(\forall \, i\), and covariance matrix \(\boldsymbol{\Lambda}(\boldsymbol{\vartheta}) = \mathbb{V}[\boldsymbol{Z}] \in \mathbb{R}^{n\times n}\) whose structure is assumed known up to the parameter vector \(\boldsymbol{\vartheta} \in \boldsymbol{\Upsilon} \subset \mathbb{R}^k\)
We then assume that we only observe the stochastic process \(\tilde{\boldsymbol{Y}} = \boldsymbol{Z} \odot \boldsymbol{Y}\), where \(\odot\) denotes the Hadamard product. Hence, \(\tilde{\boldsymbol{Y}} \in \mathbb{R}^n = \boldsymbol{Y} \odot \boldsymbol{Z}\) represents the observed process vector with null elements in the positions where observations are missing.
Using \(\otimes\) to denote the Kronecker product, we then define \(\tilde{\boldsymbol{X}} = \boldsymbol{Z} \otimes \boldsymbol{1}^T \odot \boldsymbol{X} \in \mathbb{R}^{n \times p}\) as the design matrix \(\boldsymbol{X}\) with zero-valued vectors for the rows where observations are missing in \(\boldsymbol{Y}\) (where \(\boldsymbol{1}\) represents a vector of ones of dimension \(p\)).
We estimate the parameters \(\boldsymbol{\beta}\) with the least square estimator:
\[\begin{equation} \hat{\boldsymbol{\beta}} = \left(\tilde{\boldsymbol{X}}^T \tilde{\boldsymbol{X}}\right)^{-1} \tilde{\boldsymbol{X}}^T \tilde{\boldsymbol{Y}}. \end{equation}\]
We compute the estimated residuals as \(\hat{\boldsymbol{\varepsilon}} = {\tilde{\boldsymbol{Y}}} - \tilde{{\boldsymbol{X}}} \hat{\boldsymbol{\beta}}\).
We then estimate with the Maximum Likelihood Estimator the parameters \(\boldsymbol{\vartheta}\) of the missingness process \(\boldsymbol{Z}\) assuming that \(\boldsymbol{Z}\) is generated from a Markov model with the following transition probabilities:
\[\begin{equation} \label{eq:markov_model_def} \begin{aligned} & P\left( Z_2=1 \mid Z_1=1\right)=1 - p_1 \\ & P\left(Z_2=1 \mid Z_1=0\right) = p_2 \\ & P\left(Z_2=0 \mid Z_1=1\right)=p_1 \\ & P\left(Z_2=0 \mid Z_1=0\right)=1-p_2. \end{aligned} \end{equation}\]
We then estimate the parameters \(\boldsymbol{\gamma}\) using a Generalized method of Wavelet Moments approach (Guerrier et al. 2013) and using the fact that the variance-covariance matrix of \(\hat{\boldsymbol{\varepsilon}}\) is given by:
\[\boldsymbol{\Sigma}_{\hat{\boldsymbol{\varepsilon}}}(\boldsymbol{\gamma}) = [\boldsymbol{\Lambda}(\hat{\boldsymbol{\vartheta}}) + \mu(\hat{\boldsymbol{\vartheta}})^2 \mathbf{1} \mathbf{1}^T ] \odot ( \boldsymbol{I} - \boldsymbol{P})\boldsymbol{\Sigma}(\boldsymbol{\gamma}) (\boldsymbol{I} - \boldsymbol{P})\]
where \(\boldsymbol{P} = \boldsymbol{X}(\boldsymbol{X}^T\boldsymbol{X})^{-1}\boldsymbol{X}^T\) and \(\boldsymbol{I}\) is the identity matrix of dimension \(n\times n\).
More precisely, we rely on the result of Xu et al. (2017) that provide a computationally efficient form of the theoretical Allan variance (equivalent to the Haar wavelet variance up to a constant) for zero-mean stochastic processes such as \(\boldsymbol{\varepsilon}\) to avoid computing these large matrices multiplication in the objective function. Indeed in Xu et al. (2017) they generalize the results in Zhang (2008) to zero-mean non-stationary processes by using averages of the diagonals and super-diagonals of the covariance matrix of \(\boldsymbol{\varepsilon}\). What this implies is that the GMWM, which uses this form, does not require the storage of the \(n \times n\) covariance matrix of \(\boldsymbol{\varepsilon}\), but only of a vector of dimension \(n\) which is then plugged into an explicit formula consisting in a linear combination of the elements of this vector (these elements being averages of the diagonal and super-diagonals of the covariance matrix).
While the GMWMX as described above and in more details in Voirol et al. (2024), is a general method for
estimating large linear model with complex dependence structure in
presence of missing data, the gmwmx2 R package
is currently developed specifically to estimate tectonic velocities from
position times series in graticule distance coordinates system (GD)
provided by the Nevada geodetic Laboratory (Blewitt 2024; Blewitt, Hammond, and Kreemer
2018).
To estimate the trajectory model (see e.g., Bevis and Brown (2014) for more details), we construct the design matrix \(\boldsymbol{X}\) such that \(i\)-th component of the vector \(\mathbf{X} {{\boldsymbol{\beta}}}\) can be described as follows with \(t_i\) representing the \(i^{th}\) ordered time point (epoch) indicated in Modified Julian Date and \(t_0\) representing the reference epoch located exactly in the middle of start and end point of the time series:
\[\begin{split} \mathbb{E}[\mathbf{Y}_i] &= \mathbf{X}_i^T {{\boldsymbol{\beta}}} \\ &= a + b \left(t_{i} - t_0\right) + \sum_{h=1}^{m}\left[c_{h} \sin \left(2 \pi f_{h} t_{i}\right) + d_h \cos \left(2 \pi f_h t_i\right)\right] + \\& \sum_{j=1}^{r}e_j H\left(t_i - t_j\right) + \sum_{k = 1 }^{s} l_k \left[1- \exp\left(\frac{-(t_i-t_k)}{\tau_k}\right)\right]H\left(t_{i}-\tau_k\right) \end{split}\]where \(a\) is the initial position at the reference epoch \(t_0\), \(b\) is the velocity parameter, \(c_h\) and \(d_h\) are the periodic motion parameters (\(h = 1\) and \(h = 2\) represent the annual and semi-annual seasonal terms, respectively with \(f_1 = \frac{1}{365.25}\) and \(f_2 = \frac{2}{365.25}\)). The offset terms models earthquakes, equipment changes or human intervention in which \(e_j\) is the magnitude of the step at epochs \(t_j\), \(r\) is the total number of offsets, \(H\) is the Heaviside step function defined as \(H(x):= \begin{cases}1, & x \geq 0 \\ 0, & x<0\end{cases}\) and the last term allow to model post-seismic deformation (see e.g., Sobrero et al. (2020)) where \(s\) is the number of post seismic relaxation time specified, \(t_k\) is the time when the relaxation \(k\) starts in Modified Julian Date (MJD), \(\tau_k\) is the relaxation period in days for the post-seismic relaxation \(k\) and \(l_k\) is the amplitude of the transient. Note that by default the estimates of the functional parameters are provided in unit/day.
When loading data from a specific station using the function
gmwmx2::download_station_ngl(), we extract from the Nevada
Geodetic Laboratory the position time series in GD coordinates, the time
steps associated with a equipment or software change and the time steps
associated with an earthquake near the station. All these objects are
stored in a object of class gnss_ts_ngl.
When applying the function gmwmx2::gmwmx2() to an object
of class gnss_ts_ngl, we construct the design matrix \(\boldsymbol{X}\) by considering an offset
term for all equipment or software changes steps and all earthquakes
indicated by the NGL. We also specify a post-seismic relaxation term for
all earthquakes indicated by the NGL. If no relaxation time is specified
in the argument vec_earthquakes_relaxation_time, we
consider a default relaxation time of \(365.25\) days.
It is generally recognized that noise in GNSS time series is best described by a combination of colored noise plus white noise (He et al. 2017; Langbein 2008; Williams et al. 2004; Bos et al. 2013) where the white noise generally model noise at high frequencies and the colored noise model the lower frequencies of the spectrum. In a large study on the noise properties of GNSS signals, concluded that the optimal noise models for 80–90% of GNSS time series signals are the power law and white noise model or white noise and flicker/pink noise with model. The package currently support both stochastic model specification.
More precisely, the power spectrum of a power-law noise has the following form: \[ P(f)=P_0\left(f / f_s\right)^\kappa \] where \(f\) is the frequency, \(P_0\) is a constant, \(f_s\) the sampling frequency and the exponent \(\kappa\) is called the spectral index.
Many stochastic noise can be expressed as such, for example, a spectral index \(\kappa=0\) produces a white noise, a spectral index \(\kappa=-2\) produces a red noise or random walk and a spectral index \(\kappa=-1\) produce a flicker noise, also called pink noise.
Granger (1980) and Hosking (1981) showed that power-law noise with a spectral index between \(-1\) and \(1\) can be obtained by using fractional differencing of Gaussian noise:
\[ (1-B)^{-\kappa / 2} \mathbf{v} \]
where \(B\) is the backward-shift operator \(\left(B v_i=v_{i-1}\right)\) and \(\mathbf{v}\) a vector with independent and identically distributed (IID) Gaussian noise.
Following from Hosking’s definition of the fractional differencing, we obtain
\[ \begin{aligned} (1-B)^{-\kappa / 2} & =\sum_{i=0}^{\infty}\binom{-\kappa / 2}{i}(-B)^i \\ & =1-\frac{\kappa}{2} B-\frac{1}{2} \frac{\kappa}{2}\left(1-\frac{\kappa}{2}\right) B^2+\ldots \\ & =\sum_{i=0}^{\infty} h_i \end{aligned} \] with the coefficient \(h_i\) that can be computed using the following recurrence relation (Kasdin and Walter 1992):
\[ \begin{aligned} h_0 & =1 \\ h_i & =\left(i-\frac{\kappa}{2}-1\right) \frac{h_{i-1}}{i} \quad \text { for } i>0 \end{aligned} \]
Assuming an infinite sequence of zero-mean white noise \(\mathbf{v}\), with variance \(\sigma_{P L}^2\), and a spectral index \(kappa > -1\) then the autocovariance \(\gamma(\tau)=\operatorname{Cov}\left(X_t, X_{t+\tau}\right)=\mathbb{E}\left[\left(X_t-\mu\right)\left(X_{t+\tau}-\mu\right)\right]\) is (Bos et al. 2008):
\[
\begin{aligned}
& \gamma(0)=\sigma_{p l}^2
\frac{\Gamma(1-\alpha)}{\left(\Gamma\left(1-\frac{\alpha}{2}\right)\right)^2}
\\
&
\gamma(\tau)=\frac{\frac{\alpha}{2}+\tau-1}{-\frac{\alpha}{\alpha}+\tau}
\gamma(\tau-1) \text { for } \tau>0
\end{aligned}
\] When the argument stochastic_model is set to
"wn + pl", the stochastic model considered includes both
white noise and power-law with the specified above stationary
autocovariance structure. The parameters estimated are: \(\sigma^2_{W N}\), \(\kappa\) (constrained to be greater than
\(-1\)) and \(\sigma^2_{P L}\).
When the argument stochastic_model is set to
"wn + fl", the stochastic model considered includes both
white noise and flicker noise (not stationary power-law noise with a
spectral index \(\kappa=-1\)) where the
variance covariance of the flicker noise \(\omega\) is obtained as follows (see e.g.,
(Bos et al. 2008)):
\[ \operatorname{Cov}(\omega) = \sigma^2_{F L}\mathbf{U}^T \mathbf{U} \]
where \[ \mathbf{U}=\left(\begin{array}{cccc} h_0 & h_1 & \ldots & h_n \\ 0 & h_0 & & h_{n-1} \\ \vdots & & \ddots & \vdots \\ 0 & 0 & \ldots & h_0 \end{array}\right) \] with the coefficients \(h_i\) computed considering a spectral index \(\kappa=-1\).
The stochastic parameters estimated are: \(\sigma^2_{W N}\), \(\sigma^2_{F L}\) and \(\kappa\) is fixed to \(-1\).
Let us showcase how to estimate the tectonic velocity in for one specific component (North, East or Vertical) of one station.
Let us first load the gmwmx2 package.
## Summary of Estimated Model
## -------------------------------------------------------------
## Functional parameters
## -------------------------------------------------------------
## Parameter                  Estimate  Std_Deviation  95% CI Lower  95% CI Upper
## -------------------------------------------------------------
## Intercept               -0.70924596   0.00880076  -0.72649514  -0.69199677
## Trend                    0.00007198   0.00000836   0.00005559   0.00008837
## Sin (Annual)             0.00135377   0.00034295   0.00068160   0.00202593
## Cos (Annual)            -0.00062838   0.00036296  -0.00133976   0.00008300
## Sin (Semi-Annual)        0.00017180   0.00023908  -0.00029679   0.00064038
## Cos (Semi-Annual)        0.00030877   0.00024661  -0.00017457   0.00079211
## Jump: MJD 56248          0.01588617   0.00157098   0.01280710   0.01896524
## Jump: MJD 55391          0.00780478   0.00153943   0.00478754   0.01082201
## Jump: MJD 55563          0.00230763   0.00175248  -0.00112717   0.00574242
## Jump: MJD 55603          0.00286880   0.00237840  -0.00179277   0.00753037
## Jump: MJD 55606         -0.00044386   0.00351707  -0.00733719   0.00644947
## Jump: MJD 56011         -0.00008945   0.00166129  -0.00334552   0.00316662
## Jump: MJD 56085          0.00073436   0.00170679  -0.00261088   0.00407960
## Jump: MJD 56748         -0.00038578   0.00140095  -0.00313159   0.00236004
## Jump: MJD 57281         -0.00145742   0.00154361  -0.00448285   0.00156800
## Earthquake: MJD 55391    0.01576163   0.00744415   0.00117137   0.03035190
## Earthquake: MJD 55563    0.00174787   0.02577951  -0.04877903   0.05227478
## Earthquake: MJD 55603   -0.28371260   0.53155129  -1.32553398   0.75810879
## Earthquake: MJD 55606    0.27719703   0.52564189  -0.75304214   1.30743621
## Earthquake: MJD 56011   -0.00480858   0.01328379  -0.03084432   0.02122717
## Earthquake: MJD 56085   -0.00772227   0.01248180  -0.03218615   0.01674162
## Earthquake: MJD 56748   -0.00910404   0.00544289  -0.01977191   0.00156382
## Earthquake: MJD 57281   -0.02236505   0.00647710  -0.03505994  -0.00967016
## -------------------------------------------------------------
## Stochastic parameters
## -------------------------------------------------------------
##  White Noise Variance  :     0.00000122
##  Stationary powerlaw Spectral index:    -0.83882877
##  Stationary powerlaw Variance:     0.00000337
## -------------------------------------------------------------
## Missingness parameters
## -------------------------------------------------------------
##  P(Z_{i+1} = 0 | Z_{i} = 1): 0.00279460
##  P(Z_{i+1} = 1 | Z_{i} = 0): 0.31578947
##  \hat{E[Z]}: 0.99122807
## -------------------------------------------------------------
## Running time: 0.34 seconds
## -------------------------------------------------------------By default, the estimated parameters are provided in m/day, we can
optionally scale the estimated functional parameters so that they are
returned in m/year with the argument scale_parameters.
## Summary of Estimated Model
## -------------------------------------------------------------
## Functional parameters
## -------------------------------------------------------------
## Parameter                  Estimate  Std_Deviation  95% CI Lower  95% CI Upper
## -------------------------------------------------------------
## Intercept              -259.05208570   3.21447935 -265.35234945 -252.75182195
## Trend                    0.02629027   0.00305374   0.02030506   0.03227549
## Sin (Annual)             0.49446304   0.12526192   0.24895419   0.73997189
## Cos (Annual)            -0.22951487   0.13256987  -0.48934705   0.03031731
## Sin (Semi-Annual)        0.06274864   0.08732351  -0.10840228   0.23389957
## Cos (Semi-Annual)        0.11277837   0.09007298  -0.06376143   0.28931817
## Jump: MJD 56248          5.80242306   0.57380175   4.67779229   6.92705382
## Jump: MJD 55391          2.85069445   0.56227851   1.74864881   3.95274009
## Jump: MJD 55563          0.84286159   0.64009281  -0.41169727   2.09742045
## Jump: MJD 55603          1.04782942   0.86870919  -0.65480931   2.75046815
## Jump: MJD 55606         -0.16211875   1.28460983  -2.67990775   2.35567026
## Jump: MJD 56011         -0.03267118   0.60678682  -1.22195150   1.15660914
## Jump: MJD 56085          0.26822620   0.62340406  -0.95362331   1.49007571
## Jump: MJD 56748         -0.14090556   0.51169791  -1.14381504   0.86200392
## Jump: MJD 57281         -0.53232443   0.56380463  -1.63736120   0.57271233
## Earthquake: MJD 55391    5.75693615   2.71897531   0.42784247  11.08602982
## Earthquake: MJD 55563    0.63841028   9.41596471 -17.81654143  19.09336199
## Earthquake: MJD 55603  -103.62602673 194.14910865 -484.15128733 276.89923386
## Earthquake: MJD 55606  101.24621682 191.99070042 -275.04864137 477.54107500
## Earthquake: MJD 56011   -1.75633307   4.85190369 -11.26588956   7.75322342
## Earthquake: MJD 56085   -2.82055781   4.55897918 -11.75599281   6.11487720
## Earthquake: MJD 56748   -3.32525134   1.98801468  -7.22168852   0.57118584
## Earthquake: MJD 57281   -8.16883558   2.36576219 -12.80564427  -3.53202690
## -------------------------------------------------------------
## Stochastic parameters
## -------------------------------------------------------------
##  White Noise Variance  :     0.00000122
##  Stationary powerlaw Spectral index:    -0.83882877
##  Stationary powerlaw Variance:     0.00000337
## -------------------------------------------------------------
## Missingness parameters
## -------------------------------------------------------------
##  P(Z_{i+1} = 0 | Z_{i} = 1): 0.00279460
##  P(Z_{i+1} = 1 | Z_{i} = 0): 0.31578947
##  \hat{E[Z]}: 0.99122807
## -------------------------------------------------------------
## Running time: 0.34 seconds
## -------------------------------------------------------------## Summary of Estimated Model
## -------------------------------------------------------------
## Functional parameters
## -------------------------------------------------------------
## Parameter                  Estimate  Std_Deviation  95% CI Lower  95% CI Upper
## -------------------------------------------------------------
## Intercept               -0.70924596   0.01136945  -0.73152967  -0.68696225
## Trend                    0.00007198   0.00001083   0.00005074   0.00009321
## Sin (Annual)             0.00135377   0.00043414   0.00050287   0.00220467
## Cos (Annual)            -0.00062838   0.00045622  -0.00152254   0.00026579
## Sin (Semi-Annual)        0.00017180   0.00027821  -0.00037348   0.00071707
## Cos (Semi-Annual)        0.00030877   0.00028861  -0.00025689   0.00087443
## Jump: MJD 56248          0.01588617   0.00191916   0.01212469   0.01964764
## Jump: MJD 55391          0.00780478   0.00182956   0.00421891   0.01139065
## Jump: MJD 55563          0.00230763   0.00190189  -0.00142001   0.00603527
## Jump: MJD 55603          0.00286880   0.00235741  -0.00175163   0.00748923
## Jump: MJD 55606         -0.00044386   0.00358112  -0.00746272   0.00657500
## Jump: MJD 56011         -0.00008945   0.00192695  -0.00386620   0.00368730
## Jump: MJD 56085          0.00073436   0.00199984  -0.00318524   0.00465397
## Jump: MJD 56748         -0.00038578   0.00187908  -0.00406870   0.00329714
## Jump: MJD 57281         -0.00145742   0.00187987  -0.00514189   0.00222704
## Earthquake: MJD 55391    0.01576163   0.00930876  -0.00248320   0.03400647
## Earthquake: MJD 55563    0.00174787   0.02766816  -0.05248072   0.05597647
## Earthquake: MJD 55603   -0.28371260   0.52755951  -1.31771024   0.75028504
## Earthquake: MJD 55606    0.27719703   0.52174625  -0.74540682   1.29980089
## Earthquake: MJD 56011   -0.00480858   0.01509158  -0.03438754   0.02477039
## Earthquake: MJD 56085   -0.00772227   0.01454627  -0.03623244   0.02078791
## Earthquake: MJD 56748   -0.00910404   0.00725836  -0.02333016   0.00512207
## Earthquake: MJD 57281   -0.02236505   0.00811536  -0.03827086  -0.00645925
## -------------------------------------------------------------
## Stochastic parameters
## -------------------------------------------------------------
##  White Noise Variance  :     0.00000193
##  Flicker Noise Variance:     0.00000251
## -------------------------------------------------------------
## Missingness parameters
## -------------------------------------------------------------
##  P(Z_{i+1} = 0 | Z_{i} = 1): 0.00279460
##  P(Z_{i+1} = 1 | Z_{i} = 0): 0.31578947
##  \hat{E[Z]}: 0.99122807
## -------------------------------------------------------------
## Running time: 1.01 seconds
## -------------------------------------------------------------