9 – Copulas in Time Series Analysis

Abstract

In previous chapters, we have mainly discussed copula models for bivariate/multivariate random variables. Now we ask two other questions that usually arise in hydrology and water resources engineering. Can we use the stochastic approach to predict streamflow at a downstream location using streamflow at the upstream location? If streamflow is time dependent, then it cannot be considered as a random variable as is done in frequency analysis. Can we model the temporal dependence of an at-site streamflow sequence (e.g., monthly streamflow) more robustly than with the classical time series and Markov modeling approach (e.g., modeling the nonlinearity of time series freely)? This chapter attempts to address these questions and introduces how to model a time series with the use of copula approach.

9 Copulas in Time Series Analysis

9.1 General Concept of Time Series Modeling

In this section, we briefly introduce time series modeling. The reader may refer to Box et al. (2008) for a complete discussion. A time series (more specifically with even time intervals) may be stationary, nonstationary, or long-memory; linear or nonlinear. Following Box et al. (2008), a general form of a linear time series Y_tYt may be written as follows:

ϕ(B)(1 − B)^dY_t = c + θ(B)a_t

ϕB1−BdYt=c+θBat(9.1)

where Y_tYt is the time series; B is the backward operator; d is the differencing operator; d = 0d=0 for stationary; d is a positive integer (usually d = 1 or 2d=1or2) for nonstationary; d ∈ (0, 1)d∈01 for long memory time series; ϕ(B) = 1 − ϕ₁B − ϕ₂B² − … − ϕ_pB^pϕB=1−ϕ1B−ϕ2B2−···−ϕpBp is the autoregressive term; θ(B) = 1 + θ₁B + θ₂B² + … + θ_qB^qθB=1+θ1B+θ2B2+···+θqBq is the moving average term; and a_tat is the innovation (i.e., white noise and more specifically white Gaussian noise).

The classic time series model given in Equation (9.1) may be identified with the following procedures:

1. Graph the sample autocorrelation (ACF) and partial autocorrelation (PACF) function for time series {X_t : t = 1, …, n}Xt:t=1…n.
2. Identify the possible model order from sample ACF and PACF, if the visual evidences are observed:
1. i. If sample ACF falls into the 95% confidence bound quickly, then the time series X_tXt may be considered stationary (shown in Figure 9.1(a)); otherwise, the time series is nonstationary or long memory (Figure 9.1(b)), and differencing is needed to convert a nonstationary time series into the stationary time series (Figure 9.1(c)).
2. ii. With the stationary time series, the model order may then be estimated from the sample ACF and PACF as follows: (a) if the cutoff point in ACF with the PACF falls into the 95% confidence bound, we will have moving average (MA) time series model (Figure 9.2(a)); (b) if the cutoff point in PACF with the ACF falls into the 95% confidence bound, we will have autoregressive (AR) time series model (Figure 9.2(b)); and (c) if both ACF and PACF fall into the 95% confidence bound, we will have an autoregressive and moving average (ARMA) time series model (Figure 9.2(c)).
3. Estimate the model parameters for the stationary time series with the assumption of model residual: at~N0σa2.

Figure 9.1 Sample autocorrelation function illustration plots.

Figure 9.2 Sample ACF and PACF for the simulated stationary time series.

With the preceding initial introduction, we will now further illustrate Equation (9.1) using streamflow as an example. It is supposed that the differencing order, d = 0, occurs most likely for a watershed before experiencing climate change and/or alteration by human activities; d = 1 occurs most likely for the watershed with these impacts; and d ∈ (0, 0.5)d∈00.5 occurs usually for reservoir operations. In other words, the original stationary streamflow series (or the stationary streamflow series after necessary differencing) at time t is dependent on the value at previous p times (i.e., it depends on the streamflow at t − 1, t − 2, …, t − pt−1,t−2,…,t−p). Constant c relates to the long-term average of the stationary series given in Equation (9.2b). θ(B) = 1 + θ₁B + θ₂B² + … + θ_qB^qθB=1+θ1B+θ2B2+…+θqBq represents the moving average term. Replacing w_t = (1 − B)^dY_twt=1−BdYt such that Y_tYt is now written as stationary time series after necessary differencing, Equation (9.1) may be rewritten as follows:

wt=c+φ1wt−1+φ2wt−2+···+φpwt−p+at+θ1at−1+θ2at−2+…+θqat−q=c+∑i=1pφiwt−i+at+∑j=1qθjat−j;at~i.i.d.0σa2

(9.2)

Taking the expectation of Equation (9.2), we have the following:

Ewt=c+∑i=1pϕiEwt−i+Eat+∑j=1qθjEat−j

(9.2a)

Substituting E(w_t) = E(w_t − i), i = 1, …, pEwt=Ewt−i,i=1,…,p, and E(a_t) = E(a_t − j) = 0Eat=Eat−j=0 for the stationary time series into Equation (9.2a), we have the following:

Ewt=c+∑i=1pϕiEwt⇒Ewt=c1−∑i=1pϕi

(9.2b)

To further evaluate if differencing is necessary, two statistical tests can help make a reasonable and formal decision. The Kwiatkowski–Phillips–Schmidt–Shin (KPSS) test (1992) has the null hypothesis of time series being stationary, while the augmented Dickey–Fuller (ADF) test (Dickey and Fuller, 1979) has the null hypothesis of time series as a unit root process (or is simply called nonstationary). The KPSS and ADF tests are complementary to each other (Arya and Zhang, 2015) as follows:

i. Time series being stationary: acceptance by KPSS test while rejection by ADF test.
ii. Time series being nonstationary: rejection by KPSS test while acceptance by ADF test.
iii. Time series belonging to a long memory process: rejection by both KPSS and ADF tests. In this case, the Hurst coefficient (Hurst, 1951) is applied to evaluate the necessary fractional differencing order.
iv. Not enough evidence to decide whether the time series is stationary or nonstationary: acceptance by both KPSS and ADF tests.

Furthermore, if there exists heteroscedasticity (i.e., changing variance) in the time series such that the time series tends to have a large value following a large value and a small value following a small value as a simple illustration. Then, for the time series with heteroscedasticity, the model error of Equation (9.1) needs to be further revised using (Generalized) Autoregressive Conditional Heteroscedastic (G) (ARCH) models. A (G)ARCH model indicates a second-order dependent time series. In other words, the conditional variability depends on the past history of the time series. An ARCH model can be written as follows:

ht≡varatat−1at−2…=Eat2at−1at−2…=w0+∑i=1swiat−i2

(9.3)

and a Generalized ARCH (Bollerslev, 1986) model can be written as follows:

ht≡varatat−1at−2…=Eat2at−1at−2…=w0+∑i=1swiat−i2+∑j=1rqjht−j

(9.4)

In Equations (9.3) and (9.4), h_tht denotes the conditional variance (variability) of a_tat given a_t − 1, a_t − 2, …at−1,at−2,…; w₀>0, w_i ≥ 0, i = 1, …, s, q_j ≥ 0, j = 1, …, rw0>0,wi≥0,i=1,…,s,qj≥0,j=1,…,r; w_iwi denotes the coefficients of the ARCH effects (i.e., for the correlated squared model errors a_tat); and q_iqi denotes the coefficients of the correlated conditional variance h_tht.

In addition, there exists a relation among conditional variance (h_t)ht), innovation (i.e., model residual a_t)at) and standard white Guassian noise (e_t, e_t~N(0, 1))et,et~N01) as follows:

at=htet

(9.5)

The parameters of the time series model may be estimated with the use of the maximum likelihood method.

Example 9.1 Fit an autoregressive time series model with order 1 (i.e., AR(1))Yt=c+ϕ1Yt−1+et,et~N0σe2 to annual streamflow data given in Table 9.1.

Plot the original time series and residual sequence. Is the residual sequence a white Gaussian noise?

Table 9.1. Annual streamflow data.

Year	Flow (cfs)	Year	Flow (cfs)
1960	517.9	1988	280.8
1961	367.8	1989	500.9
1962	252.2	1990	528.9
1963	258.6	1991	602.8
1964	281.3	1992	386
1965	308.4	1993	627.2
1966	317.3	1994	520
1967	372.4	1995	345.3
1968	349.1	1996	575
1969	504.3	1997	663.2
1970	330.9	1998	412.4
1971	413.3	1999	311
1972	461.7	2000	385.3
1973	567	2001	299.3
1974	500.6	2002	417.5
1975	654.9	2003	578.5
1976	550.3	2004	715.8
1977	401	2005	676.3
1978	593.8	2006	507
1979	508	2007	736
1980	543.5	2008	677.4
1981	442.3	2009	508.9
1982	477.3	2010	418.3
1983	473	2011	721.2
1984	548.3	2012	609
1985	467.7	2013	536.3
1986	539.3	2014	608.1
1987	431.7	2015	552.4

Solution: To fit AR(1) to the time series listed in Table 9.1, we can simply use the MATLAB function as follows:

1. Assign the TS as the time series listed in Table 9.1.
2. Set up the AR(1) model, where we need to fit using arima:

model=arima(1,0,0); % ARIMA (P, D, Q): P=1, AR term; D=0, stationary series; Q=0, MA term.

3. Apply the estimate function (through MLE):

[param,Var,LogL]=estimate(model,TS);

% param: estimated parameter for the model defined above.

% Var: variance-covariance matrix for the parameter estimated. Here, we have 3 parameters: constant-C, autoregressive parameter, and variance of model residual.

% LogL: the loglikelihood of the objective function after optimization.

Using the preceding functions, we get the results listed in Table 9.2.

Table 9.2. Parameter estimated for the AR(1) model.

ARIMA (1,0,0) model (AR(1) model)
Conditional probability distribution: Gaussian
Parameter	Value	Standard error	T-statistics
Constant (cfs)	255.13	71.80	3.55
AR{1}	0.475	0.15	3.18
Variance (cfs²)	12751.6	3141.41	4.06
LogL = –434.18

The fitted AR(1) time series model is now written as follows:

Z_t = 255.13 + 0.475Z_t − 1 + e_t

Zt=255.13+0.475Zt−1+et(9.6)

4. Apply the infer function to compute the model residual sequence listed in Table 9.3:
```
res=infer(param,TS);
```

Table 9.3. Fitted model residual.

Year	Residual (cfs)	Year	Residual (cfs)
1960	24.90	1988	–179.29
1961	–133.22	1989	112.45
1962	–177.55	1990	35.95
1963	–116.27	1991	96.56
1964	–96.61	1992	–155.32
1965	–80.29	1993	188.81
1966	–84.25	1994	–32.91
1967	–33.38	1995	–156.71
1968	–82.84	1996	155.93
1969	83.43	1997	135.07
1970	–163.66	1998	–157.60
1971	1.07	1999	–139.93
1972	10.34	2000	–17.49
1973	92.67	2001	–138.76
1974	–23.73	2002	20.27
1975	162.10	2003	125.15
1976	–15.76	2004	186.01
1977	–115.40	2005	81.33
1978	148.28	2006	–69.22
1979	–29.05	2007	240.16
1980	47.18	2008	72.84
1981	–70.87	2009	–67.84
1982	12.18	2010	–78.44
1983	–8.74	2011	267.47
1984	68.60	2012	11.46
1985	–47.75	2013	–7.97
1986	62.12	2014	98.35
1987	–79.48	2015	8.56

Figure 9.3 plots the original time series, fitted model residuals, and the histogram compared to the hypothesized white Gaussian noise. From the histogram plot, it seems that the hypothesized distribution may properly represent the distribution of the fitted model residuals. To formally assess whether the fitted model residuals are a white Gaussian noise, we apply the Kolmogorov–Smirnov (KS) test. The KS test evaluates the maximum distance of empirical and parametric CDFs. Its test statistic D_n can be written as follows:

Dn=supx∣Fnx−Fx∣

(9.7)

The null hypothesis (H₀) is that the fitted model residuals follow N0σe2,i.e.,N012751.6. With this null hypothesis, we can either use the parametric bootstrap method or MATLAB function kstest directly. Here we will simply use the MATLAB function kstest. Table 9.4 lists the empirical and parametric CDFs for the fitted model residuals.

Figure 9.3 Original time series, fitted model residual plots, and histogram.

Table 9.4. Empirical and parametric CDFs of the fitted model residuals.

Residual	Parametric CDF	Empirical CDF	Residual	Parametric CDF	Empirical CDF
24.90	0.59	0.63	–179.29	0.06	0.02
–133.22	0.12	0.16	112.45	0.84	0.82
–177.55	0.06	0.04	35.95	0.62	0.65
–116.27	0.15	0.18	96.56	0.80	0.79
–96.61	0.20	0.21	–155.32	0.08	0.11
–80.29	0.24	0.26	188.81	0.95	0.95
–84.25	0.23	0.23	–32.91	0.39	0.40
–33.38	0.38	0.39	–156.71	0.08	0.09
–82.84	0.23	0.25	155.93	0.92	0.89
83.43	0.77	0.75	135.07	0.88	0.86
–163.66	0.07	0.05	–157.60	0.08	0.07
1.07	0.50	0.53	–139.93	0.11	0.12
10.34	0.54	0.56	–17.49	0.44	0.46
92.67	0.79	0.77	–138.76	0.11	0.14
–23.73	0.42	0.44	20.27	0.57	0.61
162.10	0.92	0.91	125.15	0.87	0.84
–15.76	0.44	0.47	186.01	0.95	0.93
–115.40	0.15	0.19	81.33	0.76	0.74
148.28	0.91	0.88	–69.22	0.27	0.33
–29.05	0.40	0.42	240.16	0.98	0.96
47.18	0.66	0.67	72.84	0.74	0.72
–70.87	0.27	0.32	–67.84	0.27	0.35
12.18	0.54	0.60	–78.44	0.24	0.30
–8.74	0.47	0.49	267.47	0.99	0.98
68.60	0.73	0.70	11.46	0.54	0.58
–47.75	0.34	0.37	–7.97	0.47	0.51
62.12	0.71	0.68	98.35	0.81	0.81
–79.48	0.24	0.28	8.56	0.53	0.54

Applying kstest as follows

[H,Pvalue,stat]=kstest(res,[res,normcdf(res,0,param.Variance^0.5)],0.05)

we have H = 0, Pvalue = 0.803, and test statistic = 0.083. With null hypothesis being accepted and Pvalue > 0.05, we show that the fitted model residual is a white Gaussian noise.

9.2 Spatially Dependent Bivariate or Multivariate Time Series

In stock exchanges, the stock values among major exchanges (e.g., London, Hong Kong, New York, Tokyo) always impact one another. In other words, these major exchanges have a tendency to follow each other. In the field of hydrology and water resources engineering, there also exists a similar tendency (or spatial dependence). For example, streamflow (or flood) at a downstream location is generally positively dependent on that at the upstream location. In this section, we will show how to evaluate spatially dependent time series.

As discussed in the previous chapters, copulas are applied to bivariate/multivariate independent random variables. Thus, to employ the copula theory for a bivariate (multivariate) time-dependent sequence (e.g., spatial dependence of bivariate/multivariate time series), we need to investigate each individual time series first and fit the time series with proper models (e.g., the equations in Section 9.1). Three steps are needed for bivariate/multivariate time series analysis using copulas:

1. Investigate each univariate time series separately, including the assessment of stationarity, time series model identification, and estimation of model parameters;
2. Compute model residuals from the fitted univariate time series model.
3. Apply copulas to the model residuals obtained from step 2.

In what follows, we will use a simple example to illustrate how to model a bivariate time series.

Example 9.2 Perform a dependence study using the daily time series data given in Table 9.5.

Table 9.5. Bivariate time series.

	TS1	TS2		TS1	TS2
1	60.25	476.35	26	45.24	475.13
2	55.87	475.48	27	11.28	475.01
3	76.18	476.02	28	30.96	475.44
4	74.84	476.75	29	78.03	475.49
5	84.79	476.89	30	97.69	475.91
6	68.39	477.27	31	90.31	476.52
7	73.14	477.51	32	60.44	475.51
8	63.01	476.50	33	52.28	474.24
9	63.28	475.87	34	35.00	473.57
10	80.17	475.74	35	38.98	473.33
11	72.42	475.29	36	49.74	473.42
12	57.82	476.75	37	75.43	473.87
13	86.49	476.60	38	81.49	471.65
14	99.88	475.69	39	56.48	470.84
15	75.34	474.94	40	51.48	472.87
16	66.07	473.83	41	50.26	473.17
17	56.61	473.63	42	52.20	474.16
18	69.76	473.84	43	76.67	475.32
19	94.75	475.58	44	83.07	475.63
20	62.69	476.17	45	79.80	475.90
21	64.71	475.11	46	81.26	474.64
22	88.54	476.29	47	73.69	474.19
23	84.75	478.09	48	72.70	476.01
24	94.86	478.04	49	77.68	476.85
25	81.05	476.20	50	89.06	476.79

Solution: Applying the procedure similar to Example 9.1, the time series TS1 and TS2 are fitted with ARIMA (1,0,1) (i.e., ARMA(1,1)) and ARIMA(2,0,0) (i.e., AR(2)), respectively. The fitted parameters for the time series are given in Table 9.6. Figure 9.4 plots the original time series and empirical frequencies (histogram) for the model residuals.

Table 9.6. Fitted time series model and parameter estimated.

TS1-ARIMA(1,0,1) model:
conditional probability distribution: Gaussian
Parameter	Value	Standard error	T statistics
Constant	55.24	13.18	4.19
AR{1}	0.21	0.19	1.09
MA{1}	0.66	0.15	4.34
Variance	185.21	42.53	4.36
TS2-ARIMA(2,0,0) model:
conditional probability distribution: Gaussian
Parameter	Value	Standard error	T statistics
Constant	125.18	38.95	3.16
AR{1}	1.1	0.15	7.15
AR{2}	–0.36	0.16	–2.3
Variance	0.7	0.12	5.63

Figure 9.4 Plots of original time series and histograms of the fitted model residuals.

The acceptance by the KS test for the fitted model residuals indicates that the model residuals belong to the white noise.

We can write the residuals as a function of time series:

TS1:restTS1=−55.24+TS1t−0.21TS1t−1−0.65rest−1TS1

(9.8a)

TS2:restTS2=−123.18+TS2t−1.10TS2t−1+0.36TS2t−2

(9.8b)

Now, we may apply copula to restTS1,restTS2 that are considered as independent random variables. Figure 9.5 shows the scatter plot of the random variables.

Figure 9.5 Scatter plot of the fitted model residuals: restT1 and restT2.

From Figure 9.5, it is seen the fitted model residuals are positively correlated. Using Equations (3.70) and (3.73), the empirical rank-based correlation coefficients, Spearman’s ρ, and Kendall’s τ are computed as ρ_n≈0.28, τ_n≈0.16ρn≈0.28,τn≈0.16. To this end, we apply the Archimedean copulas (i.e., Gumbel–Hougaard, Clayton, and Frank copulas presented in Chapter 4) and meta-elliptical copulas (i.e., the Gaussian and Student t copulas presented in Chapter 7). Similar to the discussion in the previous chapters, we apply the pseudo (i.e., semiparametric) and two-stage maximum likelihood methods for parameter estimation. Table 9.7 lists the empirical and parametric CDFs of the fitted model residuals. Table 9.8 lists the parameters and corresponding estimated likelihood values. The likelihood values in Table 9.8 suggest that the Frank copula attains the best overall performance, followed by Gaussian copula. Figure 9.6 compares the CDF of the model residuals to the simulated random variates from the fitted parametric copulas. Figure 9.7 compares the model residuals to that computed from the two-stage estimation. Comparison indicates a similar performance between the Frank and Gaussian copulas.

Table 9.7. Fitted model residuals, empirical and parametric CDFs computed.

restT1	restT2	Empirical CDF		Parametric CDF
restT1	restT2	restT1	restT2	restT1	restT2
–9.03	0.13	0.25	0.59	0.25	0.56
–5.91	–0.59	0.31	0.20	0.33	0.24
13.25	0.89	0.78	0.86	0.83	0.86
–4.93	0.72	0.39	0.82	0.36	0.80
17.28	0.24	0.92	0.67	0.90	0.61
–15.82	0.73	0.14	0.84	0.12	0.81
14.15	0.60	0.80	0.80	0.85	0.76
–16.72	–0.53	0.12	0.24	0.11	0.26
5.99	0.03	0.67	0.57	0.67	0.51
7.86	0.23	0.69	0.65	0.72	0.61
–4.62	–0.29	0.41	0.35	0.37	0.36
–9.39	1.60	0.24	0.94	0.24	0.97
25.46	–0.31	0.98	0.33	0.97	0.36
9.94	–0.53	0.71	0.22	0.77	0.26
–7.15	–0.33	0.27	0.31	0.30	0.35
–0.08	–0.94	0.51	0.14	0.50	0.13
–12.28	–0.19	0.18	0.37	0.18	0.41
10.88	–0.16	0.73	0.43	0.79	0.42
17.88	1.28	0.94	0.90	0.91	0.94
–23.98	0.02	0.06	0.53	0.04	0.51
12.28	–1.06	0.76	0.10	0.82	0.10
11.79	1.51	0.75	0.92	0.81	0.96
3.38	1.62	0.55	0.96	0.60	0.97
19.83	0.01	0.96	0.51	0.93	0.50
–6.93	–1.12	0.29	0.06	0.31	0.09
–22.24	–0.19	0.08	0.39	0.05	0.41
–38.68	0.22	0.02	0.61	0.00	0.60
–1.12	0.38	0.47	0.73	0.47	0.68
17.11	–0.08	0.90	0.47	0.90	0.46
14.99	0.45	0.86	0.75	0.86	0.70
4.93	0.60	0.59	0.78	0.64	0.76
–16.78	–0.93	0.10	0.16	0.11	0.13
–4.43	–0.85	0.43	0.18	0.37	0.15
–28.16	–0.51	0.04	0.25	0.02	0.27
–4.96	–0.45	0.37	0.27	0.36	0.30
–10.31	–0.35	0.22	0.29	0.22	0.34
16.67	–0.08	0.88	0.49	0.89	0.46
–0.37	–2.76	0.49	0.02	0.49	0.00
–15.41	–0.97	0.16	0.12	0.13	0.12
–5.32	1.16	0.35	0.88	0.35	0.92
–12.14	–1.08	0.20	0.08	0.19	0.10
–5.46	0.32	0.33	0.69	0.34	0.65
14.21	0.49	0.82	0.76	0.85	0.72
2.57	–0.11	0.53	0.45	0.57	0.45
5.65	0.23	0.63	0.63	0.66	0.61
5.76	–1.21	0.65	0.04	0.66	0.07
–2.19	-0.18	0.45	0.41	0.44	0.42
3.63	1.68	0.57	0.98	0.61	0.98
4.98	0.36	0.61	0.71	0.64	0.67
14.43	0.03	0.84	0.55	0.86	0.51

Table 9.8. Estimated copula parameters and estimated log-likelihood values.

Copula	Pseudo	MLE	Semiparametric	MLE
Gumbel–Hougaard	1.15	0.75	1.13	0.61
Clayton	0.31	0.88	0.08	0.15
Frank	1.63	1.79	1.56	1.69
Gaussian	0.26	1.30	0.20	0.99
Student t	[0.26, 1.38 × 10⁷]0.261.38×107	1.30	[0.20, 1.38 × 10⁷]0.201.38×107	0.99

Figure 9.6 Comparison of simulated random variates to pseudo-observations and fitted parametric marginals.

Figure 9.7 Comparison of fitted model residuals to those computed from copula with parameter estimated using two-stage MLE.

From this example, we also note that the rank-based correlation coefficient of the model residuals may be different from that of the original time series. In this example, we have τ_n≈0.16τn≈0.16 for the model residuals, while τ_n≈0.35τn≈0.35 for the original time series. The reduction of the degree of association for the model residuals may be due to the autoregressive component of time series modeling.

From Equation (9.8), we can also reconstruct the time series with the use of random variates from the simulated copula. Here, we will again use copulas with two-stage MLE as an example. Additionally, we will use the last two values of the time series (i.e., Table 9.5) as initial estimates. Figure 9.8 plots the reconstructed time series, which shows that the reconstructed time series reasonably follows the same pattern as does the original time series.

Figure 9.8 Reconstructed time series using the copula estimated with two-stage MLE.

Now, we have explained how to study the spatial dependence for the sequence with time dependence. In the previous example, we studied time-dependent sequences and spatial dependence of the sequences separately. We first built the time series (i.e., the autoregressive and moving average) model for each univariate time-dependent sequence. Then, we built the copula model on the residual (also called innovation) of the time series model, since the residuals are now random variables.

Copula modeling can also be applied to study the serial dependence of univariate time series, in addition to studying the previously discussed bivariate/multivariate spatial-temporal dependent time series (i.e., spatial dependence for the time-dependent sequence). In the following section, we will introduce how to model the serial dependence of univariate time series.

9.3 Copula Modeling for Univariate Time Series with Serial Dependence: General Discussion

Darsow, et al. (1992) introduced the condition (i.e., equivalent to Chapman–Kolmogorov equations) for a copula-based time series to be a Markov process. Joe (1997) introduced a class of parametric stationary Markov models based on parametric copulas and parametric marginal distributions. Similar to the copula application in bivariate or multivariate frequency analysis discussed previously, a copula-based time series model also allows one to consider serial dependence and marginal behavior of the time series investigated separately.

Following Joe (1997) and Chen and Fan (2006), copulas can be applied to the stationary time series of (i) Markov chain models (both discrete and continuous, including autoregressive models); (ii) K-dependent time series models (i.e., moving average models with order k); (iii) convolution-closed infinitely divisible univariate marginal models. For stationary time series {Z_t : t = 1, 2, …}Zt:t=12….

Let {e_t}et be i.i.d. random variables that are independent of {Z_t − 1, Z_t − 2…}Zt−1Zt−2… (i.e., the innovation of the time series {Z_t : t = 1, 2, …Zt:t=1,2,…}). We may express the preceding three cases by using one of the following models (Joe, 1997).

Markov Chain models:

➣ Kth-order autoregressive model:

Z_t = α₁Z_t − 1 + α_tZ_t − 2 + … + α_KZ_t − K + e_t
Zt=α1Zt−1+αtZt−2+···+αKZt−K+et(9.9)
where α₁, α₂, …, α_Kα1,α2,…,αK are the scalars.
➣ Kth-order Markov chain:

Z_t = g(Z_t − 1, Z_t − 2, …, Z_t − k, e_t)
Zt=gZt−1Zt−2…Zt−ket(9.10)
where gg is a real-valued function.
➣ First-order convolution-closed infinitely divisional univariate margin model:

Z_t = S_t(Z_t − 1) + ϵ_t
Zt=StZt−1+et(9.11)
where S_tSt is an independent realization of the stochastic operator.

K-dependent time series models:

➣ Kth-order Moving Average model:

Z_t = e_t + β₁e_t − 1 + β₂e_t − 2 + … + β_Ke_t − K
Zt=et+β1et−1+β2et−2+···+βKet−K(9.12)
where β₁, β₂, …, β_Kβ1,β2,…,βK are the scalars.
➣ K-dependent model:

Z_t = h(e_t, e_t − 1, …, e_t − K)
Zt=hetet−1…et−K(9.13)
where hh is a real-valued function.
➣ One-dependent convolution-closed infinitely divisible univariate marginal model:

Z_t = e_t + S_t(e_t − 1)
Zt=et+Stet−1(9.14)

where S_tSt is the independent realization of the stochastic operator.

Now, with the classified models given in Equations (9.10)–(9.14), we will focus on the continuous Markov chain (also called Markov process) models for the rest of the chapter. We will introduce the simple first-order Markov models first, followed by the Kth-order Markov models.

9.4 First-Order Copula-Based Markov Model

9.4.1 General Concept of the First-Order Copula-Based Continuous Markov Model

For continuous time series {Z_t : t = 1, 2, …}Zt:t=12… modeled with the first-order Markov process, its transition probability can be expressed as follows:

P(Z_t ≤ z_t| Z_t − 1 = z_t − 1, Z_t − 2 = z_t − 2, …, Z₁ = z₁) = P(Z_t ≤ z_t| Z_t − 1 = z_t − 1)

PZt≤ztZt−1=zt−1Zt−2=zt−2…Z1=z1=PZt≤ztZt−1=zt−1(9.15)

Equation (9.15) means that the probabilistic behavior of the time series {Z_t}Zt is fully governed by the joint distribution of {Z_t, Z_t − 1}ZtZt−1. We can apply copula modeling as a robust and powerful representation of Equation (9.15) as follows:

P(Z_t ≤ z_t| Z_t − 1 = z_t − 1, Z_t − 2 = z_t − 2, …, Z₁ = z₁)

PZt≤ztZt−1=zt−1Zt−2=zt−2…Z1=z1

=PZt≤ztZt−1=zt−1=∂CFztFzt−1∂Fzt−1

(9.16a)

and the conditional density of Z_tZt given Z_t − 1Zt−1 can be expressed using copula as follows:

h(z_t| z_t − 1) = f(z_t)c(F(z_t), F(z_t − 1))

hztzt−1=fztcFztFzt−1(9.16b)

where CC and cc represent the copula and its density function of (z_t, z_t − 1(zt,zt−1), and FF and ff represent the marginal distribution and the density function of z_tzt, respectively

9.4.2 Parameter Estimation of the First-Order Copula-Based Continuous Markov Model

Chen and Fan (2006) proposed an estimation method similar to the semiparametric maximum likelihood estimation method discussed in the previous chapters. Following Chen and Fan (2006) and Equations (9.16a) and (9.16b), we see the time series is fully determined by the true unknown marginal distribution F^∗F∗ and a copula function with parameter α^∗α∗ (or simply written as (F^∗, α^∗)F∗α∗). Here we again note the advantage of investigating the marginal distribution and copula separately.

To evaluate the copula parameter, we may first apply the empirical marginal to time series {Z_t}Zt with the Weibull plotting-position formula in the same fashion as Equation (3.103):

Fnz=1m+1∑t=1m1Zt≤z

(9.17)

where: mm is the length of time series (or simply called the sample size).

Replacing Equation (9.17) with the true unknown marginal and its density function, and true copula density function, the log-likelihood function for the first-order Markov model can be expressed as follows:

Lα=1m∑t=1mlogf∗zt+1n∑t=2mlogcF∗ztF∗zt−1α

(9.18a)

Equation (9.18a) can be simplified using the empirical distribution F_nFn as follows:

L̂α=1n∑t=2mlogcFnztFnzt−1α

(9.18b)

Equation (9.18b) is in the same form as Equation (3.104).

9.4.3 Simulation (Realizations) of the Time Series from the First-Order Copula-Based Markov Process

The univariate time series from the first-order copula-based Markov process can be simulated with a similar approach discussed in Section 3.7. With little modification, the simulation procedure is presented as follows:

i. Generate i.i.d. uniformly distributed random variables U = {u_i : i = 1, 2, 3, …, N}U=ui:i=123…N.
ii. Set y₁ = u₁y1=u1.
iii. Set u2=Cy2y1=Cy2u1=∂Cy2u1∂u1⇒y2=h−1u2y1α, in which the hh function is defined as the conditional copula.
iv. Continue until we obtain y_N = h⁻¹(u_n, y_n − 1; α)yN=h−1unyn−1α.

It should be noted that {y₁, y₂, …, y_n}y1y2…yn simulated from steps i to iv are the time series in the frequency domain (i.e., marginals), and we will need to perform the one-to-one transformation to obtain the corresponding time series simulated in the real domain (e.g., through parametric distribution, empirical distribution, or kernel density based on the observed time series).

9.4.4 Forecast and Quantile Estimation of the First-Order Markov Process

As in economics and finance, it is our interest to forecast the future behavior of the time series, or in other words median forecast and conditional quantile estimation. For given quantile, the conditional quantile estimation may also be called value-at-risk (VaR). From the transitional probability of the first-order Markov process (i.e., Equation (9.16a)), we have the median forecast expressed as follows:

EZtZt−1=zt−1=∫zth∗ztzt−1dzt=∫ztf∗ztcF∗ztF∗zt−1αdzt=∫ztcF∗ztF∗zt−1αdF∗zt

(9.19a)

Again, replacing the unknown true marginal distribution by its empirical distribution, and the true copula parameter by its estimated parameter (α̂) from Equation (9.18b), Equation (9.19a) can be rewritten as follows:

EZtZt−1=zt−1=∫ztc(Fnzt,Fnzt−1);α̂dFnzt

(9.19b)

Equation (9.19b) implies that the conditional probability (i.e., conditional copula) of Z_t ∣ Z_t − 1Zt∣Zt−1 equals 0.5 (also called 50% conditional quantile) as follows:

CFnztFnzt−1;α̂=0.5⇒Fnzt=CFnzt∣Fnzt−1−10.5Fnzt−1;α̂

(9.20a)

From Equation (9.20a), we can further forecast the behavior of z_tzt as follows:

ẑt=Fn−1CFnzt∣Fnzt−1−10.5Fnzt−1;α̂

(9.20b)

Similarly, Equations (9.20a) and (9.20b) can be easily reformulated for the estimation of any given conditional quantile q as follows:

Fnzt=CFnzt∣Fnzt−1−1qFnzt−1;α̂⇒ẑtq=Fn−1CFnzt∣Fnzt−1−1qFnzt−1;α̂

(9.21)

Example 9.3 Rework Example 9.1 using the Gumbel–Hougaard and Gaussian copula-based first-order Markov model. Also, compare the one-step ahead forecast (i.e., forecasting the annual flow for water year 2016) with both the classic AR(1) model and copula-based first-order Markov model.

Solution: In Example 9.1, we applied the AR(1) model to investigate the behavior and annual flow listed in Table 9.1. From Example 9.1 we conclude that statistically, we can apply AR(1) model to the annual flow under the assumption that annual flow shows a linear temporal dependence; however, in reality the dependence is usually nonlinear. Without imposing more complex (G)ARCH model, the copula-based Markov model is an excellent alternative approach to solve this issue. In addition, the Gaussian process assumption is also relaxed when applying the copula-based Markov model.

In this example, we will use semiparametric estimation such that the empirical distribution is applied for the marginals. The following steps are needed to model the temporal dependence with copulas:

1. With Equation (9.17), the Weibull plotting position formula and kernel density function are employed to compute the empirical marginal distribution (Table 9.9). It is worth noting that the marginal estimated nonparametrically will not change the structure of the time series dataset, as shown in Figure 9.9 (through the sample autocorrelation function plot).
2. Estimate the copula parameter for the first-order Markov model using Equation (9.17b).
3. Estimate the copula parameter for the first-order Markov model using Equation (9.17b).

Table 9.9. Empirical marginals using the Weibull plotting position formula and kernel density function.

(1)	(2)	(3)	(4)	(1)	(2)	(3)	(4)
1960	517.9	0.58	0.58	1988	280.8	0.05	0.08
1961	367.8	0.21	0.23	1989	500.9	0.49	0.53
1962	252.2	0.02	0.05	1990	528.9	0.61	0.61
1963	258.6	0.04	0.06	1991	602.8	0.81	0.79
1964	281.3	0.07	0.08	1992	386	0.26	0.26
1965	308.4	0.11	0.12	1993	627.2	0.86	0.84
1966	317.3	0.14	0.13	1994	520	0.60	0.59
1967	372.4	0.23	0.23	1995	345.3	0.18	0.18
1968	349.1	0.19	0.19	1996	575	0.75	0.73
1969	504.3	0.51	0.54	1997	663.2	0.89	0.89
1970	330.9	0.16	0.16	1998	412.4	0.30	0.32
1971	413.3	0.32	0.32	1999	311	0.12	0.12
1972	461.7	0.40	0.43	2000	385.3	0.25	0.26
1973	567	0.74	0.71	2001	299.3	0.09	0.11
1974	500.6	0.47	0.53	2002	417.5	0.33	0.33
1975	654.9	0.88	0.88	2003	578.5	0.77	0.74
1976	550.3	0.70	0.67	2004	715.8	0.95	0.95
1977	401	0.28	0.29	2005	676.3	0.91	0.91
1978	593.8	0.79	0.77	2006	507	0.53	0.55
1979	508	0.54	0.55	2007	736	0.98	0.96
1980	543.5	0.67	0.65	2008	677.4	0.93	0.91
1981	442.3	0.39	0.39	2009	508.9	0.56	0.56
1982	477.3	0.46	0.47	2010	418.3	0.35	0.33
1983	473	0.44	0.46	2011	721.2	0.96	0.95
1984	548.3	0.68	0.66	2012	609	0.84	0.80
1985	467.7	0.42	0.45	2013	536.3	0.63	0.63
1986	539.3	0.65	0.64	2014	608.1	0.82	0.80
1987	431.7	0.37	0.36	2015	552.4	0.72	0.67

Note: (1): year; (2): annual flow (cfs); (3): empirical CDF; (4): CDF computed through kernel density.

Figure 9.9 Sample autocorrelation function plots for original Flow series, rank-based and kernel density based marginals.

As discussed in the previous chapters, we can first estimate the rank-based Kendall’s tau for lag-1 temporal dependence. We computed τ≈0.31τ≈0.31. As the autoregressive coefficient estimated in Example 9.1 (φ≈0.47φ≈0.47), the annual flow at the current time tt is positively dependent on that at the previous time of t − 1t−1. Using computed sample τ≈0.31τ≈0.31, we obtain the initial parameter estimate:

Gumbel–Hougaard copula: θGHini=11−0.31≈1.44
Meta-Gaussian copula: θGAUini=ρini=sinπ20.31≈0.463.

Now maximizing Equation (9.18b) for the log-likelihood functions of Gumbel–Hougaard and meta-Gaussian copulas, we have θ_GH = 1.38; θ_GAU = 0.51θGH=1.38;θGAU=0.51.

Comparing to the parameter estimated from the AR(1) model and that estimated from meta-Gaussian copula-based first-order Markov model, there is minimal difference in regard to the parameter estimated. To further compare AR(1) model and copula-based first-order Markov model, we simulate the time series with 100 realizations. We directly simulate the realizations in the real domain for the AR(1) model (with the parameters estimated in Example 9.1). For the copula-based first-order Markov process, we first simulate the marginals, and then we perform the inverse transformation with the kernel density function approach. Figure 9.10 plots the realizations from all three approaches.

Figure 9.10 Realization from the classic AR(1) model and copula-based first-order Markov model.

One-step ahead forecast with the AR(1) model: From Equation (9.6), the corresponding one-step ahead forecast is written using the difference equation as [Z_t + 1] = 255.132 + 0.475[Z_t] + [ϵ_t + 1]Zt+1=255.132+0.475Zt+ϵt+1.

Substituting [ϵ_t + 1] = 0ϵt+1=0 and z₂₀₁₅ = 552.4 cfsz2015=552.4cfs into the forecast equation, we have the following:

[z₂₀₁₆] = 255.132 + 0.475(552.4) = 517.52 cfsz2016=255.132+0.475552.4=517.52cfs.

One-step ahead forecast from copula-based first-order Markov models: First, we can rewrite Equation (9.20b) in a similar fashion as the preceding AR(1) forecast equation:

zt+1=Fn−1(CFnzt+1∣Fnzt−10.5Fnzt;α̂

(9.22)

In Equation (9.22) F_nFn represents the marginal estimated nonparametrically through the kernel density function.

i. Gumbel–Hougaard copula: Substituting α̂=1.38 and F_n(z₂₀₁₅) = 0.67Fnz2015=0.67 into Equation (9.22) for Gumbel–Hougaard copula, we obtain the estimated marginal for 2016 and the corresponding forecasted annual flow as follows:

[F_n(z₂₀₁₆)]^GH = 0.56, and [z₂₀₁₆]^GH = 509.36 cfs
Fnz2016GH=0.56,andz2016GH=509.36cfs
ii. Meta-Gaussian copula: Substituting α̂=0.514 and F_n(z₂₀₁₅) = 0.67Fnz2015=0.67 into Equation (8.22) for the meta-Gaussian copula, we obtain the estimated marginal for 2016 and the corresponding forecasted annual flow as follows:

[F_n(z₂₀₁₆)]^GAU = 0.591, and [z₂₀₁₆]^GAU = 521.62 cfs
Fnz2016GAU=0.591,andz2016GAU=521.62cfs

Comparing the one-step ahead forecast with the AR(1) and meta-Gaussian copula-based first-order Markov model, it is seen that the relative difference of forecast results is less than 1% with the meta-Gaussian copula reaching a slightly better forecasting result.

9.5 Kth-Order Copula-Based Markov Models (K ≥ 2K≥2)

Similar to the discussion in Section 9.4; the continuous time series {Z_t : t = 1, 2, …}Zt:t=12…, modeled by the Kth-order Markov process, is fully governed by the joint distribution of {Z_t, Z_t − 1, …, Z_t − k}ZtZt−1…Zt−k. The transition probability for the first-order Markov process (i.e., Equation (9.15)) can be rewritten as follows:

P(Z_t ≤ z_t| Z_t − 1 = z_t − 1, Z_t − 2 = z_t − 2, …Z_t − K = z_t − K)

PZt≤ztZt−1=zt−1Zt−2=zt−2…Zt−K=zt−K(9.23)

Similar to the application of bivariate copula to study the first-order Markov models, the serial dependence of Kth order copula-based Markov process may be fully assessed using (K+1)-dimensional copulas.

9.5.1 Building Copula Structure for Kth-Order Markov Models

Given the property of serial dependence for a higher-order Markov process, we need to construct a D-vine copula for the modeling purpose.

We will use a third-order Markov process as an example (Figure 9.11 similar to Figure 5.11).

Figure 9.11 D-vine copula for a third-order Markov process.

To comply with the properties of the Markov process, Figure 9.11 shows the following:

a. The same as the first-order Markov model, z_t, z_t − 1, z_t − 2, and z_t − 3zt,zt−1,zt−2,andzt−3 have the same marginal distribution.
b. T₁ directly represents the lag-1 serial dependence, i.e., the same copula applies to {z_t, z_t − 1}, {z_t − 1, z_t − 2}, {z_t − 2, z_t − 3}ztzt−1,zt−1zt−2,zt−2zt−3, i.e., C_{t, t − 1} ≡ C_{t − 1, t − 2} ≡ C_{t − 2, t − 3}Ct,t−1≡Ct−1,t−2≡Ct−2,t−3.
c. For the lag-2 serial dependence, we also have the same copula applying to {z_t, z_t − 1, z_t − 2}, {z_t − 1, z_t − 2, z_t − 3}ztzt−1zt−2,zt−1zt−2zt−3, i.e., C_{t, t − 1, t − 2} ≡ C_{t − 1, t − 2, t − 3}Ct,t−1,t−2≡Ct−1,t−2,t−3.
d. The copulas in b and c are differentiable.

With the same philosophy, we can model the given Kth-order Markov process using copulas.

9.5.2 Order Identification for the Markov Process

We have explained how to build a copula structure for higher-order Markov process. Here we further explain how to identify the order of Markov process for the time series with the following procedure:

i. Study the lag-1 dependence by evaluating the dependence of {F_{z_t}, F_{z_t − 1}}FztFzt−1. If F_{z_t}Fzt and F_{z_t − 1}Fzt−1 are stochastically independent, we say there is no lag-1 dependence (or say z_tzt and z_t − 1zt−1 are independent).
ii. If step i indicates the lag-1 dependence being statistically significant, we evaluate the dependence of {F_{z_t}, F_zt − 2}FztFzt−2 for lag-2 dependence of z_tzt and z_t − 2zt−2 through the evaluation of F_{t ∣ t − 1}(z_t| z_t − 1)Ft∣t−1ztzt−1 and F_{t − 2 ∣ t − 1}(z_t − 2| z_t − 1)Ft−2∣t−1zt−2zt−1 (i.e., the copula form of C_{t ∣ t − 1}(F(z_t)| F(z_t − 1))Ct∣t−1FztFzt−1 and C_{t − 2 ∣ t − 1}(F(z_t − 2)| F(z_t − 1))Ct−2∣t−1Fzt−2Fzt−1.
iii. We move sequentially to higher orders until we identify that F_{t ∣ t − 1, …, t − k}Ft∣t−1,…,t−k and F_{t − (k + 1) ∣ t − 1, …, t − k}Ft−k+1∣t−1,…,t−k (i.e., C_{t ∣ t − 1, …, t − k}Ct∣t−1,…,t−k and C_{t − k − 1 ∣ t − 1, …, t − k}Ct−k−1∣t−1,…,t−k) are stochastically independent.
iv. Until now, we have successfully identified the order of Markov process, i.e., order k.

It is worth noting that Equation (9.24) will be applied to compute the conditional CDF needed. Also, similar to the first-order Markov process, we may apply the empirical marginal to the univariate time series.

9.5.3 Parameter Estimation for Kth-Order Copula-Based Markov Models

The parameters of the D-vine copula may be again estimated semiparametrically. Similar to Equation (9.16a), the transitional probability and its density function can be written as follows:

PZt≤ztZt−1=zt−1…Zt−k=zt−k=CFztFzt−1…Fzt−k=AkBk

(9.24)

where

Ak=∂CkFztFzt−1…Fzt−k∂Fzt−1…∂Fzt−k

(9.24a)

Bk=∂CkFzt−1…Fzt−k∂Fzt−1…∂Fzt−k=cFzt−1…Fzt−k

(9.24b)

h(z_t| Z_t − 1 = z_t − 1, …, Z_t − k = z_t − k) = f(z_t)c(F(z_t), …, F(z_t − k))

hztZt−1=zt−1…Zt−k=zt−k=fztcFzt…Fzt−k(9.25)

In Equations (9.24) and (9.25), C(.| .)C.. represents the conditional copula; cc represents the copula density function; and F and fFandf represent the marginal distribution and marginal density function, respectively.

Similar to the first-order Markov process (i.e., Equation (9.18b)), the semiparametric log-likelihood function for the (k+1)-dimensional D-vine copula can be written as follows:

Lα=1n∑t=k+1mlncFnztFnzt−1…Fnzt−kα

(9.26)

Looking at Equation (9.26), it is shown that the algebra of the copula density function may be getting complicated when the order of the Markov model needed is high. Thus, to estimate the parameters, we can proceed with two approaches: (i) sequential estimation or (ii) simultaneous estimation.

i. Sequential Estimation Approach

1. Choose and estimate the parameters of the copula candidate for the first level.
2. Compute the conditional copulas using the fitted parametric copula for the first level.
3. Choose and estimate the parameters of the copula candidates for the second level with the use of conditional copulas computed in step 2.
4. Continue these steps sequentially, until we reach the top level of the copula structure.

ii. Simultaneous Estimation Approach

Unlike the sequential approach, where we estimate the copula parameters for each level separately using the fitted copulas from the previous level, we may estimate the copula parameters of all levels simultaneously using the full semiparametric log-likelihood function of the entire vine structure as the objective function.

9.5.4 Simulation (Realizations) of the Time Series from Kth-Order Copula-Based Markov Models

Similar to the simulation for first-order copula-based Markov models discussed in Section 9.4.3, the D-vine copula simulation algorithm (i.e., algorithm 4 in Aas et al., 2009) may be modified and applied as follows:

i. Generate i.i.d. uniformly distributed random variables U = {u_i : i = 1, 2, …N}U=ui:i=12…N, where N is the length of time series that needs to be simulated.
ii. Set y₁ = u₁y1=u1.
iii. Set u2=Cy2y1=Cy2u1=∂Cy2u1∂u1⇒y2=h−1u2u1α12.
iv. Based on the general formula (i.e., Equation 5.24), set

u3=Cy3y1y2=∂C13∣2Cy3y2α12Cy1y2α12∂Cy1y2α12
v. Continue until we obtain y_NyN by setting u_N = C(y_N| y_N − 1, y_N − 2, …y_N − K; α)uN=CyNyN−1yN−2…yN−Kα.

Now, we have simulated the desired Kth-order Markov process in the frequency domain. Again, similar to the simulation of the first-order Markov model, the sequence simulated in the frequency domain needs to be transformed into the real-domain through either parametric marginal distribution or nonparametric marginal distribution (e.g., empirical distribution using plotting-position formulas and kernel density) with proper interpolation.

9.5.5 Forecast and Quantile Estimation of Kth-order Copula-Based Markov Models

For the Kth-order copula-based Markov process, the first-order median forecast formula (i.e., Equation (9.19)) can be rewritten as follows:

EZtZt−1=zt−1…Zt−K=zt−K=∫ztf∗ztcF∗zt…F∗zt−Kαdzt=∫ztcF∗zt…F∗zt−KdF∗zt

(9.27)

Replacing the unknown true marginal distribution (F^∗F∗) by its nonparametric marginal distribution (F_nFn), and true copula parameter vector αα by the estimated parameter α̂, Equation (9.27) can be rewritten as follows:

EZtZt−1=zt−1…Zt−K=zt−K=∫ztcFnzt…Fnzt−K;α̂dFnzt

(9.27a)

Similar to Equation (9.19a), Equation (9.27a) of the conditional probability (also called the conditional copula) of Z_t ∣ Z_t − 1 = z_t − 1, …, Z_t − K = z_t − KZt∣Zt−1=zt−1,…,Zt−K=zt−K is equal to 0.5 (also called the 50% conditional quantile). The median forecast can be computed using the following:

ẑt=Fn−1(CFnzt∣Fnzt−1…zt−K0.5Fnzt−1)…Fn(zt−K;α̂

(9.28)

Furthermore, for any given conditional quantile q, its associated time series value may be computed using the following:

ẑtq=Fn−1(CFnzt∣Fnzt−1,…,Fnzt−K−1qFnzt−1…Fnzt−K;α̂

(9.29)

Example 9.4 Rework TS2 series in Example 9.2 using (i) meta-Gaussian and (ii) Frank copulas. Also, compare the results with those from AR(2) model in Example 9.2.

Solution: According to Example 9.2, the time series TS2 is fitted with the classic AR(2) model. We will proceed with the following procedure:

i. Identify the Markov order for the time series:
- Lag-1 dependence: To assess the lag-1 dependence (i.e., TS2_t and TS2_t − 1)TS2tandTS2t−1), we will simply compute the rank-based Kendall correlation coefficient and assess its significance (using the critical value of α = 0.05)α=0.05).
  - Using Equation (3.68) (or simply using the MATLAB function corr), we compute the following:
  τnLag1=0.5765,Pvalue≤10−4
  Results show that the lag-1 dependence is significant.
- Lag-2 dependence: To assess the lag-2 dependence (i.e., TS2_t and TS2_t − 2TS2tandTS2t−2), we need to evaluate the dependence of F_{t ∣ t − 1}(z_t| z_t − 1)Ft∣t−1ztzt−1 and F_{t − 2 ∣ t − 1}(z_t − 2| z_t − 1)Ft−2∣t−1zt−2zt−1. Thus, we first need to estimate the conditional distribution (or simply conditional copula) with the understanding that {z_t, z_t − 1}ztzt−1 and {z_t − 1, z_t − 2zt−1,zt−2} have the same copula. Using meta-Gaussian copula, we can simply estimate the copula parameter through Kendall’s tau, previously computed as follows:
  
  ρ=sinπ2τnlag1=0.7868
  We now build the bivariate Gaussian copula for the lag-1 sequence and compute the corresponding conditional copulas. The conditional copula results are listed in Table 9.9. The Kendall correlation coefficient is computed as follows:
  
  τ_n[F(z_t| z_t − 1), F(z_t − 2| z_t − 1)] = − 0.2748, Pvalue≈0.006
  τnFztzt−1Fzt−2zt−1=−0.2748,Pvalue≈0.006
  Now, we conclude that lag-2 dependence is still significant. We will need to move on to the evaluation of lag-3 dependence.
- Lag-3 dependence: Similar to the lag-2 dependence assessment, we will need to evaluate the rank-based conditional dependence of the following: {F(z_t| z_t − 1, z_t − 2), F(z_t − 3| z_t − 1, z_t − 2)}Fztzt−1zt−2Fzt−3zt−1zt−2
  - In the preceding formulation, we can further write the two components using Equation (5.24) as follows:
  Fztzt−1zt−2=∂Czt,zt−2∣zt−1Czt∣zt−1Czt−2∣zt−1∂Czt−2∣zt−1(9.30a)
  
  Fzt−3zt−1zt−2=∂Czt−3,zt−1∣zt−2Czt−3∣zt−2Czt−1∣zt−2∂Czt−1∣zt−2(9.30b)
  - To compute the conditional probability for Equations (9.30a) and (9.30b), we apply the meta-Gaussian copula first using the Gaussian copula fitted to the lag-2 dependence assessment. The conditional distribution for C_{z_t ∣ z_t − 1}, C_{z_t − 2 ∣ z_t − 1}, C_{z_t − 1 ∣ z_t − 2}, C_{z_t − 3 ∣ z_t − 2}Czt∣zt−1,Czt−2∣zt−1,Czt−1∣zt−2,Czt−3∣zt−2 is also listed in Table 9.10.
  - With these conditional probabilities computed, we estimate the copula parameters for C_{z_t, z_t − 2 ∣ z_t − 1}, C_{z_t, z_t − 2 ∣ z_t − 1}Czt,zt−2∣zt−1,Czt3,t1jt2 that are –0.45 and –0.42 respectively. Now, we can compute F(z_t| z_t − 1, z_t − 2), F(z_t − 3| z_t − 1, z_t − 2)Fztzt−1zt−2,Fzt−3zt−1zt−2, which are again listed in Table 9.10. Now Kendall’s correlation coefficient is computed as follows:
  τ_n[F(z_t| z_t − 1, z_t − 2), F(z_t − 3| z_t − 1, z_t − 2)] = 0.1674, Pvalue≈0.099>0.05
  τnFztzt−1zt−2Fzt−3zt−1zt−2=0.1674,Pvalue≈0.099>0.05
  - It is then concluded that it is reasonable to apply the second-order copula-based Markov process for the dataset.
ii. Estimate the parameters for the second-order copula-based Markov model.
- From the results obtained in step i, we know that a trivariate copula will be needed to model the second-order Markov process for time series TS2, for which the schematic for the trivariate D-vine copula is shown in Figure 9.12. As given in the problem statement, the parameters of the meta-Gaussian, the Student t, and the Frank-vine copula structures will be estimated with the use of sequential parameter estimation.
- Meta Gaussian vine copula structure:
- In this case, we apply the meta-Gaussian copulas (Equation (7.40)) for T₁ and T₂. We will use the parameters estimated nonparametrically for lag-dependence evaluation as initial parameters.
- T₁: using ρⁱⁿ = 0.7868ρin=0.7868 for {z_t, z_t − 1}ztzt−1 & {z_t − 1, z_t − 2}zt−1zt−2 and applying the semiparametric MLE, we have the parameter estimated for the lag-1 serial dependence of T₁ as ρ_T₁ = 0.8265ρT1=0.8265.
- T_2. Fix the parameter estimated for T₁ to compute the conditional copula of C_{t ∣ t − 1}, C_{t − 1 ∣ t − 2}Ct∣t−1,Ct−1∣t−2 (listed in Table 9.11). Finally, using the computed conditional copula of bivariate variables, we estimate the parameter for T₂ as ρ_T₂ = − 0.3879ρT2=−0.3961.
- Frank vine copula structure:
- Using the same procedure as that for the meta-Gaussian copula, we again estimate the parameters for the Frank copula (Copula No. 5, Table 3.1) using the semiparametric MLE as follows:
- α_T₁ = 7.9422; α_T₂ = − 2.4461αT1=7.9422;αT2=−2.4461.
- Again, the conditional copula needed for the parameter estimation for T₂ is listed in Table 9.11.
iii. Simulate the univariate time series.

Compared to the simulation for the first-order copula-based Markov process, we will need to simulate variate y₃y3 through Equation (9.30b). Following the simulation discussed in Section 9.5.4, Figure 9.13 shows simulations from the copula-based model as well as the simple AR(2) model. To further compare the classic AR(2) model to the second-order copula-based Markov model, we perform the one-step ahead forecast using exactly the same rationale as in Example 9.3.

One-step ahead forecast from AR(2): TS2̂51≈476.40.

One-step ahead forecast from the Gaussian copula-based second-order Markov model:

F̂TS251=0.7512;TS2̂51Gaussian=476.472

One-step ahead forecast from the Frank copula-based second-order Markov model:

F̂TS251=0.7719;TS2̂51Gaussian=476.562

From the forecast results, it is seen that there is minimal difference between the classic AR(2) model and copula-based models. Given the time series data applied in Examples 9.2 and 9.4 as the synthetic time series generated from the AR(2) model, it is no surprise that overall the Gaussian copula-based model performs more similarly to the AR(2) model. Figure 9.14 plots the scatter plots for the lag-1 and lag-2 dependences. Figure 9.14 shows that the copula-based Markov model captures the serial dependence well.

Figure 9.12 Vine-copula structure for the second-order copula-based Markov process.

Figure 9.13 Simulations from AR(2) and copula-based second-order Markov models.

Table 9.10. Markov order identification results table.

Time series		Lag-2		Lag-3
TS2	CDF	(1)	(2)	(3)	(4)	(5)	(6)	(7)	(8)
		τ≈ − 0.27τ≈−0.27 P≈0.0056P≈0.0056						τ≈0.1674τ≈0.1674 P≈0.099P≈0.099
476.35	0.72
475.48	0.49
476.02	0.64	0.72	0.84
476.75	0.81	0.84	0.32	0.84	0.32	0.72	0.84	0.81	0.91
476.89	0.84	0.68	0.29	0.68	0.29	0.84	0.32	0.60	0.47
477.27	0.90	0.79	0.57	0.79	0.57	0.68	0.29	0.83	0.34
477.51	0.93	0.76	0.50	0.76	0.50	0.79	0.57	0.79	0.71
476.50	0.76	0.24	0.59	0.24	0.59	0.76	0.50	0.25	0.63
475.87	0.60	0.31	0.93	0.31	0.93	0.24	0.59	0.57	0.47
475.74	0.56	0.47	0.80	0.47	0.80	0.31	0.93	0.63	0.91
475.29	0.45	0.34	0.58	0.34	0.58	0.47	0.80	0.36	0.81
476.75	0.81	0.95	0.66	0.95	0.66	0.34	0.58	0.98	0.51
476.60	0.78	0.55	0.09	0.55	0.09	0.95	0.66	0.30	0.89
475.69	0.55	0.21	0.67	0.21	0.67	0.55	0.09	0.25	0.08
474.94	0.37	0.24	0.86	0.24	0.86	0.21	0.67	0.41	0.55
473.83	0.19	0.15	0.73	0.15	0.73	0.24	0.86	0.20	0.81
473.63	0.16	0.31	0.72	0.31	0.72	0.15	0.73	0.40	0.59
473.84	0.19	0.43	0.43	0.43	0.43	0.31	0.72	0.39	0.67
475.58	0.52	0.89	0.31	0.89	0.31	0.43	0.43	0.87	0.39
476.17	0.68	0.75	0.07	0.75	0.07	0.89	0.31	0.50	0.51
475.11	0.40	0.16	0.31	0.16	0.31	0.75	0.07	0.09	0.09
476.29	0.71	0.88	0.85	0.88	0.85	0.16	0.31	0.97	0.16
478.09	0.97	0.99	0.14	0.99	0.14	0.88	0.85	0.98	0.96
478.04	0.97	0.72	0.07	0.72	0.07	0.99	0.14	0.46	0.45
476.20	0.68	0.06	0.76	0.06	0.76	0.72	0.07	0.08	0.08
475.13	0.41	0.16	0.99	0.16	0.99	0.06	0.76	0.53	0.52
475.01	0.38	0.43	0.86	0.43	0.86	0.16	0.99	0.63	0.98
475.44	0.48	0.62	0.50	0.62	0.50	0.43	0.86	0.63	0.86
475.49	0.49	0.51	0.34	0.51	0.34	0.62	0.50	0.43	0.56
475.91	0.61	0.68	0.48	0.68	0.48	0.51	0.34	0.69	0.33
476.52	0.76	0.79	0.35	0.79	0.35	0.68	0.48	0.76	0.56
475.51	0.50	0.18	0.32	0.18	0.32	0.79	0.35	0.10	0.48
474.24	0.25	0.13	0.88	0.13	0.88	0.18	0.32	0.25	0.18
473.57	0.15	0.21	0.81	0.21	0.81	0.13	0.88	0.32	0.78
473.33	0.12	0.28	0.58	0.28	0.58	0.21	0.81	0.30	0.72
473.42	0.13	0.37	0.42	0.37	0.42	0.28	0.58	0.32	0.49
473.87	0.19	0.50	0.32	0.50	0.32	0.37	0.42	0.41	0.36
471.65	0.03	0.03	0.24	0.03	0.24	0.50	0.32	0.01	0.31
470.84	0.01	0.11	0.84	0.11	0.84	0.03	0.24	0.19	0.05
472.87	0.08	0.71	0.42	0.71	0.42	0.11	0.84	0.70	0.70
473.17	0.10	0.41	0.03	0.41	0.03	0.71	0.42	0.12	0.51
474.16	0.23	0.66	0.24	0.66	0.24	0.41	0.03	0.55	0.02
475.32	0.45	0.77	0.13	0.77	0.13	0.66	0.24	0.60	0.28
475.63	0.53	0.61	0.15	0.61	0.15	0.77	0.13	0.42	0.19
475.90	0.60	0.63	0.38	0.63	0.38	0.61	0.15	0.58	0.16
474.64	0.31	0.13	0.42	0.13	0.42	0.63	0.38	0.08	0.43
474.19	0.24	0.30	0.85	0.30	0.85	0.13	0.42	0.48	0.23
476.01	0.63	0.93	0.54	0.93	0.54	0.30	0.85	0.95	0.82
476.85	0.83	0.87	0.06	0.87	0.06	0.93	0.54	0.68	0.79
476.79	0.82	0.60	0.25	0.60	0.25	0.87	0.06	0.48	0.11

Note: Lag-2: (1) F(z_t| z_t − 1)Fztzt−1; (2) F(z_t − 1| z_t − 2)Fzt−1zt−2

Lag-3: (3) F(z_t| z_t − 1)Fztzt−1; (4) F(z_t − 2| z_t − 1)Fzt−2zt−1; (5) F(z_t − 1| z_t − 2)Fzt−1zt−2; (6) F(z_t − 3| z_t − 2)Fzt−3zt−2

(7) F(z_t| z_t − 1, z_t − 2)Fztzt−1zt−2; (8) F(z_t − 3| z_t − 1, z_t − 2)Fzt−3zt−1zt−2

Table 9.11. Parameter estimation results.

Time series			Meta-Gaussian		Frank
Lag-2	Lag-1	TS2	C_{t ∣ t − 1}Ct∣t−1	C_{t − 2 ∣ t − 1}Ct−2∣t−1	C_{t ∣ t − 1}Ct∣t−1	C_{t − 2 ∣ t − 1}Ct−2∣t−1
(t−2)	(t−1)	t	ρ^lag1 = 0.8265ρlag1=0.8265		θ^lag1 = 7.9422θlag1=7.9422
NaN	NaN	0.72
NaN	0.72	0.49
0.72	0.49	0.64	0.74	0.86	0.77	0.87
0.49	0.64	0.81	0.86	0.29	0.84	0.24
0.64	0.81	0.84	0.68	0.25	0.63	0.21
0.81	0.84	0.90	0.79	0.55	0.74	0.51
0.84	0.90	0.93	0.76	0.46	0.74	0.47
0.90	0.93	0.76	0.19	0.55	0.24	0.59
0.93	0.76	0.60	0.28	0.94	0.22	0.89
0.76	0.60	0.56	0.46	0.81	0.43	0.81
0.60	0.56	0.45	0.32	0.58	0.28	0.58
0.56	0.45	0.81	0.96	0.68	0.96	0.72
0.45	0.81	0.78	0.53	0.06	0.49	0.05
0.81	0.78	0.55	0.18	0.67	0.14	0.62
0.78	0.55	0.37	0.22	0.88	0.19	0.88
0.55	0.37	0.19	0.14	0.76	0.15	0.81
0.37	0.19	0.16	0.32	0.76	0.37	0.80
0.19	0.16	0.19	0.45	0.45	0.49	0.49
0.16	0.19	0.52	0.92	0.32	0.93	0.36
0.19	0.52	0.68	0.77	0.05	0.79	0.05
0.52	0.68	0.40	0.14	0.28	0.10	0.22
0.68	0.40	0.71	0.91	0.88	0.93	0.90
0.40	0.71	0.97	0.99	0.11	0.97	0.08
0.71	0.97	0.97	0.69	0.04	0.81	0.12
0.97	0.97	0.68	0.03	0.74	0.10	0.83
0.97	0.68	0.41	0.13	0.99	0.10	0.98
0.68	0.41	0.38	0.43	0.88	0.44	0.91
0.41	0.38	0.48	0.64	0.51	0.69	0.54
0.38	0.48	0.49	0.52	0.32	0.52	0.30
0.48	0.49	0.61	0.69	0.48	0.72	0.47
0.49	0.61	0.76	0.81	0.33	0.80	0.29
0.61	0.76	0.50	0.15	0.29	0.11	0.23
0.76	0.50	0.25	0.11	0.90	0.10	0.90
0.50	0.25	0.15	0.20	0.84	0.24	0.88
0.25	0.15	0.12	0.29	0.62	0.33	0.65
0.15	0.12	0.13	0.39	0.45	0.41	0.47
0.12	0.13	0.19	0.53	0.33	0.55	0.36
0.13	0.19	0.03	0.02	0.24	0.05	0.29
0.19	0.03	0.01	0.11	0.89	0.08	0.74
0.03	0.01	0.08	0.78	0.47	0.44	0.19
0.01	0.08	0.10	0.44	0.03	0.41	0.05
0.08	0.10	0.23	0.71	0.25	0.70	0.27
0.10	0.23	0.45	0.80	0.12	0.85	0.17
0.23	0.45	0.53	0.63	0.13	0.66	0.13
0.45	0.53	0.60	0.64	0.37	0.65	0.34
0.53	0.60	0.31	0.10	0.40	0.08	0.36
0.60	0.31	0.24	0.29	0.88	0.32	0.91
0.31	0.24	0.63	0.95	0.57	0.96	0.62
0.24	0.63	0.83	0.89	0.04	0.87	0.04
0.63	0.83	0.82	0.59	0.21	0.55	0.18

Figure 9.14 Lag-1 and lag-2 dependence comparison of simulated time series to the orginal time series TS2.

9.6 Summary

This chapter further reveals the advantages of the copula theory not only in traditional frequency analysis but also in time series analysis:

i. It allows the investigation of spatial and temporal dependences separately from their marginals and their effect.
ii. It is more robust for to modeling any type of temporal (serial dependence) and avoids the Gaussian process assumption of the time series modeling approach.
iii. It provides a better approach to identify the necessary order for the Markov process.
iv. Vine copula may be easily applied to model a higher-order Markov process.
v. These advantages are very important for the hydrological analysis under the impact of climate change and land use/land cover (LULC) when the univariate hydrological variables may no longer be considered as independent random variables.

References

Aas, K., Czado, C., Frigessi, A., and Bakken, H. (2009). Pair-copula constructions of multiple dependence. Insurance: Mathematics and Economics, 44, 182–198. doi:10.1016/j.insmatheco.2007.02.001.

Arya, F. K. and Zhang, L. (2015). Time series analysis of water quality parameters at Stillaguamish River using order series method. Stochastic Environmental Research and Risk Assessment, 29, 227. doi:10.1007/s00477–014–0907–2.

Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics, 31, 307–327.

Box, G. E. P., Jenkins, G. M., and Reinsel, G. C. (2008). Time Series Analysis: Forecasting and Control. John Wiley & Sons, Inc., Hoboken.

Chen, X. and Fan, Y. (2006). Estimation and model selection of semiparametric copula-based multivariate dynamic models under copula misspecification. Journal of Econometrics, 135, 125–154.

Darsow, W., Nguyen, B., and Olsen, E. (1992). Copulas and Markov processes. Illinois Journal of Mathematics, 36, 600–642.

Dickey, D. A. and Fuller, W. A. (1979). Distribution of the estimates for autoregressive time series with a unit root. Journal of American Statistical Association, 74, 427–431.

Hurst, H. E. (1951). Long-term storage capacity of reservoirs. Transactions of American Society of Civil Engineers, 116, 770.

Kwiatkowski, D., Phillips, P. C. B., Schmidt, P., and Shin, Y. (1992). Testing the null hypothesis of stationarity against the alternative of a unit root. Journal of Econometrics, 54, 159–178.

Joe, H. (1997). Multivariate Models and Multivariate Dependence Concepts. Chapman & Hall/CRC, New York.

Tags: Copulas and their Applications in Water Resources Engineering

Oct 12, 2020 | Posted by drzezo in Water and Sewage | Comments Off

Civil Engineer Key

Fastest Civil Engineer Engine