Abstract
In this chapter, copula modeling is applied to flood analysis with the use of real-world flood data. The chapter is structured in the following sections: (i) an introduction; (ii) at-site flood frequency analysis; (iii) spatial dependence for flood variables; and (iv) concluding remarks.
11.1 Introduction
Univariate flood frequency analysis has long been done for design of hydraulic structures, such as levees, flood walls, spillways, dams, culverts, drainage structures, and reservoirs, as well as for risk and uncertainty analysis. In the past decade, hydrologists have employed the copula theory for bivariate/multivariate flood frequency analyses. The advantages of applying the copula theory are that (i) it allows for separate consideration of marginal distributions and the joint distribution (i.e., copulas); (ii) it allows one to investigate both linear and nonlinear dependence structures; (iii) the tail dependence may be better captured; and (iv) it is easier to extend to higher dimensions through the vine copula or meta-elliptical copulas. The copula methodology has been applied to model the bivariate and multivariate flood frequency analysis (Chowdhary et al., 2011; Chen et al., 2012, 2013; Bezak et al., 2014; Sraj et al., 2015; Durocher et al., 2016; Requena et al., 2016; among others).
11.2 At-Site Flood Frequency Analysis
Univariate flood frequency analysis (e.g., using annual peak discharge) has long been a standard hydrological design method. In the United States, the log-Pearson type III distribution is still the standard distribution for flood frequency analysis, even though it is known that annual peak discharge by itself is not sufficient to account for flood risk.
A given flood event may be characterized by three important characteristics, i.e., peak discharge, volume, and duration. These three characteristics interact with one another when assessing flood risk or flood damage. As an example, a flood event with a longer duration may breach a levee due to long inundation time and possibly a large flood volume, while the peak discharge in this case may not be high. Another example is when a flood event with a higher peak discharge and a shorter duration may overtop a flood wall, causing flood damage. To further explain how to do flood frequency analysis considering all three characteristics, we will use the flood data listed in Table 11.1 (Yue, 1999) as an illustrative example.
Year | Q (cms) | V (day.cms) | D (days) | Year | Q (cms) | V (day.cms) | D (days) |
---|---|---|---|---|---|---|---|
1963 | 968 | 58,538 | 111 | 1980 | 949 | 33,010 | 69 |
1964 | 1780 | 68,828 | 98 | 1981 | 1,500 | 64,631 | 114 |
1965 | 1330 | 38,682 | 73 | 1982 | 1,920 | 50,525 | 77 |
1966 | 1650 | 54,139 | 78 | 1983 | 1,590 | 67,223 | 80 |
1967 | 934 | 39,744 | 75 | 1984 | 1,460 | 57,769 | 96 |
1968 | 1100 | 37,213 | 84 | 1985 | 1,210 | 47,627 | 80 |
1969 | 1380 | 50,895 | 80 | 1986 | 1,690 | 46,735 | 74 |
1970 | 1780 | 66,879 | 96 | 1987 | 610 | 35,600 | 96 |
1971 | 1420 | 38,634 | 66 | 1988 | 993 | 36,882 | 80 |
1972 | 1160 | 42,497 | 79 | 1989 | 1,490 | 41,943 | 63 |
1973 | 1470 | 55,766 | 78 | 1990 | 1,570 | 38,568 | 59 |
1974 | 2400 | 84,198 | 80 | 1991 | 1,130 | 49,226 | 93 |
1975 | 1260 | 48,790 | 83 | 1992 | 1,820 | 51,752 | 77 |
1976 | 1490 | 60,767 | 84 | 1993 | 1,360 | 45,263 | 83 |
1977 | 1370 | 60,824 | 92 | 1994 | 1,170 | 74,840 | 126 |
1978 | 1530 | 63,663 | 102 | 1995 | 1,550 | 51,853 | 80 |
1979 | 2040 | 59,254 | 76 |
Note: a In this dataset, discharge (Q), flood volume (V), and flood duration (D) are considered independent identically distributed (i.i.d.) random variables.
The at-site trivariate flood frequency analysis in this chapter follows this procedure:
1. Collect the streamflow sequence and separate the streamflow sequence into peak discharge, flood duration, and flood volume variable.
2. Assess the pairwise overall dependence nonparametrically with the use of the Kendall rank-based correlation coefficient.
3. Apply the vine copula approach to study the dependence structure. The bivariate copula (building block) candidates are selected based on the nonparametric tail dependence coefficient and Kendall correlation coefficient.
4. Perform the risk analysis through the joint and conditional return period.
11.2.1 Brief Discussion of Dataset
As stated in Yue (1999), the flood dataset was collected for Asuapmushuan River basin in Quebec, Canada. Due to the constraints of the dataset, Yue (1999) applied the maximum annual daily discharge (Q [m3/s]), the corresponding flood volume (V [day m3/s]), and duration (D [day]) for frequency analysis. According to Yue (1999), the values of flood volume and duration were determined from the schematic (Figure 11.1) and Equation (11.1) as follows:
In Equation (11.1), SD and ED represent the starting time and the ending time of the flood event, respectively; D represents the duration of the flood event; qiqi represents the discharge of day-i during the flood event; and V represents the flood volume.
Figure 11.1 Schematic for a given flood event.
11.2.2 Dependence Measure of Flood Variables: Nonparametric Assessment
Before we apply copulas to at-site flood frequency analysis, we compute the sample Kendall’s correlation coefficient using Equation (3.73), as listed in Table 11.2. Using Equations (3.76)–(3.79) and Equation (3.80) presented in Sections 3.4.3 and 3.4.4, Figure 11.2 graphs the chi- and K-plots to assess the dependence among the flood random variables. Table 11.2 and Figure 11.2 clearly indicate the positive dependence between Q and V as well as between V and D, while the negative dependence is detected between Q and D. Physically, the dependence structure implies that (i) high flow tends to result in high flood volume; (ii) long flood duration (or long inundation time) also tends to lead to high flood volume (e.g., flood events due to slow moving storms); (iii) a high flow event may lead to a short-duration flood event (e.g., flash flooding caused by short-duration, high-intensity storms). Thus, it is more advantageous to take all three flood characteristics into consideration than assuming peak flow and flood duration being independent, as is usually done in conventional at-site bivariate/multivariate flood frequency analysis (e.g., Yue (1999)). In addition, the K-plots in the upper triangle of Figure 11.2 confirm the positive dependence of Q and V and of V and D and close to the independence of Q and D, the same as the chi-plots in the lower triangle and scatter plots of (Q, V), (V, D), and (Q, D) placed diagonally.
Q | V | D | |
---|---|---|---|
Q | 1 | 0.41 | –0.13 |
V | 0.41 | 1 | 0.42 |
D | –0.13 | 0.42 | 1 |
With the initial dependence assessment, we will apply the vine copula, meta-Gaussian, and meta-Student t copulas to model the dependence structure for trivariate flood variables.
11.2.3 Vine Copula–Based at-Site Flood Frequency Analysis
As discussed in Chapter 5, vine copulas belong to the asymmetric copula family. Given that flood volume has a higher degree of association with both flood peak flow and flood duration, we choose volume as the center variable to build the vine structure, as shown in Figure 11.3. As shown in Figure 11.3 and the discussions in Chapter 5, the bivariate copula is the building block for the entire structure. More specifically, we have full freedom to choose the best-fitted copula for (Q, V) and (V, D) separately in T1. Then, based on the best-fitted copula in T1 we will be able to choose the best-fitted copula in T2.
Figure 11.3 Vine-copula schematic for at-site trivariate flood variables.
Copula Candidates for T1
As shown in Figure 11.3 and the discussions in the previous chapters, we need to first compute the marginal distributions nonparametrically (e.g., Weibull plotting position formula Equation (3.103), or kernel density) or parametrically with fitted marginal distributions. Here we will use the Weibull plotting position formula to compute the marginals, as shown in Table 11.3. Before we choose the copula candidate, we assess the tail dependence of (Q, V) and (V, D) such that we can make a better judgment to choose the candidate. Figure 11.4 shows the scatter plot using the empirical marginals of each of (Q, V) and of each of (V, D). Compared to the left tail (i.e., the lower tail) dependence, we are usually more interested in the right tail (i.e., the upper tail) dependence for these extreme events. Based on the tail dependence concept discussed in Chapter 3, we will first introduce how to evaluate the empirical tail dependence coefficient in what follows.
Year | Q (cms) | V (day cms) | D (days) | F(Q) | F(V) | F(D) |
---|---|---|---|---|---|---|
1963 | 968 | 58,538 | 111 | 0.12 | 0.68 | 0.91 |
1964 | 1,780 | 68,828 | 98 | 0.84 | 0.91 | 0.85 |
1965 | 1,330 | 38,682 | 73 | 0.35 | 0.21 | 0.15 |
1966 | 1,650 | 54,139 | 78 | 0.76 | 0.59 | 0.34 |
1967 | 934 | 39,744 | 75 | 0.06 | 0.24 | 0.21 |
1968 | 1,100 | 37,213 | 84 | 0.18 | 0.12 | 0.66 |
1969 | 1,380 | 50,895 | 80 | 0.44 | 0.50 | 0.49 |
1970 | 1,780 | 66,879 | 96 | 0.84 | 0.85 | 0.79 |
1971 | 1,420 | 38,634 | 66 | 0.47 | 0.18 | 0.09 |
1972 | 1,160 | 42,497 | 79 | 0.24 | 0.29 | 0.38 |
1973 | 1,470 | 55,766 | 78 | 0.53 | 0.62 | 0.34 |
1974 | 2,400 | 84,198 | 80 | 0.97 | 0.97 | 0.49 |
1975 | 1,260 | 48,790 | 83 | 0.32 | 0.41 | 0.60 |
1976 | 1,490 | 60,767 | 84 | 0.57 | 0.74 | 0.66 |
1977 | 1,370 | 60,824 | 92 | 0.41 | 0.76 | 0.71 |
1978 | 1,530 | 63,663 | 102 | 0.65 | 0.79 | 0.88 |
1979 | 2,040 | 59,254 | 76 | 0.94 | 0.71 | 0.24 |
1980 | 949 | 33,010 | 69 | 0.09 | 0.03 | 0.12 |
1981 | 1,500 | 64,631 | 114 | 0.62 | 0.82 | 0.94 |
1982 | 1,920 | 50,525 | 77 | 0.91 | 0.47 | 0.28 |
1983 | 1,590 | 67,223 | 80 | 0.74 | 0.88 | 0.49 |
1984 | 1,460 | 57,769 | 96 | 0.50 | 0.65 | 0.79 |
1985 | 1,210 | 47,627 | 80 | 0.29 | 0.38 | 0.49 |
1986 | 1,690 | 46,735 | 74 | 0.79 | 0.35 | 0.18 |
1987 | 610 | 35,600 | 96 | 0.03 | 0.06 | 0.79 |
1988 | 993 | 36,882 | 80 | 0.15 | 0.09 | 0.49 |
1989 | 1,490 | 41,943 | 63 | 0.57 | 0.26 | 0.06 |
1990 | 1,570 | 38,568 | 59 | 0.71 | 0.15 | 0.03 |
1991 | 1,130 | 49,226 | 93 | 0.21 | 0.44 | 0.74 |
1992 | 1,820 | 51,752 | 77 | 0.88 | 0.53 | 0.28 |
1993 | 1,360 | 45,263 | 83 | 0.38 | 0.32 | 0.60 |
1994 | 1,170 | 74,840 | 126 | 0.26 | 0.94 | 0.97 |
1995 | 1,550 | 51,853 | 80 | 0.68 | 0.56 | 0.49 |
Figure 11.4 Scatter plots for the marginal of (Q, V) and of (V, D).
The tail dependence may be evaluated either graphically (Abberger, 2005) or numerically (Frahm et al., 2005; Schmidt and Stradtmuller, 2006). Here the nonparametric estimation is discussed in detail. Following Frahm et al. (2005), the nonparametric estimation is based on the empirical copula (i.e., Equation (3.64)) without any assumption on either parametric copula or marginals. In general, there are three types of nonparametric estimation (i.e., log-estimator [LOG], secant of the copula’s diagonal [SEC], and CFG; Poulin et al., 2007) for the upper-tail dependence (λ̂U) that can be expressed as follows:
In Equations (11.2)–(11.4), n is the sample size; Ui, Vi are the marginal variables; and k is the chosen threshold of the LOG and SEC methods.
The LOG method was proposed by Coles et al. (1999). The SEC method first appeared in Joe (1997). The threshold k can be estimated using the heuristic plateau-finding algorithm proposed by Frahm et al. (2005), which can be formulated as follows:
1. Smooth using the box kernel with bandwidth b ∈ ℕb∈ℕ (usually each moving average window should maintain 1% data) to compute the average of (2b + 1)2b+1 successive points from λ̂1,…,λ̂n (i.e., mapping k↦λ̂k,k=1,2,…,n) to obtain λ¯1,…,λ¯n−2b.
2. Set plateau length m=⌊n−2b⌋ and define a vector: pk=λ¯k…λ¯k+m−1,k=1,…,n−2b−m+1.
3. Set the stopping criteria using the standard deviation of λ¯1,…,λ¯n−2b. The threshold k can then be estimated from the first plateau pkpk that satisfies the condition:
∑i=k+1k+m−1λ¯i−λ¯k≤2σ(11.5)
If k is un-identified, λ̂U is set as 0; otherwise, move on to step 4.
4. Estimate the upper-tail dependence coefficient for threshold k as follows:
λ̂Uk=1m∑i=1mλ¯k+i−1(11.6)
The CFG method (i.e., Equation (11.4)) first appeared in Capéraà et al. (2007) that does not require the estimation of a threshold. However, there exists a strong underlying assumption: the empirical copula may be approximated by the extreme value (EV) copula (e.g., the Gumbel–Hougaard copula as an example). It is worth noting that the lower-tail dependence is the same as the upper-tail dependence of the survival copula.
The empirical upper-/lower-tail dependence coefficient is computed, as listed in Table 11.4. To illustrate the procedure, the empirical upper-tail dependence coefficient is further explained using Q and V with the LOG method. From the sample data listed in Table 11.1, the sample size is n = 33n=33. Applying Equation (11.2), we compute λ̂k for k = 1, 2, …, 32,k=1,2,…,32, as listed in Table 11.5. With the initial λ̂ks estimated for the LOG method, we can now move on to evaluate the tail dependence. With the sample size of 33, we set the bandwidth b = 0. With b = 0b=0, we have λ̂k=λ¯k, and the standard deviation of vector λ¯s is 0.2114. The plateau length m = 5 yields the vector with size of 27 by 5 for the non-NaN values that are listed in Table 11.6. Finally, applying Equation (11.5), we obtain the first p vector that satisfies the condition that index k = 3k=3 that results in the following:
We obtain the upper tail dependence as λULOG≈0.29.
Upper | Lower | ||||
---|---|---|---|---|---|
LOG | SEC | CFG | LOG | SEC | |
Q & V | 0.29 | 0.38 | 0.43 | 0.74 | 0.95 |
V & D | 0.49 | 0.60 | 0.51 | 0.60 | 0.92 |
k | CmCm | λ̂k | k | CmCm | λ̂k |
---|---|---|---|---|---|
1 | 0.9697 | 1.0000 | 17 | 0.3636 | 0.6026 |
2 | 0.9091 | 0.4755 | 18 | 0.3333 | 0.6066 |
3 | 0.8485 | 0.2761 | 19 | 0.3030 | 0.6076 |
4 | 0.7879 | 0.1549 | 20 | 0.2727 | 0.6053 |
5 | 0.7576 | 0.3102 | 21 | 0.2121 | 0.4672 |
6 | 0.7273 | 0.4131 | 22 | 0.1818 | 0.4483 |
7 | 0.6667 | 0.2993 | 23 | 0.1818 | 0.5721 |
8 | 0.6061 | 0.1963 | 24 | 0.1515 | 0.5476 |
9 | 0.5758 | 0.2664 | 25 | 0.1515 | 0.6683 |
10 | 0.5455 | 0.3210 | 26 | 0.1212 | 0.6391 |
11 | 0.4848 | 0.2146 | 27 | 0.1212 | 0.7622 |
12 | 0.4545 | 0.2556 | 28 | 0.0909 | 0.7293 |
13 | 0.4242 | 0.2878 | 29 | 0.0606 | 0.6715 |
14 | 0.3939 | 0.3126 | 30 | 0.0606 | 0.8309 |
15 | 0.3939 | 0.4631 | 31 | 0.0303 | 0.7527 |
16 | 0.3939 | 0.5956 | 32 | 0 |
k | λ¯k | λ¯k+1 | λ¯k+2 | λ¯k+3 | λ¯k+4 |
---|---|---|---|---|---|
1 | 1.0000 | 0.4755 | 0.2761 | 0.1549 | 0.3102 |
2 | 0.4755 | 0.2761 | 0.1549 | 0.3102 | 0.4131 |
3 | 0.2761 | 0.1549 | 0.3102 | 0.4131 | 0.2993 |
4 | 0.1549 | 0.3102 | 0.4131 | 0.2993 | 0.1963 |
5 | 0.3102 | 0.4131 | 0.2993 | 0.1963 | 0.2664 |
6 | 0.4131 | 0.2993 | 0.1963 | 0.2664 | 0.3210 |
7 | 0.2993 | 0.1963 | 0.2664 | 0.3210 | 0.2146 |
8 | 0.1963 | 0.2664 | 0.3210 | 0.2146 | 0.2556 |
9 | 0.2664 | 0.3210 | 0.2146 | 0.2556 | 0.2878 |
10 | 0.3210 | 0.2146 | 0.2556 | 0.2878 | 0.3126 |
11 | 0.2146 | 0.2556 | 0.2878 | 0.3126 | 0.4631 |
12 | 0.2556 | 0.2878 | 0.3126 | 0.4631 | 0.5956 |
13 | 0.2878 | 0.3126 | 0.4631 | 0.5956 | 0.6026 |
14 | 0.3126 | 0.4631 | 0.5956 | 0.6026 | 0.6066 |
15 | 0.4631 | 0.5956 | 0.6026 | 0.6066 | 0.6076 |
16 | 0.5956 | 0.6026 | 0.6066 | 0.6076 | 0.6053 |
17 | 0.6026 | 0.6066 | 0.6076 | 0.6053 | 0.4672 |
18 | 0.6066 | 0.6076 | 0.6053 | 0.4672 | 0.4483 |
19 | 0.6076 | 0.6053 | 0.4672 | 0.4483 | 0.5721 |
20 | 0.6053 | 0.4672 | 0.4483 | 0.5721 | 0.5476 |
21 | 0.4672 | 0.4483 | 0.5721 | 0.5476 | 0.6683 |
22 | 0.4483 | 0.5721 | 0.5476 | 0.6683 | 0.6391 |
23 | 0.5721 | 0.5476 | 0.6683 | 0.6391 | 0.7622 |
24 | 0.5476 | 0.6683 | 0.6391 | 0.7622 | 0.7293 |
25 | 0.6683 | 0.6391 | 0.7622 | 0.7293 | 0.6715 |
26 | 0.6391 | 0.7622 | 0.7293 | 0.6715 | 0.8309 |
27 | 0.7622 | 0.7293 | 0.6715 | 0.8309 | 0.7527 |
From the tail dependence coefficients evaluated and listed in Table 11.4, it is seen that there exist both upper and lower tail dependences for the bivariate (Q, V) and (V, D) flood variables. To this end, we will have the following choices to investigate the dependence:
i. Use a mixed copula to model the bivariate flood variables.
ii. Use two-parameter copulas (Joe, 1997) to model the bivariate flood variables.
iii. Use copulas with upper-tail dependence to model the bivariate flood variables.
In theory, (a) all three approaches should be able to capture the overall dependence structure; (b) compared with approaches ii and iii, approach i may better capture both upper and tail dependences; (c) among the three approaches, parameter estimation for approach i is most complex; and (d) if we are only concerned with the upper-tail dependence, we may prefer approach iii. In what follows, we will discuss the copula candidates for all three approaches.
Approach i: Mixture Copula for Bivariate Variables
Following the discussion in Chapter 4, we introduce the Archimedean copula class. In this class, the Gumbel–Hougaard copula possesses the upper-tail dependence only (λU=2−21θGH), while its survival copula possesses lower-tail dependence (λL=2−21θSGH); and the Clayton copula only possesses the lower-tail dependence (λL=2−1θC). In addition, the Gumbel–Hougaard copula may only model the positive dependence, while the Clayton copula may model both positive and negative dependences. Following the discussion in Chapter 7, the meta-Gaussian copula, which is elliptical, has no tail dependence. Now through this approach, we will choose two candidates:
The Gumbel–Hougaard and Clayton copulas are listed in Chapter 4. The bivariate meta-Gaussian copula is expressed in Chapter 7. The survival Gumbel–Hougaard copula (CSGHCSGH) and its density function (cSGHcSGH) can be written as follows:
In Equations (11.7a) and (11.7b), θ : θ ≥ 1θ:θ≥1 represents the copula parameter to be estimated.
The corresponding mixture copula model may then be written as follows:
where θ = [θ1, θ2, θ3]θ=θ1θ2θ3. a1, a2, a3 ∈ [0, 1] : a1 + a2 + a3 = 1;a1,a2,a3∈01:a1+a2+a3=1; are the weight factors.
Approach ii: Two-Parameter Copulas for Bivariate Variables
As discussed in Joe (1997), the two-parameter Archimedean copulas may be capable of capturing both the overall dependence and the tail dependence. Following Joe (1997), we will briefly introduce BB1 BB4, and BB7 copulas.
BB1 Copula
Its generating function and tail dependence function can be written as follows:
The BB1 copula can only be applied to model the positive dependence and may be considered as a two-parameter Archimedean copula. It possesses both upper- and lower-tail dependences. The limiting copulas are Gumbel–Hougaard copula (θ1→0θ1→0) and Clayton copula (θ2 = 1θ2=1). With the combination of the Gumbel–Hougaard and Clayton copulas, the BB1 copula is able to capture both upper- and lower-tail dependences in which the upper-tail dependence is independent of parameter θ1θ1.
BB4 Copula
where θ1 ≥ 0, θ2>0.θ1≥0,θ2>0. Its tail dependence functions can be written as follows:
Unlike the BB1 copula, the BB4 copula is not a two-parameter Archimedean copula. Its limiting copulas are the Clayton copula when θ2→0θ2→0 and the Galambos copula when θ1→0θ1→0. The Glambos copula belongs to an extreme value copula given as follows:
As seen from Equation (11.10a), the upper-tail dependence of the BB4 copula is independent of parameter θ1.θ1.
BB7 Copula
where θ1 ≥ 1, θ2>0θ1≥1,θ2>0.
BB7 is the same as the BB1 copula and is also a two-parameter Archimedean copula. Its generating function and tail dependence functions can be expressed as follows:
The limiting copulas for the BB7 copula are the Clayton copula when θ1 = 1θ1=1 and the Joe copula when θ2→0θ2→0.
Approach iii: Choosing Copulas with Upper-Tail Dependence
The copulas are chosen from the Archimedean, extreme, and elliptical copula families as follows:
Archimedean family: Gumbel–Hougaard and Joe copulas
Extreme copula family: Galambos copula
Elliptical copula family: meta-Student t copula, λU=λL=2tν+1−ν+11−ρ1+ρ.
Among the copulas listed in approach iii, all four copulas possess the upper-tail dependence. In addition, only the meta-Student t copula also possesses the symmetric lower-tail dependence.
Parameter Estimation and the Best-Fitted Copula for T1
Parameter Estimation for Approach i: Mixture Copula
The pseudo-MLE discussed in Chapter 4 is applied to estimate the parameters of the mixture copula. The initial parameters are set as follows:
✓ Each copula is of equal weight.
✓ The initial copula parameters are represented as the random variables, which may be modeled by one copula.
For each case, we have the following: