## Abstract

In this chapter, we apply copulas to network evaluation and design. The network is considered to be comprised of rain gauges that are located in the southwest (seven gauges) and east central (three gauges) parts of Louisiana. To select proper rain gauges for network design, the kernel density is applied to model the marginal rainfall variables as that studied for rainfall analysis in Chapter 10. For the simplicity of illustrating the copula-based network design, meta-elliptical copulas (i.e., meta-Gaussian and meta-Student t) are applied to model the spatial dependence among rain gauges. The network design case study shows the appropriateness of the copula-based network design.

### 15.1 Introduction

A majority of studies on network design and evaluation have applied the multivariate normal distribution. Krstanovic and Singh (1992a, b) applied the entropy theory to evaluate the rainfall network in Louisiana. They studied both spatial and temporal rainfall network design. For spatial investigation, they imposed the assumption of no temporal dependence for the univariate rainfall record of a given rain gauge station and vice versa. The multivariate Gaussian distribution was applied in the evaluation procedure.

Chow and Liu (1968) evaluated the dependence tree with discrete probability distributions and mutual information between any given two (or pair of) random variables. They proposed an optimization of n-dimensional probability distribution with the product of 1 univariate distribution and n-2 bivariate conditional distributions. Applying the gamma distribution to rainfall variables at each gauging station and bivariate normal distribution to model rainfall variables at the paired stations, Al-Zahrani and Husain (1998) studied rainfall network reduction and expansion. Using extreme flow data in southern Manitoba, Yang and Burn (1994) proposed directional information transfer (DIT) to study the information transmitted between the paired gauging stations. They used DIT to group streamflow gauges.

Dong et al. (2005) studied the impact of the density of rain gauges on the streamflow simulation accuracy based on the cross-correlation coefficient (with lag k) between areal rainfall and discharge at Yuxiakuo of the Qingjiang River basin, located in the south of Three Gorges area of the Yangtze River, China. They found that with the increase of number of rain gauges, the variance of areal rainfall decreased hyperbolically. Inversely, with the increase of number of rain gauges, the cross-correlation increased hyperbolically between areal rainfall and discharge.

Yeh et al. (2006) studied the optimization of the groundwater quality monitoring network with factorial kriging and genetic algorithms with a case study of Pingtung Plain in Taiwan. They found that Gaussian models (with a range of 28.5 km) and spherical model (with a range of 40 km) may be applied for the modeling of short and long spatial variations. Mishra and Coulibaly (2009) reviewed and discussed hydrometric network designs. Xu et al. (2015) applied entropy theory to rain gauge network analysis, using the XiangJiang River (a tributary of the Yangze River) as a case study. Among 184 rain gauges in the basin, combinations of 8 gauges were investigated. Three measures (i.e., information of the bicombinations, bias, and Nash–Sutcliffe coefficient) were applied to identify the best network combination. Based on the good and best subnetwork obtained from different combinations of the rainfall networks and using Xinanjiang and Soil and Water Assessment Tool (SWAT) models, the authors compared streamflow hydrographs generated from the subnetwork of the rain gauges and all 184 rain gauges. Li et al. (2012) proposed entropy criterion: maximum information minimum redundancy (MIMR) for hydrometric network design, which maximized the joint entropy within the optimal set, as well as the transinformation between stations within and outside of the optimal set. Additionally, the optimal set should possess the minimal duplication of information.

Using the Pijnacker region in the Netherlands as a case-study example, Alfonso et al. (2010) proposed a water level monitor network design using information theory of discrete case. They also applied the mutual information and DIT for water level monitoring. Additionally, they estimated the total correlation of the network using the following: TCX1X2…XN=∑i=1NHXi−HX1X2…XN

As stated in Markus et al. (2003), the difficulties in conventional DIT are (1) the joint distribution must be constructed to compute the mutual information *I* and (2) for the multivariate case, several simplifications are made, by analyzing mutual information of pairs of stations and analyzing the resulting two-dimensional transinformation matrices or by assuming a normal distribution to calculate the multivariate joint entropy. These difficulties lead to the limitations of the previous studies: (1) an inappropriate distribution function may be selected, as a result of limited sample size available to characterize the multivariate distribution; (2) involvement of comparing different probability distribution functions represents another subjective aspect of the problem; and (3) a high level of skill and experience is needed to deal with the conventional multivariate distribution functions.

To overcome these difficulties and limitations, Xu et al. (2017) investigated the gauge network design using a two-phase copula entropy-based model. In this chapter, the copula-based network design is presented using the rainfall network from Southwest and East Central Louisiana as a case study to answer the following questions:

1. How much information is retained by a random variable (station)?

2. What is the information conveyed by several variables (stations) together?

3. How much information of the random variable (station) can be inferred from the knowledge of other stations through transinformation (i.e., mutual information) with the use of copula theory?

### 15.2 Dataset

Based on the study by Krstanovic and Singh (1992a, b), daily precipitation data from East Central and Southwest Louisiana are selected for the case study. Table 15.1 lists the names of the rain gauges and the lengths of records. To simplify computation, the common annual rainfall record from 1980–2015 are computed from the daily record and applied for rainfall network analysis. Figure 15.1 maps the 10 rain gauges selected. As stated in Krstanovic and Singh (1992a, b), rain gauge numbers 2, 8, and 10 are located in the East Central region, and the rest of the stations are located in the Southwest region. Table 15.2 lists the sample statistics of each rain gauge. It is seen that the annual rainfall variable (except at stations Baton Rouge, Jennings, and Slidell) is slightly skewed to the left. The histograms in Figure 15.2 show that the univariate Gaussian distribution may not be the appropriate candidate to model the marginal rainfall variables. As a result, the kernel density function is applied to model the marginal rainfall variables, which is also shown in Figure 15.2.

No. | Stations | Record range | No. | Stations | Record length |
---|---|---|---|---|---|

1 | Abbeville | 1923–2016 | 6 | Lake Charles | 1973–2016 |

2 | Baton Rouge | 1930–2016 | 7 | Leland Bowman | 1951–2016 |

3 | Crowley | 1927–2016 | 8 | Livington | 1980–2016 |

4 | De Ridder | 1915–2015 | 9 | Rockfeller | 1965–2016 |

5 | Jennings | 1917–2016 | 10 | Slidell | 1974–2016 |

Stations | Mean (mm) | Standard deviation (mm) | Skewness | Kurtosis |
---|---|---|---|---|

Abbeville | 1561.86 | 297.62 | –0.02 | 2.55 |

Baton Rouge | 1562.76 | 288.44 | 0.12 | 2.63 |

Crowley | 1540.08 | 262.58 | –0.23 | 2.17 |

De Ridder | 1520.89 | 363.61 | –0.38 | 3.45 |

Jennings | 1545.28 | 258.95 | 0.11 | 2.25 |

Lake Charles | 1513.91 | 307.29 | –0.13 | 2.45 |

Leland Bowman | 1559.30 | 318.48 | –0.53 | 3.07 |

Livington | 1615.64 | 311.20 | –0.35 | 2.23 |

Rockfeller | 1482.09 | 315.78 | –0.07 | 2.88 |

Slidell | 1611.07 | 344.99 | 0.48 | 3.36 |

Figure 15.2 Histogram of annual rainfall at each rain gauge.

### 15.3 Methodology for Rainfall Network Design

#### 15.3.1 Assumptions and Evaluation Procedures

Following Krstanovic and Singh (1992a, b) and Alfonso et al. (2010), the methodology for network design is based on the assumptions that (i) the stations are as independent as possible, (ii) a station should yield high marginal entropy, and (iii) the mutual information should be minimized (or in other words, maximize the nontransferred information).

The design procedure can be outlined as follows:

1. Compute the marginal entropy (

*H*_{Xi}HXi) and choose the station yielding the maximum marginal entropy as the center station (*X*_{m1}Xm1) that needs to be added to the network.

2. Determine the second station (

*X*_{m2}Xm2) by minimizing the transinformation (i.e., mutual information) or maximize the nontransferred information between station m_{1}and the remaining stations using the following:

Xm2∈minIXm1Xii∈1,…,M,i≠m1(15.1a)*X*_{m2}∈ min (*I*(*X*_{m1};*X*_{i}))*i*∈ 1, …,*M*,*i*≠*m*_{1}

or

Xm2∈max1−IXm1XiHXm1=t1(15.1b)

where the mutual information

*I*is symmetric.

IXm1Xi=HXm1−HXm1Xi(15.1c)*I*(*X*_{m1};*X*_{i}) =*H*(*X*_{m1}) −*H*(*X*_{m1}|*X*_{i})

3. Determine the third station (

*X*_{m3})Xm3) conditioning on*X*_{m1}and*X*_{m2}Xm1andXm2 by minimizing the transinformation (mutual information in a multivariate case) or maximize the coefficient of nontransformed information:

Xm3∈minHXm1Xm2−HXm1Xm2Xi,i=1,…,M,i∉m1m2(15.2a)*X*_{m3}∈ min (*H*(*X*_{m1},*X*_{m2}) −*H*(*X*_{m1},*X*_{m2}|*X*_{i})),*i*= 1, …,*M*,*i*∉ (*m*_{1},*m*_{2})

Comparing with Equation (15.1c), Equation (15.2a) is equivalent to

or

Xm3∈minIXm1Xm2Xm3,i=1,…,M,i∉m1m2(15.2b)*X*_{m3}∈ min (*I*((*X*_{m1},*X*_{m2});*X*_{m3})),*i*= 1, …,*M*,*i*∉ (*m*_{1},*m*_{2})

Xm3∈max1−HXm1Xm2−HXm1Xm2XiHXm1Xm2=t2,i=1,…,M,i∉m1m2(15.2c)

4. Similarly, one can determine

*X*_{mi}Xmi using*X*_{mi}∈ min (*H*(*X*_{m1}, …,*X*_{mi − 1})–*H*(*X*_{m1}, …,*X*_(*m*_{i − 1})|*X*_{i}))Xmi∈minHXm1…Xmi−1–HXm1…X_mi−1Xi

or

∈ min (∈minIXm1…Xmi−1Xi,i=1,…,M,i∉m1…mi−1(15.3a)*I*((*X*_{m1}, …,*X*_{mi − 1});*X*_{i})),*i*= 1, …,*M*,*i*∉ (*m*_{1}, …,*m*_{i − 1})

Xmi∈max1−HXm1…Xmi−1−HXm1…Xmi−1XmiHXm1…Xmi−1=ti−1(15.3b)

The coefficient of nontransformed information should fulfill the following condition:

*t*

_{i}<

*t*

_{i − 1}< … <

*t*

_{1}≤ 1

No more station is needed when *t*_{i} ≤ *t*_{i + 1}ti≤ti+1, i.e., the repetitive information exists at station *X*_{mi + 1}Xmi+1 such that only first *X*_{m1}, …, *X*_{mi}Xm1,…,Xmi stations are necessary for the network with initial *M* stations. In what follows, we will describe the procedure of rainfall network design using the procedures discussed in this section.

#### 15.3.2 Estimation of Marginal Entropy

As stated in Section 15.3.1, the marginal entropy needs to be first estimated, and the station that yields the largest entropy will be chosen as the center station. As stated earlier, the empirical kernel density is applied to model the marginal rainfall variables in order to avoid the possible probability distribution misidentification. Furthermore, with the characteristic of rainfall records, the kernel density with the positive support is applied for analysis.

Following Beirlant et al. (2001), the marginal entropy is written as follows:

where *H*_{mi}Hmi represents the marginal entropy of rain gauge *m*_{i}mi; *n* represents the length of rainfall record; and fmiker represents the kernel density function with positive supports.