Abstract
Bivariate or multivariate frequency analysis entails univariate distributions that are determined by empirical fitting to data. The fitting, in turn, requires the determination of distribution parameters and the assessment of the goodness of fit. In practical applications, such as hydrologic design, risk analysis is also needed. The objective of this chapter, therefore, is to briefly discuss these basic elements, which are needed for frequency analysis and will be needed in subsequent chapters.
2.1 Univariate Probability Distributions
Among the univariate distributions, we will briefly discuss the most commonly applied continuous univariate distributions, especially in univariate hydrological frequency analyses (Kite, 1977; Singh, 1998; Rao and Hamed, 2000; Singh and Zhang, 2016). In what follows, we will use X as an independent identically distributed (IID) random variable with probability density function (PDF) f(x)fx and cumulative distribution function (CDF) F(x)Fx.
2.1.1 Normal Distribution
Normal distribution: The PDF and CDF of the normal distribution can be given as follows:
In Equation (2.1), ΦΦ represents the standard normal distribution, and μ, σμ,σ are the location and scale parameters having the connotation of mean and standard deviation of the random variable, respectively. Defining the standard normal variable z = (x − μ)/σz=x−μ/σ, Equation (2.1) can be written as
Abramowitz and Stegun (1965) have numerically approximated F(z) with an error less than 7.5 × 10−57.5×10−5 as
where a1 − 0.319381530, a2 = − 0.356563782, a3 = 1.781477937, a4 = − 1.821255978, a5 = 1.330274429a1=−0.319381530,a2=−0.356563782,a3=1.781477937,a4=−1.821255978,a5=1.330274429, and ϵ(z)ϵz is the error of approximation.
In hydrological frequency analysis, the normal distribution has been commonly applied in two scenarios:
1. Normal distribution with mean of zero is the classic assumption for time series analysis and regression analysis. As a simple example, let Y be the response or prediction variable and X be the predictor variable. Then, a simple linear regression can be expressed as
EYX=Ŷ=a+bx;e=Y−Ŷande~N0σe2(2.2)where e is the residual or error and e~N0σe2 denotes that e is distributed normally with mean 0 and variance σe2. E[Y| X]EYX denotes the conditional expectation of Y given X. Ŷ denotes the predicted response through simple linear regression with intercept of a and slope of b.
For example, a stationary time series {Xt, t = 1, 2, …}Xtt=12… modeled by an Autoregressive and Moving Average (ARMA) model with (p, q) (Box et al., 2007) as follows:
xt=c+ϕ1xt−1+…+ϕpxt−p+et+θ1et−1+…+θqet−q;et~N0σet2(2.3)
In Equation (2.3), c is the long-term average of the time series, and ϕ1, …, ϕp; θ1, …, θqϕ1,…,ϕp;θ1,…,θq are, respectively, the coefficients for autoregressive and moving average terms. More specifically, in Equations (2.2) and (2.3), the residual e, following normal distribution with mean of 0 is commonly called white Gaussian noise.
2. After certain monotone transformation (e.g., Box–Cox or probability integral transformation), the normal distribution (Equation (2.1)) may be applied to model the nonnormally distributed hydrologic variables (e.g., Hazen, 1914; Markovic, 1965).
2.1.2 Log-Normal Distribution
Let Y = ln (x).Y=lnx. If X follows the log-normal distribution, then its logarithm follows the normal distribution, whose PDF can be written as follows:
The CDF of the log-normal distribution can be computed again through the standard normal distribution as follows:
The logarithm of the random variable X is a special case of the Box–Cox transformation (Box and Cox, 1964) with λ = 0λ=0:
The log-normal distribution has been widely used in hydrological frequency analysis (e.g., Chow, 1954).
2.1.3 Student t Distribution
Similar to the normal distribution, the Student t distribution is also bell-shaped (Hogg and Craig, 1978). However, it possesses the heavy tail, i.e., excess kurtosis is greater than 0. The PDF of the standard Student t distribution is given as follows:
And its CDF is given as follows:
In Equations (2.6a) and (2.6b), νν represents the degree of freedom. It is worth to note that with the degree of freedom, the Student t distribution will converge to normal distribution, i.e., the excess kurtosis is approaching 0. It may be explained using the excess kurtosis of Student t distribution as follows: limν→∞exkurtosis=limν→∞6ν−4=0. And 2F1F21 represents the hypergeometric function as follows:
In Equation (2.6c), the Pochhammer symbol is defined as follows:
2.1.4 Exponential and Gamma Distributions
The exponential distribution is a special case of the gamma distribution (Hogg and Craig, 1978). These two distributions have been commonly applied in rainfall and flood frequency analyses. The gamma distribution can be given as follows:
When the shape parameter α = 1α=1, the gamma distribution is reduced to the exponential distribution as follows:
whose CDF is simply
The CDF of the gamma distribution can be expressed as follows:
where
The gamma function can be expressed as follows:
with the following properties:
n is an integer. Abramowitz and Stegun (1965) have numerically approximated the gamma function for 0 < α ≤ 10<α≤1 with an absolute error less than 3 × 10−73×10−7 as Γα=1+∑i=18aiαi+ϵα,
For other values of α, the gamma function properties can be used to compute the gamma function. For example,
Besides the exponential distribution being a special case of Gamma distribution, the chi-square distribution is also a special case of gamma distribution by setting α=k2, where k denotes the degree of freedom and usually taking the integers, and β = 2.
2.1.5 Generalized Extreme Value (GEV) and Extreme Value (EV) Distributions
Introduced by Jenkinson (1955) and recommended by the Natural Environment Research Council (1975) of Great Britain, the GEV distribution has been widely applied for flood frequency analysis. The EV distributions may be directly obtained from the GEV distribution. The PDF and CDF of the GEV distribution can be written as follows:
In Equations (2.9a) and (2.9b), a, b, and c are the scale, shape, and location parameters, respectively, and the range of variable X depends on the sign of parameter b.
The EV distributions can be derived, depending on the shape parameter b.
EV I Distribution (b = 0)
The EV I distribution may also be called the Gumbel distribution (Gumbel, 1941). It is a popular distribution for flood, drought, and rainfall frequency analyses. The PDF and CDF of EV 1 distribution can be written as follows:
The coefficient of skewness is 1.1396 and the X ranges as x ∈ [c, ∞)x∈c∞.
EV II Distribution (b < 0)
The EV II distribution is also called Fréchet distribution (Gumbel, 1958) that has also been applied to frequency analysis. The PDF and CDF of the EV II distribution can be written as follows:
The coefficient of skewness is greater than 1.1396 and X can take on values in the range x∈c+ak∞, which makes it appropriate for flood frequency analysis.
EV III Distribution (b > 0)
Belonging to the Weibull family (i.e., inverse Weibull distribution), the EV III distribution is usually applied for low-flow frequency analysis (Singh, 1998). The PDF and CDF of the EV III distribution can be written as follows:
The coefficient of skewness is less than 1.396 and variable X ranges as x∈−∞c+αβ, which does not render it suitable for flood frequency analysis.
2.1.6 Weibull Distribution
The Weibull distribution (Rosin and Rammler, 1933) is commonly applied for low-flow frequency analysis, hazard functional analysis, as well as risk and reliability analysis. The PDF and CDF of the Weibull distribution can be written as follows:
The Weibull distribution is a reverse GBV distribution.
Pearson and Log-Pearson Type III Distributions
These two distributions are commonly applied for flood frequency analysis (Singh, 1998). The log-Pearson type III distribution is the standard method for flood frequency analysis in the United States, whereas the Pearson type III distribution is the standard method in China.
Pearson Type III Distribution
The PDF and CDF of Pearson type III distribution can be written as follows:
Using y = (x − c)/ay=x−c/a Equations (2.14a) and (2.14b) can be written as
The value of F(y) can be determined in the same way as for the gamma distribution discussed earlier.
Log-Pearson Type III Distribution
Similar to the log-normal distribution, if random variable X follows the log-Pearson type III distribution, then its logarithm Y = ln XY=lnX follows the Pearson type III distribution. The PDF and CDF of log-Pearson type III distribution can be written as follows:
2.1.7 Burr XII Distribution
The PDF and CDF of Burr XII distribution (Burr, 1942) can be written as follows:
2.1.8 Log-Logistic Distribution
The log-logistic distribution is also known as Fisk distribution (Shoukri et al., 1988). Its PDF and CDF can be written as follows:
Equation (2.17b) can be used to directly express a quantile. Equations (2.17) can also be generalized by including the location parameter.
2.1.9 Pareto Distribution
There are four distributions in the Pareto family (Arnold, 1983). The two- and three-parameter Pareto distributions have been used for modeling large floods. The PDF and CDF of the two-parameter Pareto distribution can be written as follows:
There are many other distributions that have been applied in frequency analysis (Singh and Zhang, 2016), besides the distributions illustrated in this section.
2.2 Bivariate Distributions
Here we discuss the commonly applied bivariate distributions in bivariate hydrologic analyses.
2.2.1 Bivariate Gamma Distribution
Several different bivariate gamma distributions have been applied in bivariate hydrological analyses. For all the bivariate gamma distributions introduced, their margins (or marginals) are univariate gamma distribution with the PDF and CDF given as Equations (2.7) and (2.8).
Izawa Bigamma Model
The joint PDF of Izawa bigamma model (Izawa, 1965) is given for random variables X and Y as follows:
where
In the preceding expressions, Is(⋅)Is⋅ is the modified Bessel function of the first kind; ηη is the association parameter between XX and YY; ρρ is Pearson’s product-moment correlation coefficient of X and Y; X~gamma(x; αx, βx); and Y~gamma(y; αy, βy)X~gammaxαxβx;andY~gammayαyβy.
The limitations of the Izawa bigamma distribution are that (i) the shape parameter of X is less than that of Y; and (ii) it may only model the positively correlated random variables.
Moran Model
The PDF of the Moran model (Moran, 1969) of X and YXandY with the gamma marginals can be written as
where x′ = Φ−1(FX(x; αx, βx)), y′ = Φ−1(FY(y; αy, βy))x′=Φ−1FXxαxβx,y′=Φ−1FYyαyβy, ρNρN represents Pearson’s product-moment correlation coefficient of the transformed variables x′ and y′x′andy′.
Smith–Adelfang–Tubbs (SAT) Model
Again with gamma marginals, Smith et al. (1982) developed the another bivariate model (i.e., the SAT model). Its PDF and CDF of the SAT model can be expressed as follows:
Farlie–Gumbel–Morgenstern (FGM) Model
This bivariate model was first proposed by Morgenstern (1956). Its PDF and CDF of the FGM model for random variables X and Y can be expressed as follows:
where {fX(x), fY(y)}fXxfYy and {FX(x), FY(y)}FXxFYy are the marginal PDFs and CDFs of XX and YY, respectively, and ηη is the correlation coefficient between XX and YY.
Gumbel Mixed (GM) Model
The GM model has been applied to model the bivariate flood frequency analysis (Yue et al., 1999). The CDF of the GM model may be expressed as follows:
where θθ is the association parameters of the GM model, which describes the dependence between random variables XX and YY as follows:
where ρρ is Pearson’s product moment correlation coefficient.
It should be noted that the marginal CDFs of random variable X and Y are the Gumbel distribution (i.e., Equation (2.10b)) in the case of the conventional GM model.
Gumbel Logistic (GL) Model
The Gumbel logistic model was first proposed by Gumbel (Gumbel, 1960, 1961). With the Gumbel-distributed marginals (Equation (2.10)), the CDF of the GL model can be expressed as follows:
where
As the association parameter of the GL model, ηη describes the dependence between two random variables.
Bivariate Exponential Model
Marshall and Ingram (1967), Singh and Singh (1991), and Bacchi et al. (1994) proposed the bivariate exponential distribution that can be expressed as follows:
where X and Y are exponentially distributed as
and c represents the association between 0 and 1 between X and Y defined through the coefficient of correlation as
This bivariate model is valid for ρ between 0 and –0.404.
Nagao–Kadoya Bivariate Exponential (BVE) Model
With the exponential distributed random variables XX and YY (Equation (2.7a)), the PDF of the BEV model (Balakrisinan and Lai, 2009) can be expressed as follows:
where
In Equations (2.25) and (2.25a), ρρ is the Pearson correlation coefficient between X and Y; α, βα,β are the parameters of exponential variables X and Y, respectively, as X~ exp (α), Y~ exp (β)X~expα,Y~expβ from Equation (2.7a); and I0andI0 is the modified Bessel function of the first kind.
2.2.2 Bivariate Normal Distribution
The bivariate normal distribution is also applied in bivariate hydrological frequency analysis. Let X and Y follow normal distribution (Equation (2.1)). Then the bivariate normal distribution can be written as follows:
2.2.3 Bivariate Log-Normal Distribution
For the log-normally distributed random variables X and Y (Equation (2.4)), the joint distribution may be expressed with the bivariate log-normal distribution as follows:
where μX, σX; μY, σYμX,σX;μY,σY are the mean and standard deviations of random variables X and Y; and ρρ is the Pearson correlation coefficient of (lnX, lnY)lnXlnY.
From the preceding commonly applied bivariate probability distribution models, it is seen that (1) the bivariate gamma and exponential family may only model the positive dependence; (2) the bivariate normal and log-normal distribution may model the dependence in the entire range; and (3) the marginal distributions of all the models belong to the same type of univariate distribution, i.e., gamma, exponential, normal, and log-normal distributions.
2.3 Estimation of Parameters of Probability Distributions
The dependence of the commonly applied conventional bivariate distributions are associated with the Pearson correlation coefficient of the bivariate random variables. In this section, we will only briefly review the parameter estimation for univariate probability distributions.
There are a number of methods that may be applied to estimate the parameters of univariate distributions (Singh, 1998; Rao and Hamed, 2000). These methods are (1) method of moments (MOM), (2) method of maximum likelihood estimation (MLE), (3) method of probability weighted moments (PWM), (4) method of L-moments (LM), (5) method of least squares (LS), (6) method of maximum entropy (MAX_ENT), (7) method of mixed moments (MIX), (8) the generalized method of moments (GMM), and (9) incomplete means method (ICM). Let X be a random variable with density function f(x; α1, α2, …, αk)fxα1α2…αk in which ααs are the parameters and X = [x1, x2, …, xn]X=x1x2…xn is the sample drawn from the population. In what follows, we will introduce the four most commonly applied methods in hydrology and water resources engineering, i.e., the MOM, MLE, PWM, and LM methods.
2.3.1 Method of Moments
The MOM is a natural and relatively easy parameter estimation method for univariate distributions. However, MOM is usually inferior in quality and not as efficient as the MLE, especially for distributions with a large number of parameters (three or more). This is partly because higher-order moments are more likely to be biased for relatively small samples (Rao and Hamed, 2000).
MOM assumes that the sample moments are equal to the population moments, that is, the sample is sufficiently large to be representative of the population. Given the probability distribution with k parameters α1, …, αkα1,…,αk, we can compute k sample moments from the sample X = {x1, …, xn},X=x1…xn, such as sample mean X¯, sample standard deviation (SX)SX, sample skewness coefficient (g1)g1, and sample kurtosis. The relation between moments and parameters of the probability distribution is then established by simultaneously solving k equations for the unknown parameters: α1, …, αkα1,…,αk. It is worth noting that the first moment is computed about the origin, while the other sample moments are about the first moment (mean). We will illustrate parameter estimation by MOM for normal, gamma, Weibull, and Gumbel distributions as examples.
The rth-moment ratio, denoted as CrCr, is defined as follows:
From Equation (2.28), we can see the following:
In addition, the classical moment diagram is graphed using the possible pairs (β1, β2)β1β2, which are related to C3C3 and C4C4 as follows:
Solution: With the PDF of normal distribution given in Equation (2.1) and letting α1 = μ; α2 = σα1=μ;α2=σ, we can estimate parameters α1 and α2α1andα2 by the solving the following two equations:
The following equates the sample mean X¯ to the population mean and the sample variance VAR(X) to the population variance:
In Equation (2.29), N is replaced by (N−1) to correct for the bias due to sample size.
Solving Equations (2.29c) and (2.29d) simultaneously, we get the following:
Solution: The PDF of gamma distribution is given as Equation (2.7). Let α1 = αα1=α and α2 = β.α2=β. The first moment of gamma distribution can be written as follows:
The variance of gamma distribution can be given as follows:
Substituting the sample mean and variance as m1 = μ1; m2 = μ2m1=μ1;m2=μ2, we can estimate the parameters by solving Equations (2.30a) and (2.30b) simultaneously as follows:
It is worth noting that the exponential distribution is a special case of gamma distribution with α1 = 1α1=1, and α2 = 1/m1α2=1/m1.
Solution: The PDF of Weibull distribution is given as Equation (2.13a). Let α1 = a, α2 = b.α1=a,α2=b. Then we can write the population mean as follows: