javascript is disabled, enable it otherwise the site will not work properly!

javascript est désactivé, activez-le sinon le site ne fonctionnera pas correctement !

Generalized Pareto distribution

In statistics, the generalized Pareto distribution (GPD) is a family of continuous probability distributions. It is often used to model the tails of another distribution. It is specified by three parameters: location $\mu$ , scale $\sigma$ , and shape $\xi$ . Sometimes it is specified by only scale and shape and sometimes only by its shape parameter. Some references give the shape parameter as $\kappa =-\xi \,$ .

Definition

The standard cumulative distribution function (cdf) of the GPD is defined by

F_{\xi }(z)={\begin{cases}1-\left(1+\xi z\right)^{-1/\xi }&{\text{for }}\xi \neq 0,\\1-e^{-z}&{\text{for }}\xi =0.\end{cases}}

where the support is $z\geq 0$ for $\xi \geq 0$ and $0\leq z\leq -1/\xi$ for $\xi <0$ . The corresponding probability density function (pdf) is

f_{\xi }(z)={\begin{cases}(1+\xi z)^{-{\frac {\xi +1}{\xi }}}&{\text{for }}\xi \neq 0,\\e^{-z}&{\text{for }}\xi =0.\end{cases}}

Characterization

The related location-scale family of distributions is obtained by replacing the argument z by ${\frac {x-\mu }{\sigma }}$ and adjusting the support accordingly.

The cumulative distribution function of $X\sim GPD(\mu ,\sigma ,\xi )$ ( $\mu \in \mathbb {R}$ , $\sigma >0$ , and $\xi \in \mathbb {R}$ ) is

F_{(\mu ,\sigma ,\xi )}(x)={\begin{cases}1-\left(1+{\frac {\xi (x-\mu )}{\sigma }}\right)^{-1/\xi }&{\text{for }}\xi \neq 0,\\1-\exp \left(-{\frac {x-\mu }{\sigma }}\right)&{\text{for }}\xi =0,\end{cases}}

where the support of $X$ is $x\geqslant \mu$ when $\xi \geqslant 0\,$ , and $\mu \leqslant x\leqslant \mu -\sigma /\xi$ when $\xi <0$ .

The probability density function (pdf) of $X\sim GPD(\mu ,\sigma ,\xi )$ is

f_{(\mu ,\sigma ,\xi )}(x)={\frac {1}{\sigma }}\left(1+{\frac {\xi (x-\mu )}{\sigma }}\right)^{\left(-{\frac {1}{\xi }}-1\right)}

again, for $x\geqslant \mu$ when $\xi \geqslant 0$ , and $\mu \leqslant x\leqslant \mu -\sigma /\xi$ when $\xi <0$ .

The pdf is a solution of the following differential equation:

\left\{{\begin{array}{l}f'(x)(-\mu \xi +\sigma +\xi x)+(\xi +1)f(x)=0,\\f(0)={\frac {\left(1-{\frac {\mu \xi }{\sigma }}\right)^{-{\frac {1}{\xi }}-1}}{\sigma }}\end{array}}\right\}

Special cases

If the shape $\xi$ and location $\mu$ are both zero, the GPD is equivalent to the exponential distribution.
With shape $\xi =-1$ , the GPD is equivalent to the continuous uniform distribution $U(0,\sigma )$ .
With shape $\xi >0$ and location $\mu =\sigma /\xi$ , the GPD is equivalent to the Pareto distribution with scale $x_{m}=\sigma /\xi$ and shape $\alpha =1/\xi$ .
If $X$ $\sim$ $GPD$ $($ $\mu =0$ , $\sigma$ , $\xi$ $)$ , then $Y=\log(X)\sim exGPD(\sigma ,\xi )$ [1]. (exGPD stands for the exponentiated generalized Pareto distribution.)
GPD is similar to the Burr distribution.

Generating generalized Pareto random variables

Generating GPD random variables

If U is uniformly distributed on (0, 1], then

X=\mu +{\frac {\sigma (U^{-\xi }-1)}{\xi }}\sim GPD(\mu ,\sigma ,\xi \neq 0)

and

X=\mu -\sigma \ln(U)\sim GPD(\mu ,\sigma ,\xi =0).

Both formulas are obtained by inversion of the cdf.

In Matlab Statistics Toolbox, you can easily use "gprnd" command to generate generalized Pareto random numbers.

GPD as an Exponential-Gamma Mixture

A GPD random variable can also be expressed as an exponential random variable, with a Gamma distributed rate parameter.

X|\Lambda \sim \operatorname {Exp} (\Lambda )

and

\Lambda \sim \operatorname {Gamma} (\alpha ,\beta )

then

X\sim \operatorname {GPD} (\xi =1/\alpha ,\ \sigma =\beta /\alpha )

Notice however, that since the parameters for the Gamma distribution must be greater than zero, we obtain the additional restrictions that: $\xi$ must be positive.

In addition to this mixture (or compound) expression, the generalized Pareto distribution can also be expressed as a simple ratio. Concretely, for $Y\sim {\text{Exponential}}(1)$ and $Z\sim {\text{Gamma}}(1/\xi ,1)$ , we have $\mu +\sigma {\frac {Y}{\xi Z}}\sim {\text{GPD}}(\mu ,\sigma ,\xi )$ . This is a consequence of the mixture after setting $\beta =\alpha$ and taking into account that the rate parameters of the exponential and gamma distribution are simply inverse multiplicative constants.

Exponentiated generalized Pareto distribution

The exponentiated generalized Pareto distribution (exGPD)

If $X\sim GPD$ $($ $\mu =0$ , $\sigma$ , $\xi$ $)$ , then $Y=\log(X)$ is distributed according to the exponentiated generalized Pareto distribution, denoted by $Y$ $\sim$ $exGPD$ $($ $\sigma$ , $\xi$ $)$ .

The probability density function(pdf) of $Y$ $\sim$ $exGPD$ $($ $\sigma$ , $\xi$ $)\,\,(\sigma >0)$ is

g_{(\sigma ,\xi )}(y)={\begin{cases}{\frac {e^{y}}{\sigma }}{\bigg (}1+{\frac {\xi e^{y}}{\sigma }}{\bigg )}^{-1/\xi -1}\,\,\,\,{\text{for }}\xi \neq 0,\\{\frac {1}{\sigma }}e^{y-e^{y}/\sigma }\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,{\text{for }}\xi =0,\end{cases}}

where the support is $-\infty <y<\infty$ for $\xi \geq 0$ , and $-\infty <y\leq \log(-\sigma /\xi )$ for $\xi <0$ .

For all $\xi$ , the $\log \sigma$ becomes the location parameter. See the right panel for the pdf when the shape $\xi$ is positive.

The exGPD has finite moments of all orders for all $\sigma >0$ and $-\infty <\xi <\infty$ .

The moment-generating function of $Y\sim exGPD(\sigma ,\xi )$ is

M_{Y}(s)=E[e^{sY}]={\begin{cases}-{\frac {1}{\xi }}{\bigg (}-{\frac {\sigma }{\xi }}{\bigg )}^{s}B(s+1,-1/\xi )\,\,\,\,\,\,\,\,\,\,\,\,{\text{for }}s\in (-1,\infty ),\xi <0,\\{\frac {1}{\xi }}{\bigg (}{\frac {\sigma }{\xi }}{\bigg )}^{s}B(s+1,1/\xi -s)\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,{\text{for }}s\in (-1,1/\xi ),\xi >0,\\\sigma ^{s}\Gamma (1+s)\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,{\text{for }}s\in (-1,\infty ),\xi =0,\end{cases}}

where $B(a,b)$ and $\Gamma (a)$ denote the beta function and gamma function, respectively.

The expected value of $Y$ $\sim$ $exGPD$ $($ $\sigma$ , $\xi$ $)$ depends on the scale $\sigma$ and shape $\xi$ parameters, while the $\xi$ participates through the digamma function:

E[Y]={\begin{cases}\log \ {\bigg (}-{\frac {\sigma }{\xi }}{\bigg )}+\psi (1)-\psi (-1/\xi +1)\,\,\,\,\,\,\,\,\,\,\,\,\,\,{\text{for }}\xi <0,\\\log \ {\bigg (}{\frac {\sigma }{\xi }}{\bigg )}+\psi (1)-\psi (1/\xi )\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,{\text{for }}\xi >0,\\\log \sigma +\psi (1)\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,{\text{for }}\xi =0.\end{cases}}

Note that for a fixed value for the $\xi \in (-\infty ,\infty )$ , the $\log \ \sigma$ plays as the location parameter under the exponentiated generalized Pareto distribution.

The variance of $Y$ $\sim$ $exGPD$ $($ $\sigma$ , $\xi$ $)$ depends on the shape parameter $\xi$ only through the polygamma function of order 1 (also called the trigamma function):

Var[Y]={\begin{cases}\psi '(1)-\psi '(-1/\xi +1)\,\,\,\,\,\,\,\,\,\,\,\,\,{\text{for }}\xi <0,\\\psi '(1)+\psi '(1/\xi )\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,{\text{for }}\xi >0,\\\psi '(1)\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,{\text{for }}\xi =0.\end{cases}}

See the right panel for the variance as a function of $\xi$ . Note that $\psi '(1)=\pi ^{2}/6\approx 1.644934$ .

Note that the roles of the scale parameter $\sigma$ and the shape parameter $\xi$ under $Y\sim exGPD(\sigma ,\xi )$ are separably interpretable, which may lead to a robust efficient estimation for the $\xi$ than using the $X\sim GPD(\sigma ,\xi )$ [2]. The roles of the two parameters are associated each other under $X\sim GPD(\mu =0,\sigma ,\xi )$ (at least up to the second central moment); see the formula of variance $Var(X)$ wherein both parameters are participated.

The Hill's estimator

Assume that $X_{1:n}=(X_{1},\cdots ,X_{n})$ are $n$ observations (not need to be i.i.d.) from an unknown heavy-tailed distribution $F$ such that its tail distribution is regularly varying with the tail-index $1/\xi$ (hence, the corresponding shape parameter is $\xi$ ). To be specific, the tail distribution is described as

{\bar {F}}(x)=1-F(x)=L(x)\cdot x^{-1/\xi },\,\,\,\,\,{\text{for some }}\xi >0,\,\,{\text{where }}L{\text{ is a slowly varying function.}}

It is of a particular interest in the extreme value theory to estimate the shape parameter $\xi$ , especially when $\xi$ is positive (so called the heavy-tailed distribution).

Let $F_{u}$ be their conditional excess distribution function. Pickands–Balkema–de Haan theorem (Pickands, 1975; Balkema and de Haan, 1974) states that for a large class of underlying distribution functions $F$ , and large $u$ , $F_{u}$ is well approximated by the generalized Pareto distribution (GPD), which motivated Peak Over Threshold (POT) methods to estimate $\xi$ : the GPD plays the key role in POT approach.

A renowned estimator using the POT methodology is the Hill's estimator. Technical formulation of the Hill's estimator is as follows. For $1\leq i\leq n$ , write $X_{(i)}$ for the $i$ -th largest value of $X_{1},\cdots ,X_{n}$ . Then, with this notation, the Hill's estimator (see page 190 of Reference 5 by Embrechts et al [3]) based on the $k$ upper order statistics is defined as

{\widehat {\xi }}_{k}^{\text{Hill}}={\widehat {\xi }}_{k}^{\text{Hill}}(X_{1:n})={\frac {1}{k-1}}\sum _{j=1}^{k-1}\log {\bigg (}{\frac {X_{(j)}}{X_{(k)}}}{\bigg )},\,\,\,\,\,\,\,\,{\text{for }}2\leq k\leq n.

In practice, the Hill estimator is used as follows. First, calculate the estimator ${\widehat {\xi }}_{k}^{\text{Hill}}$ at each integer $k\in \{2,\cdots ,n\}$ , and then plot the ordered pairs $\{(k,{\widehat {\xi }}_{k}^{\text{Hill}})\}_{k=2}^{n}$ . Then, select from the set of Hill estimators $\{{\widehat {\xi }}_{k}^{\text{Hill}}\}_{k=2}^{n}$ which are roughly constant with respect to $k$ : these stable values are regarded as reasonable estimates for the shape parameter $\xi$ . If $X_{1},\cdots ,X_{n}$ are i.i.d., then the Hill's estimator is a consistent estimator for the shape parameter $\xi$ [4].

Note that the Hill estimator ${\widehat {\xi }}_{k}^{\text{Hill}}$ makes a use of the log-transformation for the observations $X_{1:n}=(X_{1},\cdots ,X_{n})$ . (The Pickand's estimator ${\widehat {\xi }}_{k}^{\text{Pickand}}$ also employed the log-transformation, but in a slightly different way [5].)

References

External links

Mathworks: Generalized Pareto distribution

Text submitted to CC-BY-SA license. Source: Generalized Pareto distribution by Wikipedia (Historical)

Owlapps.net - since 2012 - Les chouettes applications du hibou