Aller au contenu principal

Generalized Pareto distribution


Generalized Pareto distribution


In statistics, the generalized Pareto distribution (GPD) is a family of continuous probability distributions. It is often used to model the tails of another distribution. It is specified by three parameters: location μ {\displaystyle \mu } , scale σ {\displaystyle \sigma } , and shape ξ {\displaystyle \xi } . Sometimes it is specified by only scale and shape and sometimes only by its shape parameter. Some references give the shape parameter as κ = ξ {\displaystyle \kappa =-\xi \,} .

Definition

The standard cumulative distribution function (cdf) of the GPD is defined by

F ξ ( z ) = { 1 ( 1 + ξ z ) 1 / ξ for  ξ 0 , 1 e z for  ξ = 0. {\displaystyle F_{\xi }(z)={\begin{cases}1-\left(1+\xi z\right)^{-1/\xi }&{\text{for }}\xi \neq 0,\\1-e^{-z}&{\text{for }}\xi =0.\end{cases}}}

where the support is z 0 {\displaystyle z\geq 0} for ξ 0 {\displaystyle \xi \geq 0} and 0 z 1 / ξ {\displaystyle 0\leq z\leq -1/\xi } for ξ < 0 {\displaystyle \xi <0} . The corresponding probability density function (pdf) is

f ξ ( z ) = { ( 1 + ξ z ) ξ + 1 ξ for  ξ 0 , e z for  ξ = 0. {\displaystyle f_{\xi }(z)={\begin{cases}(1+\xi z)^{-{\frac {\xi +1}{\xi }}}&{\text{for }}\xi \neq 0,\\e^{-z}&{\text{for }}\xi =0.\end{cases}}}

Characterization

The related location-scale family of distributions is obtained by replacing the argument z by x μ σ {\displaystyle {\frac {x-\mu }{\sigma }}} and adjusting the support accordingly.

The cumulative distribution function of X G P D ( μ , σ , ξ ) {\displaystyle X\sim GPD(\mu ,\sigma ,\xi )} ( μ R {\displaystyle \mu \in \mathbb {R} } , σ > 0 {\displaystyle \sigma >0} , and ξ R {\displaystyle \xi \in \mathbb {R} } ) is

F ( μ , σ , ξ ) ( x ) = { 1 ( 1 + ξ ( x μ ) σ ) 1 / ξ for  ξ 0 , 1 exp ( x μ σ ) for  ξ = 0 , {\displaystyle F_{(\mu ,\sigma ,\xi )}(x)={\begin{cases}1-\left(1+{\frac {\xi (x-\mu )}{\sigma }}\right)^{-1/\xi }&{\text{for }}\xi \neq 0,\\1-\exp \left(-{\frac {x-\mu }{\sigma }}\right)&{\text{for }}\xi =0,\end{cases}}}

where the support of X {\displaystyle X} is x μ {\displaystyle x\geqslant \mu } when ξ 0 {\displaystyle \xi \geqslant 0\,} , and μ x μ σ / ξ {\displaystyle \mu \leqslant x\leqslant \mu -\sigma /\xi } when ξ < 0 {\displaystyle \xi <0} .

The probability density function (pdf) of X G P D ( μ , σ , ξ ) {\displaystyle X\sim GPD(\mu ,\sigma ,\xi )} is

f ( μ , σ , ξ ) ( x ) = 1 σ ( 1 + ξ ( x μ ) σ ) ( 1 ξ 1 ) {\displaystyle f_{(\mu ,\sigma ,\xi )}(x)={\frac {1}{\sigma }}\left(1+{\frac {\xi (x-\mu )}{\sigma }}\right)^{\left(-{\frac {1}{\xi }}-1\right)}} ,

again, for x μ {\displaystyle x\geqslant \mu } when ξ 0 {\displaystyle \xi \geqslant 0} , and μ x μ σ / ξ {\displaystyle \mu \leqslant x\leqslant \mu -\sigma /\xi } when ξ < 0 {\displaystyle \xi <0} .

The pdf is a solution of the following differential equation:

{ f ( x ) ( μ ξ + σ + ξ x ) + ( ξ + 1 ) f ( x ) = 0 , f ( 0 ) = ( 1 μ ξ σ ) 1 ξ 1 σ } {\displaystyle \left\{{\begin{array}{l}f'(x)(-\mu \xi +\sigma +\xi x)+(\xi +1)f(x)=0,\\f(0)={\frac {\left(1-{\frac {\mu \xi }{\sigma }}\right)^{-{\frac {1}{\xi }}-1}}{\sigma }}\end{array}}\right\}}

Special cases

  • If the shape ξ {\displaystyle \xi } and location μ {\displaystyle \mu } are both zero, the GPD is equivalent to the exponential distribution.
  • With shape ξ = 1 {\displaystyle \xi =-1} , the GPD is equivalent to the continuous uniform distribution U ( 0 , σ ) {\displaystyle U(0,\sigma )} .
  • With shape ξ > 0 {\displaystyle \xi >0} and location μ = σ / ξ {\displaystyle \mu =\sigma /\xi } , the GPD is equivalent to the Pareto distribution with scale x m = σ / ξ {\displaystyle x_{m}=\sigma /\xi } and shape α = 1 / ξ {\displaystyle \alpha =1/\xi } .
  • If X {\displaystyle X} {\displaystyle \sim } G P D {\displaystyle GPD} ( {\displaystyle (} μ = 0 {\displaystyle \mu =0} , σ {\displaystyle \sigma } , ξ {\displaystyle \xi } ) {\displaystyle )} , then Y = log ( X ) e x G P D ( σ , ξ ) {\displaystyle Y=\log(X)\sim exGPD(\sigma ,\xi )} [1]. (exGPD stands for the exponentiated generalized Pareto distribution.)
  • GPD is similar to the Burr distribution.

Generating generalized Pareto random variables

Generating GPD random variables

If U is uniformly distributed on (0, 1], then

X = μ + σ ( U ξ 1 ) ξ G P D ( μ , σ , ξ 0 ) {\displaystyle X=\mu +{\frac {\sigma (U^{-\xi }-1)}{\xi }}\sim GPD(\mu ,\sigma ,\xi \neq 0)}

and

X = μ σ ln ( U ) G P D ( μ , σ , ξ = 0 ) . {\displaystyle X=\mu -\sigma \ln(U)\sim GPD(\mu ,\sigma ,\xi =0).}

Both formulas are obtained by inversion of the cdf.

In Matlab Statistics Toolbox, you can easily use "gprnd" command to generate generalized Pareto random numbers.

GPD as an Exponential-Gamma Mixture

A GPD random variable can also be expressed as an exponential random variable, with a Gamma distributed rate parameter.

X | Λ Exp ( Λ ) {\displaystyle X|\Lambda \sim \operatorname {Exp} (\Lambda )}

and

Λ Gamma ( α , β ) {\displaystyle \Lambda \sim \operatorname {Gamma} (\alpha ,\beta )}

then

X GPD ( ξ = 1 / α ,   σ = β / α ) {\displaystyle X\sim \operatorname {GPD} (\xi =1/\alpha ,\ \sigma =\beta /\alpha )}

Notice however, that since the parameters for the Gamma distribution must be greater than zero, we obtain the additional restrictions that: ξ {\displaystyle \xi } must be positive.

In addition to this mixture (or compound) expression, the generalized Pareto distribution can also be expressed as a simple ratio. Concretely, for Y Exponential ( 1 ) {\displaystyle Y\sim {\text{Exponential}}(1)} and Z Gamma ( 1 / ξ , 1 ) {\displaystyle Z\sim {\text{Gamma}}(1/\xi ,1)} , we have μ + σ Y ξ Z GPD ( μ , σ , ξ ) {\displaystyle \mu +\sigma {\frac {Y}{\xi Z}}\sim {\text{GPD}}(\mu ,\sigma ,\xi )} . This is a consequence of the mixture after setting β = α {\displaystyle \beta =\alpha } and taking into account that the rate parameters of the exponential and gamma distribution are simply inverse multiplicative constants.

Exponentiated generalized Pareto distribution

The exponentiated generalized Pareto distribution (exGPD)

If X G P D {\displaystyle X\sim GPD} ( {\displaystyle (} μ = 0 {\displaystyle \mu =0} , σ {\displaystyle \sigma } , ξ {\displaystyle \xi } ) {\displaystyle )} , then Y = log ( X ) {\displaystyle Y=\log(X)} is distributed according to the exponentiated generalized Pareto distribution, denoted by Y {\displaystyle Y} {\displaystyle \sim } e x G P D {\displaystyle exGPD} ( {\displaystyle (} σ {\displaystyle \sigma } , ξ {\displaystyle \xi } ) {\displaystyle )} .

The probability density function(pdf) of Y {\displaystyle Y} {\displaystyle \sim } e x G P D {\displaystyle exGPD} ( {\displaystyle (} σ {\displaystyle \sigma } , ξ {\displaystyle \xi } ) ( σ > 0 ) {\displaystyle )\,\,(\sigma >0)} is

g ( σ , ξ ) ( y ) = { e y σ ( 1 + ξ e y σ ) 1 / ξ 1 for  ξ 0 , 1 σ e y e y / σ for  ξ = 0 , {\displaystyle g_{(\sigma ,\xi )}(y)={\begin{cases}{\frac {e^{y}}{\sigma }}{\bigg (}1+{\frac {\xi e^{y}}{\sigma }}{\bigg )}^{-1/\xi -1}\,\,\,\,{\text{for }}\xi \neq 0,\\{\frac {1}{\sigma }}e^{y-e^{y}/\sigma }\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,{\text{for }}\xi =0,\end{cases}}}

where the support is < y < {\displaystyle -\infty <y<\infty } for ξ 0 {\displaystyle \xi \geq 0} , and < y log ( σ / ξ ) {\displaystyle -\infty <y\leq \log(-\sigma /\xi )} for ξ < 0 {\displaystyle \xi <0} .

For all ξ {\displaystyle \xi } , the log σ {\displaystyle \log \sigma } becomes the location parameter. See the right panel for the pdf when the shape ξ {\displaystyle \xi } is positive.

The exGPD has finite moments of all orders for all σ > 0 {\displaystyle \sigma >0} and < ξ < {\displaystyle -\infty <\xi <\infty } .

The moment-generating function of Y e x G P D ( σ , ξ ) {\displaystyle Y\sim exGPD(\sigma ,\xi )} is

M Y ( s ) = E [ e s Y ] = { 1 ξ ( σ ξ ) s B ( s + 1 , 1 / ξ ) for  s ( 1 , ) , ξ < 0 , 1 ξ ( σ ξ ) s B ( s + 1 , 1 / ξ s ) for  s ( 1 , 1 / ξ ) , ξ > 0 , σ s Γ ( 1 + s ) for  s ( 1 , ) , ξ = 0 , {\displaystyle M_{Y}(s)=E[e^{sY}]={\begin{cases}-{\frac {1}{\xi }}{\bigg (}-{\frac {\sigma }{\xi }}{\bigg )}^{s}B(s+1,-1/\xi )\,\,\,\,\,\,\,\,\,\,\,\,{\text{for }}s\in (-1,\infty ),\xi <0,\\{\frac {1}{\xi }}{\bigg (}{\frac {\sigma }{\xi }}{\bigg )}^{s}B(s+1,1/\xi -s)\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,{\text{for }}s\in (-1,1/\xi ),\xi >0,\\\sigma ^{s}\Gamma (1+s)\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,{\text{for }}s\in (-1,\infty ),\xi =0,\end{cases}}}

where B ( a , b ) {\displaystyle B(a,b)} and Γ ( a ) {\displaystyle \Gamma (a)} denote the beta function and gamma function, respectively.

The expected value of Y {\displaystyle Y} {\displaystyle \sim } e x G P D {\displaystyle exGPD} ( {\displaystyle (} σ {\displaystyle \sigma } , ξ {\displaystyle \xi } ) {\displaystyle )} depends on the scale σ {\displaystyle \sigma } and shape ξ {\displaystyle \xi } parameters, while the ξ {\displaystyle \xi } participates through the digamma function:

E [ Y ] = { log   ( σ ξ ) + ψ ( 1 ) ψ ( 1 / ξ + 1 ) for  ξ < 0 , log   ( σ ξ ) + ψ ( 1 ) ψ ( 1 / ξ ) for  ξ > 0 , log σ + ψ ( 1 ) for  ξ = 0. {\displaystyle E[Y]={\begin{cases}\log \ {\bigg (}-{\frac {\sigma }{\xi }}{\bigg )}+\psi (1)-\psi (-1/\xi +1)\,\,\,\,\,\,\,\,\,\,\,\,\,\,{\text{for }}\xi <0,\\\log \ {\bigg (}{\frac {\sigma }{\xi }}{\bigg )}+\psi (1)-\psi (1/\xi )\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,{\text{for }}\xi >0,\\\log \sigma +\psi (1)\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,{\text{for }}\xi =0.\end{cases}}}

Note that for a fixed value for the ξ ( , ) {\displaystyle \xi \in (-\infty ,\infty )} , the log   σ {\displaystyle \log \ \sigma } plays as the location parameter under the exponentiated generalized Pareto distribution.

The variance of Y {\displaystyle Y} {\displaystyle \sim } e x G P D {\displaystyle exGPD} ( {\displaystyle (} σ {\displaystyle \sigma } , ξ {\displaystyle \xi } ) {\displaystyle )} depends on the shape parameter ξ {\displaystyle \xi } only through the polygamma function of order 1 (also called the trigamma function):

V a r [ Y ] = { ψ ( 1 ) ψ ( 1 / ξ + 1 ) for  ξ < 0 , ψ ( 1 ) + ψ ( 1 / ξ ) for  ξ > 0 , ψ ( 1 ) for  ξ = 0. {\displaystyle Var[Y]={\begin{cases}\psi '(1)-\psi '(-1/\xi +1)\,\,\,\,\,\,\,\,\,\,\,\,\,{\text{for }}\xi <0,\\\psi '(1)+\psi '(1/\xi )\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,{\text{for }}\xi >0,\\\psi '(1)\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,{\text{for }}\xi =0.\end{cases}}}

See the right panel for the variance as a function of ξ {\displaystyle \xi } . Note that ψ ( 1 ) = π 2 / 6 1.644934 {\displaystyle \psi '(1)=\pi ^{2}/6\approx 1.644934} .

Note that the roles of the scale parameter σ {\displaystyle \sigma } and the shape parameter ξ {\displaystyle \xi } under Y e x G P D ( σ , ξ ) {\displaystyle Y\sim exGPD(\sigma ,\xi )} are separably interpretable, which may lead to a robust efficient estimation for the ξ {\displaystyle \xi } than using the X G P D ( σ , ξ ) {\displaystyle X\sim GPD(\sigma ,\xi )} [2]. The roles of the two parameters are associated each other under X G P D ( μ = 0 , σ , ξ ) {\displaystyle X\sim GPD(\mu =0,\sigma ,\xi )} (at least up to the second central moment); see the formula of variance V a r ( X ) {\displaystyle Var(X)} wherein both parameters are participated.

Giuseppe Zanotti Luxury Sneakers

The Hill's estimator

Assume that X 1 : n = ( X 1 , , X n ) {\displaystyle X_{1:n}=(X_{1},\cdots ,X_{n})} are n {\displaystyle n} observations (not need to be i.i.d.) from an unknown heavy-tailed distribution F {\displaystyle F} such that its tail distribution is regularly varying with the tail-index 1 / ξ {\displaystyle 1/\xi } (hence, the corresponding shape parameter is ξ {\displaystyle \xi } ). To be specific, the tail distribution is described as

F ¯ ( x ) = 1 F ( x ) = L ( x ) x 1 / ξ , for some  ξ > 0 , where  L  is a slowly varying function. {\displaystyle {\bar {F}}(x)=1-F(x)=L(x)\cdot x^{-1/\xi },\,\,\,\,\,{\text{for some }}\xi >0,\,\,{\text{where }}L{\text{ is a slowly varying function.}}}

It is of a particular interest in the extreme value theory to estimate the shape parameter ξ {\displaystyle \xi } , especially when ξ {\displaystyle \xi } is positive (so called the heavy-tailed distribution).

Let F u {\displaystyle F_{u}} be their conditional excess distribution function. Pickands–Balkema–de Haan theorem (Pickands, 1975; Balkema and de Haan, 1974) states that for a large class of underlying distribution functions F {\displaystyle F} , and large u {\displaystyle u} , F u {\displaystyle F_{u}} is well approximated by the generalized Pareto distribution (GPD), which motivated Peak Over Threshold (POT) methods to estimate ξ {\displaystyle \xi } : the GPD plays the key role in POT approach.

A renowned estimator using the POT methodology is the Hill's estimator. Technical formulation of the Hill's estimator is as follows. For 1 i n {\displaystyle 1\leq i\leq n} , write X ( i ) {\displaystyle X_{(i)}} for the i {\displaystyle i} -th largest value of X 1 , , X n {\displaystyle X_{1},\cdots ,X_{n}} . Then, with this notation, the Hill's estimator (see page 190 of Reference 5 by Embrechts et al [3]) based on the k {\displaystyle k} upper order statistics is defined as

ξ ^ k Hill = ξ ^ k Hill ( X 1 : n ) = 1 k 1 j = 1 k 1 log ( X ( j ) X ( k ) ) , for  2 k n . {\displaystyle {\widehat {\xi }}_{k}^{\text{Hill}}={\widehat {\xi }}_{k}^{\text{Hill}}(X_{1:n})={\frac {1}{k-1}}\sum _{j=1}^{k-1}\log {\bigg (}{\frac {X_{(j)}}{X_{(k)}}}{\bigg )},\,\,\,\,\,\,\,\,{\text{for }}2\leq k\leq n.}

In practice, the Hill estimator is used as follows. First, calculate the estimator ξ ^ k Hill {\displaystyle {\widehat {\xi }}_{k}^{\text{Hill}}} at each integer k { 2 , , n } {\displaystyle k\in \{2,\cdots ,n\}} , and then plot the ordered pairs { ( k , ξ ^ k Hill ) } k = 2 n {\displaystyle \{(k,{\widehat {\xi }}_{k}^{\text{Hill}})\}_{k=2}^{n}} . Then, select from the set of Hill estimators { ξ ^ k Hill } k = 2 n {\displaystyle \{{\widehat {\xi }}_{k}^{\text{Hill}}\}_{k=2}^{n}} which are roughly constant with respect to k {\displaystyle k} : these stable values are regarded as reasonable estimates for the shape parameter ξ {\displaystyle \xi } . If X 1 , , X n {\displaystyle X_{1},\cdots ,X_{n}} are i.i.d., then the Hill's estimator is a consistent estimator for the shape parameter ξ {\displaystyle \xi } [4].

Note that the Hill estimator ξ ^ k Hill {\displaystyle {\widehat {\xi }}_{k}^{\text{Hill}}} makes a use of the log-transformation for the observations X 1 : n = ( X 1 , , X n ) {\displaystyle X_{1:n}=(X_{1},\cdots ,X_{n})} . (The Pickand's estimator ξ ^ k Pickand {\displaystyle {\widehat {\xi }}_{k}^{\text{Pickand}}} also employed the log-transformation, but in a slightly different way [5].)

See also

  • Burr distribution
  • Pareto distribution
  • Generalized extreme value distribution
  • Exponentiated generalized Pareto distribution
  • Pickands–Balkema–de Haan theorem

References

Further reading

  • Pickands, James (1975). "Statistical inference using extreme order statistics" (PDF). Annals of Statistics. 3 s: 119–131. doi:10.1214/aos/1176343003.
  • Balkema, A.; De Haan, Laurens (1974). "Residual life time at great age". Annals of Probability. 2 (5): 792–804. doi:10.1214/aop/1176996548.
  • Lee, Seyoon; Kim, J.H.K. (2018). "Exponentiated generalized Pareto distribution:Properties and applications towards extreme value theory". Communications in Statistics - Theory and Methods. 48 (8): 1–25. arXiv:1708.01686. doi:10.1080/03610926.2018.1441418. S2CID 88514574.
  • N. L. Johnson; S. Kotz; N. Balakrishnan (1994). Continuous Univariate Distributions Volume 1, second edition. New York: Wiley. ISBN 978-0-471-58495-7. Chapter 20, Section 12: Generalized Pareto Distributions.
  • Barry C. Arnold (2011). "Chapter 7: Pareto and Generalized Pareto Distributions". In Duangkamon Chotikapanich (ed.). Modeling Distributions and Lorenz Curves. New York: Springer. ISBN 9780387727967.
  • Arnold, B. C.; Laguna, L. (1977). On generalized Pareto distributions with applications to income data. Ames, Iowa: Iowa State University, Department of Economics.

External links

  • Mathworks: Generalized Pareto distribution

Text submitted to CC-BY-SA license. Source: Generalized Pareto distribution by Wikipedia (Historical)


PEUGEOT 205