As the basic distribution in our hierarchy of generalized Pareto models we use the classical Pareto distribution, called here the Pareto (I) distribution. Its survival function is of the form (1). In practice, *α* is frequently assumed to be larger than 1, so that the distribution has a finite mean. If a random variable *X* has (1) as its survival function, then we write *X* ∼ *P*(*I*)(*σ*,*α*). In this basic model, the parameter *σ* has a dual role. It is indeed a scale parameter, but also it determines the lower bound of the support of the distribution and, in a sense, plays to some extent the role of a location parameter. A slightly more general model, which separates the roles of location and scale parameters has then been frequently used.

This distribution will be called the Pareto (II) distribution. Its survival function is of the form

\overline{F}(x)={\left[1+\left(\frac{x-\mu}{\sigma}\right)\right]}^{-\alpha},\phantom{\rule{.5em}{0ex}}x>\mu

(2)

where *μ*, the location parameter, is real valued, *σ* is positive and *α* is positive. In most applications *μ* will be non-negative, but negative values for *μ* pose no problems. If *X* has (2) as its survival function, we will write *X* ∼ *P*(II)(*μ*,*σ*,*α*).

There is an intimate relation between the Pareto (II) distribution and the Pickands generalized Pareto model, which is much used in the study of extreme values and peaks over thresholds. A good general reference for discussion of the role of the Pickands generalized Pareto distribution is the book by Falk et al. (2011). The density of the Pickands generalized Pareto model is

f(x;\sigma ,k)=\frac{1}{\sigma}{\left(1-\frac{\mathit{\text{kx}}}{\sigma}\right)}^{(1-k)/k}\phantom{\rule{1em}{0ex}}I(x>0,(\mathit{\text{kx}})/\sigma <1).

(3)

where *σ* > 0 and -*∞* < *k* < *∞*. The density corresponding to *k* = 0 is obtained by taking the limit as *k* *↑* 0 in (3). This model, (3), includes three sub-models. When *k* < 0, it yields a Pareto (II) density (with *μ* = 0), when *k* = 0 it yields an exponential density, while for *k* > 0, it corresponds to a scaled Beta distribution (of the first or standard kind). In fact, several results for the Pareto (II) distribution can be proved to remain valid in the more general context of the Pickands generalized Pareto model. In an income modeling setting, as Pareto observed, heavy tailed distributions are typically encountered and, for this reason, we will concentrate on the Pareto (II) sub-model. We remark, in passing, that despite Pareto’s insistence on the ubiquity of heavy tails, several authors have utilized scaled Beta distributions as income models, and some have even argued in favor of the exponential distribution as a model. The most general model in our hierarchy, the Feller-Pareto model will be seen to actually include some light-tailed distributions corresponding to scaled Beta models with an additional location parameter. Applications of such light tailed models are more likely to be encountered outside of the income distribution context.

Truncated versions of the Pareto (II) distribution, which clearly are not heavy tailed, are sometimes appropriate models for data sets which, for some reason, exclude large values. Such distributions are of the form

\begin{array}{lll}\phantom{\rule{5pt}{0ex}}F(x)& =0,& \phantom{\rule{1em}{0ex}}x\le \mu ,\\ =\frac{1-{\left(1+\frac{x-\mu}{\sigma}\right)}^{-\alpha}}{1-{\left(1+\frac{\tau -\mu}{\sigma}\right)}^{-\alpha}},& \phantom{\rule{1em}{0ex}}\mu <x<\tau ,\\ =1,& \phantom{\rule{1em}{0ex}}x\ge \tau ,\end{array}

(4)

where -*∞* < *μ* < *τ* < *∞*, *σ* > 0, and *α* > 0. Some discussion of such models may be found in Aban et al. (2006) and Arnold and Austin (1987).

The Pareto (III) distribution is a variant model with tail behavior comparable to that of the Pareto (II) distribution. Its survival function is of the form

\overline{F}(x)={\left[1+{\left(\frac{x-\mu}{\sigma}\right)}^{1/\gamma}\right]}^{-1},\phantom{\rule{1em}{0ex}}x>\mu

(5)

where *μ* is real, *σ* is positive and *γ* is positive. We will call *γ* the inequality parameter. If *μ* = 0 and *γ* ≤ 1, then *γ* turns out to be precisely the Gini index of inequality for this distribution. If *X* has (5) as its survival function, we will write *X* ∼ *P*(III)(*μ*,*σ*,*γ*).

If we introduce both a shape and an inequality parameter, we arrive at the Pareto (IV) family:

\overline{F}(x)={\left[1+{\left(\frac{x-\mu}{\sigma}\right)}^{1/\gamma}\right]}^{-\alpha},\phantom{\rule{1em}{0ex}}x>\mu

(6)

where *μ* (location) is real, *σ* (scale) is positive, *γ* (inequality) is positive and *α* (shape) is positive. Although we continue to call *γ* the inequality parameter it will only be identifiable with the Gini index when *α* = 1 and *μ* = 0. One might argue instead that in the P(IV) model both *γ* and *α* would be best described as shape parameters, since neither of them has a direct inequality interpretation. An anonymous referee points out that the two parameters *γ* and *α* govern the behavior of the P(IV) density as *x* approaches *μ* from above and as *x* approaches infinity. Thus, *f*(*x*;*μ*,*σ*,*γ*,*α*)∼*x*^{-α/γ-1} as *x* → *∞* and *f*(*x*-*μ*;*μ*,*σ*,*γ*,*α*) ∼ (*x*-*μ*)^{1/γ-1} as *x* → *μ*. He suggests that an argument might be advanced in favor of a reparameterization in which we define *β* = *α*/*γ*, to highlight the roles of *α* and *β* in determining the limiting behavior of the density. However, in this paper, to be consistent with the notation in Arnold (1983), we will continue with the *μ*,*σ*,*γ*,*α* parameterization and continue to call *γ* the inequality parameter. If a random variable *X* has (6) as its survival function, we will write *X* ∼ *P*(IV)(*μ*,*σ*,*γ*,*α*). Note that the Pareto (IV) distribution, with *μ* = 0, is also known as a Burr Type XII distribution.

The three more specialized families. P(I)-P(III), may be identified as special cases of the Pareto (IV) family as follows:

\begin{array}{c}\phantom{\rule{1.7em}{0ex}}P(\mathrm{I})(\sigma ,\alpha )=P(\text{IV})(\sigma ,\sigma ,1,\alpha ),\\ \phantom{\rule{.25em}{0ex}}P(\text{II})(\mu ,\sigma ,\alpha )=P(\text{IV})(\mu ,\sigma ,1,\alpha ),\\ P(\text{III})(\mu ,\sigma ,\gamma )=P(\text{IV})(\mu ,\sigma ,\gamma ,1).\end{array}

(7)

Feller (1971), p. 49, suggested a different definition of a Pareto distribution. It can be recognized as the distribution of a ratio of two independent gamma variables (a distribution also known as Beta distribution of the second kind). By considering a linear function of a power of such a random variable, we arrive at a very general family, called the Feller–Pareto family. Thus if *X*_{
i
} ∼ Γ(*δ*_{
i
},1)*i* = 1,2, are independent random variables, and if for *μ* real, *σ* > 0 and *γ* > 0 we define

W=\mu +\sigma {({X}_{2}/{X}_{1})}^{\gamma},

(8)

then *W* has a Feller–Pareto distribution, and we write *W* ∼ *FP*(*μ*,*σ*,*γ*,*δ*_{1},*δ*_{2}).

It may be verified that the Pareto (IV) distributions are identifiable with the Feller–Pareto distributions with *δ*_{2} = 1, i.e.,

P(\text{IV})(\mu ,\sigma ,\gamma ,\alpha )=\mathit{\text{FP}}(\mu ,\sigma ,\gamma ,\alpha ,1).

(9)

The density of the general Feller–Pareto distribution defined by (8) is of the form

\begin{array}{r}{f}_{W}(w)={\left(\frac{w-\mu}{\sigma}\right)}^{({\delta}_{2}/\gamma )-1}{\left[1+{\left(\frac{w-\mu}{\sigma}\right)}^{1/\gamma}\right]}^{-{\delta}_{1}-{\delta}_{2}}/\left[\gamma \sigma B({\delta}_{1},{\delta}_{2})\right],\\ w>\mu .\end{array}

(10)

The corresponding survival function is obtainable from tables of the incomplete beta function. For many computations it is simpler to work directly with the representation (8). The Pareto (IV) distributions correspond to the case in which *X*_{2} has a gamma distribution while *X*_{1} has an exponential distribution. The Pareto (III) distributions are encountered when both *X*_{1} and *X*_{2} are exponential variables.

Kalbfleisch and Prentice (1980) call the Feller-Pareto density (with *μ* = 0) a generalized *F* density. Instead we might describe a Feller Pareto variable as being a location and scale transform of a generalized beta variable of the second kind. Recall that a beta variable of the second kind is just a ratio of independent gamma variables.

An even more general model might be built using independent variables *X*_{
i
} ∼ Γ(*δ*_{
i
},1), *i* = 1,2. One could define

W=\mu +\sigma \left(\frac{{X}_{2}^{{\gamma}_{2}}}{{X}_{1}^{{\gamma}_{1}}}\right).

(11)

The additional flexibility provided by the introduction of such a sixth parameter in the model has not been investigated.

The full array of generalized univariate Pareto distributions to be considered in this paper are subsumed in the Feller–Pareto family and a unified derivation of many distributional results is possible. However, in the case of Pareto (I)-(IV) distributions, some alternative representations are also useful.

A random variable *X* has a *P*(I)(*σ*,*α*) distribution if it is of the form

X=\sigma {e}^{V/\alpha}

(12)

where *V* is a standard exponential random variable. An analogous representation of a Pareto (II) variable in terms of an exponential random variable is possible, i.e.,

X=\mu +\sigma \left({e}^{V/\alpha}-1\right).

(13)

Likewise Pareto (III) and (IV) variables can be represented as *X* = *μ* + *σ*(*e*^{V} - 1)^{γ} and *X* = *μ* + *σ*(*e*^{V/α} - 1)^{γ} respectively. The representation (12) for a classical Pareto variable (i.e., Pareto (I)) highlights the useful observation that the logarithm of such a variable has a shifted exponential distribution. This will permit the recognition of many distributional properties of Pareto (I) variables as reflections of parallel properties of exponential variables.

A second important representation of the Pareto (II) distribution, known to Maguire et al. (1952), is as a mixture of exponentials. We may describe it in terms of the conditional survival function, given an auxiliary gamma distributed random variable *Z*. Thus, if

P(X>x\mid Z=z)={e}^{-z(x-\mu )/\sigma},\phantom{\rule{.5em}{0ex}}x>\mu ,

i.e., a (translated) exponential distribution, and if *Z* ∼ Γ(*α*,1), then it follows that unconditionally *X*∼*P*(II)(*μ*,*σ*,*α*). Alternatively, this can be viewed as being equivalent to the representation in (8) after setting *γ* = *δ*_{2} = 1.

This representation of the Pareto (II) distribution as a gamma mixture of exponential distributions is often encountered in reliability and survival contexts, see e.g., Keiding et al. (2002). It is also familiar in Bayesian analysis of exponential data, where the gamma density enters as a convenient prior. In this context the Pareto (II) distribution is sometimes called the Lomax distribution.

The Pareto (III) distribution was apparently first considered by (Fisk 1961a;1961b) who called it a sech^{2} distribution. It is closely related to the logistic distribution. We say that a random variable *X* has a logistic (*μ*,*σ*) distribution, if its distribution function assumes the form

{F}_{X}(x)={\left[1+{e}^{-(x-\mu )/\sigma}\right]}^{-1},\phantom{\rule{.5em}{0ex}}-\infty \phantom{\rule{0.3em}{0ex}}<x<\infty

and we write *X* ∼ *L*(*μ*,*σ*). It is not difficult to verify that

X\sim L(\mu ,\sigma )\iff {e}^{X}\sim P(\mathrm{III})(0,{e}^{\mu},\sigma ).

(14)

It is as a consequence of the relation (14) that the Pareto (III) distribution, with *μ* = 0, is sometimes called the log-logistic distribution.

**Remark 1.** *Johnson et al. (*1994*) refer to a Pareto distribution of the third kind that is not to be confused with the Pareto (III) distribution discussed in this paper. The survival function of this “third kind” distribution is of the form*

\overline{F}(x)={\left(1+\frac{x}{\sigma}\right)}^{-\alpha}{e}^{-\beta x},x>0.

(15)

*This distribution, which was suggested by Pareto (*1897*), was proposed to accommodate cases in which the basic Pareto model (1) was inadequate for fitting certain data configurations. This model is closely related to the Pareto (II) distribution, but with an additional exponential factor. Note that it could be viewed as the distribution of the minimum of a Pareto (II) variable (with μ* = 0*) and an independent exponential variable. This model has been used infrequently, but recently it has reappeared, this time called a tapered Pareto distribution (Kagan and Schoenberg*2001*).*

### 2.1 Distributional properties

The Feller–Pareto distributions are unimodal. The mode is at *μ* if *γ* > *δ*_{2}, while if *γ* ≤ *δ*_{2}, we find (here *W* ∼ *F* *P*(*μ*,*σ*,*γ*,*δ*_{1},*δ*_{2}))

\text{mode}(W)=\mu +\sigma {\left[({\delta}_{2}-\gamma )/({\delta}_{1}+\gamma )\right]}^{\gamma}

(16)

In order to compute moments of the Pareto distributions, it is convenient to work with the representation (8). With *W* ∼ *F* *P*(*μ*,*σ*,*γ*,*δ*_{1},*δ*_{2}), if we define *W*^{∗} = (*W* - *μ*)/*σ*, then *W*^{∗} ∼ *F* *P*(0,1,*γ*,*δ*_{1},*δ*_{2}), i.e., *W*^{∗} = ^{d}(*X*_{2}/*X*_{1})^{γ} where *X*_{
i
} ∼ Γ(*δ*_{
i
},1)*i* = 1,2, are independent random variables. It then can be readily verified that for a real number *τ*, the *τ*’th moment of *W*^{∗} when it exists is of the form

E\left({W}^{\ast \tau}\right)=\mathrm{\Gamma}({\delta}_{1}-\gamma \tau )\mathrm{\Gamma}({\delta}_{2}+\gamma \tau )/\mathrm{\Gamma}({\delta}_{1})\mathrm{\Gamma}({\delta}_{2}),\phantom{\rule{1em}{0ex}}-({\delta}_{2}/\gamma )<\tau <({\delta}_{1}/\gamma ).

(17)

From this expression moments for the Feller-Pareto and the P(II)-P(IV) distributions are readily obtained.

Moments of the Pareto (I) distribution cannot be obtained in this way since, for it, *μ* = *σ* ≠ 0. They are obtainable by direct integration:

(\mathrm{Pareto}\phantom{\rule{0.3em}{0ex}}(I))\phantom{\rule{.5em}{0ex}}E({X}^{\tau})={\sigma}^{\tau}{\left(1-\frac{\tau}{\alpha}\right)}^{-1},\phantom{\rule{.5em}{0ex}}\tau <\alpha .

(18)

Sums of independent Pareto variables typically do not have analytically tractable distributions. If we multiply independent Pareto variables rather than adding them, it is sometimes possible to get simple expressions for the density of the resulting product. In the case of the Pareto I distribution the key lies in utilization of representation (12). Thus, if *X*_{1},*X*_{2},…,*X*_{
n
} are independent Pareto I variables with *X*_{
i
} ∼ *P*(I)(*σ*_{
i
},*α*_{
i
}), then their product *W* has the representation

W=\left(\prod _{i=1}^{n}{\sigma}_{i}\right)exp\left(\sum _{i=1}^{n}({V}_{i}/{\alpha}_{i})\right)

(19)

where the *V*_{
i
}’s are independent standard exponential variables. In some cases expressions are available for the distribution of{\sum}_{i=1}^{n}{V}_{i}/{\alpha}_{i}. In particular, if *α*_{
i
} = *α*,(*i* = 1,2,…,*n*), then{\sum}_{i=1}^{n}{V}_{i}/\alpha \sim \mathrm{\Gamma}(n,1/\alpha ), and we may readily obtain the density of *W*.

A second case in which simple closed form expressions are available is one in which all the *α*_{
i
}’s are distinct. In this situation we can use a result for weighted sums of exponentials given in, for example, Feller (1971), p. 40, and write the survival function of the product in the form:

P(W>w)=\sum _{i=1}^{n}{\left(\frac{w}{\sigma}\right)}^{-{\alpha}_{i}}\prod _{\genfrac{}{}{0ex}{}{k=1}{k\ne i}}^{n}\left(\frac{{\alpha}_{k}}{{\alpha}_{i}-{\alpha}_{k}}\right),\phantom{\rule{1em}{0ex}}w>\sigma

(20)

where\sigma ={\prod}_{i=1}^{n}{\sigma}_{i} and *α*_{
i
} ≠ *α*_{
j
} if *i* ≠ *j*. The distribution of products of independent Pareto (IV) variables with *μ*_{
i
}’s equal to 0 can, via the representation (8), be reduced to a problem involving the distribution of products of powers of independent gamma random variables. Unlike the Pareto (I) case, closed form expressions for the resulting density are apparently not obtainable, although moments of such products are readily available.

The Pareto (IV) family is closed under minimization when certain parameters are common to the minimands. Thus, if *X*_{1} and *X*_{2} are independent random variables with *X*_{
i
} ∼ *P*(IV)(*μ*,*σ*,*γ*,*α*_{
i
}),*i* = 1,2, then

min({X}_{1},{X}_{2})\sim P(\text{IV})(\mu ,\sigma ,\gamma ,{\alpha}_{1}+{\alpha}_{2}).

(21)

Note that in this situation the *X*_{
i
}’s share common values for the parameters *μ*,*σ* and *γ*.

Pareto (III) variables exhibit an interesting closure property with respect to geometric minimization and maximization. Indeed, this was used as a justification for use of the Pareto (III) distribution as a suitable model for income distributions based on a scenario involving competitive bidding for employment (Arnold and Laguna1976). For this, consider a sequence *X*_{1},*X*_{2},… of i.i.d. Pareto (III) (*μ*,*σ*,*γ*) random variables. Suppose that for some *p* ∈ (0,1), *N*_{
p
} is independent of the *X*_{
i
}’s and has a geometric (*p*) distribution, i.e., *P*(*N* = *n*) = *p*(1 - *p*)^{n-1}, *n* = 1,2,…. Define the corresponding random extrema by

{U}_{p}=min\{{X}_{1},{X}_{2},\dots ,{X}_{{N}_{p}}\},

(22)

and

{V}_{p}=max\{{X}_{1},{X}_{2},\dots ,{X}_{{N}_{p}}\}.

(23)

It is readily verified, by conditioning on *N*_{
p
}, that *U*_{
p
} and *V*_{
p
} each have Pareto (III) distributions. Thus

{U}_{p}\sim P(\mathit{\text{III}})(\mu ,\sigma {p}^{\gamma},\gamma ),

(24)

and

{V}_{p}\sim P(\mathit{\text{III}})(\mu ,\sigma {p}^{-\gamma},\gamma ).

(25)

Observe that, if *μ* = 0, then

{p}^{-\gamma}{U}_{p}\stackrel{d}{=}{p}^{\gamma}{V}_{p}\stackrel{d}{=}{X}_{1}.

Some characterization results based on this observation were discussed in Arnold et al. (1986).

It is possible to write down expressions for the densities of order statistics from a Pareto (IV) sample. The corresponding distribution functions will involve incomplete beta functions. Simulation of such order statistics may be accomplished by utilizing the relatively simple form of the Pareto (IV) quantile function, i.e.,

{F}^{-1}(u)=\mu +\sigma {\left[{(1-u)}^{-1/\alpha}-1\right]}^{\gamma}.

(26)

From this we have that if *X*_{i:n} is the *i* th order statistic from a sample of size *n* from a Pareto (IV) distribution, then

{X}_{i:n}\stackrel{d}{=}{F}^{-1}({U}_{i:n})

(27)

where\stackrel{d}{=} means that the two random variables are identically distributed, where *F*^{-1} is as given in (26), and where *U*_{i:n} is the *i* th order statistic of a sample of size *n* from a uniform (0,1) distribution. It is well known (see e.g. David and Nagaraja2003) that *U*_{i:n} ∼ Beta(*i*,*n* - *i* + 1).

In some special cases the density of the *i* th order statistic (27) assumes a known form. For example:

{{X}_{i}}^{\prime}\text{s}\sim P(\text{III})(\mu ,\sigma ,\gamma )\Rightarrow {X}_{i:n}\sim \mathit{\text{FP}}(\mu ,\sigma ,\gamma ,n-i+1,i).

(28)

Another case involves minima:

{{X}_{i}}^{\prime}\text{s}\sim P(\text{IV})(\mu ,\sigma ,\gamma ,\alpha )\Rightarrow {X}_{1:n}\sim P(\mathit{\text{IV}})(\mu ,\sigma ,\gamma ,n\alpha ).

(29)