# Univariate and multivariate Pareto models

- Barry C Arnold
^{1}Email author

**1**:11

**DOI: **10.1186/2195-5832-1-11

© Arnold; licensee Springer. 2014

**Received: **13 February 2014

**Accepted: **9 May 2014

**Published: **17 June 2014

## Abstract

The Pareto distribution has long been recognized as a suitable model for many non-negative socio-economic variables. Univariate and multivariate variations abound. Some unification is possible by representing the Pareto variables in terms of independent gamma distributed components. Further unification is sometimes possible since some of the frequently used multivariate Pareto models share the same copula. In some cases, inference strategies can be developed to take advantage of the stochastic representations in terms of gamma components.

### Keywords

Inequality Heavy tails Generalized Pareto Feller-Pareto Kumaraswamy distribution Hidden truncation Conditional specification## 1 Introduction

*x*is often well approximated by

*Cx*

^{-α}for some positive

*C*and some positive

*α*, led inexorably to consideration of the following well known model used for fitting univariate income data.

Here *σ*, the scale parameter, is positive and *α* (Pareto’s index of inequality) is also positive.

This model is typically referred to as the classical Pareto model. Improved fitting of data is encountered when more general Pareto-like distributions are considered. In this survey, the classical model will be embedded in a hierarchy of more complicated Pareto models. In this hierarchy of generalized Pareto distributions, the classical model will be called the Pareto (I) distribution. Multivariate income distributions are also of interest and, in that arena, a hierarchy of multivariate Pareto distributions is available, paralleling and closely related to the univariate hierarchy.

Even more flexible models have been proposed using these univariate and multivariate Pareto models as building blocks. Several of these will be described in this paper.

The end result is an impressively flexible array of income models from which the researcher can select a parsimonious model for the particular data set at hand. The emphasis in this survey will be on distributional properties of the models but some attention will be paid to estimation and inference strategies.

## 2 A hierarchy of generalized Pareto models

As the basic distribution in our hierarchy of generalized Pareto models we use the classical Pareto distribution, called here the Pareto (I) distribution. Its survival function is of the form (1). In practice, *α* is frequently assumed to be larger than 1, so that the distribution has a finite mean. If a random variable *X* has (1) as its survival function, then we write *X* ∼ *P*(*I*)(*σ*,*α*). In this basic model, the parameter *σ* has a dual role. It is indeed a scale parameter, but also it determines the lower bound of the support of the distribution and, in a sense, plays to some extent the role of a location parameter. A slightly more general model, which separates the roles of location and scale parameters has then been frequently used.

where *μ*, the location parameter, is real valued, *σ* is positive and *α* is positive. In most applications *μ* will be non-negative, but negative values for *μ* pose no problems. If *X* has (2) as its survival function, we will write *X* ∼ *P*(II)(*μ*,*σ*,*α*).

where *σ* > 0 and -*∞* < *k* < *∞*. The density corresponding to *k* = 0 is obtained by taking the limit as *k* *↑* 0 in (3). This model, (3), includes three sub-models. When *k* < 0, it yields a Pareto (II) density (with *μ* = 0), when *k* = 0 it yields an exponential density, while for *k* > 0, it corresponds to a scaled Beta distribution (of the first or standard kind). In fact, several results for the Pareto (II) distribution can be proved to remain valid in the more general context of the Pickands generalized Pareto model. In an income modeling setting, as Pareto observed, heavy tailed distributions are typically encountered and, for this reason, we will concentrate on the Pareto (II) sub-model. We remark, in passing, that despite Pareto’s insistence on the ubiquity of heavy tails, several authors have utilized scaled Beta distributions as income models, and some have even argued in favor of the exponential distribution as a model. The most general model in our hierarchy, the Feller-Pareto model will be seen to actually include some light-tailed distributions corresponding to scaled Beta models with an additional location parameter. Applications of such light tailed models are more likely to be encountered outside of the income distribution context.

where -*∞* < *μ* < *τ* < *∞*, *σ* > 0, and *α* > 0. Some discussion of such models may be found in Aban et al. (2006) and Arnold and Austin (1987).

where *μ* is real, *σ* is positive and *γ* is positive. We will call *γ* the inequality parameter. If *μ* = 0 and *γ* ≤ 1, then *γ* turns out to be precisely the Gini index of inequality for this distribution. If *X* has (5) as its survival function, we will write *X* ∼ *P*(III)(*μ*,*σ*,*γ*).

where *μ* (location) is real, *σ* (scale) is positive, *γ* (inequality) is positive and *α* (shape) is positive. Although we continue to call *γ* the inequality parameter it will only be identifiable with the Gini index when *α* = 1 and *μ* = 0. One might argue instead that in the P(IV) model both *γ* and *α* would be best described as shape parameters, since neither of them has a direct inequality interpretation. An anonymous referee points out that the two parameters *γ* and *α* govern the behavior of the P(IV) density as *x* approaches *μ* from above and as *x* approaches infinity. Thus, *f*(*x*;*μ*,*σ*,*γ*,*α*)∼*x*^{-α/γ-1} as *x* → *∞* and *f*(*x*-*μ*;*μ*,*σ*,*γ*,*α*) ∼ (*x*-*μ*)^{1/γ-1} as *x* → *μ*. He suggests that an argument might be advanced in favor of a reparameterization in which we define *β* = *α*/*γ*, to highlight the roles of *α* and *β* in determining the limiting behavior of the density. However, in this paper, to be consistent with the notation in Arnold (1983), we will continue with the *μ*,*σ*,*γ*,*α* parameterization and continue to call *γ* the inequality parameter. If a random variable *X* has (6) as its survival function, we will write *X* ∼ *P*(IV)(*μ*,*σ*,*γ*,*α*). Note that the Pareto (IV) distribution, with *μ* = 0, is also known as a Burr Type XII distribution.

*X*

_{ i }∼ Γ(

*δ*

_{ i },1)

*i*= 1,2, are independent random variables, and if for

*μ*real,

*σ*> 0 and

*γ*> 0 we define

then *W* has a Feller–Pareto distribution, and we write *W* ∼ *FP*(*μ*,*σ*,*γ*,*δ*_{1},*δ*_{2}).

*δ*

_{2}= 1, i.e.,

The corresponding survival function is obtainable from tables of the incomplete beta function. For many computations it is simpler to work directly with the representation (8). The Pareto (IV) distributions correspond to the case in which *X*_{2} has a gamma distribution while *X*_{1} has an exponential distribution. The Pareto (III) distributions are encountered when both *X*_{1} and *X*_{2} are exponential variables.

Kalbfleisch and Prentice (1980) call the Feller-Pareto density (with *μ* = 0) a generalized *F* density. Instead we might describe a Feller Pareto variable as being a location and scale transform of a generalized beta variable of the second kind. Recall that a beta variable of the second kind is just a ratio of independent gamma variables.

*X*

_{ i }∼ Γ(

*δ*

_{ i },1),

*i*= 1,2. One could define

The additional flexibility provided by the introduction of such a sixth parameter in the model has not been investigated.

The full array of generalized univariate Pareto distributions to be considered in this paper are subsumed in the Feller–Pareto family and a unified derivation of many distributional results is possible. However, in the case of Pareto (I)-(IV) distributions, some alternative representations are also useful.

*X*has a

*P*(I)(

*σ*,

*α*) distribution if it is of the form

*V*is a standard exponential random variable. An analogous representation of a Pareto (II) variable in terms of an exponential random variable is possible, i.e.,

Likewise Pareto (III) and (IV) variables can be represented as *X* = *μ* + *σ*(*e*^{
V
} - 1)^{
γ
} and *X* = *μ* + *σ*(*e*^{V/α} - 1)^{
γ
} respectively. The representation (12) for a classical Pareto variable (i.e., Pareto (I)) highlights the useful observation that the logarithm of such a variable has a shifted exponential distribution. This will permit the recognition of many distributional properties of Pareto (I) variables as reflections of parallel properties of exponential variables.

*Z*. Thus, if

i.e., a (translated) exponential distribution, and if *Z* ∼ Γ(*α*,1), then it follows that unconditionally *X*∼*P*(II)(*μ*,*σ*,*α*). Alternatively, this can be viewed as being equivalent to the representation in (8) after setting *γ* = *δ*_{2} = 1.

This representation of the Pareto (II) distribution as a gamma mixture of exponential distributions is often encountered in reliability and survival contexts, see e.g., Keiding et al. (2002). It is also familiar in Bayesian analysis of exponential data, where the gamma density enters as a convenient prior. In this context the Pareto (II) distribution is sometimes called the Lomax distribution.

^{2}distribution. It is closely related to the logistic distribution. We say that a random variable

*X*has a logistic (

*μ*,

*σ*) distribution, if its distribution function assumes the form

*X*∼

*L*(

*μ*,

*σ*). It is not difficult to verify that

It is as a consequence of the relation (14) that the Pareto (III) distribution, with *μ* = 0, is sometimes called the log-logistic distribution.

**Remark 1.**

*Johnson et al. (*1994

*) refer to a Pareto distribution of the third kind that is not to be confused with the Pareto (III) distribution discussed in this paper. The survival function of this “third kind” distribution is of the form*

*This distribution, which was suggested by Pareto (*1897*), was proposed to accommodate cases in which the basic Pareto model (1) was inadequate for fitting certain data configurations. This model is closely related to the Pareto (II) distribution, but with an additional exponential factor. Note that it could be viewed as the distribution of the minimum of a Pareto (II) variable (with μ* = 0*) and an independent exponential variable. This model has been used infrequently, but recently it has reappeared, this time called a tapered Pareto distribution (Kagan and Schoenberg*2001*).*

### 2.1 Distributional properties

*μ*if

*γ*>

*δ*

_{2}, while if

*γ*≤

*δ*

_{2}, we find (here

*W*∼

*F*

*P*(

*μ*,

*σ*,

*γ*,

*δ*

_{1},

*δ*

_{2}))

*W*∼

*F*

*P*(

*μ*,

*σ*,

*γ*,

*δ*

_{1},

*δ*

_{2}), if we define

*W*

^{∗}= (

*W*-

*μ*)/

*σ*, then

*W*

^{∗}∼

*F*

*P*(0,1,

*γ*,

*δ*

_{1},

*δ*

_{2}), i.e.,

*W*

^{∗}=

^{ d }(

*X*

_{2}/

*X*

_{1})

^{ γ }where

*X*

_{ i }∼ Γ(

*δ*

_{ i },1)

*i*= 1,2, are independent random variables. It then can be readily verified that for a real number

*τ*, the

*τ*’th moment of

*W*

^{∗}when it exists is of the form

From this expression moments for the Feller-Pareto and the P(II)-P(IV) distributions are readily obtained.

*μ*=

*σ*≠ 0. They are obtainable by direct integration:

*X*

_{1},

*X*

_{2},…,

*X*

_{ n }are independent Pareto I variables with

*X*

_{ i }∼

*P*(I)(

*σ*

_{ i },

*α*

_{ i }), then their product

*W*has the representation

where the *V*_{
i
}’s are independent standard exponential variables. In some cases expressions are available for the distribution of${\sum}_{i=1}^{n}{V}_{i}/{\alpha}_{i}$. In particular, if *α*_{
i
} = *α*,(*i* = 1,2,…,*n*), then${\sum}_{i=1}^{n}{V}_{i}/\alpha \sim \mathrm{\Gamma}(n,1/\alpha )$, and we may readily obtain the density of *W*.

*α*

_{ i }’s are distinct. In this situation we can use a result for weighted sums of exponentials given in, for example, Feller (1971), p. 40, and write the survival function of the product in the form:

where$\sigma ={\prod}_{i=1}^{n}{\sigma}_{i}$ and *α*_{
i
} ≠ *α*_{
j
} if *i* ≠ *j*. The distribution of products of independent Pareto (IV) variables with *μ*_{
i
}’s equal to 0 can, via the representation (8), be reduced to a problem involving the distribution of products of powers of independent gamma random variables. Unlike the Pareto (I) case, closed form expressions for the resulting density are apparently not obtainable, although moments of such products are readily available.

*X*

_{1}and

*X*

_{2}are independent random variables with

*X*

_{ i }∼

*P*(IV)(

*μ*,

*σ*,

*γ*,

*α*

_{ i }),

*i*= 1,2, then

Note that in this situation the *X*_{
i
}’s share common values for the parameters *μ*,*σ* and *γ*.

*X*

_{1},

*X*

_{2},… of i.i.d. Pareto (III) (

*μ*,

*σ*,

*γ*) random variables. Suppose that for some

*p*∈ (0,1),

*N*

_{ p }is independent of the

*X*

_{ i }’s and has a geometric (

*p*) distribution, i.e.,

*P*(

*N*=

*n*) =

*p*(1 -

*p*)

^{n-1},

*n*= 1,2,…. Define the corresponding random extrema by

*N*

_{ p }, that

*U*

_{ p }and

*V*

_{ p }each have Pareto (III) distributions. Thus

*μ*= 0, then

Some characterization results based on this observation were discussed in Arnold et al. (1986).

*X*

_{i:n}is the

*i*th order statistic from a sample of size

*n*from a Pareto (IV) distribution, then

where$\stackrel{d}{=}$ means that the two random variables are identically distributed, where *F*^{-1} is as given in (26), and where *U*_{i:n} is the *i* th order statistic of a sample of size *n* from a uniform (0,1) distribution. It is well known (see e.g. David and Nagaraja2003) that *U*_{i:n} ∼ Beta(*i*,*n* - *i* + 1).

*i*th order statistic (27) assumes a known form. For example:

## 3 Some related extensions

A variety of models have been proposed to add more flexibility to the generalized Pareto models discussed in Section 2. Most of them include Pareto models as special cases. In this Section we will make note of a selection of these models.

*σ*,

*γ*,

*α*) distribution. If we denote the corresponding

*P*(

*I*

*V*)(0,

*σ*,

*γ*,

*α*) distribution by

*F*

_{σ,γ,α}(

*x*) then the distribution of

*X*will be

*X*is assumed be of the form

where *x* ∈ (0,*∞*). Of course, if *λ*_{1} = *λ*_{2} = 1, the Beta-generalized distribution simplifies to become a Pareto distribution.

*θ*> 0. In such a case we have

and the distribution is usually called the exponentiated generalized Pareto distribution. It can be recognized as a special case of the Beta-generalized Pareto model with the parameters chosen to be *λ*_{1} = *θ* and *λ*_{2} = 1.

*X*has a Kumaraswamy (

*λ*

_{1},

*λ*

_{2}) distribution if its density and distribution functions are :

*F*

_{ P }(

*x*) denote the Pareto (IV) distribution function and suppose that

*K*has a Kumaraswamy (

*λ*

_{1},

*λ*

_{2}) distribution. Define$Y={F}_{P}^{-1}(K),$ then

*Y*has a Kumaraswamy-Pareto (IV) distribution with corresponding density

Akinsete et al. (2008) consider some special subcases of the Beta-generalized Pareto distribution, while Paranaiba et al. (2013) discuss the Kumaraswamy-generalized Pareto distribution. Submodels of the Kumaraswamy-generalized Pareto model are often of interest. For example the exponentiated generalized Pareto distribution (33) is such a submodel.

And, of course, one can concatenate these constructions and consider a Beta-Kumaraswamy-Pareto distribution. Going one step further we would arrive at a generalized-Beta-Kumaraswamy-Pareto model. Each generalization adds flexibility at the cost of introducing more parameters. Some degree of parsimony is evidently called for here.

where *μ* ∈ (-*∞*,*∞*),*σ*,*γ* ∈ (0,*∞*),*p* ∈ (0,1) and *h*(*x*) is a periodic function of ln*x* with period -2*π*/[ *γ* ln*p*], and with *h*(0) = 1. The case in which *h*(*x*) ≡ 1 for every *x* corresponds to the usual Pareto (III) model. More generality can be arrived at if *h*(*x*) is replaced by a suitable parametric family of periodic functions. Note that, in order for (36) to be a valid survival function, it must be the case that *x*^{1/γ}*h*(*x*) is a non-decreasing function of *x*.

*X*is observed only if a covariable

*Y*takes on a value less than some threshold value. Thus the distribution of the observed

*X*’s is of the form

*P*(

*X*≤

*x*|

*Y*≤

*y*

_{0}). With this in mind, consider the case in which (

*X*,

*Y*) has a bivariate Pareto (IV) distribution with the following joint survival function.

*X*, given that it can only be observed if

*Y*is not too large, is

*θ*= [(

*y*

_{0}-

*ν*)/

*τ*]

^{1/δ}has been introduced. In this model,

*μ*is a real valued parameter, often positive, while all of the other parameters,

*σ*,

*γ*,

*α*and

*θ*are positive valued. An alternative representation of this density is possible as follows.

where *σ*_{1} = *σ*(1 + *θ*)^{
γ
}. This is recognizable as a linear combination of two Pareto (IV) densities. Note that the density is a linear combination of two Pareto (IV) densities, but it is not a convex combination since, although the coefficients add up to 1, the second coefficient is negative. Motivated by this example, one might also consider *k*-component linear combinations of Pareto (IV) densities as income models, allowing *k* to be greater than 2. Such models with positive coefficients are natural candidates for fitting multimodal income data sets which may well have a mixture genesis. Note that, by testing the hypothesis that *θ* = 0 one can decide whether or not the data set at hand has been subject to hidden truncation. More detailed discussion of these hidden truncation Pareto models, in the Pareto (II) case, may be found in Arnold and Ghosh (2011).

## 4 Inference, briefly

Suppose that *X*_{1},*X*_{2},…,*X*_{
n
} are independent identically distributed random variables with a common Pareto (IV) distribution. The sample size should be reasonably large, since we have four parameters to estimate. The sample minimum, or some minor corrected version of it, will be a suitable estimate of the location parameter *μ*. After subtracting it from each of the observations, the remaining three parameters may be estimated using maximum likelihood. The corresponding Fisher information matrix is available (as indeed is the Fisher information matrix for the Feller-Pareto model with *μ* = 0).

Either a global search or numerical solution of the likelihood equations will be required to identify the location of the maximum of the likelihood function. In the Pareto (I) case, a variety of alternative estimates are available including best unbiased estimates. Alternatively, in the Pareto (I) case one can take logarithms of the data and arrive at a shifted exponential model, for which many estimation strategies have been developed.

A diffuse prior Bayesian analysis can be used for Pareto (IV) data. It will, predictably, yield results similar to those obtained via maximum likelihood. In the Pareto (I) case, Lwin (1972) introduced a conjugate family of priors for (*σ*,*α*) which can be used to incorporate some degree of prior knowledge of the parameters. Arnold et al. (1998) suggest use of a more flexible family of what they call conditionally conjugate priors in this setting. These priors are tailor-made for subsequent use of Gibbs sampling algorithms to generate realizations from the corresponding posterior distribution.

More details on parametric inference for Pareto models may be found in Arnold (1983) and Arnold (2008).

## 5 Multivariate Pareto models

*k*-dimensional Pareto distributions was Mardia (1962). Mardia’s type I multivariate Pareto distribution has the attractive feature that both marginals and conditional distributions are Paretian in nature. We will say that a

*k*-dimensional random vector$\underset{\_}{X}$ has a type I multivariate Pareto distribution, if the joint survival function is of the form

*σ*

_{ i }’s are non-negative marginal scale parameters. The non-negative parameter

*α*is an inequality parameter (common to all marginals). It follows from (40) that the one-dimensional marginals are classical Pareto distributions. Thus

*X*

_{ i }∼

*P*(I)(

*σ*

_{ i },

*α*),

*i*= 1,2,…,

*k*. By setting selected

*x*

_{ i }’s equal to

*σ*

_{ i }in (40), it is apparent that, for any

*k*

_{1}<

*k*, all

*k*

_{1}dimensional marginals are again multivariate Pareto. If we use the notational device$\underset{\_}{X}=(\underset{\_}{\stackrel{\u0307}{X}},\underset{\_}{\stackrel{\u0308}{X}})$ where$\underset{\_}{\stackrel{\u0307}{X}}$ is

*k*

_{1}dimensional, with an analogous partition of the vector$\underset{\_}{\sigma}=(\underset{\_}{\stackrel{\u0307}{\sigma}},\underset{\_}{\stackrel{\u0308}{\sigma}})$, we may write

Conditional distributions are also of the form (40), but with a change of location.

It is natural to extend this basic multivariate Pareto model by the introduction of location, scale, inequality and shape parameters in a manner parallel to that used to develop the univariate Pareto (II)-(IV) distributions, as follows:

*MP*

^{(k)}(

*I*

*I*)) We will say that$\underset{\_}{X}$ has a

*k*-dimensional Pareto distribution of type II, if its joint survival function is of the form

and we write$\underset{\_}{X}\sim M{P}^{(k)}(\text{II})(\underset{\_}{\mu},\underset{\_}{\sigma},\alpha )$.

*MP*

^{(k)}(

*III*))$\underset{\_}{X}$ has a

*k*-dimensional Pareto distribution of type III, if its joint survival function is of the form

and we write$\underset{\_}{X}\sim M{P}^{(k)}(\text{III})(\underset{\_}{\mu},\underset{\_}{\sigma},\underset{\_}{\gamma})$.

*MP*

^{(k)}(

*IV*))$\underset{\_}{X}$ has a

*k*-dimensional Pareto distribution of type IV, if its joint survival function is of the form

and we write$\underset{\_}{X}\sim M{P}^{(k)}(\text{IV})(\underset{\_}{\mu},\underset{\_}{\sigma},\underset{\_}{\gamma},\alpha )$.

*MP*

^{(k)}(

*II*) distribution are again of the

*MP*

^{(k)}(

*II*) form. An

*MP*

^{(k)}(

*III*) distribution has

*MP*

^{(k)}(

*III*) marginals, but not conditionals. However an

*MP*

^{(k)}(

*IV*) distribution does have both its marginals and conditionals of the

*MP*

^{(k)}(

*IV*) form. Specifically, in the

*MP*

^{(k)}(

*IV*) case, using the dot – double dot notation, we have

Takahasi (1965) discussed the *MP*^{(k)}(*IV*) distribution with$\underset{\_}{\mu}=\underset{\_}{0}$ and$\underset{\_}{\sigma}=\underset{\_}{1}$. He called it a multivariate Burr’s distribution, and noted that the marginal and conditional distributions were of the same form.

*k*-dimensional Pareto distribution of type IV, we may act as if the

*X*

_{ i }’s have the representation

where the *W*_{
i
}’s are independent identically distributed Γ(1,1) variables (i.e., standard exponential variables) and *Z*, independent of the *W*_{
i
}’s, has a Γ(*α*,1) distribution. This representation, for example, makes it easy to compute the means, variances and covariances of$\underset{\_}{X}$.

*W*

_{ i }’s are gamma rather than exponential variables. The resulting distribution will be called

*k*-dimensional Feller-Pareto, since its marginals are of the Feller-Pareto form. Thus$\underset{\_}{X}\sim F{P}^{(k)}(\underset{\_}{\mu},\underset{\_}{\sigma},\underset{\_}{\gamma},\alpha ,\underset{\_}{\beta})$ if

where the *W*_{
i
}’s and *Z* are independent random variables with *W*_{
i
} ∼ Γ(*β*_{
i
},1), (*i* = 1,2,…,*k*), and *Z* ∼ Γ(*α*,1). The marginal and conditional distributions of this multivariate Feller-Pareto distribution are again multivariate Feller-Pareto. The covariance structure can be readily obtained from the representation (49). Parallel to the situation in one dimension, there exist alternative names that could be applied to multivariate Feller-Pareto variables. They could be called multivariate generalized F variables or multivariate generalized beta of the second kind variables. An evident drawback of the multivariate Feller-Pareto model (and its various submodels) is the presence of a common value of *α* which appears in each marginal density. The consequences of this homogeneity are not easy to pin down. Certainly a model with Feller Pareto marginals with different *α*’s for each of the marginals would be desirable, if one can be developed with attractive distributional properties (e,g., “nice” conditional distributions).

### 5.1 Other multivariate Pareto distributions

Although the title of this section promises discussion of multivariate models, only the bivariate case will be treated. It will be left to the reader to visualize the, usually straightforward, extension to the multivariate case. Notational complexity is avoided to a great extent by focusing on the case *k* = 2.

*P*(IV)(

*μ*,

*σ*,

*γ*,

*α*) distribution can be represented as a scale mixture of Weibull distributions. Equivalently, as remarked earlier, that a

*P*(

*IV*) random variable admits a representation as

*U*∼

*exp*(1) and

*Z*∼ Γ(

*α*,1) are independent variables. A natural bivariate version of this construction begins with (

*U*

_{1},

*U*

_{2}) having a bivariate exponential distribution with standard exponential marginals, perhaps one of the Marshall-Olkin type with parameters 1, 1 and

*λ*. Then, with

*Z*∼ Γ(

*α*,1) independent of (

*U*

_{1},

*U*

_{2}), we define (

*X*

_{1},

*X*

_{2}) by

Observe that in any bivariate Pareto (IV) distribution generated by this method, the marginals share a common value of *α*.

*P*(

*IV*) distributions makes use of the following representation of a

*P*(

*IV*) variable. Suppose that

*U*∼

*exp*(1), then

*U*

_{1},

*U*

_{2}) having an arbitrary bivariate distribution with standard exponential marginals and construct a variable (

*X*

_{1},

*X*

_{2}) with a bivariate Pareto (IV) distribution by defining

*X*

_{ i },

*i*= 1,2, are independent with

*X*

_{ i }∼

*P*(IV)(

*μ*,

*σ*,

*γ*,

*α*

_{ i }), then min(

*X*

_{1},

*X*

_{2}) ∼

*P*(IV)(

*μ*,

*σ*,

*γ*,

*α*

_{1}+

*α*

_{2}). We then begin with three independent random variables

*Y*

_{1},

*Y*

_{2},

*Y*

_{3}with

*Y*

_{ i }∼

*P*(IV)(

*μ*,

*σ*,

*γ*,

*α*

_{ i }) and define

*X*

_{1},

*X*

_{2}) ∼

*P*(IV)(

*μ*,

*σ*,

*γ*,

*α*

_{1}+

*α*

_{2}+

*α*

_{3}). This distribution has the perhaps undesirable property that

*P*(

*X*

_{1}=

*X*

_{2}) > 0, and has another unfortunate property in that the marginals share common values of

*μ*,

*σ*and

*γ*. This latter problem can be avoided to some extent by assuming that the

*Y*

_{ i }’s have

*P*(IV)(0,1,

*γ*,

*α*

_{ i }) distributions and then defining

In this case the *X*_{
i
}’s share only a common value of *γ*.

*Z*

_{1},

*Z*

_{2}) and apply marginal transformations to produce a bivariate distribution with Pareto (IV) marginals. A popular choice for the distribution of (

*Z*

_{1},

*Z*

_{2}) is a bivariate normal with standard normal marginals and correlation

*ρ*, but of course any other bivariate distribution can be used in its place. Now using

*F*

_{μ,σ,γ,α}to denote the distribution function of a

*P*(IV)(

*μ*,

*σ*,

*γ*,

*α*) random variable and Φ to denote a standard normal distribution function, we define

The correlation structure of the *X*_{
i
}’s is inherited from the correlation structure of the *Z*_{
i
}’s. In this case the extension to *k* dimensions is particularly transparent. Note also that the model has one dependence parameter which, if set equal to 0, yields a model with independent *P*(*IV*) marginals. It will be noted that this feature of having a single dependence parameter is shared by the other bivariate models introduced in this Section.

## 6 Multivariate extensions

*k*-dimensional versions. For example a random variable with the univariate Beta-generalized-Pareto (IV) distribution can be viewed as being defined by

*V*∼

*Beta*(

*λ*

_{1},

*λ*

_{2}). For a bivariate version of this construction, we begin with (

*V*

_{1},

*V*

_{2}) having a bivariate Beta distribution, perhaps of the the type introduced by Arnold and Ng (2011), and make suitable marginal transformations. Thus we define

Higher dimensional versions of this construction require only the identification of a suitable *k*-dimensional Beta distribution. A Dirichlet distribution might be used here. Some other alternatives are described in Arnold and Ng (2011).

To identify a suitable bivariate analog of the Kumaraswamy-Pareto (IV) distribution, all that is required is a bivariate-Kumaraswamy distribution. One possible such distribution was suggested by Nadarajah et al. (2011).

which is not difficult to evaluate, since the conditional distribution of$\stackrel{\u0308}{\underset{\_}{X}}$ given that$\stackrel{\u0307}{\underset{\_}{X}}=\stackrel{\u0307}{\underset{\_}{x}}$ is of the$M{P}^{(k-{k}_{1})}(\mathit{\text{IV}})$ form (refer to equation (46), being careful to switch the roles of$\stackrel{\u0307}{\underset{\_}{X}}$ and$\stackrel{\u0308}{\underset{\_}{X}}$).

We conclude this section by noting the availability of multivariate distributions with Pareto conditionals rather than Pareto marginals. Detailed discussion of such models may be found in Arnold et al. (1999), Chapter 5.

## 7 Envoi

The survey presented in this paper is far from complete. A more detailed and extensive survey (though somewhat out of date) can be found in Arnold (1983). A revision of that book is, however, currently in preparation. In the interim, see Arnold (2008) for a more up-to-date presentation and, as mentioned in Section 4, for more details on inferential strategies. More work is still needed on the development of estimation and hypothesis testing strategies, especially for multivariate Pareto data. Creative Bayesian analyses involving informative priors, in multivariate settings and in cases involving covariates, are also notable for their absence. Finally, I apologize to those readers whose important contributions have been overlooked in this survey. I excuse myself by repeating that the survey is necessarily incomplete. However, please do advise me of any glaring omissions that you might note.

## Declarations

### Acknowledgement

The constructive suggestions supplied by anonymous referees have resulted in a much improved manuscript.

## Authors’ Affiliations

## References

- Aban IB, Meerschaert MM, Panorska AK: Parameter estimation for the truncated Pareto distribution.
*J. Amer. Statist. Assoc*101: 270–277. 2006MathSciNetView ArticleGoogle Scholar - Akinsete A, Famoye F, Lee C: The beta-Pareto distribution.
*Statistics*42: 547–563. 2008MathSciNetView ArticleGoogle Scholar - Arnold BC:
*Pareto Distributions*. International Cooperative Publishing House, Burtonsville, MD; 1983Google Scholar - Arnold BC: Pareto and generalized Pareto distributions. In
*Modeling Income Distributions and Lorenz Curves*. Edited by: Chotikapanich D. Springer, New York; 2008View ArticleGoogle Scholar - Arnold BC, Austin K:
*Truncated Pareto distributions: flexible, tractable and familiar.*. Technical Report #150, Department of Statistics, University of California, Riverside, CA; 1987Google Scholar - Arnold BC, Castillo E, Sarabia JM: Bayesian analysis for classical distributions using conditionally specified priors.
*Sankhya: Ind. J. Stat. Series B*60: 228–245. 1998MathSciNetGoogle Scholar - Arnold BC, Castillo E, Sarabia JM:
*Conditional Specification of Statistical Models*. Springer, New York; 1999Google Scholar - Arnold BC, Ghosh I: Inference for Pareto data subject to hidden truncation.
*J. Ind. Soc. Probability Stat*13: 1–16. 2011Google Scholar - Arnold BC, Laguna L: A stochastic mechanism leading to asymptotically Paretian distributions.
*Business and Economic Statistics Section, Proceedings of the American Statistical Association*1976Google Scholar - Arnold BC, Ng HKT: Flexible bivariate beta distributions.
*J. Multivariate Anal*102: 1194–1202. 2011MathSciNetView ArticleGoogle Scholar - Arnold BC, Robertson CA, Yeh HC: Some properties of a Pareto-type distribution.
*Sankhya: Ind. J. Stat. Series A*48: 404–408. 1986MathSciNetGoogle Scholar - Champernowne DG: The theory of income distribution.
*Econometrica*5: 379–381. 1937Google Scholar - David HA, Nagaraja HN:
*Order Statistics. Third edition*. Wiley, Hoboken, NJ; 2003Google Scholar - Falk M, Hüsler J, Reiss R:
*Laws of Small Numbers, Extremes and Rare Events. Third edition*. Birkhäuser/Springer, Basel; 2011View ArticleGoogle Scholar - Feller W:
*An Introduction to Probability Theory and its Applications, Vol. 2. Second edition*. Wiley, New York; 1971Google Scholar - Fisk PR: The graduation of income distributions.
*Econometrica*29: 171–185. 1961aView ArticleGoogle Scholar - Fisk PR: Estimation of location and scale parameters in a truncated grouped sech-square distribution.
*J. Am. Stat. Assoc*56: 692–702. 1961bView ArticleGoogle Scholar - Johnson NL, Kotz S, Balakrishnan N:
*Continuous Univariate Distributions, Vol. 1. Second edition*. Wiley, New York; 1994Google Scholar - Jones MC: Families of distributions arising from distributions of order statistics.
*TEST*13: 1–43. 2004MathSciNetView ArticleGoogle Scholar - Jones MC, Kumaraswamy’s distribution: A beta-type distribution with some tractability advantages.
*Stat. Methodol*6: 70–81. 2009MathSciNetView ArticleGoogle Scholar - Kagan YY, Schoenberg F: Estimation of the upper cutoff parameter for the tapered Pareto distribution.
*J. Appl. Probability*38A: 158–175. 2001MathSciNetView ArticleGoogle Scholar - Kalbfleisch JD, Prentice RL:
*The Statistical Analysis of Failure Time Data*. Wiley, New York; 1980Google Scholar - Keiding N, Kvist K, Hartvig H, Tvede M, Juul S: Estimating time to pregnancy from current durations in a cross-sectional sample.
*Biostatistics*3: 565–578. 2002View ArticleGoogle Scholar - Lwin T: Estimation of the tail of the Paretian law.
*Skand. Aktuarietidskr*55: 170–178. 1972MathSciNetGoogle Scholar - Maguire BA, Pearson ES, Wynn AHA: The time intervals between industrial accidents.
*Biometrika*39: 168–180.View ArticleGoogle Scholar - Mardia KV: Multivariate Pareto distributions.
*Ann. Math. Stat*33: 1008–1015. 1962MathSciNetView ArticleGoogle Scholar - Nadarajah S, Cordeiro GM, Ortega EMM: General results for the Kumaraswamy-G distribution.
*J. Stat. Comput. Simul*82: 951–979. 2011MathSciNetView ArticleGoogle Scholar - Pareto V:
*Cours d’economie Politique, Vol. II*. F. Rouge, Lausanne; 1897Google Scholar - Paranaiba PF, Ortega EMM, Cordeiro GM, de Pascoa MAR: The Kumaraswamy Burr XII distribution: theory and practice.
*J. Stat. Comput. Simul*83: 2117–2143. 2013MathSciNetView ArticleGoogle Scholar - Pillai RN: Semi-Pareto processes.
*J. Appl. Probab*28: 461–465. 1991MathSciNetView ArticleGoogle Scholar - Takahasi K: Note on the multivariate Burr’s distribution.
*Ann. Inst. Statist. Math*17: 257–260. 1965MathSciNetView ArticleGoogle Scholar

## Copyright

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.