Open Access

Univariate and multivariate Pareto models

Journal of Statistical Distributions and Applications20141:11

DOI: 10.1186/2195-5832-1-11

Received: 13 February 2014

Accepted: 9 May 2014

Published: 17 June 2014


The Pareto distribution has long been recognized as a suitable model for many non-negative socio-economic variables. Univariate and multivariate variations abound. Some unification is possible by representing the Pareto variables in terms of independent gamma distributed components. Further unification is sometimes possible since some of the frequently used multivariate Pareto models share the same copula. In some cases, inference strategies can be developed to take advantage of the stochastic representations in terms of gamma components.


Inequality Heavy tails Generalized Pareto Feller-Pareto Kumaraswamy distribution Hidden truncation Conditional specification

1 Introduction

Discussion of Pareto and Pareto-like distributions can be traced back to Vilfredo Pareto’s Economics textbook published in Rome in 1897. His observation that the number of persons in a population whose incomes exceed x is often well approximated by Cx-α for some positive C and some positive α, led inexorably to consideration of the following well known model used for fitting univariate income data.
P ( X > x ) = F ¯ X ( x ) = ( x / σ ) - α , x > σ .

Here σ, the scale parameter, is positive and α (Pareto’s index of inequality) is also positive.

This model is typically referred to as the classical Pareto model. Improved fitting of data is encountered when more general Pareto-like distributions are considered. In this survey, the classical model will be embedded in a hierarchy of more complicated Pareto models. In this hierarchy of generalized Pareto distributions, the classical model will be called the Pareto (I) distribution. Multivariate income distributions are also of interest and, in that arena, a hierarchy of multivariate Pareto distributions is available, paralleling and closely related to the univariate hierarchy.

Even more flexible models have been proposed using these univariate and multivariate Pareto models as building blocks. Several of these will be described in this paper.

The end result is an impressively flexible array of income models from which the researcher can select a parsimonious model for the particular data set at hand. The emphasis in this survey will be on distributional properties of the models but some attention will be paid to estimation and inference strategies.

2 A hierarchy of generalized Pareto models

As the basic distribution in our hierarchy of generalized Pareto models we use the classical Pareto distribution, called here the Pareto (I) distribution. Its survival function is of the form (1). In practice, α is frequently assumed to be larger than 1, so that the distribution has a finite mean. If a random variable X has (1) as its survival function, then we write XP(I)(σ,α). In this basic model, the parameter σ has a dual role. It is indeed a scale parameter, but also it determines the lower bound of the support of the distribution and, in a sense, plays to some extent the role of a location parameter. A slightly more general model, which separates the roles of location and scale parameters has then been frequently used.

This distribution will be called the Pareto (II) distribution. Its survival function is of the form
F ¯ ( x ) = 1 + x - μ σ - α , x > μ

where μ, the location parameter, is real valued, σ is positive and α is positive. In most applications μ will be non-negative, but negative values for μ pose no problems. If X has (2) as its survival function, we will write XP(II)(μ,σ,α).

There is an intimate relation between the Pareto (II) distribution and the Pickands generalized Pareto model, which is much used in the study of extreme values and peaks over thresholds. A good general reference for discussion of the role of the Pickands generalized Pareto distribution is the book by Falk et al. (2011). The density of the Pickands generalized Pareto model is
f ( x ; σ , k ) = 1 σ 1 - kx σ ( 1 - k ) / k I ( x > 0 , ( kx ) / σ < 1 ) .

where σ > 0 and - < k < . The density corresponding to k = 0 is obtained by taking the limit as k 0 in (3). This model, (3), includes three sub-models. When k < 0, it yields a Pareto (II) density (with μ = 0), when k = 0 it yields an exponential density, while for k > 0, it corresponds to a scaled Beta distribution (of the first or standard kind). In fact, several results for the Pareto (II) distribution can be proved to remain valid in the more general context of the Pickands generalized Pareto model. In an income modeling setting, as Pareto observed, heavy tailed distributions are typically encountered and, for this reason, we will concentrate on the Pareto (II) sub-model. We remark, in passing, that despite Pareto’s insistence on the ubiquity of heavy tails, several authors have utilized scaled Beta distributions as income models, and some have even argued in favor of the exponential distribution as a model. The most general model in our hierarchy, the Feller-Pareto model will be seen to actually include some light-tailed distributions corresponding to scaled Beta models with an additional location parameter. Applications of such light tailed models are more likely to be encountered outside of the income distribution context.

Truncated versions of the Pareto (II) distribution, which clearly are not heavy tailed, are sometimes appropriate models for data sets which, for some reason, exclude large values. Such distributions are of the form
F ( x ) = 0 , x μ , = 1 - 1 + x - μ σ - α 1 - 1 + τ - μ σ - α , μ < x < τ , = 1 , x τ ,

where - < μ < τ < , σ > 0, and α > 0. Some discussion of such models may be found in Aban et al. (2006) and Arnold and Austin (1987).

The Pareto (III) distribution is a variant model with tail behavior comparable to that of the Pareto (II) distribution. Its survival function is of the form
F ¯ ( x ) = 1 + x - μ σ 1 / γ - 1 , x > μ

where μ is real, σ is positive and γ is positive. We will call γ the inequality parameter. If μ = 0 and γ ≤ 1, then γ turns out to be precisely the Gini index of inequality for this distribution. If X has (5) as its survival function, we will write XP(III)(μ,σ,γ).

If we introduce both a shape and an inequality parameter, we arrive at the Pareto (IV) family:
F ¯ ( x ) = 1 + x - μ σ 1 / γ - α , x > μ

where μ (location) is real, σ (scale) is positive, γ (inequality) is positive and α (shape) is positive. Although we continue to call γ the inequality parameter it will only be identifiable with the Gini index when α = 1 and μ = 0. One might argue instead that in the P(IV) model both γ and α would be best described as shape parameters, since neither of them has a direct inequality interpretation. An anonymous referee points out that the two parameters γ and α govern the behavior of the P(IV) density as x approaches μ from above and as x approaches infinity. Thus, f(x;μ,σ,γ,α)x-α/γ-1 as x →  and f(x-μ;μ,σ,γ,α)  (x-μ)1/γ-1 as x → μ. He suggests that an argument might be advanced in favor of a reparameterization in which we define β = α/γ, to highlight the roles of α and β in determining the limiting behavior of the density. However, in this paper, to be consistent with the notation in Arnold (1983), we will continue with the μ,σ,γ,α parameterization and continue to call γ the inequality parameter. If a random variable X has (6) as its survival function, we will write XP(IV)(μ,σ,γ,α). Note that the Pareto (IV) distribution, with μ = 0, is also known as a Burr Type XII distribution.

The three more specialized families. P(I)-P(III), may be identified as special cases of the Pareto (IV) family as follows:
P ( I ) ( σ , α ) = P ( IV ) ( σ , σ , 1 , α ) , P ( II ) ( μ , σ , α ) = P ( IV ) ( μ , σ , 1 , α ) , P ( III ) ( μ , σ , γ ) = P ( IV ) ( μ , σ , γ , 1 ) .
Feller (1971), p. 49, suggested a different definition of a Pareto distribution. It can be recognized as the distribution of a ratio of two independent gamma variables (a distribution also known as Beta distribution of the second kind). By considering a linear function of a power of such a random variable, we arrive at a very general family, called the Feller–Pareto family. Thus if X i  Γ(δ i ,1)i = 1,2, are independent random variables, and if for μ real, σ > 0 and γ > 0 we define
W = μ + σ ( X 2 / X 1 ) γ ,

then W has a Feller–Pareto distribution, and we write WFP(μ,σ,γ,δ1,δ2).

It may be verified that the Pareto (IV) distributions are identifiable with the Feller–Pareto distributions with δ2 = 1, i.e.,
P ( IV ) ( μ , σ , γ , α ) = FP ( μ , σ , γ , α , 1 ) .
The density of the general Feller–Pareto distribution defined by (8) is of the form
f W ( w ) = w - μ σ ( δ 2 / γ ) - 1 1 + w - μ σ 1 / γ - δ 1 - δ 2 / γ σ B ( δ 1 , δ 2 ) , w > μ .

The corresponding survival function is obtainable from tables of the incomplete beta function. For many computations it is simpler to work directly with the representation (8). The Pareto (IV) distributions correspond to the case in which X2 has a gamma distribution while X1 has an exponential distribution. The Pareto (III) distributions are encountered when both X1 and X2 are exponential variables.

Kalbfleisch and Prentice (1980) call the Feller-Pareto density (with μ = 0) a generalized F density. Instead we might describe a Feller Pareto variable as being a location and scale transform of a generalized beta variable of the second kind. Recall that a beta variable of the second kind is just a ratio of independent gamma variables.

An even more general model might be built using independent variables X i  Γ(δ i ,1), i = 1,2. One could define
W = μ + σ X 2 γ 2 X 1 γ 1 .

The additional flexibility provided by the introduction of such a sixth parameter in the model has not been investigated.

The full array of generalized univariate Pareto distributions to be considered in this paper are subsumed in the Feller–Pareto family and a unified derivation of many distributional results is possible. However, in the case of Pareto (I)-(IV) distributions, some alternative representations are also useful.

A random variable X has a P(I)(σ,α) distribution if it is of the form
X = σ e V / α
where V is a standard exponential random variable. An analogous representation of a Pareto (II) variable in terms of an exponential random variable is possible, i.e.,
X = μ + σ e V / α - 1 .

Likewise Pareto (III) and (IV) variables can be represented as X = μ + σ(e V  - 1) γ and X = μ + σ(eV/α - 1) γ respectively. The representation (12) for a classical Pareto variable (i.e., Pareto (I)) highlights the useful observation that the logarithm of such a variable has a shifted exponential distribution. This will permit the recognition of many distributional properties of Pareto (I) variables as reflections of parallel properties of exponential variables.

A second important representation of the Pareto (II) distribution, known to Maguire et al. (1952), is as a mixture of exponentials. We may describe it in terms of the conditional survival function, given an auxiliary gamma distributed random variable Z. Thus, if
P ( X > x Z = z ) = e - z ( x - μ ) / σ , x > μ ,

i.e., a (translated) exponential distribution, and if Z Γ(α,1), then it follows that unconditionally XP(II)(μ,σ,α). Alternatively, this can be viewed as being equivalent to the representation in (8) after setting γ = δ2 = 1.

This representation of the Pareto (II) distribution as a gamma mixture of exponential distributions is often encountered in reliability and survival contexts, see e.g., Keiding et al. (2002). It is also familiar in Bayesian analysis of exponential data, where the gamma density enters as a convenient prior. In this context the Pareto (II) distribution is sometimes called the Lomax distribution.

The Pareto (III) distribution was apparently first considered by (Fisk 1961a;1961b) who called it a sech2 distribution. It is closely related to the logistic distribution. We say that a random variable X has a logistic (μ,σ) distribution, if its distribution function assumes the form
F X ( x ) = 1 + e - ( x - μ ) / σ - 1 , - < x <
and we write XL(μ,σ). It is not difficult to verify that
X L ( μ , σ ) e X P ( III ) ( 0 , e μ , σ ) .

It is as a consequence of the relation (14) that the Pareto (III) distribution, with μ = 0, is sometimes called the log-logistic distribution.

Remark 1. Johnson et al. (1994) refer to a Pareto distribution of the third kind that is not to be confused with the Pareto (III) distribution discussed in this paper. The survival function of this “third kind” distribution is of the form
F ¯ ( x ) = 1 + x σ - α e - β x , x > 0 .

This distribution, which was suggested by Pareto (1897), was proposed to accommodate cases in which the basic Pareto model (1) was inadequate for fitting certain data configurations. This model is closely related to the Pareto (II) distribution, but with an additional exponential factor. Note that it could be viewed as the distribution of the minimum of a Pareto (II) variable (with μ = 0) and an independent exponential variable. This model has been used infrequently, but recently it has reappeared, this time called a tapered Pareto distribution (Kagan and Schoenberg2001).

2.1 Distributional properties

The Feller–Pareto distributions are unimodal. The mode is at μ if γ > δ2, while if γ ≤ δ2, we find (here WF P(μ,σ,γ,δ1,δ2))
mode ( W ) = μ + σ ( δ 2 - γ ) / ( δ 1 + γ ) γ
In order to compute moments of the Pareto distributions, it is convenient to work with the representation (8). With WF P(μ,σ,γ,δ1,δ2), if we define W = (W - μ)/σ, then WF P(0,1,γ,δ1,δ2), i.e., W =  d (X2/X1) γ where X i  Γ(δ i ,1)i = 1,2, are independent random variables. It then can be readily verified that for a real number τ, the τ’th moment of W when it exists is of the form
E W τ = Γ ( δ 1 - γ τ ) Γ ( δ 2 + γ τ ) / Γ ( δ 1 ) Γ ( δ 2 ) , - ( δ 2 / γ ) < τ < ( δ 1 / γ ) .

From this expression moments for the Feller-Pareto and the P(II)-P(IV) distributions are readily obtained.

Moments of the Pareto (I) distribution cannot be obtained in this way since, for it, μ = σ ≠ 0. They are obtainable by direct integration:
( Pareto ( I ) ) E ( X τ ) = σ τ 1 - τ α - 1 , τ < α .
Sums of independent Pareto variables typically do not have analytically tractable distributions. If we multiply independent Pareto variables rather than adding them, it is sometimes possible to get simple expressions for the density of the resulting product. In the case of the Pareto I distribution the key lies in utilization of representation (12). Thus, if X1,X2,…,X n are independent Pareto I variables with X i P(I)(σ i ,α i ), then their product W has the representation
W = i = 1 n σ i exp i = 1 n ( V i / α i )

where the V i ’s are independent standard exponential variables. In some cases expressions are available for the distribution of i = 1 n V i / α i . In particular, if α i  = α,(i = 1,2,…,n), then i = 1 n V i / α Γ ( n , 1 / α ) , and we may readily obtain the density of W.

A second case in which simple closed form expressions are available is one in which all the α i ’s are distinct. In this situation we can use a result for weighted sums of exponentials given in, for example, Feller (1971), p. 40, and write the survival function of the product in the form:
P ( W > w ) = i = 1 n w σ - α i k = 1 k i n α k α i - α k , w > σ

where σ = i = 1 n σ i and α i  ≠ α j if i ≠ j. The distribution of products of independent Pareto (IV) variables with μ i ’s equal to 0 can, via the representation (8), be reduced to a problem involving the distribution of products of powers of independent gamma random variables. Unlike the Pareto (I) case, closed form expressions for the resulting density are apparently not obtainable, although moments of such products are readily available.

The Pareto (IV) family is closed under minimization when certain parameters are common to the minimands. Thus, if X1 and X2 are independent random variables with X i P(IV)(μ,σ,γ,α i ),i = 1,2, then
min ( X 1 , X 2 ) P ( IV ) ( μ , σ , γ , α 1 + α 2 ) .

Note that in this situation the X i ’s share common values for the parameters μ,σ and γ.

Pareto (III) variables exhibit an interesting closure property with respect to geometric minimization and maximization. Indeed, this was used as a justification for use of the Pareto (III) distribution as a suitable model for income distributions based on a scenario involving competitive bidding for employment (Arnold and Laguna1976). For this, consider a sequence X1,X2,… of i.i.d. Pareto (III) (μ,σ,γ) random variables. Suppose that for some p (0,1), N p is independent of the X i ’s and has a geometric (p) distribution, i.e., P(N = n) = p(1 - p)n-1, n = 1,2,…. Define the corresponding random extrema by
U p = min { X 1 , X 2 , , X N p } ,
V p = max { X 1 , X 2 , , X N p } .
It is readily verified, by conditioning on N p , that U p and V p each have Pareto (III) distributions. Thus
U p P ( III ) ( μ , σ p γ , γ ) ,
V p P ( III ) ( μ , σ p - γ , γ ) .
Observe that, if μ = 0, then
p - γ U p = d p γ V p = d X 1 .

Some characterization results based on this observation were discussed in Arnold et al. (1986).

It is possible to write down expressions for the densities of order statistics from a Pareto (IV) sample. The corresponding distribution functions will involve incomplete beta functions. Simulation of such order statistics may be accomplished by utilizing the relatively simple form of the Pareto (IV) quantile function, i.e.,
F - 1 ( u ) = μ + σ ( 1 - u ) - 1 / α - 1 γ .
From this we have that if Xi:n is the i th order statistic from a sample of size n from a Pareto (IV) distribution, then
X i : n = d F - 1 ( U i : n )

where = d means that the two random variables are identically distributed, where F-1 is as given in (26), and where Ui:n is the i th order statistic of a sample of size n from a uniform (0,1) distribution. It is well known (see e.g. David and Nagaraja2003) that Ui:n Beta(i,n - i + 1).

In some special cases the density of the i th order statistic (27) assumes a known form. For example:
X i s P ( III ) ( μ , σ , γ ) X i : n FP ( μ , σ , γ , n - i + 1 , i ) .
Another case involves minima:
X i s P ( IV ) ( μ , σ , γ , α ) X 1 : n P ( IV ) ( μ , σ , γ , n α ) .

3 Some related extensions

A variety of models have been proposed to add more flexibility to the generalized Pareto models discussed in Section 2. Most of them include Pareto models as special cases. In this Section we will make note of a selection of these models.

Many early researchers modeled the logarithm of income (called income power by Champernowne (1937)). Thus, instead of postulating a simple distribution for income, a relatively simple distribution was assumed for some function of income. More flexibility may be introduced by considering a parametric family of monotonic transformations of the income data whose parameters must be estimated from the data. For example, we might begin with a parametric family of increasing functions ψ ( x ; τ _ ) with corresponding inverse functions ψ - 1 ( x ; τ _ ) and assume that ψ ( x ; τ _ ) has a Pareto (IV) (0,σ,γ,α) distribution. If we denote the corresponding P(I V)(0,σ,γ,α) distribution by Fσ,γ,α(x) then the distribution of X will be
F X ( x ; σ , γ , α , τ _ ) = F σ , γ , α ψ - 1 ( x ; τ _ ) .
A parallel extension involves quantile functions instead of distribution functions. For this the quantile function of X is assumed be of the form
F X - 1 ( u ; σ , γ , α , τ _ ) = F σ , γ , α - 1 ψ ~ ( u ; τ _ ) .
where ψ ~ ( u ; τ _ ) is a parametric family of monotone functions mapping (0,1) onto (0,1). A popular model of this genre, introduced by Jones (2004), makes use of the family of quantile functions of Beta distributions. The density function of this Beta-generalized Pareto distribution is given by
f X ( x ; σ , γ , α , λ 1 , λ 2 ) = α 1 - 1 + x σ 1 / γ - α λ 1 - 1 1 + x σ 1 / γ - α λ 2 - 1 x σ ( 1 / γ ) - 1 σ γ B ( λ 1 , λ 2 ) ,

where x (0,). Of course, if λ1 = λ2 = 1, the Beta-generalized distribution simplifies to become a Pareto distribution.

Another popular model of the form (31) involves the simple choice ψ ~ ( u ) = u 1 / θ where θ > 0. In such a case we have
F X ( x ; σ , γ , α , θ ) = [ F σ , γ , α ( x ) ] θ , x > 0 ,

and the distribution is usually called the exponentiated generalized Pareto distribution. It can be recognized as a special case of the Beta-generalized Pareto model with the parameters chosen to be λ1 = θ and λ2 = 1.

Instead of the Beta distribution, one might use the Kumaraswamy distribution to obtain an alternative generalized Pareto distribution. First,we must recall the definition of the Kumaraswamy distribution. We say that X has a Kumaraswamy (λ1,λ2) distribution if its density and distribution functions are :
f K ( x ) = λ 1 λ 2 x λ 1 - 1 1 - x λ 1 λ 2 - 1 , 0 < x < 1 ,
F K ( x ) = 1 - 1 - x λ 1 λ 2 , 0 < x < 1 .
See Jones (2009) for a comprehensive introduction to the Kumaraswamy distribution. Let F P (x) denote the Pareto (IV) distribution function and suppose that K has a Kumaraswamy (λ1,λ2) distribution. Define Y = F P - 1 ( K ) , then Y has a Kumaraswamy-Pareto (IV) distribution with corresponding density
f Y ( y ) = f K ( F P ( y ) ) f P ( y ) .

Akinsete et al. (2008) consider some special subcases of the Beta-generalized Pareto distribution, while Paranaiba et al. (2013) discuss the Kumaraswamy-generalized Pareto distribution. Submodels of the Kumaraswamy-generalized Pareto model are often of interest. For example the exponentiated generalized Pareto distribution (33) is such a submodel.

And, of course, one can concatenate these constructions and consider a Beta-Kumaraswamy-Pareto distribution. Going one step further we would arrive at a generalized-Beta-Kumaraswamy-Pareto model. Each generalization adds flexibility at the cost of introducing more parameters. Some degree of parsimony is evidently called for here.

Pillai (1991) suggested an extension of the Pareto (III) distribution, motivated by its closure under geometric minimization. A random variable is said to have a semi-Pareto (III) distribution if its survival function is of the form
F ¯ ( x ; μ , σ , γ , p ) = 1 + x - μ σ 1 / γ h x - μ σ - 1 , x > μ ,

where μ (-,),σ,γ (0,),p (0,1) and h(x) is a periodic function of lnx with period -2π/[ γ lnp], and with h(0) = 1. The case in which h(x) ≡ 1 for every x corresponds to the usual Pareto (III) model. More generality can be arrived at if h(x) is replaced by a suitable parametric family of periodic functions. Note that, in order for (36) to be a valid survival function, it must be the case that x1/γh(x) is a non-decreasing function of x.

Hidden truncation or selection models may sometimes provide alternative models that are more suitable than basic Pareto models. The corresponding scenario is one in which the variable X is observed only if a covariable Y takes on a value less than some threshold value. Thus the distribution of the observed X’s is of the form P(X ≤ x|Y ≤ y0). With this in mind, consider the case in which (X,Y) has a bivariate Pareto (IV) distribution with the following joint survival function.
P ( X > x , Y > y ) = 1 + x - μ σ 1 / γ + y - ν τ 1 / δ - α , x > μ , y > ν .
(Such distributions will be discussed in more detail in Section 5). This distribution has Pareto (IV) marginals and has Pareto (IV) conditionals. After suitable reparameterization the corresponding hidden truncation density for X, given that it can only be observed if Y is not too large, is
f HT ( x ; μ , σ , γ , α , θ ) = α x - μ σ ( 1 / γ ) - 1 γ σ [ 1 - ( 1 + θ ) - α ] × 1 + x - μ σ 1 / γ - ( α + 1 ) - 1 + x - μ σ 1 / γ + θ - ( α + 1 ) , x > μ ,
where a new parameter θ = [(y0-ν)/τ]1/δ has been introduced. In this model, μ is a real valued parameter, often positive, while all of the other parameters, σ,γ,α and θ are positive valued. An alternative representation of this density is possible as follows.
f HT ( x ; μ , σ , γ , α , θ ) = 1 1 - ( 1 + θ ) - α α γ σ x - μ σ ( 1 / γ ) - 1 1 + x - μ σ 1 / γ - ( α + 1 ) - ( 1 + θ ) - α 1 - ( 1 + θ ) - α α γ σ 1 x - μ σ 1 ( 1 / γ ) - 1 1 + x - μ σ 1 1 / γ - ( α + 1 )

where σ1 = σ(1 + θ) γ . This is recognizable as a linear combination of two Pareto (IV) densities. Note that the density is a linear combination of two Pareto (IV) densities, but it is not a convex combination since, although the coefficients add up to 1, the second coefficient is negative. Motivated by this example, one might also consider k-component linear combinations of Pareto (IV) densities as income models, allowing k to be greater than 2. Such models with positive coefficients are natural candidates for fitting multimodal income data sets which may well have a mixture genesis. Note that, by testing the hypothesis that θ = 0 one can decide whether or not the data set at hand has been subject to hidden truncation. More detailed discussion of these hidden truncation Pareto models, in the Pareto (II) case, may be found in Arnold and Ghosh (2011).

4 Inference, briefly

Suppose that X1,X2,…,X n are independent identically distributed random variables with a common Pareto (IV) distribution. The sample size should be reasonably large, since we have four parameters to estimate. The sample minimum, or some minor corrected version of it, will be a suitable estimate of the location parameter μ. After subtracting it from each of the observations, the remaining three parameters may be estimated using maximum likelihood. The corresponding Fisher information matrix is available (as indeed is the Fisher information matrix for the Feller-Pareto model with μ = 0).

Either a global search or numerical solution of the likelihood equations will be required to identify the location of the maximum of the likelihood function. In the Pareto (I) case, a variety of alternative estimates are available including best unbiased estimates. Alternatively, in the Pareto (I) case one can take logarithms of the data and arrive at a shifted exponential model, for which many estimation strategies have been developed.

A diffuse prior Bayesian analysis can be used for Pareto (IV) data. It will, predictably, yield results similar to those obtained via maximum likelihood. In the Pareto (I) case, Lwin (1972) introduced a conjugate family of priors for (σ,α) which can be used to incorporate some degree of prior knowledge of the parameters. Arnold et al. (1998) suggest use of a more flexible family of what they call conditionally conjugate priors in this setting. These priors are tailor-made for subsequent use of Gibbs sampling algorithms to generate realizations from the corresponding posterior distribution.

More details on parametric inference for Pareto models may be found in Arnold (1983) and Arnold (2008).

5 Multivariate Pareto models

The first author to systematically study k-dimensional Pareto distributions was Mardia (1962). Mardia’s type I multivariate Pareto distribution has the attractive feature that both marginals and conditional distributions are Paretian in nature. We will say that a k-dimensional random vector X _ has a type I multivariate Pareto distribution, if the joint survival function is of the form
F ̄ X _ ( x _ ) = i = 1 k ( x i / σ i ) - k + 1 - α , x i > σ i
and we write X _ M P ( k ) ( I ) ( σ _ , α ) . The σ i ’s are non-negative marginal scale parameters. The non-negative parameter α is an inequality parameter (common to all marginals). It follows from (40) that the one-dimensional marginals are classical Pareto distributions. Thus X i P(I)(σ i ,α), i = 1,2,…,k. By setting selected x i ’s equal to σ i in (40), it is apparent that, for any k1 < k, all k1 dimensional marginals are again multivariate Pareto. If we use the notational device X _ = ( X ̇ _ , X ̈ _ ) where X ̇ _ is k1 dimensional, with an analogous partition of the vector σ _ = ( σ ̇ _ , σ ̈ _ ) , we may write
X ̇ _ M P ( k 1 ) ( I ) ( σ ̇ _ , α ) .

Conditional distributions are also of the form (40), but with a change of location.

It is natural to extend this basic multivariate Pareto model by the introduction of location, scale, inequality and shape parameters in a manner parallel to that used to develop the univariate Pareto (II)-(IV) distributions, as follows:

(MP(k)(I I)) We will say that X _ has a k-dimensional Pareto distribution of type II, if its joint survival function is of the form
F ̄ X _ ( x _ ) = 1 + i = 1 k x i - μ i σ i - α , x i > μ i , i = 1 , 2 , , k

and we write X _ M P ( k ) ( II ) ( μ _ , σ _ , α ) .

(MP(k)(III)) X _ has a k-dimensional Pareto distribution of type III, if its joint survival function is of the form
F ̄ X _ ( x _ ) = 1 + i = 1 k x i - μ i σ i 1 / γ i - 1 , x i > μ i , i = 1 , 2 , , k

and we write X _ M P ( k ) ( III ) ( μ _ , σ _ , γ _ ) .

(MP(k)(IV)) X _ has a k-dimensional Pareto distribution of type IV, if its joint survival function is of the form
F ̄ X _ ( x _ ) = 1 + i = 1 k x i - μ i σ i 1 / γ i - α , x i > μ i , i = 1 , 2 , , k

and we write X _ M P ( k ) ( IV ) ( μ _ , σ _ , γ _ , α ) .

The marginals and conditionals of an MP(k)(II) distribution are again of the MP(k)(II) form. An MP(k)(III) distribution has MP(k)(III) marginals, but not conditionals. However an MP(k)(IV) distribution does have both its marginals and conditionals of the MP(k)(IV) form. Specifically, in the MP(k)(IV) case, using the dot – double dot notation, we have
X _ ̇ M P ( k 1 ) ( IV ) ( μ ̇ _ , σ ̇ _ , γ ̇ _ , α )
X _ ̇ | X _ ̈ = x _ ̈ M P ( k 1 ) ( IV ) ( μ _ ̇ , τ _ ̇ , γ _ ̇ , α + k - k 1 ) ,
τ i = σ i 1 + j = k 1 + 1 k x j - μ j σ j 1 / γ j γ i , i = 1 , 2 , , k 1 .

Takahasi (1965) discussed the MP(k)(IV) distribution with μ _ = 0 _ and σ _ = 1 _ . He called it a multivariate Burr’s distribution, and noted that the marginal and conditional distributions were of the same form.

As in the univariate case, distributional properties of these multivariate Pareto distributions and possible further extensions are more transparent if one uses a representation of the variables as functions of certain independent gamma variables. Thus if X _ has a k-dimensional Pareto distribution of type IV, we may act as if the X i ’s have the representation
X i = μ i + σ i ( W i / Z ) γ i , i = 1 , 2 , , k

where the W i ’s are independent identically distributed Γ(1,1) variables (i.e., standard exponential variables) and Z, independent of the W i ’s, has a Γ(α,1) distribution. This representation, for example, makes it easy to compute the means, variances and covariances of X _ .

A generalization of the representation (48) is one in which the W i ’s are gamma rather than exponential variables. The resulting distribution will be called k-dimensional Feller-Pareto, since its marginals are of the Feller-Pareto form. Thus X _ F P ( k ) ( μ _ , σ _ , γ _ , α , β _ ) if
X i = μ i + σ i ( W i / Z ) γ i , i = 1 , 2 , , k

where the W i ’s and Z are independent random variables with W i  Γ(β i ,1), (i = 1,2,…,k), and Z Γ(α,1). The marginal and conditional distributions of this multivariate Feller-Pareto distribution are again multivariate Feller-Pareto. The covariance structure can be readily obtained from the representation (49). Parallel to the situation in one dimension, there exist alternative names that could be applied to multivariate Feller-Pareto variables. They could be called multivariate generalized F variables or multivariate generalized beta of the second kind variables. An evident drawback of the multivariate Feller-Pareto model (and its various submodels) is the presence of a common value of α which appears in each marginal density. The consequences of this homogeneity are not easy to pin down. Certainly a model with Feller Pareto marginals with different α’s for each of the marginals would be desirable, if one can be developed with attractive distributional properties (e,g., “nice” conditional distributions).

5.1 Other multivariate Pareto distributions

Although the title of this section promises discussion of multivariate models, only the bivariate case will be treated. It will be left to the reader to visualize the, usually straightforward, extension to the multivariate case. Notational complexity is avoided to a great extent by focusing on the case k = 2.

It is not difficult to verify that a P(IV)(μ,σ,γ,α) distribution can be represented as a scale mixture of Weibull distributions. Equivalently, as remarked earlier, that a P(IV) random variable admits a representation as
X = μ + σ ( U / Z ) γ
where Uexp(1) and Z Γ(α,1) are independent variables. A natural bivariate version of this construction begins with (U1,U2) having a bivariate exponential distribution with standard exponential marginals, perhaps one of the Marshall-Olkin type with parameters 1, 1 and λ. Then, with Z Γ(α,1) independent of (U1,U2), we define (X1,X2) by
X i = μ i + σ i ( U i / Z ) γ i , i = 1 , 2 .

Observe that in any bivariate Pareto (IV) distribution generated by this method, the marginals share a common value of α.

A second approach to generating bivariate P(IV) distributions makes use of the following representation of a P(IV) variable. Suppose that Uexp(1), then
μ + σ e U / α - 1 γ P ( IV ) ( μ , σ , γ , α )
Here too then, we can begin with (U1,U2) having an arbitrary bivariate distribution with standard exponential marginals and construct a variable (X1,X2) with a bivariate Pareto (IV) distribution by defining
X i = μ i + σ i e U i / α i - 1 γ i , i = 1 , 2 .
A third approach makes use of the fact that minima of independent Pareto (IV) random variables themselves have Pareto (IV) distributions. Thus if X i , i = 1,2, are independent with X i P(IV)(μ,σ,γ,α i ), then min(X1,X2) P(IV)(μ,σ,γ,α1 + α2). We then begin with three independent random variables Y1, Y2, Y3 with Y i P(IV)(μ,σ,γ,α i ) and define
X 1 = min ( Y 1 , Y 3 ) , X 2 = min ( Y 2 , Y 3 )
(this approach is often called the method of trivariate reduction). In addition to having Pareto IV marginals, it is clear that the distribution described by (54) has the property that min(X1,X2) P(IV)(μ,σ,γ,α1 + α2 + α3). This distribution has the perhaps undesirable property that P(X1 = X2) > 0, and has another unfortunate property in that the marginals share common values of μ, σ and γ. This latter problem can be avoided to some extent by assuming that the Y i ’s have P(IV)(0,1,γ,α i ) distributions and then defining
X 1 = μ 1 + σ 1 min ( Y 1 , Y 3 ) , X 2 = μ 2 + σ 2 min ( Y 2 , Y 3 ) .

In this case the X i ’s share only a common value of γ.

Finally we mention the popular Copula based approach to constructing bivariate distributions with given marginals. For this, we begin with an analytically tractable bivariate distribution for (Z1,Z2) and apply marginal transformations to produce a bivariate distribution with Pareto (IV) marginals. A popular choice for the distribution of (Z1,Z2) is a bivariate normal with standard normal marginals and correlation ρ, but of course any other bivariate distribution can be used in its place. Now using Fμ,σ,γ,α to denote the distribution function of a P(IV)(μ,σ,γ,α) random variable and Φ to denote a standard normal distribution function, we define
X 1 = F μ 1 , σ 1 , γ 1 , α 1 - 1 ( Φ ( Z 1 ) ) , X 2 = F μ 2 , σ 2 , γ 2 , α 2 - 1 ( Φ ( Z 2 ) ) .

The correlation structure of the X i ’s is inherited from the correlation structure of the Z i ’s. In this case the extension to k dimensions is particularly transparent. Note also that the model has one dependence parameter which, if set equal to 0, yields a model with independent P(IV) marginals. It will be noted that this feature of having a single dependence parameter is shared by the other bivariate models introduced in this Section.

6 Multivariate extensions

Several of the univariate extensions, discussed in Section 3, can be readily modified to yield k-dimensional versions. For example a random variable with the univariate Beta-generalized-Pareto (IV) distribution can be viewed as being defined by
X = F μ , σ , γ , α - 1 ( V ) ,
where VBeta(λ1,λ2). For a bivariate version of this construction, we begin with (V1,V2) having a bivariate Beta distribution, perhaps of the the type introduced by Arnold and Ng (2011), and make suitable marginal transformations. Thus we define
X 1 = F μ 1 , σ 1 , γ 1 , α 1 - 1 ( V 1 ) , X 2 = F μ 2 , σ 2 , γ 2 , α 2 - 1 ( V 2 ) .

Higher dimensional versions of this construction require only the identification of a suitable k-dimensional Beta distribution. A Dirichlet distribution might be used here. Some other alternatives are described in Arnold and Ng (2011).

To identify a suitable bivariate analog of the Kumaraswamy-Pareto (IV) distribution, all that is required is a bivariate-Kumaraswamy distribution. One possible such distribution was suggested by Nadarajah et al. (2011).

Hidden truncation models, likewise, can be considered in higher dimensions. For example we may begin with X _ M P ( k ) ( IV ) ( μ _ , σ _ , γ _ , α ) . Then, using our dot – double dot notation, we have
f HT ( x _ ̇ ) = f X _ ̇ | X _ ̈ x _ ̈ ( x _ ̇ ) = f X _ ̇ ( x _ ̇ ) P ( X _ ̈ x _ ̈ | X _ ̇ = x _ ̇ ) P ( X _ ̈ x _ ̈ )

which is not difficult to evaluate, since the conditional distribution of X _ ̈ given that X _ ̇ = x _ ̇ is of the M P ( k - k 1 ) ( IV ) form (refer to equation (46), being careful to switch the roles of X _ ̇ and X _ ̈ ).

We conclude this section by noting the availability of multivariate distributions with Pareto conditionals rather than Pareto marginals. Detailed discussion of such models may be found in Arnold et al. (1999), Chapter 5.

7 Envoi

The survey presented in this paper is far from complete. A more detailed and extensive survey (though somewhat out of date) can be found in Arnold (1983). A revision of that book is, however, currently in preparation. In the interim, see Arnold (2008) for a more up-to-date presentation and, as mentioned in Section 4, for more details on inferential strategies. More work is still needed on the development of estimation and hypothesis testing strategies, especially for multivariate Pareto data. Creative Bayesian analyses involving informative priors, in multivariate settings and in cases involving covariates, are also notable for their absence. Finally, I apologize to those readers whose important contributions have been overlooked in this survey. I excuse myself by repeating that the survey is necessarily incomplete. However, please do advise me of any glaring omissions that you might note.



The constructive suggestions supplied by anonymous referees have resulted in a much improved manuscript.

Authors’ Affiliations

Department of Statistics, University of California


  1. Aban IB, Meerschaert MM, Panorska AK: Parameter estimation for the truncated Pareto distribution. J. Amer. Statist. Assoc 101: 270–277. 2006MathSciNetView ArticleGoogle Scholar
  2. Akinsete A, Famoye F, Lee C: The beta-Pareto distribution. Statistics 42: 547–563. 2008MathSciNetView ArticleGoogle Scholar
  3. Arnold BC: Pareto Distributions. International Cooperative Publishing House, Burtonsville, MD; 1983Google Scholar
  4. Arnold BC: Pareto and generalized Pareto distributions. In Modeling Income Distributions and Lorenz Curves. Edited by: Chotikapanich D. Springer, New York; 2008View ArticleGoogle Scholar
  5. Arnold BC, Austin K: Truncated Pareto distributions: flexible, tractable and familiar.. Technical Report #150, Department of Statistics, University of California, Riverside, CA; 1987Google Scholar
  6. Arnold BC, Castillo E, Sarabia JM: Bayesian analysis for classical distributions using conditionally specified priors. Sankhya: Ind. J. Stat. Series B 60: 228–245. 1998MathSciNetGoogle Scholar
  7. Arnold BC, Castillo E, Sarabia JM: Conditional Specification of Statistical Models. Springer, New York; 1999Google Scholar
  8. Arnold BC, Ghosh I: Inference for Pareto data subject to hidden truncation. J. Ind. Soc. Probability Stat 13: 1–16. 2011Google Scholar
  9. Arnold BC, Laguna L: A stochastic mechanism leading to asymptotically Paretian distributions. Business and Economic Statistics Section, Proceedings of the American Statistical Association 1976Google Scholar
  10. Arnold BC, Ng HKT: Flexible bivariate beta distributions. J. Multivariate Anal 102: 1194–1202. 2011MathSciNetView ArticleGoogle Scholar
  11. Arnold BC, Robertson CA, Yeh HC: Some properties of a Pareto-type distribution. Sankhya: Ind. J. Stat. Series A 48: 404–408. 1986MathSciNetGoogle Scholar
  12. Champernowne DG: The theory of income distribution. Econometrica 5: 379–381. 1937Google Scholar
  13. David HA, Nagaraja HN: Order Statistics. Third edition. Wiley, Hoboken, NJ; 2003Google Scholar
  14. Falk M, Hüsler J, Reiss R: Laws of Small Numbers, Extremes and Rare Events. Third edition. Birkhäuser/Springer, Basel; 2011View ArticleGoogle Scholar
  15. Feller W: An Introduction to Probability Theory and its Applications, Vol. 2. Second edition. Wiley, New York; 1971Google Scholar
  16. Fisk PR: The graduation of income distributions. Econometrica 29: 171–185. 1961aView ArticleGoogle Scholar
  17. Fisk PR: Estimation of location and scale parameters in a truncated grouped sech-square distribution. J. Am. Stat. Assoc 56: 692–702. 1961bView ArticleGoogle Scholar
  18. Johnson NL, Kotz S, Balakrishnan N: Continuous Univariate Distributions, Vol. 1. Second edition. Wiley, New York; 1994Google Scholar
  19. Jones MC: Families of distributions arising from distributions of order statistics. TEST 13: 1–43. 2004MathSciNetView ArticleGoogle Scholar
  20. Jones MC, Kumaraswamy’s distribution: A beta-type distribution with some tractability advantages. Stat. Methodol 6: 70–81. 2009MathSciNetView ArticleGoogle Scholar
  21. Kagan YY, Schoenberg F: Estimation of the upper cutoff parameter for the tapered Pareto distribution. J. Appl. Probability 38A: 158–175. 2001MathSciNetView ArticleGoogle Scholar
  22. Kalbfleisch JD, Prentice RL: The Statistical Analysis of Failure Time Data. Wiley, New York; 1980Google Scholar
  23. Keiding N, Kvist K, Hartvig H, Tvede M, Juul S: Estimating time to pregnancy from current durations in a cross-sectional sample. Biostatistics 3: 565–578. 2002View ArticleGoogle Scholar
  24. Lwin T: Estimation of the tail of the Paretian law. Skand. Aktuarietidskr 55: 170–178. 1972MathSciNetGoogle Scholar
  25. Maguire BA, Pearson ES, Wynn AHA: The time intervals between industrial accidents. Biometrika 39: 168–180.View ArticleGoogle Scholar
  26. Mardia KV: Multivariate Pareto distributions. Ann. Math. Stat 33: 1008–1015. 1962MathSciNetView ArticleGoogle Scholar
  27. Nadarajah S, Cordeiro GM, Ortega EMM: General results for the Kumaraswamy-G distribution. J. Stat. Comput. Simul 82: 951–979. 2011MathSciNetView ArticleGoogle Scholar
  28. Pareto V: Cours d’economie Politique, Vol. II. F. Rouge, Lausanne; 1897Google Scholar
  29. Paranaiba PF, Ortega EMM, Cordeiro GM, de Pascoa MAR: The Kumaraswamy Burr XII distribution: theory and practice. J. Stat. Comput. Simul 83: 2117–2143. 2013MathSciNetView ArticleGoogle Scholar
  30. Pillai RN: Semi-Pareto processes. J. Appl. Probab 28: 461–465. 1991MathSciNetView ArticleGoogle Scholar
  31. Takahasi K: Note on the multivariate Burr’s distribution. Ann. Inst. Statist. Math 17: 257–260. 1965MathSciNetView ArticleGoogle Scholar


© Arnold; licensee Springer. 2014

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.