Univariate and multivariate Pareto models

Arnold, Barry C

doi:10.1186/2195-5832-1-11

Review
Open access
Published: 17 June 2014

Univariate and multivariate Pareto models

Barry C Arnold¹

Journal of Statistical Distributions and Applications volume 1, Article number: 11 (2014) Cite this article

5558 Accesses
11 Citations
2 Altmetric
Metrics details

Abstract

The Pareto distribution has long been recognized as a suitable model for many non-negative socio-economic variables. Univariate and multivariate variations abound. Some unification is possible by representing the Pareto variables in terms of independent gamma distributed components. Further unification is sometimes possible since some of the frequently used multivariate Pareto models share the same copula. In some cases, inference strategies can be developed to take advantage of the stochastic representations in terms of gamma components.

1 Introduction

Discussion of Pareto and Pareto-like distributions can be traced back to Vilfredo Pareto’s Economics textbook published in Rome in 1897. His observation that the number of persons in a population whose incomes exceed x is often well approximated by Cx^-α for some positive C and some positive α, led inexorably to consideration of the following well known model used for fitting univariate income data.

P (X > x) = {\bar{F}}_{X} (x) = {(x / σ)}^{- α}, x > σ .

(1)

Here σ, the scale parameter, is positive and α (Pareto’s index of inequality) is also positive.

This model is typically referred to as the classical Pareto model. Improved fitting of data is encountered when more general Pareto-like distributions are considered. In this survey, the classical model will be embedded in a hierarchy of more complicated Pareto models. In this hierarchy of generalized Pareto distributions, the classical model will be called the Pareto (I) distribution. Multivariate income distributions are also of interest and, in that arena, a hierarchy of multivariate Pareto distributions is available, paralleling and closely related to the univariate hierarchy.

Even more flexible models have been proposed using these univariate and multivariate Pareto models as building blocks. Several of these will be described in this paper.

The end result is an impressively flexible array of income models from which the researcher can select a parsimonious model for the particular data set at hand. The emphasis in this survey will be on distributional properties of the models but some attention will be paid to estimation and inference strategies.

2 A hierarchy of generalized Pareto models

As the basic distribution in our hierarchy of generalized Pareto models we use the classical Pareto distribution, called here the Pareto (I) distribution. Its survival function is of the form (1). In practice, α is frequently assumed to be larger than 1, so that the distribution has a finite mean. If a random variable X has (1) as its survival function, then we write X ∼ P(I)(σ,α). In this basic model, the parameter σ has a dual role. It is indeed a scale parameter, but also it determines the lower bound of the support of the distribution and, in a sense, plays to some extent the role of a location parameter. A slightly more general model, which separates the roles of location and scale parameters has then been frequently used.

This distribution will be called the Pareto (II) distribution. Its survival function is of the form

\bar{F} (x) = {[1 + (\frac{x - μ}{σ})]}^{- α}, x > μ

(2)

where μ, the location parameter, is real valued, σ is positive and α is positive. In most applications μ will be non-negative, but negative values for μ pose no problems. If X has (2) as its survival function, we will write X ∼ P(II)(μ,σ,α).

There is an intimate relation between the Pareto (II) distribution and the Pickands generalized Pareto model, which is much used in the study of extreme values and peaks over thresholds. A good general reference for discussion of the role of the Pickands generalized Pareto distribution is the book by Falk et al. (2011). The density of the Pickands generalized Pareto model is

f (x; σ, k) = \frac{1}{σ} {(1 - \frac{kx}{σ})}^{(1 - k) / k} I (x > 0, (kx) / σ < 1) .

(3)

where σ > 0 and -∞ < k < ∞. The density corresponding to k = 0 is obtained by taking the limit as k ↑ 0 in (3). This model, (3), includes three sub-models. When k < 0, it yields a Pareto (II) density (with μ = 0), when k = 0 it yields an exponential density, while for k > 0, it corresponds to a scaled Beta distribution (of the first or standard kind). In fact, several results for the Pareto (II) distribution can be proved to remain valid in the more general context of the Pickands generalized Pareto model. In an income modeling setting, as Pareto observed, heavy tailed distributions are typically encountered and, for this reason, we will concentrate on the Pareto (II) sub-model. We remark, in passing, that despite Pareto’s insistence on the ubiquity of heavy tails, several authors have utilized scaled Beta distributions as income models, and some have even argued in favor of the exponential distribution as a model. The most general model in our hierarchy, the Feller-Pareto model will be seen to actually include some light-tailed distributions corresponding to scaled Beta models with an additional location parameter. Applications of such light tailed models are more likely to be encountered outside of the income distribution context.

Truncated versions of the Pareto (II) distribution, which clearly are not heavy tailed, are sometimes appropriate models for data sets which, for some reason, exclude large values. Such distributions are of the form

\begin{array}{l} F (x) & = 0, & x \leq μ, \\ = \frac{1 - {(1 + \frac{x - μ}{σ})}^{- α}}{1 - {(1 + \frac{τ - μ}{σ})}^{- α}}, & μ < x < τ, \\ = 1, & x \geq τ, \end{array}

(4)

where -∞ < μ < τ < ∞, σ > 0, and α > 0. Some discussion of such models may be found in Aban et al. (2006) and Arnold and Austin (1987).

The Pareto (III) distribution is a variant model with tail behavior comparable to that of the Pareto (II) distribution. Its survival function is of the form

\bar{F} (x) = {[1 + {(\frac{x - μ}{σ})}^{1 / γ}]}^{- 1}, x > μ

(5)

where μ is real, σ is positive and γ is positive. We will call γ the inequality parameter. If μ = 0 and γ ≤ 1, then γ turns out to be precisely the Gini index of inequality for this distribution. If X has (5) as its survival function, we will write X ∼ P(III)(μ,σ,γ).

If we introduce both a shape and an inequality parameter, we arrive at the Pareto (IV) family:

\bar{F} (x) = {[1 + {(\frac{x - μ}{σ})}^{1 / γ}]}^{- α}, x > μ

(6)

where μ (location) is real, σ (scale) is positive, γ (inequality) is positive and α (shape) is positive. Although we continue to call γ the inequality parameter it will only be identifiable with the Gini index when α = 1 and μ = 0. One might argue instead that in the P(IV) model both γ and α would be best described as shape parameters, since neither of them has a direct inequality interpretation. An anonymous referee points out that the two parameters γ and α govern the behavior of the P(IV) density as x approaches μ from above and as x approaches infinity. Thus, f(x;μ,σ,γ,α)∼x^-α/γ-1 as x → ∞ and f(x-μ;μ,σ,γ,α) ∼ (x-μ)^1/γ-1 as x → μ. He suggests that an argument might be advanced in favor of a reparameterization in which we define β = α/γ, to highlight the roles of α and β in determining the limiting behavior of the density. However, in this paper, to be consistent with the notation in Arnold (1983), we will continue with the μ,σ,γ,α parameterization and continue to call γ the inequality parameter. If a random variable X has (6) as its survival function, we will write X ∼ P(IV)(μ,σ,γ,α). Note that the Pareto (IV) distribution, with μ = 0, is also known as a Burr Type XII distribution.

The three more specialized families. P(I)-P(III), may be identified as special cases of the Pareto (IV) family as follows:

\begin{array}{c} P (I) (σ, α) = P (IV) (σ, σ, 1, α), \\ P (II) (μ, σ, α) = P (IV) (μ, σ, 1, α), \\ P (III) (μ, σ, γ) = P (IV) (μ, σ, γ, 1) . \end{array}

(7)

Feller (1971), p. 49, suggested a different definition of a Pareto distribution. It can be recognized as the distribution of a ratio of two independent gamma variables (a distribution also known as Beta distribution of the second kind). By considering a linear function of a power of such a random variable, we arrive at a very general family, called the Feller–Pareto family. Thus if X_i ∼ Γ(δ_i,1)i = 1,2, are independent random variables, and if for μ real, σ > 0 and γ > 0 we define

W = μ + σ {(X_{2} / X_{1})}^{γ},

(8)

then W has a Feller–Pareto distribution, and we write W ∼ FP(μ,σ,γ,δ₁,δ₂).

It may be verified that the Pareto (IV) distributions are identifiable with the Feller–Pareto distributions with δ₂ = 1, i.e.,

P (IV) (μ, σ, γ, α) = FP (μ, σ, γ, α, 1) .

(9)

The density of the general Feller–Pareto distribution defined by (8) is of the form

\begin{array}{r} f_{W} (w) = {(\frac{w - μ}{σ})}^{(δ_{2} / γ) - 1} {[1 + {(\frac{w - μ}{σ})}^{1 / γ}]}^{- δ_{1} - δ_{2}} / [γ σ B (δ_{1}, δ_{2})], \\ w > μ . \end{array}

(10)

The corresponding survival function is obtainable from tables of the incomplete beta function. For many computations it is simpler to work directly with the representation (8). The Pareto (IV) distributions correspond to the case in which X₂ has a gamma distribution while X₁ has an exponential distribution. The Pareto (III) distributions are encountered when both X₁ and X₂ are exponential variables.

Kalbfleisch and Prentice (1980) call the Feller-Pareto density (with μ = 0) a generalized F density. Instead we might describe a Feller Pareto variable as being a location and scale transform of a generalized beta variable of the second kind. Recall that a beta variable of the second kind is just a ratio of independent gamma variables.

An even more general model might be built using independent variables X_i ∼ Γ(δ_i,1), i = 1,2. One could define

W = μ + σ (\frac{X_{2}^{γ_{2}}}{X_{1}^{γ_{1}}}) .

(11)

The additional flexibility provided by the introduction of such a sixth parameter in the model has not been investigated.

The full array of generalized univariate Pareto distributions to be considered in this paper are subsumed in the Feller–Pareto family and a unified derivation of many distributional results is possible. However, in the case of Pareto (I)-(IV) distributions, some alternative representations are also useful.

A random variable X has a P(I)(σ,α) distribution if it is of the form

X = σ e^{V / α}

(12)

where V is a standard exponential random variable. An analogous representation of a Pareto (II) variable in terms of an exponential random variable is possible, i.e.,

X = μ + σ (e^{V / α} - 1) .

(13)

Likewise Pareto (III) and (IV) variables can be represented as X = μ + σ(e^V - 1)^γ and X = μ + σ(e^V/α - 1)^γ respectively. The representation (12) for a classical Pareto variable (i.e., Pareto (I)) highlights the useful observation that the logarithm of such a variable has a shifted exponential distribution. This will permit the recognition of many distributional properties of Pareto (I) variables as reflections of parallel properties of exponential variables.

A second important representation of the Pareto (II) distribution, known to Maguire et al. (1952), is as a mixture of exponentials. We may describe it in terms of the conditional survival function, given an auxiliary gamma distributed random variable Z. Thus, if

P (X > x ∣ Z = z) = e^{- z (x - μ) / σ}, x > μ,

i.e., a (translated) exponential distribution, and if Z ∼ Γ(α,1), then it follows that unconditionally X∼P(II)(μ,σ,α). Alternatively, this can be viewed as being equivalent to the representation in (8) after setting γ = δ₂ = 1.

This representation of the Pareto (II) distribution as a gamma mixture of exponential distributions is often encountered in reliability and survival contexts, see e.g., Keiding et al. (2002). It is also familiar in Bayesian analysis of exponential data, where the gamma density enters as a convenient prior. In this context the Pareto (II) distribution is sometimes called the Lomax distribution.

The Pareto (III) distribution was apparently first considered by (Fisk 1961a;1961b) who called it a sech² distribution. It is closely related to the logistic distribution. We say that a random variable X has a logistic (μ,σ) distribution, if its distribution function assumes the form

F_{X} (x) = {[1 + e^{- (x - μ) / σ}]}^{- 1}, - \infty < x < \infty

and we write X ∼ L(μ,σ). It is not difficult to verify that

X \sim L (μ, σ) \Leftrightarrow e^{X} \sim P (III) (0, e^{μ}, σ) .

(14)

It is as a consequence of the relation (14) that the Pareto (III) distribution, with μ = 0, is sometimes called the log-logistic distribution.

Remark 1. Johnson et al. (1994) refer to a Pareto distribution of the third kind that is not to be confused with the Pareto (III) distribution discussed in this paper. The survival function of this “third kind” distribution is of the form

\bar{F} (x) = {(1 + \frac{x}{σ})}^{- α} e^{- β x}, x > 0 .

(15)

This distribution, which was suggested by Pareto (1897), was proposed to accommodate cases in which the basic Pareto model (1) was inadequate for fitting certain data configurations. This model is closely related to the Pareto (II) distribution, but with an additional exponential factor. Note that it could be viewed as the distribution of the minimum of a Pareto (II) variable (with μ = 0) and an independent exponential variable. This model has been used infrequently, but recently it has reappeared, this time called a tapered Pareto distribution (Kagan and Schoenberg2001).

2.1 Distributional properties

The Feller–Pareto distributions are unimodal. The mode is at μ if γ > δ₂, while if γ ≤ δ₂, we find (here W ∼ F P(μ,σ,γ,δ₁,δ₂))

mode (W) = μ + σ {[(δ_{2} - γ) / (δ_{1} + γ)]}^{γ}

(16)

In order to compute moments of the Pareto distributions, it is convenient to work with the representation (8). With W ∼ F P(μ,σ,γ,δ₁,δ₂), if we define W^∗ = (W - μ)/σ, then W^∗ ∼ F P(0,1,γ,δ₁,δ₂), i.e., W^∗ = ^d(X₂/X₁)^γ where X_i ∼ Γ(δ_i,1)i = 1,2, are independent random variables. It then can be readily verified that for a real number τ, the τ’th moment of W^∗ when it exists is of the form

E (W^{* τ}) = Γ (δ_{1} - γ τ) Γ (δ_{2} + γ τ) / Γ (δ_{1}) Γ (δ_{2}), - (δ_{2} / γ) < τ < (δ_{1} / γ) .

(17)

From this expression moments for the Feller-Pareto and the P(II)-P(IV) distributions are readily obtained.

Moments of the Pareto (I) distribution cannot be obtained in this way since, for it, μ = σ ≠ 0. They are obtainable by direct integration:

(Pareto (I)) E (X^{τ}) = σ^{τ} {(1 - \frac{τ}{α})}^{- 1}, τ < α .

(18)

Sums of independent Pareto variables typically do not have analytically tractable distributions. If we multiply independent Pareto variables rather than adding them, it is sometimes possible to get simple expressions for the density of the resulting product. In the case of the Pareto I distribution the key lies in utilization of representation (12). Thus, if X₁,X₂,…,X_n are independent Pareto I variables with X_i ∼ P(I)(σ_i,α_i), then their product W has the representation

W = (\prod_{i = 1}^{n} σ_{i}) exp (\sum_{i = 1}^{n} (V_{i} / α_{i}))

(19)

where the V_i’s are independent standard exponential variables. In some cases expressions are available for the distribution of $\sum_{i = 1}^{n} V_{i} / α_{i}$ . In particular, if α_i = α,(i = 1,2,…,n), then $\sum_{i = 1}^{n} V_{i} / α \sim Γ (n, 1 / α)$ , and we may readily obtain the density of W.

A second case in which simple closed form expressions are available is one in which all the α_i’s are distinct. In this situation we can use a result for weighted sums of exponentials given in, for example, Feller (1971), p. 40, and write the survival function of the product in the form:

P (W > w) = \sum_{i = 1}^{n} {(\frac{w}{σ})}^{- α_{i}} \prod_{\binom{k = 1}{k \neq i}}^{n} (\frac{α_{k}}{α_{i} - α_{k}}), w > σ

(20)

where $σ = \prod_{i = 1}^{n} σ_{i}$ and α_i ≠ α_j if i ≠ j. The distribution of products of independent Pareto (IV) variables with μ_i’s equal to 0 can, via the representation (8), be reduced to a problem involving the distribution of products of powers of independent gamma random variables. Unlike the Pareto (I) case, closed form expressions for the resulting density are apparently not obtainable, although moments of such products are readily available.

The Pareto (IV) family is closed under minimization when certain parameters are common to the minimands. Thus, if X₁ and X₂ are independent random variables with X_i ∼ P(IV)(μ,σ,γ,α_i),i = 1,2, then

min (X_{1}, X_{2}) \sim P (IV) (μ, σ, γ, α_{1} + α_{2}) .

(21)

Note that in this situation the X_i’s share common values for the parameters μ,σ and γ.

Pareto (III) variables exhibit an interesting closure property with respect to geometric minimization and maximization. Indeed, this was used as a justification for use of the Pareto (III) distribution as a suitable model for income distributions based on a scenario involving competitive bidding for employment (Arnold and Laguna1976). For this, consider a sequence X₁,X₂,… of i.i.d. Pareto (III) (μ,σ,γ) random variables. Suppose that for some p ∈ (0,1), N_p is independent of the X_i’s and has a geometric (p) distribution, i.e., P(N = n) = p(1 - p)^n-1, n = 1,2,…. Define the corresponding random extrema by

U_{p} = min {X_{1}, X_{2}, \dots, X_{N_{p}}},

(22)

and

V_{p} = max {X_{1}, X_{2}, \dots, X_{N_{p}}} .

(23)

It is readily verified, by conditioning on N_p, that U_p and V_p each have Pareto (III) distributions. Thus

U_{p} \sim P (III) (μ, σ p^{γ}, γ),

(24)

and

V_{p} \sim P (III) (μ, σ p^{- γ}, γ) .

(25)

Observe that, if μ = 0, then

p^{- γ} U_{p} \overset{d}{=} p^{γ} V_{p} \overset{d}{=} X_{1} .

Some characterization results based on this observation were discussed in Arnold et al. (1986).

It is possible to write down expressions for the densities of order statistics from a Pareto (IV) sample. The corresponding distribution functions will involve incomplete beta functions. Simulation of such order statistics may be accomplished by utilizing the relatively simple form of the Pareto (IV) quantile function, i.e.,

F^{- 1} (u) = μ + σ {[{(1 - u)}^{- 1 / α} - 1]}^{γ} .

(26)

From this we have that if X_i:n is the i th order statistic from a sample of size n from a Pareto (IV) distribution, then

X_{i : n} \overset{d}{=} F^{- 1} (U_{i : n})

(27)

where $\overset{d}{=}$ means that the two random variables are identically distributed, where F^-1 is as given in (26), and where U_i:n is the i th order statistic of a sample of size n from a uniform (0,1) distribution. It is well known (see e.g. David and Nagaraja2003) that U_i:n ∼ Beta(i,n - i + 1).

In some special cases the density of the i th order statistic (27) assumes a known form. For example:

{X_{i}}^{'} s \sim P (III) (μ, σ, γ) \Rightarrow X_{i : n} \sim FP (μ, σ, γ, n - i + 1, i) .

(28)

Another case involves minima:

{X_{i}}^{'} s \sim P (IV) (μ, σ, γ, α) \Rightarrow X_{1 : n} \sim P (IV) (μ, σ, γ, n α) .

(29)

3 Some related extensions

A variety of models have been proposed to add more flexibility to the generalized Pareto models discussed in Section 2. Most of them include Pareto models as special cases. In this Section we will make note of a selection of these models.

Many early researchers modeled the logarithm of income (called income power by Champernowne (1937)). Thus, instead of postulating a simple distribution for income, a relatively simple distribution was assumed for some function of income. More flexibility may be introduced by considering a parametric family of monotonic transformations of the income data whose parameters must be estimated from the data. For example, we might begin with a parametric family of increasing functions $ψ (x; \underline{τ})$ with corresponding inverse functions $ψ^{- 1} (x; \underline{τ})$ and assume that $ψ (x; \underline{τ})$ has a Pareto (IV) (0,σ,γ,α) distribution. If we denote the corresponding P(I V)(0,σ,γ,α) distribution by F_σ,γ,α(x) then the distribution of X will be

F_{X} (x; σ, γ, α, \underline{τ}) = F_{σ, γ, α} (ψ^{- 1} (x; \underline{τ})) .

(30)

A parallel extension involves quantile functions instead of distribution functions. For this the quantile function of X is assumed be of the form

F_{X}^{- 1} (u; σ, γ, α, \underline{τ}) = F_{σ, γ, α}^{- 1} (\tilde{ψ} (u; \underline{τ})) .

(31)

where $\tilde{ψ} (u; \underline{τ})$ is a parametric family of monotone functions mapping (0,1) onto (0,1). A popular model of this genre, introduced by Jones (2004), makes use of the family of quantile functions of Beta distributions. The density function of this Beta-generalized Pareto distribution is given by

f_{X} (x; σ, γ, α, λ_{1}, λ_{2}) = \frac{α {[1 - {(1 + {(\frac{x}{σ})}^{1 / γ})}^{- α}]}^{λ_{1} - 1} {(1 + {(\frac{x}{σ})}^{1 / γ})}^{- α λ_{2} - 1} {(\frac{x}{σ})}^{(1 / γ) - 1}}{σ γ B (λ_{1}, λ_{2})},

(32)

where x ∈ (0,∞). Of course, if λ₁ = λ₂ = 1, the Beta-generalized distribution simplifies to become a Pareto distribution.

Another popular model of the form (31) involves the simple choice $\tilde{ψ} (u) = u^{1 / θ}$ where θ > 0. In such a case we have

F_{X} (x; σ, γ, α, θ) = {[F_{σ, γ, α} (x)]}^{θ}, x > 0,

(33)

and the distribution is usually called the exponentiated generalized Pareto distribution. It can be recognized as a special case of the Beta-generalized Pareto model with the parameters chosen to be λ₁ = θ and λ₂ = 1.

Instead of the Beta distribution, one might use the Kumaraswamy distribution to obtain an alternative generalized Pareto distribution. First,we must recall the definition of the Kumaraswamy distribution. We say that X has a Kumaraswamy (λ₁,λ₂) distribution if its density and distribution functions are :

f_{K} (x) = λ_{1} λ_{2} x^{λ_{1} - 1} {(1 - x^{λ_{1}})}^{λ_{2} - 1}, 0 < x < 1,

(34)

and

F_{K} (x) = 1 - {(1 - x^{λ_{1}})}^{λ_{2}}, 0 < x < 1 .

(35)

See Jones (2009) for a comprehensive introduction to the Kumaraswamy distribution. Let F_P(x) denote the Pareto (IV) distribution function and suppose that K has a Kumaraswamy (λ₁,λ₂) distribution. Define $Y = F_{P}^{- 1} (K),$ then Y has a Kumaraswamy-Pareto (IV) distribution with corresponding density

f_{Y} (y) = f_{K} (F_{P} (y)) f_{P} (y) .

Akinsete et al. (2008) consider some special subcases of the Beta-generalized Pareto distribution, while Paranaiba et al. (2013) discuss the Kumaraswamy-generalized Pareto distribution. Submodels of the Kumaraswamy-generalized Pareto model are often of interest. For example the exponentiated generalized Pareto distribution (33) is such a submodel.

And, of course, one can concatenate these constructions and consider a Beta-Kumaraswamy-Pareto distribution. Going one step further we would arrive at a generalized-Beta-Kumaraswamy-Pareto model. Each generalization adds flexibility at the cost of introducing more parameters. Some degree of parsimony is evidently called for here.

Pillai (1991) suggested an extension of the Pareto (III) distribution, motivated by its closure under geometric minimization. A random variable is said to have a semi-Pareto (III) distribution if its survival function is of the form

\bar{F} (x; μ, σ, γ, p) = {[1 + {(\frac{x - μ}{σ})}^{1 / γ} h (\frac{x - μ}{σ})]}^{- 1}, x > μ,

(36)

where μ ∈ (-∞,∞),σ,γ ∈ (0,∞),p ∈ (0,1) and h(x) is a periodic function of lnx with period -2π/[ γ lnp], and with h(0) = 1. The case in which h(x) ≡ 1 for every x corresponds to the usual Pareto (III) model. More generality can be arrived at if h(x) is replaced by a suitable parametric family of periodic functions. Note that, in order for (36) to be a valid survival function, it must be the case that x^1/γh(x) is a non-decreasing function of x.

Hidden truncation or selection models may sometimes provide alternative models that are more suitable than basic Pareto models. The corresponding scenario is one in which the variable X is observed only if a covariable Y takes on a value less than some threshold value. Thus the distribution of the observed X’s is of the form P(X ≤ x|Y ≤ y₀). With this in mind, consider the case in which (X,Y) has a bivariate Pareto (IV) distribution with the following joint survival function.

P (X > x, Y > y) = {[1 + {(\frac{x - μ}{σ})}^{1 / γ} + {(\frac{y - ν}{τ})}^{1 / δ}]}^{- α}, x > μ, y > ν .

(37)

(Such distributions will be discussed in more detail in Section 5). This distribution has Pareto (IV) marginals and has Pareto (IV) conditionals. After suitable reparameterization the corresponding hidden truncation density for X, given that it can only be observed if Y is not too large, is

\begin{array}{l} f_{HT} (x; μ, σ, γ, α, θ) & = \frac{α {(\frac{x - μ}{σ})}^{(1 / γ) - 1}}{γ σ [1 - {(1 + θ)}^{- α}]} \\ \times [{(1 + {(\frac{x - μ}{σ})}^{1 / γ})}^{- (α + 1)} \\ - {(1 + {(\frac{x - μ}{σ})}^{1 / γ} + θ)}^{- (α + 1)}], x > μ, \end{array}

(38)

where a new parameter θ = [(y₀-ν)/τ]^1/δ has been introduced. In this model, μ is a real valued parameter, often positive, while all of the other parameters, σ,γ,α and θ are positive valued. An alternative representation of this density is possible as follows.

\begin{array}{l} f_{HT} (x; μ, σ, γ, α, θ) \\ = \frac{1}{1 - {(1 + θ)}^{- α}} [\frac{α}{γ σ} {(\frac{x - μ}{σ})}^{(1 / γ) - 1} {[1 + {(\frac{x - μ}{σ})}^{1 / γ}]}^{- (α + 1)}] \\ - \frac{{(1 + θ)}^{- α}}{1 - {(1 + θ)}^{- α}} [\frac{α}{γ σ_{1}} {(\frac{x - μ}{σ_{1}})}^{(1 / γ) - 1} {[1 + {(\frac{x - μ}{σ_{1}})}^{1 / γ}]}^{- (α + 1)}] \end{array}

(39)

where σ₁ = σ(1 + θ)^γ. This is recognizable as a linear combination of two Pareto (IV) densities. Note that the density is a linear combination of two Pareto (IV) densities, but it is not a convex combination since, although the coefficients add up to 1, the second coefficient is negative. Motivated by this example, one might also consider k-component linear combinations of Pareto (IV) densities as income models, allowing k to be greater than 2. Such models with positive coefficients are natural candidates for fitting multimodal income data sets which may well have a mixture genesis. Note that, by testing the hypothesis that θ = 0 one can decide whether or not the data set at hand has been subject to hidden truncation. More detailed discussion of these hidden truncation Pareto models, in the Pareto (II) case, may be found in Arnold and Ghosh (2011).

4 Inference, briefly

Suppose that X₁,X₂,…,X_n are independent identically distributed random variables with a common Pareto (IV) distribution. The sample size should be reasonably large, since we have four parameters to estimate. The sample minimum, or some minor corrected version of it, will be a suitable estimate of the location parameter μ. After subtracting it from each of the observations, the remaining three parameters may be estimated using maximum likelihood. The corresponding Fisher information matrix is available (as indeed is the Fisher information matrix for the Feller-Pareto model with μ = 0).

Either a global search or numerical solution of the likelihood equations will be required to identify the location of the maximum of the likelihood function. In the Pareto (I) case, a variety of alternative estimates are available including best unbiased estimates. Alternatively, in the Pareto (I) case one can take logarithms of the data and arrive at a shifted exponential model, for which many estimation strategies have been developed.

A diffuse prior Bayesian analysis can be used for Pareto (IV) data. It will, predictably, yield results similar to those obtained via maximum likelihood. In the Pareto (I) case, Lwin (1972) introduced a conjugate family of priors for (σ,α) which can be used to incorporate some degree of prior knowledge of the parameters. Arnold et al. (1998) suggest use of a more flexible family of what they call conditionally conjugate priors in this setting. These priors are tailor-made for subsequent use of Gibbs sampling algorithms to generate realizations from the corresponding posterior distribution.

More details on parametric inference for Pareto models may be found in Arnold (1983) and Arnold (2008).

5 Multivariate Pareto models

The first author to systematically study k-dimensional Pareto distributions was Mardia (1962). Mardia’s type I multivariate Pareto distribution has the attractive feature that both marginals and conditional distributions are Paretian in nature. We will say that a k-dimensional random vector $\underline{X}$ has a type I multivariate Pareto distribution, if the joint survival function is of the form

{\bar{F}}_{\underline{X}} (\underline{x}) = {[\sum_{i = 1}^{k} (x_{i} / σ_{i}) - k + 1]}^{- α}, x_{i} > σ_{i}

(40)

and we write $\underline{X} \sim M P^{(k)} (I) (\underline{σ}, α)$ . The σ_i’s are non-negative marginal scale parameters. The non-negative parameter α is an inequality parameter (common to all marginals). It follows from (40) that the one-dimensional marginals are classical Pareto distributions. Thus X_i ∼ P(I)(σ_i,α), i = 1,2,…,k. By setting selected x_i’s equal to σ_i in (40), it is apparent that, for any k₁ < k, all k₁ dimensional marginals are again multivariate Pareto. If we use the notational device $\underline{X} = (\underline{\dot{X}}, \underline{\ddot{X}})$ where $\underline{\dot{X}}$ is k₁ dimensional, with an analogous partition of the vector $\underline{σ} = (\underline{\dot{σ}}, \underline{\ddot{σ}})$ , we may write

\underline{\dot{X}} \sim M P^{(k_{1})} (I) (\underline{\dot{σ}}, α) .

(41)

Conditional distributions are also of the form (40), but with a change of location.

It is natural to extend this basic multivariate Pareto model by the introduction of location, scale, inequality and shape parameters in a manner parallel to that used to develop the univariate Pareto (II)-(IV) distributions, as follows:

(MP^(k)(I I)) We will say that $\underline{X}$ has a k-dimensional Pareto distribution of type II, if its joint survival function is of the form

\begin{align} {\bar{F}}_{\underline{X}} (\underline{x}) = {[1 + \sum_{i = 1}^{k} (\frac{x_{i} - μ_{i}}{σ_{i}})]}^{- α}, & x_{i} > μ_{i}, \\ i = 1, 2, \dots, k \end{align}

(42)

and we write $\underline{X} \sim M P^{(k)} (II) (\underline{μ}, \underline{σ}, α)$ .

(MP^(k)(III)) $\underline{X}$ has a k-dimensional Pareto distribution of type III, if its joint survival function is of the form

\begin{align} {\bar{F}}_{\underline{X}} (\underline{x}) = {[1 + \sum_{i = 1}^{k} {(\frac{x_{i} - μ_{i}}{σ_{i}})}^{1 / γ_{i}}]}^{- 1}, & x_{i} > μ_{i}, \\ i = 1, 2, \dots, k \end{align}

(43)

and we write $\underline{X} \sim M P^{(k)} (III) (\underline{μ}, \underline{σ}, \underline{γ})$ .

(MP^(k)(IV)) $\underline{X}$ has a k-dimensional Pareto distribution of type IV, if its joint survival function is of the form

\begin{align} {\bar{F}}_{\underline{X}} (\underline{x}) = {[1 + \sum_{i = 1}^{k} {(\frac{x_{i} - μ_{i}}{σ_{i}})}^{1 / γ_{i}}]}^{- α}, & x_{i} > μ_{i}, \\ i = 1, 2, \dots, k \end{align}

(44)

and we write $\underline{X} \sim M P^{(k)} (IV) (\underline{μ}, \underline{σ}, \underline{γ}, α)$ .

The marginals and conditionals of an MP^(k)(II) distribution are again of the MP^(k)(II) form. An MP^(k)(III) distribution has MP^(k)(III) marginals, but not conditionals. However an MP^(k)(IV) distribution does have both its marginals and conditionals of the MP^(k)(IV) form. Specifically, in the MP^(k)(IV) case, using the dot – double dot notation, we have

\dot{\underline{X}} \sim M P^{(k_{1})} (IV) (\underline{\dot{μ}}, \underline{\dot{σ}}, \underline{\dot{γ}}, α)

(45)

and

\dot{\underline{X}} | \ddot{\underline{X}} = \ddot{\underline{x}} \sim M P^{(k_{1})} (IV) (\dot{\underline{μ}}, \dot{\underline{τ}}, \dot{\underline{γ}}, α + k - k_{1}),

(46)

where

τ_{i} = σ_{i} {[1 + \sum_{j = k_{1} + 1}^{k} {(\frac{x_{j} - μ_{j}}{σ_{j}})}^{1 / γ_{j}}]}^{γ_{i}}, i = 1, 2, \dots, k_{1} .

(47)

Takahasi (1965) discussed the MP^(k)(IV) distribution with $\underline{μ} = \underline{0}$ and $\underline{σ} = \underline{1}$ . He called it a multivariate Burr’s distribution, and noted that the marginal and conditional distributions were of the same form.

As in the univariate case, distributional properties of these multivariate Pareto distributions and possible further extensions are more transparent if one uses a representation of the variables as functions of certain independent gamma variables. Thus if $\underline{X}$ has a k-dimensional Pareto distribution of type IV, we may act as if the X_i’s have the representation

X_{i} = μ_{i} + σ_{i} {(W_{i} / Z)}^{γ_{i}}, i = 1, 2, \dots, k

(48)

where the W_i’s are independent identically distributed Γ(1,1) variables (i.e., standard exponential variables) and Z, independent of the W_i’s, has a Γ(α,1) distribution. This representation, for example, makes it easy to compute the means, variances and covariances of $\underline{X}$ .

A generalization of the representation (48) is one in which the W_i’s are gamma rather than exponential variables. The resulting distribution will be called k-dimensional Feller-Pareto, since its marginals are of the Feller-Pareto form. Thus $\underline{X} \sim F P^{(k)} (\underline{μ}, \underline{σ}, \underline{γ}, α, \underline{β})$ if

X_{i} = μ_{i} + σ_{i} {(W_{i} / Z)}^{γ_{i}}, i = 1, 2, \dots, k

(49)

where the W_i’s and Z are independent random variables with W_i ∼ Γ(β_i,1), (i = 1,2,…,k), and Z ∼ Γ(α,1). The marginal and conditional distributions of this multivariate Feller-Pareto distribution are again multivariate Feller-Pareto. The covariance structure can be readily obtained from the representation (49). Parallel to the situation in one dimension, there exist alternative names that could be applied to multivariate Feller-Pareto variables. They could be called multivariate generalized F variables or multivariate generalized beta of the second kind variables. An evident drawback of the multivariate Feller-Pareto model (and its various submodels) is the presence of a common value of α which appears in each marginal density. The consequences of this homogeneity are not easy to pin down. Certainly a model with Feller Pareto marginals with different α’s for each of the marginals would be desirable, if one can be developed with attractive distributional properties (e,g., “nice” conditional distributions).

5.1 Other multivariate Pareto distributions

Although the title of this section promises discussion of multivariate models, only the bivariate case will be treated. It will be left to the reader to visualize the, usually straightforward, extension to the multivariate case. Notational complexity is avoided to a great extent by focusing on the case k = 2.

It is not difficult to verify that a P(IV)(μ,σ,γ,α) distribution can be represented as a scale mixture of Weibull distributions. Equivalently, as remarked earlier, that a P(IV) random variable admits a representation as

X = μ + σ {(U / Z)}^{γ}

where U ∼ exp(1) and Z ∼ Γ(α,1) are independent variables. A natural bivariate version of this construction begins with (U₁,U₂) having a bivariate exponential distribution with standard exponential marginals, perhaps one of the Marshall-Olkin type with parameters 1, 1 and λ. Then, with Z ∼ Γ(α,1) independent of (U₁,U₂), we define (X₁,X₂) by

X_{i} = μ_{i} + σ_{i} {(U_{i} / Z)}^{γ_{i}}, i = 1, 2 .

(50)

Observe that in any bivariate Pareto (IV) distribution generated by this method, the marginals share a common value of α.

A second approach to generating bivariate P(IV) distributions makes use of the following representation of a P(IV) variable. Suppose that U ∼ exp(1), then

μ + σ {(e^{U / α} - 1)}^{γ} \sim P (IV) (μ, σ, γ, α)

(51)

Here too then, we can begin with (U₁,U₂) having an arbitrary bivariate distribution with standard exponential marginals and construct a variable (X₁,X₂) with a bivariate Pareto (IV) distribution by defining

X_{i} = μ_{i} + σ_{i} {(e^{U_{i} / α_{i}} - 1)}^{γ_{i}}, i = 1, 2 .

(52)

A third approach makes use of the fact that minima of independent Pareto (IV) random variables themselves have Pareto (IV) distributions. Thus if X_i, i = 1,2, are independent with X_i ∼ P(IV)(μ,σ,γ,α_i), then min(X₁,X₂) ∼ P(IV)(μ,σ,γ,α₁ + α₂). We then begin with three independent random variables Y₁, Y₂, Y₃ with Y_i ∼ P(IV)(μ,σ,γ,α_i) and define

\begin{array}{l} X_{1} = min (Y_{1}, Y_{3}), \\ X_{2} = min (Y_{2}, Y_{3}) \end{array}

(53)

(this approach is often called the method of trivariate reduction). In addition to having Pareto IV marginals, it is clear that the distribution described by (54) has the property that min(X₁,X₂) ∼ P(IV)(μ,σ,γ,α₁ + α₂ + α₃). This distribution has the perhaps undesirable property that P(X₁ = X₂) > 0, and has another unfortunate property in that the marginals share common values of μ, σ and γ. This latter problem can be avoided to some extent by assuming that the Y_i’s have P(IV)(0,1,γ,α_i) distributions and then defining

\begin{array}{l} X_{1} = μ_{1} + σ_{1} min (Y_{1}, Y_{3}), \\ X_{2} = μ_{2} + σ_{2} min (Y_{2}, Y_{3}) . \end{array}

(54)

In this case the X_i’s share only a common value of γ.

Finally we mention the popular Copula based approach to constructing bivariate distributions with given marginals. For this, we begin with an analytically tractable bivariate distribution for (Z₁,Z₂) and apply marginal transformations to produce a bivariate distribution with Pareto (IV) marginals. A popular choice for the distribution of (Z₁,Z₂) is a bivariate normal with standard normal marginals and correlation ρ, but of course any other bivariate distribution can be used in its place. Now using F_μ,σ,γ,α to denote the distribution function of a P(IV)(μ,σ,γ,α) random variable and Φ to denote a standard normal distribution function, we define

\begin{array}{l} X_{1} = F_{μ_{1}, σ_{1}, γ_{1}, α_{1}}^{- 1} (Φ (Z_{1})), \\ X_{2} = F_{μ_{2}, σ_{2}, γ_{2}, α_{2}}^{- 1} (Φ (Z_{2})) . \end{array}

(55)

The correlation structure of the X_i’s is inherited from the correlation structure of the Z_i’s. In this case the extension to k dimensions is particularly transparent. Note also that the model has one dependence parameter which, if set equal to 0, yields a model with independent P(IV) marginals. It will be noted that this feature of having a single dependence parameter is shared by the other bivariate models introduced in this Section.

6 Multivariate extensions

Several of the univariate extensions, discussed in Section 3, can be readily modified to yield k-dimensional versions. For example a random variable with the univariate Beta-generalized-Pareto (IV) distribution can be viewed as being defined by

X = F_{μ, σ, γ, α}^{- 1} (V),

(56)

where V ∼ Beta(λ₁,λ₂). For a bivariate version of this construction, we begin with (V₁,V₂) having a bivariate Beta distribution, perhaps of the the type introduced by Arnold and Ng (2011), and make suitable marginal transformations. Thus we define

\begin{array}{l} X_{1} = F_{μ_{1}, σ_{1}, γ_{1}, α_{1}}^{- 1} (V_{1}), \\ X_{2} = F_{μ_{2}, σ_{2}, γ_{2}, α_{2}}^{- 1} (V_{2}) . \end{array}

(57)

Higher dimensional versions of this construction require only the identification of a suitable k-dimensional Beta distribution. A Dirichlet distribution might be used here. Some other alternatives are described in Arnold and Ng (2011).

To identify a suitable bivariate analog of the Kumaraswamy-Pareto (IV) distribution, all that is required is a bivariate-Kumaraswamy distribution. One possible such distribution was suggested by Nadarajah et al. (2011).

Hidden truncation models, likewise, can be considered in higher dimensions. For example we may begin with $\underline{X} \sim M P^{(k)} (IV) (\underline{μ}, \underline{σ}, \underline{γ}, α)$ . Then, using our dot – double dot notation, we have

f_{HT} (\dot{\underline{x}}) = f_{\dot{\underline{X}} | \ddot{\underline{X}} \leq \ddot{\underline{x}}} (\dot{\underline{x}}) = f_{\dot{\underline{X}}} (\dot{\underline{x}}) \frac{P (\ddot{\underline{X}} \leq \ddot{\underline{x}} | \dot{\underline{X}} = \dot{\underline{x}})}{P (\ddot{\underline{X}} \leq \ddot{\underline{x}})}

(58)

which is not difficult to evaluate, since the conditional distribution of $\ddot{\underline{X}}$ given that $\dot{\underline{X}} = \dot{\underline{x}}$ is of the $M P^{(k - k_{1})} (IV)$ form (refer to equation (46), being careful to switch the roles of $\dot{\underline{X}}$ and $\ddot{\underline{X}}$ ).

We conclude this section by noting the availability of multivariate distributions with Pareto conditionals rather than Pareto marginals. Detailed discussion of such models may be found in Arnold et al. (1999), Chapter 5.

7 Envoi

The survey presented in this paper is far from complete. A more detailed and extensive survey (though somewhat out of date) can be found in Arnold (1983). A revision of that book is, however, currently in preparation. In the interim, see Arnold (2008) for a more up-to-date presentation and, as mentioned in Section 4, for more details on inferential strategies. More work is still needed on the development of estimation and hypothesis testing strategies, especially for multivariate Pareto data. Creative Bayesian analyses involving informative priors, in multivariate settings and in cases involving covariates, are also notable for their absence. Finally, I apologize to those readers whose important contributions have been overlooked in this survey. I excuse myself by repeating that the survey is necessarily incomplete. However, please do advise me of any glaring omissions that you might note.

References

Aban IB, Meerschaert MM, Panorska AK: Parameter estimation for the truncated Pareto distribution. J. Amer. Statist. Assoc 101: 270–277. 2006
Article MathSciNet Google Scholar
Akinsete A, Famoye F, Lee C: The beta-Pareto distribution. Statistics 42: 547–563. 2008
Article MathSciNet Google Scholar
Arnold BC: Pareto Distributions. International Cooperative Publishing House, Burtonsville, MD; 1983
Google Scholar
Arnold BC: Pareto and generalized Pareto distributions. In Modeling Income Distributions and Lorenz Curves. Edited by: Chotikapanich D. Springer, New York; 2008
Chapter Google Scholar
Arnold BC, Austin K: Truncated Pareto distributions: flexible, tractable and familiar.. Technical Report #150, Department of Statistics, University of California, Riverside, CA; 1987
Google Scholar
Arnold BC, Castillo E, Sarabia JM: Bayesian analysis for classical distributions using conditionally specified priors. Sankhya: Ind. J. Stat. Series B 60: 228–245. 1998
MathSciNet Google Scholar
Arnold BC, Castillo E, Sarabia JM: Conditional Specification of Statistical Models. Springer, New York; 1999
Google Scholar
Arnold BC, Ghosh I: Inference for Pareto data subject to hidden truncation. J. Ind. Soc. Probability Stat 13: 1–16. 2011
Google Scholar
Arnold BC, Laguna L: A stochastic mechanism leading to asymptotically Paretian distributions. Business and Economic Statistics Section, Proceedings of the American Statistical Association 1976
Google Scholar
Arnold BC, Ng HKT: Flexible bivariate beta distributions. J. Multivariate Anal 102: 1194–1202. 2011
Article MathSciNet Google Scholar
Arnold BC, Robertson CA, Yeh HC: Some properties of a Pareto-type distribution. Sankhya: Ind. J. Stat. Series A 48: 404–408. 1986
MathSciNet Google Scholar
Champernowne DG: The theory of income distribution. Econometrica 5: 379–381. 1937
Google Scholar
David HA, Nagaraja HN: Order Statistics. Third edition. Wiley, Hoboken, NJ; 2003
Google Scholar
Falk M, Hüsler J, Reiss R: Laws of Small Numbers, Extremes and Rare Events. Third edition. Birkhäuser/Springer, Basel; 2011
Book Google Scholar
Feller W: An Introduction to Probability Theory and its Applications, Vol. 2. Second edition. Wiley, New York; 1971
Google Scholar
Fisk PR: The graduation of income distributions. Econometrica 29: 171–185. 1961a
Article Google Scholar
Fisk PR: Estimation of location and scale parameters in a truncated grouped sech-square distribution. J. Am. Stat. Assoc 56: 692–702. 1961b
Article Google Scholar
Johnson NL, Kotz S, Balakrishnan N: Continuous Univariate Distributions, Vol. 1. Second edition. Wiley, New York; 1994
Google Scholar
Jones MC: Families of distributions arising from distributions of order statistics. TEST 13: 1–43. 2004
Article MathSciNet Google Scholar
Jones MC, Kumaraswamy’s distribution: A beta-type distribution with some tractability advantages. Stat. Methodol 6: 70–81. 2009
Article MathSciNet Google Scholar
Kagan YY, Schoenberg F: Estimation of the upper cutoff parameter for the tapered Pareto distribution. J. Appl. Probability 38A: 158–175. 2001
Article MathSciNet Google Scholar
Kalbfleisch JD, Prentice RL: The Statistical Analysis of Failure Time Data. Wiley, New York; 1980
Google Scholar
Keiding N, Kvist K, Hartvig H, Tvede M, Juul S: Estimating time to pregnancy from current durations in a cross-sectional sample. Biostatistics 3: 565–578. 2002
Article Google Scholar
Lwin T: Estimation of the tail of the Paretian law. Skand. Aktuarietidskr 55: 170–178. 1972
MathSciNet Google Scholar
Maguire BA, Pearson ES, Wynn AHA: The time intervals between industrial accidents. Biometrika 39: 168–180.
Article Google Scholar
Mardia KV: Multivariate Pareto distributions. Ann. Math. Stat 33: 1008–1015. 1962
Article MathSciNet Google Scholar
Nadarajah S, Cordeiro GM, Ortega EMM: General results for the Kumaraswamy-G distribution. J. Stat. Comput. Simul 82: 951–979. 2011
Article MathSciNet Google Scholar
Pareto V: Cours d’economie Politique, Vol. II. F. Rouge, Lausanne; 1897
Google Scholar
Paranaiba PF, Ortega EMM, Cordeiro GM, de Pascoa MAR: The Kumaraswamy Burr XII distribution: theory and practice. J. Stat. Comput. Simul 83: 2117–2143. 2013
Article MathSciNet Google Scholar
Pillai RN: Semi-Pareto processes. J. Appl. Probab 28: 461–465. 1991
Article MathSciNet Google Scholar
Takahasi K: Note on the multivariate Burr’s distribution. Ann. Inst. Statist. Math 17: 257–260. 1965
Article MathSciNet Google Scholar

Download references

Acknowledgement

The constructive suggestions supplied by anonymous referees have resulted in a much improved manuscript.

Author information

Authors and Affiliations

Department of Statistics, University of California, Riverside, USA
Barry C Arnold

Authors

Barry C Arnold
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Barry C Arnold.

Additional information

Competing interests

The author declares that he has no competing interests.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0), which permits use, duplication, adaptation, distribution, and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Arnold, B.C. Univariate and multivariate Pareto models. J Stat Distrib App 1, 11 (2014). https://doi.org/10.1186/2195-5832-1-11

Download citation

Received: 13 February 2014
Accepted: 09 May 2014
Published: 17 June 2014
DOI: https://doi.org/10.1186/2195-5832-1-11

Univariate and multivariate Pareto models

Abstract

1 Introduction

2 A hierarchy of generalized Pareto models

2.1 Distributional properties

3 Some related extensions

4 Inference, briefly

5 Multivariate Pareto models

5.1 Other multivariate Pareto distributions

6 Multivariate extensions

7 Envoi

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Rights and permissions

About this article

Cite this article

Share this article

Keywords