Univariate and multivariate Pareto models
© Arnold; licensee Springer. 2014
Received: 13 February 2014
Accepted: 9 May 2014
Published: 17 June 2014
The Pareto distribution has long been recognized as a suitable model for many non-negative socio-economic variables. Univariate and multivariate variations abound. Some unification is possible by representing the Pareto variables in terms of independent gamma distributed components. Further unification is sometimes possible since some of the frequently used multivariate Pareto models share the same copula. In some cases, inference strategies can be developed to take advantage of the stochastic representations in terms of gamma components.
KeywordsInequality Heavy tails Generalized Pareto Feller-Pareto Kumaraswamy distribution Hidden truncation Conditional specification
Here σ, the scale parameter, is positive and α (Pareto’s index of inequality) is also positive.
This model is typically referred to as the classical Pareto model. Improved fitting of data is encountered when more general Pareto-like distributions are considered. In this survey, the classical model will be embedded in a hierarchy of more complicated Pareto models. In this hierarchy of generalized Pareto distributions, the classical model will be called the Pareto (I) distribution. Multivariate income distributions are also of interest and, in that arena, a hierarchy of multivariate Pareto distributions is available, paralleling and closely related to the univariate hierarchy.
Even more flexible models have been proposed using these univariate and multivariate Pareto models as building blocks. Several of these will be described in this paper.
The end result is an impressively flexible array of income models from which the researcher can select a parsimonious model for the particular data set at hand. The emphasis in this survey will be on distributional properties of the models but some attention will be paid to estimation and inference strategies.
2 A hierarchy of generalized Pareto models
As the basic distribution in our hierarchy of generalized Pareto models we use the classical Pareto distribution, called here the Pareto (I) distribution. Its survival function is of the form (1). In practice, α is frequently assumed to be larger than 1, so that the distribution has a finite mean. If a random variable X has (1) as its survival function, then we write X ∼ P(I)(σ,α). In this basic model, the parameter σ has a dual role. It is indeed a scale parameter, but also it determines the lower bound of the support of the distribution and, in a sense, plays to some extent the role of a location parameter. A slightly more general model, which separates the roles of location and scale parameters has then been frequently used.
where μ, the location parameter, is real valued, σ is positive and α is positive. In most applications μ will be non-negative, but negative values for μ pose no problems. If X has (2) as its survival function, we will write X ∼ P(II)(μ,σ,α).
where σ > 0 and -∞ < k < ∞. The density corresponding to k = 0 is obtained by taking the limit as k ↑ 0 in (3). This model, (3), includes three sub-models. When k < 0, it yields a Pareto (II) density (with μ = 0), when k = 0 it yields an exponential density, while for k > 0, it corresponds to a scaled Beta distribution (of the first or standard kind). In fact, several results for the Pareto (II) distribution can be proved to remain valid in the more general context of the Pickands generalized Pareto model. In an income modeling setting, as Pareto observed, heavy tailed distributions are typically encountered and, for this reason, we will concentrate on the Pareto (II) sub-model. We remark, in passing, that despite Pareto’s insistence on the ubiquity of heavy tails, several authors have utilized scaled Beta distributions as income models, and some have even argued in favor of the exponential distribution as a model. The most general model in our hierarchy, the Feller-Pareto model will be seen to actually include some light-tailed distributions corresponding to scaled Beta models with an additional location parameter. Applications of such light tailed models are more likely to be encountered outside of the income distribution context.
where μ is real, σ is positive and γ is positive. We will call γ the inequality parameter. If μ = 0 and γ ≤ 1, then γ turns out to be precisely the Gini index of inequality for this distribution. If X has (5) as its survival function, we will write X ∼ P(III)(μ,σ,γ).
where μ (location) is real, σ (scale) is positive, γ (inequality) is positive and α (shape) is positive. Although we continue to call γ the inequality parameter it will only be identifiable with the Gini index when α = 1 and μ = 0. One might argue instead that in the P(IV) model both γ and α would be best described as shape parameters, since neither of them has a direct inequality interpretation. An anonymous referee points out that the two parameters γ and α govern the behavior of the P(IV) density as x approaches μ from above and as x approaches infinity. Thus, f(x;μ,σ,γ,α)∼x-α/γ-1 as x → ∞ and f(x-μ;μ,σ,γ,α) ∼ (x-μ)1/γ-1 as x → μ. He suggests that an argument might be advanced in favor of a reparameterization in which we define β = α/γ, to highlight the roles of α and β in determining the limiting behavior of the density. However, in this paper, to be consistent with the notation in Arnold (1983), we will continue with the μ,σ,γ,α parameterization and continue to call γ the inequality parameter. If a random variable X has (6) as its survival function, we will write X ∼ P(IV)(μ,σ,γ,α). Note that the Pareto (IV) distribution, with μ = 0, is also known as a Burr Type XII distribution.
then W has a Feller–Pareto distribution, and we write W ∼ FP(μ,σ,γ,δ1,δ2).
The corresponding survival function is obtainable from tables of the incomplete beta function. For many computations it is simpler to work directly with the representation (8). The Pareto (IV) distributions correspond to the case in which X2 has a gamma distribution while X1 has an exponential distribution. The Pareto (III) distributions are encountered when both X1 and X2 are exponential variables.
Kalbfleisch and Prentice (1980) call the Feller-Pareto density (with μ = 0) a generalized F density. Instead we might describe a Feller Pareto variable as being a location and scale transform of a generalized beta variable of the second kind. Recall that a beta variable of the second kind is just a ratio of independent gamma variables.
The additional flexibility provided by the introduction of such a sixth parameter in the model has not been investigated.
The full array of generalized univariate Pareto distributions to be considered in this paper are subsumed in the Feller–Pareto family and a unified derivation of many distributional results is possible. However, in the case of Pareto (I)-(IV) distributions, some alternative representations are also useful.
Likewise Pareto (III) and (IV) variables can be represented as X = μ + σ(e V - 1) γ and X = μ + σ(eV/α - 1) γ respectively. The representation (12) for a classical Pareto variable (i.e., Pareto (I)) highlights the useful observation that the logarithm of such a variable has a shifted exponential distribution. This will permit the recognition of many distributional properties of Pareto (I) variables as reflections of parallel properties of exponential variables.
i.e., a (translated) exponential distribution, and if Z ∼ Γ(α,1), then it follows that unconditionally X∼P(II)(μ,σ,α). Alternatively, this can be viewed as being equivalent to the representation in (8) after setting γ = δ2 = 1.
This representation of the Pareto (II) distribution as a gamma mixture of exponential distributions is often encountered in reliability and survival contexts, see e.g., Keiding et al. (2002). It is also familiar in Bayesian analysis of exponential data, where the gamma density enters as a convenient prior. In this context the Pareto (II) distribution is sometimes called the Lomax distribution.
It is as a consequence of the relation (14) that the Pareto (III) distribution, with μ = 0, is sometimes called the log-logistic distribution.
This distribution, which was suggested by Pareto (1897), was proposed to accommodate cases in which the basic Pareto model (1) was inadequate for fitting certain data configurations. This model is closely related to the Pareto (II) distribution, but with an additional exponential factor. Note that it could be viewed as the distribution of the minimum of a Pareto (II) variable (with μ = 0) and an independent exponential variable. This model has been used infrequently, but recently it has reappeared, this time called a tapered Pareto distribution (Kagan and Schoenberg2001).
2.1 Distributional properties
From this expression moments for the Feller-Pareto and the P(II)-P(IV) distributions are readily obtained.
where the V i ’s are independent standard exponential variables. In some cases expressions are available for the distribution of. In particular, if α i = α,(i = 1,2,…,n), then, and we may readily obtain the density of W.
where and α i ≠ α j if i ≠ j. The distribution of products of independent Pareto (IV) variables with μ i ’s equal to 0 can, via the representation (8), be reduced to a problem involving the distribution of products of powers of independent gamma random variables. Unlike the Pareto (I) case, closed form expressions for the resulting density are apparently not obtainable, although moments of such products are readily available.
Note that in this situation the X i ’s share common values for the parameters μ,σ and γ.
Some characterization results based on this observation were discussed in Arnold et al. (1986).
where means that the two random variables are identically distributed, where F-1 is as given in (26), and where Ui:n is the i th order statistic of a sample of size n from a uniform (0,1) distribution. It is well known (see e.g. David and Nagaraja2003) that Ui:n ∼ Beta(i,n - i + 1).
3 Some related extensions
A variety of models have been proposed to add more flexibility to the generalized Pareto models discussed in Section 2. Most of them include Pareto models as special cases. In this Section we will make note of a selection of these models.
where x ∈ (0,∞). Of course, if λ1 = λ2 = 1, the Beta-generalized distribution simplifies to become a Pareto distribution.
and the distribution is usually called the exponentiated generalized Pareto distribution. It can be recognized as a special case of the Beta-generalized Pareto model with the parameters chosen to be λ1 = θ and λ2 = 1.
Akinsete et al. (2008) consider some special subcases of the Beta-generalized Pareto distribution, while Paranaiba et al. (2013) discuss the Kumaraswamy-generalized Pareto distribution. Submodels of the Kumaraswamy-generalized Pareto model are often of interest. For example the exponentiated generalized Pareto distribution (33) is such a submodel.
And, of course, one can concatenate these constructions and consider a Beta-Kumaraswamy-Pareto distribution. Going one step further we would arrive at a generalized-Beta-Kumaraswamy-Pareto model. Each generalization adds flexibility at the cost of introducing more parameters. Some degree of parsimony is evidently called for here.
where μ ∈ (-∞,∞),σ,γ ∈ (0,∞),p ∈ (0,1) and h(x) is a periodic function of lnx with period -2π/[ γ lnp], and with h(0) = 1. The case in which h(x) ≡ 1 for every x corresponds to the usual Pareto (III) model. More generality can be arrived at if h(x) is replaced by a suitable parametric family of periodic functions. Note that, in order for (36) to be a valid survival function, it must be the case that x1/γh(x) is a non-decreasing function of x.
where σ1 = σ(1 + θ) γ . This is recognizable as a linear combination of two Pareto (IV) densities. Note that the density is a linear combination of two Pareto (IV) densities, but it is not a convex combination since, although the coefficients add up to 1, the second coefficient is negative. Motivated by this example, one might also consider k-component linear combinations of Pareto (IV) densities as income models, allowing k to be greater than 2. Such models with positive coefficients are natural candidates for fitting multimodal income data sets which may well have a mixture genesis. Note that, by testing the hypothesis that θ = 0 one can decide whether or not the data set at hand has been subject to hidden truncation. More detailed discussion of these hidden truncation Pareto models, in the Pareto (II) case, may be found in Arnold and Ghosh (2011).
4 Inference, briefly
Suppose that X1,X2,…,X n are independent identically distributed random variables with a common Pareto (IV) distribution. The sample size should be reasonably large, since we have four parameters to estimate. The sample minimum, or some minor corrected version of it, will be a suitable estimate of the location parameter μ. After subtracting it from each of the observations, the remaining three parameters may be estimated using maximum likelihood. The corresponding Fisher information matrix is available (as indeed is the Fisher information matrix for the Feller-Pareto model with μ = 0).
Either a global search or numerical solution of the likelihood equations will be required to identify the location of the maximum of the likelihood function. In the Pareto (I) case, a variety of alternative estimates are available including best unbiased estimates. Alternatively, in the Pareto (I) case one can take logarithms of the data and arrive at a shifted exponential model, for which many estimation strategies have been developed.
A diffuse prior Bayesian analysis can be used for Pareto (IV) data. It will, predictably, yield results similar to those obtained via maximum likelihood. In the Pareto (I) case, Lwin (1972) introduced a conjugate family of priors for (σ,α) which can be used to incorporate some degree of prior knowledge of the parameters. Arnold et al. (1998) suggest use of a more flexible family of what they call conditionally conjugate priors in this setting. These priors are tailor-made for subsequent use of Gibbs sampling algorithms to generate realizations from the corresponding posterior distribution.
5 Multivariate Pareto models
Conditional distributions are also of the form (40), but with a change of location.
It is natural to extend this basic multivariate Pareto model by the introduction of location, scale, inequality and shape parameters in a manner parallel to that used to develop the univariate Pareto (II)-(IV) distributions, as follows:
and we write.
and we write.
and we write.
Takahasi (1965) discussed the MP(k)(IV) distribution with and. He called it a multivariate Burr’s distribution, and noted that the marginal and conditional distributions were of the same form.
where the W i ’s are independent identically distributed Γ(1,1) variables (i.e., standard exponential variables) and Z, independent of the W i ’s, has a Γ(α,1) distribution. This representation, for example, makes it easy to compute the means, variances and covariances of.
where the W i ’s and Z are independent random variables with W i ∼ Γ(β i ,1), (i = 1,2,…,k), and Z ∼ Γ(α,1). The marginal and conditional distributions of this multivariate Feller-Pareto distribution are again multivariate Feller-Pareto. The covariance structure can be readily obtained from the representation (49). Parallel to the situation in one dimension, there exist alternative names that could be applied to multivariate Feller-Pareto variables. They could be called multivariate generalized F variables or multivariate generalized beta of the second kind variables. An evident drawback of the multivariate Feller-Pareto model (and its various submodels) is the presence of a common value of α which appears in each marginal density. The consequences of this homogeneity are not easy to pin down. Certainly a model with Feller Pareto marginals with different α’s for each of the marginals would be desirable, if one can be developed with attractive distributional properties (e,g., “nice” conditional distributions).
5.1 Other multivariate Pareto distributions
Although the title of this section promises discussion of multivariate models, only the bivariate case will be treated. It will be left to the reader to visualize the, usually straightforward, extension to the multivariate case. Notational complexity is avoided to a great extent by focusing on the case k = 2.
Observe that in any bivariate Pareto (IV) distribution generated by this method, the marginals share a common value of α.
In this case the X i ’s share only a common value of γ.
The correlation structure of the X i ’s is inherited from the correlation structure of the Z i ’s. In this case the extension to k dimensions is particularly transparent. Note also that the model has one dependence parameter which, if set equal to 0, yields a model with independent P(IV) marginals. It will be noted that this feature of having a single dependence parameter is shared by the other bivariate models introduced in this Section.
6 Multivariate extensions
Higher dimensional versions of this construction require only the identification of a suitable k-dimensional Beta distribution. A Dirichlet distribution might be used here. Some other alternatives are described in Arnold and Ng (2011).
To identify a suitable bivariate analog of the Kumaraswamy-Pareto (IV) distribution, all that is required is a bivariate-Kumaraswamy distribution. One possible such distribution was suggested by Nadarajah et al. (2011).
which is not difficult to evaluate, since the conditional distribution of given that is of the form (refer to equation (46), being careful to switch the roles of and).
We conclude this section by noting the availability of multivariate distributions with Pareto conditionals rather than Pareto marginals. Detailed discussion of such models may be found in Arnold et al. (1999), Chapter 5.
The survey presented in this paper is far from complete. A more detailed and extensive survey (though somewhat out of date) can be found in Arnold (1983). A revision of that book is, however, currently in preparation. In the interim, see Arnold (2008) for a more up-to-date presentation and, as mentioned in Section 4, for more details on inferential strategies. More work is still needed on the development of estimation and hypothesis testing strategies, especially for multivariate Pareto data. Creative Bayesian analyses involving informative priors, in multivariate settings and in cases involving covariates, are also notable for their absence. Finally, I apologize to those readers whose important contributions have been overlooked in this survey. I excuse myself by repeating that the survey is necessarily incomplete. However, please do advise me of any glaring omissions that you might note.
The constructive suggestions supplied by anonymous referees have resulted in a much improved manuscript.
- Aban IB, Meerschaert MM, Panorska AK: Parameter estimation for the truncated Pareto distribution. J. Amer. Statist. Assoc 101: 270–277. 2006MathSciNetView ArticleGoogle Scholar
- Akinsete A, Famoye F, Lee C: The beta-Pareto distribution. Statistics 42: 547–563. 2008MathSciNetView ArticleGoogle Scholar
- Arnold BC: Pareto Distributions. International Cooperative Publishing House, Burtonsville, MD; 1983Google Scholar
- Arnold BC: Pareto and generalized Pareto distributions. In Modeling Income Distributions and Lorenz Curves. Edited by: Chotikapanich D. Springer, New York; 2008View ArticleGoogle Scholar
- Arnold BC, Austin K: Truncated Pareto distributions: flexible, tractable and familiar.. Technical Report #150, Department of Statistics, University of California, Riverside, CA; 1987Google Scholar
- Arnold BC, Castillo E, Sarabia JM: Bayesian analysis for classical distributions using conditionally specified priors. Sankhya: Ind. J. Stat. Series B 60: 228–245. 1998MathSciNetGoogle Scholar
- Arnold BC, Castillo E, Sarabia JM: Conditional Specification of Statistical Models. Springer, New York; 1999Google Scholar
- Arnold BC, Ghosh I: Inference for Pareto data subject to hidden truncation. J. Ind. Soc. Probability Stat 13: 1–16. 2011Google Scholar
- Arnold BC, Laguna L: A stochastic mechanism leading to asymptotically Paretian distributions. Business and Economic Statistics Section, Proceedings of the American Statistical Association 1976Google Scholar
- Arnold BC, Ng HKT: Flexible bivariate beta distributions. J. Multivariate Anal 102: 1194–1202. 2011MathSciNetView ArticleGoogle Scholar
- Arnold BC, Robertson CA, Yeh HC: Some properties of a Pareto-type distribution. Sankhya: Ind. J. Stat. Series A 48: 404–408. 1986MathSciNetGoogle Scholar
- Champernowne DG: The theory of income distribution. Econometrica 5: 379–381. 1937Google Scholar
- David HA, Nagaraja HN: Order Statistics. Third edition. Wiley, Hoboken, NJ; 2003Google Scholar
- Falk M, Hüsler J, Reiss R: Laws of Small Numbers, Extremes and Rare Events. Third edition. Birkhäuser/Springer, Basel; 2011View ArticleGoogle Scholar
- Feller W: An Introduction to Probability Theory and its Applications, Vol. 2. Second edition. Wiley, New York; 1971Google Scholar
- Fisk PR: The graduation of income distributions. Econometrica 29: 171–185. 1961aView ArticleGoogle Scholar
- Fisk PR: Estimation of location and scale parameters in a truncated grouped sech-square distribution. J. Am. Stat. Assoc 56: 692–702. 1961bView ArticleGoogle Scholar
- Johnson NL, Kotz S, Balakrishnan N: Continuous Univariate Distributions, Vol. 1. Second edition. Wiley, New York; 1994Google Scholar
- Jones MC: Families of distributions arising from distributions of order statistics. TEST 13: 1–43. 2004MathSciNetView ArticleGoogle Scholar
- Jones MC, Kumaraswamy’s distribution: A beta-type distribution with some tractability advantages. Stat. Methodol 6: 70–81. 2009MathSciNetView ArticleGoogle Scholar
- Kagan YY, Schoenberg F: Estimation of the upper cutoff parameter for the tapered Pareto distribution. J. Appl. Probability 38A: 158–175. 2001MathSciNetView ArticleGoogle Scholar
- Kalbfleisch JD, Prentice RL: The Statistical Analysis of Failure Time Data. Wiley, New York; 1980Google Scholar
- Keiding N, Kvist K, Hartvig H, Tvede M, Juul S: Estimating time to pregnancy from current durations in a cross-sectional sample. Biostatistics 3: 565–578. 2002View ArticleGoogle Scholar
- Lwin T: Estimation of the tail of the Paretian law. Skand. Aktuarietidskr 55: 170–178. 1972MathSciNetGoogle Scholar
- Maguire BA, Pearson ES, Wynn AHA: The time intervals between industrial accidents. Biometrika 39: 168–180.View ArticleGoogle Scholar
- Mardia KV: Multivariate Pareto distributions. Ann. Math. Stat 33: 1008–1015. 1962MathSciNetView ArticleGoogle Scholar
- Nadarajah S, Cordeiro GM, Ortega EMM: General results for the Kumaraswamy-G distribution. J. Stat. Comput. Simul 82: 951–979. 2011MathSciNetView ArticleGoogle Scholar
- Pareto V: Cours d’economie Politique, Vol. II. F. Rouge, Lausanne; 1897Google Scholar
- Paranaiba PF, Ortega EMM, Cordeiro GM, de Pascoa MAR: The Kumaraswamy Burr XII distribution: theory and practice. J. Stat. Comput. Simul 83: 2117–2143. 2013MathSciNetView ArticleGoogle Scholar
- Pillai RN: Semi-Pareto processes. J. Appl. Probab 28: 461–465. 1991MathSciNetView ArticleGoogle Scholar
- Takahasi K: Note on the multivariate Burr’s distribution. Ann. Inst. Statist. Math 17: 257–260. 1965MathSciNetView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.