The beta Marshall-Olkin family of distributions
- Morad Alizadeh^{1},
- Gauss M. Cordeiro^{2},
- Edleide de Brito^{3}Email author and
- Clarice Garcia B. Demétrio^{4}
https://doi.org/10.1186/s40488-015-0027-7
© Alizadeh et al.; licensee Springer. 2015
Received: 6 January 2015
Accepted: 10 June 2015
Published: 1 July 2015
Abstract
We study general mathematical properties of a new generator of continuous distributions with three extra shape parameters called the beta Marshall-Olkin family. We present some special models and investigate the asymptotes and shapes. The new density function can be expressed as a mixture of exponentiated densities based on the same baseline distribution. We derive a power series for its quantile function. Explicit expressions for the ordinary and incomplete moments, quantile and generating functions, Bonferroni and Lorenz curves, Shannon and Rényi entropies and order statistics, which hold for any baseline model, are determined. We discuss the estimation of the model parameters by maximum likelihood and illustrate the flexibility of the family by means of two applications to real data. PACS 02.50.Ng, 02.50.Cw, 02.50.-r Mathematics Subject Classification (2010) 62E10, 60E05, 62P99
Keywords
Generated family Marshall-Olkin family Maximum likelihood Moment Order statistic Quantile function Rényi entropyIntroduction
Recently, some attempts have been made to define new families to extend well-known distributions and at the same time provide great flexibility in modelling data in practice. So, several classes by adding one or more parameters to generate new distributions have been proposed in the statistical literature. Some well-known generators are: the Marshall-Olkin generated (MO-G) by (Marshall and Olkin 1997), the beta-G by (Eugene et al. 2002), the Kumaraswamy-G (Kw-G for short) by (Cordeiro and Castro 2011), the McDonald-G (Mc-G) by (Alexander et al. 2012), the gamma-G by (Zografos and Balakrishnan 2009), the transformer (T-X) by (Alzaatreh et al. 2013), the Weibull-G by (Bourguignon et al. 2014) and the exponentiated half-logistic by (Cordeiro et al. 2014).
Some special models
a | b | c | G(x) | Reduced distributions |
---|---|---|---|---|
- | - | 1 | G(x) | |
1 | - | - | G(x) | Generalized Marshal-Olkin family (Jayakumar and Mathew 2008) |
1 | - | - | G(x) | Exponentiated Marshall-Olkin family (New) |
1 | 1 | - | G(x) | Marshall-Olkin family (Marshall and Olkin 1997) |
1 | - | 1 | G(x) | Proportional hazard rate family (Gupta et al. 1998) |
- | 1 | 1 | G(x) | Proportional reversed hazard rate family (Gupta and Gupta 2007) |
1 | 1 | 1 | G(x) | G(x) |
This paper is organized as follows. In Section 2, we provide a physical interpretation of the BMO-G family. Three special cases of this family are defined in Section 3. In Section 4, the shape of the density and hazard rate functions are described analytically. Some useful expansions are derived in Section 5. In Section 6, we obtain a power series for the BMO-G quantile function (qf). In Section 7, we propose explicit expressions for the ordinary and incomplete moments using the qf expansion. The generating function and mean deviations are derived in Sections 8 and 9, respectively. General expressions for the Rényi and Shannon entropies are presented in Section 10. The order statistics are investigated in Section 11. Estimation of the model parameters by maximum likelihood is performed in Section 12. Applications to two real data sets illustrate the performance of the new family in Section 13. The paper is concluded in Section 14.
The new density
where g(x;ξ) is the baseline pdf. Equation (4) will be most tractable when G(x;ξ) and g(x;ξ) have simple analytic expressions. Hereafter, a random variable X with density function (4) is denoted by X∼BMO-G(a,b,c,ξ). Further, we can omit sometimes the dependence on the vector ξ of parameters and write simply G(x)=G(x;ξ).
The basic motivations for using the BMO family in practice are: (i) to make the kurtosis more flexible compared to the baseline model; (ii) to produce a skewness for symmetrical distributions; (iii) to construct heavy-tailed distributions that are not longer-tailed for modeling real data; (iv) to generate distributions with symmetric, left-skewed, right-skewed and reversed-J shaped; (v) to define special models with all types of the hrf; (vi) to generate a large number of special distributions as those presented in Table 1; and (vii) to provide consistently better fits than other generated models under the same baseline distribution. A simple example of (ii): the normal distribution is symmetric, but the beta Marshall-Olkin normal (BMO-N) becomes skewed. The fact (vii) is well-demonstrated by fitting the BMO-N and beta Marshall-Olkin Weibull (BMO-W) distributions to two real data sets in Section 13. However, we expect that there are other contexts in which the BMO special models can produce worse fits than other generated distributions. Clearly, the results in Section 13 indicate that the new family is a very competitive class to other known generators with at most three extra shape parameters.
Some special BMO distributions
The BMO-G density function (4) allows for greater flexibility of its tails and can be widely applied in many areas. The new family extends several widely-known distributions in the literature. Here, we present a few of its many special models.
3.1 The BMO-N distribution
3.2 The BMO-W distribution
3.3 The Beta Marshall-Olkin gamma (BMO-Ga) distribution
Asymptotics and shapes
Corollary 1.
Corollary 2.
Useful representation
where h _{ k+1}(x)=(k+1) g(x;ξ) G ^{ k }(x;ξ) denotes the exp-G density function with power parameter k+1.
Thus, some mathematical properties of the new model can be derived from those exp-G properties. For example, the ordinary and incomplete moments and moment generating function (mgf) of X can be obtained from those quantities of the exp-G distribution.
The formulae derived throughout the paper can be easily handled in most symbolic computation software platforms such as Maple, Mathematica and Matlab. These platforms allow to deal with analytic expressions of formidable size and complexity. Established explicit expressions to calculate statistical measures can be more efficient than computing them directly by numerical integration. The infinity limit in these sums can be substituted by a large positive integer such as 20 or 30 for most practical purposes.
Quantile power series
Moments
The PWMs are used to derive estimators of the parameters and quantiles of generalized distributions. The moment method of estimation is formulated by equating the population and sample PWMs. These moments have low variance and no severe biases, and they compare favorably with estimators obtained by maximum likelihood. However, the maximum likelihood method is adopted in Section 12 since it is easier to estimate the BMO-G parameters because of several computer routines available in widely known softwares. The maximum likelihood estimators (MLEs) enjoy desirable properties and can be used for constructing confidence intervals and also for test statistics.
where \(\omega _{r,k}={\int _{0}^{1}} Q_{G}(u)^{r}\,u^{k} d u\) can be computed at least numerically from any baseline qf.
where (a)_{ i }=a(a+1)…(a+i−1) is the ascending factorial (with the convention that (a)_{0}=1).
This equation holds when r+k−l is even and it vanishes when r+k−l is odd. So, any BMO-N moment can be expressed as an infinite weighted linear combination of Lauricella functions of type A.
Some important questions in economics are answered by knowing the mean and the shape of a distribution. Incomplete moments of an income distribution form natural building blocks for measuring inequality: for example, the Lorenz and Bonferroni curves depend upon the incomplete moments of the income distribution.
The integral in (13) can be computed at least numerically for most baseline distributions.
Generating function
where M _{ k+1}(s) is the exp-G generating function with power parameter k+1.
where the quantity \(\rho _{k}(s)={\int _{0}^{1}}\exp \left [s\,Q_{G}(u)\right ] u^{k} d u\) can be computed numerically. Equations (14) and (15) are the main results of this section.
Mean deviations
respectively, where M=Q(0.5) is the median of X, \(\mu ^{\prime }_{1}=\mathrm {E}(X)\) comes from Eq. (12), \(F(\mu ^{\prime }_{1})\) is easily calculated from Eq. (3) and \(m_{1}(z)=\int _{-\infty }^{z} x\,f(x) dx\) is the first incomplete moment.
where \(J_{k+1}(z)=\int _{-\infty }^{z} x\,h_{k+1}(x)dx\).
where \(T_{k}(z)=\int _{0}^{G(z)}Q_{G}(u)\,u^{k} du\).
Entropies
After some algebraic manipulations, we have the following proposition.
Proposition 1.
Order statistics
where K=n!/[(i−1)! (n−i)!].
We can obtain the ordinary and incomplete moments, generating function and mean deviations of the BMO-G order statistics from Eq. (18) and some properties of the exp-G model.
Estimation
For interval estimation on the model parameters, it is required the observed information matrix, whose elements U _{ rs }=∂ ^{2} ℓ/∂ r ∂ s (for r,s=a,b,c,ξ) can be computed numerically. Under standard regularity conditions (Cox and Hinkley 1979), we can approximate the distribution of \((\widehat {\Theta }-\Theta)\) by the multivariate normal N _{ r+3}(0,J(Θ)^{−1}) distribution, where r is the number of parameters of the baseline distribution.
We can compute the maximum values of the unrestricted and restricted log-likelihoods to construct likelihood ratio (LR) statistics for testing some sub-models of the BMO-G distribution. For example, we may use LR statistics to check if the fit using the BMO-W distribution is statistically “superior” to the fits using the BW, MOW, EW, EE and Weibull distributions for a given data set.
where S(x;a,b,c,ξ)=1−F(x;a,b,c,ξ) is the survival function obtained from (3) and f(x;a,b,c,ξ) is given by (4). We maximize the likelihood (20) in the same way as described before.
Empirical illustration
We illustrate the flexibility of the BMO-W and BMO-N distributions by means of two real data sets. Similar investigations could be performed for other BMO distributions. We have chosen these distributions because of the popularity of their baseline distributions. The computations are performed using the software R version 3.0.0 (package bbmle). The maximization follows the BFGS method with analytical derivatives. The algorithm used to estimate the model parameters converged for all current models.
13.1 Illustration 1: Failure time data
MLEs (SEs in parentheses) for some fitted models to the failure time data and the AIC, CAIC and BIC values
Model | a | b | c | α | β | AIC | CAIC | BIC |
---|---|---|---|---|---|---|---|---|
W | - | - | - | 0.349 | 2.374 | 264.107 | 264.255 | 268.968 |
- | - | - | (0.017) | (0.210) | ||||
EW | 0.284 | - | - | 0.253 | 5.747 | 261.211 | 261.511 | 268.504 |
(0.054) | - | - | (0.011) | (0.693) | ||||
BW | 0.274 | 0.785 | - | 0.266 | 5.864 | 263.167 | 263.673 | 272.890 |
(0.042) | (0.779) | - | (0.056) | (0.359) | ||||
MOW | - | - | 27.588 | 1.247 | 1.058 | 262.491 | 262.791 | 269.783 |
- | - | (57.410) | (1.405) | (0.514) | ||||
EMOW | - | 0.280 | 0.838 | 0.249 | 5.920 | 263.160 | 263.667 | 272.883 |
- | (0.046) | (0.664) | (0.017) | (0.274) | ||||
BMO-W | 0.262 | 0.259 | 0.086 | 0.293 | 7.450 | 260.389 | 261.159 | 272.543 |
(0.046) | (0.085) | (0.080) | (0.019) | (0.014) | ||||
McW | 3.551 | 0.108 | 0.091 | 0.715 | 2.875 | 264.212 | 264.981 | 276.366 |
(1.140) | (0.013) | (0.008) | (0.015) | (0.017) | ||||
Kw-W | 0.266 | 1.336 | - | 0.231 | 6.515 | 262.948 | 263.454 | 272.671 |
(0.048) | (0.711) | - | (0.032) | (0.054) | ||||
LNB-W | 0.281 | 0.287 | 7.281 | 0.298 | 6.807 | 261.158 | 261.928 | 273.313 |
(0.048) | (0.095) | (6.055) | (0.020) | (0.066) |
LR tests
Failures | Hypotheses | Statistic LR | p-value |
---|---|---|---|
BMO-W vs W | H _{0}:a=b=c=1 vs H _{1}:H _{0} is false | 9.717 | 0.010 |
BMO-W vs EW | H _{0}:b=c=1 vs H _{1}:H _{0} is false | 4.822 | 0.045 |
BMO-W vs BW | H _{0}:c=1 vs H _{1}:H _{0} is false | 4.777 | 0.017 |
BMO-W vs MOW | H _{0}:a=b=1 vs H _{1}:H _{0} is false | 6.101 | 0.024 |
BMO-W vs EMOW | H _{0}:a=1 vs H _{1}:H _{0} is false | 4.771 | 0.017 |
13.2 Illustration 2: Plasma ferritin data
where x∈ I R, μ∈ I R is a location parameter, σ>0 is scale parameter, a,b and c are positive shape parameters and ϕ(·) and Φ(·) are the pdf and cdf of the standard normal distribution, respectively.
MLEs (SEs in parentheses) for some fitted models to the failure time data and the AIC, CAIC and BIC values
Model | a | b | c | μ | σ | AIC | CAIC | BIC |
---|---|---|---|---|---|---|---|---|
N | - | - | - | 76.887 | 47.399 | 2135.994 | 2136.054 | 2142.611 |
- | - | - | (3.335) | (2.359) | ||||
EN | 329.601 | - | - | -242.079 | 109.357 | 2088.197 | 2088.318 | 2098.122 |
(257.441) | - | - | (50.515) | (9.321) | ||||
BN | 7.349 | 0.185 | - | -12.984 | 30.265 | 2083.599 | 2083.803 | 2096.833 |
(1.252) | (0.014) | - | (0.003) | (0.003) | ||||
MON | - | - | 0.040 | 158.760 | 51.373 | 2090.927 | 2091.048 | 2100.851 |
- | - | (0.023) | (18.248) | (3.962) | ||||
EMON | - | 18.598 | 0.007 | 124.403 | 58.756 | 2068.369 | 2068.572 | 2081.602 |
- | (10.644) | (0.003) | (17.340) | (3.442) | ||||
BMO-N | 6.373 | 0.542 | 0.007 | 111.698 | 41.750 | 2065.363 | 2065.669 | 2081.904 |
(6.622) | (0.161) | (0.003) | (15.687) | (7.625) | ||||
McN | 0.156 | 0.182 | 25.619 | 17.890 | 28.867 | 2074.582 | 2074.888 | 2091.123 |
(0.013) | (0.016) | (0.165) | (0.357) | (0.254) | ||||
Kw-N | 2.980 | 0.229 | - | 12.806 | 29.581 | 2099.027 | 2099.230 | 2112.260 |
(0.073) | (0.016) | - | (0.197) | (0.042) | ||||
LNB-N | 2.661 | 0.520 | 58.246 | 108.256 | 38.987 | 2067.884 | 2068.190 | 2084.425 |
(1.659) | (0.139) | (26.941) | (13.437) | (6.660) |
LR tests
Plasma | Hypotheses | Statistic LR | p-value |
---|---|---|---|
BMO-N vs N | H _{0}:a=b=c=1 vs H _{1}:H _{0} is false | 76.631 | <0.001 |
BMO-N vs EN | H _{0}:b=c=1 vs H _{1}:H _{0} is false | 26.834 | <0.001 |
BMO-N vs BN | H _{0}:c=1 vs H _{1}:H _{0} is false | 20.236 | <0.001 |
BMO-N vs MON | H _{0}:a=b=1 vs H _{1}:H _{0} is false | 29.564 | <0.001 |
BMO-N vs EMON | H _{0}:a=1 vs H _{1}:H _{0} is false | 5.005 | 0.015 |
Concluding remarks
We define a new class of models, named the beta Marshall-Olkin-G (BMO-G) family of distributions by adding three shape parameters, which generalizes some well-known distributions in the statistical literature such as the normal, Weibull and beta distributions. We provide a mathematical treatment of the proposed family including expansions for the density function, ordinary and incomplete moments and generating function. The BMO-G density function can be expressed as a mixture of exponentiated density functions. This property is important to obtain several other results. We derive a power series for the quantile function. Our formulas related to the BMO-G model are manageable, and with the use of modern computer resources with analytic and numerical capabilities, they may turn into adequate tools for applied statisticians. Some special models are explored. The estimation of the model parameters is carried out by the method of maximum likelihood. Finally, we fit some special models in the new family to two real data sets to demonstrate their potentiality.
Endnote
^{1} http://functions.wolfram.com/06.23.06.0004.01
Declarations
Authors’ Affiliations
References
- Alexander, C, Cordeiro, GM, Ortega, EMM, Sarabia, JM: Generalized beta-generated distributions. Comput. Stat. Data Anal. 56, 1880–1897 (2012).MathSciNetView ArticleMATHGoogle Scholar
- Alzaatreh, A, Lee, C, Famoye, F: A new method for generating families of continuous distributions. METRON. 71, 63–79 (2013).MathSciNetView ArticleGoogle Scholar
- Bourguignon, M, Silva, RB, Cordeiro, GM: The Weibull-G family of probability distributions. J. Data Sci. 12, 53–68 (2014).MathSciNetGoogle Scholar
- Cordeiro, GM, Alizadeh, M, Ortega, EMM: The exponentiated half-logistic family of distributions: Properties and applications. J. Probab. Stat. 2014, 1–21 (2014).MathSciNetView ArticleGoogle Scholar
- Cordeiro, GM, Castro, M: A new family of generalized distributions. J. Stat. Comput. Simul. 81, 883–898 (2011).MathSciNetView ArticleMATHGoogle Scholar
- Cordeiro, GM, Nadarajah, S: Closed-form expressions for moments of a class of beta generalized distributions. Braz. J. Probab. Stat. 25, 14–33 (2011).MathSciNetView ArticleGoogle Scholar
- Cox, DR, Hinkley, DV: Theoretical statistics. Chapman Hill, London (1979).Google Scholar
- Doornik, J: Ox 5: An Object-Oriented Matrix Language. Timberlake Consultants Press, London (2007).Google Scholar
- Eugene, N, Lee, C, Famoye, F: Beta-normal distribution and its applications. Commun. Stat. Theory Methods. 31, 497–512 (2002).MathSciNetView ArticleMATHGoogle Scholar
- Exton, H: Handbook of Hypergeometric Integrals: Theory, Applications, Tables, Computer Programs. Halsted Press, New York (1978).MATHGoogle Scholar
- Gupta, RC, Gupta, PL, Gupta, RD: Modeling failure time data by Lehman alternatives. Commun. Stat. Theory Methods. 27, 887–904 (1998).View ArticleMATHGoogle Scholar
- Gupta, RC, Gupta, RD: Proportional reversed hazard rate model and its applications. J. Stat. Planning Inference. 137, 3525–3536 (2007).View ArticleMATHGoogle Scholar
- Jayakumar, K, Mathew, T: On a generalization to marshall–olkin scheme and its application to burr type xii distribution. Stat. Papers. 49, 421–439 (2008).MathSciNetView ArticleMATHGoogle Scholar
- Jones, MC: Families of distributions arising from distributions of order statistics. Test. 13, 1–43 (2004).MathSciNetView ArticleMATHGoogle Scholar
- Kenney, JF, Keeping, ES: Mathematics of Statistics, Part 1. 3rd edition, Van Nostrand, New Jersey (1962).Google Scholar
- Marshall, AW, Olkin, I: A new method for adding a parameter to a family of distributions with application to the exponential and Weibull families. Biometrika. 84, 641–652 (1997).MathSciNetView ArticleMATHGoogle Scholar
- Moors, JJA: A quantile alternative for kurtosis. The Statistician. 37, 25–32 (1988).View ArticleGoogle Scholar
- Murthy, DNP, Xie, M, Jiang, R: Weibull models. New Jersey (2004).Google Scholar
- R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Viennas, Austria (2013).Google Scholar
- Rényi, A: On measures of entropy and information, volume I. In: proceedings of the 4th Berkeley symposium on mathematical statistics and probability edition. University of California Press, Berkeley (1961).Google Scholar
- Rigby, RA, Stasinopoulos, DM: Generalized additive models for location, scale and shape (with discussion). Appl. Stat. 54, 507–554 (2005).MathSciNetMATHGoogle Scholar
- Shannon, CE: Prediction and entropy of printed english. Bell Syst. Tech. J. 30, 50–64 (1951).View ArticleMATHGoogle Scholar
- Trott, M: The Mathematica Guidebook for Symbolics. With 1 DVD-ROM (Windows, Macintosh and UNIX). Springer, New York (2006).Google Scholar
- Weisberg, S: Applied linear regression. 3rd edition. Wiley, New York (2014).MATHGoogle Scholar
- Zografos, K, Balakrishnan, N: On families of beta- and generalized gamma-generated distributions and associated inference. Stat. Methodol. 6, 344–362 (2009).MathSciNetView ArticleMATHGoogle Scholar
Copyright
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.