Open Access

Exponentiated Kumaraswamy-Dagum distribution with applications to income and lifetime data

Journal of Statistical Distributions and Applications20141:8

DOI: 10.1186/2195-5832-1-8

Received: 22 November 2013

Accepted: 11 March 2014

Published: 16 June 2014

Abstract

A new family of distributions called exponentiated Kumaraswamy-Dagum (EKD) distribution is proposed and studied. This family includes several well known sub-models, such as Dagum (D), Burr III (BIII), Fisk or Log-logistic (F or LLog), and new sub-models, namely, Kumaraswamy-Dagum (KD), Kumaraswamy-Burr III (KBIII), Kumaraswamy-Fisk or Kumaraswamy-Log-logistic (KF or KLLog), exponentiated Kumaraswamy-Burr III (EKBIII), and exponentiated Kumaraswamy-Fisk or exponentiated Kumaraswamy-Log-logistic (EKF or EKLLog) distributions. Statistical properties including series representation of the probability density function, hazard and reverse hazard functions, moments, mean and median deviations, reliability, Bonferroni and Lorenz curves, as well as entropy measures for this class of distributions and the sub-models are presented. Maximum likelihood estimates of the model parameters are obtained. Simulation studies are conducted. Examples and applications as well as comparisons of the EKD and its sub-distributions with other distributions are given.

Mathematics Subject Classification (2000)

62E10; 62F30

Keywords

Dagum distribution Exponentiated Kumaraswamy-Dagum distribution Maximum likelihood estimation

1 Introduction

Camilo Dagum proposed the distribution which is referred to as Dagum distribution in 1977. This proposal enable the development of statistical distributions used to fit empirical income and wealth data, that could accommodate heavy tails in income and wealth distributions. Dagum’s proposed distribution has both Type-I and Type-II specification, where Type-I is the three parameter specification and Type-II deals with four parameter specification. This distribution is a special case of generalized beta distribution of the second kind (GB2), McDonald (1984), McDonald and Xu (1995), when the parameter q = 1, where the probability density function (pdf) of the GB2 distribution is given by:
f GB 2 ( y ; a , b , p , q ) = a y ap - 1 b ap B ( p , q ) [ 1 + y b a ] p + q , for y > 0 .

See Kleiber and Kotz (2003) for details. Note that a > 0,p > 0,q > 0, are the shape parameters and b is the scale parameter and B ( p , q ) = Γ ( p ) Γ ( q ) Γ ( p + q ) is the beta function. Kleiber (2008) traced the genesis of Dagum distribution and summarized several statistical properties of this distribution. Domma et al. (2011) obtained the maximum likelihood estimates of the parameters of Dagum distribution for censored data. Domma and Condino (2013) presented the beta-Dagum distribution. Cordeiro et al. (2013) proposed the beta exponentiated Weibull distribution. Cordeiro et al. (2010) introduced and studied some mathematical properties of the Kumaraswamy Weibull distribution. Oluyede and Rajasooriya (2013) developed the Mc-Dagum distribution and presented its statistical properties. See references therein for additional results.

The pdf and cumulative distribution function (cdf) of Dagum distribution are given by:
g D ( x ; λ , β , δ ) = β λ δ x - δ - 1 1 + λ x - δ - β - 1
(1)
and
G D ( x ; λ , β , δ ) = 1 + λ x - δ - β ,
(2)
for x > 0, where λ is a scale parameter, δ and β are shape parameters. Dagum (1977) refers to his model as the generalized logistic-Burr distribution. The k th raw or non central moments are given by
E X k = β λ k δ B β + k δ , 1 - k δ ,
for k < δ, and λ,β > 0, where B(·,·) is the beta function. The q th percentile is
x q = λ 1 δ q - 1 β - 1 - 1 δ .

In this paper, we present generalizations of the Dagum distribution via Kumaraswamy distribution and its exponentiated version. This leads to the exponentiated Kumaraswamy Dagum distribution.

The motivation for the development of this distribution is the modeling of size distribution of personal income and lifetime data with a diverse model that takes into consideration not only shape, and scale but also skewness, kurtosis and tail variation. Also, the EKD distribution and its sub-models has desirable features of exhibiting a non-monotone failure rate, thereby accommodating different shapes for the hazard rate function and should be an attractive choice for survival and reliability data analysis.

This paper is organized as follows. In section 3, we present the exponentiated Kumaraswamy-Dagum distribution and its sub models, as well as series expansion, hazard and reverse hazard functions. Moments, moment generating function, Lorenz and Bonferroni curves, mean and median deviations, and reliability are obtained in section 4. Section 5 contains results on the distribution of the order statistics and Renyi entropy. Estimation of model parameters via the method of maximum likelihood is presented in section 6. In section 7, various simulations are conducted for different sample sizes. Section 8 contains examples and applications of the EKD distribution and its sub-models, followed by concluding remarks.

2 Methods, results and discussions

Methods, results and discussions for the class of EKD distributions are presented in sections 3 to 8. These sections include the sub-models, series expansion of the pdf, closed form expressions for the hazard and reverse hazard functions, moments, moment generating function, Bonferroni and Lorenz curves, reliability, mean and median deviations, distribution of order statistics and entropy, as well as estimation of model parameters and applications.

3 The exponentiated Kumaraswamy-Dagum distribution

In this section, we present the proposed distribution and its sub-models. Series expansion, hazard and reverse hazard functions are also studied in this section.

3.1 Kumaraswamy-Dagum distribution

Kumaraswamy (1980) introduced a two-parameter distribution on (0,1). Its cdf is given by
G ( x ) = 1 - 1 - x ψ ϕ , x ( 0 , 1 ) ,

for ψ > 0 and ϕ > 0.

For an arbitrary cdf F (x) with pdf f ( x ) = dF ( x ) dx , the family of Kumaraswamy-G distributions with cdf G k (x) is given by
G K ( x ) = 1 - 1 - F ψ ( x ) ϕ ,
for ψ > 0 and ϕ > 0. By letting F (x) = G D (x), we obtain the Kumaraswamy-Dagum (KD) distribution, with cdf
G KD ( x ) = 1 - 1 - G D ψ ( x ) ϕ .

3.2 The EKD distribution

In general, the EKD distribution is G EKD (x) = [ F KD (x)] θ , where F KD (x) is a baseline (Kum-Dagum) cdf, θ>0, with the corresponding pdf given by g EKD (x) = θ [ F KD (x)]θ-1f KD (x). For large values of x, and for θ > 1(< 1), the multiplicative factor θ [ F KD (x)]θ-1> 1(< 1), respectively. The reverse statement holds for smaller values of x. Consequently, this implies that the ordinary moments of g EKD (x) are larger (smaller) than those of f KD (x) when θ > 1(< 1).

Replacing the dependent parameter β ψ by α, the cdf and pdf of the EKD distribution are given by
G EKD ( x ; α , λ , δ , ϕ , θ ) = 1 - 1 - 1 + λ x - δ - α ϕ θ ,
(3)
and
g EKD ( x ; α , λ , δ , ϕ , θ ) = α λ δ ϕ θ x - δ - 1 1 + λ x - δ - α - 1 1 - 1 + λ x - δ - α ϕ - 1 × 1 - 1 - 1 + λ x - δ - α ϕ θ - 1 ,
(4)
for α,λ,δ,ϕ,θ >0, and x > 0, respectively. The quantile function of the EKD distribution is in closed form,
G EKD - 1 ( q ) = x q = λ 1 δ 1 - 1 - q 1 θ 1 ϕ - 1 α - 1 - 1 δ .
(5)
Plots of the pdf for some combinations of values of the model parameters are given in Figure 1. The plots indicate that the EKD pdf can be decreasing or right skewed. The EKD distribution has a positive asymmetry.
Figure 1

Graph of pdfs.

3.3 Some sub-models

Sub-models of EKD distribution for selected values of the parameters are presented in this section.
  1. 1.
    When θ = 1, we obtain Kumaraswamy-Dagum distribution with cdf:
    G ( x ; α , λ , δ , ϕ ) = 1 - 1 - 1 + λ x - δ - α ϕ ,
     
for α,λ,δ,ϕ > 0 and x>0.
  1. 2.
    When ϕ = θ = 1, we obtain Dagum distribution with cdf:
    G ( x ; α , λ , δ ) = 1 + λ x - δ - α ,
     
for α,λ,δ > 0 and x > 0.
  1. 3.
    When λ = 1, we obtain exponentiated Kumaraswamy-Burr III distribution with cdf:
    G ( x ; α , δ , ϕ , θ ) = 1 - 1 - 1 + x - δ - α ϕ θ ,
     
for α,δ,ϕ,θ > 0 and x > 0.
  1. 4.
    When λ = θ = 1, we obtain Kumaraswamy-Burr III distribution with cdf:
    G ( x ; α , δ , ϕ ) = 1 - 1 - 1 + x - δ - α ϕ ,
     
for α,δ,ϕ > 0 and x > 0.
  1. 5.
    When λ = ϕ = θ = 1, we obtain Burr III distribution with cdf:
    G ( x ; α , δ ) = 1 + x - δ - α ,
     
for α,δ > 0 and x > 0.
  1. 6.
    When α = 1, we obtain exponentiated Kumaraswamy-Fisk or Kumaraswamy-Log-logistic distribution with cdf:
    G ( x ; λ , δ , ϕ , θ ) = 1 - 1 - 1 + λ x - δ - 1 ϕ θ ,
     
for λ,δ,ϕ,θ > 0 and x > 0.
  1. 7.
    When α = θ = 1, we obtain Kumaraswamy-Fisk or Kumaraswamy-Log-logistic distribution with cdf:
    G ( x ; λ , δ , ϕ ) = 1 - 1 - 1 + λ x - δ - 1 ϕ ,
     
for λ,δ,ϕ > 0 and x > 0.
  1. 8.
    When α = ϕ = θ = 1, we obtain Fisk or Log-logistic distribution with cdf:
    G ( x ; λ , δ ) = 1 + λ x - δ - 1 ,
     

for λ,δ > 0 and x > 0.

3.4 Series expansion of EKD distribution

We apply the series expansion
( 1 - z ) b - 1 = j = 0 ( - 1 ) j Γ ( b ) Γ ( b - j ) j ! z j ,
(6)

for b > 0 and |z| < 1, to obtain the series expansion of the EKD distribution.

By using equation (6),
g EKD ( x ) = i = 0 j = 0 ω ( i , j ) x - δ - 1 1 + λ x - δ - α ( j + 1 ) - 1 ,
(7)

where ω ( i , j ) = α λ δ ϕ θ ( - 1 ) i + j Γ ( θ ) Γ ( ϕ i + ϕ ) Γ ( θ - i ) Γ ( ϕ i + ϕ - j ) i ! j ! .

Note that in the Dagum (α,δ,λ) distribution, α and δ are shape parameters, and λ is a scale parameter. In the Exponentiated-Kumaraswamy (ψ,ϕ,θ) distribution, ψ is a skewness parameter, ϕ is a tail variation parameter, and the parameter θ characterizes the skewness, kurtosis, and tail of the distribution.

Consequently, for the EKD (α,λ,δ,ϕ,θ) distribution, α is shape and skewness parameter, δ is shape parameter, λ is a scale parameter, ϕ is a tail variation parameter, and the parameter θ characterizes the skewness, kurtosis, and tail of the distribution.

3.5 Hazard and reverse hazard function

The hazard function of the EKD distribution is
h EKD ( x ) = g EKD ( x ) 1 - G EKD ( x ) = αλδϕθ x - δ - 1 1 + λ x - δ - α - 1 1 - 1 + λ x - δ - α ϕ - 1 × { 1 - 1 - 1 + λ x - δ - α ϕ } θ - 1 × 1 - 1 - 1 - 1 + λ x - δ - α ϕ θ - 1 .
(8)
Plots of the hazard function are presented in Figure 2. The plots show various shapes including monotonically decreasing, unimodal, and bathtub followed by upside down bathtub shapes with five combinations of the values of the parameters. This attractive flexibility makes the EKD hazard rate function useful and suitable for non-monotone empirical hazard behaviors which are more likely to be encountered or observed in real life situations. Unfortunately, the analytical analysis of the shape of both the density (except for zero modal when α δ ≤ 1, and unimodal if α δ > 1, both for ϕ = θ = 1,) and hazard rate function seems to be very complicated. We could not determine any specific rules for the shapes of the hazard rate function.
Figure 2

Graphs of hazard functions.

The reverse hazard function of the EKD distribution is
τ EKD ( x ) = g EKD ( x ) G EKD ( x ) = αλδϕθ x - δ - 1 1 + λ x - δ - α - 1 1 - 1 + λ x - δ - α ϕ - 1 × 1 - 1 - 1 + λ x - δ - α ϕ - 1 .
(9)

4 Moments, moment generating function, Bonferroni and Lorenz curves, mean and median deviations, and reliability

In this section, we present the moments, moment generating function, Bonferroni and Lorenz curves, mean and median deviations as well as the reliability of the EKD distribution. The moments of the sub-models can be readily obtained from the general results.

4.1 Moments and moment generating function

Let t = (1+ λ x-δ)-1 in equation (7), then the s th raw moment of the EKD distribution is given by
E ( X s ) = 0 x s · g EKD ( x ) dx = i = 0 j = 0 ω ( i , j ) λ s δ - 1 · 1 δ · B α ( j + 1 ) + s δ , 1 - s δ = i = 0 j = 0 ω ( i , j , s ) B α ( j + 1 ) + s δ , 1 - s δ ,
(10)

where ω ( i , j , s ) = α λ s δ ϕ θ ( - 1 ) i + j Γ ( θ ) Γ ( ϕ i + ϕ ) Γ ( θ - i ) Γ ( ϕ i + ϕ - j ) i ! j ! , and s < δ.

The moment generating function of the EKD distribution is given by
M ( t ) = r = 0 i = 0 j = 0 ω ( i , j , r ) t r r ! B α ( j + 1 ) + r δ , 1 - r δ ,

for r < δ.

4.2 Bonferroni and Lorenz curves

Bonferroni and Lorenz curves are widely used tool for analyzing and visualizing income inequality. Lorenz curve, L(p) can be regarded as the proportion of total income volume accumulated by those units with income lower than or equal to the volume a, and Bonferroni curve, B(p) is the scaled conditional mean curve, that is, ratio of group mean income of the population. Plots of Bonferroni and Lorenz curves are given in Figure 3.
Figure 3

Graphs of Bonferroni and Lorenz curves.

Let I ( a ) = 0 a x · g EKD ( x ) dx and μ = E (X), then Bonferroni and Lorenz curves are given by
B ( p ) = I ( q ) p μ and L ( p ) = I ( q ) μ ,
respectively, for 0 ≤ p ≤ 1, and q = G EKD - 1 ( p ) . The mean of the EKD distribution is obtained from equation (10) with s = 1 and the quantile function is given in equation (5). Consequently,
I ( a ) = i = 0 j = 0 ω ( i , j , 1 ) B t ( a ) α ( j + 1 ) + 1 δ , 1 - 1 δ ,
(11)

for δ > 1, where t (a) = (1 + λ a-δ)-1, and B G ( x ) ( c , d ) = 0 G ( x ) t c - 1 ( 1 - t ) d - 1 dt for |G (x)| < 1 is incomplete Beta function.

4.3 Mean and median deviations

If X has the EKD distribution, we can derive the mean deviation about the mean μ = E (X) and the median deviation about the median M from
δ 1 = 0 | x - μ | g EKD ( x ) dx and δ 2 = 0 | x - M | g EKD ( x ) dx ,

respectively. The mean μ is obtained from equation (10) with s=1, and the median M is given by equation (5) when q = 1 2 .

The measure δ1 and δ2 can be calculated by the following relationships:
δ 1 = 2 μ G EKD ( μ ) - 2 μ + 2 T ( μ ) and δ 2 = 2 T ( M ) - μ ,
where T ( a ) = a x · g EKD ( x ) dx follows from equation (11), that is
T ( a ) = i = 0 j = 0 ω ( i , j , 1 ) B α ( j + 1 ) + 1 δ , 1 - 1 δ - B t ( a ) α ( j + 1 ) + 1 δ , 1 - 1 δ .

4.4 Reliability

The reliability R = P(X1 > X2) when X1 and X2 have independent EKD (α1,λ1,δ1,ϕ1,θ1) and EKD (α2,λ2,δ2,ϕ2,θ2) distributions is given by
R = 0 g 1 ( x ) G 2 ( x ) dx = i = 0 j = 0 k = 0 l = 0 ζ ( i , j , k , l ) 0 x - δ 1 - 1 1 + λ 1 x - δ 1 - α 1 ( j + 1 ) - 1 1 + λ 2 x - δ 2 - α 2 l dx ,

where ζ ( i , j , k , l ) = α 1 λ 1 δ 1 ϕ 1 θ 1 ( - 1 ) i + j + k + l Γ ( θ 1 ) Γ ( ϕ 1 i + ϕ 1 ) Γ ( θ 2 + 1 ) Γ ( ϕ 2 k + 1 ) Γ ( θ 1 - i ) Γ ( ϕ 1 i + ϕ 1 - j ) Γ ( θ 2 + 1 - k ) Γ ( ϕ 2 k + 1 - l ) i ! j ! k ! l ! .

If λ = λ1 = λ2 and δ = δ1 = δ2, then reliability can be reduced to
R = i = 1 j = 1 k = 1 l = 1 ζ ( i , j , k , l ) λ δ [ α 1 ( j + 1 ) + α 2 l ] .

5 Order statistics and entropy

In this section, the distribution of the k th order statistic and Renyi entropy (Renyi 1960) for the EKD distribution are presented. The entropy of a random variable is a measure of variation of the uncertainty.

5.1 Order statistics

The pdf of the k th order statistics from a pdf f (x) is
f k : n ( x ) = f ( x ) B ( k , n - k + 1 ) F k - 1 ( x ) [ 1 - F ( x ) ] n - k = k n k f ( x ) F k - 1 ( x ) [ 1 - F ( x ) ] n - k .
(12)
Using equation (6), the pdf of the k th order statistic from EKD distribution is given by
g k : n ( x ) = i = 0 j = 0 p = 0 K ( i , j , p , k ) · x - δ - 1 1 + λ x - δ - α - α p - 1 ,

where K ( i , j , p , k ) = ( - 1 ) i + j + p Γ ( n - k + 1 ) Γ ( θ k + θ i ) Γ ( ϕ j + ϕ ) Γ ( n - k + 1 - i ) Γ ( θ k + θ i - j ) Γ ( ϕ j + ϕ - p ) i ! j ! p ! k n k α λ δ ϕ θ .

5.2 Entropy

Renyi entropy of a distribution with pdf f (x) is defined as
I R ( τ ) = ( 1 - τ ) - 1 log R f τ ( x ) dx , τ > 0 , τ 1 .
Using equation (6), Renyi entropy of EKD distribution is given by
I R ( τ ) = ( 1 - τ ) - 1 log i = 0 j = 0 ( - 1 ) i + j Γ ( θ τ - τ + 1 ) Γ ( ϕ τ - τ + ϕ i + 1 ) Γ ( θ τ - τ + 1 - i ) Γ ( ϕ τ - τ + ϕ i + 1 - j ) i ! j ! × α τ λ - τ δ + 1 δ δ τ - 1 ϕ τ θ τ B α τ + α j + 1 - τ δ , τ + τ - 1 δ .

for α τ + αj + 1 - τ δ > 0 and τ + τ - 1 δ > 0 . Renyi entropy for the sub-models can be readily obtained.

6 Estimation of model parameters

In this section, we present estimates of the parameters of the EKD distribution via method of maximum likelihood estimation. The elements of the score function are presented. There are no closed form solutions to the nonlinear equations obtained by setting the elements of the score function to zero. Thus, the estimates of the model parameters must be obtained via numerical methods.

6.1 Maximum likelihood estimation

Let x = (x1,,x n ) T be a random sample of the EKD distribution with unknown parameter vector Θ = (α,λ,δ,ϕ,θ) T . The log-likelihood function for Θ is
l ( Θ ) = n ln α + ln λ + ln δ + ln ϕ + ln θ - ( δ + 1 ) i = 1 n ln x i - ( α + 1 ) i = 1 n ln 1 + λ x i - δ + ( ϕ - 1 ) i = 1 n ln 1 - 1 + λ x i - δ - α + ( θ - 1 ) i = 1 n ln 1 - 1 - 1 + λ x i - δ - α ϕ .
(13)
The partial derivatives of l (Θ) with respect to the parameters are
l α = n α - i = 1 n ln 1 + λ x i - δ + ( ϕ - 1 ) i = 1 n 1 + λ x i - δ - α ln 1 + λ x i - δ 1 - 1 + λ x i - δ - α - ( θ - 1 ) ϕ i = 1 n 1 - 1 + λ x i - δ - α ϕ - 1 1 + λ x i - δ - α ln 1 + λ x i - δ 1 - 1 - 1 + λ x i - δ - α ϕ ,
l λ = n λ - ( α + 1 ) i = 1 n x i - δ 1 + λ x i - δ + ( ϕ - 1 ) α i = 1 n 1 + λ x i - δ - α - 1 x i - δ 1 - 1 + λ x i - δ - α - ( θ - 1 ) ϕ α i = 1 n 1 - 1 + λ x i - δ - α ϕ - 1 1 + λ x i - δ - α - 1 x i - δ 1 - 1 - 1 + λ x i - δ - α ϕ ,
l δ = n δ - i = 1 n ln x i + ( α + 1 ) λ i = 1 n x i - δ ln x i 1 + λ x i - δ - ( ϕ - 1 ) α λ i = 1 n 1 + λ x i - δ - α - 1 x i - δ ln x i 1 - 1 + λ x i - δ - α + ( θ - 1 ) ϕ α λ i = 1 n 1 - 1 + λ x i - δ - α ϕ - 1 1 + λ x i - δ - α - 1 x i - δ ln x i 1 - 1 - 1 + λ x i - δ - α ϕ ,
l ϕ = n ϕ + i n ln 1 - 1 + λ x i - δ - α - ( θ - 1 ) i = 1 n 1 - 1 + λ x i - δ - α ϕ ln 1 - 1 + λ x i - δ - α 1 - 1 - 1 + λ x i - δ - α ϕ ,
and
l θ = n θ + i = 1 n ln 1 - 1 - 1 + λ x i - δ - α ϕ ,

respectively. The MLE of the parameters α,λ,δ,ϕ, and θ, say α ̂ , λ ̂ , δ ̂ , ϕ ̂ , and θ ̂ , must be obtained by numerical methods.

6.2 Asymptotic confidence intervals

In this section, we present the asymptotic confidence intervals for the parameters of the EKD distribution. The expectations in the Fisher Information Matrix (FIM) can be obtained numerically. Let Θ ̂ = ( α ̂ , λ ̂ , δ ̂ , ϕ ̂ , θ ̂ ) be the maximum likelihood estimate of Θ = (α,λ,δ,ϕ,θ). Under the usual regularity conditions and that the parameters are in the interior of the parameter space, but not on the boundary, we have: n ( Θ ̂ - Θ ) d N 5 ( 0 ̲ , I - 1 ( Θ ) ) , where I (Θ) is the expected Fisher information matrix. The asymptotic behavior is still valid if I (Θ) is replaced by the observed information matrix evaluated at θ ̂ , that is J ( Θ ̂ ) . The multivariate normal distribution N 5 0 ̲ , J ( Θ ̂ ) - 1 , where the mean vector 0 ̲ = ( 0 , 0 , 0 , 0 , 0 ) T , can be used to construct confidence intervals and confidence regions for the individual model parameters and for the survival and hazard rate functions.

The approximate 100(1-η)% two-sided confidence intervals for α, λ, δ, ϕ and θ are given by:
α ̂ ± Z η 2 I α α - 1 ( Θ ̂ ) , λ ̂ ± Z η 2 I λ λ - 1 ( Θ ̂ ) , δ ̂ ± Z η 2 I δ δ - 1 ( Θ ̂ )
ϕ ̂ ± Z η 2 I ϕ ϕ - 1 ( Θ ̂ ) , θ ̂ ± Z η 2 I θ θ - 1 ( Θ ̂ )

respectively, where Z η 2 is the upper η 2 th percentile of a standard normal distribution.

We can use the likelihood ratio (LR) test to compare the fit of the EKD distribution with its sub-models for a given data set. For example, to test θ = 1, the LR statistic is
ω = 2 ln L α ̂ , λ ̂ , δ ̂ , ϕ ̂ , θ ̂ - ln L α ~ , λ ~ , δ ~ , ϕ ~ , 1 ,

where α ̂ , λ ̂ , δ ̂ , ϕ ̂ and θ ̂ are the unrestricted estimates, and α ~ , λ ~ , δ ~ and ϕ ~ are the restricted estimates. The LR test rejects the null hypothesis if ω > χ d 2 , where χ d 2 denote the upper 100d % point of the χ2 distribution with 1 degrees of freedom.

7 Simulation study

In this section, we examine the performance of the EKD distribution by conducting various simulations for different sizes (n = 200, 400, 800, 1200) via the subroutine NLP in SAS. We simulate 2000 samples for the true parameters values I : α = 2,λ= 1,δ = 3, ϕ = 2,θ = 2 and I I : α = 1,λ = 1,δ = 1,ϕ = 1,θ = 1. Table 1 lists the means MLEs of the five model parameters along with the respective root mean squared errors (RMSE). From the results, we can verify that as the sample size n increases, the mean estimates of the parameters tend to be closer to the true parameter values, since RMSEs decay toward zero.
Table 1

Monte Carlo simulation results: mean estimates and RMSEs

  

I

 

II

n

Parameter

Mean

RMSE

 

Mean

RMSE

200

α

4.41621

3.979304324

 

1.7899006

1.992043574

 

λ

1.3580866

2.642335804

 

1.4287071

1.528578784

 

δ

3.1167852

2.601663026

 

1.0337146

0.5898521

 

ϕ

5.7270324

6.535452517

 

2.4702434

3.712081559

 

θ

4.5560563

4.306946865

 

2.8884959

3.689669972

400

α

3.5972873

3.071770841

 

1.5456974

1.513782811

 

λ

1.1196079

0.900800533

 

1.1382897

0.732002869

 

δ

2.9333424

1.821450521

 

1.0064105

0.377302664

 

ϕ

4.6989703

5.277876069

 

1.5488732

1.872088246

 

θ

4.1188983

3.616692978

 

2.4213684

2.969367761

800

α

3.1040595

2.417025941

 

1.4359333

1.278449373

 

λ

1.0626388

0.609066006

 

1.0432761

0.346996974

 

δ

2.8960167

1.36814261

 

1.0017278

0.250650155

 

ϕ

3.7437056

3.919777583

 

1.176675

0.766203302

 

θ

3.4890255

2.748229594

 

1.9733522

2.197844717

1200

α

2.8399564

2.058703427

 

1.3884174

1.169251427

 

λ

1.0429655

0.501712467

 

1.021836

0.258884917

 

δ

2.9152476

1.133666485

 

1.0014919

0.193825437

 

ϕ

3.1751818

3.043071803

 

1.083574

0.392293513

 

θ

3.164176

2.346236284

 

1.731924

1.788360814

8 Application: EKD and sub-distributions

In this section, applications based on real data, as well as comparison of the EKD distribution with its sub-models are presented. We provide examples to illustrate the flexibility of the EKD distribution in contrast to other models, including the exponentiated Kumaraswamy-Weibull (EKW), and beta-Kumaraswamy-Weibull (BKW) distributions for data modeling. The pdfs of EKW and BKW distributions are
f EKW ( x ) = θ abc λ c x c - 1 e - ( λ x ) c 1 - e - ( λ x ) c a - 1 1 - 1 - e - ( λ x ) c a b - 1 × 1 - 1 - 1 - e - ( λ x ) c a b θ - 1 ,
and
f BKW ( x ) = 1 B ( a , b ) α β c λ c x c - 1 e - ( λ x ) c 1 - e - ( λ x ) c α - 1 × 1 - 1 - e - ( λ x ) c α β b - 1 1 - 1 - 1 - e - ( λ x ) c α β a - 1 ,

respectively.

The first data set consists of the number of successive failures for the air conditioning system of each member in a fleet of 13 Boeing 720 jet airplanes (Proschan 1963). The data is presented in Table 2. The second data set consists of the salaries of 818 professional baseball players for the year 2009 (USA TODAY).
Table 2

Air conditioning system data

194

413

90

74

55

23

97

50

359

50

130

487

57

102

15

14

10

57

320

261

51

44

9

254

493

33

18

209

41

58

60

48

56

87

11

102

12

5

14

14

29

37

186

29

104

7

4

72

270

283

7

61

100

61

502

220

120

141

22

603

35

98

54

100

11

181

65

49

12

239

14

18

39

3

12

5

32

9

438

43

134

184

20

386

182

71

80

188

230

152

5

36

79

59

33

246

1

79

3

27

201

84

27

156

21

16

88

130

14

118

44

15

42

106

46

230

26

59

153

104

20

206

5

66

34

29

26

35

5

82

31

118

326

12

54

36

34

18

25

120

31

22

18

216

139

67

310

3

46

210

57

76

14

111

97

62

39

30

7

44

11

63

23

22

23

14

18

13

34

16

18

130

90

163

208

1

24

70

16

101

52

208

95

62

11

191

14

71

       
The third data set represents the poverty rate of 533 districts with more than 15,000 students in 2009 (Digest of Education Statistics “http://nces.ed.gov/programs/digest/d11/tables/dt11_096.asp”). These data sets are modeled by the EKD distribution and compared with the corresponding sub-models, the Kumaraswamy-Dagum and Dagum distributions, and as well as EKW, BKW distributions. Table 3 gives a descriptive summary of each sample. The air conditioning system sample has far more variability and the baseball player salary sample has the lowest variability.
Table 3

Descriptive statistics

Data

Mean

Median

Mode

SD

Variance

Skewness

Kurtosis

Min.

Max.

I

92.07

54.00

14.00

107.92

11646

2.16

5.19

1.0

603.0

II

3.26

1.15

0.40

4.36

19.05

2.10

5.13

0.4

33.0

III

17.71

16.80

9.30

8.80

77.38

0.80

0.73

2.7

53.6

The maximum likelihood estimates (MLEs) of the parameters are computed by maximizing the objective function via the subroutine NLMIXED in SAS. The estimated values of the parameters (standard error in parenthesis), -2 Log-likelihood statistic, Akaike Information Criterion, A IC = 2 p - 2 ln(L), Bayesian Information Criterion, B IC = p ln(n) - 2 ln(L), and Consistent Akaike Information Criterion, AICC = AIC + 2 p ( p + 1 ) n - p - 1 , where L = L ( Θ ̂ ) is the value of the likelihood function evaluated at the parameter estimates, n is the number of observations, and p is the number of estimated parameters for the EKD distribution and its sub-distributions are tabulated. See Table 4, Table 5 and Table 6.
Table 4

Estimation of models for air conditioning system data

 

Estimates

Statistics

Distribution

α

λ

δ

ϕ

θ

 

-2 log likelihood

AIC

AICC

BIC

SS

EKD

20.6164

4.7323

0.6192

18.1616

0.1657

 

2065.0

2075.0

2075.3

2091.2

0.0309

 

(1.2347)

(0.4174)

(0.0459)

(5.8028)

(0.0089)

      

KD

5.0354

4.3846

0.3762

21.7047

1

 

2066.9

2074.9

2075.2

2087.9

0.0368

 

(2.1177)

(3.0727)

(0.1253)

(27.9167)

-

      

D

1.2390

94.1526

1.2626

1

1

 

2078.4

2084.4

2084.5

2094.1

0.1344

 

(0.1749)

(33.7549)

(0.0663)

-

-

      
 

a

b

c

λ

θ

      

EKW

3.7234

0.1219

1.0595

0.0495

0.3784

 

2063.7

2073.7

2074.0

2089.8

0.0254

 

(0.8783)

(0.0183)

(0.1448)

(0.0224)

(0.1136)

      
 

a

b

α

β

c

λ

     

BKW

1.4342

0.0830

2.0054

1.9100

0.7412

0.1809

2064.6

2076.6

2077.1

2096.1

0.0338

 

(1.2507)

(0.0875)

(1.6573)

(1.9807)

(0.0343)

(0.0388)

     
Table 5

Estimation of models for baseball player salary data

 

Estimates

Statistics

Distribution

α

λ

δ

ϕ

θ

 

-2 log likelihood

AIC

AICC

BIC

SS

EKD

69.1586

0.000043

7.6321

0.0591

0.4075

 

2864.1

2874.1

2874.2

2897.7

7.8153

 

(0.000036)

(0.0000058)

(0.0557)

(0.0044)

(0.0327)

      

KD

69.0839

0.000011

7.2375

0.0996

1

 

2957.2

2965.2

2965.2

2984.0

7.7095

 

(0.000061)

(0.00000133)

(0.037)

(0.0036)

-

      

D

70.0780

0.0116

1.0312

1

1

 

3225.6

3231.6

3231.6

3245.7

6.4568

 

(34.4988)

(0.0058)

(0.0301)

-

-

      
 

a

b

c

λ

θ

      

EKW

15.0514

0.1368

0.6376

8.8903

0.5419

 

3209.8

3219.8

3219.9

3243.3

5.3289

 

(2.0692)

(0.0266)

(0.0756)

(4.9198)

(0.2098)

      
 

a

b

α

β

c

λ

     

BKW

24.0047

0.03783

14.4799

4.6029

0.5168

32.1184

3088.4

3100.4

3100.5

3128.7

18.0516

 

(0.6879)

(0.0039)

(0.2069)

(0.4549)

(0.006)

(2.4559)

     
Table 6

Estimation of models for poverty rate data

 

Estimates

Statistics

Distribution

α

λ

δ

ϕ

θ

 

-2 log likelihood

AIC

AICC

BIC

SS

EKD

75.5803

0.851500

0.8183

60.9069

0.3091

 

3750.7

3760.7

3760.8

3782.1

0.1305

 

(11.1276)

(0.32)

(0.0714)

(29.1324)

(0.02229)

      

KD

60.8898

0.304000

0.4666

90.2889

1

 

3758.9

3766.9

3767.0

3784.0

0.2604

 

(17.5714)

(0.0963)

(0.0555)

(54.8283)

-

      

D

1.7954

350.0100

2.4175

1

1

 

3831.8

3837.8

3837.9

3850.7

0.9210

 

(0.2034)

(105.94)

(0.0784)

-

-

      
 

a

b

c

λ

θ

      

EKW

0.1013

2.2289

2.741

0.02545

20.0336

 

3752.8

3762.8

3762.9

3784.2

0.1071

 

(0.0944)

(1.8026)

(2.2276)

(0.0199)

(30.0233)

      
 

a

b

α

β

c

λ

     

BKW

0.9985

1.0006

1.9999

0.03989

2.0006

0.1141

4727.5

4739.5

4739.7

4765.2

80.9942

 

(0.0069)

(0.0431)

(0.0584)

(0.0017)

(0.2564)

(0.0075)

     
Plots of the fitted EKD, KD, D and the histogram of the data are given in Figure 4. The probability plots (Chambers et al. 1983) consists of plots of the observed probabilities, against the probabilities predicted by the fitted model are also presented in Figure 5. For the EKD distribution, we plotted for example,
G ( x ( j ) ) = 1 - 1 - 1 + λ ̂ x ( j ) - δ ̂ - α ̂ ϕ ̂ θ ̂
against j - 0.375 n + 0.25 , j = 1 , 2 , , n , where x ( j ) are the ordered values of the observed data. A measure of closeness of the plot to the diagonal line given by the sum of squares
Figure 4

Fitted PDF for data sets.

Figure 5

Observed probability vs predicted probability for data sets.

Figure 6

Empirical survival function for data sets.

SS = j = 1 n G ( x ( j ) ) - j - 0.375 n + 0.25 2

was calculated for each plot. The plot with the smallest SS corresponds to the model with points that are closer to the diagonal line. Plots of the empirical and estimated survival functions for the models are also presented in Figure 6.

For the air conditioning system data, initial values α = 1,λ = 2,δ = 0.6,ϕ = 3,θ = 1 are used in SAS code for EKD model. The LR statistics for the test of the hypothesis H0 : KD against H a : EKD and H0 : D against H a : EKD are 1.9 (p-value = 0.17) and 13.4 (p-value = 0.0012). Consequently, KD distribution is the best distribution based on the LR statistic. The KD distribution gives smaller SS value than Dagum distribution and slightly bigger than EKD. For the non nested models, the values of AIC and AICC for KD and EKW models are very close, however the BIC value for KD distribution is slightly smaller than the corresponding value for the EKW distribution. We conclude that KD model compares favorably with the EKW distribution and thus provides a good fit for the air conditioning system data.

For the baseball player salary data set, initial values for EKD model in SAS code are α = 70,λ = 0.01,δ = 1.026,ϕ = 0.1,θ = 1. The EKD distribution is a better fit than KD and Dagum distributions for this data, as well as the other distributions. The values of the statistics AIC, AICC and BIC for KD distribution are smaller compared to the non nested distributions. The LR statistics for the test of the hypotheses H0 : KD against H a : EKD and H0 : D against H a : EKD are 93.1 (p-value < 0.0001) and 361.5 (p-value <0.0001). Consequently, we reject the null hypothesis in favor of the EKD distribution and conclude that the EKD distribution is significantly better than the KD and Dagum distributions based on the LR statistic. The value of AIC, AICC and BIC statistics are lower for the EKD distribution when compared to those for the EKW and BKW distributions.

For poverty rate data, initial values for EKD model are α = 73,λ = 0.1,δ = 0.15,ϕ = 60,θ = 0.33. The LR statistic for the test of the hypotheses H0 : KD against H a : EKD and H0 : D against H a : EKD are 8.2 (p-value = 0.0042) and 81.1 (p-value < 0.0001), respectively. The values of AIC, AICC and BIC statistics shows EKD distributions is a better model and the SS value of EKD model is comparatively smaller than the corresponding values for the KD and D distributions. Consequently, we conclude that EKD distribution is the best fit for the poverty rate data.

9 Conclusions

We have proposed and presented results on a new class of distributions called the EKD distribution. This class of distributions have applications in income and lifetime data analysis. Properties of this class of distributions including the series expansion of pdfs, cdfs, moments, hazard function, reverse hazard function, income inequality measures such as Lorenz and Bonferroni curves are derived. Renyi entropy, order statistics, reliability, mean and median deviations are presented. Estimation of the parameters of the models and applications are also given. Future work include MCMC methods with censored data and regression problems with concomitant information.

Authors’ information

Shujiao Huang is a graduate student at Georgia Southern University and Broderick O. Oluyede is Professor of Mathematics and Statistics at Georgia Southern University.

Declarations

Authors’ Affiliations

(1)
Department of Mathematical Sciences, Georgia Southern University

References

  1. Chambers J, Cleveland W, Kleiner B, Tukey P: Graphical Methods for Data Analysis. Chapman and Hall, London;Google Scholar
  2. Cordeiro GM, Ortega EMM, Nadarajah S: The Kumaraswamy Weibull distribution with application to failure data. J. Franklin Inst 347: 1399–1429. (2010)MathSciNetView ArticleGoogle Scholar
  3. Cordeiro GM, Gomes AE, de-Silva CQ, Ortega EMM: The Beta Exponentiated Weibull Distribution. J. Stat. Comput. Simulat 83(1):114–138. (2013)MathSciNetView ArticleGoogle Scholar
  4. Dagum CA: New model of personal income distribution: specification and estimation. Economie Applique’e 30: 413–437. (1977)Google Scholar
  5. Domma F, Condino F: The Beta-Dagum distribution: definition and properties. Communications in Statistics-Theory and Methods 44(22):4070–4090. (2013)MathSciNetView ArticleGoogle Scholar
  6. Domma F, Giordano S, Zenga M: Maximum likelihood estimation in Dagum distribution with censored samples. J. Appl. Stat 38(21):2971–2985. (2011)MathSciNetView ArticleGoogle Scholar
  7. Kleiber C: A Guide to the Dagum Distributions. In Modeling Income Distributions and Lorenz Curve Series: Economic Studies in Inequality, Social Exclusion and Well-Being 5. Edited by: Duangkamon C. Springer, New York; (2008)Google Scholar
  8. Kleiber C, Kotz S: Statistical size distributions in economics and actuarial sciences. Wiley, New York; (2003)View ArticleGoogle Scholar
  9. Kumaraswamy P: Generalized probability density function for double-bounded random process. J. Hydrol 46: 79–88. (1980)View ArticleGoogle Scholar
  10. McDonald B: Some generalized functions for the size distribution of income. Econometrica 52(3):647–663.(1995)View ArticleGoogle Scholar
  11. McDonald B, Xu J: A generalization of the beta distribution with application. J. Econometrics 69(2):133–152. (1995)MathSciNetView ArticleGoogle Scholar
  12. Oluyede BO, Rajasooriya S: The Mc-Dagum Distribution and Its Statistical Properties with Applications. Asian J. Mathematics and Applications 2013(85): , 1–16 http://scienceasia.asia/index.php/ama/article/view/85/44 , 1–16Google Scholar
  13. Proschan F: Theoretical explanation of observed decreasing failure rate. Technometrics 5: 375–383.(1963)View ArticleGoogle Scholar
  14. Renyi A: On measures of entropy and information. Berkeley Symp. Math. Stat. Probability 1(1):547–561.(1960)MathSciNetGoogle Scholar

Copyright

© Huang and Oluyede; licensee Springer. 2014

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.