- Methodology
- Open Access
- Published:
A new generalized Weibull family of distributions: mathematical properties and applications
Journal of Statistical Distributions and Applications volume 2, Article number: 13 (2015)
Abstract
We propose a generalized Weibull family of distributions with two extra positive parameters to extend the normal, gamma, Gumbel and inverse Gausssian distributions, among several other well-known distributions. We provide a comprehensive treatment of its general mathematical properties including quantile and generating functions, ordinary and incomplete moments and other properties. We introduce the log-generalized Weibull-log-logistic, this is new regression model represents a parametric family of models that includes as sub-models several widely known regression models that can be applied to censored survival data. We discuss estimation of the model parameters by maximum likelihood and provide two applications to real data.
Introduction
We introduce a generalized family of univariate distributions generated by Weibull random variables. For any baseline cumulative distribution function (cdf) G(x;η) (for \(x \in \mathbb {R}\)) and probability density function (pdf) g(x;η)=d G(x;η)/d x, depending on a parameter vector η, let q denote the dimension of the vector η. The generalized Weibull (“GW” for short) family of distributions is defined by the cdf
The pdf corresponding to (1) is given by
Hereafter, a random variable X having the called generalized Weibull-G (GW-G) density function (2) is denoted by X∼ GW-G (α,β,η). The aim of this paper is to derive some mathematical properties of X in explicit forms.
Alzaatreh et al. (2013) proposed a new method of generating families of continuos distributions, called the T-X family, which has the Weibull-X family as a special case. The cdf of the T-X family is given by
where R(t) and r(t) are the cdf and pdf of a random variable T, respectively. If a random variable T in (3) has the Weibull distribution, we obtain the GW-G distribution. Each GW-G distribution can be obtained from a specified G distribution. For α=β=1, the G distribution arises as a basic exemplar of the GW-G distribution with a continuous crossover towards cases with different shapes (for example, a particular combination of skewness and kurtosis). The hazard rate function (hrf) of X is given by
We provide explicit expressions for the quantile function (qf), ordinary and incomplete moments, mean deviations, Bonferroni and Lorenz curves, Rényi entropy, Shannon entropy, reliability and some properties of the order statistics.
The paper is outlined as follows. Section 2 provides some special distributions in the GW family. In Section 3, we derive useful expansions for the pdf and cdf of X. We can easily apply these expansions for all GW-G distributions. In Section 4, we obtain the quantile function (qf) of X. In Section 5, we derive explicit expressions for the ordinary and incomplete moments. The moment generating function (mgf) of X is determined in Section 6. Mean deviations, probability weighted moments (PWMs), entropies and reliability are investigated in Sections 7, 8, 9 and 10. In Section 11, we derive an expansion for the density function of the GW order statistics. Some inferential tools are discussed in Section 12. In Section 13, we present a generalization of regression models based on the GW family. The performance of the maximum likelihood estimators (MLEs) are also investigated by a simulation study in this section. In Section 14, we fit some GW-G distributions to two real data sets to demonstrate the potentiality of this family. Finally, Section 15 ends with some conclusions.
Special Weibull-G distributions
The GW family density function (2) allows for greater flexibility of its tails and can be widely applied in many areas of engineering and biology. Here, we present and study some special cases of this family because it extends several widely-known distributions in the literature. The density (2) will be most tractable when the cdf G(x;η) and pdf g(x;η) have simple analytic expressions.
2.1 The generalized Weibull-normal (GW-N) distribution
The GW-N distribution is defined from (2) by taking G(x;η) and g(x;η) to be the cdf and pdf of the normal N(μ,σ 2) distribution, where η=(μ,σ 2)T. Its density function is given by
where \(x \in \mathbb {R}\), \(\mu \in \mathbb {R}\) is a location parameter, σ>0 and α>0 are scale parameters, β>0 is a shape parameter, and ϕ(·) and Φ(·) are the pdf and cdf of the standard normal distribution, respectively. A random variable with density (4) is denoted by X∼GW-N(α,β,μ,σ 2). For μ=0 and σ=1, we obtain the standard GW-N distribution. Further, the GW-N distribution with α=β=1 becomes the normal distribution. Plots of the GW-N density function for selected parameter values are displayed in Fig. 1.
2.2 The generalized Weibull-Gumbel (GW-Gu) distribution
Consider the Gumbel distribution with location parameter \(\mu \in \mathbb {R}\) and scale parameter σ>0, where the pdf and cdf (for \(x\in \mathbb {R}\)) are
and
respectively. In this case η=(μ,σ)T. The mean and variance are equal to μ−γ σ and π 2 σ 2/6, respectively, where γ is the Euler’s constant (γ≈0.57722). Inserting these expressions into (2) gives the GW-Gu density function
where \(x, \mu \in \mathbb {R}\) and α,β,σ>0. The Gumbel distribution corresponds to α=β=1. Plots of (5) for selected parameter values are dispalyed in Fig. 2.
2.3 The generalized Weibull-log-normal (GW-LN) distribution
Let G(x) be the log-normal distribution with cdf
for x>0, σ>0 and \(\mu \in \mathbb {R}\), where η=(μ,σ)T. The GW-LN density function (for x>0) reduces to
For α=β=1, we obtain the log-normal distribution. Figures 3 and 4 display some possible shapes of the GW-LN density and hazard functions, respectively, for some parameter values.
2.4 The generalized Weibull-log-logistic (GW-LL) distribution
Consider the log-logistic distribution with shape parameter a>0 and scale parameter γ>0, where the pdf and cdf (for x>0) are given by
respectively, where η=(a,γ)T. Inserting these expressions into (2) yields the GW-LL density function
The log-logistic distribution corresponds to α=β= 1. Plots of (6) and hazard function for selected parameter values are displayed in Figs. 5 and 6, respectively.
Useful expansions
For any real parameter c and z∈(0,1), it can be proven that
where p i (c) are Stirling polynomials. The first six polynomials are p 0(w)=1/2, p 1(w)=(2+3w)/24, p 2(w)=(w+w 2)/48, p 3(w)=(−8−10w+15w 2+15w 3)/5760, p 4(w)=(−6w−7w 2+2w 3+3w 4)/11520 and p 5(w)=(96+140w−224w 2−315w 2+63w 5)/2903040. These coefficients are related to the Stirling polynomials1 by p n−1(w)=S n (w)/[n!(w + 1)] for n≥1, where S 0(w) = 1,S 1(w)=(w+1)/2, etc. The proof of the expansion (7) is given in details by Flajonet and Odlyzko (1990) (see Theorem 3A, page 227) and Flajonet and Sedgewick (2009) (see Theorem VI.2, page 385). In this paper, we adopt the polynomials p i (w) in accordance with Nielson (1906) and Ward (1934).
Some useful expansions for (1) and (2) can be derived using the concept of exponentiated distributions. For an arbitrary baseline cdf G(x), a random variable is said to have the exponentiated-G (“exp-G”) distribution with power parameter a>0, say X∼ exp-G (a), if its pdf and cdf are
respectively. The properties of exponentiated distributions have been studied by many authors in recent years, see Mudholkar and Srivastava (1993) for exponentiated Weibull, Gupta et al. (1998) for exponentiated Pareto, Gupta and Kundu (1999) for exponentiated exponential, Nadarajah (2005) for exponentiated Gumbel, Kakde and Shirke (2006) for exponentiated lognormal, and Nadarajah and Gupta (2007) for exponentiated gamma distributions.
By expanding the exponential function in (1), we can write
and then using (7)
Expanding G(x)(m+1)β and G(x)i+(m+1)β+1 in power series, F(x) can be expressed as
where H k (x) denotes the cdf of the exp-G (k) distribution and
The corresponding pdf of X can be expressed as
where h k+1(x) denotes the pdf of the exp-G (k+1) distribution and v k =w k+1. So, several properties of the GW-G distribution can be obtained by knowing those of the exp-G distribution, see, for example, Mudholkar et al. (1995), Gupta and Kundu (2001) and Nadarajah and Kotz (2006a), among others.
Quantile function
Let Q G (u)=G −1(u) be the quantile function (qf) of G for 0<u<1. Inverting F(x)=u in (1), we obtain the qf of X as
Hence, Eq. (11) reveals that the GW-G qf can be expressed in terms of the G qf. Quantiles of interest can be obtained from (11) by substituting appropriate values for u. In particular, the median of X is obtained when u=1/2, expressed by
We can also use (11) for simulating GW-G random variables by setting u as a uniform random variable In the unit interval (0,1). Using the power series expansion in Eq. (11), we have
where v k =(−1)k+1/(k α). Hence, the last equation reveals that the GW-G qf can be expressed as the G qf applied a power series.
Moments
From now on, let Y k+1∼ exp-G (k+1). A first formula for the nth moment of X can be obtained from Eq. (10) as
Explicit expressions for moments of several exponentiated distributions are given by Nadarajah and Kotz (2006a). They can be used to produce \(\mu _{n}^{\prime }\).
A second formula for \(\mu _{n}^{\prime }\) can be obtained from (10) in terms of the baseline quantile function Q G (u). We obtain
where the integral can be expressed in terms of the G quantile function
The ordinary moments of several GW-G distributions can be determined directly from Eqs. (13) and (14). Here, we give two examples. For the first example, we consider the Gumbel distribution with cdf \(G(x)=1-\exp \left \{-\exp \left (\frac {x-\mu }{\sigma }\right)\right \}\). The moments of the exponentiated Gumbel distribution with parameter (k+1) can be obtained from Nadarajah and Kotz (2006a). The nth moment of the GW-Gu distribution becomes
For the second example, we consider the generalized Weibull-standard logistic (GW-SL) distribution, where G(x)=(1+e−x)−1. A result from (Prudnikov et al. 1986, Section 2.6.13, Eq. 4) gives (for t<1)
where \(B(\textit {a,b}) = {\int _{0}^{1}} t^{a-1}\,(1-t)^{b-1} \textit {dt}\) is the beta function.
Further, the central moments (μ r ) and cumulants (κ r ) of X can be determined as
respectively, where \(\kappa _{1}=\mu ^{\prime }_{1}\). Plots of the skewness and kurtosis varying the values of α and β for the GW-N and GW-LL distributions are displayed in Figs. 7 and 8, respectively. These plots reveal that the skewness and kurtosis depend on both parameters α and β.
The incomplete moments play an important role for measuring inequality, for example, the Lorenz and Bonferroni curves, which depend upon the first incomplete moment of a distribution. The nth incomplete moment of X is calculated as
The last integral can be computed for most baseline G distributions.
Let \(\mu _{n}^{\prime }=E(X^{n})\) be the nth ordinary moment of X calculated from (12) or (13). The nth descending factorial moment of X is
where
is the Stirling number of the first kind which counts the number of ways to permute a list of r items into k cycles. So, we can obtain the factorial moments from the ordinary moments given before.
Generating function
Here, we provide two formulae for the moment generating function (mgf) M(t)=E(etX) of X. A first formula for M(t) comes from (10) as
where M k+1(t) is the mgf of Y k+1. Hence, M(t) can be determined from the generating function of the exp-G (k+1) distribution.
A second formula for M(t) can be derived from (10) as
where ρ(t,a) can be calculated from the parent qf Q G (x) by
We can obtain the mgfs of several GW-G distributions directly from Eqs. (15) and (16). For example, the mgf of the GW-SL distribution (for t<1) is given by
Mean deviations
The mean deviations about the mean (\(\delta _{1}=E(|X-\mu ^{\prime }_{1}|)\)) and about the median (δ 2=E(|X−M|)) of X can be expressed as
respectively, where \(\mu ^{\prime }_{1}=E(X)\), M=M e d i a n(X) is the median given in Section 3, \(F(\mu ^{\prime }_{1})\) is easily calculated from the cdf (1) and \(m_{1}(z)=\int _{-\infty }^{z} x\,f(x) dx\) is the first incomplete moment.
Here, we provide two alternative ways to compute δ 1 and δ 2. First, a general equation for m 1(z) can be derived from (10) as
where
Equation (18) is the basic quantity to compute the mean deviations of the exp-G distributions. Hence, the mean deviations in (17) depend only on the mean deviations of the exp-G distribution.
A second general formula for m 1(z) can be derived by setting u=G(x) in (18)
where
is a simple integral defined from the baseline qf Q G (u).
In a similar way, the mean deviations of any GW-G distribution can be computed from Eqs. (19)–(20). For example, the mean deviations of the GW-SL distribution are determined immediately (by using the generalized binomial expansion) from the function
Applications of the first incomplete moment can be addressed to obtain Bonferroni and Lorenz curves defined for a given probability π by \(B(\pi)= m_{1}(q)/[\pi \mu ^{\prime }_{1}]\) and \(L(\pi)=m_{1}(q)/\mu ^{\prime }_{1}\), respectively, where \(\mu ^{\prime }_{1}=E(X)\) and q=Q G (1− exp{−[−α −1 log(1−π)]1/β}) is the GW-G qf at π, see equation.
Probability weighted moments
A very useful mathematical quantity is the probability weighted moment (PWM) of X. The (n,s)th PWM is given by κ n,s =E{X n F(X)s} for n,s=0,1,… Using the binomial theorem, κ n,s can be written as
Using the power series expansion in the last equation, we have
Using (7) and the power series expansion, we can write
Expanding G(x)β(m+l+1)+q−1 and G(x)β(m+l+1)+q+i, we have
where
The quantity κ n,s can be obtained from (21) in terms of the baseline qf by setting G(x)=u. We have
where τ(n,r) is given by (14).
Equation (22) can be applied for most baseline G distributions to derive explicit expressions for κ n,s , since the baseline qf can usually be expressed as a power series.
Entropies
An entropy is a measure of variation or uncertainty of a random variable X. Two popular entropy measures are the Rényi and Shannon entropies (Shannon 1951; Rényi 1961). The Rényi entropy of a random variable with pdf f(·) is defined by
for γ>0 and γ≠1. The Shannon entropy of a random variable X is defined by E[− logf(X)]. It is the particular case of the Rényi entropy for γ ↑1.
Here, we derive expressions for the Rényi and Shannon entropies when X is a generalized Weibull-G random variable. Using (7), we can write
Expanding G(x)γ(β−1) and G(x)i+γ(β−1)+1, the last equation becomes
where
By expanding the exponential function and using the results obtained in (23), we can write
where τ ℓ is given by
So,
where
s k and τ ℓ are given by (24) and (25), respectively, and I q+k+ℓ comes from the parent distribution as
Hence, the Rényi entropy of X is given by
The Shannon entropy can be obtained by limiting γ ↑1 in (26). However, it is easier to derive an expression for it from first principles. Using (2), the Shannon entropy cam be expressed as
Using the series expansion for log(1−z), we can write
Henceforth, we use an equation by Gradshteyn and Ryzhik (2007) for a power series raised to a positive integer n
where the coefficients c n,i (for i=1,2,…) are easily determined from the recurrence equation
where \(c_{n,0}={a_{0}^{n}}\). The coefficient c n,i can be determined from c n,0,…,c n,i−1 and hence from the quantities a 0,…,a i . In fact, c n,i can be given explicitly in terms of the coefficients a i , although it is not necessary for programming numerically our expansions in any algebraic or numerical software.
Based on Eqs. (29), (28) can be rewritten as
where
for r=1,2,… and e j,0=2−j. Using the result (31) and expanding log(1−G(x)) in a similar form to (27), the Shannon entropy reduces to
For any real parameter β and G(x) ∈ (0,1), we can write from (7)
where p 0(β)=β/2, p 1(β)=β (3β+5)/24, p 2(β)=β (β 2+5β+6)/48, etc. Then, the Shannon entropy for the GW-G family is given by
The expectations in (32) can be easily evaluated numerically for a given G(·) and g(·). Using (10), they can also be represented as
and
The last of these representations can also be expressed in terms of the parent qf Q G (u)=G −1(u) as
where the integral can be calculated for most baseline distributions using a power series expansion for Q G (u).
Reliability
Here, we derive the reliability, R= Pr(X 2<X 1), when X 1∼ GW-G (α 1,β 1) and X 2∼ GW-G (α 2,β 2) are independent random variables. Probabilities of this form have many applications especially in engineering concepts. Let f i denote the pdf of X i and F i denote the cdf of X i . By using the representations, (8) and (10), we can write
where R jk = Pr(Y j <Y k ) is the reliability between the independent random variables Y j ∼ exp-G (j) and Y k ∼ exp-G (k+1). Hence, the reliability for the GW-G random variables is a linear combination of those for exp-G random variables. In the particular case α 1=α 2 and β 1=β 2, Eq. (33) gives R=1/2.
Order statistics
Order statistics make their appearance in many areas of statistical theory and practice. Suppose X 1,…,X n is a random sample from the GW-G distribution. Let X i:n denote the ith order statistic. The pdf of X i:n can be expressed from (8) and (10) as
where K=n!/[ (i−1)! (n−i)!]. Using (29) and (30), we can write
where f j+i−1,0=(w 0)j+i−1,
and w k is given by (9). Hence,
where
Equation (34) is the main result of this section. It reveals that the pdf of the GW-G order statistics is a triple linear combination of exp-G density functions. So, several mathematical quantities of these order statistics like ordinary, incomplete and factorial moments, mgf, mean deviations and several others can be obtained from those quantities of generalized Weibull-G distributions. Clearly, the cdf of X i:n can be expressed as
Maximum likelihood estimation
Several approaches for parameter point estimation were proposed in the literature but the maximum likelihood method is the most commonly employed. The maximum likelihood estimates (MLEs) enjoy desirable properties and can be used when constructing confidence intervals and also in test statistics. Large sample theory for these estimates delivers simple approximations that work well in finite samples. Statisticians often seek to approximate quantities such as the density of a test statistic that depend on the sample size in order to obtain better approximate distributions. The resulting approximation for the MLEs in distribution theory is easily handled either analytically or numerically. The goodness of fit statistics including the Akaike information criterion (AIC), Bayesian information criterion (BIC), Consistent Akaike information criterion (CAIC), Anderson-Darling (A) and Cramér–von Mises (W) are computed to compare the fitted models.
Here, we consider estimation of the unknown parameters of the GW-G distribution by the method of maximum likelihood. Let x 1, …, x n be a sample from (2) and θ=(α,β,η T)T be vector of parameters of dimension (q+2). The log-likelihood function for θ is given by
The score functions for the parameters α, β and η are easily derived analytically as
respectively.
The MLE \(\widehat {\boldsymbol {\theta }}\) of θ is obtained by solving the nonlinear likelihood equations U α (θ)=0, U β (θ)=0 and U η (θ)=0. These equations cannot be solved analytically and statistical software can be used to solve them numerically. We can use iterative techniques such as a Newton-Raphson type algorithm to obtain \(\widehat {{\boldsymbol {\theta }}}\). We employ the numerical procedure NLMixed in SAS.
Let J(θ)={J ab } be the (q + 2)×(q + 2) observed information matrix (for a,b=α,β,η), whose elements can be calculated numerically. Based on the approximate multivariate normal \(N_{q+2}(0,J(\widehat {\boldsymbol {\theta }})^{-1})\) distribution of θ̂, we can construct approximate confidence intervals for the model parameters. We can compute the maximum values of the unrestricted and restricted log-likelihoods to obtain likelihood ratio (LR) statistics for testing some sub-models of the GW-G distribution. Hypothesis tests of the type H 0: ψ=ψ 0 versus H 1: ψ≠ψ 0, where ψ is a vector formed with some components of θ and ψ 0 is a specified vector, can be performed using LR statistics. For example, the test of H 0:α=β=1 versus H 1: H 0 isnot true is equivalent to compare the GW-G and G distributions and the LR statistic is given by
where \(\widehat {\alpha }\), \(\widehat {\beta }\) and \(\widehat {{\boldsymbol {\eta }}}\) are the MLEs under H and \(\widetilde {{\boldsymbol {\eta }}}\) is the estimate under H 0.
Regression models
In many practical applications, the lifetimes are affected by explanatory variables such as the cholesterol level, blood pressure, weight and many others. Parametric models to estimate univariate survival functions and for censored data regression problems are widely used. A regression model that provides a good fit to lifetime data tends to yield more precise estimates of the quantities of interest.
Let X be a random variable having the pdf (2). A class of regression models for location and scale is characterized by the fact that the random variable Y= log(X) has a distribution with location parameter μ(v) dependent only on the explanatory variable vector and a scale parameter σ. Then, we can then write
where σ>0 and Z has the distribution which does not depend on v. The random variable Y (for y∈ℜ) has density function given by
where the functions G(·) and g(·) are defined in Section 1.
For illustrative purposes, let X be a random variable having the GW-LL density function defined in Section 2.1. The random variable Y=log(X) re-parameterized in terms of μ= log(a) and σ=1/γ is given by
where α>0 and β>0 are shape parameters, μ∈ℜ is the location parameter and σ>0 is the scale parameter.
We refer to Eq. (37) as the log-generalized Weibull-log-logistic (LGW-LL) distribution, say Y∼LGW-LL(α,β,μ,σ). If X∼GW-LL(α,β,a,γ), then Y= log(X)∼LGW-LL(α,β,μ,σ). For α=β=1, we obtain the logistic model. The survival function corresponding to (37) is given by
Plots of the density function (37) for selected parameter values are displayed in Fig. 9, which show great flexibility for different values of α and β.
Now, we define the standardized random variable Z=(Y−μ)/σ having the density function
Next, we propose a linear location-scale regression model linking the response variable y i and the explanatory variable vector \(\textbf {v}_{i}^{T}=(v_{i1},\ldots,v_{\textit {ip}})\) as follows
where the random error z i has density function (39), τ=(τ 1,…,τ p )T, σ>0, α>0 and β>0 are unknown parameters. The parameter \(v_{i}=\textbf {v}_{i}^{T} {\boldsymbol {\tau }}\) is the location of y i . The location parameter vector v=(v 1,…,v n )T is represented by a linear model v=V τ, where V=(v 1,…,v n )T is a known model matrix. The LGW-LL model (40) opens new possibilities for fitted many different types of data.
Consider a sample (y 1,v 1),…,(y n ,v n ) of n independent observations, where each random response is defined by y i = min{log(x i ), log(c i )}. We assume non-informative censoring such that the observed lifetimes and censoring times are independent. Let F and C be the sets of individuals for which y i is the log-lifetime or log-censoring, respectively. Conventional likelihood estimation techniques can be applied here. The log-likelihood function for the vector of parameters θ=(α,β,σ,τ T)T from model (40) has the form \(l({\boldsymbol {\theta }})=\sum \limits _{i \in F}l_{i}({\boldsymbol {\theta }})+\sum \limits _{i \in C}l_{i}^{(c)}({\boldsymbol {\theta }})\), where \(l_{i}({\boldsymbol {\theta }})=\log [f(y_{i})]\), \(l_{i}^{(c)}({\boldsymbol {\theta }})=\log [\!S(y_{i})]\), f(y i ) is the density (37) and S(y i ) is the survival function (38) of Y i . The total log-likelihood function for θ reduces to
where r is the number of uncensored observations (failures). The MLE \(\widehat {{\boldsymbol {\theta }}}\) of the vector of unknown parameters can be calculated by maximizing the log-likelihood (41). We use the procedure NLMixed in SAS to calculate the estimate \(\widehat {{\boldsymbol {\theta }}}\). Initial values for β and σ are taken from the fit of the log-Weibull regression model with α=0 and β=1.
The elements of the (p+3)×(p+3) observed information matrix \(-\ddot {\textbf {L}}({\boldsymbol {\theta }})\), namely −L α α ,−L α β , \(-\textbf {L}_{\alpha \sigma },-\textbf {L}_{{\alpha \tau }_{j}}, -\textbf {L}_{\beta \beta },-\textbf {L}_{\beta \sigma }, -\textbf {L}_{{\beta \tau }_{j}},-\textbf {L}_{\sigma \sigma },-\textbf {L}_{{\sigma \tau }_{j}}\phantom {\dot {i}\!}\) and \(-\textbf {L}_{\beta _{j}\beta _{s}}\phantom {\dot {i}\!}\) (for j,s=1,…,p) can be calculated numerically. Inference on θ can be conducted in the classical way based on the approximate multivariate normal \(N_{p+3}\left (0,-\ddot {\textbf {L}}(\widehat {{\boldsymbol {\theta }}})^{-1}\right)\) distribution for \(\widehat {{\boldsymbol {\theta }}}\). Further, we can use LR statistics for comparing the LGW-LL model with some of its sub-models.
13.1 Simulation
For simulating of the GW-N distribution, we consider from Eq. (11) that U is a random variable from a uniform distribution in (0,1). We simulate the GW-N(α=2,β=1.5,0.5,μ=0,σ=1) model for n = 50, 150 and 300 times. For each sample size, we compute the MLEs of α, β, μ and σ. Then, we repeat this process 1000 times and compute the averages of the estimates (AEs), biases and means squared errors (MSEs). The results are reported in Table 1.
Based on the figures in Table 1, we note that the MSEs of the MLEs of α, β, μ and σ and a decay toward zero as the sample size increases, as usually expected under standard regularity conditions. As the sample size n increases, the mean estimates of the parameters tend to be closer to the true parameter values. This fact supports that the asymptotic normal distribution provides an adequate approximation to the finite sample distribution of the estimates. The usual normal approximation can be oftentimes improved by making bias adjustments to the MLEs. Approximations to the biases of the MLEs in simple models may be obtained analytically. In order to improve the accuracy of these estimates using analytical bias reduction one needs to obtain several cumulants of log likelihood derivatives which are notoriously cumbersome for the proposed model. In Fig. 10 we present the true density and the density of the average values of the parameters for different sample sizes.
Applications
In this section, we present two applications to read data. In the first, the computations were performed using the subroutine g o o d n e s s.f i t in the script AdequacyModel of the R package. In the second application for censured data the computations were done using the subroutine nlmixed of the SAS software.
14.1 Data: Strengths of glass fibers
The data (n=63) set is on the strengths of 1.5 cm glass fibers from Smith and Naylor (1987) contained in the gamlss.data library of the R software. Barreto-Souza et al. (2010) fitted the beta generalized exponential (BGE) distribution to these data and proved that its fit is better than those of the beta exponential (BE) (Nadarajah and Kotz 2006b) and generalized exponential (GE) (Gupta and Kundu 1999) distributions. Barreto-Souza et al. (2011) proved that the beta Fréchet (BF) distribution gives a better fit than the Fréchet and exponentiated Fréchet (EF) (Nadarajah and Kotz 2003) distributions. Alzaghal et al. (2013) fitted the exponentiated Weibull-exponential (EWE) distribution to the current data and conclude that this distribution provides a better fit than the BGE and BF distributions. Recently, Bourguignon et al. (2014) fits the Weibull-exponential (WE) distribution and shows that it is better than the exponentiated Weibull (EW) (Mudholkar and Srivastava 1993) and exponentiated exponential (EE) (Gupta and Kundu 1999) models.
Now, we compare the EWE and WE models with some other GW-G models fitted to these data. We also present the fits of the baseline distributions to compare the gain with the generated distributions. Table 2 provides the MLEs (and the corresponding standard errors in parentheses) of the model parameters and the values of the statistics AIC, BIC, A and W for some models.
Formal tests for the extra skewness parameters (α,β) in the GW-N distribution are performed using LR statistics as described in Section 12. We compare the GW-N and normal models and the GW-LL and LL models, where the LR values are listed in Table 3. For the strengths of glass fibers data, we reject the null hypotheses of the LR tests in favor of the GW-N and GW-LL distributions thus indicating the gain added by the parameters α and β.
In order to assess if the model is appropriate, Fig. 11 a and b display the histogram of the current data and the fitted densities of the GW-N, N, GW-LL, GW-Gu, GW-LN, EWE and WE models. The figures in Table 4 and the plots of Fig. 11 indicate that the GW-Gu distribution has a significant gain compared with other distributions.
14.2 Entomology data
The data come from a study carried out at the Department of Entomology of the Luiz de Queiroz School of Agriculture, University of São Paulo, which aim to assess the longevity of the mediterranean fruit fly (ceratitis capitata). The need for this fly to seek food just after emerging from the larval stage has permitted the use of toxic baits for its management in Brazilian orchards for at least fifty years. This pest control technique consists of using small portions of food laced with an insecticide, generally an organophosphate, that quickly kills the flies, instead of using an insecticide alone. Recently, there have been reports of the insecticidal effect of extracts of the neem tree leading to proposals to adopt various extracts (aqueous extract of the seeds, methanol extract of the leaves and dichloromethane extract of the branches) to control pests such as the mediterranean fruit fly. The experiment was completely randomized with eleven treatments, consisting of different extracts of the neem tree, at concentrations of 39,225 and 888 ppm. After preliminary statistical analysis, these eleven treatments were allocated into two groups, namely:
-
Group 1: Control 1 (deionized water); Control 2 (acetone - 5 %); aqueous extract of seeds (AES) (39 ppm); AES (225 ppm); AES (888 ppm); methanol extract of leaves (MEL) (225 ppm); MEL (888 ppm); and dichloromethane extract of branches (DMB) (39 ppm).
-
Group 2: MEL (39 ppm); DMB (225 ppm) and DMB (888 ppm).
For more details, see Silva et al. (2013). The response variable in the experiment is the lifetime of the adult flies in days after exposure to the treatments. The experimental period was set at 51 days, so that the numbers of larvae that survived beyond this period were considered as censored observations. The total sample size was n=72, because four cases were lost. Therefore, the variables used in this study were: x i -lifetime of ceratitis capitata adults in days, δ i -censoring indicator and v i1-group (1 = group 1, 0 = group 2). We start the analysis of the data considering only failure (x i ) and censoring (δ i ) data.
Recently, Alexander et al. (2012) analyzed these data using the McDonald-Weibull (McW) distribution with scale parameter β>0 and shape parameter λ>0. We focus on this distribution since it extends various distributions previously discussed in the lifetime literature, as: beta Weibull (BW) (Lee et al. 2007), Kumaraswamy Weibull (KwW) (Cordeiro et al. 2010), exponentiated Weibull (EW) (Mudholkar et al. 1995) distributions and more.
Now, we compare the McW distribution and some of their sub-models. For some fitted models, Table 4 provides the MLEs (and the corresponding standard errors in parentheses) of the parameters and the values of the AIC, BIC and CAIC statistics. The computations were performed using the NLMixed subroutine in SAS. They indicate that the GW-LL model has the lowest AIC, BIC and CAIC values among those values of the fitted models, and therefore it could be chosen as the best model.
In order to assess if the model is appropriate, Fig. 12 a displays the empirical and estimated cumulative distributions for the fitted GW-LL and LL distributions to the current data. Further, Fig. 12 b gives the plots of the empirical survival function and the estimated GW-LL and LL survival functions. These plots indicate the GW-LL model provides a good fit to these data.
Now, we present results by fitting the model
where the random variable Y i follows the LGW-LL distribution given in (37). The MLEs of the model parameters and the asymptotic standard errors of these estimates calculated using the NLMixed procedure in SAS are listed in Table 5.
A summary of the values of the measures AIC, CAIC and BIC to compare the LGW-LL and logistic regression models is given in Table 5. We conclude that the fitted LGW-LL regression model has the lowest AIC, CAIC and BIC values compared with those values of the fitted logistic model. Figure 13 provides the plots of the estimated survival function and estimated cdf of the LGW-LL distribution. These plots indicate this regression model provides a good fit to these data.
Conclusions
We study some mathematical properties of a new generalized Weibull family of distributions with two extra positive parameters. The family is able to generalize any continuous distribution. We provide some special models, a very useful mixture representation in terms of exponentiated distributions, explicit expressions for the ordinary and incomplete moments, generating function, mean deviations, probability weighted moments, entropies, reliability and order statistics. The model parameters are estimated by the method of maximum likelihood. We introduce a location-scale regression model based on the new family. The importance of the proposed models is illustrated by means of two real life data sets. The new models provide consistently better fits than other competitive models for these data.
References
Alexander, C, Cordeiro, GM, Ortega, EMM, Sarabia, JM: Generalized beta-generated distributions. Comput. Stat. Data Anal. 56, 1880–1897 (2012).
Alzaatreh, A, Lee, C, Famoye, F: A new method for generating families of continuous distributions. Metron. 71, 63–79 (2013).
Alzaghal, A, Famoye, F, Lee, C: Exponentiated T-X Family of Distributions with Some Applications. Int. J. Stat. Probab. 2, 31 (2013).
Barreto-Souza, W, Santos, AH, Cordeiro, GM: The beta generalized exponential distribution. J. Stat. Comput. Simul. 80, 159–172 (2010).
Barreto-Souza, W, Cordeiro, GM, Simas, AB: Some results for beta Fréchet distribution. Commun. Stat. Theory Methods. 40, 798–811 (2011).
Bourguignon, M, Silva, RB, Cordeiro, GM: The Weibull-G Family of Probability Distributions. J. Data Sci. 12, 53–68 (2014).
Cordeiro, GM, Ortega, EMM, Nadarajah, S: The Kumaraswamy Weibull distribution with application to failure data. J. Frankl. Inst. 347, 1399–1429 (2010).
Flajonet, P, Odlyzko, A: Singularity analysis of generating function. SIAM: SIAM J. Discr. Math. 3, 216–240 (1990).
Flajonet, P, Sedgewick, R: Analytic Combinatorics. ISBN 978-0-521-89806-5. Cambridge University Press (2009).
Gradshteyn, IS, Ryzhik, IM: Table of Integrals, Series, and Products. seventh edition. Academic Press, San Diego (2007).
Gupta, RC, Gupta, PL, Gupta, RD: Modeling failure time data by Lehman alternatives. Commun. Stat. Theory Methods. 27, 887–904 (1998).
Gupta, RD, Kundu, D: Generalized exponential distributions. Aust. N. Z. J. Stat. 41, 173–188 (1999).
Gupta, RD, Kundu, D: Exponentiated exponential family: an alternative to gamma and Weibull distributions. Biom. J. 43, 117–130 (2001).
Kakde, CS, Shirke, DT: On exponentiated lognormal distribution. Int. J. Agric. Stat. Sci. 2, 319–326 (2006).
Lee, C, Famoye, F, Olumolade, O: Beta Weibull distribution: some properties and applications to censored data. J. Modern Appl. Stat. Methods. 6, 173–186 (2007).
Mudholkar, GS, Srivastava, DK: Exponentiated Weibull family for analyzing bathtub failure-rate data. IEEE Trans. Reliab. 42, 299–302 (1993).
Mudholkar, GS, Srivastava, DK, Freimer, M: The exponentiated Weibull family: A reanalysis of the bus-motor-failure data. Technometrics. 37, 436–445 (1995).
Nadarajah, S: The exponentiated Gumbel distribution with climate application. Environmetrics. 17, 13–23 (2005).
Nadarajah, S, Gupta, AK: The exponentiated gamma distribution with application to drought data. Calcutta Stat Assoc. Bull. 59, 29–54 (2007).
Nadarajah, S, Kotz, S: The exponentiated Frechet distribution. Interstat Electron. (2003). http://interstat.statjournals.net/YEAR/2003/abstracts/0312001.php.
Nadarajah, S, Kotz, S: The exponentiated type distributions. Acta Applicandae Mathematicae. 92, 97–111 (2006a).
Nadarajah, S, Kotz, S: The beta exponential distribution. Reliab. Eng. Syst. Saf. 91, 689–697 (2006b).
Nielson, N: Handbuch der theorie der gamma funktion. Chelsea Publ. Co., New York (1906).
Prudnikov, AP, Brychkov, YA, Marichev, OI: Integrals and Series, Vol. 1, 2 and 3. Gordon and Breach Science Publishers, Amsterdam (1986).
Rényi, A: On measures of entropy and information, Vol. 1. University of California Press, Berkeley (1961).
Shannon, CE: Prediction and entropy of printed English. Bell Syst. Technical J. 30, 50–64 (1951).
Silva, MA, Bezerra-Silva, GCD, Vendramim, JD, Mastrangelo, T: Sublethal effect of neem extract on Mediterranean fruit fly adults. Rev. Bras. Frutic. 35, 93–101 (2013).
Smith, RL, Naylor, JC: A comparison of maximum likelihood and Bayesian estimators for the three-parameter Weibull distribution. Appl. Stat. 36, 358–369 (1987).
Ward, M: The representation of Stirling’s numbers and Stirling’s polynomials as sums of factorials. Am. J. Math. 56, 87–95 (1934).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Cordeiro, G.M., M. Ortega, E.M. & Ramires, T.G. A new generalized Weibull family of distributions: mathematical properties and applications. J Stat Distrib App 2, 13 (2015). https://doi.org/10.1186/s40488-015-0036-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s40488-015-0036-6
Keywords
- Estimation
- Generating function
- Mean deviation
- Moment
- Weibull distribution
Mathematics Subject Classification
- 47N30
- 97K70
- 97K80