- Research
- Open access
- Published:
A generalization to the log-inverse Weibull distribution and its applications in cancer research
Journal of Statistical Distributions and Applications volume 8, Article number: 14 (2021)
Abstract
In this paper we consider a generalization of a log-transformed version of the inverse Weibull distribution. Several theoretical properties of the distribution are studied in detail including expressions for its probability density function, reliability function, hazard rate function, quantile function, characteristic function, raw moments, percentile measures, entropy measures, median, mode etc. Certain structural properties of the distribution along with expressions for reliability measures as well as the distribution and moments of order statistics are obtained. Also we discuss the maximum likelihood estimation of the parameters of the proposed distribution and illustrate the usefulness of the model through real life examples. In addition, the asymptotic behaviour of the maximum likelihood estimators are examined with the help of simulated data sets.
Introduction
Keller et al. (1982) introduced and studied the inverse Weibull distribution (IWD) through the cumulative distribution function (c.d.f.)
for any y > 0 and c > 0. Certain modified versions of the IWD have been used frequently in survival analysis and reliability studies for modelling certain failure characteristics such as infant mortality, useful life, wear-out periods etc. For details regarding the applications of the IWD and its related versions, we can refer Drapella (1993), Mudholkar and Srivastava (1993), Khan et al. (2008), Khan and Pasha (2009), Jazi et al. (2010), de Gusmao et al. (2011), Shahbaz et al. (2012), Elbatal et al. (2016), Aryal and Elbatal (2015) and Kumar and Nair (2016, 2018a, 2018b, 2018c). Truncated versions of distributions like the Normal distribution, Weibull distribution, Lindley distribution etc. have found wide applications in various areas of survival analysis and reliability theory. For example see Ahmed et al. (2010), Amemiya (1973), Kizilersu et al. (2016), Singh et al. (2014), Zhang and Xie (2011) etc.
A log-transformed version of the IWD namely “the log-inverse Weibull distribution (LIWD)” and its location-scale extended version namely the “extended log-inverse Weibull distribution (ELIWD)” capable for modelling truncated data sets were studied by Kumar and Nair (2018c) through their c.d.f.s
and
for y ≥ 0 and z ≥ ea with a ∈ (−∞, ∞), b > 0 and c > 0 respectively. The LIWD and the ELIWD find applications in several areas including industrial as well as bio medical fields, but are not suitable for data sets having non-decreasing failure rates. Moreover, the LIWD can be considered as a better alternative to the left truncated form of the IWD truncated at unity as it has much more flexibility in terms of its measures of central tendency, dispersion, skewness and kurtosis and the shapes of its p.d.f. The distribution with c.d.f.s (2) and (3) are denoted as LIWD(c) and ELIWD(a, b, c) respectively throughout this paper.
The present paper proposes a Lehman Type II extension (see Lehmann (1953)) of the ELIWD(a, b, c) through the name “the exponentiated log-inverse Weibull distribution (ExLIWD)” by incorporating an additional shape parameter so as to include the non-decreasing shape for the hazard rate function, thereby increasing the flexibility of the model in handling data sets from various real life situations. Thus we attempt to establish that the ExLIWD is not only more flexible in terms of the shapes of its hazard rate function, measures of central tendency, dispersion, skewness and kurtosis, but also gives much better fit to complete, censored as well as truncated data sets.
The rest of the paper is organized as follows: In Section 2 we present the definition and a number of important properties of the ExLIWD including expressions for its characteristic function and moments. Some structural properties of the ExLIWD is suggested in Section 3 while Section 4 deals with the distribution and moments of order statistics. Section 5 contains the maximum likelihood (M.L.) estimation of the parameters of the distribution. The usefulness of the model as a survival distribution in various areas of cancer related applications is illustrated in Section 6 by considering three real life data sets out of which two are complete data sets and one is a censored cancer data set. Through Section 7 we examine the asymptotic behaviour of the estimators of the parameters of the ExLIWD with the help of simulated data sets.
Exponentiated log-inverse Weibull distribution
In this section we present the definition and some important properties of the exponentiated log-inverse Weibull distribution.
Definition 0.1 A continuous random variable Y is said to have “the exponentiated log-inverse Weibull distribution (ExLIWD)” if its c.d.f. is of the following form, for any a ∈ (−∞, ∞), b > 0, c > 0, δ > 0 and y > ea.
in which \( \psi \left(y;\underline{\theta}\right)=\psi \left(y;a,b\right)=\left[\frac{\mathit{\ln}(y)-a}{b}\right]. \)
Clearly the ExLIWD(a, b, c, δ) is a proportional hazard model (also known as a Lehman Type II extension) of the ELIWD(a, b, c). For the sake of convenience, a distribution with c.d.f. (4 ) is hereafter denoted as ExLIWD(a, b, c, δ).
A practical interpretation of the ExLIWD(a, b, c, δ) can be provided whenever δ is an integer. Consider a device constituted of δ independent and identically distributed components having the ELIWD(a, b, c) life times with c.d.f. Q3(.) as given in (3), connected in a series system so that the device fails if any of the components fail. Let Y1,Y2,...Yδ denote the life times of the components and let Y be the life of the system with c.d.f. F(y). Then
which shows that the life time of the device has the ExLIWD(a, b, c, δ), in the light of (4).
Now the p.d.f. f(y), the survival function \( \overline{F}(y) \), the hazard rate function h(y) and the reverse hazard rate function τ(y) of the ExLIWD(a, b, c, δ) are obtained as
and
The plots of the c.d.f., the p.d.f. and the hazard rate function of the ExLIWD(a, b, c, δ) for particular values of its parameters are presented in Fig. 1, Fig. 2 and Fig. 3 respectively.
From Fig. 1 it can be observed that the plots of the c.d.f. F(y) coincide at the point (eb, 1 − (0.632121)δ) for fixed values of the parameters b and δ and varying c. Hence it can be inferred that there is a probability of [1 − (0.632121)δ] that an ExLIWD(a, b, c, δ) distributed life time is atmost eb for any value of c.
On differentiating the p.d.f. (5) and the hazard rate function (7) with respect to y, we have
and
respectively, in which ψ(y;θ) is as defined in (4).
Next we obtain the following results on the shapes of the ExLIWD(a, b, c, δ) with regards to its p.d.f. and hazard rate function, proofs of which follow directly from (9) and (10) since \( 1-{e}^{-{\left[\psi \left(y;\underset{\_}{\theta}\right)\right]}^{-c}}>0 \) for all values of a ∈ (−∞,∞), b > 0, c > 0, δ > 0 and y > ea, where ψ(y; θ) is as defined in (4).
Result 1
The p.d.f., f(y) of the ExLIWD(a, b, c, δ) is a decreasing function of y if \( \psi \left(y;\underset{\_}{\theta}\right)>{\left[\ln \left(\delta \right)\right]}^{-{c}^{-1}} \). Moreover, when \( \psi \left(y;\underline{\theta}\right)<{\left[\mathit{\ln}\left(\delta \right)\right]}^{-{c}^{-1}} \), f(y) is decreasing for y if
Result 2
The hazard rate function, h(y) of the ExLIWD(a, b, c, δ) is a decreasing function of y whenever \( 1-{e}^{-{\left[\psi \left(y;\underset{\_}{\theta}\right)\right]}^{-c}}>c{\left\{{\left[\psi \left(y;\underline{\theta}\right)\right]}^c\left[1+c+ b\psi \left(y;\underline{\theta}\right)\right]\right\}}^{-1} \)
The mode (Mo) of the ExLIWD(a,b,c,δ) is obtained from (9) as the solution of the equation
Moreover, using the condition for uni-modality it can be observed that the ExLIWD(a, b, c, δ) is uni-modal if \( \frac{d^2f(y)}{d{y}^2}<0, \) which on simplification gives
On inverting the c.d.f. F (yp) given in (4), the pth quantile function yp of the ExLIWD(a, b, c, δ) for p ∈ (0,1) is obtained as
On substituting p = 0.5 in (13), we have the median(M) of the ExLIWD(a, b, c, δ) as
We have calculated the values of the median and mode of the ExLIWD(a, b, c, δ) for particular values of its parameters and the corresponding plots are presented in Figs. 4 and 5 respectively. The following aspects of the median and mode of the ExLIWD(a, b, c, δ) can be observed based on these Figures.
-
The median of the ExLIWD(a, b, c, δ) is a non-increasing function of c when δ is small and is a non-decreasing function of c for larger values of δ.
-
The values of the median coincide at the point M = e(a + b) for fixed arbitrary values of the parameters a and b and δ = 1.51119, for all values of the parameter c.
-
The mode of the ExLIWD(a, b, c, δ) is an increasing function of the parameter c for fixed values of δ.
-
For fixed arbitrary values of the parameter c, the mode of the ExLIWD(a, b, c, δ) is a decreasing function of the parameter δ.
Now we present the following integrals and functions which are required in the sequel.
For any a ∈ R = (−∞,∞),
in which (x)k = x(x + 1)…(x + k − 1), for k ≥ 1 with (x)0 =1. For Re(ν) > 0 and Re(λ) > 0, we have the following integrals.
and
The incomplete Gamma functions γ(λ,z) and Γ(λ,z) as defined in (17) and (18) can be represented in terms of the confluent hypergeometric function φ(λ,γ; z) as
and
in which
The importance of moments in identifying the salient features of a distribution like mean, variance, skewness and kurtosis needs no emphasis. We obtain expressions for the characteristic function and the rth raw moment of the ExLIWD(a, b, c, δ) through the following results, their proofs are included in Appendix.
Result 3
For t ∈ ℜ and \( i=\sqrt{-1} \), the characteristic function ΦY(t) of the ExLIWD(a, b, c, δ) is the following.
in which for any \( a\in {\mathfrak{R}}^{+}=\left[0,\infty \right) \), [a] denotes the integer part of a, φ(α,β;θ) is the confluent hypergeometric function and S(m, j) is the Stirling numbers of second kind.
Result 4
For r ≥ 1 and Re(1 − kc−1) > 0, the rth raw moment μr of the ExLIWD(a, b, c, δ).is
with\( {\varphi}_k\left(c,\delta \right)={\sum}_{j=0}^{\infty}\frac{{\left(-1\right)}^j}{j!}{\left(\delta -j\right)}_j{\left(j+1\right)}^{\left[{kc}^{-1}-1\right]} \), which converges for any c > 0, δ > 0 and k > 0.
The values of the raw moments of the ExLIWD(a, b, c, δ) can be calculated numerically by using mathematical softwares like MATHEMATICA, MATHCAD etc. We have calculated the mean(μ), variance(σ2), moment measure of skewness(γ1) and moment measure of kurtosis(γ2) of the ExLIWD(a, b, c, δ) and plotted them in Figs. 6 and 7.
Percentile measures of skewness and kurtosis of a distribution provide a better understanding of the pattern of skewness and kurtosis of the distribution and are less affected by the tail behaviour of the distribution or by outliers. Moreover, the fact that the moment measures of kurtosis can become infinite for many heavy tailed distributions highlights the importance of the percentile measures. The percentile measure of skewness and kurtosis of the ExLIWD(a, b, c, δ) are obtained by the following results in which.
and δ > 0.
Result 5
The Galton’s and Bowley’s percentile measures of skewness denoted by Ga and Bo respectively of the ExLIWD(a, b, c, δ) are given by
and
Proof. Proof follows from the following definitions of the Galton and Bowley measures of skewness, in the light (13).
and
Result 6
The Schmid-Tred’e percentile measure of kurtosis L of the ExLIWD(a, b, c, δ) is given by
Proof. Proof is straight forward from the following definitions of the Schmid - Tred’e measure of kurtosis L, in the light of (13).
Based on Results 5 and 6, we have the following remarks.
Remark 1 The ExLIWD(a, b, c, δ) is symmetric for those values of the parameters satisfying the condition \( {e}^{q_{0.8}}+{e}^{q_{0.2}}=2{e}^{-{q}_{0.5}} \) and positively (negatively) skewed for \( {e}^{q_{0.8}}+{e}^{q_{0.2}} \) less than (greater than) \( 2{e}^{-{q}_{0.5}} \).
Remark 2 The ExLIWD(a, b, c, δ) is mesokurtic for those values of the parameters satisfying the condition \( {e}^{q_{0.975}}-{e}^{q_{0.025}}=2.9058\left({e}^{q_{0.75}}-{e}^{q_{0.25}}\right) \) and leptokurtic (platykurtic) if
\( {e}^{q_{0.975}}-{e}^{q_{0.025}} \) is greater than (less than) \( 2.9058\left({e}^{q_{0.75}}-{e}^{q_{0.25}}\right). \)
The values of Ga and L for particular values of the parameters c and δ and fixed values of the other parameters are calculated and plotted in Fig. 8. From the figures it can be observed that the values of Ga is a non-increasing function of δ and c for fixed values of the other parameters while L is non-increasing for increasing values of δ and small values of c as is evident in the case of moment measures of skewness and kurtosis also.
The following result provides an expression for the sth incomplete moment of the ExLIWD(a, b, c, δ) in the light of (20) based on which we have Corollaries 1 and 2.
Result 7
For \( \varsigma \left(z;j\right)=\left(j+1\right){\left[\psi \left(z;\underline{\theta}\right)\right]}^{-c} \) , the s th incomplete moment of the ExLIWD(a, b, c, δ).
is the following.
Proof. The sth incomplete moment of the ExLIWD(a, b, c, δ) with p.d.f. (5) is defined as
where ψ(y;θ) is as defined in (4). On integrating (31) after using the substitution u = ψ(y;θ) and hence expanding the terms using (15), we have
where Γ(α, y) is the incomplete Gamma function as defined in (18) and \( \varsigma \left(z;j\right)=\left(j+1\right){\left[\psi \left(z;\underline{\theta}\right)\right]}^{-c} \). (32) can be represented in terms of the confluent hypergeometric function Ψ(α,γ;z) in the light of (20) to give (30).
Corollary 1 We can obtain the mean deviation about the mean μ1 and the mean deviation about the median M of the ExLIWD(a, b, c, δ) with c.d.f. F(.) as E(| Y − μ1| ) = 2μ1F(μ1) − 2H1(μ1) and E(| Y − M| ) = M − 2H1(M) respectively, where H1(z) is as given in (30), when s = 1.
Corollary 2 For the ExLIWD(a, b, c, δ), the equations to the Lorenz and Bonferroni curves are \( L(p)={\mu}_i^{-1}{H}_1\left({y}_p\right) \) and B(p) = (pμ1)−1H1(yp) respectively, where yp and Hs(y) are as defined in (13) and (30) respectively.
The geometric mean (GM) finds application in survival analysis, especially in cases where only a cumulative survival is available. Here we present an expression for the GM of the ExLIWD(a, b, c, δ).
Result 8
The GM of the ExLIWD(a, b, c, δ) is given by the equation
where φ1(c, δ) is as defined in (23) and Re[1 − c−1]> 0.
Proof. By definition,
using the substitution \( z=\frac{\ln (y)-a}{b} \). On integrating (34) after expanding the terms, we obtain (33) in the light of (23).
Entropy measures the level of randomness or uncertainty in a system. The expressions for the R’enyi and Shannon entropies of the ExLIWD(a, b, c, δ) are provided through the following result.
Result 9
The R’enyi entropy of the ExLIWD(a, b, c, δ) is the following for Re[b(1 − ρ)] > 1 and m(c, ρ) = c−1(1 − ρ(c + 1)).
Proof. By definition, R’enyi entropy of the ExLIWD(a, b, c, δ) with p.d.f. (5) is
in which
On substituting \( z=\psi \left(y;\underset{\_}{\theta}\right) \) in (37) and integrating after expanding the exponential terms, we get (35) in the light of (36).
The stress - strength reliability concept, initially considered by Church and Harris (1970), is used for describing the life of a component having strength Y2 subjected to a stress Y1, where both Y1 and Y2 are random variables. Obviously, the component fails if the Y1 > Y2 and will survive otherwise. Then the stress-strength reliability measure, R of a system is defined as
The following result gives an expression for R when Yi has the ExLIWD(a, b, c, δi), for i = 1,2.
Result 10
For i = 1,2, let Yi be a random variable following the ExLIWD(a, b, c, δi) with.
p.d.f. f(.) as defined in (5). Then the stress-strength reliability measure, \( R=\frac{\delta_1}{\delta_1+{\delta}_2} \).
Proof. As defined in (38), the stress-strength reliability measure,
which shows that R depends only on the values of the parameter δ.
Some structural properties
Here we present certain structural properties of the ExLIWD(a, b, c, δ), establishing some relations of the distribution with certain existing Weibull models including the inverse generalised Weibull distribution IGWD(b, c, δ) of Jain et al. (2014) with c.d.f.
for b > 0, c > 0, δ > 0 and y > 0. The proofs of Results 13, 14, 15, 16, 17 and 18 are.
straight forward and hence omitted.
Result 11
If Y be any continuous random variable with c.d.f. F(y), for every y ∈ [ea, ∞), then,
for z ∈ [ea, ∞) if and only if Y has the ExLIWD(a, b, c, δ).
Proof. Proof follows from Theorem 8 (Rinne (2008), pp. 262) with h(y) = ln[1 − exp[−{b−1[ln(y) − a]}c]]−δ and d = 1 so that \( \underset{y\to \infty }{\lim }h(y)=\infty, h\left({e}^a\right)=0 \), and E(h(Y)) = 1 for Y ∈ [ea, ∞).
Result 12
The c.d.f. F(y) of the ExLIWD(a, b, c, δ) can be approximated by the c.d.f. of the IGWD (bceac, c, δ) for extremely small values of y.
Proof. For y > ea, the c.d.f. F(y) of the ExLIWD(a, b, c, δ) given in (4) can be written as
For extremely small values of y, take y = ea + t, t > 0. Using this in (42), we obtain
On expanding the term ln(1 + te−a) in (44) and discarding the second term onwards, we obtain the following representation of F(t) as t → 0.
Clearly (45) is the c.d.f. of the IGWD (bceac, c, δ) in the light of (40).
Result 13
A random variable Y has the ExLIWD(a, b, c, δ) if and only if X = ln (Y) − a has the IGWD(b,c,δ) for a ∈ (−∞, ∞), b > 0, c > 0, δ > 0, x > 0 and y > ea.
Result 14
A random variable Y with support (ea, ∞) has the ExLIWD(a, b, c, δ) for a ∈ (−∞, ∞), b > 0, c > 0 and δ > 0 with p.d.f. (5) if and only if Z = [ln(Y) − a]−1 has the EWD(c, b−1, δ) of Mudholkar and Srivastava (1993).
Result 15
A random variable Y with support (ea, ∞) has the ExLIWD(a, b, c, δ) for a ∈ (−∞, ∞), b > 0, c > 0 and δ > 0 with p.d.f. (5) if and only if Z = [b−1{ln(y) − a}]−c has the exponentiated exponential distribution (EED(b, δ)) of Gupta and Kundu (2001).
Result 16
A random variable Y with support (ea, ∞) has the ExLIWD(a, b, c, 1) for a ∈ (−∞, ∞), b > 0, c > 0 with p.d.f. (5) if and only if Z = exp {b−1[ln(Y) − a]}−1 follows an the LWD considered by Kumar and Nair (2018b).
Result 17
A random variable Y with support (ea, ∞) has the ExLIWD(a, b, c, δ) for a ∈ (−∞, ∞), b > 0, c > 0 and δ > 0 with p.d.f. (5) if and only if Z = Yα, follows the ExLIWD(αa, αb, c, δ).
Result 18
A random variable Y with support (ea, ∞) has the ExLIWD(a, b, c, δ) for a ∈ (−∞, ∞), b > 0, c > 0 and δ > 0 with p.d.f. (5) if and only if Z = βY follows the ExLIWD(ln(β) + a, b, c, δ) for Z > eln(β) + a.
Distribution and moments of order statistics
Let Yi:n be the ith order statistics based on a random sample Y1, Y2, ..., Yn of size n from the ExLIWD(a, b, c, δ), with p.d.f. f(y) = f(y; δ) as given in (5) and let μr = μr(δ) be the rth raw moment as given in (23). In this section we obtain the distribution and moments of the ith order statistics Yi:n of the ExLIWD(a, b, c, δ).
Result 19
For y > 0, the p.d.f. of the ith order statistics of the ExLIWD(a, b, c, δ) is given by
where
Proof. Consider a random sample of size n from an ExLIWD(a, b, c, δ). The p.d.f. of the ith order statistics Yi:n can be defined as
By using (4) and (5) we have the following from (47) in the light of the binomial expansion.
which reduces to (46).
As a consequence of Result 19, we have the following corollaries.
Corollary 3 For y > 0, the p.d.f. of the largest order statistics Yn : n = max (Y1, Y2, …, Yn) of the ExLIWD(a, b, c, δ) is
where \( {\delta}_1^{\ast }=\delta \left(k+1\right) \).
Corollary 4 For y > 0, the p.d.f. of the smallest order statistics Y1 : n = min (Y1, Y2, …, Yn) of the ExLIWD(a, b, c, δ) is
where \( {\delta}_2^{\ast }= n\delta \).
Corollary 5 For y > 0, the p.d.f. of the median Ym + 1 : n, with n = 2 m + 1 of the ExLIWD(a, b, c, δ) is the following, in which \( {\delta}_3^{\ast }=\delta \left(m+k+1\right). \)
Corollary 6 The smallest order statistics Y1 : n = min (Y1, Y2, …, Yn) follows the ExLIWD(a, b, c, nδ) if and only if Y1 follows the ExLIWD(a, b, c, δ).
Result 20
For r > 0, the rth raw moment of the ith order statistics Yi : n of the ExLIWD(a, b, c, δ) is the following, in which, νn : i : k and δ∗ are as defined in (46).
Proof. Proof follows from Results 4 and 19.
Estimation
In this section, we discuss the M.L. estimation of the parameters of the ExLIWD(a, b, c, δ) and derive the likelihood equations for complete and right-censored cases. A data set of observations without any missing value is termed as an uncensored/complete set. The likelihood function for a complete data set having Y1, Y2, ...Yn is given by
Censored data is regularly encountered in survival and reliability analysis as the information regarding the survival time of some of the observations under study may remain incomplete or unknown. According to Klein and Moeschberger (2006), censored data sets represent a particular type of missing data. Assume that we have a random sample of n units with true survival times T1, T2, ..., Tn having p.d.f. f(y) and c.d.f. F(y). However, due to right censoring such as staggered entry, loss to follow-up, competing risks (death from other causes) or any combination of these, it might be impossible to observe the survival times in all of these n cases. Thus a subject can either be observed for its full life time or can be censored. Clearly, the observed data are the minimum of the survival time and censoring time for each unit. Assume that C1, C2, ..., Cn are the censoring times of the n units drawn independently of Ti, i = 1, 2,..., n. On each of n units, we observe n random pairs (Yi,ηi), in which Yi = min (Ti,Ci) and
for i = 1,2,...,n. Clearly ηi, the censorship indicator indicates whether Ti is censored or not. Then, the likelihood function for the censored data set is given by
Estimation of parameters for the ExLIWD(a, b, c, δ) for complete data sets
Here we discuss the M.L. estimation of the parameters of the ExLIWD(a, b, c, δ) based on a random sample Y1, Y2, ..., Yn taken from the distribution. For \( \psi \left(y;\underline{\theta}\right) \) as defined in (4), the log-likelihood function for the vector of parameters Υ = (a, b, c, δ) is given by
On differentiating the log-likelihood function (57) with respect to the parameters a, b, c, and δ respectively and equating to zero, we obtain the following likelihood equations.
and
in which \( {\varDelta}_{\Upsilon}\left({y}_i\right)=\frac{1-\delta \exp \left[-\psi {\left({y}_i;\underline{\theta}\right)}^{-c}\right]}{1-\exp \left[-\psi {\left({y}_i;\underline{\theta}\right)}^{-c}\right]} \)
Estimation of parameters for the ExLIWD(a, b, c, δ) for censored data sets
Let r be the number of failures among the n units. Then, using (4) and (5) in (56) the log-likelihood function of the ExLIWD(a, b, c, δ) for censored data set is given by
On differentiating the log-likelihood function (62) with respect to the parameters a, b, c, and δ respectively and equating to zero, we obtain the corresponding likelihood equations as
and
in which \( {\varDelta}_{\Upsilon}\left({y}_i\right)=\frac{1-\delta \exp \left[-\psi {\left({y}_i;\underline{\theta}\right)}^{-c}\right]}{1-\exp \left[-\psi {\left({y}_i;\underline{\theta}\right)}^{-c}\right]} \).
These likelihood equations may not always provide a unique solution and in such cases the maximum of the likelihood function is obtained in the border of the domain of the parameters. Hence we have obtained the second order partial derivatives of the log-likelihood function of the ExLIWD(a, b, c, δ) and by using R software it has been verified that the values of the second order partial derivatives are negative for the estimated parametric values for both complete and censored cases.
Applications
In this section, we illustrate the utility of the ExLIWD(a, b, c, δ) as a survival distribution suitable for handling complete as well as censored cancer data by considering three real life data sets arising from cancer related fields, out of which the first and third are complete data sets while the second one is a censored data set.
Data Set 1 The data on survival of 40 patients suffering from leukemia, from the Ministry of Health Hospitals in Saudi Arabia taken from Abouammoh et al. (1994)
115, 181, 255, 418, 441, 461, 516, 739, 743, 789, 807, 865, 924, 983, 1024, 1062, 1063, 1165, 1191, 1222, 1222, 1251, 1277, 1290, 1357, 1369, 1408,1455, 1478, 1549, 1578, 1578, 1599, 1603, 1605, 1696, 1735, 1799, 1815, 1852.
Data Set 2 Censored data discussed in Sickle-Santanello et al. (1988) and given in Klein and Moeschberger (2006). The data consist of death times (in weeks) of patients with cancer of tongue with aneuploid DNA profile. The observations are
1, 3, 3, 4, 10, 13, 13, 16, 16, 24, 26, 27, 28, 30, 30, 32, 41, 51, 61∗, 65, 67, 70, 72, 73, 74∗, 77, 79∗, 80∗, 81∗, 87∗, 87∗, 88∗, 89∗, 91, 93, 93∗, 96, 97∗, 100, 101∗, 104, 104∗, 108∗, 109∗, 120∗, 131∗, 150∗, 157, 167, 231∗, 240∗ and 400∗ where asterisks denote censored observations.
Data Set 3 The data set aken from Aldeni et al. (2017) consists of the remission time of 128 bladder cancer patients.
0.080, | 0.200, | 0.400, | 0.500, | 0.510, | 0.810, | 0.900, | 1.050, | 1.190, | 1.260, |
1.350, | 1.400, | 1.460, | 1.760, | 2.020, | 2.020, | 2.070, | 2.090, | 2.230, | 2.260, |
2.460, | 2.540, | 2.620, | 2.640, | 2.690, | 2.690, | 2.750, | 2.830, | 2.870, | 3.020, |
3.250, | 3.310, | 3.360, | 3.360, | 3.480, | 3.520, | 3.570, | 3.640, | 3.700, | 3.820, |
3.880, | 4.180, | 4.230, | 4.260, | 4.330, | 4.340, | 4.400, | 4.500, | 4.510, | 4.870, |
4.980, | 5.060, | 5.090, | 5.170, | 5.320, | 5.320, | 5.340, | 5.410, | 5.410, | 5.490, |
5.620, | 5.710, | 5.850, | 6.250, | 6.540, | 6.760, | 6.930, | 6.940, | 6.970, | 7.090, |
7.260, | 7.280, | 7.320, | 7.390, | 7.590, | 7.620, | 7.630, | 7.660, | 7.870, | 7.930, |
8.260, | 8.370, | 8.530, | 8.650, | 8.660, | 9.020, | 9.220, | 9.470, | 9.740, | 10.06, |
10.34, | 10.66, | 10.75, | 11.25, | 11.64, | 11.79, | 11.98, | 12.02, | 12.03, | 12.07, |
12.63, | 13.11, | 13.29, | 13.80, | 14.24, | 14.76, | 14.77, | 14.83, | 15.96, | 16.62, |
17.12, | 17.14, | 17.36, | 18.10, | 19.13, | 20.28, | 21.73, | 22.69, | 23.63, | 25.74, |
25.82, | 26.31, | 32.15, | 34.26, | 36.66, | 43.01, | 46.12, | 79.05. |
The summary statistics for the complete data sets, Data Set 1 and Data Set 3 are provided in Table 1 while that of Data Set 2 is provided in Table 2. From the values it can be observed that Data Set 1 is negatively skewed while Data Set 3 is highly positively skewed in nature.
The log likelihood functions of the ExLIWD(a, b, c, δ) for the complete data sets are obtained using (57) while that corresponding to the censored data set is obtained using (62). The fit of the ExLIWD(a, b, c, δ) to Data set 1, 2 and 3 is evaluated using the Kolmogorov - Smirnov (K-S) statistic. The values of the K-S statistic corresponding to the ExLIWD(a, b, c, δ) for the three data sets along with the corresponding critical values at 1% level are provided in Table 3 and from these values, it can be concluded that the fit of the distribution is significant in all the three cases. The estimates of the parameters of the ExLIWD(a, b, c, δ) are computed using the R software and are provided in Tables 4, 5 and 6 respectively along with the values of the standard errors of the estimates (SE), calculated values of the statistic (t - Value) and P-values of the estimates of the parameters for the three data sets. From these tables, it can be seen that the estimates are significant.
For establishing the suitability of the proposed model as compared to some of the existing models, we have considered the following distributions, among them some are recently developed ones.
-
The three parameter Gompertz-Lindley distribution (GLD) of Koleoso et al. (2019)
-
The extended log-inverse Weibull distribution (ELIWD) of Kumar and Nair (2018c)
-
The three parameter Lindley distribution (LD) of Shanker et al. (2017)
-
The exponentiated power Lindley distribution (EPLD) of Ashour and Eltehiwy (2015)
-
The Kumaraswamy modified inverse Weibull distribution (KMIWD) of Aryal and Elbatal (2015)
-
The inverse generalized Weibull distribution (IGWD) of Jain et al. (2014)
-
The generalized inverse generalized Weibull distribution (GIGWD) of Jain et al. (2014)
-
The generalized inverse Weibull distribution (GIWD) of de Gusmao et al. (2011)
-
The exponentiated generalised inverse Weibull distribution (EGIWD) of Elbatal (2011)
-
The log-generalized inverse Weibull distribution (LGIWD) of de Gusmao et al. (2011)
All the above distributions are fitted to Data sets 1, 2 using the likelihood function (54) and (56) for complete and censored data sets respectively. In both these cases it can be seen that the ExLIWD provides the best fit to the data sets while the GIGWD is seen to be the distribution providing the next best fit. Hence we have fitted the ExLIWD and the GIGWD to Data set 3 for comparing their efficiencies as distributional models. The M.L.E.s of the parameters of all these distributions are obtained using R-Software. The performances of these distributions are examined by using certain information criteria like ‘the Akaike information criteria (AIC)’, ‘the Bayesian information criteria (BIC)’, ‘the corrected Akaike information criteria (AICc)’ and ‘the consistent Akaike information criteria (CAIC)’. The numerical results obtained are summarised in Tables 7, 8 and 9. Further, for graphical comparison, we have obtained cumulative probability plots and the Weibull probability plots (WPP) corresponding to the models having the best fit for the three data sets as presented in Figs. 9, 10, 11, 12, 13 and 14. From Tables 7, 8 and 9 it can be observed that the ExLIWD(a, b, c, δ) gives relatively better fit to all the three data sets as compared to the existing models, since the values of AIC, BIC, AICc and CAIC are minimum. Further, the cumulative probability plots and the WPPs of the various distributions corresponding to the data sets also support this claim. The fact that the performance of the ExLIWD(a, b, c, δ) is better when compared to the existing models emphasises the utility of the proposed distribution as a distributional model for fitting cancer related data sets in both complete as well as censored cases.
Simulation
In order to assess the empirical performance of the M.L. estimators of the parameters of the ExLIWD(a, b, c, δ) in terms of their bias and mean square errors, a simulation study is carried out by generating observations from the ExLIWD(a, b, c, δ) for the parametric values a = 0.5, b = 2, c = 2.5, δ = 5. As the c.d.f. (4) of the ExLIWD(a, b, c, δ) is in a closed form, it is easy to generate pseudo-random numbers from the distribution using random numbers from the uniform distribution U (0,1) which are generated using statistical softwares like EXCEL. The corresponding ExLIWD(a, b, c, δ) random numbers are hence generated using the quantile function (13). The technique of bootstrapping is used to create multiple samples and we have considered 200 bootstrap samples of sizes n = 50, 100, 250 and 500 (see Efron and Tibshirani (1991)). R- package is utilised for bootstrapping and from the results presented in Table 10, it can be observed that as sample size increases, the bias approaches zero and decreasing trend is observed in the mean square errors (MSE) of the respective estimators. This reveals that the efficiency of the estimators of the corresponding parameters increases with increase in sample size.
Conclusion
In this paper we have considered a generalization of the log-inverse Weibull distribution of Kumar and Nair (2018b) in order to build a more flexible model which includes the nondecreasing shape for the hazard rate function, in addition and investigated several theoretical properties of the distribution. We have discussed the estimation of the parameters of the proposed distribution by method of maximum likelihood for both complete and censored cases and illustrated the usefulness of model with the help of certain medical data sets for highlighting the suitability of the model in cancer research. It has been verified using various information measures that the proposed distribution is a better distributional model for fitting such data sets as compared to many of its related models as well as many recently developed distributions. Further, the asymptotic behaviour of the maximum likelihood estimators are examined with the help of simulated data sets. Thus, through this paper, we have developed a wide class of distributions which possess more flexibility in terms of the shapes of its hazard rate function, measures of central tendency, dispersion, skewness and kurtosis so as to capable for modelling complete, censored as well as truncated data sets. Several inferential aspects of the distribution as well as related regression models are in the process of being studied, which will come out soon through another publication.
Availability of data and materials
All data generated or analysed during this study are included in this published article [and its supplementary information files].
Abbreviations
- AIC:
-
Akaike’s information criterion
- AICc:
-
Corrected Akaike’s information criterion
- BIC:
-
Bayesian information criterion
- CAIC:
-
Consistent Akaike information criterion
- c.d.f.:
-
Cumulative distribution function
- EGIWD:
-
The exponentiated generalised inverse Weibull distribution
- ELIWD:
-
Extended log-inverse Weibull distribution
- EPLD:
-
Exponentiated power Lindley distribution
- ExLIWD:
-
Exponentiated log-inverse Weibull distribution
- GIGWD:
-
Generalised inverse generalised Weibull distribution
- GIWD:
-
Generalised inverse Weibull distribution
- GLD:
-
Gompertz-Lindley distribution
- IGWD:
-
Inverse generalised Weibull distribution
- IWD:
-
Inverse Weibull distribution
- KMIWD:
-
Kumaraswamy modified inverse Weibull distribution
- LD:
-
Lindley distribution
- LGIWD:
-
Log-generalised inverse Weibull distribution
- LIWD:
-
Log-inverse Weibull distribution
- M.L.:
-
Maximum likelihood
- M.L.E.:
-
Maximum likelihood estimation
- p.d.f.:
-
Probability density function
- S.E.:
-
Standard Error
- WPP:
-
Weibull Probability Plot
References
Abouammoh, A., Abdulghani, S., Qamber, I.: On partial orderings and testing of new better than renewal used classes. Rel. Eng. Syst. Safety. 43, 37–41 (1994)
Ahmed, S.E., Castro-Kuriss, C., Flores, E., Leiva, V., Sanhueza, A.: A truncated version of the Birnbaum -Saunders distribution with an application in financial risk. Pak J. Stat. 26(1), 293–311 (2010)
Aldeni, M., Lee, C., Famoye, F.: Families of distributions arising from the quantile of generalized lambda distribution. J. Stat. Distrib. Appl. 4(1), 25 (2017)
Amemiya, T.: Regression analysis when the dependent variable is truncated normal. Econometrica. 41, 997–1016 (1973)
Aryal, G., Elbatal, I.: Kumaraswamy modified inverse Weibull distribution: theory and application. Appl. Mathematics Inform. Sci. 9, 651–660 (2015)
Ashour, S.K., Eltehiwy, M.A.: Exponentiated power Lindley distribution. J. Adv. Res. 6(6), 895–905 (2015)
Church, J.D., Harris, B.: The estimation of reliability from stress-strength relationships. Technometrics. 12, 49–54 (1970)
de Gusmao, F., Ortega, E., Cordeiro, G.: The generalized inverse Weibull distribution. Stat. Papers. 52, 591–619 (2011)
Drapella, A.: The complementary Weibull distribution: unknown or just forgotten? Qual. Reliab. Eng. Int. 9, 383–385 (1993)
Efron, B., Tibshirani, R.: Statistical data analysis in the computer age. Science. 253, 390–395 (1991)
Elbatal, I.: Exponentiated modified Weibull distribution. Econ. Qual. Contr. 26, 189–200 (2011)
Elbatal, I., Condino, F., Domma, F.: Reflected generalized beta inverse Weibull distribution: definition and properties. Sankhya B. 78, 316–340 (2016)
Gupta, R.D., Kundu, D.: Exponentiated exponential family: an alternative to gamma and Weibull distributions. Biom. J. 43, 117–130 (2001)
Jain, K., Singla, N., Sharma, S.K.: The generalized inverse generalized Weibull distribution and its properties. J. Probab. 2014, 17–28 (2014)
Jazi, M.A., Lai, C.D., Alamatsaz, M.H.: A discrete inverse Weibull distribution and estimation of its parameters. Stat. Methodol. 7(2), 121–132 (2010)
Keller, A.Z., Kamath, A.R.R., Perera, U.D.: Reliability analysis of CNC machine tools. Reliab. Eng. 3, 449–473 (1982)
Khan, M.S., Pasha, G.: The plotting of observations for the inverse Weibull distribution on probability paper. J. Adv. Res. Prob. Stat. 1(1), 11–22 (2009)
Khan, M.S., Pasha, G.R., Pasha, A.H.: Theoretical analysis of inverse Weibull distribution. WSEAS Trans. Mathematics. 7, 30–38 (2008)
Kizilersu, A., Kreer, M., Thomas, A.W.: Goodness-of-fit testing for left-truncated two-parameter Weibull distributions with known truncation point. Aust. J. Stat. 45, 15–42 (2016)
Klein, J.P., Moeschberger, M.L.: Survival Analysis: Techniques for Censored and Truncated Data (2006) Springer Science & Business Media
Koleoso, P., Chukwu, A., Bamiduro, T.: A three-parameter Gompertz-Lindley distribution: its properties and applications. J. Math. Theory Mod. 9(4), 29–42 (2019)
Kumar, C.S., Nair, S.R.: An extended version of Kumaraswamy inverse Weibull distribution and its properties. Statistica. 76, 249–262 (2016)
Kumar, C.S., Nair, S.R.: On some aspects of a flexible class of additive Weibull distribution. Commun. Stat.-Theory Methods. 47, 1028–1049 (2018a)
Kumar, C.S., Nair, S.R.: On some properties of the log-Weibull distribution. J. Stat. Math. Eng. 4, 3–10 (2018b)
Kumar, C.S., Nair, S.R.: On the log-inverse Weibull distribution and its properties. Am. J. Mathemat.l Manage. Sci. 37, 144–167 (2018c)
Lehmann, E.L.: The power of rank tests. Ann. Math. Stat. 24, 23–43 (1953)
Mudholkar, G.S., Srivastava, D.K.: Exponentiated Weibull family for analyzing bathtub failure-rate data. IEEE Trans. Reliab. 42, 299–302 (1993)
Rinne, H.: The Weibull Distribution: a Handbook. CRC Press, Boca Raton (2008)
Shahbaz, M.Q., Shahbaz, S., Butt, N.S.: The Kumaraswamy-inverse Weibull distribution. Pak. J. Stat. Oper. Res. 8, 479–489 (2012)
Shanker, R., Shukla, K.K., Shanker, R., Tekie, A.: A three-parameter Lindley distribution. Am. J. Math. Stat. 7(1), 15–26 (2017)
Sickle-Santanello, B.J., Farrar, W.B., Decenzo, J.F., Keyhani-Rofagha, S., Klein, J., Pearl, D., Laufman, H., O’Toole, R.V.: Technical and statistical improvements for flow cytometric dna analysis of paraffin-embedded tissue. Cytometry: J. Int. Soc. Anal. Cytol. 9(6), 594–599 (1988)
Singh, S.K., Singh, U., Sharma, V.K.: The truncated Lindley distribution: inference and application. J. Stat. Appl. Prob. 3(2), 219–224 (2014)
Zhang, T., Xie, M.: On the upper truncated Weibull distribution and its reliability implications. Rel. Eng. Syst. Safety. 96(1), 194–200 (2011)
Acknowledgements
The authors would like to express their sincere thanks to the Chief Editor, the Associate Editor and all the anonymous referees for their valuable comments on an earlier version of the paper, which greatly improved the quality and presentation of the manuscript.
Funding
The second author acknowledges the University Grants Commission, Government of India, for providing partial financial support through UGC-Travel Grants facility for travelling to the venue of ICOSDA 2019 for presenting the paper in the conference.
Author information
Authors and Affiliations
Contributions
The article is the result of the combined work of both the authors Kumar, C. S. and Nair, S. R.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix
I.Proof of Result 3
By definition, the characteristic function of the ExLIWD(a, b, c, δ) is the following for \( \psi \left(y;\underline{\theta}\right) \) as given in (4).
On substituting \( z=\psi \left(y;\underline{\theta}\right) \) in (67) we obtain,
which can be simplified as given below by expanding the exponential terms.
as the kth raw moment of the IWD(c) is Γ(1 − kc−1), for k ≤ [c − 1]. On using the relation \( {\sum}_{k=0}^{\infty}\frac{{\left({ite}^a\right)}^k{k}^j}{k!j!}={\sum}_{q=0}^j{\left(1-q\right)}_qS\left(j,q\right)\Phi \left(1,1-q;{e}^a it\right) \) in (68), we obtain (22).
II. Proof of Result 4
By definition, the rth raw moment of the ExLIWD(a, b, c, δ) is given by
On substituting z = ln(y) − a, we get
On expanding the exponential term erz in (70), we obtain
by using the substitution \( {\left(\frac{z}{b}\right)}^{-c} \) = x in (71) and hence applying (15). On further simplification, (74) reduces to (23) in the light that the mth raw moment of the IWD(c) is Γ(1 − mc−1), for m ≤ [c − 1]. Using the Ratio-test for convergence it can be verified that the expression for the rth raw moment of the ExLIWD(a, b, c, δ), (23) is convergent for all values of its parameters.
III. Elements of the Fisher information matrix
The elements of the observed Fisher information matrix IΥ = ((Iij)), i, j = 1, 2, 3, 4 corresponding to the likelihood function (57) are as follows in which\( {\Delta}_{\Upsilon}^{\ast}\left({y}_i\right)=\frac{e^{-\psi {\left({y}_i;\underset{\_}{\theta}\right)}^{-c}}}{{\left\{1-{e}^{-\psi {\left({y}_i;\underset{\_}{\theta}\right)}^{-c}}\right\}}^2} \).
and
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Kumar, C.S., Nair, S.R. A generalization to the log-inverse Weibull distribution and its applications in cancer research. J Stat Distrib App 8, 14 (2021). https://doi.org/10.1186/s40488-021-00116-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s40488-021-00116-1
Keywords
- Hazard rate
- Maximum likelihood estimation
- Model selection
- Order statistics
- Reliability measures
- Simulation