A generalization to the log-inverse Weibull distribution and its applications in cancer research

In this paper we consider a generalization of a log-transformed version of the inverse Weibull distribution. Several theoretical properties of the distribution are studied in detail including expressions for its probability density function, reliability function, hazard rate function, quantile function, characteristic function, raw moments, percentile measures, entropy measures, median, mode etc. Certain structural properties of the distribution along with expressions for reliability measures as well as the distribution and moments of order statistics are obtained. Also we discuss the maximum likelihood estimation of the parameters of the proposed distribution and illustrate the usefulness of the model through real life examples. In addition, the asymptotic behaviour of the maximum likelihood estimators are examined with the help of simulated data sets.


Introduction
introduced and studied the inverse Weibull distribution (IWD) through the cumulative distribution function (c.d.f.) for any y > 0 and c > 0. Certain modified versions of the IWD have been used frequently in survival analysis and reliability studies for modelling certain failure characteristics such as infant mortality, useful life, wear-out periods etc. For details regarding the applications of the IWD and its related versions, we can refer Drapella (1993), Mudholkar and Srivastava (1993), Khan et al. (2008), Khan and Pasha (2009), Jazi et al. (2010), de Gusmao et al. (2011, Shahbaz et al. (2012), Elbatal et al. (2016), Aryal and Elbatal (2015) and Kumar and Nair (2016, 2018a, 2018b, 2018c. Truncated versions of distributions like the Normal distribution, Weibull distribution, Lindley distribution etc. have found wide applications in various areas of survival analysis and reliability theory. For example see Ahmed et al. (2010), Amemiya (1973), Kizilersu et al. (2016), Singh et al. (2014), Zhang and Xie (2011) etc. A log-transformed version of the IWD namely "the log-inverse Weibull distribution (LIWD)" and its location-scale extended version namely the "extended log-inverse Weibull distribution (ELIWD)" capable for modelling truncated data sets were studied by Kumar and Nair (2018c) through their c.d.f.s and for y ≥ 0 and z ≥ e a with a ∈ (−∞, ∞), b > 0 and c > 0 respectively. The LIWD and the ELIWD find applications in several areas including industrial as well as bio medical fields, but are not suitable for data sets having non-decreasing failure rates. Moreover, the LIWD can be considered as a better alternative to the left truncated form of the IWD truncated at unity as it has much more flexibility in terms of its measures of central tendency, dispersion, skewness and kurtosis and the shapes of its p.d.f. The distribution with c.d.f.s (2) and (3) are denoted as LIWD(c) and ELIWD(a, b, c) respectively throughout this paper. The present paper proposes a Lehman Type II extension (see Lehmann (1953)) of the ELIWD(a, b, c) through the name "the exponentiated log-inverse Weibull distribution (ExLIWD)" by incorporating an additional shape parameter so as to include the nondecreasing shape for the hazard rate function, thereby increasing the flexibility of the model in handling data sets from various real life situations. Thus we attempt to establish that the ExLIWD is not only more flexible in terms of the shapes of its hazard rate function, measures of central tendency, dispersion, skewness and kurtosis, but also gives much better fit to complete, censored as well as truncated data sets.
The rest of the paper is organized as follows: In Section 2 we present the definition and a number of important properties of the ExLIWD including expressions for its characteristic function and moments. Some structural properties of the ExLIWD is suggested in Section 3 while Section 4 deals with the distribution and moments of order statistics. Section 5 contains the maximum likelihood (M.L.) estimation of the parameters of the distribution. The usefulness of the model as a survival distribution in various areas of cancer related applications is illustrated in Section 6 by considering three real life data sets out of which two are complete data sets and one is a censored cancer data set. Through Section 7 we examine the asymptotic behaviour of the estimators of the parameters of the ExLIWD with the help of simulated data sets.

Exponentiated log-inverse Weibull distribution
In this section we present the definition and some important properties of the exponentiated log-inverse Weibull distribution.
Definition 0.1 A continuous random variable Y is said to have "the exponentiated log-inverse Weibull distribution (ExLIWD)" if its c.d.f. is of the following form, for any a ∈ (−∞, ∞), b > 0, c > 0, δ > 0 and y > e a .
in which ψðy; θÞ ¼ ψðy; a; bÞ ¼ ½ ln ðyÞ − a b : Clearly the ExLIWD(a, b, c, δ) is a proportional hazard model (also known as a Lehman Type II extension) of the ELIWD(a, b, c). For the sake of convenience, a distribution with c.d.f. (4 ) is hereafter denoted as ExLIWD (a, b, c, δ).
A practical interpretation of the ExLIWD(a, b, c, δ) can be provided whenever δ is an integer. Consider a device constituted of δ independent and identically distributed components having the ELIWD(a, b, c) life times with c.d.f. Q 3 (.) as given in (3), connected in a series system so that the device fails if any of the components fail. Let Y 1 ,Y 2 ,...Y δ denote the life times of the components and let Y be the life of the system with c.d.f. F(y). Then which shows that the life time of the device has the ExLIWD(a, b, c, δ), in the light of (4). Now the p.d.f. f(y), the survival function FðyÞ, the hazard rate function h(y) and the reverse hazard rate function τ(y) of the ExLIWD(a, b, c, δ) are obtained as The plots of the c.d.f., the p.d.f. and the hazard rate function of the ExLIWD(a, b, c, δ) for particular values of its parameters are presented in Fig. 1, Fig. 2 and Fig. 3 respectively.
From Fig. 1 it can be observed that the plots of the c.d.f. F(y) coincide at the point (e b , 1 − (0.632121) δ ) for fixed values of the parameters b and δ and varying c. Hence it can be inferred that there is a probability of [1 − (0.632121) δ ] that an ExLIWD(a, b, c, δ) distributed life time is atmost e b for any value of c.
On differentiating the p.d.f. (5) and the hazard rate function (7) with respect to y, we have and Kumar and Nair Journal of Statistical Distributions and Applications (2021) respectively, in which ψ(y;θ) is as defined in (4).
Next we obtain the following results on the shapes of the ExLIWD(a, b, c, δ) with regards to its p.d.f. and hazard rate function, proofs of which follow directly from (9) and (10) since 1 − e − ½ψðy;θÞ − c > 0 for all values of a ∈ (−∞,∞), b > 0, c > 0, δ > 0 and y > e a , where ψ(y; θ) is as defined in (4).

Result 1
The p.d.f., f(y) of the ExLIWD(a, b, c, δ) is a decreasing function of y if ψðy; θÞ > ½lnðδÞ − c − 1 . Moreover, when ψðy; θÞ < ½ ln ðδÞ − c − 1 , f(y) is decreasing for y if Result 2 The hazard rate function, h(y) of the ExLIWD(a, b, c, δ) is a decreasing function of y whenever 1 − e − ½ψðy;θÞ − c > cf½ψðy; θÞ c ½1 þ c þ bψðy; θÞg − 1 The mode (M o ) of the ExLIWD(a,b,c,δ) is obtained from (9) as the solution of the equation Moreover, using the condition for uni-modality it can be observed that the On inverting the c.d.f. F (y p ) given in (4), the p th quantile function y p of the ExLIWD(a, b, c, δ) for p ∈ (0,1) is obtained as On substituting p = 0.5 in (13), we have the median(M) of the ExLIWD(a, b, c, δ) as We have calculated the values of the median and mode of the ExLIWD(a, b, c, δ) for particular values of its parameters and the corresponding plots are presented in Figs. 4 and 5 respectively. The following aspects of the median and mode of the ExLIWD(a, b, c, δ) can be observed based on these Figures. Now we present the following integrals and functions which are required in the sequel.
For any a ∈ R = (−∞,∞), in which (x) k = x(x + 1)…(x + k − 1), for k ≥ 1 with (x) 0 =1. For Re(ν) > 0 and Re(λ) > 0, we have the following integrals. and The incomplete Gamma functions γ(λ,z) and Γ(λ,z) as defined in (17) and (18) can be represented in terms of the confluent hypergeometric function φ(λ,γ; z) as and The importance of moments in identifying the salient features of a distribution like mean, variance, skewness and kurtosis needs no emphasis. We obtain expressions for the characteristic function and the r th raw moment of the ExLIWD(a, b, c, δ) through the following results, their proofs are included in Appendix.

Result 3
in which for any a∈R þ ¼ ½0; ∞Þ, [a] denotes the integer part of a, φ(α,β;θ) is the confluent hypergeometric function and S(m, j) is the Stirling numbers of second kind.

Result 4
For r ≥ 1 and Re(1 − kc −1 ) > 0, the r th raw moment μ r of the ExLIWD(a, b, c, δ).is The values of the raw moments of the ExLIWD(a, b, c, δ) can be calculated numerically by using mathematical softwares like MATHEMATICA, MATHCAD etc. We have calculated the mean(μ), variance(σ 2 ), moment measure of skewness(γ 1 ) and moment measure of kurtosis(γ 2 ) of the ExLIWD(a, b, c, δ) and plotted them in Figs. 6 and 7.
Percentile measures of skewness and kurtosis of a distribution provide a better understanding of the pattern of skewness and kurtosis of the distribution and are less affected by the tail behaviour of the distribution or by outliers. Moreover, the fact that the moment measures of kurtosis can become infinite for many heavy tailed distributions highlights the importance of the percentile measures. The percentile measure of skewness and kurtosis of the ExLIWD(a, b, c, δ) are obtained by the following results in which.
and δ > 0. Result 5 The Galton's and Bowley's percentile measures of skewness denoted by G a and B o respectively of the ExLIWD(a, b, c, δ) are given by and Proof. Proof follows from the following definitions of the Galton and Bowley measures of skewness, in the light (13).
Result 6 The Schmid-Tred'e percentile measure of kurtosis L of the ExLIWD(a, b, c, δ) is given by Proof. Proof is straight forward from the following definitions of the Schmid -Tred'e measure of kurtosis L, in the light of (13).
Based on Results 5 and 6, we have the following remarks. Remark 1 The ExLIWD(a, b, c, δ) is symmetric for those values of the parameters satisfying the condition e q 0:8 þ e q 0:2 ¼ 2e − q 0:5 and positively (negatively) skewed for e q 0:8 þ e q 0:2 less than (greater than) 2e − q 0:5 . Remark 2 The ExLIWD(a, b, c, δ) is mesokurtic for those values of the parameters satisfying the condition e q 0:975 − e q 0:025 ¼ 2:9058ðe q 0:75 − e q 0:25 Þ and leptokurtic (platykurtic) if e q 0:975 − e q 0:025 is greater than (less than) 2:9058ðe q 0:75 − e q 0:25 Þ: The values of G a and L for particular values of the parameters c and δ and fixed values of the other parameters are calculated and plotted in Fig. 8. From the figures it can be observed that the values of G a is a non-increasing function of δ and c for fixed values of the other parameters while L is non-increasing for increasing values of δ and small values of c as is evident in the case of moment measures of skewness and kurtosis also.
The following result provides an expression for the s th incomplete moment of the ExLIWD(a, b, c, δ) in the light of (20) based on which we have Corollaries 1 and 2.
is the following.
Proof. The s th incomplete moment of the ExLIWD(a, b, c, δ) with p.d.f. (5) is defined as where ψ(y;θ) is as defined in (4). On integrating (31) after using the substitution u = ψ(y;θ) and hence expanding the terms using (15), we have where Γ(α, y) is the incomplete Gamma function as defined in (18) where H 1 (z) is as given in (30), when s = 1. Corollary 2 For the ExLIWD(a, b, c, δ), the equations to the Lorenz and Bonferroni curves are LðpÞ ¼ μ − 1 i H 1 ðy p Þ and B(p) = (pμ 1 ) −1 H 1 (y p ) respectively, where y p and H s (y) are as defined in (13) and (30) respectively.
The geometric mean (GM) finds application in survival analysis, especially in cases where only a cumulative survival is available. Here we present an expression for the GM of the ExLIWD(a, b, c, δ).
Proof. By definition, . On integrating (34) after expanding the terms, we obtain (33) in the light of (23).
Entropy measures the level of randomness or uncertainty in a system. The expressions for the R'enyi and Shannon entropies of the ExLIWD(a, b, c, δ) are provided through the following result.
The stress -strength reliability concept, initially considered by Church and Harris (1970), is used for describing the life of a component having strength Y 2 subjected to a stress Y 1 , where both Y 1 and Y 2 are random variables. Obviously, the component fails if the Y 1 > Y 2 and will survive otherwise. Then the stress-strength reliability measure, R of a system is defined as The following result gives an expression for R when Y i has the ExLIWD(a, b, c, δ i ), for i = 1,2.

Result 10
For i = 1,2, let Y i be a random variable following the ExLIWD(a, b, c, δ i ) with. Kumar and Nair Journal of Statistical Distributions and Applications (2021) (5). Then the stress-strength reliability measure, R ¼ δ 1 δ 1 þδ 2 . Proof. As defined in (38), the stress-strength reliability measure, which shows that R depends only on the values of the parameter δ.

Some structural properties
Here we present certain structural properties of the ExLIWD(a, b, c, δ), establishing some relations of the distribution with certain existing Weibull models including the inverse generalised Weibull distribution IGWD(b, c, δ) of Jain et al. (2014) with c.d.f.

Result 11
If Y be any continuous random variable with c.d.f. F(y), for every y ∈ [e a , ∞), then,  (4) can be written as For extremely small values of y, take y = e a + t, t > 0. Using this in (42), we obtain On expanding the term ln(1 + te −a ) in (44) and discarding the second term onwards, we obtain the following representation of F(t) as t → 0. Kumar and Nair Journal of Statistical Distributions and Applications (2021) Clearly (45) is the c.d.f. of the IGWD (b c e ac , c, δ) in the light of (40).

Result 14
A random variable Y with support (e a , ∞) has the ExLIWD (

Result 15
A random variable Y with support (e a , ∞) has the ExLIWD (

Result 16
A random variable Y with support (e a , ∞) has the ExLIWD(a, b, c, 1) for a ∈ (−∞, follows an the LWD considered by Kumar and Nair (2018b).

Distribution and moments of order statistics
Let Y i:n be the i th order statistics based on a random sample Y 1 , Y 2 , ..., Y n of size n from the ExLIWD(a, b, c, δ), with p.d.f. f(y) = f(y; δ) as given in (5) and let μ r = μ r (δ) be the r th raw moment as given in (23). In this section we obtain the distribution and moments of the i th order statistics Y i:n of the ExLIWD(a, b, c, δ).

Result 19
For y > 0, the p.d.f. of the i th order statistics of the ExLIWD(a, b, c, δ) is given by where Kumar and Proof. Consider a random sample of size n from an ExLIWD(a, b, c, δ). The p.d.f. of the i th order statistics Y i:n can be defined as By using (4) and (5) we have the following from (47) in the light of the binomial expansion.

Result 20
For r > 0, the r th raw moment of the i th order statistics Y i : n of the ExLIWD(a, b, c, δ) is the following, in which, ν n : i : k and δ * are as defined in (46).
Proof. Proof follows from Results 4 and 19. Kumar and Nair Journal of Statistical Distributions and Applications (2021)

Estimation
In this section, we discuss the M.L. estimation of the parameters of the ExLIWD(a, b, c, δ) and derive the likelihood equations for complete and right-censored cases. A data set of observations without any missing value is termed as an uncensored/complete set. The likelihood function for a complete data set having Y 1 , Y 2 , ...Y n is given by Censored data is regularly encountered in survival and reliability analysis as the information regarding the survival time of some of the observations under study may remain incomplete or unknown. According to Klein and Moeschberger (2006), censored data sets represent a particular type of missing data. Assume that we have a random sample of n units with true survival times T 1 , T 2 , ..., T n having p.d.f. f(y) and c.d.f. F(y). However, due to right censoring such as staggered entry, loss to follow-up, competing risks (death from other causes) or any combination of these, it might be impossible to observe the survival times in all of these n cases. Thus a subject can either be observed for its full life time or can be censored. Clearly, the observed data are the minimum of the survival time and censoring time for each unit. Assume that C 1 , C 2 , ..., C n are the censoring times of the n units drawn independently of T i , i = 1, 2,..., n. On each of n units, we observe n random pairs (Y i ,η i ), in which Y i = min (T i ,C i ) and for i = 1,2,...,n. Clearly η i , the censorship indicator indicates whether T i is censored or not. Then, the likelihood function for the censored data set is given by Estimation of parameters for the ExLIWD(a, b, c, δ) for complete data sets Here we discuss the M.L. estimation of the parameters of the ExLIWD(a, b, c, δ) based on a random sample Y 1 , Y 2 , ..., Y n taken from the distribution. For ψðy; θÞ as defined in (4), the log-likelihood function for the vector of parameters Υ = (a, b, c, δ) is given by On differentiating the log-likelihood function (57) with respect to the parameters a, b, c, and δ respectively and equating to zero, we obtain the following likelihood equations.
and Kumar and Nair Journal of Statistical Distributions and Applications (2021) Let r be the number of failures among the n units. Then, using (4) and (5) in (56) the log-likelihood function of the ExLIWD(a, b, c, δ) for censored data set is given by On differentiating the log-likelihood function (62) with respect to the parameters a, b, c, and δ respectively and equating to zero, we obtain the corresponding likelihood equations as X These likelihood equations may not always provide a unique solution and in such cases the maximum of the likelihood function is obtained in the border of the domain of the parameters. Hence we have obtained the second order partial derivatives of the log-likelihood function of the ExLIWD(a, b, c, δ) and by using R software it has been verified that the values of the second order partial derivatives are negative for the estimated parametric values for both complete and censored cases.

Applications
In this section, we illustrate the utility of the ExLIWD(a, b, c, δ) as a survival distribution suitable for handling complete as well as censored cancer data by considering three real life data sets arising from cancer related fields, out of which the first and third are complete data sets while the second one is a censored data set.
The log likelihood functions of the ExLIWD(a, b, c, δ) for the complete data sets are obtained using (57) while that corresponding to the censored data set is obtained using (62). The fit of the ExLIWD(a, b, c, δ) to Data set 1, 2 and 3 is evaluated using the Kolmogorov -Smirnov (K-S) statistic. The values of the K-S statistic corresponding to the ExLIWD(a, b, c, δ) for the three data sets along with the corresponding critical values at 1% level are provided in Table 3 and from these values, it can be concluded that the fit of the distribution is significant in all the three cases. The estimates of the parameters of the ExLIWD(a, b, c, δ) are computed using the R software and are provided in Tables 4, 5 and 6 respectively along with the values of the standard errors of the estimates (SE), calculated values of the statistic (t -Value) and P-values of the estimates of the parameters for the three data sets. From these tables, it can be seen that the estimates are significant.
For establishing the suitability of the proposed model as compared to some of the existing models, we have considered the following distributions, among them some are recently developed ones. Kumar and Nair Journal of Statistical Distributions and Applications (2021)  The three parameter Gompertz-Lindley distribution (GLD) of Koleoso et al. (2019) The extended log-inverse Weibull distribution (ELIWD) of Kumar and Nair (2018c) The three parameter Lindley distribution (LD) of Shanker et al. (2017) The exponentiated power Lindley distribution (EPLD) of Ashour and Eltehiwy (2015) The Kumaraswamy modified inverse Weibull distribution (KMIWD) of Aryal and Elbatal (2015) The All the above distributions are fitted to Data sets 1, 2 using the likelihood function (54) and (56) for complete and censored data sets respectively. In both these cases it can be seen that the ExLIWD provides the best fit to the data sets while the GIGWD is seen to be the distribution providing the next best fit. Hence we have fitted the ExLIWD and the GIGWD to Data set 3 for comparing their efficiencies as distributional models. The M.L.E.s of the parameters of all these distributions are obtained using R-Software. The performances of these distributions are examined by using certain information criteria like 'the Akaike information criteria (AIC)', 'the Bayesian information criteria (BIC)', 'the corrected Akaike information criteria (AICc)'  and 'the consistent Akaike information criteria (CAIC)'. The numerical results obtained are summarised in Tables 7, 8 and 9. Further, for graphical comparison, we have obtained cumulative probability plots and the Weibull probability plots (WPP) corresponding to the models having the best fit for the three data sets as presented in Figs. 9, 10, 11, 12, 13 and 14. From Tables 7, 8 and 9 it can be observed that the ExLIWD(a, b, c, δ) gives relatively better fit to all the three data sets as compared to the existing models, since the values of AIC, BIC, AICc and CAIC are minimum. Further, the cumulative probability plots and the WPPs of the various distributions corresponding to the data sets also support this claim. The fact that the performance of the ExLIWD(a, b, c, δ) is better when compared to the existing models emphasises the utility of the proposed distribution as a distributional model for fitting cancer related data sets in both complete as well as censored cases.

Simulation
In order to assess the empirical performance of the M.L. estimators of the parameters of the ExLIWD(a, b, c, δ) in terms of their bias and mean square errors, a simulation study is carried out by generating observations from the ExLIWD(a, b, c, δ) for the parametric values a = 0.5, b = 2, c = 2.5, δ = 5. As the c.d.f. (4) of the ExLIWD(a, b, c, δ) is in a closed form, it is easy to generate pseudo-random numbers from the distribution using random numbers from the uniform distribution U (0,1) which are generated using statistical softwares like EXCEL. The corresponding ExLIWD(a, b, c, δ) random numbers are hence generated using the quantile function (13). The technique of bootstrapping is used to create multiple samples and we have considered 200 bootstrap samples of sizes n = 50, 100, 250 and 500 (see Efron and Tibshirani (1991)). R-package is utilised for bootstrapping and from the results presented in Table 10, it can be observed that as sample size increases, the bias approaches zero and decreasing trend is observed in the mean square errors (MSE) of the respective estimators. This reveals that the efficiency of the estimators of the corresponding parameters increases with increase in sample size.

Conclusion
In this paper we have considered a generalization of the log-inverse Weibull distribution of Kumar and Nair (2018b) in order to build a more flexible model which includes the nondecreasing shape for the hazard rate function, in addition and investigated several theoretical properties of the distribution. We have discussed the estimation of the parameters of the proposed distribution by method of maximum likelihood for both complete and censored cases and illustrated the usefulness of model with the help of certain medical data sets for highlighting the suitability of the model in cancer research. It has been verified using various information measures that the proposed distribution is a better distributional model for fitting such data sets as compared to many of its related models as well as many recently developed distributions. Further, the asymptotic behaviour of the maximum likelihood estimators are examined with the help of simulated data sets. Thus, through this paper, we have developed a wide class of distributions which possess more flexibility in terms of the shapes of its hazard rate function, measures of central tendency, dispersion, skewness and kurtosis so as to capable for modelling complete, censored as well as truncated data sets. Several inferential aspects of the distribution as well as related regression models are in the process of being studied, which will come out soon through another publication.