Methodology  Open  Published:
The odd loglogistic logarithmic generated family of distributions with applications in different areas
Journal of Statistical Distributions and Applicationsvolume 4, Article number: 6 (2017)
Abstract
We introduce and study general mathematical properties of a new generator of continuous distributions with three extra parameters called the odd loglogistic logarithmic generated family of distributions. We present some special models and investigate the asymptotes and shapes. The new density function can be expressed as a linear combination of exponentiated densities based on the same baseline distribution. Explicit expressions for the ordinary and incomplete moments, quantile and generating functions, Shannon and Rényi entropies and order statistics, which hold for any baseline model, are determined. We discuss the estimation of the model parameters by maximum likelihood. Further, we introduce the new family in longterm survival models. We illustrate the potentiality of the proposed models by means of four applications to real data.
Introduction
Statistical distributions are very useful in describing and predicting real world phenomena. Numerous extended distributions have been extensively used over the last decades for modeling data in several areas. Recent developments focus on defining new families that extend wellknown distributions and at the same time provide greater flexibility in modeling data in practice. Hence, several classes to generate new distributions by adding one or more parameters have been proposed in the statistical literature. Some wellknown generators are the MarshallOlkin generated (MOG) by Marshall and Olkin (1997), betaG by Eugene et al. (2002), KumaraswamyG (KwG) by Cordeiro and de Castro (2011), WeibullG by Bourguignon et al. (2014), exponentiated halflogisticG by Cordeiro et al. (2014a), LomaxG by Cordeiro et al. (2014b), among others.
Let G(x;ξ) be a baseline cumulative distribution function (cdf) and ξ be the vector of associated parameters. Recently, Gleaton and Lynch (2004, 2006, 2010) defined the cdf of the odd loglogistic family with one extra shape parameter α>0 by
where $\bar {G}(x;\boldsymbol {\xi })=1G(x;\boldsymbol {\xi })$. More precisely, they showed the following facts:

The set of generalized loglogistic (GLL) transformations form an Abelian group with the binary operation of composition;

The transformation group partitions the set of all lifetime distributions into equivalence classes, so that any two distributions in an equivalence class are related through a GLL transformation;

Either every distribution in an equivalence class has a moment generating function (mgf), or none does;

Every distribution in an equivalence class has the same number of moments;

Each equivalence class is linearly ordered according to the transformation parameter, with larger values of this parameter corresponding to smaller dispersion of the distribution about the common median class; and

Within an equivalence class, the KullbackLeibler information is an increasing function of the ratio of the transformation parameters.
In addition, Gleaton and Rahman (2010, 2014) obtained asymptotic results for the maximum likelihood estimates (MLEs) of the parameters of these two distributions. They proved that for distributions generated from either a twoparameter Weibull or a twoparameter inverse Gaussian distributions by a GLL transformation, the joint MLEs of the parameters are asymptotically normal and efficient, provided the GLL transformation parameter exceeds three.
We define the cdf of the odd loglogistic logarithmicG (OLLLG) family by
where G(x;ξ) is the baseline cdf depending on a parameter vector ξ and α>0 and 0<β<1 are two additional shape parameters. It includes the odd loglogisticG (OLLG) family (Gleaton and Lynch 2004, 2006) and the logarithmicG family. Some special models are given in Table 1.
This paper is organized as follows. In Section 2, we provide a physical interpretation of the OLLLG family and define two special cases. In Section 3, two useful linear representations are derived. In Section 4, we obtain explicit expressions for the moments and generating function. In Section 5, general expressions for the Rényi and Shannon entropies and order statistics are presented. Estimation of the model parameters by maximum likelihood is investigated in Section 6. We also present the performance of the MLEs through a simulation study. In Section 7, the OLLLG model is modified for possible presence of longterm survivors in the data. Four applications to real data illustrate the performance of the proposed models in Section 8. The paper is concluded in Section 9.
Motivation and special cases
The density function corresponding to (2) is given by
where g(x;ξ) is the baseline pdf. Hereafter, a random variable X with density function (3) is denoted by X∼OLLLG(α,β,ξ). Further, we can omit sometimes the dependence on the vector ξ of the parameters and write simply G(x)=G(x;ξ).
A motivation of this family can be explained as follows. Suppose that a parallel system is made up of N components and the lifetimes of the components are independent and identically distributed (iid) random variables, denoted by Z _{1},⋯,Z _{ N }, with common cdf (1). Then, the system fails as soon as the last component fails, namely the lifetime of the whole system is represented by X= max{Z _{1},⋯,Z _{ N }}. In many survival parallel systems, it is almost impossible to have a fixed number of components because some of them get lost or censored for various reasons. Therefore we may assume that N is a discrete random variable. Suppose that N has the logarithmic distribution with probability mass function given by
Then, the cdf of the life length of the whole system, X, is obtained as
which is identical to (2).
The hazard rate function (hrf) of X becomes
The OLLLG family of distributions is easily simulated by inverting (2) as follows: if U has a uniform U(0,1) distribution, then
has the density function (3), where Q _{ G }(u)=G ^{−1}(u) is the quantile function (qf) of the baseline G.
Remark 1
Although, we have stated that β∈(0,1), Eq. (2) is still a cdf if β<0. Hence, we can consider the OLLLG family defined for any β∈(−∞,0)∪(0,1).
In Appendix 1, we present the asymptotes and shapes of the OLLLG model.
2.1 Special OLLLG distributions
The OLLLG density function (3) allows for greater flexibility of its tails and can be widely applied in many areas of engineering and biology. It will be most tractable when G(x;ξ) and g(x;ξ) have simple analytic expressions. We now present and discuss some special cases of this family because it extends several widelyknown distributions in the literature.
2.1.1 Odd loglogistic logarithmic Weibull (OLLLW) model
Let $G(x;\boldsymbol {\xi })=1{\mathrm {e}}^{(b\,x)^{a}}$ the Weibull cdf, with scale parameter b>0 and shape parameter a>0, where ξ=(a,b). The OLLLW density function (for x>0) is given by
Figures 1 and 2 display some shapes of the OLLLW density and hazard functions, respectively. The plots in Fig. 1 indicate that this density function can be decreasing, unimodal and bimodal. Moreover, Fig. 2 reveals that the hrf of the OLLLW model can be decreasing, increasing, increasingdecreasingincreasing, bathtub shaped and upside bathtub shaped.
2.1.2 Odd loglogistic logarithmic normal (OLLLN) model
Consider the normal model with location parameter $\mu \in \mathbb {R}$ and scale parameter σ>0, whose pdf and cdf (for $x\in \mathbb {R}$) are given by
respectively, where ξ=(μ,σ)^{T}. Inserting these expressions in (3), the OLLLN pdf is given by
where $x \in \mathbb {R}$, $\mu \in \mathbb {R}$ is a location parameter, σ>0 is a scale parameter, α and β are the shape and scale parameters, and ϕ(·) and Φ(·) are the pdf and cdf of the standard normal distribution, respectively. For μ=0 and σ=1, we obtain the OLLLstandard normal (OLLLSN) distribution. Plots of the OLLLSN density function for selected parameter values are displayed in Fig. 3. We note that this model is suitable for unimodal and bimodal data sets.
Linear representations
Let $A(u)=\frac {u^{\alpha }}{u^{\alpha }+(1u)^{\alpha }}$ be the cdf of the odd loglogistic uniform (Gleaton and Lynch 2004, 2006, 2010) distribution. For β∈(0,1), we have 0<β A(u)<1. Then, we can apply the power series reported in Appendix 2 by taking u=G(x)^{α}, since they are always convergent in the interval (0,1), i.e., the power series are valid in the support of X. Henceforth, we consider that 0<β<1, which is not a restrictive assumption since it is in agreement with the logarithm distribution defined for compounding the OLLLG family.
Based on the power series $\log (1u)=\sum _{i=1}^{\infty }\frac {u^{i}}{i}$ (which converges for u<1), the OLLLG family cdf can be expanded as
By using Eq. (24) given in Appendix 2, we obtain
where the coefficients $h^{*}_{k}(\alpha,i)$ can be determined from the recursive formula given after (24).
Further, we define the exponentiatedG (“ExpG”) distribution for an arbitrary parent distribution G, say W∼Exp^{c} G, if W has cdf and pdf given by H _{ c }(x)=G(x)^{c} and h _{ c }(x)=c g(x) G(x)^{c−1}, respectively. This transformed model is also called the Lehmann type I distribution, say Exp ^{c}(G).
Then, we can rewrite F(x) as
where $d_{k}=\sum _{i=1}^{\infty }\frac {\beta ^{i}h^{*}_{k}(\alpha,i)}{i\,\log (1\beta)}$ and H _{ k }(x) is the ExpG cdf with power parameter k.
By differentiating (6), the pdf of X follows as
where h _{ k+1}(x)=(k+1) G(x)^{k} g(x) is the ExpG density function with power parameter (k+1).
Equation (7) reveals that the OLLLG density function is a linear combination of ExpG densities. Some structural properties of the new family such as the ordinary and incomplete moments and generating function can be determined from wellestablished properties of the ExpG distribution. The properties of ExpG distributions have been studied by many authors in recent years, see Mudholkar and Srivastava (1993) and Mudholkar et al. (1995) for exponentiated Weibull, Gupta and Kundu (1999) for exponentiated exponential and Nadarajah (2006) for exponentiated Gumbel, among others. The linear representations (6) and (7) are the main results of this section.
Moments and generating function
Let Y _{ k } be a expG random variable with power parameter k+1, i.e., having density h _{ k+1}(x). The nth ordinary moment of X∼ OLLLG follows from (7) as
where $\tau (n,k)=\int _{\infty }^{\infty } x^{n}\,G(x)^{k}\,g(x)\mathrm {d}x=\int _{0}^{1} Q_{G}(u)^{n}\,u^{k} \mathrm {d} u$. In fact, it is possible to exchange the infinite sum and the integral using the dominated convergence theorem for series.
Expressions for moments of several expG distributions are given by Nadarajah and Kotz (2006), which can be used to obtain E(X ^{n}). Cordeiro and Nadarajah (2011) determined τ(n,k) for some wellknown distributions such as the normal, beta, gamma and Weibull distributions.
The variance, skewness and kurtosis measures are given by
Plots of the skewness and kurtosis of the OLLLW distribution as functions of α and β for a=2 and b=1 are displayed in Fig. 4.
For empirical purposes, the shape of many distributions can be usefully described by what we call the incomplete moments. These types of moments play an important role for measuring inequality, for example, income quantiles and Lorenz and Bonferroni curves, which depend upon the incomplete moments of a distribution. The nth incomplete moment of X is calculated as
The last integral can be determined analytically or numerically for most baseline distributions. Equation (9) can be used to determine conditional moments, mean deviations and Bonferroni and Lorentz curves of X.
Let M(t)=E(e^{tX}) be the mgf of X. We can obtain M(t) from (7) as
where M _{ k }(t) is the mgf of Y _{ k } and $\rho (t,k)=\int _{\infty }^{\infty } {\mathrm {e}}^{t\,x}\,G(x)^{k}\,g(x) \mathrm {d}x= \int _{0}^{1} \exp [t\,Q_{G}(u)]\,u^{k} \mathrm {d} u$.
We can determine the mgfs for several OLLLG distributions directly from Eq. (10).
We present some mathematical properties of the odd loglogistic logarithmic exponential (OLLLE) distribution in Appendix 3 to illustrate the applicability of the previous results.
Other properties
We hardly need to emphasize the necessity and importance of entropies and order statistics in any statistical analysis especially in applied work.
5.1 Entropies
An entropy is a measure of variation or uncertainty of a random variable X. Two popular entropy measures are the Rényi and Shannon entropies (Shannon 1951; Rényi 1961). The Rényi entropy of a random variable with pdf f(x) is defined by
for γ>0 and γ≠1. The Shannon entropy of a random variable X is defined by E{− log[f(X)]}. It is the special case of the Rényi entropy when γ ↑1. Direct calculation gives
First, we define
By using the binomial expansion, we have
Second, we have the power series from Eq. (28) given in Appendix 2
where $s_{k}^{*}(\alpha, a_{3}+i,0)$ is defined there. Then,
After some algebraic manipulations, we can write
Then, the Shannon entropy of X reduces to
For the Rényi entropy, after some algebraic developments, we obtain
where
By using the generalized binomial expansion, we have
Further, we can write from Eq. (28)
where $s_{k}^{*}(\alpha, 2\gamma +i,\gamma)$ is defined there. Finally,
where Y _{ k }∼Beta(k+1+α (γ+i)+j,1). Figure 5 displays plots of the Rényi entropy versus γ for a=3.5, b=1 and selected values of α and β.
5.2 Order statistics
Order statistics make their appearance in many areas of statistical theory and practice. Suppose that X _{1},⋯,X _{ n } is a random sample from the OLLLG distribution. Let X _{ i:n } denote the ith order statistic. From Eqs. (6) and (7), the pdf of X _{ i:n } can be written as
where K=n!/[(i−1)! (n−i)!]. By using a result of Gradshteyn and Ryzhik (2000, Section 0.314) for a power series raised to a positive integer number, we obtain
where $e_{j+i1,0}=d_{0}^{j+i1}$ and, for k≥1,
Setting $d^{*}_{r}=(r+1)d_{r+1}$ and using a result of Gradshteyn and Ryzhik (2000, Section 0.316) for multiplying two power series, we have
where $e^{*}_{k}=\sum _{q=0}^{k} e_{j+i1,q}\,d^{*}_{kq}$. Hence, we can write
where (for k≥0)
Equation (11) is the main result of this section. It reveals that the pdf of the OLLLG order statistics is a linear combination of ExpG densities. So, several mathematical quantities of the OLLLG order statistics such as ordinary, incomplete and factorial moments, mgf, mean deviations, among others, can be obtained from those quantities of the ExpG distribution.
Estimation
In this section, we determine the MLEs of the model parameters of the new family from complete samples. Let x _{1},⋯,x _{ n } be the observed values from the OLLLG distribution with parameters α,β and ξ. Let θ=(α,β,ξ)^{⊤} be the parameter vector. The total loglikelihood function for θ is given by
The loglikelihood function can be maximized either directly or by solving the nonlinear likelihood equations obtained by differentiating (12). We use the goodness.fit function in R (R Development Core Team 2013) and NLMixed procedure in SAS to obtain the MLEs. The components of the score function U _{ n }(θ)=(∂ ℓ _{ n }/∂ α,∂ ℓ _{ n }/∂ β,∂ ℓ _{ n }/∂ ξ)^{⊤} are
and
where h ^{(ξ)}(·) means the derivative of the function h with respect to ξ.
6.1 Simulation study
We simulate the OLLLSN distribution (with μ=0,σ=1,α=0.2,0.5,β=−0.5,0.7) from Eq. (5) by using a random variable U having a uniform distribution in (0,1). We simulate n= 50, 150 and 300 variates and, for each replication, we evaluate the MLEs $\hat \mu $, $\hat \sigma $, $\hat \alpha $ and $\hat \beta $. We repeat this process 2000 times and determine the average estimates (AEs), biases and means squared errors (MSEs). The results are reported in Table 2.
The figures in Table 2 indicate that the MSEs and the AEs of the estimates of μ, σ, α and β decay toward zero when the sample size increases, as expected under firstorder asymptotic theory. As n increases, the AEs of the parameters tend to be closer to the true parameter values. This fact supports that the asymptotic normal distribution provides an adequate approximation to the finite sample distribution of the MLEs.
The OLLLG family with longterm survival
Models for survival analysis typically consider that every subject in the population under study is susceptible to the event of interest and will eventually experience such event if followup is sufficiently long. However, there are situations when a fraction of individuals are not expected to experience the event of interest, that is, those individuals are cured or not susceptible. Cure rate models for survival data have been used to model timetoevent data for various types of cancers, including breast cancer, nonHodgkins lymphoma, leukemia, prostate cancer and melanoma. These models have become very popular due to significant progress in treatment therapies leading to enhanced cure rates.
Models to accommodate a cured fraction have been widely developed. Perhaps the most popular type of cure rate models are the mixture models (MMs) developed by Boag (1949), Berkson and Gage (1952) and Farewell (1982). Some proposals have been made recently in the literature by more long term survival to model lifetimes with covariates. For example, Ortega et al. (2012) considered the problem of assessing local influence in the negative binomial beta Weibull regression model to predict the cure of prostate cancer, Hashimoto et al. (2013) derived curvature quantities under various perturbation schemes in Neyman type A betaWeibull model for longterm survivors, Fachini et al. (2014) adapted local influence methods to a bivariate regression model with cure fraction and, recently, Ortega et al. (2015) used local influence methods to the power series betaWeibull regression model for predicting breast carcinoma. The MMs allow simultaneously estimating whether the event of interest will occur, which is called incidence, and when it will occur, given that it can occur, which is called latency. Let N _{ i } (for i=1,…,n) be the indicator denoting that the ith individual is susceptible (N _{ i }=1) or nonsusceptible (N _{ i }=0), i.e., the population is divided into two subpopulations so that an individual either is cured with probability 0<p<1, or has a proper survival function S(x) with probability (1−p). In this work, we do not consider the regression structure, although future research using covariates on the probability p may be investigated. The MM can be represented by
where S _{ pop }(x _{ i }) is the unconditional survival function of x _{ i } for the entire population, S(x _{ i }N _{ i }=1)=1−F(x _{ i }N _{ i }=1) is the survival function for the susceptible individuals and p=P(N _{ i }=0) is the probability of cure of an individual. The pdf corresponding to (13) is given by
where f(x _{ i }N _{ i }=1) is the baseline pdf (see Section 2.1) for the susceptible individuals. Equations (13) and (14) are improper functions, since S _{ pop }(x) is not a proper survival function. We can omit sometimes the dependence on the indicator N _{ i } and write simply S(x _{ i }N _{ i }=1)=S(x), f(x _{ i }N _{ i }=1)=f(x), etc.
Inserting (3) in (14) and (2) in (13), the pdf and survival function of the OLLLG cure rate family are given, respectively, by
and
A random variable having density (15) is denoted by X∼OLLGcr(α,β,ξ,p). The hrf of the OLLGcr model is given by h _{ pop }(t)=f _{ pop }(t)/S _{ pop }(t).
7.1 Estimation
We consider the situation when the timetoevent is not completely observed and is subject to right censoring. Let c _{ i } denote the censoring time. We observe x _{ i }= min{x _{ i },c _{ i }} and δ _{ i }=I(x _{ i }≤c _{ i }), where δ _{ i }=1 if x _{ i } is a timetoevent and δ _{ i }=0 if x _{ i } is right censored (for i=1,…,n). From n pairs of times and censoring indicators (x _{1},δ _{1}),⋯,(x _{ n },δ _{ n }), the loglikelihood function under noninformative censoring is given by $\ell _{n}(\boldsymbol {\theta })=\sum _{i\in F} \log f_{pop}(x_{i};\boldsymbol {\theta }) + \sum _{i\in C} \log S_{pop}(x_{i};\boldsymbol {\theta }),$ where θ=(α,β,ξ,p)^{T} denotes the parameter vector and F and C denote the uncensored and censored sets of observations, respectively. Replacing f _{ pop }(x _{ i };θ) and S _{ pop }(x;θ) by (15) and (16), respectively, the loglikelihood reduces to
where r is the number of failures (uncensored observations). We can obtain the MLE $\widehat {\boldsymbol {\theta }}$ of θ by maximizing the loglikelihood (17) either directly in R using the optim function, in SAS using the NLMixed procedure and in other statistical software or by solving the nonlinear likelihood equations obtained by differentiating (17).
Applications
In this section, we provide four applications to real data. In the first three applications, we present some results by fitting special models defined in Section 2.1. In the fourth application, we present an application using the longterm survival model defined in Section 7.
For the first three applications, the goodnessoffit statistics including the Cramérvon Mises (W ^{∗}) and AndersonDarling (A ^{∗}) test statistics are used to compare the fitted models; see Chen and Balakrishnan (1995) for more details. The smaller the values of A ^{∗} and W ^{∗}, the better the fit to the data. We also consider the KolmogrovSmirnov (KS) statistic (and its corresponding pvalue) and minus the maximized loglikelihood ($\hat \ell _{n}$) for the sake of comparison. For the fourth application (censored data), we adopt the AIC and BIC statistics to compare the fitted models since the A ^{∗} and W ^{∗} statistics are not suitable for censored data.
For the next three applications, we consider the OLLLN distribution and, for the purpose of comparison, we fit the following models to the data sets described below:

The normal distribution.

The exponentiated normal (EN) distribution.

The logarithmic normal (LN) distribution, the special case of the OLLLN distribution when α=1.

The beta normal (BN) distribution (Eugene et al. 2002) with density
$$\begin{array}{@{}rcl@{}} f_{BN}(x)=\frac{1}{\sigma B(\alpha,\beta)} \left[\Phi\left(\frac{x\mu}{\sigma}\right)\right]^{\alpha1}\left[1\Phi\left(\frac{x\mu}{\sigma}\right)\right]^{\beta1} \phi\left(\frac{x\mu}{\sigma}\right). \end{array} $$ 
The gamma normal (GN) distribution (Alzaatreh et al. 2014) with density
$$\begin{array}{@{}rcl@{}} f_{GN}(x)=\frac{\beta^{\alpha}}{\sigma\Gamma(\alpha)} \left[\log\left\{1\Phi\left(\frac{x\mu}{\sigma}\right)\right\}\right]^{\alpha1}\left[1\Phi\left(\frac{x\mu}{\sigma}\right)\right]^{\beta1} \phi\left(\frac{x\mu}{\sigma}\right). \end{array} $$ 
The Kumaraswamy normal (KN) distribution (Cordeiro and de Castro 2011) with density
$$\begin{array}{@{}rcl@{}} f_{KN}(x)=\frac{\alpha\beta}{\sigma} \left\{\Phi\left[\left(\frac{x\mu}{\sigma}\right)\right]\right\}^{\alpha1} \left\{1\left[\Phi\left(\frac{x\mu}{\sigma}\right)\right]^{\alpha}\right\}^{\beta1}\phi\left(\frac{x\mu}{\sigma}\right). \end{array} $$ 
The odd loglogistic normal (OLLN) distribution (special case of OLLLN distribution when β→1) with density (Braga et al. 2016)
$$\begin{array}{@{}rcl@{}} f_{OLLN}(x)=\frac{\alpha\, \phi\left(\frac{x\mu}{\sigma}\right)[\Phi\left(\frac{x\mu}{\sigma}\right) ]^{\alpha1} [1\Phi\left(\frac{x\mu}{\sigma}\right)]^{\alpha1}}{\sigma \{[1\Phi\left(\frac{x\mu}{\sigma}\right)]^{\alpha}+[\Phi\left(\frac{x\mu}{\sigma}\right)]^{\alpha}\}^{2}}, \end{array} $$
where $x\in \mathbb {R}$, $\mu \in \mathbb {R}$, α>0, β>0 and σ>0.
8.1 Application 1
First, we consider the data set representing the failure times of a particular windshield device. These data were also studied by Blischke and Murthy (2000) and Murthy et al. (2004). The data, referred as D1, are: 0.040, 1.866, 2.385, 3.443, 0.301, 1.876, 2.481, 3.467, 0.309, 1.899, 2.610, 3.478, 0.557, 1.911, 2.625, 3.578, 0.943, 1.912, 2.632, 3.595, 1.070, 1.914, 2.646, 3.699, 1.124, 1.981, 2.661, 3.779, 1.248, 2.010, 2.688, 3.924, 1.281, 2.038, 2.823, 4.035, 1.281, 2.085, 2.890, 4.121, 1.303, 2.089, 2.902, 4.167, 1.432, 2.097, 2.934, 4.240, 1.480, 2.135, 2.962, 4.255, 1.505, 2.154, 2.964, 4.278, 1.506, 2.190, 3.000, 4.305, 1.568, 2.194, 3.103, 4.376, 1.615, 2.223, 3.114, 4.449, 1.619, 2.224, 3.117, 4.485, 1.652, 2.229, 3.166, 4.570, 1.652, 2.300, 3.344, 4.602, 1.757, 2.324, 3.376, 4.663.
The MLEs of the parameters and the standard errors (SEs) in parentheses and the goodnessoffit statistics for D1 are listed in Table 3. We note that the OLLLN model outperforms all the fitted competitive models under these statistics.
The histogram of the data D1 and fitted densities are displayed in Fig. 6. We note that the fitted OLLLN distribution best captures the empirical histogram. Based on the equations given in Section 4, we give some measures based on the moments of the OLLLN distribution. The expected value and the variance of the failure times of windshield devices are: E(X)=2.58 and Var(X)=1.24., respectively. Also, the skewness and kurtosis measures are given by Skewness(X)=0.23 and Kurtosis(X)=2.42, thus indicating that the tail on the right side is longer and then it is a platykurtic distribution.
8.2 Application 2
The second data set D2 consists of lifetimes of 43 blood cancer patients (in days) from one of the Health Hospitals in Saudi Arabia (Abouammoh and Abdulghani 1994). These data are: 115, 181, 255, 418, 441, 461, 516, 739, 743, 789, 807, 865, 924, 983, 1025, 1062, 1063, 1165, 1191, 1222, 1222, 1251, 1277, 1290, 1357, 1369, 1408, 1455, 1478, 1519, 1578, 1578, 1599, 1603, 1605, 1696, 1735, 1799, 1815,1852, 1899, 1925, 1965.
The MLEs of the parameters and SEs (in parentheses) and the goodnessoffit statistics for D2 are listed in Table 4. We conclude that the KN model has the smallest $\hat \ell _{n}$ and the OLLLN has the third smallest $\hat \ell _{n}$, the BN model has the smallest W ^{∗} and the OLLLN and KN models have the second smallest W ^{∗}’s. The OLLLN and BN models have the smallest A ^{∗}’s. The OLLLN and GN models have the smallest KS statistics. The GN model has the largest pvalue and the OLLLN and KN models have the second largest pvalues. Therefore, we can conclude that the OLLLN model has either the best fit or is very close to the best fit with respect to the current criterions. The histogram of D2 and the fitted densities are displayed in Fig. 7.
8.3 Application 3
The third data set D3 includes the lower discharge of at least seven consecutive days and return period (time) of ten years of the Cuiabá River, Cuiabá, Mato Grosso, Brazil. These data have also been studied by Cordeiro et al. (2012). The MLEs of the parameters and SEs (in parentheses) and the goodnessoffit statistics for D3 are listed in Table 5. We note that the OLLLN model outperforms all other fitted models.
The fitted densities for the models listed in Table 5 are displayed in Fig. 8. We verify that the fitted OLLLN distribution best captures the histogram of these data.
In summary, we conclude that the OLLLN distribution outperforms all the fitted competitive models under the selected criterion for D1, D2 and D3. For all three data sets, we verify that the fitted OLLLN distribution best captures the three histograms, especially for the third data set, which indicates the outstanding performance of this distribution.
8.4 Application 4: OLLLG longterm survival models
These data consist of n=493 lifetimes (t _{ i } in months) of patients diagnosed with breast cancer. The steps to construct these data can be found in Gendoo et al. (2015). In many applications there is qualitative information about the hazard shape, which can help for selecting a particular polyhazard model. In this context, a device called the total time on test (TTT) plot is useful. The TTT plot is obtained by plotting $\mathsf {G}(r/n)=\left [\left (\sum _{i=1}^{r}T_{i:n}\right)+(nr)T_{r:n}\right ]/\left (\sum _{i=1}^{n}T_{i:n}\right)$, where r=1,…,n, and T _{ i:n } (for i=1,…,n) are the order statistics of the sample, against r/n. It is a straight diagonal for constant hazards leading to an exponential model. It is convex for decreasing hazards and concave for increasing hazards leading to a singleWeibull model. It is first convex and then concave if the hazard is bathtubshaped leading to a biWeibull model. It is first concave and then convex if the hazard is bimodalshaped leading to a loglogistic model. For multimodal hazards, the TTT plot contains several concave and convex regions. The TTT plot in Fig. 10 a indicates an increasingdecreasingincreasing hrf. So, the OLLLW distribution would be a good option to model these data.
Next, we compare the results by fitting the OLLLWcr model and some of its submodels such as: the odd loglogistic Weibull cure rate (OLLWcr) model (OLLLNcr distribution when β→1), the logarithmic Weibull cure rate (LWcr) model (OLLLNcr distribution when α=1) and Weibull cure rate model (OLLLNcr distribution when α=1 and β→1). Table 6 provides the MLEs (and the corresponding SEs in parentheses) of the model parameters and the values of the AIC and BIC statistics. The results indicate that the OLLLWcr model has the lowest values of these statistics among those values of the fitted models, and therefore it could be chosen as the best model. On the other hand, the proportion of cured individuals obtained by the KaplanMeier estimator is 0.577. Thus, we can conclude based on the figures in Table 6 that the OLLLNcr model gives a more accurate estimate for the proportion of cured individuals.
The adequacy of the fitted models can also be noted in Fig. 9, which presents the empirical and estimated survival functions. Based on these plots, we can conclude that the OLLLWcr model provides a good fit for the breast cancer data. In additional, the empirical scaled TTT transform can be used to identify the shape of the hazard function. The fitted hazard function for the OLLLWcr model is displayed in Fig. 10 b, which we observe bimodal shapes, thus indicating a good fit.
Conclusions
We study some mathematical properties of the odd loglogistic logarithmicG family of distributions with two extra shape parameters α>0 and β∈(0,1). We provide some special models, a very useful linear representation for the density function in terms of exponentiated densities, explicit expressions for the moments, generating function, entropies and order statistics. The model parameters are estimated by the method of maximum likelihood. We perform a simulation study to verify the adequacy of the estimators. We also introduce a longterm survival model based on the new family. The importance of the proposed models is illustrated by means of four real life data sets. The new models provide consistently better fits than other competitive models for the current data.
Appendix 1: Asymptotes and shapes
Corollary 1
Let a= inf{xF(x)>0}. The asymptotics of Eqs. (2), (3) and (4) when x→a are given by
Corollary 2
The asymptotics of Eqs. (2), (3) and (4) when x→∞ are given by
The shapes of the density and hazard rate functions can be described analytically. The critical points of the OLLLG density function are the roots of the equation:
There may be more than one root to (18). Let $\lambda (x)=\frac {{\mathrm {d}}^{2}\log [f(x)] }{{\mathrm {d}} x^{2}}$. We have
If x=x _{0} is a root of (18) then it corresponds to a local maximum (minimum) if λ(x _{0})<0 (λ(x _{0})>0). It refers to a point of inflexion if λ(x _{0})=0.
The critical point of h(x) are obtained from the equation
There may be more than one root to (19).
Appendix 2: Useful power series
The power series derived in this appendix are required for the proofs of the linear representations in Section 3. All power series given below are convergent for u≤1. In Sections 3 and 5.1, they can be applied for the support of X since the quantity $\beta G(x,\boldsymbol {\xi })^{\alpha }/[G(x,\boldsymbol {\xi })^{\alpha }+\bar {G}(x,\boldsymbol {\xi })^{\alpha }]$ does belong to the interval (0,1) when β∈(0,1).
First, for a>0, we have the generalized binomial expansion
which holds for u≤1.
Second, we obtain an expression for $\left [\frac {u^{\alpha }}{u^{\alpha }+(1u)^{\alpha }}\right ]^{m}$, where α>0 is a real number, m is a natural number and u≤1. We can write
where
The power series (20) and (21) and the others derived from them converge everywhere. For any real α>0, the power series follows from (20) and (21)
where $b_{k}(\alpha)=a_{k}(\alpha)+(1)^{k}\,\binom {\alpha }{k}$. Combining (21) and (22), we have (see Gradshteyn and Ryzhik 2000, Section 0.313)
where c _{0}(α)=a _{0}(α)/b _{0}(α) and the c _{ k }(α)’s (for k≥1) are determined from the recurrence equation
Third, based on (23) and using a result of Gradshteyn and Ryzhik (2000Section 0.314) for a power series raised to a positive integer number, we obtain
where $h^{*}_{0}(\alpha,m)=c_{0}(\alpha)^{m}$ and, for k≥1,
Fourth, we obtain a power series for $ \frac {u^{\gamma }}{\left [u^{\alpha }+(1u)^{\alpha }\right ]^{w}}$, which is applied in Section 5.1, where w and α are positive real numbers, γ>0 and 0<u<1. We have
The second term in (25) can be expanded as
where $a_{i}(w)=\sum \limits _{j=i}^{\infty } (1)^{i+j}\,\binom {w}{j}\,\binom {j}{i}$. Now, from (24), we have
where
The first term u ^{−αw−γ} in (25) can be expanded as
where $s_{2,k}^{*}(\alpha,w,\gamma)=\sum \limits _{i=k}^{\infty }(1)^{i+k}\, {\alpha w \gamma \choose i}\,{i \choose k}$. Finally, from (25), (26), (27) and using a result of Gradshteyn and Ryzhik (2000 Section 0.316) for multiplication of two power series, we obtain
where $s^{*}_{k}(\alpha,w,\gamma)=\sum \limits _{j=0}^{k}s_{1,j}^{*}(\alpha,w)\,s_{2,kj}^{*}(\alpha,w,\gamma)$. Equation (28) is the main result to obtain the Rényi entropy in Section 5.1.
Appendix 3: Properties for a special model
The OLLLE distribution is defined by inserting G(x)=1−e^{−λx} and g(x)=λ e^{−λx} in Eq. (3), where x>0 and λ>0. Let X be the random variable representing this distribution. We derive some statistical measures of X from the asymptotics in Appendix 1 and the general results in Sections 4 and 5.2
Corollary 3
The asymptotics of Eqs. (2), (3) and (4) for the OLLLE distribution when x→0 are given by
Corollary 4
The asymptotics of Eqs. (2), (3) and (4) for the OLLLE distribution when x→∞ are given by
These equations can provide the effects of the parameters on the tails of the OLLLE distribution.
We provide some statistical measures of X. Its nth ordinary moment follows from (8) as
where d _{ k } is defined in Eq. (6).
Further, the nth incomplete moment of the OLLLE distribution is obtained from (9) as
where $\gamma (a,z)=\int _{0}^{z}t^{a1}\,\mathrm {e}^{t}\,dt$ denotes the incomplete gamma function.
The mgf of X comes immediately from (10)
Finally, the rth ordinary moment of the ith OLLLE order statistic reduces to
where s _{ k } is given by (11).
References
Abouammoh, AM, Abdulghani, SA: On partial orderings and testing of new better than renewal used classes. Reliability Eng. Syst. Safety. 43, 37–41 (1994).
Alzaatreh, A, Famoye, F, Lee, C: The gammanormal distribution: Properties and applications. Comput. Stat. Data Anal. 69, 67–80 (2014).
Berkson, J, Gage, RP: Survival curve for cancer patients following treatment. J. Am. Stat. Assoc. 47, 501–515 (1952).
Blischke, WR, Murthy, DNP: Reliability: Modeling, Prediction and Optimization. 1st ed. Wiley, New York (2000).
Boag, JW: Maximum likelihood estimates of the proportion of patients cured by cancer therapy. J. R. Stat. Soc. Series B. 11, 15–53 (1949).
Bourguignon, M, Silva, RB, Cordeiro, GM: The WeibullG family of probability distributions. J. Data Sci. 12, 53–68 (2014).
Braga, AS, Cordeiro, GM, Ortega, GMM, da Cruz, JN: The odd loglogistic normal distribution: Theory and applications in analysis of experiments. J. Stat. Theory Prac (2016). doi:10.1080/15598608.2016.1141127.
Chen, G, Balakrishnan, N: A general purpose approximate goodnessoffit test. J. Q. Technol. 27, 154–161 (1995).
Cordeiro, GM, Alizadeh, M, Ortega, EMM: The exponentiated halflogistic family of distributions: Properties and applications. J Probab Stat. 1, 1–21 (2014a).
Cordeiro, GM, de Castro, M: A new family of generalized distributions. J. Stat. Comput. Simul. 81, 883–893 (2011).
Cordeiro, GM, Nadarajah, S: Closedform expressions for moments of a class of beta generalized distributions. Braz. J. Prob. Stat. 25, 14–33 (2011).
Cordeiro, GM, Nadarajah, S, Ortega, EMM: The Kumaraswamy Gumbel distribution. Stat. Methods Appl. 21, 139–168 (2012).
Cordeiro, GM, Ortega, EMM, Bozidar, PV, Pescim, RR: The Lomax generator of distributions: Properties, minification process and regression model. Appl Math Comput. 247, 465–486 (2014b).
Eugene, N, Lee, C, Famoye, F: Betanormal distribution and its applications. Commun. StatisticsTheory Methods. 31, 497–512 (2002).
Fachini, JB, Ortega, EMM, Cordeiro, GM: A bivariate regression model with cure fraction. J. Stat. Comput. Simul. 84, 1580–1595 (2014).
Farewell, VT: The use of mixture models for the analysis of survival data with longterm survivors. Biometrics. 38, 1041–1046 (1982).
Gendoo, DMA, Ratanasirigulchai, N, Schröder, M, Pare, L, Parker, JS, Prat, A, N HaibeKains, B: genefu: a package for breast cancer gene expression analysis (2015). Retrieved 20160330, from https://bioc.ism.ac.jp/packages/devel/bioc/vignettes/genefu/inst/doc/genefu.pdf, https://goo.gl/jngJMY.
Gleaton, JU, Lynch, JD: On the distribution of the breaking strain of a bundle of brittle elastic fibers. Adv. Appl. Probab. 36, 98–115 (2004).
Gleaton, JU, Lynch, JD: Properties of generalized loglogistic families of lifetime distributions. J. Probab. Stat. Sci. 4, 51–64 (2006).
Gleaton, JU, Lynch, JD: Extended generalized loglogistic families of lifetime distributions with an application. J. Probab. Stat. Sci. 8, 1–17 (2010).
Gleaton, JU, Rahman, MM: Asymptotic properties of MLE’s for distributions generated from a 2parameter Weibull distribution by a generalized loglogistic transformation. J. Probab. Stat. Sci. 8, 199–214 (2010).
Gleaton, JU, Rahman, MM: Asymptotic properties of MLE’s for distributions generated from a 2parameter inverse Gaussian distribution by a generalized loglogistic transformation. J. Probab. Stat. Sci. 12, 85–99 (2014).
Gradshteyn, IS, Ryzhik, IM: Table of Integrals, Series, and Products. 7th ed. Academic Press, San Diego (2000).
Gupta, RD, Kundu, D: Generalized exponential distributions. Aust. New Zealand J. Stat. 42, 173–188 (1999).
Hashimoto, EM, Cordeiro, GM, Ortega, EMM: The new Neyman type A beta Weibull model with longterm survivors. Comput. Stat. 28, 933–954 (2013).
Marshall, AN, Olkin, I: A new method for adding a parameter to a family of distributions with applications to the exponential and Weibull families. Biometrika. 84, 641–652 (1997).
Mudholkar, GS, Srivastava, DK: Exponentiated Weibull family for analyzing bathtub failurerate data. IEEE Trans. Reliab. 42, 299–302 (1993).
Mudholkar, GS, Srivastava, DK, Freimer, M: The exponentiated Weibull family: A reanalysis of the busmotorfailure data. Technometrics. 37, 436–445 (1995).
Murthy, DNP, Xie, M, Jiang, R: Weibull Models. 1st ed. Wiley, Hoboken (2004).
Nadarajah, S: The exponentiated Gumbel distribution with climate application. Environmetrics. 17, 13–23 (2006).
Nadarajah, S, Kotz, S: The exponentiated type distributions. Acta Applicandae Mathematicae. 92, 97–111 (2006).
Ortega, EMM, Cordeiro, GM, Kattan, MW: The negative binomialbeta Weibull regression model to predict the cure of prostate cancer. J. Appl. Stat. 39, 1191–1210 (2012).
Ortega, EMM, Cordeiro, GM, Campelo, AK, Kattan, MW, Cancho, VG: A power series betaWeibull regression model for predicting breast carcinoma. Stat. Med. 34, 1366–1388 (2015).
Rényi, A: On measures of entropy and information. Proc. Fourth Berkeley Symp. Math. Stat. Probab. 1, 547–561 (1961).
R Development Core Team: R: A Language and Environment for Statistical Computing (2013).
Shannon, CE: Prediction and entropy of printed English. Bell Syst. Tech. J. 30, 50–64 (1951).
Acknowledgements
The authors are very grateful to the editor Felix Famoye for helpful comments and suggestions.
Authors’ contributions
The five authors jointly participated of the study and elaborated the research. Gauss Cordeiro corrected the manuscript throughout. All authors read and approved the final manuscript.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Author information
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Received
Accepted
Published
DOI
Keywords
 Generated family
 Maximum likelihood
 Moment
 Order statistic
 Quantile function
 Survival analysis
AMS Subject Classification
 97K50
 62N01
 62N02