The generalized log-logistic distribution for a nonnegative random variable *T* can be conveniently specified in terms of the hazard function as follows:

$$ h(t;\boldsymbol{\alpha})=\frac{\kappa \rho (\rho t)^{\kappa-1}}{1+(\gamma t)^{\kappa}}, t>0, $$

(1)

where *ρ*>0, *κ*>0 and *γ*>0 are parameters and *α*=(*κ*,*γ*,*ρ*)^{′}. If *γ* depends on *ρ* via *γ*=*ρ* and *γ*=*ρ*
*η*
^{−1/κ} with *η*>0, then (1) reduces to the hazard function of the log-logistic (Lawless 2002) and Burr XII (Wang et al. 2008) distributions, respectively. Taking *γ* not dependent on *ρ*, it is easy to verify that (1) is closed under PH relationship (see below). The hazard function is monotone decreasing when *κ*≤1, and unimodal when *κ*>1 (i.e., *h*(*t*;*α*)=0 at *t*=0, increases to a maximum at *t*=[(*κ*−1)/*γ*
^{κ}]^{1/κ}, and then approaches zero monotonically as *t*→*∞*). Note that (1) approaches the Weibull hazard function as *γ*
^{κ}→0. This particular feature of the generalized log-logistic model enables it to handle monotone increasing hazard satisfactorily via *κ*>1 and *γ* small (close to zero).

The survivor function, probability density function and cumulative hazard function of the generalized log-logistic distribution are, respectively,

$$\begin{array}{*{20}l} &S(t;\boldsymbol{\alpha})=[1+(\gamma t)^{\kappa}]^{-\frac{\rho^{\kappa}}{\gamma^{\kappa}}}, \end{array} $$

(2)

$$\begin{array}{*{20}l} &f(t;\boldsymbol{\alpha})=\frac{\kappa \rho (\rho t)^{\kappa-1}}{\left[1+(\gamma t)^{\kappa}\right]^{\frac{\rho^{\kappa}}{\gamma^{\kappa}}+1}}, \end{array} $$

(3)

$$\begin{array}{*{20}l} &H(t;\boldsymbol{\alpha})=\frac{\rho^{\kappa}}{\gamma^{\kappa}} \log{[1+(\gamma t)^{\kappa}]}. \end{array} $$

(4)

The median of the distribution is \(\frac {\left (2^{\frac {\gamma ^{\kappa }}{\rho ^{\kappa }}}-1\right)^{\frac {1}{\kappa }}}{\gamma }\), and the *r*
^{th} moment is

$$ E(T^{r})=\frac{\rho^{\kappa}}{\gamma^{\kappa+r}}~ \frac{\Gamma\left(\frac{\rho^{\kappa}}{\gamma^{\kappa}}-\frac{r}{\kappa}\right) \Gamma\left(\frac{r}{\kappa}+1\right)}{\Gamma\left(\frac{\rho^{\kappa}}{\gamma^{\kappa}}+1\right)}~~ \text{provided}~~ \frac{\kappa \rho^{\kappa}}{\gamma^{\kappa}}>r. $$

In particular, the mean is \(E(T)=\frac {\rho ^{\kappa }}{\gamma ^{\kappa }} \frac {\Gamma \left (\frac {\rho ^{\kappa }}{\gamma ^{\kappa }}-\frac {1}{\kappa }\right) \Gamma \left (\frac {1}{\kappa }+1\right)}{\Gamma \left (\frac {\rho ^{\kappa }}{\gamma ^{\kappa }}+1\right)}\) provided \(\frac {\kappa \rho ^{\kappa }}{\gamma ^{\kappa }}>1\).

For the family of PH models with covariates **z**=(*z*
_{1},*z*
_{2},…,*z*
_{
p
})^{′}, the hazard function for *T* can be expressed as

$$ h(t;\mathbf{z})=h_{0}(t;\boldsymbol{\alpha})~ e^{\mathbf{z}'\boldsymbol{\beta}}, $$

(5)

where *h*
_{0}(*t*;*α*) is the baseline hazard function (i.e., the hazard function when **z**=**0**) characterized by the vector of parameters *α*, and *β*=(*β*
_{1},*β*
_{2},…,*β*
_{
p
})^{′} is the vector of regression coefficients. A fully parametric PH model can be formulated by specifying *h*
_{0}(*t*;*α*) parametrically. If *h*
_{0}(*t*;*α*) is specified by the generalized log-logistic hazard function (1), then (5) takes the form

$$ h(t;\mathbf{z})=\frac{\kappa \rho^{*} (\rho^{*} t)^{\kappa-1}}{1+(\gamma t)^{\kappa}}, $$

(6)

where \(\phantom {\dot {i}\!}\rho ^{*}=e^{\mathbf {z}'\boldsymbol {\beta }/\kappa }\). Thus the generalized log-logistic is closed under proportionality of hazards. Another widely used parametric PH family is the Weibull, for which *h*
_{0}(*t*;*α*)=*κ*
*ρ*(*ρ*
*t*)^{κ}. Note that the Cox PH model is semiparametric, for which the baseline hazard function in (5) is left arbitrary and is denoted by *h*
_{0}(*t*).

### Estimation

Suppose that a censored random sample consisting of data (*t*
_{
i
},*δ*
_{
i
},**z**
_{
i
}), *i*=1,2,…,*n*, is available, where *t*
_{
i
} is a lifetime or censoring time according to whether *δ*
_{
i
}=1 or 0, respectively, and **z**
_{
i
}=(*z*
_{
i1},*z*
_{
i2},…,*z*
_{
ip
})^{′} is the vector of covariates for the *i*
^{th} individual. Letting \(m=\sum _{i=1}^{n} \delta _{i}\), *a*
_{
i
}= exp(**z**
*i*′*β*) and *b*
_{
i
}=(*γ*
*t*
_{
i
})^{κ}, the log-likelihood function for the generalized log-logistic PH can be written as

$$\begin{array}{*{20}l} \ell(\boldsymbol{\theta})&=m\log{\kappa}+m\kappa\log{\rho}+(\kappa-1)\sum_{i=1}^{n}{\delta_{i} \log{t_{i}}}-\sum_{i=1}^{n}{\delta_{i} \log{(1+b_{i})}} \\ &\quad+\sum_{i=1}^{n}{\delta_{i} \log{a_{i}}}-\left(\frac{\rho}{\gamma}\right)^{\kappa} \sum_{i=1}^{n}{a_{i}\log{(1+b_{i})}}, \end{array} $$

(7)

where *θ*=(*α*
^{′},*β*
^{′})^{′}. The first derivatives of the log-likelihood function are

$$ \begin{aligned} \frac{\partial \ell(\boldsymbol{\theta})}{\partial \kappa}&=\frac{m}{\kappa}+m\log{\rho}+\sum_{i=1}^{n}\delta_{i}\log{t_{i}}-\frac{1}{\kappa}\sum_{i=1}^{n}\delta_{i}b_{i}c_{i} -\left(\frac{\rho}{\gamma}\right)^{\kappa} \left(\frac{1}{\kappa}\right)\sum_{i=1}^{n}a_{i}b_{i}c_{i}\\ &\quad- \left(\frac{\rho}{\gamma}\right)^{\kappa}\log{\left(\frac{\rho}{\gamma}\right)}\sum_{i=1}^{n}a_{i}\log{(1+b_{i})},& \end{aligned} $$

(8)

$$ \begin{aligned} &\frac{\partial \ell(\boldsymbol{\theta})}{\partial \gamma}=-\left(\frac{\kappa}{\gamma}\right)\sum_{i=1}^{n}\delta_{i}d_{i}-\left(\frac{\kappa}{\gamma}\right) \left(\frac{\rho}{\gamma}\right)^{\kappa} \sum_{i=1}^{n}a_{i}d_{i}- \left(\frac{\kappa}{\gamma}\right) \left(\frac{\rho}{\gamma}\right)^{\kappa} \sum_{i=1}^{n}a_{i}\log{(1-d_{i})}, & \end{aligned} $$

(9)

$$ \begin{aligned} &\frac{\partial \ell(\boldsymbol{\theta})}{\partial \rho}=\frac{m\kappa}{\rho}-\left(\frac{\kappa}{\rho}\right)\left(\frac{\rho}{\gamma}\right)^{\kappa}~ \sum_{i=1}^{n}a_{i}\log{(1+b_{i})},& \end{aligned} $$

(10)

$$ \begin{aligned} &\frac{\partial \ell(\boldsymbol{\theta})}{\partial \beta_{j}}=\sum_{i=1}^{n}\delta_{i}z_{ij}-\left(\frac{\rho}{\gamma}\right)^{\kappa} \sum_{i=1}^{n} a_{i}\log{(1+b_{i})}z_{ij}\ \text{for}~ j=1,2,\ldots,p,& \end{aligned} $$

(11)

where *c*
_{
i
}= log*b*
_{
i
}/(1+*b*
_{
i
}) and *d*
_{
i
}=*b*
_{
i
}/(1+*b*
_{
i
}) (see Appendix). To improve the convergence of iterative procedures for maximum likelihood estimation and the accuracy of large-sample methods, we remove range restrictions on parameters through the parameterizations *α*
^{∗}=(*κ*
^{∗},*γ*
^{∗},*ρ*
^{∗})^{′}, where *κ*
^{∗}= log*κ*, *γ*
^{∗}= log*γ* and *ρ*
^{∗}= log*ρ*. The maximum likelihood estimate of *θ*
^{∗}=(*α*
^{∗}
^{′},*β*
^{′})^{′} can then be obtained by solving the equations *∂*
*ℓ*(*θ*
^{∗})/*∂*
*κ*
^{∗}=0, *∂*
*ℓ*(*θ*
^{∗})/*∂*
*γ*
^{∗}=0, *∂*
*ℓ*(*θ*
^{∗})/*∂*
*ρ*
^{∗}=0 and *∂*
*ℓ*(*θ*
^{∗})/*∂*
*β*
_{
j
}=0 iteratively, where (see Appendix)

$$\frac{\partial \ell(\boldsymbol{\theta}^{*})}{\partial \kappa^{*}}= \left[\kappa\left(\frac{\partial \ell(\boldsymbol{\theta})}{\partial \kappa}\right)\right]_{\boldsymbol{\alpha}=\exp{(\boldsymbol{\alpha}^{*})}},~~ \frac{\partial \ell(\boldsymbol{\theta}^{*})}{\partial \gamma^{*}}= \left[\gamma\left(\frac{\partial \ell(\boldsymbol{\theta})}{\partial \gamma}\right)\right]_{\boldsymbol{\alpha}=\exp{(\boldsymbol{\alpha}^{*})}}, $$

$$\frac{\partial \ell(\boldsymbol{\theta}^{*})}{\partial \rho^{*}}= \left[\rho\left(\frac{\partial \ell(\boldsymbol{\theta})}{\partial \rho}\right)\right]_{\boldsymbol{\alpha}=\exp{(\boldsymbol{\alpha}^{*})}},~~ \frac{\partial \ell(\boldsymbol{\theta}^{*})}{\partial \beta_{j}}= \left[\frac{\partial \ell(\boldsymbol{\theta})}{\partial \beta_{j}}\right]_{\boldsymbol{\alpha}=\exp{(\boldsymbol{\alpha}^{*})}}. $$

Many software packages have reliable optimization procedures to maximize log-likelihood functions. We wrote our computer code in R (R Core Team 2016), and used the function *nlminb* for optimization (see the Additional file 1).

### Initial values

We may use Weibull, log-logistic and Cox PH fits to generate initial values in solving the equations *∂*
*ℓ*(*θ*
^{∗})/*∂*
*κ*
^{∗}=0, *∂*
*ℓ*(*θ*
^{∗})/*∂*
*γ*
^{∗}=0, *∂*
*ℓ*(*θ*
^{∗})/*∂*
*ρ*
^{∗}=0 and *∂*
*ℓ*(*θ*
^{∗})/*∂*
*β*
_{
j
}=0. Let \(\hat {\kappa }_{1}\) and \(\hat {\rho }_{1}\) be the maximum likelihood estimates of the Weibull shape and scale parameters, respectively, \(\hat {\kappa }_{2}\) and \(\hat {\rho }_{2}\) the maximum likelihood estimates of the log-logistic shape and scale parameters, respectively, and \(\hat {\boldsymbol {\beta }^{*}}\) the estimates of the regression coefficients for the Cox PH model. Note that maximum likelihood methods for the Weibull, log-logistic and Cox PH models are available in many statistical softwares, including R (R Core Team 2016). We propose to use \(\log {\hat {\kappa }_{1}}\), \(\log {|\hat {\kappa }_{1}-\hat {\kappa }_{2}|}\), \(\log {\hat {\rho }_{1}}\) and \(\hat {\boldsymbol {\beta }^{*}}\) as initial values for *κ*
^{∗}, *γ*
^{∗}, *ρ*
^{∗} and *β*, respectively. If convergence is not achieved with these initial values, we propose to replace \(\log {\hat {\kappa }_{1}}\) and \(\log {\hat {\rho }_{1}}\) by \(\log {\hat {\kappa }_{2}}\) and \(\log {\hat {\rho }_{2}}\), respectively. In fitting the generalized log-logistic model to many data sets, we have not experienced any difficulty in obtaining convergence with this technique.

### Tests and confidence intervals

Tests and interval estimates for the model parameters are based on the approximate normality of the maximum likelihood estimators. The asymptotic distribution of \(\boldsymbol {\hat {\theta }}^{*}\) is approximately a (*p*+3)-variate normal distribution with mean *θ*
^{∗} and covariance matrix \(\Sigma =I(\boldsymbol {\hat {\theta }}^{*})^{-1}\), where

$$I(\boldsymbol{\hat{\theta}}^{*})=- \left[\begin{array}{cccc} \frac{\partial^{2} \ell(\boldsymbol{\theta}^{*})}{\partial \kappa^{*2}} & \frac{\partial^{2} \ell(\boldsymbol{\theta}^{*})}{\partial \kappa^{*} \partial \gamma^{*}} & \ldots & \frac{\partial^{2} \ell(\boldsymbol{\theta}^{*})}{\partial \kappa^{*} \partial \beta_{p}}\\ \frac{\partial^{2} \ell(\boldsymbol{\theta}^{*})}{\partial \gamma^{*} \partial \kappa^{*}} & \frac{\partial^{2} \ell(\boldsymbol{\theta}^{*})}{\partial \gamma^{*2}} & \ldots & \frac{\partial^{2} \ell(\boldsymbol{\theta}^{*})}{\partial \gamma^{*} \partial \beta_{p}}\\ \vdots & \vdots & \vdots & \vdots\\ \frac{\partial^{2} \ell(\boldsymbol{\theta}^{*})}{\partial \beta_{p} \partial \kappa^{*}} & \frac{\partial^{2} \ell(\boldsymbol{\theta}^{*})}{\partial \beta_{p}\partial \gamma^{*}} & \ldots & \frac{\partial^{2} \ell(\boldsymbol{\theta^{*}})}{\partial {\beta^{2}_{p}}} \end{array}\right]_{\boldsymbol{\theta}^{*}=\boldsymbol{\hat{\theta}}^{*}}. $$

is the (*p*+3)×(*p*+3) observed information matrix (second derivatives of *ℓ*(*θ*
^{∗}) are given in Appendix: Derivatives of the log-likelihood function). By the multivariate delta method, the asymptotic distribution of \(\boldsymbol {\hat {\theta }}\) is also approximately normal with mean *θ* and covariance matrix *D*
*Σ*
*D*
^{′}, where *D* is the (*p*+3)×(*p*+3) diagonal matrix \(diag(\boldsymbol {\hat {\alpha }},1,1,\ldots,1)\) and \(\boldsymbol {\hat {\alpha }}=\exp {(\boldsymbol {\hat {\alpha }}^{*})}\).

### Generalized log-logistic distribution in joint modeling

Joint models are used to quantify association between an internal time-dependent covariate and time until an event of interest occurs (Wulfsohn and Tsiatis 1997). It involves two separate models: a model that takes into account measurement error in the time-dependent covariate to estimate its true values (longitudinal model), and another model that uses these estimated values to quantify the association between this covariate and the time to the occurrence of the event (time-to-event model). The idea behind the joint modeling technique is to couple the time-to-event model with the longitudinal model. The general framework of the maximum likelihood method and large sample theory can be found in Rizopoulos (2012). Maximization of the log-likelihood function for joint modeling is computationally challenging, as it involves evaluating multiple integrals that do not have an analytical solution, except in very special cases. The R package **JM** has been developed by Rizopoulos (2010) to fit joint models using Weibull baseline hazard, piecewise-constant baseline hazard, spline approximation of the baseline hazard and unspecified baseline hazard functions. We have modified the source codes for Weibull to fit joint models using the generalized log-logistic baseline hazard function. The application of the generalized log-logistic distribution in joint modeling is illustrated with an example in Section 1.

### Goodness of fit

The nonparametric estimates are useful for assessing the quality of fit of a particular parametric time-to-event model (Lawless 2002). For a model without covariate, we use the approach to simultaneously examine plots of parametric and nonparametric estimates of the survival function, superimposed on the same graph. Let \(S(t;\hat {\boldsymbol {\theta }})\) and \(\hat {S}(t)\) be the estimates of the survivor functions based on the parametric model of interest and the Kaplan-Meier method (Kaplan and Meier 1958), respectively. The estimates \(S(t;\hat {\boldsymbol {\theta }})\) as a function of *t* should be close to \(\hat {S}(t)\) if the parametric model is adequate. For a model with covariates, we consider residual diagnostic plots, where the residuals are defined based on the cumulative hazard function *H*(*t*;*θ*). If \(\hat {S}(H(t;\boldsymbol {\hat {\theta }}))\) is the Kaplan-Meier estimate of \(H(t;\boldsymbol {\hat {\theta }})\), then a plot of \(-\log \hat {S}(H(t;\boldsymbol {\hat {\theta }}))\) versus \(H(t;\boldsymbol {\hat {\theta }})\) should be roughly a straight line with unit slope when the model is adequate (Lawless 2002).

We also use the Akaike’s information criterion (AIC) (Akaike 1974) to compare the fits of different models. The AIC is defined by

$$\textrm{AIC}=-2 \log(\mathrm{maximized~likelihood}) + 2(p+k), $$

where *p* is the number of covariates and *k* is the number of parameters of the assumed probability distribution (*k*=3 for the generalized log-logistic model). In general, when comparing two or more models, we prefer the one with the lowest AIC value. A rule of thumb is that if *Δ*
_{M}=AIC_{M}−AIC_{min}>2, then there is considerably less support for Model M compared to the model with minimum AIC (Burnham and Anderson 2002).