 Research
 Open Access
 Published:
The powerCauchy negativebinomial: properties and regression
Journal of Statistical Distributions and Applications volume 5, Article number: 1 (2018)
Abstract
We propose and study a new compounded model to extend the halfCauchy and powerCauchy distributions, which offers more flexibility in modeling lifetime data. The proposed model is analytically tractable and can be used effectively to analyze censored and uncensored data sets. Its density function can have various shapes such as reversedJ and rightskewed. It can accommodate different hazard shapes such as decreasing, upsidedown bathtub and decreasingincreasingdecreasing. Some mathematical properties of the new distribution can be determined from a linear combination for its density function such as ordinary and incomplete moments. The performance of the maximum likelihood method to estimate the model parameters is investigated by a simulation study. Further, we introduce the new logpowerCauchy negativebinomial regression model for censored data, which includes as submodels some widely known regression models that can be applied to censored data. Four real life data sets, of which one is censored, have been analyzed and the new models provide adequate fits.
Introduction
Numerous extended classical distributions have been proposed for modelling data in several areas such as biological studies, environmental and medical sciences, engineering, economics, finance and actuarial science. However, in many applied areas like lifetime analysis, finance and insurance, there is a clear need for further extended distributions, that is, new distributions which are more flexible to model real data in these areas, since the data can present a high degree of skewness and kurtosis. There are many generalizations and extensions of distributions in literature using the randomlystopped approach for either the minimum or maximum of K independent and identically distributed (iid) random variables (discrete or continuous). See, for example, Nekoukhou and Bidram (2017). Further, Rooks et al. (2010) introduced a twoparameter powerCauchy (PC) distribution for analyzing upsidedown bathtub (UBT) hazard function data. The cumulative distribution function (cdf) and probability density function (pdf) of the PC distribution with shape parameter α and scale parameter σ are, respectively, given by
and
Tahir et al. (2016) studied the exponentiated powerCauchy (EPC) distribution. Let Z_{ a } denote the EPC distribution with baseline parameters α and σ and power parameter a>0. The cdf and pdf of Z_{ a } are given by
and
respectively.
In this paper, we define a new fourparameter generalization of the PC distribution named the powerCauchy negativebinomial (PCNB) model. The new distribution is flexible to model complex positive real data sets, i.e., it can have decreasing, UBT shaped and decreasingincreasingdecreasing hazard rate functions (hrfs). It thus provides a good alternative to several wellknown life distributions.
The paper is unfolded as follows. In “The proposed model” section, we define the PCNB distribution. In “Properties of the new model” section, we obtain some of its mathematical properties including quantile function (qf), tail behaviors, a useful linear representation for its density function and some types of moments. In “Estimation” section, the model parameters are estimated by maximum likelihood and a simulation study is performed. In “Regression model” section, we present a regression model based on the PCNB distribution with censored data. In “Applications” section, the usefulness of the new distribution is illustrated by means of four real data sets where we show empirically that it outperforms some wellknown lifetime distributions. Finally, “Concluding remarks” section offers some concluding remarks.
The proposed model
General Insurance companies typically face two major problems when they want to use past or present claim amounts in forecasting future claim severity. First, they have to find an appropriate statistical distribution for their large volumes of claim amounts. Then, test how well this statistical distribution fits their claim data. Most data in general insurance problems is skewed to the right and therefore most distributions that exhibit this characteristic can be used to model the claim severity. Insurance data contains relatively large claim amounts, which may be infrequent. Hence, there is a clear need to use statistical distributions with relatively heavy tails and highly skewed like the PC distribution.
Large claims play a special role because of their importance financially. It is also hard to assess their distribution. They do not occur very often, and historical experience is therefore limited. Insurance companies may even cover claims larger than anything that has been seen before. How should such situations be tackled? The simplest would be to fit a parametric family and try to extrapolate beyond past experience. That may not be a very good idea. A generalization of the PC distribution may fit well in the central regions without being reliable at all at the extreme right tail, and such a procedure may easily underestimate big claims severely.
Let T_{1},…,T_{ K } denote the failure times of K (a latent random variable) claims where K is assumed independent of the Ti′s in a setup with at least one claim. Then, we define Z= max{T_{1},…,T_{ K }}. Consider that the Ti′s are iid random variables with common cdf G(z) and that K follows the negativebinomial (NB) probability mass function (n=1,2,… and p are fixed but unknown parameters)
Under this setup, the conditional pdf of Z given K is
Then, the marginal pdf of Z follows as
The cdf of Z (which holds for any positive real n) is given by
Inserting Eqs. (1) and (2) in Eq. (5), we obtain
where α,σ>0 and p∈(0,1). Henceforth, we denote by Z∼PCNB(n,p,α,σ) a random variable having the density (7). The cdf of Z is given by
Clearly, if p=n=1, the PCNB model is identical to the PC distribution (2). Moreover, the PCNB(n,p,α,σ) model has the following six submodels:

(i)
If p=n=α=1, it gives the halfCauchy (HC) distribution;

(ii)
If α=1, it reduces to the halfCauchy negative binomial (HCNB) distribution;

(iii)
If n=1, it gives the PCgeometric distribution;

(iv)
If α=n=1, it becomes the HCgeometric distribution;

(v)
If p=1, it reduces to the exponentiatedPC distribution;

(vi)
If p=n=1, it becomes the PC distribution.
Note that the special models given in (ii), (iii) and (iv) do not exist in the literature.
The survival function (sf), hrf and reversed hazard rate function (rhrf) of Z are given by S(z)=1−F(z), h(z)=f(z)/S(z) and r(z)=f(z)/F(z), respectively, where F(z) and f(z) are defined before. Figures 1 and 2 display some plots of the density and hrf of Z for σ=1 and different values of α, p and n. The plots in Fig. 1a and b reveal that the PCNB density can have different shapes such as rightskewed and reversedJ. The plots in Fig. 2a and b indicate that the hrf of Z can have DFR (decreasing failure rate), UBT and DID (decreasingincreasingdecreasing) shapes.
Properties of the new model
In this section, we provide some structural properties of the new distribution.
Quantile function and random number generation
The qf of Z is determined by inverting (8) as
We can easily generate PCNB random variables from (9).
Tail behaviors
The tail behaviors of the pdf and cdf of Z in (7) and (8) are given as follow:
where A=(2p/π)^{n} and B=2n/(p π). For example, for fixed values of n and p, the left and right tails of the PCNB distribution are heavier when α increases. Also, for fixed values of α and p, the left tail becomes heavier when n increases.
Moments
For any real q>0, the power series \((1t)^{q}=\sum _{j=0}^{\infty } (q)_{j}\,t^{j}/j!\) holds, where (q)_{ j }=q+(q+1)+⋯+(q+j−1)=Γ(q+j)/Γ(q) is the ascending order factorial and (q)_{0}=1. Then, the cdf of Z in Eq. (8) can be expressed as
where B_{ j }(n,p)=(n)_{ j } p^{n}(1−p)^{j}/j! (for j≥0) and G_{ PC }(z;α,σ) is the cdf given in Eq. (1).
By differentiating Eq. (10), the pdf of Z follows as
where h_{n+j}(z)=h_{n+j}(z;α,σ) is the EPC density function with power parameter n+j given by Eq. (4). Equation (11) reveals that the PCNB density is a linear combination of EPC densities. So, some mathematical properties of Z can be obtained from those of the EPC distribution. Next, we provide two examples.
Tahir et al. (2016) (see Sections 6.8 and 6.9) determined the sth ordinary and incomplete moments of Z_{ a } as
and
respectively, where a_{0}(s)=1, a_{1}(s)=s/3, a_{2}(s)=s(5s+7)/90, etc, and D_{ z }=2 π^{−1} tan−1(z/σ)^{α}.
Then, the rth ordinary moment of Z follows from Eqs. (11) and (12) as
Analogously, the rth incomplete moment of Z, say \(m_{r}(z)=\int _{0}^{z} z^{r}\,f_{PCNB}(z) dz \), can be obtained from (11) and (13) as
The first incomplete moment m_{1}(q) follows from Eq. (15) for r=1. It is useful to obtain the Bonferroni and Lorenz curves and mean deviations for the new model.
Estimation
Several approaches for parameter point estimation were proposed in the literature but the maximum likelihood method is the most commonly employed. The maximum likelihood estimates (MLEs) enjoy desirable properties that can be used when constructing confidence intervals for the model parameters. Large sample theory for these estimates delivers simple approximations that work well in finite samples. The normal approximation for the MLEs in distribution theory is easily handled either analytically or numerically.
We consider the estimation of the unknown parameters of the new distribution by the maximum likelihood method. Let z_{1},…,z_{ m } be m observed values from the PCNB distribution given by (7) with vector of parameters θ=(n,p,α,σ)^{⊤}. The loglikelihood ℓ=ℓ(θ) for θ is given by
Equation (16) can be maximized either directly by using wellknown computing platforms such as the R (optim function), SAS (PROC NLMIXED) and Ox program (subroutine MaxBFGS). These scripts can be applied and executed for a wide range of initial values. This process often leads to more than one maximum. However, in these cases, we consider the MLEs corresponding to the largest value of the loglikelihood statistics. In a few cases, no maximum is identified for the selected initial values. In these cases, new initial values can be tried in order to obtain a maximum. There exist sufficient conditions for the existence of the MLEs such as compactness of the parameter space and the concavity of the loglikelihood function. These estimates can exist even when such conditions are not satisfied. For more complex models, and in particular when there is no explicit solution, it is nearly impossible to establish theoretical conditions on the existence and uniqueness of the MLEs. However, such properties can be investigated numerically for this distribution and a given data set.
For interval estimation on the model parameters, we can evaluate the estimated observed information matrix \(J(\widehat {\boldsymbol {\theta }}\)) numerically. Further, we can easily check if the fit using the PCNB model is statistically “superior” to the fits using any of its six special models. For example, for comparing the PCNB and HC distributions, i.e., testing the null hypothesis H_{0}:p=n=α=1 against H_{1}:H_{0} is false, the likelihood ratio (LR) statistic is given by \(w = 2\{\ell (\widehat {{\theta }})  \ell (\widetilde {{\theta }})\}\), where \(\widehat {{\theta }}\) and \(\widetilde {{\theta }}\) are the unrestricted and restricted estimates obtained by maximizing ℓ=ℓ(θ) under H_{1} and H_{0}, respectively. The limiting distribution of this statistic is \(\chi _{3}^{2}\) under the null hypothesis, which is rejected if w exceeds the upper 100(1−γ)% quantile of the \(\chi _{3}^{2}\) distribution.
The PCNB survival function has closedform expression and hence this distribution can be used effectively in analyzing lifetime data in the presence of censoring. Consider a situation, where the time to event is not completely observed and is subjected to right censoring. Let C_{ i } denote censoring time. We then observe z_{ i }= min(t_{ i },c_{ i }), where t_{ i } is the observed time to the event and c_{ i } is the observed rightcensored, for i=1,…,m. The loglikelihood function reduces to
The above loglikelihood can be maximized numerically to obtain the MLEs. We use the optim routine in the R software.
Monte Carlo simulation study. Now we assess the performance of the maximum likelihood method for estimating the PCNB parameters using Monte Carlo simulations. The simulation study is repeated 5000 times each with sample sizes m=50,100,200,500 and parameter scenarios: I: p=0.8, n=0.5, α=0.5 and σ=1, II: p=0.5, n=0.5, α=1.5 and σ=1 and III: p=0.1, n=1.5, α=1.5 and σ=1. Table 1 gives the average biases (Bias) of the MLEs, mean square errors (MSE) and modelbased coverage probabilities (CP) for the parameters p, n, α and σ under these scenarios and different sample sizes. Based on the simulation results, we conclude that the MLEs perform well in estimating the parameters of the PCNB distribution. The CPs of the confidence intervals are quite close to the 95% nominal levels. Therefore, the MLEs and their asymptotic results can be adopted for estimating and constructing confidence intervals for the model parameters.
Regression model
In many practical applications, the lifetimes are affected by explanatory variables such as the cholesterol level, blood pressure, weight and many others. Parametric models to estimate univariate survival functions and for censored data regression problems are widely used. A regression model that provides a good fit to lifetime data tends to yield more precise estimates of the quantities of interest.
In applications in the area of survival analysis, the hrf is often Ushaped or unimodal, i.e., the function is not monotonic. The regression models commonly used for survival data are the logWeibull, monotonic failure rate, loglogistic, decreasing failure rate and unimodal functions. One of the objectives of this work is to propose a new regression model, in location and scale form, called the logpowerCauchy negativebinomial (LPCNB) regression model, which presents different failure rate functional forms. The proposed model is an alternative to the traditional extreme value (or logWeibull), logistic and lognormal models, among others. One way to study the effect of these explanatory variables on the response variable Y is through a locationscale regression model, also known as a model of accelerated lifetime. These models consider that the response variable belongs to a family of distributions characterized by a location parameter and a scale parameter. Further details on this class of regression models can be found in Cox and Oakes (1984), Kalbfleisch and Prentice (2002) and Lawless (2003). In the context of survival analysis, some distributions have been used to analyze censored data. For example, more recently, Cruz et al. (2016) proposed the logodd loglogistic Weibull regression model with censored data, Lanjoni et al. (2016) defined an extended Burr XII regression model and Ortega et al. (2016) introduced the odd BirnbaumSaunders regression model with applications to lifetime data. In a similar manner, we define a locationscale regression model using the LPCNB regression model.
Let Z∼PCNB(n,p,α,σ) be a random variable having the density (7). A class of regression models for location and scale is characterized by the fact that the random variable Y= log(Z) has a distribution with location parameter μ(v), which depends only on the explanatory variable vector, and a scale parameter a. Then, we can write Y=μ(v)+aW, where a>0 and the distribution of W does not depend on v.
The random variable Y=log(X) reparameterized in terms of μ= log(σ) and a=α^{−1} has density function (for \(y \in \mathbb {R}\)) given by
where n>0 and p∈(0,1) are shape parameters, \(\mu \in \mathbb {R}\) is the location parameter and a>0 is the scale parameter.
We refer to Eq. (17) as the LPCNB distribution, say Y∼LPCNB(n,p,μ,a). If Z∼PCNB(n,p,α,σ), then Y= log(Z)∼LPCNB(n,p,μ,a).
For p=n=1, we obtain the logpower Cauchy (LPC) model. The survival function corresponding to Eq. (17) is given by
Plots of the density function (17) for selected parameter values are displayed in Fig. 3a and b, which show great flexibility for different values of p and n.
We define the standardized random variable W=(Y−μ)/a having the density function
Next, we propose a linear locationscale regression model linking the response variable y_{ i } and the explanatory variable vector \(\mathbf {v}_{i}^{T}=(v_{i1},\ldots,v_{ip})\) given by
where the random error w_{ i } has density function (19), τ=(τ_{1},…,τ_{ p })^{T}, a>0, n>0 and p∈(0,1) are unknown parameters. The parameter \(\phi _{i}=\mathbf {v}_{i}^{T} \tau \) is the location of y_{ i }. The location parameter vector ϕ=(ϕ_{1},…,ϕ_{ m })^{T} is represented by a linear model ϕ=vτ, where V=(v_{1},…,v_{ m })^{T} is a known model matrix. The LPCNB model (20) opens new possibilities for fitting many different types of data.
Consider a sample (y_{1},v_{1}),…,(y_{ m },v_{ m }) of m independent observations, where each random response is defined by y_{ i }= min{log(z_{ i }), log(c_{ i })}. We assume noninformative censoring such that the observed lifetimes and censoring times are independent. Let F and C be the sets of individuals for which y_{ i } is the loglifetime or logcensoring, respectively. Conventional likelihood estimation techniques can be applied here. The loglikelihood function for the vector of parameters θ=(p,n,a,τ^{T})^{T} from model (20) has the form \(l(\boldsymbol {\theta })=\sum \limits _{i \in F}l_{i}({\boldsymbol {\theta }})+\sum \limits _{i \in C}l_{i}^{(c)}({\boldsymbol {\theta }})\), where \(l_{i}({\boldsymbol {\theta }})=\log [f(y_{i})]\), \(l_{i}^{(c)}({\boldsymbol {\theta }})=\log [\!S(y_{i})]\), f(y_{ i }) is the density (17) and S(y_{ i }) is the survival function (18) of Y_{ i }. The total loglikelihood function for θ reduces to
where q is the number of uncensored observations (failures) and \(w_{i}={\left (y_{i}\textbf {v}_{i}^{T} {\boldsymbol {\tau }}\right)}/{a}\). The MLE \(\widehat {\boldsymbol {\theta }}\) of θ can be evaluated by maximizing the loglikelihood (21). We use the procedure NLMixed in SAS to calculate \(\widehat {\boldsymbol {\theta }}\). Initial values for τ and a are taken from the fit of the LPC regression model with p=n=1.
The elements of the (p+3)×(p+3) observed information matrix J(θ), namely J_{ pp },J_{ pn }, \(\phantom {\dot {i}\!}J_{pa},J_{p\tau _{j}},J_{nn},J_{na}, J_{n \tau _{j}},J_{aa},J_{a \tau _{j}}\) and \(J_{\tau _{j}\tau _{s}}\phantom {\dot {i}\!}\) (for j,s=1,…,p), can be evaluated numerically. Inference on θ can be conducted in the classical way based on the approximate multivariate normal \(N_{p+3}\left (0,J(\widehat {\boldsymbol {\theta }})^{1}\right)\) distribution for \(\widehat {\boldsymbol {\theta }}\).
We can use the likelihood ratio (LR) statistic for comparing some special models with the LPCNB regression model. We consider the partition \(\boldsymbol {\theta }=\left (\boldsymbol {\theta }_{1}^{T},\boldsymbol {\theta }_{2}^{T}\right)^{T}\), where θ_{1} is a subset of parameters of interest and θ_{2} is a subset of remaining parameters. The LR statistic for testing the null hypothesis \(H_{0}:{\boldsymbol {\theta }}_{1} ={\boldsymbol {\theta }}_{1}^{(0)}\) versus the alternative hypothesis \(H_{1}:{\boldsymbol {\theta }}_{1} \neq {\boldsymbol {\theta }}_{1}^{(0)}\) is given by \(w= 2\{\ell (\widehat {\boldsymbol {\theta }})\ell (\widetilde {\boldsymbol {\theta }})\}\), where \(\widetilde {\boldsymbol {\theta }}\) and \(\widehat {\boldsymbol {\theta }}\) are the estimates under the null and alternative hypotheses, respectively. The statistic w is asymptotically (as n→∞) distributed as \(\chi _{q}^{2}\), where q is the dimension of the subset of parameters θ_{1} of interest.
Applications
In this section, the PCNB distribution is fitted to model three real life data sets. We compare the fits of the PCNB model with the betaWeibull (BW) proposed by Lee et al. (2007), beta halfCauchy (BHC) defined by Cordeiro and Lemonte (2011), Kumaraswamy halfCauchy (KHC) presented by Ghosh (2014), powerCauchy geometric (PCG) and powerCauchy models. We estimate the parameters by using the maximum likelihood method. In order to compare the models, we consider the following goodnessoffit statistics: Akaike information criterion (AIC) and KolmogorovSmirnov (KS) measure with the associated pvalue. The pdfs of the BW, BHC and KHC (for x>0 and a,b,c,σ,λ>0) distributions are given by
respectively, where K_{1}=2^{a}/[σ π^{a} B(a,b)] and K_{2}=a b 2^{a}/(σ π^{a}).
Data set 1: Load haul dumbp machines failure data. First, we consider data on the times between successive failures (TBFs) of load haul dumbp machines. The operation and maintenance cards of a fleet of 19 LHD machines were collected for a period of one year. These cards record times to failure, the engine clock hour and the reported failures in case of operation cards, and the times to repairs and actual repairs performed in case of maintenance cards (see Kumar et al. [1989, Appendix 2, Table B1]). The summary statistics of the data are: m= 50, \(\bar {x}\)= 45.88, s=51.76936, skewness=2.07528 and kurtosis=6.06486. The MLEs (with SEs in parentheses), the AIC and KS statistics and their pvalues are listed in Table 2. The figures in this table indicate that the PCNB model provides the best fit to the current data. Next, we provide the scaled TTT plot, see Aarset (1987), for these data in Fig. 4b. The summary statistics and Fig. 4a and b reveal that the first data set is rightskewed with DID failure rate shape. So, the PCNB has the ability to fit rightskewed data with DID failure rate shape. For a visual comparison, we provide PP plots of the fitted models to these data in Fig. 5. Clearly, the PCNB model provides a closer fit to these data.
Data set 2: Jet Airplanes failure data. The second data set is taken from Porchan (1963), which represents the failure times of air conditioning system of 720 jet airplanes. A set of the summary statistics of the data are: m=213, \(\bar {x}\)= 93.14085, s=106.7636, skewness=2.11185 and kurtosis=4.92499. The results of the fitted distributions are presented in Table 3. We conclude that the PCNB model provides the best fit with lowest values of the AIC and KS statistics and largest pvalue. The scaled TTT plot for the second data set in Fig. 6b gives an indication of a decreasing failure rate shape. The summary statistics and Fig. 6a and b reveal that the second data set is rightskewed with decreasing failure shape. So, the PCNB distribution can be used effectively to model these data. The PP plots in Fig. 7 also support the results of Table 3.
In conclusion, the PCNB model is certainly an appropriate model for fitting the first two data sets.
Data set 3: Head and neck cancer data. The third data set is taken from Efron (1988) regarding head and neck cancer clinical trial consisting of survival times of 51 patients in arm A who were given radiation therapy. Nine patients were lost to the followup and were regarded as censored observations. The MLEs of the model parameters are listed in Table 4. The figures in this table indicate that the PCNB model provides the best fit with lowest values of the AIC and KS statistics. The plot in Fig. 8b reveals that the third data set has UBT failure rate shape, and then the PCNB distribution can be used effectively to model these data. The plots of the estimated survival functions of the PCNB, BW and GPC distributions are displayed in Fig. 8a. Clearly, the PCNB estimated survival function provides a closer fit to the empirical survival function than the other models.
Regression model example : Entomology data. First, we use the data from a study carried out at the Department of Entomology of the Luiz de Queiroz School of Agriculture, University of São Paulo, which aims to assess the longevity of the Mediterranean fruit fly (ceratitis capitata). The need for this fly to seek food just after emerging from the larval stage has permitted the use of toxic baits for its management in Brazilian orchards for at least fifty years. This pest control technique consists of using small portions of food laced with an insecticide, generally an organophosphate, that quickly kills the flies, instead of using an insecticide alone. Recently, there have been reports of the insecticidal effect of extracts of the neem tree leading to proposals to adopt various extracts (aqueous extract of the seeds, methanol extract of the leaves and dichloromethane extract of the branches) to control pests such as the Mediterranean fruit fly. The experiment was completely randomized with eleven treatments, consisting of different extracts of the neem tree, at concentrations of 39, 225 and 888 ppm.
After preliminary statistical analysis, these eleven treatments were allocated into two groups, namely:

Group 1: Control 1 (deionized water); Control 2 (acetone  5%); aqueous extract of seeds (AES) (39 ppm); AES (225 ppm); AES (888 ppm); methanol extract of leaves (MEL) (225 ppm); MEL (888 ppm); and dichloromethane extract of branches (DMB) (39 ppm).

Group 2: MEL (39 ppm); DMB (225ppm) and DMB (888 ppm).
The response variable in the experiment is the lifetime of the adult flies in days after exposure to the treatments. The experimental period was set at 51 days, so that the numbers of larvae that survived beyond this period were considered as censored data. The total sample size is n=72, because four observations were lost. Therefore, the variables used in this study are: z_{ i }lifetime of ceratitis capitata adults in days, v_{i1}sex of the larvae and v_{i2}group (0=group 1, 1=group 2). We start the analysis of these data considering only failure (z_{ i }) and censoring (c_{ i }) data and an appropriate model for fitting the data could be the LPCNB and LPC distributions.
Next, we present results on fitting the model
where the response variable Y_{ i } follows the LPCNB distribution given in (17), i=1,…,72. Table 5 lists the MLEs and their standard errors in parentheses for two fitted regression models. The MLEs of the model parameters are evaluated using the NLMixed procedure in SAS. Iterative maximization of the logarithm of the likelihood function (21) starts with initial values for τ and σ, which are taken from the fit of the LPC regression model.
We note from the fitted LPCNB regression model that v_{2} is significant at 1% and that there is a significant difference between the groups 1 and 2 for the survival times. Table 6 gives a summary of the AIC, consistent Akaike information criterion (CAIC) and Bayesian information criterion (BIC) to compare the LPCNB and LPC regression models. The LPCNB regression model outperforms the LPC model irrespective of the criteria and then they can be used effectively in the analysis of these data.
Finally, we turn to a simplified model retaining only v_{2} as an explanatory variable
The MLEs for the LPCNB regression model fitted to these data are listed in Table 7. In order to assess if the model is appropriate, Fig. 9 displays the plots of the empirical survival function and the estimated survival function from the fitted LPCNB regression model. In fact, this regression model provides a good fit to these data.
Concluding remarks
We consider a lifetime model in the context of insurance claims where the claim sizes follow a power Cauchy and the number of claims is negative binomial distributed. In these terms, we propose a new model by compounding the powerCauchy and negativebinomial distributions called the powerCauchy negativebinomial (PCNB) distribution. We provide a useful linear representation for its density, which allows to obtain some properties for the proposed distribution. We use the maximum likelihood method for estimating the model parameters. The suitability of these estimates is investigated by a simulation study. We fit the proposed distribution to three real data sets to show empirically its flexibility. We proposed a new class of regression models for location and scale based on the logarithm of the PCNB random variable. Estimation and inference on the regression coefficients are discussed and an application to real data in Entomology is addressed. Various future studies can be conducted, such as employing other estimation techniques (bootstrap and Bayesian methods) and investigating the sensitivity of the LPCNB regression model using diagnosis and analysis of residuals. which led to this improved version.
References
Aarset, MV: How to identify bathtub hazard rate. IEEE Trans. Reliab. 36, 106–108 (1987).
Cordeiro, GM, Lemonte, AJ: The betahalf Cauchy distribution. J. Probab. Statist. Art. ID. (904705), 18 (2011).
Cox, DR, Oakes, D: Analysis of survival data. Chapman and Hall, New York (1984).
Cruz, da, Ortega, JN, Cordeiro, EMM: GM: The logodd loglogistic Weibull regression model: modelling, estimation, influence diagnostics and residual analysis. J. Stat. Comput. Simul. 86, 1516–1538 (2016).
Efron, B: Logistic regression, survival analysis, and the Kaplan–Meier curve. J. Amer. Statist. Assoc. 83, 414–425 (1988).
Ghosh, I: The Kumaraswamyhalf Cauchy distribution: Properties and applications. J. Stat. Theory Appl. 13, 122–134 (2014).
Kalbfleisch, JD, Prentice, RL: The statistical analysis of failure time data. Wiley, New York (2002).
Kumar, U, Klefsjo, B, Granholm, S: Reliability investigation for a fleet of load haul dump machines in a Swedish mine. Reliab. Eng. Syst. Safet. 26, 341–361 (1989).
Lanjoni, BR, Ortega, EMM, Cordeiro, GM: Extended Burr XII regression models: Theory and applications. J. Agric. Biol. Environ. Stat. 21, 203–224 (2016).
Lawless, JF: Statistical models and methods for lifetime data. Wiley, New Jersey (2003).
Lee, C, Famoye, F, Olumolade, O: BetaWeibull distribution: Some properties and applications to censored data. J. Mod. Appl. Stat. Methods. 6, 173–186 (2007).
Nekoukhou, V, Bidram, H: A new generalization of the Weibullgeometric distribution with bathtub failure rate. Commun. Stat. Theory Methods. 46, 4296–4310 (2017).
Ortega, EMM, Lemonte, AJ, Cordeiro, GM, da Cruz, JN: The odd BirnbaumSaunders regression model with applications to lifetime data. J. Stat. Theory Pract. 10, 780–804 (2016).
Proschan, F: Theoretical explanation of observed decreasing failure rate. Technometrics. 5, 375–383 (1963).
Rooks, B, Schumacher, A, Cooray, K: The power Cauchy distribution: derivation, description, and composite models. NSFREU Program Reports (2010). Available from "http://www.cst.cmich.edu/mathematics/research/REU_and_LURE.shtml".
Tahir, MH, Zubair, M, Cordeiro, GM, Alzaatreh, A, Mansoor, M: The PoissonX family of distributions. J. Stat. Comput. Simul. 86, 2901–2921 (2016).
Acknowledgement
The authors are grateful to the EditorinChief, the Associate Editor and anonymous referees for their many helpful comments and suggestions on an earlier version of this paper which led to this improved version.
Author information
Affiliations
Contributions
The authors, viz MZ, MHT, GMC, AA and EMMO with the consultation of each other carried out this work and drafted the manuscript together. All authors read and approved the final manuscript.
Corresponding author
Correspondence to Gauss M. Cordeiro.
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Zubair, M., Tahir, M.H., Cordeiro, G.M. et al. The powerCauchy negativebinomial: properties and regression. J Stat Distrib App 5, 1 (2018) doi:10.1186/s4048801700823
Received:
Accepted:
Published:
Keywords
 Censoring
 Compounding
 Gclass
 HalfCauchy distribution
 Maximum likelihood estimation
 Negativebinomial distribution
AMS Subject Classification
 Primary 60E05
 Secondary 62N05
 62F10