Failure time regression with continuous informative auxiliary covariates

In this paper we use Cox’s regression model to fit failure time data with continuous informative auxiliary variables in the presence of a validation subsample. We first estimate the induced relative risk function by kernel smoothing based on the validation subsample, and then improve the estimation by utilizing the information on the incomplete observations from non-validation subsample and the auxiliary observations from the primary sample. Asymptotic normality of the proposed estimator is derived. The proposed method allows one to robustly model the failure time data with an informative multivariate auxiliary covariate. Comparison of the proposed approach with several existing methods is made via simulations. Two real datasets are analyzed to illustrate the proposed method.


Introduction
In epidemiologic studies, the exposure variable vector X is often too difficult or too expensive to measure on the full cohort, whereas an auxiliary variable vector W for X can be easily measured for all subjects in the study cohort. For example, in a large scale nutritional study, the PIN Study (Savitz et al. 2001), it would be prohibitively expensive to obtain the exact dietary iron intake on each individual recruited. Instead, a self administered quantitative food questionnaire is conducted on all subjects where a crude assessment of iron intake is obtained. The true exposure, the blood serrum ferritin concentration, is only assayed for a validation set consisting of a small subset of the full study cohort. Although the true covariates are missing for most individuals, the existence of some surrogates or auxiliary measurements conveys information about X and serves as common proxy measure. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/ licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.
Utilizing the available auxiliary information to improve the efficiency of the effects estimation and in turns to increase the power of the study is critical for the success of the studies. In this paper, we study censored failure time regression with a continuous auxiliary covariate vector.
A variety of authors have contributed their work to this field. Related works include Prentice (1982), Pepe et al. (1989), Lin and Ying (1993), Hughes (1993), Lipsitz and Ibrahim (1996), Zhou and Wang (2000), Fan and Wang (2009), Liu et al. (2010), etc. In particular, Prentice (1982) introduced a partial likelihood estimator based on the induced relative risk function. This method was further developed by Pepe et al. (1989) using parametric modeling. Zhou and Pepe (1995) proposed an estimated partial likelihood method for discrete auxiliary covariates to relax the parametric assumptions on the frequency of events and the underlying distributions of covariates. This method was extended by Zhou and Wang (2000) to deal with continuous auxiliary variables, based on the Nadaraya-Watson kernel smoother method (Nadaraya 1964;Watson, 1964). Fan and Wang (2009), Liu et al. (2010) used the same approach for multivariate failure time data with auxiliary covariates. While Zhou and Wang's (2000) approach is useful in certain situations, there are some restrictions on it. First, the approach is effective only when the auxiliary variable W is of low dimension so that the "curse of dimensionality" in nonparametric smoothing can be avoided. Secondly, it requires that, conditionally on X, W provides no additional information about the hazard of failure; that is, all of the effects of W on failure and censoring are mediated through X, which is somewhat restricted since W may not be a true surrogate and depends on the failure given X.
Further, this method does not fully utilize the observations in the non-validation subsample and hence cannot be efficient in certain situations.
We here propose a new method to deal with the above problems associated with the method in Zhou and Wang (2000). The proposed method allows W to be multivariate and to be informative in the sense that, conditional X, it may provide additional information on the hazard of failure. We first estimate the induced relative risk function with a kernel smoother based on the validation sample, and then improve the estimation by utilizing the information on the incomplete observations from the non-validation subsample. In addition, the local linear smoother (see for example in Fan and Gijbels 1996) is employed to enhance the performance of the kernel smoother in Zhou and Wang (2000) at the boundary regions. Our method will be expected to improve the efficiency of the estimator of Zhou and Wang (2000) in various situations, for example, when auxiliary variable W is informative or not very informative about X (see also the simulation results). Asymptotic normality of our estimator is derived.
The proposed methodology can be extended to model multivariate failure time data with auxiliary covariates by following the method in Fan andWang (2009) or Liu et al. (2010).
The paper is organized as follows. In Section 2, we introduce the hazards models. In Section 3 we introduce our new estimation approach to predicting the induced relative risk for individuals in non-validation subsample based on the kernel smoother. In Section 4 we concentrate on the asymptotic properties of the proposed estimators. We conduct simulations in Section 5 to compare the efficiencies of different estimating methods. In Section 6 we apply the proposed methodology to two real datasets.

Cox's proportional hazards models
To facilitate exposition, we here employ the notations in Zhou and Wang (2000). Suppose that there are n independent individuals in a study cohort. Let {X i (t), Z i (t)} denote the covariate vector for the ith subject at time t (i = 1, …, n). Assume that X i (·) is only observed in the validation subsample which is chosen at the baseline under the ignorable missing mechanism condition (Rubin 1976). Let Z i (·) be the remaining covariate vector that is always observed, and W i (·) the informative auxiliary variables for X i (·). Let η i be an indicator variable with η i = 1 if the ith individual is in the validation set and 0 if in the nonvalidation set. Put V = {i : η i = 1} and V̄ = {i : η i = 0}. We assume that individuals in the validation subsample are randomly selected and hence representative. Then observed data for the ith subject is where S i is the observed event time for the ith subject, which is the minimum of the potential failure time T i and the censoring time C i , and δ i is the indicator of censoring. We consider the following conditional hazard rate function of failure (Cox 1972) (2.1) where λ 0 (·) ≥ 0 is the unspecified base-line hazard and is the relative risk parameter vector to be estimated.
For model (2.1), the relative risk functions are , and the partial likelihood function for the parameters β is (2.2) where ℛ(S i ) is the risk set at time S i . However, for i ∈ V, the true variate X i (t) is not observed, and hence the corresponding relative risk function γ i (β, t) is not available and has to be imputed. Zhou and Wang (2000) used the conditional expectation (2.3) for the imputation of γ i (β, t) (i ∈ V). Based on data in V, they obtained the Nadaraya-Watson kernel estimator (Nadaraya 1964;Watson 1964) of the above imputation and replaced γ i (β, t) for i ∈ V̄ in (2.2) by the kernel estimator, which leads to the estimated partial likelihood. Under the assumption that W is not informative, that is, all of the effects of W on failure and censoring are mediated through X, so that they derived the consistency and asymptotic normality of the estimation. However, if W is informative, their method will generally be biased (see also Section 5). In addition, since this method directly used information in the auxiliary covariate W and estimated the conditional expectation (2.3), it may encounter the so-called "curse of dimensionality" if W is of higher dimension. For the present study, we propose a new method for imputation of the relative risk function. The information in W will be used in a new way. This leads to a new estimated partial likelihood.

Estimated partial likelihood with a local smoother
In this section, we introduce our method to estimate the parameters in model (2.1) based on maximizing the estimated partial likelihood.

Local smoother for the relative risk function
Instead of (2.3), we use the conditional expectation of γ i (β, t), as imputation of γ i (β, t). Let d be the dimension of Z and let (3.4) be the induced risk function. Put , and ν i (β 1 , t) = E[ζ i (β 1 , t)|S i ≥ t, Z i (t)]. To use the partial likelihood (2.2), we need to estimate ϕ i (β, t) or equivalently ν i (β 1 , t) for i ∈ V. Using the local linear smoother (see for example, Fan and Gijbels 1996) leads to the following (functional) estimators of ν j (β 1 , t) for j ∈ V̄(

3.5)
where h is the bandwidth, The above estimation of the relative risk function was similarly used in Zhou and Wang (2000) for a nonparametric smoothing problem on the estimation of E[γ i (β, t)|S i ≥ t, Z i (t),W i (t)], where the "curse of dimensionality" problem can happen if W is multivariate. Note that this estimation method uses only the complete observations in V and neglects the important information on incomplete observations in V. It follows that this approach cannot be expected to be efficient in certain situations. In addition, it is required in Zhou and Wang (2000) that, conditional on X, the auxiliary variable W provides no additional information on the the hazard of failure. This requirement may not hold if W is not a genuine surrogate of X. In the following, we propose an improved estimation approach which utilizes information from W and observations in V̄ and does not impose the requirement. Moreover, the proposed method allows one to model the failure time data with informative multivariate auxiliary variable W without "curse of dimensionality". Note that even for one dimensional Z and W, the method in Zhou and Wang (2000) requires a two-dimensional smoother while the new method needs only one-dimensional smoothing. To have a performance comparable with that of a one-dimensional nonparametric smoother using M 1 = 50 data points, we need about data points for a 2-dimensional nonparameteric smoother. Hence the loss of efficiency due to highly dimensional smoothing is large and increasing exponentially fast (see page 317 of Fan and Yao 2003).

Improved estimation of the relative risk function and the estimated partial likelihood
Recall that W is a vector of auxiliary variables for X and is hence correlated with X. Let ξ i (α, t) = exp(α′W i (t)), where α is a parameter vector to be chosen. Considering the conditional , then we can estimate ψ i (α, t) by running local linear smoothing based on the data in V: The following result depicts asymptotic correlation of ν̂j(β 1 , t) and ψ̂j(α, t).

Proposition 3.1-Suppose that the conditions in
is jointly asymptotically normal with mean zero and covariance matrix where ν 0 (K) = ∫ K 2 (u)du, , and p(·) is the density function of Z.
By the distribution theory for multivariate normal variates, the conditional distribution of given is asymptotically normal with mean The conditional mean can be estimated by substituting consistent estimators based on the validation sample for , σ 1 (Z j , t) and σ 2 (Z j , t), and by replacing ψ j (α, t) with the primary sample based estimator (3.7) where By equating with its estimated conditional mean and solving for ν j (β 1 , t), we obtain an improved (functional) estimate (3.8) The updated estimator ν̄j(β 1 , t) is doomed to be more accurate than ν̂j(β 1 , t) in (3.5), since it has used the information from W and observations in V. Even though the information about W may not be utilized in a very efficient way as in Zhou and Wang's (2000) estimator when W is not informative, it is the price we have to pay for achieving robustness against informative W. Note that ν̄j depends on α which is related to efficiency of the estimator. Intuitively, one should choose α to maximize the conditional correlation coefficient between ζ j and ξ j , given (S j ≥ t, Z j ), which is evident from the following result.

Proposition 3.2-Assume that the conditions in
When , i.e. the relative risk contributed by W is not correlated to that contributed by X, given (S ≥ t, Z), the estimator νj is asymptotically equivalent to ν̂j in (3.5).
In general, . Hence, by Propositions 3.1 and 3.2, ν̄j is more efficient than ν̂j. Note that the proposed estimator is consistent for any α.
The above estimation method for ν j (β, t) was similarly used in Chen and Chen (2000) for estimating parameters in a parametric regression model. Our estimation can be regarded as an extension of their estimation approach in nonparametric regression. In addition, we do not need a working model to specify the regression relationship between the surrogate and the covariate, and hence there is no risk of misspecification of the working model.
For each given value of β, with the estimator ν̄j(β, t), we can estimate the induced relative risk r i (β, t) in (3.4) by (3.9) where . Then the parameters β can be estimated by maximizing the following estimated partial likelihood function (EPL): (3.10) We denote β̂E PL = arg max β EPL(β).
For an extreme case with W ≈ Z, Zhou and Wang's imputation for (2.3) approximately becomes ϕî(β, t) = ν̂i(β, t) exp(β′Z i (t)) and uses a two dimensional smoother, which is inferior to the improved estimator ϕī(β, t), and hence by the definition of β̂E PL , our estimator is superior to Zhou and Wang's. However, it is generally difficult to compare these two estimators. In our estimation of the induced relative risk, we used an improved estimator ϕ̄j(β, t) for j ∈ V. The "curse of dimensionality" problem in Zhou and Wang (2000) can be avoided for a multivariate W. Our approach would at least be useful in cases where the number of variables in Z which are correlated with the missing covariate X is low, whereas the exposure variables of interest and their auxiliary variables may be multivariate.
An alternative to β̂E PL is to maximize (3.10) but with r̂i(β, t) replaced by r̃i(β, t) = η i γ i (β, t) . We denote the resulting estimator by β̂V, which does not use the information on W in V. Intuitively, β̂E PL should be better than β̂V, but this is not true in general, since comparison of the asymptotic results in Theorems 4.1 and 4.2 below could not lead to a dominated estimator. However, in small validation ratio settings, β̂V is not expected to perform well, since it uses only the observations in the validation set for smoothing.

Asymptotic behaviors
Let n v be the subsample size of the validation set and let ρ be the limit of ratio of validation observations, lim n→∞ n v /n. Assume that ρ ∈ (0, 1]. Define For any matrix A, we use A ⊗2 to denote matrix AA τ . The censoring time is assumed to be independent of the failure time conditioning on the true covariates in model (2.1), that is, (4.11) which is different from that in Zhou and Wang (2000) where it is assumed Then, under the independent censoring assumption (4.11), M i (t) is a mean zero martingale with repsect to ℱ i (t) (Kalbfleisch and Prentice 1980;Fleming and Harrington 1991).
In addition, the cumulative hazard can be consistently estimated as Without loss of generality, we assume that t ∈[0, 1]. Put , where , and The following theorem shows that β̂E PL is asymptotically normal.

Remark 4.1
It is interesting to note that when the auxiliary W approximates X, and hence the second term in the expectation of Σ 2 (β) approximates to (1 − ρ)Q i . Therefore, a small ρ will not result in a big Σ 2 (β 0 ). Theoretically, when W i = Z i , and the above asymptotic variance formula shares the same formula as that for the estimator in Zhou and Wang (2000) as exactly expected. However, in practice where W i ≈ Z i , since Zhou and Wang (2000) used a higher dimensional smoother than us, our estimator would have better efficiency for finite samples.

When
, β̂E PL is asymptotically equivalent to the complete-case estimator based on only the validation set V. This is also expected, since the auxiliary variable W i contains no information on X i at this setting. From Theorem 4.1, the asymptotic covariance matrix of β̂E PL is of sandwich form, which can consistently be estimated by Ω̂0 = Î −1 (β 0 ) Σ(β 0 )Î −1 (β 0 ), where Î(β) and Σ(β) are the corresponding empirical estimates. Specifically, Ghosh et al. Page 9 where ρ̂ = n v /n, In summary, a constant variance estimator for β̂E PL can be obtained by replacing the population quantities in the asymptotic covariance matrix Σ(β 0 ) with their corresponding sample averages as in Zhou and Wang (2000). Hence, the asymptotic confidence intervals for β can also be constructed.
The following theorem demonstrates the asymptotic normality of β̂V.

Choice of the parameter vector α
The choice of α affects efficiency of β̂E PL , although the estimator is -consistent for any α. In this paper, we choose α by minimizing the variance of the estimator β̂E PL . Given initial value of β and α, one can estimate α by minimizing the trace of Ω(α).

Ghosh et al.
Page 10 Once the value of α is known, maximization of EPL(β) can be solved via Newton-Raphson iterations. Repeating this procedure, one can find a solution to the optimization problem (3.10). To reduce the burden of computation in practice, one can employ a consistent naive estimator of β as initial value, for example the estimator of β based on only the validation sample which is easy to implement because it involves only a simple fit for the usual Cox's model. In our experience, using the naive estimator as the initial value the iterations converge in a few steps.

Choice of the bandwidth parameter
As for the bandwidths, they affect the estimator β̂E PL , which is true in any nonparametric smoothing problems. Fortunately, the proposed estimator β̂E PL is effective for a large range of bandwidths (see Condition (6) in Appendix 1). Similar to that in Zhou and Wang (2000), we employed here the empirical bandwidth h = (h 1 , h 2 )′ with h 1 = 2σ̂Zn −1/3 and h 2 = 2σ̂Wn −1/3 , where σ̂Z and σ̂W are respectively the sample standard deviations of Z and W, which satisfy the bandwidth conditions required in this paper.

Simulations
In this section, we conduct finite-sample simulations a The aims of the simulations are threefold: one is to examine the small sample behavior of β̂E PL , another is to compare the performance of our estimator with some existing estimators under various situations, and the third and the most important is to illustrate that the proposed estimation allows for an informative auxiliary vector W. The covariates (X, Z) are generated from the following transformation to create correlation: (5.12) where U i 's are independent and identically distributed as U(0, 2). The failure time T conditional on covariate X is from an exponential distribution with hazard function where λ is the baseline constant hazard. We only consider the case λ = 1. Then The auxiliary variable W is generated from (5.13) where e ~ (0, σ 2 ) and σ 2 is the parameter controlling the strength of the association between X and W. We consider the settings with γ = 0 and 2. Model (5.13) with γ = 2 allows one to explore the effectiveness of the proposed method with an informative surrogate W. For γ = 0, it also allows us to compare the performance of the newly proposed method and that in Zhou and Wang (2000). We do simulations for σ = 0.2 and 0.8. The censoring variable is uniformly distributed and independent of the failure time. The validation set is randomly selected with P(η i = 1) = 0.5.
We choose the Gaussian kernel function with the bandwidths (h 1 = 2σ̂Zn −1/3 , h 2 = 2σ̂Wn −1/3 ) which satisfy the bandwidth conditions in Theorem 4.1, where σ̂Z and σ̂W are the sample standard deviations of Z and W respectively. In the following tables, β 0 =[log(2), 0.5]′ denotes the true value of the parameter to be estimated, se is the standard error of βÊ PL from simulation, denotes the mean of the estimated standard errors and cp denotes the 95% coverage probability.
The methods we considered are the newly proposed estimated partial likehood estimation (β̂E PL )and its conterpart (β̂Z W ) in Zhou and Wang (2000), the estimator (β̂V) which does not use the information on W in V, the complete-case Cox regression analysis (β̂C C ) which uses only the validation subsample, the Cox regression with W substituted for the missing X (β̂N), and the full data Cox regression (βF) which assumes that X is available for all n subjects in the study.
Tables 1 and 2 summarizes the results obtained from the simulation. Note that β̂F, β̂C C and β̂V in Table 2 are the same in Table 1, so we do not report them in Table 2. For finite sample sizes, the mean, median, standard error (se) and 95% confidence intervals of the estimator are calculated based on 500 independent runs. We observe that βF is the best estimator but it is not always obtainable for practical studies. The estimator β̂N is biased. In all the situations β̂E PL is observed to be a consistent estimator of true β 0 . The estimates obtained from Zhou and Wang (2000) method are biased when γ = 2. Also, we notice that the bias in their estimates increases when σ increases. Efficiency of β̂E PL relative to the complete case estimator β̂C C is approximately the same for β 1 = log(2) but much higher for β 2 = 0.5. For γ = 0 the estimator β̂Z W is more efficient than the proposed estimator for smaller values of σ but as the correlation between the exposure and auxiliary variable decreases the efficiency becomes closer (see the values of se for n = 300). Also, we notice that β̂E PL has less bias than β̂V for different values of n, but they are still comparable and even in some cases β̂V is better in terms of the standard deviation. For γ = 2, our method stays almost equally efficient as σ increases, but β̂Z W fails because of its large bias and low coverage probability (cp). Note that, when γ = 2, W is an informative auxiliary variable about the failure time and is not very informative about X.
We also performed simulations to see the effect of validation ratio and different bandwidths on the estimation. The proposed estimator β̂E PL works well for smaller validation percentages and is not very sensitive to the bandwidth selection. In particular, β̂E PL is better than β̂V when the validation ratio is 0.25, which is evidenced in Table 3. We also conducted simulations with smaller validation ratios. Our experience indicates that, as the validation ratio gets as small as 0.2, β̂V is very bad but β̂E PL still works.
We conclude that, the proposed partial likelihood estimator can be used to make inference for β under various situations. In particular, the estimator is consistent and efficient when the auxiliary variable is informative about the hazard rate of failure time while Zhou and Wang's estimator fails. 6 Real data analysis

Primary Biliary Cirrhosis data
We apply the proposed approach to the data from the Mayo Clinic trial in primary biliary cirrhosis (PBC) of the liver conducted between 1974 and 1984. A total of 424 PBC patients, referred to Mayo Clinic during that ten-year interval, met eligibility criteria for the randomized placebo controlled trial of the drug D-penicillamine. The first 312 cases in the data set participated in the randomized trial and contain largely complete data. The additional 112 cases did not participate in the clinical trial, but agreed to have basic measurements recorded and to be followed for survival. Six of those cases were lost to follow-up shortly after diagnosis, so the data here are on an additional 106 cases as well as the 312 randomized participants.
A clinical background description and a more extended discussion for the trial and the covariates recorded can be found in Dickson et al. (1989) and Markus et al. (1989). The variables involved in our specify analysis include id: case number; days: number of days between registration and the earlier of death, transplantation, or study analysis time; status: status of censoring; bili: serum bilirubin (in mg/dl); chol: serum cholesterol (inmg/dl) and Age: age in days.
In this analysis, we are particularly interested in the effect of patients' serum cholesterol and age on the survival of the patients. This type of failure time data can be modeled by the Cox proportional hazards models with an unknown baseline hazard function. However, about 31% outcomes of cholesterol were missing in this data set. Removing those observations may lead to biased estimates and standard errors. We noted that the outcomes of serum bilirubin were completely obtained with no missing values. Preliminary analysis showed that there is a significant correlation between serum cholesterol and bilirubin. Also, intuitively bilirubin has some additional effect on the hazard of failure and we would like to use that information efficiently. To illustrate this effect, we performed a complete Cox regression analysis for two different situations. We take the logarithmic transformation of bilirubin for our study.
In Table 4, we observe that the coefficient and standard error estimates are quite different for both the situations and the 95% confidence intervals for the coefficient of age are nonoverlapping. We can conclude that serum bilirubin has some additional effect on the hazard of failure. Hence, our proposed method can be applied to this dataset considering serum bilirubin as the informative auxiliary covariate. Table 5 displays the analysis results based on the Cox's regression for the complete data (CC), the method proposed by Zhou and Wang (2000) (ZW) and the proposed method (EPL). The CC method uses only 284 complete-case observations and the other two methods use all 418 observations. Variable "logchol" denotes the logarithm of cholesterol. The estimates of the coefficients and their standard errors are given in the table.
The regression analysis confirms that both serum cholesterol and age are significantly related to the time to event. For estimating the effect of serum cholesterol and age, there is a reasonable efficiency gain by using the two methods based on partial likelihood approach over the complete case Cox regression analysis. But there is a discrepancy between the estimates from complete data and Zhou and Wangs estimate which could be due to the fact that the latter method does not consider the additional effect contributed by the auxiliary covariate. In our simulation we observed that the standard error of the estimates were underestimated in Zhou and Wangs method when auxiliary variable was informative. In the real data analysis also the standard error estimate for serum cholesterol is underestimated. Moreover, the standard error estimates in our method is comparable to Zhou and Wangs method whereas the calculation time is much less compared to their method.

Serrum Ferritin Concentration in relation to preterm delivery study
We apply the proposed approach to the data on iron intake in relation to preterm delivery study from the University of North Carolina Hospitals at Chapel Hill. A total of 1520 women were included in the study. 17 of these women were lost to follow up. So the data consist of 1503 individuals among which 270 individuals had their serrum ferritin concentration (FERRITIN) measured with an immunometric assay. However a crude score for dietary iron intake (DTFE) was collected using a dietary food frequency questionnaire for all the individuals.
A clinical background description and a more extended discussion for the trial and the covariates recorded can be found in Savitz et al. (2001). The variables involved in our specfic analysis include (i) id: case number; (ii) Gestation Time: The number of weeks from pregnancy to delivery; (iii) DTFE: Dietary iron intake(in 100mg/dl); (iv) Ferritin: Serum Ferritin (in 100 mg/dl); and (v) Age: age in years. By using the notations in the proposed method, X is Ferritin, W is DTFE, and Z is Age.
In this analysis, we are particularly interested in the effect of patients' serum ferritin and age on the delivery of the patients. This type of failure time data can be modeled by the Cox proportional hazards models.
However, outcomes for serum ferritin were missing in this data set. Removing those observations can lead to biased or inefficient estimates. We noted that the outcomes of dietary iron intake were completely obtained with no missing values. Table 6 displays the analysis results based on the the CC method, the ZW method, and the proposed EPL method. The CC method used only 270 complete-case observations and the other two methods used all 1503 observations. The regression analysis using the new method confirms that both serum ferritin and age are significantly related to the time to event. For estimating the effect of serum ferritin and age, there is also a reasonable efficiency gain by using the two methods based on partial likelihood approach over the complete case cox regression analysis. The estimate of serrum ferritin is lower by the EPL method. The estimate is significantly different from zero with pvalue 0.020. In contrast, the p-value from CC method in estimation of serrum ferritin is 0.06.

Conclusion
We have introduced an EPL estimation method for Cox's models with informative auxiliary covariates and established asymptotic normality of our estimator. The proposed proposed methodology allows for multivariate auxiliary covariates W without suffering the curse of dimensionality.
We used the same bandwidth as suggested by Zhou and Wang (2000) in our estimation. Though it performs reasonably well, one can develop a bandwidth selection criteria like generalized cross-validation for an improved estimation. It is desirable to increase the efficiency of the estimation. In future, we can consider the optimization of α or introduce some weight structure in the score equation to achieve robustness. Further, it is worthy extending our approach to model multivariate failure time.

Supplementary Material
Refer to Web version on PubMed Central for supplementary material.
. Then the asymptotic normality can be obtained by using the Cramé-Wald device and directly computing the asymptotic mean and variance (see, for example the Lemma 6.3 in Jiang and Mack 2001).

Proof of Proposition 3.2
Note that from (3.8) The asymptotic normality of is obtained by the Slutsky's theorem and the asymptotic normality of and .

Proof of Theorem 4.1
The proof is argued in the framework of the multivariate counting processes, the martingale theory, and the techniques commonly used in nonparametric regression. Following the same routine as in Zhou and Wang (2000), the consistency of β̂E PL can be derived by using the Inverse Function Theorem (Rudin 1964;Andersen and Gill, 1982) and the argument by Foutz (1977). In the following, we give only the asymptotic normality in Theorem 4.1. The main techniques we employed are Taylor's expansion of the score function corresponding to the estimated likelihood function (3.10), Lenglart inequality, the martingale central limit theorem (see e.g. Fleming and Harrington 1991), and nonparametric regression techniques. By using counting process notation, the score function corresponding to the estimated partial likelihood function (3.10) at time point t can be written as (7.16) where By (7.16), β̂E PL solves the equation Û(β, 1) = 0. By Taylor's expansion, one gets (7.17) where β * is between β̂E PL and β 0 . By Lemma 7.1 and consistency of β̂E PL , Therefore, to prove the asymptotic normality in the theorem it suffices to show that n −1/2 Û (β, 1) is asymptotically normal with mean 0 and variance Σ(β 0 ) = (1−ρ)Σ 1 (β 0 )+ρΣ 2 (β 0 ), which is evidenced in Lemma 7.4 below.