Failure time regression with continuous informative auxiliary covariates
- Lipika Ghosh^{1},
- Jiancheng Jiang^{1}Email author,
- Yanqing Sun^{1} and
- Haibo Zhou^{2}
DOI: 10.1186/s40488-015-0026-8
© Ghosh et al.; licensee Springer. 2015
Received: 28 February 2014
Accepted: 14 November 2014
Published: 20 February 2015
Abstract
In this paper we use Cox’s regression model to fit failure time data with continuous informative auxiliary variables in the presence of a validation subsample. We first estimate the induced relative risk function by kernel smoothing based on the validation subsample, and then improve the estimation by utilizing the information on the incomplete observations from non-validation subsample and the auxiliary observations from the primary sample. Asymptotic normality of the proposed estimator is derived. The proposed method allows one to robustly model the failure time data with an informative multivariate auxiliary covariate. Comparison of the proposed approach with several existing methods is made via simulations. Two real datasets are analyzed to illustrate the proposed method.
Mathematics Subject Classification (MSC): 62G07, 62G20
Keywords
Auxiliary covariates Censoring Estimated partial likelihood Local linear smoothing ValidationIntroduction
In epidemiologic studies, the exposure variable vector X is often too difficult or too expensive to measure on the full cohort, whereas an auxiliary variable vector W for X can be easily measured for all subjects in the study cohort. For example, in a large scale nutritional study, the PIN Study (Savitz et al. 2001), it would be prohibitively expensive to obtain the exact dietary iron intake on each individual recruited. Instead, a self administered quantitative food questionnaire is conducted on all subjects where a crude assessment of iron intake is obtained. The true exposure, the blood serrum ferritin concentration, is only assayed for a validation set consisting of a small subset of the full study cohort. Although the true covariates are missing for most individuals, the existence of some surrogates or auxiliary measurements conveys information about X and serves as common proxy measure. Utilizing the available auxiliary information to improve the efficiency of the effects estimation and in turns to increase the power of the study is critical for the success of the studies. In this paper, we study censored failure time regression with a continuous auxiliary covariate vector.
A variety of authors have contributed their work to this field. Related works include Prentice (1982), Pepe et al. (1989), Lin and Ying (1993), Hughes (1993), Lipsitz and Ibrahim (1996), Zhou and Wang (2000), Fan and Wang (2009), Liu et al. (2010), etc. In particular, Prentice (1982) introduced a partial likelihood estimator based on the induced relative risk function. This method was further developed by Pepe et al. (1989) using parametric modeling. Zhou and Pepe (1995) proposed an estimated partial likelihood method for discrete auxiliary covariates to relax the parametric assumptions on the frequency of events and the underlying distributions of covariates. This method was extended by Zhou and Wang (2000) to deal with continuous auxiliary variables, based on the Nadaraya-Watson kernel smoother method (Nadaraya 1964; Watson, 1964). Fan and Wang (2009), Liu et al. (2010) used the same approach for multivariate failure time data with auxiliary covariates. While Zhou and Wang’s (2000) approach is useful in certain situations, there are some restrictions on it. First, the approach is effective only when the auxiliary variable W is of low dimension so that the “curse of dimensionality” in nonparametric smoothing can be avoided. Secondly, it requires that, conditionally on X, W provides no additional information about the hazard of failure; that is, all of the effects of W on failure and censoring are mediated through X, which is somewhat restricted since W may not be a true surrogate and depends on the failure given X.
Further, this method does not fully utilize the observations in the non-validation subsample and hence cannot be efficient in certain situations.
We here propose a new method to deal with the above problems associated with the method in Zhou and Wang (2000). The proposed method allows W to be multivariate and to be informative in the sense that, conditional X, it may provide additional information on the hazard of failure. We first estimate the induced relative risk function with a kernel smoother based on the validation sample, and then improve the estimation by utilizing the information on the incomplete observations from the non-validation subsample. In addition, the local linear smoother (see for example in Fan and Gijbels 1996) is employed to enhance the performance of the kernel smoother in Zhou and Wang (2000) at the boundary regions. Our method will be expected to improve the efficiency of the estimator of Zhou and Wang (2000) in various situations, for example, when auxiliary variable W is informative or not very informative about X (see also the simulation results). Asymptotic normality of our estimator is derived.
The proposed methodology can be extended to model multivariate failure time data with auxiliary covariates by following the method in Fan and Wang (2009) or Liu et al. (2010).
The paper is organized as follows. In Section 2, we introduce the hazards models. In Section 3 we introduce our new estimation approach to predicting the induced relative risk for individuals in non-validation subsample based on the kernel smoother. In Section 4 we concentrate on the asymptotic properties of the proposed estimators. We conduct simulations in Section 5 to compare the efficiencies of different estimating methods. In Section 6 we apply the proposed methodology to two real datasets.
Cox’s proportional hazards models
To facilitate exposition, we here employ the notations in Zhou and Wang (2000). Suppose that there are n independent individuals in a study cohort. Let {X _{ i }(t),Z _{ i }(t)} denote the covariate vector for the ith subject at time t (i=1,⋯,n). Assume that X _{ i }(·) is only observed in the validation subsample which is chosen at the baseline under the ignorable missing mechanism condition (Rubin 1976). Let Z _{ i }(·) be the remaining covariate vector that is always observed, and W _{ i }(·) the informative auxiliary variables for X _{ i }(·). Let η _{ i } be an indicator variable with η _{ i }=1 if the ith individual is in the validation set and 0 if in the nonvalidation set. Put V={i: η _{ i }=1} and \(\bar {V}=\{i:\ \eta _{i}=0\}\). We assume that individuals in the validation subsample are randomly selected and hence representative. Then observed data for the ith subject is {S _{ i },δ _{ i },Z _{ i }(·),W _{ i }(·),X _{ i }(·)} if η _{ i }=1, and {S _{ i },δ _{ i },Z _{ i }(·),W _{ i }(·)} if η _{ i }=0, where S _{ i } is the observed event time for the ith subject, which is the minimum of the potential failure time T _{ i } and the censoring time C _{ i }, and δ _{ i } is the indicator of censoring. We consider the following conditional hazard rate function of failure (Cox 1972)
where λ _{0}(·)≥0 is the unspecified base-line hazard and β=(β1′,β2′)^{′} is the relative risk parameter vector to be estimated.
where \({\mathcal R}(S_{i})\) is the risk set at time S _{ i }. However, for \(i\in \bar {V}\), the true variate X _{ i }(t) is not observed, and hence the corresponding relative risk function γ _{ i }(β,t) is not available and has to be imputed.
they derived the consistency and asymptotic normality of the estimation. However, if W is informative, their method will generally be biased (see also Section 5). In addition, since this method directly used information in the auxiliary covariate W and estimated the conditional expectation (2.3), it may encounter the so-called “curse of dimensionality” if W is of higher dimension. For the present study, we propose a new method for imputation of the relative risk function. The information in W will be used in a new way. This leads to a new estimated partial likelihood.
Estimated partial likelihood with a local smoother
In this section, we introduce our method to estimate the parameters in model (2.1) based on maximizing the estimated partial likelihood.
3.1 Local smoother for the relative risk function
The above estimation of the relative risk function was similarly used in Zhou and Wang (2000) for a nonparametric smoothing problem on the estimation of E[ γ _{ i }(β,t)|S _{ i }≥t,Z _{ i }(t),W _{ i }(t)], where the “curse of dimensionality” problem can happen if W is multivariate. Note that this estimation method uses only the complete observations in V and neglects the important information on incomplete observations in \(\bar {V}\). It follows that this approach cannot be expected to be efficient in certain situations. In addition, it is required in Zhou and Wang (2000) that, conditional on X, the auxiliary variable W provides no additional information on the the hazard of failure. This requirement may not hold if W is not a genuine surrogate of X. In the following, we propose an improved estimation approach which utilizes information from W and observations in \(\bar {V}\) and does not impose the requirement. Moreover, the proposed method allows one to model the failure time data with informative multivariate auxiliary variable W without “curse of dimensionality”. Note that even for one dimensional Z and W, the method in Zhou and Wang (2000) requires a two-dimensional smoother while the new method needs only one-dimensional smoothing. To have a performance comparable with that of a one-dimensional nonparametric smoother using M _{1}=50 data points, we need about \(M=M_{1}^{1.2}=109\) data points for a 2-dimensional nonparameteric smoother. Hence the loss of efficiency due to highly dimensional smoothing is large and increasing exponentially fast (see page 317 of Fan and Yao 2003).
3.2 Improved estimation of the relative risk function and the estimated partial likelihood
The following result depicts asymptotic correlation of \(\hat {\nu }_{j}(\beta _{1},t)\) and \(\hat {\psi }_{j}(\alpha,t)\).
Proposition 3.1.
The updated estimator \(\bar {\nu }_{j}(\beta _{1},t)\) is doomed to be more accurate than \(\hat {\nu }_{j}(\beta _{1},t)\) in (3.5), since it has used the information from W and observations in \(\bar {V}\). Even though the information about W may not be utilized in a very efficient way as in Zhou and Wang’s (2000) estimator when W is not informative, it is the price we have to pay for achieving robustness against informative W. Note that \(\bar {\nu }_{j}\) depends on α which is related to efficiency of the estimator. Intuitively, one should choose α to maximize the conditional correlation coefficient between ζ _{ j } and ξ _{ j }, given (S _{ j }≥t,Z _{ j }), which is evident from the following result.
Proposition 3.2.
When \(\rho _{\alpha }^{\ast }=0\), i.e. the relative risk contributed by W is not correlated to that contributed by X, given (S≥t,Z), the estimator \(\bar {\nu }_{j}\) is asymptotically equivalent to \(\hat {\nu }_{j}\) in (3.5).
In general, \(\rho _{\alpha }^{\ast }>0\). Hence, by Propositions 3.1 and 3.2, \(\bar {\nu }_{j}\) is more efficient than \(\hat {\nu }_{j}.\) Note that the proposed estimator is consistent for any α.
The above estimation method for ν _{ j }(β,t) was similarly used in Chen and Chen (2000) for estimating parameters in a parametric regression model. Our estimation can be regarded as an extension of their estimation approach in nonparametric regression. In addition, we do not need a working model to specify the regression relationship between the surrogate and the covariate, and hence there is no risk of misspecification of the working model.
We denote \(\hat {\beta }_{\textit {EPL}}=\arg \max _{\beta }EPL(\beta).\)
For an extreme case with W≈Z, Zhou and Wang’s imputation for (2.3) approximately becomes \(\hat {\phi }_{i}(\beta,t)=\hat {\nu _{i}}(\beta,t)\exp (\beta 'Z_{i}(t))\) and uses a two dimensional smoother, which is inferior to the improved estimator \(\bar {\phi }_{i}(\beta,t)\), and hence by the definition of \(\hat {\beta }_{\textit {EPL}}\), our estimator is superior to Zhou and Wang’s. However, it is generally difficult to compare these two estimators. In our estimation of the induced relative risk, we used an improved estimator \(\bar {\phi }_{j}(\beta,t)\) for \(j\in \bar {V}\). The “curse of dimensionality” problem in Zhou and Wang (2000) can be avoided for a multivariate W. Our approach would at least be useful in cases where the number of variables in Z which are correlated with the missing covariate X is low, whereas the exposure variables of interest and their auxiliary variables may be multivariate.
An alternative to \(\hat {\beta }_{\textit {EPL}}\) is to maximize (3.10) but with \(\hat {r}_{i}(\beta,t)\) replaced by \(\tilde {r}_{i}(\beta,t)=\eta _{i}\gamma _{i}(\beta,t) +(1-\eta _{i})\hat {\phi }_{i}(\beta,t),\) where \(\hat {\phi }_{i}(\beta,t)=\hat {\nu _{i}}(\beta,t)\exp (\beta 'Z_{i}(t)).\) We denote the resulting estimator by \(\hat {\beta }_{V}\), which does not use the information on W in \(\bar {V}\). Intuitively, \(\hat {\beta }_{\textit {EPL}}\) should be better than \(\hat {\beta }_{V}\), but this is not true in general, since comparison of the asymptotic results in Theorems 4.1 and 4.2 below could not lead to a dominated estimator. However, in small validation ratio settings, \(\hat {\beta }_{V}\) is not expected to perform well, since it uses only the observations in the validation set for smoothing.
Asymptotic behaviors
Let n _{ v } be the subsample size of the validation set and let ρ be the limit of ratio of validation observations, lim_{ n→∞ } n _{ v }/n. Assume that ρ∈(0,1]. Define s ^{(0)}(β,t)=E[ Y _{ i }(t)r _{ i }(β,t)], s ^{(1)}(β,t)=(∂/∂ β)s ^{(0)}(β,t), s ^{(2)}(β,t)=(∂/∂ β ^{ τ })s ^{(1)}(β,t).
For any matrix A, we use A ^{⊗2} to denote matrix A A ^{ τ }.
Define the filters \({\mathcal F}_{i}(t)=\sigma \bigl \{N_{i}(u),Y_{i}(u+), X_{i}(u), Z_{i}(u):\, 0\le u\le t\bigr \}.\)
Then, under the independent censoring assumption (4.11),
M _{ i }(t) is a mean zero martingale with repsect to \({\mathcal F}_{i}(t)\) (Kalbfleisch and Prentice 1980; Fleming and Harrington 1991).
The following theorem shows that \(\hat {\beta }_{\textit {EPL}}\) is asymptotically normal.
Theorem 4.1.
Remark 4.1.
It is interesting to note that \(Q_{i}^{*}\approx Q_{i}\) when the auxiliary W approximates X, and hence the second term in the expectation of Σ _{2}(β) approximates to (1−ρ)Q _{ i }. Therefore, a small ρ will not result in a big Σ _{2}(β _{0}). Theoretically, when W _{ i }=Z _{ i }, \(Q_{i}^{*}=0\) and the above asymptotic variance formula shares the same formula as that for the estimator in Zhou and Wang (2000) as exactly expected. However, in practice where W _{ i }≈Z _{ i }, since Zhou and Wang (2000) used a higher dimensional smoother than us, our estimator would have better efficiency for finite samples.
In summary, a constant variance estimator for \(\hat {\beta }_{\textit {EPL}}\) can be obtained by replacing the population quantities in the asymptotic covariance matrix Σ(β _{0}) with their corresponding sample averages as in Zhou and Wang (2000). Hence, the asymptotic confidence intervals for β can also be constructed.
The following theorem demonstrates the asymptotic normality of \(\hat {\beta }_{V}\).
Theorem 4.2.
Under the same conditions as in Theorem 4.1, the estimator \(\hat {\beta }_{V}\) shares the same asymptotic distribution as \(\hat {\beta }_{\textit {EPL}}\) but with Σ _{1}(β _{0}) and Σ _{2}(β _{0}) replaced by Σ _{1V }(β _{0}) and Σ _{2V }(β _{0}), respectively, where \(\Sigma _{1V}(\beta _{0})= E\left [\!\int _{0}^{1}\Delta (\phi _{i})(u){dM}_{i}(u)\right ]^{\otimes 2},\) and \(\Sigma _{2V}(\beta _{0})=E\left [\!\int _{0}^{1}\Delta (\gamma _{i})(u){dM}_{i}(u) -\frac {1-\rho }{\rho } Q_{i} \right ]^{\otimes 2}.\)
4.1 Choice of the parameter vector α
The choice of α affects efficiency of \(\hat {\beta }_{\textit {EPL}}\), although the estimator is \(\sqrt {n}\)-consistent for any α. In this paper, we choose α by minimizing the variance of the estimator \(\hat {\beta }_{\textit {EPL}}\). Given initial value of β and α, one can estimate α by minimizing the trace of \(\hat {\Omega }(\alpha)\).
Once the value of α is known, maximization of E P L(β) can be solved via Newton-Raphson iterations. Repeating this procedure, one can find a solution to the optimization problem (3.10). To reduce the burden of computation in practice, one can employ a consistent naive estimator of β as initial value, for example the estimator of β based on only the validation sample which is easy to implement because it involves only a simple fit for the usual Cox’s model. In our experience, using the naive estimator as the initial value the iterations converge in a few steps.
4.2 Choice of the bandwidth parameter
As for the bandwidths, they affect the estimator \(\hat {\beta }_{\textit {EPL}}\), which is true in any nonparametric smoothing problems. Fortunately, the proposed estimator \(\hat {\beta }_{\textit {EPL}}\) is effective for a large range of bandwidths (see Condition (6) in Appendix 1). Similar to that in Zhou and Wang (2000), we employed here the empirical bandwidth h=(h _{1},h _{2})^{′} with \(h_{1}=2\hat {\sigma }_{Z} n^{-1/3}\) and \(h_{2}=2\hat {\sigma }_{W}n^{-1/3}\), where \(\hat {\sigma }_{Z}\) and \(\hat {\sigma }_{W}\) are respectively the sample standard deviations of Z and W, which satisfy the bandwidth conditions required in this paper.
Simulations
where \(e\sim {\mathcal N}(0,\sigma ^{2})\) and σ ^{2} is the parameter controlling the strength of the association between X and W. We consider the settings with γ=0 and 2. Model (5.13) with γ=2 allows one to explore the effectiveness of the proposed method with an informative surrogate W. For γ=0, it also allows us to compare the performance of the newly proposed method and that in Zhou and Wang (2000). We do simulations for σ=0.2 and 0.8. The censoring variable is uniformly distributed and independent of the failure time. The validation set is randomly selected with P(η _{ i }=1)=0.5.
We choose the Gaussian kernel function with the bandwidths \(\left (h_{1}=2\hat {\sigma }_{Z} n^{-1/3}, h_{2}=2\hat {\sigma }_{W} n^{-1/3}\right)\) which satisfy the bandwidth conditions in Theorem 4.1, where \(\hat {\sigma }_{Z}\) and \(\hat {\sigma }_{W}\) are the sample standard deviations of Z and W respectively. In the following tables, β _{0}=[ log(2),0.5]^{′} denotes the true value of the parameter to be estimated, se is the standard error of \(\hat {\beta }_{\textit {EPL}}\) from simulation, \(mean(\hat {se})\) denotes the mean of the estimated standard errors and cp denotes the 95% coverage probability.
The methods we considered are the newly proposed estimated partial likehood estimation (\(\hat {\beta }_{\textit {EPL}}\))and its conterpart (\(\hat {\beta }_{\textit {ZW}}\)) in Zhou and Wang (2000), the estimator (\(\hat {\beta }_{V}\)) which does not use the information on W in \(\bar {V}\), the complete-case Cox regression analysis (\(\hat {\beta }_{\textit {CC}}\)) which uses only the validation subsample, the Cox regression with W substituted for the missing X (\(\hat {\beta }_{N}\)), and the full data Cox regression (\(\hat {\beta }_{F}\)) which assumes that X is available for all n subjects in the study.
Comparison of simulation results with σ=0.2 and validation fraction 0.5
n | γ =0 | γ =2 | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
\({\hat {\beta }_{F}}\) | \({\hat {\beta }_{CC}}\) | \({\hat {\beta }_{N}}\) | \({\hat {\beta }_{V}}\) | \({\hat {\beta }_{ZW}}\) | \({\hat {\beta }_{EPL}}\) | \({\hat {\beta }_{N}}\) | \({\hat {\beta }_{ZW}}\) | \({\hat {\beta }_{EPL}}\) | |||
50% | |||||||||||
censoring | |||||||||||
100 | mean−β _{0} | 0.018 | 0.021 | -0.034 | -0.036 | -0.045 | 0.006 | -0.988 | -0.434 | 0.031 | |
0.004 | 0.030 | 0.022 | 0.035 | 0.013 | 0.006 | 0.169 | 0.174 | 0.010 | |||
median−β _{0} | 0.013 | -0.001 | -0.045 | -0.047 | 0.033 | -0.033 | -0.983 | -0.484 | 0.021 | ||
-0.015 | 0.023 | 0.002 | 0.019 | -0.001 | 0.011 | 0.160 | 0.172 | 0.008 | |||
se | 0.323 | 0.439 | 0.302 | 0.410 | 0.346 | 0.457 | 0.084 | 0.574 | 0.459 | ||
0.281 | 0.391 | 0.276 | 0.307 | 0.283 | 0.312 | 0.237 | 0.324 | 0.338 | |||
\(mean(\hat {se})\) | 0.292 | 0.429 | 0.280 | 0.402 | 0.329 | 0.414 | 0.068 | 0.373 | 0.411 | ||
0.258 | 0.382 | 0.257 | 0.278 | 0.257 | 0.296 | 0.244 | 0.274 | 0.288 | |||
cp | 0.916 | 0.956 | 0.924 | 0.938 | 0.948 | 0.946 | 0.0 | 0.670 | 0.946 | ||
0.930 | 0.960 | 0.934 | 0.922 | 0.914 | 0.924 | 0.920 | 0.892 | 0.916 | |||
300 | mean−β _{0} | -0.019 | 0.026 | -0.068 | -0.039 | -0.007 | -0.021 | -0.994 | -0.630 | 0.005 | |
0.006 | 0.017 | 0.024 | 0.013 | 0.010 | 0.002 | 0.166 | 0.202 | -0.007 | |||
median−β _{0} | -0.028 | 0.024 | 0.077 | -0.042 | -0.012 | -0.018 | -0.991 | -0.635 | -0.002 | ||
0.007 | 0.001 | 0.028 | 0.021 | 0.018 | 0.002 | 0.169 | 0.199 | 0.003 | |||
se | 0.161 | 0.234 | 0.158 | 0.225 | 0.176 | 0.238 | 0.048 | 0.246 | 0.243 | ||
0.146 | 0.217 | 0.146 | 0.162 | 0.150 | 0.163 | 0.127 | 0.137 | 0.166 | |||
\(mean(\hat {se})\) | 0.164 | 0.233 | 0.158 | 0.227 | 0.177 | 0.222 | 0.039 | 0.170 | 0.231 | ||
0.146 | 0.209 | 0.145 | 0.159 | 0.147 | 0.155 | 0.137 | 0.125 | 0.159 | |||
cp | 0.944 | 0.942 | 0.928 | 0.948 | 0.950 | 0.936 | 0.0 | 0.108 | 0.940 | ||
0.956 | 0.942 | 0.956 | 0.948 | 0.944 | 0.938 | 0.796 | 0.630 | 0.940 | |||
20% | |||||||||||
censoring | |||||||||||
100 | mean−β _{0} | 0.021 | 0.020 | -0.031 | -0.041 | 0.044 | 0.002 | -1.003 | -0.455 | 0.023 | |
0.001 | 0.014 | 0.018 | 0.036 | 0.013 | 0.005 | 0.155 | 0.163 | 0.011 | |||
median−β _{0} | 0.016 | 0.014 | -0.029 | -0.048 | 0.038 | -0.013 | -1.000 | -0.466 | 0.003 | ||
-0.008 | -0.001 | 0.011 | 0.029 | 0.005 | 0.003 | 0.151 | 0.159 | 0.005 | |||
se | 0.248 | 0.339 | 0.234 | 0.322 | 0.272 | 0.364 | 0.071 | 0.467 | 0.360 | ||
0.211 | 0.305 | 0.210 | 0.224 | 0.212 | 0.229 | 0.180 | 0.241 | 0.235 | |||
\(mean(\hat {se})\) | 0.232 | 0.340 | 0.223 | 0.306 | 0.263 | 0.313 | 0.062 | 0.318 | 0.315 | ||
0.205 | 0.302 | 0.204 | 0.217 | 0.204 | 0.213 | 0.195 | 0.214 | 0.215 | |||
cp | 0.936 | 0.956 | 0.934 | 0.912 | 0.966 | 0.894 | 0.0 | 0.550 | 0.904 | ||
0.938 | 0.956 | 0.938 | 0.934 | 0.952 | 0.936 | 0.924 | 0.862 | 0.928 | |||
300 | mean−β _{0} | -0.007 | -0.009 | -0.056 | -0.032 | -0.006 | -0.011 | -1.001 | -0.617 | 0.015 | |
-0.001 | 0.008 | 0.016 | 0.008 | 0.004 | -0.004 | 0.152 | 0.023 | -0.012 | |||
median−β _{0} | -0.019 | -0.016 | -0.064 | -0.040 | -0.006 | -0.022 | -1.001 | -0.613 | 0.001 | ||
-0.002 | 0.002 | 0.011 | 0.007 | 0.003 | -0.005 | 0.154 | 0.024 | -0.016 | |||
se | 0.131 | 0.190 | 0.127 | 0.177 | 0.141 | 0.194 | 0.044 | 0.304 | 0.195 | ||
0.116 | 0.164 | 0.116 | 0.126 | 0.119 | 0.127 | 0.100 | 0.178 | 0.135 | |||
\(mean(\hat {se})\) | 0.131 | 0.187 | 0.260 | 0.179 | 0.142 | 0.179 | 0.034 | 0.219 | 0.179 | ||
0.116 | 0.166 | 0.150 | 0.125 | 0.117 | 0.122 | 0.110 | 0.161 | 0.124 | |||
cp | 0.948 | 0.960 | 0.928 | 0.952 | 0.966 | 0.926 | 0.0 | 0.244 | 0.926 | ||
0.952 | 0.954 | 0.952 | 0.954 | 0.952 | 0.944 | 0.748 | 0.070 | 0.934 |
Comparison of simulation results with σ=0.8 and validation fraction 0.5
n | γ =0 | γ =2 | ||||||
---|---|---|---|---|---|---|---|---|
\({\hat {\beta }_{N}}\) | \({\hat {\beta }_{ZW}}\) | \({\hat {\beta }_{EPL}}\) | \(\hat {\beta }_{ZW}\) | \({\hat {\beta }_{ZW}}\) | \({\hat {\beta }_{EPL}}\) | |||
50% | ||||||||
censoring | ||||||||
100 | mean−β _{0} | -0.369 | -0.045 | 0.034 | -0.961 | -0.325 | 0.027 | |
0.139 | 0.055 | 0.008 | 0.177 | 0.141 | 0.009 | |||
median−β _{0} | -0.381 | -0.052 | 0.021 | -0.955 | -0.368 | 0.010 | ||
0.028 | 0.051 | 0.008 | 0.164 | 0.140 | 0.002 | |||
se | 0.206 | 0.374 | 0.457 | 0.076 | 0.549 | 0.450 | ||
0.260 | 0.287 | 0.338 | 0.243 | 0.325 | 0.332 | |||
\(mean(\hat {se})\) | 0.196 | 0.256 | 0.414 | 0.064 | 0.374 | 0.414 | ||
0.249 | 0.262 | 0.288 | 0.244 | 0.271 | .293 | |||
cp | 0.504 | 0.940 | 0.934 | 0.0 | 0.721 | 0.940 | ||
0.914 | 0.932 | 0.916 | 0.904 | 0.888 | 0.920 | |||
300 | mean−β _{0} | -0.392 | -0.056 | 0.012 | -0.965 | -0.399 | 0.012 | |
0.139 | 0.033 | -0.011 | 0.175 | 0.147 | -0.009 | |||
median−β _{0} | -0.392 | -0.055 | 0.004 | -0.963 | -0.395 | 0.004 | ||
0.139 | 0.044 | -0.004 | 0.176 | 0.145 | -0.002 | |||
se | 0.114 | 0.198 | 0.255 | 0.044 | 0.325 | 0.254 | ||
0.142 | 0.156 | 0.170 | 0.129 | 0.180 | 0.171 | |||
\(mean(\hat {se})\) | 0.108 | 0.223 | 0.227 | 0.036 | 0.213 | 0.228 | ||
0.140 | 0.157 | 0.159 | 0.137 | 0.157 | 0.158 | |||
cp | 0.068 | 0.932 | 0.932 | 0.0 | 0.520 | 0.932 | ||
0.830 | 0.946 | 0.934 | 0.770 | 0.808 | 0.936 | |||
20% | ||||||||
censoring | ||||||||
100 | mean−β _{0} | -0.368 | -0.046 | 0.024 | -0.969 | -0.328 | 0.022 | |
0.126 | 0.052 | 0.019 | 0.163 | 0.128 | 0.021 | |||
median−β _{0} | -0.372 | -0.052 | 0.020 | -0.966 | -0.354 | 0.016 | ||
0.122 | 0.053 | 0.020 | 0.168 | 0.124 | 0.015 | |||
se | 0.165 | 0.272 | 0.360 | 0.064 | 0.450 | 0.348 | ||
0.202 | 0.213 | 0.240 | 0.186 | 0.238 | 0.238 | |||
\(mean(\hat {se})\) | 0.156 | 0.263 | 0.306 | 0.057 | 0.291 | 0.321 | ||
0.198 | 0.207 | 0.211 | 0.195 | 0.212 | 0.222 | |||
cp | 0.352 | 0.966 | 0.912 | 0.0 | 0.658 | 0.916 | ||
0.918 | 0.942 | 0.928 | 0.910 | 0.878 | 0.924 | |||
300 | mean−β _{0} | -0.390 | -0.048 | 0.013 | -0.972 | -0.388 | 0.021 | |
0.123 | 0.026 | -0.011 | 0.161 | 0.127 | -0.015 | |||
median−β _{0} | -0.393 | -0.058 | -0.004 | -0.972 | -0.399 | 0.013 | ||
0.123 | 0.028 | -0.017 | 0.164 | 0.128 | -0.016 | |||
se | 0.092 | 0.159 | 0.195 | 0.039 | 0.262 | 0.200 | ||
0.115 | 0.123 | 0.131 | 0.102 | 0.138 | 0.133 | |||
\(mean(\hat {se})\) | 0.086 | 0.172 | 0.186 | 0.033 | 0.168 | 0.179 | ||
0.112 | 0.123 | 0.126 | 0.110 | 0.122 | 0.124 | |||
cp | 0.018 | 0.954 | 0.924 | 0.0 | 0.412 | 0.924 | ||
0.792 | 0.944 | 0.934 | 0.716 | 0.784 | 0.942 |
Comparison of simulation results with β=[ ln(2) 0.5] ^{ ′ } , 50% censoring, σ=0.2 , and validation fraction 0.25
n =100 | n =200 | ||||
---|---|---|---|---|---|
\({\hat {\beta }_{V}}\) | \({\hat {\beta }_{EPL}}\) | \({\hat {\beta }_{V}}\) | \({\hat {\beta }_{EPL}}\) | ||
m e a n−β _{0} | -0.142 | 0.056 | -0.087 | 0.049 | |
0.107 | 0.018 | 0.056 | 0.001 | ||
m e d i a n−β _{0} | -0.152 | 0.035 | -0.125 | 0.011 | |
0.091 | -0.002 | 0.065 | 0.002 | ||
se | 0.506 | 0.565 | 0.405 | 0.417 | |
0.329 | 0.320 | 0.234 | 0.232 | ||
\(mean(\hat {se})\) | 0.513 | 0.618 | 0.380 | 0.410 | |
0.306 | 0.333 | 0.220 | 0.224 | ||
cp | 0.944 | 0.936 | 0.928 | 0.924 | |
0.934 | 0.954 | 0.934 | 0.942 |
We conclude that, the proposed partial likelihood estimator can be used to make inference for β under various situations. In particular, the estimator is consistent and efficient when the auxiliary variable is informative about the hazard rate of failure time while Zhou and Wang’s estimator fails.
Real data analysis
6.1 Primary Biliary Cirrhosis data
We apply the proposed approach to the data from the Mayo Clinic trial in primary biliary cirrhosis (PBC) of the liver conducted between 1974 and 1984. A total of 424 PBC patients, referred to Mayo Clinic during that ten-year interval, met eligibility criteria for the randomized placebo controlled trial of the drug D-penicillamine. The first 312 cases in the data set participated in the randomized trial and contain largely complete data. The additional 112 cases did not participate in the clinical trial, but agreed to have basic measurements recorded and to be followed for survival. Six of those cases were lost to follow-up shortly after diagnosis, so the data here are on an additional 106 cases as well as the 312 randomized participants.
A clinical background description and a more extended discussion for the trial and the covariates recorded can be found in Dickson et al. (1989) and Markus et al. (1989). The variables involved in our specify analysis include id: case number; days: number of days between registration and the earlier of death, transplantation, or study analysis time; status: status of censoring; bili: serum bilirubin (in mg/dl); chol: serum cholesterol (inmg/dl) and Age: age in days.
In this analysis, we are particularly interested in the effect of patients’ serum cholesterol and age on the survival of the patients. This type of failure time data can be modeled by the Cox proportional hazards models with an unknown baseline hazard function. However, about 31% outcomes of cholesterol were missing in this data set. Removing those observations may lead to biased estimates and standard errors. We noted that the outcomes of serum bilirubin were completely obtained with no missing values. Preliminary analysis showed that there is a significant correlation between serum cholesterol and bilirubin. Also, intuitively bilirubin has some additional effect on the hazard of failure and we would like to use that information efficiently. To illustrate this effect, we performed a complete Cox regression analysis for two different situations. We take the logarithmic transformation of bilirubin for our study.
Regression analysis of primary biliary cirrhosis (PBC) data study
Method | Variable | Parameter | Standard error | 95% Confidence interval | |
---|---|---|---|---|---|
l o g b i l i<1.6 | CC | logchol | 0.271 | 0.393 | (-0.499, 1.040) |
age | 0.055 | 0.012 | (0.031, 0.079) | ||
l o g b i l i≥1.6 | ZW | logchol | -0.635 | 0.345 | (-1.312, 0.042) |
age | -0.005 | 0.016 | (-0.037, 0.027) |
Regression analysis of primary biliary cirrhosis (PBC) data
Method | Variable | Estimates of parameters | Standard error | 95% Confidence interval |
---|---|---|---|---|
CC | logchol | 0.853 | 0.214 | (0.432, 1.273) |
age | 0.048 | 0.010 | (0.029, 0.067) | |
ZW | logchol | 1.142 | 0.154 | (0.840, 1.444) |
age | 0.047 | 0.007 | (0.033, 0.061) | |
EPL | logchol | 0.851 | 0.215 | (0.429, 1.273) |
age | 0.044 | 0.007 | (0.029, 0.058) |
The regression analysis confirms that both serum cholesterol and age are significantly related to the time to event. For estimating the effect of serum cholesterol and age, there is a reasonable efficiency gain by using the two methods based on partial likelihood approach over the complete case Cox regression analysis. But there is a discrepancy between the estimates from complete data and Zhou and Wangs estimate which could be due to the fact that the latter method does not consider the additional effect contributed by the auxiliary covariate. In our simulation we observed that the standard error of the estimates were underestimated in Zhou and Wangs method when auxiliary variable was informative. In the real data analysis also the standard error estimate for serum cholesterol is underestimated. Moreover, the standard error estimates in our method is comparable to Zhou and Wangs method whereas the calculation time is much less compared to their method.
6.2 Serrum Ferritin Concentration in relation to preterm delivery study
We apply the proposed approach to the data on iron intake in relation to preterm delivery study from the University of North Carolina Hospitals at Chapel Hill. A total of 1520 women were included in the study. 17 of these women were lost to follow up. So the data consist of 1503 individuals among which 270 individuals had their serrum ferritin concentration (FERRITIN) measured with an immunometric assay. However a crude score for dietary iron intake (DTFE) was collected using a dietary food frequency questionnaire for all the individuals.
A clinical background description and a more extended discussion for the trial and the covariates recorded can be found in Savitz et al. (2001). The variables involved in our specfic analysis include (i) id: case number; (ii) Gestation Time: The number of weeks from pregnancy to delivery; (iii) DTFE: Dietary iron intake(in 100mg/dl); (iv) Ferritin: Serum Ferritin (in 100 mg/dl); and (v) Age: age in years. By using the notations in the proposed method, X is Ferritin, W is DTFE, and Z is Age.
In this analysis, we are particularly interested in the effect of patients’ serum ferritin and age on the delivery of the patients. This type of failure time data can be modeled by the Cox proportional hazards models.
However, outcomes for serum ferritin were missing in this data set. Removing those observations can lead to biased or inefficient estimates. We noted that the outcomes of dietary iron intake were completely obtained with no missing values.
Regression analysis of Iron intake in relation to preterm delivery study
Method | Variables | Estimates of parameters | Standard error | p -value | Hazard ratio |
---|---|---|---|---|---|
CC | ferritin | 0.2451 | 0.1306 | 0.060 | 1.278 |
age | 0.009 | 0.0108 | 0.402 | 1.009 | |
ZW | ferritin | 0.2236 | 0.076 | 0.004 | 1.251 |
age | 0.0102 | 0.0043 | 0.018 | 1.010 | |
EPL | ferritin | 0.1797 | 0.0771 | 0.020 | 1.197 |
age | 0.0159 | 0.0036 | 0.000 | 1.016 |
The regression analysis using the new method confirms that both serum ferritin and age are significantly related to the time to event. For estimating the effect of serum ferritin and age, there is also a reasonable efficiency gain by using the two methods based on partial likelihood approach over the complete case cox regression analysis. The estimate of serrum ferritin is lower by the EPL method. The estimate is significantly different from zero with p-value 0.020. In contrast, the p-value from CC method in estimation of serrum ferritin is 0.06.
Conclusion
We have introduced an EPL estimation method for Cox’s models with informative auxiliary covariates and established asymptotic normality of our estimator. The proposed proposed methodology allows for multivariate auxiliary covariates W without suffering the curse of dimensionality.
We used the same bandwidth as suggested by Zhou and Wang (2000) in our estimation. Though it performs reasonably well, one can develop a bandwidth selection criteria like generalized cross-validation for an improved estimation. It is desirable to increase the efficiency of the estimation. In future, we can consider the optimization of α or introduce some weight structure in the score equation to achieve robustness. Further, it is worthy extending our approach to model multivariate failure time.
Endnote
^{a} All numerical results in this paper are obtained using the software MATLAB and the codes are available (Additional file 1).
Appendix 1: Condition (A)
Also observe that, s ^{(0)}(β,t)=E[ Y(t)r(β,t)]=E[ Y(t)r ^{∗}(β,t)].(5) Let F _{ Y(t),Z } be the joint distribution of (Y(t),Z), and f(t,z)=(∂/∂ z)F _{ Y(t),z }(1,z). For each t∈[ 0,1], both f(t,z) and ϕ(β,t) have the 2nd continuous derivative almost everywhere. (6) h→0, n h ^{ d+4}→0 and n h ^{ d }(logn)^{−2}→∞, as n→∞.
Appendix 2: Technical Proofs
Proof of Proposition 3.1. The argument employed here is similar to that for Theorem 1 of Jiang et al. (2011). Note that \(\hat {\nu }_{j}-\nu _{j}=\sum _{i\in V}\omega _{i}(\nu _{i}-\nu _{j}) +\sum _{i\in V}\omega _{i}(\zeta _{i}-\nu _{i})\). By standard nonparametric regression techniques (see for example Härdle 1990; Fan and Gijbels 1996), it can be shown that the first term above contributes to bias and is O _{ p }(h ^{2}), which is of order \(o_{p}(1/\sqrt {nh^{d}})\), if one uses an undersmoothing bandwidth such that n h ^{ d+4}→0, so that \(\hat {\nu }_{j}-\nu _{j}=\sum _{i\in V}\omega _{i}(\zeta _{i}-\nu _{i})+o_{p}(1/\sqrt {nh^{d}}).\) Similarly, \(\hat {\psi }_{j}-\psi _{j}=\sum _{i\in V}\omega _{i}(\xi _{i}-\psi _{i})+o_{p}(1/\sqrt {nh^{d}}).\) Then the asymptotic normality can be obtained by using the Cramé-Wald device and directly computing the asymptotic mean and variance (see, for example the Lemma 6.3 in Jiang and Mack 2001).
The asymptotic normality of \(\sqrt {nh^{d}}(\bar {\nu }_{j}-\nu _{j})\) is obtained by the Slutsky’s theorem and the asymptotic normality of \(\sqrt {nh^{d}}(\hat {\nu }_{j}-\nu _{j})\), \(\sqrt {nh^{d}}(\hat {\psi }_{j}-\psi _{j})\) and \(\sqrt {nh^{d}}(\bar {\psi }_{j}-\psi _{j})\).
Lemma 7.1.
Proof.
uniformly in \(\beta \in {\mathcal B}\). □
Proof of Theorem 4.1. The proof is argued in the framework of the multivariate counting processes, the martingale theory, and the techniques commonly used in nonparametric regression. Following the same routine as in Zhou and Wang (2000), the consistency of \(\hat {\beta }_{\textit {EPL}}\) can be derived by using the Inverse Function Theorem (Rudin 1964; Andersen and Gill, 1982) and the argument by Foutz (1977). In the following, we give only the asymptotic normality in Theorem 4.1. The main techniques we employed are Taylor’s expansion of the score function corresponding to the estimated likelihood function (3.10), Lenglart inequality, the martingale central limit theorem (see e.g. Fleming and Harrington 1991), and nonparametric regression techniques.
Therefore, to prove the asymptotic normality in the theorem it suffices to show that \(n^{-1/2}\hat {U}(\beta,1)\) is asymptotically normal with mean 0 and variance Σ(β _{0})=(1−ρ)Σ _{1}(β _{0})+ρ Σ _{2}(β _{0}), which is evidenced in Lemma 7.4 below.
Proof of Theorem 4.2. Using similar arguments to Theorem 4.1, we establish the result.
Lemma 7.2.
Proof.
The result can be obtained by following the same argument as that for Lemma 2.4 of Zhou and Wang (1999). □
Lemma 7.3.
Proof.
where the last equality is from Lemma 7.2. Therefore the result holds. □
Lemma 7.4.
Proof.
For the 1st and 3rd terms above, each of them is a sum of independently distributed terms with mean zero from the nonvalidation and validation subsamples, respectively. The 1st term converges weakly to a gaussian process with covariance (1−ρ)Σ _{1}(β _{0}). The 3rd term is asymptotically normal with mean zero and variance ρ Σ _{2}(β _{0}). By independence of the two terms, \(n^{-1/2}\hat {U}(\beta,1)\stackrel {P}{\longrightarrow }N(0,\Sigma (\beta))\) with Σ(β)=(1−ρ)Σ _{1}(β)+ρ Σ _{2}(β). □
Declarations
Acknowledgements
The work was partially supported by NSF grant DMS-0906482 and NSFC grant 71361010. The research of Yanqing Sun was partially supported by the National Institutes of Health NIAID [grant number R37 AI054165] and by the National Science Foundation [grant number DMS-1208978]. The authors would also like to thank Dr. Savitz for making the data on serrum ferritin concentration in relation to preterm delivery study available for the application.
Authors’ Affiliations
References
- Andersen, PK: Gill: Cox’s regression model for counting processes: a large sample study. Lifetime Data Anal. 10, 1100–1120 (1982).MATHGoogle Scholar
- Chen, Y-H, Chen, R: A unified approach to regression analysis under double-sampling designs. J. R. Stat. Soc. B. 62, 449–460 (2000).View ArticleMATHGoogle Scholar
- Cox, DR: Regression models and life-tables (with discussion). J. R. Stat. Soc. B. 34, 187–220 (1972).MATHGoogle Scholar
- Dickson, ER, Grambsch, PM, Fleming, TR, Fisher, LD, Langworthy, A: Prognosis in Primary Biliary Cirrhosis: Model for Decision Making. Hepatology. 10, 1–7 (1989).View ArticleGoogle Scholar
- Fan, J, Gijbels, I: Local Polynomial Modeling and Its Applications. Chapman and Hall, London (1996).Google Scholar
- Fan, J, Yao, Q: Nonlinear Time Series: Nonparametric and Parametric Methods. Springer-Verlag, New York (2003).View ArticleGoogle Scholar
- Fan, Z, Wang, X: Marginal hazards model for multivariate failure time data with auxiliary covariates. J. Nonparametric Stat. 21, 771–786 (2009).View ArticleMATHGoogle Scholar
- Fleming, TR: Harrington, DP Counting Process and Survival Analysis. Wiley, New York (1991).Google Scholar
- Foutz, RV: On the unique consistent solution to the likelihood equations. J. Am. Stat. Assoc. 72, 147–148 (1977).View ArticleMATHMathSciNetGoogle Scholar
- Härdle, W: Applied Nonparametric Regression. Cambridge University Press, London (1990).View ArticleMATHGoogle Scholar
- Hughes, MD: Regression dilution in the proportional hazards model. Biometrics. 49, 1056–1066 (1993).View ArticleMATHMathSciNetGoogle Scholar
- Jiang, X, Jiang, J, Liu, Y: Nonparametric regression under double-sampling designs. J. Syst. Sci. Complex. 24, 167–175 (2011).View ArticleMATHMathSciNetGoogle Scholar
- Jiang, J, Mack, YP: Robust local polynomial regression for dependent data. Stat. Sinica. 11, 705–722 (2001).MATHMathSciNetGoogle Scholar
- Kalbfleisch, JD, Prentice, RL: The Statistical Analysis of Failure Time Data. Wiley, New York (1980).MATHGoogle Scholar
- Lin, DY, Ying, Z: Cox regression with incomplete covariate measurements. J. Am. Stat. Assoc. 88, 1341–1349 (1993).View ArticleMATHMathSciNetGoogle Scholar
- Lipsitz, S, Ibrahim, JG: Using the E-M algorithm for survival data with incomplete categorical covariates. Lifetime Data Anal. 2, 5–14 (1996).View ArticleMATHGoogle Scholar
- Liu, Y, Wu, Y, Zhou, H: Multivariate failure times regression with a continuous auxiliary covariate. J. Multivariate Anal. 101, 679–691 (2010).View ArticleMATHMathSciNetGoogle Scholar
- Markus, BH, Dickson, ER, Grambsch, PM, Fleming, TR, Mazzaferro, V, Klintmalm, GB, Wiesner, RH, Van Thiel, DH, Starzl, TE: Efficiency of liver transplantation in patients with primary biliary cirrhosis. N. Engl. J. Med. 320, 1709–1713 (1989).View ArticleGoogle Scholar
- Nadaraya, EA: On estimating regression. Theory Probab. Appl. 10, 186–190 (1964).Google Scholar
- Pepe, MS, Self, SG, Prentice, RL: Further results on covariate measurement errors in cohort studies with time to response data. Statist. Med. 8, 1167–1178 (1989).View ArticleGoogle Scholar
- Prentice, RL: Covariate measurement errors and parameter estimation in a failure time regression model. Biomtrika. 69, 331–342 (1982).View ArticleMATHMathSciNetGoogle Scholar
- Rubin, DB: Inference and missing data. Biomtrika. 63, 581–592 (1976).View ArticleMATHGoogle Scholar
- Rudin, W: Principle of Mathematical Analysis. McGraw-Hill Book Co., New York (1964).Google Scholar
- Savitz, DA, Dole, N, Jr Terry, JW, Zhou, H, Jr Thorp, JM: Smoking and pregnancy outcome among African-American and white women in central North Carolina. Epidemiology. 12, 636–642 (2001).View ArticleGoogle Scholar
- Watson, GS: Smooth regression analysis. Sankhya A. 26, 359–372 (1964).MATHGoogle Scholar
- Zhou, H, Pepe, MS: Auxiliary covariate data in failure time regression analysis. Biomtrika. 82, 139–149 (1995).View ArticleMATHMathSciNetGoogle Scholar
- Zhou, H, Wang, C-Y: Some asymptotic results for using kernel smoother with covariate measurement error problem in survival analysis, Vol. 2200. University of North Carolina, Chapill Hill (1999).Google Scholar
- Zhou, H, Wang, C-Y: Failure time regression with continuous covariates measured with error. J. R. Stat. Soc. B. 62, 657–665 (2000).View ArticleMATHGoogle Scholar
Copyright
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.