Nonparametric rank based estimation of bivariate densities given censored data conditional on marginal probabilities
- Alan D. Hutson^{1}Email author
DOI: 10.1186/s40488-016-0047-y
© Hutson. 2016
Received: 19 August 2015
Accepted: 22 March 2016
Published: 14 April 2016
Abstract
In this note we develop a new Kaplan-Meier product-limit type estimator for the bivariate survival function given right censored data in one or both dimensions. Our derivation is based on extending the constrained maximum likelihood density based approach that is utilized in the univariate setting as an alternative strategy to the approach originally developed by Kaplan and Meier (1958). The key feature of our bivariate survival function is that the marginal survival functions correspond exactly to the Kaplan-Meier product limit estimators. This provides a level of consistency between the joint bivariate estimator and the marginal quantities as compared to other approaches. The approach we outline in this note may be extended to higher dimensions and different censoring mechanisms using the same techniques.
Keywords
Product-limit estimator Bivariate survival function Maximum likelihood Linear programmingMathematics Subject Classification (MSC)
62N01; 62G07Introduction
In this note we develop a new Kaplan-Meier product-limit type estimator for the bivariate survival function given right censored data in one or both dimensions. Our derivation is based on extending the constrained maximum likelihood density based approach (Satten and Datta 2001; Zhou 2005) that is utilized in the univariate setting as an alternative strategy to the classical discrete nonparametric hazard function approach (Kaplan and Meier 1958). There are several methods for estimating a bivariate survival function across different censoring patterns that have been proposed in the literature based on extending the univariate hazard function approach or creating various decompositions (Akritas and Van Keilegom 2003; Gill et al. 1995; Lin and Ying 1993; Prentice et al. 2004; Wang and Wells 1997). In general, they are somewhat complex to compute and may have deficiencies such as negative mass estimates at given points (Prentice et al. 2004). The large sample theory involving these estimators is quite technical (Gill et al. 1995). To the best of our knowledge one of the key limitations of all of the completely nonparametric bivariate survival function estimators developed to-date is that they yield marginal estimators that may not be equivalent to the product-limit estimator corresponding to each dimension. Our estimator, framed as a sparse multinomial estimation problem given simplex constraints, remedies this issue. In addition, in terms of future work our method may be extended to higher dimensions and other censoring mechanisms (left and interval) using the techniques outlined in this note. We also can consider support over the entire real line.
where d _{ j } denotes the number of events and r _{ j } denotes the number at risk at time x _{(j)}, j=1,2,⋯,g, e.g. See Cox and Oakes (1984) for details of this derivation. The maximization of (2) with respect to the parameters h _{ j }, j=1,2,⋯,g, yields the oft utilized estimates \(\hat {h}_{j}=d_{j}/r_{j}\), where r _{ j } denotes the number of subjects at risk at time x _{(j)} and d _{ j } denotes the number of subjects who fail at time x _{(j)}. For a technical treatment with respect to the behavior of the product-limit estimator and how it translates to continuous case see Chen and Lo (1997). It is well-known, but not immediately obvious, that the product-limit estimator reduces to the classic empirical estimator of the survival function \(\hat {S}(x)=1-\sum _{i=1}^{n} I(x_{(i)}\leq x)/n\) when there are no censored observations, where I(·) denotes the indicator function. This is the starting point for most of the bivariate survival estimators found in the literature (Gill et al. 1995).
As an alternative approach to the Kaplan-Meier construction we start by estimating the density function first in a nonparametric fashion from which the survival function and distribution functions are then readily estimated. This approach mirrors the classic parametric maximum likelihood estimation given right censored data in terms of the likelihood containing a density function component and a survival function component whose relative contributions depends upon whether or not an observation is censored. In this framework denote the observed values as T _{ i }= min(X _{ i },C _{ i }), i=1,2,⋯,n. Note that in our alternative derivation we allow the more general assumption that both X and C may have support over the entire real line as compared to the more common restriction in survival modeling that X and C have support only on the positive real line. Furthermore, denote the censoring indicator variable as \(\delta _{i}=I_{(X_{i} \leq C_{i})}\phantom {\dot {i}\!}\), denote the ordered observed T _{ i }’s as t _{(1)}<t _{(2)}<⋯<t _{(n)} and define the parameters of interest as π _{ i }=P(X≤t _{(i)}|δ _{ i }=1)−P(X<t _{(i)}|δ _{ i }=1), i=1,2,⋯,n. The parameter definition is the justification and linkage for this estimator towards underlying continuous data (Owen 1988). Note that similar to the traditional product-limit estimator π _{ i }=0 if δ _{ i }=0 by definition. Now, given right-censoring we only observe j≤n of the X’s, where \(j=\sum _{i=1}^{n} \delta _{i}\).
The last term in the likelihood corresponds to the constraint that \(\sum _{i=1}^{n} \pi _{i} = 1\). If the last observation is censored this implies as in the traditional approach that we still may have an improper distribution of the survival function. Hence, by definition we set this to be an uncensored observation per asymptotic consistency arguments (Chen and Lo 1997). Obviously the likelihood at (3) reduces to the likelihood for the classical empirical estimator given no censoring with \(\hat {\pi }_{i}=1/n\).
where \(\hat {\pi }_{n}=1-\sum _{j=1}^{n-1} \hat {\pi }_{j}\), see Satten and Datta (2001).
where \(\hat {\pi }_{i}\)’s are given at (4) and \(\hat {\pi }_{n}=1-\sum _{j=1}^{n-1} \hat {\pi }_{j}\). The estimator of the survival function given at (6) is equivalent to the product-limit form at (1) in terms of the actual estimated survival probabilities. This is the jumping off point of our new bivariate survival function estimator.
In Section 2 we outline the new constrained maximum likelihood procedure used to develop a bivariate estimator of the joint density f(x,y) from which estimators of the bivariate distribution function F(x,y) and bivariate survival function are readily calculated. In Section 3 we provide some illustrative toy examples followed by two real data examples. We finish with some basic conclusions.
Constrained maximum likelihood estimation
where the parameters, i.e. the π _{ i,j }’s, i=1,2,⋯,n, j=1,2,⋯,n, are in essence weights between 0 and 1 and are defined in detail below at (17).
where we denote the ranks of the observed failure or censoring times per each margin as \(r_{s_{i}}=\text {rank}(S_{i})\) and \(r_{t_{j}}=\text {rank}(T_{j})\), respectively, corresponding to the order statistics S _{(1)}<S _{(2)}<⋯<S _{(n)} and T _{(1)}<T _{(2)}<⋯<T _{(n)}. The parameters at (9) are instrumental with respect to defining the simplex constraints used in our maximization procedure described below. The inter-relationships between the cell probabilities, the π _{ i,j }’s, and the marginal probabilities, the \(\pi _{r_{s_{i}},.}\)’s and \(\pi _{.,r_{t_{j}}}\)’s, are defined in detail below at (20) and (21), respectively.
where we set \(\delta ^{x}_{(n)}=1\). In the case of no censoring all \(\hat {\pi }_{i,.}\)’s are equal to 1/n.
where \(\hat {\pi }_{i,.}\)’s are given at (10) and \(\hat {\pi }_{n,.}=1-\sum _{j=1}^{n-1} \hat {\pi }_{j,.}\). It should be obvious that \(\hat {\pi }_{i,.}=0\) from (10) when \(\delta ^{x}_{(i)}=0\) in the case of a censored observation.
where \(\hat {\pi }_{.,j}\)’s are given at (13) and \(\hat {\pi }_{.,n}=1-\sum _{i=1}^{n-1} \hat {\pi }_{i,.}\).
else we define \(\pi _{r_{s_{i}},r_{t_{j}} }=0\), i.e. \(\pi _{r_{s_{i}},r_{t_{j}} }=0\) if \(\delta _{r_{s_{i}},r_{t_{j}}} = 0\) or \(\delta ^{x}_{r_{s_{i}}}= 0\) or \(\delta ^{y}_{r_{t_{j}}} = 0 \). In the specific case where there is no censoring for both X and Y the number of parameters is of size n and the corresponding maximum likelihood estimator for \(\pi _{r_{s_{i}},r_{t_{j}} }\) is 1/n as per the standard empirical density estimator, i.e. there is a point mass of 1/n per each set of paired observations.
The constraints at (20) and (21) pertain to the marginal constraints where π _{ i,.} and π _{.,j } are defined at (9). Our approach is similar to the problems described for multinomial distribution parameter estimation given sparse data and a class of linear simplex constraints (Liu 2000). The argument for replacing π _{ i,.} and π _{.,j } at (9) with their corresponding estimators \(\hat {\pi }_{i,.}\) and \(\hat {\pi }_{.,j}\) at (10) and (13), respectively, follows similar to the classic R×C contingency table exact inference case. The contribution of the \(\hat {\pi }_{i,.}\)’s and \(\hat {\pi }_{.,j}\)’s to the multinomial distribution given sparse data and linear simplex constraints corresponding to \(\hat {\pi }_{r_{s_{i}},r_{t_{j}} }\)’s of interest depend on the data through the censoring values for the δ ^{ x }’s and δ ^{ y }’s. The joint distribution of the \(\hat {\pi }_{i,.}\)’s and \(\hat {\pi }_{r_{s_{i}},r_{t_{j}} }\)’s is identical to the distribution of the \(\hat {\pi }_{r_{s_{i}},r_{t_{j}} }\)’s since by definition the \(\hat {\pi }_{i,.}\)’s are determined by the \(\hat {\pi }_{r_{s_{i}},r_{t_{j}} }\)’s. The same holds in the other dimension relative to the \(\hat {\pi }_{.,j}\)’s, i.e. the \(\hat {\pi }_{r_{s_{i}},r_{t_{j}} }\)’s are sufficient statistics in terms of determinig the parameters that define the marginal densities.
- 1.
Given the observed censoring pattern utilize the constraints (19), (20) and (21) to define the parameter space.
- 2.
Obtain the estimates of marginal probabilities π _{ i,.} and π _{.,j } given by \(\hat {\pi }_{i,.}\) and \(\hat {\pi }_{.,j}\) at (10) and (13),respectively. Substitute the estimates into (20) and (21) after first processing step 1 above.
- 3.
Utilize standard maximum likelihood technique on the likelihood defined at (18) to solve for the remaining unknown parameters given the constraints defined at (19)–(22).
respectively. Some small sample toy examples will be provided in the next section in order to illustrate the process followed by some real data examples.
Note that if censoring occurs solely in either the x or y dimension the likelihood at (18) reduces substantially in complexity and has a form very similar to that of the univariate setting at (3).
Comment Large sample variance estimates for \(\hat {F}(x,y)\) and \(\hat {S}(x,y)\) are conceptually straightforward in that they follow standard methods based on obtaining the co-information matrix with dimensions that vary as a function of the proportion of censored observations. For small samples this is straightforward. However for moderate to large samples and from a programming point of view, this becomes a rather complex computational problem such that we would recommend either bootstrap or jackknife methodologies for the purpose of variance estimation.
Examples
In this section we provide a few straightforward small sample scenarios in order to illustrate the maximization process for the estimator of f(x,y) and S(x,y) given various censoring patterns. This is followed by two real data examples used by previous researchers in the past to illustrate this type of estimator. It is important to note that as we present the results we set \(\delta ^{x}_{(n)}=\delta ^{y}_{(n)}=1\) by definition. The rational for this was described above.
Toy data examples
- 1.
π _{1,3}+π _{2,1}+π _{3,2}+π _{5,6}+π _{6,4}+π _{6,6}=1,
- 2.
π _{1,3}=1/6,π _{2,1}=1/6,π _{3,2}=1/6,π _{5,6}=1/4,π _{6,4}+π _{6,6}=1/4,
- 3.
π _{2,1}=1/6,π _{3,2}=1/6,π _{1,3}=1/6,π _{6,4}=1/6,π _{5,6}+π _{6,6}=1/3.
We can see that in this small sample setting that estimates for π=(π _{1,3},π _{2,1},π _{3,2},π _{5,6},π _{6,4},π _{6,6}) in this case are determined solely by the marginal constraints. In general, for moderate to large sample sizes this will not be the case.
Bivariate estimates for \(\hat {\pi }_{i,j}, i=1,2, \cdots, 6, j=1,2,\cdots, 6,\) corresponding to example 1
i/j | r _{ t }=1 | r _{ t }=2 | r _{ t }=3 | r _{ t }=4 | r _{ t }=5 | r _{ t }=6 | \(\hat {\pi }_{i,.}\) | \(\delta ^{x}_{r_{s_{i}}}\) |
---|---|---|---|---|---|---|---|---|
r _{ s }=1 | 0 | 0 | 1/6 | 0 | 0 | 0 | 1/6 | 1 |
r _{ s }=2 | 1/6 | 0 | 0 | 0 | 0 | 0 | 1/6 | 1 |
r _{ s }=3 | 0 | 1/6 | 0 | 0 | 0 | 0 | 1/6 | 1 |
r _{ s }=4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
r _{ s }=5 | 0 | 0 | 0 | 0 | 0 | 1/4 | 1/4 | 1 |
r _{ s }=6 | 0 | 0 | 0 | 1/6 | 0 | 1/12 | 1/4 | 1 |
\(\hat {\pi }_{.,j}\) | 1/6 | 1/6 | 1/6 | 1/6 | 0 | 1/3 | 1 | |
\(\delta ^{y}_{r_{t_{j}}}\) | 1 | 1 | 1 | 1 | 0 | 1 |
- 1.
π _{1,3}+π _{3,5}+π _{3,6}+π _{4,1}+π _{4,6}+π _{5,2}+π _{5,6}+π _{6,5}+π _{6,6}=1
- 2.
π _{1,3}=1/6,π _{3,5}+π _{3,6}=5/24,π _{4,1}+π _{4,6}=5/24,π _{5,2}+π _{5,6}=5/24,π _{6,5}+π _{6,6}=5/24
- 3.
π _{4,1}=1/6,π _{5,2}=1/6,π _{1,3}=1/6,π _{3,5}+π _{6,5}=1/4,π _{3,6}+π _{4,6}+π _{5,6}+π _{6,6}=1/4
Bivariate estimates for \(\hat {\pi }_{i,j}, i=1,2, \cdots, 6, j=1,2,\cdots, 6,\) corresponding to example 2
i/j | r _{ t }=1 | r _{ t }=2 | r _{ t }=3 | r _{ t }=4 | r _{ t }=5 | r _{ t }=6 | \(\hat {\pi }_{i,.}\) | \(\delta ^{x}_{r_{s_{i}}}\) |
---|---|---|---|---|---|---|---|---|
r _{ s }=1 | 0 | 0 | 1/6 | 0 | 0 | 0 | 1/6 | 1 |
r _{ s }=2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
r _{ s }=3 | 0 | 0 | 0 | 0 | 5/24 | 0 | 5/24 | 1 |
r _{ s }=4 | 1/6 | 0 | 0 | 0 | 0 | 1/24 | 5/24 | 1 |
r _{ s }=5 | 0 | 1/6 | 0 | 0 | 0 | 1/24 | 5/24 | 1 |
r _{ s }=6 | 0 | 0 | 0 | 0 | 1/24 | 1/6 | 5/24 | 1 |
\(\hat {\pi }_{.,j}\) | 1/6 | 1/6 | 1/6 | 0 | 1/4 | 1/4 | 1 | |
\(\delta ^{y}_{r_{t_{j}}}\) | 1 | 1 | 1 | 0 | 1 | 1 |
- 1.
π _{1,1}+π _{3,4}+π _{3,6}+π _{6,4}+π _{6,6}=1
- 2.
π _{1,1}=1/6,π _{3,4}+π _{3,6}=5/24,π _{6,4}+π _{6,6}=5/8
- 3.
π _{1,1}=1/6,π _{3,4}+π _{6,4}=5/18,π _{3,6}+π _{6,6}=5/9
Bivariate estimates for \(\hat {\pi }_{i,j}, i=1,2, \cdots, 6, j=1,2,\cdots, 6,\) corresponding to example 3
i/j | r _{ t }=1 | r _{ t }=2 | r _{ t }=3 | r _{ t }=4 | r _{ t }=5 | r _{ t }=6 | \(\hat {\pi }_{i,.}\) | \(\delta ^{x}_{r_{s_{i}}}\) |
---|---|---|---|---|---|---|---|---|
r _{ s }=1 | 1/6 | 0 | 0 | 0 | 0 | 0 | 1/6 | 1 |
r _{ s }=2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
r _{ s }=3 | 0 | 0 | 0 | 5/24 | 0 | 0 | 5/24 | 1 |
r _{ s }=4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
r _{ s }=5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
r _{ s }=6 | 0 | 0 | 0 | 5/72 | 0 | 5/9 | 5/8 | 1 |
\(\hat {\pi }_{.,j}\) | 1/6 | 0 | 0 | 5/18 | 0 | 5/9 | 1 | |
\(\delta ^{y}_{r_{t_{j}}}\) | 1 | 0 | 0 | 1 | 0 | 1 |
Note that if there was no censoring within examples 1–3 then the respective values for \(\hat {\pi }_{r_{s_{i}},r_{t_{i}} }\), i=1,2,⋯,n, would be simply 1/n, which corresponds to the maximum likelihood estimates for the classic empirical joint density function.
Real data examples
In this section we re-analyze two sets of real data utilized to demonstrate other approaches to bivarate survival function estimation (Akritas and Van Keilegom 2003; Wang and Wells 1997), see the references contained within relative to the source of the original data.
- 1.
π _{1,1}+π _{2,8}+π _{3,2}+π _{4,6}+π _{5,5}+π _{6,4}+π _{7,9}+π _{10,3}+π _{10,10}+π _{10,11}+π _{11,3}+π _{11,7}+π _{11,10}=1,
- 2.
π _{1,1}=1/11,π _{2,8}=1/11,π _{3,2}=1/11,π _{4,6}=1/11,π _{5,5}=1/11,π _{6,4}=1/11,π _{7,9}=1/11,π _{10,3}+π _{10,10}+π _{10,11}=2/11,π _{11,3}+π _{11,7}+π _{11,10}=2/11
- 3.
π _{1,1}=1/11,π _{3,2}=1/11,π _{10,3}+π _{11,3}=1/11,π _{6,4}=1/11,π _{5,5}=1/11,π _{4,6}=1/11,π _{11,7}=1/11,π _{2,8}=1/11,π _{7,9}=1/11,π _{10,10}+π _{11,10}=1/11,π _{10,11}=1/11.
Estimated bivariate survival probabilites for skin graft data evaluated at the marginal quartiles
u _{ x } | u _{ y } | \(\hat {Q}_{x}(u_{x})\) | \(\hat {Q}_{y}(u_{y})\) | \(\hat {S}\left (\hat {Q}_{x}(u_{x}),\hat {Q}_{y}(u_{y})\right)\) |
---|---|---|---|---|
0.25 | 0.25 | 19.25 | 15.0 | 0.82 |
0.25 | 0.5 | 19.25 | 21.0 | 0.82 |
0.25 | 0.75 | 19.25 | 28.25 | 0.73 |
0.5 | 0.25 | 29. | 15.0 | 0.73 |
0.5 | 0.5 | 29. | 21.0 | 0.55 |
0.5 | 0.75 | 29. | 28.25 | 0.45 |
0.75 | 0.25 | 59.25 | 15.0 | 0.73 |
0.75 | 0.5 | 59.25 | 21.0 | 0.55 |
0.75 | 0.75 | 59.25 | 28.25 | 0.45 |
Estimated bivariate survival probabilities for recurrence times to infection at the point of insertion of a catheter for kidney patients at the marginal quantiles
u _{ x } | u _{ y } | \(\hat {Q}_{x}(u_{x})\) | \(\hat {Q}_{y}(u_{y})\) | \(\hat {S}(\hat {Q}_{x}(u_{x}),\hat {Q}_{y}(u_{y}))\) |
---|---|---|---|---|
0.25 | 0.25 | 15 | 16 | 0.95 |
0.25 | 0.5 | 15 | 39 | 0.92 |
0.25 | 0.75 | 15 | 154 | 0.81 |
0.5 | 0.25 | 46 | 16 | 0.91 |
0.5 | 0.5 | 46 | 39 | 0.85 |
0.5 | 0.75 | 46 | 154 | 0.68 |
0.75 | 0.25 | 149 | 16 | 0.90 |
0.75 | 0.5 | 149 | 39 | 0.77 |
0.75 | 0.75 | 149 | 154 | 0.58 |
Conclusions
In this note we provide the only method for estimating a bivariate survival function such that the marginal estimators correspond exactly to the Kaplan-Meier product limit estimators such that there is a relative consistency between the marginal estimates derived via univariate or bivariate methods. Unlike other methods developed in the literature, our approach is generalizable to higher dimensions and different censoring mechanisms(interval and left censoring). Our methodology also opens up an alternative path for kernel smoothing of both the bivariate density, distribution function and survival function via use of the estimated π _{ i,j }’s over the multinomial grid of non-zero mass. Using real data we illustrated the computational approach and feasibility of this new method of simplex constraint based maximum likelihood estimation as applied to right censored data.
Declarations
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
- Akritas, MG, Van Keilegom, I: Estimation of bivariate and marginal distributions with censored data. J. R. Stat. Soc. Ser. B. 65, 457–471 (2003).MathSciNetView ArticleMATHGoogle Scholar
- Chen, K, Lo, S-H: On the rate of uniform convergence of the product-limit estimator: Strong and weak laws. Ann. Stat. 25, 1050–1087 (1997).MathSciNetView ArticleMATHGoogle Scholar
- Cox, DR, Oakes, D: Analysis of Survival Data. Chapman & Hall/CRC, New York (1984).Google Scholar
- Gill, RD, van der Lann, MJ, Wellner, JA: Inefficient estimators of the bivariate survival function for three models. Annales de l’Institut Henri Poincaré. 31, 545–597 (1995).MathSciNetMATHGoogle Scholar
- Kaplan, EL, Meier, P: Nonparametric estimation from incomplete observations. J. Am. Stat. Assoc. 53, 457–481 (1958).MathSciNetView ArticleMATHGoogle Scholar
- Lin, DY, Ying, Z: A simple nonparametric estimator of the bivariate survival function under univariate censoring. Biometrika. 80, 573–581 (1993).MathSciNetView ArticleMATHGoogle Scholar
- Liu, C: Estimation of discrete distributions with a class of simplex constraints. J. Am. Stat. Assoc. 95, 109–120 (2000).View ArticleGoogle Scholar
- Owen, AB: Empirical likelihood ratio confidence intervals for a single functional. Biometrika. 75, 237–249 (1988).MathSciNetView ArticleMATHGoogle Scholar
- Prentice, RL, Zoe Moodie, F, Wu, J: Hazard-based nonparametric survivor function estimation. J. R. Stat. Soc. Ser. B. 66, 305–319 (2004).MathSciNetView ArticleMATHGoogle Scholar
- Satten, GA, Datta, S: The Kaplan-Meier Estimator as an Inverse-Probability-of-Censoring Weighted Average. Am. Stat. 55, 207–210 (2001).MathSciNetView ArticleMATHGoogle Scholar
- Wang, W, Wells, MT: Nonparametric estimators of the bivariate survival function under simplified censoring condition. Biometrika. 84, 863–880 (1997).MathSciNetView ArticleMATHGoogle Scholar
- Zhou, M: Empirical likelihood ratio with arbitrarily censored/truncated data by EM algorithm. J. Comput. Graph. Stat. 14, 643–656 (2005).MathSciNetView ArticleGoogle Scholar