3.1 Nonlinear mixed-effects ODE model
Viral load data from an HIV/AIDS clinical trial are composed of repeated measurements on a group of patients so that a hierarchical modeling approach is necessary to account for within subject as well as between subject variation simultaneously. We are interested in estimating biologically/clinically meaningful parameters in (1) and conducting statistical inferences while taking below detection limit measurements for all patients into account. In the area of longitudinal data analysis, the mixed-effects model is often used to characterize both within and between subjects variation. Let y
ij
be logarithmic measurement of viral load for subject i at time t
ij
for i = 1,⋯,n and j = 1,⋯,n
i
; population parameter μ = (log c, log δ, log λ, log ρ, log N, log k, log ϕ)T; individual parameter θ
i
= (log c
i
, log δ
i
, log λ
i
, log ρ
i
, log N
i
, log k
i
, log ϕ
i
)T. Let g(θ
i
,t
ij
) = log10(V
ij
(θ
i
,t
ij
)) with V
ij
(θ
i
,t
ij
) being the true amount of viral load based on (1) for subject i at time t
ij
. Following Davidian and Giltinan (1995), a natural Nonlinear Mixed-effects ODE Model (NLME-ODE) is modeled as
-
(i)
Within-subject variation: y
ij
= g(θ
i
,t
ij
) + ε
ij
. Measurement error ε
ij
is assumed to follow a normal distribution with mean zero and variance σ 2.
-
(ii)
Between-subject variation: θ
i
= μ + b
i
. Random effect b
i
characterizes the deviation of individual parameters from population level and we assume .
It deserves mentioning that the model described here is different from the classical NLME model in that g(·) has no explicit form in the long-term HIV dynamic setting, which leads to additional challenges for estimating parameters. A few methods such as the Bayesian approach and a Newton-like method were proposed to attack these challenges; see, for example, Huang et al. (2006), Guedj et al. (2007). However, these procedures either ignore missing (under detection limit) mechanism or are not applicable in the high dimensional setting. We here advocate the SAEM algorithm to take below detection limit data for parameter estimation into account in the longitudinal HIV dynamic model.
3.2 Parameter estimation
Both random effects from NLME-ODE and below detection limit (left-censoring) viral load can be treated as missing data. EM algorithm is able to deal with missing data where the log-likelihood with regard to missing components distribution is obtained at the E-step and the parameter estimates are updated through maximization (M-step). In light of high dimensionality of random effects and censored data, SAEM algorithm coupled with MCMC provides a convenient way of drawing samples at the E-step (Delyon et al.1999; Kuhn and Lavielle2005). The detail of this method is described below.
3.2.1 A. Expectation
From one point of view, both individual parameter θ
i
and below detection limit data can be treated as missing data. A classical way to cope with missing data is the EM algorithm proposed by Dempster et al. (1977). Let θ = (θ1,⋯,θ
n
). Denote Yo and Ym as observations beyond and below detection limit for all subjects across the study period, respectively. Let L(Yo,Ym,θ;μ,D,σ2) represent the complete data likelihood. The MLE of (μ,D,σ2) is determined by the marginal likelihood of the observed data L(Yo;μ,D,σ2), whereas this quantity is often intractable. As an alternative, the EM algorithm calculates the expected value of the complete log-likelihood function, with respect to the joint conditional distribution of Ym,θ given Yo under the current estimates of the parameter (μ(k),D(k),σ2 (k))
(6)
For simplicity of notation, let I
o
= {(i,j);y
ij
≥ DL} with DL being the detection limit, and be the corresponding observation such that I
m
= {(i,j);y
ij
< DL}, and be the corresponding missing data. Also, let n
t
be the total number of observations and n
s
be the number of subjects. In the long-term HIV treatment, it follows
(7)
Of particular note is that when g(·) is a linear function of θ, it is easy to obtain that equation (6) follows a normal distribution. For our problem, g(·) is the result of numerically integrating the ODE system in (1), which is not only a nonlinear function of θ, but the close-form expression does not exist as well. Accordingly, we follow the idea of a stochastic version of EM algorithm (SAEM) (Delyon et al.1999) and evaluate equation (6) as follows.
3.2.2 B. Gibbs sampler for incomplete data
It has been long known that Gibbs sampler is useful for simulating data from the joint posterior distribution (Gelfand et al.1990; Wakefield1996). In our case, at the k th iteration, θ and Ym can be alternatively generated from the joint posterior distribution P(θ,Ym∣Yo,μ(k-1),D(k-1),σ2(k-1)) summarized in the following two steps.
Step 1 Simulate Ym(k) from the marginal conditional posterior distribution P(Ym∣θ(k-1),Yo,μ(k-1),D(k-1),σ2(k-1)) which follows a normal distribution truncated at the detection limit. Each is then centered at with variance σ2(k-1) and can thus be simulated as follows (Breslaw1994):
-
(a)
calculate the cumulative probability of the detection limit under the same distribution as and denote as P DL;
-
(b)
draw u from the uniform distribution U(0, 1); and
-
(c)
obtain a sample of as , where Φ is the standard normal cumulative distribution function.
It should be noted that this sampling algorithm requires only one draw at each iteration therefore is efficient.
Step 2 Simulate θk from the conditional posterior distribution P(θ ∣ Ym(k),Yo,μ(k-1),D(k-1),σ2(k-1)), which has no close-form formula but is proportional to (7) with all the parameters given at current values. The Metropolis-Hastings (M-H) algorithm is capable of generating samples from this distribution. Indeed, one choice of the proposal distribution is. Then the procedure proceeds as follows
-
(a)
Calculate acceptance probability α(φ|θ (k-1)) as
(8)
where φ is a candidate simulated from qk-1. If we assume that each θ
i
is independent, then D is diagonal. We may thus simulate φ for each i (denote as φ
i
) separately. After some arrangements, the acceptance probability α is simplified as
(9)
where
(10)
-
(b)
For each i, draw u from the uniform distribution U(0, 1). If u ≤ α(φ i|θ (k-1)), then accept φ i as new ; otherwise keep as .
Notice that, unlike the implementation of M-H algorithm in other cases in which the choice of variance for proposal density q is essential to the efficiency of the algorithm. Herein, there is no need to choose an appropriate variance manually. Since variance D is always estimated from the last iteration, the algorithm updates the proposal variance automatically to make itself adaptive. Often, the candidate parameter φ simulated from q makes the integration of (1) unstable, which is the so-called stiffness. To handle the stiff ODEs, we apply a Rosenbrock method, which is relatively easy to implement and also provides a good accuracy. We refer the interested readers to Kaps and Rentrop (1979) for more details.
3.2.3 C. Maximization
Once θ and Ym are simulated, it is straightforward to update (μ,D,σ2) by maximizing equation (6) and we obtain
(11)
Note that the above estimates are composed of minimum sufficient statistics of (μ,D,σ2). Denote s1 = Σ
i
θ
i
,, and. Stochastic approximation step of SAEM is composed of updating s1,s2, and s3 with a sequence γ(k) at the k th iteration. Kuhn and Lavielle (2005) recommended to use γ(k) = 1 for the first K1 iterations followed by diminishing γ(k) = 1/(k - K 1) for another K2 iterations in order to satisfy the assumptions of SAEM and to ensure the convergence of the algorithm. It deserves mentioning that the estimates of variance can be easily obtained by the inverse of the observed fisher information matrix of (7).