3.1 Nonlinear mixedeffects ODE model
Viral load data from an HIV/AIDS clinical trial are composed of repeated measurements on a group of patients so that a hierarchical modeling approach is necessary to account for within subject as well as between subject variation simultaneously. We are interested in estimating biologically/clinically meaningful parameters in (1) and conducting statistical inferences while taking below detection limit measurements for all patients into account. In the area of longitudinal data analysis, the mixedeffects model is often used to characterize both within and between subjects variation. Let y_{
ij
} be logarithmic measurement of viral load for subject i at time t_{
ij
} for i = 1,⋯,n and j = 1,⋯,n_{
i
}; population parameter μ = (log c, log δ, log λ, log ρ, log N, log k, log ϕ)^{T}; individual parameter θ_{
i
} = (log c_{
i
}, log δ_{
i
}, log λ_{
i
}, log ρ_{
i
}, log N_{
i
}, log k_{
i
}, log ϕ_{
i
})^{T}. Let g(θ_{
i
},t_{
ij
}) = log10(V_{
ij
}(θ_{
i
},t_{
ij
})) with V_{
ij
}(θ_{
i
},t_{
ij
}) being the true amount of viral load based on (1) for subject i at time t_{
ij
}. Following Davidian and Giltinan (1995), a natural Nonlinear Mixedeffects ODE Model (NLMEODE) is modeled as

(i)
Withinsubject variation: y _{
ij
} = g(θ _{
i
},t _{
ij
}) + ε _{
ij
}. Measurement error ε _{
ij
} is assumed to follow a normal distribution with mean zero and variance σ ^{2}.

(ii)
Betweensubject variation: θ _{
i
} = μ + b _{
i
}. Random effect b _{
i
} characterizes the deviation of individual parameters from population level and we assume {\mathbf{\text{b}}}_{i}\sim \mathcal{N}(\mathbf{\text{0}},\mathbf{\text{D}}).
It deserves mentioning that the model described here is different from the classical NLME model in that g(·) has no explicit form in the longterm HIV dynamic setting, which leads to additional challenges for estimating parameters. A few methods such as the Bayesian approach and a Newtonlike method were proposed to attack these challenges; see, for example, Huang et al. (2006), Guedj et al. (2007). However, these procedures either ignore missing (under detection limit) mechanism or are not applicable in the high dimensional setting. We here advocate the SAEM algorithm to take below detection limit data for parameter estimation into account in the longitudinal HIV dynamic model.
3.2 Parameter estimation
Both random effects from NLMEODE and below detection limit (leftcensoring) viral load can be treated as missing data. EM algorithm is able to deal with missing data where the loglikelihood with regard to missing components distribution is obtained at the Estep and the parameter estimates are updated through maximization (Mstep). In light of high dimensionality of random effects and censored data, SAEM algorithm coupled with MCMC provides a convenient way of drawing samples at the Estep (Delyon et al.1999; Kuhn and Lavielle2005). The detail of this method is described below.
3.2.1 A. Expectation
From one point of view, both individual parameter θ_{
i
} and below detection limit data can be treated as missing data. A classical way to cope with missing data is the EM algorithm proposed by Dempster et al. (1977). Let θ = (θ_{1},⋯,θ_{
n
}). Denote Y^{o} and Y^{m} as observations beyond and below detection limit for all subjects across the study period, respectively. Let L(Y^{o},Y^{m},θ;μ,D,σ^{2}) represent the complete data likelihood. The MLE of (μ,D,σ^{2}) is determined by the marginal likelihood of the observed data L(Y^{o};μ,D,σ^{2}), whereas this quantity is often intractable. As an alternative, the EM algorithm calculates the expected value of the complete loglikelihood function, with respect to the joint conditional distribution of Y^{m},θ given Y^{o} under the current estimates of the parameter (μ^{(k)},D^{(k)},σ^{2 (k)})
\mathbf{Q}\left(\mathit{\mu},\mathbf{D},{\sigma}^{2}\mid {\mathit{\mu}}^{(\mathbf{k})},{\mathbf{D}}^{(\mathbf{k})},{\sigma}^{\text{2 (k)}}\right)={\mathbf{\text{E}}}_{{\mathbf{\text{Y}}}^{m},\theta \mid {\mathbf{\text{Y}}}^{o},{\mu}^{(k)},{\mathbf{\text{D}}}^{(k)},{\sigma}^{\text{2 (k)}}}\left[logL(\mathit{\mu},\mathbf{\text{D}},{\sigma}^{2};{\mathbf{\text{Y}}}^{o},{\mathbf{\text{Y}}}^{m},\mathit{\theta})\right].
(6)
For simplicity of notation, let I_{
o
} = {(i,j);y_{
ij
} ≥ DL} with DL being the detection limit, and{y}_{\mathit{\text{ij}}}^{o} be the corresponding observation such that I_{
m
} = {(i,j);y_{
ij
} < DL}, and{y}_{\mathit{\text{ij}}}^{m} be the corresponding missing data. Also, let n_{
t
} be the total number of observations and n_{
s
} be the number of subjects. In the longterm HIV treatment, it follows
\begin{array}{ll}\phantom{\rule{5pt}{0ex}}L(\mathit{\mu},\mathbf{\text{D}},{\sigma}^{2};{\mathbf{\text{Y}}}^{o},{\mathbf{\text{Y}}}^{m},\mathit{\theta})& \propto {({\sigma}^{2})}^{\frac{{n}_{t}}{2}}\times {\left\mathbf{\text{D}}\right}^{\frac{{n}_{s}}{2}}exp\left\{\frac{1}{2{\sigma}^{2}}{\mathrm{\Sigma}}_{i,j\in {I}_{o}}{\left[{y}_{\mathit{\text{ij}}}^{o}g({\mathit{\theta}}_{i},{t}_{\mathit{\text{ij}}})\right]}^{2}\right.\\ \left(\right)close="\}">\phantom{\rule{1em}{0ex}}\frac{1}{2{\sigma}^{2}}{\mathrm{\Sigma}}_{i,j\in {I}_{m}}{\left[{y}_{\mathit{\text{ij}}}^{m}g({\mathit{\theta}}_{i},{t}_{\mathit{\text{ij}}})\right]}^{2}\frac{1}{2}{\mathrm{\Sigma}}_{i}{({\mathit{\theta}}_{i}\mathit{\mu})}^{T}{\mathbf{\text{D}}}^{\text{1}}({\mathit{\theta}}_{i}\mathit{\mu})& .\end{array}\n
(7)
Of particular note is that when g(·) is a linear function of θ, it is easy to obtain that equation (6) follows a normal distribution. For our problem, g(·) is the result of numerically integrating the ODE system in (1), which is not only a nonlinear function of θ, but the closeform expression does not exist as well. Accordingly, we follow the idea of a stochastic version of EM algorithm (SAEM) (Delyon et al.1999) and evaluate equation (6) as follows.
3.2.2 B. Gibbs sampler for incomplete data
It has been long known that Gibbs sampler is useful for simulating data from the joint posterior distribution (Gelfand et al.1990; Wakefield1996). In our case, at the k th iteration, θ and Y^{m} can be alternatively generated from the joint posterior distribution P(θ,Y^{m}∣Y^{o},μ^{(k1)},D^{(k1)},σ^{2(k1)}) summarized in the following two steps.
Step 1 Simulate Y^{m(k)} from the marginal conditional posterior distribution P(Y^{m}∣θ^{(k1)},Y^{o},μ^{(k1)},D^{(k1)},σ^{2(k1)}) which follows a normal distribution truncated at the detection limit. Each{y}_{\mathit{\text{ij}}}^{m(k)} is then centered atg\left({\mathit{\theta}}_{i}^{(k1)},{t}_{\mathit{\text{ij}}}\right) with variance σ^{2(k1)} and can thus be simulated as follows (Breslaw1994):

(a)
calculate the cumulative probability of the detection limit under the same distribution as {y}_{\mathit{\text{ij}}}^{m(k)} and denote as P _{DL};

(b)
draw u from the uniform distribution U(0, 1); and

(c)
obtain a sample of {y}_{\mathit{\text{ij}}}^{m(k)} as {y}_{\mathit{\text{ij}}}^{m(k)}=g({\mathit{\theta}}_{i}^{(k1)},{t}_{\mathit{\text{ij}}})+{\sigma}^{(k1)}{\Phi}^{1}[\mathit{\text{u}}\times {\mathbf{\text{P}}}_{\text{DL}}], where Φ is the standard normal cumulative distribution function.
It should be noted that this sampling algorithm requires only one draw at each iteration therefore is efficient.
Step 2 Simulate θ^{k} from the conditional posterior distribution P(θ ∣ Y^{m(k)},Y^{o},μ^{(k1)},D^{(k1)},σ^{2(k1)}), which has no closeform formula but is proportional to (7) with all the parameters given at current values. The MetropolisHastings (MH) algorithm is capable of generating samples from this distribution. Indeed, one choice of the proposal distribution is{q}_{k1}\sim \mathcal{N}({\mathit{\mu}}^{(k1)},{\mathbf{\text{D}}}^{(k1)}). Then the procedure proceeds as follows

(a)
Calculate acceptance probability α(φθ ^{(k1)}) as
min\left\{1,\frac{P\left(\phi \mid {\mathbf{Y}}^{\mathbf{m}(\mathbf{k})},{\mathbf{Y}}^{\mathbf{o}},{\mathit{\mu}}^{(k1)},{\mathbf{\text{D}}}^{(k1)},{\sigma}^{\text{2(k1)}}\right)}{P\left({\mathit{\theta}}^{(k1)}\mid {\mathbf{Y}}^{\mathbf{m}(\mathbf{k})},{\mathbf{Y}}^{\mathbf{o}},{\mathit{\mu}}^{(k1)},{\mathbf{\text{D}}}^{(k1)},{\sigma}^{\text{2(k1)}}\right)}\frac{{q}_{(k1)}\left({\mathit{\theta}}^{(k1)}\mid {\mathit{\mu}}^{(k1)},{\mathbf{\text{D}}}^{(k1)}\right)}{{q}_{(k1)}\left(\phi \mid {\mathit{\mu}}^{(k1)},{\mathbf{\text{D}}}^{(k1)}\right)}\right\},
(8)
where φ is a candidate simulated from q_{k1}. If we assume that each θ_{
i
} is independent, then D is diagonal. We may thus simulate φ for each i (denote as φ_{
i
}) separately. After some arrangements, the acceptance probability α is simplified as
\alpha ({\phi}^{i}\mid {\mathit{\theta}}^{(k1)})=min\left\{1,\phantom{\rule{.5em}{0ex}}{\mathbf{\text{R}}}^{i}\right\},
(9)
where
\begin{array}{ll}\phantom{\rule{5pt}{0ex}}{\mathbf{\text{R}}}^{i}& =exp\left\{\frac{1}{2{\sigma}^{\text{2(k1)}}}\left({\mathrm{\Sigma}}_{j;i,j\in {I}_{o}}{[{y}_{\mathit{\text{ij}}}^{o}g({\mathit{\theta}}_{i}^{(k1)},{t}_{\mathit{\text{ij}}})]}^{2}+{\mathrm{\Sigma}}_{j;i,j\in {I}_{m}}\left[{y}_{\mathit{\text{ij}}}^{m(k)}\right.\right.\right.\\ \phantom{\rule{1em}{0ex}}{\left(\right)close="]">g\left({\mathit{\theta}}_{i}^{(k1)},{t}_{\mathit{\text{ij}}}\right)}^{}2& {\mathrm{\Sigma}}_{j;i,j\in {I}_{o}}{\left[{y}_{\mathit{\text{ij}}}^{o}g\left({\phi}_{i},{t}_{\mathit{\text{ij}}}\right)\right]}^{2}\end{array}\n \n \n \n close="}">\n \n close=")">\n \n \n \n \n \Sigma \n \n \n j\n ;\n i\n ,\n j\n \u2208\n \n \n I\n \n \n m\n \n \n \n \n \n \n \n \n \n \n y\n \n \n ij\n \n \n m\n (\n k\n )\n \n \n \n g\n \n \n \n \n \phi \n \n \n i\n \n \n ,\n \n \n t\n \n \n ij\n \n \n \n \n \n \n \n \n 2\n \n \n \n \n \n \n .\n
(10)

(b)
For each i, draw u from the uniform distribution U(0, 1). If u ≤ α(φ ^{i}θ ^{(k1)}), then accept φ ^{i} as new {\mathit{\theta}}_{i}^{(k)}; otherwise keep {\mathit{\theta}}_{i}^{(k1)} as {\mathit{\theta}}_{i}^{(k)}.
Notice that, unlike the implementation of MH algorithm in other cases in which the choice of variance for proposal density q is essential to the efficiency of the algorithm. Herein, there is no need to choose an appropriate variance manually. Since variance D is always estimated from the last iteration, the algorithm updates the proposal variance automatically to make itself adaptive. Often, the candidate parameter φ simulated from q makes the integration of (1) unstable, which is the socalled stiffness. To handle the stiff ODEs, we apply a Rosenbrock method, which is relatively easy to implement and also provides a good accuracy. We refer the interested readers to Kaps and Rentrop (1979) for more details.
3.2.3 C. Maximization
Once θ and Y^{m} are simulated, it is straightforward to update (μ,D,σ^{2}) by maximizing equation (6) and we obtain
\begin{array}{l}{\mathit{\mu}}^{(k)}=\frac{1}{{n}_{s}}{\mathrm{\Sigma}}_{i}{\mathit{\theta}}_{i}^{(k)},\\ \text{diag}\left({\mathbf{\text{D}}}^{(k)}\right)=\frac{1}{{n}_{s}}{\mathrm{\Sigma}}_{i}{\mathit{\theta}}_{i}^{2(k)}{\mathit{\mu}}^{2(k)},\\ {\sigma}^{\text{2(k)}}=\frac{1}{{n}_{t}}\left\{{\mathrm{\Sigma}}_{i,j\in {I}_{o}}{[{y}_{\mathit{\text{ij}}}^{o}g({\mathit{\theta}}_{i}^{(k)},{t}_{\mathit{\text{ij}}})]}^{2}+{\mathrm{\Sigma}}_{i,j\in {I}_{m}}{[{y}_{\mathit{\text{ij}}}^{m(k)}g({\mathit{\theta}}_{i}^{(k)},{t}_{\mathit{\text{ij}}})]}^{2}\right\}.\end{array}
(11)
Note that the above estimates are composed of minimum sufficient statistics of (μ,D,σ^{2}). Denote s_{1} = Σ_{
i
}θ_{
i
},{s}_{2}={\mathrm{\Sigma}}_{i}{\mathit{\theta}}_{i}^{2}, and{s}_{3}={\mathrm{\Sigma}}_{i,j\in {I}_{o}}{[{y}_{\mathit{\text{ij}}}^{o}g({\mathit{\theta}}_{i},{t}_{\mathit{\text{ij}}})]}^{2}+{\mathrm{\Sigma}}_{i,j\in {I}_{m}}{[{y}_{\mathit{\text{ij}}}^{m}g({\mathit{\theta}}_{i},{t}_{\mathit{\text{ij}}})]}^{2}. Stochastic approximation step of SAEM is composed of updating s_{1},s_{2}, and s_{3} with a sequence γ^{(k)} at the k th iteration{s}_{i}^{(k)}=(1{\gamma}^{(k)}){s}_{i}^{(k1)}+{\gamma}^{(k)}{s}_{i}^{(k)}. Kuhn and Lavielle (2005) recommended to use γ^{(k)} = 1 for the first K1 iterations followed by diminishing γ^{(k)} = 1/(k  K 1) for another K2 iterations in order to satisfy the assumptions of SAEM and to ensure the convergence of the algorithm. It deserves mentioning that the estimates of variance can be easily obtained by the inverse of the observed fisher information matrix of (7).