 Methodology
 Open Access
 Published:
Tolerance intervals in statistical software and robustness under model misspecification
Journal of Statistical Distributions and Applications volume 8, Article number: 10 (2021)
Abstract
A tolerance interval is a statistical interval that covers at least 100ρ% of the population of interest with a 100(1−α)% confidence, where ρ and α are prespecified values in (0, 1). In many scientific fields, such as pharmaceutical sciences, manufacturing processes, clinical sciences, and environmental sciences, tolerance intervals are used for statistical inference and quality control. Despite the usefulness of tolerance intervals, the procedures to compute tolerance intervals are not commonly implemented in statistical software packages. This paper aims to provide a comparative study of the computational procedures for tolerance intervals in some commonly used statistical software packages including JMP, Minitab, NCSS, Python, R, and SAS. On the other hand, we also investigate the effect of misspecifying the underlying probability model on the performance of tolerance intervals. We study the performance of tolerance intervals when the assumed distribution is the same as the true underlying distribution and when the assumed distribution is different from the true distribution via a Monte Carlo simulation study. We also propose a robust model selection approach to obtain tolerance intervals that are relatively insensitive to the model misspecification. We show that the proposed robust model selection approach performs well when the underlying distribution is unknown but candidate distributions are available.
Introduction
There are three types of statistical intervals commonly used in practice: confidence interval, prediction interval, and tolerance interval. Confidence intervals provide a range of values that are likely to include the unknown parameter with a specified degree of confidence, 100(1−α)%, based upon a random sample. A prediction interval is an interval, with a specified degree of confidence, 100(1−α)%, that the single future observation or multiple future observations from a population will fall between. A tolerance interval covers at least a specified proportion, ρ (0≤ρ≤1), of the population with a specified degree of confidence, 100(1−α)% with 0≤α≤1 (Hahn and Meeker 1991). It can be interpreted as we are 100(1−α)% confidence that at least 100ρ% of the population will be within the interval. This tolerance interval can be denoted as a [100(1−α)%]/[100ρ%] tolerance interval. For example, a quality engineer in a light bulb manufacturer needs to evaluate light bulbs’ life spans. The engineer randomly collects a sample of 100 light bulbs and reports the times to failure. The engineer wants to calculate a 95%/99% lower tolerance bound, which is the burn time that at least 99% of all light bulbs exceed with 95% confidence. Suppose the lower tolerance bound based on a normal distribution is 1085.947, so the engineer can claim that at least 99% of all the light bulbs exceed approximately 1086 hours of burn time with 95% confidence (Minitab 18 Statistical Software 2017). Tolerance intervals would be of particular interest in setting limits on the process capability for a product manufactured in large quantities (Hahn and Meeker 1991). Therefore, the tolerance interval is widely used in statistical quality control.
Despite the usefulness of tolerance intervals, the computation of tolerance intervals based on different distributional assumptions is not commonly implemented in statistical software packages. We found that only a few commonly used statistical software packages, such as Minitab (Minitab 18 Statistical Software 2017), R (R Core Team 2020) and SAS (SAS Institute Inc 2014), provides the computational procedures for tolerance intervals. The objective of this paper is twofold. First, we aim to compare different commonly used statistical software packages that offer computational procedures to compute tolerance intervals. Second, we evaluate the performance of tolerance intervals under model uncertainty and propose a robust model selection approach to compute the tolerance intervals.
The rest of this paper is organized as follows. In Section 2, we provide the notation for tolerance intervals and introduce the computation procedures available in commonly used statistical software packages. In Section 3, we evaluate the performance of tolerance intervals under model misspecification. In Section 4, we propose a model selection approach when the underlying probability model is unknown but some candidate models are available. Finally, in Section 5, some concluding remarks and future research directions are provided.
Tolerance interval and statistical models
2.1 Basics of tolerance intervals
Let X_{1},X_{2},…,X_{n} be a random sample of size n from a probability model with probability density function (PDF) f(x;θ) and cumulative distribution function (CDF) F(x;θ), where θ is the vector of parameters. We denote the observed values of X_{1},X_{2},…,X_{n} as x_{1},x_{2},…,x_{n}. In the case that the population mean μ and population standard deviation σ are unknown, these parameters are estimated by using the sample mean and sample standard deviation, \(\bar {x} = \sum _{i=1}^{n} x_{i}/n\) and s = \(\sqrt {\sum _{i=1}^{n}(x_{i} \bar {x})^{2}/(n1)}\), respectively. For example, for normally distributed data, a [100(1−α)%]/[100ρ%] tolerance interval has the form
where k is the tolerance factor, (1−α)∈(0,1) is the confidence level and ρ∈(0,1) is the population proportion of interest. Usually, the exact value of k for given values of α and ρ is not easy to compute (with the onesided normal setting being an exception), therefore, most tolerance intervals are calculated based on approximation methods (Young 2010). We define
as the coverage of a twosided interval [L,U], where L and U are statistics computed from the sample. Then, a [100(1−α)%]/[100ρ%] twosided tolerance interval [L,U] satisfies
Similarly, a [100(1−α)%]/[100ρ%] upper onesided tolerance interval [L,∞] satisfies
and a [100(1−α)%]/[100ρ%] lower onesided tolerance interval [−∞,U] satisfies
For specific values of α and ρ, the twosided, upper onesided, and lower onesided tolerance intervals can be obtained by finding the values of L and U that satisfy Eqs. (1), (2), and (3), respectively, for a specified underlying distribution.
To construct the tolerance interval, instead of assuming the data are coming from a particular parametric model, one can obtain a nonparametric tolerance interval based on order statistics (see, for example, Section 7.2 of David and Nagaraja (2003)). Specifically, the upper and lower nonparametric [100(1−α)%]/[ 100ρ%] tolerance limits are
where x_{j:n} is the jth order statistic of the random sample x_{1},x_{2},…,x_{n} and the values of r and s (r<s) are chosen to satisfy Eq. (1) (Hahn and Meeker 1991; David and Nagaraja 2003).
2.2 Parametric tolerance intervals for some particular distributions
To illustrate the calculation of the tolerance intervals based on different distributions, we consider four symmetric distributions with location and scale parameters: normal (Gaussian), Cauchy, logistic, and Laplace distributions; and three twoparameter skewed (asymmetric) distributions with shape and scale parameters: gamma, Weibull, and lognormal distributions. The functional form of these seven distributions and the corresponding computational formulas for the tolerance intervals based on these distributions are presented in the following. For more details of the computation of tolerance intervals based on different distributions, one may refer to Young (2014).

Normal distribution: The PDF and CDF of a normal distribution with location parameter μ_{N} and scale parameter σ_{N} are, respectively,
$$\begin{array}{*{20}l} f_{N}(x;\mu_{N},\sigma_{N}) = & \frac{1}{\sigma_{N} \sqrt{2\pi}}\exp\left[\frac{(x \mu_{N})^{2}}{2\sigma_{N}^{2}}\right] \end{array} $$(4)$$\begin{array}{*{20}l} {\text{and }} F_{N}(x;\mu_{N},\sigma_{N}) = & \int_{ \infty}^{x} \frac{1}{\sigma_{N} \sqrt{2\pi}} \exp\left[\frac{(t\mu_{N})^{2}}{2\sigma_{N}^{2}}\right] \mathrm{d}t, \end{array} $$(5)where −∞<x<∞,−∞<μ_{N}<∞ and σ_{N}>0.
Based on a random sample of size n, X_{1},X_{2},…,X_{n}, from the normal distribution with PDF and CDF in Eqs. (4) and (5), respectively, suppose \({\hat \mu }_{N}\) and \({\hat \sigma }_{N}\) are the corresponding sample mean and sample standard deviation, then the lower and upper onesided [100(1−α)%]/[100 ρ%] tolerance intervals are (−∞,L_{N,1}) and (U_{N,1},∞) with
$$\begin{array}{@{}rcl@{}} L_{N,1} & = & {\hat \mu}_{N}  k_{N,1,\alpha,\rho} \hat{\sigma}_{N} {\text{ and }} U_{N,1} = {\hat \mu}_{N} + k_{N,1,\alpha,\rho} {\hat \sigma}_{N}, \end{array} $$(6)where the tolerance factor k_{N,1,α,ρ} can be obtained as
$$\hspace{90pt} k_{N,1,\alpha,\rho} = \frac{1}{\sqrt{n}}\mathrm{t}^{*}_{n1;1\alpha} (\sqrt{n}z_{\rho}),$$with \(t^{*}_{d;p}(\omega)\) is the pth upper percentile of a noncentral Student’s tdistribution with d degrees of freedom and noncentrality parameter ω, and z_{p} is the pth upper percentile of the standard normal distribution. Note that the onesided tolerance interval with tolerance factor k_{N,1,α,ρ} is an exact interval.
A twosided [ 100(1−α)%]/[100 ρ%] tolerance interval under normal distribution, (L_{N,2},U_{N,2}), is
$$\begin{array}{@{}rcl@{}} L_{N,2} & = & {\hat \mu}_{N}  k_{N,2,\alpha,\rho} {\hat \sigma}_{N} {\text{ and }} U_{N,2} = {\hat \mu}_{N} + k_{N,2,\alpha,\rho} {\hat \sigma}_{N}, \end{array} $$(7)where k_{N,2} can be obtained as Hoew (1969) (see also, Guenther (2007))
$$\begin{array}{@{}rcl@{}} k_{N,2,\alpha,\rho} & = & \left(z_{\frac{1+\rho}{2}}\sqrt{1+n^{1}} \right) \sqrt{\frac{n1}{\chi^{2}_{n1;\alpha}}} \sqrt{1+\frac{n3\chi^{2}_{n1;\alpha}}{2(n+1)^{2}}}, \end{array} $$and \(\chi ^{2}_{d;p}\) is the pth upper percentile of the chisquare distribution with d degrees of freedom. Note that the twosided tolerance interval with tolerance factor k_{N,2,α,ρ} is an approximation. For the other ways to approximate the tolerance factor, one can refer to Section 2.3 of Krishnamoorthy and Mathew (2009).

Cauchy distribution: The PDF and CDF of a Cauchy distribution with location parameter μ_{C} and scale parameter σ_{C} are, respectively,
$$\begin{array}{*{20}l} f_{C}(x;\mu_{C},\sigma_{C}) = & \frac{1}{\pi\sigma_{C}} \left[\frac{\sigma_{C}^{2}}{(x\mu_{C})^{2}+\sigma_{C}^{2}}\right] \end{array} $$(8)$$\begin{array}{*{20}l} {\text{and }} F_{C}(x; \mu_{C},\sigma_{C}) = & \int_{ \infty}^{x} \frac{1}{\pi\sigma_{C}} \left[\frac{\sigma_{C}^{2}}{(t\mu_{C})^{2}+\sigma_{C}^{2}}\right] \mathrm{d}t, \end{array} $$(9)where −∞<x<∞,−∞<μ_{C}<∞ and σ_{C}>0.
Based on a random sample of size n, X_{1},X_{2},…,X_{n}, from the Cauchy distribution with PDF and CDF in Eqs. (8) and (9), respectively, suppose \({\hat \mu }_{C}\) and \({\hat \sigma }_{C}\) are the maximum likelihood estimates of μ_{C} and σ_{C}, respectively, then the lower and upper onesided [ 100(1−α)%]/[100 ρ%] tolerance intervals are (−∞,L_{C,1}) and (U_{C,1},∞) with
$$\begin{array}{@{}rcl@{}} L_{C,1} = {\hat \mu}_{C}  k_{C,\alpha,\rho} {\hat \sigma}_{C} {\text{ and }} U_{C,1} = {\hat \mu}_{C} + k_{C,\alpha,\rho} {\hat \sigma}_{C}, \end{array} $$where k_{C,α,ρ} is defined as
$$\begin{array}{@{}rcl@{}} k_{C,\alpha,\rho} & = & \frac{z_{1\alpha}}{\sqrt{n}}\sqrt{2+2[F^{1}_{C}(1  \rho;\mu_{C} = 0, \sigma_{C} = 1)]^{2}} \\ & &  F^{1}_{C}(1  \rho; \mu_{C} = 0, \sigma_{C} = 1), \end{array} $$with \(F^{1}_{C}(p; \mu _{C} = 0, \sigma _{C} = 1) = \frac {1}{\pi (p^{2}+1)},p \in (0, 1)\). An approximate twosided [100(1 α)%]/[100 ρ%] tolerance interval, (L_{C,2},U_{C,2}), is given by
$$\begin{array}{@{}rcl@{}} L_{C,2} & = & {\hat \mu}_{C}  k_{C,\alpha/2,\rho/2} {\hat \sigma}_{C} {\text{ and }} U_{C,2} = {\hat \mu}_{C} + k_{C,\alpha/2,\rho/2} {\hat \sigma}_{C}. \end{array} $$ 
Logistic distribution: The PDF and CDF of a logistic distribution with location parameter μ_{L} and scale parameter σ_{L} are, respectively,
$$\begin{array}{*{20}l} f_{L}(x; \mu_{L}, \sigma_{L}) = & \frac{\exp \left[ \frac{x\mu_{L}}{\sigma_{L}}\right]}{\sigma_{L}\left[1+\exp \left( \frac{x\mu_{L}}{\sigma_{L}}\right) \right]^{2}}, \end{array} $$(10)$$\begin{array}{*{20}l} {\text{and }} F_{L}(x;\mu_{L}, \sigma_{L}) = & \frac{1}{1+ \exp \left( \frac{x\mu_{L}}{\sigma_{L}} \right)}, \end{array} $$(11)where −∞<x<∞,−∞<μ_{L}<∞ and σ_{L}>0.
Based on a random sample of size n, X_{1},X_{2},…,X_{n}, from the logistic distribution with PDF and CDF in Eqs. (10) and (11), respectively, suppose \({\hat \mu }_{L}\) and \({\hat \sigma }_{L}\) are the maximum likelihood estimates of μ_{L} and σ_{L}, respectively, then the lower and upper onesided [100(1−α)%]/[100ρ%] tolerance intervals are (−∞,L_{L,1}) and (U_{L,1},∞) with
$$\begin{array}{@{}rcl@{}} L_{L,1} &=& {\hat \mu}_{L}  k_{L,1, \alpha, \rho} {\hat \sigma}_{L} {\text{ and }} U_{L,1} = {\hat \mu}_{L} + k_{L,2, \alpha, \rho} {\hat \sigma}_{L}, \end{array} $$where k_{L,1,α,ρ} and k_{L,2,α,ρ} can be obtained as
$$\begin{array}{@{}rcl@{}} k_{L,1, \alpha, \rho} & \approx & \frac{t_{1, \alpha, \rho} + \sqrt{t^{2}_{1, \alpha, \rho}  u_{\alpha, \rho}v_{\alpha}}}{v_{\alpha}}, \\ k_{L,2, \alpha, \rho} & \approx & \frac{t_{2, \alpha, \rho} + \sqrt{t^{2}_{2,\alpha, \rho}  u_{\alpha, \rho} v_{\alpha}}}{v_{\alpha}}, \end{array} $$and
$$\begin{array}{@{}rcl@{}} t_{1, \alpha, \rho} & = & {F}^{1}_{L}(\rho;\mu = 0, \sigma = 1)  \hat\sigma_{12} {z}^{2}_{1  \alpha}, \\ t_{2, \alpha, \rho} & = & {F}^{1}_{L}(\rho;\mu = 0, \sigma = 1) + \hat\sigma_{12} {z}^{2}_{1  \alpha}, \\ u_{\alpha, \rho} & = & [{F}^{1}_{L}(\rho;\mu = 0, \sigma = 1)]^{2}  \hat{\sigma}^{2}_{1}{z}^{2}_{1  \alpha}, \\ v_{\alpha} & = & 1  {\hat\sigma}^{2}_{2}{z}^{2}_{1  \alpha}, \end{array} $$\({F}^{1}_{L}(p;\mu _{L} = 0, \sigma _{L} = 1) = \ln [p/(1p)],p \in (0, 1),\hat {\sigma }^{2}_{1}\) and \(\hat {\sigma }^{2}_{2}\) are the variances of \({\hat \mu }_{L}\) and \({\hat \sigma }_{L}\), respectively, and \(\hat \sigma _{12}\) is the covariance of \({\hat \mu }_{L}\) and \({\hat \sigma }_{L}\).
An approximate twosided [ 100(1−α)%]/[ 100ρ%] tolerance interval, (L_{L,2},U_{L,2}), is given by
$$\begin{array}{@{}rcl@{}} L_{L,2} &=& {\hat \mu}_{L}  k_{L,1,\alpha/2,(\rho+1)/2} {\hat \sigma}_{L} {\text{ and }} U_{L,2} = {\hat \mu}_{L} + k_{L,2,\alpha/2,(\rho+1)/2} {\hat \sigma}_{L}. \end{array} $$Note that tolerance intervals under logistic distribution cannot be calculated if \(t^{2}_{1, \alpha, \rho }  u_{\alpha, \rho } v_{\alpha } < 0\) or \(t^{2}_{2, \alpha, \rho }  u_{\alpha, \rho } v_{\alpha } < 0\).

Laplace distribution: The PDF and CDF of a logistic distribution with location parameter μ_{P} and scale parameter σ_{P} are, respectively,
$$\begin{array}{*{20}l} f_{P}(x;\mu_{P},\sigma_{P}) = & \frac{\exp\left(\frac{x \mu_{P}}{\sigma_{P}}\right)}{2 \sigma_{P}}, \end{array} $$(12)$$\begin{array}{*{20}l} {\text{and }} F_{P}(x; \mu_{P}, \sigma_{P}) = &\left\{\begin{array}{ll} \frac{1}{2} \exp\left(\frac{x  \mu_{P}}{\sigma_{P}} \right) & {\text{if }} x \leq \mu, \cr 1  \frac{1}{2} \exp\left( \frac{x  \mu_{P}}{\sigma_{P}} \right) & {\text{if }} x > \mu, \cr \end{array}\right. \end{array} $$(13)where −∞<x<∞,−∞<μ_{P}<∞ and σ_{P}>0.
Based on a random sample of size n, X_{1},X_{2},…,X_{n}, from the Laplace distribution, suppose \({\hat \mu }_{P}\) and \({\hat \sigma }_{P}\) are the maximum likelihood estimates of μ_{P} and σ_{P}, respectively, the lower and upper onesided [ 100(1−α)%]/[100 ρ%] tolerance intervals are (−∞,L_{P,1}) and (U_{P,1},∞) with
$$\begin{array}{@{}rcl@{}} L_{P,1} & = & {\hat \mu}_{P}  k_{P, \alpha, \rho} {\hat \sigma}_{P} {\text{ and }} U_{P,1} = {\hat \mu}_{P} + k_{P, \alpha, \rho} {\hat \sigma}_{P}, \end{array} $$where
$$\begin{array}{@{}rcl@{}} k_{P, \alpha, \rho} \approx  n \ln[2(1  \rho)] + \frac{z_{1\alpha}}{n  z_{1\alpha}^{2}}\sqrt{n(1+ [\ln[2(1  \rho)]^{2})  z_{1\alpha}^{2}}. \end{array} $$An approximate twosided [ 100(1−α)%]/[ 100ρ%] tolerance interval, (L_{P,2},U_{P,2}), is given by
$$\begin{array}{@{}rcl@{}} L_{P,2} &=& {\hat \mu}_{P}  k_{P,1,\alpha/2,(\rho+1)/2} {\hat \sigma}_{P} {\text{ and }} U_{P,2} = {\hat \mu}_{P} + k_{P,2,\alpha/2,(\rho+1)/2} {\hat \sigma}_{P}. \end{array} $$ 
Gamma distribution: The PDF and CDF of the gamma distribution with parameters θ_{G} and β_{G} are, respectively,
$$\begin{array}{@{}rcl@{}} f_{G}(x;\theta_{G},\beta_{G}) = \frac{x^{\theta_{G}1}\exp\left(x/\beta_{G}\right)}{\beta_{G}^{\theta_{G}}\Gamma(\theta_{G})} \end{array} $$(14)and
$$\begin{array}{@{}rcl@{}} F_{G}(x;\theta_{G},\beta_{G}) = \int_{0}^{x} \frac{t^{\theta_{G}1}\exp\left(t/\beta_{G}\right)}{\beta_{G}^{\theta_{G}}\Gamma(\theta_{G})} \mathrm{d}t, \end{array} $$(15)where x>0,θ_{G}>0 is the shape parameter, β_{G}> 0 is the scale parameter, and \(\Gamma (a) = \int _{0}^{\infty } t^{a  1} e^{z} dt\) is the gamma function.
For gamma distribution, the tolerance intervals can be obtained through the normal tolerance interval by considering a transformation of random variable (Krishnamoorthy et al. 2008). Suppose X is a gamma random variable with PDF and CDF in Eqs. (14) and (15), then X^{1/3} can be approximated by a normal distribution with mean μ_{N} and variance \(\sigma ^{2}_{N}\) defined as
$$\begin{array}{@{}rcl@{}} \mu_{N} = \frac{\beta_{G}^{1/3}\Gamma(\theta_{G}+1/3)}{\Gamma(\theta_{G})} {\text{ and }} \sigma^{2}_{N} = \frac{\beta^{2/3}_{G}\Gamma(\theta_{G}+2/3)}{\Gamma(\theta_{G})}  \mu_{N}^{2}. \end{array} $$(16)Based on a random sample X_{1},X_{2},…,X_{n} from gamma distribution, we first obtain the maximum likelihood estimates of the parameters θ_{G} and β_{G}, denoted as \({\hat \theta }_{G}\) and \({\hat \beta }_{G}\), respectively. Then, we substitute θ_{G} and β_{G} by \({\hat \theta }_{G}\) and \({\hat \beta }_{G}\) into Eq. (16) to obtain \({\hat \mu }_{N}\) and \({\hat \sigma }_{N}^{2}\). After that, the onesided and twosided tolerance intervals for normal distribution (the upper and lower limits are denoted as L_{N} and U_{N}, respectively) can be obtained from Eqs. (6) and (7), respectively, based on \({\hat \mu }_{N}\) and \({\hat \sigma }_{N}^{2}\). The lower and upper [100(1−α)%]/[100ρ%] tolerance limits based on gamma distribution can be obtained as
$$\begin{array}{@{}rcl@{}} L_{G} = L^{3}_{N} {\text{ and }} U_{G} = U^{3}_{N}. \end{array} $$ 
Weibull distribution: The PDF and CDF of the Weibull distribution with parameters β_{W} and θ_{W} are, respectively,
$$\begin{array}{@{}rcl@{}} f_{W}(x;\beta_{W},\theta_{W}) = \frac{\theta_{W}}{\beta_{W}} \left(\frac{x}{\beta_{W}} \right)^{\theta_{W} 1} \exp\left[ \left(\frac{x}{\beta_{W}}\right)^{\theta_{W}}\right] \end{array} $$(17)and
$$\begin{array}{@{}rcl@{}} F_{W}(x;\beta_{W},\theta_{W}) = 1  \exp\left[ \left(\frac{x}{\beta_{W}}\right)^{\theta_{W}}\right], \end{array} $$(18)where x>0,θ_{W}>0 is the shape parameter and β_{W}>0 is the scale parameter.
Based on a random sample X_{1},X_{2},…,X_{n} from Weibull distribution, we first obtain the maximum likelihood estimates of the parameters θ_{W} and β_{W}, denoted as \({\hat \theta }_{W}\) and \({\hat \beta }_{W}\), respectively. Then, the lower and upper onesided [100(1−α)%]/[100ρ%] tolerance intervals can be obtained as:
$$\begin{array}{@{}rcl@{}} L_{W} & = & \exp\left[\ln(\hat{\theta}_{W})  \frac{\hat{\beta}^{1}_{W} t^{*}_{n1;\alpha}\left(\sqrt{n} \lambda_{\rho}\right)}{\sqrt{n1}} \right] \\ U_{W} & = & \exp\left[\ln(\hat{\theta}_{W})  \frac{\hat{\beta}^{1}_{W} t^{*}_{n1;1\alpha}\left(\sqrt{n} \lambda_{1\rho}\right)}{\sqrt{n1}} \right], \end{array} $$where λ_{ρ}= ln(− ln(ρ)). A twosided tolerance interval based on Weibull distribution can be obtained by replacing α by α/2 and ρ by (ρ+1)/2 in the above formulas for computing L_{W} and U_{W}.

Lognormal distribution: The PDF and CDF of the lognormal distribution with parameters μ_{LN} and σ_{LN} are, respectively,
$$\begin{array}{@{}rcl@{}} f_{LN}\left(x;\mu_{LN},\sigma_{LN}\right) = \frac{1}{x\sigma_{LN} \sqrt{2\pi}}\exp\left[\frac{\left(\ln x \mu_{LN}\right)^{2}}{2\sigma^{2}_{LN}}\right], \end{array} $$(19)and
$$\begin{array}{@{}rcl@{}} F_{LN}(x;\mu_{LN},\sigma_{LN}) = \int_{ 0}^{x} \frac{1}{t\sigma_{LN} \sqrt{2\pi}}\exp\left[\frac{(\ln t \mu_{LN})^{2}}{2\sigma^{2}_{LN}}\right] \mathrm{d}t, \end{array} $$(20)where x>0,σ_{LN} is the shape parameter (and is the standard deviation of the log of the distribution), μ_{LN}∈(−∞,∞) is the scale parameter (and is also the median of the distribution).
Based on a random sample X_{1},X_{2},…,X_{n} from lognormal distribution, we can obtain the maximum likelihood estimates of the parameters μ_{LN} and σ_{LN}, denoted as \({\hat \mu }_{LN}\) and \({\hat \sigma }_{LN}\), respectively. Then, the onesided and twosided tolerance intervals for normal distribution (the upper and lower limits are denoted as L_{N} and U_{N}, respectively) can be obtained from Eqs. (6) and (7), respectively, based on \({\hat \mu }_{LN}\) and \({\hat \sigma }_{LN}^{2}\). The tolerance intervals based on lognomral distribution can be computed using the fact that Y= lnX follows a normal distribution if X follows a lognormal distribution, i.e., the lower and upper [100(1−α)%]/[100ρ%] tolerance limits based on lognormal distribution can be obtained as
$$\begin{array}{@{}rcl@{}} L_{LN} = \exp(L_{N}) {\text{ and }} U_{LN} = \exp(U_{N}). \end{array} $$
Statistical software packages for tolerance intervals
3.1 Available statistical software packages
There are several statistical software packages that can provide the computation of tolerance intervals. In this subsection, we discuss several commonly used statistical software packages, including JMP (JMP Version 16, 2021), Minitab (Minitab 18 Statistical Software, 2017), NCSS (NCSS 2021 Statistical Software, 2021), Python (Python Core Team, 2015), R (R Core Team, 2020), and SAS (SAS Institute Inc, 2014), that provide computational procedures to calculate tolerance intervals based on various distributions.
All these six software packages discussed here provide computational procedures of tolerance intervals for normal distribution and nonparametric tolerance intervals. In R (R Core Team, 2020), the package tolerance (Young 2010; 2014) provides the computational procedures of tolerance intervals for more than 20 different distributions. Minitab (Minitab 18 Statistical Software 2017) provides the computation of tolerance intervals for 10 different distributions under the “Quality Tools". In Python (2015), toleranceinterval package provides the computation of nonparametric tolerance interval and parametric tolerance intervals for normal and lognormal distributions. The SAS Institute Inc (2014) procedure PROC CAPABILITY provides tolerance intervals for normal distribution and nonparametric distribution. The statistical distributions and procedures available in JMP, Minitab, NCSS, Python, and R, are summarized in Table 1.
3.2 Comparisons of different software packages
In comparing those six statistical software packages considered here, Python, SAS, JMP, and NCSS have very limited capability in computing tolerance intervals. The R package tolerance is the most comprehensive software package for computing tolerance intervals. Identical methods were implemented across different software packages for some distributions, yet different software packages use different formulas for other distributions. For instance, for onesided tolerance intervals based on normal distribution, the formulas that are used in JMP, Minitab, NCSS, Python, and SAS are equivalent to the R function normtol.int in the tolerance package with the ‘EXACT’ method (i.e., method = ‘EXACT’). For twosided tolerance intervals based on normal distribution, the formulas that are used in JMP, Minitab and SAS are equivalent to the R function normtol.int in the tolerance package with the ‘EXACT’ method (i.e., method = ‘EXACT’), while the formulas that are used in NCSS and Python are equivalent to the R function normtol.int in the tolerance package with the ‘HE’ method (i.e., method = ‘HE’). For nonparametric tolerance intervals, Minitab, NCSS and SAS use the procedure corresponding to the R function nptol.int in the tolerance package with the ‘WILK’ method (i.e., method = “WILK”), while JMP and Python use the procedure corresponding to the R function nptol.int in the tolerance package with the ‘HM’ method, i.e., method = “HM”). For lognormal distribution, the formula used in Minitab corresponds to the R function normtol.int in the tolerance package with the ‘EXACT’ method and setting log.norm = TRUE (i.e., method = “EXACT”, log.norm = T), while Python obtains the tolerance intervals based on logtransformation of the tolerance intervals for normal distribution. For the other distributions, however, Minitab and R use different computational formulas to obtain the tolerance intervals. The corresponding references for the formulas used in different software and the equivalence of the resulting tolerance intervals obtained from different software (grouping in parentheses) are summarized in Table 2.
Effect of model misspecification on tolerance intervals
4.1 Monte Carlo simulation studies
In this section, Monte Carlo simulation studies are used to evaluate the performance of tolerance intervals under different distributions in terms of the empirical confidence levels and population proportions of interest. Specifically, we evaluate the performance of tolerance intervals by assessing the closeness of the empirical probability Pr[C(L,U;θ)] to ρ and the empirical probability 1− Pr[C(L,U;θ)≥ρ] to α. We consider the cases that the assumed distribution is the same as the true underlying distribution and the assumed distribution is different from the true underlying distribution. In this simulation study, we generate random samples of size n from the statistical distributions F and compute the one and twosided tolerance intervals based on the distribution G, i.e, F is the true underlying distribution and G is the assumed distribution. As the true underlying distribution is usually unknown and not specified in practice, we compare the coverage of the tolerance interval to determine the robustness of the tolerance interval for different distributions.
Here, we consider a simulation study for symmetric distributions (normal, Cauchy, logistic, and Laplace distributions) and a simulation study for skewed distributions (gamma, Weibull, and lognormal distributions). For symmetric distributions, we consider the standard distributions by setting the location parameter to be 0 and the scale parameter to be 1. For skewed distributions, we consider the parameter settings based on the parameter estimates in a real data example presented in Section 6.2 (see, Table 25). Specifically, the following procedure is used in the Monte Carlo simulation study to evaluate the performance of the tolerance intervals for fixed values of α and ρ:

Generate a random sample of size n, (x_{1},x_{2},…,x_{n}), from the true underlying distribution F;

Compute the tolerance interval using the sample (x_{1},x_{2},…,x_{n}) based on the assumed distribution G. The tolerance interval obtained in the hth simulation is denoted as [L^{(h)},U^{(h)}];

Obtain the probability that the random variable follows distribution F falls in between the upper and lower limits, i.e., C(L^{(h)},U^{(h)})=F(U^{(h)})−F(L^{(h)});

If C(L^{(h)},U^{(h)})≥ρ, set δ^{(h)}=1, otherwise δ^{(h)}=0;

Repeat Steps (i) – (iv) M times to obtain C(L^{(h)},U^{(h)}) and δ^{(h)} for h=1,2,…,M.
The simulation results are based on M=10000 except for the normal tolerance intervals in R with ‘EXACT’ and ‘OCT’ methods due to the long computation time of these exact procedures in which M=1000 is used. For each setting, the following quantities are computed for comparison purposes:

\(\hat \alpha = 1  \frac {\sum _{h=1}^{M} \delta ^{(h)}}{M}\);

\(\hat \rho = \frac {\sum _{h=1}^{M} C(L^{(h)},U^{(h)})}{M}\);

\(\hat {s} = \sqrt {\frac {\sum _{h=1}^{M} (C(L^{(h)},U^{(h)})  \hat \rho)^{2}}{M  1}}\).
We consider n=10, 25, 50 and 100, α=0.01, 0.05, 0.1 and 0.2, and ρ=0.9, 0.95, 0.99 and 0.995 in both the simulation studies for symmetric distributions and skewed distributions. If the tolerance intervals are performed as expected, the value of the \(\hat {\alpha }\) should be close to the corresponding α with a smaller value of \(\hat {\alpha }\) is preferred, and the value of \(\hat \rho \) should be close to the corresponding ρ with larger value of \({\hat {\rho }}\) is preferred. Moreover, the tolerance interval that gives smaller value of \(\hat {s}\) is preferred. To make it easier to assess the performance of different tolerance intervals and to take into account the Monte Carlo simulation errors, in the tables for those simulation results, we highlight those values of \(\hat \alpha \) within \(\pm 2 \sqrt {\alpha (1\alpha)/M}\) and those values of \(\hat \rho \) within \(\pm 2 \sqrt {\rho (1\rho)/M}\) in bold.
4.2 Simulation results and discussions
The simulation results under different settings when the assumed distribution is the same as the underlying distribution (i.e., F=G) are presented in Tables 3, 4, 5, 6, 7, 8 and 9. When the assumed distribution and the true underlying distribution are the same, we would expect the value of \({\hat {\alpha }}\) should be close to α and the value of \(\hat \rho \) should be close to ρ. However, we observe from Tables 3, 4, 5, 6, 7, 8 and 9 that when the sample size n is small, \(\hat \alpha \) can be larger than α under the correct model assumption. For example, in Table 4, when the underlying distribution is Cauchy with PDF and CDF in Eqs. (8) and (9), α=0.05,ρ=0.9 and n=10, the value of \(\hat \alpha \) is 0.1500. For moderate to large sample sizes (i.e., n=50 and n=100), the values of \(\hat \alpha \) are close or even smaller than the values of α in most cases. For the values of \(\hat \rho \), we observe that the values of \(\hat \rho \) are always greater than ρ under the correct model assumption. For the standard deviation \(\hat {s}\), the value decreases as the sample size n increases.
For the sake of saving space, we only present some representative simulation results under different settings when the assumed distribution is different from the underlying distribution (i.e., F≠G) in Tables 10, 11, 12 and 13, and the simulation results for other settings are presented in the Appendix (Tables 26–38). From Tables 10 and 11, we observe that the tolerance intervals computed under Cauchy and logistic distributions are robust to model misspecification when the true underlying distribution is normal. In Tables 10, 11, 12 and 13, the values of \(\hat {\alpha }\) are less than or equal to α and the values of \(\hat {\rho }\) are larger than ρ. However, the simulation results show that when the tolerance intervals are not robust under model misspecification in general. In Table 12, when the underlying true distribution is Cauchy (F: Cauchy) and the tolerance intervals are computed based on assuming normal distribution (G: Normal), the performance of tolerance intervals may not be satisfactory in terms of the closeness of \(\hat \alpha \) and \(\hat \rho \) to α and ρ, respectively. For example, in Table 12, when ρ = 0.99, α = 0.1 and n=50, the value of \(\hat \alpha \) is 0.7921, which is much larger than the desired level α=0.1 and the value of \(\hat \rho \) is 0.9696, which is smaller than the specific proportion ρ=0.99. Similar observations are obtained based on the results presented in Table 13 and Tables 26–38 in the Appendix for both symmetric and asymmetric distributions.
Based on the simulation results in this section, when the true distribution is different from the assumed distribution, the parametric tolerance intervals can be sensitive to the model misspecification and the performance of the tolerance intervals can be problematic in terms of the covering proportion ρ and the degree of confidence α. Hence, it is desired to develop an appropriate approach to compute the tolerance interval when the true underlying distribution is unknown. To address the issue of model uncertainty in practice, one plausible solution is using the nonparametric tolerance interval which does not require a distributional assumption. From a simulation study for the performance of nonparametric tolerance interval under the four symmetric distributions considered here (results are presented in the Appendix, Tables 39–42), the nonparametric tolerance intervals do not perform as well as the parametric tolerance intervals computed under the correct distributional assumption, i.e., G=F, and the values of \({\hat {\alpha }}\) and \({\hat {\rho }}\) can be far from the prespecified values. For the aforementioned reasons, we propose a model selection approach when there are potential candidate distributions under consideration.
Proposed model selection approach
5.1 Model selection based on maximum likelihood
In this section, we propose a simple model selection approach based on the maximum likelihood for the construction of tolerance intervals under model uncertainty in order to reduce the negative effect of model misspecification. We calculate the maximum likelihood of each candidate distribution and choose the distribution that has the largest likelihood. In other words, we are choosing a distribution that is most likely to be the true distribution when the true distribution is unknown. Then, we calculate the tolerance interval based on the selected distribution. The proposed model selection approach is summarized as follows:

Based on the random sample x_{1},x_{2},…,x_{n}, compute the values of maximum loglikelihood for each of the candidate distributions. For example, for the four symmetric distributions considered here, we have the value of maximum loglikelihood based on normal distribution
$$\begin{array}{@{}rcl@{}} L_{N} = \sum_{i=1}^{n} \ln \left\{\frac{1}{\sigma_{N} \sqrt{2\pi}}\exp\left[\frac{(x_{i}  \mu_{N})^{2}}{2\sigma_{N}^{2}}\right] \right\}, \end{array} $$(21)the value of maximum loglikelihood based on Cauchy distribution
$$\begin{array}{@{}rcl@{}} L_{C} = \sum_{i=1}^{n} \ln \left\{\ \frac{1}{\pi\sigma_{C}} \left[\frac{\sigma_{C}^{2}}{(x_{i}\mu_{C})^{2}+\sigma_{C}^{2}}\right] \right\}, \end{array} $$(22)the value of maximum loglikelihood based on logistic distribution
$$\begin{array}{@{}rcl@{}} L_{L} = \sum_{i=1}^{n} \ln \left\{\frac{\exp\left[(\frac{x_{i}\mu_{L}}{\sigma_{L}})\right]}{\sigma_{L}(1+\exp\left[\frac{x_{i}\mu_{L}}{\sigma_{L}}\right])^{2}} \right\}, \end{array} $$(23)and the value of maximum loglikelihood based on Laplace distribution
$$\begin{array}{@{}rcl@{}} L_{P} = \sum_{i=1}^{n} \ln \left[\frac{\exp\left(\frac{x_{i} \mu_{P}}{\sigma_{P}}\right)}{2\sigma_{P}} \right]. \end{array} $$(24)The maximum loglikelihood based on the asymmetric distributions considered here can be computed in a similar manner.

Select the distribution that gives the largest value of the maximum loglikelihood as the assumed distribution G and compute the tolerance interval based on distribution G.
Since the candidate models considered here have the same number of parameters, therefore, we use the values of the maximum likelihood for model selection. When the candidate models have a different number of parameters, some model selection criteria that penalize the model for having more parameters such as the Akaike’s information criterion (AIC) and the Bayesian information criterion (BIC) can be utilized for model selection.
5.2 Monte Carlo simulation study
In this subsection, we perform a simulation study as described in Section 4 to compute the values of \({\hat {\alpha }},{\hat {\rho }}\) and \({\hat {s}}\) based on the proposed model selection approach using maximum likelihood. For symmetric distributions, we consider the normal, Cauchy, logistic, and Laplace distributions as the candidate distributions. For skewed distributions, we consider the gamma, Weibull, and lognormal distributions as the candidate distributions.
The simulated results under different settings are presented in Tables 14, 15, 16, 17, 18, 19 and 20. From Tables 14, 15, 16, 17, 18, 19 and 20, we observe that the performance of the tolerance intervals computed based on the proposed model selection approach is not as good as the tolerance intervals when the assumed distribution and the true underlying distribution are the same, however, the performance of the tolerance intervals computed based on the proposed model selection approach is better than the tolerance intervals under model misspecification. For example, for 95%/99% tolerance intervals with sample size n=50 and the true underlying distribution is Cauchy, the values of \({\hat {\alpha }},{\hat {\rho }}\) and \({\hat {s}}\) are 0.0661, 0.9925 and 0.0016, respectively, when the assumed distribution is Cauchy (see, Table 4), the values of \({\hat {\alpha }},{\hat {\rho }}\) and \({\hat {s}}\) are 0.7823, 0.9708 and 0.0224, respectively, when the assumed distribution is normal (see, Table 12), and the values of \({\hat {\alpha }},{\hat {\rho }}\) and \({\hat {s}}\) are 0.1227, 0.9894 and 0.0129, respectively, when the proposed model selection approach is used (see, Table 15). We can see that the proposed model selection approach can effectively reduce the risk of model misspecification in the computation of tolerance intervals. Moreover, the performance of the tolerance intervals based on the proposed model selection approach can be better than the nonparametric tolerance intervals (e.g., for 95%/99% tolerance intervals with sample size n=50 and the true underlying distribution is Cauchy, the values of \({\hat {\alpha }},{\hat {\rho }}\) and \({\hat {s}}\) are 0.3884, 0.9879 and 0.0145, respectively). However, when compared with the nonparametric tolerance interval, the proposed model selection approach requires the specification of some suitable candidate distributions.
Illustrative examples
In this section, two numerical examples are used to compare the computations of tolerance intervals using different software packages and illustrate the proposed model selection approach.
6.1 Differences in flood levels data
In this example, we consider 33 differences in flood levels between two stations on Fox river which streams through Wisconsin. The data was originally gathered by Gumbel and Mustafi (1967), which were also discussed by Bain and Engelhardt (1973) and Puig and Stephens (2000). The dataset is presented in Table 21. We assume the data is coming from a normal distribution or a logistic distribution and compute the corresponding parametric tolerance intervals using JMP, Minitab, NCSS, Python, R, and SAS. For the computation in R, the functions normtol.int, logistol.int and nptol.int in the tolerance package with different method options are used to compute the parametric tolerance intervals based on normal and logistic distributions, respectively (Young 2010; 2014).
The 95%/95% tolerance intervals computed based on the data in Table 21 from different software packages are presented in Table 22. For tolerance intervals under normal distribution, we observe that the resulting intervals from JMP, Minitab, SAS, and R with method = ~EXACT~ are the same, while the resulting intervals from NCSS, Python, and R with method "HE" are the same. However, the tolerance intervals computed under logistic distribution are different in Minitab and R.
To illustrate the proposed model selection approach, we consider that the normal and logistic distributions as the candidate distributions. The maximum likelihood estimates of the parameters μ_{N} and σ_{N} for the normal distribution are 9.3536 and 4.0205, respectively, and the value of maximum loglikelihood is 92.2417. The maximum likelihood estimates of the parameters μ_{L} and σ_{L} for logistic distribution are 9.4048 and 2.3611, respectively, and the value of maximum loglikelihood is 93.3586. Based on the values of the maximum loglikelihood, we select the logistic distribution over the normal distribution, and hence, we report the tolerance interval computed based on the logistic distribution for this data set.
6.2 Locomotive controls failure data
To illustrate the computation of tolerance intervals using Minitab, Python, and R for asymmetric distributions (JMP, NCSS, and SAS are not included since they only provide tolerance intervals based on the normal distribution), we consider a lifetime data set for locomotive controls. Nelson (1982) presented the miles to failure of 37 locomotive controls. This data set was also discussed by Krishnamoorthy and Xie (2011) and Yuan et al. (2018). The data set is presented in Table 23.
For illustrative purposes, we consider three commonly used lifetime distributions, the gamma, Weibull, and lognormal distributions, as candidate models for the lifetime of locomotive controls. The 95%/95% tolerance intervals under the gamma, Weibull, and lognormal distributions obtained from Minitab, Python, and R are presented in Table 24. Note that the tolerance intervals obtained from Minitab and R with method = ~EXACT~ under lognormal distribution are the same. However, under different statistical distributions, the resulting tolerance intervals obtained from Minitab and R are different.
To apply the proposed model selection approach, we compute the maximum likelihood estimates of the model parameters and the corresponding values of maximum loglikelihood under the gamma, Weibull, and lognormal distributions, and present the results in Table 25. Since the Weibull distribution gives the largest likelihood among the three candidate models, we select the Weibull distribution and report the tolerance interval based on the Weibull distribution. Based on the results from R, the 95%/95% tolerance interval based on Weibull distribution is (23.884, 171.782). We are 95% confident that 95% of the locomotive controls will have lifetimes that are between 23884 miles and 171782 miles. If this does not satisfy the requirements of the railroad company, then the reliability of the locomotive controls needs to be improved in the manufacturing process.
Concluding remarks
In this paper, we discuss the computation of tolerance intervals available in commonly used statistical software packages including JMP, Minitab, NCSS, Python, R, and SAS. We evaluate the performance of tolerance intervals using Monte Carlo simulation under model misspecification by considering four symmetric distributions: normal, Cauchy, logistic, and Laplace distributions, and three asymmetric distributions: gamma, Weibull, and lognormal distributions. We observe that the performance of parametric tolerance intervals can be sensitive to model misspecification. Therefore, when the true underlying distribution is unknown and some candidate distributions are available, we propose a simple model selection approach and show that the proposed approach can effectively reduce the negative effect of misspecifying the underlying distribution in the performance of tolerance intervals. The computation of the tolerance intervals using different statistical software packages and the proposed model selection approach are illustrated by two numerical examples. For future research, we can compare the performance of the tolerance intervals obtained from different software packages with complete and incomplete data.
Appendix
Availability of data and materials
The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.
Abbreviations
 AIC:

Akaike’s information criterion
 BIC:

Bayesian information criterion
 CDF:

Cumulative distribution function
 PDF:

Probability density function
References
Bain, L. J., Engelhardt, M.: Interval Estimation for the TwoParameter Double Exponential Distribution. Technometrics. 15, 875–887 (1973).
Bain, L. J., Engelhardt, M.: Simple Approximate Distributional Results for Confidence and Tolerance Limits for the Weibull Distribution Based on Maximum Likelihood Estimators. Technometrics. 23, 15–20 (1981).
Bain, L., Englehardt, M.: Statistical analysis of reliability and life testing models: Theory and Methods. Second edition. Marcel Dekker, New York (1991).
Balakrishnan, N., (Ed): Handbook of the Logistic Distribution. Marcel Dekker, New York (1992).
Battelle Memorial Institute: Nonparametric Procedure. In: MMPDS12: Metallic Materials Properties Development and Standardization. Battelle Memorial Institute, Columbus (2017).
Blischke, W. R., Murthy, D. N. P.: Reliability: Modeling, Prediction, and Optimization. Wiley, New York (2000).
Bury, K.: Statistical Distributions in Engineering. Cambridge University Press, UK (1999).
Coles, S.: An Introduction to Statistical Modeling of Extreme Values. SpringerVerlag, London (2001).
David, H. A., Nagaraja, H. N.: Order Statistics. Third edition. Wiley, Hoboken (2003).
Faulkenberry, G. D., Daly, J. C.: Sample size for tolerance limits on a normal distribution. Technometrics. 12, 831–821 (1970).
Fernandez, A. J.: Twosided tolerance intervals in the exponential case: Corrigenda and generalizations. Comput. Stat. Data Anal. 54, 151–162 (2010).
Guenther, W. C.: Sampling Inspection in Statistical Quality Control. (Griffin’s Statistical Monographs, Number 37). London and High Wycombe, Griffin (2007).
Gumbel, E. J., Mustafi, C. K.: Some Analytical Properties of Bivariate Extremal Distributions. J. Am. Stat. Assoc. 62, 569–588 (1967).
Hahn, G. J., Meeker, W. Q.: Statistical Intervals: A Guide for Practitioners. Wiley, New York (1991).
Hall, I. J.: OneSided Tolerance Limits for a Logistic Distribution Based on Censored Samples. Biometrics. 31, 873–880 (1975).
Hoew, W. G.: TwoSided Tolerance Limits for Normal Populations  Some Improvements. J. Am. Stat. Assoc. 64, 610–620 (1969).
Hong, L. J., Huang, Z., Lam, H.: Learningbased robust optimization: Procedures and statistical guarantees. ArXiv Preprint ArXiv. 1704, 04342 (2017).
JMP®. Version 16. SAS Institute Inc., Cary, NC (2021).
Krishnamoorthy, K., Mathew, T.: Statistical Tolerance Regions: Theory, Applications, and Computation. Wiley, Hoboken (2009).
Krishnamoorthy, K., Mathew, T., Mukherjee, S.: Normalbased methods for a gamma distribution: prediction and tolerance intervals and stressstrength reliability. Technometrics. 50, 69–78 (2008).
Krishnamoorthy, K., Xie, F.: Tolerance intervals for symmetric locationscale families based on uncensored or censored samples. J. Stat. Plan. Infer. 141, 1170–1182 (2011).
Lawless, J. F.: Construction of tolerance bounds for the extremevalue and the Weibull distribution. Technometrics. 17, 255–261 (1975).
Meeker, W. Q., Hahn, G. J., Escobar, L. A.: Statistical Intervals: A Guide for Practitioners and Researchers. 2nd ed. Wiley, New York (2017).
Minitab 18 Statistical Software: [Computer software]. State College, PA, Minitab Inc. (2017).
Nelson, W.: Applied Life Data Analysis. Wiley, New York (1982).
NCSS 2021 Statistical Software: NCSS, LLC, Kaysville, Utah, USA (2021). ncss.com/software/ncss.
Puig, P., Stephens, M. A.: Tests of Fit for the Laplace Distribution, with Applications. Technometrics. 42, 417–424 (2000).
Python: Python Core Team. Python: A dynamic, open source programming language. Python Software Foundation (2015). http://www.python.org. Accessed 1 Jan 2021.
R Core Team: A Language and Environment for Statistical Computing R Foundation for Statistical Computing Vienna, Austria (2020). https://www.Rproject.org/. Accessed 1 Jan 2021.
Robbins, H.: On distributionfree tolerance limits in random sampling. Ann. Math. Stat. 15, 214–216 (1944).
SAS Institute Inc: The SAS system for Windows. Release 9.4. SAS Institute Inc., Cary NC (2014).
Wald, A.: An Extension of Wilks Method for Setting Tolerance Limits. Ann. Math. Stat. 14, 45–55 (1943).
Wald, A., Wolfowitz, J.: Tolerance Limits for a Normal Distribution. Ann. Math. Stat. 17, 208–215 (1946).
Weissberg, A., Beatty, G.: Tables of Tolerance Limit Factors for Normal Distributions. Technometrics. 2, 483–500 (1969).
Wilks, S. S.: Determination of Sample Sizes for Setting Tolerance Limits. Ann. Math. Stat. 12, 91–96 (1941).
Wilks, S. S.: Statistical prediction with special reference to the problem of tolerance limits. Ann. Math. Stat. 13, 400–409 (1941).
Young, D. S.: Tolerance: An R Package for Estimating Tolerance Intervals. J. Stat. Softw. Artic. 36(5), 1–39 (2010).
Young, D. S.: Computing Tolerance Intervals and Regions in R. In: Rao, M. B., Rao, C. R. (eds.)Handbook of Statistics, Volume 32: Computational Statistics with R, pp. 309–338, North Holland (2014).
Young, D. S., Mathew, T.: Improved Nonparametric Tolerance Intervals Based on Interpolated and Extrapolated Order Statistics. J. Nonparametric Stat. 26, 415–432 (2014).
Yuan, M., Hong, Y., Escobar, L. A., Meeker, W. Q.: Twosided tolerance intervals for members of the (log)locationscale family of distributions". Qual. Technol. Quant. Manag. 15, 374–392 (2018).
Acknowledgements
The authors thank the editor and the anonymous referee for their useful comments and suggestions on an earlier version of this article which resulted in this improved version.
Funding
This work was supported by the Hamilton Undergraduate Research Award from Southern Methodist University. The second author’s work was also supported by a grant from the Simons Foundation (#709773).
Author information
Authors and Affiliations
Contributions
The authors carried out this work and drafted the manuscript collaboratively. All authors read and approved the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Cho, K.S., Tony Ng, H.K. Tolerance intervals in statistical software and robustness under model misspecification. J Stat Distrib App 8, 10 (2021). https://doi.org/10.1186/s40488021001232
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s40488021001232
Keywords
 Cauchy distribution
 Maximum likelihood
 Model selection
 Model uncertainty