Skip to main content

Tolerance intervals in statistical software and robustness under model misspecification

Abstract

A tolerance interval is a statistical interval that covers at least 100ρ% of the population of interest with a 100(1−α)% confidence, where ρ and α are pre-specified values in (0, 1). In many scientific fields, such as pharmaceutical sciences, manufacturing processes, clinical sciences, and environmental sciences, tolerance intervals are used for statistical inference and quality control. Despite the usefulness of tolerance intervals, the procedures to compute tolerance intervals are not commonly implemented in statistical software packages. This paper aims to provide a comparative study of the computational procedures for tolerance intervals in some commonly used statistical software packages including JMP, Minitab, NCSS, Python, R, and SAS. On the other hand, we also investigate the effect of misspecifying the underlying probability model on the performance of tolerance intervals. We study the performance of tolerance intervals when the assumed distribution is the same as the true underlying distribution and when the assumed distribution is different from the true distribution via a Monte Carlo simulation study. We also propose a robust model selection approach to obtain tolerance intervals that are relatively insensitive to the model misspecification. We show that the proposed robust model selection approach performs well when the underlying distribution is unknown but candidate distributions are available.

Introduction

There are three types of statistical intervals commonly used in practice: confidence interval, prediction interval, and tolerance interval. Confidence intervals provide a range of values that are likely to include the unknown parameter with a specified degree of confidence, 100(1−α)%, based upon a random sample. A prediction interval is an interval, with a specified degree of confidence, 100(1−α)%, that the single future observation or multiple future observations from a population will fall between. A tolerance interval covers at least a specified proportion, ρ (0≤ρ≤1), of the population with a specified degree of confidence, 100(1−α)% with 0≤α≤1 (Hahn and Meeker 1991). It can be interpreted as we are 100(1−α)% confidence that at least 100ρ% of the population will be within the interval. This tolerance interval can be denoted as a [100(1−α)%]/[100ρ%] tolerance interval. For example, a quality engineer in a light bulb manufacturer needs to evaluate light bulbs’ life spans. The engineer randomly collects a sample of 100 light bulbs and reports the times to failure. The engineer wants to calculate a 95%/99% lower tolerance bound, which is the burn time that at least 99% of all light bulbs exceed with 95% confidence. Suppose the lower tolerance bound based on a normal distribution is 1085.947, so the engineer can claim that at least 99% of all the light bulbs exceed approximately 1086 hours of burn time with 95% confidence (Minitab 18 Statistical Software 2017). Tolerance intervals would be of particular interest in setting limits on the process capability for a product manufactured in large quantities (Hahn and Meeker 1991). Therefore, the tolerance interval is widely used in statistical quality control.

Despite the usefulness of tolerance intervals, the computation of tolerance intervals based on different distributional assumptions is not commonly implemented in statistical software packages. We found that only a few commonly used statistical software packages, such as Minitab (Minitab 18 Statistical Software 2017), R (R Core Team 2020) and SAS (SAS Institute Inc 2014), provides the computational procedures for tolerance intervals. The objective of this paper is twofold. First, we aim to compare different commonly used statistical software packages that offer computational procedures to compute tolerance intervals. Second, we evaluate the performance of tolerance intervals under model uncertainty and propose a robust model selection approach to compute the tolerance intervals.

The rest of this paper is organized as follows. In Section 2, we provide the notation for tolerance intervals and introduce the computation procedures available in commonly used statistical software packages. In Section 3, we evaluate the performance of tolerance intervals under model misspecification. In Section 4, we propose a model selection approach when the underlying probability model is unknown but some candidate models are available. Finally, in Section 5, some concluding remarks and future research directions are provided.

Tolerance interval and statistical models

2.1 Basics of tolerance intervals

Let X1,X2,…,Xn be a random sample of size n from a probability model with probability density function (PDF) f(x;θ) and cumulative distribution function (CDF) F(x;θ), where θ is the vector of parameters. We denote the observed values of X1,X2,…,Xn as x1,x2,…,xn. In the case that the population mean μ and population standard deviation σ are unknown, these parameters are estimated by using the sample mean and sample standard deviation, \(\bar {x} = \sum _{i=1}^{n} x_{i}/n\) and s = \(\sqrt {\sum _{i=1}^{n}(x_{i} -\bar {x})^{2}/(n-1)}\), respectively. For example, for normally distributed data, a [100(1−α)%]/[100ρ%] tolerance interval has the form

$$\hspace{130pt} \bar{x} \pm\ ks,$$

where k is the tolerance factor, (1−α)(0,1) is the confidence level and ρ(0,1) is the population proportion of interest. Usually, the exact value of k for given values of α and ρ is not easy to compute (with the one-sided normal setting being an exception), therefore, most tolerance intervals are calculated based on approximation methods (Young 2010). We define

$$\begin{array}{@{}rcl@{}} C(L,U; \boldsymbol{\theta}) = F(U; \boldsymbol{\theta}) - F(L; \boldsymbol{\theta}) = \Pr(L < X < U),~~ U > L, \end{array} $$

as the coverage of a two-sided interval [L,U], where L and U are statistics computed from the sample. Then, a [100(1−α)%]/[100ρ%] two-sided tolerance interval [L,U] satisfies

$$\begin{array}{@{}rcl@{}} \Pr[C(L,U;\theta) \ge \rho] \ge 1 - \alpha, \end{array} $$
(1)

Similarly, a [100(1−α)%]/[100ρ%] upper one-sided tolerance interval [L,] satisfies

$$\begin{array}{@{}rcl@{}} \Pr[1- F(L; \theta) \ge \rho] \ge 1 - \alpha, \end{array} $$
(2)

and a [100(1−α)%]/[100ρ%] lower one-sided tolerance interval [−,U] satisfies

$$\begin{array}{@{}rcl@{}} \Pr[F(U;\theta) \ge \rho] \ge 1 - \alpha. \end{array} $$
(3)

For specific values of α and ρ, the two-sided, upper one-sided, and lower one-sided tolerance intervals can be obtained by finding the values of L and U that satisfy Eqs. (1), (2), and (3), respectively, for a specified underlying distribution.

To construct the tolerance interval, instead of assuming the data are coming from a particular parametric model, one can obtain a nonparametric tolerance interval based on order statistics (see, for example, Section 7.2 of David and Nagaraja (2003)). Specifically, the upper and lower nonparametric [100(1−α)%]/[ 100ρ%] tolerance limits are

$$\begin{array}{@{}rcl@{}} L = x_{r:n} {\text{ and }} U = x_{s:n}, \end{array} $$

where xj:n is the j-th order statistic of the random sample x1,x2,…,xn and the values of r and s (r<s) are chosen to satisfy Eq. (1) (Hahn and Meeker 1991; David and Nagaraja 2003).

2.2 Parametric tolerance intervals for some particular distributions

To illustrate the calculation of the tolerance intervals based on different distributions, we consider four symmetric distributions with location and scale parameters: normal (Gaussian), Cauchy, logistic, and Laplace distributions; and three two-parameter skewed (asymmetric) distributions with shape and scale parameters: gamma, Weibull, and lognormal distributions. The functional form of these seven distributions and the corresponding computational formulas for the tolerance intervals based on these distributions are presented in the following. For more details of the computation of tolerance intervals based on different distributions, one may refer to Young (2014).

  • Normal distribution: The PDF and CDF of a normal distribution with location parameter μN and scale parameter σN are, respectively,

    $$\begin{array}{*{20}l} f_{N}(x;\mu_{N},\sigma_{N}) = & \frac{1}{\sigma_{N} \sqrt{2\pi}}\exp\left[-\frac{(x- \mu_{N})^{2}}{2\sigma_{N}^{2}}\right] \end{array} $$
    (4)
    $$\begin{array}{*{20}l} {\text{and }} F_{N}(x;\mu_{N},\sigma_{N}) = & \int_{- \infty}^{x} \frac{1}{\sigma_{N} \sqrt{2\pi}} \exp\left[-\frac{(t-\mu_{N})^{2}}{2\sigma_{N}^{2}}\right] \mathrm{d}t, \end{array} $$
    (5)

    where −<x<,−<μN< and σN>0.

    Based on a random sample of size n, X1,X2,…,Xn, from the normal distribution with PDF and CDF in Eqs. (4) and (5), respectively, suppose \({\hat \mu }_{N}\) and \({\hat \sigma }_{N}\) are the corresponding sample mean and sample standard deviation, then the lower and upper one-sided [100(1−α)%]/[100 ρ%] tolerance intervals are (−,LN,1) and (UN,1,) with

    $$\begin{array}{@{}rcl@{}} L_{N,1} & = & {\hat \mu}_{N} - k_{N,1,\alpha,\rho} \hat{\sigma}_{N} {\text{ and }} U_{N,1} = {\hat \mu}_{N} + k_{N,1,\alpha,\rho} {\hat \sigma}_{N}, \end{array} $$
    (6)

    where the tolerance factor kN,1,α,ρ can be obtained as

    $$\hspace{90pt} k_{N,1,\alpha,\rho} = \frac{1}{\sqrt{n}}\mathrm{t}^{*}_{n-1;1-\alpha} (\sqrt{n}z_{\rho}),$$

    with \(t^{*}_{d;p}(\omega)\) is the p-th upper percentile of a non-central Student’s t-distribution with d degrees of freedom and non-centrality parameter ω, and zp is the p-th upper percentile of the standard normal distribution. Note that the one-sided tolerance interval with tolerance factor kN,1,α,ρ is an exact interval.

    A two-sided [ 100(1−α)%]/[100 ρ%] tolerance interval under normal distribution, (LN,2,UN,2), is

    $$\begin{array}{@{}rcl@{}} L_{N,2} & = & {\hat \mu}_{N} - k_{N,2,\alpha,\rho} {\hat \sigma}_{N} {\text{ and }} U_{N,2} = {\hat \mu}_{N} + k_{N,2,\alpha,\rho} {\hat \sigma}_{N}, \end{array} $$
    (7)

    where kN,2 can be obtained as Hoew (1969) (see also, Guenther (2007))

    $$\begin{array}{@{}rcl@{}} k_{N,2,\alpha,\rho} & = & \left(z_{\frac{1+\rho}{2}}\sqrt{1+n^{-1}} \right) \sqrt{\frac{n-1}{\chi^{2}_{n-1;\alpha}}} \sqrt{1+\frac{n-3-\chi^{2}_{n-1;\alpha}}{2(n+1)^{2}}}, \end{array} $$

    and \(\chi ^{2}_{d;p}\) is the p-th upper percentile of the chi-square distribution with d degrees of freedom. Note that the two-sided tolerance interval with tolerance factor kN,2,α,ρ is an approximation. For the other ways to approximate the tolerance factor, one can refer to Section 2.3 of Krishnamoorthy and Mathew (2009).

  • Cauchy distribution: The PDF and CDF of a Cauchy distribution with location parameter μC and scale parameter σC are, respectively,

    $$\begin{array}{*{20}l} f_{C}(x;\mu_{C},\sigma_{C}) = & \frac{1}{\pi\sigma_{C}} \left[\frac{\sigma_{C}^{2}}{(x-\mu_{C})^{2}+\sigma_{C}^{2}}\right] \end{array} $$
    (8)
    $$\begin{array}{*{20}l} {\text{and }} F_{C}(x; \mu_{C},\sigma_{C}) = & \int_{- \infty}^{x} \frac{1}{\pi\sigma_{C}} \left[\frac{\sigma_{C}^{2}}{(t-\mu_{C})^{2}+\sigma_{C}^{2}}\right] \mathrm{d}t, \end{array} $$
    (9)

    where −<x<,−<μC< and σC>0.

    Based on a random sample of size n, X1,X2,…,Xn, from the Cauchy distribution with PDF and CDF in Eqs. (8) and (9), respectively, suppose \({\hat \mu }_{C}\) and \({\hat \sigma }_{C}\) are the maximum likelihood estimates of μC and σC, respectively, then the lower and upper one-sided [ 100(1−α)%]/[100 ρ%] tolerance intervals are (−,LC,1) and (UC,1,) with

    $$\begin{array}{@{}rcl@{}} L_{C,1} = {\hat \mu}_{C} - k_{C,\alpha,\rho} {\hat \sigma}_{C} {\text{ and }} U_{C,1} = {\hat \mu}_{C} + k_{C,\alpha,\rho} {\hat \sigma}_{C}, \end{array} $$

    where kC,α,ρ is defined as

    $$\begin{array}{@{}rcl@{}} k_{C,\alpha,\rho} & = & \frac{z_{1-\alpha}}{\sqrt{n}}\sqrt{2+2[F^{-1}_{C}(1 - \rho;\mu_{C} = 0, \sigma_{C} = 1)]^{2}} \\ & & - F^{-1}_{C}(1 - \rho; \mu_{C} = 0, \sigma_{C} = 1), \end{array} $$

    with \(F^{-1}_{C}(p; \mu _{C} = 0, \sigma _{C} = 1) = \frac {1}{\pi (p^{2}+1)},p \in (0, 1)\). An approximate two-sided [100(1- α)%]/[100 ρ%] tolerance interval, (LC,2,UC,2), is given by

    $$\begin{array}{@{}rcl@{}} L_{C,2} & = & {\hat \mu}_{C} - k_{C,\alpha/2,\rho/2} {\hat \sigma}_{C} {\text{ and }} U_{C,2} = {\hat \mu}_{C} + k_{C,\alpha/2,\rho/2} {\hat \sigma}_{C}. \end{array} $$
  • Logistic distribution: The PDF and CDF of a logistic distribution with location parameter μL and scale parameter σL are, respectively,

    $$\begin{array}{*{20}l} f_{L}(x; \mu_{L}, \sigma_{L}) = & \frac{\exp \left[- \frac{x-\mu_{L}}{\sigma_{L}}\right]}{\sigma_{L}\left[1+\exp \left(- \frac{x-\mu_{L}}{\sigma_{L}}\right) \right]^{2}}, \end{array} $$
    (10)
    $$\begin{array}{*{20}l} {\text{and }} F_{L}(x;\mu_{L}, \sigma_{L}) = & \frac{1}{1+ \exp \left(- \frac{x-\mu_{L}}{\sigma_{L}} \right)}, \end{array} $$
    (11)

    where −<x<,−<μL< and σL>0.

    Based on a random sample of size n, X1,X2,…,Xn, from the logistic distribution with PDF and CDF in Eqs. (10) and (11), respectively, suppose \({\hat \mu }_{L}\) and \({\hat \sigma }_{L}\) are the maximum likelihood estimates of μL and σL, respectively, then the lower and upper one-sided [100(1−α)%]/[100ρ%] tolerance intervals are (−,LL,1) and (UL,1,) with

    $$\begin{array}{@{}rcl@{}} L_{L,1} &=& {\hat \mu}_{L} - k_{L,1, \alpha, \rho} {\hat \sigma}_{L} {\text{ and }} U_{L,1} = {\hat \mu}_{L} + k_{L,2, \alpha, \rho} {\hat \sigma}_{L}, \end{array} $$

    where kL,1,α,ρ and kL,2,α,ρ can be obtained as

    $$\begin{array}{@{}rcl@{}} k_{L,1, \alpha, \rho} & \approx & \frac{t_{1, \alpha, \rho} + \sqrt{t^{2}_{1, \alpha, \rho} - u_{\alpha, \rho}v_{\alpha}}}{v_{\alpha}}, \\ k_{L,2, \alpha, \rho} & \approx & \frac{t_{2, \alpha, \rho} + \sqrt{t^{2}_{2,\alpha, \rho} - u_{\alpha, \rho} v_{\alpha}}}{v_{\alpha}}, \end{array} $$

    and

    $$\begin{array}{@{}rcl@{}} t_{1, \alpha, \rho} & = & {F}^{-1}_{L}(\rho;\mu = 0, \sigma = 1) - \hat\sigma_{12} {z}^{2}_{1 - \alpha}, \\ t_{2, \alpha, \rho} & = & {F}^{-1}_{L}(\rho;\mu = 0, \sigma = 1) + \hat\sigma_{12} {z}^{2}_{1 - \alpha}, \\ u_{\alpha, \rho} & = & [{F}^{-1}_{L}(\rho;\mu = 0, \sigma = 1)]^{2} - \hat{\sigma}^{2}_{1}{z}^{2}_{1 - \alpha}, \\ v_{\alpha} & = & 1 - {\hat\sigma}^{2}_{2}{z}^{2}_{1 - \alpha}, \end{array} $$

    \({F}^{-1}_{L}(p;\mu _{L} = 0, \sigma _{L} = 1) = \ln [p/(1-p)],p \in (0, 1),\hat {\sigma }^{2}_{1}\) and \(\hat {\sigma }^{2}_{2}\) are the variances of \({\hat \mu }_{L}\) and \({\hat \sigma }_{L}\), respectively, and \(\hat \sigma _{12}\) is the covariance of \({\hat \mu }_{L}\) and \({\hat \sigma }_{L}\).

    An approximate two-sided [ 100(1−α)%]/[ 100ρ%] tolerance interval, (LL,2,UL,2), is given by

    $$\begin{array}{@{}rcl@{}} L_{L,2} &=& {\hat \mu}_{L} - k_{L,1,\alpha/2,(\rho+1)/2} {\hat \sigma}_{L} {\text{ and }} U_{L,2} = {\hat \mu}_{L} + k_{L,2,\alpha/2,(\rho+1)/2} {\hat \sigma}_{L}. \end{array} $$

    Note that tolerance intervals under logistic distribution cannot be calculated if \(t^{2}_{1, \alpha, \rho } - u_{\alpha, \rho } v_{\alpha } < 0\) or \(t^{2}_{2, \alpha, \rho } - u_{\alpha, \rho } v_{\alpha } < 0\).

  • Laplace distribution: The PDF and CDF of a logistic distribution with location parameter μP and scale parameter σP are, respectively,

    $$\begin{array}{*{20}l} f_{P}(x;\mu_{P},\sigma_{P}) = & \frac{\exp\left(-\frac{|x- \mu_{P}|}{\sigma_{P}}\right)}{2 \sigma_{P}}, \end{array} $$
    (12)
    $$\begin{array}{*{20}l} {\text{and }} F_{P}(x; \mu_{P}, \sigma_{P}) = &\left\{\begin{array}{ll} \frac{1}{2} \exp\left(\frac{x - \mu_{P}}{\sigma_{P}} \right) & {\text{if }} x \leq \mu, \cr 1 - \frac{1}{2} \exp\left(- \frac{x - \mu_{P}}{\sigma_{P}} \right) & {\text{if }} x > \mu, \cr \end{array}\right. \end{array} $$
    (13)

    where −<x<,−<μP< and σP>0.

    Based on a random sample of size n, X1,X2,…,Xn, from the Laplace distribution, suppose \({\hat \mu }_{P}\) and \({\hat \sigma }_{P}\) are the maximum likelihood estimates of μP and σP, respectively, the lower and upper one-sided [ 100(1−α)%]/[100 ρ%] tolerance intervals are (−,LP,1) and (UP,1,) with

    $$\begin{array}{@{}rcl@{}} L_{P,1} & = & {\hat \mu}_{P} - k_{P, \alpha, \rho} {\hat \sigma}_{P} {\text{ and }} U_{P,1} = {\hat \mu}_{P} + k_{P, \alpha, \rho} {\hat \sigma}_{P}, \end{array} $$

    where

    $$\begin{array}{@{}rcl@{}} k_{P, \alpha, \rho} \approx - n \ln[2(1 - \rho)] + \frac{z_{1-\alpha}}{n - z_{1-\alpha}^{2}}\sqrt{n(1+ [\ln[2(1 - \rho)]^{2}) - z_{1-\alpha}^{2}}. \end{array} $$

    An approximate two-sided [ 100(1−α)%]/[ 100ρ%] tolerance interval, (LP,2,UP,2), is given by

    $$\begin{array}{@{}rcl@{}} L_{P,2} &=& {\hat \mu}_{P} - k_{P,1,\alpha/2,(\rho+1)/2} {\hat \sigma}_{P} {\text{ and }} U_{P,2} = {\hat \mu}_{P} + k_{P,2,\alpha/2,(\rho+1)/2} {\hat \sigma}_{P}. \end{array} $$
  • Gamma distribution: The PDF and CDF of the gamma distribution with parameters θG and βG are, respectively,

    $$\begin{array}{@{}rcl@{}} f_{G}(x;\theta_{G},\beta_{G}) = \frac{x^{\theta_{G}-1}\exp\left(-x/\beta_{G}\right)}{\beta_{G}^{\theta_{G}}\Gamma(\theta_{G})} \end{array} $$
    (14)

    and

    $$\begin{array}{@{}rcl@{}} F_{G}(x;\theta_{G},\beta_{G}) = \int_{0}^{x} \frac{t^{\theta_{G}-1}\exp\left(-t/\beta_{G}\right)}{\beta_{G}^{\theta_{G}}\Gamma(\theta_{G})} \mathrm{d}t, \end{array} $$
    (15)

    where x>0,θG>0 is the shape parameter, βG> 0 is the scale parameter, and \(\Gamma (a) = \int _{0}^{\infty } t^{a - 1} e^{-z} dt\) is the gamma function.

    For gamma distribution, the tolerance intervals can be obtained through the normal tolerance interval by considering a transformation of random variable (Krishnamoorthy et al. 2008). Suppose X is a gamma random variable with PDF and CDF in Eqs. (14) and (15), then X1/3 can be approximated by a normal distribution with mean μN and variance \(\sigma ^{2}_{N}\) defined as

    $$\begin{array}{@{}rcl@{}} \mu_{N} = \frac{\beta_{G}^{1/3}\Gamma(\theta_{G}+1/3)}{\Gamma(\theta_{G})} {\text{ and }} \sigma^{2}_{N} = \frac{\beta^{2/3}_{G}\Gamma(\theta_{G}+2/3)}{\Gamma(\theta_{G})} - \mu_{N}^{2}. \end{array} $$
    (16)

    Based on a random sample X1,X2,…,Xn from gamma distribution, we first obtain the maximum likelihood estimates of the parameters θG and βG, denoted as \({\hat \theta }_{G}\) and \({\hat \beta }_{G}\), respectively. Then, we substitute θG and βG by \({\hat \theta }_{G}\) and \({\hat \beta }_{G}\) into Eq. (16) to obtain \({\hat \mu }_{N}\) and \({\hat \sigma }_{N}^{2}\). After that, the one-sided and two-sided tolerance intervals for normal distribution (the upper and lower limits are denoted as LN and UN, respectively) can be obtained from Eqs. (6) and (7), respectively, based on \({\hat \mu }_{N}\) and \({\hat \sigma }_{N}^{2}\). The lower and upper [100(1−α)%]/[100ρ%] tolerance limits based on gamma distribution can be obtained as

    $$\begin{array}{@{}rcl@{}} L_{G} = L^{3}_{N} {\text{ and }} U_{G} = U^{3}_{N}. \end{array} $$
  • Weibull distribution: The PDF and CDF of the Weibull distribution with parameters βW and θW are, respectively,

    $$\begin{array}{@{}rcl@{}} f_{W}(x;\beta_{W},\theta_{W}) = \frac{\theta_{W}}{\beta_{W}} \left(\frac{x}{\beta_{W}} \right)^{\theta_{W} -1} \exp\left[- \left(\frac{x}{\beta_{W}}\right)^{\theta_{W}}\right] \end{array} $$
    (17)

    and

    $$\begin{array}{@{}rcl@{}} F_{W}(x;\beta_{W},\theta_{W}) = 1 - \exp\left[- \left(\frac{x}{\beta_{W}}\right)^{\theta_{W}}\right], \end{array} $$
    (18)

    where x>0,θW>0 is the shape parameter and βW>0 is the scale parameter.

    Based on a random sample X1,X2,…,Xn from Weibull distribution, we first obtain the maximum likelihood estimates of the parameters θW and βW, denoted as \({\hat \theta }_{W}\) and \({\hat \beta }_{W}\), respectively. Then, the lower and upper one-sided [100(1−α)%]/[100ρ%] tolerance intervals can be obtained as:

    $$\begin{array}{@{}rcl@{}} L_{W} & = & \exp\left[\ln(\hat{\theta}_{W}) - \frac{\hat{\beta}^{-1}_{W} t^{*}_{n-1;\alpha}\left(-\sqrt{n} \lambda_{\rho}\right)}{\sqrt{n-1}} \right] \\ U_{W} & = & \exp\left[\ln(\hat{\theta}_{W}) - \frac{\hat{\beta}^{-1}_{W} t^{*}_{n-1;1-\alpha}\left(-\sqrt{n} \lambda_{1-\rho}\right)}{\sqrt{n-1}} \right], \end{array} $$

    where λρ= ln(− ln(ρ)). A two-sided tolerance interval based on Weibull distribution can be obtained by replacing α by α/2 and ρ by (ρ+1)/2 in the above formulas for computing LW and UW.

  • Lognormal distribution: The PDF and CDF of the lognormal distribution with parameters μLN and σLN are, respectively,

    $$\begin{array}{@{}rcl@{}} f_{LN}\left(x;\mu_{LN},\sigma_{LN}\right) = \frac{1}{x\sigma_{LN} \sqrt{2\pi}}\exp\left[-\frac{\left(\ln x- \mu_{LN}\right)^{2}}{2\sigma^{2}_{LN}}\right], \end{array} $$
    (19)

    and

    $$\begin{array}{@{}rcl@{}} F_{LN}(x;\mu_{LN},\sigma_{LN}) = \int_{- 0}^{x} \frac{1}{t\sigma_{LN} \sqrt{2\pi}}\exp\left[-\frac{(\ln t- \mu_{LN})^{2}}{2\sigma^{2}_{LN}}\right] \mathrm{d}t, \end{array} $$
    (20)

    where x>0,σLN is the shape parameter (and is the standard deviation of the log of the distribution), μLN(−,) is the scale parameter (and is also the median of the distribution).

    Based on a random sample X1,X2,…,Xn from lognormal distribution, we can obtain the maximum likelihood estimates of the parameters μLN and σLN, denoted as \({\hat \mu }_{LN}\) and \({\hat \sigma }_{LN}\), respectively. Then, the one-sided and two-sided tolerance intervals for normal distribution (the upper and lower limits are denoted as LN and UN, respectively) can be obtained from Eqs. (6) and (7), respectively, based on \({\hat \mu }_{LN}\) and \({\hat \sigma }_{LN}^{2}\). The tolerance intervals based on lognomral distribution can be computed using the fact that Y= lnX follows a normal distribution if X follows a lognormal distribution, i.e., the lower and upper [100(1−α)%]/[100ρ%] tolerance limits based on lognormal distribution can be obtained as

    $$\begin{array}{@{}rcl@{}} L_{LN} = \exp(L_{N}) {\text{ and }} U_{LN} = \exp(U_{N}). \end{array} $$

Statistical software packages for tolerance intervals

3.1 Available statistical software packages

There are several statistical software packages that can provide the computation of tolerance intervals. In this subsection, we discuss several commonly used statistical software packages, including JMP (JMP Version 16, 2021), Minitab (Minitab 18 Statistical Software, 2017), NCSS (NCSS 2021 Statistical Software, 2021), Python (Python Core Team, 2015), R (R Core Team, 2020), and SAS (SAS Institute Inc, 2014), that provide computational procedures to calculate tolerance intervals based on various distributions.

All these six software packages discussed here provide computational procedures of tolerance intervals for normal distribution and nonparametric tolerance intervals. In R (R Core Team, 2020), the package tolerance (Young 2010; 2014) provides the computational procedures of tolerance intervals for more than 20 different distributions. Minitab (Minitab 18 Statistical Software 2017) provides the computation of tolerance intervals for 10 different distributions under the “Quality Tools". In Python (2015), toleranceinterval package provides the computation of nonparametric tolerance interval and parametric tolerance intervals for normal and lognormal distributions. The SAS Institute Inc (2014) procedure PROC CAPABILITY provides tolerance intervals for normal distribution and nonparametric distribution. The statistical distributions and procedures available in JMP, Minitab, NCSS, Python, and R, are summarized in Table 1.

Table 1 Procedures available in commonly used statistical software packages

3.2 Comparisons of different software packages

In comparing those six statistical software packages considered here, Python, SAS, JMP, and NCSS have very limited capability in computing tolerance intervals. The R package tolerance is the most comprehensive software package for computing tolerance intervals. Identical methods were implemented across different software packages for some distributions, yet different software packages use different formulas for other distributions. For instance, for one-sided tolerance intervals based on normal distribution, the formulas that are used in JMP, Minitab, NCSS, Python, and SAS are equivalent to the R function normtol.int in the tolerance package with the ‘EXACT’ method (i.e., method = ‘EXACT’). For two-sided tolerance intervals based on normal distribution, the formulas that are used in JMP, Minitab and SAS are equivalent to the R function normtol.int in the tolerance package with the ‘EXACT’ method (i.e., method = ‘EXACT’), while the formulas that are used in NCSS and Python are equivalent to the R function normtol.int in the tolerance package with the ‘HE’ method (i.e., method = ‘HE’). For nonparametric tolerance intervals, Minitab, NCSS and SAS use the procedure corresponding to the R function nptol.int in the tolerance package with the ‘WILK’ method (i.e., method = “WILK”), while JMP and Python use the procedure corresponding to the R function nptol.int in the tolerance package with the ‘HM’ method, i.e., method = “HM”). For lognormal distribution, the formula used in Minitab corresponds to the R function normtol.int in the tolerance package with the ‘EXACT’ method and setting log.norm = TRUE (i.e., method = “EXACT”, log.norm = T), while Python obtains the tolerance intervals based on log-transformation of the tolerance intervals for normal distribution. For the other distributions, however, Minitab and R use different computational formulas to obtain the tolerance intervals. The corresponding references for the formulas used in different software and the equivalence of the resulting tolerance intervals obtained from different software (grouping in parentheses) are summarized in Table 2.

Table 2 Comparison of different software packages for computing tolerance intervals of some commonly used distributions

Effect of model misspecification on tolerance intervals

4.1 Monte Carlo simulation studies

In this section, Monte Carlo simulation studies are used to evaluate the performance of tolerance intervals under different distributions in terms of the empirical confidence levels and population proportions of interest. Specifically, we evaluate the performance of tolerance intervals by assessing the closeness of the empirical probability Pr[C(L,U;θ)] to ρ and the empirical probability 1− Pr[C(L,U;θ)≥ρ] to α. We consider the cases that the assumed distribution is the same as the true underlying distribution and the assumed distribution is different from the true underlying distribution. In this simulation study, we generate random samples of size n from the statistical distributions F and compute the one- and two-sided tolerance intervals based on the distribution G, i.e, F is the true underlying distribution and G is the assumed distribution. As the true underlying distribution is usually unknown and not specified in practice, we compare the coverage of the tolerance interval to determine the robustness of the tolerance interval for different distributions.

Here, we consider a simulation study for symmetric distributions (normal, Cauchy, logistic, and Laplace distributions) and a simulation study for skewed distributions (gamma, Weibull, and lognormal distributions). For symmetric distributions, we consider the standard distributions by setting the location parameter to be 0 and the scale parameter to be 1. For skewed distributions, we consider the parameter settings based on the parameter estimates in a real data example presented in Section 6.2 (see, Table 25). Specifically, the following procedure is used in the Monte Carlo simulation study to evaluate the performance of the tolerance intervals for fixed values of α and ρ:

  • Generate a random sample of size n, (x1,x2,…,xn), from the true underlying distribution F;

  • Compute the tolerance interval using the sample (x1,x2,…,xn) based on the assumed distribution G. The tolerance interval obtained in the h-th simulation is denoted as [L(h),U(h)];

  • Obtain the probability that the random variable follows distribution F falls in between the upper and lower limits, i.e., C(L(h),U(h))=F(U(h))−F(L(h));

  • If C(L(h),U(h))≥ρ, set δ(h)=1, otherwise δ(h)=0;

  • Repeat Steps (i) – (iv) M times to obtain C(L(h),U(h)) and δ(h) for h=1,2,…,M.

The simulation results are based on M=10000 except for the normal tolerance intervals in R with ‘EXACT’ and ‘OCT’ methods due to the long computation time of these exact procedures in which M=1000 is used. For each setting, the following quantities are computed for comparison purposes:

  • \(\hat \alpha = 1 - \frac {\sum _{h=1}^{M} \delta ^{(h)}}{M}\);

  • \(\hat \rho = \frac {\sum _{h=1}^{M} C(L^{(h)},U^{(h)})}{M}\);

  • \(\hat {s} = \sqrt {\frac {\sum _{h=1}^{M} (C(L^{(h)},U^{(h)}) - \hat \rho)^{2}}{M - 1}}\).

We consider n=10, 25, 50 and 100, α=0.01, 0.05, 0.1 and 0.2, and ρ=0.9, 0.95, 0.99 and 0.995 in both the simulation studies for symmetric distributions and skewed distributions. If the tolerance intervals are performed as expected, the value of the \(\hat {\alpha }\) should be close to the corresponding α with a smaller value of \(\hat {\alpha }\) is preferred, and the value of \(\hat \rho \) should be close to the corresponding ρ with larger value of \({\hat {\rho }}\) is preferred. Moreover, the tolerance interval that gives smaller value of \(\hat {s}\) is preferred. To make it easier to assess the performance of different tolerance intervals and to take into account the Monte Carlo simulation errors, in the tables for those simulation results, we highlight those values of \(\hat \alpha \) within \(\pm 2 \sqrt {\alpha (1-\alpha)/M}\) and those values of \(\hat \rho \) within \(\pm 2 \sqrt {\rho (1-\rho)/M}\) in bold.

4.2 Simulation results and discussions

The simulation results under different settings when the assumed distribution is the same as the underlying distribution (i.e., F=G) are presented in Tables 3, 4, 5, 6, 7, 8 and 9. When the assumed distribution and the true underlying distribution are the same, we would expect the value of \({\hat {\alpha }}\) should be close to α and the value of \(\hat \rho \) should be close to ρ. However, we observe from Tables 3, 4, 5, 6, 7, 8 and 9 that when the sample size n is small, \(\hat \alpha \) can be larger than α under the correct model assumption. For example, in Table 4, when the underlying distribution is Cauchy with PDF and CDF in Eqs. (8) and (9), α=0.05,ρ=0.9 and n=10, the value of \(\hat \alpha \) is 0.1500. For moderate to large sample sizes (i.e., n=50 and n=100), the values of \(\hat \alpha \) are close or even smaller than the values of α in most cases. For the values of \(\hat \rho \), we observe that the values of \(\hat \rho \) are always greater than ρ under the correct model assumption. For the standard deviation \(\hat {s}\), the value decreases as the sample size n increases.

Table 3 Performance of tolerance intervals based on normal distribution when the true distribution is normal (F=G: Normal)
Table 4 Performance of tolerance intervals based on Cauchy distribution when the true distribution is Cauchy (F=G: Cauchy)
Table 5 Performance of tolerance intervals based on Laplace distribution when the true distribution is Laplace (F=G: Laplace)
Table 6 Performance of tolerance intervals based on logistic distribution when the true distribution is logistic (F=G: Logistic)
Table 7 Performance of tolerance intervals based on logistic distribution when the true distribution is logistic (F=G: Gamma)
Table 8 Performance of tolerance intervals based on logistic distribution when the true distribution is logistic (F=G: Weibull)
Table 9 Performance of tolerance intervals based on logistic distribution when the true distribution is logistic (F=G: Lognormal)

For the sake of saving space, we only present some representative simulation results under different settings when the assumed distribution is different from the underlying distribution (i.e., FG) in Tables 10, 11, 12 and 13, and the simulation results for other settings are presented in the Appendix (Tables 26–38). From Tables 10 and 11, we observe that the tolerance intervals computed under Cauchy and logistic distributions are robust to model misspecification when the true underlying distribution is normal. In Tables 10, 11, 12 and 13, the values of \(\hat {\alpha }\) are less than or equal to α and the values of \(\hat {\rho }\) are larger than ρ. However, the simulation results show that when the tolerance intervals are not robust under model misspecification in general. In Table 12, when the underlying true distribution is Cauchy (F: Cauchy) and the tolerance intervals are computed based on assuming normal distribution (G: Normal), the performance of tolerance intervals may not be satisfactory in terms of the closeness of \(\hat \alpha \) and \(\hat \rho \) to α and ρ, respectively. For example, in Table 12, when ρ = 0.99, α = 0.1 and n=50, the value of \(\hat \alpha \) is 0.7921, which is much larger than the desired level α=0.1 and the value of \(\hat \rho \) is 0.9696, which is smaller than the specific proportion ρ=0.99. Similar observations are obtained based on the results presented in Table 13 and Tables 26–38 in the Appendix for both symmetric and asymmetric distributions.

Table 10 Performance of tolerance intervals based on Cauchy distribution when the true distribution is normal (G: Cauchy; F: Normal)
Table 11 Performance of tolerance intervals based on Laplace distribution when the true distribution is normal (G: Laplace; F: Normal)
Table 12 Performance of tolerance intervals based on normal distribution when the true distribution is Cauchy (G: Normal; F: Cauchy)
Table 13 Performance of tolerance intervals based on logistic distribution when the true distribution is Laplace (G: Logistic; F: Laplace)

Based on the simulation results in this section, when the true distribution is different from the assumed distribution, the parametric tolerance intervals can be sensitive to the model misspecification and the performance of the tolerance intervals can be problematic in terms of the covering proportion ρ and the degree of confidence α. Hence, it is desired to develop an appropriate approach to compute the tolerance interval when the true underlying distribution is unknown. To address the issue of model uncertainty in practice, one plausible solution is using the nonparametric tolerance interval which does not require a distributional assumption. From a simulation study for the performance of nonparametric tolerance interval under the four symmetric distributions considered here (results are presented in the Appendix, Tables 39–42), the nonparametric tolerance intervals do not perform as well as the parametric tolerance intervals computed under the correct distributional assumption, i.e., G=F, and the values of \({\hat {\alpha }}\) and \({\hat {\rho }}\) can be far from the pre-specified values. For the aforementioned reasons, we propose a model selection approach when there are potential candidate distributions under consideration.

Proposed model selection approach

5.1 Model selection based on maximum likelihood

In this section, we propose a simple model selection approach based on the maximum likelihood for the construction of tolerance intervals under model uncertainty in order to reduce the negative effect of model misspecification. We calculate the maximum likelihood of each candidate distribution and choose the distribution that has the largest likelihood. In other words, we are choosing a distribution that is most likely to be the true distribution when the true distribution is unknown. Then, we calculate the tolerance interval based on the selected distribution. The proposed model selection approach is summarized as follows:

  • Based on the random sample x1,x2,…,xn, compute the values of maximum log-likelihood for each of the candidate distributions. For example, for the four symmetric distributions considered here, we have the value of maximum log-likelihood based on normal distribution

    $$\begin{array}{@{}rcl@{}} L_{N} = \sum_{i=1}^{n} \ln \left\{\frac{1}{\sigma_{N} \sqrt{2\pi}}\exp\left[-\frac{(x_{i} - \mu_{N})^{2}}{2\sigma_{N}^{2}}\right] \right\}, \end{array} $$
    (21)

    the value of maximum log-likelihood based on Cauchy distribution

    $$\begin{array}{@{}rcl@{}} L_{C} = \sum_{i=1}^{n} \ln \left\{\ \frac{1}{\pi\sigma_{C}} \left[\frac{\sigma_{C}^{2}}{(x_{i}-\mu_{C})^{2}+\sigma_{C}^{2}}\right] \right\}, \end{array} $$
    (22)

    the value of maximum log-likelihood based on logistic distribution

    $$\begin{array}{@{}rcl@{}} L_{L} = \sum_{i=1}^{n} \ln \left\{\frac{\exp\left[(-\frac{x_{i}-\mu_{L}}{\sigma_{L}})\right]}{\sigma_{L}(1+\exp\left[-\frac{x_{i}-\mu_{L}}{\sigma_{L}}\right])^{2}} \right\}, \end{array} $$
    (23)

    and the value of maximum log-likelihood based on Laplace distribution

    $$\begin{array}{@{}rcl@{}} L_{P} = \sum_{i=1}^{n} \ln \left[\frac{\exp\left(-\frac{|x_{i}- \mu_{P}|}{\sigma_{P}}\right)}{2\sigma_{P}} \right]. \end{array} $$
    (24)

    The maximum log-likelihood based on the asymmetric distributions considered here can be computed in a similar manner.

  • Select the distribution that gives the largest value of the maximum log-likelihood as the assumed distribution G and compute the tolerance interval based on distribution G.

Since the candidate models considered here have the same number of parameters, therefore, we use the values of the maximum likelihood for model selection. When the candidate models have a different number of parameters, some model selection criteria that penalize the model for having more parameters such as the Akaike’s information criterion (AIC) and the Bayesian information criterion (BIC) can be utilized for model selection.

5.2 Monte Carlo simulation study

In this subsection, we perform a simulation study as described in Section 4 to compute the values of \({\hat {\alpha }},{\hat {\rho }}\) and \({\hat {s}}\) based on the proposed model selection approach using maximum likelihood. For symmetric distributions, we consider the normal, Cauchy, logistic, and Laplace distributions as the candidate distributions. For skewed distributions, we consider the gamma, Weibull, and lognormal distributions as the candidate distributions.

The simulated results under different settings are presented in Tables 14, 15, 16, 17, 18, 19 and 20. From Tables 14, 15, 16, 17, 18, 19 and 20, we observe that the performance of the tolerance intervals computed based on the proposed model selection approach is not as good as the tolerance intervals when the assumed distribution and the true underlying distribution are the same, however, the performance of the tolerance intervals computed based on the proposed model selection approach is better than the tolerance intervals under model misspecification. For example, for 95%/99% tolerance intervals with sample size n=50 and the true underlying distribution is Cauchy, the values of \({\hat {\alpha }},{\hat {\rho }}\) and \({\hat {s}}\) are 0.0661, 0.9925 and 0.0016, respectively, when the assumed distribution is Cauchy (see, Table 4), the values of \({\hat {\alpha }},{\hat {\rho }}\) and \({\hat {s}}\) are 0.7823, 0.9708 and 0.0224, respectively, when the assumed distribution is normal (see, Table 12), and the values of \({\hat {\alpha }},{\hat {\rho }}\) and \({\hat {s}}\) are 0.1227, 0.9894 and 0.0129, respectively, when the proposed model selection approach is used (see, Table 15). We can see that the proposed model selection approach can effectively reduce the risk of model misspecification in the computation of tolerance intervals. Moreover, the performance of the tolerance intervals based on the proposed model selection approach can be better than the nonparametric tolerance intervals (e.g., for 95%/99% tolerance intervals with sample size n=50 and the true underlying distribution is Cauchy, the values of \({\hat {\alpha }},{\hat {\rho }}\) and \({\hat {s}}\) are 0.3884, 0.9879 and 0.0145, respectively). However, when compared with the nonparametric tolerance interval, the proposed model selection approach requires the specification of some suitable candidate distributions.

Table 14 Performance of parametric tolerance intervals based on the proposed model selection approach for symmetric distributions when the true underlying distribution is normal
Table 15 Performance of parametric tolerance intervals based on the model selection approach for symmetric distributions when the true distribution is Cauchy
Table 16 Performance of parametric tolerance intervals based on the proposed model selection approach for symmetric distributions when the true underlying distribution is Laplace
Table 17 Performance of parametric tolerance intervals base on the proposed model selection approach for symmetric distributions when the true underlying true distribution is logistic
Table 18 Performance of parametric tolerance intervals based on the model selection approach for skewed distributions when the true distribution is gamma
Table 19 Performance of parametric tolerance intervals based on the model selection approach for skewed distributions when the true distribution is Weibull
Table 20 Performance of parametric tolerance intervals based on the model selection approach for skewed distributions when the true distribution is lognormal

Illustrative examples

In this section, two numerical examples are used to compare the computations of tolerance intervals using different software packages and illustrate the proposed model selection approach.

6.1 Differences in flood levels data

In this example, we consider 33 differences in flood levels between two stations on Fox river which streams through Wisconsin. The data was originally gathered by Gumbel and Mustafi (1967), which were also discussed by Bain and Engelhardt (1973) and Puig and Stephens (2000). The dataset is presented in Table 21. We assume the data is coming from a normal distribution or a logistic distribution and compute the corresponding parametric tolerance intervals using JMP, Minitab, NCSS, Python, R, and SAS. For the computation in R, the functions normtol.int, logistol.int and nptol.int in the tolerance package with different method options are used to compute the parametric tolerance intervals based on normal and logistic distributions, respectively (Young 2010; 2014).

Table 21 Differences of flood levels between two stations on a river in Wisconsin

The 95%/95% tolerance intervals computed based on the data in Table 21 from different software packages are presented in Table 22. For tolerance intervals under normal distribution, we observe that the resulting intervals from JMP, Minitab, SAS, and R with method = ~EXACT~ are the same, while the resulting intervals from NCSS, Python, and R with method "HE" are the same. However, the tolerance intervals computed under logistic distribution are different in Minitab and R.

Table 22 The 95%/95% tolerance intervals based on the data in Table 21 computed using different software packages

To illustrate the proposed model selection approach, we consider that the normal and logistic distributions as the candidate distributions. The maximum likelihood estimates of the parameters μN and σN for the normal distribution are 9.3536 and 4.0205, respectively, and the value of maximum log-likelihood is -92.2417. The maximum likelihood estimates of the parameters μL and σL for logistic distribution are 9.4048 and 2.3611, respectively, and the value of maximum log-likelihood is -93.3586. Based on the values of the maximum log-likelihood, we select the logistic distribution over the normal distribution, and hence, we report the tolerance interval computed based on the logistic distribution for this data set.

6.2 Locomotive controls failure data

To illustrate the computation of tolerance intervals using Minitab, Python, and R for asymmetric distributions (JMP, NCSS, and SAS are not included since they only provide tolerance intervals based on the normal distribution), we consider a lifetime data set for locomotive controls. Nelson (1982) presented the miles to failure of 37 locomotive controls. This data set was also discussed by Krishnamoorthy and Xie (2011) and Yuan et al. (2018). The data set is presented in Table 23.

Table 23 Miles to failure of 37 locomotive controls (in 1000 of miles)

For illustrative purposes, we consider three commonly used lifetime distributions, the gamma, Weibull, and lognormal distributions, as candidate models for the lifetime of locomotive controls. The 95%/95% tolerance intervals under the gamma, Weibull, and lognormal distributions obtained from Minitab, Python, and R are presented in Table 24. Note that the tolerance intervals obtained from Minitab and R with method = ~EXACT~ under lognormal distribution are the same. However, under different statistical distributions, the resulting tolerance intervals obtained from Minitab and R are different.

Table 24 The 95%/95% tolerance intervals for the data set in Table 23 computed using Minitab, Python, and R

To apply the proposed model selection approach, we compute the maximum likelihood estimates of the model parameters and the corresponding values of maximum log-likelihood under the gamma, Weibull, and lognormal distributions, and present the results in Table 25. Since the Weibull distribution gives the largest likelihood among the three candidate models, we select the Weibull distribution and report the tolerance interval based on the Weibull distribution. Based on the results from R, the 95%/95% tolerance interval based on Weibull distribution is (23.884, 171.782). We are 95% confident that 95% of the locomotive controls will have lifetimes that are between 23884 miles and 171782 miles. If this does not satisfy the requirements of the railroad company, then the reliability of the locomotive controls needs to be improved in the manufacturing process.

Table 25 Maximum likelihood estimates of parameters and the corresponding maximum log-likelihood for the gamma, Weibull, and lognormal distributions based on the data in Table 23

Concluding remarks

In this paper, we discuss the computation of tolerance intervals available in commonly used statistical software packages including JMP, Minitab, NCSS, Python, R, and SAS. We evaluate the performance of tolerance intervals using Monte Carlo simulation under model misspecification by considering four symmetric distributions: normal, Cauchy, logistic, and Laplace distributions, and three asymmetric distributions: gamma, Weibull, and lognormal distributions. We observe that the performance of parametric tolerance intervals can be sensitive to model misspecification. Therefore, when the true underlying distribution is unknown and some candidate distributions are available, we propose a simple model selection approach and show that the proposed approach can effectively reduce the negative effect of misspecifying the underlying distribution in the performance of tolerance intervals. The computation of the tolerance intervals using different statistical software packages and the proposed model selection approach are illustrated by two numerical examples. For future research, we can compare the performance of the tolerance intervals obtained from different software packages with complete and incomplete data.

Appendix

Table 26 Performance of tolerance intervals based on logistic distribution when the true distribution is normal (G: Logistic; F: Normal)
Table 27 Performance of tolerance intervals based on Laplace distribution when the true distribution is Cauchy (G: Laplace; F: Cauchy)
Table 28 Performance of tolerance intervals based on Cauchy distribution when the true distribution is Laplace (G: Cauchy; F: Laplace)
Table 29 Performance of tolerance intervals based on normal distribution when the true distribution is Laplace (G: Normal; F: Laplace)
Table 30 Performance of tolerance intervals based on Laplace distribution when the true distribution is logistic (G: Laplace; F: Logistic)
Table 31 Performance of tolerance intervals based on Cauchy distribution when the true distribution is logistic (G: Cauchy; F: Logistic)
Table 32 Performance of tolerance intervals based on normal distribution when the true distribution is logistic (G: Normal; F: Logistic)
Table 33 Performance of tolerance intervals based on lognormal distribution when the true distribution is gamma (G: Lognormal; F: Gamma)
Table 34 Performance of tolerance intervals based on Weibull distribution when the true distribution is gamma (G: Weibull; F: Gamma)
Table 35 Performance of tolerance intervals based on gamma distribution when the true distribution is Weibull (G: Gamma; F: Weibull)
Table 36 Performance of tolerance intervals based on lognormal distribution when the true distribution is Weibull (G: Lognormal; F: Weibull)
Table 37 Performance of tolerance intervals based on gamma distribution when the true distribution is lognormal (G: Gamma; F: Lognormal)
Table 38 Performance of tolerance intervals based on Weibull distribution when the true distribution is lognormal (G: Weibull; F: Lognormal)
Table 39 Performance of nonparametric tolerance intervals when the true underlying distribution is normal
Table 40 Performance of nonparametric tolerance intervals when the true underlying distribution is Cauchy
Table 41 Performance of nonparametric tolerance intervals when the true underlying distribution is Laplace
Table 42 Performance of nonparametric tolerance intervals when the true underlying distribution is logistic

Availability of data and materials

The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Abbreviations

AIC:

Akaike’s information criterion

BIC:

Bayesian information criterion

CDF:

Cumulative distribution function

PDF:

Probability density function

References

  1. Bain, L. J., Engelhardt, M.: Interval Estimation for the Two-Parameter Double Exponential Distribution. Technometrics. 15, 875–887 (1973).

    MathSciNet  Article  Google Scholar 

  2. Bain, L. J., Engelhardt, M.: Simple Approximate Distributional Results for Confidence and Tolerance Limits for the Weibull Distribution Based on Maximum Likelihood Estimators. Technometrics. 23, 15–20 (1981).

    Article  Google Scholar 

  3. Bain, L., Englehardt, M.: Statistical analysis of reliability and life testing models: Theory and Methods. Second edition. Marcel Dekker, New York (1991).

    Google Scholar 

  4. Balakrishnan, N., (Ed): Handbook of the Logistic Distribution. Marcel Dekker, New York (1992).

    MATH  Google Scholar 

  5. Battelle Memorial Institute: Nonparametric Procedure. In: MMPDS-12: Metallic Materials Properties Development and Standardization. Battelle Memorial Institute, Columbus (2017).

    Google Scholar 

  6. Blischke, W. R., Murthy, D. N. P.: Reliability: Modeling, Prediction, and Optimization. Wiley, New York (2000).

    Book  Google Scholar 

  7. Bury, K.: Statistical Distributions in Engineering. Cambridge University Press, UK (1999).

    Book  Google Scholar 

  8. Coles, S.: An Introduction to Statistical Modeling of Extreme Values. Springer-Verlag, London (2001).

    Book  Google Scholar 

  9. David, H. A., Nagaraja, H. N.: Order Statistics. Third edition. Wiley, Hoboken (2003).

    Book  Google Scholar 

  10. Faulkenberry, G. D., Daly, J. C.: Sample size for tolerance limits on a normal distribution. Technometrics. 12, 831–821 (1970).

    MATH  Google Scholar 

  11. Fernandez, A. J.: Two-sided tolerance intervals in the exponential case: Corrigenda and generalizations. Comput. Stat. Data Anal. 54, 151–162 (2010).

    MathSciNet  Article  Google Scholar 

  12. Guenther, W. C.: Sampling Inspection in Statistical Quality Control. (Griffin’s Statistical Monographs, Number 37). London and High Wycombe, Griffin (2007).

    Google Scholar 

  13. Gumbel, E. J., Mustafi, C. K.: Some Analytical Properties of Bivariate Extremal Distributions. J. Am. Stat. Assoc. 62, 569–588 (1967).

    MathSciNet  Article  Google Scholar 

  14. Hahn, G. J., Meeker, W. Q.: Statistical Intervals: A Guide for Practitioners. Wiley, New York (1991).

    Book  Google Scholar 

  15. Hall, I. J.: One-Sided Tolerance Limits for a Logistic Distribution Based on Censored Samples. Biometrics. 31, 873–880 (1975).

    Article  Google Scholar 

  16. Hoew, W. G.: Two-Sided Tolerance Limits for Normal Populations - Some Improvements. J. Am. Stat. Assoc. 64, 610–620 (1969).

    Google Scholar 

  17. Hong, L. J., Huang, Z., Lam, H.: Learning-based robust optimization: Procedures and statistical guarantees. ArXiv Preprint ArXiv. 1704, 04342 (2017).

    Google Scholar 

  18. JMP®. Version 16. SAS Institute Inc., Cary, NC (2021).

  19. Krishnamoorthy, K., Mathew, T.: Statistical Tolerance Regions: Theory, Applications, and Computation. Wiley, Hoboken (2009).

    Book  Google Scholar 

  20. Krishnamoorthy, K., Mathew, T., Mukherjee, S.: Normal-based methods for a gamma distribution: prediction and tolerance intervals and stress-strength reliability. Technometrics. 50, 69–78 (2008).

    MathSciNet  Article  Google Scholar 

  21. Krishnamoorthy, K., Xie, F.: Tolerance intervals for symmetric location-scale families based on uncensored or censored samples. J. Stat. Plan. Infer. 141, 1170–1182 (2011).

    MathSciNet  Article  Google Scholar 

  22. Lawless, J. F.: Construction of tolerance bounds for the extreme-value and the Weibull distribution. Technometrics. 17, 255–261 (1975).

    MathSciNet  Article  Google Scholar 

  23. Meeker, W. Q., Hahn, G. J., Escobar, L. A.: Statistical Intervals: A Guide for Practitioners and Researchers. 2nd ed. Wiley, New York (2017).

    Book  Google Scholar 

  24. Minitab 18 Statistical Software: [Computer software]. State College, PA, Minitab Inc. (2017).

    Google Scholar 

  25. Nelson, W.: Applied Life Data Analysis. Wiley, New York (1982).

    Book  Google Scholar 

  26. NCSS 2021 Statistical Software: NCSS, LLC, Kaysville, Utah, USA (2021). ncss.com/software/ncss.

  27. Puig, P., Stephens, M. A.: Tests of Fit for the Laplace Distribution, with Applications. Technometrics. 42, 417–424 (2000).

    MathSciNet  Article  Google Scholar 

  28. Python: Python Core Team. Python: A dynamic, open source programming language. Python Software Foundation (2015). http://www.python.org. Accessed 1 Jan 2021.

  29. R Core Team: A Language and Environment for Statistical Computing R Foundation for Statistical Computing Vienna, Austria (2020). https://www.R-project.org/. Accessed 1 Jan 2021.

  30. Robbins, H.: On distribution-free tolerance limits in random sampling. Ann. Math. Stat. 15, 214–216 (1944).

    MathSciNet  Article  Google Scholar 

  31. SAS Institute Inc: The SAS system for Windows. Release 9.4. SAS Institute Inc., Cary NC (2014).

  32. Wald, A.: An Extension of Wilks Method for Setting Tolerance Limits. Ann. Math. Stat. 14, 45–55 (1943).

    MathSciNet  Article  Google Scholar 

  33. Wald, A., Wolfowitz, J.: Tolerance Limits for a Normal Distribution. Ann. Math. Stat. 17, 208–215 (1946).

    MathSciNet  Article  Google Scholar 

  34. Weissberg, A., Beatty, G.: Tables of Tolerance Limit Factors for Normal Distributions. Technometrics. 2, 483–500 (1969).

    MathSciNet  Article  Google Scholar 

  35. Wilks, S. S.: Determination of Sample Sizes for Setting Tolerance Limits. Ann. Math. Stat. 12, 91–96 (1941).

    MathSciNet  Article  Google Scholar 

  36. Wilks, S. S.: Statistical prediction with special reference to the problem of tolerance limits. Ann. Math. Stat. 13, 400–409 (1941).

    MathSciNet  Article  Google Scholar 

  37. Young, D. S.: Tolerance: An R Package for Estimating Tolerance Intervals. J. Stat. Softw. Artic. 36(5), 1–39 (2010).

    Google Scholar 

  38. Young, D. S.: Computing Tolerance Intervals and Regions in R. In: Rao, M. B., Rao, C. R. (eds.)Handbook of Statistics, Volume 32: Computational Statistics with R, pp. 309–338, North- Holland (2014).

  39. Young, D. S., Mathew, T.: Improved Nonparametric Tolerance Intervals Based on Interpolated and Extrapolated Order Statistics. J. Nonparametric Stat. 26, 415–432 (2014).

    MathSciNet  Article  Google Scholar 

  40. Yuan, M., Hong, Y., Escobar, L. A., Meeker, W. Q.: Two-sided tolerance intervals for members of the (log)-location-scale family of distributions". Qual. Technol. Quant. Manag. 15, 374–392 (2018).

    Article  Google Scholar 

Download references

Acknowledgements

The authors thank the editor and the anonymous referee for their useful comments and suggestions on an earlier version of this article which resulted in this improved version.

Funding

This work was supported by the Hamilton Undergraduate Research Award from Southern Methodist University. The second author’s work was also supported by a grant from the Simons Foundation (#709773).

Author information

Affiliations

Authors

Contributions

The authors carried out this work and drafted the manuscript collaboratively. All authors read and approved the manuscript.

Corresponding author

Correspondence to Hon Keung Tony Ng.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Cho, K.S., Tony Ng, H.K. Tolerance intervals in statistical software and robustness under model misspecification. J Stat Distrib App 8, 10 (2021). https://doi.org/10.1186/s40488-021-00123-2

Download citation

Keywords

  • Cauchy distribution
  • Maximum likelihood
  • Model selection
  • Model uncertainty