Tolerance intervals in statistical software and robustness under model misspecification

Cho, Kyung Serk; Tony Ng, Hon Keung

doi:10.1186/s40488-021-00123-2

Methodology
Open access
Published: 18 July 2021

Tolerance intervals in statistical software and robustness under model misspecification

Journal of Statistical Distributions and Applications volume 8, Article number: 10 (2021) Cite this article

4941 Accesses
2 Citations
Metrics details

Abstract

A tolerance interval is a statistical interval that covers at least 100ρ% of the population of interest with a 100(1−α)% confidence, where ρ and α are pre-specified values in (0, 1). In many scientific fields, such as pharmaceutical sciences, manufacturing processes, clinical sciences, and environmental sciences, tolerance intervals are used for statistical inference and quality control. Despite the usefulness of tolerance intervals, the procedures to compute tolerance intervals are not commonly implemented in statistical software packages. This paper aims to provide a comparative study of the computational procedures for tolerance intervals in some commonly used statistical software packages including JMP, Minitab, NCSS, Python, R, and SAS. On the other hand, we also investigate the effect of misspecifying the underlying probability model on the performance of tolerance intervals. We study the performance of tolerance intervals when the assumed distribution is the same as the true underlying distribution and when the assumed distribution is different from the true distribution via a Monte Carlo simulation study. We also propose a robust model selection approach to obtain tolerance intervals that are relatively insensitive to the model misspecification. We show that the proposed robust model selection approach performs well when the underlying distribution is unknown but candidate distributions are available.

Introduction

There are three types of statistical intervals commonly used in practice: confidence interval, prediction interval, and tolerance interval. Confidence intervals provide a range of values that are likely to include the unknown parameter with a specified degree of confidence, 100(1−α)%, based upon a random sample. A prediction interval is an interval, with a specified degree of confidence, 100(1−α)%, that the single future observation or multiple future observations from a population will fall between. A tolerance interval covers at least a specified proportion, ρ (0≤ρ≤1), of the population with a specified degree of confidence, 100(1−α)% with 0≤α≤1 (Hahn and Meeker 1991). It can be interpreted as we are 100(1−α)% confidence that at least 100ρ% of the population will be within the interval. This tolerance interval can be denoted as a [100(1−α)%]/[100ρ%] tolerance interval. For example, a quality engineer in a light bulb manufacturer needs to evaluate light bulbs’ life spans. The engineer randomly collects a sample of 100 light bulbs and reports the times to failure. The engineer wants to calculate a 95%/99% lower tolerance bound, which is the burn time that at least 99% of all light bulbs exceed with 95% confidence. Suppose the lower tolerance bound based on a normal distribution is 1085.947, so the engineer can claim that at least 99% of all the light bulbs exceed approximately 1086 hours of burn time with 95% confidence (Minitab 18 Statistical Software 2017). Tolerance intervals would be of particular interest in setting limits on the process capability for a product manufactured in large quantities (Hahn and Meeker 1991). Therefore, the tolerance interval is widely used in statistical quality control.

Despite the usefulness of tolerance intervals, the computation of tolerance intervals based on different distributional assumptions is not commonly implemented in statistical software packages. We found that only a few commonly used statistical software packages, such as Minitab (Minitab 18 Statistical Software 2017), R (R Core Team 2020) and SAS (SAS Institute Inc 2014), provides the computational procedures for tolerance intervals. The objective of this paper is twofold. First, we aim to compare different commonly used statistical software packages that offer computational procedures to compute tolerance intervals. Second, we evaluate the performance of tolerance intervals under model uncertainty and propose a robust model selection approach to compute the tolerance intervals.

The rest of this paper is organized as follows. In Section 2, we provide the notation for tolerance intervals and introduce the computation procedures available in commonly used statistical software packages. In Section 3, we evaluate the performance of tolerance intervals under model misspecification. In Section 4, we propose a model selection approach when the underlying probability model is unknown but some candidate models are available. Finally, in Section 5, some concluding remarks and future research directions are provided.

Tolerance interval and statistical models

2.1 Basics of tolerance intervals

Let X₁,X₂,…,X_n be a random sample of size n from a probability model with probability density function (PDF) f(x;θ) and cumulative distribution function (CDF) F(x;θ), where θ is the vector of parameters. We denote the observed values of X₁,X₂,…,X_n as x₁,x₂,…,x_n. In the case that the population mean μ and population standard deviation σ are unknown, these parameters are estimated by using the sample mean and sample standard deviation, $\bar {x} = \sum _{i=1}^{n} x_{i}/n$ and s = $\sqrt {\sum _{i=1}^{n}(x_{i} -\bar {x})^{2}/(n-1)}$, respectively. For example, for normally distributed data, a [100(1−α)%]/[100ρ%] tolerance interval has the form

$$\hspace{130pt} \bar{x} \pm\ ks,$$

where k is the tolerance factor, (1−α)∈(0,1) is the confidence level and ρ∈(0,1) is the population proportion of interest. Usually, the exact value of k for given values of α and ρ is not easy to compute (with the one-sided normal setting being an exception), therefore, most tolerance intervals are calculated based on approximation methods (Young 2010). We define

$$\begin{array}{@{}rcl@{}} C(L,U; \boldsymbol{\theta}) = F(U; \boldsymbol{\theta}) - F(L; \boldsymbol{\theta}) = \Pr(L < X < U),~~ U > L, \end{array} $$

as the coverage of a two-sided interval [L,U], where L and U are statistics computed from the sample. Then, a [100(1−α)%]/[100ρ%] two-sided tolerance interval [L,U] satisfies

$$\begin{array}{@{}rcl@{}} \Pr[C(L,U;\theta) \ge \rho] \ge 1 - \alpha, \end{array} $$

(1)

Similarly, a [100(1−α)%]/[100ρ%] upper one-sided tolerance interval [L,∞] satisfies

$$\begin{array}{@{}rcl@{}} \Pr[1- F(L; \theta) \ge \rho] \ge 1 - \alpha, \end{array} $$

(2)

and a [100(1−α)%]/[100ρ%] lower one-sided tolerance interval [−∞,U] satisfies

$$\begin{array}{@{}rcl@{}} \Pr[F(U;\theta) \ge \rho] \ge 1 - \alpha. \end{array} $$

(3)

For specific values of α and ρ, the two-sided, upper one-sided, and lower one-sided tolerance intervals can be obtained by finding the values of L and U that satisfy Eqs. (1), (2), and (3), respectively, for a specified underlying distribution.

To construct the tolerance interval, instead of assuming the data are coming from a particular parametric model, one can obtain a nonparametric tolerance interval based on order statistics (see, for example, Section 7.2 of David and Nagaraja (2003)). Specifically, the upper and lower nonparametric [100(1−α)%]/[ 100ρ%] tolerance limits are

$$\begin{array}{@{}rcl@{}} L = x_{r:n} {\text{ and }} U = x_{s:n}, \end{array} $$

where x_j:n is the j-th order statistic of the random sample x₁,x₂,…,x_n and the values of r and s (r<s) are chosen to satisfy Eq. (1) (Hahn and Meeker 1991; David and Nagaraja 2003).

2.2 Parametric tolerance intervals for some particular distributions

To illustrate the calculation of the tolerance intervals based on different distributions, we consider four symmetric distributions with location and scale parameters: normal (Gaussian), Cauchy, logistic, and Laplace distributions; and three two-parameter skewed (asymmetric) distributions with shape and scale parameters: gamma, Weibull, and lognormal distributions. The functional form of these seven distributions and the corresponding computational formulas for the tolerance intervals based on these distributions are presented in the following. For more details of the computation of tolerance intervals based on different distributions, one may refer to Young (2014).

Normal distribution: The PDF and CDF of a normal distribution with location parameter μ_N and scale parameter σ_N are, respectively,
$$\begin{array}{*{20}l} f_{N}(x;\mu_{N},\sigma_{N}) = & \frac{1}{\sigma_{N} \sqrt{2\pi}}\exp\left[-\frac{(x- \mu_{N})^{2}}{2\sigma_{N}^{2}}\right] \end{array} $$
(4)

$$\begin{array}{*{20}l} {\text{and }} F_{N}(x;\mu_{N},\sigma_{N}) = & \int_{- \infty}^{x} \frac{1}{\sigma_{N} \sqrt{2\pi}} \exp\left[-\frac{(t-\mu_{N})^{2}}{2\sigma_{N}^{2}}\right] \mathrm{d}t, \end{array} $$
(5)

where −∞<x<∞,−∞<μ_N<∞ and σ_N>0.

Based on a random sample of size n, X₁,X₂,…,X_n, from the normal distribution with PDF and CDF in Eqs. (4) and (5), respectively, suppose ${\hat \mu }_{N}$ and ${\hat \sigma }_{N}$ are the corresponding sample mean and sample standard deviation, then the lower and upper one-sided [100(1−α)%]/[100 ρ%] tolerance intervals are (−∞,L_N,1) and (U_N,1,∞) with
$$\begin{array}{@{}rcl@{}} L_{N,1} & = & {\hat \mu}_{N} - k_{N,1,\alpha,\rho} \hat{\sigma}_{N} {\text{ and }} U_{N,1} = {\hat \mu}_{N} + k_{N,1,\alpha,\rho} {\hat \sigma}_{N}, \end{array} $$
(6)

where the tolerance factor k_N,1,α,ρ can be obtained as
$$\hspace{90pt} k_{N,1,\alpha,\rho} = \frac{1}{\sqrt{n}}\mathrm{t}^{*}_{n-1;1-\alpha} (\sqrt{n}z_{\rho}),$$
with $t^{*}_{d;p}(\omega)$ is the p-th upper percentile of a non-central Student’s t-distribution with d degrees of freedom and non-centrality parameter ω, and z_p is the p-th upper percentile of the standard normal distribution. Note that the one-sided tolerance interval with tolerance factor k_N,1,α,ρ is an exact interval.

A two-sided [ 100(1−α)%]/[100 ρ%] tolerance interval under normal distribution, (L_N,2,U_N,2), is
$$\begin{array}{@{}rcl@{}} L_{N,2} & = & {\hat \mu}_{N} - k_{N,2,\alpha,\rho} {\hat \sigma}_{N} {\text{ and }} U_{N,2} = {\hat \mu}_{N} + k_{N,2,\alpha,\rho} {\hat \sigma}_{N}, \end{array} $$
(7)

where k_N,2 can be obtained as Hoew (1969) (see also, Guenther (2007))
$$\begin{array}{@{}rcl@{}} k_{N,2,\alpha,\rho} & = & \left(z_{\frac{1+\rho}{2}}\sqrt{1+n^{-1}} \right) \sqrt{\frac{n-1}{\chi^{2}_{n-1;\alpha}}} \sqrt{1+\frac{n-3-\chi^{2}_{n-1;\alpha}}{2(n+1)^{2}}}, \end{array} $$

and $\chi ^{2}_{d;p}$ is the p-th upper percentile of the chi-square distribution with d degrees of freedom. Note that the two-sided tolerance interval with tolerance factor k_N,2,α,ρ is an approximation. For the other ways to approximate the tolerance factor, one can refer to Section 2.3 of Krishnamoorthy and Mathew (2009).
Cauchy distribution: The PDF and CDF of a Cauchy distribution with location parameter μ_C and scale parameter σ_C are, respectively,
$$\begin{array}{*{20}l} f_{C}(x;\mu_{C},\sigma_{C}) = & \frac{1}{\pi\sigma_{C}} \left[\frac{\sigma_{C}^{2}}{(x-\mu_{C})^{2}+\sigma_{C}^{2}}\right] \end{array} $$
(8)

$$\begin{array}{*{20}l} {\text{and }} F_{C}(x; \mu_{C},\sigma_{C}) = & \int_{- \infty}^{x} \frac{1}{\pi\sigma_{C}} \left[\frac{\sigma_{C}^{2}}{(t-\mu_{C})^{2}+\sigma_{C}^{2}}\right] \mathrm{d}t, \end{array} $$
(9)

where −∞<x<∞,−∞<μ_C<∞ and σ_C>0.

Based on a random sample of size n, X₁,X₂,…,X_n, from the Cauchy distribution with PDF and CDF in Eqs. (8) and (9), respectively, suppose ${\hat \mu }_{C}$ and ${\hat \sigma }_{C}$ are the maximum likelihood estimates of μ_C and σ_C, respectively, then the lower and upper one-sided [ 100(1−α)%]/[100 ρ%] tolerance intervals are (−∞,L_C,1) and (U_C,1,∞) with
$$\begin{array}{@{}rcl@{}} L_{C,1} = {\hat \mu}_{C} - k_{C,\alpha,\rho} {\hat \sigma}_{C} {\text{ and }} U_{C,1} = {\hat \mu}_{C} + k_{C,\alpha,\rho} {\hat \sigma}_{C}, \end{array} $$

where k_C,α,ρ is defined as
$$\begin{array}{@{}rcl@{}} k_{C,\alpha,\rho} & = & \frac{z_{1-\alpha}}{\sqrt{n}}\sqrt{2+2[F^{-1}_{C}(1 - \rho;\mu_{C} = 0, \sigma_{C} = 1)]^{2}} \\ & & - F^{-1}_{C}(1 - \rho; \mu_{C} = 0, \sigma_{C} = 1), \end{array} $$

with $F^{-1}_{C}(p; \mu _{C} = 0, \sigma _{C} = 1) = \frac {1}{\pi (p^{2}+1)},p \in (0, 1)$. An approximate two-sided [100(1- α)%]/[100 ρ%] tolerance interval, (L_C,2,U_C,2), is given by
$$\begin{array}{@{}rcl@{}} L_{C,2} & = & {\hat \mu}_{C} - k_{C,\alpha/2,\rho/2} {\hat \sigma}_{C} {\text{ and }} U_{C,2} = {\hat \mu}_{C} + k_{C,\alpha/2,\rho/2} {\hat \sigma}_{C}. \end{array} $$
Logistic distribution: The PDF and CDF of a logistic distribution with location parameter μ_L and scale parameter σ_L are, respectively,
$$\begin{array}{*{20}l} f_{L}(x; \mu_{L}, \sigma_{L}) = & \frac{\exp \left[- \frac{x-\mu_{L}}{\sigma_{L}}\right]}{\sigma_{L}\left[1+\exp \left(- \frac{x-\mu_{L}}{\sigma_{L}}\right) \right]^{2}}, \end{array} $$
(10)

$$\begin{array}{*{20}l} {\text{and }} F_{L}(x;\mu_{L}, \sigma_{L}) = & \frac{1}{1+ \exp \left(- \frac{x-\mu_{L}}{\sigma_{L}} \right)}, \end{array} $$
(11)

where −∞<x<∞,−∞<μ_L<∞ and σ_L>0.

Based on a random sample of size n, X₁,X₂,…,X_n, from the logistic distribution with PDF and CDF in Eqs. (10) and (11), respectively, suppose ${\hat \mu }_{L}$ and ${\hat \sigma }_{L}$ are the maximum likelihood estimates of μ_L and σ_L, respectively, then the lower and upper one-sided [100(1−α)%]/[100ρ%] tolerance intervals are (−∞,L_L,1) and (U_L,1,∞) with
$$\begin{array}{@{}rcl@{}} L_{L,1} &=& {\hat \mu}_{L} - k_{L,1, \alpha, \rho} {\hat \sigma}_{L} {\text{ and }} U_{L,1} = {\hat \mu}_{L} + k_{L,2, \alpha, \rho} {\hat \sigma}_{L}, \end{array} $$

where k_L,1,α,ρ and k_L,2,α,ρ can be obtained as
$$\begin{array}{@{}rcl@{}} k_{L,1, \alpha, \rho} & \approx & \frac{t_{1, \alpha, \rho} + \sqrt{t^{2}_{1, \alpha, \rho} - u_{\alpha, \rho}v_{\alpha}}}{v_{\alpha}}, \\ k_{L,2, \alpha, \rho} & \approx & \frac{t_{2, \alpha, \rho} + \sqrt{t^{2}_{2,\alpha, \rho} - u_{\alpha, \rho} v_{\alpha}}}{v_{\alpha}}, \end{array} $$

and
$$\begin{array}{@{}rcl@{}} t_{1, \alpha, \rho} & = & {F}^{-1}_{L}(\rho;\mu = 0, \sigma = 1) - \hat\sigma_{12} {z}^{2}_{1 - \alpha}, \\ t_{2, \alpha, \rho} & = & {F}^{-1}_{L}(\rho;\mu = 0, \sigma = 1) + \hat\sigma_{12} {z}^{2}_{1 - \alpha}, \\ u_{\alpha, \rho} & = & [{F}^{-1}_{L}(\rho;\mu = 0, \sigma = 1)]^{2} - \hat{\sigma}^{2}_{1}{z}^{2}_{1 - \alpha}, \\ v_{\alpha} & = & 1 - {\hat\sigma}^{2}_{2}{z}^{2}_{1 - \alpha}, \end{array} $$

${F}^{-1}_{L}(p;\mu _{L} = 0, \sigma _{L} = 1) = \ln [p/(1-p)],p \in (0, 1),\hat {\sigma }^{2}_{1}$ and $\hat {\sigma }^{2}_{2}$ are the variances of ${\hat \mu }_{L}$ and ${\hat \sigma }_{L}$, respectively, and $\hat \sigma _{12}$ is the covariance of ${\hat \mu }_{L}$ and ${\hat \sigma }_{L}$.

An approximate two-sided [ 100(1−α)%]/[ 100ρ%] tolerance interval, (L_L,2,U_L,2), is given by
$$\begin{array}{@{}rcl@{}} L_{L,2} &=& {\hat \mu}_{L} - k_{L,1,\alpha/2,(\rho+1)/2} {\hat \sigma}_{L} {\text{ and }} U_{L,2} = {\hat \mu}_{L} + k_{L,2,\alpha/2,(\rho+1)/2} {\hat \sigma}_{L}. \end{array} $$

Note that tolerance intervals under logistic distribution cannot be calculated if $t^{2}_{1, \alpha, \rho } - u_{\alpha, \rho } v_{\alpha } < 0$ or $t^{2}_{2, \alpha, \rho } - u_{\alpha, \rho } v_{\alpha } < 0$.
Laplace distribution: The PDF and CDF of a logistic distribution with location parameter μ_P and scale parameter σ_P are, respectively,
$$\begin{array}{*{20}l} f_{P}(x;\mu_{P},\sigma_{P}) = & \frac{\exp\left(-\frac{|x- \mu_{P}|}{\sigma_{P}}\right)}{2 \sigma_{P}}, \end{array} $$
(12)

$$\begin{array}{*{20}l} {\text{and }} F_{P}(x; \mu_{P}, \sigma_{P}) = &\left\{\begin{array}{ll} \frac{1}{2} \exp\left(\frac{x - \mu_{P}}{\sigma_{P}} \right) & {\text{if }} x \leq \mu, \cr 1 - \frac{1}{2} \exp\left(- \frac{x - \mu_{P}}{\sigma_{P}} \right) & {\text{if }} x > \mu, \cr \end{array}\right. \end{array} $$
(13)

where −∞<x<∞,−∞<μ_P<∞ and σ_P>0.

Based on a random sample of size n, X₁,X₂,…,X_n, from the Laplace distribution, suppose ${\hat \mu }_{P}$ and ${\hat \sigma }_{P}$ are the maximum likelihood estimates of μ_P and σ_P, respectively, the lower and upper one-sided [ 100(1−α)%]/[100 ρ%] tolerance intervals are (−∞,L_P,1) and (U_P,1,∞) with
$$\begin{array}{@{}rcl@{}} L_{P,1} & = & {\hat \mu}_{P} - k_{P, \alpha, \rho} {\hat \sigma}_{P} {\text{ and }} U_{P,1} = {\hat \mu}_{P} + k_{P, \alpha, \rho} {\hat \sigma}_{P}, \end{array} $$

where
$$\begin{array}{@{}rcl@{}} k_{P, \alpha, \rho} \approx - n \ln[2(1 - \rho)] + \frac{z_{1-\alpha}}{n - z_{1-\alpha}^{2}}\sqrt{n(1+ [\ln[2(1 - \rho)]^{2}) - z_{1-\alpha}^{2}}. \end{array} $$

An approximate two-sided [ 100(1−α)%]/[ 100ρ%] tolerance interval, (L_P,2,U_P,2), is given by
$$\begin{array}{@{}rcl@{}} L_{P,2} &=& {\hat \mu}_{P} - k_{P,1,\alpha/2,(\rho+1)/2} {\hat \sigma}_{P} {\text{ and }} U_{P,2} = {\hat \mu}_{P} + k_{P,2,\alpha/2,(\rho+1)/2} {\hat \sigma}_{P}. \end{array} $$
Gamma distribution: The PDF and CDF of the gamma distribution with parameters θ_G and β_G are, respectively,
$$\begin{array}{@{}rcl@{}} f_{G}(x;\theta_{G},\beta_{G}) = \frac{x^{\theta_{G}-1}\exp\left(-x/\beta_{G}\right)}{\beta_{G}^{\theta_{G}}\Gamma(\theta_{G})} \end{array} $$
(14)

and
$$\begin{array}{@{}rcl@{}} F_{G}(x;\theta_{G},\beta_{G}) = \int_{0}^{x} \frac{t^{\theta_{G}-1}\exp\left(-t/\beta_{G}\right)}{\beta_{G}^{\theta_{G}}\Gamma(\theta_{G})} \mathrm{d}t, \end{array} $$
(15)

where x>0,θ_G>0 is the shape parameter, β_G> 0 is the scale parameter, and $\Gamma (a) = \int _{0}^{\infty } t^{a - 1} e^{-z} dt$ is the gamma function.

For gamma distribution, the tolerance intervals can be obtained through the normal tolerance interval by considering a transformation of random variable (Krishnamoorthy et al. 2008). Suppose X is a gamma random variable with PDF and CDF in Eqs. (14) and (15), then X^1/3 can be approximated by a normal distribution with mean μ_N and variance $\sigma ^{2}_{N}$ defined as
$$\begin{array}{@{}rcl@{}} \mu_{N} = \frac{\beta_{G}^{1/3}\Gamma(\theta_{G}+1/3)}{\Gamma(\theta_{G})} {\text{ and }} \sigma^{2}_{N} = \frac{\beta^{2/3}_{G}\Gamma(\theta_{G}+2/3)}{\Gamma(\theta_{G})} - \mu_{N}^{2}. \end{array} $$
(16)

Based on a random sample X₁,X₂,…,X_n from gamma distribution, we first obtain the maximum likelihood estimates of the parameters θ_G and β_G, denoted as ${\hat \theta }_{G}$ and ${\hat \beta }_{G}$, respectively. Then, we substitute θ_G and β_G by ${\hat \theta }_{G}$ and ${\hat \beta }_{G}$ into Eq. (16) to obtain ${\hat \mu }_{N}$ and ${\hat \sigma }_{N}^{2}$. After that, the one-sided and two-sided tolerance intervals for normal distribution (the upper and lower limits are denoted as L_N and U_N, respectively) can be obtained from Eqs. (6) and (7), respectively, based on ${\hat \mu }_{N}$ and ${\hat \sigma }_{N}^{2}$. The lower and upper [100(1−α)%]/[100ρ%] tolerance limits based on gamma distribution can be obtained as
$$\begin{array}{@{}rcl@{}} L_{G} = L^{3}_{N} {\text{ and }} U_{G} = U^{3}_{N}. \end{array} $$
Weibull distribution: The PDF and CDF of the Weibull distribution with parameters β_W and θ_W are, respectively,
$$\begin{array}{@{}rcl@{}} f_{W}(x;\beta_{W},\theta_{W}) = \frac{\theta_{W}}{\beta_{W}} \left(\frac{x}{\beta_{W}} \right)^{\theta_{W} -1} \exp\left[- \left(\frac{x}{\beta_{W}}\right)^{\theta_{W}}\right] \end{array} $$
(17)

and
$$\begin{array}{@{}rcl@{}} F_{W}(x;\beta_{W},\theta_{W}) = 1 - \exp\left[- \left(\frac{x}{\beta_{W}}\right)^{\theta_{W}}\right], \end{array} $$
(18)

where x>0,θ_W>0 is the shape parameter and β_W>0 is the scale parameter.

Based on a random sample X₁,X₂,…,X_n from Weibull distribution, we first obtain the maximum likelihood estimates of the parameters θ_W and β_W, denoted as ${\hat \theta }_{W}$ and ${\hat \beta }_{W}$, respectively. Then, the lower and upper one-sided [100(1−α)%]/[100ρ%] tolerance intervals can be obtained as:
$$\begin{array}{@{}rcl@{}} L_{W} & = & \exp\left[\ln(\hat{\theta}_{W}) - \frac{\hat{\beta}^{-1}_{W} t^{*}_{n-1;\alpha}\left(-\sqrt{n} \lambda_{\rho}\right)}{\sqrt{n-1}} \right] \\ U_{W} & = & \exp\left[\ln(\hat{\theta}_{W}) - \frac{\hat{\beta}^{-1}_{W} t^{*}_{n-1;1-\alpha}\left(-\sqrt{n} \lambda_{1-\rho}\right)}{\sqrt{n-1}} \right], \end{array} $$

where λ_ρ= ln(− ln(ρ)). A two-sided tolerance interval based on Weibull distribution can be obtained by replacing α by α/2 and ρ by (ρ+1)/2 in the above formulas for computing L_W and U_W.
Lognormal distribution: The PDF and CDF of the lognormal distribution with parameters μ_LN and σ_LN are, respectively,
$$\begin{array}{@{}rcl@{}} f_{LN}\left(x;\mu_{LN},\sigma_{LN}\right) = \frac{1}{x\sigma_{LN} \sqrt{2\pi}}\exp\left[-\frac{\left(\ln x- \mu_{LN}\right)^{2}}{2\sigma^{2}_{LN}}\right], \end{array} $$
(19)

and
$$\begin{array}{@{}rcl@{}} F_{LN}(x;\mu_{LN},\sigma_{LN}) = \int_{- 0}^{x} \frac{1}{t\sigma_{LN} \sqrt{2\pi}}\exp\left[-\frac{(\ln t- \mu_{LN})^{2}}{2\sigma^{2}_{LN}}\right] \mathrm{d}t, \end{array} $$
(20)

where x>0,σ_LN is the shape parameter (and is the standard deviation of the log of the distribution), μ_LN∈(−∞,∞) is the scale parameter (and is also the median of the distribution).

Based on a random sample X₁,X₂,…,X_n from lognormal distribution, we can obtain the maximum likelihood estimates of the parameters μ_LN and σ_LN, denoted as ${\hat \mu }_{LN}$ and ${\hat \sigma }_{LN}$, respectively. Then, the one-sided and two-sided tolerance intervals for normal distribution (the upper and lower limits are denoted as L_N and U_N, respectively) can be obtained from Eqs. (6) and (7), respectively, based on ${\hat \mu }_{LN}$ and ${\hat \sigma }_{LN}^{2}$. The tolerance intervals based on lognomral distribution can be computed using the fact that Y= lnX follows a normal distribution if X follows a lognormal distribution, i.e., the lower and upper [100(1−α)%]/[100ρ%] tolerance limits based on lognormal distribution can be obtained as
$$\begin{array}{@{}rcl@{}} L_{LN} = \exp(L_{N}) {\text{ and }} U_{LN} = \exp(U_{N}). \end{array} $$

Statistical software packages for tolerance intervals

3.1 Available statistical software packages

There are several statistical software packages that can provide the computation of tolerance intervals. In this subsection, we discuss several commonly used statistical software packages, including JMP (JMP Version 16, 2021), Minitab (Minitab 18 Statistical Software, 2017), NCSS (NCSS 2021 Statistical Software, 2021), Python (Python Core Team, 2015), R (R Core Team, 2020), and SAS (SAS Institute Inc, 2014), that provide computational procedures to calculate tolerance intervals based on various distributions.

All these six software packages discussed here provide computational procedures of tolerance intervals for normal distribution and nonparametric tolerance intervals. In R (R Core Team, 2020), the package tolerance (Young 2010; 2014) provides the computational procedures of tolerance intervals for more than 20 different distributions. Minitab (Minitab 18 Statistical Software 2017) provides the computation of tolerance intervals for 10 different distributions under the “Quality Tools". In Python (2015), toleranceinterval package provides the computation of nonparametric tolerance interval and parametric tolerance intervals for normal and lognormal distributions. The SAS Institute Inc (2014) procedure PROC CAPABILITY provides tolerance intervals for normal distribution and nonparametric distribution. The statistical distributions and procedures available in JMP, Minitab, NCSS, Python, and R, are summarized in Table 1.

Table 1 Procedures available in commonly used statistical software packages

Tolerance intervals in statistical software and robustness under model misspecification

Abstract

Introduction

Tolerance interval and statistical models

2.1 Basics of tolerance intervals

2.2 Parametric tolerance intervals for some particular distributions

Statistical software packages for tolerance intervals

3.1 Available statistical software packages

3.2 Comparisons of different software packages

Effect of model misspecification on tolerance intervals

4.1 Monte Carlo simulation studies

4.2 Simulation results and discussions

Proposed model selection approach

5.1 Model selection based on maximum likelihood

5.2 Monte Carlo simulation study

Illustrative examples

6.1 Differences in flood levels data

6.2 Locomotive controls failure data

Concluding remarks

Appendix

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords