Exact distributions of statistics for making inferences on mixed models under the default covariance structure

Weerahandi, Samaradasa; Yu, Ching-Ray

doi:10.1186/s40488-020-00105-w

Methodology
Open access
Published: 17 August 2020

Exact distributions of statistics for making inferences on mixed models under the default covariance structure

Samaradasa Weerahandi¹ &
Ching-Ray Yu²

Journal of Statistical Distributions and Applications volume 7, Article number: 4 (2020) Cite this article

2371 Accesses
4 Citations
Metrics details

Abstract

At this juncture when mixed models are heavily employed in applications ranging from clinical research to business analytics, the purpose of this article is to extend the exact distributional result of Wald (Ann. Math. Stat. 18: 586–589, 1947) to handle models involving a number of variance components.Due to the unavailability of exact distributional results for underlying statistics, currently available methods provide small group/sample inference only for balanced ANOVA models or simple regression models. The exact distributional results developed in this article should prove useful in making inferences by such methods as parametric bootstrap, fiducial, and generalized p-value approach, when there are a number of variance components to deal with.

Introduction

Ever since Wald (1947) published his celebrated paper, basically there has been only relaxation of the assumptions (cf. Seely and El-Bassiounni (1983)) made in that paper involving a simple mixed model. Therefore, the purpose of this article is to extend Wald’s result to develop additional exact distribution theory to tackle various inference problems in mixed models having the compound symmetric covariance structure.

The variance components in mixed models (cf. Searle et al. (2006)) play an important role in such applications as clinical analytics, business analytics, and industrial process control (cf. Hamada and Weerahandi (2000), and Wu and Hamada (2009)), a whole area that classical approach to inference fails to provide small sample inference. However, the solutions they provided are valid only for ANOVA type problems.

Mixed-effects models first became popular due to the ease of handling covariance structures associated with longitudinal data, and then due to the well known advantage (cf. Henderson (1975) and Robinson (1991)) of BLUPs (Best linear unbiased predictors) over the LSEs (Least squares estimators) when a practitioner has to make inferences on a number of groups (e.g. markets, patient groups, and treatment levels) in a grouping structure. In fact, in Corporate America the BLUP has replaced the LSE as the most widely used statistical technique in advanced business analytics.

In mixed-effects regression models, the MLE (Maximum likelihood estimator) based methods, are basically the only class of methods available for making inferences on models involving a number of random effects. But such solutions are asymptotic methods requiring, not only the sample sizes to be large, but also the number of levels in a grouping structure to be large, an unreasonable assumption. Therefore, in this article we will also briefly develop tests and interval estimates for variance components and their functions.

The model and problems

To describe the underlying problems more precisely, consider a regression model in a mixed model setting with a number of random effects in a grouping structure,

$$ \mathbf{y} =\mathbf{X} \boldmath{\beta} + \mathbf{Z_{1} u_{1} + Z_{2} u_{2} + \cdots +Z_{J} u_{J} } +\boldsymbol{\epsilon}, $$

(1)

where X is an N×k matrix formed by a set of covariates corresponding to k fixed effects β,Z_j are N×k_j matrices formed by covariates corresponding to random effects u_j of j^th factor with k_j levels. We make the usual assumption that $\boldsymbol {\epsilon } \sim N\left (0, \sigma _{e}^{2} \mathbf {I}_{\mathbf {N}}\right)$. Also using the most widely used compound symmetric covariance structure, which is also the default setting in statistical software packages, we also make the assumption $\mathbf {u_{j}} \sim N\left (\mathbf {0}, \sigma _{\mathbf {j}}^{2} \mathbf {I}_{\mathbf {j}}\right), j=1, \cdots, J$, and that they are distributed independently of each other and the error terms. It should be noted that the vector of fixed effects β contain the overall mean effect of each of the factors having random effects, which we denote as β_zj, j=1,⋯,J.

The model can also be written in familiar compact form as

$$ \mathbf{y} =\mathbf{X} \boldmath{\beta} + \mathbf{Z u} +\boldsymbol{\epsilon}, $$

(2)

where Z=(Z₁,⋯,Z_J) and u=(u₁,⋯,u_J)^′. Henderson (1975) provided the formulas for the BLUEs (Best linear unbiased estimators) of the fixed effects and the BLUPs of random effects when the variances are known. Robinson (1991) showed that various derivations available in the literature for these quantities all lead to Henderson’s formulas

$$ \mathbf{X}' \mathbf{X} \hat{\boldsymbol{\beta}} + \mathbf{X}' \mathbf{Z} \tilde{\mathbf{u}} =\mathbf{X}' \mathbf{y}, $$

(3)

$$\mathbf{Z}' \mathbf{X} \hat{\boldsymbol{\beta}} + (\mathbf{Z}' \mathbf{Z} + \Sigma^{-1}) \tilde{\mathbf{u}} =\mathbf{Z}' \mathbf{y}, $$

where Σ=Var(u) and $\tilde {\mathbf {u}}$ is the BLUP of u. Under the compound symmetric covariance structure, the default in statistical software packages, $\Sigma _{\rho } = \Sigma /\sigma _{e}^{2} = diag(\rho _{1} \mathbf {I_{1}},\rho _{2} \mathbf {I_{2}}, \cdots, \rho _{\mathbf {J}} \mathbf {I}_{\mathbf {J}}) $, where $\rho _{j}= \sigma _{j}^{2} /\sigma _{e}^{2}$. The equations in (3) can be solved explicitly for $\hat {\boldsymbol {\beta }}$ and $\tilde {\mathbf {u}}$. It is evident from (2) and was formally proved by Henderson et al. (1975) that $\hat {\boldsymbol {\beta }}$ is actually the GLSE (Generalized least squares estimator)

$$ \hat{\boldsymbol{\beta}}= (\mathbf{X}' \Gamma^{-1} \mathbf{X})^{-1} X' \Gamma^{-1} \mathbf{y}, \text{ where } \Gamma=\mathbf{I_{N}}+\mathbf{Z} \Sigma_{\rho} \mathbf{Z}'. $$

(4)

Then, we can easily obtain the explicit solution for the random part of the predictor as

$$ \tilde{\mathbf{u}} = (\mathbf{Z}' \mathbf{Z} + \Sigma_{\rho}^{-1})^{-1} \mathbf{Z}' (\mathbf{y} - \mathbf{X} \widehat{\boldsymbol{\beta}}). $$

(5)

2.1 The issue with available results

When we go about making inferences in applications involving a number of variance components, we encounter the fact that the variance components are unknown parameters, and hence need to be tackled before we can carry out any kind of inference. The most widely used method is to estimate them by the REML (Restricted maximum likelihood) or the ML (Maximum likelihood), and then carry out tests and interval estimates using their asymptotic distributions. It is well known that (cf. Weerahandi (2004) and Gamage et al. (2013)), due to using such asymptotic assumptions, which require not only the sample sizes to be large, but also the number of factor levels in a grouping structure to be large, an unreasonable assumption in practice.

Therefore, the purpose of this article is to develop inference methods without relying on asymptotic distributional results. Since variance components are important in their own merit in a variety of applications we will undertake one application in the area of industrial process control.

Exact distributions of LSEs

To solve problems involving variance components, regardless one takes the GPQ (Generalized pivotal quantity) based approach, parametric bootstrap approach, or generalized fiducial approach, we need to tackle the variance components, which are typically unknown. The inferences based on MLE and asymptotic results are highly inaccurate in that they seriously miss the intended Type I error in testing of hypotheses and the intended probability coverage in interval estimation

3.1 Distributions for inferences on variance components

By considering the ordinary least squares regression of $\widetilde {\mathbf {X}} =(\mathbf {X}, \mathbf {Z})$ on y, Wald (1947) showed, albeit in non-matrix notation, that the distribution of the residual sum of squares, $ S_{e}=\mathbf {y}^{\prime } \left (\mathbf {I}_{\mathbf {n}} - \widetilde {\mathbf {X}}(\widetilde {\mathbf {X}}^{\prime }\widetilde {\mathbf {X}})^{-1} \right) \mathbf {y} $ is related to the familiar chi-squared distribution

$$ U =\frac{S_{e}}{\sigma_{e}^{2}}\mathbf{\sim }\chi_{e}^{2} \text{, where } e=N- rank(\widetilde{\mathbf{X}}). $$

(6)

Moreover, U was shown to be distributed independently of u. Typically $\tilde {\mathbf {X}}$ is not of full rank as Wald (1947) implicitly assumed, but the assumption can be maintained by redefining $\tilde {\mathbf {X}}$ as the non-singular sub-matrix that one gets if the regression is run blindly. Gamage et al. (2013) showed how one variance component can be tackled by transforming $\tilde {\mathbf {X}}$ by an orthonormal basis for the vector space spanned by its columns. Actually, any number of variance components in a compound symmetric structure can be handled more conveniently and efficiently based on the LSE results, because underlying LSE results based on the QR algorithm is highly computationally efficient and fool proof in handling non-singular design matrices. This is accomplished by extending Wald’s argument for any number of random effects, and properly redefining the $\tilde {\mathbf {X}}$ matrix.

From the theory of regression and extended Wald results, it is known that $\hat {\mathbf {u}} | \mathbf {u} \sim \mathbf {N}\left (\mathbf {u}, \sigma _{\mathbf {e}}^{\mathbf {2}} {(\tilde {\mathbf {X}}'\tilde {\mathbf {X}})_{\mathbf {u}}^{\mathbf {-1}}}\right)$, where ${(\tilde {\mathbf {X}}'\tilde {\mathbf {X}})_{\mathbf {u}}^{\mathbf {-1}}}) $ is the sub-matrix of ${(\tilde {\mathbf {X}}'\tilde {\mathbf {X}})^{-1}}$ corresponding to the u part of the LSE regression. Therefore, the unconditional as well as the conditional distribution of $(\hat {\mathbf {u}} -\mathbf {u})$ given u is given by

$$(\hat{\mathbf{u}} -\mathbf{u})| \mathbf{u} \sim N \left(\mathbf{0}, \sigma_{\mathbf{e}}^{\mathbf{2}} (\tilde{\mathbf{X}}'\tilde{\mathbf{X}})_{\mathbf{u}}^{\mathbf{-1}} \right), $$

a distribution free of u. Hence the unconditional distribution of

$$(\hat{\mathbf{u}} -\mathbf{u}) \sim N \left(\mathbf{0}, \sigma_{\mathbf{e}}^{\mathbf{2}} (\tilde{\mathbf{X}}'\tilde{\mathbf{X}})_{\mathbf{u}}^{\mathbf{-1}} \right), $$

is also the same and distributed independently of u. Hence, the LSE, as opposed to Henderson’s predictor $\tilde {\mathbf {u}}$, of u is distributed as

$$ \hat{\mathbf{u}} = (\widehat{\mathbf{u}} -\mathbf{u})+ \mathbf{u} \sim N \left(\mathbf{0}, \sigma_{\mathbf{e}}^{\mathbf{2}} {(\tilde{\mathbf{X}'} \tilde{\mathbf{X}})_{\mathbf{u}}^{\mathbf{-1}}} + \Sigma_{u} \right), $$

(7)

where Σ_u is the covariance matrix of u. In particular, individual variance components and variance ratios of practical importance can be tackled based on the distributional result

$$ \hat{\mathbf{u}}_{j} \sim N\left(\mathbf{0}, \sigma_{e}^{2} \left({(\tilde{\mathbf{X}}'\tilde{\mathbf{X}})_{\mathbf{j}}^{\mathbf{-1}}} +\rho_{j} I_{j} \right) \right) \hbox{, where } \rho_{j} = \frac{\sigma_{j}^{2}}{\sigma_{e}^{2}}. $$

(8)

In computing $\hat {\mathbf {u}}_{j}$s by LSE, perhaps it is convenient to run the unconstrained regression, and then center each $\hat {\mathbf {u}}_{j}$ by subtracting the mean, the mean of estimated β_zj.

3.2 Distributions for functions of variance components

The distributional result (8) allows us to handle any variance component by the distribution of the corresponding component of $\hat {\mathbf {u}}$. In solving some of the inference problems in quality assurance and in applications of the BLUP, however, we also need to tackle multiple variance components simultaneously. To develop necessary distributional results for such applications, consider two factors of interests, and let $\hat {\mathbf {u}}_{1}$ and $ \hat {\mathbf {u}}_{2}$ be the least squares estimators of their random effects u₁ and u₂, each of which corresponds to non-singular sub-matrices of $\tilde {\mathbf {X}}$.

Then, it follows from (7) that

$$ \left(\begin{array}{c} \hat{\mathbf{u}}_{1} \\ \hat{\mathbf{u}}_{2} \end{array} \right) \sim N \left(\left(\begin{array}{c} \mathbf{0}_{1} \\ \mathbf{0}_{2} \end{array} \right), \sigma_{e}^{2} \Sigma_{x\rho} \right), $$

(9)

where

$$\Sigma_{x\rho} =\Sigma_{x}(\rho_{1},\rho_{2}) = \left(\tilde{\mathbf{X}}'\tilde{\mathbf{X}} \right)_{12}^{-1} + \left(\begin{array}{c c} \rho_{1} \mathbf{I}_{1} & \mathbf{0}_{1} \\ \mathbf{0}_{2} & \rho_{2} \mathbf{I}_{2} \end{array} \right). $$

Let $\Sigma _{x\rho }^{-1}$ be the inverse of the matrix Σ_xρ, found by Eigen value decomposition or otherwise. Then, from (9) we can obtain two independent random vectors as

$$\left(\begin{array}{c} \hat{\mathbf{v}}_{1} \\ \hat{\mathbf{v}}_{2} \end{array} \right) =\Sigma_{x\rho}^{-1} \left(\begin{array}{c} \hat{\mathbf{u}}_{1} \\ \hat{\mathbf{u}}_{2} \end{array} \right) \sim N \left(\left(\begin{array}{c} \mathbf{0}_{1} \\ \mathbf{0}_{2} \end{array} \right), \sigma_{e}^{2} \mathbf{I} \right). $$

Now it is evident that the two variance components can be tackled using the two independent chi-squared random variables

$$ V_{l}=\frac{S_{l} (\rho_{1},\rho_{2})}{\sigma_{e}^{2}} \sim \chi_{a_{l}}^{2}, \hbox{ \ where } S_{l} (\rho_{1},\rho_{2})=\hat{\mathbf{v_{l}}}^{\prime } \hat{\mathbf{v_{l}}}, \hbox{ for $l= 1, 2$}, $$

(10)

where $a_{l} = rank(\tilde {\mathbf {X}}_{l})$, which in many cases equal to k_l−1.

As shown by Weerahandi and Gamage (2016), an alternative way of making inferences in problems involving two parameters can be accomplished by a marginal distribution of one statistic and the conditional distribution of the other statistic. In our application we readily have the marginals given by (8), the conditional distributions can be found using the multivariate distribution theory on normal distributions. For example, the conditional distribution of $\hat {\mathbf {u}}_{2}$ given $\hat {\mathbf {u}}_{1} $ is given by

$$ \hat{\mathbf{u}}_{2} | \hat{\mathbf{u}}_{1} \sim N \left(\boldsymbol{\delta}_{21}, \mathbf{\Sigma}_{2,1} \right), $$

(11)

where

$$\boldsymbol{\delta}_{21}= \mathbf{\Sigma}_{22} \mathbf{\Sigma}_{22}^{-1} \hat{\mathbf{u}}_{1} \hbox{, } \mathbf{\Sigma}_{2,1}=\mathbf{\Sigma}_{11} - \mathbf{\Sigma}_{12}\mathbf{\Sigma}_{11}^{-1} \mathbf{\Sigma}_{21}, $$

and sub-matrices of Σ_xρ appearing in the above equation are obtained by the partition

$$\mathbf{\Sigma}_{x}(\rho_{1},\rho_{2}) = \left(\begin{array}{c c} \mathbf{\Sigma}_{11} & \mathbf{\Sigma}_{12} \\ \mathbf{\Sigma}_{21} & \mathbf{\Sigma}_{22} \end{array} \right), $$

It is now evident that we can tackle the two variance components of interest using the chi-squared statistics,

$$\begin{array}{@{}rcl@{}} V_{1} &=& \frac{S_{1}(\rho_{1})}{\sigma_{e}^{2}} \sim \chi_{a_{1}}^{2}, \hbox{ \ where } S_{1}(\rho_{1})=\hat{\mathbf{u_{1}}}^{\prime } \left((\tilde{\mathbf{X}}'\mathbf{X})_{1}^{-1}+\rho_{1} I_{1} \right)^{-1}\hat{\mathbf{u}}_{1} \end{array} $$

(12)

$$V_{21} = \frac{S_{21}(\rho_{1},\rho_{2})}{\sigma_{e}^{2}} | \hat{\mathbf{u_{1}}} \sim \chi_{a_{2}}^{2}, \hbox{ \ where } S_{21}(\rho_{1},\rho_{2})=(\hat{\mathbf{u_{1}}} -\boldsymbol{\delta}_{21})^{\prime} \mathbf{\Sigma}_{2,1}^{-1} (\hat{\mathbf{u_{1}}} -\boldsymbol{\delta}_{21}). $$

However, it is inconvenient to work with conditional distributions in deriving inference methods. We can avoid the above conditional distribution by applying the transformation suggested by Weerahandi and Gamage (2016). In our problem this is accomplished by considering the distribution of the CDF (Cumulative distribution function) of V₂₁, which is distributed as

$$ U_{21}=F_{V_{21}}= C\left(\widehat{\mathbf{U}}_{1}, \widehat{\mathbf{U}}_{2} ; \rho_{1}, \rho_{2}, \sigma_{e}^{2}\right) | \widehat{\mathbf{U}}_{1} = \hat{\mathbf{u}}_{1} \sim U(0,1), $$

(13)

where uppercase variables represent the observable random quantities, and C is the CDF of the chi-squared distribution with a₂ degrees of freedom, which is readily available from statistical software packages. But the above uniform distribution is free of $\hat {\mathbf {u_{1}}}$. Therefore, the random variable U₂₁ is distributed as

$$ U_{21}=C \left(\widehat{\mathbf{U}}_{1}, \widehat{\mathbf{U}}_{2} ; \rho_{1}, \rho_{2}, \sigma_{e}^{2}\right) \sim U(0,1) $$

(14)

independently of $\widehat {\mathbf {U}}_{1}$. Obviously the above argument can be extended for two sets of random effects to enable handling of any number of variance components in an iterative manner.

Illustration: Inference on functions of variance components

In this section we illustrate the application of foregoing results by undertaking a function of variance components that arise in practice of statistical quality assurance and in some applications of BLUPs. For a variety of problems associated with Gauge, “Repeatability” and “Reproducibility”, and the reader is referred to Wu and Hamada (2009) for a detailed discussion of these notions and for various sums and ratios of variance component that are important in quality assurance measurement systems.

We will illustrate the approach that one can take in such applications, by considering the problems of making inferences on a variance component and a sum of one variance component $\sigma _{j}^{2}$ of and the error variance $\sigma _{e}^{2}$. Making inferences about the variance ratio ρ_j itself is a trivial task because, it follows chi-squared random variate

$$ V_{j}=\frac{S(\rho_{j})}{\sigma_{e}^{2}}\mathbf{\sim }\chi_{a}^{2}, \hbox{ \ where } S(\rho_{j})=\widehat{\mathbf{u_{j}}}^{\prime } \left((\tilde{\mathbf{X}}'\tilde{\mathbf{X}})_{j}^{-1}+\rho_{j} I_{j} \right)^{-1}\widehat{\mathbf{u_{j}}}, $$

(15)

and a=k_j−1 and $(\tilde {\mathbf {X}}'\tilde {\mathbf {X}})_{j}^{-1} $ is the sub-matrix of ${(\widetilde {\mathbf {X'}}\widetilde {\mathbf {X}})^{\mathbf {-1}}}$ corresponding to the u_j part of the regression. In some applications it is more convenient to work with the alternate form of (15),

$$ V_{j}=\tilde{S}\left(\sigma_{e}^{2}, \sigma_{j}^{2}\right) \sim \chi_{a}^{2}, \hbox{ \ where } \tilde{S}\left(\sigma_{e}^{2}, \sigma_{j}^{2}\right)=\widehat{\mathbf{u_{j}}}^{\prime } \left(\sigma_{e}^{2} (\tilde{\mathbf{X}}'\tilde{\mathbf{X}})_{j}^{-1}+\sigma_{j}^{2} I_{j} \right)^{-1}\widehat{\mathbf{u_{j}}}, $$

(16)

If needed, the computation of S(ρ_j) can be simplified by diagonalizing the $(\tilde {\mathbf {X}}'\tilde {\mathbf {X}})_{j}^{-1}$ matrix by means of an orthogonal matrix, say P, as

$$ S(\rho_{j}) =\tilde{\mathbf{u}}_{j}^{\prime} \left(\mathbf{D}_{\mathbf{j}}+\rho_{\mathbf{j}} I_{\mathbf{j}} \right)^{-1} \tilde{\mathbf{u}}_{j} =\sum_{l}\frac{u_{{lj}}^{2}}{(\rho_{j} +\lambda_{{lj}})} $$

where D_j is a diagonal matrix formed by Eigen values λ_j of $(\tilde {\mathbf {X}}'\tilde {\mathbf {X}})_{j}^{-1}$ and $\tilde {\mathbf {u}}_{j}=\mathbf {P} \hat {\mathbf {u}}_{j}$.

It follows from (15) and (8) that

$$ W=\frac{a S_{e}}{e S(\rho_{j})} \sim F_{e,a}, $$

(17)

a test statistic involving no nuisance parameters. For example, it is easily seen that an upper 100γ% confidence bound for ρ_j is given by

$$\rho_{j} \leq max \left(0, s_{j}^{-1} \left(\frac{as_{e}}{e q_{\gamma}}\right) \right), $$

where q_γ is the γ^th quantile of the F-distribution with a and e degrees of freedom, and $s_{j}^{-1} $ is the inverse function of s(ρ_j).

Making inferences on individual variance components, and sums and some ratios of variance components is not a trivial task. In fact, classical approach to inference fails to provide small sample inference even in the simple balanced mixed models (cf. Weerahandi (1992)). Whereas inferences on any function of variance components is easily accomplished by taking the generalized approach to inference. It should be emphasized that one can obtain equivalent results by taking the generalized fiducial inference approach introduced by Hannig et al. (2006). For example, most of the the problems undertaken by Cisewski and Hannig (2012) can be tackled using exact distributional results without relying on any asymptotic or approximate results. Moreover, now that we have exact distributional results, one can also take Parametric Bootstrap approach to make inferences involving exact probability statements, just like Generalized Inference does.

4.1 Inference on individual variance components

Perhaps the easiest way to make inferences on individual variance components as well as functions of variance components, is by replacing each nuisance parameter by their GPQ (Generalized pivotal quantity) that reduces to the parameter at the observed sample point, and then showing the resulting quantity leads to a well defined extremer region in the problem of testing of the parameter of interest. While the error variance, $\sigma _{e}^{2}$ is easily handled by the obvious GPQ suggested by (6)

$$ G_{\sigma_{e}^{2}} =\frac{\sigma_{e}^{2} s_{e}}{S_{e}}=\frac{s_{e}}{U}, $$

(18)

which is basically the same as the classical PQ (Pivotal quantity). The GPQs of other variance components is not that easy to derive, but by applying what is known as the substitution method (cf. Weerahandi (2004)) to Eqs. (6) and (15), we can obtain a GPQ that reduces to $\sigma _{j}^{2}$ at the observed sample point as

$$ G_{\sigma_{j}^{2}} = \frac{\sigma_{e}^{2} s_{e}}{S_{e}} s^{-1} \left(\frac{s_{e} S \left(\sigma_{j}^{2} /\sigma_{e}^{2}\right)}{S_{e}} \right) =\frac{s_{e}}{U} s^{-1} \left(\frac{s_{e} V_{j}}{U} \right), $$

(19)

where s()⁻¹ is the inverse function of s() unless it is negative, in which case the GPQ is set to 0. Obviously expression (19) provides a GPQ for $\sigma _{j}^{2}$, because it has the required properties:

The distribution of $G_{\sigma _{j}^{2}}$ is free of unknown parameters
At observed sample points, $G_{\sigma _{j}^{2}}$ depends only on the parameter of interest, $\sigma _{j}^{2}$.

In fact, at observed sample points, $G_{\sigma _{j}^{2}}$ reduces to $\sigma _{j}^{2}$.

Nevertheless, the inverse function s⁻¹() of s() can be avoided, by first considering the problem of testing null hypotheses of the form

$$ H_{0} :\sigma_{j}^{2} \geq \sigma_{0}^{2}. $$

A GTV (Generalized test variable), a notion introduced by Tsui and Weerahandi (1989), for testing the above hypotheses can also be obtained by the substitution method in alternative equivalent forms as

$$\begin{array}{*{20}l} T& =\frac{s_{e} S\left(\sigma_{j}^{2} /\sigma_{e}^{2}\right)} { S_{e} s\left(\frac{\sigma_{j}^{2} }{\sigma_{e}^{2}} \frac{S_{e}}{s_{e}} \right) } =\frac{V_{j}}{U} \frac{s_{e}} { s\left(\frac{U \sigma_{j}^{2}}{s_{e}} \right) } \hbox {, or as }\\ T & = \frac{ \tilde{S}(\sigma_{e}^{2},\sigma_{j}^{2}) } { \tilde{s}\left(\frac{\sigma_{e}^{2} s_{e}}{S_{e}},\sigma_{j}^{2} \right) } =\frac{V_{j}}{ \tilde{s}\left(\frac{s_{e}}{U},\sigma_{j}^{2} \right) }. \end{array} $$

(20)

Notice that the distribution of the above test variable is free of nuisance parameters, because the expression of T in (20) involves only $\sigma _{j}^{2}$, the parameter of interest. In addition, from its first expression, it is clear that the test variable reduces to 1 at the observed values of the random variables, because at the observed values S_e become equal to s_e, and S becomes equal to s, thus leading to a well defined point on the boundary of extreme regions based on T. Moreover, the random variable T tends to be stochastically increasing in $\sigma _{j}^{2}$, because it follows from (16) that $\tilde {S()}$ is a stochastically decreasing function of $\sigma _{j}^{2}$. Hence, the generalized p-value for testing H₀ can be obtained as

$$\begin{array}{*{20}l} p & =max \Pr (T\leq 1) = \Pr \left(V_{j}\leq \tilde{s}\left(\frac{s_{e}}{U},\sigma_{0}^{2} \right) \right) \\ & = EG_{a} \left(\tilde{s}\left(\frac{s_{e}}{U},\sigma_{0}^{2} \right) \right), \end{array} $$

(21)

where G_a is the CDF of the chi-squared distribution with a degrees of freedom and the expectation is taken with respect to the random variable $U\sim \chi _{e}^{2}~$.

The generalized confidence intervals for the variance component $\sigma _{j}^{2}$ can be derived from the GPQ (19) or deduced directly from the p-value in (21). In the latter approach, if $\sigma _{1}^{2}$ and $\sigma _{2}^{2}$ are chosen such that

$$ EG_{a} \left(\tilde{s}\left(\frac{s_{e}}{U},\sigma_{1}^{2} \right) \right) =\frac{1-\gamma }{2} $$

and

$$ ~EG_{a} \left(\tilde{s}\left(\frac{s_{e}}{U},\sigma_{2}^{2} \right) \right) =\frac{1+\gamma }{2}, $$

then $\left [\sigma _{1}^{2}, \sigma _{2}^{2}\right ]$ is a 100γ% generalized confidence interval for $\sigma _{j}^{2}$.

4.2 Inference on sums of variances

To illustrate the approach one can take in dealing with various sums and ratios of variance components that arise in quality assurance applications (cf. Wu and Hamada (2009)) consider the particular problem of making inferences about quantities such as $\sigma _{1}^{2} + \sigma _{2}^{2}$. Unlike simple in variance ratios, the problem is non-trivial even if one of the two variances is the error variance $\sigma _{e}^{2}$. So, in order to illustrate the approach that one can take in dealing with sums of variance components, consider the particular problem of constructing interval estimates for the sum $\sigma _{s}^{2}= \sigma _{e}^{2} + \sigma _{j}^{2}$

It follows from (18) and (19) that

$$G_{s}=G_{\sigma_{j}^{2}} + G_{\sigma_{e}^{2}} = \frac{s_{e}}{U} \left(1+s^{-1} \left(\frac{s_{e} V_{j}}{U}) \right) \right) $$

is a potential GPQ for the sum. It is indeed a GPQ for $\sigma _{s}^{2}$, because it has the required properties:

The distribution of G_s is free of unknown parameters
At observed sample points, G_s depends only on the parameter of interest.

Moreover, at observed sample points G_s reduces to $\sigma _{s}^{2}$. Therefore, 100γ% generalized confidence intervals on $\sigma _{s}^{2}$ can be constructed by first finding quantiles of the GPQ

$$Pr(G_{s} \leq q_{\gamma}) = Pr \left(\frac{s_{e}}{U} \left(1+s^{-1} \left(\frac{s_{e} V_{j}}{U}) \right) \right) \leq q_{\gamma} \right), $$

and obtaining one-sided interval estimates as $\sigma _{s}^{2} \leq q_{\gamma }$. The quantiles can be easily computed by exact numerical integration. The generalized confidence can be computed more conveniently using Monte Carlo integration. For example, 95% generalized confidence intervals for $\sigma _{s}^{2}$ can be obtained in the following steps:

Generate a large set of random numbers from the independent chi-squared random variables (U_i,V_i), say i=1,2,⋯,100000 pairs,
For each pair, compute G_i=G_s(U_i,V_i),i=1,2,⋯,100000, from $U_{i} \sim \chi _{e}^{2}$ and $V_{i} \sim \chi _{a}^{2}$,
Sort the G_i samples and obtain the order statistic as G₍₁₎,G₍₂₎,⋯,G_(100000),
Obtain the 95% lower confidence bound for $\sigma _{s}^{2}$ as G₍₅₀₀₀₎,
Obtain the 95% upper confidence bound for $\sigma _{s}^{2}$ as G_(95000),
Obtain the 95% equal-tail confidence interval for $\sigma _{s}^{2}$ as (G₍₂₅₀₀₎,G_(97500)).

Testing of hypotheses can also be carried out using the above interval estimates. For example, the null hypotheses $H_{0} : \sigma _{s}^{2} =\delta $ is rejected at 0.05 level of significance if the interval estimate, (G₍₂₅₀₀₎,G_(97500)) does not contain the hypothesized value δ.

4.3 Inference on fixed effects

Next consider the problem of making inferences on some or all fixed effects in the presence of random effects. Confidence intervals and tests available from widely used software packages such as SAS PROC MIXED and R lme are based on asymptotic distributions of the REML and the ML of variance components. Unlike inferences directly on variance components, these methods are not too bad. But since fixed effects also depend on variance components to some extent, the purpose of this section to show how one can make inferences based on exact probability statements.

If variance components were known parameters, then we can make inferences about β based on its GLSE (Generalized Leased Squares Estimator) specified by (4), which is distributed as

$$ \hat{\boldsymbol{\beta}} \sim N(\boldsymbol{\beta}, \sigma_{e}^{2} \Gamma), \text{ where } \Gamma=\mathbf{I_{N}}+\mathbf{Z}\mathbf{\Sigma}_{\rho} \mathbf{Z}'. $$

(22)

But the issue of concern is that, while it is trivial to handle the error variance $\sigma _{e}^{2}$, the classical approach fails to provide satisfactory inferences when the among group variances are unknown, as usually the case. The inference problem is not simple because $\hat {\boldsymbol {\beta }}$ is itself is a function of nuisance parameters, namely

$$\hat{\boldsymbol{\beta}}= (\mathbf{X}' \Gamma^{-1} \mathbf{X})^{-1} \mathbf{X}' \Gamma^{-1} \mathbf{y}. $$

There is of course no issue in point estimation. In fact, simple LSE based point estimates are unbiased estimates, which are as good as point estimates provided by the REML and the ML methods. The problem is making inferences about components of β beyond the point estimation is not trivial. So we will next develop generalized inference methods to tackle β.

To show how the generalized inferences based on exact probability statements can be developed based on (22), let us consider the case where we have one factor between group variance σ². In this case the covariance matrix Γ appearing in (4) reduces to Γ=I_N+ρZZ^′, where $\rho = \sigma ^{2} / \sigma _{e}^{2}$. Consider the problem of testing individual components of β, results of which can be easily extended to test a vector of fixed effects. In generalized approach to inference such nuisance parameters as ρ can be tackled by its GPQ, the random quantity

$$ R=R(U,V)=max \left(0, s^{-1} \left(\frac{{s}_{e} {S}(\rho) \sigma_{e}^{2}}{\sigma_{e}^{2} {S}_{e}} \right) \right) = max \left(0, s^{-1} \left(\frac{{s}_{e} {V} }{{U}} \right) \right), $$

(23)

which reduces to ρ at the observed set of sample points. In Section 4 we saw that this GPQ yields that same inferences on ρ as the classical PQ, but as we see below, it has greater potential when treated as a GPQ. Now we can make inferences on the β vector and its components based on the result (23) using the resulting GPQ

$$\boldsymbol{\beta}_{R} = \left(\mathbf{X}'\boldsymbol{\Gamma}_{\mathbf{R}}^{\mathbf{-1}} \mathbf{X} \right)^{-1} \mathbf{X}'\boldsymbol{\Gamma}_{\mathbf{R}}^{\mathbf{-1}} \mathbf{y}, $$

where Γ_R=I_N+RZZ^′.

To illustrate the steps in making such generalized inference, consider the specific problem of testing hypotheses of the form

$$ H_{01}: \beta_{i} \geq \beta_{0}, \mathrm{ or }\ H_{02}: \beta_{i} \leq \beta_{0}, $$

(24)

where β₀ is a specified constant with β₀=0 in (24) being the most widely used situation of testing the significance of a variable of special interest such as the efficacy of a drug. If there were no random effects, we would have used the t-test suggested by the classical regression theory to perform the test. To develop the counterpart in the presence of a random effects, let us use the distributional result

$$ Z= \frac{\hat{\beta_{i}} (\rho) - \beta_{i}}{\sigma_{e} \sqrt{(\mathbf{X}' \Gamma^{-1} \mathbf{X})_{\mathbf{i}}}}\sim N(0,1), $$

(25)

where $\hat {\beta _{i}} (\rho)$ is the i^th component of β(ρ) corresponding to the parameter β_i being tested. This result along with (6) implies that

$$ T= \frac{\hat{\beta_{i} (\rho)} - \beta_{i}}{S_{e} \sqrt{(\mathbf{X}' (\mathbf{I}_{N}+\rho \mathbf{Z Z'})^{-1} \mathbf{X})_{\mathbf{i}}}}\sim t_{e}, $$

a result we could have used if the variance ratio ρ were known. Moreover, recall that we can tackle the ρ parameter using the distribution results (6) and (15), namely

$$\begin{array}{@{}rcl@{}} U =\frac{S_{e}}{\sigma_{e}^{2}}\mathbf{\sim }\chi_{e}^{2}, \text{ where } e=N- rank(\widetilde{\mathbf{X}})\ {and } \\ V=\frac{S(\rho)}{\sigma_{e}^{2}}\mathbf{\sim }\chi_{a}^{2}, \hbox{ \ where } S(\rho)=\widehat{\mathbf{u}}^{\prime } \left((\tilde{\mathbf{X}}'\tilde{\mathbf{X}})_{u}^{-1}+\rho I_{l} \right)^{-1}\widehat{\mathbf{u_{j}}}. \end{array} $$

in the case of one grouping structure with l groups.

Since ρ is unknown in practice, the above result leads to an approximate t-test only. However, we can derive a generalized test based on exact probability statement using forgoing distributional results. In view of the form of the T statistic in the known ρ case, it is evident that the unknown variance ratio can be tackled by using the potential generalized test variable

$$\begin{array}{@{}rcl@{}} \tilde{T}&=& \frac{\sigma_{e}}{\sqrt{S_{e}}(\hat{\beta_{i}}(R) - \beta_{i})} \frac{(\hat{\beta_{i}}(\rho) - \beta_{i})}{\sigma_{e} \sqrt{ \left(\mathbf{X}' (\mathbf{I}_{\mathbf{N}}+\rho \mathbf{Z Z'})^{-1} \mathbf{X} \right)_{i}}} \sqrt{(\mathbf{X}' \left(\mathbf{I}_{\mathbf{N}}+ {R} \mathbf{Z Z'} \right)^{-1} \mathbf{X})_{i}} \\ &=& \frac{Z}{\sqrt{U}(\hat{\beta_{i}}(R) - \beta_{i})} \sqrt{ \left(\mathbf{X}' \left(\mathbf{I}_{\mathbf{N}}+ {R}\mathbf{Z Z'} \right)^{-1} \mathbf{X} \right)_{i}}, \end{array} $$

(26)

where R is a function of U and V, and $Z \sim N(0,1), U \sim \chi _{e}^{2}$, and $V \sim \chi _{a}^{2}$ are mutuality independent random variables.

It is clear from the second expression of (26) that the distribution of $\tilde {T}$ is free of unknown parameter, and from the first expression it is clear that at the observed set of sample points, the test variable reduces to $1/\sqrt {s_{e}}$. Moreover the distribution of $ \tilde {T}$ is stochastically monotonic in β_i. Therefore, the hypotheses of the form (24) can be tested based on the generalized p-value

$$\begin{array}{@{}rcl@{}} p&=& Pr \left(\frac{Z}{\sqrt{U}} \sqrt{(\mathbf{X'} \left(\mathbf{I}_{\mathbf{N}} + {R} \mathbf{Z Z'} \right)^{-1} \mathbf{X})_{i} } \leq \frac{(\hat{\beta_{i}} (R) - \beta_{0}) }{\sqrt{s_{e}}} \right) \\ &=& E \left[\Phi \left(\frac{(\hat{\beta_{i}} (R) - \beta_{0}) \sqrt{U} }{\sqrt{s_{e} (\mathbf{X'} \left(\mathbf{I}_{\mathbf{N}}+ {R} \mathbf{Z Z'} \right)^{-1} \mathbf{X})_{i}}} \right) \right]. \end{array} $$

(27)

or 1−p, where the expectation in (27) is taken with respect to the chi-squared random variable U and V, and S(ρ) is given by (15).

Simulation study

The purpose of this section is to conduct a Monte-Carlo simulation study to study the performance of the proposed method compared with the widely used MLE based methods that are available from such software packages as SAS Proc Mixed and R lme. Weerahandi (2004, Chapter 4) provides simulated results showing serious Type I Error issues of ML and REML based tests in the context of random effects ANOVA. Whereas, in our simulation study concerning the regression case, we considered 3,5 and 10 groups and error variance is fixed at 1. Though not necessary, to minimize the extent of the simulation, sample size of each group is assumed to be equal and set at values n=10 and 100. In each of the studies below, we also included one X variable with fixed effect 10 is included when X takes on normal random numbers with mean 0 and variance 0.1. Moreover, except when more than one random effect is necessary, the Z is set to be taken as normal random numbers with mean 1 and variance 1. In typical applications where mixed effects models add value over fixed effects models, the factor variance tends to be smaller than the error variance, and so we fixed the latter was set at 1 and the former was varied to have standard deviations at 0.01, 0.1, and 1.

5.1 Testing the variance component

The literature provides comparison of tests available for testing variance components basically for ANOVA type models only. Here we compare the performance of the proposed generalized test in Section 4 against the two methods available from the R software package. One widely used test is an REML based method available from the R lme function with the “interval” procedure, which is an improved REML based Wald-type confidence interval. The other competing method is the PL (Profile likelihood) based confidence interval available from R lmer with the “confint” procedure.

Table 1 below shows the Type-I error of alternative tests for testing hypotheses of the form $H_{0}: \sigma _{0}^{2} \leq \sigma _{A}^{2}$. When the sample size and the factor variance are both large, REML method fails due to non-convergence, and so such cases are indicated as NA. It is evident from Table 1 that the proposed test outperforms both the competing methods under all scenarios. The REML method seriously misses the intended size of tests when the factor variance is small and become conservative when the factor variance is large, whereas the PL method is too conservative in all cases. The proposed method also becomes somewhat conservative when the factor variance is also large. The performance of likelihood based methods somewhat improve as the number of factor levels increases.

Table 1 Type I error of competing tests of $H_{0}: \sigma _{A}^{2} \geq \sigma _{0}^{2}$ with intended size 0.05

Full size table

5.2 Testing fixed effects

The literature does not seem to provide any comparison of tests available for testing fixed effects of mixed effects regression models, perhaps thinking that their optimality is covered by the classical theory. Although the situation is not as serious as that of variance components, the assumption is valid only if the variance components are known. Therefore, here we compare the performance of the proposed generalized test in Section 5 against the two methods available from software packages such as SAS and R, namely the REML and ML based tests. Again since multiple covariates add no value in understanding the comparative performance competing tests, we consider one covariate X, whose fixed effect is denoted as β. The performance was found to vary somewhat with choice of x, and therefore, to understand the overall performance of competing tests for H₀:β=0, we generated X as a 50% mixture of X used in the previous section and one taking on equally spaced values from -1 to 1.

Table 2 below shows the performance of competing methods as the number of groups take on typical values 3, 5, and 10, whereas the sample size from each group takes on value 10 and 100. In each case, the performance of competing tests were studied as the factor variance σ_A vary from 0.1 to 1.0. It is evident from Table 2 that when number of groups is small, type I error of MLE based methods tends to be somewhat larger than the intended size of 0.05, whereas the propose method stay closer to the intended size. The ML test is worse than the REML test and could be as large as.084, though not as bad as tests on variance components. When the sample size is small, Type I error of each test tends to go up with the factor variance, whereas when the sample sizes are large Type I errors of RMEL and PL methods tend to go down. When the number of groups and the sample sizes are both large, all three methods behave practically the same.

Table 2 Type I error of competing tests of H₀:β=0 with intended size 0.05

Full size table

Discussion

In this article we have shown how to make inferences in mixed effects models based on exact distributions of underlying statistics when the covariance structure is compound symmetric. The simulation studies undertaken in this article have shown the potential in developing tests and confidence intervals that are better than currently available asymptotic methods. Our results also show that there is potential for further improvement of inferences, especially when factor variances are too large or too small compared to the error variance.

Our tests and interval estimates are based on the generalized inference approach. Researchers are encouraged to develop alternative inferences by taking other promising approaches as Parametric Bootstrap and Fiducial inference with a view towards further improving the performance of tests, especially when factor variances are large compared to the error variance. Also desirable is to extend results to widely used and practically useful other covariance structures.

Gamage et al. (2013) developed generalized prediction intervals for BLUPs of mixed models. Further research necessary to extend such results to compare two BLUPS in the same model or BLUPs of two independent models. The results presented in this article should pave the way for such developments.

Availability of data and materials

This article dealing with a novel Distribution Theory does not have any Data to report.

Abbreviations

ANOVA:: Analysis of variance
BLUP:: Best linear unbiased predictor
BLUE:: Best linear unbiased estimator
CDF:: Cumulative distribution function
GLSE:: Generalize least squares estimator
GPQ:: Generalized pivotal quantity
GTV:: Generalized test variable
LSE:: Least squares estimator
ML:: Maximum likelihood
MLE:: Maximum likelihood estimator
PQ:: Pivotal quantity
QR:: Q and R are the notations used for the two matrices in QR matrix decomposition
REML:: Restricted maximum likelihood

References

Cisewski, J., Hannig, J.: Generalized fiducial inference for normal linear mixed models. Ann. Stat. 40, 2102–2127 (2012).
Article MathSciNet Google Scholar
Gamage, J., Mathew, T., Weerahandi, S.: Generalized prediction intervals for BLUPs in mixed models. J. Multivar. Anal. 220, 226–233 (2013).
Article MathSciNet Google Scholar
Hannig, J., Iyer, H., Patterson, P.: Fiducial generalized confidence intervals. J. Am. Stat. Assoc. 101, 254–269 (2006).
Article MathSciNet Google Scholar
Hamada, M., Weerahandi, S.: Measurement system assessment via generalized inference. J. Qual. Technol. 32, 241–253 (2000).
Article Google Scholar
Henderson, C. R.: Best linear unbiased estimation and prediction under a selection model.Biometrics. 31, 423–447 (1975).
Article Google Scholar
Robinson, G. K.: That BLUP is a good thing: the Estimation of random effects. Stat. Sci. 6, 15–32 (1991).
Article MathSciNet Google Scholar
Searle, S. R., Casella, G., McCulloch, C. E.: Variance Components. Wiley, Hoboken (2006).
MATH Google Scholar
Seely, J. F., El-Bassiouni: Applying Wald’s variance component test. Ann. Stat. 11, 197–201 (1983).
Article MathSciNet Google Scholar
Tsui, K., Weerahandi, S.: Generalized p-values in significance testing of hypotheses in the presence of nuisance parameters. J. Am. Stat. Assoc. 84, 602–607 (1989).
MathSciNet Google Scholar
Wald, A.: A note on regression analysis. Ann. Math. Stat. 18, 586–589 (1947).
Article MathSciNet Google Scholar
Weerahandi, S.: Testing variance components in mixed models with generalized p-values. J. Am. Stat. Assoc. 86, 151–153 (1992).
MathSciNet Google Scholar
Weerahandi, S.: Generalized confidence intervals. J. Am. Stat. Assoc. 88, 899–905 (1993).
Article MathSciNet Google Scholar
Weerahandi, S.: Generalized Inference in Repeated Measures: Exact Methods in MANOVA and Mixed Models. Wiley, Hoboken (2004).
MATH Google Scholar
Weerahandi, S., Gamage, J.A general method of inference for two-parameter continuous distributions.Communications in Statistics (T&M). 45, 2612–2625 (2016).
Google Scholar
Wu, J., Hamada, M. S.: Experiments: Planning, Analysis, and Optimization. Wiley, Hoboken (2009).
MATH Google Scholar

Download references

Acknowledgements

The authors are grateful to the editors and referees who made very constructive comments and suggestions. Addressing them led to a substantially improved version of the manuscript. The authors also thank Professor Thomas Mathew of UMBC for reading the manuscript and making constructive suggestions.

Funding

This research was not funded by any institution.

Author information

Authors and Affiliations

X-Techniques, 23 Chestnut St, Edison, 08817, NJ, USA
Samaradasa Weerahandi
Pfizer, 235 East 42nd St, New York, USA
Ching-Ray Yu

Authors

Samaradasa Weerahandi
View author publications
You can also search for this author in PubMed Google Scholar
Ching-Ray Yu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Authors’ contributions

Weerahandi developed the initial distributions after joint discussions with the co-author Yu. Co-author Yu, heavily contributed to the simulation studies to evaluate the Type I error performance of proposed solutions in this article compared with other solutions in the literature. All authors read and approved the final manuscript.

Authors’ information

The first author of this article, Weerahandi, is the president of X-Techniques located at 23 Chestnut St, Edison, NJ 08817. The second author is a director of Pfizer located at 235 East 42nd Street, New York, NY 10017.

Corresponding author

Correspondence to Samaradasa Weerahandi.

Ethics declarations

Competing interests

Authors of this article do not have Competing interests of any kind, financially or otherwise.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Weerahandi, S., Yu, CR. Exact distributions of statistics for making inferences on mixed models under the default covariance structure. J Stat Distrib App 7, 4 (2020). https://doi.org/10.1186/s40488-020-00105-w

Download citation

Received: 05 March 2020
Accepted: 31 July 2020
Published: 17 August 2020
DOI: https://doi.org/10.1186/s40488-020-00105-w

Exact distributions of statistics for making inferences on mixed models under the default covariance structure

Abstract

Introduction

The model and problems

2.1 The issue with available results

Exact distributions of LSEs

3.1 Distributions for inferences on variance components

3.2 Distributions for functions of variance components

Illustration: Inference on functions of variance components

4.1 Inference on individual variance components

4.2 Inference on sums of variances

4.3 Inference on fixed effects

Simulation study

5.1 Testing the variance component

5.2 Testing fixed effects

Discussion

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Authors’ contributions

Authors’ information

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords