 Review
 Open Access
 Published:
Efficient and adaptive rankbased fits for linear models with skewnormal errors
Journal of Statistical Distributions and Applications volume 1, Article number: 18 (2014)
Abstract
The rankbased fit of a linear model is based on minimizing a norm. A score function needs to be selected for the fit and the proper choice leads to asymptotically efficient regression estimators, i.e., fits equivalent to the maximum likelihood estimators (mle). In this paper, we present the family of optimal scores functions for the skewnormal family of distributions. We show the easy computation of this rankbased fit using the R package Rfit. We present the results of a small simulation study comparing the rankbased estimators and the mles in terms of efficiency and validity over skewnormal and contaminated normal distributions. We also develop and present empirical results for a Hoggtype adaptive procedure for selecting among a family of these scores based on a robust initial fit.
Introduction
Rankbased fitting of linear models offers an attractive alternative to least squares (LS) and maximum likelihood (mle) fitting. The geometry of the rankbased fit is similar to that of LS. Simply replace the Euclidean squared norm used in the LS fit with another norm which, unlike LS, results in a robust fit. The accompanying robust analysis is the analogue of the LS’s ANOVA and ANCOVA. The rankbased analysis offers a complete analysis including robust diagnostics to check quality of fit. This rank based analysis has recently been extended to mixed and nonlinear models; see [Kloke et al. (2009]) and [Abebe and McKean (2013]), respectively. A full development of the rankbased analysis can be found in Chapters 35 of the monograph by [Hettmansperger and McKean (2011]). The rankbased analysis is, generally, highly efficient. It can easily be optimized depending on the information available concerning the distribution of the random errors. For example, if the form of the error distribution is known, then an appropriate rankbased procedure can be selected to attain full efficiency.
In this paper, we discuss rankbased analyses which are appropriate for the skewnormal (SN) family of distributions. This is a rich family of skewed distributions developed by [Azzalini (1985]). The skewness of a distribution in this family is controlled by a shape parameter α, −∞<α<∞. Distributions are left or right skewed, depending on whether α<0 or α>0, respectively. If α=0 then the distribution is normal. As we discuss in Section 3, all of these distributions are lighttailed. Such families of distributions occur frequently in accelerated failure time (AFT) models. The response of interest in such models is survival time and its log is frequently modeled in terms of a log linear model. The random errors for these log linear models are, generally, skewed. For instance, the family of distributions for log linear models when survival time follows an Fdistribution contains a wide variety of skewed distributions with tail weights that range from moderate to heavy; see [McKean and Sievers (1989]). Hence, the skewnormal family adds a rich class of lighttailed skewed distributions which also includes the normal distribution.
In Section 2, we outline the rankbased analysis for a general linear model. The computation of these analyses is easily handled by the R package RfitRfit, developed by [Kloke and McKean (2013]), which can be freely downloaded at the CRAN http://cran.us.rproject.org/. We discuss this software for an example in this section and we continue such discussion in the remainder of the article. The data for the example and the R code, supplemental to RfitRfit, used in this article are available to the reader at the http://www.stat.wmich.edu/mckean/SN/.
In Section 3, we develop rankbased analyses for the skewnormal family. These analyses are efficient for this family and the appropriate analysis is fully efficient. As we discuss, these analyses are technically robust similar to the optimal rankbased analysis for normally distributed errors. In contrast, in a sensitivity analysis, we show that the maximum likelihood fit (mle) is not robust. In Section 4, we present the results of a Monte Carlo study which verify the robustness and validity of the rankbased analysis over a family of SN distributions and contaminated SN distributions. These studies confirm the nonrobustness of the mle analysis.
The rankbased analysis depends on the shape parameter α. One outcome of the Monte Carlo study is that rankbased analyses based on shape parameters in a neighborhood of the correct α had very similar behavior to that using the correct α. This suggests that a simple Hoggtype adaptive procedure would entertain excellent properties in this situation. In Section 5, we develop such an adaptive scheme for the family of SN distributions. In a simulation study, we verify the efficiency and validity of this scheme over SN situations and, further, over two contaminated situations.
Notation and rankbased analysis
Let Y be a n×1 vector of responses which follows the linear model given by
where 1_{ n } is a vector of n ones; X is a n×p design matrix which may contain predictors (covariates) as well as indicator (dummy) variables; β_{0} is an intercept parameter; β is a p×1 vector of regression parameters; and e is a n×1 vector of random errors. Because we have an intercept parameter in the model, we assume without loss of generality that the design matrix X is centered, (all columns of X have mean 0). For the theory discussed below, assume that the components of e are iid with pdf f(x) and cdf F(x), where F(x) is unknown.
The least squares (LS) estimator of β is the vector ${\hat{\mathit{\beta}}}_{\mathit{\text{LS}}}$ which minimizes the Euclidean distance between Y and the column space of X ; that is, it satisfies
where $\parallel \mathit{v}{\parallel}_{2}^{2}=\sum _{i=1}^{n}{v}_{i}^{2}$ is the Euclidean norm.
For the rankbased estimates, simply replace the Euclidean norm $\parallel \xb7{\parallel}_{2}^{2}$ by the pseudonorm
where R(v_{ i }) is the rank of v_{ i } among v_{1},…,v_{ n } and the scores a(i) are generated as as a[ i]=φ[ i/(n+1)], for a nondecreasing bounded squareintegrable function φ(u), satisfying, without loss of generality, the standardizing conditions $\int \phi \left(u\right)\phantom{\rule{0.3em}{0ex}}\mathit{\text{du}}=0$ and $\int {\phi}^{2}\left(u\right)\phantom{\rule{0.3em}{0ex}}\mathit{\text{du}}=1$. Then the rankbased estimator minimizes the ∥·∥_{ φ }distance between Y and the column space of X ; i.e.,
These estimators were proposed by [Jaeckel (1972]) and [Jurečková (1971]). An associated rankbased analysis, including diagnostics procedures, is discussed in Chapters 35 of the monograph by [Hettmansperger and McKean (2011]). A score function needs to be selected. Often the Wilcoxon (linear) score function is used, $\phi \left[\phantom{\rule{0.3em}{0ex}}u\right]=\sqrt{12}[\phantom{\rule{0.3em}{0ex}}u(1/2\left)\right]$. When Wilcoxon scores are used, we refer to the subsequent fit and analysis as the Wilcoxon analysis. Another frequent choice is the sign scores function, φ[ u]=sgn [ u−(1/2)], which yields the l_{1}fit. Score functions are discussed in terms of optimality in Section 2.2.
As with LS, the rankbased estimator of the intercept is a location estimate based on the residuals. For LS, the arithmetic mean is used while for the rankbased estimates, generally, the median is used; i.e.,
2.1 Theory
As shown in [Hettmansperger and McKean (2011]), the influence function of the rankbased estimator ${\widehat{\mathit{\beta}}}_{\phi}$ is given by
where the point $({y}_{0},{\mathit{x}}_{0}^{T})$ represents an outlier. The parameter τ_{ φ } is the scale parameter given by
where
Based on this influence function it is clear that ${\widehat{\mathit{\beta}}}_{\phi}$ is robust in Yspace if the scores function φ(u) is bounded. Note, though, that ${\widehat{\mathit{\beta}}}_{\phi}$ is not robust in X space. A weighted version of the Wilcoxon estimator called the HBR (high breakdown rankbased) achieves 50% breakdown in both the X space and the Yspace; see [Chang et al. (1999]).
As can be seen from the influence function, the asymptotic distribution of the rankbased estimator is given by
Note that the only difference between the theory for the LS and rankbased estimators is that σ^{2} is replaced by ${\tau}_{\phi}^{2}$. Hence, the asymptotic relative efficiency (ARE) between the LS and rankbased estimator is ${\sigma}^{2}/{\tau}_{\phi}^{2}$. For Wilcoxon scores, assuming that the random errors have a normal distribution, this ARE is the familiar 0.955; that is, for normal errors using the Wilcoxon analysis instead of the LS analysis results in only a 5% loss of efficiency. [Koul et al. (1987]) developed an estimator of τ_{ φ }, ${\widehat{\tau}}_{\phi}$, which is computed by RfitRfit.
Chapters 35 of [Hettmansperger and McKean (2011]) discuss rankbased analyses of linear, mixed, and nonlinear models. The examples and simulation studies of this paper make use of these confidence intervals and a robust R^{2}, which we briefly present. An asymptotic (1−α)100% confidence interval for a linear combination of regression parameters, say, h^{T}β is given by
where t_{α/2,n−p−1} denotes the upper α/2, tcritical value with n−p−1 degrees of freedom.
Next, consider a general linear hypothesis of the form
where H is a specified q×p matrix. Let V_{ Full } and V_{ Red } denote the respective full model column space of X and the reduced model subspace of V_{ Full } constrained by H_{0}. Denote the distance between Y and each of these subspaces respectively by $\parallel \mathit{Y}\mathit{X}{\widehat{\mathit{\beta}}}_{\phi}{\parallel}_{\phi}$ and $\parallel \mathit{Y}\mathit{W}{\widehat{\mathit{\theta}}}_{\phi}{\parallel}_{\phi}$, where W denotes a reduced model n×(p−q) design matrix and ${\widehat{\mathit{\theta}}}_{\phi}$ denotes the corresponding reduced model rankbased estimator. Then $\text{RD}=\parallel \mathit{Y}\mathit{W}{\widehat{\mathit{\theta}}}_{\phi}{\parallel}_{\phi}\parallel \mathit{Y}\mathit{X}{\widehat{\mathit{\beta}}}_{\phi}{\parallel}_{\phi}$ denotes the reduction in distance when passing from the reduced model to the full model. This is analogous to the LS reduction in Euclidean squareddistance (reduction in sumsofsquares). The corresponding rankbased Ftest is given by
An asymptotic level α test is to reject H_{0} in favor of H_{ A }, if F_{ φ }≥F_{ α }(q,n−p−1), where F_{ α }(q,n−p−1) denotes the upper αcritical value of an Fdistribution with q and n−p−1 degrees of freedom.
As an example of this test, consider the hypothesis that all regression coefficients except for the intercept parameter are 0; i.e.,
In this case the reduced model dispersion is ∥Y ∥_{ φ }. Thus the reduction in dispersion is $\mathit{\text{RD}}=\parallel \mathit{Y}{\parallel}_{\phi}\parallel \mathit{Y}\mathit{X}{\widehat{\mathit{\beta}}}_{\phi}{\parallel}_{\phi}$. Using this reduction, the robust Ftest statistic is given by expression (12). A nominal level α test of the hypotheses (13) is to reject H_{0} if F_{ φ }≥F_{ α }(p,n−p−1). This test is a robust analogue of the least squares F test statistic that all regression coefficients are 0. Recall that the traditional coefficient of determination R^{2} can be expressed as a onetoone function of the LS Ftest. In the same way, a robust coefficient of determination R^{2} can be formulated as
see page 243 of [Hettmansperger and McKean (2011]). We refer to R_{2} as a robust coefficient of determination in subsequent examples.
2.2 Optimal scores
The rankbased analysis outlined above requires the selection of a score function φ(u). If the form of the underlying error distribution is known, we can obtain an optimal score function which minimizes the variance of the estimator. Using expressions (7) and (8), we can rewrite 1/τ_{ φ } as
where ρ is a correlation coefficient and $\sqrt{I\left(\phantom{\rule{0.3em}{0ex}}f\right)}$ is Fisher Information. Therefore, minimizing τ_{ φ } is equivalent to maximizing the above identity. By the last equality, this is accomplished by making ρ=1; i.e., by taking φ(u) to be φ_{ f }(u). So expression (8) is the score function which optimizes the rankbased analysis. Since ${\hat{\mathit{\beta}}}_{\phi}$ is location and scale equivariant, only the form of f(x) is needed. Furthermore, since in this case ${\tau}_{\phi}=1/\sqrt{I\left(\phantom{\rule{0.3em}{0ex}}f\right)}$, the rankbased estimator ${\hat{\mathit{\beta}}}_{\phi}$ is asymptotically fully efficient, i.e., ${\hat{\mathit{\beta}}}_{\phi}$ has the same asymptotic distribution as the maximum likelihood estimator (mle).
For example, if the error distribution is normal, then the optimal score function simplifies to φ(u)=Φ^{−1}(u), the normal scores. If the error distribution is logistic, then the linear Wilcoxon scores are obtained, while double exponential (Laplace) distributed errors produces the sign scores.
2.3 Computation of the rankbased analysis
The computation of a rankbased analysis can be obtained by using the R package Rfit Rfit developed by [Kloke and McKean (2012]), which can be downloaded at CRAN. Like R, Rfit Rfit is freeware and can run on all platforms (windows, linux, and mac). As we discuss in Section 3, it is easy to install new scores in Rfit Rfit based on a general scores function. For now, we illustrate the computation of Rfit for a Wilcoxon analysis in the following example.
Example 2.1
(Linear Model with SkewNormal Errors). We use a simulated data set based on the model y=β_{0}+β_{1}x_{1}+β_{2}x_{2}+β_{3}x_{3}+e, where x_{1}=1,⋯,50; x_{2} and x_{3} are variates from a standard normal distribution; and the random errors are generated from a standard skewnormal distribution with shape parameter α=−8, as discussed in Section 3. We set β_{1}=0.01, β_{2}=0.15, and β_{3}=0.0. The sample size is n=50. The data set can be downloaded at the url cited in Section 1. The code segments below assume that the R vectors yy, x1x1, x2x2, x3x3, contain respectively the responses and values for x_{1}, x_{2} and x_{3}. For this example, the following R code using the package Rfit Rfit computes the Wilcoxon fit of the model, prints out the table of coefficients, and saves the Studentized residuals in the vector studwstudw.
Based on the summary table, the 95% confidence intervals for β_{ j }, j=1,2,3, (10), trap the true parameters. The overall F_{ φ } test that all the regression coefficients are 0 except for the intercept is significant, p=0.0257. The value of the robust coefficient of determination R_{2}, expression (14), is 18%.
The last command stores the Studentized Wilcoxon residuals in the vector studwstudw. These residuals are adjusted for both variance of the errors and location in the X space; see [McKean and Sheather (2009]) for a review of robust diagnostic procedures. Figure 1 displays Studentized residual and normal q − q plots based on the Wilcoxon fit.
As noted in Section 3, the skewnormal distribution chosen to generate the random variates in this example is left skewed. Hence, as expected, the residuals show longer left than right tails. These plots show that scores for leftskewed error distributions are more appropriate for this data than the Wilcoxon scores. There appears to be one large outlier in the left tail, also. ■
Skewnormal error distributions
The family of skewnormal distributions consists of left and right skewed distributions along with the normal distribution. The pdfs in this family are of the form
where the parameter α satisfies −∞<α<∞ and ϕ(x) and Φ(x) are the pdf and cdf of a standard normal distribution, respectively. For this paper, if a random variable X has this pdf, we say that X has a standard skewnormal distribution with parameter α and write X∼S N(α). If α=0, then X has a standard normal distribution. Further X is distributed left skewed if α<0 and right skewed if α>0. This family of distributions was introduced by [Azzalini (1985]), who discussed many of its properties.
In this paper, we are interested in linear models, (1), where the random errors may have skewnormal errors. In this case, the random error can be written as e_{ i } = b ε_{ i }, where ε_{ i } has a standard skewnormal distribution and b is a scale parameter. The rankbased estimator ${\widehat{\mathit{\beta}}}_{\phi}$ and corresponding analysis are regression and scale equivariant, so there is no need to estimate the scale parameter b. The only scale parameter requiring estimation for standard errors is τ_{ φ }. Likewise, for inference on the vector of parameters β there is no need to estimate the shape parameter α.
What rank scores would be best for such error distributions? To get an idea, we next discuss the optimal scores for a specified α. To obtain the optimal rankbased scores, because of equivariance, we need only the form (down to scale and location) of the pdf. So for the derivation of the scores, assume that the random variable X∼S N(α) with pdf (16). It easily follows that
Denote the inverse of the cdf of X by F^{−1}(u;α). Then it follows from expression (15) that the optimal score function for X is
For all values of α, this score function is strictly increasing over the interval (0,1); see [Azzalini (1985]). As expected, for α=0, expression (17) simplifies to the normal scores. Due to the first term on the rightside of expression (18), all the score functions in this family are unbounded, indicating that the skewnormal family of distributions is lighttailed. Thus the influence functions of the rankbased estimators based on scores in this family are unbounded in the Y space and, hence, are not robust. This includes the normal scores, but [Huber (1981]) pointed out that normal scores are technically robust and, as our simulation studies show, the family of skewnormal scores seems also to be technically robust.
Figure 2 displays the pdf’s and corresponding optimal scores for three values of α: −7, 1, 5. Note that the pdf for α=−7 is left skewed while those for positive α values are right skewed. Unsurprisingly, the pdf for α=1 is closer to being symmetric than the other pdfs. The score function for the leftskewed pdf emphasizes relatively the right tails over the left tails, while the reverse is true for the rightskewed pdfs.
3.1 Computation of the rankbased analysis using skewnormal scores
The computation of the rankbased analysis can be obtained by using the R package RfitRfit. It is easy to install the family of skewnormal scores. Briefly, rankbased scores form a class in Rfit Rfit consisting of three parts: the score function, its derivative, and a vector of parameters used in the definition of the function. For the skewnormal scores, details are given in the appendix, but for the readers convenience the necessary R code is contained in the R function skewnsskewns, which we have placed at the web site cited in the introduction.
Example 3.1
(Example 2.1, Continued). We now return to Example 2.1 and show the computation of the rankbased analysis of it based on the skewnormal scores with shape parameter α=−8. The first two lines of code define the skewnormal scores as salp salp and the third line sets the shape parameter. Details of this definition can be found in the appendix.
Note that the skewnormal analysis is much more precise than the Wilcoxon analysis of the last section. The empirical ARE is (τ_{ W }/τ_{α=−8})^{2}=2.78; i.e., for this data set, the skewnormal analysis is 2.8 times more efficient than the Wilcoxon analysis. Note, also, that the robust coefficient of determination, R_{2}, has increased from 18% to 28%.
3.2 Sensitivity analysis
For a verification of the technical robustness of the rankbased skewnormal analysis, we conducted a small sensitivity analysis. We generated n=50 observations from a linear model of the form y_{ i }=x_{ i }+e_{ i }, where x_{ i } has a N(0,1) distribution and e_{ i } has a N(0,10^{2}) distribution. The x_{ i }’s and e_{ i }’s are all independent. We added outliers of the form
where Δ is in the set {0,20,40,60,80,100,1000,2000}. The sensitivity curve for an estimator $\widehat{\beta}$ is given by the function
where $\widehat{\beta}$ and $\widehat{\beta}\left(\Delta \right)$ denote the estimates of β on the original and modified data (19), respectively. We obtained sensitivity curves for the estimators: Wilcoxon, normal scores, skewnormal (α=3), skewnormal (α=5), skewnormal (α=7), and maximum likelihood estimates (mle). The mles were computed by the package snsn. For all values of Δ, the changes in all of the the rankbased estimates were less than 0.004. Thus the rankbased skewnormal estimators, including the normal scores estimator, exhibited technical robustness for this study. On the other hand, the mle was sensitive to the values of Δ. We show these changes in Table 1; hence, for this study, the mle was not robust.
3.3 Range of practical α parameters
In Sections 4 and 5, the results of simulation studies are presented. The error distributions involve families of skewnormal distributions. So a practical range of α values is needed. [Pourahmadi (2007]) derived properties of the skewnormal distribution including its moment generating function. In particular, he showed that if X has the S N(α) distribution, then X converges in distribution to Z as α→∞ where Z is N(0,1); i.e., the distribution of X converges to a halfnormal distribution. Likewise, X converges in distribution to −Z as α→−∞. In terms of α, the convergence is fairly fast. Table 2 serves as an illustration of this as it displays the mean (μ), median ($\stackrel{~}{\mu}$), variance (σ^{2}), and coefficient of skewness (ξ) for various values of α. The last column of the table shows the value of these parameters for the halfnormal distribution. Positive values of α suffice because if X∼S N(α) then −X∼S N(−α). There is little difference between the standard skewnormal distribution and the halfnormal distribution for values of α near ±8. Based on these facts, we use skewnormal distributions with values of α between −12 and 12 for our Monte Carlo investigations.
Because we are interested in linear models, there is another practical reason for this range of α values. Note that the support of a skewnormal distribution is (−∞,∞) making it ideal for error distributions for regression models. On the other hand, the support of a halfnormal distribution is (0,∞), which is generally the support of a survival distribution. Often, the log’s of such variables are modeled as accelerated failure time (AFT) models, as briefly discussed in Section 1.
Monte Carlo study
This section contains the results of a small simulation study concerning rankbased procedures based on skewnormal scores. The model simulated is
where x_{ i } is distributed N(0,1); e_{ i } is distributed from a selected error distribution; i=1,…,100; the x_{ i }s and e_{ i }s are all independent; and the variable c_{ i } is a treatment indicator with values of either 0 or 1. We selected two error distributions for the study. One is a skewnormal distribution with shape parameter α=5 while the other is a contaminated version of a skewnormal. The contaminated errors are of the form
where W_{ i } has a skewnormal distribution with shape parameter α=5, V_{ i } has a $N({\mu}_{c}=10,{\sigma}_{c}^{2}=36)$ distribution, I_{ε,i} has a binomial (1,ε=0.15) distribution, and W_{ i },V_{ i }, and I_{ε,i} are all independent. Hence, this contaminated distribution is skewed with heavy right tails. The design is slightly unbalanced with n_{1}=45 and n_{2}=55. Without loss of generality β,θ, and β_{0} were set to 0.
For the rank based procedures, we selected the rankbased procedure based on the score function φ_{5}(u), (18), which is optimal for a skew normal distribution with α=5 and then three on each side of the optimal, i.e., procedures based on the score functions φ_{ α }(u) with α=2,3,4,6,7, and 8. With the discussion in Section 3.3 in mind, we also selected the rankbased procedure with α=10. The rankbased Wilcoxon, least squares (LS) procedure, and mle procedures complete the methods investigated. The empirical results presented are the empirical AREs, which for each estimator is the ratio of the empirical meansquare error (MSE) of the mle to the empirical MSE of the estimator; hence, values of this ratio less than 1 are favorable to the mle while values greater than 1 are favorable to the estimator. Secondly, we present the empirical confidence intervals with nominal confidence 0.95. For all the procedures, we chose asymptotic confidence intervals of the form $\widehat{\beta}\pm 1.96\mathit{\text{SE}}\left(\widehat{\beta}\right)$. We used a simulation size of 10,000.
The results are presented in Table 3. For the skewnormal errors, for both parameters β and θ, all the rankbased estimators except the Wilcoxon estimator are more efficient than the mle estimator. Note that the most efficient estimator for both β and θ is the rankbased estimator with α=5; although, empirical efficiencies are not significantly different from the empirical efficiencies for a few of the nearby (α close to 5) rankbased estimators. In terms of validity, the empirical confidences of all the procedures are close to the nominal confidence of 0.95. Not surprisingly, LS performed the worst overall.
For the contaminated error distribution, the rankbased estimators are much more efficient than the mle procedure. Further, the estimator with scores based on α=5 is still the most empirically powerful in the study. It has empirical efficiency of 785% relative to the mle for β and 1310% for θ. Even the Wilcoxon procedure is over 756% more efficient than the mle for θ. On the basis of the empirical confidences, for both parameters, all procedures appear to be from slightly to moderately conservative. Least squares performed extremely poor in the contaminated part of the study. All the rankbased procedures based on skewnormal scores display technical robustness in this study.
HoggType adaptive procedure
In Section 3, we discussed the rankbased method based on the optimal score function for a specified shape parameter α. Asymptotically, it is as efficient as the mle and, at least for the situations covered in the simulation study, the rankbased estimator appears to be more efficient than the mle for finite samples. In practice, though, the true shape parameter is not known. One could obtain the mle of α and use that score function. The mle, however, is not robust. Reconsidering the empirical study, note from Table 3 that the rankbased estimates close to the optimal rankbased estimator were more efficient than the mle and most had efficiencies that were quite close to that of the optimal. That is, in selecting a score function, perhaps close would suffice. In this section, we consider a Hoggtype adaptive scheme which has this as its goal.
[Hogg et al. (1975]) proposed an adaptive procedure for tests of the difference in locations for the two sample problem. The null hypothesis is that the two population distributions are the same. The selection of the test is based on a pair of selector statistics that measure respectively skewness and tail weight of the underlying error distribution. These selector statistics are functions of the order statistics of the combined samples. Several distributionfree rank tests of significance level δ comprise the tests. Under the null hypothesis, it follows from the sufficiency and completeness of the combined order statistics and the distributionfreeness of the rank test statistics that the selected test maintains the level δ. See, also, the discussion in Chapter 10 of [Hogg et al. (2013]).
This is fine for simple location tests where we have distributionfree rank tests, but in our case we are fitting a linear model and, hence, the adaption must be based on the residuals from an initial fit. Thus the above mentioned sufficiency result is not true for our fitting case. [Shomrani (2003]) developed a Hoggtype adaptive scheme for fitting a linear model based on an initial fit. In Shomrani’s scheme, the selector statistics are functions of the residuals from the initial fit. While the significance level is no longer maintained, based on the results of a large simulation study, the scheme’s empirical levels were generally close to the nominal value. In Chapter 6 of [Kloke and McKean (2014]), R software is developed for this scheme.
The adaptive scheme of [Shomrani (2003]) was formed for a wide range of error distributions: from left to right skewed and from light to heavy tailed distributions. We refine this scheme for the skewnormal family of distributions. As discussed above, there are two selector statistics involved. One, Q_{1}, selects based on skewness while the other, Q_{2} selects based on tail thickness. In a preliminary study over the skewnormal family, tail thickness did not seem to be a paramount issue, so we focus on Q_{1} alone.
Let V =(V_{1},V_{2},…,V_{ n })^{T} be a random vector and define
where ${\overline{U}}_{.05}$, ${\overline{M}}_{.5}$, and ${\overline{L}}_{.05}$ are the averages of the largest 5% of the V_{ i }’s, the middle 50% of the V_{ i }’s, and the smallest 5% of the V_{ i }’s, respectively. Large values of Q_{1} indicate that the right tails of the sample are longer than the left tails; i.e., indicating an underlying right skewed distribution. Likewise, small values of Q_{1} indicate leftskewness. Note that as left (right) skew increases, Q_{1} is likely to decrease (increase). The statistic Q_{1} is not robust. One scheme under current investigation is to replace the means by medians. Keep in mind, though, that a robust diagnostic analysis is available for the initial robust fit. Hence in practice, outliers are easily flagged and modifications to the adaptive scheme can be made.
5.1 Adaptive scheme for skew normals
Our adaptive scheme consists of the 7 optimal score functions for skewnormal distributions with α=−12,−8,−4,0,4,8,and 12. So there are three scores each for left and right skewed distributions along with the normal scores. The scheme utilizes residuals from an initial Wilcoxon fit. We chose the Wilcoxon because it is robust. Also, it is optimal for a symmetric distribution (logistic) and, hence, less likely to bias selection for left or right skewness.
We decided to set the benchmarks for the selector statistic Q_{1} based on the medians of the distribution of Q_{1} for the family of skewnormal distributions with α=−10,−6,−2,2,6,and 10. Then the selection part of the adaptive scheme is given by:
We estimated the medians of the sampling distributions of the statistic Q_{1} based on simulations of size 10,000 drawn from the appropriate skewnormal distributions. Because Q_{1} is location and scale equivariant, simulation using standard skewnormal distributions suffices. The estimated (simulated) medians used for the scheme are given in Table 4.
In summary, the algorithm for our adaptive scheme is:

1.
Fit using Wilcoxon scores ⇒ Obtain residuals ${\hat{\mathit{e}}}_{W}$.

2.
Compute ${Q}_{1}\left({\hat{\mathit{e}}}_{W}\right)$ and then select φ _{ α } using expression (24), using the estimated medians of Q _{1}.

3.
Fit with selected score φ _{ α }.

4.
Inference is based on the fit of Step (3).
We next try the scheme on the data of Example 2.1.
Example 5.1
(Example 2.1, continued). For the data in Example 2.1, the selector Q_{1} has the value 0.344; hence the adaptive scheme (24) selects the score function with α=−12. Table 5 summarizes the results of the fits based on skewnormal scores with α’s in a neighborhood of −8.
The values of the regression coefficients are given along with the robust coefficient of determinations R_{2} and estimates of τ_{ φ }. The fits are quite similar. Notice that in terms of precision, ${\widehat{\tau}}_{\phi}$’s, that the fit with α=−8 is the most precise. __
5.2 Simulation study
We investigated the validity and efficiency of the adaptive scheme in a Monte Carlo study using situations similar to those in Section 4. In particular, the sample size is set at n=100 and the linear model contains one predictor coefficient, β, and one indicator coefficient for the treatment, θ. We considered three sampling situations. In the first situation (I), for each simulation, α is randomly selected from the set {−12,−11,…,11,12} where the selections are made equilikely. Then the random errors for the model are generated from this S N(α) distribution. For Situation II, again α is randomly selected from the same set of values but now the random errors are random variables of the form
where S_{ i }∼S N(α), C_{ i }∼N(10,6^{2}), I_{.15} is Bernoulli with proportion of success 0.15, and S_{ i },C_{ i },I_{.15} are independent. Thus, Situation II is the same as the second situation of Section 3, i.e., rightskewed contamination. Situation III is the same as situation II except that C_{ i }∼N(0,6^{2}), i.e., symmetric contamination. 10,000 simulations were used for each situation.
The methods considered are: our adaptive scheme (AdSch), least squares (LS), Wilcoxon (Wil), and maximum likelihood (mle). We also considered the procedure based on the correct α; i.e., the α which is selected for the distribution of the random errors. Note that this is not a statistical method and we label it as Optrv, “rv” for random variable. Even for Situation I, the distribution of its rankbased estimate depends on the multinomial random variable involved in the selection of the simulated distribution. We only include it to serve as an yardstick for the four statistical methods.
As in the simulation study of Section 3, we considered empirical efficiency (relative to the mle) and validity of 95% confidence intervals for the parameters β and θ. The results for Situation I are summarized in Table 6. The adaptive scheme was more efficient than all other statistical methods for both parameters. In particular, it was more efficient than the mle by 6 and 5% respectively. In terms of validity all of the methods are valid. The adaptive scheme was less efficient (by 6%) than the optimal nonstatistical procedure.
For Situation I, the random errors have a skewnormal distribution with shape parameter α drawn from the set {−12,−11,…,12}, while the scheme selects scores from the set {−12,−8,−4,0,4,8,12}. These sets are different; hence, it does not make sense to consider when the scheme made the “correct” selection. We did keep track of how often the selection was within two units of the distribution simulated. For the 10,000 simulations of Situation I, the estimate of this proportion is 0.584. Note that for Situations II and III, the random errors have a contaminated skewnormal distribution. In particular, it is not a skewnormal distribution. So for Situations II and III, this proportion is irrelevant.
The results for Situation II are summarized in Table 7. For this rightskewed contaminated situation, the adaptive scheme is much more efficient than the mle, 358% and 356% respectively for β and θ. The adaptive scheme is adapting to this heavy tailed situation and the mle is not robust. As expected, the Wilcoxon performs best. All the methods are valid. We included the optimal random variable procedure since it was in Situation I. Note, though, that in Situations II and III the distribution of the random errors is not a skewnormal distribution. Besides not being a statistical method it is no longer optimal in any sense.
The results for the situation, III, with symmetric contamination can be found in Table 8. The adaptive scheme is much more efficient than the mle, 367% and 477% respectively for β and θ. As expected, the Wilcoxon performs best. All the rankbased methods are valid. The mle is slightly conservative for θ. The same comments hold for Optrv as in Situation II.
Conclusion
Rankbased analyses of linear models depend on the selection of a score function. In practice, often the Wilcoxon (linear) score function is chosen. These scores require no tuning constants and, further, the Wilcoxon rankbased analysis attains 95.5% efficiency relative to the traditional least squares (LS) analysis when the random error distribution is normal. However, rankbased analyses are easily optimized if there is knowledge of the distribution of the random errors of the linear model. For example, if the random errors are normally distributed then selecting the normal scores for the rankbased analysis results in the efficiency of 100% (fully efficient) relative to the LS analysis.
In this paper, we have presented the rankbased analyses based on appropriate score functions for random errors having a distribution from the family of skewnormal distributions. In this case, the score function depends on the shape parameter α, −∞<α<∞. Of course, the rankbased analysis is fully efficient if the correct α is known. The rankbased analysis is a complete analysis, including fitting, inference (rankbased ANOVA), and robust diagnostic procedures. Based on the results of our Monte Carlo, these rankbased analyses appear to be more efficient than the maximum likelihood (mle) analysis for the skewnormal distributions considered. The most efficient rankbased analysis is based on the optimal score function, but even those rankbased analyses with shape parameters within three units of the correct α were more efficient than the mle in these situations. They were much more efficient than the mle’s over situations where the error distribution had a contaminated skewnormal distribution. Based on empirical confidence levels, all the methods in the study were valid.
The good efficiency results for the rankbased analyses in a neighborhood of the true α suggest that a Hoggtype adaptive scheme would have high efficiency. In Section 5, we developed such a scheme for the skewnormal family of distributions based on an initial robust Wilcoxon fit. In the Monte Carlo studies we performed, this scheme was more efficient than the mle over the family of skewnormal distributions and was much more efficient than the mle over the contaminated skewnormal situations. Furthermore, for the situations covered, this adaptive scheme appears to be valid.
[Kloke and McKean (2012]) developed an R package Rfit Rfit for these rankbased analyses, which can be freely downloaded at CRAN. The default scores are the Wilcoxon scores, but, as we discuss in Section 3 it is easy to add classes of scores including the optimal scores for skewnormal distributions. The adaptive scheme of Section 5 is also easily coded using RfitRfit. The necessary code for the scores and the adaptive scheme can be found at the web site cited in Section 1. Hence, computation of these rankbased analyses is not a problem.
The rankbased analyses using skewnormal scores are robust in Y (response) space, but not in X (factor) space. The weighted Wilcoxon fit proposed by [Chang et al. (1999]) yield a robust rankbased analysis which possesses 50% breakdown in X (factor) space. We are now developing such an analysis for the skewnormal scores; see [Abebe et al. (2014]) for discussion. This analysis could also be part of an adaptive scheme.
A Appendix: R code for the class of skew normal scores
Rankbased scores form a class in Rfit Rfit which consists of three principal parts. The first part is the scores function itself and the third part consists of any parameters in the function expression. Thus parts (1) and (3) for the skewnormal scores are respectively given by expression (18) and the shape parameter α. The second part is the derivative of the score function which is used in the Rfit Rfit function which estimates τ_{ φ }. Let l(x) denote the function defined in expression (18). The derivative of the optimal score function is given by
where
To complete the class statement for the skewnormal scores we need only compute the quantiles F^{−1}(u;α). [Azzalini (2013]) developed the R package sn sn (available at CRAN) which computes the quantile function F^{−1}(u;α) and, also, the corresponding pdf and cdf. The command qsn(u,shape=alpha) qsn(u,shape=alpha) returns F^{−1}(u;α), for 0<u<1. The package sn sn requires the package mnormtmnormt.
The following R code defines the class of skewnormal scores:
The next code segment obtains the data for a plot of the scores with shape parameter α=−7.
References
Abebe A, McKean JW: Weighted Wilcoxon estimators in Nonlinear Regression. Aust N Z J. Stat 2013, 55: 401–420. 10.1111/anzs.12046
Abebe, A, McKean, JW, Kloke, JD, Bilgic, Y: Iterated Reweighted RankBased Estimates for GEE Models. Technical Report (2014).
Azzalini A: A class of distributions which includes the normal ones. Scand. J. Stat 1985, 12: 171–178.
Azzalini, A: R package sn: The skewnormal and skewt distributions (version 0.4–18).
Chang W, McKean J, Naranjo J, Sheather S: Highbreakdown rank regression. J. Am. Stat. Assoc 1999, 94: 205–219. 10.1080/01621459.1999.10473836
Hettmansperger TP, McKean JW: Robust Nonparametric Statistical Methods. ChapmanHall, Boca Raton, FL; 2011.
Hogg RV, Fisher DM, Randles RH: A twosample adaptive distributionfree test. J. Am. Stat. Assoc 1975, 70: 656–661.
Hogg RV, McKean JW, Craig AT: Introduction to Mathematical Statistics. Pearson, Boston; 2013.
Huber, PJ: Robust Statistics. John Wiley & Son (1981).
Jaeckel LA: Estimating regression coefficients by minimizing the dispersion of residuals. Ann. Math. Stat 1972, 43: 1449–1458. 10.1214/aoms/1177692377
Jurečková J: Nonparametric estimate of regression coefficients. Ann. Math. Stat 1971, 42: 1328–1338. 10.1214/aoms/1177693245
Kloke JD, McKean JW: Rfit: Rankbased estimation for linear models. R J 2012, 4: 57–64.
McKean JW: Small sample properties of JR estimators. In JSM Proceedings . American Statistical Association, Alexandria, VA; 2013.
Kloke, JD, McKean, JW: Nonparametric statistical methods using R, ChapmanHall, Boca Raton, FL (2014).
Kloke JD, McKean JW, Rashid M: Rankbased estimation and associated inferences for linear models with cluster correlated errors. J. Am. Stat. Assoc 2009, 104: 384–390. 10.1198/jasa.2009.0116
Koul HL, Sievers GL, McKean JW: An estimator of the scale parameter for the rank analysis of linear models under general score functions. Scand. J. Stat 1987, 14: 131–141.
McKean J, Sheather S: Diagnostic procedures. Wiley Interdiscip. S. Rev.: Comput tat 2009, 1(2):221–233. 10.1002/wics.12
McKean J, Sievers G: Rank scores suitable for analysis of linear models under asymmetric error distributions. Technometrics 1989, 31: 207–218. 10.1080/00401706.1989.10488514
Pourahmadi M: Construction of skewnormal random variables: Are they linear combinations of normals and halfnormals. J. Stat. Theory Appl 2007, 3: 314–328.
Shomrani, A: A comparison of different schemes for selecting and estimating score functions based on residuals. Ph.D. thesis, Western Michigan University, Department of Statistics (2003).
Acknowledgement
We acknowledge the helpful comments of an associate editor and a referee on the original manuscript.
Author information
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
The authors contributed equally to the manuscript. All authors read and approved the final manuscript.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0), which permits use, duplication, adaptation, distribution, and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Received
Accepted
Published
DOI
Keywords
 Linear models
 Monte Carlo
 Nonparametrics
 Regression rank scores
 Robust
 Wilcoxon procedures