Open Access

Efficient and adaptive rank-based fits for linear models with skew-normal errors

Journal of Statistical Distributions and Applications20141:18

https://doi.org/10.1186/s40488-014-0018-0

Received: 10 February 2014

Accepted: 21 July 2014

Published: 22 October 2014

Abstract

The rank-based fit of a linear model is based on minimizing a norm. A score function needs to be selected for the fit and the proper choice leads to asymptotically efficient regression estimators, i.e., fits equivalent to the maximum likelihood estimators (mle). In this paper, we present the family of optimal scores functions for the skew-normal family of distributions. We show the easy computation of this rank-based fit using the R package Rfit. We present the results of a small simulation study comparing the rank-based estimators and the mles in terms of efficiency and validity over skew-normal and contaminated normal distributions. We also develop and present empirical results for a Hogg-type adaptive procedure for selecting among a family of these scores based on a robust initial fit.

Keywords

Linear modelsMonte CarloNonparametricsRegression rank scoresRobustWilcoxon procedures

Introduction

Rank-based fitting of linear models offers an attractive alternative to least squares (LS) and maximum likelihood (mle) fitting. The geometry of the rank-based fit is similar to that of LS. Simply replace the Euclidean squared norm used in the LS fit with another norm which, unlike LS, results in a robust fit. The accompanying robust analysis is the analogue of the LS’s ANOVA and ANCOVA. The rank-based analysis offers a complete analysis including robust diagnostics to check quality of fit. This rank based analysis has recently been extended to mixed and nonlinear models; see [Kloke et al. (2009]) and [Abebe and McKean (2013]), respectively. A full development of the rank-based analysis can be found in Chapters 3-5 of the monograph by [Hettmansperger and McKean (2011]). The rank-based analysis is, generally, highly efficient. It can easily be optimized depending on the information available concerning the distribution of the random errors. For example, if the form of the error distribution is known, then an appropriate rank-based procedure can be selected to attain full efficiency.

In this paper, we discuss rank-based analyses which are appropriate for the skew-normal (SN) family of distributions. This is a rich family of skewed distributions developed by [Azzalini (1985]). The skewness of a distribution in this family is controlled by a shape parameter α, −<α<. Distributions are left or right skewed, depending on whether α<0 or α>0, respectively. If α=0 then the distribution is normal. As we discuss in Section 3, all of these distributions are light-tailed. Such families of distributions occur frequently in accelerated failure time (AFT) models. The response of interest in such models is survival time and its log is frequently modeled in terms of a log linear model. The random errors for these log linear models are, generally, skewed. For instance, the family of distributions for log linear models when survival time follows an F-distribution contains a wide variety of skewed distributions with tail weights that range from moderate to heavy; see [McKean and Sievers (1989]). Hence, the skew-normal family adds a rich class of light-tailed skewed distributions which also includes the normal distribution.

In Section 2, we outline the rank-based analysis for a general linear model. The computation of these analyses is easily handled by the R package RfitRfit, developed by [Kloke and McKean (2013]), which can be freely downloaded at the CRAN http://cran.us.r-project.org/. We discuss this software for an example in this section and we continue such discussion in the remainder of the article. The data for the example and the R code, supplemental to RfitRfit, used in this article are available to the reader at the http://www.stat.wmich.edu/mckean/SN/.

In Section 3, we develop rank-based analyses for the skew-normal family. These analyses are efficient for this family and the appropriate analysis is fully efficient. As we discuss, these analyses are technically robust similar to the optimal rank-based analysis for normally distributed errors. In contrast, in a sensitivity analysis, we show that the maximum likelihood fit (mle) is not robust. In Section 4, we present the results of a Monte Carlo study which verify the robustness and validity of the rank-based analysis over a family of SN distributions and contaminated SN distributions. These studies confirm the nonrobustness of the mle analysis.

The rank-based analysis depends on the shape parameter α. One outcome of the Monte Carlo study is that rank-based analyses based on shape parameters in a neighborhood of the correct α had very similar behavior to that using the correct α. This suggests that a simple Hogg-type adaptive procedure would entertain excellent properties in this situation. In Section 5, we develop such an adaptive scheme for the family of SN distributions. In a simulation study, we verify the efficiency and validity of this scheme over SN situations and, further, over two contaminated situations.

Notation and rank-based analysis

Let Y be a n×1 vector of responses which follows the linear model given by
Y = 1 n β 0 + X β + e .
(1)

where 1 n is a vector of n ones; X is a n×p design matrix which may contain predictors (covariates) as well as indicator (dummy) variables; β0 is an intercept parameter; β is a p×1 vector of regression parameters; and e is a n×1 vector of random errors. Because we have an intercept parameter in the model, we assume without loss of generality that the design matrix X is centered, (all columns of X have mean 0). For the theory discussed below, assume that the components of e are iid with pdf f(x) and cdf F(x), where F(x) is unknown.

The least squares (LS) estimator of β is the vector β ̂ LS which minimizes the Euclidean distance between Y and the column space of X ; that is, it satisfies
β ̂ LS = Argmin Y X β 2 2 . ,
(2)

where v 2 2 = i = 1 n v i 2 is the Euclidean norm.

For the rank-based estimates, simply replace the Euclidean norm · 2 2 by the pseudo-norm
v φ = i = 1 n a [ R ( v i ) ] v i , v R n ,
(3)
where R(v i ) is the rank of v i among v1,…,v n and the scores a(i) are generated as as a[ i]=φ[ i/(n+1)], for a nondecreasing bounded square-integrable function φ(u), satisfying, without loss of generality, the standardizing conditions φ ( u ) du = 0 and φ 2 ( u ) du = 1 . Then the rank-based estimator minimizes the · φ -distance between Y and the column space of X ; i.e.,
β ̂ φ = Argmin Y X β φ .
(4)

These estimators were proposed by [Jaeckel (1972]) and [Jurečková (1971]). An associated rank-based analysis, including diagnostics procedures, is discussed in Chapters 3-5 of the monograph by [Hettmansperger and McKean (2011]). A score function needs to be selected. Often the Wilcoxon (linear) score function is used, φ [ u ] = 12 [ u ( 1 / 2 ) ] . When Wilcoxon scores are used, we refer to the subsequent fit and analysis as the Wilcoxon analysis. Another frequent choice is the sign scores function, φ[ u]=sgn [ u−(1/2)], which yields the l1-fit. Score functions are discussed in terms of optimality in Section 2.2.

As with LS, the rank-based estimator of the intercept is a location estimate based on the residuals. For LS, the arithmetic mean is used while for the rank-based estimates, generally, the median is used; i.e.,
β ̂ 0 = med i Y i x T β ̂ φ .
(5)

2.1 Theory

As shown in [Hettmansperger and McKean (2011]), the influence function of the rank-based estimator β ̂ φ is given by
Ω x 0 , y 0 ; β ̂ φ = τ φ ( X X / n ) 1 φ [ F ( y 0 ) ] x 0 ,
(6)
where the point ( y 0 , x 0 T ) represents an outlier. The parameter τ φ is the scale parameter given by
τ φ 1 = φ ( u ) φ f ( u ) du ,
(7)
where
φ f ( u ) = f ( F 1 ( u ) ) f ( F 1 ( u ) ) .
(8)

Based on this influence function it is clear that β ̂ φ is robust in Y-space if the scores function φ(u) is bounded. Note, though, that β ̂ φ is not robust in X -space. A weighted version of the Wilcoxon estimator called the HBR (high breakdown rank-based) achieves 50% breakdown in both the X -space and the Y-space; see [Chang et al. (1999]).

As can be seen from the influence function, the asymptotic distribution of the rank-based estimator is given by
β ̂ φ N p β , τ φ 2 ( X X ) 1 .
(9)

Note that the only difference between the theory for the LS and rank-based estimators is that σ2 is replaced by τ φ 2 . Hence, the asymptotic relative efficiency (ARE) between the LS and rank-based estimator is σ 2 / τ φ 2 . For Wilcoxon scores, assuming that the random errors have a normal distribution, this ARE is the familiar 0.955; that is, for normal errors using the Wilcoxon analysis instead of the LS analysis results in only a 5% loss of efficiency. [Koul et al. (1987]) developed an estimator of τ φ , τ ̂ φ , which is computed by RfitRfit.

Chapters 3-5 of [Hettmansperger and McKean (2011]) discuss rank-based analyses of linear, mixed, and nonlinear models. The examples and simulation studies of this paper make use of these confidence intervals and a robust R2, which we briefly present. An asymptotic (1−α)100% confidence interval for a linear combination of regression parameters, say, h T β is given by
h T β ̂ φ ± t α / 2 , n p 1 τ ̂ φ h T ( X X ) 1 h ,
(10)

where tα/2,np−1 denotes the upper α/2, t-critical value with np−1 degrees of freedom.

Next, consider a general linear hypothesis of the form
H 0 : H β = 0 versus H A : H β 0
(11)
where H is a specified q×p matrix. Let V Full and V Red denote the respective full model column space of X and the reduced model subspace of V Full constrained by H0. Denote the distance between Y and each of these subspaces respectively by Y X β ̂ φ φ and Y W θ ̂ φ φ , where W denotes a reduced model n×(pq) design matrix and θ ̂ φ denotes the corresponding reduced model rank-based estimator. Then RD = Y W θ ̂ φ φ Y X β ̂ φ φ denotes the reduction in distance when passing from the reduced model to the full model. This is analogous to the LS reduction in Euclidean squared-distance (reduction in sums-of-squares). The corresponding rank-based F-test is given by
F φ = RD / q τ ̂ φ / 2 .
(12)

An asymptotic level α test is to reject H0 in favor of H A , if F φ F α (q,np−1), where F α (q,np−1) denotes the upper α-critical value of an F-distribution with q and np−1 degrees of freedom.

As an example of this test, consider the hypothesis that all regression coefficients except for the intercept parameter are 0; i.e.,
H 0 : β = 0 versus H A : β 0 ,
(13)
In this case the reduced model dispersion is Y φ . Thus the reduction in dispersion is RD = Y φ Y X β ̂ φ φ . Using this reduction, the robust F-test statistic is given by expression (12). A nominal level α test of the hypotheses (13) is to reject H0 if F φ F α (p,np−1). This test is a robust analogue of the least squares F test statistic that all regression coefficients are 0. Recall that the traditional coefficient of determination R2 can be expressed as a one-to-one function of the LS F-test. In the same way, a robust coefficient of determination R2 can be formulated as
R 2 = RD RD + ( n p 1 ) ( τ ̂ φ / 2 ) ;
(14)

see page 243 of [Hettmansperger and McKean (2011]). We refer to R2 as a robust coefficient of determination in subsequent examples.

2.2 Optimal scores

The rank-based analysis outlined above requires the selection of a score function φ(u). If the form of the underlying error distribution is known, we can obtain an optimal score function which minimizes the variance of the estimator. Using expressions (7) and (8), we can rewrite 1/τ φ as
( τ φ ) 1 = 0 1 φ ( u ) φ f ( u ) du = 0 1 φ ( u ) φ f ( u ) du 0 1 φ f 2 ( u ) du 1 / 2 0 1 φ f 2 ( u ) du 1 / 2 = ρ 0 1 φ f 2 ( u ) du 1 / 2 = ρ I ( f ) ,
(15)

where ρ is a correlation coefficient and I ( f ) is Fisher Information. Therefore, minimizing τ φ is equivalent to maximizing the above identity. By the last equality, this is accomplished by making ρ=1; i.e., by taking φ(u) to be φ f (u). So expression (8) is the score function which optimizes the rank-based analysis. Since β ̂ φ is location and scale equivariant, only the form of f(x) is needed. Furthermore, since in this case τ φ = 1 / I ( f ) , the rank-based estimator β ̂ φ is asymptotically fully efficient, i.e., β ̂ φ has the same asymptotic distribution as the maximum likelihood estimator (mle).

For example, if the error distribution is normal, then the optimal score function simplifies to φ(u)=Φ−1(u), the normal scores. If the error distribution is logistic, then the linear Wilcoxon scores are obtained, while double exponential (Laplace) distributed errors produces the sign scores.

2.3 Computation of the rank-based analysis

The computation of a rank-based analysis can be obtained by using the R package Rfit Rfit developed by [Kloke and McKean (2012]), which can be downloaded at CRAN. Like R, Rfit Rfit is freeware and can run on all platforms (windows, linux, and mac). As we discuss in Section 3, it is easy to install new scores in Rfit Rfit based on a general scores function. For now, we illustrate the computation of Rfit for a Wilcoxon analysis in the following example.

Example 2.1

(Linear Model with Skew-Normal Errors). We use a simulated data set based on the model y=β0+β1x1+β2x2+β3x3+e, where x1=1,,50; x2 and x3 are variates from a standard normal distribution; and the random errors are generated from a standard skew-normal distribution with shape parameter α=−8, as discussed in Section 3. We set β1=0.01, β2=0.15, and β3=0.0. The sample size is n=50. The data set can be downloaded at the url cited in Section 1. The code segments below assume that the R vectors yy, x1x1, x2x2, x3x3, contain respectively the responses and values for x1, x2 and x3. For this example, the following R code using the package Rfit Rfit computes the Wilcoxon fit of the model, prints out the table of coefficients, and saves the Studentized residuals in the vector studwstudw.

Based on the summary table, the 95% confidence intervals for β j , j=1,2,3, (10), trap the true parameters. The overall F φ test that all the regression coefficients are 0 except for the intercept is significant, p=0.0257. The value of the robust coefficient of determination R2, expression (14), is 18%.

The last command stores the Studentized Wilcoxon residuals in the vector studwstudw. These residuals are adjusted for both variance of the errors and location in the X -space; see [McKean and Sheather (2009]) for a review of robust diagnostic procedures. Figure 1 displays Studentized residual and normal qq plots based on the Wilcoxon fit.
Figure 1

Studentized residual and normal q q plots based on the Wilcoxon fit of the data in Example 2.1.

As noted in Section 3, the skew-normal distribution chosen to generate the random variates in this example is left skewed. Hence, as expected, the residuals show longer left than right tails. These plots show that scores for left-skewed error distributions are more appropriate for this data than the Wilcoxon scores. There appears to be one large outlier in the left tail, also. ■

Skew-normal error distributions

The family of skew-normal distributions consists of left and right skewed distributions along with the normal distribution. The pdfs in this family are of the form
f ( x ; α ) = 2 ϕ ( x ) Φ ( αx ) ,
(16)

where the parameter α satisfies −<α< and ϕ(x) and Φ(x) are the pdf and cdf of a standard normal distribution, respectively. For this paper, if a random variable X has this pdf, we say that X has a standard skew-normal distribution with parameter α and write XS N(α). If α=0, then X has a standard normal distribution. Further X is distributed left skewed if α<0 and right skewed if α>0. This family of distributions was introduced by [Azzalini (1985]), who discussed many of its properties.

In this paper, we are interested in linear models, (1), where the random errors may have skew-normal errors. In this case, the random error can be written as e i = b ε i , where ε i has a standard skew-normal distribution and b is a scale parameter. The rank-based estimator β ̂ φ and corresponding analysis are regression and scale equivariant, so there is no need to estimate the scale parameter b. The only scale parameter requiring estimation for standard errors is τ φ . Likewise, for inference on the vector of parameters β there is no need to estimate the shape parameter α.

What rank scores would be best for such error distributions? To get an idea, we next discuss the optimal scores for a specified α. To obtain the optimal rank-based scores, because of equivariance, we need only the form (down to scale and location) of the pdf. So for the derivation of the scores, assume that the random variable XS N(α) with pdf (16). It easily follows that
f ( x ; α ) f ( x ; α ) = x αϕ ( αx ) Φ ( αx ) .
(17)
Denote the inverse of the cdf of X by F−1(u;α). Then it follows from expression (15) that the optimal score function for X is
φ α ( u ) = F 1 ( u ; α ) αϕ ( α F 1 ( u ; α ) ) Φ ( α F 1 ( u ; α ) ) .
(18)

For all values of α, this score function is strictly increasing over the interval (0,1); see [Azzalini (1985]). As expected, for α=0, expression (17) simplifies to the normal scores. Due to the first term on the right-side of expression (18), all the score functions in this family are unbounded, indicating that the skew-normal family of distributions is light-tailed. Thus the influence functions of the rank-based estimators based on scores in this family are unbounded in the Y -space and, hence, are not robust. This includes the normal scores, but [Huber (1981]) pointed out that normal scores are technically robust and, as our simulation studies show, the family of skew-normal scores seems also to be technically robust.

Figure 2 displays the pdf’s and corresponding optimal scores for three values of α: −7, 1, 5. Note that the pdf for α=−7 is left skewed while those for positive α values are right skewed. Unsurprisingly, the pdf for α=1 is closer to being symmetric than the other pdfs. The score function for the left-skewed pdf emphasizes relatively the right tails over the left tails, while the reverse is true for the right-skewed pdfs.
Figure 2

These plots display the pdfs of the three skew-normal distributions with shape parameter α =−7,1, and 5, along with the corresponding optimal scores.

3.1 Computation of the rank-based analysis using skew-normal scores

The computation of the rank-based analysis can be obtained by using the R package RfitRfit. It is easy to install the family of skew-normal scores. Briefly, rank-based scores form a class in Rfit Rfit consisting of three parts: the score function, its derivative, and a vector of parameters used in the definition of the function. For the skew-normal scores, details are given in the appendix, but for the readers convenience the necessary R code is contained in the R function skewnsskewns, which we have placed at the web site cited in the introduction.

Example 3.1

(Example 2.1, Continued). We now return to Example 2.1 and show the computation of the rank-based analysis of it based on the skew-normal scores with shape parameter α=−8. The first two lines of code define the skew-normal scores as salp salp and the third line sets the shape parameter. Details of this definition can be found in the appendix.

Note that the skew-normal analysis is much more precise than the Wilcoxon analysis of the last section. The empirical ARE is (τ W /τα=−8)2=2.78; i.e., for this data set, the skew-normal analysis is 2.8 times more efficient than the Wilcoxon analysis. Note, also, that the robust coefficient of determination, R2, has increased from 18% to 28%.

3.2 Sensitivity analysis

For a verification of the technical robustness of the rank-based skew-normal analysis, we conducted a small sensitivity analysis. We generated n=50 observations from a linear model of the form y i =x i +e i , where x i has a N(0,1) distribution and e i has a N(0,102) distribution. The x i ’s and e i ’s are all independent. We added outliers of the form
y 50 y 50 + Δ ,
(19)
where Δ is in the set {0,20,40,60,80,100,1000,2000}. The sensitivity curve for an estimator β ̂ is given by the function
S ( Δ ; β ̂ ) = β ̂ β ̂ ( Δ ) ,
(20)
where β ̂ and β ̂ ( Δ ) denote the estimates of β on the original and modified data (19), respectively. We obtained sensitivity curves for the estimators: Wilcoxon, normal scores, skew-normal (α=3), skew-normal (α=5), skew-normal (α=7), and maximum likelihood estimates (mle). The mles were computed by the package snsn. For all values of Δ, the changes in all of the the rank-based estimates were less than 0.004. Thus the rank-based skew-normal estimators, including the normal scores estimator, exhibited technical robustness for this study. On the other hand, the mle was sensitive to the values of Δ. We show these changes in Table 1; hence, for this study, the mle was not robust.
Table 1

Values of the sensitivity function for the mle at the given values of Δ

Δ

0

20

40

60

80

100

1000

2000

mle

0.00

−0.07

−0.07

−0.00

0.12

0.30

−5.80

−6.32

3.3 Range of practical α parameters

In Sections 4 and 5, the results of simulation studies are presented. The error distributions involve families of skew-normal distributions. So a practical range of α values is needed. [Pourahmadi (2007]) derived properties of the skew-normal distribution including its moment generating function. In particular, he showed that if X has the S N(α) distribution, then X converges in distribution to |Z| as α where Z is N(0,1); i.e., the distribution of X converges to a half-normal distribution. Likewise, X converges in distribution to −|Z| as α→−. In terms of α, the convergence is fairly fast. Table 2 serves as an illustration of this as it displays the mean (μ), median ( μ ~ ), variance (σ2), and coefficient of skewness (ξ) for various values of α. The last column of the table shows the value of these parameters for the half-normal distribution. Positive values of α suffice because if XS N(α) then −XS N(−α). There is little difference between the standard skew-normal distribution and the half-normal distribution for values of α near ±8. Based on these facts, we use skew-normal distributions with values of α between −12 and 12 for our Monte Carlo investigations.
Table 2

Parameters (mean μ , median μ ~ , variance σ 2 , and coefficient of skewness ξ ) for S N ( α ) distributions for the given values of α

α

0.00

1.00

2.00

4.00

6.00

7.00

8.00

10.00

15.00

20.00

μ

0.00

0.56

0.71

0.77

0.79

0.79

0.79

0.79

0.80

0.80

0.80

μ ~

0.00

0.55

0.66

0.67

0.67

0.67

0.67

0.67

0.67

0.67

0.67

σ 2

1.00

0.68

0.49

0.40

0.38

0.38

0.37

0.37

0.37

0.36

0.36

ξ

0.00

0.14

0.45

0.78

0.89

0.92

0.93

0.96

0.98

0.98

0.99

The values in the last column ( ) are those for a half-normal distribution.

Because we are interested in linear models, there is another practical reason for this range of α values. Note that the support of a skew-normal distribution is (−,) making it ideal for error distributions for regression models. On the other hand, the support of a half-normal distribution is (0,), which is generally the support of a survival distribution. Often, the log’s of such variables are modeled as accelerated failure time (AFT) models, as briefly discussed in Section 1.

Monte Carlo study

This section contains the results of a small simulation study concerning rank-based procedures based on skew-normal scores. The model simulated is
y i = β 0 + β 1 x i + θ c i + e i ,
(21)
where x i is distributed N(0,1); e i is distributed from a selected error distribution; i=1,…,100; the x i s and e i s are all independent; and the variable c i is a treatment indicator with values of either 0 or 1. We selected two error distributions for the study. One is a skew-normal distribution with shape parameter α=5 while the other is a contaminated version of a skew-normal. The contaminated errors are of the form
e i = ( 1 I ε , i ) W i + I ε , i V i ,
(22)

where W i has a skew-normal distribution with shape parameter α=5, V i has a N ( μ c = 10 , σ c 2 = 36 ) distribution, Iε,i has a binomial (1,ε=0.15) distribution, and W i ,V i , and Iε,i are all independent. Hence, this contaminated distribution is skewed with heavy right tails. The design is slightly unbalanced with n1=45 and n2=55. Without loss of generality β,θ, and β0 were set to 0.

For the rank based procedures, we selected the rank-based procedure based on the score function φ5(u), (18), which is optimal for a skew normal distribution with α=5 and then three on each side of the optimal, i.e., procedures based on the score functions φ α (u) with α=2,3,4,6,7, and 8. With the discussion in Section 3.3 in mind, we also selected the rank-based procedure with α=10. The rank-based Wilcoxon, least squares (LS) procedure, and mle procedures complete the methods investigated. The empirical results presented are the empirical AREs, which for each estimator is the ratio of the empirical mean-square error (MSE) of the mle to the empirical MSE of the estimator; hence, values of this ratio less than 1 are favorable to the mle while values greater than 1 are favorable to the estimator. Secondly, we present the empirical confidence intervals with nominal confidence 0.95. For all the procedures, we chose asymptotic confidence intervals of the form β ̂ ± 1.96 SE ( β ̂ ) . We used a simulation size of 10,000.

The results are presented in Table 3. For the skew-normal errors, for both parameters β and θ, all the rank-based estimators except the Wilcoxon estimator are more efficient than the mle estimator. Note that the most efficient estimator for both β and θ is the rank-based estimator with α=5; although, empirical efficiencies are not significantly different from the empirical efficiencies for a few of the nearby (α close to 5) rank-based estimators. In terms of validity, the empirical confidences of all the procedures are close to the nominal confidence of 0.95. Not surprisingly, LS performed the worst overall.
Table 3

Summary of results of simulation study of rank-based procedures and the mle procedure for the skew-normal with shape α =5 distribution and a skew-normal contaminated distribution

 

Skew normal errors

Contaminated errors

 

β

θ

β

θ

Proced.

ARE

Conf.

ARE

Conf.

ARE

Conf.

ARE

Conf.

rb α=2

1.02

0.96

1.04

0.96

6.61

0.98

10.84

0.98

rb α=3

1.09

0.96

1.11

0.96

7.43

0.97

12.24

0.98

rb α=4

1.13

0.96

1.15

0.96

7.79

0.97

12.91

0.98

rb α=5

1.14

0.96

1.16

0.96

7.85

0.96

13.10

0.97

rb α=6

1.13

0.95

1.16

0.96

7.73

0.96

13.02

0.97

rb α=7

1.11

0.95

1.14

0.95

7.49

0.95

12.72

0.97

rb α=8

1.09

0.95

1.12

0.95

7.17

0.95

12.30

0.96

rb α=10

1.04

0.94

1.07

0.94

6.46

0.94

11.22

0.95

rb Wil.

0.78

0.95

0.79

0.95

4.70

0.96

7.56

0.97

LS

0.70

0.95

0.71

0.95

0.20

0.95

0.31

0.95

mle

1.00

0.93

1.00

0.93

1.00

0.96

1.00

0.99

For the contaminated error distribution, the rank-based estimators are much more efficient than the mle procedure. Further, the estimator with scores based on α=5 is still the most empirically powerful in the study. It has empirical efficiency of 785% relative to the mle for β and 1310% for θ. Even the Wilcoxon procedure is over 756% more efficient than the mle for θ. On the basis of the empirical confidences, for both parameters, all procedures appear to be from slightly to moderately conservative. Least squares performed extremely poor in the contaminated part of the study. All the rank-based procedures based on skew-normal scores display technical robustness in this study.

Hogg-Type adaptive procedure

In Section 3, we discussed the rank-based method based on the optimal score function for a specified shape parameter α. Asymptotically, it is as efficient as the mle and, at least for the situations covered in the simulation study, the rank-based estimator appears to be more efficient than the mle for finite samples. In practice, though, the true shape parameter is not known. One could obtain the mle of α and use that score function. The mle, however, is not robust. Reconsidering the empirical study, note from Table 3 that the rank-based estimates close to the optimal rank-based estimator were more efficient than the mle and most had efficiencies that were quite close to that of the optimal. That is, in selecting a score function, perhaps close would suffice. In this section, we consider a Hogg-type adaptive scheme which has this as its goal.

[Hogg et al. (1975]) proposed an adaptive procedure for tests of the difference in locations for the two sample problem. The null hypothesis is that the two population distributions are the same. The selection of the test is based on a pair of selector statistics that measure respectively skewness and tail weight of the underlying error distribution. These selector statistics are functions of the order statistics of the combined samples. Several distribution-free rank tests of significance level δ comprise the tests. Under the null hypothesis, it follows from the sufficiency and completeness of the combined order statistics and the distribution-freeness of the rank test statistics that the selected test maintains the level δ. See, also, the discussion in Chapter 10 of [Hogg et al. (2013]).

This is fine for simple location tests where we have distribution-free rank tests, but in our case we are fitting a linear model and, hence, the adaption must be based on the residuals from an initial fit. Thus the above mentioned sufficiency result is not true for our fitting case. [Shomrani (2003]) developed a Hogg-type adaptive scheme for fitting a linear model based on an initial fit. In Shomrani’s scheme, the selector statistics are functions of the residuals from the initial fit. While the significance level is no longer maintained, based on the results of a large simulation study, the scheme’s empirical levels were generally close to the nominal value. In Chapter 6 of [Kloke and McKean (2014]), R software is developed for this scheme.

The adaptive scheme of [Shomrani (2003]) was formed for a wide range of error distributions: from left to right skewed and from light to heavy tailed distributions. We refine this scheme for the skew-normal family of distributions. As discussed above, there are two selector statistics involved. One, Q1, selects based on skewness while the other, Q2 selects based on tail thickness. In a preliminary study over the skew-normal family, tail thickness did not seem to be a paramount issue, so we focus on Q1 alone.

Let V =(V1,V2,…,V n ) T be a random vector and define
Q 1 ( V ) = U ¯ . 05 M ¯ . 5 M ¯ . 5 L ¯ . 05 ,
(23)

where U ¯ . 05 , M ¯ . 5 , and L ¯ . 05 are the averages of the largest 5% of the V i ’s, the middle 50% of the V i ’s, and the smallest 5% of the V i ’s, respectively. Large values of Q1 indicate that the right tails of the sample are longer than the left tails; i.e., indicating an underlying right skewed distribution. Likewise, small values of Q1 indicate left-skewness. Note that as left (right) skew increases, Q1 is likely to decrease (increase). The statistic Q1 is not robust. One scheme under current investigation is to replace the means by medians. Keep in mind, though, that a robust diagnostic analysis is available for the initial robust fit. Hence in practice, outliers are easily flagged and modifications to the adaptive scheme can be made.

5.1 Adaptive scheme for skew normals

Our adaptive scheme consists of the 7 optimal score functions for skew-normal distributions with α=−12,−8,−4,0,4,8,and 12. So there are three scores each for left and right skewed distributions along with the normal scores. The scheme utilizes residuals from an initial Wilcoxon fit. We chose the Wilcoxon because it is robust. Also, it is optimal for a symmetric distribution (logistic) and, hence, less likely to bias selection for left or right skewness.

We decided to set the benchmarks for the selector statistic Q1 based on the medians of the distribution of Q1 for the family of skew-normal distributions with α=−10,−6,−2,2,6,and 10. Then the selection part of the adaptive scheme is given by:
Q 1 < med ( Q 1 | α = 10 ) Select score using α = 12 med ( Q 1 | α = 10 ) < Q 1 < med ( Q 1 | α = 6 ) Select score using α = 8 med ( Q 1 | α = 6 ) < Q 1 < med ( Q 1 | α = 2 ) Select score using α = 4 med ( Q 1 | α = 2 ) < Q 1 < med ( Q 1 | α = 2 ) Select score using α = 0 med ( Q 1 | α = 2 ) < Q 1 < med ( Q 1 | α = 6 ) Select score using α = 4 med ( Q 1 | α = 6 ) < Q 1 < med ( Q 1 | α = 10 ) Select score using α = 8 med ( Q 1 | α = 10 ) < Q 1 Select score using α = 12
(24)
We estimated the medians of the sampling distributions of the statistic Q1 based on simulations of size 10,000 drawn from the appropriate skew-normal distributions. Because Q1 is location and scale equivariant, simulation using standard skew-normal distributions suffices. The estimated (simulated) medians used for the scheme are given in Table 4.
Table 4

Simulated sample median of the distribution of Q 1 drawn from the skew-normal distribution with shape parameter α

α

-10

-6

-2

2

6

10

Median Q1

0.44

0.49

0.73

1.37

2.05

2.26

The simulation size is 10,000. These estimated medians are used in the adaptive scheme (24).

In summary, the algorithm for our adaptive scheme is:
  1. 1.

    Fit using Wilcoxon scores Obtain residuals e ̂ W .

     
  2. 2.

    Compute Q 1 ( e ̂ W ) and then select φ α using expression (24), using the estimated medians of Q 1.

     
  3. 3.

    Fit with selected score φ α .

     
  4. 4.

    Inference is based on the fit of Step (3).

     

We next try the scheme on the data of Example 2.1.

Example 5.1

(Example 2.1, continued). For the data in Example 2.1, the selector Q1 has the value 0.344; hence the adaptive scheme (24) selects the score function with α=−12. Table 5 summarizes the results of the fits based on skew-normal scores with α’s in a neighborhood of −8.
Table 5

For the data of Example 2.1, the adaptive scheme chose the score function with α =−12

α

β 1

β 2

β 3

τ ̂ φ

RobustR2

α=−12 (Adaptive Choice)

0.0035

0.2190

0.0368

0.3675

0.2743

α=−10

0.0030

0.2095

0.0480

0.3699

0.2732

α=−9

0.0037

0.1923

0.0514

0.3531

0.2828

α=−8

0.0042

0.1831

0.0544

0.3484

0.2865

α=−7

0.0043

0.1822

0.0537

0.3533

0.2849

α=−6

0.0044

0.1808

0.0540

0.3490

0.2884

α=−5

0.0044

0.1805

0.0531

0.3580

0.2838

α=−4

0.0043

0.1854

0.0449

0.3886

0.2668

This table shows the variation in the rank-based fits for score functions in a neighborhood of α=−8.

The values of the regression coefficients are given along with the robust coefficient of determinations R2 and estimates of τ φ . The fits are quite similar. Notice that in terms of precision, τ ̂ φ ’s, that the fit with α=−8 is the most precise. __

5.2 Simulation study

We investigated the validity and efficiency of the adaptive scheme in a Monte Carlo study using situations similar to those in Section 4. In particular, the sample size is set at n=100 and the linear model contains one predictor coefficient, β, and one indicator coefficient for the treatment, θ. We considered three sampling situations. In the first situation (I), for each simulation, α is randomly selected from the set {−12,−11,…,11,12} where the selections are made equilikely. Then the random errors for the model are generated from this S N(α) distribution. For Situation II, again α is randomly selected from the same set of values but now the random errors are random variables of the form
e i = ( 1 I . 15 ) S i + I . 15 C i ,

where S i S N(α), C i N(10,62), I.15 is Bernoulli with proportion of success 0.15, and S i ,C i ,I.15 are independent. Thus, Situation II is the same as the second situation of Section 3, i.e., right-skewed contamination. Situation III is the same as situation II except that C i N(0,62), i.e., symmetric contamination. 10,000 simulations were used for each situation.

The methods considered are: our adaptive scheme (AdSch), least squares (LS), Wilcoxon (Wil), and maximum likelihood (mle). We also considered the procedure based on the correct α; i.e., the α which is selected for the distribution of the random errors. Note that this is not a statistical method and we label it as Optrv, “rv” for random variable. Even for Situation I, the distribution of its rank-based estimate depends on the multinomial random variable involved in the selection of the simulated distribution. We only include it to serve as an yardstick for the four statistical methods.

As in the simulation study of Section 3, we considered empirical efficiency (relative to the mle) and validity of 95% confidence intervals for the parameters β and θ. The results for Situation I are summarized in Table 6. The adaptive scheme was more efficient than all other statistical methods for both parameters. In particular, it was more efficient than the mle by 6 and 5% respectively. In terms of validity all of the methods are valid. The adaptive scheme was less efficient (by 6%) than the optimal non-statistical procedure.
Table 6

Empirical efficiencies and confidence coefficients for Situation I (error distributions are skew-normals)

 

AdSch

Optrv

LS

Wil

mle

β, ARE

1.06

1.12

0.75

0.80

1.00

β, Conf

0.95

0.95

0.95

0.95

0.93

θ, ARE

1.05

1.11

0.73

0.79

1.00

θ, Conf

0.95

0.95

0.95

0.95

0.94

Empirical efficiencies are MSE’s relative to the MSE of the mles.

For Situation I, the random errors have a skew-normal distribution with shape parameter α drawn from the set {−12,−11,…,12}, while the scheme selects scores from the set {−12,−8,−4,0,4,8,12}. These sets are different; hence, it does not make sense to consider when the scheme made the “correct” selection. We did keep track of how often the selection was within two units of the distribution simulated. For the 10,000 simulations of Situation I, the estimate of this proportion is 0.584. Note that for Situations II and III, the random errors have a contaminated skew-normal distribution. In particular, it is not a skew-normal distribution. So for Situations II and III, this proportion is irrelevant.

The results for Situation II are summarized in Table 7. For this right-skewed contaminated situation, the adaptive scheme is much more efficient than the mle, 358% and 356% respectively for β and θ. The adaptive scheme is adapting to this heavy tailed situation and the mle is not robust. As expected, the Wilcoxon performs best. All the methods are valid. We included the optimal random variable procedure since it was in Situation I. Note, though, that in Situations II and III the distribution of the random errors is not a skew-normal distribution. Besides not being a statistical method it is no longer optimal in any sense.
Table 7

Empirical efficiencies and confidence coefficients for Situation II (error distributions are skewed contaminated skew-normals)

 

AdSch

Optrv

LS

Wil

mle

β, ARE

3.58

1.86

0.92

7.27

1.00

β, Conf

0.97

0.97

0.95

0.95

0.94

θ, ARE

3.56

1.56

0.94

7.36

1.00

θ, Conf

0.97

0.96

0.95

0.95

0.95

Empirical efficiencies are MSE’s relative to the MSE of the mles.

The results for the situation, III, with symmetric contamination can be found in Table 8. The adaptive scheme is much more efficient than the mle, 367% and 477% respectively for β and θ. As expected, the Wilcoxon performs best. All the rank-based methods are valid. The mle is slightly conservative for θ. The same comments hold for Optrv as in Situation II.
Table 8

Empirical efficiencies and confidence coefficients for Situation III (error distributions are symmetrically contaminated skew-normals)

 

AdSch

Optrv

LS

Wil

mle

β, ARE

3.67

1.01

0.26

6.16

1.00

β, Conf

0.94

0.97

0.95

0.97

0.95

θ, ARE

4.77

0.85

0.35

8.18

1.00

θ, Conf

0.94

0.97

0.95

0.97

0.98

Empirical efficiencies are MSE’s relative to the MSE of the mles.

Conclusion

Rank-based analyses of linear models depend on the selection of a score function. In practice, often the Wilcoxon (linear) score function is chosen. These scores require no tuning constants and, further, the Wilcoxon rank-based analysis attains 95.5% efficiency relative to the traditional least squares (LS) analysis when the random error distribution is normal. However, rank-based analyses are easily optimized if there is knowledge of the distribution of the random errors of the linear model. For example, if the random errors are normally distributed then selecting the normal scores for the rank-based analysis results in the efficiency of 100% (fully efficient) relative to the LS analysis.

In this paper, we have presented the rank-based analyses based on appropriate score functions for random errors having a distribution from the family of skew-normal distributions. In this case, the score function depends on the shape parameter α, −<α<. Of course, the rank-based analysis is fully efficient if the correct α is known. The rank-based analysis is a complete analysis, including fitting, inference (rank-based ANOVA), and robust diagnostic procedures. Based on the results of our Monte Carlo, these rank-based analyses appear to be more efficient than the maximum likelihood (mle) analysis for the skew-normal distributions considered. The most efficient rank-based analysis is based on the optimal score function, but even those rank-based analyses with shape parameters within three units of the correct α were more efficient than the mle in these situations. They were much more efficient than the mle’s over situations where the error distribution had a contaminated skew-normal distribution. Based on empirical confidence levels, all the methods in the study were valid.

The good efficiency results for the rank-based analyses in a neighborhood of the true α suggest that a Hogg-type adaptive scheme would have high efficiency. In Section 5, we developed such a scheme for the skew-normal family of distributions based on an initial robust Wilcoxon fit. In the Monte Carlo studies we performed, this scheme was more efficient than the mle over the family of skew-normal distributions and was much more efficient than the mle over the contaminated skew-normal situations. Furthermore, for the situations covered, this adaptive scheme appears to be valid.

[Kloke and McKean (2012]) developed an R package Rfit Rfit for these rank-based analyses, which can be freely downloaded at CRAN. The default scores are the Wilcoxon scores, but, as we discuss in Section 3 it is easy to add classes of scores including the optimal scores for skew-normal distributions. The adaptive scheme of Section 5 is also easily coded using RfitRfit. The necessary code for the scores and the adaptive scheme can be found at the web site cited in Section 1. Hence, computation of these rank-based analyses is not a problem.

The rank-based analyses using skew-normal scores are robust in Y (response) space, but not in X (factor) space. The weighted Wilcoxon fit proposed by [Chang et al. (1999]) yield a robust rank-based analysis which possesses 50% breakdown in X (factor) space. We are now developing such an analysis for the skew-normal scores; see [Abebe et al. (2014]) for discussion. This analysis could also be part of an adaptive scheme.

A Appendix: R code for the class of skew normal scores

Rank-based scores form a class in Rfit Rfit which consists of three principal parts. The first part is the scores function itself and the third part consists of any parameters in the function expression. Thus parts (1) and (3) for the skew-normal scores are respectively given by expression (18) and the shape parameter α. The second part is the derivative of the score function which is used in the Rfit Rfit function which estimates τ φ . Let l(x) denote the function defined in expression (18). The derivative of the optimal score function is given by
φ α ( u ) = l [ F 1 ( u ; α ) ] 1 2 ϕ [ F 1 ( u ; α ) ] Φ [ α F 1 ( u ; α ) ] ,
(25)
where
l ( x ) = 1 + α 2 ϕ ( αx ) [ αxΦ [ α F 1 ( u ; α ) ] + ϕ ( αx ) ] Φ 2 ( αx ) .

To complete the class statement for the skew-normal scores we need only compute the quantiles F−1(u;α). [Azzalini (2013]) developed the R package sn sn (available at CRAN) which computes the quantile function F−1(u;α) and, also, the corresponding pdf and cdf. The command qsn(u,shape=alpha) qsn(u,shape=alpha) returns F−1(u;α), for 0<u<1. The package sn sn requires the package mnormtmnormt.

The following R code defines the class of skew-normal scores:

The next code segment obtains the data for a plot of the scores with shape parameter α=−7.

Declarations

Acknowledgement

We acknowledge the helpful comments of an associate editor and a referee on the original manuscript.

Authors’ Affiliations

(1)
Department of Statistics, Western Michigan University
(2)
Department of Biostatistics, University of Wisconsin

References

  1. Abebe A, McKean JW: Weighted Wilcoxon estimators in Nonlinear Regression. Aust N Z J. Stat 2013, 55: 401–420. 10.1111/anzs.12046MathSciNetView ArticleGoogle Scholar
  2. Abebe, A, McKean, JW, Kloke, JD, Bilgic, Y: Iterated Reweighted Rank-Based Estimates for GEE Models. Technical Report (2014).Google Scholar
  3. Azzalini A: A class of distributions which includes the normal ones. Scand. J. Stat 1985, 12: 171–178.MathSciNetGoogle Scholar
  4. Azzalini, A: R package sn: The skew-normal and skew-t distributions (version 0.4–18).Google Scholar
  5. Chang W, McKean J, Naranjo J, Sheather S: High-breakdown rank regression. J. Am. Stat. Assoc 1999, 94: 205–219. 10.1080/01621459.1999.10473836MathSciNetView ArticleGoogle Scholar
  6. Hettmansperger TP, McKean JW: Robust Nonparametric Statistical Methods. Chapman-Hall, Boca Raton, FL; 2011.Google Scholar
  7. Hogg RV, Fisher DM, Randles RH: A two-sample adaptive distribution-free test. J. Am. Stat. Assoc 1975, 70: 656–661.Google Scholar
  8. Hogg RV, McKean JW, Craig AT: Introduction to Mathematical Statistics. Pearson, Boston; 2013.Google Scholar
  9. Huber, PJ: Robust Statistics. John Wiley & Son (1981).View ArticleGoogle Scholar
  10. Jaeckel LA: Estimating regression coefficients by minimizing the dispersion of residuals. Ann. Math. Stat 1972, 43: 1449–1458. 10.1214/aoms/1177692377MathSciNetView ArticleGoogle Scholar
  11. Jurečková J: Nonparametric estimate of regression coefficients. Ann. Math. Stat 1971, 42: 1328–1338. 10.1214/aoms/1177693245MathSciNetView ArticleGoogle Scholar
  12. Kloke JD, McKean JW: Rfit: Rank-based estimation for linear models. R J 2012, 4: 57–64.Google Scholar
  13. McKean JW: Small sample properties of JR estimators. In JSM Proceedings . American Statistical Association, Alexandria, VA; 2013.Google Scholar
  14. Kloke, JD, McKean, JW: Nonparametric statistical methods using R, Chapman-Hall, Boca Raton, FL (2014).View ArticleGoogle Scholar
  15. Kloke JD, McKean JW, Rashid M: Rank-based estimation and associated inferences for linear models with cluster correlated errors. J. Am. Stat. Assoc 2009, 104: 384–390. 10.1198/jasa.2009.0116MathSciNetView ArticleGoogle Scholar
  16. Koul HL, Sievers GL, McKean JW: An estimator of the scale parameter for the rank analysis of linear models under general score functions. Scand. J. Stat 1987, 14: 131–141.MathSciNetGoogle Scholar
  17. McKean J, Sheather S: Diagnostic procedures. Wiley Interdiscip. S. Rev.: Comput tat 2009, 1(2):221–233. 10.1002/wics.12View ArticleGoogle Scholar
  18. McKean J, Sievers G: Rank scores suitable for analysis of linear models under asymmetric error distributions. Technometrics 1989, 31: 207–218. 10.1080/00401706.1989.10488514MathSciNetView ArticleGoogle Scholar
  19. Pourahmadi M: Construction of skew-normal random variables: Are they linear combinations of normals and half-normals. J. Stat. Theory Appl 2007, 3: 314–328.MathSciNetGoogle Scholar
  20. Shomrani, A: A comparison of different schemes for selecting and estimating score functions based on residuals. Ph.D. thesis, Western Michigan University, Department of Statistics (2003).Google Scholar

Copyright

© McKean and Kloke; licensee Springer. 2014

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.