Comparing the variances of two dependent variables
- Rand Wilcox^{1}Email author
DOI: 10.1186/s40488-015-0030-z
© Wilcox. 2015
Received: 7 April 2015
Accepted: 6 August 2015
Published: 15 August 2015
Abstract
Various methods have been derived that are designed to test the hypothesis that two dependent variables have a common variance. Extant results indicate that all of these methods perform poorly in simulations. The paper provides a new perspective on why the Morgan-Pitman test does not control the probability of a Type I error when the marginal distributions have heavy tails. This new perspective suggests an alternative method for testing the hypothesis of equal variances and simulations indicate that it continues to perform well in situations where the Morgan-Pitman test performs poorly.
Keywords
Morgan-Pitman test Heteroscedasticity HC4 estimator Well elderly 2 studyIntroduction
A classic problem that arises in various situations is testing the hypothesis that two dependent variables have equal variances. For example, when measuring systolic and diastolic blood pressure, the quality of two different blood pressure gauges depends in part on whether one type of gauge has more variability than some other type. Rothstein et al. (1981) cite two examples in psychology in which a test of equality of variances is of interest. Other examples from psychology are described in Lord and Novick (1968); Games et al. (1972) and Levy (1976). Snedecor and Cochran (1967) also cite two examples, one of which deals with testing for differences in reliability between two laboratories.
McCulloch (1987) suggests replacing Pearson’s correlation with Spearman’s correlation. But simulations in Wilcox (1990) indicate that again the actual level can be substantially higher than the nominal level when sampling from a heavy-tailed distribution. Wilcox (1990) reported simulation results on several other methods and found that all of them performed poorly under non-normality. They included methods derived by Tiku and Balakrishnan (1986) as well as a Box-Scheffé type test that has close connections to a method suggested by Levy (1976).
This paper provides a different perspective than the results reported by McCulloch (1987) and Mudholker et al. (2003) regarding why the Morgan-Pitman performs poorly when sampling from a heavy-tailed distribution. Details are given in Section 2. This alternative perspective suggests a general strategy for getting improved control over the Type I error probability. A particular variation of this strategy is described in Section 3. Simulation results based on the method in Section 3 are reported in Section 4.
The Morgan-Pitman test and heavy-tailed distributions
for testing H _{0}:ρ=0. From basic principles, T has a Student’s T distribution if either X or Y has a normal distribution and simultaneously, X and Y are independent. Note that independence implies homoscedasticity. That is, the conditional variance of Y, given X, does not depend on the value of X, which plays a fundamental role in the derivation of T (e.g., Hogg and Craig 1970). In the context of least squares regression, it is known that if there is heteroscedasticity, the usual test of the hypothesis of a zero slope uses the wrong standard error (e.g., Long and Ervin 2000). It follows that when testing H _{0}:ρ=0 using the test statistic T given by (3), again the wrong standard error is being used. As the sample size increases, the probability of rejecting can increase even when ρ=0 but there is heteroscedasticity (e.g., Wilcox 2012).
Of course, Fig. 1 does not establish that there is exact homoscedasticity regarding the association between U and V when dealing with normal distributions. The only point is that as we move from normal distributions toward heavy-tailed distributions, heteroscedasticity becomes more pronounced, so it is not surprising that the Morgan-Pitman test performs poorly for such situations.
Figure 2 also indicates why replacing Pearson’s correlation with Spearman’s correlation is unsatisfactory. Converting observations to ranks does not eliminate heteroscedasticity. Converting the data used in Fig. 2 to ranks, a plot of the.2,.5 and.8 quantile regression lines indicates that heteroscedasticity is less severe, which in turn suggests that using Spearman’s correlation improves control over the Type I error probability compared to using T _{ uv }, but that poor control over the Type I error probability will still be an issue. Simulation results in Section 4 demonstrate the extent to which this is the case.
Modification of the Morgan-Pitman test
The diagonal elements of S are the estimated squared standard errors of b _{0} and b _{1}. For convenience, the HC4 estimates of the standard errors of b _{0} and b _{1} are denoted by S _{0} and S _{1}.
where t is the 1−α/2 quantile of Student’s t distribution with ν=n−2 degrees of freedom. This suggests testing (1) by computing a confidence interval based on (4) but with X and Y replaced by U and V, respectively. This will be called method HC henceforth.
Simulation results
Simulations were used to assess the extent to which method HC controls the Type I error probability. The sample size was taken to be n=20 and 100. Estimated Type I error probabilities, \(\hat {\alpha }\), were based on 10000 replications.
Some properties of the g-and-h distribution
g | h | κ _{1} | κ _{2} |
---|---|---|---|
0.0 | 0.0 | 0.00 | 3.0 |
0.0 | 0.2 | 0.00 | 21.46 |
0.2 | 0.0 | 0.61 | 3.68 |
0.2 | 0.2 | 2.81 | 155.98 |
Estimated probability of a Type I error
n=20 | n=100 | |||||||
---|---|---|---|---|---|---|---|---|
g | h | ρ | HC | MP | SP | HC | MP | SP |
0.0 | 0.0 | 0.0 | .047 | .051 | .052 | .048 | .050 | .045 |
0.0 | 0.2 | 0.0 | .065 | .256 | .088 | .057 | .355 | .090 |
0.2 | 0.0 | 0.0 | .053 | .075 | .052 | .052 | .090 | .057 |
0.2 | 0.2 | 0.0 | .066 | .286 | .087 | .050 | .403 | .085 |
0.0 | 0.0 | 0.5 | .052 | .050 | .050 | .047 | .051 | .049 |
0.0 | 0.2 | 0.5 | .059 | .242 | .080 | .051 | .354 | .086 |
0.2 | 0.0 | 0.5 | .052 | .076 | .056 | .052 | .087 | .055 |
0.2 | 0.2 | 0.5 | .061 | .275 | .086 | .048 | .398 | .081 |
Power comparisons seem meaningless for situations where the Type I error probability is not controlled reasonably well. Even for the skewed, light-tailed distribution considered here, where the kurtosis is only 3.68, the Morgan-Pitman test does not perform well, particularly as the sample size increases. But to provide at least some perspective, simulations were run again for the bivariate normal case where σ _{1}=1 and σ _{2}=1.5. With ρ=0 and n=20, power for methods HC, MP and SP was estimated to be.329,.391 and.346, respectively. Increasing σ _{2}=2, the estimates were.739,.842 and.775. With ρ=.5, σ _{2}=1.5 and n=20, power for methods HC, MP and SP was estimated to be.421,.498 and.442, respectively. Increasing σ _{2}=1.5, the estimates were.828,.909 and.855. So for symmetric and sufficiently light-tailed distributions, the Morgan-Pitman test offers a power advantage that would seem to be of practical importance.
Illustrations
Rao (1948) reports data on the weight of cork borings from the north, east, west and south side of 28 trees. Comparing the variances of the east and south sides with the Morgan-Pitman test, the p-value is.043. Using the modification of the Morgan-Pitman test (method HC), the p-value is.186, the only point being that in practice, the choice of method can make a difference.
The next illustration is based on data from the Well Elderly 2 study Clark et al. (2011). A general goal in this study was to assess the efficacy of an intervention strategy aimed at improving the physical and emotional health of older adults. A portion of the study was aimed at understanding the impact of intervention on a measure of depressive symptoms based on the Center for Epidemiologic Studies Depressive Scale (CESD). The CESD (Radloff 1977) is sensitive to change in depressive status over time and has been successfully used to assess ethnically diverse older people (Lewinsohn et al. 1988). Higher scores indicate a higher level of depressive symptoms. The sample size is 328.
Concluding remarks
Of course, simulations do not prove that a method provides adequate control over the Type I error probability among all situations that might be encountered. The main result is that method HC continues to perform well in situations where the Morgan-Pitman test, and the variation based on Spearman’s correlation, perform poorly.
Heterosceasticity can be addressed using a variety of other methods as noted in the introduction. Evidently, in terms of controlling the probability of a Type I error, alternative methods would provide at best a slight improvement over method HC among the situations considered in the simulations simply because HC performs well. Perhaps situations can be found where some other method for dealing with heteroscedasticity offers a practical advantage, but this remains to be determined.
Declarations
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
- Clark, F, Jackson, J, Carlson, M, Chou, CP, Cherry, BJ, Jordan-Marsh, M, Knight, BG, Mandel, D, Blanchard, J, Granger, DA, Wilcox, RR, Lai, MY, White, B, Hay, J, Lam, C, Marterella, A, Azen, SP: Effectiveness of a lifestyle intervention in promoting the well-being of independently living older people: results of the Well Elderly 2 Randomise Controlled Trial. J. Epidemiol. Community Health. 66, 782–790 (2011). doi:10.1136/jech.2009.099754 View ArticleGoogle Scholar
- Cribari-Neto, F: Asymptotic inference under heteroskedasticity of unknown form. Comput. Stat. Data Anal. 45, 215–233 (2004).MATHMathSciNetView ArticleGoogle Scholar
- Cribari-Neto, F, Souza, TC, Vasconcellos, KLP: Inference under heteroskedasticity and leveraged data. Commun Stat - Theory Methods. 36, 1877–1888 (2007).MATHMathSciNetView ArticleGoogle Scholar
- Games, PA, Winkler, HB, Probert, DA: Robust tests for homogeneity of variance. Educ. Psychol. Meas. 32, 887–909 (1972).Google Scholar
- Godfrey, LG: Tests for regression models with heteroskedasticity of unknown form. Comput. Stat. Data Anal. 50, 2715–2733 (2006).MATHMathSciNetView ArticleGoogle Scholar
- Hoaglin, DC: Summarizing shape numerically: The g-and-h distribution. In: Hoaglin, D, Mosteller, F, Tukey J (eds.)Exploring Data Tables Trends and Shapes, pp. 461–515. Wiley, New York (1985).Google Scholar
- Hogg, RV, Craig, AT: Introduction to Mathematical Statistics. 3rd Ed. Macmillan, New York (1970).MATHGoogle Scholar
- Lewinsohn, PM, Hoberman, HM, Rosenbaum, M: A prospective study of risk factors for unipolar depression. J. Abnorm. Psychol. 97, 251–64 (1988).View ArticleGoogle Scholar
- Levy, KJ: A procedure for testing the equality of p correlated variances. Br. J. Math. Stat. Psychol. 29, 89–93 (1976).MATHView ArticleGoogle Scholar
- Long, JS, Ervin, LH: Using heteroscedasticity consistent standard errors in the linear regression model. Am. Stat. 54, 217–224 (2000).MATHGoogle Scholar
- Lord, F, Novick, M: Statistical theories of mental test scores. Addison-Wesley, Reading, MA (1968).MATHGoogle Scholar
- McCulloch, CE: Tests for equality of variance for paired data. Commun. Stat. Theory Methods. 16, 1377–1391 (1987).MathSciNetView ArticleGoogle Scholar
- Morgan, WA: A test for the significance of the difference between two variances in a sample from a normal bivariate population. Biometrika. 31, 13–19 (1939).MATHMathSciNetGoogle Scholar
- Mudholkar, GS, Wilding, GE, Mietlowski, WL: Robustness Properties of the Pitman-Morgan Test. Commun. Stat. Theory Methods. 32, 1801–1816 (2003).MathSciNetView ArticleGoogle Scholar
- Ng, M, Wilcox, RR: Level robust methods based on the least squares regression line. J. Mod. Appl. Stat. Methods. 8, 384–395 (2009).MATHGoogle Scholar
- Pitman, EJG: A Note on Normal Correlation. Biometrika. 31, 9–12 (1939).MATHMathSciNetView ArticleGoogle Scholar
- Radloff, L: The CESD scale: a self report depression scale for research in the general population. Appl. Psychol. Meas. 1, 385–401 (1977).View ArticleGoogle Scholar
- Rao, CR: Tests of significance in multivariate analysis. Biometrika. 35, 58–79 (1948).MATHMathSciNetView ArticleGoogle Scholar
- Rothstein, SM, Bell, WD, Patrick, JA, Miller, H: A jackknife test of homogeneity of variance with paired replicates of data. Psychometrika. 46, 35–40 (1981).View ArticleGoogle Scholar
- Snedecor, GW, Cochran, W: Statistical Methods. 6th Ed. University Press, Ames, IA (1967).Google Scholar
- Tiku, ML, Balakrishna, N: A robust test for testing the correlation coefficient. Commun. Stat. Simul. Comput. 15, 945–971 (1986).MATHView ArticleGoogle Scholar
- Wilcox, RR: Comparing the variances of two dependent groups. J. Educ. Stat. 15, 237–247 (1990).View ArticleGoogle Scholar
- Wilcox, RR: Introduction to Robust Estimation and Hypothesis Testing. 3rd Edition. Academic Press, San Diego, CA (2012).Google Scholar