Joint distribution of rank statistics considering the location and scale parameters and its power study
- Wan-Chen Lee^{1}Email author
https://doi.org/10.1186/2195-5832-1-6
© Lee; licensee Springer. 2014
Received: 7 August 2013
Accepted: 10 February 2014
Published: 11 June 2014
Abstract
The ranking method used for testing the equivalence of two distributions has been studied for decades and is widely adopted for its simplicity. However, due to the complexity of calculations, the power of the test is either estimated by a normal approximation or found when an appropriate alternative is given. Here, via the Finite Markov chain imbedding technique, we are able to establish the marginal and joint distributions of the rank statistics considering the shift and scale parameters, respectively and simultaneously, under two different continuous distribution functions. Furthermore, the procedures of distribution equivalence tests and their power functions are discussed. Numerical results of a joint distribution of rank statistics under the standard normal distribution and the powers for a sequence of alternative normal distributions with means from −20 to 20 and standard deviations from 1 to 9 and their reciprocal are presented. In addition, we discuss the powers of the rank statistics under the Lehmann alternatives.
2010 Mathematics Subject Classification
Primary 62G07; Secondary 62G10
Keywords
FMCI Lehmann alternative Rank statistic Rank-sum test Power1 Introduction
where k is a positive integer. However, Lehmann (1998) pointed out that the power function of the rank-sum test, Equation (2), was only qualitative. Since the numerical values for assessing the probabilities in Equation (1) are considerably complicated in computation when F and G are continuous distributions with F≠G.
As the rank-sum test is widely adopted for testing the center differences of two distributions, it is natural to study the efficiency of a rank-sum test for variability (Ansari and Bradley 1960). For decades, studies have focused on proposing new definitions of the rank statistic and using the methods of Chernoff and Savage to show the relative efficiency of the proposed statistic to the F-test, see for example Mood (1954), Siegel and Tukey (1960), Ansari and Bradley (1960), and Klotz (1962). Ansari and Bradley (1960) mentioned that if the means of the X and Y samples cannot be considered equal, differences in location have a severe impact on all the tests of dispersion. Klotz (1962) showed the power of a rank test can be found by integrating the joint density of X and Y samples over that part of the m+n dimensional space defined by the alternative orderings which lie in the critical region of the test, for which conditions are very strict.
Our approach aims at releasing some of the conditions for finding the distribution of the proposed rank statistic. We systematically imbed the random vector U_{ n } into a Markov chain to induce the marginal and joint distributions of the rank statistics considering the shift and scale parameter, respectively, under any form of two distribution functions. A joint distribution of rank statistics, to the best of our knowledge, has not been studied in the literature. The main strength of using the finite Markov chain imbedding approach (FMCI) is to derive the distribution of the rank statistic without giving any conditions. Therefore, under the null hypothesis of F=G, we are able to identify a proper critical region and, under the alternative assumption, the power of the test can be determined naturally. The distribution of the random vector U_{ n }, independent of the form of the distribution function F, is also demonstrated under the null hypothesis of the distribution equivalence.
The main contributions of this paper are as follows. In Section 2.1, we introduce the procedures of deriving the distribution of the rank statistic considering the shift parameter and its power function by using FMCI. The procedures are general and can be applied to either two identical distribution functions of interest or two different continuous density functions. In Section 2.2, we address the steps for finding the distribution of the rank statistic considering the scale parameter and its power function. In Section 2.3, we retrieve the joint distribution of the rank statistics considering the location and scale parameters simultaneously as well as its power function. Numerical results of a joint distribution and some powers of the rank statistics against shift parameter and scale parameter, individually and simultaneously, are presented in Section 3. We also discuss the powers of the rank statistics under the Lehmann alternatives. We end this paper with a short conclusion in Section 4.
2 Methods
2.1 Distributions of the rank statistic in the shift case
Theorem 1
The statistic R_{ l }is equivalent to the statistic W_{ Y }, which is addressed by Wilcoxon in 1945.
Proof
Next, we demonstrate that for two random samples from the same population, the distribution of the random vector U_{ n } is independent of the form of the distribution function.
Theorem 2
Proof
which is independent of the distribution function.
This is the reason that the distribution of the rank statistic U_{ n } is distribution-free under the null hypothesis. However, the distribution of the random vector U_{ n } is discrete uniform with the mass function one over the number of possible outcomes of the random vector U_{ n } only when assuming F=G. In other words, the distribution of the random vector U_{ n } can be found by the traditional combinatorial analysis when F=G. Unfortunately, when F≠G, we will not be able to establish the distribution of U_{ n } through Equation (7) as solving the multiple integral in Equation (8) is either tedious given some appropriate alternative distribution function or difficult. Our understanding is that finding the power of the test has not been solved in most cases. To overcome this situation, we bring in the finite Markov chain imbedding approach.
and p_{ i } is defined in Equation (3).
Theorem 3
where$\mathit{B}\left({C}_{r}\right)=\sum _{k:{R}_{l}\left({\mathit{U}}_{n}\right)=r}{e}_{k},\phantom{\rule{2.83795pt}{0ex}}{e}_{k}$is a$1\times \left(\genfrac{}{}{0ex}{}{m+n}{n}\right)$unit row vector corresponding to state u_{ n }, ξ(=P(Z_{0}=1)=1) is the initial probability and M_{ t }, t=1,…,n, are the transition probability matrices of the imbedded Markov chain defined on the state space Ω_{ t }.
Proof
where $\mathit{B}\left({C}_{r}\right)=\sum _{k:{R}_{l}\left({\mathit{U}}_{n}\right)=r}{e}_{k},\phantom{\rule{2.77626pt}{0ex}}{e}_{k}$ is a $1\times \left(\genfrac{}{}{0ex}{}{m+n}{n}\right)$ unit row vector corresponding to state U_{ n }, we then obtain the conditional probability of the rank R_{ l }.
Note that the alternative hypothesis is subject to the purpose of the test. This simply needs to be slightly modified if a one-sided test is adopted.
2.2 Distributions of the rank statistic in the scale case
Let n^{−} be the length of the vector ${\mathit{U}}_{n}^{-}$ and n^{+} be the length of the vector ${\mathit{U}}_{n}^{+}$.
Theorem 4
where$\mathit{B}\left({C}_{r}\right)=\sum _{k:{R}_{s}\left({\mathbf{U}}_{n}\right)=r}{e}_{k},\phantom{\rule{2.83795pt}{0ex}}{e}_{k}$is a$1\times \left(\genfrac{}{}{0ex}{}{m+n}{n}\right)$unit row vector corresponding to state U_{ n }, ξ(=P(Z_{0}=1)=1) is the initial probability and M_{ t }, t=1,…,n are the transition probability matrices of the imbedded Markov chain defined on the state space Ω_{ t }.
Proof
In accordance with Equation (11), we use the possible value of R_{ s } as a rule of the partition. The rest of the proof follows along the same line as that of Theorem 3, and here, is omitted.
which establishes the distribution of R_{ s }.
2.3 Joint distributions of the rank statistics in the shift and scale case
We have derived the marginal distributions of R_{ l } and R_{ s } in terms of U_{ n }, respectively, which yield the following theorem.
Theorem 5
where$\mathit{B}\left({C}_{r}\right)=\sum _{k:{R}_{l}\left({\mathit{U}}_{n}\right)={r}_{1}\&{R}_{s}\left({\mathit{U}}_{n}\right)={r}_{2}}{e}_{k},\phantom{\rule{2.83795pt}{0ex}}{e}_{k}$is a$1\times \left(\genfrac{}{}{0ex}{}{m+n}{n}\right)$unit row vector corresponding to state u_{ n }, ξ(=P(Z_{0}=1)=1) is the initial probability and M_{ t }, t=1,…,n are the transition probability matrices of the imbedded Markov chain defined on the state space Ω_{ t }.
Proof
By Equations (4) and (11), we know each u_{ n } in the state space Ω_{ n } has corresponding values of R_{ l } and R_{ s }. The combinations of the values R_{ l } and R_{ s } are used to be the standard of the partition. The rest of the proof follows along the same line as that of Theorem 3.
Note that unless having a conjecture about the values of θ and σ, we tend to use a two-sided test. However, with the knowledge of the center and shape of the distribution of interest, deciding a sectorial critical region is a better choice, for which an example is demonstrated in the numerical studies.
3 Numerical results and discussion
3.1 A joint distribution of R_{ l }and R_{ s }
3.2 Powers for a joint test using R_{ l }and R_{ s }
3.3 Lehmann alternatives
Power comparisons for a one-sided rank test H_{ 0 }:F ( x ; θ_{ o }, σ_{ o }) = G ( x ; θ_{ a }, σ_{ a }) v.s. H_{ a }:F^{ k }( x ; θ_{ o }, σ_{ o }) = G ( x ; θ_{ a }, σ_{ a })
m= 6 n= 10 | m= 10 n= 10 | m= 10 n= 20 | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
F | Test | β(F) | β(F^{2}) | β(F^{3}) | β(F^{6}) | β(F) | β(F^{2}) | β(F^{3}) | β(F^{6}) | β(F) | β(F^{2}) | β(F^{3}) | β(F^{6}) |
U(0,1) | R _{ l } | .090 | .411 | .647 | .900 | .096 | .496 | .761 | .967 | .099 | .591 | .845 | .984 |
R _{ s } | .080 | .152 | .193 | .218 | .076 | .137 | .149 | .123 | .100 | .236 | .370 | .638 | |
R_{ l }&R_{ s } | .100 | .452 | .699 | .934 | .100 | .531 | .799 | .981 | .100 | .622 | .878 | .992 | |
t(3) | R _{ l } | 0.090 | .412 | .639 | .897 | 0.096 | .493 | .756 | .965 | 0.099 | .574 | .841 | .987 |
R _{ s } | 0.080 | .150 | .197 | .217 | 0.076 | .137 | .152 | .121 | 0.100 | .234 | .367 | .634 | |
R_{ l }&R_{ s } | 0.100 | .453 | .696 | .932 | 0.100 | .528 | .798 | .980 | 0.100 | .606 | .874 | .993 | |
E x p(1) | R _{ l } | 0.090 | .411 | .650 | .899 | 0.096 | .490 | .764 | .967 | 0.099 | .579 | .841 | .987 |
R _{ s } | 0.080 | .149 | .195 | .217 | 0.076 | .140 | .152 | .122 | 0.100 | .232 | .376 | .641 | |
R_{ l }&R_{ s } | 0.100 | .451 | .702 | .933 | 0.100 | .525 | .805 | .982 | 0.100 | .607 | .875 | .993 |
4 Conclusion
Our proposed algorithm provides a solution for finding the power of distribution equivalence tests considering the shift and scale parameters, respectively and simultaneously. Numerical studies show that a joint test should be adopted for the test homogeneity of distributions as well as under Lehmann alternatives. Also an elliptic critical region is a better choice rather than a rectangular one for a joint test. In practice, it is reasonable to have neither the normality assumption nor equal mean/variance of the interested distributions. However, our algorithm highly depends on the technology equipments as the possible states in Ω_{ n } grow rapidly when the sample sizes increase. Therefore, we can, so far, only target small sample sizes in our work.
Declarations
Acknowledgments
The author would like to thank James C. Fu and anonymous referee whose comments led to significant improvements of this manuscript.
Authors’ Affiliations
References
- Ansari AR, Bradley RA: Rank-Sum Tests for Dispersions. Ann. Math. Stat 1960, 31: 1174–1189. 10.1214/aoms/1177705688MathSciNetView ArticleGoogle Scholar
- Collings BJ, Hamilton MA: Estimating the power of the two-sample Wilcoxon Test for location shift. Biometrics 1988, 44: 847–860. 10.2307/2531596View ArticleGoogle Scholar
- Klotz J: Nonparametric test for scale. Ann. Math. Stat 1962, 33: 498–512. 10.1214/aoms/1177704576MathSciNetView ArticleGoogle Scholar
- Lehmann EL: The power for rank tests. Ann. Math. Stat 1953, 24: 23–43. 10.1214/aoms/1177729080MathSciNetView ArticleGoogle Scholar
- Lehmann EL: Nonparametrics: Statistical Methods Based on Ranks. Prentice-Hall, New Jersey; 1998.Google Scholar
- Mann HB, Whitney DR: On a test of whether one of two random variables is stochastically larger than the other. Ann. Math. Stat 1947, 18: 50–60. 10.1214/aoms/1177730491MathSciNetView ArticleGoogle Scholar
- Mood AM: On the asymptotic efficiency of certain nonparametric two-sample tests. Ann. Math. Stat 1954, 25: 514–522. 10.1214/aoms/1177728719MathSciNetView ArticleGoogle Scholar
- Rosner B, Glynn RJ: Power and sample size estimation for the Wilcoxon rank sum test with application to comparisons of C statistics from alternative prediction models. Biometrics 2009, 65: 188–197. 10.1111/j.1541-0420.2008.01062.xMathSciNetView ArticleGoogle Scholar
- Shieh G, Jan SL, Randles RH: On power and sample size determinations for the Wilcoxon-Mann-Whitney test. Nonparametric Stat 2006, 18: 33–43. 10.1080/10485250500473099MathSciNetView ArticleGoogle Scholar
- Siegel S, Tukey JW: A nonparametric sum of ranks procedure for relative spread in unpaired samples. J. Am. Stat. Assoc 1960, 55: 429–445. 10.1080/01621459.1960.10482073MathSciNetView ArticleGoogle Scholar
- Wilcoxon F: Individual comparisons by ranking methods. Biometrics 1945, 1: 80–83. 10.2307/3001968MathSciNetView ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.