# Joint distribution of rank statistics considering the location and scale parameters and its power study

## Abstract

The ranking method used for testing the equivalence of two distributions has been studied for decades and is widely adopted for its simplicity. However, due to the complexity of calculations, the power of the test is either estimated by a normal approximation or found when an appropriate alternative is given. Here, via the Finite Markov chain imbedding technique, we are able to establish the marginal and joint distributions of the rank statistics considering the shift and scale parameters, respectively and simultaneously, under two different continuous distribution functions. Furthermore, the procedures of distribution equivalence tests and their power functions are discussed. Numerical results of a joint distribution of rank statistics under the standard normal distribution and the powers for a sequence of alternative normal distributions with means from −20 to 20 and standard deviations from 1 to 9 and their reciprocal are presented. In addition, we discuss the powers of the rank statistics under the Lehmann alternatives.

### 2010 Mathematics Subject Classification

Primary 62G07; Secondary 62G10

## 1 Introduction

Suppose that on the basis of observations X1,…,X m ;Y1,…,Y n from the cumulative distribution functions F and G, two major topics in the hypothesis testing are to test the equivalence of either the center or the dispersion of the two populations of interest. The hypotheses are stated, for some θ ≠ 0,

${H}_{o}:F\left(x\right)=G\left(x\right)\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\text{versus}\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}{H}_{a}:F\left(x\right)=G\left(x-\theta \right),\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\mathit{\text{for}}\phantom{\rule{2.77626pt}{0ex}}\mathit{\text{all}}\phantom{\rule{2.77626pt}{0ex}}x,$

which is known as the shift alternative and, for some σ≠1,

${H}_{o}:F\left(x\right)=G\left(x\right)\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\text{versus}\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}{H}_{a}:F\left(x\right)=G\left(x{\sigma }^{-1}\right),\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\mathit{\text{for}}\phantom{\rule{2.77626pt}{0ex}}\mathit{\text{all}}\phantom{\rule{2.77626pt}{0ex}}\mathrm{x.}$

Wilcoxon (1945) proposed the ranking method for testing the significance of the difference of the two populations means, also known as the Wilcoxon rank-sum test, and defined a statistic W Y , as the sum of the ranks of the ys in the combined and ordered sequence of xs and ys, equivalent to

$\sum _{j=1}^{n}\left\{\phantom{\rule{2.77626pt}{0ex}}#\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\text{of}\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}{x}_{i}^{\prime }s\phantom{\rule{2.77626pt}{0ex}}<\phantom{\rule{2.77626pt}{0ex}}{y}_{j}\phantom{\rule{2.77626pt}{0ex}}\right\}+\frac{n\left(n+1\right)}{2}.$

Mann and Whitney (1947) introduced an elaboration of the ranking test, proposed the statistic ${U}_{X}=\mathit{\text{mn}}-{W}_{Y}+\frac{n\left(n+1\right)}{2}$, and proved that the limiting distribution of the test statistic U X is

$\frac{{U}_{X}-E\left({U}_{X}\right)}{\sqrt{\mathit{\text{Var}}\left({U}_{X}\right)}}\stackrel{L}{\to }N\left(0,1\right)$

as m and n go to infinity in any arbitrary manner where

$E\left({U}_{X}\right)=\mathit{\text{mn}}{p}_{1}$

and

$\mathit{\text{Var}}\left({U}_{X}\right)={\mathit{\text{mnp}}}_{1}\left(1-{p}_{1}\right)+\mathit{\text{mn}}\left(n-1\right)\left({p}_{2}-{p}_{1}^{2}\right)+\mathit{\text{mn}}\left(m-1\right)\left({p}_{3}-{p}_{1}^{2}\right),$

with

$\begin{array}{lcr}{p}_{1}& =& P\left(X>Y\right),\hfill \\ {p}_{2}& =& P\left(X>Y\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\text{and}\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}X>{Y}^{\prime }\right),\\ {p}_{3}& =& P\left(X>Y\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\text{and}\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}{X}^{\prime }>Y\right),\end{array}$
(1)

where X,X and Y,Y are independently distributed, X,X with the distribution F, and Y,Y with the distribution G. Intuitively, the power for the right-sided test can be found as

$\begin{array}{l}P\left(\right\frac{{U}_{X}-E\left({U}_{X}\right)}{\sqrt{\mathit{\text{Var}}\left({U}_{X}\right)}}>\frac{c-E\left({U}_{X}\right)}{\sqrt{\mathit{\text{Var}}\left({U}_{X}\right)}}\phantom{\rule{2.77626pt}{0ex}}\left|\right\phantom{\rule{2.77626pt}{0ex}}{H}_{a}\left)\right,\end{array}$
(2)

where c is the value such that

$\Phi \left(\frac{c-\frac{1}{2}\mathit{\text{mn}}}{\sqrt{\frac{1}{12}\mathit{\text{mn}}\left(m+n+1\right)}}\phantom{\rule{2.77626pt}{0ex}}\left|\right\phantom{\rule{2.77626pt}{0ex}}{H}_{o}\right)\ge 1-\mathrm{\alpha .}$

Over the years, there have been studies on finding the exact or approximate power for the rank-sum test. By choosing an appropriate alternative distribution function, Shieh et al. (2006) derived the exact power for the uniform, normal, double exponential and exponential shift models. Rosner and Glynn (2009) discussed power against the family of alternatives of the form

${\Phi }^{-1}\left({F}_{Y}\left(y\right)\right)={\Phi }^{-1}\left({F}_{X}\left(y\right)\right)+\mu \phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\mathit{\text{forsome}}\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\mu \ne 0,$

where the underlying distributions F X and F Y are normal. Collings and Hamilton (1988) presented a bootstrap method to find the empirical distribution functions in order to approximate the power against the shift alternative. Lehmann (1953) derived the power function as

$\begin{array}{l}P\left({S}_{1}={s}_{1},{S}_{2}={s}_{2},\cdots \phantom{\rule{0.3em}{0ex}},{S}_{n}={s}_{n}\right)=\frac{{k}^{n}}{\left(\genfrac{}{}{0}{}{m+n}{m}\right)}\prod _{j=1}^{n}\frac{\Gamma \left({s}_{j}+\mathit{\text{jk}}-j\right)}{\Gamma \left({s}_{j}\right)}\frac{\Gamma \left({s}_{j+1}\right)}{\Gamma \left({s}_{j+1}+\mathit{\text{jk}}-j\right)},\end{array}$

where s j is the rank of y j in the combined samples for the alternative hypothesis of

${G}_{Y}\left(x\right)={F}_{X}{\left(x\right)}^{k},\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\mathit{\text{forall}}\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}x,$

where k is a positive integer. However, Lehmann (1998) pointed out that the power function of the rank-sum test, Equation (2), was only qualitative. Since the numerical values for assessing the probabilities in Equation (1) are considerably complicated in computation when F and G are continuous distributions with FG.

As the rank-sum test is widely adopted for testing the center differences of two distributions, it is natural to study the efficiency of a rank-sum test for variability (Ansari and Bradley 1960). For decades, studies have focused on proposing new definitions of the rank statistic and using the methods of Chernoff and Savage to show the relative efficiency of the proposed statistic to the F-test, see for example Mood (1954), Siegel and Tukey (1960), Ansari and Bradley (1960), and Klotz (1962). Ansari and Bradley (1960) mentioned that if the means of the X and Y samples cannot be considered equal, differences in location have a severe impact on all the tests of dispersion. Klotz (1962) showed the power of a rank test can be found by integrating the joint density of X and Y samples over that part of the m+n dimensional space defined by the alternative orderings which lie in the critical region of the test, for which conditions are very strict.

Our approach aims at releasing some of the conditions for finding the distribution of the proposed rank statistic. We systematically imbed the random vector U n into a Markov chain to induce the marginal and joint distributions of the rank statistics considering the shift and scale parameter, respectively, under any form of two distribution functions. A joint distribution of rank statistics, to the best of our knowledge, has not been studied in the literature. The main strength of using the finite Markov chain imbedding approach (FMCI) is to derive the distribution of the rank statistic without giving any conditions. Therefore, under the null hypothesis of F=G, we are able to identify a proper critical region and, under the alternative assumption, the power of the test can be determined naturally. The distribution of the random vector U n , independent of the form of the distribution function F, is also demonstrated under the null hypothesis of the distribution equivalence.

The main contributions of this paper are as follows. In Section 2.1, we introduce the procedures of deriving the distribution of the rank statistic considering the shift parameter and its power function by using FMCI. The procedures are general and can be applied to either two identical distribution functions of interest or two different continuous density functions. In Section 2.2, we address the steps for finding the distribution of the rank statistic considering the scale parameter and its power function. In Section 2.3, we retrieve the joint distribution of the rank statistics considering the location and scale parameters simultaneously as well as its power function. Numerical results of a joint distribution and some powers of the rank statistics against shift parameter and scale parameter, individually and simultaneously, are presented in Section 3. We also discuss the powers of the rank statistics under the Lehmann alternatives. We end this paper with a short conclusion in Section 4.

## 2 Methods

### 2.1 Distributions of the rank statistic in the shift case

Let {X1,…,X m } and {Y1,…,Y n } be two independent samples from the continuous cumulative density distributions F(x) and G(xθ), respectively. Given x={x1,…,x m } and x[i] is the ith smallest number in the sample, we have

${p}_{i}=P\left({x}_{\left[i-1\right]}

for i=1,2,…,m+1 where x[0]=− and x[m+1]=. Therefore, we define the sampling distribution of Y in the (m+1) intervals as

$\begin{array}{lcr}\mathbit{p}& =& \left(G\left({x}_{\left[1\right]}\right)-G\left({x}_{\left[0\right]}\right),\dots ,G\left({x}_{\left[m+1\right]}\right)-G\left({x}_{\left[m\right]}\right)\right)\hfill \\ =& \left({p}_{1},{p}_{2},\dots ,{p}_{m+1}\right).\hfill \end{array}$
(3)

Given m, for t=1,2,…,n, let

$\begin{array}{l}{\Omega }_{t}=\left\{{\mathbit{u}}_{t}=\left({u}_{1}\left(t\right),\cdots \phantom{\rule{0.3em}{0ex}},{u}_{m+1}\left(t\right)\right):\sum _{i=1}^{m+1}{u}_{i}\left(t\right)=t\phantom{\rule{2.77626pt}{0ex}}\text{and}\phantom{\rule{2.77626pt}{0ex}}{u}_{i}\left(t\right)\ge 0,\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}i=1,\dots ,m+1\right\},\end{array}$

where u i (t) is the number of ys in the interval [ x[i−1],x[i]) among y1,…,y t . For each u n =(u1(n),,um+1(n)), we have a corresponding rank-sum of y’s in the combined sample

$\begin{array}{l}{R}_{l}\left({\mathbit{U}}_{n}={\mathbit{u}}_{n}|\mathbit{X}\right)=\frac{\sum _{i=1}^{m+1}{u}_{i}^{2}\left(n\right)+\sum _{i=1}^{m+1}{u}_{i}\left(n\right)}{2}+\sum _{i=1}^{m}\left({u}_{i}\left(n\right)+1\right)\left(\sum _{j=i+1}^{m+1}{u}_{j}\left(n\right)\right).\end{array}$
(4)

#### Theorem 1

The statistic R l is equivalent to the statistic W Y , which is addressed by Wilcoxon in 1945.

#### Proof

Let

$I\left({x}_{i},{y}_{j}\right)=\left\{\begin{array}{cc}1\phantom{\rule{1em}{0ex}}& \text{if}\phantom{\rule{0.3em}{0ex}}{x}_{i}<{y}_{j}\\ 0\phantom{\rule{1em}{0ex}}& \text{otherwise}.\end{array}\right\$

The rank statistic W Y , sum of the ranks of y’s observations, can be determined by

$\begin{array}{lcr}\sum _{j=1}^{n}\left(\sum _{i=1}^{m}I\left({x}_{i},{y}_{j}\right)+j\right)& =& \sum _{j=1}^{n}\sum _{i=1}^{m}I\left({x}_{i},{y}_{j}\right)+\sum _{j=1}^{n}j\\ =& \sum _{i=1}^{m}\sum _{j=1}^{n}I\left({x}_{i},{y}_{j}\right)+\frac{n\left(n+1\right)}{2}.\end{array}$
(5)

The first summation of the first term in Equation (5) can be interpreted as the number of y observations larger than x[i] which is $\sum _{j=i+1}^{m+1}{u}_{j}\left(n\right)$ in our expression. It is not difficult to see that $\sum _{i=1}^{m+1}{u}_{i}\left(n\right)$ equals n, the size of y sample. Therefore, the equation can be rewritten as

$\begin{array}{l}\sum _{i=1}^{m}\left(\sum _{j=i+1}^{m+1}{u}_{j}\left(n\right)\right)+\frac{\sum _{i=1}^{m+1}{u}_{i}{\left(n\right)}^{2}+2\sum _{i=1}^{m}{u}_{i}\left(n\right)\left(\sum _{j=i+1}^{m+1}{u}_{j}\left(n\right)\right)+\sum _{i=1}^{m+1}{u}_{i}\left(n\right)}{2}.\end{array}$

It is then easy to see that

$\begin{array}{l}\sum _{i=1}^{m}\left({u}_{i}\left(n\right)+1\right)\left(\sum _{j=i+1}^{m+1}{u}_{j}\left(n\right)\right)+\frac{\sum _{i=1}^{m+1}{u}_{i}{\left(n\right)}^{2}+\sum _{i=1}^{m+1}{u}_{i}\left(n\right)}{2}={R}_{l}.\end{array}$

Next, we demonstrate that for two random samples from the same population, the distribution of the random vector U n is independent of the form of the distribution function.

#### Theorem 2

Distribution-free property of U n .

$\begin{array}{l}P\left({\mathbit{U}}_{n}={\mathbit{u}}_{n}|{H}_{o}\right)=\frac{1}{\mathit{\text{Card}}\left({\Omega }_{n}\right)}=\frac{1}{\left(\genfrac{}{}{0}{}{m+n}{n}\right)}.\end{array}$
(6)

#### Proof

We know the joint PDF of the ordered sample of xs is given by

$f\left({x}_{\left[1\right]},\dots ,{x}_{\left[m\right]}\right)=m!\phantom{\rule{2.77626pt}{0ex}}\prod _{i=1}^{m}f\left({x}_{i}\right)$

and, when F=G, the conditional probability of the random vector U n given X=(x1,x2,…,x m ) is

$\begin{array}{l}P\left({\mathbit{U}}_{n}={\mathbit{u}}_{n}|\phantom{\rule{2.77626pt}{0ex}}{x}_{1},{x}_{2},\dots ,{x}_{m}\phantom{\rule{2.77626pt}{0ex}}\right)=\frac{n!}{\prod _{i=1}^{m+1}{u}_{i}\left(n\right)!}\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\prod _{i=1}^{m+1}\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}{\left({\int }_{{x}_{\left[i-1\right]}}^{{x}_{\left[i\right]}}f\left(y\right)\mathit{\text{dy}}\right)}^{{u}_{i}\left(n\right)},\end{array}$
(7)

where x[0]=− and x[m+1]=. By taking the expected value of the conditional probability, we have

$\begin{array}{l}P\left({\mathbit{U}}_{n}={\mathbit{u}}_{n}|{H}_{o}\right)\hfill \\ =\underset{-\infty \le {x}_{\left[1\right]}\le \cdots \le {x}_{\left[m\right]}\le \infty }{\int \cdots \int }P\left({\mathbit{u}}_{n}|\phantom{\rule{2.77626pt}{0ex}}{x}_{1},\dots ,{x}_{m}\phantom{\rule{2.77626pt}{0ex}}\right)\phantom{\rule{0.3em}{0ex}}f\left({x}_{\left[1\right]},\dots ,{x}_{\left[m\right]}\right)\phantom{\rule{2.77626pt}{0ex}}d{x}_{\left[1\right]}\phantom{\rule{2.77626pt}{0ex}}\cdots \phantom{\rule{2.77626pt}{0ex}}d{x}_{\left[m\right]}\hfill \\ ={\int }_{-\infty }^{\infty }{\int }_{{x}_{\left[1\right]}}^{\infty }\cdots {\int }_{{x}_{\left[m-1\right]}}^{\infty }\frac{n!}{\prod _{i=1}^{m+1}{u}_{i}\left(n\right)!}{\left(F\left({x}_{\left[1\right]}\right)\right)}^{{u}_{1}\left(n\right)}{\left(F\left({x}_{\left[2\right]}\right)-F\left({x}_{\left[1\right]}\right)\right)}^{{u}_{2}\left(n\right)}\hfill \\ \phantom{\rule{28.45274pt}{0ex}}\cdots {\left(1-F\left({x}_{\left[m\right]}\right)\right)}^{{u}_{m+1}\left(n\right)}m!\mathit{\text{dF}}\left({x}_{\left[1\right]}\right)\cdots \mathit{\text{dF}}\left({x}_{\left[m\right]}\right).\hfill \end{array}$
(8)

Using variable transformation, it is clear to see that the random variables F(x[1]),…,F(x[m]) have a Dirichlet distribution with parameters u1(n)+1,u2(n)+1, …,um+1(n)+1. Therefore, we have

$\begin{array}{l}P\left({\mathbit{U}}_{n}={\mathbit{u}}_{n}|{H}_{o}\right)=\frac{n!m!}{\left(n+m\right)!}=\frac{1}{\mathit{\text{Card}}\left({\Omega }_{n}\right)}\end{array}$

which is independent of the distribution function.

This is the reason that the distribution of the rank statistic U n is distribution-free under the null hypothesis. However, the distribution of the random vector U n is discrete uniform with the mass function one over the number of possible outcomes of the random vector U n only when assuming F=G. In other words, the distribution of the random vector U n can be found by the traditional combinatorial analysis when F=G. Unfortunately, when FG, we will not be able to establish the distribution of U n through Equation (7) as solving the multiple integral in Equation (8) is either tedious given some appropriate alternative distribution function or difficult. Our understanding is that finding the power of the test has not been solved in most cases. To overcome this situation, we bring in the finite Markov chain imbedding approach.

Let Ω t ,t=0,1,…,n, be the state space which has

$\begin{array}{l}\left(\genfrac{}{}{0}{}{m+t}{t}\right)\end{array}$

possible states, Γ n ={0,1,…,n} be an index set, and {Z t :tΓ n } be a non-homogeneous Markov chain on the state space Ω t . As a transition probability matrix M t for this chain, t=1,…,n, consider

$\begin{array}{l}{\begin{array}{c}{\Omega }_{t}\\ {\mathbit{M}}_{t}=\begin{array}{c}{\Omega }_{t-1}\end{array}& \left[\right\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}{p}_{{u}_{t-1},{u}_{t}}\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\left]\right\end{array}}_{\left(\genfrac{}{}{0}{}{m+t-1}{t-1}\right)×\left(\genfrac{}{}{0}{}{m+t}{t}\right)},\end{array}$

where

$\begin{array}{lcr}{p}_{{\mathbit{u}}_{t-1},{\mathbit{u}}_{t}}\hfill & =& P\left({Z}_{t}={\mathbit{u}}_{t}|{Z}_{t-1}={\mathbit{u}}_{t-1}\right)\hfill \\ =& \left\{\begin{array}{cc}{p}_{i}& \phantom{\rule{1em}{0ex}}\text{if}\phantom{\rule{2.77626pt}{0ex}}{u}_{i}\left(t-1\right)+1={u}_{i}\left(t\right)\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\text{and}\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}{u}_{j}\left(t-1\right)={u}_{j}\left(t\right)\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\forall \phantom{\rule{2.77626pt}{0ex}}j\ne i\\ 0& \phantom{\rule{1em}{0ex}}\text{otherwise}\hfill \end{array}\right\,\hfill \end{array}$

and p i is defined in Equation (3).

#### Theorem 3

R l (U n |X) is finite Markov chain imbeddable, and

$\begin{array}{l}P\left({R}_{l}\left({\mathbit{U}}_{n}\right)=r|\mathbit{X}\right)=\mathbit{\xi }\left(\prod _{t=1}^{n}{\mathbit{M}}_{t}\right){\mathbit{B}}^{\prime }\left({C}_{r}\right),\end{array}$

where$\mathbit{B}\left({C}_{r}\right)=\sum _{k:{R}_{l}\left({\mathbit{U}}_{n}\right)=r}{e}_{k},\phantom{\rule{2.83795pt}{0ex}}{e}_{k}$is a$1×\left(\genfrac{}{}{0}{}{m+n}{n}\right)$unit row vector corresponding to state u n , ξ(=P(Z0=1)=1) is the initial probability and M t , t=1,…,n, are the transition probability matrices of the imbedded Markov chain defined on the state space Ω t .

#### Proof

For each u n =(u1(n),,um+1(n)) in the state space Ω n , we have a corresponding rank R l as shown in Equation (4). Intuitively, the minimum rank r l s is n(n+1)/2 and the maximum rank r l b is n(2m+n+1)/2. In accordance with the possible values of the rank R l , we define a finite partition {C r :r=r l s ,…,r l b } such that

$\begin{array}{l}P\left({Z}_{n}\in {C}_{r}|\mathbit{p}\right)=\mathbit{\xi }\left(\prod _{t=1}^{n}{\mathbit{M}}_{t}\right){\mathbit{B}}^{\prime }\left({C}_{r}\right)\end{array}$
(9)

where $\mathbit{B}\left({C}_{r}\right)=\sum _{k:{R}_{l}\left({\mathbit{U}}_{n}\right)=r}{e}_{k},\phantom{\rule{2.77626pt}{0ex}}{e}_{k}$ is a $1×\left(\genfrac{}{}{0}{}{m+n}{n}\right)$ unit row vector corresponding to state U n , we then obtain the conditional probability of the rank R l .

Then, the Law of Large Numbers is used to determine the probability of U n for any continuous F and G

$\frac{1}{N}\phantom{\rule{2.77626pt}{0ex}}\sum _{i=1}^{N}\phantom{\rule{2.77626pt}{0ex}}P\left({\mathbit{U}}_{n}={\mathbit{u}}_{n}|\phantom{\rule{2.77626pt}{0ex}}{\mathbf{\text{X}}}_{i}\right)\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\stackrel{\mathit{\text{p}}}{\to }\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}P\left({\mathbit{U}}_{n}={\mathbit{u}}_{n}\right)$

where X i is the ith sample of size m from the distribution function F. It is easy to see that

$\begin{array}{l}P\left({R}_{l}\left({\mathbit{U}}_{n}\right)=r\right)=\sum _{{\mathbit{u}}_{n}:R\left({\mathbit{u}}_{n}\right)=r}P\left({\mathbit{U}}_{n}={\mathbit{u}}_{n}\right).\end{array}$
(10)

To test

${H}_{o}:F\left(x\right)=G\left(x\right)\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\text{versus}\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}{H}_{a}:F\left(x\right)=G\left(x-\theta \right),$

for some θ≠0, the power function is approximated by

$\begin{array}{l}P\left({R}_{l}\left({\mathbit{U}}_{n}\right)\le {r}_{1\alpha }|{H}_{a}\right)+P\left({R}_{l}\left({\mathbit{U}}_{n}\right)\ge {r}_{2\alpha }|{H}_{a}\right)\hfill \\ =& \sum _{r={r}_{\mathit{\text{ls}}}}^{{r}_{1\alpha }}P\left({R}_{l}\left({\mathbit{U}}_{n}\right)=r|{H}_{a}\right)+\sum _{r={r}_{2\alpha }}^{{r}_{\mathit{\text{lb}}}}P\left({R}_{l}\left({\mathbit{U}}_{n}\right)=r|{H}_{a}\right)\hfill \\ =& \sum _{r={r}_{\mathit{\text{ls}}}}^{{r}_{1\alpha }}\sum _{{\mathbit{u}}_{n}:R\left({\mathbit{u}}_{n}\right)=r}P\left({\mathbit{U}}_{n}={\mathbit{u}}_{n}|{H}_{a}\right)+\sum _{r={r}_{2\alpha }}^{{r}_{\mathit{\text{lb}}}}\sum _{{\mathbit{u}}_{n}:R\left({\mathbit{u}}_{n}\right)=r}P\left({\mathbit{U}}_{n}={\mathbit{u}}_{n}|{H}_{a}\right)\hfill \\ \approx & \sum _{r={r}_{\mathit{\text{ls}}}}^{{r}_{1\alpha }}\sum _{{\mathbit{u}}_{n}:R\left({\mathbit{u}}_{n}\right)=r}\frac{1}{N}\phantom{\rule{2.77626pt}{0ex}}\sum _{i=1}^{N}\phantom{\rule{2.77626pt}{0ex}}P\left({\mathbit{U}}_{n}|{H}_{a};\phantom{\rule{2.77626pt}{0ex}}{\mathbf{\text{X}}}_{i}\right)+\sum _{r={r}_{2\alpha }}^{{r}_{\mathit{\text{lb}}}}\sum _{{\mathbit{u}}_{n}:R\left({\mathbit{u}}_{n}\right)=r}\frac{1}{N}\phantom{\rule{2.77626pt}{0ex}}\sum _{i=1}^{N}\phantom{\rule{2.77626pt}{0ex}}P\left({\mathbit{U}}_{n}|{H}_{a};\phantom{\rule{2.77626pt}{0ex}}{\mathbf{\text{X}}}_{i}\right)\hfill \\ =& \frac{1}{N}\phantom{\rule{2.77626pt}{0ex}}\left(\sum _{r={r}_{\mathit{\text{ls}}}}^{{r}_{1\alpha }}\sum _{i=1}^{N}\sum _{{\mathbit{u}}_{n}:R\left({\mathbit{u}}_{n}\right)=r}P\left({\mathbit{U}}_{n}|{H}_{a};\phantom{\rule{2.77626pt}{0ex}}{\mathbf{\text{X}}}_{i}\right)+\sum _{r={r}_{2\alpha }}^{{r}_{\mathit{\text{lb}}}}\sum _{i=1}^{N}\sum _{{\mathbit{u}}_{n}:R\left({\mathbit{u}}_{n}\right)=r}P\left({\mathbit{U}}_{n}|{H}_{a};\phantom{\rule{2.77626pt}{0ex}}{\mathbf{\text{X}}}_{i}\right)\right)\hfill \\ =& \frac{1}{N}\phantom{\rule{2.77626pt}{0ex}}\sum _{i=1}^{N}\phantom{\rule{2.77626pt}{0ex}}\left(\sum _{r={r}_{\mathit{\text{ls}}}}^{{r}_{1\alpha }}P\left({R}_{l}\left({\mathbit{U}}_{n}\right)=r|{H}_{a};\phantom{\rule{2.77626pt}{0ex}}{\mathbf{\text{X}}}_{i}\right)+\sum _{r={r}_{2\alpha }}^{{r}_{\mathit{\text{lb}}}}P\left({R}_{l}\left({\mathbit{U}}_{n}\right)=r|{H}_{a};\phantom{\rule{2.77626pt}{0ex}}{\mathbf{\text{X}}}_{i}\right)\right),\hfill \end{array}$

where

$\begin{array}{l}P\left({R}_{l}\left({\mathbit{U}}_{n}\right)\le {r}_{1\alpha }|{H}_{o}\right)+P\left({R}_{l}\left({\mathbit{U}}_{n}\right)\ge {r}_{2\alpha }|{H}_{o}\right)\le \mathrm{\alpha .}\end{array}$

Note that the alternative hypothesis is subject to the purpose of the test. This simply needs to be slightly modified if a one-sided test is adopted.

### 2.2 Distributions of the rank statistic in the scale case

We studied the distribution and the power function of the rank statistic R l considering a shift in location. Now, the distribution and the power function of the rank statistic considering the scale parameter will be addressed. For this purpose, we consider F(x)=G(x σ−1) and state the null and alternative hypotheses as

${H}_{o}:\sigma =1\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\text{versus}\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}{H}_{a}:\sigma \ne 1.$

To do so, we begin with the procedure of finding the distribution of the rank statistic, denoted R s , considering the scale parameter through the random vector U n . The array of ranks are given by

$\left(m+n\right)/2,\dots ,3,2,1,\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}1,2,3,\dots ,\left(m+n\right)/2;$

if m+n is even, and

$\left(m+n-1\right)/2,\dots ,3,2,1,\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}0\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}1,2,3,\dots ,\left(m+n-1\right)/2$

if m+n is odd. We first introduce how to determine the rank-sum of ys observations in the combined samples, R s , with respect to

${\Omega }_{n}=\left\{{\mathbf{u}}_{n}=\left({u}_{1}\left(n\right),\dots ,{u}_{m+1}\left(n\right)\right):\sum _{i=1}^{m+1}{u}_{i}\left(n\right)=n\right\}$

where u i (n) means the number of y observations belonging to [ x[i−1],x[i]). Let m e d(x,y) be the median among xs and ys and belongs to [ x[i],x[i+1]) which will then break U n into two parts ${\mathbit{U}}_{n}^{-}$ and ${\mathbit{U}}_{n}^{+}$. If m+n is odd and m e d(x,y)=x[i], then

${\mathbit{U}}_{n}^{-}=\left({u}_{1}^{-}={u}_{i}\left(n\right)\phantom{\rule{2.77626pt}{0ex}},\phantom{\rule{2.77626pt}{0ex}}{u}_{2}^{-}={u}_{i-1}\left(n\right)\phantom{\rule{2.77626pt}{0ex}},\phantom{\rule{2.77626pt}{0ex}}\cdots \phantom{\rule{2.77626pt}{0ex}},\phantom{\rule{2.77626pt}{0ex}}{u}_{i}^{-}={u}_{1}\left(n\right)\right)$

is a 1×i vector and

${\mathbit{U}}_{n}^{+}=\left({u}_{1}^{+}={u}_{i+1}\left(n\right)\phantom{\rule{2.77626pt}{0ex}},\phantom{\rule{2.77626pt}{0ex}}{u}_{2}^{+}={u}_{i+2}\left(n\right)\phantom{\rule{2.77626pt}{0ex}},\phantom{\rule{2.77626pt}{0ex}}\cdots \phantom{\rule{2.77626pt}{0ex}},\phantom{\rule{2.77626pt}{0ex}}{u}_{m+1-i}^{+}={u}_{m+1}\left(n\right)\right)$

is a 1×(m+1−i) vector. The second possible case is, if m+n is odd and $\mathit{\text{med}}\left(x,y\right)={y}_{\left[\sum _{k=1}^{i}{u}_{k}\left(n\right)+j\right]}$, then ${\mathbit{U}}_{n}^{-}$, a row vector with length i+1, has the form

$\left({u}_{1}^{-}=j-1\phantom{\rule{2.77626pt}{0ex}},\phantom{\rule{2.77626pt}{0ex}}{u}_{2}^{-}={u}_{i}\left(n\right)\phantom{\rule{2.77626pt}{0ex}},\phantom{\rule{2.77626pt}{0ex}}\cdots \phantom{\rule{2.77626pt}{0ex}},\phantom{\rule{2.77626pt}{0ex}}{u}_{i+1}^{-}={u}_{1}\left(n\right)\right)$

and ${\mathbit{U}}_{n}^{+}$, a row vector with length m+1−i, is given by

$\left({u}_{1}^{+}={u}_{i+1}\left(n\right)-j\phantom{\rule{2.77626pt}{0ex}},\phantom{\rule{2.77626pt}{0ex}}{u}_{2}^{+}={u}_{i+2}\left(n\right)\phantom{\rule{2.77626pt}{0ex}},\phantom{\rule{2.77626pt}{0ex}}\cdots \phantom{\rule{2.77626pt}{0ex}},\phantom{\rule{2.77626pt}{0ex}}{u}_{m+1-i}^{+}={u}_{m+1}\left(n\right)\right).$

The third possible case is, if m+n is even and x[i] is the smallest number larger than m e d(x,y), the vectors are now defined as

${\mathbit{U}}_{n}^{-}=\left({u}_{1}^{-}={u}_{i}\left(n\right)\phantom{\rule{2.77626pt}{0ex}},\phantom{\rule{2.77626pt}{0ex}}{u}_{2}^{-}={u}_{i-1}\left(n\right)\phantom{\rule{2.77626pt}{0ex}},\phantom{\rule{2.77626pt}{0ex}}\cdots \phantom{\rule{2.77626pt}{0ex}},\phantom{\rule{2.77626pt}{0ex}}{u}_{i}^{-}={u}_{1}\left(n\right)\right)$

and

${\mathbit{U}}_{n}^{+}=\left({u}_{1}^{+}=0\phantom{\rule{2.77626pt}{0ex}},\phantom{\rule{2.77626pt}{0ex}}{u}_{2}^{+}={u}_{i+1}\left(n\right)\phantom{\rule{2.77626pt}{0ex}},\phantom{\rule{2.77626pt}{0ex}}\cdots \phantom{\rule{2.77626pt}{0ex}},\phantom{\rule{2.77626pt}{0ex}}{u}_{m+2-i}^{+}={u}_{m+1}\left(n\right)\right).$

The last possibility is, if m+n is even, ${y}_{\left[\sum _{k=1}^{i}{u}_{k}\left(n\right)+j\right]}$ is the smallest number larger than m e d(x,y). The vectors are now defined as

${\mathbit{U}}_{n}^{-}=\left({u}_{1}^{-}=j-1\phantom{\rule{2.77626pt}{0ex}},\phantom{\rule{2.77626pt}{0ex}}{u}_{2}^{-}={u}_{i}\left(n\right)\phantom{\rule{2.77626pt}{0ex}},\phantom{\rule{2.77626pt}{0ex}}\cdots \phantom{\rule{2.77626pt}{0ex}},\phantom{\rule{2.77626pt}{0ex}}{u}_{i+1}^{-}={u}_{1}\left(n\right)\right)$

and

${\mathbit{U}}_{n}^{+}=\left({u}_{1}^{+}={u}_{i+1}\left(n\right)-j+1\phantom{\rule{2.77626pt}{0ex}},\phantom{\rule{2.77626pt}{0ex}}{u}_{2}^{+}={u}_{i+2}\left(n\right)\phantom{\rule{2.77626pt}{0ex}},\phantom{\rule{2.77626pt}{0ex}}\cdots \phantom{\rule{2.77626pt}{0ex}},\phantom{\rule{2.77626pt}{0ex}}{u}_{m+1-i}^{+}={u}_{m+1}\left(n\right)\right).$

Let n be the length of the vector ${\mathbit{U}}_{n}^{-}$ and n+ be the length of the vector ${\mathbit{U}}_{n}^{+}$.

#### Theorem 4

R s (U n |X) is finite Markov chain imbeddable, and

$\begin{array}{l}P\left({R}_{s}\left({\mathbf{U}}_{n}\right)=r|\mathbit{X}\right)=\mathbit{\xi }\left(\prod _{t=1}^{n}{M}_{t}\right){\mathbit{B}}^{\prime }\left({C}_{r}\right),\end{array}$

where$\mathbit{B}\left({C}_{r}\right)=\sum _{k:{R}_{s}\left({\mathbf{U}}_{n}\right)=r}{e}_{k},\phantom{\rule{2.83795pt}{0ex}}{e}_{k}$is a$1×\left(\genfrac{}{}{0}{}{m+n}{n}\right)$unit row vector corresponding to state U n , ξ(=P(Z0=1)=1) is the initial probability and M t , t=1,…,n are the transition probability matrices of the imbedded Markov chain defined on the state space Ω t .

#### Proof

For each U n in the state space Ω n , we have a corresponding

$\begin{array}{lcr}{R}_{s}\left({\mathbf{U}}_{n}|\mathbit{X}\right)& =& {R}_{s}\left({\mathbit{U}}_{n}^{-}|\mathbit{X}\right)+{R}_{s}\left({\mathbit{U}}_{n}^{+}|\mathbit{X}\right)\hfill \\ =& \frac{\sum _{k=1}^{{n}^{-}}{\left({u}_{k}^{-}\right)}^{2}+\sum _{k=1}^{{n}^{-}}{u}_{k}^{-}}{2}+\sum _{k=1}^{{n}^{-}-1}\left({u}_{k}^{-}+1\right)\left(\sum _{j=k+1}^{{n}^{-}}{u}_{j}^{-}\right)\hfill \\ +\frac{\sum _{k=1}^{{n}^{+}}{\left({u}_{k}^{+}\right)}^{2}+\sum _{k=1}^{{n}^{+}}{u}_{k}^{+}}{2}+\sum _{k=1}^{{n}^{+}-1}\left({u}_{k}^{+}+1\right)\left(\sum _{j=k+1}^{{n}^{+}}{u}_{j}^{+}\right).\hfill \end{array}$
(11)

The smallest possible value of R s (U n ) is

$\begin{array}{l}{r}_{\mathit{\text{ss}}}=\left\{\begin{array}{ll}\frac{n\left(n+2\right)}{4}\hfill & \phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\text{if}\phantom{\rule{2.77626pt}{0ex}}m+n\phantom{\rule{2.77626pt}{0ex}}\text{is even and}\phantom{\rule{2.77626pt}{0ex}}n\phantom{\rule{2.77626pt}{0ex}}\text{is even}\\ \frac{\left(n+1\right)\left(n+3\right)}{4}\hfill & \phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\text{if}\phantom{\rule{2.77626pt}{0ex}}m+n\phantom{\rule{2.77626pt}{0ex}}\text{is even and}\phantom{\rule{2.77626pt}{0ex}}n\phantom{\rule{2.77626pt}{0ex}}\text{is odd}\\ \frac{{n}^{2}}{4}\hfill & \phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\text{if}\phantom{\rule{2.77626pt}{0ex}}m+n\phantom{\rule{2.77626pt}{0ex}}\text{is odd and}\phantom{\rule{2.77626pt}{0ex}}n\phantom{\rule{2.77626pt}{0ex}}\text{is even}\\ \frac{\left(n+1\right)\left(n-1\right)}{4}\hfill & \phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\text{if}\phantom{\rule{2.77626pt}{0ex}}m+n\phantom{\rule{2.77626pt}{0ex}}\text{is odd and}\phantom{\rule{2.77626pt}{0ex}}n\phantom{\rule{2.77626pt}{0ex}}\text{is odd}\end{array}\right\\end{array}$
(12)

and the largest possible value is

$\begin{array}{l}{r}_{\mathit{\text{sb}}}=\left\{\begin{array}{ll}\frac{n\left(2m+n+2\right)}{4}\hfill & \phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\text{if}\phantom{\rule{2.77626pt}{0ex}}m+n\phantom{\rule{2.77626pt}{0ex}}\text{is even and}\phantom{\rule{2.77626pt}{0ex}}n\phantom{\rule{2.77626pt}{0ex}}\text{is even}\\ \frac{n\left(2m+n+2\right)-1}{4}\hfill & \phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\text{if}\phantom{\rule{2.77626pt}{0ex}}m+n\phantom{\rule{2.77626pt}{0ex}}\text{is even and}\phantom{\rule{2.77626pt}{0ex}}n\phantom{\rule{2.77626pt}{0ex}}\text{is odd}\\ \frac{n\left(2m+n-1\right)}{4}\hfill & \phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\text{if}\phantom{\rule{2.77626pt}{0ex}}m+n\phantom{\rule{2.77626pt}{0ex}}\text{is odd and}\phantom{\rule{2.77626pt}{0ex}}n\phantom{\rule{2.77626pt}{0ex}}\text{is even}\\ \frac{n\left(2m+n\right)-1}{4}\hfill & \phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\text{if}\phantom{\rule{2.77626pt}{0ex}}m+n\phantom{\rule{2.77626pt}{0ex}}\text{is odd and}\phantom{\rule{2.77626pt}{0ex}}n\phantom{\rule{2.77626pt}{0ex}}\text{is odd}\end{array}\right\\end{array}$
(13)

In accordance with Equation (11), we use the possible value of R s as a rule of the partition. The rest of the proof follows along the same line as that of Theorem 3, and here, is omitted.

Similarly, we apply the LLN to conclude that

$\frac{1}{N}\phantom{\rule{2.77626pt}{0ex}}\sum _{i=1}^{N}\phantom{\rule{2.77626pt}{0ex}}P\left({R}_{s}|\phantom{\rule{2.77626pt}{0ex}}{\mathbf{\text{X}}}_{i}\phantom{\rule{2.77626pt}{0ex}}\right)\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\stackrel{\mathit{\text{p}}}{\to }\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}P\left({R}_{s}\right)$

which establishes the distribution of R s .

Through FMCI we, again, successfully retrieved the distribution of R s under selected alternative distributions, for which the procedures are similar to those in the previous section. In addition, it is quite intuitive to approximate the power function by

$\begin{array}{l}\frac{1}{N}\phantom{\rule{2.77626pt}{0ex}}\sum _{i=1}^{N}\left(\sum _{s={r}_{\mathit{\text{ss}}}}^{{s}_{1\alpha }}P\left({R}_{s}\left({\mathbf{U}}_{n}\right)=s|\phantom{\rule{2.77626pt}{0ex}}{\mathbf{\text{X}}}_{i}\right)+\sum _{s={s}_{2\alpha }}^{{r}_{\mathit{\text{sb}}}}P\left({R}_{s}\left({\mathbf{U}}_{n}\right)=s|\phantom{\rule{2.77626pt}{0ex}}{\mathbf{\text{X}}}_{i}\right)\right),\end{array}$

where

$\begin{array}{l}P\left({R}_{s}\left({\mathbf{U}}_{n}\right)\le {s}_{1\alpha }|{H}_{o}\right)+P\left({R}_{s}\left({\mathbf{U}}_{n}\right)\ge {s}_{2\alpha }|{H}_{o}\right)\le \mathrm{\alpha .}\end{array}$

### 2.3 Joint distributions of the rank statistics in the shift and scale case

We have derived the marginal distributions of R l and R s in terms of U n , respectively, which yield the following theorem.

#### Theorem 5

(R l (U n |X),R s (U n |X)) is finite Markov chain imbeddable, and

$\begin{array}{l}P\left({R}_{l}\left({\mathbit{U}}_{n}\right)={r}_{1};{R}_{s}\left({\mathbit{U}}_{n}\right)={r}_{2}|\mathbit{X}\right)=\mathbit{\xi }\left(\prod _{t=1}^{n}{\mathbit{M}}_{t}\right){\mathbit{B}}^{\prime }\left({C}_{r}\right)\end{array}$

where$\mathbit{B}\left({C}_{r}\right)=\sum _{k:{R}_{l}\left({\mathbit{U}}_{n}\right)={r}_{1}&{R}_{s}\left({\mathbit{U}}_{n}\right)={r}_{2}}{e}_{k},\phantom{\rule{2.83795pt}{0ex}}{e}_{k}$is a$1×\left(\genfrac{}{}{0}{}{m+n}{n}\right)$unit row vector corresponding to state u n , ξ(=P(Z0=1)=1) is the initial probability and M t , t=1,…,n are the transition probability matrices of the imbedded Markov chain defined on the state space Ω t .

#### Proof

By Equations (4) and (11), we know each u n in the state space Ω n has corresponding values of R l and R s . The combinations of the values R l and R s are used to be the standard of the partition. The rest of the proof follows along the same line as that of Theorem 3.

The joint distribution of the ranks considering both the location and scale parameters which can be determined through our algorithm is yet to be studied in the literature. Our result allows us to test the homogeneity of the distribution functions F(x)=G((xθ)σ−1). We state the hypotheses as follows

$\begin{array}{l}{H}_{o}:\theta =0\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\text{and}\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\sigma =1\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\text{v.s.}\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}{H}_{a}:\theta \ne 0\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\text{or}\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\sigma \ne 1.\end{array}$
(14)

Also we are able to identify a proper critical region under the null hypothesis and discuss its power when FG. For example, a rectangular critical region can be

$\begin{array}{l}{C}_{\alpha }=\left\{{R}_{l}\le {r}_{1l},\phantom{\rule{2.77626pt}{0ex}}{R}_{l}\ge {r}_{2l},\phantom{\rule{2.77626pt}{0ex}}{R}_{s}\le {r}_{1s}\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\text{or}\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}{R}_{s}\ge {r}_{2s}\right\}\end{array}$

where r1l, r2l, r1s and r2s are the critical values such that

$\begin{array}{lcr}P\left({R}_{l}\le {r}_{1l}|{H}_{o}\right)& +& P\left({R}_{l}\ge {r}_{2l}|{H}_{o}\right)+P\left({r}_{1l}<{R}_{l}<{r}_{2l},{R}_{s}\le {r}_{1s}|{H}_{o}\right)\\ +& P\left({r}_{1l}<{R}_{l}<{r}_{2l},{R}_{s}\ge {r}_{2s}|{H}_{o}\right)\le \alpha \hfill \end{array}$

or an elliptic critical region

$\begin{array}{l}{C}_{\alpha }^{\prime }=\left\{\frac{{R}_{l}^{2}}{a}+\frac{{R}_{s}^{2}}{b}>C\right\}\end{array}$

for some positive constants a and b such that

$\begin{array}{l}P\left(\frac{{R}_{l}^{2}}{a}+\frac{{R}_{s}^{2}}{b}>C|{H}_{o}\right)\le \mathrm{\alpha .}\end{array}$

According to the above defined rejection region, the power of the test can be found as

$\begin{array}{lcr}P\left({R}_{l}\le {r}_{1l}|{H}_{a}\right)& +& P\left({R}_{l}\ge {r}_{2l}|{H}_{a}\right)+P\left({r}_{1l}<{R}_{l}<{r}_{2l},{R}_{s}\le {r}_{1s}|{H}_{a}\right)\\ +& P\left({r}_{1l}<{R}_{l}<{r}_{2l},{R}_{s}\ge {r}_{2s}|{H}_{a}\right)\hfill \end{array}$
(15)

or

$\begin{array}{l}P\left(\frac{{R}_{l}^{2}}{a}+\frac{{R}_{s}^{2}}{b}>C|{H}_{a}\right).\end{array}$
(16)

Note that unless having a conjecture about the values of θ and σ, we tend to use a two-sided test. However, with the knowledge of the center and shape of the distribution of interest, deciding a sectorial critical region is a better choice, for which an example is demonstrated in the numerical studies.

## 3 Numerical results and discussion

### 3.1 A joint distribution of R l and R s

Let {X1,…,X5}N(0,1) and {Y1,…,Y7}N(θ,σ). Figure 1 gives the joint distribution of the random variables R l and R s under the null hypothesis of θ=0 and σ=1. The marginal distributions of R l and R s can be easily established from their joint distribution. Figure 1 also shows that the two random variables R l and R s are dependent. We construct two critical regions as shown in Figure 2, according to their joint distribution. Outside the yellow area in Figure 2 is the selected rectangular critical region C0.1738 and outside the red shadow is the elliptic one C 0.1738′.

### 3.2 Powers for a joint test using R l and R s

The alternative of interest is stated in the preceding section (see Equation (14)). The power functions of the test statistics R l and R s for a sequence of normally distributed populations with θ from -20 to 20 with an increment of 0.5 and σ from 1 to 10 with an increment of 1, and its reciprocal under two types of critical regions are provided in Figures 3 and 4. We adopt a two-sided test because of the selected values of the parameters. It should be slightly modified the critical region in the previous step in order to calculate the powers if a one-sided test is adopted. Both critical regions roughly perform equally well as shown in Figures 3 and 4. Figure 5 presents the performance of the two critical regions for given various parameter settings. Figures 5(a) and (b) show that given a standard deviation of 1 or a mean of 0, the powers of the two critical regions, rectangular and elliptic, are high and similar. However, when the variation of the alternative population reduces (σ=1/10) or increases (σ=10), the elliptic critical region performs better than the rectangular one as shown in Figures 5(c) and (d). Therefore, we suggest that when conducting a test for the equivalence of two distributions, an elliptic rejection area should be used.

Next, we consider the problem of determining an optimum rank test. To conduct a test of distributions equivalency, we can use either R l or R s as the test statistic. As mentioned earlier, the marginal distribution R l or R s can be easily established from their joint distribution. Figures 6 and 7 provide the power functions for the test statistics R l and R s at the level of significance 17.38%, respectively. Figure 7 shows that the rank test against scale parameter is badly effected by the centre of the alternative population. This was seen before by Ansari and Bradley (1960). By comparing Figures 6 and 7 with Figure 4, it seems that the joint test would be much more reliable than either R l or R s alone for distributions equivalence tests. A joint test for distributions equivalency would like a better option under most circumstances.

### 3.3 Lehmann alternatives

Consider the one-sided alternative F(x;θ,σ) > G(x;θ,σ), Lehmann (1953) proposed a test of H o :F(x;θ,σ) = G(x;θ,σ) against H a :F(x;θ,σ)k= G(x;θ,σ) which is known as the family of Lehmann alternative. Note F(x;θ,σ)k is the cumulative distribution of max1≤ik(x i ) when X i F and, under the alternative hypothesis, G(x;θ,σ) is stochastically larger than F(x;θ,σ). First of all, we know

$\begin{array}{lcr}{E}_{k}\left(X\right)& =& {\int }_{-\infty }^{0}-G\left(x\right)\mathit{\text{dx}}+{\int }_{0}^{\infty }1-G\left(x\right)\mathit{\text{dx}}\hfill \\ >& {\int }_{-\infty }^{0}-F\left(x\right)\mathit{\text{dx}}+{\int }_{0}^{\infty }1-F\left(x\right)\mathit{\text{dx}}=E\left(X\right).\hfill \end{array}$
(17)

Therefore, the larger the R l is, the stronger the evidence against the null hypothesis will be. For the variation of the distribution per se, the codomain of the density function is compressed to larger numbers; therefore, in most cases, we have V a r(X k ) < V a r(X). We then propose to reject the null hypothesis when R s is large. For example, given F U(0,1) and G = Fk, it is easy to see

$\begin{array}{l}\frac{{E}_{k+1}\left(X\right)}{{E}_{k}\left(X\right)}=\frac{{\left(k+1\right)}^{2}}{k\left(k+2\right)}>1\end{array}$
(18)

and

$\begin{array}{l}\frac{{\mathit{\text{Var}}}_{k+1}\left(X\right)}{{\mathit{\text{Var}}}_{k}\left(X\right)}=\frac{{\left(k+1\right)}^{3}}{k\left(k+2\right)\left(k+3\right)}<1\end{array}$
(19)

for all k. We first find the marginal and joint distributions of the ranks R l and R s in order to define critical regions for R l and R s individually and simultaneously. Due to the properties of the mean and variance of the alternative distribution, as shown in Equations (17), (18) and (19), we are cautious to define the critical regions. Table 1 provides powers for the tests as we choose uniform, standard Normal, student-t with 3 degrees of freedom, exponential distributions for the hypothesized distribution, a couple of different settings for sample sizes m and n, and 2, 3, 6 for k. Clearly, a joint test considering both R l and R s for the equality of distributions is best suited in comparison with tests considering only one of the rank statistics.

## 4 Conclusion

Our proposed algorithm provides a solution for finding the power of distribution equivalence tests considering the shift and scale parameters, respectively and simultaneously. Numerical studies show that a joint test should be adopted for the test homogeneity of distributions as well as under Lehmann alternatives. Also an elliptic critical region is a better choice rather than a rectangular one for a joint test. In practice, it is reasonable to have neither the normality assumption nor equal mean/variance of the interested distributions. However, our algorithm highly depends on the technology equipments as the possible states in Ω n grow rapidly when the sample sizes increase. Therefore, we can, so far, only target small sample sizes in our work.

## References

• Ansari AR, Bradley RA: Rank-Sum Tests for Dispersions. Ann. Math. Stat 1960, 31: 1174–1189. 10.1214/aoms/1177705688

• Collings BJ, Hamilton MA: Estimating the power of the two-sample Wilcoxon Test for location shift. Biometrics 1988, 44: 847–860. 10.2307/2531596

• Klotz J: Nonparametric test for scale. Ann. Math. Stat 1962, 33: 498–512. 10.1214/aoms/1177704576

• Lehmann EL: The power for rank tests. Ann. Math. Stat 1953, 24: 23–43. 10.1214/aoms/1177729080

• Lehmann EL: Nonparametrics: Statistical Methods Based on Ranks. Prentice-Hall, New Jersey; 1998.

• Mann HB, Whitney DR: On a test of whether one of two random variables is stochastically larger than the other. Ann. Math. Stat 1947, 18: 50–60. 10.1214/aoms/1177730491

• Mood AM: On the asymptotic efficiency of certain nonparametric two-sample tests. Ann. Math. Stat 1954, 25: 514–522. 10.1214/aoms/1177728719

• Rosner B, Glynn RJ: Power and sample size estimation for the Wilcoxon rank sum test with application to comparisons of C statistics from alternative prediction models. Biometrics 2009, 65: 188–197. 10.1111/j.1541-0420.2008.01062.x

• Shieh G, Jan SL, Randles RH: On power and sample size determinations for the Wilcoxon-Mann-Whitney test. Nonparametric Stat 2006, 18: 33–43. 10.1080/10485250500473099

• Siegel S, Tukey JW: A nonparametric sum of ranks procedure for relative spread in unpaired samples. J. Am. Stat. Assoc 1960, 55: 429–445. 10.1080/01621459.1960.10482073

• Wilcoxon F: Individual comparisons by ranking methods. Biometrics 1945, 1: 80–83. 10.2307/3001968

## Acknowledgments

The author would like to thank James C. Fu and anonymous referee whose comments led to significant improvements of this manuscript.

## Author information

Authors

### Corresponding author

Correspondence to Wan-Chen Lee.

### Competing interests

The author declares that she has no competing interests.

## Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

## Rights and permissions

Reprints and permissions

Lee, WC. Joint distribution of rank statistics considering the location and scale parameters and its power study. J Stat Distrib App 1, 6 (2014). https://doi.org/10.1186/2195-5832-1-6