2.1 Distributions of the rank statistic in the shift case
Let {X1,…,X
m
} and {Y1,…,Y
n
} be two independent samples from the continuous cumulative density distributions F(x) and G(x−θ), respectively. Given x={x1,…,x
m
} and x[i] is the ith smallest number in the sample, we have
for i=1,2,…,m+1 where x[0]=−∞ and x[m+1]=∞. Therefore, we define the sampling distribution of Y in the (m+1) intervals as
(3)
Given m, for t=1,2,…,n, let
where u
i
(t) is the number of y′s in the interval [ x[i−1],x[i]) among y1,…,y
t
. For each u
n
=(u1(n),⋯,um+1(n)), we have a corresponding rank-sum of y’s in the combined sample
(4)
Theorem 1
The statistic R
l
is equivalent to the statistic W
Y
, which is addressed by Wilcoxon in 1945.
Proof
Let
The rank statistic W
Y
, sum of the ranks of y’s observations, can be determined by
(5)
The first summation of the first term in Equation (5) can be interpreted as the number of y observations larger than x[i] which is in our expression. It is not difficult to see that equals n, the size of y sample. Therefore, the equation can be rewritten as
It is then easy to see that
Next, we demonstrate that for two random samples from the same population, the distribution of the random vector U
n
is independent of the form of the distribution function.
Theorem 2
Distribution-free property of U
n
.
(6)
Proof
We know the joint PDF of the ordered sample of x′s is given by
and, when F=G, the conditional probability of the random vector U
n
given X=(x1,x2,…,x
m
) is
(7)
where x[0]=−∞ and x[m+1]=∞. By taking the expected value of the conditional probability, we have
(8)
Using variable transformation, it is clear to see that the random variables F(x[1]),…,F(x[m]) have a Dirichlet distribution with parameters u1(n)+1,u2(n)+1, …,um+1(n)+1. Therefore, we have
which is independent of the distribution function.
This is the reason that the distribution of the rank statistic U
n
is distribution-free under the null hypothesis. However, the distribution of the random vector U
n
is discrete uniform with the mass function one over the number of possible outcomes of the random vector U
n
only when assuming F=G. In other words, the distribution of the random vector U
n
can be found by the traditional combinatorial analysis when F=G. Unfortunately, when F≠G, we will not be able to establish the distribution of U
n
through Equation (7) as solving the multiple integral in Equation (8) is either tedious given some appropriate alternative distribution function or difficult. Our understanding is that finding the power of the test has not been solved in most cases. To overcome this situation, we bring in the finite Markov chain imbedding approach.
Let Ω
t
,t=0,1,…,n, be the state space which has
possible states, Γ
n
={0,1,…,n} be an index set, and {Z
t
:t∈Γ
n
} be a non-homogeneous Markov chain on the state space Ω
t
. As a transition probability matrix M
t
for this chain, t=1,…,n, consider
where
and p
i
is defined in Equation (3).
Theorem 3
R
l
(U
n
|X) is finite Markov chain imbeddable, and
whereis aunit row vector corresponding to state u
n
, ξ(=P(Z0=1)=1) is the initial probability and M
t
, t=1,…,n, are the transition probability matrices of the imbedded Markov chain defined on the state space Ω
t
.
Proof
For each u
n
=(u1(n),⋯,um+1(n)) in the state space Ω
n
, we have a corresponding rank R
l
as shown in Equation (4). Intuitively, the minimum rank r
l
s
is n(n+1)/2 and the maximum rank r
l
b
is n(2m+n+1)/2. In accordance with the possible values of the rank R
l
, we define a finite partition {C
r
:r=r
l
s
,…,r
l
b
} such that
(9)
where is a unit row vector corresponding to state U
n
, we then obtain the conditional probability of the rank R
l
.
Then, the Law of Large Numbers is used to determine the probability of U
n
for any continuous F and G
where X
i
is the ith sample of size m from the distribution function F. It is easy to see that
(10)
To test
for some θ≠0, the power function is approximated by
where
Note that the alternative hypothesis is subject to the purpose of the test. This simply needs to be slightly modified if a one-sided test is adopted.
2.2 Distributions of the rank statistic in the scale case
We studied the distribution and the power function of the rank statistic R
l
considering a shift in location. Now, the distribution and the power function of the rank statistic considering the scale parameter will be addressed. For this purpose, we consider F(x)=G(x σ−1) and state the null and alternative hypotheses as
To do so, we begin with the procedure of finding the distribution of the rank statistic, denoted R
s
, considering the scale parameter through the random vector U
n
. The array of ranks are given by
if m+n is even, and
if m+n is odd. We first introduce how to determine the rank-sum of y′s observations in the combined samples, R
s
, with respect to
where u
i
(n) means the number of y observations belonging to [ x[i−1],x[i]). Let m e d(x,y) be the median among x′s and y′s and belongs to [ x[i],x[i+1]) which will then break U
n
into two parts and . If m+n is odd and m e d(x,y)=x[i], then
is a 1×i vector and
is a 1×(m+1−i) vector. The second possible case is, if m+n is odd and , then , a row vector with length i+1, has the form
and , a row vector with length m+1−i, is given by
The third possible case is, if m+n is even and x[i] is the smallest number larger than m e d(x,y), the vectors are now defined as
and
The last possibility is, if m+n is even, is the smallest number larger than m e d(x,y). The vectors are now defined as
and
Let n− be the length of the vector and n+ be the length of the vector .
Theorem 4
R
s
(U
n
|X) is finite Markov chain imbeddable, and
whereis aunit row vector corresponding to state U
n
, ξ(=P(Z0=1)=1) is the initial probability and M
t
, t=1,…,n are the transition probability matrices of the imbedded Markov chain defined on the state space Ω
t
.
Proof
For each U
n
in the state space Ω
n
, we have a corresponding
(11)
The smallest possible value of R
s
(U
n
) is
(12)
and the largest possible value is
(13)
In accordance with Equation (11), we use the possible value of R
s
as a rule of the partition. The rest of the proof follows along the same line as that of Theorem 3, and here, is omitted.
Similarly, we apply the LLN to conclude that
which establishes the distribution of R
s
.
Through FMCI we, again, successfully retrieved the distribution of R
s
under selected alternative distributions, for which the procedures are similar to those in the previous section. In addition, it is quite intuitive to approximate the power function by
where
2.3 Joint distributions of the rank statistics in the shift and scale case
We have derived the marginal distributions of R
l
and R
s
in terms of U
n
, respectively, which yield the following theorem.
Theorem 5
(R
l
(U
n
|X),R
s
(U
n
|X)) is finite Markov chain imbeddable, and
whereis aunit row vector corresponding to state u
n
, ξ(=P(Z0=1)=1) is the initial probability and M
t
, t=1,…,n are the transition probability matrices of the imbedded Markov chain defined on the state space Ω
t
.
Proof
By Equations (4) and (11), we know each u
n
in the state space Ω
n
has corresponding values of R
l
and R
s
. The combinations of the values R
l
and R
s
are used to be the standard of the partition. The rest of the proof follows along the same line as that of Theorem 3.
The joint distribution of the ranks considering both the location and scale parameters which can be determined through our algorithm is yet to be studied in the literature. Our result allows us to test the homogeneity of the distribution functions F(x)=G((x−θ)σ−1). We state the hypotheses as follows
(14)
Also we are able to identify a proper critical region under the null hypothesis and discuss its power when F≠G. For example, a rectangular critical region can be
where r1l, r2l, r1s and r2s are the critical values such that
or an elliptic critical region
for some positive constants a and b such that
According to the above defined rejection region, the power of the test can be found as
(15)
or
(16)
Note that unless having a conjecture about the values of θ and σ, we tend to use a two-sided test. However, with the knowledge of the center and shape of the distribution of interest, deciding a sectorial critical region is a better choice, for which an example is demonstrated in the numerical studies.