 Research
 Open Access
 Published:
Joint distribution of ktuple statistics in zeroone sequences of Markovdependent trials
Journal of Statistical Distributions and Applications volume 4, Article number: 26 (2017)
Abstract
We consider a sequence of n, n≥3, zero (0)  one (1) Markovdependent trials. We focus on ktuples of 1s; i.e. runs of 1s of length at least equal to a fixed integer number k, 1≤k≤n. The statistics denoting the number of ktuples of 1s, the number of 1s in them and the distance between the first and the last ktuple of 1s in the sequence, are defined. The work provides, in a closed form, the exact conditional joint distribution of these statistics given that the number of ktuples of 1s in the sequence is at least two. The case of independent and identical 0−1 trials is also covered in the study. A numerical example illustrates further the theoretical results.
Introduction
Run counting statistics defined on a sequence of binary (zero (0)  one (1)) random variables (RVs), along with their exact and approximate distributions, have been extensively studied in the literature. Their popularity is due to the fact that such statistics appear as useful theoretical models in many research areas including statistics (e.g. hypothesis testing), engineering (e.g. system reliability and quality control), biology (e.g. population genetics and DNA sequence analysis), computer science (e.g. encoding/decoding/transmission of digital information) and financial engineering (e.g. insurance and risk analysis).
In such applications, a key point is the understanding how 1s and 0s are distributed and combined as elements of a 0−1 sequence (finite or infinite, memoryless or not) and eventually forming runs of 1s or 0s which are enumerated according to certain counting schemes. Each scheme defines how runs of same symbols or strings (patterns) of both symbols are formed and consequently are enumerated. A counting scheme may depend on, among other considerations, whether overlapping counting is allowed or not as well as if the counting starts or not from scratch when a run/string of a certain size has been so far enumerated.
The counting scheme as well as the intrinsic uncertainty of a 0−1 sequence are often suggested by the applications. Probabilistic models, in common use, for the internal structure of a 0−1 sequence include the model of a sequence with elements independent of each other or a model for which it is assumed some kind of dependence among the elements of it. The methods used to derive exact/approximating, marginal/joint probability distributions include combinatorial analysis, generating functions, finite Markov chain imbedding technique, recursive schemes as well as normal, Poisson and large deviation approximations.
For extensive reviews of the recent literature on the distribution theory of runs and patterns we refer to Balakrishnan and Koutras (2002) and Fu and Lou (2003). Current works on the subject include, among others, those of Antzoulakos and Chadjiconstantinidis (2001); Eryilmaz (2006, 2015, 2016, 2017); Eryilmaz and Yalcin (2011); Johnson and Fu (2014); Koutras (2003); Koutras et al. (2016); Makri and Psillakis (2015); Makri et al. (2013) and Mytalas and Zazanis (2013, 2014).
In this article we derive expressions for a conditional distribution of a trivariate statistic. Its components denote the number of runs of 1s of length exceeding a fixed threshold number, the number of 1s in such runs of 1s and the length of the minimum sequence’s segment in which these runs are concentrated. The study is developed on a sequence of twostate (0−1) Markovdependent trials. The runs are enumerated according to Mood’s (1940) counting scheme.
More specifically, the manuscript is organized as follows. In Section 2 we present some preliminary material, including notation and definitions, necessary to develop our results which are obtained in Section 4. In Section 3 we give a motivation along with a statement of the aim of the work. A numerical example, showed in Section 5, clarifies the theoretical results of Section 4. A discussion on the results as well as a note on a future work are given in Section 6.
Throughout the article, for integers, n, m, \({n\choose m}\) denotes the extended binomial coefficient (see, Feller (1968), pp. 50, 63), ⌊x⌋ stands for the greatest integer less than or equal to x and δ _{ ij } denotes the Kronecker delta fuction of the integer arguments i and j. Further, for α>β, we apply the conventions \(\sum _{i=\alpha }^{\beta }y_{i}=0\), \(\prod _{i=\alpha }^{\beta }y_{i}=1\), \(\sum _{i=\alpha }^{\beta }\mathbf {Y}^{(i)}=\mathbf {O}\equiv {\scriptsize \left (\begin {array}{cc} 0 &0\\ 0 & 0 \end {array}\right)}\), \(\prod _{i=\alpha }^{\beta }\mathbf {Y}^{(i)}=\mathbf {I}\equiv {\scriptsize \left (\begin {array}{cc} 1 &0\\ 0 & 1 \end {array}\right)}\), where y _{ i } and Y ^{(i)} are scalars and 2×2 matrices, respectively.
Preliminaries
2.1 Run counting statistics
Let \(\{X_{t}\}_{t=1}^{n}\), n≥1, be the first n trials of a binary (0−1) sequence of RVs, X _{ t }=x _{ t }∈{0,1}. A run of 1s, is a (sub)sequence of \(\{X_{t}\}_{t=1}^{n}\) consisting of consecutive 1s, the number of which is referred to as its length, preceded and succeeded by 0s or by nothing.
Given a fixed integer k, 1≤k≤n, a ktuple of 1s is a run of 1s of length k or more. In the paper we will deal with the following statistics defined on a \(01 \{X_{t}\}_{t=1}^{n}\). For details see, e.g. Makri et al. (2015) and the references therein.
(I) G _{ n,k } denoting the number of ktuples of 1s, 1≤k≤n. In particular, G _{ n,1} denotes the number of 1tuples of 1s, i.e. it represents the number R _{ n }≡G _{ n,1} of all runs of 1s in the sequence. Using the convention X _{0}=X _{ n+1}≡0, we can define G _{ n,k } as
where
(II) S _{ n,k } denoting the number of 1s in the G _{ n,k } ktuples of 1s; i.e. S _{ n,k } represents the sum of lengths of the G _{ n,k } ktuples of 1s, 1≤k≤n. In particular S _{ n,1} represents the number of all 1s in the sequence; hence, the number of 0s, Y _{ n }, in the sequence is Y _{ n }=n−S _{ n,1}. S _{ n,k } is formally defined as
Readily, k G _{ n,k }≤S _{ n,k }.
(III) L _{ n }, n≥1, denoting the length of the longest run of 1s in the sequence. By setting
we have that
Readily L _{ n }<k iff G _{ n,k }<1.
(IV) For G _{ n,k }≥1, 1≤k≤n, D _{ n,k } denotes the distance (number of trials) between and including the first 1 of the first ktuple of 1s and the last 1 of the last ktuple of 1s in the sequence. If there is only one ktuple of 1s in the sequence then D _{ n,k } denotes its length. That is, D _{ n,k } represents the size (length) of the minimum (sub)sequence of \(\{X_{t}\}_{t=1}^{n}\) in which all G _{ n,k } ktuple of 1s are concentrated. In particular, D _{ n,1} represents the length of the minimum segment of the sequence containing all R _{ n } runs of 1s or in other words all S _{ n,1} 1s appearing in the sequence. For G _{ n,k }≥1, 1≤k≤n, D _{ n,k } can be formally defined as
where
Readily, D _{ n,k }=S _{ n,k }=L _{ n }, if G _{ n,k }=1 and D _{ n,k }>S _{ n,k }>L _{ n }, if G _{ n,k }>1.
(V) For G _{ n,k }≥1, 1≤k≤n, set V _{ n,k }=(D _{ n,k },G _{ n,k },S _{ n,k }). This is the RV we focus on in the article.
Example: By way of illustration consider the trials 1110001100010001010011101111001001001001 numbered from 1 to 40. Then, L _{40}=4 and V _{40,1}=(40,11,19), V _{40,2}=(28,4,12), V _{40,3}=(28,3,10), V _{40,4}=(4,1,4).
2.2 Internal structure’s models
A general enough model for the internal structure of a \(01 \{X_{t}\}_{t=1}^{n}\), n≥2, is that of the first n trials of a homogeneous 0−1 Markov chain of first order (HMC1). On such a model we will develop our results. Accordingly, we next state the necessary notation/definitions.
Let {X _{ t }}_{ t≥1} be a HMC1 with state space ={0,1}, one step transition probability matrix
with
and probability distribution vector at time t
with
Readily, because of the homogeneity of {X _{ t }}_{ t≥1}, it holds
with
where \(\mathbf {e}_{i}^{'}\) is the transpose (i.e. the column vector) of the row vector e _{ i }, \(i\in {\mathcal {A}}\), with e _{0}=(1,0) and e _{1}=(0,1).
In particular, for p _{01}+p _{10}≠0, i.e. P≠I, it holds
The setup of a 0−1 HMC1 \(\{X_{t}\}_{t=1}^{n}\), n≥2, covers the case of a 0−1 sequence of independent and identically distributed (IID) RVs, too. This is so, because a \(01 \{X_{t}\}_{t=1}^{n}\), n≥2, IID sequence with
is a particular HMC1 with
2.3 A combinatorial result
In combinatorial analysis which will be used in Section 4, the following result, recalled from Makri et al. (2007), is useful. The coefficient
represents the number of allocations of α indistinguishable balls into r distinguishable cells where each of the m, 0≤m≤r, specified cells is occupied by at most k balls. Equivalently, it gives the number of nonnegative integer solutions of the linear equation x _{1}+x _{2}+…+x _{ r }=α with the restrictions, for m≥1, \(0\leq x_{i_{j}}\leq k\), 1≤j≤m, for some specific mcombination {i _{1},i _{2},…,i _{ m }} of {1,2,…,r}, and no restrictions on x _{ j }s, 1≤j≤r, for m=0.
Moreover, H _{ r }(α,r,k) is Riordan’s (1964, p. 104) coefficient
Motivation and aim of the work
In a study of a 0−1 sequence \(\{X_{t}\}_{t=1}^{n}\), n≥3, it is reasonable for one to be interested in the probabilistic behavior of RV V _{ n,k }=(D _{ n,k },G _{ n,k },S _{ n,k }). This happens because jointly its components provide a more refined view of the internal clustering structure of the sequence than the information extracted by each one alone.
Interpreting a ktuple of 1s as a cluster of consecutive 1s of size at least k, D _{ n,k } represents the size of the minimum segment of \(\{X_{t}\}_{t=1}^{n}\) in which G _{ n,k } clusters of size at least k and at most L _{ n } are concentrated. The overall density of G _{ n,k } clusters, with respect to the number of 1s in them, as well as of the minimum concentration segment is evaluated by S _{ n,k }. Large values of D _{ n,k } suggest that these G _{ n,k } clusters spread over the interval between the left and the right side of the sequence whereas small values of D _{ n,k } indicate rather that the clusters are concentrated in a segment of the sequence of small size leaving the rest part(s) of the sequence empty of such clusters.
In addition to this information, a large value of S _{ n,k } paired with a small value of G _{ n,k } indicates the existence of clusters of 1s of a large size and therefore a trend whereas the same value of S _{ n,k } paired with a large value of G _{ n,k } indicates rather a distribution of clusters of small size in the (sub)sequence in which they are concentrated.
Therefore, based on the former interpretation, the motivation for the study as well as the usefulness of the statistic V _{ n,k }=(D _{ n,k },G _{ n,k },S _{ n,k }) is apparent. In the sequel, we assume that G _{ n,k }≥2 in order to have at least two ktuples of 1s in the sequence and accordingly the distance D _{ n,k } is not a degenerate one. Moreover, this assumption is a common one in an application area of D _{ n,k }; e.g., in detecting pattern (tandem or nontandem direct) repeats in DNA sequences (Benson 1999).
For 1≤k≤n, set
and for n≥3, 1≤k≤⌊(n−1)/2⌋, define
and for (d,m,s)∈Ω _{ n,k },
The paper provides exact closed form expressions for α _{ n,k }, h _{ n,k }(d,m,s) and eventually for v _{ n,k }(d,m,s) when V _{ n,k } is defined on a 0−1 HMC1/IID. The expressions are obtained via combinatorial analysis.
More specifically, closed formulae are established for the first time for h _{ n,k }(d,m,s), 1≤k≤⌊(n−1)/2⌋, when V _{ n,k } is defined on a 0−1 HMC1 with given P and p ^{(1)}. Since, the general frame of HMC1 covers as a particular case IID sequences, the so implied expressions for v _{ n,k }(d,m,s) are alternative to those obtained for v _{ n,k }(d,m,s), 1≤k≤⌊(n−1)/2⌋, by Makri et al. (2015) for IID sequences.
Moreover, for n≥3, 1≤k≤⌊(n−1)/2⌋, 2k+1≤d≤n, let
Therefore, since
hence, the work provides closed form expressions for determining f _{ n,k }(d) for HMC1 and IID \(01 \{X_{t}\}_{t=1}^{n}\). These expressions are alternative to those derived, for IID sequences, by Makri et al. (2015) for 1≤k≤⌊(n−1)/2⌋ as well as to those obtained, for HMC1, by Arapis et al. (2016) for k=1 and by Arapis et al. (2017) for 1≤k≤⌊(n−1)/2⌋.
Results
In a 0−1 sequence \(\{X_{t}\}_{t=1}^{n}\), n≥2, for 0≤y≤n, 0≤r≤⌊(n+1)/2⌋ and (i,j)∈{0,1}^{2}, define
Accordingly, for a HMC1 \(\{X_{t}\}_{t=1}^{n}\), n≥2, with given P and p ^{(1)}, it holds
for 2−(i+j)≤y≤n−(i+j), 1−δ _{ y,0}−δ _{ y,n }+δ _{ i+j,2}≤r≤ min{n−y,y−1+i+j} and \(\pi _{n}^{(i,j)}(y,r)=0\), otherwise.
Consequently, \(\pi _{n}^{(i,j)}(y,r)\), for a 0−1 IID sequence, reduces to
Theorem 1
For n≥3, (d,m,s)∈Ω _{ n,1}, \(0<p_{1}^{(1)}<1\), it holds
where ε _{ n }(d)=1, if n=d; \(p_{00}^{nd2}\left \{p_{10}p_{00}+p_{0}^{(1)}(p_{1}^{(1)})^{1}p_{01}\left [(nd1)p_{10}+p_{00}\right ]\right \}\), if n≥d+1.
Proof
For d=3,…,n−2, i=2,3,…,n−d, s=2,3,…,d−1, m=2,3,…, min{s,d−s+1} an element of the event \(\Gamma _{i,d,m,s}=\{U_{n,1}^{(1)}=i, D_{n,1}=d, R_{n}=m, S_{n,1}=s\}\) is a 0−1 sequence of length n with probability
Fix i. Then the number of elements of the event Γ _{ i,d,m,s } is \({s1\choose m1}{ds1\choose m2}\), since the number of allocations of s 1s in m runs of 1s is \({s1\choose m1}\) and the number of allocations of d−s 0s in m−1 runs of 0s is \({ds1\choose m2}\), so that
We use similar reasoning for the rest cases. Then summing with respect to i we get the result. □
For a sequence \(\{X_{t}\}_{t=1}^{n}\) of 0−1 IID RVs, h _{ n,1}(d,m,s) reduces to the explicit formula given in the next Corollary.
Corollary 1
For n≥3, (d,m,s)∈Ω _{ n,1}, 0<p _{1}<1, it is true that
In order to derive for HMC1, in the forthcoming Theorem 2, h _{ n,k }(d,m,s), 5≤2k+1≤n, we next recall, in Lemma 1, a result from (Makri et al.: On the concentration of runs of ones of length exceeding a threshold in a Markov chain, submitted).
Lemma 1
For (i,j)∈{0,1}^{2}, n≥2, set \(\lambda _{n,k}^{(i,j)}(x)=P(G_{n,k}=x,X_{1}=i,X_{n}=j)\), x=0,1. Then, it holds that:
(I) For 2≤k≤n−2+i+j,
(II) For k>n−2+i+j,
Theorem 2
For n≥5, 2≤k≤⌊(n−1)/2⌋, (d,m,s)∈Ω _{ n,k }, \(0<p_{1}^{(1)}<1\), it holds
where
and
Proof
For 1≤r _{1}≤r _{2}≤n let \(Y_{r_{1},r_{2}}\), \(R_{r_{1},r_{2}}\), \(L_{r_{1},r_{2}}\), \(S_{r_{1},r_{2},k}\), \(D_{r_{1},r_{2},k}\), \(G_{r_{1},r_{2},k}\) be RVs defined on the subsequence \(X_{r_{1}}, X_{r_{1}+1},\ldots,X_{r_{2}}\) of \(\{X_{t}\}_{t=1}^{n}\). For m≥2 define the event
An element of this event is a 0  1 sequence of length d, starting and ending with a 1, for which y _{ j }’s and z _{ j }’s, representing the lengths of the failure and success runs, respectively, satisfy the conditions:

(a)
y _{1}+y _{2}+…+y _{ r−1}=y, y _{ j }≥1, 1≤j≤r−1.

(b)
\(\phantom {\dot {i}\!}z_{1}+z_{i_{1}}+z_{i_{2}}+\ldots +z_{i_{m2}}+z_{r}=s\), z _{ j }≥k, j∈{1,i _{1},i _{2},…,i _{ m−2},r}, for some specific combination {1,i _{1},i _{2},…,i _{ m−2},r} of {1,2,…,r−1,r} among the \({r2\choose m2}\) ones.

(c)
\(z_{i_{m1}}+z_{i_{m}}+\ldots +z_{i_{r2}}=dys\), \(1\leq z_{i_{j}}\leq k1\), m−1≤j≤r−2, for {i _{ m−1},…,i _{ r−2}}∈{1,2,…,r}−{1,i _{1},i _{2},…,i _{ m−2},r}.
Fix i _{1},i _{2},…,i _{ m−2}. Then the number of such sequences, i.e. the number of solutions of the system (a)(c), is
and each such sequence has probability
Hence,
For k+2≤i≤n−k−d, m≥2, we have that
By similar reasoning we get the remaining cases of i, i.e. 1≤i≤k+1 and n−d+1−k≤i≤n−d+1. Then summing with respect to i, y and r we get the result. □
Having found h _{ n,k }(d,m,s), we next proceed to obtain v _{ n,k }(d,m,s). In accomplishing it, the required probabilities α _{ n,k } for HMC1 are recalled, in Lemma 2, from Arapis et al. (2016) for k=1, and they are computed via Lemma 1 for 2≤k≤⌊(n−1)/2⌋.
Lemma 2
For n≥k≥1, the probability α _{ n,k }, for HMC1, is computed via the expressions:(I) For k=1,
and
(II) For 2≤k≤n,
Theorem 3
For n≥3, 1≤k≤⌊(n−1)/2⌋, (d,m,s)∈Ω _{ n,k }, \(0<p_{1}^{(1)}<1\), the PMF v _{ n,k }(d,m,s) for a HMC1, with given P and p ^{(1)}, is calculated by
where α _{ n,k } and h _{ n,k }(d,m,s) are provided by Lemma 2 and Theorems 1 (for k=1) and 2 (for 2≤k≤⌊(n−1)/2⌋), respectively.
Remark 1
For IID sequences, in implementing Theorem 3, one has to take into consideration Eqs. (10)  (11), (19) and (21). Moreover, for speeding up calculations, one has to set π _{ n }(y) in front of the inner summation in (22).
A numerical example
In this example we compute some indicative numerics concerning two model (i.e. HMC1 and IID) 0−1 sequences \(\{X_{t}\}_{t=1}^{n}\) which are considered in the paper. The common length of these was taken small, i.e. n=8, so that the required computations can also be carried out by a hand/pocket calculator and thus it is possible to gain insight in the formulae developed in Section Results, and also because of space limitations. The sequences that have been used are as follows. Table 1: An IID sequence with p _{1}=0.5. Table 2: A HMC1 sequence with p _{00}=p _{11}=0.9, \(p_{1}^{(1)}=0.5\).
Both tables depict for k=1,2,3, v _{8,k }(d,m,s), (d,m,s)∈Ω _{8,k } and f _{8,k }(d), 2k+1≤d≤8 illustrating the numeric values of the involved probabilities. v _{8,k }(d,m,s) and f _{8,k }(d) were computed via Eqs. (29) and (17), respectively.
Discussion and further study
In this article we have derived exact closed form expressions for PMF v _{ n,k }(d,m,s), n≥3, 1≤k≤⌊(n−1)/2⌋, (d,m,s)∈Ω _{ n,k }, of the RV V _{ n,k }∣_{ n,k } defined on a 0−1 sequence of homogeneous Markovdependent trials. The method used is a combinatorial one relied on results exploiting the internal structure of such a sequence.
As it is noticed in the Introduction the application domain of runs contains a diverse range of fields. Indicative potential ones are next discussed.
Encoding, compression and transmission of digital information calls for the understanding the distributions of runs of 1s or 0s. Such a knowledge helps in analyzing, and also in comparing, several techniques used in communication networks. In such networks 0−1 data ranging from a few kilobytes (e.g. emails) to many gigabytes of greedy multimedia applications (e.g. video on demand) are highly encoded, decoded and eventually proceeded under security. For details, see e.g., Sinha and Sinha (2009), Makri and Psillakis (2011a) and Tabatabaei and Zivic (2015).
An area where the study of runs of 1s and 0s has become increasingly useful is the field of bioinformatics or computational biology. For instance, molecular biologists design similarity tests between two DNA sequences where a 1 is interpreted as a match of the sequences at a given position and everything else as a 0. Moreover, the probabilistic analysis of such sequences according to the form, the length and the number of detected patterns as well as of the positions and the lengths of the segments of the sequence in which they are concentrated, probably suggests a functional reason for the internal structure of the examined sequence. The latter facts might be useful in suggesting a further investigation of the underline sequence(s) by biologists. See, e.g. Avery and Henderson (1999), Benson (1999) and Nuel et al. (2010).
Another active area where run statistics, in particular G _{ n,k } and S _{ n,k }, have interesting statistical applications is that connected to hypothesis testing; e.g., in tests of randomness. For a systematic study of such a topic, we refer among others, the works of Koutras and Alexandrou (1997) and Antzoulakos et al. (2003).
Accordingly, it is reasonable for one to use the exact expressions obtained for v _{ n,k }(d,m,s) in applications like the ones mentioned above. This is so, because this distribution, as a joint one, is more flexible than each one of its marginals which have been used in such applications. See, e.g. Lou (2003), Makri and Psillakis (2011b) and Arapis et al. (2016).
Moreover, in handling 0  1 sequences of a large length, with dependent or not elements, a Monte  Carlo simulation, based on Eqs. (1)  (4) would be a useful tool in obtaining approximate values for v _{ n,k }(d,m,s). In addition, the general approximating methods, suggested by Johnson and Fu (2014), might be helpful in deriving approximate values for f _{ n,k }(d).
References
Antzoulakos, DL, Bersimis, S, Koutras, MV: On the distribution of the total number of run lengths. Ann. Inst. Statist. Math. 55, 865–884 (2003).
Antzoulakos, DL, Chadjiconstantinidis, S: Distributions of numbers of success runs of fixed length in Markov dependent trials. Ann. Inst. Statist. Math. 53, 559–619 (2001).
Arapis, AN, Makri, FS, Psillakis, ZM: On the length and the position of the minimum sequence containing all runs of ones in a Markovian binary sequence. Statist. Probab. Lett. 116, 45–54 (2016).
Arapis, AN, Makri, FS, Psillakis, ZM: Distribution of statistics describing concentration of runs in non homogeneous Markovdependent trials. Commun. Statist. Theor. Meth. (2017). doi:10.1080/03610926.2017.1337144.
Avery, PJ, Henderson, D: Fiting Markov chain models to discrete state series such as DNA sequences. Appl. Statist. 48(Part 1), 53–61 (1999).
Balakrishnan, N, Koutras, MV: Runs and Scans with Applications. Wiley, New York (2002).
Benson, G: Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).
Eryilmaz, S: Some results associated with the longest run statistic in a sequence of Markov dependent trials. Appl. Math. Comput. 175, 119–130 (2006).
Eryilmaz, S: Discrete time shock models involving runs. Statist. Probab. Lett. 107, 93–100 (2015).
Eryilmaz, S: Generalized waiting time distributions associated with runs. Metrika. 79, 357–368 (2016).
Eryilmaz, S: The concept of weak exchangeability and its applications. Metrika. 80, 259–271 (2017).
Eryilmaz, S, Yalcin, F: Distribution of run statistics in partially exchangeable processes. Metrika. 73, 293–304 (2011).
Feller, W: An Introduction to Probability Theory and Its Applications. 3rd Ed., Vol. I. Wiley, New York (1968).
Fu, JC, Lou, WYW: Distribution Theory of Runs and Patterns and Its Applications: A finite Markov chain imbedding approach. World Scientific, River Edge (2003).
Johnson, BC, Fu, JC: Approximating the distributions of runs and patterns. J. Stat. Distrib. Appl. 1:5, 1–15 (2014).
Koutras, MV: Applications of Markov chains to the distribution of runs and patterns. In: Shanbhag, DN, Rao, CR (eds.)Handbook of Statistics, pp. 431–472. Elsevier, NorthHolland (2003).
Koutras, MV, Alexandrou, V: Nonparametric randomness tests based on success runs of fixed length. Statist. Probab. Lett. 32, 393–404 (1997).
Koutras, VM, Koutras, MV, Yalcin, F: A simple compound scan statistic useful for modeling insurance and risk management problems. Insur. Math. Econ. 69, 202–209 (2016).
Lou, WYW: The exact distribution of the ktuple statistic for sequence homology. Statist. Probab. Lett. 61, 51–59 (2003).
Makri, FS, Philippou, AN, Psillakis, ZM: Success run statistics defined on an urn model. Adv. Appl. Prob. 39, 991–1019 (2007).
Makri, FS, Psillakis, ZM: On success runs of a fixed length in Bernoulli sequences: Exact and asymptotic results. Comput. Math. Appl. 61, 761–772 (2011a).
Makri, FS, Psillakis, ZM: On runs of length exceeding a threshold: normal approximation. Stat. Papers. 52, 531–551 (2011b).
Makri, FS, Psillakis, ZM: On ℓoverlapping runs of ones of length k in sequences of independent binary random variables. Commun. Statist. Theor. Meth. 44, 3865–3884 (2015).
Makri, FS, Psillakis, ZM, Arapis, AN: Counting runs of ones with overlapping parts in binary strings ordered linearly and circularly. Intern. J. Statist. Probab. 2, 50–60 (2013).
Makri, FS, Psillakis, ZM, Arapis, AN: Length of the minimum sequence containing repeats of success runs. Statist. Probab. Lett. 96, 28–37 (2015).
Mood, AM: The distribution theory of runs. Ann. Math. Statist. 11, 367–392 (1940).
Mytalas, GC, Zazanis, MA: Central limit theorem approximations for the number of runs in Markovdependent binary sequences. J. Statist. Plann. Infer. 143, 321–333 (2013).
Mytalas, GC, Zazanis, MA: Central limit theorem approximations for the number of runs in Markovdependent multitype sequences. Commun. Statist. Theor. Meth. 43, 1340–1350 (2014).
Nuel, G, Regad, L, Martin, J, Camproux, AC: Exact distribution of a pattern in a set of random sequences generated by a Markov source: applications to biological data. Algorithm Mol. Biol. 5, 1–18 (2010).
Riordan, AM: An Introduction to Combinatorial Analysis. Second Ed. John Wiley, New York (1964).
Sinha, K, Sinha, BP: On the distribution of runs of ones in binary trials. Comput. Math. Appl. 58, 1816–1829 (2009).
Tabatabaei, SAH, Zivic, N: A review of approximate message authentication codes. In: Zivic, N (ed.)Robust Image Authentication in the Presence of Noise, pp. 106–127. Springer International Publishing AG, Cham (ZG), Switzerland (2015).
Acknowledgements
The authors wish to thank the Editor for the thorough reading, and the anonymous reviewers for useful comments and suggestions which improved the article.
Author information
Affiliations
Contributions
The authors, ANA, FSM and ZMP with the consultation of each other carried out this work and drafted the manuscript together. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Arapis, A., Makri, F. & Psillakis, Z. Joint distribution of ktuple statistics in zeroone sequences of Markovdependent trials. J Stat Distrib App 4, 26 (2017). https://doi.org/10.1186/s4048801700805
Received:
Accepted:
Published:
Keywords
 Exact Distributions
 Runs
 Binary trials
 Markov chain
AMS Subject Classification
 Primary 60E05, 62E15
 Secondary 60J10, 60C05