# Approximating the distributions of runs and patterns

## Abstract

The distribution theory of runs and patterns has been successfully used in a variety of applications including, for example, nonparametric hypothesis testing, reliability theory, quality control, DNA sequence analysis, general applied probability and computer science. The exact distributions of the number of runs and patterns are often very hard to obtain or computationally problematic, especially when the pattern is complex and n is very large. Normal, Poisson and compound Poisson approximations are frequently used to approximate these distributions. In this manuscript, we (i) study the asymptotic relative error of the normal, Poisson, compound Poisson and finite Markov chain imbedding and large deviation approximations; and (ii) provide some numerical studies to comparing these approximations with the exact probabilities for moderately sized n. Both theoretical and numerical results show that, in the relative sense, the finite Markov chain imbedding approximation performs the best in the left tail and the large deviation approximation performs best in the right tail.

### AMS Subject Classification

Primary 60E05; Secondary 60J10

## Introduction and notation

Let ${\left\{{X}_{i}\right\}}_{i=1}^{n}$ be a sequence of m-state trials (m≥2) taking values in the set $\mathcal{S}=\left\{{s}_{1},\dots ,{s}_{m}\right\}$ of m symbols. For simplicity, ${\left\{{X}_{i}\right\}}_{i=1}^{n}$ will be denoted {X i } and n will be allowed to be . A simple pattern$\Lambda ={s}_{{i}_{1}}{s}_{{i}_{2}}\cdots {s}_{{i}_{\ell }}$, of length , is the juxtaposition of (not necessarily distinct) symbols from . Given a simple pattern Λ, we let X n (Λ) denote the number of either non-overlapping or overlapping occurrences of Λ in the sequence ${\left\{{X}_{i}\right\}}_{i=1}^{n}$, where the method of counting will be made clear by the context. The waiting time W(Λ,x) until the x’th occurrence of the simple pattern Λ in ${\left\{{X}_{i}\right\}}_{i=1}^{n}$ is thus defined by

$W\left(\Lambda ,x\right)=inf\left\{n\in \mathbb{N}:{X}_{n}\left(\Lambda \right)=x\right\},$

and, by convention, the waiting time for the first occurrence is denoted W(Λ)=W(Λ,1). Finally, we define the inter arrival times

${W}_{i}\left(\Lambda \right)=W\left(\Lambda ,i\right)-W\left(\Lambda ,i-1\right),\phantom{\rule{2em}{0ex}}\text{for}\phantom{\rule{1em}{0ex}}i=1,2,\dots \text{,}$

where W(Λ,0):=0.

We say that two patterns Λ1 and Λ2 are distinct if neither Λ1 appears in Λ2 nor Λ2 appears in Λ1. If Λ1,…,Λ r are pairwise distinct simple patterns, we define the compound pattern $\Lambda =\bigcup _{i=1}^{r}{\Lambda }_{i}$, where an occurrence of any Λ i is considered an occurrence of Λ. For a compound pattern Λ=Λ1Λ r , we similarly define

${X}_{n}\left(\Lambda \right)=\sum _{j=1}^{r}{X}_{n}\left({\Lambda }_{j}\right).$

The waiting times W(Λ,x), W(Λ) and W i (Λ) are then defined as above, and often referred to as sooner waiting times.

From these definitions it is easy to see that, for any simple or compound pattern Λ, x and n, the events {X n (Λ)<x} and {W(Λ,x)>n} are equivalent and hence

$\mathbb{P}\left\{{X}_{n}\left(\Lambda \right)n\right\},$
(1)

which provides a convenient way of studying the exact and approximate distribution of X n (Λ) through the waiting time distributions of W(Λ,x).

Throughout this paper, unless specified otherwise, we assume that the trials {X i } are either independent and identically distributed (i.i.d.) or first order Markov dependent; the pattern Λ is either simple or compound; and the counting of occurrences of Λ is in a non-overlapping fashion.

The distribution of the number of runs and patterns in a sequence of multi-state trials or random permutations of a set of integers have been successfully used in various fields in applied probability, statistics and discrete mathematics. Examples include reliability theory, quality control, DNA sequence analysis, psychology, ecology, astronomy, nonparametric tests, successions, and the Eulerian and Simon-Newcomb numbers (the latter 3 being defined for permutations). Two recent books, Balakrishnan and Koutras (2002) and Fu and Lou (2003), provide some scope of the distribution theory of runs and patterns and Martin et al. (2010) and Nuel et al. (2010) provides some extensions to sets of sequences.

Given a pattern Λ, the exact distribution of X n (Λ) traditionally has been determined using combinatoric analysis on a case by case basis. The formulae for these distributions are often very complex and computationally problematic. Even for many simple patterns, their distributions in terms of combinatoric analysis remains unknown, especially when the {X i } are Markov dependent multi-state trials.

The waiting time W(Λ) for the first occurrence of certain types of runs and patterns have been studied by many authors. See, for example, Blom and Thorburn (1982), Gerber and Li (1981), Schwager (1983), and Solov’ev (1966). More recently, Fu and Koutras (1994) developed a method for determining the exact distributions of X n (Λ) and W(Λ) for any simple or compound Λ in either i.i.d. or Markov dependent trials (see also Fu and Lou 2003). The method was referred to as the Finite Markov Chain Imbedding (FMCI) technique, which can be easily described as follows: given a simple or compound pattern Λ, there exists a finite Markov chain {Y i } defined on a finite state space, say Ω={1,…,d,α}, with an absorbing state α and transition probability matrix of the form where c is a column vector. The distribution of the waiting time for Λ is given by

$\mathbb{P}\left\{W\left(\Lambda \right)=n\right\}={\mathbit{\xi }}_{0}{\mathbf{N}}^{n-1}\left(\mathbf{\text{I}}-\mathbf{N}\right){1}^{\prime }$
(3)

where ξ0 is the initial distribution, N is the essential transition probability matrix (i.e. the sub-stochastic matrix consisting of only the transient states of {Y i }) as defined in (2), I is a d×d identity matrix and 1=(1,1,…,1) is a 1×d row-vector. Furthermore, the random variable X n (Λ), the number of occurrences of Λ in {X i }, is also finite Markov chain imbeddable and its distribution is given by

$\mathbb{P}\left\{{X}_{n}\left(\Lambda \right)n\right\}={\mathbit{\xi }}_{0}{\mathbf{N}}_{x}^{n}{1}^{\prime },$
(4)

where the essential transition probability matrix N x has the form

${\mathbf{N}}_{x}=\left[\begin{array}{ll}\mathbf{N}& \mathbf{C}\\ \mathbf{N}& \mathbf{C}& 0\\ \phantom{\rule{0.75em}{0ex}}\ddots & \phantom{\rule{0.75em}{0ex}}\ddots \\ 0& \mathbf{N}& \mathbf{C}\\ \mathbf{N}\end{array}\right]\phantom{\rule{0.25em}{0ex}},$
(5)

the matrix N is given by (2), and the matrix C defines the “continuation” transition probabilities from one occurrence to the next and depends on c in (2).

If the pattern Λ is long and complex and n is very large, then the computation of $\mathbb{P}\left\{{X}_{n}\left(\Lambda \right)=x\right\}$ can become problematic and, to overcome this problem, various asymptotic approximations have been developed for these probabilities.

In real applications, if the exact distribution is not available or is hard to compute, it is important to know which approximations perform well and are easy to compute. Furthermore, it is important to know how these approximations perform with respect to each other and the exact distribution from both a theoretical and numerical standpoint. The aims of this manuscript are two-fold: (i) we first study the asymptotic relative error of the normal, Poisson (or compound Poisson), and FMCI approximations with respect to the exact distribution; and (ii) we then provide a numerical study of these three approximations with the exact probabilities in cases where x is fixed and n and when n is fixed and x varies. As an important byproduct, the FMCI technique allows the normal and Poisson approximations to be applied in more cases, for example, the distribution of compound patterns and patterns in Markov dependent trials.

## The approximations

### Normal approximation

The normal approximation is one of the most popular for approximating the distribution of the number of runs or patterns X n (Λ) in Statistics. In general, when Λ is simple or compound, the trials are i.i.d., and the counting is non-overlapping, by appealing to (1) and renewal arguments, it has been shown that X n (Λ) is asymptotically normally distributed (cf. Fu and Lou 2007; Karlin and Taylor 1975). The form of the approximation is

$\underset{n\to \infty }{lim}\mathbb{P}\left\{\frac{{X}_{n}\left(\Lambda \right)-n/{\mu }_{W}}{\sqrt{n{\sigma }_{W}^{2}{\mu }_{W}^{-3}}}\le u\right\}=\Phi \left(u\right),$
(6)

where Φ(·) denotes the standard normal distribution function and μ W and ${\sigma }_{W}^{2}$ are the mean and variance of W(Λ) respectively, which are given by

$\begin{array}{ll}{\mu }_{W}& ={\mathbit{\xi }}_{0}{\left(\mathbf{I}-\mathbf{N}\right)}^{-1}{1}^{\prime },\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\text{and}\phantom{\rule{2em}{0ex}}\end{array}$
(7)
$\begin{array}{ll}{\sigma }_{W}^{2}& ={\mathbit{\xi }}_{0}\left(\mathbf{I}+\mathbf{N}\right){\left(\mathbf{I}-\mathbf{N}\right)}^{-2}{1}^{\prime }-{\mu }_{W}^{2}.\phantom{\rule{2em}{0ex}}\end{array}$
(8)

Given a pattern Λ, it is well known that the mean μ W and the variance ${\sigma }_{W}^{2}$ are difficult to obtain via combinatoric arguments, especially when Λ is a compound pattern or the trials are Markov dependent. For example, as pointed out in Karlin (2005) and Kleffe and Borodovski (1992), approximate values of μ W and ${\sigma }_{W}^{2}$ must sometimes be used. Since W(Λ) is finite Markov chain imbeddeble, (7) and (8), provide the exact values.

The limit in (6) is appropriate when the sequence of inter arrival times {W i (Λ)} are i.i.d., which is the case for simple and compound patterns when the {X i } are i.i.d. and counting is non-overlapping. When occurrences of Λ correspond to a delayed renewal process, which can occur for Markov dependent trials and/or overlapping counting, we could use the mean and variance of W2(Λ) for the normalizing constants, which are easily obtained by modifying ξ0 in (7) and (8). Even more general cases can be handled by making use of a functional central limit theorem for Markov chains (see, for example, (Meyn and Tweedie 1993, §17.4) and (Asmussen 2003, Theorem 7.2, pg. 30) for the details).

### Poisson and compound poisson approximations

It is well known that, in a sequence of Bernoulli (p) trials, if n pλ as n, then the probability of k successes in n trials can be approximated by a Poisson probability with parameter λ, denoted $P\left(\lambda \right)$. This idea has been extended to certain patterns Λ and, under certain conditions, the distribution of X n (Λ) can be approximated by a Poisson distribution with parameter μ n in the sense that

${d}_{\text{TV}}\left(ℒ\left({X}_{n}\left(\Lambda \right)\right),P\left({\mu }_{n}\right)\right)<{\epsilon }_{n},$
(9)

where $ℒ\left(·\right)$ denotes the distribution (law) of a random variable and dTV(·,·) denotes the total variation distance.

The primary tool used to obtain μ n and the bound ε n is the Stein-Chen method (Chen 1975), and this method has been refined by various authors Arratia et al. (1990), Barbour and Eagleson (1983), Barbour and Eagleson (1984), Barbour and Eagleson (1987), Barbour and Hall (1984), Godbole (1990a), Godbole (1990b), Godbole (1991), Godbole and Schaffner (1993), and Holst et al. (1988). This method has also been extended to compound Poisson approximations for the distributions of runs and patterns and Barbour and Chryssaphinou (2001) provides an excellent theoretical review of these approximations.

In practice, ${\mu }_{n}=\mathbb{E}{X}_{n}\left(\Lambda \right)$ or the expectation of a closely related run statistic is used (cf. Balakrishnan and Koutras 2002, §5.2.3) so that, in the former case,

$\mathbb{P}\left\{{X}_{n}\left(\Lambda \right)=x\right\}\approx \frac{{\left(\mathbb{E}{X}_{n}\left(\Lambda \right)\right)}^{x}}{x!}exp\left\{-\mathbb{E}{X}_{n}\left(\Lambda \right)\right\}.$
(10)

Finding $\mathbb{E}{X}_{n}\left(\Lambda \right)$ and the bound ε n is usually done on a case by case basis. For the mathematical details, the books (Barbour et al. 1992a) and (Balakrishnan and Koutras 2002) are recommended.

Let ${P}_{\phantom{\rule{0.3em}{0ex}}c}\left(\lambda ,\nu \right)$ denote the compound Poisson distribution, that is, the distribution of the random variable $\sum _{j=1}^{M}{Y}_{j}$ where the random variable M has a Poisson distribution with parameter λ and the Y j are i.i.d. having distribution ν. A compound Poisson distribution for approximating nonnegative random variables was suggested in Barbour et al. (1992b) (see also Barbour et al. (19951996)). The approximation is formulated similarly to the Poisson approximation:

${d}_{\text{TV}}\left(ℒ\left({X}_{n}\left(\Lambda \right)\right),{P}_{\phantom{\rule{0.3em}{0ex}}c}\left(\lambda ,\nu \right)\right)<{\epsilon }_{n}.$
(11)

The distribution of Nn,k, the number of non-overlapping occurrences of k consecutive successes in n i.i.d. Bernoulli trials, is one of the most important in this area and one of the most studied in the literature. Reversing the roles of S (success) and F (failure), the reliability of consecutive-k-out-of-n system, denoted C(k,n : F), is given by $\mathbb{P}\left\{{N}_{n,k}=0\right\}$. Even in this simple case (i.e. Λ=S SS), there are several ways to apply the Poisson approximation techniques. For example, (Godbole 1991, Theorem 2) shows that approximating Nn,k with a $P\left(\mathbb{E}{N}_{n,k}\right)$ distribution works well if certain conditions hold. Godbole and Schaffner (Godbole and Schaffner 1993, pg. 340) suggests an improved Poisson approximation for word patterns.

The primary difficulty in applying the Poisson approximation is the determination of the optimal parameter μ n , which is higly dependent on the structure of the pattern Λ. In particular, if Λ is long and has several uneven overlapping sub-patterns, then finding μ n by their method can be very tedious. In the sequel, we show that even the (asymptotic) best choice for μ n for Poisson approximations does not perform well in the relative sense.

### FMCI approximations

Approximations based on the FMCI approach depend on the spectral decomposition of the essential transition probability matrix N.

Let N be a w×w essential transition probability matrix associated with a finite Markov chain {Y n :n≥0} corresponding to the distribution of the waiting time W(Λ). Let 1>λ1≥|λ2|≥≥|λ w | denote the ordered eigenvalues of N, repeated according to their algebraic multiplicities, with associated (right) eigenvectors ${\mathbit{\eta }}_{1}^{\prime },{\mathbit{\eta }}_{2}^{\prime },\cdots \phantom{\rule{0.3em}{0ex}},{\mathbit{\eta }}_{w}^{\prime }$. When the geometric multiplicity of λ i is less than its algebraic multiplicity, we will use vectors of 0’s for the unspecified eigenvectors. The fact that λ1 can be taken as a positive real number and that η1 can be taken to be non-negative are consequences of the Perron-Frobenious Theorem for non-negative matrices ( Seneta cf.1981).

#### Definition 1

We will say that {Y n :n≥0}, or equivalently, N, satisfies the FMCI Approximation Conditions if

1. (i)

there exists constants a 1,…,a w such that

${1}^{\prime }=\sum _{i=1}^{w}{a}_{i}{\mathbit{\eta }}_{i}^{\prime },$
(12)
2. (ii)

λ 1 has algebraic multiplicity g and λ 1>|λ j | for all j>g.

Verifying these conditions is usually straightforward. They certainly hold if N is irreducible and aperiodic, but also hold in many other cases as well. For example, (12) requires only that 1 is in the linear space spanned by $\left\{{\mathbit{\eta }}_{1}^{\prime },{\mathbit{\eta }}_{2}^{\prime },\cdots \phantom{\rule{0.3em}{0ex}},{\mathbit{\eta }}_{w}^{\prime }\right\}$, which can hold even when N is defective (not diagonizable). Condition (ii) requires that the communication classes corresponding λ1 are aperiodic. That is, if Ψ is a communication class and N[Ψ] corresponds to the substocastic matrix N restricted to the states in Ψ, with largest eigenvalue λ1[Ψ], then all Ψ such that λ1[Ψ]=λ1 should be aperiodic. We also mention that the algebraic multiplicity of λ1 is the number of communication classes Ψ such that λ1[Ψ]=λ1.

Fu and Johnson (2009) give the following theorem.

#### Theorem 1

Let {X i } be a sequence of i.i.d. trials taking values in , let Λ be a simple pattern of length with d×d essential transition probability matrix N and let X n (Λ) be the number of non-overlapping occurrences of Λ in {X i }. If N satisfies the FMCI approximation conditions then, for any fixed x≥0,

$\mathbb{P}\left\{{X}_{n}\left(\Lambda \right)=x\right\}\sim {a}^{x+1}\left(\genfrac{}{}{0.0pt}{}{n-x\left(\ell -1\right)}{x}\right){\left(1-{\lambda }_{1}\right)}^{x}{\lambda }_{1}^{n-x},$
(13)

where$a=\sum _{j=1}^{g}{a}_{j}\left({\mathbit{\xi }}_{0}{\mathbit{\eta }}_{j}^{\prime }\right)$. If g=1, as is usually the case, then a=a1(ξ0η 1′).

Given a pattern Λ, the approximation in (13) requires finding the Markov chain imbedding associated with the waiting time W(Λ), the essential transition probability matrix N as well as its eigenvalues and associated eigenvectors. Usually, these steps are rather simple and can be easily automated together with (13). Even for very large n and large , say n=1,000,000 and =50, the CPU time is negligible. Fu and Johnson (2009) also provide details on extending these results to compound patterns, overlapping counting and Markov dependent trials.

For the purpose of comparing these approximations, we prefer to write (13) as

$\begin{array}{ll}\mathbb{P}\left\{{X}_{n}\left(\Lambda \right)=x\right\}& \sim {a}^{x+1}{\left(\frac{1-{\lambda }_{1}}{{\lambda }_{1}}\right)}^{x}\left(\genfrac{}{}{0.0pt}{}{n-x\left(\ell -1\right)}{x}\right)exp\left\{nln{\lambda }_{1}\right\}\phantom{\rule{2em}{0ex}}\end{array}$
(14)

Note that the approximation havs three parts: a constant part; a polynomial in n of degree x; and a third (dominant) part which converges to 0 exponentially fast as n.

More precisely, the FMCI approximation in (13) may be written as

$\begin{array}{ll}\mathbb{P}\left\{{X}_{n}\left(\Lambda \right)=x\right\}& ={a}^{x+1}{\left(\frac{1-{\lambda }_{1}}{{\lambda }_{1}}\right)}^{x}\left(\genfrac{}{}{0.0pt}{}{n-x\left(\ell -1\right)}{x}\right)\\ \phantom{\rule{1em}{0ex}}×exp\left\{nln{\lambda }_{1}\right\}\left[1+o\left({\left|\frac{{\lambda }_{g+1}}{{\lambda }_{1}}\right|}^{n/\left(x+1\right)-\ell }\right)\right].\end{array}$
(15)

Since |λg+1|<λ1, the term |λg+1/λ1|n/(x+1)− tends to 0 exponentially as n and hence is negligible if n/(x+1)− is moderate or large (say ≥50).

### Large deviation approximation

Fu et al. (2012) provide the following large deviation approximation for right-tail probabilities for the number of non-overlapping occurrences for simple patterns Λ. The reasons for providing only the right-tail large deviation approximation are (i) all of the above mentioned approximations fail to approximate the extreme right-tail probabilities and (ii) the FMCI approximation provides an accurate approximation for left-tail probabilities.

#### Theorem 2

Let $\epsilon =x{\mu }_{W}^{2}/\left(1+x{\mu }_{W}\right)$ and let

${\phi }_{W}\left(t\right)=1+\left({e}^{t}-1\right)\mathbit{\xi }{\left(\mathbf{I}-{e}^{t}\mathbf{N}\right)}^{-1}{1}^{\prime },$
(16)

be the moment generating function of W(Λ). Then

$\phantom{\rule{-10.0pt}{0ex}}\mathbb{P}\left\{{X}_{n}\left(\Lambda \right)\ge \mathbb{E}{X}_{n}\left(\Lambda \right)+\mathit{\text{nx}}\right\}={e}^{-\mathrm{n\beta }\left(\epsilon ,\Lambda \right)}\frac{1}{\sqrt{n}}\left\{{b}_{0}+{b}_{1}{n}^{-1}+\cdots +{b}_{m}{n}^{-m}+\mathcal{O}\left({n}^{-m-1}\right)\right\},$
(17)

where

$\beta \left(x,\Lambda \right)=\left(\frac{1}{{\mu }_{W}}+x\right)h\left(\epsilon ,\tau \right)=\left(\frac{1}{{\mu }_{W}}+x\right)\left[-\frac{\tau {\mu }_{W}}{1+x{\mu }_{W}}-ln{\phi }_{W\left(\Lambda \right)}\left(-\tau \right)\right],$
(18)
$h\left(\epsilon ,t\right)=\mathrm{\epsilon t}-ln{\phi }_{{\mu }_{W}-W\left(\Lambda \right)}\left(t\right)$

, τ is the solution to h(ε,τ)=0, and

$\begin{array}{ll}{b}_{0}& =\frac{1}{\mathrm{\sigma \tau }\sqrt{2\pi \left({\mu }^{-1}+x\right)}}\hfill \\ {b}_{1}& =\frac{1}{\mathrm{\sigma \tau }\sqrt{2\pi {\left({\mu }^{-1}+x\right)}^{3}}}\left\{-\frac{1}{{\sigma }^{2}{\tau }^{2}}+\frac{{h}^{\left(3\right)}\left(\epsilon ,\tau \right)}{2\tau {\sigma }^{4}}-\frac{{h}^{\left(4\right)}\left(\epsilon ,\tau \right)}{8{\sigma }^{4}}-\frac{5{\left({h}^{\left(3\right)}\left(\epsilon ,\tau \right)\right)}^{2}}{24{\sigma }^{6}}\right\}\\ \sigma & =\sqrt{-{h}^{\mathrm{\prime \prime }}\left(\epsilon ,\tau \right)}.\end{array}$
(19)

## Comparisons and relative error

For a given n, x and pattern Λ, we define the relative error of an approximation with respect to the exact probability $\mathbb{P}\left\{{X}_{n}\left(\Lambda \right)=x\right\}$ as

$R\left(x:E,A\right)=sgn\left(A-E\right)\left[max\left(\frac{E}{A},\frac{A}{E}\right)-1\right],$

where A stands for the approximate probability and E stands for the exact probability $\mathbb{P}\left\{{X}_{n}\left(\Lambda \right)=x\right\}$. This quantity, R(x:E,A), goes from − to and treats the importance of overestimation the same as underestimation. It is clear that R(x:E,A)>0 implies that the approximation is overestimating the exact probability and that R(x:E,A)<0 implies that the approximation is underestimating the exact probability. Since, for fixed x, the probability $\mathbb{P}\left\{{X}_{n}\left(\Lambda \right)=x\right\}$ converges to 0 exponentially fast as n, it follows that R(x:E,A)→± implies that the approximation tends to 0 with the wrong rate. If R(x:E,A) is near 0 then the approximation is close to the exact probability $\mathbb{P}\left\{{X}_{n}\left(\Lambda \right)=x\right\}$.

Note that R(x:E,A) is a function of x, n and the method of approximation used. The following theorem provides the asymptotic relative error for the Normal approximation (N), the Poisson approximation (P(μ n )) and the finite Markov chain imbedding approximation (F).

### Theorem 3

Let {X i } be a sequence of i.i.d. multi-state trials taking values in and let Λ be a simple pattern defined on . Then, for every fixed x, we have,

$\left(i\right)\phantom{\rule{2em}{0ex}}\underset{n\to \infty }{lim}R\left(x:E,F\right)=0;$
(20)
$\begin{array}{l}\left(\mathit{\text{ii}}\right)\phantom{\rule{2em}{0ex}}\underset{n\to \infty }{lim}R\left(x:E,P\left({\mu }_{n}\right)\right)=\left\{\begin{array}{cc}\infty ,& \text{if}\phantom{\rule{1em}{0ex}}{limsup}_{n}\phantom{\rule{0.25em}{0ex}}{\mu }_{n}/n<-ln{\lambda }_{1};\\ c\left(x\right),& \text{if}\phantom{\rule{1em}{0ex}}{lim}_{n}\phantom{\rule{0.25em}{0ex}}{\mu }_{n}/n=-ln{\lambda }_{1};\\ -\infty ,& \text{if}\phantom{\rule{1em}{0ex}}{liminf}_{n}\phantom{\rule{0.25em}{0ex}}{\mu }_{n}/n>-ln{\lambda }_{1};\end{array}\right\\end{array}$
(21)
$\begin{array}{l}\left(\mathit{\text{iii}}\right)\phantom{\rule{2em}{0ex}}\underset{n\to \infty }{lim}R\left(x:E,N\right)=\left\{\begin{array}{cc}\infty ,& \text{if}\phantom{\rule{1em}{0ex}}{\mu }_{W}/2\underset{W}{\overset{2}{\sigma }}\le -ln{\lambda }_{1};\\ -\infty ,& \text{if}\phantom{\rule{1em}{0ex}}{\mu }_{W}/2\underset{W}{\overset{2}{\sigma }}>-ln{\lambda }_{1};\end{array}\right\\end{array}$
(22)

where the exact probability is computed using (4) and

$c\left(x\right)={a}^{x+1}{\left(\frac{{\lambda }_{1}-1}{{\lambda }_{1}ln{\lambda }_{1}}\right)}^{x}-1.$

### Proof

Given a pattern Λ and x, for the finite Markov chain imbedding approximation we have

$\underset{n\to \infty }{lim}\frac{\mathbb{P}\left\{{X}_{n}\left(\Lambda \right)=x\right\}}{{a}^{x+1}{\left(\frac{1-{\lambda }_{1}}{{\lambda }_{1}}\right)}^{x}\left(\genfrac{}{}{0.0pt}{}{n-x\left(\ell -1\right)}{x}\right)exp\left\{nln{\lambda }_{1}\right\}}=1$

and hence (i) follows immediately from the definition of R(x:E,A) and Theorem 1.

For the Poisson approximation we have, since E/F1 by (i),

$\frac{E}{P\left({\mu }_{n}\right)}=\frac{E}{F}×\frac{F}{P\left({\mu }_{n}\right)}\sim \frac{F}{P\left({\mu }_{n}\right)}$

and hence

$\begin{array}{ll}\frac{E}{P\left({\mu }_{n}\right)}& =\frac{\mathbb{P}\left\{{X}_{n}\left(\Lambda \right)=x\right\}}{\frac{{\mu }_{n}^{x}}{x!}exp\left\{-{\mu }_{n}\right\}}\phantom{\rule{2em}{0ex}}\\ \sim \frac{{a}^{x+1}{\left(\frac{1-{\lambda }_{1}}{{\lambda }_{1}}\right)}^{x}\left(\genfrac{}{}{0.0pt}{}{n-x\left(\ell -1\right)}{x}\right)exp\left\{nln{\lambda }_{1}\right\}}{\frac{{\mu }_{n}^{x}}{x!}exp\left\{-{\mu }_{n}\right\}}.\phantom{\rule{2em}{0ex}}\end{array}$
(23)

If ${liminf}_{n}{\mu }_{n}/n>-ln{\lambda }_{1}$ then exp{n lnλ1+μ n } tends to 0 exponentially fast which overrides the polynomial term and hence R(x:E,P(μ n ))→− as n for all fixed x. Similarly, if ${limsup}_{n}{\mu }_{n}/n<-ln{\lambda }_{1}$, then R(x:E,P(μ n ))→ as n for all fixed x. Furthermore, if ${lim}_{n}{\mu }_{n}/n=-ln{\lambda }_{1}$, then the ratio yields

$\underset{n\to \infty }{lim}R\left(x:E,P\left(-nln{\lambda }_{1}\right)\right)={a}^{x+1}{\left(\frac{{\lambda }_{1}-1}{{\lambda }_{1}ln{\lambda }_{1}}\right)}^{x}-1$

and this completes the proof of (ii). Note also that, if ${limsup}_{n}{\mu }_{n}/n>-ln{\lambda }_{1}$ and ${liminf}_{n}{\mu }_{n}/n<-ln{\lambda }_{1}$, then ${lim}_{n}R\left(x:E,P\left({\mu }_{n}\right)\right)$ will not exist.

For the normal approximation we have that X n (Λ) is approximately normal with mean n/μ W and variance $n{\sigma }_{W}^{2}/{\mu }_{W}^{3}$ and hence

$\mathbb{P}\left\{{X}_{n}\left(\Lambda \right)=x\right\}\approx N={\int }_{x-1/2}^{x+1/2}\frac{1}{\sqrt{2\mathrm{\pi n}{\sigma }_{W}^{2}{\mu }_{W}^{-3}}}exp\left\{-\frac{{\left(t-n/{\mu }_{W}\right)}^{2}}{2n\underset{W}{\overset{2}{\sigma }}\underset{W}{\overset{-3}{\mu }}}\right\}\phantom{\rule{0.3em}{0ex}}\mathit{\text{dt}}$

Hence, provided n>μ W (x+1/2), we have

$\begin{array}{l}N\le \frac{1}{\sqrt{2\mathrm{\pi n}{\sigma }_{W}^{2}{\mu }_{W}^{-3}}}exp\left\{-\frac{{\left(x+1/2-n/{\mu }_{W}\right)}^{2}}{2n\underset{W}{\overset{2}{\sigma }}\underset{W}{\overset{-3}{\mu }}}\right\}.\end{array}$

Therefore, as in the proof of (ii), we are interested in the asymptotics of F/N, which yields

$\begin{array}{ll}\frac{F}{N}& \sim \sqrt{\frac{2\mathrm{\pi n}{\sigma }_{W}^{2}}{{\mu }_{W}^{3}}}\phantom{\rule{0.3em}{0ex}}{a}^{x+1}{\left(\frac{1-{\lambda }_{1}}{{\lambda }_{1}}\right)}^{x}\left(\genfrac{}{}{0.0pt}{}{n-x\left(\ell -1\right)}{x}\right)\phantom{\rule{2em}{0ex}}\\ \phantom{\rule{2em}{0ex}}\phantom{\rule{2em}{0ex}}×exp\left\{nln{\lambda }_{1}+\frac{{\left(x+1/2-n/{\mu }_{W}\right)}^{2}}{2n\underset{W}{\overset{2}{\sigma }}\underset{W}{\overset{-3}{\mu }}}\right\}.\phantom{\rule{2em}{0ex}}\end{array}$
(24)

We may rewrite the argument of the exponential function as

$n\left[ln{\lambda }_{1}+\frac{{\mu }_{W}}{2{\sigma }_{W}^{2}}{\left(\frac{{\mu }_{W}\left(x+1/2\right)}{n}-1\right)}^{2}\right],$

making it clear that (24) converges to if ${\mu }_{W}/2{\sigma }_{W}^{2}\ge -ln{\lambda }_{1}$ and 0 otherwise. Therefore, R(x:E,N)→ if ${\mu }_{W}/2{\sigma }_{W}^{2}\ge -ln{\lambda }_{1}$ and R(x:E,N)→− if ${\mu }_{W}/2{\sigma }_{W}^{2}<-ln{\lambda }_{1}$ and the proof of (iii) is complete.

Theorem 3 (ii) implies that asymptotically (for fixed x and n), the Poisson approximation performs poorly (in the relative sense) regardless of the value μ n used. When Λ is simple and does not have overlapping sub-patterns, taking ${\mu }_{n}=\mathbb{E}{X}_{n}\left(\Lambda \right)$ is normally recommended for the Poisson approximation (cf. Arratia et al. 1990). In this case, non-overlapping and overlapping counting is equivalent. The following corollary shows that, for fixed x, the Poisson approximation will (asymptotically) always overestimate the exact probability in the following sense.

### Corollary 1

Let Λ be a simple pattern defined on an i.i.d. sequence of multi-state trials. For${\mu }_{n}=\mathbb{E}{X}_{n}\left(\Lambda \right)$, we have

$\underset{n\to \infty }{lim}R\left(x:E,P\left({\mu }_{n}\right)\right)=\infty$

for all fixed x.

### Proof

Recall that, in this case, X n (Λ) is a renewal process with i.i.d. inter-renewal times with mean ${\mu }_{W}=\mathbb{EW}\left(\Lambda \right)$ and hence, by the elementary renewal theorem, we have $\mathbb{E}{X}_{n}\left(\Lambda \right)/n\to 1/{\mu }_{W}$ so that $\mathbb{E}{X}_{n}\left(\Lambda \right)\sim n/{\mu }_{W}$. Therefore, by Theorem 3 (ii), it is sufficient to show that n/μ W <−n lnλ1 for all sufficiently large n, or

${e}^{-1/{\mu }_{W}}>{\lambda }_{1}.$

Now, since $0<{\lambda }_{1}\in ℝ$ is a dominant eigenvalue of N, it follows that: $0<{\left(1-{\lambda }_{1}\right)}^{-1}\in ℝ$ is a dominant eigenvalue of the matrix (IN)−1=A=(a i j ); a i j ≥0 with at least one a i j >0; and A 1=(IN)−11μ W 1. Hence, by a simple corollary to the Perron-Frobenius Theorem for nonnegative matrices (cf. Karlin and Taylor 1975, Corollary 2.2, pg. 551), we have

$\frac{1}{1-{\lambda }_{1}}=\underset{n\to \infty }{limsup}{\left(\underset{i,j}{max}|\underset{\mathit{\text{ij}}}{\overset{\left(n\right)}{a}}|\right)}^{1/n}\le {\mu }_{W},$

where ${a}_{\mathit{\text{ij}}}^{\left(n\right)}={\left({\mathbf{\text{A}}}^{n}\right)}_{\mathit{\text{ij}}}$. Therefore, provided μ W <,

${e}^{-1/{\mu }_{W}}>1-\frac{1}{{\mu }_{W}}\ge {\lambda }_{1},$

which completes the proof.

Corollary 1 implies that, if ${\mu }_{n}\sim \mathbb{E}{X}_{n}\left(\Lambda \right)$, then the Poisson approximation will always overestimate the exact probability as n. Together with Theorem 3 (ii), this implies that using μ n n lnλ1 results in the best Poisson approximation as n.

We also comment that, for the normal approximation, both ${\mu }_{W}/2{\sigma }_{W}^{2}<-ln{\lambda }_{1}$ and ${\mu }_{W}/2{\sigma }_{W}^{2}\ge -ln{\lambda }_{1}$ are possible. As a simple example, suppose we have a sequence of i.i.d. Bernoulli (p) trials and Λ=S S S. If p=1/2, we obtain

${\mu }_{W}=14,\phantom{\rule{1em}{0ex}}{\sigma }_{W}^{2}=142\phantom{\rule{1em}{0ex}}\text{and}\phantom{\rule{1em}{0ex}}{\lambda }_{1}=0.9196434,$

and

$\frac{{\mu }_{W}}{2{\sigma }_{W}^{2}}=0.04929577<-ln{\lambda }_{1}=0.08376932.$

However, with p=0.9, we obtain

${\mu }_{W}=3.717421,\phantom{\rule{1em}{0ex}}{\sigma }_{W}^{2}=2.145694\phantom{\rule{1em}{0ex}}\text{and}\phantom{\rule{1em}{0ex}}{\lambda }_{1}=0.5419067;$

and

$\frac{{\mu }_{W}}{2{\sigma }_{W}^{2}}=0.8662513>-ln{\lambda }_{1}=0.6126614.$

Thus, R(x:E,N)→± are both possible depending on x, the pattern, and the probability structure of the {X i }.

## Numerical comparisons

In the previous section we showed that, for fixed x and n, the approximation based on the finite Markov chain imbedding technique outperforms the Poisson and normal approximations. In practice, however, one is interested in the performance of these approximations not only when x is fixed and n, but also when n is fixed (at some moderate value) and x varies. The reason we consider only large or moderate n in our numerical study is that, for small n, the FMCI technique easily gives the exact results. In this section we present some numerical experiments to illustrate the advantages (and disadvantages) of the methods discussed.

The approximations we compare are: the finite Markov chain approximation in (13) (FMCI); the Poisson approximation with ${\mu }_{n}=n/{\mu }_{W}\phantom{\rule{0.3em}{0ex}}\left(\sim \mathbb{E}{X}_{n}\left(\Lambda \right)\right)$ where μ W is calculated using (7) (Poisson); The normal approximation given in (6) (Normal); and the large deviation approximation given in Theorem 2 (LD), which is only for right-tail probabilities.

### Reliability of C(k,n:F) systems

A consecutive-k-out-of-n:F system is a system of n independent and linearly connected components, each with common (continuous) lifetime distribution F, in which the system fails if k consecutive components fail. At a given time t>0, the probability a component is working is p=1−F(t) and the probability a single component has failed is q=1−p and hence the probability the system has failed is equivalent to the probability that k (or more) consecutive components have failed, which is equivalent to the probability of k consecutive failures in a sequence of n Bernoulli trials with success probability p. Barbour et al. (1995) present a table of various bounds for system reliability based on a Poisson approximation and a compound approximation and compare these to bounds found in Fu (1985). Table 1 shows the exact probabilities and relative errors for the FMCI and Poisson approximations as well as the compound Poisson approximation in Barbour et al. (1995) (CP).

The FMCI approximation performs very well for the parameters tested here. As expected, the Poisson and compound Poisson approximations perform well when n qk is relatively small. When the reliability of the system is relatively low, the Poisson and compound Poisson approximations begin to degrade.

### Approximating the distribution of N n,k

Recall that Nn,k is the number of non-overlapping occurrences of k consecutive successes in {X i } (i.e. Nn,k=X n (Λ) with Λ=S SS of length k). By reversing the roles of success and failure, the reliability of C(k,n : F) systems can be related to the distribution of Nn,k. In this section we present some examples of approximating $\mathbb{P}\left\{{N}_{n,k}=x\right\}$ with the approximations FMCI, Normal, Poisson and LD.

Figure 1 shows the relative error R(x:E,A) in these approximations for (a) N2000,4; (b) N5000,4; and (c) N250000,6 when the probability of success is p=0.3. On all of the figures, the top axis is on a standard z-scale making use of the asymptotic mean and variance of X n (Λ) — namely,

$z=\frac{x-n/{\mu }_{W}}{\sqrt{n{\sigma }_{W}^{2}{\mu }_{W}^{-3}}}.$

We notice that the Finite Markov chain imbedding approximation (FMCI) performs very well in the left tail of the distribution in all cases. Its performance degrades as x gets large but its performance is more consistent than both the Poisson and Normal approximations in this case. The large deviation approximation performs well in the right tail in all cases. In (c), the FMCI approximation performs very well throughout most of the support. The Poisson approximations also perform well over most of the x considered. The normal approximation performs well in the neighbourhood of $\mathbb{E}{X}_{n}\left(\Lambda \right)$ but not in the tails.

As the probability of success p increases, the FMCI approximation still performs very well in the left tail, but it’s performance tends to degrade more quickly as x increases. The Poisson approximations also quickly degrades as p increases since $\mathbb{E}{N}_{n,k}$ increases. For larger p, the Normal approximation tends to work better near the mean. In the far left tail, the FMCI approximation is preferred and in the far right tail, the LD approximation is preferred.

### Biological sequences

Sequences of DNA nucleotides are of great interest (as are sequences of amino acids and other biological sequences). Figure 2 shows the relative errors for approximating $\mathbb{P}\left\{{X}_{n}\left(\Lambda \right)=x\right\}$ with Λ=A C G (n=1,000 and 10,000) and Λ=C A T T A G (n=500,000). We see that the FMCI approximation again performs very well in the left tail, although, in (b), the performance degrades somewhat as x gets large. The large deviation approximation performs very well in the right tail, especially when x is greater than 3 standard deviations above the mean. While it is difficult to give a rule of thumb, the FMCI approximation seems to perform very well when $x\le \mathcal{O}\left({n}^{1/2}\right)$. The normal approximation works best within a few standard deviations of the mean and performs best in this region when $\mathbb{E}{X}_{n}\left(\Lambda \right)$ is relatively large.

## Discussion and conclusions

The finite Markov chain imbedding approximations (FMCI and LD) provide an alternative to the usual normal and Poisson approximations for the distributions of runs and patterns. While the FMCI approximation is simple, accurate and fast, it has one disadvantage over the normal and Poisson approximations — it requires the use of the FMCI technique, which is non-traditional and less known in the Statistics community, except in the area of system reliability (cf. Cui et al. 2010). On the other hand, the FMCI technique does not require the rather strong conditions necessary for the Poisson techniques, such as n pkλ. This condition is seldom satisfied in practical applications. For example, in DNA sequence analysis, the probabilities p A , p C , p G and p T do not tend to 0 as n increases. They may not all be in the neighbourhood of 1/4 but they are bounded away from 0.

For all of the numeric results in the previous section, the exact probabilities $\mathbb{P}\left\{{X}_{n}\left(\Lambda \right)=x\right\}$ are obtained via the FMCI technique and their CPU times were only a few seconds or less than a minute even in the case of Λ=C A T T A G and n=500,000. Based on our experience, if the length of the pattern is less than 20 and n is less than 1,000,000, the exact probability should be computed.

## References

• Arratia R, Goldstein L, Gordon L: Poisson approximation and the Chen-Stein method. Stat. Sci 1990, 5(4):403–434.

• Asmussen S: Applied Probability and Queues. Springer, New York; 2003.

• Balakrishnan N, Koutras MV: Runs and Scans with Applications. Wiley Series in Probability and Statistics. Wiley-Interscience [John Wiley & Sons], New York; 2002.

• Barbour AD, Eagleson GK: Poisson approximation for some statistics based on exchangeable trials. Adv. Appl. Probab 1983, 15(3):585–600.

• Barbour AD, Eagleson GK: Poisson convergence for dissociated statistics. J. Roy. Statist. Soc. Ser. B 1984, 46(3):397–402.

• Barbour AD, Eagleson GK: An improved Poisson limit theorem for sums of dissociated random variables. J. Appl. Probab 1987, 24(3):586–599.

• Barbour AD, Hall P: On the rate of Poisson convergence. Math. Proc. Cambridge Philos. Soc 1984, 95(3):473–480.

• Barbour AD, Chryssaphinou O: Compound Poisson approximation: a user’s guide. Ann. Appl. Probab 2001, 11(3):964–1002.

• Barbour AD, Holst L, Janson S: Poisson Approximation. Oxford Studies in Probability. 199a. Oxford Science Publications

• Barbour AD, Chen LHY, Loh W-L: Compound Poisson approximation for nonnegative random variables via Stein’s method. Ann. Probab 1992b, 20(4):1843–1866.

• Barbour AD, Chryssaphinou O, Roos M: Compound Poisson approximation in reliability theory. IEEE T. Reliab 1995, 44(3):398–402.

• Barbour AD, Chryssaphinou O, Roos M: Compound Poisson approximation in systems reliability. Naval Res. Logist 1996, 43(2):251–264.

• Blom G, Thorburn D: How many random digits are required until given sequences are obtained? J. Appl. Probab 1982, 19(3):518–531.

• Chen LHY: Poisson approximation for dependent trials. Ann. Probab 1975, 3(3):534–545.

• Cui L, Xu Y, Zhao X: Developments and applications of the finite Markov chain imbedding approach in reliability. IEEE T. Reliab 2010, 59(4):685–690.

• Fu JC: Reliability of a large consecutive-k-out-of-n:F system. IEEE T. Reliab 1985, R-34: 120–127.

• Fu JC, Johnson BC: Approximate probabilities for runs and patterns in i.i.d. and Markov dependent multi-state trials. Adv. Appl. Probab 2009, 41(1):292–308.

• Fu JC, Koutras MV: Distribution theory of runs: a Markov chain approach. J. Amer. Statist. Assoc 1994, 89(427):1050–1058.

• Fu JC, Lou WYW: Distribution Theory of Runs and Patterns and Its Applications. World Scientific Publishing Co. Inc, River Edge; 2003.

• Fu JC, Lou WYW: On the normal approximation for the distribution of the number of simple or compound patterns in a random sequence of multi-state trials. Methodol. Comput. Appl. Probab 2007, 9(2):195–205.

• Fu JC, Johnson BC, Chang Y-M: Approximating the extreme right-hand tail probability for the distribution of the number of patterns in a sequence of multi-state trials. J. Stat. Plan. Infer 2012, 142(2):473–480.

• Gerber HU, Li S-YR: The occurrence of sequence patterns in repeated experiments and hitting times in a Markov chain. Stochastic Process. Appl 1981, 11(1):101–108.

• Godbole AP: Degenerate and Poisson convergence criteria for success runs. Statist. Probab. Lett 1990a, 10(3):247–255.

• Godbole AP: Specific formulae for some success run distributions. Statist. Probab. Lett 1990b, 10(2):119–124.

• Godbole AP: Poisson approximations for runs and patterns of rare events. Adv. Appl. Probab 1991, 23(4):851–865.

• Godbole AP, Schaffner AA: Improved Poisson approximations for word patterns. Adv. Appl. Probab 1993, 25(2):334–347.

• Holst L, Kennedy JE, Quine MP: Rates of Poisson convergence for some coverage and urn problems using coupling. J. Appl. Probab 1988, 25(4):717–724.

• Karlin S: Statistical signals in bioinformatics. Proc. Natl. Acad. Sci. U. S. A 2005, 102(38):13355–13362.

• Karlin S, Taylor HM: A First Course in Stochastic Processes. Academic Press [A subsidiary of Harcourt Brace Jovanovich, Publishers], New York-London; 1975.

• Kleffe J, Borodovski M: First and second moment of counts of words in random text generated by Markov chains. Comp Applic Biosci 1992, 8: 443–441.

• Martin J, Regad L, Camproux A-C, Nuel G: Finite Markov chain embedding for the exact distribution of patterns in a set of random sequences. In Advances in Data Analysis. Statistics for Industry and Technology. Edited by: Skiadas CH. Birkhäuser, Boston; 2010.

• Meyn SP, Tweedie RL: Markov Chains and Stochastic Stability. Communications and Control Engineering Series. 1993.

• Nuel G, Regad L, Martin J, Camproux A-C: Exact distribution of a pattern in a set of random sequences generated by a Markov source: applications to biological data. Algorithm Mol. Biol 2010, 5(1):1–18.

• Schwager SJ: Run probabilities in sequences of Markov-dependent trials. J. Amer. Statist. Assoc 1983, 78(381):168–180.

• Seneta E: Non-negative Matrices and Markov Chains. Springer, New York; 1983.

• Solov’ev AD: A combinatorial identity and its application to the problem on the first occurrence of a rare event. Teor. Verojatnost. i Primenen 1966, 11: 313–320.

## Acknowledgements

This work was supported, in part, by the Natural Sciences and Engineering Research Council of Canada.

## Author information

Authors

### Competing interests

The authors declare that they have no competing interests.

### Authors’ contributions

BJ and JF contributed equally to the mathematical details. BJ performed the numerical comparisons and prepared the manuscript. Both authors read and approved the final manuscript.

Brad C Johnson and James C Fu contributed equally to this work.

## Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

## Rights and permissions 