Normal approximation
The normal approximation is one of the most popular for approximating the distribution of the number of runs or patterns X_{
n
}(Λ) in Statistics. In general, when Λ is simple or compound, the trials are i.i.d., and the counting is nonoverlapping, by appealing to (1) and renewal arguments, it has been shown that X_{
n
}(Λ) is asymptotically normally distributed (cf. Fu and Lou 2007; Karlin and Taylor 1975). The form of the approximation is
\underset{n\to \infty}{lim}\mathbb{P}\left\{\frac{{X}_{n}\left(\Lambda \right)n/{\mu}_{W}}{\sqrt{n{\sigma}_{W}^{2}{\mu}_{W}^{3}}}\le u\right\}=\Phi \left(u\right),
(6)
where Φ(·) denotes the standard normal distribution function and μ_{
W
} and {\sigma}_{W}^{2} are the mean and variance of W(Λ) respectively, which are given by
\begin{array}{ll}{\mu}_{W}& ={\mathit{\xi}}_{0}{(\mathbf{I}\mathbf{N})}^{1}{1}^{\prime},\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\text{and}\phantom{\rule{2em}{0ex}}\end{array}
(7)
\begin{array}{ll}{\sigma}_{W}^{2}& ={\mathit{\xi}}_{0}(\mathbf{I}+\mathbf{N}){(\mathbf{I}\mathbf{N})}^{2}{1}^{\prime}{\mu}_{W}^{2}.\phantom{\rule{2em}{0ex}}\end{array}
(8)
Given a pattern Λ, it is well known that the mean μ_{
W
} and the variance {\sigma}_{W}^{2} are difficult to obtain via combinatoric arguments, especially when Λ is a compound pattern or the trials are Markov dependent. For example, as pointed out in Karlin (2005) and Kleffe and Borodovski (1992), approximate values of μ_{
W
} and {\sigma}_{W}^{2} must sometimes be used. Since W(Λ) is finite Markov chain imbeddeble, (7) and (8), provide the exact values.
The limit in (6) is appropriate when the sequence of inter arrival times {W_{
i
}(Λ)} are i.i.d., which is the case for simple and compound patterns when the {X_{
i
}} are i.i.d. and counting is nonoverlapping. When occurrences of Λ correspond to a delayed renewal process, which can occur for Markov dependent trials and/or overlapping counting, we could use the mean and variance of W_{2}(Λ) for the normalizing constants, which are easily obtained by modifying ξ_{0} in (7) and (8). Even more general cases can be handled by making use of a functional central limit theorem for Markov chains (see, for example, (Meyn and Tweedie 1993, §17.4) and (Asmussen 2003, Theorem 7.2, pg. 30) for the details).
Poisson and compound poisson approximations
It is well known that, in a sequence of Bernoulli (p) trials, if n p→λ as n→∞, then the probability of k successes in n trials can be approximated by a Poisson probability with parameter λ, denoted P\left(\lambda \right). This idea has been extended to certain patterns Λ and, under certain conditions, the distribution of X_{
n
}(Λ) can be approximated by a Poisson distribution with parameter μ_{
n
} in the sense that
{d}_{\text{TV}}\left(\mathcal{L}\right({X}_{n}\left(\Lambda \right)),P({\mu}_{n}\left)\right)<{\epsilon}_{n},
(9)
where \mathcal{L}(\xb7) denotes the distribution (law) of a random variable and d_{TV}(·,·) denotes the total variation distance.
The primary tool used to obtain μ_{
n
} and the bound ε_{
n
} is the SteinChen method (Chen 1975), and this method has been refined by various authors Arratia et al. (1990), Barbour and Eagleson (1983), Barbour and Eagleson (1984), Barbour and Eagleson (1987), Barbour and Hall (1984), Godbole (1990a), Godbole (1990b), Godbole (1991), Godbole and Schaffner (1993), and Holst et al. (1988). This method has also been extended to compound Poisson approximations for the distributions of runs and patterns and Barbour and Chryssaphinou (2001) provides an excellent theoretical review of these approximations.
In practice, {\mu}_{n}=\mathbb{E}{X}_{n}\left(\Lambda \right) or the expectation of a closely related run statistic is used (cf. Balakrishnan and Koutras 2002, §5.2.3) so that, in the former case,
\mathbb{P}\left\{{X}_{n}\right(\Lambda )=x\}\approx \frac{{\left(\mathbb{E}{X}_{n}\right(\Lambda \left)\right)}^{x}}{x!}exp\left\{\mathbb{E}{X}_{n}\left(\Lambda \right)\right\}.
(10)
Finding \mathbb{E}{X}_{n}\left(\Lambda \right) and the bound ε_{
n
} is usually done on a case by case basis. For the mathematical details, the books (Barbour et al. 1992a) and (Balakrishnan and Koutras 2002) are recommended.
Let {P}_{\phantom{\rule{0.3em}{0ex}}c}(\lambda ,\nu ) denote the compound Poisson distribution, that is, the distribution of the random variable \sum _{j=1}^{M}{Y}_{j} where the random variable M has a Poisson distribution with parameter λ and the Y_{
j
} are i.i.d. having distribution ν. A compound Poisson distribution for approximating nonnegative random variables was suggested in Barbour et al. (1992b) (see also Barbour et al. (19951996)). The approximation is formulated similarly to the Poisson approximation:
{d}_{\text{TV}}\left(\mathcal{L}\right({X}_{n}\left(\Lambda \right)),{P}_{\phantom{\rule{0.3em}{0ex}}c}(\lambda ,\nu \left)\right)<{\epsilon}_{n}.
(11)
The distribution of N_{n,k}, the number of nonoverlapping occurrences of k consecutive successes in n i.i.d. Bernoulli trials, is one of the most important in this area and one of the most studied in the literature. Reversing the roles of S (success) and F (failure), the reliability of consecutivekoutofn system, denoted C(k,n : F), is given by \mathbb{P}\{{N}_{n,k}=0\}. Even in this simple case (i.e. Λ=S S⋯S), there are several ways to apply the Poisson approximation techniques. For example, (Godbole 1991, Theorem 2) shows that approximating N_{n,k} with a P\left(\mathbb{E}{N}_{n,k}\right) distribution works well if certain conditions hold. Godbole and Schaffner (Godbole and Schaffner 1993, pg. 340) suggests an improved Poisson approximation for word patterns.
The primary difficulty in applying the Poisson approximation is the determination of the optimal parameter μ_{
n
}, which is higly dependent on the structure of the pattern Λ. In particular, if Λ is long and has several uneven overlapping subpatterns, then finding μ_{
n
} by their method can be very tedious. In the sequel, we show that even the (asymptotic) best choice for μ_{
n
} for Poisson approximations does not perform well in the relative sense.
FMCI approximations
Approximations based on the FMCI approach depend on the spectral decomposition of the essential transition probability matrix N.
Let N be a w×w essential transition probability matrix associated with a finite Markov chain {Y_{
n
}:n≥0} corresponding to the distribution of the waiting time W(Λ). Let 1>λ_{1}≥λ_{2}≥⋯≥λ_{
w
} denote the ordered eigenvalues of N, repeated according to their algebraic multiplicities, with associated (right) eigenvectors {\mathit{\eta}}_{1}^{\prime},{\mathit{\eta}}_{2}^{\prime},\cdots \phantom{\rule{0.3em}{0ex}},{\mathit{\eta}}_{w}^{\prime}. When the geometric multiplicity of λ_{
i
} is less than its algebraic multiplicity, we will use vectors of 0’s for the unspecified eigenvectors. The fact that λ_{1} can be taken as a positive real number and that η_{1} can be taken to be nonnegative are consequences of the PerronFrobenious Theorem for nonnegative matrices ( Seneta cf.1981).
Definition 1
We will say that {Y_{
n
}:n≥0}, or equivalently, N, satisfies the FMCI Approximation Conditions if

(i)
there exists constants a _{1},…,a _{
w
} such that
{1}^{\prime}=\sum _{i=1}^{w}{a}_{i}{\mathit{\eta}}_{i}^{\prime},
(12)

(ii)
λ _{1} has algebraic multiplicity g and λ _{1}>λ _{
j
} for all j>g.
Verifying these conditions is usually straightforward. They certainly hold if N is irreducible and aperiodic, but also hold in many other cases as well. For example, (12) requires only that 1^{′} is in the linear space spanned by \{{\mathit{\eta}}_{1}^{\prime},{\mathit{\eta}}_{2}^{\prime},\cdots \phantom{\rule{0.3em}{0ex}},{\mathit{\eta}}_{w}^{\prime}\}, which can hold even when N is defective (not diagonizable). Condition (ii) requires that the communication classes corresponding λ_{1} are aperiodic. That is, if Ψ is a communication class and N[Ψ] corresponds to the substocastic matrix N restricted to the states in Ψ, with largest eigenvalue λ_{1}[Ψ], then all Ψ such that λ_{1}[Ψ]=λ_{1} should be aperiodic. We also mention that the algebraic multiplicity of λ_{1} is the number of communication classes Ψ such that λ_{1}[Ψ]=λ_{1}.
Fu and Johnson (2009) give the following theorem.
Theorem 1
Let {X_{
i
}} be a sequence of i.i.d. trials taking values in, let Λ be a simple pattern of length ℓ with d×d essential transition probability matrix N and let X_{
n
}(Λ) be the number of nonoverlapping occurrences of Λ in {X_{
i
}}. If N satisfies the FMCI approximation conditions then, for any fixed x≥0,
\mathbb{P}\left\{{X}_{n}\right(\Lambda )=x\}\sim {a}^{x+1}\left(\genfrac{}{}{0.0pt}{}{nx(\ell 1)}{x}\right){(1{\lambda}_{1})}^{x}{\lambda}_{1}^{nx},
(13)
wherea=\sum _{j=1}^{g}{a}_{j}\left({\mathit{\xi}}_{0}{\mathit{\eta}}_{j}^{\prime}\right). If g=1, as is usually the case, then a=a_{1}(ξ_{0}η 1′).
Given a pattern Λ, the approximation in (13) requires finding the Markov chain imbedding associated with the waiting time W(Λ), the essential transition probability matrix N as well as its eigenvalues and associated eigenvectors. Usually, these steps are rather simple and can be easily automated together with (13). Even for very large n and large ℓ, say n=1,000,000 and ℓ=50, the CPU time is negligible. Fu and Johnson (2009) also provide details on extending these results to compound patterns, overlapping counting and Markov dependent trials.
For the purpose of comparing these approximations, we prefer to write (13) as
\begin{array}{ll}\mathbb{P}\left\{{X}_{n}\right(\Lambda )=x\}& \sim {a}^{x+1}{\left(\frac{1{\lambda}_{1}}{{\lambda}_{1}}\right)}^{x}\left(\genfrac{}{}{0.0pt}{}{nx(\ell 1)}{x}\right)exp\{nln{\lambda}_{1}\}\phantom{\rule{2em}{0ex}}\end{array}
(14)
Note that the approximation havs three parts: a constant part; a polynomial in n of degree x; and a third (dominant) part which converges to 0 exponentially fast as n→∞.
More precisely, the FMCI approximation in (13) may be written as
\begin{array}{ll}\mathbb{P}\left\{{X}_{n}\right(\Lambda )=x\}& ={a}^{x+1}{\left(\frac{1{\lambda}_{1}}{{\lambda}_{1}}\right)}^{x}\left(\genfrac{}{}{0.0pt}{}{nx(\ell 1)}{x}\right)\\ \phantom{\rule{1em}{0ex}}\times exp\{nln{\lambda}_{1}\}\left[1+o\left({\left\frac{{\lambda}_{g+1}}{{\lambda}_{1}}\right}^{n/(x+1)\ell}\right)\right].\end{array}
(15)
Since λ_{g+1}<λ_{1}, the term λ_{g+1}/λ_{1}^{n/(x+1)−ℓ} tends to 0 exponentially as n→∞ and hence is negligible if n/(x+1)−ℓ is moderate or large (say ≥50).
Large deviation approximation
Fu et al. (2012) provide the following large deviation approximation for righttail probabilities for the number of nonoverlapping occurrences for simple patterns Λ. The reasons for providing only the righttail large deviation approximation are (i) all of the above mentioned approximations fail to approximate the extreme righttail probabilities and (ii) the FMCI approximation provides an accurate approximation for lefttail probabilities.
Theorem 2
Let
\epsilon =x{\mu}_{W}^{2}/(1+x{\mu}_{W})
and let
{\phi}_{W}\left(t\right)=1+({e}^{t}1)\mathit{\xi}{(\mathbf{I}{e}^{t}\mathbf{N})}^{1}{1}^{\prime},
(16)
be the moment generating function of W(Λ). Then
\phantom{\rule{10.0pt}{0ex}}\mathbb{P}\left\{{X}_{n}\right(\Lambda )\ge \mathbb{E}{X}_{n}(\Lambda )+\mathit{\text{nx}}\}={e}^{\mathrm{n\beta}(\epsilon ,\Lambda )}\frac{1}{\sqrt{n}}\left\{{b}_{0}+{b}_{1}{n}^{1}+\cdots +{b}_{m}{n}^{m}+\mathcal{O}\left({n}^{m1}\right)\right\},
(17)
where
\beta (x,\Lambda )=\left(\frac{1}{{\mu}_{W}}+x\right)h(\epsilon ,\tau )=\left(\frac{1}{{\mu}_{W}}+x\right)\left[\frac{\tau {\mu}_{W}}{1+x{\mu}_{W}}ln{\phi}_{W\left(\Lambda \right)}(\tau )\right],
(18)
h(\epsilon ,t)=\mathrm{\epsilon t}ln{\phi}_{{\mu}_{W}W\left(\Lambda \right)}\left(t\right)
, τ is the solution to h^{′}(ε,τ)=0, and
\begin{array}{ll}{b}_{0}& =\frac{1}{\mathrm{\sigma \tau}\sqrt{2\pi ({\mu}^{1}+x)}}\hfill \\ {b}_{1}& =\frac{1}{\mathrm{\sigma \tau}\sqrt{2\pi {({\mu}^{1}+x)}^{3}}}\left\{\frac{1}{{\sigma}^{2}{\tau}^{2}}+\frac{{h}^{\left(3\right)}(\epsilon ,\tau )}{2\tau {\sigma}^{4}}\frac{{h}^{\left(4\right)}(\epsilon ,\tau )}{8{\sigma}^{4}}\frac{5{\left({h}^{\left(3\right)}\right(\epsilon ,\tau \left)\right)}^{2}}{24{\sigma}^{6}}\right\}\\ \sigma & =\sqrt{{h}^{\mathrm{\prime \prime}}(\epsilon ,\tau )}.\end{array}
(19)