### 2.1 Extinction in the Galton-Watson branching process

To investigate biological extinction, we use a Galton-Watson branching process in which, at each discrete time interval, every individual generates *i* discrete offspring with probability *p*_{
i
}, and zero offspring with *p*_{0}. Without loss of generality we assume that an individual produces its offspring and then dies, so that each individual in a population is restricted to a single generation. The offspring number is a random variable, which we denote by *X*. Let *n* be the maximum value of *X* so that *X* takes values in the state space {\mathcal{D}}_{n}=\{0,1,2,\dots ,n\}

At any given time *t*, the size of a population (*Z*_{
t
}) is the number of individuals in the branching process. We set *Z*_{0}≡1 unless otherwise specified. The probability of extinction of a branching process is q\equiv {lim}_{t\to \infty}P({Z}_{t}=0|{Z}_{0}=1). If the starting size of the population is greater than one, then the overall probability of extinction can be defined as

\underset{t\to \infty}{lim}P({Z}_{t}=0|{Z}_{0}=N)={q}^{N}.

So we can solve for extinction in the case of *Z*_{0}=1 and extend the results to larger starting populations if necessary.

The recursive formula for finding *q* can be found through a first step analysis (Kimmel and Axelrod 2002). The probability that the lineage of a single individual eventually goes extinct is the probability that it dies without offspring (*p*_{0}) plus the probability that it produces a single offspring whose lineage dies out (*p*_{1}*q*) plus the probability that it produces two offspring whose joint lineages die out (*p*_{2}*q*^{2}), and so on.

This leads to the formal definition of the probability generating function:

f\left(q\right)=\mathbb{E}\left[\phantom{\rule{0.3em}{0ex}}{q}^{X}\right]={p}_{0}+{p}_{1}q+{p}_{2}{q}^{2}+{p}_{3}{q}^{3}\dots {p}_{n}{q}^{n}=\sum _{k=0}^{n}{p}_{k}\phantom{\rule{0.3em}{0ex}}{q}^{k}.

(1)

The probability of extinction of a branching process starting with a single individual is the smallest root of the equation *f*(*q*)=*q* for *q*∈[ 0,1]. The solution *q*=1 is always a root of (1) and is not necessarily the smallest positive root. In some cases, the probability of extinction is trivially obvious. For instance, if *p*_{0}=0 individuals always produces at least one offspring, therefore *q*=0. Furthermore, cases where \mathbb{E}\left[\phantom{\rule{0.3em}{0ex}}X\right]\le 1 always yield *q*=1 (Kimmel and Axelrod 2002).

Inferring the probability of extinction analytically for branching processes with *p*_{0}>0 and \mathbb{E}\left[\phantom{\rule{0.3em}{0ex}}X\right]>1 can be difficult because (1) has *n* complex-valued roots according to the fundamental law of algebra. In the following we illustrate how (1) can be seen in terms of moments of the offspring distribution, and discuss how this approach can be used to estimate *q*.

### 2.2 Moments of the branching process

Let {m}_{k}\equiv \mathbb{E}\left[\phantom{\rule{0.3em}{0ex}}{X}^{k}\right] denote the *k* th moment of the branching process generator *X*. The first moment, *m*_{1}, is equivalent to the average offspring number. Higher moments can be used to obtain other summary statistics of the distribution, such as the variance {\sigma}^{2}={m}_{2}-{m}_{1}^{2}.

The Laplace transform of (1) can be used to (recursively) express extinction in terms of the moments of the branching process

\begin{array}{ll}f\left(q\right)& =\mathbb{E}\left[{q}^{X}\right]=\mathbb{E}\left[{\mathrm{e}}^{X\phantom{\rule{0.3em}{0ex}}logq}\right]\phantom{\rule{2em}{0ex}}\\ =1+{m}_{1}logq+{m}_{2}\frac{{(logq)}^{2}}{2}+{m}_{3}\frac{{(logq)}^{3}}{6}+\dots \phantom{\rule{2em}{0ex}}\\ =\sum _{k=0}^{\infty}{m}_{k}\frac{{(logq)}^{k}}{k!}\phantom{\rule{2em}{0ex}}\end{array}

where *m*_{0}=1. Note that *m*_{
k
}>0 for all *k*≥0. Furthermore, with *q*∈(0,1) we have logq<0. Therefore, even moments increase the probability of extinction while odd moments decrease it. Additionally, if *q*∈(e^{−1},1) then logq\in (-1,0) and the series converges with logq. Thus, approximations, *f*^{∗}(*q*), which take the form

{f}^{\ast}\left(q\right)=\sum _{k=0}^{s-1}{m}_{k}\frac{{(logq)}^{k}}{k!}+o\left({(logq)}^{s}\right)

for *s*≥3 are only accurate when *q* is large and the moments are small. As *q* *↓* 0, the series requires more and more terms to provide accurate approximation. Therefore, when *q* is small the first few moments are not necessarily informative about the probability of extinction.

### 2.3 *s*-Convex orderings of random variables

Here we demonstrate how the first few moments of the offspring distribution can be used to find bounds on the probability of extinction. The random variable *X* is bound by zero and its maximum, *n*, conveniently allowing for *s*-convex ordering (Denuit and Lefevre 1997; Hürlimann 2005; Courtois et al. 2006). Following (Hürlimann 2005) denote by *Δ* the forward difference operator for g:\phantom{\rule{0.3em}{0ex}}{\mathcal{D}}_{n}\to \mathbb{R} by *Δ* *g*(*i*)=*g*(*i*+1)−*g*(*i*) for all i\in {\mathcal{D}}_{n-1}. Analogously for k\in {\mathcal{D}}_{n} the *k*-th order forward difference operator is defined recursively by *Δ*^{0}*g*=*g* and for *k*≥1 by *Δ*^{k}*g*(*i*)=*Δ*^{k−1}*g*(*i*−1)−*Δ*^{k−1}*g*(*i*) for all i\in {\mathcal{D}}_{n-k}. Then, for two random variables *X* and *Y* valued in {\mathcal{D}}_{n} we say *X* precedes *Y* in the *s*-convex order, written X{\le}_{s-\mathit{\text{cx}}}^{{\mathcal{D}}_{n}}Y if \mathbb{E}\left[\phantom{\rule{0.3em}{0ex}}g\right(X\left)\right]\le \mathbb{E}\left[\phantom{\rule{0.3em}{0ex}}g\right(y\left)\right] for all *s*-convex real functions *g* on {\mathcal{D}}_{n}. A convenient consequence is that if X{\le}_{s-\mathit{\text{cx}}}^{{\mathcal{D}}_{n}}Y then

\begin{array}{ll}\mathbb{E}\left({X}^{k}\right)=\mathbb{E}\left({Y}^{k}\right)& \phantom{\rule{1em}{0ex}}\text{for}\phantom{\rule{2.77626pt}{0ex}}k=1,2,\dots ,s-1\phantom{\rule{2em}{0ex}}\\ \mathbb{E}\left({X}^{k}\right)\le \mathbb{E}\left({Y}^{k}\right)& \phantom{\rule{1em}{0ex}}\text{for}\phantom{\rule{2.77626pt}{0ex}}k\ge \mathrm{s.}\phantom{\rule{2em}{0ex}}\end{array}

Define the *moment space* for all random variables with state set {\mathcal{D}}_{n} and fixed first *s*−1 moments {m}_{1},\dots ,{m}_{s-1} by

{\mathfrak{B}}_{s,n}^{\overrightarrow{m}}\equiv \mathfrak{B}\left({\mathcal{D}}_{n},{m}_{1},{m}_{2},\dots ,{m}_{s-1}\right).

Since the random variable *X* is strictly positive, its moment space only contains positive elements. Further, we are only interested in cases where the mean is greater than 1 so that extinction is not certain. This provides a moment space with well behaved properties. The study of the moment problem (see e.g., (Karlin and McGregor 1957; Prékopa 1990)) yields an important relationship between consecutive moments on {\mathfrak{B}}_{s,n}^{\overrightarrow{m}} conditional on *m*_{1}≥1

{\left({m}_{i}\right)}^{\frac{i+1}{i}}\le {m}_{i+1}\le {\mathit{\text{nm}}}_{i}.

(2)

Minimum and maximum extrema distributions on {\mathfrak{B}}_{s,n}^{\overrightarrow{m}} can be found for any distribution on {\mathcal{D}}_{n}, with fixed first *s* moments *m*_{1},*m*_{2},…,*m*_{
s
} (Denuit and Lefevre 1997). The random variables for these distributions are denoted {X}_{min}^{\left(s\right)} and {X}_{max}^{\left(s\right)} such that

\begin{array}{ll}{X}_{min}^{\left(s\right)}{\le}_{s-\mathit{\text{cx}}}^{{\mathcal{D}}_{n}}X{\le}_{s-\mathit{\text{cx}}}^{{\mathcal{D}}_{n}}{X}_{max}^{\left(s\right)}& \phantom{\rule{1em}{0ex}}\text{for all}\phantom{\rule{2.77626pt}{0ex}}X\in {\mathcal{D}}_{n}.\phantom{\rule{2em}{0ex}}\end{array}

Extrema have been derived for *s*=2,3,4,5 (Denuit and Lefevre 1997; Denuit et al. 1999; Hürlimann 2005). Here, we reiterate these results providing the inferred distributions and their utility when obtaining bounds on the probability of extinction. We begin on {\mathfrak{B}}_{2,n}^{\overrightarrow{m}} with the maximal random variable, {X}_{max}^{\left(2\right)}, defined as:

{X}_{max}^{\left(2\right)}=\left\{\begin{array}{cc}0& \text{with}\phantom{\rule{2.77626pt}{0ex}}{p}_{0}=1-\frac{{m}_{1}}{n}\\ n& \text{with}\phantom{\rule{2.77626pt}{0ex}}{p}_{n}=\frac{{m}_{1}}{n}.\end{array}\right.

For {X}_{max}^{\left(2\right)} we observe *m*_{i+1}=*n* *m*_{
i
}, so by (2) this can clearly be seen as the maximum extrema. Intuitively, this is the “long shot” distribution on {\mathcal{D}}_{n}, a worst case scenario. Because the values and respective probabilities of {X}_{max}^{\left(2\right)} are known, *q* can be solved explicitly by finding the least positive root of the generating function:

f\left(q\right)={p}_{0}+{p}_{n}{q}^{n}.

This provides an upper limit on extinction because this generating function will be greater than or equal to the generating function for all other random variables with the same *m*_{1} and *n*, on *q*∈ [ 0,1].

{\mathfrak{B}}_{2,n}^{\overrightarrow{m}}

is a very general moment space and the first moment does not often provide much information about an unknown distribution. Therefore, {X}_{max}^{\left(2\right)} is not likely to be a tight upper bound when *n* is large or unknown. However, if *m*_{1} is near *n*, then the distribution can be fairly well approximated by {X}_{max}^{\left(2\right)}.

Unlike {X}_{max}^{\left(2\right)}, {X}_{min}^{\left(2\right)} does not provide a useful bound on the probability of extinction. {X}_{min}^{\left(2\right)} is defined as:

{X}_{min}^{\left(2\right)}=\left\{\begin{array}{cc}\alpha & \text{with}\phantom{\rule{2.77626pt}{0ex}}{p}_{\alpha}=\alpha +1-{m}_{1}\\ \alpha +1& \text{with}\phantom{\rule{2.77626pt}{0ex}}{p}_{\alpha +1}={m}_{1}-\alpha \end{array}\right.

(3)

where *α* is the integer on {\mathcal{D}}_{n} such that

\alpha <{m}_{1}\le \alpha +1.

This extremal random variable represents a best case scenario. However, since *m*_{1}>1, *α* must be larger than zero and this branching process has no chance of death (i.e. *p*_{0}=0) and consequently no chance of extinction (*q*=0). Therefore {X}_{min}^{\left(2\right)} does not provide a useful bound on the probability of extinction as the bound *q*≥0 is obvious.

This bound and all other bounds examined here can be found using discrete Chebyshev systems (Denuit and Lefevre 1997). However, extremal bounds are perhaps more intuitive for continuous random variables, to which the discrete cases can be seen as similar (Shaked and Shanthikumar 2007; Hürlimann 2005; Denuit et al. 1999). For example, {X}_{min}^{\left(2\right)} in the continuous case has only one possible value, *m*_{1} with {p}_{{m}_{1}}=1. By (2) this is clearly an extrema because (*m*_{
i
})^{(i+1)/i}=*m*_{i+1}=(*m*_{1})^{i+1}. In comparison, the discrete case (3) has similar properties.

The following notation helps extending these calculations to higher order systems (Denuit et al. 1999). Let w,x,y,z\in {\mathcal{D}}_{n}, and set *m*_{0}=1. Then:

\begin{array}{ll}\hfill {m}_{j,z}& :=z\xb7{m}_{j-1}-{m}_{j},\phantom{\rule{1em}{0ex}}j=1,2,\dots ;\phantom{\rule{2em}{0ex}}\\ \hfill {m}_{j,z,y}& :=y\xb7{m}_{j-1,z}-{m}_{j,z},\phantom{\rule{1em}{0ex}}j=2,3,\dots ;\phantom{\rule{2em}{0ex}}\\ \hfill {m}_{j,z,y,x}& :=x\xb7{m}_{j-1,z,y}-{m}_{j,z,y},\phantom{\rule{1em}{0ex}}j=3,4,\dots ;\phantom{\rule{2em}{0ex}}\\ \hfill {m}_{j,z,y,x,w}& :=w\xb7{m}_{j-1,z,y,x}-{m}_{j,z,y,x},\phantom{\rule{1em}{0ex}}j=4,5,\dots .\phantom{\rule{2em}{0ex}}\end{array}

The reader should recognize this notation as it is simply the iterative forward difference operator *Δ*^{k} for moments.

If the first two moments are known, then a tighter upper bound can be found. On {\mathfrak{B}}_{3,n}^{\overrightarrow{m}} the minimal distribution in the 3-convex sense is given by:

{X}_{min}^{\left(3\right)}=\left\{\begin{array}{ll}0& \text{with}\phantom{\rule{2.77626pt}{0ex}}{p}_{0}=1-{p}_{\alpha}-{p}_{\alpha +1}\\ \alpha & \text{with}\phantom{\rule{2.77626pt}{0ex}}{p}_{\alpha}=\frac{{m}_{2,\alpha +1}}{\alpha}\\ \alpha +1& \text{with}\phantom{\rule{2.77626pt}{0ex}}{p}_{\alpha +1}=\frac{-{m}_{2,\alpha}}{\alpha +1}\end{array}\right.

where

\alpha <\frac{{m}_{2}}{{m}_{1}}\le \alpha +1.

This bound is already known in the branching process literature (Daley and Narayan 1980). Similar to {X}_{max}^{\left(2\right)}, the extremal random variable {X}_{min}^{\left(3\right)} represents a worst case scenario, this time using two moments. The root of the equation

f\left(q\right)=q={p}_{0}+{p}_{\alpha}{q}^{\alpha}+{p}_{\alpha +1}{q}^{\alpha +1}

(4)

provides an upper bound to the probability of extinction, so that (4) has greater values at any *q*∈[ 0,1) than the probability generating functions of any other random variable in {\mathfrak{B}}_{3,n}^{\overrightarrow{m}}.

In contrast to {X}_{max}^{\left(2\right)}, the minimum extrema on {\mathfrak{B}}_{3,n}^{\overrightarrow{m}} yields the upper limit for the probability of extinction. The alternation between minimum and maximum for the worst case scenarios is due to the convexity of (1). Again, this extrema is perhaps more intuitive in the continuous sense, in which

{X}_{min,\phantom{\rule{1em}{0ex}}\text{cont.}}^{\left(3\right)}=\left\{\begin{array}{cc}0& \text{with}\phantom{\rule{2.77626pt}{0ex}}{p}_{0}=1-{p}_{{m}_{2}/{m}_{1}}\\ \frac{{m}_{2}}{{m}_{1}}& \text{with}\phantom{\rule{2.77626pt}{0ex}}{p}_{{m}_{2}/{m}_{1}}=\frac{{\left({m}_{1}\right)}^{2}}{{m}_{2}}.\end{array}\right.

In this case, successive moments simply grow by *m*_{2}/*m*_{1}, so that *m*_{i+1}=*m*_{
i
}(*m*_{2}/*m*_{1}), providing a minimum on {\mathfrak{B}}_{3,n}^{\overrightarrow{m}}. And, as was the case for the minimum on {\mathfrak{B}}_{2,n}^{\overrightarrow{m}}, the discrete minimum extrema on {\mathfrak{B}}_{3,n}^{\overrightarrow{m}} has similar properties to the continuous minimum extrema.

For both {\mathfrak{B}}_{2,n}^{\overrightarrow{m}} and {\mathfrak{B}}_{3,n}^{\overrightarrow{m}} the discrete cases are simply discretization of the continuous case. However, this is not necessarily the case for higher moment spaces (Courtois et al. 2006). While the continuous cases provide more intuitive extrema, derivation of the discrete case for higher moments is not as simple as deriving the continuous case and discretizing.

Next, we examine the maximum extrema on {\mathfrak{B}}_{3,n}^{\overrightarrow{m}}:

{X}_{max}^{\left(3\right)}=\left\{\begin{array}{ll}\alpha & \text{with}\phantom{\rule{2.77626pt}{0ex}}{p}_{\alpha}=\frac{{m}_{2,n,\alpha +1}}{n-\alpha}\\ \alpha +1& \text{with}\phantom{\rule{2.77626pt}{0ex}}{p}_{\alpha +1}=\frac{-{m}_{2,n,\alpha}}{n-\alpha -1}\\ n& \text{with}\phantom{\rule{2.77626pt}{0ex}}{p}_{n}=1-{p}_{\alpha}-{p}_{\alpha +1}\end{array}\right.

where

\alpha <\frac{{\mathit{\text{nm}}}_{1}-{m}_{2}}{n-{m}_{1}}\le \alpha +1.

Since {X}_{max}^{\left(3\right)} can only provide non-trivial information about *q* if *p*_{0}>0, this extremal distribution is only informative about extinction when *α*=0 and *p*_{
α
}>0, which is the case whenever *n* *m*_{1}−*m*_{2}<*n*−*m*_{1}. Although this requirement may appear restrictive, some classes of distributions have simple rules under which {X}_{max}^{\left(3\right)} is informative. For example, for binomial distributions, *B*_{n,p}, {X}_{max}^{\left(3\right)} will provide a non-zero lower bound if 1/*n*<*p*≤1/(*n*−1).

We move on to {\mathfrak{B}}_{4,n}^{\overrightarrow{m}}. The use of three moments can improve bounds on the probability of extinction, but as with all of the maximal random variables, {X}_{max}^{\left(4\right)} requires the knowledge of the maximum, *n*. {X}_{max}^{\left(4\right)} is defined as:

{X}_{max}^{\left(4\right)}=\left\{\begin{array}{ll}0& \text{with}\phantom{\rule{2.77626pt}{0ex}}{p}_{0}=1-{p}_{\alpha}-{p}_{\alpha +1}-{p}_{n}\\ \alpha & \text{with}\phantom{\rule{2.77626pt}{0ex}}{p}_{\alpha}=\frac{{m}_{3,n,\alpha +1}}{\alpha (n-\alpha )}\\ \alpha +1& \text{with}\phantom{\rule{2.77626pt}{0ex}}{p}_{\alpha +1}=\frac{-{m}_{3,n,\alpha}}{(\alpha +1)(n-\alpha -1)}\\ n& \text{with}\phantom{\rule{2.77626pt}{0ex}}{p}_{n}=\frac{{m}_{3,\alpha ,\alpha +1}}{n(n-\alpha )(n-\alpha -1)}\end{array}\right.

where

\alpha <\frac{{m}_{2}n-{m}_{3}}{{m}_{1}n-{m}_{2}}\le \alpha +1.

While this is a potential improvement to the lower bound given by {X}_{min}^{\left(3\right)}, the improvement is sometimes negligible. As n\to \infty, the difference between {X}_{max}^{\left(4\right)} and {X}_{min}^{\left(3\right)} vanishes because

\begin{array}{ll}\underset{n\to \infty}{lim}& \phantom{\rule{1em}{0ex}}\frac{{m}_{2}n-{m}_{3}}{{m}_{1}n-{m}_{2}}=\frac{{m}_{2}}{{m}_{1}}\phantom{\rule{2em}{0ex}}\end{array}

and furthermore, if n\to \infty then *p*_{
n
}→0. Therefore, the resulting generating function for {X}_{max}^{\left(4\right)} is identical to (4) if the maximal value is unknown. So, like the first moment, the third moment is uninformative about extinction when *n* is unknown, unless assumptions are made about the distribution (see e.g., (Daley and Narayan 1980; Ethier and Khoshnevisan 2002)).

The minimal extrema for {\mathfrak{B}}_{4,n}^{\overrightarrow{m}},\phantom{\rule{0.3em}{0ex}}{X}_{min}^{\left(4\right)} is given by

{X}_{min}^{\left(4\right)}=\left\{\begin{array}{ll}\alpha ,& \text{with}\phantom{\rule{2.77626pt}{0ex}}{p}_{\alpha}=\frac{{m}_{3,\beta ,\beta +1,\alpha +1}}{(\beta -\alpha )(\beta +1-\alpha )}\\ \alpha +1,& \text{with}\phantom{\rule{2.77626pt}{0ex}}{p}_{\alpha +1}=\frac{-{m}_{3,\beta ,\beta +1,\alpha}}{(\beta -\alpha )(\beta -1-\alpha )}\\ \beta ,& \text{with}\phantom{\rule{2.77626pt}{0ex}}{p}_{\beta}=\frac{{m}_{3,\alpha ,\alpha +1,\beta +1}}{(\beta -\alpha )(\beta -1-\alpha )}\\ \beta +1,& \text{with}\phantom{\rule{2.77626pt}{0ex}}{p}_{\beta +1}=\frac{-{m}_{3,\alpha ,\alpha +1,\beta}}{(\beta -\alpha )(\beta +1-\alpha )}\end{array}\right.

where *α* and *β* are given by

\alpha <\frac{{m}_{3,\beta ,\beta +1}}{{m}_{2,\beta ,\beta +1}}\le \alpha +1,\phantom{\rule{1em}{0ex}}\beta <\frac{{m}_{3,\alpha ,\alpha +1}}{{m}_{2,\alpha ,\alpha +1}}\le \beta +1.

Again, this bound is only useful if *p*_{0}>0. Unfortunately there is no short form equation to identify which spaces {\mathfrak{B}}_{4,n}^{\overrightarrow{m}} fit this requirement. However, one can easily determine if a given {\mathfrak{B}}_{4,n}^{\overrightarrow{m}} has a useful {X}_{min}^{\left(4\right)}. Assuming *α*=0, \hat{\beta} is simply bound by

\hat{\beta}<\frac{{m}_{3}-{m}_{2}}{{m}_{2}-{m}_{1}}\le \hat{\beta}+1.

And if {m}_{3,\hat{\beta},\hat{\beta}+1}<{m}_{2,\hat{\beta},\hat{\beta}+1}, then the bound is useful because the resulting {X}_{min}^{\left(4\right)} has *p*_{0}>0. Alternatively, if {m}_{3,\hat{\beta},\hat{\beta}+1}\ge {m}_{2,\hat{\beta},\hat{\beta}+1} the supports for {X}_{min}^{\left(4\right)} have *p*_{0}=0 and consequently *q*=0.

If the first four moments are known, the extremal variable {X}_{min}^{\left(5\right)} can be obtained. Its distribution takes a simple form, but the equations used to find its values and relative probabilities are relatively large. From (Hürlimann 2005), {X}_{min}^{\left(5\right)} is defined as:

{X}_{min}^{\left(5\right)}=\left\{\begin{array}{ll}0& \text{with}\phantom{\rule{2.77626pt}{0ex}}{p}_{0}=1-{p}_{\alpha}-{p}_{\alpha +1}-{p}_{\beta}-{p}_{\beta +1}\\ \alpha & \text{with}\phantom{\rule{2.77626pt}{0ex}}{p}_{\alpha}=\frac{{m}_{4,\beta ,\beta +1,\alpha +1}}{\alpha (\beta -\alpha )(\beta +1-\alpha )}\\ \alpha +1& \text{with}\phantom{\rule{2.77626pt}{0ex}}{p}_{\alpha +1}=\frac{-{m}_{4,\beta ,\beta +1,\alpha}}{(\alpha +1)(\beta -\alpha )(\beta -1-\alpha )}\\ \beta & \text{with}\phantom{\rule{2.77626pt}{0ex}}{p}_{\beta}=\frac{{m}_{4,\alpha ,\alpha +1,\beta +1}}{\beta (\beta -\alpha )(\beta -1-\alpha )}\\ \beta +1& \text{with}\phantom{\rule{2.77626pt}{0ex}}{p}_{\beta +1}=\frac{-{m}_{4,\alpha ,\alpha +1,\beta}}{(\beta +1)(\beta -\alpha )(\beta +1-\alpha )}\end{array}\right.

where

\alpha <\frac{{m}_{4,\beta ,\beta +1}}{{m}_{3,\beta ,\beta +1}}\le \alpha +1,\phantom{\rule{1em}{0ex}}\beta <\frac{{m}_{4,\alpha ,\alpha +1}}{{m}_{3,\alpha ,\alpha +1}}\le \beta +1.

(5)

Courtois et al. (Courtois et al. 2006) explained that there is no analytic form to directly obtain *α* and *β* for {X}_{min}^{\left(5\right)}. They showed this by disproving the intuitive idea that the discrete support encloses the continuous support. To find *α* and *β*, we iteratively search all possible supports on *D*_{
n
} until both inequalities are satisfied. This exhaustive method for finding the supports for this extrema is not ideal, especially if *D*_{
n
} is dense. Linear programing can be used to easily find the extremal supports and their probabilities (Prékopa 1990), but such approaches are not necessary when *D*_{
n
} is sparse (e.g., when *n* is relatively small).

Hürlimann (Hürlimann 2005) also presents a form for the upper extremal variable in {\mathfrak{B}}_{5,n}^{\overrightarrow{m}}. The process {X}_{max}^{\left(5\right)} is defined as:

{X}_{max}^{\left(5\right)}=\left\{\begin{array}{ll}\alpha ,& \text{with}\phantom{\rule{2.77626pt}{0ex}}{p}_{\alpha}=\frac{{m}_{4,n,\beta ,\beta +1,\alpha +1}}{(\beta -\alpha )(\beta +1-\alpha )(n-\alpha )}\\ \alpha +1,& \text{with}\phantom{\rule{2.77626pt}{0ex}}{p}_{\alpha +1}=\frac{-{m}_{4,n,\beta ,\beta +1,\alpha}}{(\beta -\alpha )(\beta -\alpha -1)(n-\alpha -1)}\\ \beta ,& \text{with}\phantom{\rule{2.77626pt}{0ex}}{p}_{\beta}=\frac{{m}_{4,n,\alpha ,\alpha +1,\beta +1}}{(\beta -\alpha )(\beta -\alpha -1)(n-\beta )}\\ \beta +1,& \text{with}\phantom{\rule{2.77626pt}{0ex}}{p}_{\beta +1}=\frac{-{m}_{4,n,\alpha ,\alpha +1,\beta}}{(\beta -\alpha )(\beta +1-\alpha )(n-\beta -1)}\\ n,& \text{with}\phantom{\rule{2.77626pt}{0ex}}{p}_{n}=1-{p}_{\alpha}-{p}_{\alpha +1}-{p}_{\beta}-{p}_{\beta +1}\end{array}\right.

where

\begin{array}{l}\alpha <\frac{{m}_{4,n,\beta ,\beta +1}}{{m}_{3,n,\beta ,\beta +1}}\le \alpha +1,\phantom{\rule{1em}{0ex}}\beta <\frac{{m}_{4,n,\alpha ,\alpha +1}}{{m}_{3,n,\alpha ,\alpha +1}}\le \beta +1.\end{array}

As was the case for {X}_{min}^{\left(4\right)}, one can determine if {X}_{max}^{\left(5\right)} has *p*_{0}>0 by assuming *α*=0 and solving for \hat{\beta} with

\hat{\beta}<\frac{{m}_{4,n,0,1}}{{m}_{3,n,0,1}}\le \hat{\beta}+1.

If the resulting \hat{\beta} in the inequality {m}_{4,n,\hat{\beta},\hat{\beta}+1}<{m}_{4,n,\hat{\beta},\hat{\beta}+1} holds, the bound for {X}_{max}^{\left(5\right)} is informative.

All {X}_{max}^{\left(j\right)} extrema rely on the maximum offspring number, *n*. Similar to {X}_{max}^{\left(4\right)}, when *n* is unknown or infinity {X}_{max}^{\left(5\right)} goes to the minimum on the lower moment space, here {X}_{min}^{\left(4\right)}. Thus if *n* is unknown, {X}_{max}^{\left(j\right)} goes to {X}_{min}^{(j-1)}, at least for the cases examined here.

The Chebychev approach can be used to extend this approach to higher moments (Hürlimann 2005). However, moments above the fourth are rarely used, and higher moments can be difficult to estimate from small samples. Further, the equations for the supports and probabilities for moments above the fourth become increasingly complex.