Extinction in a branching process: why some of the fittest strategies cannot guarantee survival

Biological fitness is typically measured by the expected rate of reproduction, but strategies with high fitness can also have high probabilities of extinction. Likewise, gambling strategies with a high expected payoff can also have a high risk of ruin. We take inspiration from the gambler’s ruin problem to examine how extinction is related to population growth. Using moment theory we demonstrate how higher moments can impact the probability of extinction and how the first few moments can be used to find bounds on the extinction probability, focusing on s-convex ordering of random variables. This approach generates “best case” and “worst case” scenarios to provide upper and lower bounds on the probability of extinction. 92D15, 60J80, 60E15


Extinction of a branching process
Reproduction is necessary for the survival of populations.Populations with high rates of reproduction will often avoid extinction.However, a population may have a high expected reproductive rate but may nevertheless go extinct with near certainty (Lewontin and Cohen, 1969).For example, populations with large variation in reproductive success can sometimes have a high probability of extinction, even if they have a high expected growth (Tuljapurkar and Orzack, 1980).
Similarly, investors and gamblers can avoid Gambler's Ruin through growth of capital.However, a gambler should not simply apply the strategy with the highest expected growth rate as it may also have a high risk of ruin.For example, investors can use the Kelly ratio (Kelly, 1956) to maximize expected geometric growth of their capital but strict adherence to this ratio can be risky, and playing a more conservative strategy is often recommended (MacLean et al., 2010).
To estimate the probability of Gambler's Ruin, one can use approximations based on moments (Ethier and Khoshnevisan, 2002;Canjar, 2007;Hürlimann, 2005).Here we apply these approaches to estimate the probability of extinction in a branching process.The mathematics of Gambler's Ruin is very similar to that of extinction in a branching process (Courtois et al., 2006).Both statistical models involve a random variable (payoff/offspring number), resulting in a random walk (change in capital/change in population size), and an absorbing state (ruin/extinction).Moreover, both processes are assumed to be Markovian, and finding the probability of ruin/extinction involves solving for the root of a convex function.
Here we examine the random variable representing the number of offspring, and investigate how the moments of this random variable are related to the probability of extinction.We demonstrate an important relationship between these moments and extinction: odd moments favor survival and even moments favor extinction.The first moment of the offspring distribution, its mean, has the biggest influence on extinction.However, the first moment alone is not usually informative about extinction probabilities.In fact, strategies with arbitrarily large first moments can nevertheless go extinct with near certainty.Some of the "fittest" strategies can be highly unlikely to survive.
Using the first few moments of the offspring distribution, one can obtain bounds on the ultimate probability of extinction (Courtois et al., 2006;Daley and Narayan, 1980).These bounds provide "best case" and "worst case" distributions.We present these bounds, termed s-convex extremal random variables, adapted from actuarial science and research on the gambler's ruin problem (Denuit and Lefevre, 1997;Hürlimann, 2005;Courtois et al., 2006).We find the conditions under which these extremals provide non-trivial bounds.Using some simple examples, we demonstrate how these methods can be used to compare distributions using their moments.

Extinction in the Galton-Watson branching process
To investigate biological extinction, we use a Galton-Watson branching process in which, at each discrete time interval, every individual generates i discrete offspring with probability p i , and zero offspring with p 0 .Without loss of generality we assume that an individual produces its offspring and then dies, so that each individual in a population is restricted to a single generation.The offspring number is a random variable, which we denote by X.Let n be the maximum value of X so that X takes values in the state space D n = {0, 1, 2, ..., n} At any given time t, the size of a population (Z t ) is the number of individuals in the branching process.We set Z 0 ≡ 1 unless otherwise specified.The probability of extinction of a branching process is q ≡ lim t→∞ P (Z t = 0|Z 0 = 1).If the starting size of the population is greater than one, then the overall probability of extinction can be defined as So we can solve for extinction in the case of Z 0 = 1 and extend the results to larger starting populations if necessary.
The recursive formula for finding q can be found through a first step analysis (Kimmel and Axelrod, 2002).The probability that the lineage of a single individual eventually goes extinct is the probability that it dies without offspring (p 0 ) plus the probability that it produces a single offspring whose lineage dies out (p 1 q) plus the probability that it produces two offspring whose joint lineages die out (p 2 q 2 ), and so on.
This leads to the formal definition of the probability generating function: (1) The probability of extinction of a branching process starting with a single individual is the smallest root of the equation f (q) = q for q ∈ [0, 1].The solution q = 1 is always a root of (1) and is not necessarily the smallest positive root.In some cases, the probability of extinction is trivially obvious.For instance, if p 0 = 0 individuals always produces at least one offspring, therefore q = 0. Furthermore, cases where E[X] ≤ 1 always yield q = 1 (Kimmel and Axelrod, 2002).
Inferring the probability of extinction analytically for branching processes with p 0 > 0 and E[X] > 1 can be difficult because (1) has n complex-valued roots according to the fundamental law of algebra.In the following we illustrate how (1) can be seen in terms of moments of the offspring distribution, and discuss how this approach can be used to estimate q.
3 Moments of the branching process ] denote the kth moment of the branching process generator X.The first moment, m 1 , is equivalent to the average offspring number.Higher moments can be used to obtain other summary statistics of the distribution, such as the variance The Laplace transform of (1) can be used to (recursively) express extinction in terms of the moments of the branching process where m 0 = 1.Note that m k > 0 for all k ≥ 0. Furthermore, with q ∈ (0, 1) we have log q < 0. Therefore, even moments increase the probability of extinction while odd moments decrease it.Additionally, if q ∈ (e −1 , 1) then log q ∈ (−1, 0) and the series converges with log q.Thus, approximations, f * (q), which take the form for s ≥ 3 are only accurate when q is large and the moments are small.As q ↓ 0, the series requires more and more terms to provide accurate approximation.Therefore, when q is small the first few moments are not necessarily informative about the probability of extinction.

s-Convex orderings of random variables
Here we demonstrate how the first few moments of the offspring distribution can be used to find bounds on the probability of extinction.The random variable X is bound by zero and its maximum, n, conveniently allowing for s-convex ordering (Denuit and Lefevre, 1997;Hürlimann, 2005;Courtois et al., 2006).Define the moment space for all random variables with state set D n and fixed first s − 1 moments m 1 , . . ., m s−1 by Since the random variable X is strictly positive, its moment space only contains positive elements.Further, we are only interested in cases where the mean is greater than 1 so that extinction is not certain.This provides a moment space with well behaved properties.The study of the moment problem (e.g., Karlin and McGregor, 1957;Prékopa, 1990) yields an important relationship between consecutive moments on B m s,n conditional on m 1 ≥ 1 For two random variables X and Y with state set D n , we say that if Minimum and maximum extrema distributions on B m s,n can be found for any distribution on D n , with fixed first s moments m 1 , m 2 , ..., m s (Denuit and Lefevre, 1997).The random variables for these distributions are denoted X (s) min and X (s) max such that See Denuit and Lefevre (1997), Denuit et al. (1999b) and Hürlimann (2005) for detailed definitions of s-convexity.Following the results from these papers, we define the extremal min/max random variables given the first few moments.We begin on B m 2,n with the maximal random variable, X (2) max , defined as: For X (2) max we observe m i+1 = nm i , so by (2) this can clearly be seen as the maximum extrema.Intuitively, this is the "long shot" distribution on D n , a worst case scenario.Because the values and respective probabilities of X (2) max are known, q can be solved explicitly by finding the least positive root of the generating function: This provides an upper limit on extinction because this generating function will be greater than or equal to the generating function for all other random variables with the same m 1 and n, on q ∈ [0, 1].

B m
2,n is a very general moment space and the first moment does not often provide much information about an unknown distribution.Therefore, X (2) max is not likely to be a tight upper bound when n is large or unknown.However, if m 1 is near n, then the distribution can be fairly well approximated by min does not provide a useful bound on the probability of extinction.X (2) min is defined as: where α is the integer on D n such that This extremal random variable represents a best case scenario.However, since m 1 > 1, α must be larger than zero and this branching process has no chance of death (i.e.p 0 = 0) and consequently no chance of extinction (q = 0).Therefore X (2) min does not provide a useful bound on the probability of extinction as the bound q ≥ 0 is obvious.This bound and all other bounds examined here can be found using discrete Chebyshev systems (Denuit and Lefevre, 1997).However, extremal bounds are perhaps more intuitive for continuous random variables, to which the discrete cases can be seen as similar (Shaked and Shanthikumar, 2007;Hürlimann, 2005;Denuit et al., 1999b).For example, X (2) min in the continuous case has only one possible value, m 1 with p m 1 = 1.By (2) this is clearly an extrema because (m i ) (i+1)/i = m i+1 = (m 1 ) i+1 .In comparison, the discrete case (3) has similar properties.
The following notation helps extending these calculations to higher order systems (Denuit et al., 1999a).Let w, x, y, z ∈ D n , and set m 0 = 1.Then: m j,z := z • m j−1 − m j , j = 1, 2, . . .; m j,z,y := y • m j−1,z − m j,z , j = 2, 3, . . .; m j,z,y,x := x • m j−1,z,y − m j,z,y , j = 3, 4, . . .; m j,z,y,x,w := w • m j−1,z,y,x − m j,z,y,x , j = 4, 5, . . .If the first two moments are known, then a tighter upper bound can be found.On B m 3,n the minimal distribution in the 3-convex sense is given by: This bound is already known in the branching process literature (Daley and Narayan, 1980).Similar to X (2) max , the extremal random variable X (3) min represents a worst case scenario, this time using two moments.The root of the equation f (q) = q = p 0 + p α q α + p α+1 q α+1 (4) provides an upper bound to the probability of extinction, so that (4) has greater values at any q ∈ [0, 1) than the probability generating functions of any other random variable in B m 3,n .
In contrast to X (2) max , the minimum extrema on B m 3,n yields the upper limit for the probability of extinction.The alternation between minimum and maximum for the worst case scenarios is due to the convexity of (1).Again, this extrema is perhaps more intuitive in the continuous sense, in which In this case, successive moments simply grow by m 2 /m 1 , so that m i+1 = m i (m 2 /m 1 ), providing a clear minimum on B m 3,n .And, as was the case for the minimum on B m 2,n , the discrete minimum extrema on B m 3,n has similar properties to the continuous minimum extrema.
For both B m 2,n and B m 3,n the discrete cases are simply discretization of the continuous case.However, this is not necessarily the case for higher moment spaces (Courtois et al., 2006).While the continuous cases provide more intuitive extrema, derivation of the discrete case for higher moments is not as simple as deriving the continuous case and discretizing.
Next, we examine the maximum extrema on B m 3,n : Since X (3) max can only provide non-trivial information about q if p 0 > 0, this extremal distribution is only informative about extinction when α = 0 and p α > 0, which is the case whenever nm 1 − m 2 < n − m 1 .Although this requirement may appear restrictive, some classes of distributions have simple rules under which X (3) max is informative.For example, for binomial distributions, B n,p , X (3) max will provide a non-zero lower bound if 1/n < p ≤ 1/(n − 1).
We move on to B m 4,n .The use of three moments can improve bounds on the probability of extinction, but as with all of the maximal random variables, X (4) max requires the knowledge of the maximum, n.X (4) max is defined as: While this is a potential improvement to the lower bound given by X (3) min , the improvement is sometimes negligible.As n → ∞, the difference between X (4) max and X (3) and because with p n → 0, the generating function for X (4) max is identical to (4).So, like the first moment, the third moment is uninformative about extinction when n is unknown, unless assumptions are made about the distribution (see, e.g., Daley and Narayan, 1980;Ethier and Khoshnevisan, 2002).
The minimal extrema for B m 4,n , X min is given by where α and β are given by Again, this bound is only useful if p 0 > 0. Unfortunately there is no short form equation to identify which spaces B m 4,n fit this requirement.However, one can easily determine if a given B m 4,n has a useful X (4) min .Assuming α = 0, β is simply bound by , then the bound is useful because the resulting X (4) min has p 0 > 0. Alternatively, if m 3, β, β+1 ≥ m 2, β, β+1 the the supports for X (4) min have p 0 = 0 and consequently q = 0.
If the first four moments are known, the extremal variable X (5) min can be obtained.Its distribution takes a simple form, but the equations used to find its values and relative probabilities are relatively large.From Hürlimann (2005), X (5) min is defined as: (5) Courtois et al. (2006) proposed that there is no analytic form to directly obtain α and β for X (5) min .They showed this by disproving the intuitive idea that the discrete support encloses the continuous support.Thus, to find α and β, iteratively search D n until both inequalities are satisfied.Hürlimann (2005) also presents a form for the upper extremal variable in B m 5,n .The process X (5) max is defined as: As was the case for X min , one can determine if X (5) max has p 0 > 0 by assuming α = 0 and solving for β with If the resulting β in the inequality m 4,n, β, β+1 < m 4,n, β, β+1 holds, the bound for X (5) max is informative.
All X (j) max extrema rely on the maximum offspring number, n.Similar to X (4) max , when n is unknown or infinity X (5) max goes to the minimum on the lower moment space, here X (4) min .Thus if n is unknown, X (j) max goes to X (j−1) min , at least for the cases examined here.
The Chebychev approach can be used to extend this approach to higher moments (Hürlimann, 2005), however we do not believe this would be worthwhile for two reasons.First, moments above the fourth are rarely used, and higher moments can be difficult to estimate.Further, the equations for the supports and probabilities for moments above the fourth become immense, and calculating their values for a given set of moments may be challenging.

Examples
Here we discuss some example distributions, graph their generating functions, and also graph generating functions for the extremal distributions.The plot of the probability generating function, f (q), on q ∈ (0, 1) is a useful way to visualize how the moments are related to extinction.The probability generating function takes the value p 0 at q = 0.At small q, f (q) has a slope of approximately p 1 .In this part of the function, when q is small, there can be a weak relationship between f (q) and moments.In comparison, when q is close to 1, the moments are closely related to f (q).For example f (1) = m 1 .Higher moments begin to influence the function as q moves away from 1.
The probability of extinction of a process is found when f (q) = q, i.e at the intersect between its probability generating function f (q) and the diagonal q.Thus, processes with a high probability of extinction will cross the diagonal near q = 1, in the domain of q in which the probability generating function is often closely related to its first few moments.
Plotting the probability generating functions for the extremal distributions helps demonstrate why they act as bounds on extinction.In these examples (Figure 1), we compare two distributions with identical first moment and maximum (m 1 = 2, n = 20), i.e. both distributions are in B 2 2,20 .In particular, we look at a binomial distribution, Figure 1(a), and a truncated geometric distribution, Figure 1(b).For each of these plots we also plot the generating functions for some of the extremal distributions.The extremal distributions provide clear bounds: best case extrema are found below the plot of the generating function, worst case extrema are found above.For example, the extremal distribution based on one moment, X (2) max , provides an upper bound on the probability of extinction, and can be seen as the upper line in both plots.Because they share an identical first moment and maximum, X (2) max is the same for both distributions.Clearly, one moment does not provide a good bound in these examples.As more moments are used, the bounds become tighter.The extrema using four moments provide relatively accurate upper and lower bounds for both examples.
The lower bounds provide the best case extrema, which are useful in both cases only when three or four moments are known.The lower bound using two moments is not useful here in either case, as its probability generating function crosses the diagonal at zero so its probability of extinction is zero.The lower bound using only one moment was not included because its generating function is trivial and always uninformative about extinction.
Importantly, these examples demonstrate why higher moments are often necessary to compare strategies.These two distributions have identical first moments (m 1 = 2) so classically their fitness value would be equal.However, the binomial example is more likely to survive.If entire distributions are known, then extinction probabilities can be calculated explicitly.If instead enough moments are known, one can nevertheless conclude that the truncated geometric example is inferior.Compare the extremal distributions when four moments are known, paying attention to where they cross the diagonal.The value at the intersect is the probability of extinction for the extrema, which we display in Table 1 and Table 2, respectively for the binomial example and the truncated geometric example.Using four moments, the best case for the truncated geometric example (0.404,

Discussion
The work here is intended to highlight the relationship between the moments of the offspring distribution and the probability of extinction.Extinction can be defined in terms of moments, but the first few moments are only informative about extinction under certain conditions.But, no matter these conditions there exists an interesting relationship with even and odd moments: high even moments favor extinction, high odd moments favor survival.This relationship between even and odd moments is also seen in the stochastic price equation, where relative growth rates increase with increasing odd moments, and decrease with increasing even moments (Rice, 2008).
The relationship between moments and extinction can provide insight into the evolutionary process.A high first moment can favor survival, but worst case extrema ("long shots") represent the strategies that are least likely to survive.Better strategies have a high first moment and relatively low second moment (high mean, low variance) with the worst case as the distribution with the lowest third moment (strongest right skew).Even better strategies have a high first moment with a relatively low second moment and relatively high third moment (high mean, low variance, strong left skew).Worst case extrema with three moments have the highest possible fourth moment (excessive kurtosis).The relative importance of higher moments depends on the distribution, and in some cases higher moments can have a big influence on extinction.
Strategies with a high probability of extinction are unlikely to be found in natural populations, even if their expected reproductive rate is high (Tuljapurkar and Orzack, 1980).New alleles will often arrive in a population as a singlet, and extinction is permanent unless the same mutation occurs more than once.In such cases, survival is more important than the average rate of reproduction.Using moments of the offspring distribution one can find bounds on extinction using their s-convex extrema.If the best case extrema for a set of moments has a high probability of extinction, then strategies with these moments will be evolutionarily unlikely, regardless of how fit they would be if they survived.
Gamblers can avoid strategies with a high risk of ruin by calculating their odds.In natural populations, such calculations are not required to prevent the occurrence of high risk strategies.Instead, risky strategies will be naturally unlikely, especially considering that many arrive as a single allele with one chance at survival.Risk is not solely determined by mean growth, and strategies with a high mean can sometimes have high risk.Unfortunately, these high risk and high reward strategies are unlikely to return anything without sufficient investment, so natural avoidance of risk can result in missed opportunity for growth.

Table 1 :
Extinction probabilities and supports for the extremal distributions of the Binomial example B 20,0.1 .The actual probability of extinction for this process is 0.181.

Table 2 :
Extinction probabilities and supports for the extremal distributions of the truncated Geometric example.The actual probability of extinction for this process is 0.499.