A general stochastic model for bivariate episodes driven by a gamma sequence

We propose a new stochastic model describing the joint distribution of (X ,N), where N is a counting variable while X is the sum of N independent gamma random variables. We present the main properties of this general model, which include marginal and conditional distributions, integral transforms, moments and parameter estimation. We also discuss in more detail a special case where N has a heavy tailed discrete Pareto distribution. An example from finance illustrates the modeling potential of this new mixed bivariate distribution.


Introduction
We propose a new stochastic model describing the joint distribution of (X, N) where N is an integer-valued random variable on N = {1, 2, . . .} while the {X i } are independent and identically distributed (IID) random variables on R + , independent of N. Our new model is a generalization of the BEG and the BGG models introduced in Kozubowski and Panorska (2005) and Barreto-Souza (2012), respectively, where N was geometrically distributed with the probability mass function (PMF) and the {X i } were either exponential (BEG), with the probability density function (PDF) or gamma distributed (BGG), with the PDF X i ".
The majority of work connected with bivariate distributions given by (1) concerned cases where the distributions of X and N were both light-tailed, with the exception of Arendarczyk et al. (2018), where N was (light tailed) geometric and X was heavy tailed with Lomax distribution (also known as Pareto Type II), given by the survival function (SF) In contrast, in our generalization studied in this paper, both X and N in (1) are heavy tailed. We achieve this by keeping the {X i } in (1) gamma distributed with PDF (4) as in Barreto-Souza (2012), while changing N to be a discrete version of the continuous Pareto distribution (5) with the same survival function (see, e.g., Buddana and Kozubowski 2014;Krishna and Singh Pundir 2009). The PMF of this discrete Pareto (DP) distribution can be written as where p ∈ (0, 1) is a "size" parameter and δ ∈[ 0, ∞) is a tail parameter. We note that as δ → 0 + , then the DP distribution (6) converges to a geometric distribution (2). Thus, while our model (1) with DP distributed N offers more flexibility compared with the BEG model of Kozubowski and Panorska (2005) or the BGG model of Barreto-Souza (2012), it turns into the latter two in the special case with δ = 0 and exponential or gamma distributed {X i } in (1). In addition to presenting basic properties of the new model described above, we also summarize fundamental properties of the entire class of such models with gamma distributed {X i } and arbitrary distribution of the random variable N, which drives the dependence structure of the (X, N) vector in (1). These properties include infinite divisibility, the tail behavior of X, the conditional distributions of N given X = x as well as N given X > x, and estimation. In particular, we establish the existence and uniqueness of maximum likelihood estimators of α and β within this class of models. Our general results concerning the conditional distributions have interesting connections with hidden truncation (see, e.g., Arnold ) and play important role in goodness-of-fit analysis for these models.
Our article is organized as follows. In Section 2 we present fundamental properties of general class of bivariate distributions (1), driven by an IID gamma sequence {X i }. In Section 3, we summarize basic facts about the special case with discrete Pareto distributed N, including parameter estimation. An application from finance is presented in Section 4,

Bivariate episodes driven by an IID gamma sequence
In this section we summarize the properties which are common to all bivariate distributions describing the episodes (1) driven by a sequence {X i } of IID gamma variables with the PDF (4), denoted by GAM(α, β). We assume that the variable N, which describes the duration of an episode (X, N), is independent of the {X i } and supported on the set of positive integers N = {1, 2, . . .}. Clearly, given N = n ∈ N, the variable X is the sum of n IID gamma variables GAM(α, β), so it is also gamma distributed, with parameters αn and β. Thus, the joint PDF of (X, N) in (1) is of the form where f N (·) is the PMF of N. This leads to the following result, whose proof is elementary. (X, N) in (1)  In view of the above, we conclude that the marginal distribution of X is an infinite mixture of gamma distributions, as the PDF of X takes on the form

Proposition 1 The cumulative distribution function (CDF) and the survival functions (SF) of
Remark 1 In the special case where N is geometric (2) we recover the Bivariate Gamma-Geometric (BGG) distribution of Barreto-Souza (2012). In this case, the PDF of X can be written in terms of 2-parameter Mittag-Leffler (ML) special function E α,β (z) = ∞ n=0 z n (αn + β) , z ∈ C, Re(α) > 0, Re(β) > 0, which is an entire function on the complex plain C (see, e.g., Haubold et al. 2011). Indeed, simple algebra shows that the PDF (8) takes on the form Further, in the special case α = 1, when the {X i } in (1) are exponentially distributed, we obtain the BEG model of Kozubowski and Panorska (2005), where the term "BEG" stands for Bivariate distribution with Exponential and Geometric margins. Here, the ML special function becomes the exponential function, E 1,1 (z) = e z , and the marginal density of X in (10) reduces to the exponential PDF (3) with β replaced with pβ.
Let us also account for the joint Laplace transform ψ(t, s) connected with the distribution of (X, N) in (1), where Standard conditioning argument shows that this function can be conveniently expressed in terms of the probability generating function (PGF) of the random variable

Proposition 2
The LT of (X, N) in (1) driven by an IID gamma GAM(α, β) sequence {X i } is given by where G N (·) is the PGF of N. Barreto-Souza (2012),

then (12) turns into the LT of the BGG model of
Remark 3 As can be seen from the proof of Proposition 2, if the IID variables {X i } are non-negative but not necessarily gamma distributed, then the LT in (12) is given by

Moments and tail behavior
Here we consider moments and the tail behavior of the variables X and N. Clearly, if the counting variable N is light tailed, and all of its moments E[ N η ] exist for η > 0, then the mixed moments E[ X r N η ] exist as well for any r, η > 0. However, this is not so if the distribution of N is heavy tailed, and its survival function behaves like a power law, where c is a positive constant and f (x) ∼ g(x) (x → ∞) means that the ratio f (x)/g(x) converges to 1 as x → ∞. Indeed, if the tail probability satisfies (13) for some v > 0, then for positive η > 0 the expectation E[ N η ] exists only for η < v. Moreover, in this case of heavy tailed N, the variable X in (1) is heavy tailed as well, with essentially the same tail behavior as N. (1) Amponsah et al. Journal of Statistical Distributions and Applications (2021) 8:7 Page 5 of 31

Proposition 3 If the variable N in
Under the conditions of Proposition 3, the moments E[ X r ] with r > 0 are finite only for r < v. More generally, standard calculations involving either the joint PDF (7) or conditioning arguments viz. the representation (1) show that the joint moments E[ X r N η ] with positive r, η > 0 exist only for r+η < v, and their values are provided in the following result.
Proposition 4 Under the conditions of Proposition 3, the moments E[ X r N η ] with r, η > 0 are finite only for r + η < v, in which case we have where f N (·) is the PMF of N.
If they exist, then the mean vector and the covariance matrix of (X, N) in (1) can be deduced from Proposition 4. The result below provides relevant information concerning this. (1) (2), so that E[ N] = 1/p and Var[ N] = (1 − p)/p 2 , we recover the mean vector and the covariance matrix of the BGG model of Barreto-Souza (2012).

Remark 4 We note that if N is geometric
Remark 5 If the variance of N is finite, then the correlation of X and N is a simple function of the parameter α and the index of dispersion of N, Var [N] , which can fall anywhere in the interval (0, ∞). In the BGG case, this reduces to ρ = (1 − p)/(1 − p + p/α), which can be either greater than or less than √ 1 − p, as noted by Barreto-Souza (2012). The latter is the correlation in the BEG model, arising in the special case α = 1 of the BGG.

Conditional distributions
The derivation of basic conditional distributions in this bivariate model is straightforward. Clearly, the conditional distribution of X given N = n is GAM (αn, β), with the PDF Similarly, the PMF of the conditional distribution of N given X = x can be obtained by dividing the joint PDF of (X, N) in (7) by the marginal PDF of X, given by (8). Interestingly, this conditional PMF is proportional to the product of two PMFs as follows: This reduces to the Poisson PMF with parameter λ when β = 1.
Let us note an important stochastic interpretation of an integer-valued random variable Y with PMF proportional to the product of two PMFs.
where f Y (·), f N (·), and f Q (·) are the PMFs of Y, N, and Q, respectively, then we have the representation where the variables N and Q are mutually independent.
In view of the above lemma, we get the following result regarding the conditional distribution of N given X = x.
Proposition 6 Let (X, N) have a joint distribution with the PDF (7). Then the conditional distribution of N given X = x is the same as that of the random variable Y in (19), where N and Q are independent and Q − 1 has extended Poisson distribution EP(α, λ) with λ = βx, given by the PMF (17).

Remark 7
We note that in the special case of the BGG model, where N is geometric (2), the PDF of N given X = x in (16) is proportional to [ (1 − p) 1/α βx] αn / (αn). We conclude that this conditional distribution is the same as that of 1 + R, where R ∼ EP(α, λ) with λ = (1 − p) 1/α βx. In the special case of the BEG model, where we have α = 1, the variable R is Poisson with mean λ = (1 − p)βx, and N|X = x is R shifted up by 1.
The information about the conditional distribution of X given N = n provided above is useful in goodness-of-fit analysis for this bivariate model. Indeed, given data (X i , N i ) from the model, one can isolate the pairs with a particular value of N i = n and check the fit of the theoretical conditional distribution of X|N = n against the empirical distribution of the {X i |N i = n}. Unfortunately, one cannot proceed in a similar fashion using the conditional distribution of N given X = x, since the distribution of X is continuous. With this limitation in mind, we now derive the conditional distribution of N given X > x, which will be more useful in goodness-of-fit analysis.
Observe that the PMF of this conditional distribution can be written as where and the {X i } are IID gamma GAM(α, β) distributed random variables. This is because the integrand above is the PDF of the sum X 1 + · · · + X n . Consider a renewal process Then, the probability on the right-hand-side of (21) coincides with the probability P(N(x) + 1 ≤ n), which in turn is the same as the H(n) in (20), with the latter taking the form The function in the numerator on the right-hand-side in (22) is the (upper) incomplete gamma function, Thus, the function H(n) above is a genuine CDF of a discrete random variable N(x) + 1. In turn, in view of (20), the conditional PMF of N given X > x is proportional to the product of the CDF H(n) and the PMF f N (n). Distributions with such a structure have an important stochastic representation related to the hidden truncation (see, e.g., Arnold, 2009), given below.
where f Y (·) and f N (·) are the PMFs of Y and N, respectively, while F Q (·) is the CDF of a random variable Q, then we have the representation where the variables N and Q are mutually independent.
In view of the above lemma, we have the following result concerning the conditional distribution of N given X > x in our model.

Proposition 7
Let (X, N) have a joint distribution with the PDF (7). Then the conditional distribution of N given X > x is the same as that of the random variable Y in (23) Amponsah et al. Journal of Statistical Distributions and Applications (2021)  We note that in the special case where α = 1, so that the variables {X i } in (1) are exponential with parameter β, the renewal process {N(x), x ∈ R + } is actually a Poisson process, with rate β. If in addition the variable N is geometrically distributed with the PMF (2), the distribution of (X, N) in (1) turns into the BEG model studied by Kozubowski and Panorska (2005). We have the following new result concerning the conditional distribution of N given X > x in this special case.
Proposition 8 Let (X, N) have a joint distribution with the PDF (7) where α = 1 and N is geometric, so that f N (n) = p(1 − p) n−1 , n ∈ N. Then the conditional distribution of N given X > x is the same as that of the random variable Y = N + T, where T is Poisson with parameter (1 − p)βx, independent of N.
The above results are useful in goodness-of-fit analysis when our bivariate distribution is fitted to data. Indeed, given data (X i , N i ) from the model, one can isolate the pairs where X i > x for a particular x > 0, and check the fit of the theoretical conditional distribution of N|X > x against the empirical distribution of the {N i } that are part of these pairs. One simple strategy is to compare certain characteristics, such as the mean, the mode, or the variance, of the theoretical and the empirical distributions across different values of x. For example, in the BEG model, by Proposition 8, the conditional expectation is a simple linear function of x, which can be plotted against x and compared with the corresponding plot of the sample mean of N|X > x for the same x. Then the goodnessof-fit of the BEG model may be judged by comparing the two plots.
We finally note that the conditional distribution of N given X > x studied above is one of the two marginal distributions in the bivariate model (X, N)|X > x. The full joint distribution of this model can be described by the joint PDF, which can be shown to be of the form where f Y (·) is the PMF (20) of the distribution of N|X > x and with H(n) as in (21), is the PDF of gamma distribution GAM(αn, β) truncated below at x. The joint CDF of this distribution can be written in terms of (upper) incomplete gamma function as follows: Observe that, in view of (24), while the PDF of this distribution admits a hierarchical structure in the spirit of (7), this conditional distribution generally does not have a stochastic representation analogous to (1). On the other hand, the bivariate distribution (2021) 8:7 Page 9 of 31 describing (X, N)|N > u, where u ∈ N 0 , can actually be related to (1). Indeed, it is easy to see that the joint PDF of this distribution is of the form where is the PMF ofÑ d = N|N > u. Thus, this conditional distribution admits the representation (1) with N replaced byÑ. For any x ∈ R + and n ≥ u, the joint survival function of this distribution takes on the form with fÑ (·) provided in (27). We note the following stochastic representation of this distribution involving the excess random variable N u which, like N, is also supported on the set of positive integers N.
Proposition 9 Let (X, N) have a joint distribution with the PDF (7) and let N u Then we have the following stochastic representation: Thus, if the support of N is the set of positive integers, then the conditional distribution of (X, N)|N > u can be expressed as a convolution of two distributions with similar structure, one of which is degenerate and the other is also connected with a discrete distribution supported on N. In particular, if N is geometrically distributed with the PMF (2), then due to its memoryless property, we have that N u where G u is a gamma GAM(αu, β) distributed random variable, independent of (X, N). We finish this section with a brief account for two other related conditional distributions, one of (X, N)|X ≤ x, and another of (X, N)|N ≤ u. The following result, which is straightforward to derive, can be found in Amponsah (2017) (see equations (2.36) and (2.38) therein) in its special case of discrete Pareto random variable N.
Proposition 10 Let u, n ∈ N and x > 0, with n ≤ u. Then the conditional CDF of (X, N) given N ≤ u is given by Similarly, for 0 < y ≤ x and n ∈ N, the conditional CDF of (X, N) given X ≤ x is given by

Divisibility properties
It was shown in Barreto-Souza (2012) and Kozubowski and Panorska (2005) that the BGG and BEG models are infinitely divisible (ID). Below, we show that this is a general property of this model, as long as the ID property holds for N as well as the underlying IID sequence Recall that a random vector X (and its probability distribution) is said to be ID if for each n ∈ N the vector X can be "decomposed" into the sum In this situation, the variables Y (n) i are also non-negative and their common LT is given by i } whose distribution is supported on the set N 0 of non-negative integers. This implies that N itself is ID as well, and can be decomposed into the sum of n IID terms of the form 1/n + N (n) is the PGF of N. Under this notation, we have the following result.

Proposition 11
Let N be an integer-valued random variable supported on N, given by the PGF G N (·). Further, suppose that (X, N) is defined viz. (1), where the {X i } are nonnegative, IID, and ID random variables, given by LT ψ X 1 (·), independent of N. Suppose also that the distribution of N − 1 is discrete ID. Then, the distribution of (X, N) is ID and for each n ∈ N we have the stochastic representation where all the variables on the right-hand-side of (33) are mutually independent, the {G (n) j } are IID non-negative variables given by the LT ψ n (t) =[ ψ X 1 (t)] 1/n , and the {X ij } are IID with the LT ψ X 1 (·). Amponsah et al. Journal of Statistical Distributions and Applications (2021) 8:7 Page 11 of 31 When we restrict the above result to the case where the variables {X i } are gamma GAM(α, β) distributed (and thus ID), we obtain the following generalizations of the results in Barreto-Souza (2012). β) distributed, and independent of N. Suppose also that the distribution of N − 1 is discrete ID. Then, the distribution of (X, N) is ID and for each n ∈ N we have the stochastic representation (32) (33), where all the variables on the righthand-side of (33) are mutually independent, the {G Kozubowski and Panorska (2005) and Barreto-Souza (2012), the variable N was geometric (2) while the {X i } were exponentially or gamma distributed, respectively. In this case, the variables T (n) j in (33) have negative binomial N B(r, p) distribution with r = 1/n, whose PDF and PGF are given by

Remark 8 In
respectively. We note that in the gamma case the LT of the {(R (32) is of the form [ ψ(t, s)] 1/n , where ψ(t, s) is the LT of (X, N), given by (12). More generally, the ID property of the distribution of (X, N) established above implies that [ ψ(t, s)] r is a genuine LT for any r ∈ R + , and describes the marginal distribution of a bivariate Lévy motion {(R(r), V (r)), r ∈ R + }, which is a stochastic process with independent and stationary increments started at the origin, built upon the distribution of (X, N). Such processes connected with the BEG and BGG models were studied in Kozubowski et al. (2008) and Barreto-Souza (2012), respectively. Their analogs in the more general case described above can be studied along the same lines.

Parameter estimation
In this section we consider maximum likelihood estimation of the parameters of the model (1) where the {X i } are IID with gamma GAM(α, β) distribution. We shall assume that the distribution of N depends on k unknown parameters ω 1 , . . . , ω k , where ω = (ω 1 , . . . , ω k ) ∈ ⊂ R k . Under this notation, our model has k + 2 unknown parameters Under this notation, the PMF of the random variable N will be denoted by f N (·; ω).
Let us start with the Fisher information matrix I(θ ) corresponding to this bivariate model. Note that the logarithm of the joint PDF in (7), Amponsah et al. Journal of Statistical Distributions and Applications (2021) 8:7 Page 12 of 31 is the sum of two terms, v 1 (x, n; α, β) = αn log β+(αn−1) log x−βx−log (αn) and v 2 (n; ω) = log f N (n; ω), with each of them depending on different parameters. Consequently, if standard regularity conditions for the family {f N (·; ω), ω ∈ } are satisfied and appropriate expectations exist, then the Fisher information I(θ ) will have a block-diagonal structure, where W =[ w ij ] is a 2 by 2 matrix where and J(ω) is the Fisher information matrix connected with the family {f N (·; ω), ω ∈ }. Straightforward algebra shows that the entries of W are as follows: where
Next, we let (X 1 , N 1 ), . . . , (X n , N n ) be a random sample from the bivariate model with the PDF (7). By straightforward calculations, we find the log-likelihood function to be of the form where andN n ,X n are the sample means of the {N i } and {X i }, respectively. In order to find the maximum likelihood estimator (MLE) of θ , we need to maximize the log-likelihood function (37) with respect to θ ∈ = R 2 + × ⊂ R k+2 . Since θ = (α, β, ω), by the special structure of the log-likelihood function (37), we can proceed by maximizing the functions R(α, β) and Q(ω) with respect to their arguments independently of each other. Consequently, the MLE of ω is only dependent on the {N i }, and is connected with the family {f N (·; ω), ω ∈ }. Additionally, the MLEs of α and β will be of the same form regardless of the particular distribution of N, although their asymptotic properties may depend on the latter. Amponsah et al. Journal of Statistical Distributions and Applications (2021) 8:7 Page 13 of 31
In view of the above result, we conclude that whenever the limit in Part (iii) is negative, which occurs with probability 1 for any finite sample of size n ≥ 2, the function g(α) in monotonically increasing on the interval (0,α n ) and monotonically decreasing on the interval (α n , ∞), whereα n is a unique solution for α of the equation g (α) = 0. This establishes the existence and uniqueness of the MLEs of α and β for this model. (X 1 , N 1 ), . . . , (X n , N n ) be IID copies of (X, N) in (1), where n ≥ 2. Then, the MLEs of α and β exist and are unique, where the MLE of α,α n , solves the equation

Corollary 2 Let
while the MLE of β is given byβ n =α nNn /X n .
Remark 10 When the parameter α is known, the MLE of β always exists, is unique, and its value is provided by (40). In particular this recovers the results of the BEG model of Kozubowski and Panorska (2005), where we have α = 1. (42) is continuous and monotonic, the solution of this equation is straightforward to compute using standard numerical methods.

Remark 12
In the special case when the sample size is n = 1, the equation (42) does not admit a finite solution. However, the equation is satisfied in the limiting sense, witĥ α n = ∞. What the above analysis shows is that the function R(α, β) in (38) is maximized "at infinity", when α → ∞ and β = β(α) is given by (40). A practical interpretation of this is that the gamma variables that drive the model of (X, N) in (1) reduce to a point mass at c, where c = X 1 /N 1 , so that the distribution of (X, N) is a degenerate one, with (X, N) = (cN, N). Similar interpretation applies when n ≥ 2 and the limit in Part (iii) in Proposition 12 is zero, in which case the solution of (42) is alsoα n = ∞. As can be seen from the proof of this result, this can only happen if all the ratios X i /N i in the sample are equal to some c > 0. However, this degenerate case is not of a concern in practice, as this event occurs with probability zero. Barreto-Souza (2012), who considered a special case in (1) (·; ω), ω ∈ } is such that the function = (ψ 1 , . . . , ψ k+2 ) with ψ i = −∂/∂θ i log f X,N (x, n), i = 1, . . . , k + 2, satisfies standard regularity conditions of large-sample theory (see, e.g., conditions A0-A6 in Bickel and Doksum (2015), pp. 384-385) then the MLEs of α and β are consistent, asymptotically normal, and efficient (see, e.g., Theorem 6.2.2 in Bickel and Doksum (2015), p. 386). In particular, due to the special structure of the Fisher information matrix (34), we will have the following convergence in distribution

Remark 13 Compared with the results of
where N d (μ, ) denotes d-variate normal distribution with mean vector μ and covariance matrix and W is the 2 by 2 matrix with entries given by (36). This asymptotic normal distribution can be used to derive (approximate) confidence intervals for the parameters.

Bivariate episodes with discrete Pareto duration
Here, we illustrate the general results above with a special case studied in Amponsah (2017), where the random variable N in (1) has discrete Pareto (DP) distribution with parameters δ ≥ 0 and p ∈ (0, 1), given by the PMF (6) and denoted by DP(δ, p). Virtually all the results discussed in Section 2 carry-over to this special case, where we use the PMF (6) in place of a generic PMF f N (·). First, we provide a formal definition of this particular stochastic model.

Remark 15
In the special case with δ = 0, the DP distribution turns into a geometric distribution (2), and the GMDP distribution reduces to the BGG model proposed by Barreto-Souza (2012). If in addition to δ = 0 we also have α = 1, so that the {X i } in (1) are IID exponential with parameter β > 0, the GMDP model turns into the BEG distribution studied in Kozubowski and Panorska (2005). The GMDP model offers more flexibility, as it has an additional parameter, and can be useful for modeling heavy tailed data, as its marginal distributions of X and N are both heavy tailed for any δ > 0, as will be seen below.
The joint PDF of (X, N) ∼ GMDP(α, β, δ, p) is given explicitly by (7), with the DP PMF (6) in place of f N (·). In turn, the CDF and the SF are given by the expressions in Proposition 1. Further, by (8), we obtain the PDF of X for which takes a particularly simple form in case of the BEG model (δ = 0, α = 1), where X is exponential with parameter βp. We also note that (for x ∈ R + ) the CDF of X is of the form

Laplace transform
The LT of the GMDP model follows from general theory, discussed in Proposition 2 and the remark following it. When we take into account the particular form of the PGF connected with DP distribution (see Proposition 2.6 in Buddana and Kozubowski 2014), we obtain the following result.

Moments and tail behavior
Recall that the SF of N ∼ DP(δ, p) is of the form (see Buddana and Kozubowski 2014, Proposition 2.4) where x is the floor function (the largest integer that is less than or equal to x). This shows that DP distribution is heavy tailed with tail index v = 1/δ, that is the SF satisfies (13) with v = 1/δ. Further the constant c in (13) is equal to c =[ −δ log(1 − p)] −1/δ . Thus, by Proposition 3, we have the following result.

Conditional distributions
The conditional distributions of (X, N) ∼ GMDP(α, β, δ, p) follow general theory laid out in Section 2. First, we note that the conditional distribution of X given N = n is gamma GAM(αn, β), with the PDF in (15). This conditional distribution is the same in any model of the form (1) driven by an IID gamma sequence {X i }, regardless of a particular distribution of N. Second, the conditional distribution of N given X = x has the PMF given by (16) with f N (·) replaced by the DP PMF (6), the function f α,λ (·) given by (17), and the normalizing constant c is given by Moreover, according to Proposition 6, this conditional distribution is the same as that of the random variable N|N = Q, where N and Q are independent and Q−1 has extended Poisson distribution EP(α, λ) with λ = βx, given by the PMF (17). Further, the conditional distribution of N|X > x has the PMF of the form (20), and, in view of Proposition 7, is the same as the distribution of N|N ≥ Q with independent N and Q, where this time  (24) where f Y (·) is the PMF (20) of the distribution of N|X > x and g(·|n) is the PDF of gamma distribution GAM(αn, β) truncated below at x, with the PDF given by (25). As discussed in Section 2, the two conditional distributions, one of X|N = n and one for N|X > x, provide useful aids in goodness-of-fit analysis when the GMDP model is fitted to data.
We now turn to the conditional distribution of (X, N)|N > u, where u ∈ N 0 . As in the general case discussed in Section 2, this conditional distribution admits representation (1) with N replaced byÑ, where the latter has a truncated DP distribution, with the PMF given by (27). The PDF and the SF of this distribution are given by (26) and (28), respectively. This case is very special, since the corresponding excess random variable, N u d = N −u|N > u is also DP distributed (see Proposition 3.3 in Buddana and Kozubowski 2014). In view of this result, we have the following analog of Proposition 9.

Remark 16
By the above result, if (X, N) has GMDP distribution then the conditional distribution of (X, N)|N > u is a convolution of two distributions, of which one is degenerate and the other is also a GMDP distribution, with different "size" parameter p. In the special case when δ = 0 (so that N is geometrically distributed) the variable (X u , N u ) in (47) has actually the same distribution as (X, N) itself, due to the memoryless property of the geometric distribution.
Additional two conditional distributions, (X, N)|X ≤ x and (X, N)|N ≤ u, are described in the next result which follows from Proposition 10.

Infinite divisibility and stochastic representations
In this section we discuss an important stochastic representation of the GMDP distribution as well as its infinite divisibility property. Both of these are related to the fact that the DP distribution is a mixture of geometric distributions. Indeed (see, e.g., shown in Buddana and Kozubowski 2014, Proposition 3.4), the DP distribution of N ∼ DP(δ, p) admits the following hierarchical structure: Since a DP distributed random variable N is an important component of the GMDP distribution, a similar hierarchical structure carries over to the latter, as shown in the result below.

Proposition 18
The distribution of (X, N) ∼ GMDP(α, β, δ, p) admits the following hierarchical structure: Remark 17 According to the above result, a GMDP distribution can be thought of as being conditionally a BGG distribution, which is a GMBD distribution with δ = 0, so that the variable N in the stochastic representation is a geometric one. In other words, this N is conditionally geometric, so that its unconditional distribution is DP, as seen above, which is precisely why we recover the GMDP model. Another way to describe this is through the density functions. Namely, given a gamma GAM(1/δ, γ ) distributed Z in Proposition 18, with density given by the PMF of (X, N)|Z = z in Proposition 18 will be Then, the distribution of (X, N) ∼ GMDP (α, β, δ, p) can be seen as the marginal distribution of (X, N) in a trivariate model (X, N, Z) given by the PDF

Further properties of the GMDP model can be developed by means of this hierarchical structure coupled with standard conditioning arguments.
Another consequence of the hierarchical structure of the DP distribution is its infinite divisibility (cf., Corollary 3.5 in Buddana and Kozubowski 2014), which, in view of Corollary 1, immediately implies the same property of the GMDP model. In the following result we need the PGF of N ∼ DP(δ, p), which is known to be of the form (see Buddana and Kozubowski 2014, Proposition 2.6): Corollary 3 The distribution of (X, N) ∼ GMDP(α, β, δ, p) is ID and for each n ∈ N we have the stochastic representation (32) (33), where all the variables on the right-hand-side of (33) are mutually independent, the {G

Parameter estimation
The discussion of estimation for the general case in Section 2 generally carries over to the special case of the GMDP distribution. Here, the variable N ∼ DP(δ, p) depends on a 2dimensional vector-parameter ω = (δ, p) ∈ , where we take the parameter space to be an open set = (0, ∞) × (0, 1). The PDF of N, which is given by (6), will be denoted by f N (·; ω). Clearly, our model has four unknown parameters, which can be conveniently described as a vector-parameter θ , where θ 1 = α, θ 2 = β, θ 3 = δ, and θ 4 = p. With this notation, the parameter space is an open set = (0, ∞) 3 × (0, 1).

Fisher information
According to general results in Section 2, if the appropriate expectations exist, the Fisher information matrix I(θ) connected with the GMDP(α, β, δ, p) distribution has a blockdiagonal structure, where the entries {w ij }, generally given by (35), depend only on the parameters α and β, and the 2 × 2 matrix is the Fisher information matrix connected with the DP(δ, p) model. Since in our situation the expectation of N is given by (45), according to (36), we have where ψ(·) is the digamma function. We note that the above quantities are well defined only when δ < 1, due to the heavy tail of the DP distribution. Indeed, clearly the two series describing w 12 = w 21 and w 22 converge only if 1/δ > 1, so that we must have δ < 1. In addition, we have the asymptotic relation ψ (z) ∼ 1/z as z → ∞ (see, e.g., formula 6.4.12 in Abramowitz and Stegun 1972, p. 260) and the PMF of the DP distribution given by the expression in the square bracket in the series describing w 11 behaves asymptotically as the power n −(1+1/δ) at infinity. Thus, that series also converges only if δ < 1. Further, standard calculations produce the information matrix (50) of the DP model, which is given in the result below.

Proposition 19
The Fisher information matrix (50) of the DP(δ, p) model is well defined for any δ > 0 and p ∈ (0, 1), and is given by

A =[ a ij ] is a symmetric matrix with entries of the form
, the function f N (·; δ, p) is the PDF (6) of the DP(δ, p) distribution, and

Maximum likelihood estimation
Let (X 1 , N 1 ), . . . , (X n , N n ) be a random sample from a GMDP distribution. According to general results on estimation discussed in Section 2, maximum likelihood estimation can be carried out separately for the two pairs, α, β, and δ, p. Regarding the former, the MLEs of α and β always exist and are unique, and their values are provided in Corollary 2. Moreover, while the calculation of the MLEs involves solving a one-dimensional non-linear Page 21 of 31 equation, stated in (42), the relevant function is rather well behaved and finding the solution by standard numerical methods is straightforward. Further, since the family of DP distributions {f N (·; ω), ω ∈ (0, ∞)×(0, 1)} is such that the function = (ψ 1 , . . . , ψ 4 ) with N (x, n), i = 1, . . . , 4, satisfies standard regularity conditions of largesample theory, the MLEs of α and β are consistent, asymptotically normal, and efficient. Due to the special structure of the Fisher information matrix (49), we will have the convergence in distribution (43) where W is the 2 × 2 matrix with entries {w ij } given above. This asymptotic normal distribution can be used to derive (approximate) confidence intervals for the two parameters.
Let us now move to estimation of parameters δ and p. According to general results on estimation discussed in Section 2, this requires optimization of the function Q(ω) in (39) with respect to ω = (δ, p) ∈ (0, ∞) × (0, 1). Since the function is the log-likelihood (LL) function of an univariate random sample {N i } from a DP distribution, one can use methods available for this distribution in order to get the MLEs. In particular, we can proceed by solving the likelihood equations, obtained by setting to zero the two partial derivatives of the LL function in (53). The calculation of the derivatives is straightforward; after some algebra, we obtain the following likelihood equations: log h (δ, p, N i ) h(δ,p,N i ) h(δ,p,N i While the above equations can be solved for δ and p by standard numerical methods, the actual calculations may be computationally intensive. An alternative approach for finding the MLEs in this case is the standard EM algorithm, as the DP distribution is conveniently represented as a mixture of geometric distributions. Such an algorithm has been recently developed and tested in Amponsah and Kozubowski (2021), and was found to be working quite well in this setting. We recommend that it be used for this estimation problem.
Remark 18 A careful analysis of the case with n = 1 shows that the likelihood equations above do not admit any solution within the parameter space. However, the function Q(δ, p) is maximized byδ n = 0 andp n = 1/N. These values point towards the limiting geometric distribution of N, which arises when δ = 0 in the DP model.

An application from finance
In this section, we show an application of the GMDP model to a financial data set. We chose the daily closing price of Bahrain all share Index, quoted in Bahrain dinar. The data set was obtained from Investing.com, and consisted of the daily closing prices, from May 24, 2010, to February 20, 2019. We converted our daily data to (n = 2171) daily log-returns, which are the logarithms of the ratios of the closing prices for every two consecutive days. Next, we computed the growth episodes and obtained (n = 508) observations (X i , N i ), i = 1, 2, . . . , 508, where the N i is the duration and X i is the magnitude (total log-return over the period of the i th episode) of the ith growth episode. Duration data from this data set was previously fitted with the DP model in Amponsah and Kozubowski (2021). In this work we fit the bivariate GMDP model to the duration and magnitude of the growth episodes.
Assuming the data are IID GMDP (α, β, δ, p), the MLEs computed by solving likelihood equations (see Section 2.4) areα = 0.9420,β = 281.4498,δ = 0.1153, andp = 0.5355. The EM algorithm of Amponsah and Kozubowski (2021) was used to computeδ and p. To assess the goodness-of-fit of our model to the growth episodes of the Bahrain all share Index, we used the MLEs described above to compute the parameters of all the marginal and conditional distributions we fit below. Considering this fact, the generally

Goodness-of-fit analysis
We started with the fit of DP model to the marginal of duration. Table 1 contains frequency, relative frequency, and model probabilities for the values of N observed in the data, namely N = 1, 2, ..., 6, and N ≥ 7. Figure 1 shows a plot of the DP model probabilities (the last row of Table 1) versus the observed relative frequencies (the third row of Table 1), with a diagonal line y = x added for visualization purposes. The DP probabilities are very close to the relative frequencies, and the mean (absolute) percentage error is about 10.598%. A formal chisquare goodness-of-fit test based on the information provided in Table 1 was performed as well, resulting in the p-value of 0.27. Thus, the null hypothesis of DP Pareto distribution of N is not rejected at 5% significance level. Overall, this analysis shows a reasonable fit of the DP model to the marginal of duration.
Next, we checked if the positive daily log-returns are well fit by the gamma distribution with parameter α and β. We used graphical tools (histogram, q-q plots), and formal Kolmogorov-Smirnov (KS) test to confirm the goodness-of-fit. Figure 2 shows a histogram of positive daily log-returns overlayed with fitted gamma PDFs and a q-q plot of the model and empirical quantiles. Both plots show a good fit. We also tested the fit of the model to the positive daily log-returns. The KS statistic and its p-value are 0.0397 and 0.3905, respectively. Therefore, the null hypothesis that the positive daily log-returns follow gamma distributions with parameters α and β, is not rejected at significance level of 5%. Overall, we believe that the positive log-returns may be assumed to have come from gamma distribution with parameters α and β.
Further, we investigated the fit of the model marginal distribution of magnitude to the data. The model marginal distribution of X is a mixture of gamma distributions with scale parameter β and shape parameters αn, n ∈ N, with the mixing weights given by P(N = n), where N ∼ DP(δ, p). The plot in Fig. 3 shows a histogram of the marginal of magnitude overlaid with the gamma mixture model and the corresponding q-q plot. The KS test statistic and p-value are 0.0453 and 0.6751, respectively. Thus, the null hypothesis that the model marginal follows a mixture of gamma distributions with PDF (44) is not rejected at significance level of 5%. We believe the results show a reasonable fit of the marginal distribution.
Next, we turned to goodness-of-fit analysis of the GMDP model's conditional distributions of X given N = k, which are gamma with parameters kα and β, to the corresponding data. Figure 4 shows graphical illustration of fit via histograms overlaid with fitted gamma  PDFs and q-q plots for durations N = 1, 2, 3, and 4. The plots show a reasonable fit of the conditional distributions of the GMDP model to the corresponding data. In addition, we performed the Kolmogorov-Smirnov goodness-of-fit test of these conditional models to the data. The results of the KS tests of the null hypothesis that the conditional distributions of X given N = k are gamma with parameters kα and β, are given in Table 2. None of the tests rejected the null hypothesis on the significance level of 5%. We believe the results confirm a reasonable fit of the conditional distributions.
Similarly, we performed a goodness-of-fit analysis of the GMDP model's conditional distributions of N|X > x for thresholds x = 0.0001, 0.0015 and 0.0025, which should have a hidden truncated discrete Pareto distributions as described in Proposition 6.
Tables 3, 4 and 5 show the frequencies, relative frequencies and estimated model probabilities of all the observed growth periods' durations given that X > x, for the thresholds of x = 0.0001, 0.0015, and 0.0025, respectively. Figure 5 shows plots of the model probabilities versus the corresponding relative frequencies reported in these tables. The estimated conditional model probabilities are very close to the relative frequencies, and the mean of the (absolute) percentage errors for the thresholds x = 0.0001, 0.0015, and 0.0025, are 10.5728%, 10.0313% and 10.0817%, respectively. The chi-square goodness-offit tests based on the information in Tables 3 -5 resulted in the p-values of 0.45, 0.52, 0.49, respectively. We conclude that these conditional theoretical models show reasonable fit to the data.
Overall, we believe the GMDP model fits the growth periods of the Bahrain index rather well. This example provides evidence for the considerable modeling potential of the GMDP model.

Concluding remarks
This work provides fundamental properties of a general stochastic model describing the joint distribution of (X, N), where N is a counting variable while X is the sum of N independent gamma random variables. These properties include marginal and conditional distributions, Laplace transform, moments and tail behavior, and divisibility properties. The existence and uniqueness of maximum likelihood estimators of the relevant parameters are established as well, along with their asymptotic properties. A particular example with a heavy tailed discrete Pareto distributed N illustrates general properties of this  5 The plot of model probabilities versus relative frequencies of duration N = 1, 2, 3, 4, 5, 6 and N ≥ 7 given X > x for thresholds x = 0.0001 (square shape), x = 0.0015 (circular shape), and x = 0.0025 (triangular shape) stochastic model, while a data example from finance shows the modeling potential of this new mixed bivariate distribution.
and ψ X 1 (·) is the LT of the {X i }. This leads to ψ(t, s) = E e −s ψ X 1 (t) N = G N e −s ψ X 1 (t) .
Since for gamma distributed {X i } we have ψ X 1 (t) = 1/(1 + t/β) α , the result follows. Proof of Proposition 3. The result follows from Theorem 3.2 in Robert and Segers (2008), as the conditions (3.10) or (3.11) of that theorem hold for our N due to the fact that N satisfies (13) while the {X i } are light tailed gamma distributed.
Proof of Proposition 5. The result follows from straightforward application of Proposition 4.
Proof of Lemma 1. We proceed by showing that the PMF of the random variable on the right-hand-side in (19) coincides with the expression on the right-hand-side in (18). Indeed, by independence of N and Q, for any n ∈ N 0 , we have into the above expression and interchanging the order of the summation, followed by summing-up the resulting geometric series, we arrive at We now calculate the PGF of Y d = N|N ≥ N(x) + 1, which, in view of (54) and (56) By proceeding as above, and calculating the numerator in (57) by inserting there the expression (55), followed by interchanging the order of the summation and subsequently summing-up the resulting geometric series, we find the numerator in (57)  Since this is the product of the PGF of N and the PGF of T, we obtain the result. Proof of Proposition 9. The result follows by direct computation of the LTs on each side in (29), and subsequence verification that they coincide.
Proof of Proposition 11. We proceed by showing that the joint LT of the expression on the right-hand-side in (32) coincides with the LT of (X, N), with the latter given by ψ(t, s) = G N [ e −s ψ X 1 (t)], as stated in the second remark following Proposition 2. We start with the LT of the pair (R (n) j , V (n) j ). By standard conditioning argument, we have Proof of Theorem 12. First, we re-write the function g (α) as follows: