Analytical properties of generalized Gaussian distributions

*Correspondence: adytso@princeton.edu 1Department of Electrical Engineering, Princeton University, 08544 Princeton, USA Full list of author information is available at the end of the article Abstract The family of Generalized Gaussian (GG) distributions has received considerable attention from the engineering community, due to the flexible parametric form of its probability density function, in modeling many physical phenomena. However, very little is known about the analytical properties of this family of distributions, and the aim of this work is to fill this gap. Roughly, this work consists of four parts. The first part of the paper analyzes properties of moments, absolute moments, the Mellin transform, and the cumulative distribution function. For example, it is shown that the family of GG distributions has a natural order with respect to second-order stochastic dominance. The second part of the paper studies product decompositions of GG random variables. In particular, it is shown that a GG random variable can be decomposed into a product of a GG random variable (of a different order) and an independent positive random variable. The properties of this decomposition are carefully examined. The third part of the paper examines properties of the characteristic function of the GG distribution. For example, the distribution of the zeros of the characteristic function is analyzed. Moreover, asymptotically tight bounds on the characteristic function are derived that give an exact tail behavior of the characteristic function. Finally, a complete characterization of conditions under which GG random variables are infinitely divisible and self-decomposable is given. The fourth part of the paper concludes this work by summarizing a number of important open questions.


Introduction
The goal of this work is to study a large family of probability distributions, termed Generalized Gaussian (GG), that has received considerable attention in many engineering applications. We shall refer to X p with the GG distribution given by the probability density function (pdf ) f X p (x) = c p α e − |x−μ| p 2α p , c p = p 2 p+1 p Dytso et al. Journal of Statistical Distributions and Applications (2018) 5:6 Page 2 of 40 as X p ∼ N p (μ, α p ), and where we define the gamma function, the lower incomplete gamma function and the upper incomplete gamma function as respectively. Another commonly used name for this type of distribution, especially in economics, is the Generalized Error distribution. The flexible parametric form of the pdf of the GG distribution allows for tails that are either heavier than Gaussian (p < 2) or lighter than Gaussian (p > 2) which makes it an excellent choice for many modeling scenarios. The origin of the GG family can be traced to the seminal work of Subbotin (1923) and Lévy (1925). In fact, Subbotin (1923) has shown that the same axioms used by Gauss (1809) to derive the normal distribution, are also satisfied by the GG distribution.
Well-known examples of this distribution include: the Laplace distribution for p = 1; the Gaussian distribution for p = 2; and the uniform distribution on [ μ−α, μ+α] for p = ∞.

Past work
The GG distribution has found use in image processing applications where many statistical features of an image are naturally modeled by distributions that are heavier-tailed than Gaussian. For example, Gabor coefficients are convolution kernels whose frequency and orientation representations are similar to those of the human visual system. Gabor coefficients have found a wide range of applications in texture retrieval and face-recognition problems. However, a considerable drawback of using Gabor coefficients is the memory requirements needed to store a Gabor representation of an image. In Gonzalez-Jimenez et al. (2007) GG distributions with the parameter p < 2 have been shown to accurately approximate the empirical distribution of Gabor coefficients in terms of the Kullback-Liebler (KL) divergence and the χ 2 distance. Moreover, the authors in (Gonzalez-Jimenez et al. 2007) demonstrated that data compression algorithms based on the GG statistical model considerably reduce the memory required to store Gabor coefficients.
In a classical image retrieval problem, a system searches for K images similar to a query image from a digital library containing a total of N images (usually K N). In (Do and Vetterli 2002) by modeling wavelet coefficients with a GG distribution and using the KL divergence as a similarity measure, the authors were able to improve retrieval rates by 65% to 70%, compared with traditional approaches.
Other applications of the GG distribution in image processing applications include modeling: textured images, see Mallat (1989); Moulin and Liu (1999) and de Wouwer et al. (1999); pixels forming fine-resolution synthetic aperture radar (SAR) images (Bernard et al. 2006); and the distribution of values in subband decompositions of video signals Westerink et al. (1991) and Sharifi and Leon-Garcia (1995).
In communication theory, the GG distribution finds many modeling applications in impulsive noise channels which occur when the noise pdf has a longer tail than the Gaussian pdf. For example, in Beaulieu and Young (2009) it is shown that in ultrawideband (UWB) systems with time-hopping (TH) the interference should be modeled with probability distributions that are more impulsive than the Gaussian. Moreover, it has been shown that for the moderate and high signal-to-noise ratio (SNR) the interference in the TH-UWB is well modeled by the GG distribution with a parameter p ≤ 1. In Algazi and Lerner (1964) and Miller and Thomas (1972) certain atmospheric noises were shown to be impulsive and GG distributions with parameter values of 0.1 < p < 0.6 were shown to provide good approximations to their distributions.
GG distributions can also model noise distributions that appear in non-standard wireless media. In Nielsen and B.Thomas (1987) the authors showed that Arctic under-ice noise is well modeled by members of the GG family. In Banerjee and Agrawal (2013) the GG family has been recognized as a model for the underwater acoustic channel where values of p = 2.2 and p = 1.6 have been found to model the ship transit noise and the sea surface agitation noise, respectively.
The problem of designing optimal detectors for signals in the presence of GG noise has been considered in Miller and Thomas (1972); Poor and Thomas (1978) and Viswanathan and Ansari (1989). In Soury et al. (2012) the authors studied the average bit error probability of binary coherent signaling over flat fading channels subject to additive GG noise. Interestingly, the authors of Soury et al. (2012) give an exact expression for the average probability of error in terms of Fox's H functions.
In power systems, the GG distribution has been used to model hourly peak load model demand in power grids (Mohamed et al. 2008).
In Varanasi and Aazhang (1989) the authors studied a problem of estimating parameters of the GG distribution (order p, mean μ, and variance σ 2 = E (X p − μ) 2 ) from n independent realizations of a GG random variable. The authors of (Varanasi and Aazhang 1989) considered three estimation methods, namely, the method of moments, maximum likelihood, and moment/Newton-step estimators, and compared performance of each for different values of p. For example, in the vicinity of p = 2, the moment method was shown to perform best. In (Richter 2007) the authors established connections between chi-square and Student's t-distribution. Moreover, in Richter (2016), using the notions of generalized chi-square and Fisher statistics introduced in Richter (2007), the authors studied a problem of inferring one or two scaling parameters of the GG distribution and derived both the confidence interval and significance test.
The Shannon capacity of channels with GG noise has been considered in Fahs and Abou-Faycal (2018) and Dytso et al. (2017b). In Fahs and Abou-Faycal (2018) the authors gave general results on the structure of the optimal input distribution in channels with GG noise under a large family of channel input cost constraints. In Dytso et al. (2017b) the authors investigated the capacity of channels with GG noise under L p moment constraints and proposed several upper and lower bounds that are asymptotically tight.
As the pdf of GG distributions has a very simple form, many quantities such as moments, entropy, and Rényi entropy can be easily computed (Do and Vetterli 2002;Nadarajah 2005). Also, from the information theoretic perspective the GG distribution is interesting because it maximizes the entropy under a p-th absolute moment constraint (Cover and Thomas 2006;Lutwak et al. 2007). The maximum entropy property can serve as an important intermediate step in a number of proofs. For example, in (Dytso et al. 2018) it has been used to generalize the Ozarow-Wyner bound (Ozarow and Wyner 1990) on the mutual information of discrete inputs over arbitrary channels. In Nielsen and Dytso et al. Journal of Statistical Distributions and Applications (2018) 5:6 Page 4 of 40 Nock (2017) the maximum entropy principle has been used to improve bounds on the entropy of Gaussian mixtures. While the number of applications of the GG distribution is large, many of its properties have been drawn from numerical studies, and few analytical properties of the GG family are known beyond the cases p = 1, 2 and p = ∞. For instance, very little is known about the characteristic function of the GG distribution and only expressions in terms of hypergeometric functions are known. For example, the characteristic function of the GG distribution was given in terms of Fox-Write functions in Pogány and Nadarajah (2010) for all p > 1 and later generalized in terms of Fox-H functions in Soury and Alouini (2015) for all p > 0. The work of Soury and Alouini (2015), also characterized the pdf of the sum of two independent GG random variables in terms of Fox-H functions. Specific non-linear transformations of sums of independent GG distributions and the moment generating function of the GG distribution have been studied in Vasudevay and Kumari (2013).
There is also a large body of work on multivariate GG distributions. For example, to the best of our knowledge, the first multivariate generalization was introduced in (De Simoni 1968) where the exponent was taken to be (x − μ) T K −1 (x − μ) p 2 where x and μ are vectors and K is a matrix. In Goodman and Kotz (1973) the authors introduced yet another multivariate generalization of the GG distribution in (1): X is said to be multivariate GG if and only if it can be written as X = KZ + μ where the components of Z are independently and identically distributed according to the univariate GG distribution in (1). An example of multivariate distributions with GG marginals and examples of multivariate GG distributions defined with respect to other norms the interested reader is referred to Richter (2014); Arellano-Valle and Richter (2012) and Gupta and Nagar (2018) and the references therein.

Paper outline and contributions
Our contributions are as follows: 1 In"Moments and the Mellin transform" section, we study properties of the moments of the GG distribution including the following: • In Proposition 1 we derive an expression for the Mellin transform of the GG distribution; and • In Proposition 2 we show necessary and sufficient conditions under which moments of the GG distribution uniquely determine the distribution.
2 In"Properties of the distribution" section, we study properties of the distribution including the following: • In "Stochastic ordering" section, Proposition 3 shows that the family of GG distributions is an ordered set where the order is taken in terms of second-order stochastic dominance; and • In "Relation to completely monotone functions and positive definiteness" section, Theorem 1 connects the pdf of GG distributions to positive definite functions. In particular, we show that for p ≤ 2 the pdf of the GG distribution is a positive definite function and for p > 2 the pdf is not a positive definite function. Moreover, it is shown that for p ≤ 2 the pdf of the Dytso et al. Journal of Statistical Distributions and Applications (2018) 5:6 Page 5 of 40 GG distribution can be expressed as an integral of a Gaussian pdf with respect to a non-negative finite Borel measure.
3 In"On product decomposition of GG random variables" section, Proposition 5 shows that the GG random variable X p can be decomposed into a product of two independent random variables X p = V · X r where X r is a GG random variable. We carefully study properties of this decomposition including the following: • In "On the PDF of V p,q " section, Proposition 6 gives power series and integral representations of the pdf of V ; and • In "On the determinacy of the distribution of V G,q " section, Proposition 8 shows under which conditions the distribution of V is completely determined by its moments. Interestingly, the range for values of p for which X p and V are determinant is not the same. This gives an interesting example that the product of two determinate random variables is not necessarily determinate.
4 In"Characteristic function" section, we study properties of the characteristic function of the GG distribution including the following: • In "Connection to stable distributions" section, Proposition 9 discusses connections between a class of GG distributions and a class of symmetric stable distributions; • In "Analyticity of the characteristic function" section, Proposition 10 shows under what conditions the characteristic function of the GG distribution is a real analytic function; • In "On the distribution of zeros of the characteristic function" section, Theorem 3 studies the distribution of zeros of the characteristic function of the GG distribution. In particular, it is shown that for p ≤ 2 the characteristic function of the GG distribution has no zeros and is always positive, and for p > 2 the characteristic function has at least one positive-to-negative zero crossing; and • In "Asymptotic behavior of φ p (t)" section, Proposition 11 gives the tail behavior of the characteristic function of the GG distribution and its derivatives. The consequences of this result are discussed.
5 In"Additive decomposition of a GG random variable" section, we study additive decompositions of the GG random variables including the following: • In "Infinite divisibility of the characteristic function" section, Theorem 5 completely characterizes for which values of p the GG random variable is infinitely divisible. In addition, Proposition 14 studies properties of the canonical Lévy-Khinchine representation of infinitely divisible distributions; and • In "Self-decomposability of the characteristic function" section, Theorem 6 characterizes conditions under which a GG distribution of order p can be additively transformed into another GG distribution of order q. In the case of p = q this corresponds to answering if a GG distribution is self-decomposable.
The paper is concluded in "Discussion and conclusion" section by reflecting on future directions. Dytso et al. Journal of Statistical Distributions and Applications (2018) 5:6 Page 6 of 40

Other parametrization of the PDF
In addition to the parametrization used in (1), there are several other parametrization used in the literature. For example, Subbotin in his seminal paper (Subbotin 1923) used the following parametrization, which is still a commonly used notation amongst probability theorists: In some engineering literature where variance models power it is convenient to work with the distributions where the variance is taken to be independent of the parameter p (e.g., (Gonzalez-Jimenez et al. 2007) and Miller and Thomas (1972)) In statistical literature, some authors prefer to use (e.g., (Richter 2016)) In the above parametrization the p-th moment, when μ = 0, is normalized such that it equals to σ p . The choice of the parametrization is usually dictated by the application that one has in mind. In this work, we choose to work with the parametrization in (1) which we found to be convenient for studying the Mellin transform and the characteristic function of the GG distribution.

Moments and the Mellin transform
In this section, we study properties of the moments, absolute moments and Mellin transform of the GG distribution. We also show conditions under which the moments of X p uniquely characterize its distribution. While the majority of the results in this section are not new or are easy to derive, we choose to include them for completeness as most of the development in other section will heavily depend on properties of moments.

Moments, absolute moments, and the Mellin transform
Definition 1 (Mellin Transform (Poularikas 1998).) The Mellin transform of a positive random variable X is defined as The Mellin transform emerges as a major tool in characterizing products of positive independent random variables since Proposition 1 (Mellin Transform of |X p |.) For any p > 0 and X p ∼ N p (0, α p ) Moreover, for any p > 0 and k > −1 the absolute moments are given by Proof The Mellin transform can be computed by using the integral (Poularikas 1998, Table 8.1) and, therefore, where in the last step we used the value of c p in (1). Moreover, the above integral is finite if Re(s) > 0 and p > 0. The proof of (11) follows by choosing s = k + 1 in (10). This concludes the proof.
Note that the p-th absolute moment of X p is given by E X p p = 2α p p . The expression in (11) can also be extended to multivariate GG distributions defined through p norms; see for example Lutwak et al. (2007) and Arellano-Valle and Richter (2012).
The following corollary, which relates k-th moments of two GG distributions of a different order, is useful in many proofs.
for any k ∈ R + . Moreover, for q > p Proof See Appendix A.

Moment problem
The classical moment problem asks whether a distribution can be uniquely determined by its moments. For random variables defined on R, this problem goes under the name of the Hamburger moment problem and for random variables on R + under the name of the Stieltjes moment problem (Stoyanov 2000). If the answer is affirmative, we say that the moment problem is determinate. Otherwise, we say that the moment problem is indeterminate and there exists another distribution that shares the same moments.

Proposition 2
The GG distribution is determinate for p ∈[ 1, ∞) and indeterminate for p ∈ (0, 1). Dytso et al. Journal of Statistical Distributions and Applications (2018) 5:6 Page 8 of 40 Proof We first show that for p ∈ (0, 1) the GG distribution is indeterminate. To show that an absolutely continuous distribution with a pdf f (x) is indeterminate it is enough to check the classical Krein sufficient condition (Stoyanov 2000) given by In other words, if (15) is satisfied, then the distribution is indeterminate. For the GG distribution, the condition in (15) reduces to showing which is finite if p ∈ (0, 1). Therefore, for p ∈ (0, 1) the GG distribution is indeterminate.
To show that the distribution is determinate it is enough to show that the characteristic function has a power series expansion with a positive radius of convergence. For the GG distribution with p ∈[ 1, ∞), this will be done in Proposition 10.
The interested reader is referred to [Lin and Huang (1997), Theorem 2] and [Hoffman-Jørgensen (2017), p. 301] where the conditions for the moment determinacy are provided for a Double Generalized Gamma distribution of which a GG distribution is special case.
Remark 1 To show that for p ∈ (0, 1) there are distributions with the same moments as GG distributions, one can modify the example in [Stoyanov (2000), Chapter 11.4]. Specifically, for any ∈ (0, 1) there exists ρ, r and λ such that the pdf has the same integer moments as a GG distribution.

Remark 2
In (Varanasi and Aazhang 1989) the authors studied the problem of estimating the parameter p from n independent realizations of a GG random variable. As one of the proposed methods, the authors used empirical moments to estimate the parameter p. Moreover, in Varanasi and Aazhang (1989) it has been observed that the method of moments performs poorly for p ∈ (0, 1). In view of Proposition 2, the observation about the method of moments made in Varanasi and Aazhang (1989) can be attributed to the fact that the GG distribution is indeterminate for p ∈ (0, 1).

Stochastic ordering
The cumulative distribution function (CDF) of X p ∼ N p (μ, α p ) is given by Corollary 1 suggests that there might be some ordering between members of the GG family. To make this point more explicit we need the following definition.
Definition 2 (Stochastic Dominance (Levy 1992).) A random variable X dominates another random variable Y in the sense of the first-order stochastic dominance if Proposition 3 Let X p ∼ N p (0, 1) and X q ∼ N q (0, 1). Then, for p ≤ q, X q dominates X p in the sense of the second-order stochastic dominance.
Proof See Appendix B.
It can be shown that the first-order stochastic dominance does not hold since for p ≤ q From Proposition 3 we have the following inequality for the expected value of functions of GG distributions.
Proposition 4 Let X q ∼ N q (0, 1) and X p ∼ N p (0, 1). Then, for p ≤ q and for any nondecreasing and concave function g : R → R we have that Proof The inequality in (19) is equivalent to the second-order stochastic dominance. For more details, the interested reader is referred to Levy (1992).
Examples of functions that satisfy the hypothesis of Proposition 4 are g(x) = x − √ x 2 + 1 and g(x) = −e −tx , t ≥ 0. These choices lead to the following inequalities for p ≤ q: In particular, the inequality in (21) shows that the Laplace transform of f X p (which exists if 1 < p, q ) is larger than the Laplace transform of f X q .

Relation to completely monotone functions and positive definiteness
We begin by introducing the notion of completely monotone and Bernstein functions. (Schilling et al. 2012).

Definition 3 (Completely Monotone and Bernstein Functions
A function f :[ 0, ∞) →[ 0, ∞) is said to be a Bernstein function if the derivative of f is a completely monotone function.
Applying the well-known result from Schilling et al. (2012), that the composition of a completely monotone function and a Bernstein function is completely monotone, on the Dytso et al. Journal of Statistical Distributions and Applications (2018) 5:6 Page 10 of 40 function e −x (completely monotone) and the function x p 2 (Bernstein for p ∈ (0, 1]) we obtain the following.
For p > 1 the function e −x p is not completely monotone.
As will be observed throughout this paper, the GG distribution exhibits different properties depending on whether p ≤ 2 or p > 2. At the heart of this behavior is the concept of positive-definite functions.
Definition 4 (Positive Definite Function (Stewart 1976).) A function f : R → C is called positive definite if for every positive integer n and all real numbers x 1 , x 2 , ..., x n , the n × n matrix is positive semi-definite.
The next result relates the pdf of the GG distribution to the class of positive definite functions.

Theorem 1
The function e − |x| p 2 is • not positive definite for p ∈ (2, ∞); and • positive definite for p ∈ (0, 2]. Moreover, there exists a finite non-negative Borel Proof See Appendix C.
The expression in (24) will form a basis for much of the analysis in the regime p ∈ (0, 2] and will play an important role in examining properties of the characteristic function of the GG distribution. The following corollary of Theorem 1 will also be useful.

Corollary 3 For any
Proof The proof follows by substituting x in (24) with x q p .

On product decomposition of GG random variables
As a consequence of Theorem 1 we have the following decompositional representation of the GG random variable.
Proposition 5 For any 0 < q ≤ p ≤ 2 let X q ∼ N q (0, 1). Then, • V p,q is an unbounded random variable for p < 2 and V p,q = 1 for p = 2; and • for p < 2, V p,q is a continuous random variable with pdf given by Proposition 5 can be used to show that the GG random distribution is a Gaussian mixture which is formally defined next.

Definition 5 (Gaussian Mixture (McLachlan and Peel 2004).) A random variable X is called a (centered) Gaussian mixture if there exists a positive random variable V and a
standard Gaussian random variable Z, independent of V, such that X d = VZ.
As a consequence of Proposition 5 we have the following result.
where V q,q is independent of X 2 and its pdf is defined in ( 26b).
Proof The proof follows by choosing p = q in (26a).
Another case of importance is where X 1 is a Laplace random variable. For the ease of notation the special cases of Gaussian and Laplace mixtures will be denoted as follows in the sequel: respectively.

On the PDF of V p,q
The expression for the pdf of V p,q in (26b) can be difficult to analyze due to the complex nature of the integrand. The next result provides two new representations of the pdf of V p,q that in many cases are easier to analyze than the expression in (26b).
Proposition 6 For 0 < q ≤ p ≤ 2 the pdf of a random variable V p,q has the following representations: Dytso et al. Journal of Statistical Distributions and Applications (2018) 5:6 Page 12 of 40 1 Power Series Representation where 2 Integral Representation where The pdf of the random variable V G,q is plotted in Fig. 1. Interestingly, the slope of f V G,q (v) around v = 0 + behaves very differently depending on whether q < 1 or q > 1. This behavior can be best illustrated by looking at the pdf of Proof By using the power series expansion of f V G,q (v) in (28) and the transformation The proof follows by taking the limit as v → 0 in (34).
As we will demonstrate later, the behavior of the pdf of V G,q around zero will be important in studying the asymptotic behavior of the characteristic function of X q . This is reminiscent of the initial value theorem of the Laplace transform where the value of a function at zero can be used to estimate the asymptotic behavior of its Laplace transform. Indeed, as we will see, the characteristic function of X q and the Laplace transform of V 2 G,q have a clear connection. Dytso et al. Journal of Statistical Distributions and Applications (2018) 5:6 Page 13 of 40

On the determinacy of the distribution of V G,q
Similar to the investigation in "Moment problem" section of whether GG distributions are determinant (uniquely determined by their moments) or not, we now conduct a similar investigation of the distributions of V G,q .

Proposition 8
The distribution of V G,q is determinant for q ≥ 2 5 .
Proof To show that the distribution of V G,q is determinant we can use Carleman's sufficient condition for positive random variables (Stoyanov 2000). This condition states that Next using the expression for the k-th moment of V G,q given in Appendix D and the approximation of the ratio of moments shown in Appendix A we have that Using the approximation in (36) in the sum in (35) we have that By using conditions for the convergence of p-series the sum in (37) diverges if 1 2 1 q − 1 2 ≥ 1 or q ≥ 2 5 . Therefore, Carleman's condition is satisfied if q ≥ 2 5 , and thus V G,q has a determinant distribution for q ≥ 2 5 . This concludes the proof.
Remark 4 According to Proposition 2 and 8, for the range of values q ∈ 2 5 , 1 the random variable X q d = V G,q · X 2 is a product of two random variables with determinant distributions while X q itself has an indeterminate distribution on q ∈ 2 5 , 1 by Proposition 2. This observation generates an interesting example illustrating that the product of two independent random variables with determinant distributions can have an indeterminate distribution.

Characteristic function
The focus of this section is on the characteristic function of the GG distribution. The characteristic function of the GG distribution can be written in the following integral forms.

Theorem 2
The characteristic function of X p ∼ N p (0, 1) is given by where the density of a variable V G,p is defined in Proposition 5.
Proof The proof of (38a) follows from the fact that e − |x| p 2 is an even function which implies that the Fourier transform is equivalent to the cosine transform.
To show (38b) observe that where the equalities follow from: a) the decomposition property in Proposition 5; and b) the independence of V G,p and X 2 and the fact that the characteristic function of X 2 is e − t 2 2 . This concludes the proof.
As a consequence of the positive definiteness, φ p (t), for p ∈ (0, 2], has a more manageable form given in (38b). However, for p > 2 it does not appear that φ p (t) can be written in a more amenable form and the best simplification one can perform is a trivial symmetrization that converts the Fourier transform into the cosine transform in (38a). Nonetheless, the cosine representation in (38a) does allow us to simplify the implementation of the numerical calculation of φ p (t). Examples of characteristic functions of X p ∼ N p (0, 1) for several values of p are given in Fig. 2.
The following result is immediate by Theorem 2.

Connection to stable distributions
A class of distributions that is closed under convolution of independent copies is called stable. A more precise definition is given next.
Definition 6 (Stable Random Variables (Zolotarev 1986;Lukacs 1970).) Let X 1 and X 2 be independent copies of a random variable X. Then X is said to be stable if for all constants a > 0 and b > 0, there exist c > 0 and d ∈ R such that The defining relationship in ( 39) is equivalent to where φ X (t) is a characteristic function of a random variable X.
Throughout this work we will use stable distribution, stable random variable, and stable characteristic function interchangeably.
The characteristic function of a stable distribution has the following canonical representation: where μ ∈ R is the shift-parameter, c ∈ R + is the scaling parameter, β ∈[ −1, 1] is the skewness parameter, and α ∈ (0, 2] is the order parameter. We refer the interested reader to (Zolotarev 1986) for a comprehensive treatment of the subject of stable distributions.
In this work we are interested in symmetric stable distributions (i.e., β = 0) which also go under the name of α-stable distributions with the characteristic function given by Observe that there is a duality between a class of symmetric stable distributions and a class of GG distributions with p ∈ (0, 2]. Up to a normalizing constant, the pdf of a GG random variable is equal to the characteristic function of an α-stable random variable. Equivalently, the pdf of an α-stable random variable is equal, up to a normalizing constant, to the characteristic function of a GG random variable.
We exploit this duality to give, yet another, integral representation of the characteristic function of the GG distribution with parameter p ∈ (0, 2].

Proposition 9 For p
Moreover, let the integrand in ( 43a) be given by

then:
• U p (x) is a non-negative function; • For p ∈ (0, 1), U p (x) is an increasing function with • The function g p has a single maximum given by Proof The characterization in (43a) can be found in (Zolotarev 1986, Theorem 2.2.3). The proof of the properties of U p (x) is presented in Appendix F.
Since the integral in Proposition 9 is performed over a finite interval, the characterization in Proposition 9 is especially useful for numerical computations of φ p (t). The plots in Fig. 2, for p ∈ (0, 2), are done by using the expression for φ p (t) in (43a). To the best of Dytso et al. Journal of Statistical Distributions and Applications (2018) 5:6 Page 17 of 40 our knowledge, the properties of U p (x) and g p (x), derived in Proposition 9, are new and facilitate a more efficient numerical computation of the integral representation of φ p (t).
The plot of the function U p (x) for p = 0.5 and p = 1.5 is shown in Fig. 3. We suspect that most of the properties of φ p (t) for p ∈ (0, 2) that we derive in this paper can be found by using the integral expression in (43a). However, instead of taking this route we use the product decomposition in Proposition 5 to derive all the properties of φ p (t). We believe that using a product decomposition is a more natural approach. Moreover, the positive random variables in Gaussian mixtures, V G,p in our case, naturally appear in a number of applications (e.g., bounds on the entropy of sum of independent random variables (Eskenazis et al. 2016)) and are of independent interest.

Analyticity of the characteristic function
An important question, in particular for numerical methods, is: when can the characteristic function of a random variable be represented as a power series of the form The above expression is especially useful since the moments of GG distributions are known for every k; see Proposition 1.
For p < 1 the function φ p (t) is not real analytic. Proof See Appendix G.
The results of Proposition 10 also lead to the conclusion that for p > 1 the moment generating function of X p , M p (t) = E e tX p exists for all t ∈ R.

On the distribution of zeros of the characteristic function
As seen from Fig. 2 the characteristic function of the GG distribution can have zeros. The next theorem gives a somewhat surprising result on the distribution of zeros of φ p (t).

Theorem 3 The characteristic function of φ p (t) has the following properties:
• for p > 2, φ p (t) has at least one positive to negative zero crossing. Moreover, the number of zeros is at most countable; and • for p ∈ (0, 2], φ p (t) is a positive function.
Proof See Appendix H.
Also, we conjecture that zeros of φ p (t) have the following additional property.

Conjecture 1 For p ∈ (2, ∞) zeros of φ p (t) do not appear periodically.
It is important to point out that, for p = ∞, the characteristic function is given by φ ∞ (t) = sin(t) t = sinc(t), and zeros do appear periodically. However, for p < ∞ we conjecture that zeros do not appear periodically.

Asymptotic behavior of φ p (t)
Next, we find the asymptotic behavior of φ p (t) as t → ∞. In fact, the next result gives the asymptotic behavior not only of φ p (t) = E e − V 2 G,p t 2 2 but also of a more general function for some m > 0. The analysis of the function in (45) also allows one to find asymptotic behavior on higher order derivatives of φ p (t). For example, the first order derivative can be related to the function in (45) as follows: Proposition 11 Let m ∈ R + ; then Proof See Appendix I.
Using Proposition 11, we can give an exact tail behavior for φ p (t). Dytso et al. Journal of Statistical Distributions and Applications (2018) where A 0 is defined in ( 46). Moreover, for 0 < q, p < 2 and some α > 0 Proof The proof follows immediately from Proposition 11.
Note that, for p ∈ (0, 2], the function φ p ( √ 2t) can be thought of as a Laplace transform of the pdf of the random variable V 2 G,p . This observation together with the asymptotic behavior of φ p (t) leads to the following result.

Proposition 13 For n ∈ R, E[ V n G,p ] is finite if and only if n + p > −1.
Proof For n > −1 the proof is a consequence of the decomposition property in Propositions 5 and 1 where it is shown that E[ |X p | n ] < ∞ if n > −1 for all p > 0. Therefore, we assume that n < −1.
First observe that for any positive random variable X and k > 0 the negative moments of X can be expressed as follows: where F(t) is the Laplace transform of the pdf of X. Using the identity in (48) and the fact that φ p ( √ 2t) is the Laplace transform of the pdf of the random variable V 2 G,p we have that Note that the integral in (49) , which implies that the integral in (49) is finite if and only if 2k − p < 1. Setting 2k = −n concludes the proof.
According to Proposition 1 and Proposition 5, for n > −1 while for n ≤ −1 it is not clear whether E V n G,p is finite since both moments E[ |X p | n ] = ∞ and E[ |X 2 | n ] = ∞. The result in Proposition 13 is interesting because it states that E[ V n G,p ] is finite even if absolute moments of X p and X 2 are infinite. The result in Proposition 13 plays an important role in deriving non-Shannon type bounds in problems of communicating over channels with GG noise; see (Dytso et al. 2017b) for further details. Page 20 of 40

Additive decomposition of a GG random variable
In this section we are interested in determining whether a GG random variable X q ∼ N q (0, α q ) can be decomposed into a sum of two or more independent random variables.

Infinite divisibility of the characteristic function
Definition 7 (Infinite Divisibility (Lukacs 1970;van Harn and Steutel 2003).) A characteristic function φ(t) is said to be infinitely divisible if for every n ∈ N there exists a characteristic function φ n (t) such that Similarly to stable distributions, we use infinitely divisible distribution, infinitely divisible random variable, and infinitely divisible characteristic function interchangeably.
Next we summarize properties of infinitely divisible distributions needed for our purposes.
where a is real and where θ(x) is a non-decreasing and bounded function such that lim x→−∞ θ(x) = 0. The function dθ(x) is called the Lévy measure. The integrand is defined for x = 0 by continuity to be equal to − t 2 2 . The representation in ( 51) In general, the Lévy measure dθ is not a probability measure and hence the distribution function θ(x) is not bounded by one.
We use Theorem 4 to give a complete characterization of the infinite divisibility property of the GG distribution.
Next observe that where the equalities follow from: a) the expression for the CDF in (16); and b) using the limit lim x→∞ (s,x) x s−1 e −x = 1 (Olver 1991). From the limit in (53) and since the distribution is Gaussian only for p = 2 we have from property 4) in Theorem 4 that φ p (t) is not infinitely divisible for p ≥ 1 unless p = 2.
Another proof that φ p (t) is not infinitely divisible for p > 2 follows from Theorem 3 since φ p (t) has at least one zero, which violates property 1) of Theorem 4. This concludes the proof.
Next, we show that the Lévy measure in the canonical representation in (51) is an absolutely continuous measure. This also allows us to give a new representation of φ p (t) for p ∈ (0, 1] where it is infinitely divisible.

Proposition 14
For p ∈ (0, 1], the Lévy measure is absolutely continuous with density f θ (x) and φ p (t) can be expressed as follows: Moreover, for x = 0 Proof See Appendix J.
Remark 5 For the Laplace distribution with φ 1 (t) = 1 1+4t 2 , the density f θ (x) can be computed by using ( 54b) and is given by and the exponent in the Lévy-Khinchine representation is given by

Self-decomposability of the characteristic function
In this section we are interested in determining whether a GG random variable X q ∼ N q (0, α q ) can be decomposed into a sum of two independent random variables in which Dytso et al. Journal of Statistical Distributions and Applications (2018)  one of the random variables is GG. Distributions with such a property are known as selfdecomposable.
Definition 8 (Self-Decomposable Characteristic Function (Lukacs 1970;van Harn and Steutel 2003).) A characteristic function φ(t) is said to be self-decomposable if for every α ≥ 1 there exists a characteristic function ψ α (t) such that In our context, the GG random variable X p ∼ N p (0, 1) is self-decomposable if for every α ≥ 1 there exists a random variableX α such that where Z p ∼ N p (0, 1) is independent ofX α . In this section, we will look at a generalization of self-decomposability (in Eqs. (56) and (57)) and study whether there exists a random variableX α independent of Z p ∼ N p (0, 1) such that where X q ∼ N q (0, 1) for every α ≥ 1. The decomposition in (58) finds application in information theory where the existence of the decomposition in (58) guarantees the achievability of Shannon's bound on the capacity; see (Dytso et al. 2017b) for further details.
The existence of a random variableX α is equivalent to showing that the function φ (q,p,α) is a valid characteristic function.
Observe that both Gaussian and Laplace are self-decomposable random variables. Selfdecomposability of Gaussian random variables is a well known property. To see that the Laplace distribution is self-decomposable notice that φ (1,1,α) The expression in (60) is a convex combination of the characteristic function of a point mass at zero and the characteristic function of a Laplace distribution. Therefore, the expression in (60) is a characteristic function. Checking whether a given function is a valid characteristic function is a notoriously difficult question, as it requires checking whether φ (q,p,α) (t) is a positive definite function; see (Ushakov 1999) for an in-depth discussion on this topic. However, a partial answer to this question can be given.
Proof See Appendix K.
The result of Theorem 6 is depicted in Fig. 4 We would like to point out that for 2 < q ≤ p there are cases when φ (q,p,α) (t) is a characteristic function for some but not all α ≥ 1. Specifically, let p = q = ∞ in which case φ ∞ (t) = sin(t) t = sinc(t) and φ (∞,∞,α) For example, when α = 2 we have that φ (∞,∞,α) (t) = 1 2 cos(2t), which corresponds to the characteristic function of the random variableX = ±1 equally likely. Note that in the above example, because zeros of φ p (t) occur periodically, we can select α such that the poles and zeros of φ (q,p,α) (t) cancel. However, we conjecture that such examples are only possible for p = ∞, and for 2 < p < ∞ zeros of φ p (t) do not appear periodically (see Conjecture 1) leading to the following: Fig. 4 In the regime S 2 = {(p, q) : 0 < p = q < 1} (the dashed line) φ (q,p,α) (t) is self-decomposable. We also emphasize the point (p, q) = (2, 2) (the black square) corresponds to the Gaussian characteristic function, and the point (p, q) = (1, 1) (the black circle) corresponds to the Laplace characteristic function. The regime S 1 = {(q, p) : 2 < q < p} (the gray triangle) is where φ (q,p,α) (t) is not a characteristic function for almost all α ≥ 1. The white space is the regime where φ (q,p,α) (t) is not a characteristic function for all α ≥ 1 Dytso et al. Journal of Statistical Distributions and Applications (2018) 5:6 Page 24 of 40 Conjecture 2 For 2 < q ≤ p < ∞, φ (q,p,α) (t) is not a characteristic function for all α > 1.
It is not difficult to check, by using the property that convolution with an analytic function is again analytic, that Conjecture 2 is true if p is an even integer and q is any non-even real number.

Discussion and conclusion
In this work we have focused on characterizing properties of the GG distribution. We have shown that for p ∈ (0, 2] the GG random variable can be decomposed into a product of two independent random variables where the first random variable is a positive random variable and the second random variable is also a GG random variable. This decomposition was studied by providing several expressions for the pdf of the positive random variable.
A related open question is whether Proposition 5 can be extended to the regime of p > 2. That is, the question is, can X p be decomposed as follows: for some positive random variable V independent of X q ∼ N q (0, 1)? Noting that |X| p d = V · |X q | and using the Mellin transform method (recall that the Mellin transform works only for non-negative random variables) this question reduces to determining whether is a proper characteristic function. A partial answer to this question is given next.

Proposition 15
The function φ log(V ) (t) • for p > q, is not a valid characteristic function. Therefore, the decomposition in (62) does not exist; and • for p < q, is an integrable function. Moreover, if φ log(V ) (t) is a valid characteristic function then the pdf of V is given by Proof See Appendix L.
To check if the decomposition in (62) exists for p < q one needs to verify whether the function in (63) is a valid pdf. Because of the complex nature of the integral it is not obvious whether the function in (63) is a valid pdf, and we leave this for future work.
We have also characterized several properties of the characteristic function of the GG distribution such as analyticity, the distribution of zeros, infinite divisibility and self-decomposability. Moreover, in the regime p ∈ (0, 2) by exploiting the product decomposition we were able to give an exact behavior of the tail of the characteristic function. Dytso et al. Journal of Statistical Distributions and Applications (2018) 5:6 Page 25 of 40 We expect that the properties derived in this paper will be useful for a large audience of researchers. For example, in (Dytso et al. 2017b(Dytso et al. , 2018 we have used the result in this paper to answer important information theoretic questions about optimal communication over channels with GG noise and optimal compression of GG sources. In view of the fact that GG distributions maximize entropy under L p moment constraints, we also expect that GG distributions will start to play an important role in finding bounds on the entropy of sums of random variables; see for example (Eskenazis et al. 2016) and (Dytso et al. 2017a) where GG distributions are used to derive such bounds.

Appendix A: Proof of Corollary 1
The goal is to show that for every fixed k > 0 the function g k (p) is decreasing in p. This result can be extracted from the next lemma which demonstrates a slightly more general result.

Lemma 1 Let
and let γ denote the Euler's constant where γ ≈ 0.57721. Then, for every fixed k > 0 and log(a) > γ the function g k,a (x) is increasing in x > 0.
Proof Instead of working with g k,a (x) it is simpler to work with a logarithm of g k,a (x) (recall that logarithms preserve monotonicity) Taking the derivative of f k,a (x) we have that where ψ 0 (x) is the digamma function. Next using the series representation of the digamma function (Abramowitz and Stegun 1964) given by Dytso et al. Journal of Statistical Distributions and Applications (2018)  we have that the derivative is given by Clearly the terms in the summation in (68) are positive under the assumptions of the lemma and, hence, d dx f k,a (x) > 0. This concludes the proof.
Observing that g k (p) = g k,2 1 p and log(2) ≈ 0.693 > γ ≈ 0.577 concludes the proof that g k (p) is a decreasing function.
The second part follows by using Stiriling's approximation (x + 1) ≈ √ 2πx x e x and the property that (x + 1) = x (x) as follows: The proof is concluded by taking the limit as k → ∞ and using that q > p.

Appendix B: Proof of Proposition 3
The proof follows from the inequality: for p ≤ q. For completeness the inequality in (69) is shown in Appendix B.1. Without loss of generality assume that x > 0 and observe that where (71)

B.1 Proof of the inequality in (69)
Let The goal is to show that f (p, x) is an increasing function of p. To that end, observe that by using a change of variable u = (2t) 1 p the function f (p, x) can be written as Therefore, showing monotonicity of f (p, x) is equivalent to showing that for p ≤ q The inequality in (75) can be conveniently re-written as and then the inequality in (76) follows by the monotonicity of the exponential function. This concludes the proof.

Appendix C: Proof of Theorem 1
To show that e − |x| p 2 is not a positive definite function for p > 2 it is enough to consider the following counterexample. In Definition 4 let n = 3 and choose |x 1 − x 2 | = , |x 2 − x 3 | = a and |x 1 − x 3 | = (a + 1) for some , a > 0. Therefore, the determinant of the matrix A is given by The idea of the proof is to show that for a small we have that h( ) < 0. To that end, we use the following small t approximation e t = 1 + t The proof is concluded by taking small enough and noting that ((a+1) p +a p +1) 2 2 − a 2p − (a + 1) 2p − 1 ≥ 0 for p ≤ 2 and ((a+1) p +a p +1) 2 2 − a 2p − (a + 1) 2p − 1 < 0 for p > 2. Dytso et al. Journal of Statistical Distributions and Applications (2018)  An easy way of see that e − |x| p 2 is a positive definite function is by observing that e − |x| p 2 , for p ∈ (0, 2], is a characteristic function of a stable distribution of order p. The proof then follows by Bochner's theorem (Ushakov 1999, Theorem 1.3.1.) which guarantees that all characteristic functions are positive definite. For other proofs that e − |x| p 2 is positive definite for p ∈ (0, 2] we refer the reader to (Lévy 1925) and (Bochner 1937).
To show that e − |x| p 2 can be represented in the integral form given in (24) we use the proof outlined in (Bochner 1937). According to Bernstein's theorem (Widder 1946, Theorem 12.a) every completely monotone function can be written as a Laplace transform of some non-negative finite Borel measure μ. In Corollary 2 we have verified that e − u p 2 2 is a completely monotone function for p ∈ (0, 2]. Therefore, according to Bernstein's theorem, we can write e − u p 2 2 for p ∈ (0, 2] as follows: for u > 0 Substituting u = x 2 into (78) completes the proof.

Appendix D: Proof of Proposition 5
To simplify the notation let r = 2q p . To show that X q = V p,q · X r , first observe that where the equalities follow from: a) using the representation of e − |x| p 2 in Corollary 3; and b) interchanging the order of integration which is justified by Tonelli's theorem for positive functions.
The above implies that dν(t) = = P V p,q · X r ∈ S , where the equalities follow from: a) the representation of e − |x| p 2 in Theorem 1; b) the fact that dν(t) = c q c r 1 t 1 r dμ p (t) is a probability measure; c) because X r is independent of t; and d) renaming V p,q = 1 T 1 r . Therefore, it follows from (79) that X q d = V p,q · X r .
Next, we show that for p < 2 the random variable V p,q is unbounded. Any random variable V p,q is unbounded if and only if lim k→∞ E 1 k V k p,q = ∞.
To show that V p,q is unbounded observe that due to its non-negativity all the moments of V p,q are given by Moreover, by the assumption that p < 2 we have that r = 2q p > q, and by using Corollary 1 we have that for r > q Therefore, V p,q is an unbounded random variable for p < 2. For p = 2 we have that r = q and, hence, E V k p,q = E |X q | k E[|X r | k ] = 1, for all k > 0. Therefore, V p,q = 1 for p = 2.
To find the pdf of V p,q we use the Mellin transform approach by observing that Therefore, by using Proposition 1 the Mellin transform of V p,q is given by Finally, the pdf of V p,q is computed by the inverse Mellin transform of (80) This concludes the proof.

Appendix E: Proof of Proposition 6
To simplify the notation let r = 2q p . First, we show the power series representation of f V p,q (v) given in (28). Using the integral representation of f V p,q (v) in (26b) and the residue theorem we have that Dytso et al. Journal of Statistical Distributions and Applications (2018) where the s k are given by the poles of where Therefore, by putting (81), (82), and (83) together we arrive at where the last step is due to the identity (−x) (x) = − π x sin(πx) and the identity (x + 1) = x (x). The proof of this part is concluded by noting that a 0 = 0.
To show the representation of f V p,q (v) in (30) we use the definition of the gamma function (z) = ∞ 0 x z−1 e −x dx as follows: To validate the interchange of summation and integration in (84) observe that  where the (in)-equalities follow from: a) using the inequality |sin (x)| ≤ 1; b) using the power series e x = ∞ n=0 x n n! ; and c) using the fact that the integral converges since q r − 1 = p 2 − 1 < 0 and where we have used that p = 2q r and p < 2 and, hence, 2 kq 1 r − 1 q v kq x kq r < x for large enough x.