Open Access

Bivariate beta-generated distributions with applications to well-being data

Journal of Statistical Distributions and Applications20141:15

https://doi.org/10.1186/2195-5832-1-15

Received: 28 February 2014

Accepted: 10 June 2014

Published: 30 June 2014

Abstract

The class of beta-generated distributions (Commun. Stat. Theory Methods 31:497–512, 2002; TEST 13:1–43, 2004) has received a lot of attention in the last years. In this paper, three new classes of bivariate beta-generated distributions are proposed. These classes are constructed using three different definitions of bivariate distributions with classical beta marginals and different covariance structures. We work with the bivariate beta distributions proposed in (J. Educ. Stat. 7:271–294, 1982; Metrika 54:215–231, 2001; Stat. Probability Lett. 62:407–412, 2003) for the first proposal, in (Stat. Methods Appl. 18: 465–481, 2009) for the second proposal and (J. Multivariate Anal. 102:1194–1202, 2011) for the third one. In each of these three classes, the main properties are studied. Some specific bivariate beta-generated distributions are studied. Finally, some empirical applications with well-being data are presented.

Mathematics Subject Classification (2000)

62E15; 60E05

Keywords

Classical beta distribution Bivariate beta distribution Covariance structure GB1 and GB2 distributions

1 Introduction

In the recent statistical literature several methodologies of constructing bivariate and multivariate distributions based on marginal and conditional distributions have been proposed; see the works by Arnold et al. (1999; 2001), Kotz et al. (2000), Sarabia and Gómez-Déniz (2008) and Balakrishnan and Lai (2009) among others.

An important field of research focuses on the study of new classes of univariate distributions which contain the classical proposals, also allowing for more flexibility in fitting data. In this sense, the class of beta-generated (BG) distributions (Eugene et al. 2002; Jones 2004) has received an increasing amount of attention in recent years.

There are several reasons for studying classes of multivariate beta generated distributions. The two existing proposals of multivariate BG distributions present some drawbacks. The first proposal (Jones and Larsen 2004) is only valid for modeling data above the diagonal. The second proposal (Arnold et al. 2006) is defined in terms of the conditional distributions, and the corresponding marginal distributions do not follow, in general, beta generated distributions.

The three bivariate and multivariate models proposed in this paper present BG marginals, with high flexibility in the marginals and in the dependence structure. The marginal distributions of the first model share one of the shape parameters, and the structure of dependence satisfies TP2 condition (see Section 3). The marginals of the second model are free, and the different pairwise of marginals are associated (see Section 3). The third model is the more flexible, in the sense than all the marginals are free (they do not share any shape parameter) and the covariance structure admits correlations of any sign.

On the other hand, these classes of distributions present several fields of applicability. For example, bivariate beta generated distributions with classical beta marginals are natural choices to be used as prior distributions for the parameters of correlated binomial random variables (with any sign for the correlation) in Bayesian analysis (see Apostolakis and Moieni 1987; Arnold and Ng 2011).

In this work, three new classes of bivariate BG distributions are proposed. These classes are constructed using three alternative definitions of bivariate distributions with classical beta marginals and different covariance structures. We work with the bivariate beta distributions proposed by Libby and Novick (1982), Jones (2001) and Olkin and Liu (2003) for the first proposal, El-Bassiouny and Jones (2009) for the second proposal and Arnold and Ng (2011) for the third one. For each of these three classes, the main properties are obtained. Some specific bivariate BG distributions are studied. Finally, some empirical applications with well-being data are presented.

The contents of this paper are as follows. In Section 2 we present some basic properties of the class of the BG distributions and a brief review about two multivariate extensions of the BG distribution. Section 3 considers and studies the three classes of bivariate BG distributions and their main properties as well as to introduce three specific bivariate distributions. A number of applications of these distributions to fit well-being data are presented in Section 4. Finally, some conclusions and future research directions are given in Section 5.

2 Some univariate and multivariate beta-generated distributions

2.1 The univariate class of beta-generated distributions

In this section we present basic properties of the class of BG distributions. We begin with an initial baseline probability density function (PDF) f(x), where the corresponding cumulative distribution function (CDF) is represented by F(x). The class of BG distributions is defined in terms of the PDF by (a,b > 0),
g F ( x ; a , b ) = B ( a , b ) - 1 f ( x ) F ( x ) a - 1 1 - F ( x ) b - 1 ,
(1)

where B(a,b) = Γ(a)Γ(b)/Γ(a + b) denotes the classical beta function. A random variable X with PDF (1) will be denoted by X B G ( a , b ; F ) .

The CDF associated to (1) is,
G F ( x ; a , b ) = I F ( x ) ( a , b ) ,

where IF(x)(·,·) denotes the incomplete beta ratio.

If B B e ( a , b ) represents the classical beta distribution, a simple stochastic representation of (1) is,
X = F - 1 ( B ) .
(2)
This representation permits a direct simulation of the values of a random variable with PDF (1), which can be also used for generating multivariate versions of the BG distribution. The raw moments of a BG distribution can be obtained by,
E X r = E F - 1 ( B ) r , r > 0 .

An important number of new classes of distributions have been proposed using this methodology. Some representatives examples of BG distributions include the generalized beta of the first kind (GB1) proposed by McDonald (1984), the generalized beta of the second kind (GB2) proposed and studied by Venter (1983) and McDonald (1984), the logF distribution (Barndorff-Nielsen et al. 1982), the beta-normal distribution (Eugene et al. 2002), the beta-exponential distribution (Nadarajah and Kotz 2006) and the Skew-t distribution (Jones 2004).

Some extensions of this family have been proposed by Alexander and Sarabia (2010), Alexander et al. (2012), Cordeiro and de Castro (2011) and Zografos (2011). Other alternative flexible families of distributions can be found in Alzaatreh et al. (2013, 2014) and Lee et al. (2013).

If a = i and b = n - i + 1 in (1), we obtain the PDF of the i-th order statistic from F (Jones 2004). Below, we highlight some representative values of a and b,

  •  If a = b = 1, g F  = f.

  •  If a = n and b = 1, we obtain the distribution of the maximum.

  •  If a = 1 and b = n, we obtain the distribution of the minimum.

  •  If a ≠ b, we obtain a family of skew distributions.

Parameters a and b control the tailweight of the distribution. Specifically, the a parameter controls left-hand tailweight and the b parameter controls the right-hand tailweight of the distribution. On the other hand, a = b yields a symmetric sub-family, with a controlling tailweight. If a = b = 1 the BG family is always symmetric if the baseline function F(x) is symmetric. In this sense, the BG distribution accommodates several kind of tails. For example (see Jones 2004),

  •  Potential tails: If fx-(α+1) and α > 0, when x g F x-b α-1,

  •  Exponential tails: If fe-α x and β > 0, then g F e-b β x if x → .

2.2 Two previous classes of multivariate extensions of beta-generated distributions

There are two proposals for multivariate extensions of BG distributions. The first proposal used the joint PDF of a subset of order statistics, and has been proposed by Jones and Larsen (2004). The second proposal used the so called Rosenblatt construction (Rosenblatt 1952), and has been proposed by Arnold et al. (2006). These two alternatives are not related with the multivariate BG distributions studied in this paper.

3 Three classes of bivariate beta-generated distributions

In this section we introduce three new classes of bivariate BG distributions. These three classes are constructed combining the basic stochastic representation (2) with three recent definitions of bivariate beta distributions proposed in the literature. The three definitions differ from each other in the marginal distributions and in the flexibility of the covariance structure. First, we need some previous notation. Let X be a random variable distributed as the classical gamma denoted by XG a , with PDF f(x) = [Γ(a)]-1xa-1e-x, with x ≥ 0 and a > 0. Then, if X1G a and X2G b are independent gamma random variables, the transformed random variable X = X1/(X1 + X2) is distributed as the classical beta distribution with parameters (a,b).

3.1 The first class of bivariate beta-generated distributions

The first class of bivariate beta-generated distribution is based on the following class of bivariate beta distribution.

Definition 1. Let G a 1 , G a 2 and G b be three independent gamma random variables with a1,a2,b > 0. The first class of bivariate beta distribution is defined by the stochastic representation,
Z 1 , Z 2 = G a 1 G a 1 + G b , G a 2 G a 2 + G b .
(3)

This class was initially proposed by Libby and Novick (1982) and then studied by Jones (2001) and Olkin and Liu (2003).

Now, using (3) we define the following class of bivariate BG distributions.

Definition 2. Let G a 1 , G a 2 and G b be three independent gamma random variables with a1,a2,b > 0. The first class of bivariate BG distribution is defined by the stochastic representation,
X 1 , X 2 = F 1 - 1 G a 1 G a 1 + G b , F 2 - 1 G a 2 G a 2 + G b ,
(4)

where F1(·),F2(·) are genuine CDF.

3.1.1 Basic properties

In this section we study some basic properties of the bivariate BG distribution defined in (4). The marginal distributions of (4) are BG distributions,
X 1 B G a 1 , b ; F 1 , X 2 B G a 2 , b ; F 2 .

Note that both marginals share the second shape parameter b. However, this fact does not make the model less flexible, since both baseline distributions F1 and F2 are different.

Using the joint PDF of the bivariate beta distribution (3) (See Appendix), we obtain the joint PDF of the first class of bivariate BG distribution given by,
f x 1 , x 2 = F 1 x 1 a 1 - 1 F 2 x 2 a 2 - 1 1 - F 1 ( x 1 ) a 2 + b - 1 1 - F 2 ( x 2 ) a 1 + b - 1 f 1 x 1 f 2 x 2 B a 1 , a 2 , b 1 - F 1 x 1 F 2 x 2 a 1 + a 2 + b ,
(5)
where a1,a2,b > 0 and B(a1,a2,b) = Γ(a1)Γ(a2)Γ(b)/Γ(a1 + a2 + b). An alternative expression of (5) in terms of the PDF of the BG distribution is,
f x 1 , x 2 = Γ a 1 + b Γ a 2 + b Γ ( b ) Γ a 1 + a 2 + b g F 1 x 1 ; a 1 , a 2 + b g F 2 x 2 ; a 2 , a 1 + b 1 - F 1 x 1 F 2 x 2 a 1 + a 2 + b ,
where g F (x;a,b) represents the PDF defined in (1). The joint PDF can be also written as an infinite mixture,
f x 1 , x 2 = j = 0 dA ( j ) g F 1 x 1 ; a 1 + j , a 2 + b · g F 2 x 2 ; a 2 + j , a 1 + b ,
(6)
where d = Γ ( a 1 + b ) Γ ( a 2 + b ) Γ ( b ) Γ ( a 1 + a 2 + b ) and
A ( j ) = Γ a 1 + j Γ a 1 · Γ a 2 + j Γ a 2 · Γ a 1 + a 2 + b Γ a 1 + a 2 + b + j · 1 j ! .
The conditional distribution of X1 given X2 is,
f x 1 | x 2 = Γ a 1 + a 2 + b Γ a 1 Γ a 2 Γ ( b ) F 1 x 1 a 1 - 1 1 - F 1 x 1 a 2 + b - 1 1 - F 2 x 2 a 1 f 1 x 1 1 - F 1 x 1 F 2 x 2 a 1 + a 2 + b ,
and the regression function of X1 given X2 is,
E X 1 | X 2 = x 2 = Γ a 1 + a 2 + b Γ a 1 Γ a 2 Γ ( b ) 1 - F 2 x 2 a 2 0 1 F 1 - 1 ( t ) t a 1 - 1 ( 1 - t ) a 2 + b - 1 1 - t F 2 x 2 a 1 + a 2 + b dt .
In order to study the dependence between X1 and X2, we consider the local dependence function defined by (Holland and Wang 1987),
γ x 1 , x 2 = 2 x 1 x 2 log f x 1 , x 2 .

We use the definitions of total positivity of order 2 (T P2) functions and reverse rule of order 2 (R R2) functions, which are the following.

Definition 3. A joint PDF f(x,y) is said to be T P2 (R R2) if
f ( x , y ) f ( u , v ) - f ( x , v ) f ( u , y ) 0 ( 0 )

for all x ≤ u and y ≤ v.

The following result relates the local dependence function γ(x1,x2) with the T P2 and R R2 (see Theorem 7.1 in Holland and Wang 1987).

Theorem 1. Let f(x1,x2) be the joint PDF of (X1,X2) with support on a set S where the set S = S1 × S2. Then, f(x1,x2) is T P2 (R R2) if and only if γ(x1,x2) ≥ 0(≤ 0).

For the first class of BG distribution, it can be verified that
γ x 1 , x 2 = a 1 + a 2 + b f 1 x 1 f 2 x 2 1 - F 1 x 1 F 1 x 1 2 > 0 ,

and then X1 and X2 are T P2. As a consequence, the linear correlation coefficient between X1 and X2 is always positive.

It can be proved (see Shaked 1977) that if the joint PDF f(x1,x2) is T P2 (R R2), then the conditional hazard rate of X1|X2 = x2 is decreasing (increasing) in x2. A similar property holds for the other conditional distribution X2|X1 = x1. As the PDF in (5) is T P2, this property shows the monotonicity properties of the hazard rate functions of the conditional distributions of X1|X2 = x2 as a function of x2 and the X2|X1 = x1 as a function of X1.

On the other hand, because X1 and X2 are increasing functions of independent random variables, X1 and X2 are associated random variables (Esary et al. 1967).

Expressions for the cross moments E X 1 r 1 X 2 r 2 can be obtained from (5) or in terms of an infinite mixture from (6). On the other hand, if b > r, it can be shown that,
E F 1 X 1 r F 2 X 2 r 1 - F 1 X 1 F 2 X 2 r = Γ a 1 + r Γ a 2 + r Γ ( b - r ) Γ a 1 + a 2 + b Γ a 1 Γ a 2 Γ ( b ) Γ a 1 + a 2 + b + r .

3.1.2 Extension to higher dimensions

The extension to higher dimensions is direct. The m-dimensional random vector is defined as,
X 1 , , X m = F i - 1 G a i G a i + G b , i = 1 , 2 , , m .
(7)
The marginal distributions are X i B G ( a i , b ; F i ) , i = 1,2,…,m. The joint PDF of (7) is given by,
f x 1 , , x m = c i = 1 m F i x i a i - 1 f i x i 1 - F i x i a i + 1 1 + i = 1 m F i x i 1 - F i x i - b ,

where c-1 = B(a1,…,a m ,b) = Γ(a1)  Γ(a m )Γ(b)/Γ(a1 + … + a m  + b).

3.2 The second type of bivariate beta-generated distributions

The second type of bivariate BG distribution is motivated by the fact of having a bivariate distribution with arbitrary BG marginals. This second class is based on the following class of bivariate beta distribution, which was proposed by El-Bassiouny and Jones (2009).

Definition 4. Let G a i , i = 1,2,3,4 be independent gamma random variables, where a i  > 0, i = 1,2,3,4. The second class of bivariate beta distribution is defined by the stochastic representation,
Z 1 , Z 2 = G a 1 G a 1 + G a 3 , G a 2 G a 2 + G a 3 + G a 4 .
(8)

Now, we define the second class of BG distributions.

Definition 5. Let G a i , i = 1,2,3,4 be independent gamma random variables, where a i  > 0, i = 1,2,3,4. The second class of bivariate BG distribution is defined by the stochastic representation,
X 1 , X 2 = F 1 - 1 G a 1 G a 1 + G a 3 , F 2 - 1 G a 2 G a 2 + G a 3 + G a 4 ,
(9)

where F1(·),F2(·) are genuine CDF.

3.2.1 Basic properties

The marginal distributions are BG distributions with arbitrary parameters,
X 1 B G a 1 , a 3 ; F 1 , X 2 B G a 2 , a 3 + a 4 ; F 2 .
Using the joint PDF (8) (see Appendix), the joint PDF of this second class of bivariate BG distributions is given by,
f x 1 , x 2 = k g F 1 x 1 ; a 1 , A - a 1 g F 2 x 2 ; a 2 , A - a 2 1 - F 1 x 1 F 2 x 2 A a 12 x 1 , x 2 ,
where k-1 = B(a1,a3)B(a2,a1 + a3 + a4), A = i = 1 4 a i , a12(x1,x2) = H [F1(x1),F2(x2)] and
H z 1 , z 2 = 2 F 1 A , a 4 ; A - a 2 ; z 1 1 - z 2 1 - z 1 z 2 ,

being 2F1[..;.;] the Gauss confluent hypergeometric function.

The conditional density function of X1|X2 = x2 is,
f x 1 | x 2 = k f 1 x 1 F 1 x 1 a 1 - 1 1 - F 1 x 1 A - a 1 - 1 1 - F 2 x 2 a 1 1 - F 1 x 1 F 2 x 2 A a 12 x 1 , x 2 ,
where k = k B ( a 2 , a 3 + a 4 ) B ( a 1 , A - a 1 ) B ( a 2 , A - a 2 ) and the conditional density function of X2|X1 = x1 is,
f x 2 | x 1 = k f 2 x 2 F 2 x 2 a 2 - 1 1 - F 2 x 2 A - a 2 - 1 1 - F 1 x 1 A - a 1 - a 3 1 - F 1 x 1 F 2 x 2 A a 12 x 1 , x 2 ,

where k = k B ( a 1 , a 3 ) B ( a 1 , A - a 1 ) B ( a 2 , A - a 2 ) .

The regression function of X1 given X2 is,
E X 1 | X 2 = x 2 = a 1 x 2 - x 1 f 1 x 1 F 1 x 1 a 1 - 1 1 - F 1 x 1 A - a 1 - 1 1 - F 1 x 1 F 2 x 2 A a 12 x 1 , x 2 dx 1 ,
where a 1 ( x 2 ) = k 1 - F 2 ( x 2 ) a 1 and the regression function of X2 given X1 is,
E X 2 | X 1 = x 1 = a 2 x 1 - f 2 x 2 F 2 x 2 a 2 - 1 1 - F 2 x 2 A - a 2 - 1 1 - F 1 x 1 F 2 x 2 A a 12 x 1 , x 2 dx 2 ,

being a 2 ( x 1 ) = k 1 - F 1 ( x 1 ) A - a 1 - a 3 .

The cross-product moments can be obtained as,
E X 1 r 1 X 2 r 2 = E Z 1 , Z 2 F 1 - 1 Z 1 r 1 F 2 - 1 Z 2 r 2 ,
(10)
where (Z1,Z2) is the bivariate random variable with joint PDF given by equation (23). Note that (10) can be computed easily by simulation from samples of the random variable (Z1,Z2). The local dependence function is given by,
γ x 1 , x 2 = Af 1 x 1 f 2 x 2 1 - F 1 x 1 F 2 x 2 2 + 2 a 12 x 1 , x 2 x 1 x 2 .
(11)

The second term in (11) is long and will not be included here.

The random variables X1 and X2 are associated and then the linear correlation coefficient is always non-negative (see Definition I.11 and Proposition I.13 in Marshall and Olkin (2007)).

3.2.2 Multivariate extensions

A multivariate extension of (9) is also possible. We define,
X 1 , , X m = F i - 1 G a i G a i + j = 1 i G b j , i = 1 , 2 , , m ,
where the marginal distributions are BG distributions with parameters,
X i B G a i , b 1 + + b i ; F i , i = 1 , 2 , , m .

3.3 The third type of bivariate beta-generated distributions

The next class of bivariate beta distribution is the more general class in the sense that the marginal distributions have arbitrary parameters and admits any sign for the linear correlation coefficient. The following definition was proposed by Arnold and Ng (2011).

Definition 6. The third class of bivariate beta distribution is defined by the stochastic representation,
Z 1 , Z 2 = G a 1 + G a 3 G a 1 + G a 3 + G a 4 + G a 5 , G a 2 + G a 4 G a 2 + G a 3 + G a 4 + G a 5 ,

where G a i , i = 1,2,3,4,5 are independent gamma random variables, where a i  > 0, i = 1,2,3,4,5.

Now, we define the third class of BG distributions.

Definition 7. Let G a i , i = 1,2,3,4,5 be independent gamma random variables with a i  > 0, i = 1,2,3,4,5. The third class of bivariate BG distribution is defined by the stochastic representation,
X 1 , X 2 = F 1 - 1 G a 1 + G a 3 G a 1 + G a 3 + G a 4 + G a 5 , F 2 - 1 G a 2 + G a 4 G a 2 + G a 3 + G a 4 + G a 5 ,
(12)

where F1(·),F2(·) are genuine CDF.

3.3.1 Basic properties

The marginal distributions of (12) are X 1 B G ( a 1 + a 3 , a 4 + a 5 ; F 1 ) and X 2 B G ( a 2 + a 4 , a 3 + a 5 ; F 2 ) .

The joint PDF of (12) is given by,
f X 1 , X 2 x 1 , x 2 = f 1 x 1 f 2 x 2 1 - F 1 x 1 2 1 - F 2 x 2 2 f V , W F 1 x 1 1 - F 1 x 1 , F 2 x 2 1 - F 2 x 2 ,

where fV,W(·,·) is defined in equation (24) in the Appendix.

The conditional density function of X1|X2 = x2 is,
f x 1 | x 2 = k f 1 x 1 f V , W F 1 x 1 1 - F 1 x 1 , F 2 x 2 1 - F 2 x 2 1 - F 1 x 1 2 1 - F 2 x 2 a 3 + a 5 + 1 F 2 x 2 a 2 + a 4 - 1 ,
where k = B(a2 + a4,a3 + a5) and the conditional density function of X2|X1 = x1 is,
f x 2 | x 1 = k f 2 x 2 f V , W F 1 x 1 1 - F 1 x 1 , F 2 x 2 1 - F 2 x 2 1 - F 1 x 1 a 4 + a 5 + 1 1 - F 2 x 2 2 F 1 x 1 a 1 + a 3 - 1 ,

where k′′ = B(a1 + a3,a4 + a5).

The regression function of X1 given X2 is,
E X 1 | X 2 = x 2 = b 1 x 2 - x 1 f 1 x 1 f V , W F 1 x 1 1 - F 1 x 1 , F 2 x 2 1 - F 2 x 2 1 - F 1 x 1 2 dx 1 ,
where b 1 ( x 2 ) = k [ 1 - F 2 ( x 2 ) ] a 3 + a 5 + 1 F 2 ( x 2 ) a 2 + a 4 - 1 and the regression function of X2 given X1 is,
E X 2 | X 1 = x 1 = b 2 x 1 - x 2 f 2 x 2 f V , W F 1 x 1 1 - F 1 x 1 , F 2 x 2 1 - F 2 x 2 1 - F 2 x 2 2 dx 2 ,

with b 2 ( x 1 ) = k [ 1 - F 1 ( x 1 ) ] a 4 + a 5 + 1 F 1 ( x 1 ) a 1 + a 3 - 1 .

The local dependence function can be written as,
γ x 1 , x 2 = h x 1 , x 2 · 2 f V , W u 1 , u 2 v w f V , W u 1 , u 2 - f V , W u 1 , u 2 v f V , W u 1 , u 2 w f V , W 2 u 1 , u 2 ,

where h ( x 1 , x 2 ) = f 1 ( x 1 ) f 2 ( x 2 ) [ 1 - F 1 ( x 1 ) ] [ 1 - F 2 ( x 2 ) ] and u i = F i ( x i ) 1 - F i ( x i ) , i = 1,2.

In a similar way, the cross-product moments can be obtained using formula (10), where now (Z1,Z2) is the bivariate random variable with joint PDF given by (24).

The covariance structure of (12) is flexible and the sign of the linear correlation coefficient can be positive or negative.

3.3.2 Multivariate extensions

The general multivariate version of the third class of BG distribution is based on the multivariate beta distribution proposed by Arnold and Ng (2011). Using this definition, the extension of (12) to dimensions higher than two is,
X 1 , , X m = F i - 1 G a i + G b i G a i + l = 1 m G b l + G c , i = 1 , 2 , , m ,

where G a i , G b i , i = 1,2,…,m and G c are independent gamma random variables.

3.4 Estimation

Here we derive the maximum likelihood (ML) estimator of the parameters of the BG family of the first type defined in (4). Let (x1i,x2i), i = 1,2,…,n be a random sample of size n from (5), where we assume that both baseline functions are F1(x1;τ1) and F2(x2;τ2), where τ i , i = 1,2 is a p i  × 1, i = 1,2 vector of unknown parameters of the parent distributions. The log-likelihood function for θ = (τ1,τ2,a1,a2,b) may be written,
( θ ) = - n log B a 1 , a 2 , b + a 1 i = 1 n log F 1 x 1 i ; τ 1 F ̄ 2 x 2 i ; τ 2 1 - F 1 x 1 i ; τ 1 F 2 x 2 i ; τ 2 + a 2 i = 1 n log F 2 x 2 i ; τ 2 F ̄ 1 x 1 i ; τ 1 1 - F 1 x 1 i ; τ 1 F 2 x 2 i ; τ 2 + b i = 1 n log F ̄ 1 x 1 i ; τ 1 F ̄ 2 x 2 i ; τ 2 1 - F 1 x 1 i ; τ 1 F 2 x 2 i ; τ 2 + i = 1 n log f 1 x 1 i ; τ 1 f 2 x 2 i ; τ 2 F 1 x 1 i ; τ 1 F 1 x 1 i ; τ 1 F ̄ 1 x 1 i ; τ 1 F ̄ 2 x 2 i ; τ 2 ,
(13)

where we have used the notation F ̄ i · ; τ i = 1 - F i · ; τ i , i = 1,2.

This expression may be maximized either directly, e.g. using the Mathematica software function FindMaximum (see Wolfram Research, Inc. 2010), the SAS procedure NLMIXED (SAS Institute, Inc. 2010), the R software functions nlm or optim (R Development Core Team 2011), or the MATLAB function fmincon (The Mathworks, Inc. 2011), among others, which provides numerical algorithms for nonlinear optimization), or by solving the nonlinear equations obtained by differentiating expression (13).

Initial estimates of the parameters a1, a2 and b can be inferred from estimates of τ1 and τ2, since if (X1,X2) is distributed as (4), then (F1(X1),F2(X2)) is distributed as (3). If we define the random variables Y1 = F1(X1) and Y2 = F2(X2), we obtain the following expressions,
E Y 1 = a 1 a 1 + b ,
(14)
E Y 2 = a 2 a 2 + b ,
(15)
E 1 - Y 1 Y 2 1 - Y 1 1 - Y 2 = a 1 + a 2 + b - 1 b - 1 , b > 1
(16)
If m1, m2 and m12 are the sample versions of previous moments, solving Equations (14) to (16) for a1, a2 and b we have:
â 1 = m 1 1 - m 2 1 - m 12 1 - m 1 m 2 + m 12 m 1 + m 2 - m 1 m 2 - 1 ,
(17)
â 2 = m 2 1 - m 1 1 - m 12 1 - m 1 m 2 + m 12 m 1 + m 2 - m 1 m 2 - 1 ,
(18)
b ̂ = ( 1 - m 1 ) 1 - m 2 1 - m 12 1 - m 1 m 2 + m 12 m 1 + m 2 - m 1 m 2 - 1 .
(19)
The components of the score vector U(θ) are given by,
U a 1 ( θ ) = n ψ a 1 + a 2 + b - ψ a 1 + i = 1 n log F 1 x 1 i ; τ 1 F ̄ 2 x 2 i ; τ 2 1 - F 1 x 1 i ; τ 1 F 2 x 2 i ; τ 2 , U a 2 ( θ ) = n ψ a 1 + a 2 + b - ψ a 2 + i = 1 n log F 2 x 2 i ; τ 2 F ̄ 1 x 1 i ; τ 1 1 - F 1 x 1 i ; τ 1 F 2 x 2 i ; τ 2 , U b ( θ ) = n ψ a 1 + a 2 + b - ψ ( b ) + i = 1 n log F ̄ 1 x 1 i ; τ 1 F ̄ 2 x 2 i ; τ 2 1 - F 1 x 1 i ; τ 1 F 2 x 2 i ; τ 2 , U τ 1 ( θ ) = a 1 - 1 i = 1 n F ̇ 1 x 1 i τ 1 F 1 x 1 i ; τ 1 - a 2 + b - 1 i = 1 n F ̇ 1 x 1 i τ 1 F ̄ 1 x 1 i ; τ 1 + a 1 + a 2 + b i = 1 n F ̇ 1 x 1 i τ 1 F 2 x 2 i ; τ 2 1 - F 1 x 1 i ; τ 1 F 2 x 2 i ; τ 2 + i = 1 n f ̇ 1 x 1 i τ 1 f 1 x 1 i ; τ 1 , U τ 2 ( θ ) = a 2 - 1 i = 1 n F ̇ 2 x 2 i τ 2 F 2 x 2 i ; τ 2 - a 1 + b - 1 i = 1 n F ̇ 2 x 2 i τ 2 F ̄ 2 x 2 i ; τ 2 + a 1 + a 2 + b i = 1 n F ̇ 2 x 2 i τ 2 F 1 x 1 i ; τ 1 1 - F 1 x 1 i ; τ 1 F 2 x 2 i ; τ 2 + i = 1 n f ̇ 2 x 2 i τ 2 f 2 x 2 i ; τ 2 ,

where f ̇ j ( x ji ) τ j = f j ( x ji ; τ j ) / τ j and F ̇ j ( x ji ) τ j = F j ( x ji ; τ j ) / τ j are p j  × 1 vectors, with j = 1,2 and ψ(s) = d logΓ(s)/d s is the digamma function.

For obtaining interval estimation and hypothesis testing on the model parameters, we need the observed information matrix. The (p1 + p2 + 3,p1 + p2 + 3) observed matrix J = J(θ) can be obtained by taking partial second derivatives in the score vector U(θ). Assuming conditions that are fulfilled for parameters in the interior of the parameter space (but not in the boundary), the distribution of n θ ̂ - θ is asymptotically normal N p 1 + p 2 + 3 0 , I ( θ ) - 1 , where I(θ) denotes the expected information matrix. As usual, we can substitute I(θ) by J θ ̂ , that is, the observed information matrix evaluated at θ ̂ and then, the distribution N p 1 + p 2 + 3 0 , J ( θ ̂ ) - 1 can be used to construct approximate confidence intervals for the parameters.

The estimation of the other two models (9) and (12) requires a detailed study, which is beyond the scope of this paper and will be object of future research.

To finish this section, it should be mentioned that all the models proposed in this paper (4, 9 and 12) and their multivariate extensions can be enriched including location and scale parameters.

3.5 Some specific bivariate distributions

In this section we propose three specific bivariate BG models.

3.5.1 Bivariate Beta-Normal distributions

This model is a direct bivariate extension of the beta-normal distribution considered by Eugene et al. (2002). If F i (x i ) = Φ(z i ), where z i  = (x i  - μ i )/σ i , where μ i R and σ i  > 0, i = 1,2 and Φ(z) is the CDF of a standard normal distribution, we obtain the bivariate joint PDF,
f x 1 , x 2 = Φ z 1 a 1 - 1 Φ z 2 a 2 - 1 1 - Φ z 1 a 2 + b - 1 1 - Φ z 2 a 1 + b - 1 ϕ z 1 ϕ z 2 σ 1 σ 2 B a 1 , a 2 , b 1 - Φ z 1 Φ z 2 a 1 + a 2 + b ,

where a1,a2,b > 0.

3.5.2 Bivariate GB1 income distributions

If we take F(x) = x a in (1), we obtain the generalized beta distribution of the first kind (GB1) (see McDonald 1984), which will be denoted by XG B 1(p,q,a). Then, if F i ( x i ) = x i a i , with i = 1,2 and using Equation (5) we obtain,
f x 1 , x 2 = a 1 a 2 x 1 a 1 p 1 - 1 1 - x 1 a 1 p 2 + q - 1 x 2 a 2 p 2 - 1 1 - x 2 a 2 p 1 + q - 1 B p 1 , p 2 , q 1 - x 1 a 1 x 2 a 2 p 1 + p 2 + q ,
(20)

with 0 ≤ x1,x2 ≤ 1. The marginal distributions are X1G B 1(p1,q,a1) and X2G B 1(p2,q,a2). If we set a1 = a2 = 1 in (20), we obtain the bivariate beta proposed by Olkin and Liu (2003).

3.5.3 Bivariate GB2 income distributions

Now if we take F(x) = 1 - 1/(1 + x a ) in (1), we obtain the generalized beta distribution of the second kind (GB2) (see McDonald 1984), which will be denoted by XG B 2(p,q,a). Then, if F i ( x i ) = 1 - 1 / ( 1 + x i a i ) , with i = 1,2 and using formula (5) we have,
f x 1 , x 2 = a 1 a 2 B p 1 , p 2 , q · x 1 a 1 p 1 - 1 x 2 a 2 p 2 - 1 1 + x 1 a 1 + x 2 a 2 p 1 + p 2 + q , x 1 , x 2 0 ,

where the marginal distributions are X1G B 2(p1,q,a1) and X2G B 2(p2,q,a2).

4 Applications

To illustrate the methodology developed in this paper, we have fitted the bivariate BG model of the fist type defined in (4) to estimate the international distribution of well-being for the period 1980-2010. The estimation method is based on the formulation developed in Section 3.4. It should be worth noting that we have focused on three dimensions of well-being, namely income, health and education. Since these components present a positive correlation, the first type of BG distributions given by (4) is specially suitable in this case.

4.1 The data

We have used the most recent available data from International Human Development Indicators (UNDP 2012) on the HDI and its three components for the period 1980-2010 with five years intervals.

Note that we consider well-being as a multidimensional process which, in addition to economic variables, also involves social aspects such as health and education. In this context, the Human Development Index provides an excellent theoretical benchmark to make multidimensional assessments of well-being. Then, we have focused on three dimensions of quality of life: income, educational standards and health. In particular, we focus on the single-dimensional indices of the HDI, which are three normalized variables placed on scale 1-0. This structure of the data is specially representative in this case since we consider Beta and GB1 marginals for the BG models.

Income is represented by Gross National Income per capita measured in PPP 2005 US dollars, to make incomes comparable across countries and over time. The health component is represented by life expectancy at birth, which is considered an indicator of the health level.The education index is made up of two indicators, expected years of schooling and mean years of schooling, which are aggregated using the geometric mean. The first educational variable informs about the number of years that a child of school entrance age can expect to receive if prevailing patterns of age-specific enrollment rates persist throughout the child’s life (UNDP 2012). The second indicator reports the average number of years of education received by people aged 25 and older, converted from education attainment levels using official durations of each level (Barro and Lee, 2013).

Originally, our sample comprised only 105 countries, covering less than the 75 percent of global population. We had non-available data for 26 countries for one or more years before 1995. In order to offer comparable results across periods and to not restricting the sample considerably, missing values have been estimated. The estimation of these missing values has been based on two complementary methodologies which jointly offered feasible and consistent results according to the sample: piecewise cubic Hermite interpolating polynomial (PCHI) and the average rate of change, which was used when PCHI offered unfeasible estimations or out of range results. The interpolated values have been obtained using the command pchip of the R package Signal, which uses the methodology described by Fritsch and Carlson (1980). After this procedure, our data set includes 132 countries whose indicators of income, health and education are available for eight points of time (see Appendix for details). Consequently, the sample covers over 90 percent of the world population during the whole period.

4.2 Fitted models and results

The bivariate data consist of three pairs of variables (income,education), (income,health) and (education, health).

We have fitted the class of models given by Equation (4) with three specifications for the baseline CDFs:

  •  F i (x i ) = x i , with 0 ≤ x i  ≤ 1, i = 1,2 (classical beta marginals),

  •   F i x i = x i a i , with 0 ≤ x i  ≤ 1, a i  > 0, i = 1,2 (GB1 marginals) and

  •   F i x i = 1 - exp - a i x i 1 - exp - a i , with 0 ≤ x i  ≤ 1, a i  > 0, i = 1,2 (BG truncated exponential marginals).

The first model (with classical beta marginals) depends on 3 parameters, and the second and third models (with GB1 and BG truncated exponential marginals) are characterized by 5 parameters. The three models have been estimated by maximum likelihood using the equations given in Section 3.4. In total, we have fitted 7 × 3 × 3 = 63 different models. The initial estimates of the parameters have been obtained using Equations (17) to (19). In the case of the model with classical beta marginals, initial estimates are quite close to the ML estimators because they are based on sufficient statistics.

For the three pairs of variables, we have compared both models using the Akaike information criterion (AIC), defined by (Akaike 1974)
AIC = - 2 log L + 2 d ;
(21)

where log L = θ ̂ is the log-likelihood of the model evaluated at the maximum likelihood estimates and d is the number of parameters. We chose the model with the smallest value of AIC statistic.

Tables 1, 2, 3, 4, 5 and 6 show the parameter estimates and their standard errors for the three alternative models considered: BG distribution with classical beta marginals (3 parameter model p1,p2,q) and with GB1 and BG truncated exponential marginals (5 parameter model p1,p2,q,a1,a2), which have been fitted to pairs of the variables: Education, Health and Income. In particular, Tables 1 and 4 show the results obtained for Education & Health, Tables 2 and 5 show the corresponding results for Education & Income, and Tables 3 and 6 for Income & Health. The estimations have been performed by maximum likelihood, focusing on quinquennial periods from 1980 to 2010. It can be seen that, assuming the asymptotic normality of the maximum likelihood estimates, most of the estimates are statistically significant at a 0.05 level of significance.
Table 1

Parameter estimates for the BG models (with beta and GB1 marginals) fitted to the variables education & health by maximum likelihood (standard errors in parenthesis)

 Variables: education & health

Year

BG model with beta marginals

BG model with GB1 marginals

p ̂ 1

p ̂ 2

q ̂

p ̂ 1

p ̂ 2

q ̂

â 1

â 2

1980

2.6512

7.0544

3.4327

22.5595

10.5446

2.6651

0.1177

0.5381

 

(0.2358)

(0.6334)

(0.3063)

(9.1385)

(2.4534)

(0.2890)

(0.0481)

(0.1299)

1985

2.9384

7.5581

3.2621

11.1697

14.2985

2.6168

0.2527

0.4416

 

(0.2614)

(0.6780)

(0.2906)

(3.6937)

(6.2408)

(0.2810)

(0.0818)

(0.1918)

1990

2.9333

7.1301

2.8442

26.6132

6.6869

2.3141

0.1070

0.8566

 

(0.2615)

(0.6410)

(0.2534)

(20.0869)

(1.7468)

(0.2473)

(0.0800)

(0.2243)

1995

2.8508

6.1734

2.3377

17.3255

4.4878

2.0254

0.1606

1.1454

 

(0.2551)

(0.5572)

(0.2085)

(7.0388)

(0.9738)

(0.2125)

(0.0649)

(0.2588)

2000

2.7879

5.5716

1.9464

4.6151

5.7575

1.7644

0.5647

0.8677

 

(0.2505)

(0.5051)

(0.1736)

(1.7265)

(2.8469)

(0.1811)

(0.2029)

(0.4191)

2005

2.6539

5.0138

1.5705

13.1632

2.3387

1.5422

0.2096

1.9972

 

(0.2398)

(0.4577)

(0.1401)

(7.3975)

(0.5126)

(0.1578)

(0.1173)

(0.4455)

2010

2.6591

5.2296

1.4064

3.4113

3.7361

1.4084

0.7889

1.3821

 

(0.2411)

(0.4794)

(0.1254)

(1.1784)

(1.4461)

(0.1418)

(0.2710)

(0.5291)

Table 2

Parameter estimates for the BG models (with beta and GB1 marginals) fitted to the variables education & income by maximum likelihood (standard errors in parenthesis)

 Variables: education & income

Year

BG model with beta marginals

BG model with GB1 marginals

p ̂ 1

p ̂ 2

q ̂

p ̂ 1

p ̂ 2

q ̂

â 1

â 2

1980

2.2431

3.3973

2.8809

2.1813

16.6110

2.6533

0.9603

0.2195

 

(0.2008)

(0.3063)

(0.2591)

(0.5790)

(13.5230)

(0.2743)

(0.2243)

(0.1715)

1985

2.8775

3.7222

3.1934

4.3394

10.2173

2.7715

0.6321

0.3586

 

(0.2572)

(0.3338)

(0.2858)

(1.4677)

(5.6252)

(0.2881)

(0.1955)

(0.1857)

1990

3.1691

3.6855

3.0715

6.4501

7.6217

2.6157

0.4702

0.4541

 

(0.2834)

(0.3302)

(0.2746)

(2.7530)

(3.5058)

(0.2725)

(0.1880)

(0.1960)

1995

3.3196

3.3074

2.7070

6.3618

6.2069

2.3322

0.4907

0.4965

 

(0.2978)

(0.2967)

(0.2421)

(2.4599)

(2.2291)

(0.2418)

(0.1818)

(0.1698)

2000

3.4652

3.0623

2.3826

5.8748

5.8042

2.0680

0.5420

0.4885

 

(0.3119)

(0.2751)

(0.2131)

(2.2960)

(2.0882)

(0.2139)

(0.2052)

(0.1691)

2005

3.6366

2.8670

2.0777

7.2438

3.4311

1.8787

0.4773

0.7694

 

(0.3285)

(0.2580)

(0.1858)

(3.2031)

(0.8872)

(0.1918)

(0.2070)

(0.1937)

2010

3.7596

2.7789

1.8944

3.4581

4.0863

1.8134

1.0309

0.6702

 

(0.3406)

(0.2505)

(0.1693)

(1.0479)

(1.1725)

(0.1835)

(0.3034)

(0.1870)

Table 3

Parameter estimates for the BG models (with beta and GB1 marginals) fitted to the variables income & health by maximum likelihood (standard errors in parenthesis)

Variables: income & health

Year

BG model with beta marginals

BG model with GB1 marginals

p ̂ 1

p ̂ 2

q ̂

p ̂ 1

p ̂ 2

q ̂

â 1

â 2

1980

3.4799

5.9823

2.9491

7.2415

4.5699

2.7798

0.4964

1.1929

 

(0.3103)

(0.5360)

(0.2624)

(2.1497)

(1.0817)

(0.2855)

(0.1417)

(0.2674)

1985

3.8895

7.7393

3.3336

19.9607

7.1397

2.7940

0.1918

0.9025

 

(0.3458)

(0.6913)

(0.2959)

(11.3280)

(1.8739)

(0.2993)

(0.1072)

(0.2316)

1990

3.7983

8.0223

3.1630

19.2882

10.1747

2.5143

0.18067

0.63504

 

(0.3379)

(0.7173)

(0.2808)

(10.8390)

(3.4854)

(0.2721)

(0.1003)

(0.2148)

1995

3.4082

7.5345

2.7867

12.4184

11.5787

2.2338

0.2457

0.5300

 

(0.3039)

(0.6760)

(0.2478)

(4.6027)

(4.5656)

(0.2401)

(0.0901)

(0.2104)

2000

3.0781

7.0973

2.3944

11.9092

7.5709

1.9876

0.2346

0.7667

 

(0.2752)

(0.6396)

(0.2132)

(4.7419)

(2.2840)

(0.2109)

(0.0935)

(0.2342)

2005

2.7891

6.8378

2.0250

19.9351

4.0164

1.7675

0.1323

1.4058

 

(0.2501)

(0.6197)

(0.1805)

(19.3729)

(1.2603)

(0.1863)

(0.1265)

(0.4381)

2010

2.4296

6.5389

1.6754

4.4081

4.2348

1.5941

0.5388

1.4186

 

(0.2188)

(0.5974)

(0.1496)

(1.2981)

(1.4324)

(0.1627)

(0.1582)

(0.4772)

Table 4

Parameter estimates for the BG model (BG truncated exponential marginals) fitted to the variables education & health by maximum likelihood (standard errors in parenthesis)

 Variables: education & health

Year

Truncated exponential model

p ̂ 1

p ̂ 2

q ̂

â 1

â 2

1980

3.4104

19.2541

1.3217

3.1388

4.1114

 

(0.3671)

(4.0935)

(0.2372)

(0.4543)

(0.6456)

1985

3.9159

17.3862

1.4386

2.7821

3.4638

 

(0.4287)

(3.7250)

(0.2519)

(0.4325)

(0.6248)

1990

4.1159

10.5428

1.6196

2.2094

2.1086

 

(0.4795)

(2.0906)

(0.2652)

(0.4185)

(0.5804)

1995

4.0476

6.7782

1.6363

1.7177

1.0854

 

(0.5304)

(1.3450)

(0.2499)

(0.4271)

(0.5825)

2000

3.8541

5.3537

1.5412

1.3568

0.5281

 

(0.5772)

(1.1472)

(0.2193)

(0.4437)

(0.6087)

2005

4.3824

3.8719

1.2821

1.6870

0.0000

 

(0.7229)

(0.4862)

(0.1420)

(0.4704)

(0.0427)

2010

3.9681

4.1826

1.1884

1.3861

0.0001

 

(0.7878)

(1.0311)

(0.1607)

(0.5223)

(0.6912)

Table 5

Parameter estimates for the BG model (BG truncated exponential marginals) fitted to the variables education & income by maximum likelihood (standard errors in parenthesis)

 Variables: education & income

Year

Truncated exponential model

p ̂ 1

p ̂ 2

q ̂

â 1

â 2

1980

2.3477

6.0323

1.5187

1.7859

2.8785

 

(0.2801)

(0.9499)

(0.2891)

(0.5446)

(0.6098)

1985

3.1220

6.4260

1.6681

1.8322

2.7940

 

(0.3847)

(0.9671)

(0.3094)

(0.5189)

(0.5639)

1990

3.5887

6.2829

1.5834

1.9604

2.8090

 

(0.4533)

(0.9215)

(0.2838)

(0.5057)

(0.5350)

1995

3.6986

5.9955

1.3892

1.9633

3.0090

 

(0.4814)

(0.8708)

(0.2429)

(0.5081)

(0.5264)

2000

3.7836

5.7931

1.2656

1.8579

3.0539

 

(0.5115)

(0.8442)

(0.2131)

(0.5086)

(0.5110)

2005

3.9060

5.3306

1.1572

1.7335

2.9354

 

(0.5793)

(0.8152)

(0.1925)

(0.5332)

(0.5233)

2010

3.3507

5.8944

1.1977

0.9970

2.8693

 

(0.5127)

(0.9557)

(0.1937)

(0.5579)

(0.5168)

Table 6

Parameter estimates for the BG model (BG truncated exponential marginals) fitted to the variables income & health by maximum likelihood (standard errors in parenthesis)

 Variables: income & health

Year

Truncated exponential model

p ̂ 1

p ̂ 2

q ̂

â 1

â 2

1980

5.2910

5.6187

2.1441

1.7075

0.6332

 

(0.7765)

(1.0281)

(0.3450)

(0.4689)

(0.5700)

1985

6.0810

8.9083

2.0766

2.1172

1.3880

 

(0.7943)

(1.6258)

(0.3272)

(0.4063)

(0.5318)

1990

6.6056

9.9881

1.7975

2.5828

1.7600

 

(0.8013)

(1.7123)

(0.2787)

(0.3801)

(0.5043)

1995

6.6121

8.9254

1.5662

2.8884

1.7273

 

(0.8033)

(1.4908)

(0.2377)

(0.3774)

(0.4970)

2000

6.2010

7.4495

1.4527

2.8150

1.3495

 

(0.7880)

(1.2753)

(0.2108)

(0.3768)

(0.4997)

2005

6.5460

5.5399

1.3561

2.9102

0.6186

 

(0.9084)

(0.9529)

(0.1881)

(0.3852)

(0.5150)

2010

6.6966

4.1236

1.1760

3.1741

0.0000

 

(1.0480)

(0.7957)

(0.1720)

(0.4275)

(0.6583)

In order to illustrate the interval estimation of the parameters in Section 3.4, we have included the asymptotic confidence intervals at 95 percent for the models with beta and GB1 marginals (Tables 7, 8 and 9).
Table 7

Confidence intervals (95%) for the BG models (with beta and GB1 marginals) fitted to the variables education & health by maximum likelihood

 Variables: education & health

Year

Limit

BG model with beta marginals

BG model with GB1 marginals

p ̂ 1

p ̂ 2

q ̂

p ̂ 1

p ̂ 2

q ̂

â 1

â 2

1980

Lower

2.1890

5.8129

2.8324

4.6480

5.7359

2.0987

0.0234

0.2835

 

Upper

3.1134

8.2959

4.0330

40.4710

15.3533

3.2315

0.2120

0.7927

1985

Lower

2.4261

6.2292

2.6925

3.9300

2.0665

2.0660

0.0924

0.0657

 

Upper

3.4507

8.8870

3.8317

18.4094

26.5305

3.1676

0.4130

0.8175

1990

Lower

2.4208

5.8737

2.3475

-12.7571

3.2632

1.8294

-0.0498

0.4170

 

Upper

3.4458

8.3865

3.3409

65.9835

10.1106

2.7988

0.2638

1.2962

1995

Lower

2.3508

5.0813

1.9290

3.5295

2.5792

1.6089

0.0334

0.6382

 

Upper

3.3508

7.2655

2.7464

31.1215

6.3964

2.4419

0.2878

1.6526

2000

Lower

0.6984

2.8142

0.3379

7.9680

16.3910

0.3195

0.1146

0.3637

 

Upper

3.2789

6.5616

2.2867

7.9990

11.3374

2.1194

0.9624

1.6891

2005

Lower

2.1839

4.1167

1.2959

-1.3359

1.3340

1.2329

-0.0203

1.1240

 

Upper

3.1239

5.9109

1.8451

27.6623

3.3434

1.8515

0.4395

2.8704

2010

Lower

2.1865

4.2900

1.1606

1.1016

0.9017

1.1305

0.2577

0.3451

 

Upper

3.1317

6.1692

1.6522

5.7210

6.5705

1.6863

1.3201

2.4191

Table 8

Confidence Intervals (95%) for the BG models (with beta and GB1 marginals) fitted to the variables education & income by maximum likelihood

 Variables: education & health

Year

Limit

BG model with beta marginals

BG model with GB1 marginals

p ̂ 1

p ̂ 2

q ̂

p ̂ 1

p ̂ 2

q ̂

â 1

â 2

1980

Lower

1.8495

2.7970

2.3731

1.0465

-9.8941

2.1157

0.5207

-0.1166

 

Upper

2.6367

3.9976

3.3887

3.3161

43.1161

3.1909

1.3999

0.5556

1985

Lower

2.3734

3.0680

2.6332

1.4627

-0.8081

2.2068

0.2489

-0.0054

 

Upper

3.3816

4.3764

3.7536

7.2161

21.2427

3.3362

1.0153

0.7226

1990

Lower

2.6136

3.0383

2.5333

1.0542

0.7503

2.0816

0.1017

0.0699

 

Upper

3.7246

4.3327

3.6097

11.8460

14.4931

3.1498

0.8387

0.8383

1995

Lower

2.7359

2.7259

2.2325

1.5404

1.8379

1.8583

0.1344

0.1637

 

Upper

3.9033

3.8889

3.1815

11.1832

10.5759

2.8061

0.8470

0.8293

2000

Lower

1.0808

0.8424

0.5077

13.4885

12.1203

0.4423

0.1112

0.0826

 

Upper

4.0765

3.6015

2.8003

10.3750

9.8971

2.4872

0.9442

0.8199

2005

Lower

2.9927

2.3613

1.7135

0.9657

1.6922

1.5028

0.0716

0.3897

 

Upper

4.2805

3.3727

2.4419

13.5219

5.1700

2.2546

0.8830

1.1491

2010

Lower

3.0920

2.2879

1.5626

1.4042

2.5649

1.4537

0.4362

0.3037

 

Upper

4.4272

3.2699

2.2262

5.5120

7.1611

2.1731

1.6256

1.0367

Table 9

Confidence intervals (95%) for the BG models (with beta and GB1 marginals) fitted to the variables income & health by maximum likelihood

 Variables: education & health

Year

Limit

BG model with beta marginals

BG model with GB1 marginals

p ̂ 1

p ̂ 2

q ̂

p ̂ 1

p ̂ 2

q ̂

â 1

â 2

1980

Lower

2.8717

4.9317

2.4348

3.0281

2.4498

2.2202

0.2187

0.6688

 

Upper

4.0881

7.0329

3.4634

11.4549

6.6900

3.3394

0.7741

1.7170

1985

Lower

3.2117

6.3844

2.7536

-2.2422

3.4669

2.2074

-0.0183

0.4486

 

Upper

4.5673

9.0942

3.9136

42.1636

10.8125

3.3806

0.4019

1.3564

1990

Lower

3.1360

6.6164

2.6126

-1.9562

3.3433

1.9810

-0.0159

0.2140

 

Upper

4.4606

9.4282

3.7134

40.5326

17.0061

3.0476

0.3773

1.0560

1995

Lower

2.8126

6.2095

2.3010

3.3971

2.6301

1.7632

0.0691

0.1176

 

Upper

4.0038

8.8595

3.2724

21.4397

20.5273

2.7044

0.4223

0.9424

2000

Lower

0.8471

4.5394

0.5105

56.4722

17.2919

0.4192

0.0219

0.1796

 

Upper

3.6175

8.3509

2.8123

21.2033

12.0475

2.4010

0.4179

1.2257

2005

Lower

2.2989

5.6232

1.6712

-18.0358

1.5462

1.4024

-0.1156

0.5471

 

Upper

3.2793

8.0524

2.3788

57.9060

6.4866

2.1326

0.3802

2.2645

2010

Lower

2.0008

5.3680

1.3822

1.8638

1.4273

1.2752

0.2287

0.4833

 

Upper

2.8584

7.7098

1.9686

6.9524

7.0423

1.9130

0.8489

2.3539

Tables 10 and 11 show the values of the AIC statistic (Equation (21)) obtained for both candidate models fitted to three pairs of variables: Education & Health, Education & Income and Income & Health. Our estimates point out that the values of AIC statistic for the BG model with GB1 marginals are lower than those observed for the BG distribution with classical beta marginals, except in the case of Education & Health and Education & Income for the year 2010. For the Education & Health data the best model is the model with GB1 marginals (except in 2000). In the case of Education & Income data the model with BG truncated exponential marginals outperforms the other two models. Finally, for Income & Health data, the model with GB1 marginals outperforms the BG truncated exponential marginals model in 1980, 1985 and 1995 while in the other four years the BG truncated exponential marginals is the model that provides the best fit. These results imply that, in general terms, the accuracy of the estimates is higher for the models with 5 parameters.As an illustration, Figures 1, 2, 3, 4, 5 and 6 present the contour plots for the BG distributions with classical beta marginals and GB1 marginals fitted to the pairs of variables: Education & Health, Education & Income and Income & Health for every five years during the period 1980-2010. The shape of this graphics supports the existence of a positive correlation among the variables considered, thus pointing out the suitability of the first type of BG model. The contour plots also reveal that the proposed models represent the geography of the bivariate data adequately, being more accurate in the case of the BG distribution with GB1 marginals, as concluded from the results of the Akaike information criteria.
Table 10

AIC statistics obtained by maximum likelihood for the BG models with beta (3 parameters) and GB1 marginals (5 parameters) fitted to pairs of the variables: Education, Health and Income

 

Education & health

Education & income

Income & health

Year

Model (3 par)

Model (5 par)

Model (3 par)

Model (5 par)

Model (3 par)

Model (5 par)

1980

-292.88

-352.59

-174.12

-180.00

-279.48

-288.67

1985

-309.52

-354.03

-206.89

-217.35

-332.58

-354.54

1990

-304.40

-340.63

-213.99

-228.01

-344.03

-380.03

1995

-291.33

-313.41

-206.65

-217.94

-332.39

-370.94

2000

-291.77

-296.28

-207.64

-217.52

-325.00

-354.22

2005

-296.81

-306.00

-210.98

-213.52

-323.60

-340.72

2010

-330.82

-327.52

-215.63

-213.91

-317.16

-318.24

Smaller values indicate better fitted models.

Table 11

AIC statistics obtained by maximum likelihood for the BG model (BG truncated exponential maginals) with 5 parameters fitted to pairs of the variables: education, health and income

 

Education & health

Education & income

Income & health

Year

Model (5 par)

Model (5 par)

Model (5 par)

1980

-333.49

-188.88

-286.86

1985

-342.58

-223.42

-350.60

1990

-323.42

-233.30

-375.72

1995

-300.82

-230.08

-373.12

2000

-295.86

-233.28

-363.92

2005

-303.35

-233.61

-364.73

2010

-332.99

-237.16

-359.70

Smaller values indicate better fitted models.

Figure 1

Contour plots for the BG model with beta marginals fitted to the variables Education & Health.

Figure 2

Contour plots for the BG model with GB1 marginals fitted to the variables Education & Health.

Figure 3

Contour plots for the BG model with beta marginals fitted to the variables Education & Income.

Figure 4

Contour plots for the BG model with GB1 marginals fitted to the variables Education & Income.

Figure 5

Contour plots for the BG model with beta marginals fitted to the variables Income & Health.

Figure 6

Contour plots for the BG model with GB1 marginals fitted to the variables Income & Health.

5 Conclusions

The main conclusions of this paper are the following. Three different classes of bivariate BG distributions have been presented. These classes have been constructed using three different definitions of bivariate beta distributions, proposed by Libby and Novick (1982), Jones (2001) and Olkin and Liu (2003) for the first proposal, El-Bassiouny and Jones (2009) for the second proposal and Arnold and Ng (2011) for the third proposal. The main properties of these three classes have been studied. Three specific bivariate BG distributions have been obtained. Finally, an empirical application with well-being data has been presented.

The future research about bivariate BG distributions moves in three directions. The first line research is to propose specific models for their practical use in statistical modeling. The study of these possible models in any dimension could be an interesting field of research. Secondly, we propose to study statistical inference methodologies for bivariate (and, more generally, multivariate) BG distributions in (9) and (12). Finally, we propose to establish a model competition between BG distributions in (4), (9) and (12) for different choices of F1 and F2.

Appendix

The joint PDF of the different classes of bivariate beta distribution

The joint PDF of the bivariate random variable in (3) is (see Libby and Novick (1982), Jones (2001) and Olkin and Liu (2003)),
f z 1 , z 2 = z 1 a 1 - 1 z 2 a 2 - 1 1 - z 1 a 2 + b - 1 1 - z 2 a 1 + b - 1 B a 1 , a 2 , b 1 - z 1 z 2 a 1 + a 2 + b , 0 < z 1 , z 2 < 1
(22)
where B(a1,a2,b) = Γ(a1)Γ(a2)Γ(b)/Γ(a1 + a2 + b) and the marginal distributions are Z 1 B e ( a 1 , b ) and Z 2 B e ( a 2 , b ) . Note that (22) belongs to the three-parametric exponential family, where sufficient statistics for (a1,a2,b) are given by,
i = 1 n y 1 i 1 - y 2 i 1 - y 1 i y 2 i , i = 1 n y 2 i 1 - y 1 i 1 - y 1 i y 2 i , i = 1 n 1 - y 1 i 1 - y 2 i 1 - y 1 i y 2 i .
For n = 1, the distributions of the sufficient statistics are:
1 - Z 1 Z 2 Z 1 1 - Z 2 B 2 a 2 + b , a 1 + 1 ,
1 - Z 1 Z 2 Z 2 1 - Z 1 B 2 a 1 + b , a 2 + 1 ,
and
1 - Z 1 Z 2 1 - Z 1 1 - Z 2 B 2 a 1 + a 2 , b + 1 ,

where B 2 ( a , b ) denotes beta distribution of the second kind.

The log-moments of (22) are:
E log Z 1 1 - Z 2 1 - Z 1 Z 2 = ψ a 1 - ψ a 1 + a 2 + b , E log Z 2 1 - Z 1 1 - Z 1 Z 2 = ψ a 2 - ψ a 1 + a 2 + b , E log 1 - Z 1 1 - Z 2 1 - Z 1 Z 2 = ψ ( b ) - ψ a 1 + a 2 + b .
The joint PDF of the bivariate beta density (8) is givenby (see El-Bassiouny and Jones (2009)),
f z 1 , z 2 = k z 1 a 1 - 1 1 - z 1 A - a 1 - 1 z 2 a 2 - 1 1 - z 2 A - a 2 - 1 1 - z 1 z 2 A × 2 F 1 A , a 4 ; A - a 2 ; z 1 1 - z 2 1 - z 1 z 2 ,
(23)

where A = a1 + a2 + a3 + a4, k-1 = B(a1,a3)B(a2,a1 + a3 + a4) and 2F1[..;.;] denote the Gauss hypergeometric function.

The expression for the fV,W(v,w) function is given by (see Arnold and Ng 2011),
f V , W ( v , w ) = 0 0 u 4 / w - u 5 u 4 + u 5 v f v , w , u 3 , u 4 , u 5 du 3 du 4 du 5 , u , w > 0 ,
(24)
where
f v , w , u 3 , u 4 , u 5 = u 3 + u 5 u 4 + u 5 i = 1 5 Γ a i v u 4 + u 5 - u 3 a 1 - 1 × w u 3 + u 5 - u 4 a 2 - 1 i = 3 5 u i a i - 1 exp { - u 3 w + u 4 v + u 5 v + w + 1 } ,

where u4/w - u5 < u3 < (u4 + u5)v, u4,u5,v,w > 0.

Description of the data set

The list of countries used in the analysis are the following:

Afghanistan, Guatemala, Pakistan, Albania, Guyana, Panama, Algeria, Haiti, Papua New Guinea, Argentina, Honduras, Paraguay, Armenia, Hong Kong, China (SAR), Peru, Australia, Hungary, Philippines, Austria, Iceland, Poland, Bahrain, India, Portugal, Bangladesh, Indonesia, Qatar, Belgium, Iran (Islamic Republic of), Romania, Belize, Ireland, Russian Federation, Benin, Israel, Rwanda, Bolivia (Plurinational State of), Italy, Saudi Arabia, Botswana, Jamaica, Senegal, Brazil, Japan, Sierra Leone, Brunei Darussalam, Jordan, Slovakia, Bulgaria, Kenya, Slovenia, Burundi, Korea (Republic of), South Africa, Cameroon, Kuwait, Spain, Canada, Lao PDR, Sri Lanka, Central African Republic, Latvia, Sudan, Chile, Lesotho, Swaziland, China, Liberia, Sweden, Colombia, Lithuania, Switzerland, Congo, Luxembourg, Syrian Arab Republic, Congo (Democratic Republic of), Malawi, Tajikistan, Costa Rica, Malaysia, Tanzania (United Republic of), Cote D’ivoire, Mali, Thailand, Cuba, Malta, Togo, Cyprus, Mauritania, Tonga, Denmark, Mauritius, Trinidad and Tobago, Dominican Republic, Mexico, Tunisia, Ecuador, Moldova (Republic of), Turkey, Egypt, Mongolia, Uganda, El Salvador, Morocco, Ukraine, Estonia, Mozambique, United Arab Emirates, Fiji, Myanmar, United Kingdom, Finland, Namibia, United States, France, Nepal, Uruguay, Gabon, Netherlands, Venezuela (Bolivarian R.), Gambia, New Zealand, VietNam, Germany, Nicaragua, Yemen, Ghana, Niger, Zambia, Greece, Norway, Zimbabwe.

Data on the health index can be retrieved from https://data.undp.org/dataset/Health-index/9v27-i7ic, data on the education index can be drawn from https://data.undp.org/dataset/Expected-Years-of-Schooling-of-children-years-/qnam-f624 for the variable expected years of schooling and https://data.undp.org/dataset/Mean-years-of-schooling-of-adults-years-/m67k-vi5c for the mean years of schooling. Finally, income data come from https://data.undp.org/dataset/Income-index/qt4g-yea9.

Declarations

Acknowledgements

The authors thank to Ministerio de Economía y Competitividad (project ECO2010-15455) for partial support of this work. The authors thank also the comments by the attendants of the first ICOSDA meeting celebrated at Mount Pleasant, MI USA. We are grateful to the Editors-in-Chief and the reviewers for careful reading and for their comments and suggestions which greatly improved the paper.

Authors’ Affiliations

(1)
Department of Economics, University of Cantabria

References

  1. Alexander C, Cordeiro GM, Ortega EMM, Sarabia JM: Generalized beta-generated distributions. Comput. Stat. Data Anal 2012, 56: 1880–1897. 10.1016/j.csda.2011.11.015MathSciNetView ArticleGoogle Scholar
  2. Alexander C, Sarabia JM: Generalized Beta-Generated Distributions, ICMA Centre Discussion Papers in Finance DP2010–09, ICMA Centre. The University of Reading, Witheknights, PO Box 242, Reading RG6 6BA, UK; 2010.Google Scholar
  3. Alzaatreh A, Lee C, Famoye F: A new method for generating families of continuous distributions. Metron 2013, 71: 63–79. 10.1007/s40300-013-0007-yMathSciNetView ArticleGoogle Scholar
  4. Alzaatreh A, Lee C, Famoye F: The gamma-normal distribution: properties and applications. Comput. Stat. Data Anal 2014, 69: 67–80.MathSciNetView ArticleGoogle Scholar
  5. Akaike H: A new look at the statistical model identification. IEEE Trans. Automatic Control 1974, 19: 716–723. 10.1109/TAC.1974.1100705MathSciNetView ArticleGoogle Scholar
  6. Apostolakis FJ, Moieni P: The foundations of models of dependence in probabilistic safety assessment. Reliability Eng 1987, 18: 177–195. 10.1016/0143-8174(87)90097-7View ArticleGoogle Scholar
  7. Arnold BC, Ng HKT: Flexible bivariate beta distributions. J. Multivariate Anal 2011, 102: 1194–1202. 10.1016/j.jmva.2011.04.001MathSciNetView ArticleGoogle Scholar
  8. Arnold BC, Castillo E, Sarabia JM: Conditional Specification of Statistical Models, Springer Series in Statistics. Springer Verlag, New York; 1999.Google Scholar
  9. Arnold BC, Castillo E, Sarabia JM: Conditionally specified distributions: an introduction (with discussion). Stat. Sci 2001, 16: 249–274. 10.1214/ss/1009213728MathSciNetView ArticleGoogle Scholar
  10. Arnold BC, Castillo E, Sarabia JM: Families of multivariate distributions involving the Rosenblatt construction. J. Am. Stat. Assoc 2006, 101: 1652–1662. 10.1198/016214506000000159MathSciNetView ArticleGoogle Scholar
  11. Balakrishnan N, Lai C-D: Continuous Bivariate Distributions. Springer, New York; 2009.Google Scholar
  12. Barndorff-Nielsen O, Kent J, Sorensen M, mixtures Normalvariance-mean, z distributions Int: Stat. Rev. 1982, 50: 145–159. 10.2307/1402598View ArticleGoogle Scholar
  13. Barro RJ, Lee JW: A new data set of educational attainment in the world, 1950–2010. J. Dev. Econ 2013, 104: 184–198.View ArticleGoogle Scholar
  14. Cordeiro GM, de Castro M: A new family of generalized distributions. J. Stat. Comput. Simul 2011, 81: 883–893. 10.1080/00949650903530745MathSciNetView ArticleGoogle Scholar
  15. El-Bassiouny AH, Jones MC: A bivariate F distribution with marginals on arbitrary numerator and denominator degrees of freedom, and related bivariate beta and t distributions. Stat. Methods Appl 2009, 18: 465–481. 10.1007/s10260-008-0103-yMathSciNetView ArticleGoogle Scholar
  16. Esary JD, Proschan F, Walkup DW: Association of random variables, with applications. Ann. Math. Stat 1967, 38: 1466–1474. 10.1214/aoms/1177698701MathSciNetView ArticleGoogle Scholar
  17. Eugene N, Lee C, Famoye F: The beta-normal distribution and its applications. Commun. Stat. Theory Methods 2002, 31: 497–512. 10.1081/STA-120003130MathSciNetView ArticleGoogle Scholar
  18. Fritsch FN, Carlson RE: Monotone piecewise cubic interpolation. SIAM J. Numerical Anal 1980, 17: 238–246. 10.1137/0717021MathSciNetView ArticleGoogle Scholar
  19. Holland PW, Wang YJ: Dependence function for continuous bivariate densities. Commun. Stat. Theory Methods 1987, 16: 863–876. 10.1080/03610928708829408MathSciNetView ArticleGoogle Scholar
  20. Jones MC: Multivariate t and beta distributions associated with the multivariate F distribution. Metrika 2001, 54: 215–231.MathSciNetView ArticleGoogle Scholar
  21. Jones MC: Families of distributions arising from distributions of order statistics. Test 2004, 13: 1–43. 10.1007/BF02602999MathSciNetView ArticleGoogle Scholar
  22. Jones MC, Larsen PV: Multivariate distributions with support above the diagonal. Biometrika 2004, 91: 975–986. 10.1093/biomet/91.4.975MathSciNetView ArticleGoogle Scholar
  23. Kotz S, Balakrishnan N: Johnson, N L. John Wiley and Sons, New York; 2000.Google Scholar
  24. Lee C, Famoye F, Alzaatreh A: Methods for generating families of continuous distribution in the recent decades. Wiley. Interdiscip. Rev.: Comput. Stat 2013, 5: 219–238. 10.1002/wics.1255View ArticleGoogle Scholar
  25. Libby DL, Novick MR: Multivariate generalized beta distributions with applications to utility assessment. J. Educ. Stat 1982, 7: 271–294.View ArticleGoogle Scholar
  26. Marshall AW, Olkin I: Life Distributions. Structure of Nonparametrics, Semiparametric and Parametric Families. Springer, New York; 2007.Google Scholar
  27. McDonald JB: Some generalized functions for the size distribution of income. Econometrica 1984, 52: 647–663. 10.2307/1913469View ArticleGoogle Scholar
  28. Nadarajah S, Kotz S: The beta exponential distribution. Reliability Eng. Syst. Safety 2006, 91: 689–697. 10.1016/j.ress.2005.05.008View ArticleGoogle Scholar
  29. Olkin I, Liu R: A bivariate beta distribution. Stat. Probability Lett 2003, 62: 407–412. 10.1016/S0167-7152(03)00048-8MathSciNetView ArticleGoogle Scholar
  30. R Development Core Team: R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria; 2011. http://www.R-project.org/Google Scholar
  31. Rosenblatt M: Remarks on a multivariate transformation. Ann. Math. Stat 1952, 23: 470–472. 10.1214/aoms/1177729394MathSciNetView ArticleGoogle Scholar
  32. Sarabia JM, Gómez-Déniz E: Construction of multivariate distributions: a review of some recent results (with discussion), SORT. Stat. Oper. Res. Trans 2008, 32: 3–36.Google Scholar
  33. SAS Institute Inc.: SAS/STAT, version 9.2. Cary, NC, USA; 2010.Google Scholar
  34. Shaked M: A family of concepts of dependence for bivariate distributions. J. Am. Stat. Assoc 1977, 72: 642–650. 10.1080/01621459.1977.10480628MathSciNetView ArticleGoogle Scholar
  35. The Mathworks Inc. Matlab, release 2011, Novi, MI, USA; 2011.Google Scholar
  36. UNDP: International Human Development Indicators. 2012.http://hdr.undp.org/en/statistics/ Retrieved from . Last Accessed 10 Nov 2012Google Scholar
  37. Venter G: Transformed beta and gamma distributions and aggregate losses. Proc. Casualty Actuarial Soc 1983, LXX: 156–193.Google Scholar
  38. Wolfram Research Inc.: Mathematica, version 8.0. Champaign, IL, USA; 2010.Google Scholar
  39. Zografos K: Generalized beta generated-II distributions. In Modern Mathematical Tools and Techniques in Capturing Complexity. Edited by: Pardo L, Balakrishnan N, Angeles Gil M. Berlin: Springer; 2011.Google Scholar

Copyright

© Sarabia et al.; licensee Springer. 2014

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.