Open Access

On generating T-X family of distributions using quantile functions

Journal of Statistical Distributions and Applications20141:2

https://doi.org/10.1186/2195-5832-1-2

Received: 10 December 2013

Accepted: 1 April 2014

Published: 11 June 2014

Abstract

The cumulative distribution function (CDF) of the T-X family is given by R{W(F(x))}, where R is the CDF of a random variable T, F is the CDF of X and W is an increasing function defined on [0, 1] having the support of T as its range. This family provides a new method of generating univariate distributions. Different choices of the R, F and W functions naturally lead to different families of distributions. This paper proposes the use of quantile functions to define the W function. Some general properties of this T-X system of distributions are studied. It is shown that several existing methods of generating univariate continuous distributions can be derived using this T-X system. Three new distributions of the T-X family are derived, namely, the normal-Weibull based on the quantile of Cauchy distribution, normal-Weibull based on the quantile of logistic distribution, and Weibull-uniform based on the quantile of log-logistic distribution. Two real data sets are applied to illustrate the flexibility of the distributions.

Keywords

Beta-family Generalized distribution Survival function T-X families Moments

1. Introduction

Statistical distributions are important for parametric inferences and applications to fit real world phenomena. Many methods have been developed to generate statistical distributions in the literature. Some well-known methods in the early days for generating univariate continuous distributions include methods based on differential equations developed by Pearson (1895), methods of translation developed by Johnson (1949), and the methods based on quantile functions developed by Tukey (1960). The interest in developing new methods for generating new or more flexible distributions continues to be active in the recent decades. Lee et al. (2013) indicated that the majority of methods developed after 1980s are the methods of ‘combination’ for the reason that these new methods are based on the idea of combining two existing distributions or by adding extra parameters to an existing distribution to generate a new family of distributions. A brief summary of some methods in the literature that are related to the method proposed in this article is provided.

McDonald (1984) introduced the generalized beta distributions of the first and second kinds (GB1 and GB2). Subsequently, a further generalization named the generalized beta distribution (GBD) was given by McDonald and Xu (1995), which consists of more than 30 special cases or limiting distributions of GBD, including GB1 and GB2.

Azzalini (1985) introduced a family of skew-normal distributions, SN (λ), defined as g(x; λ) = 2ϕ(x)Φ(λx), where ϕ and Φ are probability density function (PDF) and cumulative distribution function (CDF) of N(0, 1), respectively. The skewness is characterized by the parameter λ. For a review of skew-symmetric distributions, one may refer to Kotz and Vicari (2005). Ferreira and Steel (2006) introduced a general framework for generating a family of skewed distributions based on a symmetric distribution. The PDF of the new family has the form
g x | f , p = f x p F x ,
(1.1)

where F is the CDF of a symmetric PDF f and p is a skewed PDF defined on [0, 1].

Marshall and Olkin (1997) proposed a general method for generating a new family of life distributions defined in terms of survival function as
G ¯ x ; α = α F ¯ x 1 - α ¯ F ¯ x = α F ¯ x F x + α F ¯ x , - < x < , a > 0 ,
(1.2)

where α ¯ = 1 - α and F ¯ = 1 - F is the survival function of the random variable X. For details about life distributions, one may refer to Marshall and Olkin (2010) and Lai (2013).

Eugene et al. (2002) proposed the beta-generated family of distributions, where beta distribution with PDF b is used as the generator. The CDF of the beta generated distribution is defined as G x = 0 F x b t dt , where F is the CDF of any random variable. If X is continuous, the corresponding PDF of the beta generated distribution is
g x = f x B α , β F α - 1 x 1 - F x β - 1 , a > 0 , β > 0 ,
(1.3)

where B(α, β) is the beta function. The PDF in (1.3) can be considered as a generalization of the distribution of order statistic (Eugene et al. 2002; Jones 2004). Many researchers have studied the beta generated distributions and their applications by applying different F in Equation (1.3). Examples include Famoye et al. (2004), Akinsete et al. (2008), Cordeiro and Lemonte (2011), and Alshawarbeh et al. (2012).

Jones (2009) and Cordeiro and de Castro (2011) extended the beta-generated family of distributions by using Kumaraswamy distribution b(t) = αβ tα - 1(1 - t α )β - 1, t (0, 1) (Kumaraswamy 1980), instead of the beta distribution. The PDF for Kumaraswamy-generated (Kw-G) family of distributions is defined by
g x = α β f x F α - 1 x 1 - F α x β - 1 , a > 0 , β > 0 .
(1.4)

Some examples of the Kw-G distributions are the Kw-Weibull (Cordeiro et al. 2010) and Kw-Gumbel (Cordeiro et al. 2011).

Recently, Alexander et al. (2012) studied the generalized beta-X family by considering b(t) = cB(a, b)- 1tac - 1(1 - t c )b - 1, 0 < t < 1, the generalized beta distribution of the first kind introduced by McDonald (1984). The new family is called generalized beta-generated (GBG) family of distributions. The PDF for GBG family of distributions is given by
g x , τ , a , b , c = cB a , b - 1 f x ; τ F x ; τ ac - 1 1 - F x ; τ c b - 1 .
(1.5)

When c = 1, the family in (1.5) reduces to the beta-X family in (1.3), and when a = 1, (1.5) reduces to the Kw-G family in (1.4).

Alzaatreh et al. (2013b) proposed a general method by replacing the beta PDF with a PDF r of a continuous random variable and applying a function W(F(x)) that satisfies some conditions (given in (2.1)) to develop the T-X family. The CDF of the T-X family is defined as
G x = a W F x r t dt = R W F x ,
(1.6)
where R is the CDF of T. The corresponding PDF (if it exists) of the T-X family of distributions is
g x = d dx W F x r W F x .
(1.7)

Different W functions generate different families of T-X distributions. Two continuous distributions of the T-X families that have been studied are Gamma-Pareto distribution (Alzaatreh et al. 2012a) and Weibull-Pareto distribution (Alzaatreh et al. 2013a). When X is discrete, the resulting T-X family is discrete. The T-geometric family generates the discrete analogue to the distribution of any continuous random variable T (Alzaatreh et al. 2012b). For a review of methods for generating univariate continuous distributions, one may refer to Lee et al. (2013).

The T-X family provides a new method to generate distributions by using the function W. A large number of distributions, continuous and discrete, can be generated by applying any two existing univariate distributions based on this method. Alzaatreh et al. (2013b) gave several choices of W(λ), including - log(1 - λ), λ/(1 - λ), log(λ/(1 - λ)), log(-log λ). It is clear that there are other choices that can be defined to generate different T-X families. Is there a systematic approach to define the W function for the T-X family? This question will be addressed in this paper.

In Section 2, a method to define the W function for generating T-X families of continuous probability distributions is presented. The W functions defined in Alzaatreh et al. (2013b) are special cases of the general approach. In order to distinguish between the previous T-X family proposed by Alzaatreh et al. (2013b) and the method proposed in this article, we use the abbreviation T-X(W) family for the previous T-X family and the abbreviation T-X{Y} for the family defined in Section 2. In Section 3, some properties of the new families are studied. Relationship between the new families and some existing families is given. Also in Section 3, the normal-Weibull distribution based on the quantile function of the Cauchy distribution, the normal-Weibull distribution based on the quantile function of the logistic distribution and Weibull-uniform distribution based on the quantile function of the log-logistic distribution are defined. Some properties of these three distributions are derived. In Section 4, a general family of life distributions based on survival function using similar methodology of the T-X{Y} family is presented. Some properties of the family are investigated. In Section 5, two real data sets are used to illustrate the flexibility of T-X{Y} family of distributions. Conclusions are given in Section 6.

2. Generating families of continuous probability distributions using quantile function

The T-X(W) family of distributions in (1.6) is generated by using the function W which satisfies the following conditions (Alzaatreh et al. 2013b):
i . W F x a , b , ii . W is differentiable and monotonically non decreasing , iii . W F x a as x - and W F x b as x ,
(2.1)

where [a, b] is the support of the random variable T for - ∞ ≤ a < b ≤ ∞.

In this section, a class of W functions wider than the one defined in (2.1) will be considered to define a T-X family. Let W : (0, 1) → (a, b), for - ∞ ≤ a < b ≤ ∞, be a right-continuous and non-decreasing function such that, lim λ 0 + W λ = a and lim λ 1 - W λ = b , then the composition G(x) = R{W(F(x))}, x (-∞, ∞), is a distribution function, because it satisfies the following required conditions for a distribution function:
  1. (a)

    G is non-decreasing,

     
  2. (b)

    G is right-continuous,

     
  3. (c)

    G(x) → 0 as x → - ∞ and G(x) → 1 as x → ∞.

     
If T has a PDF r with support (a, b), then
G x = a W F x r t dt .
(2.2)

Note that if both functions W and F are absolutely continuous, then G in (2.2) is absolutely continuous and has a density function g x = d dx G x .

A general method to define W function for generating T-X families is now proposed. It is assumed that the random variable T has support on the interval (a, b). Let P be the CDF of the random variable Y taking values on (a, b), and define the quantile function of the distribution P by
Q Y λ = inf y : P y λ , λ 0 , 1 .
If P is continuous and strictly increasing then Q Y  = P- 1 is continuous and strictly increasing (Shorack and Wellner 1986). We take W to be the quantile function of a strictly increasing distribution function P for the random variable Y, namely, W(λ) = Q Y (λ),  λ (0, 1), then Q Y is continuous and non-decreasing, and the CDF of a T-X{Y} family using the quantile function Q Y is defined as
G x = a Q Y F x r t dt = R Q Y F x , x - , .
(2.3)
If we assume further that Y has a density p(y) > 0 for all y in a neighborhood of Q Y (λ) where λ (0, 1), then d d λ Q Y λ exists and equals [p(Q Y (λ))]- 1 (Shorack and Wellner 1986), and hence the corresponding PDF associated with (2.3) is
g x = f x p Q Y F x r Q Y F x .
(2.4)

Note that the PDF defined in (2.4) can be easily used to generate a T-X{Y} family of distributions by applying the quantile function of any existing distribution.

The notation X sometimes represents the random variable with PDF f and sometimes represents the random variable with PDF g. Where there may be confusion, the notations X f for the random variable X with PDF f and X g for the random variable X with PDF g are used. The term moment refers to non-central moment, unless otherwise specified.

Lemma 1:
  1. (a)

    If the random variables X f and Y have the same distribution with the same parameters, then G = R.

     
  2. (b)

    If the random variables T and Y have the same distribution with the same parameters, then G = F.

     

Proof: The proofs of (a) and (b) follow from definition (2.4). □

Some properties of the T-X{Y} family:
  1. 1.

    Any PDF f can be represented as the PDF defined in (2.4) by considering Q Y  = R - 1.

     
  2. 2.

    The support of the new random variable defined in (2.4) is the same as the support of the random variable with PDF f.

     
  3. 3.

    If the support of the random variable Y is [c, d] with [a,  b]  [c,  d], then the PDF in (2.4) is defined with support [F - 1(P(a)), F - 1(P(b))].

     
  4. 4.

    The relationship between the random variable X g with PDF in (2.4) and T is given by T = Q Y (F(X g )) and hence, X g  = F - 1(P(T)) when F - 1 exists, where P is the CDF of Y with the corresponding quantile function Q Y . Using this relation, one can generate the random variable X g by generating the random variable T and then computing X g  = F - 1(P(T)). Similarly, one can compute the moments of X g by using E X g n = E F - 1 P T n .

     
  5. 5.

    The hazard function, h g x = g x / G ¯ x , for the random variable X g in (2.4) is given by h g x = f x p Q Y F x h r Q Y F x , where h r is the hazard function for the random variable T with PDF r.

     
  6. 6.

    The quantile function of the new random variable in (2.4) is given by Q X g λ = Q X f P Q T λ , λ (0, 1), where Q X f and Q T are the quantile functions of the random variables X f and T respectively.

     

3. Some T-X{Y} families and properties

3.1 Some T-X{Y} families based on different quantile functions

The quantile function of a random variable Y may not be explicitly represented. However, many of the existing continuous random variables are one-to-one functions and they have explicit quantile functions. The quantile functions of these random variables can be used to generate new T-X{Y} families. The following example illustrates how to derive the T-X{Y} family of distributions.

Example: T-X {log-logistic} family:

Let the random variable Y follow the log-logistic distribution with parameters α and β. The PDF and quantile function are, respectively, p(y) = (β/α)(y/α)β - 1/(1 + (y/α) β )2, y ≥ 0, and Q Y (λ) = α(λ/(1 - λ))1/β, λ (0,  1). Therefore, p(Q Y (λ)) = (β/α)λ(β - 1)/β(1 - λ)(β + 1)/β, and the definition in (2.4) gives the PDF of T-X{log-logistic} family as
g x = α / β f x F β - 1 / β x 1 - F x β + 1 / β r α F x 1 - F x 1 / β .
(3.1)

When α = β = 1, the family in (3.1) reduces to g x = f x 1 - F x 2 r F x / 1 - F x . This PDF can be written in terms of hazard and survival functions of X f as g x = h f x F ¯ x r 1 - F ¯ x ) / F ¯ ( x , where h f is the hazard function and F ¯ is the survival function of the random variable X f .

Table 1 lists the probability density functions of some T-X{Y} families based on different quantile functions. Each family g is based on a given quantile function that defines many subfamilies of distributions.
Table 1

T-X { Y } families based on different quantile functions

Random variable Y and support of r( t)

The quantile function Q Y ( λ)

Family of probability density function g( x) defined in (2.4)

Exponential (0, ∞)

- b log(1 - λ), b > 0

bf x 1 - F x r - b log 1 - F x

Weibull (0, ∞)

γ{-log(1 - λ)}1/c,  γ, c > 0

γ f x r γ - log 1 - F x 1 / c c 1 - F x - log 1 - F x c - 1 / c

Rayleigh (0, ∞)

{-2b2 log(1 - λ)}1/2,  b > 0

bf x r - 2 b 2 log 1 - F x 1 / 2 1 - F x - 2 log 1 - F x 1 / 2

Dagum (0, ∞)

β λ 1 / p 1 - λ 1 / p 1 / α , α , β , p > 0

β f x r β F 1 / α p x / 1 - F 1 / p x 1 / α α p F 1 - 1 / α p x 1 - F 1 / p x 1 + 1 / α

Lomax (0, ∞)

1 α 1 - 1 - λ 1 / k 1 - λ 1 / k , α , k > 0

f x k α 1 - F x 1 / k + 1 r 1 - 1 - F x 1 / k α 1 - F x 1 / k

Log-logistic (0, ∞)

α λ 1 - λ 1 / β , α , β > 0

α f x r α F x / 1 - F x 1 / β β F β - 1 / β x 1 - F x β + 1 / β

Exponentiated Exponential (0, ∞)

- 1 θ log 1 - λ 1 / α , θ , α > 0

f x r - 1 / θ log 1 - F 1 / α x α θ F α - 1 / α x 1 - F 1 / α x

Cauchy (-∞, ∞)

a + b {tan(π(λ - 0.5))}, b > 0

π b f x r a + b tan π F x - 1 / 2 cos 2 π F x - 1 / 2

Extreme value(Gumbel) (-∞, ∞)

a - b log(-log λ),  b > 0

bf x r a - b log - log F x - F x log F x

Laplace (-∞, ∞)

a + b log 2 λ , λ < 0.5 a - b log 2 1 - λ , λ 0.5 a , b > 0

bf x r a + b log 2 F x F x , F x < .5 bf x r a - b log 2 1 - F x 1 - F x , F x . 5

Logistic (-∞, ∞)

a + b log λ 1 - λ , b > 0

bf x r a + b log F x / 1 - F x F x 1 - F x

Generalized logistic (II) (-∞, ∞)

log 1 - 1 - λ 1 / α 1 - λ 1 / α , α > 0

f x r log 1 - 1 - F x 1 / α / 1 - F x 1 / α α 1 - F x 1 - 1 - F x 1 / α

Common supports of random variables X f and T are [0, 1], (0, ∞), or (-∞, ∞). Beta-X, Kw-G and GBG are T-X{uniform} families with T being defined on [0, 1]. The W functions given in Alzaatreh et al. (2013b) can be defined by the quantile functions of random variable Y as follows: W(λ) = - log(1 - λ), λ (0, 1), is the quantile function of standard exponential distribution, W(λ) = λ/(1 - λ) is the quantile function of log-logistic distribution with parameters α = β = 1, and W(λ) = log(λ/(1 - λ)) is the quantile function of logistic distribution with scale parameter b = 1 and location parameter a = 0. Many other W functions can be defined by using the quantile function approach. The T-X(W) families defined in Alzaatreh et al. (2013b) derive their parameters from the random variables T and X and none from the W function. The T-X(W) can be derived through the T-X{Y} framework by noting that the W function is the quantile function for the random variable Y. One advantage of using the T-X{Y} framework is that one can keep one or more parameters from the distribution of Y. In particular, keeping a shape parameter from Y can add more flexibility to the new distribution.

3.2 Some properties of T-X{Y} families

In the following, we assume (if necessary) the mentioned expectations exist and are finite.

Theorem 1: Let X f be a non-negative random variable with PDF f(x), and let E X f n denote the nth moment of X f , then
E X g n E X f n · E P ¯ T - 1 ,

where E X g n is the nth moment of the random variable with density in (2.4), P ¯ = 1 - P is the survival function of the CDF P, and T is the random variable with PDF r.

Proof: By definition,
E X f n = 0 x n f x dx F - 1 P t x n f x dx F - 1 P t n F - 1 P t f x dx = F - 1 P t n · P ¯ t .
Hence,
F - 1 P t n E X f n P ¯ t - 1 .
(3.2)
Using property (4) in Section 2 and (3.2) yields
E X g n = E F - 1 P T n E X f n · E P ¯ T - 1 .

The entropy of a random variable is a measure of the variation of uncertainty. Shannon’s (1948) entropy of the random variable X with density g is defined as E{-log(g(X))}.

Theorem 2: Shannon’s entropy η X g of X g is given by
η X g = E log q X f P T + E log p T + η T ,

where η T is Shannon’s entropy for the random variable T with PDF r, and q X f is the quantile density function of X f .

Proof: By definition,
η X g = E - log g X g = - E log f X g + E log p Q Y F X g - E log r Q Y F X g .
Note that the random variable T = Q Y {F(X g )} has the PDF r, and X g  = F- 1{P(T)}. Thus,
η X g = - E log f F - 1 P T + E log p T + η T = E log q X f P T + E log p T + η T .

3.3 Relationship between T-X{Y} family and some existing families of distributions

Many existing families of distributions can be generated by using the quantile function approach defined in (2.4). Four examples are given in the following.

Generalized beta-generated (GBG) family introduced by Alexander et al. (2012):

The GBG family is the generalized beta-X{uniform} family. It can also be derived as follows: By setting α = β = 1 and p = 1/c in the Dagum quantile function in Table 1, the PDF of the T-X{Dagum} family is given by
g x = cf x r F c x / 1 - F c x F 1 - c x 1 - F c x 2 .
(3.3)
By taking r in (3.3) to be r(t) = {B(a, b)}- 1ta - 1(1 + t)- b - a, t > 0, which is the PDF of the inverted beta random variable, the family of probability density functions is obtained as
g x = cB a , b - 1 f x F ca - 1 x 1 - F c x b - 1 , a , b > 0 .
(3.4)

The family in (3.4) is the generalized beta-generated (GBG) family in (1.5).

Family of skewed distributions defined in Ferreira and Steel (2006):

The family of skewed distributions in (1.1) defined by Ferreira and Steel (2006) can be represented in the form of T-X{Y} system by considering the quantile function of a standard uniform distribution, Q Y (λ) = λ, where F is the CDF of a symmetric PDF f and the r is a skewed PDF having support [0, 1].

T-X family defined in Alzaatreh et al. (2013b):

Alzaatreh et al. (2013b) studied the T-X(W) family using W(F(x)) = - log(1 - F(x)). By using the quantile function approach, let λ = F(x), then W(λ) = - log (1 - λ) is the quantile function of standard exponential distribution. Hence, the T-X family studied by Alzaatreh et al. (2013b) is the T-X{exponential}. According to the authors, the T-X{exponential} is a family of distributions arising from the hazard function of X f . If h f is the hazard function and H f is the cumulative hazard function of the random variable X f , and the exponential distribution has mean b, the PDF of the T-X{exponential} family is g(x) = bh f (x)r{bH f (x)}. In a similar way, T-X{Weibull} and T-X{Rayleigh} families can be considered as families of distributions arising from hazard functions of X f . The PDF of the T-X{Weibull} family is g(x) = (γ/c)h f  (x)r{γ(H f  (x))1/c}/(H f  (x))(c - 1)/c and the PDF of T-X{Rayleigh} is g(x) = bh f  (x)r{(2b2H f  (x))1/2}/(2H f  (x))1/2.

Generalized beta distribution introduced by McDonald and Xu (1995):

Setting α = β = 1 in the PDF of the T-X{log-logistic} family in (3.1) and taking r(t) = {B(p, q)}- 1tp - 1(1 + t)- p - q, t > 0, the PDF of the inverted beta random variable with parameters p and q, yields
g x = f x B p , q F p - 1 x 1 - F x q - 1 , p , q 0 .
(3.5)
Note that (3.5) is the beta-generated family. By taking F(x) = (x/b) a /(1 + c(x/b) a ), which is the CDF of a truncated log-logistic distribution with 0 < x a  < b a /(1 - c), 0 ≤ c < 1, a and b positive in (3.5), the generalized beta distribution introduced by McDonald and Xu (1995) is obtained as
GB x ; a , b , c , p , q = a x ap - 1 1 - 1 - c x / b a q - 1 b ap B p , q 1 + c x / b a p + q , for 0 < x a < b a 1 - c ,

with 0 ≤ c < 1, and a, b, p and q positive.

Taking F(x) in (3.5) to be F(x) = e(x - δ)/σ/(1 + ce(x - δ)/σ), which is the CDF of a truncated logistic distribution with - ∞ < (x - δ)/σ < ln(1/(1 - c)), 0 ≤ c < 1 and σ > 0, yields
g x = e p x - δ / σ 1 - 1 - c e x - δ / σ q - 1 / σ B p , q 1 + c e x - δ / σ p + q .
(3.6)

The PDF in (3.6) is the exponential generalized beta distribution in McDonald and Xu (1995).

3.4 Three examples of new distributions derived from T-X{Y} family

Table 1 contains many T-X families based on different quantile functions. Three new distributions, normal-Weibull{Cauchy}, normal-Weibull{logistic} and Weibull-uniform{log-logistic} distributions are introduced, and some properties of these distributions are studied.

Normal-Weibull{Cauchy} distribution:

Setting a = 0 and b = 1 in the T-X{Cauchy} family in Table 1, and letting r be N(μ, σ2), the normal-X{Cauchy} sub-family is given by
g x = π f x σ 2 cos 2 π F x - 1 / 2 exp - 1 2 tan π F x - 1 / 2 - μ σ 2 .
(3.7)
Substituting F(x) = 1 - exp{-(x/γ) c }, the CDF of Weibull distribution in (3.7) yields
g x = π c / γ x / γ c - 1 exp - x / γ c σ 2 cos 2 π 1 / 2 - exp - x / γ c exp - tan π 1 / 2 - exp - x / γ c - μ σ 2 2 ,
(3.8)

for x > 0, and σ, c, γ > 0. The random variable with the PDF in (3.8) is said to follow a four-parameter normal-Weibull{Cauchy} (NW{C}) distribution. A location parameter δ can be included in (3.8) by writing x as (x - δ) leading to a five-parameter distribution.

Plots of the NW{C} density function for different parameter values are given in Figure 1. The graphs in Figure 1 show that the NW{C} distribution can be right skewed, left skewed, unimodal or bimodal.
Figure 1

The PDF of normal-Weibull{Cauchy} for various values of μ ,σ ,c and γ .

Lemma 2: The nth moment of the NW{C} random variable with PDF in (3.8) exists for any μ, σ > 0, c > 0, γ > 0 and satisfies the inequality
E X g n γ n Γ 1 + n / c 4 Φ 1 + 3 π σ 2 2 exp - 1 - μ 2 2 σ 2 + 3 π μ 2 1 - Φ 1 ,
(3.9)

where Φ is the CDF of a normal distribution with parameters μ and σ.

Proof: The nth moment for Weibull random variable is E X f n = γ n Γ 1 + n / c . The CDF of standard Cauchy distribution is P(y) = 1/2 + (1/π)tan- 1(y), so 1 - P(T) = 1/2 - (1/π)tan- 1(T), where - ∞ < T < ∞. When T ≤ 1, 1 - P(T) ≥ 1/4 and hence (1 - P(T))- 1 ≤ 4. When T > 1 and by using the series tan - 1 T = π 2 + n = 1 - 1 n 2 n - 1 T 2 n - 1 (Polyanin and Manzhirov 2008),
1 / 2 - 1 / π tan - 1 T = 1 2 - 1 π tan - 1 T = 1 π 1 T - 1 3 T 3 + 1 5 T 5 - 1 7 T 7 + + - 1 n + 1 2 n - 1 T 2 n - 1
> 1 π 1 T - 1 3 T 3 = 1 π 3 T 2 - 1 3 T 3 = 1 π 2 T 2 + T 2 - 1 3 T 3 > 2 3 πT .
Hence, (1 - P(T))- 1 < (3π/2)T. Since the random variable T with PDF r has a normal distribution with parameters μ and σ, then by using Theorem 1
E X g n E X f n E 1 - P T - 1 γ n Γ 1 + n / c - 1 4 r t dt + 3 π / 2 1 t · r t dt ,
= γ n Γ 1 + n / c - 1 4 r t dt + 3 π σ / 2 1 t - μ σ · r t dt + 3 π / 2 1 μ · r t dt ,

and the result in (3.9) follows. □

Normal-Weibull{logistic} distribution:

Setting a = 0 and b = 1 in the PDF of the T-X{logistic} family in Table 1, and taking r to be N(μ, σ), the PDF of normal distribution, and F to be the CDF of Weibull distribution, F(x) = 1 - exp{-(x/γ) c }, the normal-Weibull{logistic} (NW{L}) distribution is obtained as
g x = c x / γ c - 1 exp x / γ c γ σ 2 π exp x / γ c - 1 exp - 1 2 log exp x / γ c - 1 - μ σ 2 , x > 0 , σ , c , γ > 0 .
Plots of NW{L} density function for different parameter values are given in Figure 2. The graphs in Figure 2 show that the NW{L} distribution can be reversed J-shape, skewed to the right or skewed to the left or bimodal.
Figure 2

The PDF of Normal-Weibull{logistic} for various values of μ ,σ ,c and γ .

Lemma 3: The nth moment of the NW{L} random variable exists for any σ > 0, c > 0, γ > 0 and satisfies the inequality
E X g n γ n Γ 1 + n / c 1 + exp μ + 0.5 σ 2 .
(3.10)

Proof: The nth moment for the Weibull random variable is E X f n = γ n Γ 1 + n / c . The CDF of standard logistic random variable is P(y) = exp(y)/{1 + exp(y)}. Since the random variable T has normal distribution with parameters μ and σ, then E({1 - P(T)}- 1) = E(1 + exp(T)) = 1 + exp(μ + 0.5σ2). By using Theorem 1, the result in (3.10) follows. □

Weibull-uniform{log-logistic} distribution:

Setting α = β = 1 in the PDF of the T-X{log-logistic} family in (3.1), and taking r to be the PDF of Weibull distribution, r(t) = (c/γ)(x/γ)c - 1 exp{-(x/γ) c } and F to be the CDF of uniform distribution, F(x) = (x - a)/(b - a), the Weibull-uniform{log-logistic} (WU{LL}) distribution is obtained as
g x = c b - a γ b - x 2 x - a γ b - x c - 1 exp - x - a γ b - x c , a < x < b , c , γ > 0 .
Plots of WU{LL} density function for different parameter values are given in Figure 3. The graphs in Figure 3 show that the WU{LL} distribution can be reversed J-shape, skewed to the right or skewed to the left or bimodal.
Figure 3

The PDF of Weibull-uniform{log-logistic} for various values of c and γ when a= 0 and b= 5 .

Lemma 4: The nth moment of the WU{LL} random variable exists for any b > a, c > 0, γ > 0 and satisfies the inequality
E X g n b n + 1 - a n + 1 n + 1 b - a 1 + γ Γ 1 + 1 / c .
(3.11)

Proof: The nth moment for the uniform random variable is E X f n = b n + 1 - a n + 1 n + 1 b - a . The CDF of standard log-logistic random variable is P(y) = y/(1 + y). Since the random variable T has Weibull distribution with parameters c and γ, then E({1 - P(T)}- 1) = E(1 + T) = 1 + γ Γ(1 + 1/c). By using Theorem 1, the result in (3.11) follows. □

4. The family of T-X{Y} distributions based on survival functions

Instead of using the CDF F in (2.2), one can use the survival function F ¯ and apply similar method to generate a new family of distributions in terms of survival functions.

If P ¯ and Q Y are the survival and quantile functions of the random variable Y, then P ¯ - 1 F x = Q Y 1 - F x = Q Y F ¯ x . A new family T-X{Y} of distributions in terms of the survival function of X is defined as
G ¯ x = a P ¯ - 1 F x r t dt = a Q Y F ¯ x r t dt = R Q Y F ¯ x .
(4.1)
The corresponding PDF associated with (4.1) is
g x = f x p Q Y F ¯ x r Q Y F ¯ x .
(4.2)
The family of life distributions introduced by Marshall and Olkin (1997) can be derived using (4.1) as follows. By using Q Y (λ) = a(λ/(1 - λ))1/b, the quantile function of the log-logistic distribution and R(t) = ηt/(1 + ηt), the CDF of log-logistic distribution with scale parameter 1/η in (4.1) can be written as
G ¯ x = η a F ¯ x / 1 - F ¯ x 1 / b 1 + η a F ¯ x / 1 - F ¯ x 1 / b = η a F ¯ 1 / b x F 1 / b x + η a F ¯ 1 / b x .
(4.3)

Letting ηa = α and 1/b = β, (4.3) becomes G ¯ x = α F ¯ β x F β x + α F ¯ β x , which reduces to Marshall-Olkin’s family in (1.2) when β = 1.

The following theorem gives the relation between the moments of the random variables defined in (2.4) and (4.2) when the PDF f is symmetric.

Theorem 3: Let E 1 X g n and E 2 X g n denote the nth moments of the random variables in (2.4) and (4.2) respectively. If f is symmetric, then
E 2 X g n = i = 0 n - 1 i n i 2 m f n - i E 1 X g i ,

where m f is the median of the random variable X f with PDF f.

Proof: The nth moment of (2.4) and (4.2) are E 1 X g n = 0 1 F - 1 v n p Q Y v r Q Y v dv and E 2 X g n = 0 1 F - 1 1 - u n p Q Y u r Q Y u du , where the substitutions v = F(x) and u = 1 - F(x) are applied to (2.4) and (4.2) respectively. When f is symmetric, F- 1(1 - u) = 2m f  - F- 1(u), u [0, 1]. By using the binomial theorem,
E 2 X g n = 0 1 F - 1 1 - u n p Q Y u r Q Y u du = i = 0 n - 1 i n i 2 m f n - i 0 1 F - 1 u i p Q Y u r Q Y u du ,
= i = 0 n - 1 i n i 2 m f n - i E 1 X g i .
Theorem 4: Let X f be non-negative random variable with PDF f, and let E X f n denote the nth moment of X f , then
E X g n E X f n · E P T - 1 ,

where E X g n is the nth moment of the random variable with density in (4.2), and T is the random variable with PDF r.

Proof: The proof is similar to the proof of Theorem 1 after noting that the relation between the random variables X g in (4.2) and T is given by X g = F - 1 P ¯ T . □

Similar to Theorem 2, Shannon’s entropy η X g for random variable X g with PDF in (4.2) is given in the following theorem.

Theorem 5: Let X g be a random variable with density in (4.2). Shannon’s entropy η X g is given by η X g = E log q X f P ¯ T + E log p T + η T .

Proof: The proof is similar to that of Theorem 2. □

5. Application

In this section, we apply the NW{C} distribution to fit two data sets. The first data is the famous Old Faithful Geyser eruption data (n = 272) obtained from Härdle (1991, p. 201). The data is the duration time of eruption (in minutes) taken during August 1st to August 15th, 1985 (Dekking et al. 2005). The second data set is USS Halfbeak diesel engine data (n = 71) studied by Ascher and Feingold (1984, p. 75) and Meeker and Escobar (1998, p. 415). The data is the time of unscheduled maintenance actions for the USS Halfbeak number 4 main propulsion diesel engine over 25.518 operating hours (Meeker and Escobar 1998, p. 415).

5.1 The famous old faithful Geyser eruption data

As shown in Figure 4, the data has two distinct modes. A common approach for fitting such a bimodal data is by using mixture distributions. Arellano-Valle et al. (2010) applied flexible epsilon-skew-normal distribution to fit the data and their fit is the same as that of mixture-normal distribution. Four distributions, a four-parameter NW{C} in (3.8), a five-parameter NW{C}, mixture normal, and beta-normal are applied to fit the data using maximum likelihood technique. Table 2 contains the estimates, standard errors of the estimates, log-likelihood values, AIC, K-S test statistics and the corresponding p-values.
Figure 4

PDFs for the famous Old Faithful Geyser eruption data.

Table 2

The MLEs and goodness-of-fit statistics for the famous Old Faithful Eruption data

 

Five-parameter NW{C}

Mixture-normala

Four-parameter NW{C}

Beta-normal

MLE (SEb)

μ ^ = 1.883 (0.447)

μ ^ 1 = 4.273 (0.034)

μ ^ = 0.644 (0.288)

μ ^ = 3.163 (0.007)

σ ^ = 5.156 (0.645)

σ ^ 1 = 0.437 (0.027)

σ ^ = 3.756 (0.438)

σ ^ = 0.238 (0.001)

c ^ = 2.032 (0.150)

μ ^ 2 = 2.019 (0.026)

c ^ = 3.899 (0.168)

a ^ = 0.083 (0.007)

γ ^ = 1.885 (0.116)

σ ^ 2 = 0.236 (0.023)

γ ^ = 3.623 (0.031)

b ^ = 0.060 (0.004)

δ ^ = 1.343 (0.060)

p ^ = 0.652 (0.029)

Log-likelihood

-270.0

-276.35

-292.3

-372.566

AIC

550.0

562.7

592.6

753.1

K-S statistic

0.042

0.049

0.075

0.151

p-value

0.712

0.539

0.096

8.908e-06

aMixture normal is defined as pN(μ1, σ1) + (1 - p)N(μ2, σ2); bSE: standard error of the MLE.

The results in Table 2 indicate that the five-parameter NW{C} provides the best fit followed by mixture normal based on all three measures, log-likelihood, AIC and K-S statistic. When bimodality is a population characteristic, it may be more appropriate to fit the data with one distribution, instead of fitting the data by the mixture of two distributions. The NW{C} distribution can fit well a wide variety of distribution shapes, including bimodal data such as Old Faithful Geyser eruption data. Figure 4 displays the estimated PDF of the distributions that provide adequate fit to the data.

From Figure 4, the addition of the fifth parameter, which is a measure of location, has some effect on the fit. By using either a likelihood ratio test or the Wald test for the significance of the parameter δ, we observe that the parameter is significantly different from zero. Thus, a five-parameter NW{C} (and not a four-parameter NW{C}) should be used to fit the bivariate data. According to Johnson et al. (1994, p. 12), four-parameter distributions should be sufficient for most practical purposes. The authors went on to state and we quote “… but it is doubtful whether the improvement obtained by including a fifth or sixth parameter is commensurate with the extra labor involved”. For this application, adding the fifth parameter to the NW{C} improves the fit with an increase of more than 22 points in the log-likelihood value. Furthermore, the fifth parameter δ is significantly different from zero.

5.2 USS Halfbeak diesel engine data

Marciano et al. (2012) used the data to illustrate the application of the McDonald-gamma distribution (Mc-ΓD). The distribution of the data is highly skewed to the left and platykurtic (skewness = -1.576 and kurtosis = 1.653). The MLEs (with corresponding standard errors in parentheses) of the parameters of NW{C} distribution and the statistics AIC, the log-likelihood value and K-S and the corresponding p-values are given in Table 3. The values of AIC, log-likelihood and K-S statistics for Mc-ΓD, Kumaraswamy-gamma distribution (Kw-ΓD) are taken from Marciano et al. (2012). The other results in Table 3 are obtained by using NLMIXED procedure in SAS and the MATLAB software.
Table 3

Parameter estimate (standard error in parentheses) for USS Halfbeak diesel engine data

 

Four-parameter NW{C}

McDonald-gamma

Kumaraswamy-gamma

MLE (SEb)

μ ^ =6.072 (1.514)

α ^ = 99.865 (0.294)

α ^ = 6.384 (0.992)

σ ^ =4.217 (1.207)

β ^ = 2.030 (0.022)

β ^ = 0.1996 (0.040)

c ^ = 1.448 (0.161)

a ^ = 0.0421 (0.006)

b ^ = 2.403 (0.001)

γ ^ = 10.073 (0.936)

b ^ =200.040 (60.178)

c ^ =0.0013 (0.000)

c ^ =0.2796 (0.014)

Log-likelihood

-196.45

-217.35

-239.2

AIC

400.9

444.7

486.4

K-S

0.0811

0.2635

0.2849

P-value

0.739

1.045e-04

1.974e-05

b: standard error of the MLE.

The NW{C} distribution has the smallest AIC and K-S statistics, and the largest log-likelihood value, which indicates NW{C} is superior to the other distributions in Table 3. Figure 5 displays the estimated PDF of the NW{C}, Mc-Γ and Kw-Γ distributions. The figure shows that the NW{C} distribution provides the best fit to the data compared to other distributions.
Figure 5

The fitted PDFs for the USS Halfbeak diesel engine data.

6. Conclusions

This paper presents a method to generate the T-X(W) families of distributions introduced in Alzaatreh et al. (2013b) by defining the W function using the quantile function of another random variable Y. Table 1 contains some T-X{Y} families based on different quantile functions. The T-X{Y} framework provides an easy way for generating distributions of the T-X(W) family introduced by Alzaatreh et al. (2013b). Existing methods like the methods of combination reviewed in Lee et al. (2013) for generating univariate continuous distributions can be derived using the T-X{Y} framework. T-X{exponential}, T-X{Weibull} and T-X{Rayleigh} can be viewed as families of distributions arising from hazard functions. The T-X{Y} family is extended by using survival function of X. The family of life distributions derived by Marshall and Olkin (1997) can be derived using the T-X{Y} family based on survival functions. Three new distributions in the family, normal-Weibull{Cauchy}, normal-Weibull{logistic} and Weibull-uniform{log-logistic} distributions are defined. These distributions are very flexible and are capable of fitting various types of data. The Old Faithful Geyser eruption data are used to illustrate that the NW{C} distribution fits bimodal data very well, which typically can only be adequately fitted using mixture distributions.

Declarations

Acknowledgments

We are grateful for many constructive comments and suggestions from the associate editor and the two referees. These comments and suggestions have greatly improved the presentation of the paper.

Authors’ Affiliations

(1)
Department of Mathematics, Central Michigan University

References

  1. Akinsete A, Famoye F, Lee C: The beta-Pareto distribution. Statistics 2008, 42: 547–563. 10.1080/02331880801983876MathSciNetView ArticleMATHGoogle Scholar
  2. Alexander C, Cordeiro GM, Ortega EMM, Sarabia JM: Generalized beta-generated distributions. Computational statistics and data analysis 2012, 56(6):1880–1896. 10.1016/j.csda.2011.11.015MathSciNetView ArticleMATHGoogle Scholar
  3. Alshawarbeh A, Lee C, Famoye F: The beta-Cauchy distribution. Journal of Probability and Statistical Science 2012, 10: 41–58.MathSciNetGoogle Scholar
  4. Alzaatreh A, Famoye F, Lee C: Gamma-Pareto distribution and its applications. Journal of Modern Applied Statistical Methods 2012a, 11(1):78–94.Google Scholar
  5. Alzaatreh A, Lee C, Famoye F: On the discrete analogues of continuous distributions. Statistical Methodology 2012b, 9: 589–603. 10.1016/j.stamet.2012.03.003MathSciNetView ArticleGoogle Scholar
  6. Alzaatreh A, Famoye F, Lee C: Weibull-Pareto distribution and its applications. Communications in Statistics-Theory and Methods 2013a, 42: 1673–1691. 10.1080/03610926.2011.599002MathSciNetView ArticleGoogle Scholar
  7. Alzaatreh A, Lee C, Famoye F: A new method for generating families of continuous distributions. Metron 2013b, 71(1):63–79. 10.1007/s40300-013-0007-yMathSciNetView ArticleGoogle Scholar
  8. Arellano-Valle RB, Cortés MA, Gómez HW: An extension of the epsilon-skew-normal distribution. Communications in Statistics-Theory and Methods 2010, 39(5):912–922. 10.1080/03610920902807903MathSciNetView ArticleMATHGoogle Scholar
  9. Ascher H, Feingold H: Repairable Systems Reliability. Marcel Dekker, New York; 1984.MATHGoogle Scholar
  10. Azzalini A: A class of distributions which includes the normal ones. Scand J Stat 1985, 12: 171–178.MathSciNetMATHGoogle Scholar
  11. Cordeiro GM, de Castro M: A new family of generalized distributions. J Stat Comput Simul 2011, 81(7):883–898. 10.1080/00949650903530745MathSciNetView ArticleMATHGoogle Scholar
  12. Cordeiro GM, Lemonte AJ: The β-Birnbaum-Saunders distribution: an improved distribution for fatigue life modeling. Computational Statistics and Data Analysis 2011, 55(3):1445–1461. 10.1016/j.csda.2010.10.007MathSciNetView ArticleMATHGoogle Scholar
  13. Cordeiro GM, Ortega EMM, Nadarajah S: The Kumaraswamy Weibull distribution with application to failure data. J Franklin Inst 2010, 347: 1399–1429. 10.1016/j.jfranklin.2010.06.010MathSciNetView ArticleMATHGoogle Scholar
  14. Cordeiro GM, Nadarajah S, Ortega EMM: The Kumaraswamy Gumbel distribution. Statistical Methods and Applications, 2012 2011, 21(2):139–168.MathSciNetView ArticleGoogle Scholar
  15. Dekking FM, Kraaikamp C, Lopuhaä HP, Meester LE: A Modern Introduction to Probability and Statistics. Springer, New York; 2005.View ArticleMATHGoogle Scholar
  16. Eugene N, Lee C, Famoye F: The beta-normal distribution and its applications. Communications in Statistics-Theory and Methods 2002, 31(4):497–512. 10.1081/STA-120003130MathSciNetView ArticleMATHGoogle Scholar
  17. Famoye F, Lee C, Eugene N: Beta-normal distribution: bimodality properties and applications. Journal of Modern Applied Statistical Methods 2004, 3(1):85–103.Google Scholar
  18. Ferreira JTAS, Steel MFJ: A constructive representation of univariate skewed distributions. J Am Stat Assoc 2006, 101: 823–829. 10.1198/016214505000001212MathSciNetView ArticleMATHGoogle Scholar
  19. Härdle W: Smoothing Techniques with Implementation in S. Springer, New York; 1991.View ArticleMATHGoogle Scholar
  20. Johnson NL: Systems of frequency curves generated by methods of translation. Biometrika 1949, 36: 149–176. 10.1093/biomet/36.1-2.149MathSciNetView ArticleMATHGoogle Scholar
  21. Johnson NL, Kotz S, Balakrishnan N: Continuous Univariate Distributions, Vol. 1. 2nd edition. John Wiley and Sons, Inc., New York; 1994.MATHGoogle Scholar
  22. Jones MC: Families of distributions arising from distributions of order statistics. Test 2004, 13: 1–43. 10.1007/BF02602999MathSciNetView ArticleMATHGoogle Scholar
  23. Jones MC: Kumaraswamy’s distribution: a beta-type distribution with tractability advantages. Statistical Methodology 2009, 6: 70–81. 10.1016/j.stamet.2008.04.001MathSciNetView ArticleMATHGoogle Scholar
  24. Kotz S, Vicari D: Survey of developments in the theory of continuous skewed distributions. Metron 2005, LXIII: 225–261.MathSciNetGoogle Scholar
  25. Kumaraswamy P: A generalized probability density functions for double-bounded random processes. J Hydrol 1980, 46: 79–88. 10.1016/0022-1694(80)90036-0View ArticleGoogle Scholar
  26. Lai CD: Constructions and applications of lifetime distributions. Appl Stoch Model Bus Ind 2013, 29: 127–140. 10.1002/asmb.948MathSciNetView ArticleMATHGoogle Scholar
  27. Lee C, Famoye F, Alzaatreh A: Methods for generating families of univariate continuous distributions in the recent decades. WIREs Computational Statistics 2013, 5: 219–238. 10.1002/wics.1255View ArticleGoogle Scholar
  28. Marciano FWP, Nascimento ADC, Santos-Neto M, Corderio GM: The Mc-Γ distribution and its statistical properties: an application to reliability data. International Journal of Statistics and Probability 2012, 1(1):53–71.View ArticleGoogle Scholar
  29. Marshall AW, Olkin I: A new method for adding a parameter to a family of distributions with applications to the exponential and Weibull families. Biometrika 1997, 84: 641–652. 10.1093/biomet/84.3.641MathSciNetView ArticleMATHGoogle Scholar
  30. Marshall AW, Olkin I: Life Distributions. Springer, New York; 2010.MATHGoogle Scholar
  31. McDonald JB: Some generalized functions for the size distribution of income. Econometrica 1984, 52: 647–663. 10.2307/1913469View ArticleMATHGoogle Scholar
  32. McDonald JB, Xu YJ: A generalization of the beta distribution with applications. J Econ 1995, 66: 133–152. 10.1016/0304-4076(94)01612-4View ArticleMATHGoogle Scholar
  33. Meeker WQ, Escobar LA: Statistical methods for reliability data. John Wiley & Sons, New York; 1998.MATHGoogle Scholar
  34. Pearson K: Contributions to the mathematical theory of evolution. II. Skew variation in homogeneous material. Philos Trans R Soc Lond A 1895, 186: 343–414. 10.1098/rsta.1895.0010View ArticleGoogle Scholar
  35. Polyanin AD, Manzhirov A: Handbook of Integral Equations. 2nd edition. Chapman & Hall/CRC, New York; 2008.View ArticleMATHGoogle Scholar
  36. Shannon CE: A mathematical theory of communication. Bell System Technical Journal 1948, 27: 379–432. 10.1002/j.1538-7305.1948.tb01338.xMathSciNetView ArticleMATHGoogle Scholar
  37. Shorack GR, Wellner JA: Empirical Processes with Applications to Statistics. John Wiley & Sons, New York; 1986.MATHGoogle Scholar
  38. Tukey JW Technical Report 36. In The Practical Relationship Between the Common Transformations of Percentages of Counts and Amounts. Princeton University, Princeton, NJ, Statistical Techniques Research Group; 1960.Google Scholar

Copyright

© Aljarrah et al.; licensee Springer. 2014

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.