# A new extended normal regression model: simulations and applications

## Abstract

Various applications in natural science require models more accurate than well-known distributions. In this context, several generators of distributions have been recently proposed. We introduce a new four-parameter extended normal (EN) distribution, which can provide better fits than the skew-normal and beta normal distributions as proved empirically in two applications to real data. We present Monte Carlo simulations to investigate the effectiveness of the EN distribution using the Kullback-Leibler divergence criterion. The classical regression model is not recommended for most practical applications because it oversimplifies real world problems. We propose an EN regression model and show its usefulness in practice by comparing with other regression models. We adopt maximum likelihood method for estimating the model parameters of both proposed distribution and regression model.

## Introduction

In recent years, several methods for generating new models from classic distributions have been proposed. A detailed study about “the evolution of methods for generalizing classic distributions” was made by Lee et al. (2013). A generalization of the standard normal distribution is sought because it can provide more accurate statistical models and inferential procedures. For instance, the beta normal distribution was pioneered by Eugene et al. (2002), who discussed some of its structural properties.

Additionally, the beta generalized normal (BGN) distribution was proposed by Cintra et al. (2013) to extend the beta normal distribution. They applied the BGN model to the synthetic aperture radar image processing. This paper presents a new extended normal (EN) distribution based on the family introduced by Cordeiro et al. (2013).

For any continuous cumulative distribution function (cdf) G(x), Cordeiro et al. (2013) defined the cdf of the exponentiated generalized (EG) family by

$$\begin{array}{@{}rcl@{}} F(x) = \left[1- \left\{1 - G(x)\right\}^{a}\right]^{b}, \end{array}$$
(1)

where a>0 and b>0 are two additional shape parameters whose role is to generate distributions with heavier/lighter tails and provide wider ranges for skewness and kurtosis. These parameters are sought as a manner to furnish a more flexible distribution.

Because of its tractable cdf (1), the EG family can be used quite effectively even if the data are censored. This family is capable to return univariate models for any type of support. Further, it allows for greater flexibility of its tails and can be widely applied in many areas such as engineering and biology.

Its probability density function (pdf) has a very simple form

$$f(x) =a\,b\,\left\{1 - G(x)\right\}^{a-1}\,\left[1-\left\{1- G(x)\right\}^{a}\right]^{b -1}\,g(x).$$
(2)

An important advantage of the density (2) is its ability of fitting skewed data that can not be often fitted by existing distributions. Based on the cdf G(x) and pdf g(x) of any baseline G distribution, we can associate the EG-G pdf (2) with two extra parameters. The EG family can be used for discriminating between the G and EG-G distributions.

The baseline distribution G(x) is a special case of (2) when a=b=1. For a=1, it gives the exponentiated-G (“Exp-G”) class. If b=1, we obtain the Lehmann type II-G (LTII-G) class. Eq. (2) generalizes both Lehmann types I and II alternative classes (Lehmann 1953). In fact, this equation can be defined as the exponentiated generator applied to the LTII-G class.

Note that even if g(x) is a symmetric density, the density f(x) will not be symmetric. The cdf (1) has tractable properties especially for simulations, since its quantile function (qf) has a simple form

$$x\,=\,Q_{G}\left\{\,\left[\,1\,-\,\left(1\,-\,u^{\frac 1b}\right)^{\frac 1a}\,\right]\,\right\},$$

where QG(u) is the baseline qf.

This paper is outlined as follows. In Section 2, we define the EN distribution and provide plots of its density function. A linear representation for the EN density function is derived in Section 3. We obtain an explicit expression for its moments in Section 4. In Section 5, we provide the maximum likelihood estimates (MLEs) of the parameters. In Section 6, we define the EN regression model and discuss the estimation of the model parameters. In Section 7, we perform some simulations and present three applications to real data sets. Finally, some concluding remarks are addressed in Section 8.

## The EN distribution

Due to the analytical tractability of its pdf and its importance in asymptotic theory (such as the central limit theorem and delta mehtod), the normal distribution is the most popular model distribution in applications to real data with support in $$\mathbb {R}$$.

When the number of observations is large, it can serve as an approximate distribution for several other models. The normal N (μ,σ) pdf (for $$x \in \mathbb {R}$$) is

$$\begin{array}{@{}rcl@{}} g(x;\mu,\sigma)\,=\,\frac{1}{\sqrt{2\,\pi}\,\sigma}\,\mathrm{e}^{-\frac 12\left(\frac{x-\mu}{\sigma}\right)^{2}} \,=\,\frac{1}{\sigma}\phi\left(\frac{x-\mu}{\sigma}\right), \end{array}$$
(3)

where $$\mu \in \mathbb {R}$$ is a mean parameter, σ>0 is a scale parameter and $$\phantom {\dot {i}\!}\phi (x)\,=\,(2\pi)^{-1/2}\,\mathrm {e}^{-x^{2}/2}$$ is the standard normal pdf.

Its cdf has the form

$$\begin{array}{@{}rcl@{}} G(x;\mu,\sigma)\,=\,\int_{-\infty}^{x}\,g(t;\mu,\sigma)\,\mathrm{d}t\,=\,\Phi\left(\frac{x-\mu}{\sigma}\right), \end{array}$$
(4)

where $$\Phi (x)\,=\,\int _{-\infty }^{x}\,\phi (t)\,\mathrm {d}t$$ is the standard normal cdf.

By inserting (3) and (4) in Eqs. (1) and (2), the cdf and pdf of the EN distribution (for $$x \in \mathbb {R}$$) can be expressed, respectively, as

$$\begin{array}{@{}rcl@{}} F(x)\,=\,\left[\,1\,-\, \left\{1 - \Phi\left(\frac{x-\mu}{\sigma}\right)\right\}^{a}\,\right]^{b} \end{array}$$
(5)

and

$$\begin{array}{@{}rcl@{}} f(x)&=&\frac{a\,b}{\sigma}\,\left\{1 - \Phi\left(\frac{x-\mu}{\sigma}\right)\right\}^{a-1}\,\left[1-\left\{1- \Phi\left(\frac{x-\mu}{\sigma}\right)\right\}^{a}\right]^{b -1}\\ &&\times\phi\left(\frac{x-\mu}{\sigma}\right). \end{array}$$
(6)

Hereafter, a random variable X having density (6) is denoted by XEN(a,b,μ,σ). Evidently, this density does not involve any complicated function and the normal distribution arises as the basic exemplar when a=b=1. It is a positive point of the current generalization. Moreover, the qf of X is

$$\begin{array}{@{}rcl@{}} Q_{\text{EN}}(p) \,=\, \mu\,+\,\sigma\,\Phi^{-1}\left(1\,-\,\left[1\,-\,p^{\frac{1}{b}}\right]^{\frac{1}{a}}\right). \end{array}$$
$${\kern90pt}m=\,E(X)\,=\,\sigma\,a\,b\,\mathcal{I}_{a,b}\,+\,\mu,$$

where

$$\mathcal{I}_{a,b}\,=\, \int_{-\infty}^{\infty} z\,\phi(z)\,[1\,-\,\Phi(z)]^{a-1}\,\left\{1\,-\,[1\,-\,\Phi(z)]^{a}\right\}^{b-1}\,\mathrm{d}z.$$

In next sections, other moment results are proved. Moreover, from the previous qf of the EN distribution, the associated median, say M, is

$$M=\,Q_{\text{EN}}(1/2)\,=\,\sigma\,z_{a,b}\,+\,\mu,$$

where za,b=Φ−1(1 − [1 − 2−1/b]1/a) is the standard normal quantile at 1 − [1 − 2−1/b]1/a. Thus, the next function suggests a symmetric discussion:

$$\left\{ \begin{array}{l} \text{right asymmetry}, \quad \text{if }z_{a,b} > a\,b\,\mathcal{I}_{a,b}\\ \text{symmetry}, \quad \text{if }z_{a,b} = a\,b\,\mathcal{I}_{a,b}\\ \text{left asymmetry}, \quad \text{if }z_{a,b} < a\,b\,\mathcal{I}_{a,b}. \end{array} \right.$$

We motivate the paper by comparing the performances of the EN, normal, skew-normal (SN) and beta-normal (BN) models fitted to two real data sets. Figure 1 displays possible shapes of the density function (6) for some parametervalues. We can note the flexibility of the EN distribution with respect to the normal distribution.

## Linear representation

A useful linear representation for (2) can be derived using the concept of exponentiated distributions. For an arbitrary baseline cdf G(x), a random variable T is said to have the exponentiated-G (Exp-G) distribution with power parameter a>0, say TExp-G (a), if its pdf and cdf are

$$\begin{array}{@{}rcl@{}} H_{a}(x)\,=\,G^{a}(x)\,\,\,\,\text{and}\,\,\,\, h_{a}(x)\,=\,a\,g(x)\,G^{a-1}(x), \end{array}$$

respectively. Several properties of the exponentiated distributions have been studied by some authors recently such as those for the exponentiated Weibull (Mudholkar and Srivastava 1993) and exponentiated generalized gamma (Cordeiro et al. 2013) distributions.

### Theorem 1

Let X EN (a,b,μ,σ). The pdf of X can be written as

$$\begin{array}{@{}rcl@{}} f(x) = \sum_{j=0}^{\infty} \,w_{j+1}\,h_{j+1}(x), \end{array}$$
(7)

where hj+1(x) is the exponentiated-normal (Exp-N) density with power parameter j+1, say Exp- N (μ,σ,j+1), namely

$$\begin{array}{@{}rcl@{}} h_{j+1}(x)=\frac{(j+1)}{\sigma}\,\phi\left(\frac{x-\mu}{\sigma}\right)\,\Phi\left(\frac{x-\mu}{\sigma}\right)^{j}. \end{array}$$

The proof of this theorem is given in Appendix A.

It is possible to verify using symbolic software (such as Maple) that $$\sum _{j=0}^{\infty } \,w_{j+1}=1$$ as expected.

Equation (7) is the main result of this section. It reveals that the EN density is a linear combination of Exp-N densities. So, several mathematical properties of the proposed distribution can then be obtained from those of the Exp-N distribution using previous results given by Rêgo et al. (2012).

## Moments

First, we determine the probability weighted moments (PWMs) of the standard normal distribution since they are required for the ordinary moments of the EN distribution. The standard normal PWMs are defined by

$$\tau_{n,j}\,=\,\int_{-\infty}^{\infty}\,z^{n}\,\Phi(z)^{j}\,\phi(z)\,\mathrm{d}z,$$

for n≥0 and j≥0 integers.

The result holds

$$\Phi(z)=\frac{1}{2}\left\{1+ \text{erf}\left(\frac{z}{\sqrt{2}}\right)\right\},\quad z \in \mathbb{R}.$$

Applying the binomial expansion and interchanging terms gives

$$\tau_{n,j}=\frac{1}{2^{j}\sqrt{2\pi}}\,\sum_{m=0}^{j} {j \choose m}\,\int_{-\infty}^{\infty} z^{n}\, \text{erf}\left(\frac{z}{\sqrt 2}\right)^{j-m}\,\exp\left(-\frac{z^{2}}{2}\right)dz.$$

Based on the power series for the error function

$$\text{erf}(z)=\frac{2}{\sqrt{\pi}}\sum_{r=0}^{\infty} \frac{(-1)^{r} z^{2r+1}}{(2r+1)\,r!},$$

we can obtain τn,j from Eqs. (9)–(11) given by Nadarajah (2008).

For n+jr even, we have

$$\begin{array}{@{}rcl@{}} \tau_{n,j}&=& 2^{n/2}\,\pi^{-(j+1/2)} \sum\limits_{\overset{r=0}{(n+j-r)\,\text{even}}}^{j}\, {j \choose r}\, \left(\frac{\pi}{2}\right)^{r}\,\Gamma\left(\frac{n+j-r+1}{2}\right)\times \\ && F_{A}^{(j-r)} \left(\frac{n+j-r+1}{2};\frac{1}{2},\ldots,\frac{1}{2};\frac{3}{2},\ldots,\frac{3}{2};-1,\ldots,-1 \right), \end{array}$$
(8)

where $$F_{A}^{(j-r)}(\cdot)$$ is the Lauricella function of type A. See, for example, Exton (1978)Footnote 1. If n+kj is odd, the corresponding terms in τn,j vanish.

### Corollary 1

Suppose that $$\mu _{n}^{\prime }= E(X^{n})$$ exists. Then,

$$\begin{array}{@{}rcl@{}} \mu_{n}^{\prime}=\mathrm{E}(X^{n})=\sum_{j=0}^{\infty} (j+1)\,w_{j+1}\,\tau_{n,j}, \end{array}$$
(9)

where τn,j is given by (8).

The skewness and kurtosis of X can be computed from QEN(p) using Bowley and Moors well-known quantities. Figure 2 displays plots of the skewness and kurtosis measures of X for selected values of a and b. We note that the skewness and kurtosis values for the normal distribution are obtained when values for (a,b) tend to (1,1).

## Estimation

Consider a random variable XEN(a, b, μ, σ) and let θ=(a, b, μ, σ) be the model parameters, where (·) is the transposition operator. Thus, the associated log-likelihood function for one observation x is

$$\begin{array}{@{}rcl@{}} \ell(\boldsymbol{\theta};x)&\,=\,\log(a)\,+\,\log(b)\,-\,\log(\sigma) \,+\,(a\,-\,1)\,\log\left[1\,-\,\Phi\left(\frac{x\,-\,\mu}{\sigma}\right)\right] \\ &\,+\,(b\,-\,1)\log\left\{\,1\,-\,\left[\,1\,-\,\Phi\left(\frac{x\,-\,\mu}{\sigma}\right)\,\right]^{a}\right\} \,+\,\log\left[\phi\left(\frac{x\,-\,\mu}{\sigma}\right)\right]. \end{array}$$
(10)

Given a data set x1,…,xn, the MLE of θ is determined by maximizing $$\ell _{n}(\boldsymbol {\theta })\,=\,\sum _{i=1}^{n}\,\ell (\boldsymbol {\theta };x_{i}).$$

Based on Eq. (10), the score vector is

$$\begin{array}{@{}rcl@{}} \boldmath{U}_{\theta}&=&(\,U_{a},\,U_{b},\,U_{\mu},\,U_{\sigma}\,)^{\top}\\ &=&\left(\,\frac{\partial\,\ell_{n}(\boldsymbol{\theta})}{\partial a}, \,\,\,\frac{\partial\,\ell_{n}(\boldsymbol{\theta})}{\partial b}, \,\,\,\frac{\partial\,\ell_{n}(\boldsymbol{\theta})}{\partial \mu}, \,\,\,\frac{\partial\,\ell_{n}(\boldsymbol{\theta})}{\partial \sigma} \,\right)^{\top}, \end{array}$$

whose components are

$$\begin{array}{@{}rcl@{}} U_{a} &=& \frac na\,+\,\sum_{i=1}^{n}\,\log\left[1\,-\,\Phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\right]\\ &&-\,(b\,-\,1)\,\sum_{i=1}^{n}\, \left[ \frac{\left[1\,-\,\Phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\right]^{a}\log\left[1\,-\,\Phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right) \right]}{1\,-\,\left[\,1\,-\,\Phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\,\right]^{a}}\right], \end{array}$$
$$U_{b}\,=\,\frac nb \,+\,\sum_{i=1}^{n}\,\log\left\{1\,-\,\left[\,1\,-\,\Phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\,\right]^{a}\right\},$$
$$\begin{array}{@{}rcl@{}} U_{\mu}&=&\left(\frac{a\,-\,1}{\sigma}\right)\,\sum_{i=1}^{n}\,\left[\frac{\phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right) }{1\,-\,\Phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)}\right] \,-\,\frac{1}{\sigma}\,\sum_{i=1}^{n}\,\left[\frac{\phi^{\prime}\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)}{\phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)}\right]\\ &&-\frac{a\,(b-1)}{\sigma}\, \sum_{i=1}^{n}\,\left\{ \frac{ \phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right) \left[1\,-\,\Phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\right]^{a-1} }{ 1\,-\,\left[1\,-\,\Phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\right]^{a} }\right\}\\ \end{array}$$

and

$$\begin{array}{@{}rcl@{}} U_{\sigma}&=&\frac n\sigma\,-\,\frac{\mu (a-1)}{\sigma^{2}}\, \sum_{i=1}^{n}\,\left[\frac{\phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right) }{1\,-\,\Phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)}\right] \,+\,\frac{\mu}{\sigma^{2}}\,\sum_{i=1}^{n}\,\left[\frac{\phi^{\prime}\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)} {\phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)}\right]\\ &+&\frac{a \mu (b-1)}{\sigma^{2}}\, \sum_{i=1}^{n}\,\left\{ \frac{ \phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right) \left[1\,-\,\Phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\right]^{a-1} }{ 1\,-\,\left[1\,-\,\Phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\right]^{a} }\right\}. \end{array}$$

An advantage of the EN distribution is that the MLE $$\widehat {b}$$ has a partially closed-form expression. Suppose that the observed information matrix is non-negative definite. The MLE of b can be expressed in terms of the MLEs $$\widehat {a},\widehat {\mu }$$ and $$\widehat {\sigma }$$ as

$$\begin{array}{@{}rcl@{}} \widehat{b}&=&\varphi(\widehat{a},\widehat{\mu},\widehat{\sigma},\left\{x_{1},\,\ldots,x_{n}\right\})\\ &=&\left(\, n^{-1}\,\sum_{i=1}^{n}\,\log\left\{1\,-\,\left[\,1\,-\,\Phi\left(\frac{x_{i}\,-\,\widehat{\mu}}{\widehat{\sigma}}\right)\,\right]^{\widehat{a}}\right\} \, \right)^{-1}. \end{array}$$
(11)

This fact is important at least for two reasons. The estimates become the solutions of a system with three equations and three variables (say “(3,3) system”) instead of a (4,4) system. Further, Eq. (11) clarifies the relationship of $$\widehat {b}$$ with $$\widehat {a}$$, $$\widehat {\mu }$$ and $$\widehat {\sigma }$$. More details are described in the simulation section.

Additionally, in order to make inference on the model parameters, the total observed information matrix is J(θ)={−Urs}, where Urs=2 (θ)/θr θs, for r,s{a,b,μ,σ}. By differentiating the score function, we obtain the Hessian matrix elements Urs given in Appendix B.

## The EN regression model

The classical normal linear regression model is usually applied in science and engineering to describe symmetrical data for which linear functions of unknown parameters are used to explain the phenomena under study. However, it is well-known that several phenomena are not always in agreement with the classical regression model due to lack of symmetry and/or the presence of heavy and lightly tails in the empirical distribution. As an alternative to overcome this shortcoming, we propose a new regression model based on the EN distribution thus extending the normal linear regression.

Let vi=(vi1,…,vip) be the p×1 explanatory variable vector associated with the ith response variable xi (for i=1,…,n). Let Xi be a response variable having the EN distribution given by (6) re-parameterized as

$$X_{i}=\mathbf{v}_{i}^{\top} \,\boldsymbol{\beta}\,+\,\sigma\, Z_{i},$$
(12)

where the random error ZEN(a,b,0,1) has the standardized EN distribution, β=(β1,…,βp) is the unknown vector of coefficients, σ>0 is an unknown dispersion parameter and vi is the explanatory vector modeling the location parameter $$\mu _{i}=\mathbf {v}_{i}^{\top } \boldsymbol {\beta }$$.

Hence, the location parameter vector μ=(μ1,…,μn) of the EN regression model has the linear structure μ=Vβ, where V=[v1|…|vn] is a known model matrix.

The EN regression model (12) opens new possibilities for fitting many different types of data, since the EN distribution is much more flexible then the normal distribution. The most important special regressions are:

• For a=1, it gives the exponentiated-normal (Exp-N) regression model, which has not been explored, but it can be understood as a regression under the power normal distribution pioneered by Kundu and Gupta (2013).

• For b=1, it reduces to the LTII-normal (LTII-N) regression model defined as a linear model under the LTII-N distribution.

• If a=b=1, it reduces to the normal linear regression.

For statistical inference on the EN regression model, we consider a sample (X1,v1),…,(Xn,vn) of n independent observations. The log-likelihood function for the vector of parameters η=(a,b,σ,β) of model (12) is

$$\begin{array}{@{}rcl@{}} \ell({\boldsymbol{\eta}}) &\,=\,& n\log\left(\frac{a\,b}{\sigma}\right)\,+\, \sum_{i=1}^{n}\log[\phi(z_{i})] \,+\, (a-1)\sum_{i=1}^{n}\log[1-\Phi(z_{i})]\\ &&\,+\, (b-1)\sum_{i=1}^{n}\log\{1-[1-\Phi(z_{i})]^{a}\}, \end{array}$$
(13)

where $$z_{i}=({x_{i}-\mathbf {v}_{i}^{\top }\boldsymbol {\beta }})/\sigma$$ and xi is a possible outcome of Xi.

The components of the score vector U(η) are

$$\begin{array}{@{}rcl@{}} \frac{\partial l({\boldsymbol{\eta}})}{\partial a}&=&\frac{n}{a}+\sum_{i=1}^{n}\log[1-\Phi(z_{i})]-(b-1)\sum_{i=1}^{n}\frac{[1-\Phi(z_{i})]^{a} \log[1-\Phi(z_{i})]}{\{1-[1-\Phi(z_{i})]^{a}\}}, \end{array}$$
$$\begin{array}{@{}rcl@{}} \frac{\partial l({\boldsymbol{\eta}})}{\partial b}&=&\frac{n}{b}+\sum_{i=1}^{n}\log\{1-[1-\Phi(z_{i})]^{a}\}, \end{array}$$
$$\begin{array}{@{}rcl@{}} \frac{\partial l({\boldsymbol{\eta}})}{\partial\sigma}&=&-\frac{n}{\sigma}-\frac{2}{\sigma}\sum_{i=1}^{n}z_{i}^{2}+\frac{(a-1)}{\sigma} \sum_{i=1}^{n}\frac{z_{i}\phi(z_{i})}{[1-\Phi(z_{i})]}\\ &-&\frac{a\,(b-1)}{\sigma}\sum_{i=1}^{n}\frac {z_{i}\phi(z_{i})[1-\Phi(z_{i})]^{a-1}}{\{1-[1-\Phi(z_{i})]^{a}\}}, \end{array}$$
$$\begin{array}{@{}rcl@{}} \frac{\partial l({\boldsymbol{\eta}})}{\partial\beta_{j}}&=&-\frac{2}{\sigma} \sum_{i=1}^{n}v_{ij}z_{i}+\frac{(a-1)}{\sigma}\sum_{i=1}^{n} \frac{v_{ij}\phi(z_{i})}{[1-\Phi(z_{i})]}\\ &-&\frac{a\,(b-1)}{\sigma}\sum_{i=1}^{n}\frac{v_{ij}\phi(z_{i})[1-\Phi(z_{i})]^{a-1}} {\{1-[1-\Phi(z_{i})]^{a}\}}, \end{array}$$

where j=1,…,p.

Note that a closed-form expression for the MLE $$\widehat {\boldsymbol {\eta }}$$ is analytically intractable and, therefore, its computation has to be performed numerically by means of a nonlinear optimization algorithm.

We can maximize the log-likelihood function (13) based on the Newton-Raphson method. In particular, we use the matrix programming language Ox (MaxBFGS function) (see Doornik 2007) to calculate $$\widehat {{\boldsymbol {\eta }}}$$. Initial values for β and σ can be taken from the fit of the classical regression model (a=b=1).

Under general regularity conditions, the asymptotic distribution of $$(\widehat {\boldsymbol {\eta }}-{\boldsymbol {\eta }})$$ is multivariate normal Np+3(0,K(η)−1), where K(η) is the expected information matrix. These conditions can be found in Cox and Hinkley’s Theoretical Statistics book (1974). The asymptotic covariance matrix K(η)−1 of $$\widehat {{\boldsymbol {\eta }}}$$ can be approximated by the inverse of the (p+3)×(p+3) observed information matrix J(η) and then the inference on the parameter vector η can be based on the normal approximation Np+3(0,J(η)−1) for $$\widehat {{\boldsymbol {\eta }}}$$.

Besides estimation of the model parameters, hypotheses tests can be considered using likelihood ratio (LR) statistics.

## Numerical results

Three studies are presented in this section. First, we perform a Monte Carlo simulation study. Subsequently, two applications to real data show the potential uses of the new distribution. Third, the usefulness of the proposed regression model in Section 6 is proved empirically based on quality of life data.

### 7.1 Simulation study

Here, we provide a Monte Carlo simulation study in order to quantify the effectiveness of the EN distribution based on the symmetrized Kullback-Leibler divergence as a goodness-of-fit comparison criterion.

Initially, we provide a brief discussion on the Kullback-Leibler divergence. According to Cover and Thomas (1991), this measure is the quantification of the error by assuming that the Y model is true when the data follow the X distribution. For example, it has been proposed as essential parts of test statistics and strongly applied to contexts of radar synthetic aperture image processing in both univariate (Nascimento et al. 2010) and polarimetric (or multivariate) (Nascimento et al. 2014) perspectives.

In order to work with measures which satisfy non-negativity, symmetry and definiteness properties, Nascimento et al. (2010) considered the measure dKL, namely

$$\begin{array}{@{}rcl@{}} & d_{\text{KL}}(X,Y)\,=\,\frac 12\,[\,D(X||Y)\,+\,D(Y||X)\,] \\ \,&\,=\,\int_{\mathcal D}\, \underbrace{ (\,f_{X}(x;[a_{x},b_{x},\mu_{x},\sigma_{x}])\,-\,f_{Y}(x;[a_{y},b_{y},\mu_{y},\sigma_{y}])\,) \,\log\left(\frac{ f_{X}(x;[a_{x},b_{x},\mu_{x},\sigma_{x}]) }{ f_{Y}(x;[a_{y},b_{y},\mu_{y},\sigma_{y}]) } \right) }_{ \equiv \,\text{IntegrandKL(x,y)} } \mathrm{d}x. \end{array}$$

Figure 3 displays both functions IntegrandKL(x,y) and dKL(X,Y) at the parametric point [a,b,μ,σ]=[a,b,0,1] when a,b=4,5,6. It is noticeable that this measure can be understood as a distance between the two points– θ1=(a1,b1,μ2,σ1) and θ2=(a2,b2,μ2,σ2)–in the parametric space, say dKL(θ1,θ2).

For increasing values of ε, the IntegrandKL (X,Y) has different forms. Further, IntegrandKL (X,Y)→0 when ε→0.

Figure 3b and c reveal the influence of a and b, respectively, when we employ a perturbation in each parameter under (μ,σ)=(0,1). As expected, when the value of ε increases, the distance d KL also increases in both cases. However, this distance is most evident when we take smaller negative values of ε.

Table 1 gives the asymptotic performance of the maximum likelihood procedure discussed in the previous section with respect to the Kullback-Leibler distance, where we identify critical scenarios under the parametric space, which can require a harder maximum likelihood estimation. The results support the fact: “when we wish to estimate one additional parameter (a or b) given that the MLE for the other parameter is known and higher than one, then the biases of the estimates tend to increase for high values of the parameter of interest.” In particular, at the MLE of b given $$\widehat {a}$$, the above information finds strong justification in Eq. (11). Based on this equation, when $$\widehat {a}$$ takes high values, the MLE of b collapses for an indetermination algebraic.

### 7.2 Two applications to real data

Here, we perform two applications to real data sets. First, we consider the data the strengths of glass fibres analyzed by Jones and Faddy (2004). These data were obtained at the National Physical Laboratory (UK) to explain the breaking strength of sixty three glass fibres having length 1.5 cm.

As a second application, we consider the fatigue life data (Meeker and Escobar 1998) for sixty seven specimens of Alloy T7987 that failed before having accumulated three hundred thousand cycles of testing. The data set was rounded to the nearest thousand cycles.

We prove empirically the efficiency of the EN distribution versus the normal, skew-normal (SN) (Azzalini 1984) and beta normal (BN) (Eugene et al. 2002) distributions.

The SN density [ TSN(a,μ,σ)] has the form (for $$x,\,a,\,\mu \in \mathbb {R}$$ and σ>0)

$$f(x;a,\mu,\sigma)\,=\,\frac 2\sigma\, \phi\left(\frac{x\,-\,\mu}{\sigma}\right)\, \Phi\left[a\,\left(\frac{x\,-\,\mu}{\sigma}\right)\right]$$

and the BN density [ TBN(α,β,μ,σ)] is (for $$x,\,\mu \in \mathbb {R}$$ and α,β,σ>0)

$$f(x;\alpha,\beta,\mu,\sigma) \,=\,J\, \phi\left(\frac{x\,-\,\mu}{\sigma}\right)\, \left[\Phi\left(\frac{x\,-\,\mu}{\sigma}\right)\right]^{\alpha-1}\, \left[1\,-\,\Phi\left(\frac{x\,-\,\mu}{\sigma}\right)\right]^{\beta-1},$$

where J=Γ(α+β)/[Γ(α) Γ(β) σ].

We compare the distributions using three goodness-of-fit (GoF) measures: Anderson-Darling (A ), Cramer-Von Mises (W ) and Kolmogorov-Smirnov (KS) statistics. We adopt the goodness.fit function from the R program through the BFGS method. According to detailed discussion in Quang (1989), these measures are more indicated than the Akaike information criterion (AIC) and Bayesian information criterion (BIC) or some of their variations, which are more useful for nested models. Table 2 gives the GoF measures for each fitted distribution with respect to both data sets.

The GoF’s measures for the EN distribution correspond to the lowest values among the discrimination criteria (highlighted in Table 2). These results provide evidence that the EN distribution is the most suitable model (among those considered) to describe both data sets.

### 7.3 Application for regression models

We assess changes on the oral health-related quality of life (OHRQL) of schoolchildren. To that end, a follow-up exam of three years was made to evaluate the impact of caries incidence on the OHRQL of adolescents. The data were obtained from a study (for more details, see Paula et al. 2012) developed by the Department of Community Dentistry, Division of Health Education and Health Promotion, Piracicaba Dental School, University of Campinas-UNICAMP.

The variables employed are (for i=1,…,291):

• xi: overall score of the OHRQL at time of follow up;

• vi1: number of teeth decayed, missing and filled (TDMF)

(0=without TDMF increment; 1=with TDMF increment).

We analyze these data based on the EN regression model

$$X_{i}\,=\,\beta_{0}\,+\,\beta_{1}\,v_{i1}\,+\,\sigma\,Z_{i},\quad i=1,\ldots,291,$$

where the errors Zi’s are independent random variables having the EN (a,b,0,1) distribution.

The gamma-normal (GN) (Lima et al. 2015) distribution extends the normal distribution and can be used to fit data that come from a distribution with heavy tails reducing the influence of aberrant observations. The GN density with location parameter $$\mu \in \mathbb {R}$$, dispersion parameter σ>0 and shape parameter a>0 takes the form

$$\begin{array}{@{}rcl@{}} f(x)=\frac{1}{\sigma\Gamma(a)}\,\phi\left(\frac{x-\mu}{\sigma}\right) \left\{-\log\left[1-\Phi\left(\frac{x-\mu}{\sigma}\right)\right]\right\}^{a-1}. \end{array}$$

Further, the EN regression model is compared with the Exp-N, LTII-N, normal and GN regression models. Table 3 provides the MLEs of the parameters for the EN regression and these models.

Iterative maximization of the log-likelihood function (13) starts with initial values for β and σ taken from the fit of the classical regression model (a=b=1). In general, all fitted regression models reveal that v1 is significant at a 1% level of significance and that there is a significant difference between the levels of the numbers of teeth decayed, missing and filled. As expected, we find reciprocal relations between $$\mu _{i}=\mathbb {E}(X_{i})$$ and v1i in the EN, LTII-N, GN and normal regression models, except for the Exp-N regression (which-although well adjusted-does not seem to be a coherent model). On the other hand, based on the estimates of σ, the EN regression model reveals advantages in relation to the other models.

The values of the AIC, Consistent Akaike Information Criterion (CAIC) and BIC to compare the fitted models are given in Table 4.

It is clear that the EN regression model outperforms the other regressions irrespective of the criteria and then we can conclude that the new regression model can be used effectively in the analysis of the current data set. A comparison of the proposed regression model with some of its sub-models using LR statistics is addressed in Table 5.

The figures in this table, specially the p-values, indicate that the EN regression model yields a better fit to these data than the other sub-models.

A graphical comparison among the fitted regression models is reported in Figure 4. The plots of these curves are the empirical cdf and the estimated cdf. Based on these plots, it is evident that the EN regression model provides a superior fit to the current data.

## Conclusions

Flexible statistical distributions have been sought for describing data from practical situations in which the use of classical ones is not recommended. In this paper, we propose an extension of the normal distribution based on the exponentiated generalized family defined by Cordeiro et al. (2011), which adds two extra shape parameters to a baseline distribution. We provide some structural properties of the new extended normal (EN) distribution. The model parameters are estimated by maximum likelihood. The efficiency of this distribution is illustrated by means of two applications to real data sets. There is a clear evidence that the EN distribution outperforms the skew-normal distribution and can be a competitive alternative to the beta normal distribution. The classical regression model does not produce good results in many real problems, and for this reason several extensions have arisen in recent years. We propose a new regression model based on the EN distribution and prove its importance in real applications. This new regression model opens a wide range of research topics following the basic inference concepts of the normal linear regression model.

## Appendix A: Proof for the Theorem 3.1

We consider the power series

$$\begin{array}{@{}rcl@{}} (1 - z)^{b} \,=\, \sum_{k=0}^{\infty}(-1)^{k}\,{b \choose k}\,z^{k}, \end{array}$$

which holds for any real non-integer b and |z|<1. Using this generalized binomial expansion twice in Eq. (1), we can write the EG-G cumulative distribution as

$$\begin{array}{@{}rcl@{}} F(x)=\sum_{j=0}^{\infty} \,w_{j+1}\,H_{j+1}(x), \end{array}$$

where $$w_{j+1}= \sum _{m=1}^{\infty } (-1)^{j+m+1}\,{b \choose m}\, {m\,a \choose j+1}$$ and Hj+1(x) is the Exp-G cdf with power parameter j+1. By differentiating the last equation, we obtain (7).

## Appendix B: The Hessian matrix

The elements of the Hessian matrix are:

$$\begin{array}{@{}rcl@{}} U_{aa}&=&-\frac{n}{a^{2}}\,-\,(b\,-\,1)\,\sum_{i=1}^{n}\, \left\{ \frac{ \left[1\,-\,\Phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\right]^{a}\,\log^{2}\left[1\,-\,\Phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\right] }{ 1\,-\,\left[1\,-\,\Phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\right]^{a} } \right\}\\ &&+(b\,-\,1)\,\sum_{i=1}^{n}\,\left\{\,\frac{ \left[1\,-\,\Phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\right]^{2a}\,\log^{2}\left[1\,-\,\Phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\right] }{ \left\{\,1\,-\,\left[1\,-\,\Phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\right]^{a}\,\right\} } \right\}, \end{array}$$
$$\begin{array}{@{}rcl@{}} {}U_{ab}&=&-\sum_{i=1}^{n}\,\left\{ \frac{ \left[1\,-\,\Phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\right]^{a}\,\log\left[1\,-\,\Phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\right] }{ 1\,-\,\left[1\,-\,\Phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\right]^{a} } \right\}, \end{array}$$
$$\begin{array}{@{}rcl@{}} {\kern29pt}U_{a\mu}&=&\frac 1\sigma\,\sum_{i=1}^{n}\,\left\{ \frac{ \phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right) }{ 1\,-\,\Phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right) }\right\}\\ &&-\frac{(b-1)}{\sigma}\sum_{i=1}^{n}\, \left\{ \frac{ \phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\, \left[1\,-\,\Phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\right]^{a} }{ \left[1\,-\,\Phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\right]\left\{\,1\,-\,\left[1\,-\,\Phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\right]^{a}\,\right\} } \right\}\\ &&-\frac{a\,(b-1)}{\sigma} \,\sum_{i=1}^{n}\left\{ \frac{ \phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\,\left[1\,-\,\Phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\right]^{a-1} \,\log\left[1\,-\,\Phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\right] }{ 1\,-\,\left[1\,-\,\Phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\right]^{a} } \right\}\\ &&+\frac{a\,(b-1)}{\sigma} \,\sum_{i=1}^{n}\left\{ \frac{ \phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\, \left[1\,-\,\Phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\right]^{2a-1} \,\log\left[1\,-\,\Phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\right] }{ \left\{\,1\,-\,\left[1\,-\,\Phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\right]^{a}\,\right\}^{2} } \right\}, \end{array}$$
$$\begin{array}{@{}rcl@{}} {\kern30pt}U_{a\sigma}&=&-\frac \mu{\sigma^{2}}\,\sum_{i=1}^{n}\,\left\{ \frac{ \phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right) }{ 1\,-\,\Phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right) }\right\}\\ &&+\frac{\mu\,(b-1)}{\sigma^{2}}\sum_{i=1}^{n}\, \left\{ \frac{ \phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\, \left[1\,-\,\Phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\right]^{a} }{ \left[1\,-\,\Phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\right]\left\{\,1\,-\,\left[1\,-\,\Phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\right]^{a}\,\right\} } \right\} \\ &&+\frac{a\,(b-1)\,\mu}{\sigma^{2}} \,\sum_{i=1}^{n}\left\{ \frac{ \phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\,\left[1\,-\,\Phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\right]^{a-1} \,\log\left[1\,-\,\Phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\right] }{ 1\,-\,\left[1\,-\,\Phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\right]^{a} } \right\}\\ &&-\frac{a\,(b-1)\,\mu}{\sigma^{2}} \,\sum_{i=1}^{n}\left\{ \frac{ \phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\, \left[1\,-\,\Phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\right]^{2a-1} \,\log\left[1\,-\,\Phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\right] }{ \left\{\,1\,-\,\left[1\,-\,\Phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\right]^{a}\,\right\}^{2} } \right\}, \end{array}$$
$$\begin{array}{@{}rcl@{}} {}U_{bb}&=&\frac{n}{b^{2}}, \end{array}$$
$$\begin{array}{@{}rcl@{}} U_{b\mu}&=&-\frac{a}{\sigma}\,\sum_{i=1}^{n}\,\left\{\frac{\phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\, \left[1\,-\,\Phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\right]^{a-1}} {1\,-\,\left[1\,-\,\Phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\right]^{a}}\right\} \end{array}$$
$$\begin{array}{@{}rcl@{}} {}U_{b\sigma}&=&-\frac{a\mu}{\sigma^{2}}\,\sum_{i=1}^{n}\,\left\{\frac{\phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\,\left[1\,-\,\Phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\right]^{a-1}} {1\,-\,\left[1\,-\,\Phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\right]^{a}}\right\}, \end{array}$$
$$\begin{array}{@{}rcl@{}} U_{\mu\mu}&=&-\,\left(\frac{a-1}{\sigma^{2}}\right)\,\sum_{i=1}^{n}\,\left\{\frac{\phi^{\prime}\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)} {1\,-\,\Phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)} \,+\,\frac{\phi^{2}\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)} {\left[1\,-\,\Phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\right]^{2}}\right\} \\ &&+\frac{1}{\sigma^{2}}\sum_{i=1}^{n}\,\left\{\frac{\phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\,\phi^{\prime\prime}\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\,-\,{\phi}^{\prime{2}}\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)} {\phi^{2}\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)}\right\} \\ &&-\frac{a(b-1)}{\sigma}\,\sum_{i=1}^{n} \left\{-\frac1\sigma\frac{\phi^{\prime}\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\,\left[1\,-\,\Phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\right]^{a-1}} {1\,-\,\left[1\,-\,\Phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\right]^{a}} \right.\\ &&+\left(\frac{a-1}{\sigma}\right)\frac{\phi^{2}\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\,\left[1\,-\,\Phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\right]^{a-2}} {1\,-\,\left[1\,-\,\Phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\right]^{a}} \\ &&\left. -\frac{a}{\sigma} \frac{\phi^{2}\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\,\left[1\,-\,\Phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\right]^{2a-2}} {\left\{\,1\,-\,\left[1\,-\,\Phi\left(\frac{x_{i}\,-\,\mu}{\sigma}\right)\right]^{a}\,\right\}^{2}}\right\}, \end{array}$$
$$U_{\mu\sigma}\,=\,-\frac {1}{\sigma}\,U_{\mu}\,-\,\frac \mu\sigma\,U_{\mu\mu}\quad \text{ and } \quad U_{\sigma\sigma}\,=\,-\,\frac{n}{\sigma^{2}}\,+\,\frac{\mu}{\sigma^{2}}\,U_{\mu}\,-\,\frac{\mu}{\sigma}\,U_{\mu\sigma}.$$

## Availability of data and materials

Possible interested readers can contact authors.

1. 1.

Exton H. Handbook of hypergeometric integrals: theory, applications, tables, computer programs, 1978

## References

1. Azzalini, A: A class of distributions which includes the normal ones. Scand. J. Stat. 12, 171–178 (1984).

2. Cintra, RJ, Cordeiro, GM, Nascimento, ADC: Beta generalized normal distribution with an application for SAR. Image Process. 48, 1–16 (2013).

3. Cordeiro, GM, Cunha, DCC, Ortega, EMM: The exponentiated generalized class of distributions. J. Data Sci. 11, 777–803 (2013).

4. Cordeiro, GM, Ortega, EMM, Silva, GO: The exponentiated generalized gamma distribution with application to lifetime data. J. Stat. Comput. Simul. 81, 827–842 (2011).

5. Cover, TM, Thomas, JA, Ortega, EMM: Elements of Information Theory. Wiley-Interscience, New York (1991).

6. Doornik, JA: An Object-Oriented Matrix Language Ox 5. Timberlake Consultants Press, London (2007).

7. Eugene, N, Lee, C, Famoye, F: Beta-normal distribution and its applications. Commun. Stat.-Theory Methods. 31, 497–512 (2002).

8. Frery, AC, Nascimento, ADC, Cintra, RJ: Analytic Expressions for Stochastic Distances Between Relaxed Complex Wishart Distributions. IEEE Trans. Geosci. Remote Sens. 52, 1213–1226 (2014).

9. Jones, M, Faddy, MJ: A skew extension of the t-distribution, with applications. Biom. J. 65, 159–174 (2004).

10. Lee, C, Famoye, F, Alzaatreh, AY: Methods for generating families of univariate continuous distributions in the recent decades. Wiley Interdiscip. Rev. Comput. Stat. 5, 219–238 (2013).

11. Lehmann, EL: The power of rank tests. Ann. Math. Statist. 24, 23–43 (1953).

12. Lima, MCS, Cordeiro, GM, Ortega, EMM: A new extendion of the normal distribution. J. Data Sci. 3, 385–408 (2015).

13. Meeker, WQ, Escobar, L: Statistical Methods for Reliability Data. Wiley, New York (1998).

14. Mudholkar, GS, Srivastava, DK: Exponentiated Weibull family for analyzing bathtub failure-real data. IEEE Trans. Reliab. 42, 299–302 (1993).

15. Nadarajah, S: Explicit expressions for moments of order statistics. Statistics and Probability Letters. 78, 196–205 (2008).

16. Nascimento, ADC, Cintra, RJ, Frery, AC: Hypothesis Testing in Speckled Data with Stochastic Distances. IEEE Trans. Geosci. Remote Sens. 48, 373–385 (2010).

17. Paula, JS, Oliveira, M, Soares, MSP, Chaves, MGAM, Mialhe, FL: Perfil Epidemiológico dos Pacientes Atendidos no Pronto Atendimento da Faculdade de Odontologia da Universidade Federal de Juiz de Fora. Arquivos em Odontologia (UFMG). 48, 257–262 (2012).

18. Quang, HV: Likelihood Ratio Tests for Model Selection and Non-Nested Hypotheses. Econometrica. 57, 307–333 (1989).

19. Rêgo, LC, Cintra, RJ, Cordeiro, GM: On some properties of the beta normal distribution. Commun. Stat. - Theory Methods. 41, 3722–3738 (2012).

## Acknowledgements

The authors would like to thank the financial support of CNPq and FACEPE, Brazil.

Not applicable.

## Author information

Authors

### Contributions

The authors, viz MCSL, GMC, EMMO and ADCN with the consultation of each other carried out this work and drafted the manuscript together. All authors read and approved the final manuscript.

### Corresponding author

Correspondence to Maria C.S. Lima.

## Ethics declarations

### Competing interests

The authors declare that they have no competing interests. 