# The multivariate slash and skew-slash student t distributions

## Abstract

In this article, we introduce the multivariate slash and skew-slash t distributions which provide alternative choices in simulating and fitting skewed and heavy tailed data. We study their relationships with other distributions and give the densities, stochastic representations, moments, marginal distributions, distributions of linear combinations and characteristic functions of the random vectors obeying these distributions. We characterize the skew t, the skew-slash normal and the skew-slash t distributions using both the hidden truncation or selective sampling model and the order statistics of the components of a bivariate normal or t variable. Density curves and contour plots are drawn to illustrate the skewness and tail behaviors. Maximum likelihood and Bayesian estimation of the parameters are discussed. The proposed distributions are compared with the skew-slash normal through simulations and applied to fit two real datasets. Our results indicated that the proposed skew-slash t fitting outperformed the skew-slash normal fitting and is a competitive candidate distribution in analyzing skewed and heavy tailed data.

Mathematics Subject Classification Primary 62E10; Secondary 62P10

## Introduction

Skewed and heavy tailed data occur frequently in real life and pose challenges to our usual way of thinking. Examples of such data include household incomes, loss data such as crop loss claims and hospital discharge bills, and files transferred through the Internet to name a few. Candidate distributions for simulating and fitting such data are not abundant. One can’t simply take the normal or the t distributions or as such as substitutes. Even though Cauchy distribution can be used to simulate and fit such data, its sharp central peak and the fact that its first moment does not exist narrow its applications. Thus additional distributions are needed to study such skewed and heavy tailed data.

Kafadar (1988) introduced the univariate normal slash distribution as the resulting distribution of the ratio of a standard normal random variable (rv) and an independent uniform rv (hereafter referred to the distribution as the slash normal). Generalizing the standard normal by introducing a tail parameter, the slash normal has heavier tails than the standard normal, hence it could be used to simulate and fit heavy tailed data. Wang and Genton (2006) generalized the univariate slash normal to the multivariate slash normal and investigated its properties. They also defined the multivariate skew-slash normal as the resulting distribution of the ratio of a skewed normal rv and an independent uniform rv (hereafter referred to the distribution as the skew-slash normal). They applied it to fit two real datasets.

In this article, we introduce the slash (student) t distribution and the skew-slash (student) t distribution. They could be used to simulate and fit skewed and heavy tailed data. The slash t distribution generalizes the slash normal distribution of Kafadar (1988) and the multivaraiate slash normal distribution of Wang and Genton (2006), and the skew-slash t distribution generalizes the skew-slash normal distribution of the latter two authors. In the skew-slash t there is one parameter to regulate the skewness of the distribution and another parameter to control the tail behavior. By setting the skewness parameter to zero, the skew-slash t reduces to the slash t. By letting the tail parameter to be infinity, the skew-slash t simplifies to the skew t of Azzalini and Capitanio (2003). As both the slash and skew t take the t and hence the normal as their special cases, so does the skew-slash t. To fit data, one can start with the skew-slash t. If the fitted value of the degrees of freedom is very large, then one takes the simpler skew-slash normal model. This idea of course can be used to perform the hypothesis testing of a skew-slash normal sub-model against a skew-slash t model. We have derived the formulas for the densities, moments, marginal distributions and linear combinations of these distributions. Thus it could be expected that they can be used to analyze skewed and heavy tailed data.

Compared to the slash and skew-slash normal distributions, an additional parameter, the degrees of freedom, is included in the slash and skew-slash t distributions. This parameter gives the latter distributions more flexibility in fitting data than the former. Even though there is a tail parameter in both slash and skew-slash normal and slash and skew-slash t distributions, the degrees of freedom in the t distribution may lend an additional hand to model heavy (fat) tails and joints with the tail parameter to better fit data. This could be used to explain why the skew-slash fitting to the real GAD data in our application outperformed the skew-slash normal fitting. See Figure 1 where the skew-slash t fitting was able to better capture the peak of the histogram, and Table 1 where the AIC values indicated that the skew-slash t fitting was better than that of the skew-slash normal fitting. In fact, one observes that the standard error (SE) of the MLE of the tail parameter q in the table is very big (83.194) for the skew-slash normal fitting and much smaller (2.19) for the skew-slash t fitting. Noticing also the fact that the estimate of the degrees of freedom r is reasonable with a small SE, we would comment that the joint endeavor of q and r had higher fitting capability than that of a single q. Our simulation results in Section 5.1 and in particular in Table 2 also exhibited the superior performance of the proposed skew-slash t distribution to the skew-slash normal.

Azzalini and Dalla Valle (1996) introduced the multivariate skew normal distribution that extends the normal distribution with an additional skewness parameter. It provides an alternative modeling distribution to skewed data that are often observed in many areas such as economics, computer science and life sciences. Many authors have investigated skew t distributions, see e.g. Azzalini and Dalla Valle (1996), Gupta (2003), and Sahu et al. (2003). Azzalini and Capitanio (2003) proposed the multivariate skew t distribution by allowing a skewness parameter in a multivariate t distribution.

It is our belief that the proposed slash and skew-slash t distributions throw some additional light on this theory and contribute to the family of candidate distributions for modeling and simulating skewed and heavy tailed data.

The article is organized as follows. In Section 2, we introduce the multivariate slash t distribution, study its relationships with other distributions and derive the density function. We investigate its tractable properties such as heavy tail behavior and closeness of marginal distributions and linear combinations. We give the stochastic representations, moments and characteristic function. We close this section with an example which graphically displays the densities. In Section 3, we define the skew-slash t distribution, derive the densities and characteristic functions, and give the moments and distributions of linear combinations of these distributions. We first define the standard skew-slash distribution in Subsection 3.1 and study their relationships with other distributions. In subsection 3.2, we characterize the skew t, skew-slash normal and skew-slash t distributions using hidden truncation or selective sampling model and the order statistics of the components of a bivariate normal or t variable. In Subsection 3.2, we define the general multivariate skew-slash distribution. An example is presented to illustrate the densities of the proposed distributions. Section 4 covers parameter estimation and statistical inference. Here we briefly discuss the maximum likelihood and Byesian approaches. Section 5 is devoted to simulations as well as applications of the proposed skew-slash t distribution to fit two real datasets. Finally, some concluding remarks are given in Section 6.

## The multivariate slash t distribution

In this section, we define the multivarate slash t distribution, derive the density and study its tail behaviors and relationships with other distributions. We give the stochastic representations, moments, and characteristic function and discuss marginal distributions and linear combinations. We close this section with an example.

Let us first recall the multivariate t distribution. There are several variants of the definitions in the literature and we will adopt the following one. Details can be found in e.g. Kotz and Nadarajah (2004) or Johnson and Kotz (1972, p. 134). A continuous k-variate random vector T has a t distribution with degrees of freedom r, mean vector m, and correlation matrix R (covariance matrix Σ), written Tt k (r,m,R), if it has the probability density function (pdf) given by

$$\begin{array}{@{}rcl@{}} t_{k}(\mathbf{t}; r, {\mathbf{m}}, {\mathbf{R}})\,=\, \frac{\Gamma\left(\frac{r+k}{2}\right)}{(r\pi)^{k/2}\Gamma\left(\frac{r}{2}\right)|{\mathbf{R}}|^{1/2}} \left(1+\frac{(\mathbf{t}-{\mathbf{m}})^{\top} {\mathbf{R}}^{-1} (\mathbf{t}-{\mathbf{m}})}{r}\right)^{-({r+k})/{2}},\quad \mathbf{t}\in{\mathbb{R}}^{k}, \end{array}$$

where |M| denotes the determinant of a square matrix M. If m=0 and R=I k where I k denotes the k×k indentity matrix, it is referred to as the standard k-variate t distribution and denoted by t k (r). We now introduce the k-variate slash t distribution. Write UU(0,1) for the rv uniformly distributed over (0,1).

### Definition1.

A k-variate continuous random vector X is said to have a slash t distribution with tail parameter q>0, degrees of freedom r, $${\mathbf {m}}\in {\mathbb {R}}^{k}$$, and matrix R, written XS L T k (q,r,m,R), if it can be expressed as the ratio of two independent rv’s Tt k (r,m,R) and UU(0,1) as follows:

$$\begin{array}{@{}rcl@{}} {\mathbf{X}}={\mathbf{T}}/{U^{1/q}}. \end{array}$$

When m=0 and R=I k , it is referred to as the standard (k-variate) slash t and denoted by S L T k (q,r). It can be easily seen that the k-variate slash t distribution generalizes the k-variate t distribution as stated below.

### Remark1.

The limiting distribution of the slash t distribution S L T k (q,r), as q, is the student t distribution t k (r).

Let us now derive the density of the slash t distribution. Note that the joint density of T and U is

$$\begin{array}{@{}rcl@{}} t_{k}(\mathbf{t}; r, m, R){\mathbf{1}}_{[0,1]}(u), \quad \mathbf{t} \in{\mathbb{R}}^{k},\, u\in [\!0,1]. \end{array}$$

For the substitution v=u 1/q,x=t/u 1/q=t/v, the Jacobian determinant is q v k+q−1. Hence the joint density of (X,V) is given by

$$\begin{array}{@{}rcl@{}} qv^{k+q-1}t_{k}(v{\mathbf{x}}; r, {\mathbf{m}}, {\mathbf{R}}){\mathbf{1}}_{[0,1]}(v), \quad {\mathbf{x}}\in{\mathbb{R}}^{k}, v\in [0, 1]. \end{array}$$

Integrating v out yields the density f k (x;q,r,m,R) of X as follows:

$$f_{k}({\mathbf{x}}; q, r, {\mathbf{m}}, {\mathbf{R}})={\int_{0}^{1}} qv^{k+q-1}t_{k}({\mathbf{x}} v; r, {\mathbf{m}}, {\mathbf{R}})\,dv, \quad {\mathbf{x}}\in{\mathbb{R}}^{k}.$$
((2.1))

From this density it immediately follows that the standard k-variate slash t distribution S L T k (q,r) is symmetric about 0 as the standard k-variate t is so.

The heavy-tail behavior The cumulative distribution function (cdf) of the k-variate slash t is given by

$$\begin{array}{@{}rcl@{}} F_{k}({\mathbf{x}}; q, r, {\mathbf{m}}, {\mathbf{R}})=\int_{-\infty}^{{\mathbf{x}}} f_{k}({\mathbf{y}}; q, r, {\mathbf{m}}, {\mathbf{R}})\,d{\mathbf{y}}= {\int_{0}^{1}} qv^{k+q-2} H_{k}({\mathbf{x}} v; r, {\mathbf{m}}, {\mathbf{R}})\,dv, \end{array}$$

for $${\mathbf {x}} \in {\mathbb {R}}^{k}$$, where H k is the cdf of the k-variate t k (r,m,R). Denote by $$\bar H_{k}=1-H_{k}$$ the survival function. The survival function of the k-variate slash t is then given by

$$\bar F_{k}({\mathbf{x}}; q, r, {\mathbf{m}}, {\mathbf{R}})=\frac{k-1}{k+q-1}+{\int_{0}^{1}} qv^{k+q-2} \bar H_{k}({\mathbf{x}} v; r, {\mathbf{m}}, {\mathbf{R}})\,dv, \; {\mathbf{x}}\in{\mathbb{R}}^{k}.$$
((2.2))

Let us focus on the standard univariate case and write f=f 1, F=F 1, $$\bar F=1-F$$ and H=H 1, $$\bar H=1-H$$. It is well known that the univariate t distribution t 1(r,0,1) is a heavy tailed distribution with tail index r. In other words, the survival function decays at the rate of the power function r:

$$\begin{array}{@{}rcl@{}} \bar H(t; r, 0, 1) \propto t^{-r}, \quad t\to\infty, \end{array}$$

where a(t)b(t) if $$\lim \sup _{\textit {t}\to \infty } a(t)/b(t)<\infty$$. Let c be the constant in the density of the k-variate t. Then by L’Hopital’ Rule we have

\begin{array}{@{}rcl@{}} \begin{aligned} {\lim}_{x\to\infty}\frac{\bar H(x)}{x^{-r}} &=c{\lim}_{x\to\infty}\frac{\int_{x}^{\infty} (1+t^{2}/r)^{-(r+1)/2}\,dt}{x^{-r}}\\ &=c{\lim}_{x\to\infty}\frac{- (1+x^{2}/r)^{-(r+1)/2}\,dt}{-rx^{-r-1}}=cr^{(r-1)/2}<\infty. \end{aligned} \end{array}

This shows the above rate holds. Similarly, one can show that if q>r then

$$\begin{array}{@{}rcl@{}} \bar F(x ;q, r, 0, 1) \propto x^{-r}, \quad x\to\infty. \end{array}$$

This manifests that the standard univariate slash t is also heavy tailed.

Further, by (2.2) and in view of $$\bar H(xv; r, 0, 1)\geq \bar H(x; r, 0, 1)$$ for v[ 0,1] one derives

\begin{array}{@{}rcl@{}} \begin{aligned} \bar F(x; q, r, 0, 1) & \geq (k-1)/(k+q-1)+\bar H(x; r, 0, 1) {\int_{0}^{1}} qv^{k+q-2}\,dv\\ & \geq (k-1)/(k+q-1) + q/(k+q-1)\bar H(x; r, 0, 1) \\ & \geq \bar H(x; r, 0, 1), \quad x \geq 0. \end{aligned} \end{array}

This shows that the standard univariate slash t has heavier tails than the standard univariate t. In fact, the last inequality also holds for k-variate slash t for x=(x 1,…,x k ) with x i ≥0,i=1,…,k as

$$\begin{array}{@{}rcl@{}} \bar F_{k}({\mathbf{x}}; q, r, {\mathbf{m}}, {\mathbf{R}}) &=& P(X_{1}> x_{1}, \ldots, X_{k} > x_{k})\\ &=& {\int_{0}^{1}} P(T_{1}> u^{1/q}x_{1}, \ldots, T_{k} > u^{1/q} x_{k})\,du \\ & \geq& {\int_{0}^{1}} P(T_{1}> x_{1}, \ldots, T_{k} > x_{k})\,du = \bar H_{k}({\mathbf{x}}; r, {\mathbf{m}}, {\mathbf{R}}). \end{array}$$

Stochastic representations Stochastic Representations not only reveal the relations with other distributions but are very useful, for instance, in calculating moments and random generation. We provide two stochastic representations for the slash t distribution based on the two stochastic representations of the multivariate t distribution. According to Kafadar (1988), a continuous random variable ξ has a slash normal distribution, written ξS L N(q,0,Σ), if it can be expressed as ξ=Z/U 1/q where ZN k (0,Σ) and UU(0,1) are independent.

Note that if a rv T has a k-variate t distribution t k (r,m,R), then it has the stochastic representation

$$\begin{array}{@{}rcl@{}} \mathbf{T}=S^{-1}{\mathbf{Z}}+{\mathbf{m}}, \end{array}$$

where ZN k (0,R), r S 2 has the Chi-square distribution $${\chi _{r}^{2}}$$ with r degrees of freedom, and Z and S are independent. This can be easily verified. Let XS L T k (q,r,m,R). Then from the definition of the k-variate slash t it immediately follows that

$${\mathbf{X}}=S^{-1}{\boldsymbol{\xi}} + {\boldsymbol{\eta}},$$
((2.3))

where ξS L N k (q,0,R), η=m U −1/q, and both ξ and η are independent of S.

Using another stochastic representation of the k-variate t distribution from page 7 of Kotz and Nadarajah (2004), we obtain the second stochastic representation for the k-variate slash t rv XS L T k (q,r,m,R) as follows:

$${\mathbf{X}}={\mathbf{V}}^{-1/2}{\boldsymbol{\xi}}+ {\boldsymbol{\eta}},$$
((2.4))

where ξS L N k (q,0,r I k ), η=m U −1/q, and ξ,η are independent of V. Here V −1/2 is the inverse of the symmetric square root V 1/2 of V, where V has a k-variate Wishart distribution with degrees of freedom r+k−1 and covariance matrix R −1.

The moments Let us now calculate the mean vector and covariance matrix of the k-variate slash t. For XS L T k (q,r,m,R) with R=(R ij ), by the independence of T and U we have

$$\begin{array}{@{}rcl@{}} {\boldsymbol{\mu}}={\mathbb{E}}({\mathbf{X}})={\mathbb{E}}(\mathbf{T}/U^{1/q})={\mathbb{E}}(\mathbf{T}){\mathbb{E}}(U^{-1/q}). \end{array}$$

It is easy to calculate

$${\mathbb{E}}(U^{-1/q})=q/(q-1), \quad q>1,$$
((2.5))

and

$${\mathbb{E}}(U^{-2/q})=q/(q-2), \quad q>2.$$
((2.6))

For Tt k (r,m,R), from page 11 of Kotz and Nadarajah (2004) it follows

$${\mathbb{E}}(\mathbf{T})={\mathbf{m}}, \quad {\text{Var}}(\mathbf{T})={\mathbf{R}} r/(r-2), \quad r>2.$$
((2.7))

Hence by (2.5) and the first equality of (2.7) one has

$${\boldsymbol{\mu}}={{\mathbf{m}} q}/({q-1}).$$
((2.8))

To calculate Var(X) we use the formula

$${\text{Var}}({\mathbf{X}})={\text{Var}}({\mathbb{E}}({\mathbf{X}}|U))+{\mathbb{E}}\big({\text{Var}}({\mathbf{X}}|U)\big).$$
((2.9))

It is easy to see for r>2,

$$\begin{array}{@{}rcl@{}} {\text{Var}}({\mathbf{X}}|U)={\text{Var}}(\mathbf{T})/U^{2/q}={\mathbf{R}} r/((r-2)U^{2/q}), \end{array}$$

hence by (2.6) one has

$${\mathbb{E}}\big({\text{Var}}({\mathbf{X}}|U)\big)={\mathbf{R}} rq/((r-2)(q-2)), \quad q>2, r>2.$$
((2.10))

Also since $${\mathbb {E}}({\mathbf {X}}|U)={\mathbb {E}}(\mathbf {T})/U^{1/q}={\mathbf {m}}/U^{1/q}$$ it follows from (2.6) that

$$\begin{array}{@{}rcl@{}} {\mathbb{E}}(({\mathbb{E}}({\mathbf{X}}|U))^{{\otimes 2}})={\mathbf{m}}^{{\otimes 2}} q/(q-2), \quad {\mathbb{E}}({\mathbf{X}}^{{\otimes 2}})={\mathbf{m}}^{{\otimes 2}} q^{2}/(q-1)^{2}, \quad q>2, \end{array}$$

where M 2=M M . Hence for q>2,

\begin{array}{@{}rcl@{}} \begin{aligned} {\text{Var}}({\mathbb{E}}({\mathbf{X}}|U)) &={\mathbf{m}}^{{\otimes 2}} q/(q-2)-{\mathbf{m}}^{{\otimes 2}} q^{2}/(q-1)^{2}\\ &={\mathbf{m}}^{{\otimes 2}} q/((q-1)^{2}(q-2)), \quad q>2. \end{aligned} \end{array}

This, (2.10) and (2.9) yield the variance-covariance matrix X as follows:

$${\text{Var}}({\mathbf{X}})=\frac{rq{\mathbf{R}}}{(r-2)(q-2)}+\frac{q{\mathbf{m}}^{{\otimes 2}} }{(q-1)^{2}(q-2)}, \quad q>2, r>2.$$
((2.11))

In particular, if m=0 one has

$${\text{Var}}({\mathbf{X}})=\frac{rq{\mathbf{R}}}{(r-2)(q-2)}, \quad q>2, r>2.$$
((2.12))

Hence the k-variate slash t has the same correlation matrix R as the k-variate t.

In the case of the standard t, there are convenient formulae for the moments. Given non-negative integers p 1,…,p k such that p=p 1+…+p k <r/2. If any of the p 1,…,p k is odd, then

$$\begin{array}{@{}rcl@{}} {\mathbb{E}}\big(T_{1}^{p_{1}}\cdots T_{k}^{p_{k}}\big)=0. \end{array}$$

If all of them are even and r>p, then

$${\mathbb{E}}\big(T_{1}^{p_{1}}\cdots T_{k}^{p_{k}}\big)=\frac{r^{p/2}\Pi_{j=1}^{k} \left[1\cdot 3\cdot 5\cdots (2p_{j}-1)\right]}{(r-2)(r-4)\cdots (r-p)}:=c.$$
((2.13))

For details, see Kotz and Nadarajah (2004). Based on these formulae we have the following.

### Theorem1.

Let X=(X 1,…,X k )S L T k (q,r). Assume p 1,…,p k are nonnegative integers such that p=p 1+…+p k <r/2. 1) Suppose at leasts one of the p 1,…,p k is odd. If q>p, then

$$\begin{array}{@{}rcl@{}} {\mathbb{E}}\big(X_{1}^{p_{1}}\cdots X_{k}^{p_{k}}\big)=0. \end{array}$$

2) Suppose all p 1,…,p k are even. If q>p then

$$\begin{array}{@{}rcl@{}} {\mathbb{E}}\big(X_{1}^{p_{1}}\cdots X_{k}^{p_{k}}\big)={cq}/({q-p}), \end{array}$$

where c is given in (2.13). Otherwise if qp then $${\mathbb {E}}(X_{1}^{p_{1}}\cdots X_{k}^{p_{k}})$$ diverges.

### PROOF.

By the density formula (2.1) and using the substitution t=v x we have

\begin{array}{@{}rcl@{}} \begin{aligned} {\mathbb{E}}\big(X_{1}^{p_{1}}\cdots X_{k}^{p_{k}}\big) &= \int x_{1}^{p_{1}}\cdots x_{k}^{p_{k}}\left\{{\int_{0}^{1}} qv^{k+q-1}t_{k}({vx}_{1}, \ldots, {vx}_{k};r)\,dv\right\}\,{dx}_{1}\cdots {dx}_{k}\\ &={\int_{0}^{1}} qv^{q-1-p}\,dv\left\{{\mathbb{E}}\big(T_{1}^{p_{1}}\cdots T_{k}^{p_{k}}\big)\right\}. \end{aligned} \end{array}

Note that the integral $${\int _{0}^{1}} qv^{q-1-p}\,dv$$ converges to q/(qp) if qp>0 and diverges otherwise. These and (2.13) yield the desired results.

The marginal distributions Since the marginal distributions of a k-variate t are still t, the marginal distributions of a k-variate slash t are slash t.

### Theorem2.

The marginal distributions of a k-variate slash t distribution are still slash t.

### PROOF.

It suffices to show without loss of generality that for every 0≤sk,

$$\begin{array}{@{}rcl@{}} {\int\!\!\cdots\!\!\int} f_{k}(x_{1}, \ldots, x_{s}, x_{s+1}, \ldots, x_{k})\,{dx}_{s+1}\cdots {dx}_{k} =f_{s}(x_{1}, \ldots, x_{s}), \quad x_{1},\ldots, x_{s}\in{\mathbb{R}}, \end{array}$$

where f k (x)=f k (x;q,r,m,R). Substitution of the density (2.1) in the left hand of the above equality gives

$$\begin{array}{@{}rcl@{}} {\int\!\!\cdots\!\!\int} f_{k}({\mathbf{x}})\,{dx}_{s+1}\cdots {dx}_{k} ={\int_{0}^{1}} qv^{k+q-1}{\int\!\!\cdots\!\!\int} t_{k}(v{\mathbf{x}}){dx}_{s+1}\cdots {dx}_{k}dv, \end{array}$$

where t k (t)=t k (t;r,m,R). By substitution y s+1=v x s+1,…,y k =v x k one derives

$$\begin{array}{@{}rcl@{}} {\int\!\!\cdots\!\!\int} t_{k}(v{\mathbf{x}}){dx}_{s+1}\cdots {dx}_{k} =v^{s-k}{\int\!\!\cdots\!\!\int} t_{k}({vx}_{1}, \ldots, {vx}_{s}, y_{s+1}, \ldots, y_{k})\,{dy}_{s+1}\cdots {dy}_{k}. \end{array}$$

Because the marginals of the k-variate t distribution are still t, we have

$$\begin{array}{@{}rcl@{}} {\int\!\!\cdots\!\!\int} t_{k}({vx}_{1}, \ldots, {vx}_{s}, y_{s+1}, \ldots, y_{k})\,{dy}_{s+1}\cdots {dy}_{k} =t_{s}({vx}_{1}, \ldots, {vx}_{k}; r, {\mathbf{m}}_{1}, {\mathbf{R}}_{11}), \end{array}$$

where $${\mathbf {m}}=({\mathbf {m}}_{1}^{\top }, {\mathbf {m}}_{2}^{\top })^{\top }$$ with $${\mathbf {m}}_{1} \in {\mathbb {R}}^{s}$$ and R is partitioned into the 2×2 block matrix with R 11 being the s×s matrix at the position of (1,1)-block. See pages 15-16 of Kotz and Nadarajah (2004). Combining the last two equalities yields the desired equality.

Linear combinations Since the distribution of a linear function of a k-variate t variable is still t, it immediatly yields the following.

### Theorem3.

Let A be a nonsingular nonrandom matrix. If XS L T k (q,r,m,R), then A XS L T k (q,r,A m,A R A ).

The characteristic function Several authors derived the formulas for the characteristic functions φ T of the k-variate t rv T, see e.g. Joarder and Ali (1996), Dreier and Kotz (2002). Based on these formulas we can obtain the characteristic functions of the k-variate slash t using the following formula. For XS L T k (q,r,m,R), it can be expressed as the ratio X=T/U 1/q of two independent rv’s, so that its characteristic function φ X can be written as

\begin{aligned} \!\varphi_{{\mathbf{X}}}({\mathbf{t}}) &={\mathbb{E}}(\exp({i{\mathbf{t}}^{\top} {\mathbf{X}}})) ={\int_{0}^{1}} {\mathbb{E}}(\exp(i{\mathbf{t}}^{\top} {\mathbf{T}} u^{-1/q}))\,du ={\int_{0}^{1}} \varphi_{{\mathbf{T}}}({\mathbf{t}} u^{-1/q})\,du \end{aligned}
((2.14))

for t in some neighborhood of the origin in which the above integral converges.

### Example1.

For the standard univariate and bivariate slash t distributions S L T k (q,r),k=1,2, their densities are given by

$$\begin{array}{@{}rcl@{}} f_{1}(x; q,r)={\int_{0}^{1}} qv^{q} t_{1}(vx; r)\,dv, \quad x\in \mathbb{R}, \end{array}$$

and

$$\begin{array}{@{}rcl@{}} f_{2}({\mathbf{x}}; q,r)={\int_{0}^{1}} qv^{q+1} t_{2}(v{\mathbf{x}}; r)\,dv, \quad {\mathbf{x}}\in \mathbb R^{2}. \end{array}$$

Displayed in Figure 2 are the density curves and contours. On the left panel are the density curves of the normal, t and the standard slash t with q=1 and r=3 degrees of freedom. The curves are calibrated so that they have the same height at the origin. Observe that the slash t has the fattest tail whereas the normal has the slimmest one. On the right panel are the contours of the bivariate slash t with q=3 and r=5. Clearly the contours are symmetric.

## The multivariate skew-slash t distributions

In this section, we first recall the skew normal and skew t. In subsection 3.1, we define the standard skew-slash t distribution, study its relationships with other distributions and give the moments and characteristic function. In subsection 3.2, we use hidden truncation or selective sampling model and the order statistics to characterize the skew, slash and skew-slash normal and t distributions. In subsection 3.2, we define the general skew-slash t distribution, study its linear transformation and give an example in the end.

Azzalini and Dalla Valle (1996) introduced the skew normal distribution. A k-variate standard skew normal distribution has the density given by

$$\begin{array}{@{}rcl@{}} 2\phi_{k}({\mathbf{z}})\Phi({\boldsymbol{\lambda}}^{\top} {\mathbf{z}}), \quad {\boldsymbol{\lambda}}, {\mathbf{z}}\in{\mathbb{R}}^{k}, \end{array}$$

where ϕ k is the pdf of the k-variate standard normal N k (0,I) and Φ is the cdf of the standard normal N(0,1). Denote it by S K N k (λ).

Kotz and Nadarajah (2004) wrote “…that the possibilities of constructing skewed multivariate t distributions are practically limitless". The two authors surveyed the definitions given by Gupta (2003), Sahu et al. (2003), Jones (2002) and Azzalini and Capitanio (2003).

Based on these definitions, we may define the skew-slash t distributions in different ways. In this article, however, we will take the following approach. First, we will define the standard skew-slash t based on the standard skew t, a common special case of Gupta (2003), Azzalini and Capitanio (2003), and others. We then introduce a general skew-slash t distribution by introducing location and scale parameters. Our definition may lose some nice interpretations. But we think that this definition is natural, concise and, in particular, convenient in applications.

According to Gupta (2003), a k-variate skew t distribution with parameters $${\boldsymbol {\mu }}\in {\mathbb {R}}^{k}$$, Σ (correlation matrix R), $${\boldsymbol {\lambda }}\in {\mathbb {R}}^{k}$$ and r>0 has the density

$$h_{k}({\mathbf{t}}; {\boldsymbol{\lambda}}, r, \Sigma)=2t_{k}({\mathbf{t}}; r, 0, {\mathbf{R}}) \Psi\left(\frac{{\boldsymbol{\lambda}}^{\top} {\mathbf{t}}}{\sqrt{1+ {\mathbf{t}}^{\top}\Sigma^{-1} {\mathbf{t}}/r}}; r+k\right), \quad {\mathbf{t}}\in{\mathbb{R}}^{k},$$
((3.15))

where Ψ is the cdf of the univariate standard t distribution with r+k degrees of freedom. Denote this distribution by S K T k (λ,r,Σ). This form of the density given here is slightly different from that of Gupta (2003). Several constant parameters appeared in his density formual are not explicitly expressed in our form of the density. We have incorporated them in the parameters in the above density. Accordingly parameters of the same names may have different values.

When Σ is the identity matrix I k , S K T k (λ,r,I k ) is referred to as the standard skew t and denoted by S K T k (λ,r) with density h k (t;λ,r) given by

$$\begin{array}{@{}rcl@{}} h_{k}({\mathbf{t}}; {\boldsymbol{\lambda}}, r)=2t_{k}({\mathbf{t}};r)\Psi\left(\frac{{\boldsymbol{\lambda}}^{\top} {\mathbf{t}}}{\sqrt{1+{\mathbf{t}}^{\top} {\mathbf{t}}/r}}; r+k\right), \quad {\mathbf{t}}\in{\mathbb{R}}^{k}, \end{array}$$

where t k (t;r) is the density of the standard k-variate distribution t k (r) with degrees of freedom r. This is a common special case of the skew t shared by Gupta (2003), Azzalini and Capitanio (2003), and others.

### 3.1 The standard multivariate skew-slash t distribution

We begin with the definition, followed the moments and characteristic function.

### Definition2.

A k-variate continuous random vector W 0 is said to have a standard multivariate skew-slash t distribution with skewness parameter λ, tail parameter q and degrees of freedom r, written W 0S S L T k (λ,q,r), if it can be written as the ratio of two independent rv’s §S K T k (λ,r) and UU(0,1) as follows:

$$\begin{array}{@{}rcl@{}} {\mathbf{W}}_{0}={{\mathbf{S}}}/{U^{1/q}}. \end{array}$$

The standard skew-slash t generalizes the proposed standard k-variate slash t, the standard skew t of Gupta (2003) and Azzalini and Capitanio (2003), the standard slash normal of Kafadar (1988), and the standard skew-slash normal of Wang and Genton (2006). This is stated below.

### Remark2.

The limiting distribution of the standard skew-slash t distribution S S L T k (λ,q,r) is, as q, the standard skew t distribution S K T k (λ,r). The limiting distribution of S S L T k (λ,q,r) is, as r, the standard skew-slash normal S S L N k (λ,q), which includes as special cases the standard skew normal S K N k (λ) (q=) and the standard slash normal S L N k (q) (λ=0). As λ=0, S S L T k (λ,q,r) reduces to the slash t distribution S L T k (q,r).

In a similar way to the derivation of the standard slash t density in Section 2, we can obtain the density of the standard skew-slash t distribution S S L T k (λ,q,r) as follows: for $${\mathbf {w}}\in {\mathbb {R}}^{k}$$,

$$g_{k}({\mathbf{w}}; {\boldsymbol{\lambda}}, q, r)=2q{\int_{0}^{1}} \!\!u^{k+q-1}t_{k}(u{\mathbf{w}}; r) \Psi\left(\frac{u{\boldsymbol{\lambda}}^{\top} {\mathbf{w}}}{\sqrt{1+u^{2}{\mathbf{w}}^{\top} {\mathbf{w}}/r}}; k+r\right)du,$$
((3.16))

The moments Using the results of Azzalini and Dalla Valle (1996) or Kotz and Nadarajah (2004)(p.100-101) for the mean vector and covariance of the k-variate skew t distribution (i.e. by setting their α equal to $${\boldsymbol {\lambda }}/\sqrt {1+p/\nu }$$ with their p=k and ν=r here), we obtain the mean vector and covariance matrix of W 0S S L T k (λ,q,r) as follows:

$$\begin{array}{@{}rcl@{}} {\mathbb{E}}({\mathbf{W}}_{0})={\mathbb{E}}({\mathbf{S}}){\mathbf{E}}(U^{-1/q})=\frac{\sqrt{2}qr}{\sqrt{\pi}(q-1)(r-2)}\frac{{\boldsymbol{\lambda}}}{\sqrt{r+k + r {\boldsymbol{\lambda}}^{\top}{\boldsymbol{\lambda}}}}, \; q>1, r>2, \end{array}$$
$$\begin{array}{@{}rcl@{}} {\text{Var}}({\mathbf{W}}_{0})=\frac{qr}{(q-2)(r-2)(r-4)}\left({\mathbf{I}}_{k} - \frac{2(r+4)}{\pi(r-2)}\frac{r{\boldsymbol{\lambda}}{\boldsymbol{\lambda}}^{\top}}{r+k + r {\boldsymbol{\lambda}}^{\top}{\boldsymbol{\lambda}}}\right), \; q>2, r>4. \end{array}$$

The characteristic function As in deriving the characteristic function for the multivariate slash t in (2.14), one can obtain the characteristic function $$\varphi _{{\mathbf {W}}_{0}}\phantom {\dot {i}\!}$$ of W 0 as follows:

$$\begin{array}{@{}rcl@{}} \varphi_{{\mathbf{W}}_{0}}({\mathbf{t}})={\mathbb{E}}(\exp({i{\mathbf{t}}^{\top} {\mathbf{W}}_{0}})) ={\int_{0}^{1}} \varphi_{{\mathbf{S}}}({\mathbf{t}} u^{-1/q})\,du, \quad {\mathbf{t}} \in {\mathbb{N}} \end{array}$$

for some neighborhood of the origin in which the above integral converges, where φ S is the characteristic function of the standard skew t distribution SS K T k (λ,r).

### 3.2 Hidden truncation and order-statistics characterization

In this subsection, we characterize the skew t, skew-slash normal and skew-slash t distributions using the hidden truncation or selective sampling model and the order statistics of the components of a bivariate normal or t variable.

Hidden truncation or selective sampling We first give a fact about conditional pdf. Let X be a continuous random vector that has pdf f. Let X 0 be a random variable with cdf F 0. Let a be a measurable function of X such that P(A)>0 with A={a(X)≥X 0}. Suppose X and X 0 are independent. Then for every x,

\begin{array}{@{}rcl@{}} \begin{aligned} P({\mathbf{X}}\leq {\mathbf{x}}|a({\mathbf{X}})\ge X_{0}) &=P({\mathbf{X}}\leq {\mathbf{x}}, a({\mathbf{X}})\ge X_{0})/P(A)\\ &=E({\mathbf{1}}[{\mathbf{X}}\leq {\mathbf{x}}]F_{0}(a({\mathbf{X}})))/P(A) =\int_{-\infty}^{{\mathbf{x}}} f({\mathbf{y}})F_{0}(a({\mathbf{y}}))\,d{\mathbf{y}}/P(A). \end{aligned} \end{array}

Hence the conditional pdf of X given A is

$$f({\mathbf{x}}|A)=f({\mathbf{x}}) F_{0}(a({\mathbf{x}}))/P(A).$$
((3.17))

Using this we immediately derive the following results.

### Proposition1.

Suppose Tt k (r)and T 0t 1(r+k) are independent. Then the conditional pdf of T given A is

$$f({\mathbf{t}}|A)=t_{k}({\mathbf{t}}; r) \Psi(a({\mathbf{t}}); r+k)/P(A), \quad {\mathbf{t}}\in{\mathbb{R}}^{k}.$$
((3.18))

Consequently, if UU(0,1)is independent of both T and T 0 then the conditional pdf of T/U 1/q for 0<q<1 given A is

$$g({\mathbf{t}}|A)=2q{\int_{0}^{1}} u^{k+q-1}t(u{\mathbf{t}}; r) \Psi(a(u{\mathbf{t}}); r+k)\,du/2P(A), \quad {\mathbf{t}} \in{\mathbb{R}}^{k}.$$
((3.19))

In particular, both (3.18) and (3.19) hold for $$a({\mathbf {t}})=(\tau _{0}+{\boldsymbol {\tau }}^{\top }{\mathbf {t}})/\sqrt {1+{\mathbf {t}}^{\top }{\mathbf {t}}/r}$$ where τ 0,τ are arbitrary constants. In this case,

$$f({\mathbf{t}}|A)=2t_{k}({\mathbf{t}}; r) \Psi\left(\frac{\tau_{0}+{\boldsymbol{\tau}}^{\top}{\mathbf{t}}}{\sqrt{1+{\mathbf{t}}^{\top}{\mathbf{t}}/r}}; r+k\right)/2P(A), \quad {\mathbf{t}}\in{\mathbb{R}}^{k},$$
((3.20))
$$g({\mathbf{t}}|A)=2q{\int_{0}^{1}} u^{k+q-1}t(u{\mathbf{t}}; r) \Psi\left(\frac{\tau_{0}+u{\boldsymbol{\tau}}^{\top}{\mathbf{t}}}{\sqrt{1+u^{2}{\mathbf{t}}^{\top}{\mathbf{t}}/r}}; r+k\right)\,du/2P(A).$$
((3.21))

### Remark3.

The density function in (3.21) is the conditional pdf of the k-variate slash t rv X=T/U 1/q given A. It is noteworthy that the hidden truncation model yields the pdf (3.20) and (3.21), the former is proportional to the pdf (3.15) of the skew t distribution and the latter is proportional to the pdf (3.16) of the proposed skew-slash t distribution. For more discussion see e.g. Chapter 6 of Genton (2004) and the references therein. The skew-slash normal of Wang and Genton (2006) is the special case of r=.

Order statistics characterization Generalizing (c) of Theorem 1 in Arnold and Lin (2004), we give the following fact. Let (Y 1,Y 2) be a bivarate rv with pdf f and cdf F. Let Y 1,2= min(Y 1,Y 2) and Y 2,2= max(Y 1,Y 2) be the order statistics of the components of the bivarate random vector. Then

\begin{aligned} P\left(Y_{1,2}>y_{1}\right)&=P\left(Y_{2}\ge Y_{1}>y_{1}\right)+P(Y_{1} > Y_{2}>y_{1})\\ &=\int_{y_{1}}^{\infty} {dx}_{1} \int_{x_{1}}^{\infty} f\left(x_{1},x_{2}\right)\,{dx}_{2}+\int_{y_{1}}^{\infty} {dx}_{2} \int_{x_{2}}^{\infty} f\left(x_{1},x_{2}\right)\,{dx}_{1}. \end{aligned}

Thus the pdf f (1)(y 1) of Y 1,2 is given by

$$f_{(1)}\left(\,y_{1}\right)=\int_{y_{1}}^{\infty} f\left(\,y_{1}, x_{2}\right)\,{dx}_{2}+\int_{y_{1}}^{\infty} f\left(x_{1}, y_{1}\right)\,{dx}_{1}.$$
((3.22))

Let f 1,f 2 be the respective marginal pdf of Y 1,Y 2. Let

$$\begin{array}{@{}rcl@{}} F_{1}\left(\,y_{1}|y_{2}\right)=P\left(Y_{1}\leq y_{1}|Y_{2}=y_{2}\right), \quad F_{2}\left(\,y_{2}|y_{1}\right)=P\left(Y_{2}\leq y_{2}|Y_{1}=y_{1}\right). \end{array}$$

Then using $$\int _{y_{1}}^{\infty } f(y_{1}, x_{2})\,{dx}_{2}=f_{1}\left (\,y_{1}\right)\int _{y_{1}}^{\infty } f\left (\,y_{1}, x_{2}\right)/f_{1}\left (\,y_{1}\right)\,{dx}_{2} =f_{1}\left (\,y_{1}\right)\bar F_{2}\left (\,y_{1}\right)$$ and (3.22) we derive

$$f_{(1)}\left(\,y_{1}\right)=f_{1}\left(\,y_{1}\right)\bar F_{2}\left(\,y_{1}|y_{1}\right)+f_{2}\left(\,y_{1}\right)\bar F_{1}\left(\,y_{1}|y_{1}\right).$$
((3.23))

Similarly we derive the pdf f (2)(y 2) of Y 2,2 below:

$$f_{(2)}\left(\,y_{2}\right)=f_{2}\left(\,y_{2}\right) F_{1}\left(\,y_{2}|y_{2}\right)+f_{1}\left(\,y_{2}\right)F_{2}\left(\,y_{2}|y_{2}\right).$$
((3.24))

In their Theorem 1, Arnold and Lin (2004) showed that the order statistics of the components of a random vector from a bivariate normal distribution obey the skew-normal law. Using (3.23) and (3.24) we can show that the order statistics of the components of a random vector from a bivariate t distribution obey the skew t law. Thus we extend their result from the normal to t distribution as stated below.

### Proposition2.

Let (T 1,T 2)have a bivariate t distribution t 2(r,0,R) with degrees of freedom r, mean zero vector and correlation matrix R=(1,ρ,ρ,1) with −1<ρ<1. Define T 1,2= min(T 1,T 2) and T 2,2= max (T 1,T 2). Then the pdf t (1),t (2) of T 1,2,T 2,2 are given by

$$t_{(1)}(t_{1})=2t_{1}(t_{1};r)\Psi\left(\frac{-\lambda t_{1}}{\sqrt{1+{t_{1}^{2}}/r}};r+1\right), \quad t_{1}\in{\mathbb{R}},$$
((3.25))
$$t_{(2)}(t_{2})=2t_{1}(t_{2};r)\Psi\left(\frac{\lambda t_{2}}{\sqrt{1+{t_{2}^{2}}/r}};r+1\right), \quad t_{2}\in{\mathbb{R}},$$
((3.26))

where $$\lambda =\lambda (r,\rho)=\sqrt {({1+1/r})({1-\rho })/({{1+\rho })}}$$.

### Remark4.

The density functions in (3.25) and (3.26) reduce to the result (c) of Theorem 1 of Arnold and Lin (2004) when the df r= as $$\lambda (\infty, \rho)=\sqrt {(1-\rho)/(1+\rho)}$$ is equal to their skewness parameter γ in the skew-normal distribution.

### PROOF OF PROPOSITION 2.

Note first that T 1,T 2 have the same standard univariate t distribution t 1(r). Using the formula for the conditional pdf of t distribution (see e.g. page 16 of Kotz and Nadarajah (2004)), the conditional cdf of T 2 given T 1=t 1 can be written as

$$\begin{array}{@{}rcl@{}} P(T_{2}\leq t_{2}|T_{1}=t_{1})=\Psi(b(t_{1}, t_{2}; r, \rho); r+1), \quad t_{1}, t_{2} \in {\mathbb{R}}, \end{array}$$

where

$$\begin{array}{@{}rcl@{}} b(t_{1}, t_{2}; r, \rho)=\frac{(t_{2}-\rho t_{1})\sqrt{1+1/r}}{\sqrt{(1+{t_{1}^{2}}/r)(1-\rho^{2})}}. \end{array}$$

Similarly,

$$\begin{array}{@{}rcl@{}} P(T_{1}\leq t_{1}|T_{2}=t_{2})=\Psi(b(t_{2}, t_{1}; r, \rho); r+1), \quad t_{1}, t_{2} \in {\mathbb{R}}. \end{array}$$

We now apply (3.24), with both f 1 and f 2 equal to the pdf of t 1(r) and both F 1 and F 2 equal to the cdf Ψ(;r+1) of the t distribution t(r+1), to obtain the pdf t (2) given in (3.26), noting in this case $$b(t_{2}, t_{2}; r, \rho)={\lambda t_{2}}/{\sqrt {1+{t_{2}^{2}}/r}}$$. Aanloguously we can prove (3.25) in view of the equality $$\bar \Psi (t; r)=\Psi (-r; r)$$ by the symmetry of the univariate t distribution. This completes the proof.

As a corollary of Proposition 2, we obtain a characterization of the skew-slash normal and t distributions through the order statistics of the components of a random vector from a bivariate t distribution as stated below.

### Corollary1.

Let (T 1,T 2)be given in Proposition 2. Assume UU(0,1) is independent of (T 1,T 2). Then the pdf g (1),g (2) of T 1,2/U 1/q,T 2,2/U 1/q for 0<q<1 are given by

$$g_{(1)}(t_{1})=2q{\int_{0}^{1}} u^{q} t_{1}({ut}_{1};r)\Psi\left(\frac{-u\lambda t_{1}}{\sqrt{1+u^{2}{t_{1}^{2}}/r}};r+1\right)\,du, \quad t_{1}\in{\mathbb{R}},$$
((3.27))
$$g_{(2)}(t_{2})=2q{\int_{0}^{1}} u^{q} t_{1}({ut}_{2};r)\Psi\left(\frac{u\lambda t_{2}}{\sqrt{1+u^{2}{t_{2}^{2}}/r}};r+1\right)\,du, \quad t_{2}\in{\mathbb{R}}.$$
((3.28))

### Remark5.

The density functions in (3.27) and (3.28) are (i) the pdf of the order statistics of the components of the random vector (T 1,T 2)/U 1/q from the bivariate slash t distribution t 2(r,q), and (ii) reduce to the case of the skew-slash normal of Wang and Genton (2006) when the df r=.

### PROOF OF COROLLARY 1.

The desired (3.27) follows from (3.25) and the equalities

$$\begin{array}{@{}rcl@{}} P(T_{1,2}/U^{1/q}\leq t_{1})={\int_{0}^{1}} P(T_{1,2}\leq u^{1/q} t_{1})\,du =2q\int_{-\infty}^{t_{1}} {\int_{0}^{1}} v^{q} t_{(1)}(vs)\,dv\,ds, \end{array}$$

where the independence of T 1,2 and U is used to claim the first equality while the second equality follows from a change of variables. Similarly (3.28) can be proved and this finishes the proof.

The multivariate skew-slash t distributions We now introduce a general multivariate skew-slash t distribution by incorporating location and scale parameters.

### Definition3.

A continuous k-variate rv W has a multivariate skew-slash t distribution with location μ, scale Σ, skewness parameter $${\boldsymbol {\lambda }}\in {\mathbb {R}}^{k}$$, tail parameter q and degrees of freedom r, written WS S L T k (λ,q,r,μ,Σ), if it can be represented as a linear transformation of the standard multivariate skew-slash t rv W 0S S L T k (λ,q,r) as follows:

$$\begin{array}{@{}rcl@{}} {\mathbf{W}}={\boldsymbol{\mu}}+\Sigma^{1/2}{\mathbf{W}}_{0}, \end{array}$$

where Σ 1/2 is the the choleski decomposition of the positive definite covariance matrix Σ.

By a change of variables in (3.16), we derive the density of W given by

\begin{aligned} g_{k}({\mathbf{w}}; {\boldsymbol{\lambda}}, q, r, {\boldsymbol{\mu}},\Sigma)&=2q|\Sigma|^{-1/2}{\int_{0}^{1}} u^{k+q-1}t_{k}(u\Sigma^{-1/2}({\mathbf{w}}-{\boldsymbol{\mu}}); r)\\ &\quad \times\Psi\left(\frac{u{\boldsymbol{\lambda}}^{\top}\Sigma^{-1/2}({\mathbf{w}}-{\boldsymbol{\mu}})} {\sqrt{1+u^{2}Q({\mathbf{w}}; {\boldsymbol{\mu}}, \Sigma)/r}}; k+r\right)\,du, \quad {\mathbf{w}}\in{\mathbb{R}}^{k}, \end{aligned}
((3.29))

where Q(w;μ,Σ)=(wμ) Σ −1(wμ).

As in the case of the standard skew-slash t, one notices that the skew-slash t generalizes the slash t, the skew t of Azzalini and Capitanio, the slash normal of Kafadar, the skew normal of Azzalini and Dalla Valle and the skew-slash normal of Wang and Genton. This is stated below.

### Remark6.

The limiting distribution of the multivariate skew-slash t distribution S S L T k (λ,q,r,μ,Σ), as q tends to infinity, is the skew t distribution S K T k (λ,r,μ,Σ). The limiting distribution of S S L T k (λ,q,r,μ,Σ) is, as r tends to infinity, the skew-slash normal S S L N k (λ,q,μ,Σ), which include as special cases the k-variate skew normal S K N k (λ,μ,Σ) (q=) and the k-variate slash normal S L N k (q;μ,Σ) (λ=0). As λ=0, S S L T k (λ,q,r,μ,Σ) simplifies to the k-variate slash t distribution S L T k (q,r,μ,Σ).

Linear combinations Since the distribution of a linear function of a k-variate skew t variable is still skew t (see e.g. Section 5.9 of Kotz and Nadarajah (2004)), it immediatly yields the following result. Note that the relationship between our skewness parameter λ and their α is $${\boldsymbol {\lambda }}=\sqrt {1+k/r}{\boldsymbol {\alpha }}$$. Let D=diag(σ 1,1,…,σ k,k ) denote the diagonal matrix consisting of the diagonal entries of Σ=(σ i,j ) and R=D −1/2 Σ D /2 be the correlation matrix.

### Theorem4.

Let A be a nonsingular matrix. If WS S L T k (λ,q,r,μ,Σ), then $${\mathbf {A}}{\mathbf {W}} \sim {SSLT}_{k}(\tilde {\boldsymbol {\lambda }}, q, r, {\mathbf {A}}{\boldsymbol {\mu }}, \widetilde \Sigma)$$ where $$\widetilde \Sigma ={\mathbf {A}}\Sigma {\mathbf {A}}^{\top }$$ and

$$\begin{array}{@{}rcl@{}} \tilde{\boldsymbol{\lambda}}=\frac{\widetilde\Sigma^{-1/2}{\mathbf{B}}^{\top} {\boldsymbol{\lambda}}}{\sqrt{1+(1+k/r)^{-1}{\boldsymbol{\lambda}}^{\top} ({\mathbf{R}}-{\mathbf{B}} \widetilde\Sigma^{-1}{\mathbf{B}}^{\top}){\boldsymbol{\lambda}}}}, \quad {\mathbf{B}}={\mathbf{D}}^{-1/2}\Sigma{\mathbf{A}}. \end{array}$$

To give graphical view of the skewness and tail behaviors of the skew-slash t distributions, we plot the density curves of the univariate standard skew-slash t and contours of the bivariate standard skew-slash t below.

### Example2.

For the univariate and bivariate standard skew-slash t distributions S S L T 1(λ,q,r) and S S L T 2(λ 1,λ 2,q,r), the densities are given by

$$g_{1}(w;\lambda,q,r)=2q{\int_{0}^{1}} u^{q} t_{1}(uw; r) \Psi\left(\frac{u\lambda w}{\sqrt{1+u^{2}w^{2}/r}}; 1+r\right)\,du, \quad w\in{\mathbb{R}},$$
((3.30))

and, with $${\mathbf {w}}=(w_{1},w_{2})^{\top }\in {\mathbb {R}}^{2}$$ and λ=(λ 1,λ 2),

$$g_{2}({\mathbf{w}}; {\boldsymbol{\lambda}}, q, r)= 2q{\int_{0}^{1}} u^{q+1}t_{2}(u{\mathbf{w}}; r) \Psi\left(\frac{u(\lambda_{1}w_{1}+\lambda_{2}w_{2})}{\sqrt{1+u^{2}({w_{1}^{2}}+{w_{2}^{2}})/r}}; 2+r\right)\,du.$$
((3.31))

Displayed in Figure 3 are the density curves of the standard skew-slash t distribution S S L T 1(3,1,2), the skew t distribution S K T 1(3,2), and the slash t distribution S L T 1(1,2) distributions. The curves are shifted and rescaled for comparison. Observe that the skew-slash t has the fattest tail and is skewed to the right most.

Displayed in Figure 4 are the contour plots of the bivariate standard skew-slash t distribution S S L T 2(λ,3,5) for different values of the skewness parameter vector λ. Clearly the contours are more skewed as the skewness parameter vector λ gets longer.

## Statistical inference

In this section, we discuss maximum likelihood estimation and the Bayesian method and provide the approximate sampling distribution of the estimates.

The likelihood approach Let p(z;θ) denote either the slash t density in (2.1) or the skew-slash t density in (3.29), where θ denotes the corresponding parameter vector, i.e. θ=(q,r,m,R) or θ=(λ,q,r,μ,Σ). As the degrees of freedom r is unknown, we estimate it by the MLE treating it as a positive real number. Let Z 1,…,Z n be a random sample from the density p. Based on the sample, the parameter θ can be estimated by the MLE $$\hat {\boldsymbol {\theta }}$$. This is the parameter value $$\hat {\boldsymbol {\theta }}$$ which maximizes $$L({\boldsymbol {\theta }})= \frac {1}{n}\sum ^{n}_{i=1}\log p({\mathbf {Z}}_{i}; {\boldsymbol {\theta }})$$ of the log likelihood functions of the sample over the parameter space Θ, that is,

$$L(\hat{{\boldsymbol{\theta}}}) =\max_{{\boldsymbol{\theta}} \in \Theta} \frac{1}{n}\sum^{n}_{i=1} \log p({\mathbf{Z}}_{i}; {\boldsymbol{\theta}}).$$
((4.32))

Let $${\mathbf {S}}_{i}({\boldsymbol {\theta }})=\frac {\partial }{\partial {\boldsymbol {\theta }}}\log p({\mathbf {Z}}_{i}; {\boldsymbol {\theta }})$$ be the score for the observation Z i . Under suitable conditions, the MLE $$\hat {\boldsymbol {\theta }}$$ is the solution to the score equation

$$\sum_{i=1}^{n} {\mathbf{S}}_{i}({\boldsymbol{\theta}})=0.$$
((4.33))

Let J(θ)=E(S 1(θ)S 1(θ)) be the information matrix. Clearly the information matrix can be estimated by

$$\hat{\mathbf{J}}=\frac{1}{n}\sum^{n}_{i=1}{\mathbf{S}}_{i}(\hat{\boldsymbol{\theta}}){\mathbf{S}}_{i}(\hat{\boldsymbol{\theta}})^{\top}.$$
((4.34))

Under suitable conditions, the sampling distribution of $$\hat {\boldsymbol {\theta }}$$ can be approximated by the normal distribution with mean vector θ and variance-covariance matrix $$\hat {\mathbf {J}}^{-1}$$, that is, approximately

$$\hat{\boldsymbol{\theta}} \sim \mathcal{N}({\boldsymbol{\theta}},\, n^{-1}{\hat{\mathbf{J}}^{-1}}).$$
((4.35))

Based on this normal approximation, one can perform hypothesis testing, construct confidence intervals and, in particular, calculate the standard error (SE) of each component $$\hat {\boldsymbol {\theta }}_{j}$$ of the MLE $$\hat {\boldsymbol {\theta }}$$ as follows:

$$SE(\hat{\boldsymbol{\theta}}_{j})=\sqrt{n^{-1}\hat {\mathbf{J}}^{jj}}, \quad j=1, \ldots, k,$$
((4.36))

where $$\hat {\mathbf {J}}^{jj}$$ is the (j,j)- entry of the estimated inverse information matrix $$\hat {\mathbf {J}}^{-1}$$.

The numerical value of the MLE $$\hat {\boldsymbol {\theta }}$$ can be found by solving the score equation (4.33) using the newton’s method. Alternatively, one can directly search the solution of the maximization problem (4.32), for example, using the subroutine optim in the R package.

As for initial values of the newton’s algorithm, one can use the moment estimates of the parameters or other available consistent estimates. One technical issue here is that the estimate $$\hat \Sigma$$ of Σ must be positive definite. What we did in our applications was that we estimated the entries of Σ and then verified the positive definiteness of $$\hat \Sigma$$.

The Bayesian approach Given observed data D, the likelihood function L(D|θ) can be obtained from the proposed multivariate slash or skew-slash t distribution with parameter vector θ. The posterior density then satisfies p(θ|D)L(D|θ)π(θ), where π(θ) is the joint prior density of θ based on the available prior information on it. We choose a prior density for each component of θ and take the joint prior density of θ to be equal to the product of the marginal prior densities. The resulting full Bayesian model has the hierarchical structure with the conditional density of D|θ and the prior distribution θπ(θ) in the proposed model. One can obtain a random sample from the joint posterior density by the Markov Chain Monte Carlo (MCMC) method, and a parametric Baysian analysis of the model can be implemented using the Gibbs sampling method in R or JAGS.

## Simulations and applications

In this section, we use simulations to compare the performance of the skew-slash t and the skew-slash normal distributions. Simulations are also used to demonstrate parameter estimation using both the maximum likelihood criterion and Bayesian paradigm. Then the skew-slash t, with the density given in (3.29), and the skew-slash normal, with the density given below, are applied to fit two real datasets.

\begin{aligned} \eta_{k}({\mathbf{w}}; {\boldsymbol{\lambda}}, q, {\boldsymbol{\mu}},\Sigma)=&\;2q{\int_{0}^{1}} u^{k+q-1}\phi_{k}(u{\mathbf{w}}; u{\boldsymbol{\mu}}, \Sigma) \\ &\times \Phi\left(u{\boldsymbol{\lambda}}^{\top}\Sigma^{-1/2}({\mathbf{w}}-{\boldsymbol{\mu}}) \right)\,du, \quad {\mathbf{w}}\in{\mathbb{R}}^{k}, \end{aligned}
((5.37))

where ϕ k is the density of the k-variate normal distribution N k (μ,Σ) with mean vector μ and covariance matrix Σ.

### 5.1 Simulation study

Comparison between the skew-slash t and skew-slash normal To compare the two distributions, we first generated data from the 2-variate skew-slash t then fitted it with both 2-variate skew-slash t and skew-slash normal, and vice versa (i.e. generated data from the 2-variate skew-slash normal then fitted it with the two distributions). Reported in Table 2 are the average AIC values and average MLE’s of the parameters based on the sample size n=250 and repetitions M=200.

Notice that for data generated from both the skew-slash t and skew-slash normal, the average AIC values of the skew-slash t fitting were lower than those of the skew-slash normal, indicating a better overall model fitting of the former to the data than the latter.

Parameter estimation by the Bayesian method We now conducted simulations to study the behaviors of the MLE’s and Bayesian estimates of the parameters in the data-generation models for the standard univariate and bivariate slash and skew-slash t distributions. Here the prior distributions are the standard normal N(0,1) for the skewness parameter λ (or the components of the parameter vector λ), the exponential distribution with rate 0.1 truncated at 2 for the tail parameter q and truncated at 4 for the degrees of freedom r. It follows from Sahu et al. (2003) and Fernandez and Steel (1998) that the resulting distributions have finite variances. We generated 100 random samples of size 200 from the standard univariate slash and skew-slash t and the standard bivariate slash and skew-slash t. Reported in Table 3 are the average estimates of the parameters based on the maximum likelihood criterion and Bayesian paradigm. Observe that the two types of estimates are close for all the simulation setups.

### 5.2 Applications

Model fitting to the GAD Data Gestational age at delivery (GAD) is a variable widely studied in epidemiology, see, for example, Longnecker et al. (2001). We applied the skew-slash t and skew-slash normal distributions to fit the log transformed GAD of n=100 observations. Figure 1 is the histogram superimposed with the fitted density curves, while Table 1 reports the MLE’s, the standard errors (SE) of the parameter estimates and the AIC. We can see from Figure 1 that the skew-slash t distribution was able to better capture the peak of the histogram, giving a better estimation of the density to the majority of data points. In the mean time, the AIC values in Table 1 indicated that the skew-slash t fitting was better than the skew-slash normal.

Model fitting to the AIS data Azzalini and Dalla Valle (1996) used their skew normal distribution to fit (LBM-lean body mass, BMI-body mass index) pairs of the athletes from Australian Institute of Sport (AIS), where the data of n=202 observations were reported in Cook and Weisberg (1994). Wang and Genton (2006) used their skew-slash normal distribution to re-fit the data. Here we applied the proposed skew-slash t distribution to re-fit the (LBM, BMI) pairs in the AIS data. Before fitting we standardized the variables.

Figure 5 is the scatter plot superimposed with the fitted skew-slash t and skew-slash normal contours in the scale of LBM and BMI. The skew-slash normal contour is similar to what was reported in Wang and Genton (2006). The comparison between the two contours seemed to indicate that the fittings based on two models were close.

Reported in Table 4 are the MLE’s, standard errors and AIC. A smaller AIC value of the skew-slash normal fitting than the skew-slash t fitting indicated that the former was a better fit to this data. When the skew-slash normal was used to fit the data we were able to obtain the MLE’s. But the row and column corresponding to parameter q in the Hessian matrix were zero, suggesting a simpler skew normal fitting to this data. The skew-slash normal fitting by Wang and Genton (2006) led to the same conclusion though they did not report the standard errors of the parameter estimates.

In conclusion, the proposed slash and skew-slash t are competitive candidate models for fitting skewed and heavy tailed data. The parameters can be estimated under either the frequentist method or Bayesian paradigm. Although for a particular dataset the skew-slash t may not be the final model, it is a good choice to start with in model selection due to its flexibility and the fact that it takes the skew normal, skew t and hence the usual normal and t as its submodels.

## Concluding remarks

In this article, we defined the multivariate slash and skew-slash distributions in a pursuit of providing additional distributions to simulate and fit skewed and heavy tailed data. We investigated the heavy tail behaviors and tractable properties of these distributions which are useful in simulations and applications to real data. We characterized the skew t, the skew-slash normal and the skew-slash t distributions using both the hidden truncation or selective sampling model and the order statistics of the components of a bivariate normal or t variable. We demonstrated that the proposed skew-slash t model takes as sub-models the slash t, the slash normal, the skew-slash normal, the skew normal, the skew t and hence the usual normal and t. This nested property can be used in hypothesis testing.

Our simulations and applications to real data indicated that the proposed skew-slash t fitting outperformed the skew-slash normal fitting. Even though the skew-slash normal contains a tail parameter q, the fitting with it to the GAD data was unsatisfactory as the SE of the MLE of the tail parameter q was large, see Table 1. This suggests that not all heavy-tail properties in data can be explained by the tail parameter q in the slash distributions. Thus it makes sense for us to further search for distributions which can be used to fit heavy tailed data. Our proposed slash and skew-slash t distributions can be considered as an example in this attempt. We complete our remarks by pointing out that the degrees-of-freedom parameter r and the tail parameter q would explain different types of fat tail behaviors existed in data.

## References

• Arnold, BC, Lin, GD: Characterization of the Skew-normal and Generalized Chi Distributions. Sankhyā. 66, 593–606 (2004).

• Azzalini, A, Capitanio, A: Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t distribution. J. R. Stat. Soc. Ser. B. 65, 367–389 (2003).

• Azzalini, A, Dalla Valle, A: The multivariate skew-normal distribution. Biometrika. 83, 715–726 (1996).

• Cook, RD, Weisberg, S: An Introduction to Regression Graphics. Wiley, New York (1994).

• Dreier, I, Kotz, S: A note on the characteristic function of the t-distribution. Statist. & Probabil. Lett. 57, 221–224 (2002).

• Fernandez, C, Steel, MFJ: On Bayesian Modeling of Fat Tails and Skewness. J. Am. Statist. Assoc. 93, 359–371 (1998).

• Genton, MG (ed.): Skew-elliptical distributions and their applications: a journey beyond normality. Chapman and Hall/CRC, Boca Raton (2004).

• Gupta, AK: Multivariate skew t-distribution,Statistics: A Journal of Theoretical and Applied Statistics. 37, 359–363 (2003). doi:10.1080/715019247.

• Joarder, AH, Ali, MM: On the characteristic function of the multivariate student t distribution. Pak. J. Statist. 12, 55–62 (1996).

• Johnson, N, Kotz, S: Distributions in Statistics: Continuous Multivariate Distributions. John Wiley & Sons, New York (1972).

• Jones, MC: Marginal replacement in multivariate densities, with application to skewing spherically symmetric distributions. J. Multivar. Anal. 81, 85–99 (2002).

• Kafadar, K: Slash Distribution. In: Johnson, NL, Kotz, S, Read, C (eds.)Encyclopedia of Statistical Sciences. vol. 8, pp. 510–511. Wiley, New York (1988).

• Kotz, S, Nadarajah, S: Multivariate t distributions and their applications. Cambridge University Press, Cambridge (2004).

• Longnecker, MP, Klebanoff, MA, Zhou, H, Brock, JW: Association between maternal serum concentration of the ddt metabolite dde and preterm and small-for-gestational-age babies at birth. Lancet. 358, 110–114 (2001).

• Sahu, SK, Dey, DK, Branco, MD: A new class of multivariate skew distributions with applications to Bayesian regression models. Can. J. Stat. 31, 129–150 (2003).

• Wang, J, Genton, M: The multivariate skew-slash distributions. J. Statist. Plann. Inferr. 136, 209–220 (2006).

## Acknowledgements

The authors gratefully thank two anonymous reviewers for their suggestions that substantially improved the article.

## Author information

Authors

### Corresponding author

Correspondence to Fei Tan.

### Competing interests

The authors declare that they have no competing interests.

### Authors’ contributions

FT contributed to the entire work, HP to the theoretical part, and YT to the inference, simulations and applications. All authors read and approved the final manuscript.

## Rights and permissions 