Skip to main content

High-dimensional star-shaped distributions

Abstract

Stochastic representations of star-shaped distributed random vectors having heavy or light tail density generating function g are studied for increasing dimensions along with corresponding geometric measure representations. Intervals are considered where star radius variables take values with high probability, and the derivation of values of distribution functions of g-robust statistics is proved to be based upon considering random events whose probability is asymptotically negligible if the dimension of the sample vector is approaching infinity. Moreover, a principal component representation of p-generalized elliptically contoured p-generalized Gaussian distributions is discussed.

Introduction

Among the frequently obtained impressions one gets from analyzing high-dimensional data sets are that an observation point’s distance from the zero element of the sample space is likely to belong to a certain interval from the positive real line, away from zero, and that the distribution of the direction of the vector seems to be close, in a certain sense, to a uniform distribution on the set of all directions that are observable from a certain center. The first observation can be reflected from a probabilistic point of view by a measure concentration type property including what is done in (Biau and Mason 2015) and (Vershynin 2016), and the second is part of background for testing uniformity on high-dimensional spheres, see e.g. (Cutting et al. 2017), possibly after projecting data points onto spheres as, e.g., in (Banerjee and Ghosh 2004).

In situations of the described type, it may be reasonable to model the data, or their residuals after fitting to a model, by multivariate star-shaped distributions. In this regard, (Balkema and Embrechts 2007) and (Balkema et al. 2010) discover conditions ensuring that star-shaped distributions with the Gauss-exponential law being one of the most known examples appear as limit laws in certain high-risk scenarios.

Distributions from the class of star-shaped distributions are flexible with respect to convexity or radial concavity, allow different variability of probability mass along different directions of the sample space and are able to model light and heavy distribution centers and tails. Because there is no natural number being representative for large dimensions, one might like to consider sequences or schemas of series of n-dimensional vectors with n approaching infinity. However, for simplicity of notation, we instead consider here just a single random vector X taking values in \(\mathbb {R}^{n}\) and assume afterwards that n is tending to infinity in formulas holding for X.

Let us recall at this point the following general aspect of uni- or multivariate asymptotic probabilistic analysis being of particular importance, for example, in large deviation theory, but not exclusively there. Studying the limit behavior of certain sequences of distributions on specific subsets of their ranges of definition and comparing it to how the appearing limit law itself behaves on the same sets needs to precisely know the latter one. In this respect, it is an independent problem to study the behavior limit laws show on the sets of interest. Similarly, if a sequence of distributions of increasing dimension is approximated in a certain part of its range of definition by a high-dimensional star-shaped limit law then studying the latter one is an independent problem being in the core of interest of the present note.

With the agreement of considering just one single vector X of dimension n, particular questions concerned by the buzzword ’big data’ are approached in the present short note by reflecting above mentioned impressions gained from data in the language of probability distributions. To be more specific, we are dealing here with star-shaped distributions in \(\mathbb {R}^{n}\) and correspondingly distributed vectors. Such vector allows a stochastic representation as a product of a random generalized radius variable R and a random vector U being star-uniformly distributed on a star-sphere and independent of R, as well as a corresponding geometric measure representation. Some consequences which can be drawn from these representations in case of increasing dimension are studied. In particular, a representation of p-generalized elliptically contoured distributions is considered from the point of view of principal components.

The paper is structured as follows. In “Preliminaries” section, we present preliminary facts on star-shaped distributions including the notions of star surface content measure and star-uniform distribution on a star sphere. “A principal component representation” section deals with the particular class of p-generalized elliptically contoured distributions and it is studied there how they apply to modeling high-dimensional data. “A measure concentration property” section is then aimed to consider typical intervals where R takes values if X is star-shaped distributed, and in “On g-robust statistics” section distributions of univariate statistics are described which can basically be derived from star-uniformly distributed vectors. Such distributions are not affected by whether X has a density generating function g generating light or heavy distribution tails and is therefore called g-robust. The derivation of values of distribution functions of g-robust statistics is proved to be based upon considering random events whose probability is asymptotically negligible if the dimension of the sample vector is approaching infinity.

Preliminaries

Let \(K\subset \mathbb {R}^{n}\) be a star body having the origin in its interior and assume that the Minkowski functional hK of K is positively homogeneous of degree one. We call K(r)=rK and its boundary S(r)=rS the star ball and star sphere of star radius r>0, respectively. If g:[0,)→[0,) satisfies 0<I(n,g)< where \(I(n,g)=\int \limits _{0}^{\infty } r^{n-1}g(r)dr\) then it is called a density generating function. In such case,

$$\varphi_{g,K}(x)=C(g,K)g(h_{K}(x)), x\in\mathbb{R}^{n} $$

is called a star-shaped density and K its contour defining star body. The corresponding probability measure is denoted Φg,K and the normalizing constant allows the representation

$$C(g,K)=\frac{1}{\mathfrak{O}_{S}(S)I(n,g)} $$

where \(\mathfrak {O}_{S}(S)\) means the star-generalized surface content of S, see (Richter 2014). If the additional assumption C(g,K)=1 is satisfied then g is called a density generator. In the following example, we recall an explicit analytical representation of the star-generalized surface content measure OS in case S is a p-generalized ellipsoid with main axes of half lengths a1,...,an and indicate relationships to representations in other particular cases.

Example 1

Let \(K=\{x\in \mathbb {R}^{n}:|x|_{a,p}\leq 1\}\) where \(|x|_{a,p}=\left (\sum \limits _{i=1}^{n}|\frac { x_{i}}{a_{i}}|^{p}\right)^{1/p}, a=(a_{1},...,a_{n})^{T}, a_{i}>0,i=1,...,n,p>0\), and S+(−)=S∩{x:xn>(<)0} the upper (lower) half of the (a,p)-ellipsoid S={x:|x|a,p=1}. Then

$$ \mathfrak{O}_{S}(A)=a_{n}\;\left(\int\limits_{G(A\cap S^{+})}+\int\limits_{G(A\cap S^{-})}\right)\quad\frac{d(x_{1},...,x_{n-1})}{\left(1-\sum\limits_{i=1}^{n-1}|\frac{ x_{i}}{a_{i}}|^{p}\right)^{1-1/p}},\; A\in{\mathfrak{B}(S)} $$
(1)

where \(G(A\cap S^{+(-)})=\{\vartheta \in \mathbb {R}^{n-1}:\exists \eta =\eta (\vartheta) s.t. (\vartheta ^{T},\eta)^{T}\in A\cap S^{+(-)} \}\), \(\mathfrak {B}(S)=\mathfrak {B}^{n}\cap S\) and \(\mathfrak {B}^{n}\) denotes the Borel σ-field in \(\mathbb {R}^{n}\).

  • If p=2 and a=1n=(1,...,1)T then \(\mathfrak {O}_{S}(A)\) is the Euclidean surface content of the measurable subset A of S.

  • If p=1 then \(\mathfrak {O}_{S}(A)\) can be considered as a particular polyhedral generalized surface content of A.

  • If \(\mathfrak {O}_{S,\infty }(A)\) is defined as the limit of \(\mathfrak {O}_{S}(A), A\in {\mathfrak {B}(S)}\) as p then \(\mathfrak {O}_{S,\infty }\) can be considered as another particular polyhedral generalized surface content measure. For the whole class of polyhedral generalized surface content measures, see (Richter and Schicker 2017).

  • Generalizations of representation (1) hold true for all cases where K is a ball with respect to any norm or antinorm, see (Richter 2015).

The next example deals with the asymptotic behavior of star surface content and volume of star spheres and star balls or ellipsoids, respectively, if dimension is approaching infinity.

Example 2

[a] It is well known that if S is the Euclidean unit sphere then \(\mathfrak {O}_{S}(S)=\omega _{n}\) where \( \omega _{n}={2\pi ^{n/2}}/{\Gamma ({\frac {n}{2}})} \) is the Euclidean surface content of S. It is known that \(\arg \sup \limits _{n}\omega _{n}=7\) and that ωn is monotonously decreasing starting from this value, see e.g. (Loskot and Beaulieu 2007). Moreover, according to Stirling’s formula,

$$\mathfrak{O}_{S}(S)\sim\frac{\sqrt{2e}}{1+\frac{1}{6n}}(\frac{2\pi e}{n})^{(n-1)/2} \text{ as } n\rightarrow \infty$$

meaning that the ratio of the quantity on the left hand side divided by that of the right hand side tends to one if n tends to infinity. Obviously, \(\mathfrak {O}_{S}(S)\) is tending to zero quite fast as n.

(b) If S is the ln,p-sphere having unit star radius, p>0, then the star surface content of S is known to be \(\mathfrak {O}_{S}(S)=\omega _{n,p}\) where \( \omega _{n,p}={2^{n}(\Gamma (\frac {1}{p}))^{n}}/(p^{n-1}\Gamma (\frac {n}{p})). \) Note that

$$\mathfrak{O}_{S}(S)\sim \frac{p}{\sqrt{2\pi}}\left(\frac{p}{n}\right)^{\frac{n}{p}-\frac{1}{2}} \left(\frac{2\Gamma(\frac{1}{p})}{p}\right)^{n}e^{\frac{n}{p}}, n\rightarrow \infty. $$

Let \(\Omega _{n,p}=\frac {\omega _{n,p}}{n}\) denote the volume of the ln,p-ball. The asymptotic relations following from the latter one,

$$\Omega_{n,p}^{\;\frac{p}{n}}\sim\frac{pe\left[\frac{2}{p}\Gamma\left(\frac{1}{p}\right)\right]^{p}}{n}\text{\; and \;} \Omega_{n,p}^{\;\frac{p}{n\ln n}}\sim\frac{1}{e},\,n\rightarrow \infty, $$

generalize two results given in (Chen and Lin 2014) for the particular Euclidean case p=2.

(c) It is well known that the star surface content of the (a,p)-ellipsoid \(S=\{x\in \mathbb {R}^{n}:|x|_{a,p}= 1\}\) is \( \mathfrak {O}_{S}(S)=a_{1}\cdots a_{n}\omega _{n,p}, \) thus

$$ \mathfrak{O}_{S}(S)\sim a_{1}\cdots a_{n}\frac{p}{\sqrt{2\pi}}\left(\frac{p}{n}\right)^{\frac{n}{p}-\frac{1}{2}} \left(\frac{2\Gamma(\frac{1}{p})}{p}\right)^{n}e^{\frac{n}{p}},\, n\rightarrow \infty. $$
(2)

If a random vector X follows the star-shaped density φg,K then it allows the stochastic representation

$$X\overset{d}{=}R\cdot U $$

meaning that X is distributed as R·U. The nonnegative random variable R is independent of the random vector U, R has density function

$$f_{R}(r)=I(n,g)^{-1}r^{n-1}g(r) I_{[0,\infty)}(r) $$

and U has the star-uniform distribution

$$\omega_{S}(A)=\frac{O_{S}(A)}{O_{S}(S)}, A\in \mathfrak{B}(S). $$

Here, I[0,)(r)=1 if r≥0 and I[0,)(r)=0 otherwise. Accordingly, the geometric measure representation of star-shaped distribution laws reads

$$\Phi_{g,K}(B)=\frac{1}{I(n,g)}\int\limits_{0}^{\infty}r^{n-1}g(r)\mathfrak{F}_{S}(B,r)dr, B\in\mathfrak{B}^{n} $$

where

$$\mathfrak{F}_{S}(B,r)=\frac{\mathfrak{O}_{S}\left(\left[\frac{1}{r}B\right] \cap S\right)}{\mathfrak{O}_{S}(S)}=\omega_{S}\left(\left[\frac{1}{r}B\right] \cap S\right) $$

is the star sphere intersection proportion function of the set B. Let the Minkowski functional of K be denoted hK then

$$R\overset{d}{=}h_{K}(X), $$

and R is called the star radius of X.

A principal component representation

In this section we study to what extent a particular class of continuous multivariate star-shaped distributions applies to modeling high-dimensional data. To be specific, we consider p-generalized elliptically contoured p-generalized Gaussian distributions.

It will be shown which way principal component analysis can be used to identify those components of such star-shaped distributed vectors being of major importance for the modeling process. In particular it turns out that covariance ellipsoids being l2-ellipsoids have the same main axes as density level set ellipsoids being lp-ellipsoids.

Let Xi,i=1,...,n be independent random variables correspondingly following the densities

$$f_{p}(x;\mu_{i},\sigma_{i})=\frac{C_{p}}{\sigma_{i}}\exp\left\{-\frac{|x-\mu_{i}|^{p}}{p\sigma_{i}^{p}}\right\}, x\in \mathbb{R} $$

where \(C_{p}=p^{1-1/p}/\left (2\Gamma \left (\frac {1}{p}\right)\right)\). Then \(\mathbb {E}X_{i}=\mu _{i}\) and \(V(X_{i})=\kappa \sigma _{i}^{2}\) are expectation and variance of Xi, respectively, i=1,...,n. Here and in what follows we use the notation

$$\kappa(n)=p^{\frac{2}{p}}\frac{\Gamma\left(\frac{n+2}{2}\right)}{\Gamma\left(\frac{n}{2}\right)}\text{ and}~ \kappa=\kappa(1). $$

Let σ=(σ1,...,σn)T,X=(X1,...,Xn)T and μ=(μ1,...,μn)T, then Xμ allows the stochastic representation

$$X-\mu\overset{d}{=}R_{\sigma,p}U_{\sigma,p} $$

where Rσ,p=|Xμ|σ,p and \(U_{\sigma,p}=\frac {1}{R_{\sigma,p}}(X-\mu)\) are independent. The star radius Rσ,p follows the p-generalized Chi-distribution with n d.f. having according to (Richter 2007) the density

$$f(r)=\frac{r^{n-1}e^{-\frac{r^{p}}{p}}}{p^{\frac{n}{p}-1}\Gamma\left(\frac{n}{p}\right)}I_{[0,\infty)}(r), $$

and the stochastic basis vector Uσ,p is star-uniformly distributed on the star sphere S=Eσ,p from Example 2(c) with ai=σi,i=1,...,n,Uσ,pωσ,p. Thus, using notation in (Richter 2014), X belongs to the class \( EC_{\sigma,p,\mu,I_{n}}\) and its density generating function can be chosen as \(g(r)=g_{p}(r)= e^{-\frac {r^{p}}{p}}I_{[0,\infty)}(r)\). Note that \(\mathbb {E} U_{\sigma,p}=0_{n}\) and, because \(\mathbb {E} R^{2}_{\sigma,p}=\kappa (n)\),

$$cov (U_{\sigma,p})=\frac{\kappa}{\kappa(n)} D^{2}$$

where

$$D=diag(\sigma_{1},...,\sigma_{n}). $$

It follows that \(\mathbb {E} X=\mu \) and cov(X)=κD2. Now, denote an orthogonal n×n-matrix \(O=\left (O_{i,j}\right)_{i,j=\overline {1,n}}\) and let the transpose of its i’th row be Oi=(Oi,1,...,Osi,n)T. The random vector Y=O(Xμ) follows the p-generalized elliptically contoured density

$$ f_{Y}(y)=\frac{C_{p}^{n}}{\sigma_{1}\cdots\sigma_{n}} \exp\left\{-\frac{1}{p}|O^{T}y|^{p}_{\sigma, p}\right\},y\in\mathbb{R}^{n}, $$
(3)

that is Y belongs to the class \(EC_{\sigma,p,0_{n},O}. \) Note that \(\mathbb {E}Y=0_{n}\) and \(cov(Y)=\mathbb {E}YY^{T}=\Sigma =\kappa OD^{2}O^{T}=\left (\sigma _{i,j}\right)_{i,j=\overline {1,n}}\) where

$$\sigma_{i,j}=cov(Y_{i},Y_{j})= \kappa O_{i}^{T}D^{2}O_{j}, 1\leq i, j\leq n. $$

Thus, \(\phantom {\dot {i}\!}f_{Y}=\varphi _{g_{p},OK}\) and the boundary of K is S=Eσ,p. Note that OK is a star body having the properties introduced at the beginning of this section. The covariance ellipsoid of Y is

$$C(\Sigma)=\{x\in\mathbb{R}^{n}: x^{T}\Sigma^{-1}x=1\}= \left\{x\in\mathbb{R}^{n}: \sum\limits_{i=1}^{n} \frac{||\Pi_{O_{i}}x||^{2}}{\sigma_{i}^{2}}=\kappa\right\} $$

where Πyx means the orthogonal projection of x into the linear space spanned up by y. The main axes of C(Σ) belong to the spaces spanned up by the vectors Oi and have half lengths of size \(\sqrt {\kappa }\sigma _{i}, i=1,...,n\), respectively. Moreover, the set C(Σ) is symmetric with respect to any of the lines \(L_{i}=\mathfrak {L}\{O_{i}\}, i=1,...,n.\) The latter holds also true for the p-generalized ellipsoids

$$OE_{\sigma,p}=\{Ox\in \mathbb{R}^{n}:|x|_{\sigma,p}=1\} $$

because Eσ,p is symmetric with respect to the lines

$$L^{*}_{i}=\{x\in\mathbb{R}^{n}: x=\lambda e_{i}, \lambda\in\mathbb{R}\}, $$

for i=1,...,n, and

$$OL_{i}^{*}=\{\lambda O e_{i}: \lambda \in\mathbb{R}\}=\{\lambda 0_{i}: \lambda \in\mathbb{R}\}=L_{i} $$

with e1,...,en being the standard orthonormal basis vectors in \(\mathbb {R}^{n}\). One may chose the variances according to the maximization procedure from principal component analysis, σ1σ2≥...≥σn. For high-dimensional data holds σn→0 as n. This circumstance allows to introduce data reduction. Therefore, methods from this area of statistical analysis apply to model data of arbitrary fixed or increasing dimension by p-generalized elliptically contoured distributions if p≥1. If p(0,1), certain maximization principles from PCA are to be changed with corresponding minimization principles. In this sense, representation (3) may be called a principal component representation of p-generalized elliptically contoured p-generalized Gaussian densities. With slight changes, the density generating function g=gp may be replaced in (3) with an arbitrary one for representing general p-generalized elliptically contoured densities.

A measure concentration property

A χ2-distributed random variable with n d.f. can be considered as the square of the Euclidean norm of an n-dimensional standard Gaussian vector. The probability mass of such vector is the more concentrated in a relatively small shell having radius of order \(\sqrt {n}\) the larger the vector’s dimension is, see Figs. 1 and 2. Observing the empirical ”relative concentration numbers” 30/10,45/30,90/100 and 300/1000 one may argue that suitably defined numbers might even converge to zero in some sense. This will be proved here within the even more general frame of χp-distributions. For definitions and properties of these distributions we refer to (Richter 2007; 2009; 2014; 2015; 2016).

Fig. 1
figure 1

1000 simulations of \(\chi ^{2}_{10}\)-statistic and \(\chi ^{2}_{30}\)-statistic

Fig. 2
figure 2

1000 simulations of \(\chi ^{2}_{100}\)-statistic and \(\chi ^{2}_{1000}\)-statistic

If a multivariate distribution converges to the standard Gaussian law then the square of the Euclidean norm of the correspondingly distributed vector X, i.e. the square of the Euclidean radius R=|X|1,2 of such vector, will tend under some additional assumption to the χ2-distribution with n d.f.. Let now X follow a star-shaped distribution Φg,K, what can we say then about the behavior of (the suitably defined power of) its generalized (star) radius? In this section we derive typical intervals where star radius variables of high-dimensional star-shaped vectors take values.

Proposition 1

For δ>0 chosen such that

$$\alpha=\frac{1}{\delta^{2}}\left(\frac{I(n+2,g)I(n,g)}{I^{2}(n+1,g)}-1\right) $$

is approaching zero as dimension n is tending to infinity, and independently of the shape defining star body K, the typical behavior of the random star radius of a vector following the star-shaped distribution Φg,K with density generating function g is described by

$$P\left((1-\delta)\frac{I(n+1,g)}{I(n,g)} \leq R\leq (1+\delta)\frac{I(n+1,g)}{I(n,g)}\right)\geq 1-\alpha. $$

Proof

Obviously,

$$\mathbb{E} R^{k}=\frac{I(n+k,g)}{I(n,g)}. $$

Now, Tschebyscheff’s inequality applies \(\square \)

According to (Biau and Mason 2015), the behavior of |X|1,p as n increases is called the distance concentration phenomenon in the computational learning literature. For sums of independent random variables or matrices, sharper concentration inequalities of exponential type are proved in (Vershynin 2016).

For more details on moments of p-spherical random vectors, see (Arellano-Valle and Richter 2012), for an asymmetric situation if p=1 see (Henschel and Richter 2002). The following corollary deals with a class of light tailed high-dimensional star-shaped distributions.

Corollary 1

Let K be any star body as introduced in “Preliminaries” section. If the density generating function of a high-dimensional star-shaped distribution Φg,K is that of Kotz type with parameters s>0,t>0,k>1−n,

$$g(r)=r^{k-1} e^{-tr^{s}}I_{[0,\infty)}(r), $$

and

$$\delta\gg \frac{1}{\sqrt{n} } $$

(meaning that \(\delta {\sqrt {n} }\rightarrow \infty \) as n) then, for sufficiently large n, there holds

$$P\left((1-\delta)\frac{\Gamma(\frac{n+k}{s})}{t^{1/s}\Gamma(\frac{n+k-1}{s})}\leq R\leq (1+\delta)\frac{\Gamma(\frac{n+k}{s})}{t^{1/s}\Gamma(\frac{n+k-1}{s})} \right)\geq 1-\alpha. $$

Proof

First we check that α tends to zero: because

$$\frac{I(n+2,g)I(n,g)}{I^{2}(n+1,g)}=1+\frac{1}{s(n+k)}+O\left(\frac{1}{n^{2}}\right), $$

where O(.) means Landau’s big O symbol, it follows

$$\alpha=\frac{1}{\delta^{2}}\left[\frac{1}{s(n+k)}+O\left(\frac{1}{n^{2}}\right)\right], n\rightarrow \infty. $$

Such α approaches zero as n tends to infinity if δ2·n for n. Finishing the proof, we finally observe that

$$\frac{I(n+1,g)}{I(n,g)}=\frac{\Gamma\left(\frac{k+n}{s}\right)}{t^{1/s}\Gamma\left(\frac{k+n-1}{s}\right)} \,{\square} $$

Remark 1

On using Stirling’s formula, it can be seen that

$$ \frac{\Gamma\left(\frac{n+k}{s}\right)}{\Gamma\left(\frac{n+k-1}{s}\right)}= \left(\frac{n+k}{s}\right)^{1/s}\left(1+O\left(\frac{1}{n}\right)\right) \text{ as }n\rightarrow \infty. $$
(4)

Before turning to the case of heavy distribution tails, we note that asymptotic relation (4) makes the statement of Corollary 1 more specific in the sense that

$$I_{K{t}}(R)= \left((1-\delta)\left(\frac{n+k}{ts}\right)^{1/s},(1+\delta)\left(\frac{n+k}{ts}\right)^{1/s} \right) $$

is a reasonable interval where R takes values with high probability if we are given a high-dimensional star-shaped vector with density generating function of Kotz type and fixed or increasing dimension n. The most essential role is played here by parameter s which basically determines the relative heaviness or lightness of the distribution tails.

Corollary 2

Let K be any star body as introduced in “Preliminaries” section. If the density generating function of a star-shaped distribution Φg,K is that of Pearson type VII with parameters s>0,k>n+2 [12] where kn as n,

$$g(r)=\left(1+\frac{r}{s}\right)^{-k} I_{[0,\infty)}(r), $$

and

$$\delta\gg \frac{1}{\sqrt{n}} \text{ as well as} ~\delta\gg \frac{1}{\sqrt{k-n} }, $$

then, for sufficiently large n, with probability greater or equal to 1−α, the star-radius R of the high-dimensional vector X belongs to the interval

$$I_{P7}(R)=\left((1-\delta)\frac{sn}{k-n-1},\, (1+\delta)\frac{sn}{k-n-1} \right){.} $$

Proof

Checking that α tends to zero, we find that

$$\frac{I(n+2,g)I(n,g)}{I^{2}(n+1,g)}=1+\frac{1}{n}+\frac{1}{k-n-2}+\frac{1}{n(k-n-2)}. $$

The proof is finished by observing that

$$\frac{I(n+1,g)}{I(n,g)}=\frac{sn}{k-n-1}\, {\square} $$

Remark 2

Let δ=δ(n)→+0 and α=α(n)→+0 as n such that nδ2(n)→ and assume that in the situation of Corollary 2 there holds additionally that (kn)δ2(n)→. The statements of Corollaries 1 and 2 can be reformulated then as

$$P(R>(1+\delta)\mathbb{E}R \text{ or } R<(1-\delta)\mathbb{E}R)=O(\alpha(n)) \text{ as } n\rightarrow \infty $$

where Landau’s symbol O defined for the asymptotic relation f(n)=O(g(n)),n guaranties the existence of a constant C such that for all n there holds |f(n)/g(n)|≤C. Moreover, in the situation of Corollaries 1 and 2 we have that

$$\mathbb{E}R=\left(\frac{n+k}{ts}\right)^{1/s}\left(1+O\left({1\over n}\right)\right)~ \text{and}~ \mathbb{E}R=\frac{sn}{k-n-1}, $$

respectively.

Remark 3

Let us finally mention that because in fact we are considering sequences or even schemas of series of vectors and distributions, the assumption kn stated in Corollary 2 is not contradictory. Instead, it ensures a certain variability of the result. Moreover, we remark that (Henschel 2001) and (Henschel and Richter 2002) study the exact distribution of R in case of simplicially contoured vectors (or ln,1-spherical vectors having nonnegative components). General ln,p-spherical vectors and their star radius R are studied in (Richter 2009), (Arellano-Valle and Richter 2012) and (Richter 2014), tables of corresponding exact quantiles of Rp and R are to be found in (Müller and Richter 2016) and (Richter 2016).

On g-robust statistics

If the distribution of a statistic does not depend on the density generating function g of a star-shaped sample vector density φg,K then it is commonly called g-robust. It is well known that Student and Fisher type statistics possess besides the g-robustness property further optimality properties. Here we will see that decisions based upon such statistics are done by closer analyzing random events whose probability is asymptotically negligible if the dimension of the sample vector is approaching infinity.

To be more concrete, in this section, we describe a class of statistical distribution functions, derived from a star-shaped sample vector, the ratio representations of whose values are asymptotically negligible as vector dimension increases unboundedly. It turns out, e.g., that classical and generalized Student and Fisher distributions belong to this class.

Let a random vector X follow the distribution law Φg,K,XΦg,K, and the sets

$$B(t)=\{x\in\mathbb{R}^{n}: T(x)< t\}, t\in \mathbb{R} $$

being generated by a statistic \(T:\mathbb {R}^{n}\rightarrow \mathbb {R}\) such that the equation

$$\mathfrak{F}_{S}(B(t),r)=\mathcal{C}(t), \forall t $$

is satisfied where \(\mathcal {C}(t)\in [0,1]\) does not depend on r. A statistic T of this type is g-robust. It follows by the geometric representation proved for star-shaped distributions in (Richter 2014) that

$$ P(T(X)< t) =\Phi_{g,K}(B(t))=\frac{\mathfrak{O}_{S}(B(t) \cap S)}{\mathfrak{O}_{S}(S)}. $$
(5)

The distribution law of T(X) is determined already by that of vector’s X central projection onto the star sphere S. Thus, for \(t\in \mathbb {R}\), the probability P(T(X)<t) is defined by the ωS-value of the random event B(t)∩S in the geometric probability space \((S,\mathfrak {B}(S),\omega _{S})\), and its representation in (5) will be called its ratio representation. It can be observed that many star spheres show the asymptotic behaviour

$$ \mathfrak{O}_{S}(S)\rightarrow 0 \text{ as } n\rightarrow \infty. $$
(6)

For a particular case of such type, see Example 2(a). The statistical model concerned in this case is dealing with independent and homoscedastic random variables. In case of increasing dimensions, we are confronted then with sequences of probability spaces with asymptotically negligible set S.

Example 3

In case S is the unit ln,p-sphere, condition (6) is satisfied.

In the following example we restrict our consideration to the n-dimensional standard Gaussian law Φg,K=Φ where \(g(r)=e^{-r^{2}/2}\) and \(K=\left \{x: x_{1}^{2}+...+x_{n}^{2}\leq 1\right \}.\)

Example 4

A set ARn belongs to the class \(\mathfrak {A}(dir,dist)\) if there exist functions eA: [0,)→Sn(1) and RA: [0,)→[0,) satisfying the following two assumptions:

\(\mathfrak {A} 1)\)The set A allows the representation

$$A\cap S_{n}(r)=H_{n}\left(e_{A}(r),\, R_{A}(r) \right)\cap S_{n}(r),\ r >0, $$

where Hn(e,R)={xRn:Πex=λe,λR} is a half space and Sn(r) the Euclidean sphere of radius r.

\(\mathfrak {A} 2)\) The function \(\mathfrak {C}:[0,\infty)\rightarrow R^{n}\) with

$$\mathfrak{C}(r)=R_{A}(r)e_{A}(r),\ r \geq0 $$

is a piecewise continuous curve such that A becomes a Borel set.

The functions eA and RA are called directional type and distance type functions of the set A, respectively. If \(A\in \mathfrak {A}(dir,dist)\) then

$$\begin{array}{rl} \Phi(A)=& \frac{C({n-1})}{\sqrt{2\pi}}\int\limits^{\infty}_{0}r^{n-1}e^{-r^{2}/2}\int\limits^{\alpha^{*}(r)}_{0}(\sin \alpha)^{n-2}d\alpha\ dr,\\[2ex] \alpha^{*}(r)=&\arctan \left((r /R_{A}(r))^{2}-1\right)^{1/2}\in (0,\pi),\ r >0 \end{array} $$

where C(n)=21−n/2/Γ(n/2), see (Richter 1995). If the function rα(r)is constant then

$$\Phi(A)=\frac{\omega_{n-1}}{\omega_{n}}\int\limits_{0}^{\alpha^{*}}(\sin\alpha)^{n-2}d\alpha $$

where

$$\frac{\omega_{n-1}}{\omega_{n}}\sim\sqrt{\frac{n}{2\pi}}, n\rightarrow \infty. $$

Example 5

Let \(\frac {1}{\sigma }X,\sigma >0,\) be a standard Gaussian distributed random vector in Rn. The statistic

$$T_{\mathfrak{e},\mathcal{N}}=\frac{(X,\mathfrak{e})}{\|\Pi_{\mathcal{N}}X\|/\sqrt{k}} $$

is known to be Student distributed with k d.f. for all \(\mathfrak {e}\in S_{n}(1)\) and all k-dimensional linear subspaces \(\mathcal {N}\) of Rn such that \(\mathfrak {e}\bot \mathcal {N}\) and kn−1.

Let \(A=B(t)=\{T_{\mathfrak {e},\mathcal {N}}< t\}\). Then \(A\in \mathfrak {A}(dir,dist)\) where \(e_{A}(r)=\mathfrak {e},\) the distance type function is \( R_{B(t)}(r)=\tilde {t}r /(\tilde {t}^{\,2}+1)^{1/2},\ \tilde {t}=t/\sqrt {n-1}\) and the function \( \alpha ^{*}(r)=\arctan \ (1/\tilde {t}\,) \)is constant. Evaluating with k=n−1 the limit of Φ(A)as n leads to the well known result Φ0,1(t) where Φ0,1 denotes the cumulative distribution function of the univariate standard normal distribution.

For similar properties of a corresponding exact Student test in nonlinear regression, see (Ittrich 2000) and (Ittrich and Richter 2005).

Example 6

For a related consideration on the p-generalized Fisher statistic, see (Richter 2009).

Remark 4

If one is interested in avoiding the asymptotic negligibility of S in case of increasing dimension one may leave the class of statistical models dealing with independent homoscedastic observations. Density level sets of sample vectors having heteroscedastic components may be star-shaped. If S is an (a,p)-ellipsoid then, according to asymptotic relation (2), condition (6) may by violated and even asymptotically stabilizing.

Availability of data and material

Not applicable.

References

Download references

Acknowledgements

The author likes to thank all the Reviewers, Associated Editor and Editor for their valuable comments.

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wolf-Dieter Richter.

Ethics declarations

Competing interests

The author declares no conflict of interest

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Richter, WD. High-dimensional star-shaped distributions. J Stat Distrib App 6, 5 (2019). https://doi.org/10.1186/s40488-019-0096-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s40488-019-0096-0

Keywords