High-dimensional star-shaped distributions

Richter, Wolf-Dieter

doi:10.1186/s40488-019-0096-0

Short report
Open access
Published: 06 June 2019

High-dimensional star-shaped distributions

Wolf-Dieter Richter ORCID: orcid.org/0000-0002-3610-7219¹

Journal of Statistical Distributions and Applications volume 6, Article number: 5 (2019) Cite this article

2711 Accesses
2 Citations
2 Altmetric
Metrics details

Abstract

Stochastic representations of star-shaped distributed random vectors having heavy or light tail density generating function g are studied for increasing dimensions along with corresponding geometric measure representations. Intervals are considered where star radius variables take values with high probability, and the derivation of values of distribution functions of g-robust statistics is proved to be based upon considering random events whose probability is asymptotically negligible if the dimension of the sample vector is approaching infinity. Moreover, a principal component representation of p-generalized elliptically contoured p-generalized Gaussian distributions is discussed.

Introduction

Among the frequently obtained impressions one gets from analyzing high-dimensional data sets are that an observation point’s distance from the zero element of the sample space is likely to belong to a certain interval from the positive real line, away from zero, and that the distribution of the direction of the vector seems to be close, in a certain sense, to a uniform distribution on the set of all directions that are observable from a certain center. The first observation can be reflected from a probabilistic point of view by a measure concentration type property including what is done in (Biau and Mason 2015) and (Vershynin 2016), and the second is part of background for testing uniformity on high-dimensional spheres, see e.g. (Cutting et al. 2017), possibly after projecting data points onto spheres as, e.g., in (Banerjee and Ghosh 2004).

In situations of the described type, it may be reasonable to model the data, or their residuals after fitting to a model, by multivariate star-shaped distributions. In this regard, (Balkema and Embrechts 2007) and (Balkema et al. 2010) discover conditions ensuring that star-shaped distributions with the Gauss-exponential law being one of the most known examples appear as limit laws in certain high-risk scenarios.

Distributions from the class of star-shaped distributions are flexible with respect to convexity or radial concavity, allow different variability of probability mass along different directions of the sample space and are able to model light and heavy distribution centers and tails. Because there is no natural number being representative for large dimensions, one might like to consider sequences or schemas of series of n-dimensional vectors with n approaching infinity. However, for simplicity of notation, we instead consider here just a single random vector X taking values in $\mathbb {R}^{n}$ and assume afterwards that n is tending to infinity in formulas holding for X.

Let us recall at this point the following general aspect of uni- or multivariate asymptotic probabilistic analysis being of particular importance, for example, in large deviation theory, but not exclusively there. Studying the limit behavior of certain sequences of distributions on specific subsets of their ranges of definition and comparing it to how the appearing limit law itself behaves on the same sets needs to precisely know the latter one. In this respect, it is an independent problem to study the behavior limit laws show on the sets of interest. Similarly, if a sequence of distributions of increasing dimension is approximated in a certain part of its range of definition by a high-dimensional star-shaped limit law then studying the latter one is an independent problem being in the core of interest of the present note.

With the agreement of considering just one single vector X of dimension n, particular questions concerned by the buzzword ’big data’ are approached in the present short note by reflecting above mentioned impressions gained from data in the language of probability distributions. To be more specific, we are dealing here with star-shaped distributions in $\mathbb {R}^{n}$ and correspondingly distributed vectors. Such vector allows a stochastic representation as a product of a random generalized radius variable R and a random vector U being star-uniformly distributed on a star-sphere and independent of R, as well as a corresponding geometric measure representation. Some consequences which can be drawn from these representations in case of increasing dimension are studied. In particular, a representation of p-generalized elliptically contoured distributions is considered from the point of view of principal components.

The paper is structured as follows. In “Preliminaries” section, we present preliminary facts on star-shaped distributions including the notions of star surface content measure and star-uniform distribution on a star sphere. “A principal component representation” section deals with the particular class of p-generalized elliptically contoured distributions and it is studied there how they apply to modeling high-dimensional data. “A measure concentration property” section is then aimed to consider typical intervals where R takes values if X is star-shaped distributed, and in “On g-robust statistics” section distributions of univariate statistics are described which can basically be derived from star-uniformly distributed vectors. Such distributions are not affected by whether X has a density generating function g generating light or heavy distribution tails and is therefore called g-robust. The derivation of values of distribution functions of g-robust statistics is proved to be based upon considering random events whose probability is asymptotically negligible if the dimension of the sample vector is approaching infinity.

Preliminaries

Let $K\subset \mathbb {R}^{n}$ be a star body having the origin in its interior and assume that the Minkowski functional h_K of K is positively homogeneous of degree one. We call K(r)=rK and its boundary S(r)=rS the star ball and star sphere of star radius r>0, respectively. If g:[0,∞)→[0,∞) satisfies 0<I(n,g)<∞ where $I(n,g)=\int \limits _{0}^{\infty } r^{n-1}g(r)dr$ then it is called a density generating function. In such case,

$$\varphi_{g,K}(x)=C(g,K)g(h_{K}(x)), x\in\mathbb{R}^{n} $$

is called a star-shaped density and K its contour defining star body. The corresponding probability measure is denoted Φ_g,K and the normalizing constant allows the representation

$$C(g,K)=\frac{1}{\mathfrak{O}_{S}(S)I(n,g)} $$

where $\mathfrak {O}_{S}(S)$ means the star-generalized surface content of S, see (Richter 2014). If the additional assumption C(g,K)=1 is satisfied then g is called a density generator. In the following example, we recall an explicit analytical representation of the star-generalized surface content measure O_S in case S is a p-generalized ellipsoid with main axes of half lengths a₁,...,a_n and indicate relationships to representations in other particular cases.

Example 1

Let $K=\{x\in \mathbb {R}^{n}:|x|_{a,p}\leq 1\}$ where $|x|_{a,p}=\left (\sum \limits _{i=1}^{n}|\frac { x_{i}}{a_{i}}|^{p}\right)^{1/p}, a=(a_{1},...,a_{n})^{T}, a_{i}>0,i=1,...,n,p>0$, and S⁺⁽⁻⁾=S∩{x:x_n>(<)0} the upper (lower) half of the (a,p)-ellipsoid S={x:|x|_a,p=1}. Then

$$ \mathfrak{O}_{S}(A)=a_{n}\;\left(\int\limits_{G(A\cap S^{+})}+\int\limits_{G(A\cap S^{-})}\right)\quad\frac{d(x_{1},...,x_{n-1})}{\left(1-\sum\limits_{i=1}^{n-1}|\frac{ x_{i}}{a_{i}}|^{p}\right)^{1-1/p}},\; A\in{\mathfrak{B}(S)} $$

(1)

where $G(A\cap S^{+(-)})=\{\vartheta \in \mathbb {R}^{n-1}:\exists \eta =\eta (\vartheta) s.t. (\vartheta ^{T},\eta)^{T}\in A\cap S^{+(-)} \}$, $\mathfrak {B}(S)=\mathfrak {B}^{n}\cap S$ and $\mathfrak {B}^{n}$ denotes the Borel σ-field in $\mathbb {R}^{n}$.

If p=2 and a=1_n=(1,...,1)^T then $\mathfrak {O}_{S}(A)$ is the Euclidean surface content of the measurable subset A of S.
If p=1 then $\mathfrak {O}_{S}(A)$ can be considered as a particular polyhedral generalized surface content of A.
If $\mathfrak {O}_{S,\infty }(A)$ is defined as the limit of $\mathfrak {O}_{S}(A), A\in {\mathfrak {B}(S)}$ as p→∞ then $\mathfrak {O}_{S,\infty }$ can be considered as another particular polyhedral generalized surface content measure. For the whole class of polyhedral generalized surface content measures, see (Richter and Schicker 2017).
Generalizations of representation (1) hold true for all cases where K is a ball with respect to any norm or antinorm, see (Richter 2015).

The next example deals with the asymptotic behavior of star surface content and volume of star spheres and star balls or ellipsoids, respectively, if dimension is approaching infinity.

Example 2

[a] It is well known that if S is the Euclidean unit sphere then $\mathfrak {O}_{S}(S)=\omega _{n}$ where $ \omega _{n}={2\pi ^{n/2}}/{\Gamma ({\frac {n}{2}})} $ is the Euclidean surface content of S. It is known that $\arg \sup \limits _{n}\omega _{n}=7$ and that ω_n is monotonously decreasing starting from this value, see e.g. (Loskot and Beaulieu 2007). Moreover, according to Stirling’s formula,

$$\mathfrak{O}_{S}(S)\sim\frac{\sqrt{2e}}{1+\frac{1}{6n}}(\frac{2\pi e}{n})^{(n-1)/2} \text{ as } n\rightarrow \infty$$

meaning that the ratio of the quantity on the left hand side divided by that of the right hand side tends to one if n tends to infinity. Obviously, $\mathfrak {O}_{S}(S)$ is tending to zero quite fast as n→∞.

(b) If S is the l_n,p-sphere having unit star radius, p>0, then the star surface content of S is known to be $\mathfrak {O}_{S}(S)=\omega _{n,p}$ where $ \omega _{n,p}={2^{n}(\Gamma (\frac {1}{p}))^{n}}/(p^{n-1}\Gamma (\frac {n}{p})). $ Note that

$$\mathfrak{O}_{S}(S)\sim \frac{p}{\sqrt{2\pi}}\left(\frac{p}{n}\right)^{\frac{n}{p}-\frac{1}{2}} \left(\frac{2\Gamma(\frac{1}{p})}{p}\right)^{n}e^{\frac{n}{p}}, n\rightarrow \infty. $$

Let $\Omega _{n,p}=\frac {\omega _{n,p}}{n}$ denote the volume of the l_n,p-ball. The asymptotic relations following from the latter one,

$$\Omega_{n,p}^{\;\frac{p}{n}}\sim\frac{pe\left[\frac{2}{p}\Gamma\left(\frac{1}{p}\right)\right]^{p}}{n}\text{\; and \;} \Omega_{n,p}^{\;\frac{p}{n\ln n}}\sim\frac{1}{e},\,n\rightarrow \infty, $$

generalize two results given in (Chen and Lin 2014) for the particular Euclidean case p=2.

(c) It is well known that the star surface content of the (a,p)-ellipsoid $S=\{x\in \mathbb {R}^{n}:|x|_{a,p}= 1\}$ is $ \mathfrak {O}_{S}(S)=a_{1}\cdots a_{n}\omega _{n,p}, $ thus

$$ \mathfrak{O}_{S}(S)\sim a_{1}\cdots a_{n}\frac{p}{\sqrt{2\pi}}\left(\frac{p}{n}\right)^{\frac{n}{p}-\frac{1}{2}} \left(\frac{2\Gamma(\frac{1}{p})}{p}\right)^{n}e^{\frac{n}{p}},\, n\rightarrow \infty. $$

(2)

If a random vector X follows the star-shaped density φ_g,K then it allows the stochastic representation

$$X\overset{d}{=}R\cdot U $$

meaning that X is distributed as R·U. The nonnegative random variable R is independent of the random vector U, R has density function

$$f_{R}(r)=I(n,g)^{-1}r^{n-1}g(r) I_{[0,\infty)}(r) $$

and U has the star-uniform distribution

$$\omega_{S}(A)=\frac{O_{S}(A)}{O_{S}(S)}, A\in \mathfrak{B}(S). $$

Here, I_[0,∞)(r)=1 if r≥0 and I_[0,∞)(r)=0 otherwise. Accordingly, the geometric measure representation of star-shaped distribution laws reads

$$\Phi_{g,K}(B)=\frac{1}{I(n,g)}\int\limits_{0}^{\infty}r^{n-1}g(r)\mathfrak{F}_{S}(B,r)dr, B\in\mathfrak{B}^{n} $$

where

$$\mathfrak{F}_{S}(B,r)=\frac{\mathfrak{O}_{S}\left(\left[\frac{1}{r}B\right] \cap S\right)}{\mathfrak{O}_{S}(S)}=\omega_{S}\left(\left[\frac{1}{r}B\right] \cap S\right) $$

is the star sphere intersection proportion function of the set B. Let the Minkowski functional of K be denoted h_K then

$$R\overset{d}{=}h_{K}(X), $$

and R is called the star radius of X.

A principal component representation

In this section we study to what extent a particular class of continuous multivariate star-shaped distributions applies to modeling high-dimensional data. To be specific, we consider p-generalized elliptically contoured p-generalized Gaussian distributions.

It will be shown which way principal component analysis can be used to identify those components of such star-shaped distributed vectors being of major importance for the modeling process. In particular it turns out that covariance ellipsoids being l₂-ellipsoids have the same main axes as density level set ellipsoids being l_p-ellipsoids.

Let X_i,i=1,...,n be independent random variables correspondingly following the densities

$$f_{p}(x;\mu_{i},\sigma_{i})=\frac{C_{p}}{\sigma_{i}}\exp\left\{-\frac{|x-\mu_{i}|^{p}}{p\sigma_{i}^{p}}\right\}, x\in \mathbb{R} $$

where $C_{p}=p^{1-1/p}/\left (2\Gamma \left (\frac {1}{p}\right)\right)$. Then $\mathbb {E}X_{i}=\mu _{i}$ and $V(X_{i})=\kappa \sigma _{i}^{2}$ are expectation and variance of X_i, respectively, i=1,...,n. Here and in what follows we use the notation

$$\kappa(n)=p^{\frac{2}{p}}\frac{\Gamma\left(\frac{n+2}{2}\right)}{\Gamma\left(\frac{n}{2}\right)}\text{ and}~ \kappa=\kappa(1). $$

Let σ=(σ₁,...,σ_n)^T,X=(X₁,...,X_n)^T and μ=(μ₁,...,μ_n)^T, then X−μ allows the stochastic representation

$$X-\mu\overset{d}{=}R_{\sigma,p}U_{\sigma,p} $$

where R_σ,p=|X−μ|_σ,p and $U_{\sigma,p}=\frac {1}{R_{\sigma,p}}(X-\mu)$ are independent. The star radius R_σ,p follows the p-generalized Chi-distribution with n d.f. having according to (Richter 2007) the density

$$f(r)=\frac{r^{n-1}e^{-\frac{r^{p}}{p}}}{p^{\frac{n}{p}-1}\Gamma\left(\frac{n}{p}\right)}I_{[0,\infty)}(r), $$

and the stochastic basis vector U_σ,p is star-uniformly distributed on the star sphere S=E_σ,p from Example 2(c) with a_i=σ_i,i=1,...,n,U_σ,p∼ω_σ,p. Thus, using notation in (Richter 2014), X belongs to the class $ EC_{\sigma,p,\mu,I_{n}}$ and its density generating function can be chosen as $g(r)=g_{p}(r)= e^{-\frac {r^{p}}{p}}I_{[0,\infty)}(r)$. Note that $\mathbb {E} U_{\sigma,p}=0_{n}$ and, because $\mathbb {E} R^{2}_{\sigma,p}=\kappa (n)$,

$$cov (U_{\sigma,p})=\frac{\kappa}{\kappa(n)} D^{2}$$

where

$$D=diag(\sigma_{1},...,\sigma_{n}). $$

It follows that $\mathbb {E} X=\mu $ and cov(X)=κD². Now, denote an orthogonal n×n-matrix $O=\left (O_{i,j}\right)_{i,j=\overline {1,n}}$ and let the transpose of its i’th row be O_i=(O_i,1,...,O_si,n)^T. The random vector Y=O(X−μ) follows the p-generalized elliptically contoured density

$$ f_{Y}(y)=\frac{C_{p}^{n}}{\sigma_{1}\cdots\sigma_{n}} \exp\left\{-\frac{1}{p}|O^{T}y|^{p}_{\sigma, p}\right\},y\in\mathbb{R}^{n}, $$

(3)

that is Y belongs to the class $EC_{\sigma,p,0_{n},O}. $ Note that $\mathbb {E}Y=0_{n}$ and $cov(Y)=\mathbb {E}YY^{T}=\Sigma =\kappa OD^{2}O^{T}=\left (\sigma _{i,j}\right)_{i,j=\overline {1,n}}$ where

$$\sigma_{i,j}=cov(Y_{i},Y_{j})= \kappa O_{i}^{T}D^{2}O_{j}, 1\leq i, j\leq n. $$

Thus, $\phantom {\dot {i}\!}f_{Y}=\varphi _{g_{p},OK}$ and the boundary of K is S=E_σ,p. Note that OK is a star body having the properties introduced at the beginning of this section. The covariance ellipsoid of Y is

$$C(\Sigma)=\{x\in\mathbb{R}^{n}: x^{T}\Sigma^{-1}x=1\}= \left\{x\in\mathbb{R}^{n}: \sum\limits_{i=1}^{n} \frac{||\Pi_{O_{i}}x||^{2}}{\sigma_{i}^{2}}=\kappa\right\} $$

where Π_yx means the orthogonal projection of x into the linear space spanned up by y. The main axes of C(Σ) belong to the spaces spanned up by the vectors O_i and have half lengths of size $\sqrt {\kappa }\sigma _{i}, i=1,...,n$, respectively. Moreover, the set C(Σ) is symmetric with respect to any of the lines $L_{i}=\mathfrak {L}\{O_{i}\}, i=1,...,n.$ The latter holds also true for the p-generalized ellipsoids

$$OE_{\sigma,p}=\{Ox\in \mathbb{R}^{n}:|x|_{\sigma,p}=1\} $$

because E_σ,p is symmetric with respect to the lines

$$L^{*}_{i}=\{x\in\mathbb{R}^{n}: x=\lambda e_{i}, \lambda\in\mathbb{R}\}, $$

for i=1,...,n, and

$$OL_{i}^{*}=\{\lambda O e_{i}: \lambda \in\mathbb{R}\}=\{\lambda 0_{i}: \lambda \in\mathbb{R}\}=L_{i} $$

with e₁,...,e_n being the standard orthonormal basis vectors in $\mathbb {R}^{n}$. One may chose the variances according to the maximization procedure from principal component analysis, σ₁≥σ₂≥...≥σ_n. For high-dimensional data holds σ_n→0 as n→∞. This circumstance allows to introduce data reduction. Therefore, methods from this area of statistical analysis apply to model data of arbitrary fixed or increasing dimension by p-generalized elliptically contoured distributions if p≥1. If p∈(0,1), certain maximization principles from PCA are to be changed with corresponding minimization principles. In this sense, representation (3) may be called a principal component representation of p-generalized elliptically contoured p-generalized Gaussian densities. With slight changes, the density generating function g=g_p may be replaced in (3) with an arbitrary one for representing general p-generalized elliptically contoured densities.

A measure concentration property

A χ²-distributed random variable with n d.f. can be considered as the square of the Euclidean norm of an n-dimensional standard Gaussian vector. The probability mass of such vector is the more concentrated in a relatively small shell having radius of order $\sqrt {n}$ the larger the vector’s dimension is, see Figs. 1 and 2. Observing the empirical ”relative concentration numbers” 30/10,45/30,90/100 and 300/1000 one may argue that suitably defined numbers might even converge to zero in some sense. This will be proved here within the even more general frame of χ^p-distributions. For definitions and properties of these distributions we refer to (Richter 2007; 2009; 2014; 2015; 2016).

If a multivariate distribution converges to the standard Gaussian law then the square of the Euclidean norm of the correspondingly distributed vector X, i.e. the square of the Euclidean radius R=|X|_1,2 of such vector, will tend under some additional assumption to the χ²-distribution with n d.f.. Let now X follow a star-shaped distribution Φ_g,K, what can we say then about the behavior of (the suitably defined power of) its generalized (star) radius? In this section we derive typical intervals where star radius variables of high-dimensional star-shaped vectors take values.

Proposition 1

For δ>0 chosen such that

$$\alpha=\frac{1}{\delta^{2}}\left(\frac{I(n+2,g)I(n,g)}{I^{2}(n+1,g)}-1\right) $$

is approaching zero as dimension n is tending to infinity, and independently of the shape defining star body K, the typical behavior of the random star radius of a vector following the star-shaped distribution Φ_g,K with density generating function g is described by

$$P\left((1-\delta)\frac{I(n+1,g)}{I(n,g)} \leq R\leq (1+\delta)\frac{I(n+1,g)}{I(n,g)}\right)\geq 1-\alpha. $$

Proof

Obviously,

$$\mathbb{E} R^{k}=\frac{I(n+k,g)}{I(n,g)}. $$

Now, Tschebyscheff’s inequality applies $\square $ □

According to (Biau and Mason 2015), the behavior of |X|_1,p as n increases is called the distance concentration phenomenon in the computational learning literature. For sums of independent random variables or matrices, sharper concentration inequalities of exponential type are proved in (Vershynin 2016).

For more details on moments of p-spherical random vectors, see (Arellano-Valle and Richter 2012), for an asymmetric situation if p=1 see (Henschel and Richter 2002). The following corollary deals with a class of light tailed high-dimensional star-shaped distributions.

Corollary 1

Let K be any star body as introduced in “Preliminaries” section. If the density generating function of a high-dimensional star-shaped distribution Φ_g,K is that of Kotz type with parameters s>0,t>0,k>1−n,

$$g(r)=r^{k-1} e^{-tr^{s}}I_{[0,\infty)}(r), $$

and

$$\delta\gg \frac{1}{\sqrt{n} } $$

(meaning that $\delta {\sqrt {n} }\rightarrow \infty $ as n→∞) then, for sufficiently large n, there holds

$$P\left((1-\delta)\frac{\Gamma(\frac{n+k}{s})}{t^{1/s}\Gamma(\frac{n+k-1}{s})}\leq R\leq (1+\delta)\frac{\Gamma(\frac{n+k}{s})}{t^{1/s}\Gamma(\frac{n+k-1}{s})} \right)\geq 1-\alpha. $$

Proof

First we check that α tends to zero: because

$$\frac{I(n+2,g)I(n,g)}{I^{2}(n+1,g)}=1+\frac{1}{s(n+k)}+O\left(\frac{1}{n^{2}}\right), $$

where O(.) means Landau’s big O symbol, it follows

$$\alpha=\frac{1}{\delta^{2}}\left[\frac{1}{s(n+k)}+O\left(\frac{1}{n^{2}}\right)\right], n\rightarrow \infty. $$

Such α approaches zero as n tends to infinity if δ²·n→∞ for n→∞. Finishing the proof, we finally observe that

$$\frac{I(n+1,g)}{I(n,g)}=\frac{\Gamma\left(\frac{k+n}{s}\right)}{t^{1/s}\Gamma\left(\frac{k+n-1}{s}\right)} \,{\square} $$

□

Remark 1

On using Stirling’s formula, it can be seen that

$$ \frac{\Gamma\left(\frac{n+k}{s}\right)}{\Gamma\left(\frac{n+k-1}{s}\right)}= \left(\frac{n+k}{s}\right)^{1/s}\left(1+O\left(\frac{1}{n}\right)\right) \text{ as }n\rightarrow \infty. $$

(4)

Before turning to the case of heavy distribution tails, we note that asymptotic relation (4) makes the statement of Corollary 1 more specific in the sense that

$$I_{K{t}}(R)= \left((1-\delta)\left(\frac{n+k}{ts}\right)^{1/s},(1+\delta)\left(\frac{n+k}{ts}\right)^{1/s} \right) $$

is a reasonable interval where R takes values with high probability if we are given a high-dimensional star-shaped vector with density generating function of Kotz type and fixed or increasing dimension n. The most essential role is played here by parameter s which basically determines the relative heaviness or lightness of the distribution tails.

Corollary 2

Let K be any star body as introduced in “Preliminaries” section. If the density generating function of a star-shaped distribution Φ_g,K is that of Pearson type VII with parameters s>0,k>n+2 [12] where k−n→∞ as n→∞,

$$g(r)=\left(1+\frac{r}{s}\right)^{-k} I_{[0,\infty)}(r), $$

and

$$\delta\gg \frac{1}{\sqrt{n}} \text{ as well as} ~\delta\gg \frac{1}{\sqrt{k-n} }, $$

then, for sufficiently large n, with probability greater or equal to 1−α, the star-radius R of the high-dimensional vector X belongs to the interval

$$I_{P7}(R)=\left((1-\delta)\frac{sn}{k-n-1},\, (1+\delta)\frac{sn}{k-n-1} \right){.} $$

Proof

Checking that α tends to zero, we find that

$$\frac{I(n+2,g)I(n,g)}{I^{2}(n+1,g)}=1+\frac{1}{n}+\frac{1}{k-n-2}+\frac{1}{n(k-n-2)}. $$

The proof is finished by observing that

$$\frac{I(n+1,g)}{I(n,g)}=\frac{sn}{k-n-1}\, {\square} $$

□

Remark 2

Let δ=δ(n)→+0 and α=α(n)→+0 as n→∞ such that nδ²(n)→∞ and assume that in the situation of Corollary 2 there holds additionally that (k−n)δ²(n)→∞. The statements of Corollaries 1 and 2 can be reformulated then as

$$P(R>(1+\delta)\mathbb{E}R \text{ or } R<(1-\delta)\mathbb{E}R)=O(\alpha(n)) \text{ as } n\rightarrow \infty $$

where Landau’s symbol O defined for the asymptotic relation f(n)=O(g(n)),n→∞ guaranties the existence of a constant C such that for all n there holds |f(n)/g(n)|≤C. Moreover, in the situation of Corollaries 1 and 2 we have that

$$\mathbb{E}R=\left(\frac{n+k}{ts}\right)^{1/s}\left(1+O\left({1\over n}\right)\right)~ \text{and}~ \mathbb{E}R=\frac{sn}{k-n-1}, $$

respectively.

Remark 3

Let us finally mention that because in fact we are considering sequences or even schemas of series of vectors and distributions, the assumption k−n→∞ stated in Corollary 2 is not contradictory. Instead, it ensures a certain variability of the result. Moreover, we remark that (Henschel 2001) and (Henschel and Richter 2002) study the exact distribution of R in case of simplicially contoured vectors (or l_n,1-spherical vectors having nonnegative components). General l_n,p-spherical vectors and their star radius R are studied in (Richter 2009), (Arellano-Valle and Richter 2012) and (Richter 2014), tables of corresponding exact quantiles of R^p and R are to be found in (Müller and Richter 2016) and (Richter 2016).

On g-robust statistics

If the distribution of a statistic does not depend on the density generating function g of a star-shaped sample vector density φ_g,K then it is commonly called g-robust. It is well known that Student and Fisher type statistics possess besides the g-robustness property further optimality properties. Here we will see that decisions based upon such statistics are done by closer analyzing random events whose probability is asymptotically negligible if the dimension of the sample vector is approaching infinity.

To be more concrete, in this section, we describe a class of statistical distribution functions, derived from a star-shaped sample vector, the ratio representations of whose values are asymptotically negligible as vector dimension increases unboundedly. It turns out, e.g., that classical and generalized Student and Fisher distributions belong to this class.

Let a random vector X follow the distribution law Φ_g,K,X∼Φ_g,K, and the sets

$$B(t)=\{x\in\mathbb{R}^{n}: T(x)< t\}, t\in \mathbb{R} $$

being generated by a statistic $T:\mathbb {R}^{n}\rightarrow \mathbb {R}$ such that the equation

$$\mathfrak{F}_{S}(B(t),r)=\mathcal{C}(t), \forall t $$

is satisfied where $\mathcal {C}(t)\in [0,1]$ does not depend on r. A statistic T of this type is g-robust. It follows by the geometric representation proved for star-shaped distributions in (Richter 2014) that

$$ P(T(X)< t) =\Phi_{g,K}(B(t))=\frac{\mathfrak{O}_{S}(B(t) \cap S)}{\mathfrak{O}_{S}(S)}. $$

(5)

The distribution law of T(X) is determined already by that of vector’s X central projection onto the star sphere S. Thus, for $t\in \mathbb {R}$, the probability P(T(X)<t) is defined by the ω_S-value of the random event B(t)∩S in the geometric probability space $(S,\mathfrak {B}(S),\omega _{S})$, and its representation in (5) will be called its ratio representation. It can be observed that many star spheres show the asymptotic behaviour

$$ \mathfrak{O}_{S}(S)\rightarrow 0 \text{ as } n\rightarrow \infty. $$

(6)

For a particular case of such type, see Example 2(a). The statistical model concerned in this case is dealing with independent and homoscedastic random variables. In case of increasing dimensions, we are confronted then with sequences of probability spaces with asymptotically negligible set S.

Example 3

In case S is the unit l_n,p-sphere, condition (6) is satisfied.

In the following example we restrict our consideration to the n-dimensional standard Gaussian law Φ_g,K=Φ where $g(r)=e^{-r^{2}/2}$ and $K=\left \{x: x_{1}^{2}+...+x_{n}^{2}\leq 1\right \}.$

Example 4

A set A⊂Rⁿ belongs to the class $\mathfrak {A}(dir,dist)$ if there exist functions e_A: [0,∞)→S_n(1) and R_A: [0,∞)→[0,∞) satisfying the following two assumptions:

$\mathfrak {A} 1)$The set A allows the representation

$$A\cap S_{n}(r)=H_{n}\left(e_{A}(r),\, R_{A}(r) \right)\cap S_{n}(r),\ r >0, $$

where H_n(e,R)={x∈Rⁿ:Π_ex=λe,λ≥R} is a half space and S_n(r) the Euclidean sphere of radius r.

$\mathfrak {A} 2)$ The function $\mathfrak {C}:[0,\infty)\rightarrow R^{n}$ with

$$\mathfrak{C}(r)=R_{A}(r)e_{A}(r),\ r \geq0 $$

is a piecewise continuous curve such that A becomes a Borel set.

The functions e_A and R_A are called directional type and distance type functions of the set A, respectively. If $A\in \mathfrak {A}(dir,dist)$ then

$$\begin{array}{rl} \Phi(A)=& \frac{C({n-1})}{\sqrt{2\pi}}\int\limits^{\infty}_{0}r^{n-1}e^{-r^{2}/2}\int\limits^{\alpha^{*}(r)}_{0}(\sin \alpha)^{n-2}d\alpha\ dr,\\[2ex] \alpha^{*}(r)=&\arctan \left((r /R_{A}(r))^{2}-1\right)^{1/2}\in (0,\pi),\ r >0 \end{array} $$

where C(n)=2^1−n/2/Γ(n/2), see (Richter 1995). If the function r→α^∗(r)is constant then

$$\Phi(A)=\frac{\omega_{n-1}}{\omega_{n}}\int\limits_{0}^{\alpha^{*}}(\sin\alpha)^{n-2}d\alpha $$

where

$$\frac{\omega_{n-1}}{\omega_{n}}\sim\sqrt{\frac{n}{2\pi}}, n\rightarrow \infty. $$

Example 5

Let $\frac {1}{\sigma }X,\sigma >0,$ be a standard Gaussian distributed random vector in Rⁿ. The statistic

$$T_{\mathfrak{e},\mathcal{N}}=\frac{(X,\mathfrak{e})}{\|\Pi_{\mathcal{N}}X\|/\sqrt{k}} $$

is known to be Student distributed with k d.f. for all $\mathfrak {e}\in S_{n}(1)$ and all k-dimensional linear subspaces $\mathcal {N}$ of Rⁿ such that $\mathfrak {e}\bot \mathcal {N}$ and k≤n−1.

Let $A=B(t)=\{T_{\mathfrak {e},\mathcal {N}}< t\}$. Then $A\in \mathfrak {A}(dir,dist)$ where $e_{A}(r)=\mathfrak {e},$ the distance type function is $ R_{B(t)}(r)=\tilde {t}r /(\tilde {t}^{\,2}+1)^{1/2},\ \tilde {t}=t/\sqrt {n-1}$ and the function $ \alpha ^{*}(r)=\arctan \ (1/\tilde {t}\,) $is constant. Evaluating with k=n−1 the limit of Φ(A)as n→∞ leads to the well known result Φ_0,1(t) where Φ_0,1 denotes the cumulative distribution function of the univariate standard normal distribution.

For similar properties of a corresponding exact Student test in nonlinear regression, see (Ittrich 2000) and (Ittrich and Richter 2005).

Example 6

For a related consideration on the p-generalized Fisher statistic, see (Richter 2009).

Remark 4

If one is interested in avoiding the asymptotic negligibility of S in case of increasing dimension one may leave the class of statistical models dealing with independent homoscedastic observations. Density level sets of sample vectors having heteroscedastic components may be star-shaped. If S is an (a,p)-ellipsoid then, according to asymptotic relation (2), condition (6) may by violated and even asymptotically stabilizing.

Availability of data and material

Not applicable.

References

Arellano-Valle, R. B., Richter, W. -D.: On skewed continuous l _n,p-symmetric distributions. Chil. J. Stat. 3(2), 193–212 (2012).
MathSciNet Google Scholar
Balkema, A., Embrechts, P., Nolde N.: Meta densities and the shape of their sample clouds. J. Multivar. Anal. 101, 1738–1754 (2010).
Article MathSciNet Google Scholar
Balkema, G., Embrechts, P.: High Risk Scenarios and Extremes. A geometric approach. European Mathematical Society Publishing House, Zürich (2007).
Book Google Scholar
Banerjee, A., Ghosh, J.: Frequency sensitive competitive learning for scalable balanced clustering on high-dimensional hypersurfaces. IEEE Neural Netw. 15, 702–719 (2004).
Article Google Scholar
Biau, G., Mason, D.: High-dimensional p-norms. In: Hallin P Mason, Steinebach (eds) Mathematical statistics and limit theorems, pp. 21–40. Springer, Cham (2015).
Google Scholar
Chen, C. P., Lin, L.: Inequalities for the volume of the unit ball in $\mathbb {R}^{n}$. Mediterr. J. Math. 11, 299–314 (2014).
Article MathSciNet Google Scholar
Cutting, C., Paindaveine, D., Verdebout, T.: Testing uniformity on high-dimensional spheres. Anna Stat. 45(3), 1024–1058 (2017).
Article Google Scholar
Henschel, V.: Ausgewählte lineare Modelle in simplizial konturiert verteilten Grundgesamtheiten. GCA-Verlag, Herdecke (2001).
Google Scholar
Henschel, V., Richter, W. -D.: Geometric generalization of the exponential law. J. Multivar. Anal.81(2), 189–204 (2002).
Article MathSciNet Google Scholar
Ittrich, C.: Exakte Methoden in Regressionsmodellen mit einem nichtlinearen Parameter und sphärisch symmetrischen Fehlern. Shaker, Aachen (2000).
Google Scholar
Ittrich, C., Richter, W. -D.: Exact tests and confidence regions in nonlinear regression. Statistics. 39, 13–42 (2005).
Article MathSciNet Google Scholar
Loskot, P., Beaulieu, N. C.: On monotonicity of the hypersphere volume and area. J Geo. 87, 96–98 (2007). https://doi.org/10.1007/s00022-007-1891-1.
Article MathSciNet Google Scholar
Müller, K., Richter, W. -D.: Extreme value distributions for dependentjointly l _n,p-symmetrically distributed random variables. Depend. Model. 4, 30–62 (2016). https://doi.org/10.1515/demo-2016-0002.
MathSciNet MATH Google Scholar
Richter, W. -D.: A geometric approach to the Gaussian law. In: Mammitzsch/Schneeweiß (ed.)Symposia Gaussiana, pp. 25–45. Walter de Gruyter & Co., berlin, Conf. B. (1995).
Richter, W. -D.: Generalized spherical and simplicial coordinates. J. Math Anal. Appl. 335, 1187–1202 (2007). https://doi.org/10.1016/j.jmaa.2007.03.047.
Article MathSciNet Google Scholar
Richter, W. -D.: Continuous l _n,p-symmetric distributions. Lith. Math J. 49(1), 93–108 (2009). https://doi.org/10.1007/s10986-009-9030-3.
Article MathSciNet Google Scholar
Richter, W. -D.: Geometric disintegration and star-shaped distributions. J. Stat. Distrib. Appl. 1(20) (2014). https://doi.org/10.1186/s40488-014-0020-6.
Richter, W. -D.: Convex and radially concave contoured distributions. J. Probab. Stat., [56]12 (2015). https://doi.org/10.1155/2015/165468, Article ID 165468.
Article MathSciNet Google Scholar
Richter, W. -D.: Exact inference on scaling parameters in norm and antinorm contoured sample distributions. J. Stat. Distrib. Appl. 3(8) (2016). https://doi.org/10.1186/s40488-016-0046-z.
Richter, W. -D., Schicker, K.: Polyhedral star-shaped distributions. J. Probab. Stat., 35 (2017). https://doi.org/10.1155/2017/7176897.
Article MathSciNet Google Scholar
Vershynin, R.: Four lectures on probabilistic methods for data science 41 (2016). arXiv 1612.06661. Cornell University. https://arxiv.org/abs/1612.06661.

Download references

Acknowledgements

The author likes to thank all the Reviewers, Associated Editor and Editor for their valuable comments.

Funding

Not applicable.

Author information

Authors and Affiliations

University of Rostock, Institute of Mathematics, Ulmenstraße 69, Rostock, Germany
Wolf-Dieter Richter

Authors

Wolf-Dieter Richter
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wolf-Dieter Richter.

Ethics declarations

Competing interests

The author declares no conflict of interest

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Richter, WD. High-dimensional star-shaped distributions. J Stat Distrib App 6, 5 (2019). https://doi.org/10.1186/s40488-019-0096-0

Download citation

Received: 05 September 2018
Accepted: 21 May 2019
Published: 06 June 2019
DOI: https://doi.org/10.1186/s40488-019-0096-0

High-dimensional star-shaped distributions

Abstract

Introduction

Preliminaries

Example 1

Example 2

A principal component representation

A measure concentration property

Proposition 1

Proof

Corollary 1

Proof

Remark 1

Corollary 2

Proof

Remark 2

Remark 3

On g-robust statistics

Example 3

Example 4

Example 5

Example 6

Remark 4

Availability of data and material

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords