 Short report
 Open Access
Highdimensional starshaped distributions
 WolfDieter Richter^{1}Email authorView ORCID ID profile
https://doi.org/10.1186/s4048801900960
© The Author(s) 2019
 Received: 5 September 2018
 Accepted: 21 May 2019
 Published: 6 June 2019
Abstract
Stochastic representations of starshaped distributed random vectors having heavy or light tail density generating function g are studied for increasing dimensions along with corresponding geometric measure representations. Intervals are considered where star radius variables take values with high probability, and the derivation of values of distribution functions of grobust statistics is proved to be based upon considering random events whose probability is asymptotically negligible if the dimension of the sample vector is approaching infinity. Moreover, a principal component representation of pgeneralized elliptically contoured pgeneralized Gaussian distributions is discussed.
Keywords
 Increasing dimension
 Starshaped distribution
 Star radius distribution
 Staruniform distribution
 pgeneralized elliptically contoured distribution
 Principal component representation
 grobust statistic
 Indeterminate form
Introduction
Among the frequently obtained impressions one gets from analyzing highdimensional data sets are that an observation point’s distance from the zero element of the sample space is likely to belong to a certain interval from the positive real line, away from zero, and that the distribution of the direction of the vector seems to be close, in a certain sense, to a uniform distribution on the set of all directions that are observable from a certain center. The first observation can be reflected from a probabilistic point of view by a measure concentration type property including what is done in (Biau and Mason 2015) and (Vershynin 2016), and the second is part of background for testing uniformity on highdimensional spheres, see e.g. (Cutting et al. 2017), possibly after projecting data points onto spheres as, e.g., in (Banerjee and Ghosh 2004).
In situations of the described type, it may be reasonable to model the data, or their residuals after fitting to a model, by multivariate starshaped distributions. In this regard, (Balkema and Embrechts 2007) and (Balkema et al. 2010) discover conditions ensuring that starshaped distributions with the Gaussexponential law being one of the most known examples appear as limit laws in certain highrisk scenarios.
Distributions from the class of starshaped distributions are flexible with respect to convexity or radial concavity, allow different variability of probability mass along different directions of the sample space and are able to model light and heavy distribution centers and tails. Because there is no natural number being representative for large dimensions, one might like to consider sequences or schemas of series of ndimensional vectors with n approaching infinity. However, for simplicity of notation, we instead consider here just a single random vector X taking values in \(\mathbb {R}^{n}\) and assume afterwards that n is tending to infinity in formulas holding for X.
Let us recall at this point the following general aspect of uni or multivariate asymptotic probabilistic analysis being of particular importance, for example, in large deviation theory, but not exclusively there. Studying the limit behavior of certain sequences of distributions on specific subsets of their ranges of definition and comparing it to how the appearing limit law itself behaves on the same sets needs to precisely know the latter one. In this respect, it is an independent problem to study the behavior limit laws show on the sets of interest. Similarly, if a sequence of distributions of increasing dimension is approximated in a certain part of its range of definition by a highdimensional starshaped limit law then studying the latter one is an independent problem being in the core of interest of the present note.
With the agreement of considering just one single vector X of dimension n, particular questions concerned by the buzzword ’big data’ are approached in the present short note by reflecting above mentioned impressions gained from data in the language of probability distributions. To be more specific, we are dealing here with starshaped distributions in \(\mathbb {R}^{n}\) and correspondingly distributed vectors. Such vector allows a stochastic representation as a product of a random generalized radius variable R and a random vector U being staruniformly distributed on a starsphere and independent of R, as well as a corresponding geometric measure representation. Some consequences which can be drawn from these representations in case of increasing dimension are studied. In particular, a representation of pgeneralized elliptically contoured distributions is considered from the point of view of principal components.
The paper is structured as follows. In “Preliminaries” section, we present preliminary facts on starshaped distributions including the notions of star surface content measure and staruniform distribution on a star sphere. “A principal component representation” section deals with the particular class of pgeneralized elliptically contoured distributions and it is studied there how they apply to modeling highdimensional data. “A measure concentration property” section is then aimed to consider typical intervals where R takes values if X is starshaped distributed, and in “On grobust statistics” section distributions of univariate statistics are described which can basically be derived from staruniformly distributed vectors. Such distributions are not affected by whether X has a density generating function g generating light or heavy distribution tails and is therefore called grobust. The derivation of values of distribution functions of grobust statistics is proved to be based upon considering random events whose probability is asymptotically negligible if the dimension of the sample vector is approaching infinity.
Preliminaries
Example 1

If p=2 and a=1_{n}=(1,...,1)^{T} then \(\mathfrak {O}_{S}(A)\) is the Euclidean surface content of the measurable subset A of S.

If p=1 then \(\mathfrak {O}_{S}(A)\) can be considered as a particular polyhedral generalized surface content of A.

If \(\mathfrak {O}_{S,\infty }(A)\) is defined as the limit of \(\mathfrak {O}_{S}(A), A\in {\mathfrak {B}(S)}\) as p→∞ then \(\mathfrak {O}_{S,\infty }\) can be considered as another particular polyhedral generalized surface content measure. For the whole class of polyhedral generalized surface content measures, see (Richter and Schicker 2017).

Generalizations of representation (1) hold true for all cases where K is a ball with respect to any norm or antinorm, see (Richter 2015).
The next example deals with the asymptotic behavior of star surface content and volume of star spheres and star balls or ellipsoids, respectively, if dimension is approaching infinity.
Example 2
A principal component representation
In this section we study to what extent a particular class of continuous multivariate starshaped distributions applies to modeling highdimensional data. To be specific, we consider pgeneralized elliptically contoured pgeneralized Gaussian distributions.
It will be shown which way principal component analysis can be used to identify those components of such starshaped distributed vectors being of major importance for the modeling process. In particular it turns out that covariance ellipsoids being l_{2}ellipsoids have the same main axes as density level set ellipsoids being l_{p}ellipsoids.
A measure concentration property
If a multivariate distribution converges to the standard Gaussian law then the square of the Euclidean norm of the correspondingly distributed vector X, i.e. the square of the Euclidean radius R=X_{1,2} of such vector, will tend under some additional assumption to the χ^{2}distribution with n d.f.. Let now X follow a starshaped distribution Φ_{g,K}, what can we say then about the behavior of (the suitably defined power of) its generalized (star) radius? In this section we derive typical intervals where star radius variables of highdimensional starshaped vectors take values.
Proposition 1
Proof
According to (Biau and Mason 2015), the behavior of X_{1,p} as n increases is called the distance concentration phenomenon in the computational learning literature. For sums of independent random variables or matrices, sharper concentration inequalities of exponential type are proved in (Vershynin 2016).
For more details on moments of pspherical random vectors, see (ArellanoValle and Richter 2012), for an asymmetric situation if p=1 see (Henschel and Richter 2002). The following corollary deals with a class of light tailed highdimensional starshaped distributions.
Corollary 1
Proof
Remark 1
Corollary 2
Proof
Remark 2
Remark 3
Let us finally mention that because in fact we are considering sequences or even schemas of series of vectors and distributions, the assumption k−n→∞ stated in Corollary 2 is not contradictory. Instead, it ensures a certain variability of the result. Moreover, we remark that (Henschel 2001) and (Henschel and Richter 2002) study the exact distribution of R in case of simplicially contoured vectors (or l_{n,1}spherical vectors having nonnegative components). General l_{n,p}spherical vectors and their star radius R are studied in (Richter 2009), (ArellanoValle and Richter 2012) and (Richter 2014), tables of corresponding exact quantiles of R^{p} and R are to be found in (Müller and Richter 2016) and (Richter 2016).
On grobust statistics
If the distribution of a statistic does not depend on the density generating function g of a starshaped sample vector density φ_{g,K} then it is commonly called grobust. It is well known that Student and Fisher type statistics possess besides the grobustness property further optimality properties. Here we will see that decisions based upon such statistics are done by closer analyzing random events whose probability is asymptotically negligible if the dimension of the sample vector is approaching infinity.
To be more concrete, in this section, we describe a class of statistical distribution functions, derived from a starshaped sample vector, the ratio representations of whose values are asymptotically negligible as vector dimension increases unboundedly. It turns out, e.g., that classical and generalized Student and Fisher distributions belong to this class.
For a particular case of such type, see Example 2(a). The statistical model concerned in this case is dealing with independent and homoscedastic random variables. In case of increasing dimensions, we are confronted then with sequences of probability spaces with asymptotically negligible set S.
Example 3
In case S is the unit l_{n,p}sphere, condition (6) is satisfied.
In the following example we restrict our consideration to the ndimensional standard Gaussian law Φ_{g,K}=Φ where \(g(r)=e^{r^{2}/2}\) and \(K=\left \{x: x_{1}^{2}+...+x_{n}^{2}\leq 1\right \}.\)
Example 4
A set A⊂R^{n} belongs to the class \(\mathfrak {A}(dir,dist)\) if there exist functions e_{A}: [0,∞)→S_{n}(1) and R_{A}: [0,∞)→[0,∞) satisfying the following two assumptions:
Example 5
Let \(A=B(t)=\{T_{\mathfrak {e},\mathcal {N}}< t\}\). Then \(A\in \mathfrak {A}(dir,dist)\) where \(e_{A}(r)=\mathfrak {e},\) the distance type function is \( R_{B(t)}(r)=\tilde {t}r /(\tilde {t}^{\,2}+1)^{1/2},\ \tilde {t}=t/\sqrt {n1}\) and the function \( \alpha ^{*}(r)=\arctan \ (1/\tilde {t}\,) \)is constant. Evaluating with k=n−1 the limit of Φ(A)as n→∞ leads to the well known result Φ_{0,1}(t) where Φ_{0,1} denotes the cumulative distribution function of the univariate standard normal distribution.
For similar properties of a corresponding exact Student test in nonlinear regression, see (Ittrich 2000) and (Ittrich and Richter 2005).
Example 6
For a related consideration on the pgeneralized Fisher statistic, see (Richter 2009).
Remark 4
If one is interested in avoiding the asymptotic negligibility of S in case of increasing dimension one may leave the class of statistical models dealing with independent homoscedastic observations. Density level sets of sample vectors having heteroscedastic components may be starshaped. If S is an (a,p)ellipsoid then, according to asymptotic relation (2), condition (6) may by violated and even asymptotically stabilizing.
Declarations
Acknowledgements
The author likes to thank all the Reviewers, Associated Editor and Editor for their valuable comments.
Funding
Not applicable.
Competing interests
The author declares no conflict of interest
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
 ArellanoValle, R. B., Richter, W. D.: On skewed continuous l _{n,p}symmetric distributions. Chil. J. Stat. 3(2), 193–212 (2012).MathSciNetGoogle Scholar
 Balkema, A., Embrechts, P., Nolde N.: Meta densities and the shape of their sample clouds. J. Multivar. Anal. 101, 1738–1754 (2010).MathSciNetView ArticleGoogle Scholar
 Balkema, G., Embrechts, P.: High Risk Scenarios and Extremes. A geometric approach. European Mathematical Society Publishing House, Zürich (2007).View ArticleGoogle Scholar
 Banerjee, A., Ghosh, J.: Frequency sensitive competitive learning for scalable balanced clustering on highdimensional hypersurfaces. IEEE Neural Netw. 15, 702–719 (2004).View ArticleGoogle Scholar
 Biau, G., Mason, D.: Highdimensional pnorms. In: Hallin P Mason, Steinebach (eds) Mathematical statistics and limit theorems, pp. 21–40. Springer, Cham (2015).Google Scholar
 Chen, C. P., Lin, L.: Inequalities for the volume of the unit ball in \(\mathbb {R}^{n}\). Mediterr. J. Math. 11, 299–314 (2014).MathSciNetView ArticleGoogle Scholar
 Cutting, C., Paindaveine, D., Verdebout, T.: Testing uniformity on highdimensional spheres. Anna Stat. 45(3), 1024–1058 (2017).View ArticleGoogle Scholar
 Henschel, V.: Ausgewählte lineare Modelle in simplizial konturiert verteilten Grundgesamtheiten. GCAVerlag, Herdecke (2001).Google Scholar
 Henschel, V., Richter, W. D.: Geometric generalization of the exponential law. J. Multivar. Anal.81(2), 189–204 (2002).MathSciNetView ArticleGoogle Scholar
 Ittrich, C.: Exakte Methoden in Regressionsmodellen mit einem nichtlinearen Parameter und sphärisch symmetrischen Fehlern. Shaker, Aachen (2000).Google Scholar
 Ittrich, C., Richter, W. D.: Exact tests and confidence regions in nonlinear regression. Statistics. 39, 13–42 (2005).MathSciNetView ArticleGoogle Scholar
 Loskot, P., Beaulieu, N. C.: On monotonicity of the hypersphere volume and area. J Geo. 87, 96–98 (2007). https://doi.org/10.1007/s0002200718911.MathSciNetView ArticleGoogle Scholar
 Müller, K., Richter, W. D.: Extreme value distributions for dependentjointly l _{n,p}symmetrically distributed random variables. Depend. Model. 4, 30–62 (2016). https://doi.org/10.1515/demo20160002.MathSciNetMATHGoogle Scholar
 Richter, W. D.: A geometric approach to the Gaussian law. In: Mammitzsch/Schneeweiß (ed.)Symposia Gaussiana, pp. 25–45. Walter de Gruyter & Co., berlin, Conf. B. (1995).Google Scholar
 Richter, W. D.: Generalized spherical and simplicial coordinates. J. Math Anal. Appl. 335, 1187–1202 (2007). https://doi.org/10.1016/j.jmaa.2007.03.047.MathSciNetView ArticleGoogle Scholar
 Richter, W. D.: Continuous l _{n,p}symmetric distributions. Lith. Math J. 49(1), 93–108 (2009). https://doi.org/10.1007/s1098600990303.MathSciNetView ArticleGoogle Scholar
 Richter, W. D.: Geometric disintegration and starshaped distributions. J. Stat. Distrib. Appl. 1(20) (2014). https://doi.org/10.1186/s4048801400206.
 Richter, W. D.: Convex and radially concave contoured distributions. J. Probab. Stat., [56]12 (2015). https://doi.org/10.1155/2015/165468, Article ID 165468.MathSciNetView ArticleGoogle Scholar
 Richter, W. D.: Exact inference on scaling parameters in norm and antinorm contoured sample distributions. J. Stat. Distrib. Appl. 3(8) (2016). https://doi.org/10.1186/s404880160046z.
 Richter, W. D., Schicker, K.: Polyhedral starshaped distributions. J. Probab. Stat., 35 (2017). https://doi.org/10.1155/2017/7176897.MathSciNetView ArticleGoogle Scholar
 Vershynin, R.: Four lectures on probabilistic methods for data science 41 (2016). arXiv 1612.06661. Cornell University. https://arxiv.org/abs/1612.06661.