An R package for modeling and simulating generalized spherical and related distributions
 John P. Nolan^{1}Email authorView ORCID ID profile
https://doi.org/10.1186/s4048801600530
© The Author(s) 2016
Received: 22 January 2016
Accepted: 13 October 2016
Published: 28 October 2016
Abstract
A flexible class of multivariate generalized spherical distributions with starshaped level sets is developed. To work in dimension above two requires tools from computational geometry and multivariate numerical integration. An algorithm to approximately simulate from these starshaped distributions is developed; it also works for simulating from more general tessellations. These techniques are implemented in the R package gensphere.
Keywords
Generalized spherical distributions Starshaped distributions Computational geometryMathematics Subject Classification (2000)
62H11 68U05 65D30Introduction
There is a need for tractable models for multivariate data with nonstandard dependence structures. Our motivation here was to be able to flexibly model distributions with starshaped level sets. The R package gensphere has been developed that allows one to work with these classes of distributions: specifying flexible shapes for the level sets, computing densities, and simulating. A deliberate goal in this process is to have methods and programs that work in dimension d≥2, and this requires some methods from computational geometry. While the original intent focused on starshaped regions, some of the tools developed here are useful for other problems, e.g. sampling from more general sets.
A motivating example for this work is to model fragment dispersion from an explosion. In such problems, the fragments disperse in three dimensions in patterns like those of Fig. 4. The ability to easily specify different contour functions by adding together multiple terms as in Section 2.1 is of practical importance for describing different types of explosive devices. The goal of this modeling is to design better body and vehicle armor to protect people.
Under integrability conditions discussed below, this will give a probability density function on \({\mathbb {R}}^{d}\), and the level sets of such a distribution are scalar multiples of \(\mathcal {C}\). Such distributions are also called homothetic, see Balkema and Nolde (2010), Section 3.1 or Simon and Blume (1994), Section 20.4. We will call c(·) the contour function and g(·) the radial decay function of the distribution.
Our approach differs from Fernández et al. (1995) where they start with a function \(v : {\mathbb {R}}^{d} \to [0,\infty)\) that is homogeneous: v(a x)=av(x). Such functions are called gauge functions or Minkowski functionals, and are well studied in convex analysis and functional analysis. The relationship between their v function and our contour function is v(x)=x/c(x/x). If c(s)=1, then \(\mathcal {C}\) is the unit sphere and v(x)=x, so the resulting classes of distributions are the spherical/isotropic distributions. If v(·) is convex, then v(·) is a norm on \({\mathbb {R}}^{d}\) and \(\mathcal {C}\) is the unit sphere in that norm, hence the name vspherical distributions. When v(·) is not convex, e.g. the ℓ _{ p } quasinorm with p<1, v(x) does not give a norm, so \(\mathcal {C}\) is not strictly speaking a unit sphere, but we will still call the resulting distributions vspherical.

Define a flexible set of contours

Carefully tessellate a contour

Sample from a tessellation

Use a contour and a radial function g(·) to define a generalized spherical distribution

Compute the density f(·) given by (1)

Approximately simulate from a distribution with density f(·)
The third step above also provides a way to simulate from paths and surfaces unrelated to generalized spherical laws, giving new classes of probability distributions on paths and surfaces.
Other references on generalized spherical laws are Arnold et al. (2008), Kamiya et al. (2008), Rattihalli and Basugade (2009), Rattihalli and Patil (2010), and Balkema and Nolde (2010). These papers develop the idea of generalized spherical distributions, but do not provide general purpose software for working with these distributions and do not cover techniques for working with higher dimensional models. Richter (2014) gives a rigorous investigation of pgeneralized elliptically contoured distributions, with a detailed analysis of the surface measure and a polar disintegration of the laws.
Generalized spherical distributions
We will assume c(·) is continuous on \({\mathbb {S}}\) and that c(s)≤c _{0}. This guarantees (2) is finite, though evaluating it may be difficult, especially when d>2. Section 3 discusses an approach to this problem that improves the accuracy of this computation for the types of contours considered here. Given any univariate probability density h(·) on the positive axis, the function \(g(r)=k_{\mathcal {C}} r^{1d} h(r)\) is a valid radial decay function. This is the approach used in the rest of this paper and in the associated package.
Choosing Z uniformly distributed (proportional to surface area) on the contour does not work in general. Richter (2014) shows this works in special circumstances, e.g. if the contour \(\mathcal {C}\) is an ℓ _{2} ball, ℓ _{1} ball, or ℓ _{ ∞ } ball. In Section 3 we develop a way to approximately simulate a wider class of distributions by using a piecewise linear approach: approximate the contour \(\mathcal {C}\) by a simplicial tessellation and use (4) on each piece.
2.1 Specification of a contour function
For modeling purposes, we want a flexible family of functions that can be used in a variety of problems.

r(s)=1, which makes \(\mathcal {C}\) the Euclidean ball. Any isotropic/radially symmetric distribution can be modeled by using just this term in a contour function and the appropriate radial decay function.

r(s)=c(sμ,θ) is a cone with peak 1 at center \(\boldsymbol {\mu } \in {\mathbb {S}}\) and height 0 at the base given by the circle \(\{{\mathbf {x}} \in {\mathbb {S}} : \boldsymbol {\mu } \cdot {\mathbf {x}} = \cos \theta \}\). It is assumed that θ≤π/2.

r(s)=c(sμ,σ)= exp(−t(s)^{2}/(2σ ^{2})) is a Gaussian bump centered at location \(\boldsymbol {\mu } \in {\mathbb {S}}\) and “standard deviation” σ>0. Here t(s) is the distance between μ and the projection of \({\mathbf {s}} \in {\mathbb {S}}\) linearly onto the plane tangent to \({\mathbb {S}}\) at μ.

\(r^{*}({\mathbf {s}}) = \vert \vert {\mathbf {s}} \vert \vert _{\ell ^{p}({\mathbb {R}}^{d})}\), p>0.

\(r^{*}({\mathbf {s}}) = \vert \vert A {\mathbf {s}} \vert \vert _{\ell ^{p}({\mathbb {R}}^{m})}\), p>0, A an (m×d) matrix. This allows a generalized pnorm. If A is d×d and orthogonal, then the resulting contour will be a rotation of the standard unit ball in ℓ ^{ p }. If A is d×d and not orthogonal, then the contour will be sheared. If m>d, it will give the ℓ ^{ p } norm on \({\mathbb {R}}^{m}\) of A s.

r ^{∗}(s)=(s ^{⊤} A s)^{1/2}, where A is a positive definite (d×d) matrix. Then the level curves of the distribution are ellipses. Any elliptically contoured distribution can be modeled by using just this term in a contour function and the appropriate radial decay function.
2.2 Choice of R
In general, g(r) can be any nonnegative integrable function. The radial decay of R determines the decay of f(·) on \({\mathbb {R}}^{d}\). In most applications one wants 0<g(0)<∞ and g(r) decreasing for r>0, but other possibilities may be of interest. If g(0)=0, the density surface given by (1) will have a “well” at the origin; if g(0)=+∞, then the density blows up at the origin. If g(·) oscillates, then the density surface will have radial “waves” emanating out from the origin. If R has bounded support, then X will have bounded support.
The gamma distributions give a family of distributions that can be used to get generalized spherical distributions with light tails. If a Γ(d,1) law is used for R, then h(r)=Γ(d)^{−1} r ^{ d−1} exp(−r), so \(g(r)=k_{\mathcal {C}} r^{1d} h(r) = (k_{\mathcal {C}}/\Gamma (d)) \exp (r)\), which is finite at the origin and monotonically decreasing. If one wants heavy tails for X, then some possibilities for R are Fréchet, Pareto and multivariate stable amplitude. (The latter is defined in Nolan (2013) by R=Z, where Z is radially symmetric/isotropic αstable in ddimensions. Numerical methods to calculate the density h(r) of R and simple ways to simulate are given in the reference).
Contours: tessellating, integrating and simulating
A large part of the technical complexity of working with generalized spherical laws is in representing the contours and evaluating the norming constant \(k_{\mathcal {C}}\) in (2) and simulating from the contour \(\mathcal {C}\). The gensphere package uses two other recent R packages for these problems: SphericalCubature Nolan (2015b) and mvmesh Nolan (2015a).
SphericalCubature numerically integrates a function on a ddimensional sphere. Given a tessellation of the sphere in \({\mathbb {R}}^{d}\), it uses adaptive integration to integrate over the (d−1)dimensional surface to evaluate \(k_{\mathcal {C}}\). If the integrand function is smooth and the tessellation is reasonable, then the numerical integration is accurate in modest dimensions, say d=2,3,4,5,6. However, when the integrand function has abrupt changes, numerical techniques can miss parts of the integral. This is even a problem in dimension 2, where the integration is a one dimensional problem. One way to deal with this is to work with tessellations that focus on the places where the integrand is not smooth. In complete generality, this is hard to do. However, in evaluating integral (2) for one of the contours described above, we have an implicit description of where the contour changes abruptly.
The mvmesh package is used to define multivariate meshes, e.g. a collection of vertices and grouping information that specify a list of simplices that approximate a contour. The first place where mvmesh is used in gensphere is to give a grid on the sphere \({\mathbb {S}}\) in ddimensions, e.g. the top left plot in Fig. 1. mvmesh has a function UnitSphere that computes an approximately equal surface area approximation to a hypersphere in dimension d. It takes a parameter k to say how many recursive subdivisions are used in each octant; increasing this value will give a finer tessellation of the sphere. Then this tessellation is refined by adding points to the sphere centered on the places where the contour has bumps, e.g. the cone and Gaussian bumps (type 2 and 3). Then the new points are combined with the original tessellation of the sphere to get a refined tessellation of the sphere that includes these key points.
It is at this point that the SphericalCubature package is used to evaluate the integral (2). This is difficult to accurately evaluate in dimension greater than three if the contour is not smooth. In addition to the estimate of the integral, we use an option in the adaptive integration routine to return the partition used in the multivariate cubature, along with the estimated integral over each simplex. The reasoning is that the integration routine is subdividing regions where the integrand is changing quickly to get a better estimate of the integrand. This subdivision should make the tessellation more closely approximate the contour. We now have the final tessellation of the unit sphere, an estimate of the integral (2) over each of the simplices, and an estimate of the norming constant, e.g. sum of these just mentioned values.
Now the tessellation of the contour is defined by deforming the tessellation of the sphere to the contour: each partition point \({\mathbf {s}} \in {\mathbb {S}}\) gets mapped to c(s)s on the contour. The grouping information from the spherical tessellation is inherited by the contour tessellation. This tessellation is returned as an S3 object of class “mvmesh”. This object contains the vertices, the grouping information, and a list of all the simplices S _{1},S _{2},…,S _{ k } in the tessellation. One advantage of this is that the plot method from the mvmesh package can plot the contours in 2 and 3 dimensions. This process of refining the tessellation has two purposes: (a) get a more accurate estimate of the norming constant by focusing the numerical integration routine on regions where the integrand changes rapidly and (b) get a more accurate tessellation of the contour. Each step of this process can add more simplices, with the goal of capturing key features of the contour. For example, the contour in Fig. 4 started with 512 simplices in the tessellation of the sphere in \({\mathbb {R}}^{3}\) with k=3, adding the points on the cones brought the number up to 888 simplices, and after the adaptive cubature routine subdivision there were 2284 simplices.
Exact simulation from a surface is a challenging problem and general methods are difficult to apply for complicated contours like our starshaped regions. We now describe an approximate method based on the above tessellation. Recall that the above process gives us a list of simplices S _{1},…,S _{ m } and associated weights w _{1},…,w _{ m }, with w _{ j } an estimate of the surface area of the contour approximated by simplex S _{ j }.

Select an index j∈{1,…,m} with probability proportional to w _{ j }.

Simulate a point u that is uniformly distributed on the unit simplex in ddimensions. This is standard: simulate u from a Dirichlet distribution with parameter α=(1,1,…,1), e.g. let E _{1},…,E _{ d } i.i.d. standard exponential random variates and set \({\mathbf {u}}=(E_{1},\ldots,E_{d})/ \left (\sum _{i=1}^{d} E_{i} \right)\).

Map the point u to the simplex S _{ j } using the coordinates of u as barycentric coordinates: Z=u ^{⊤} S _{ j }.

Simulate R from the radial distribution with density h(r).

Return the value X=R Z.
The subdivision process, including the numerical cubature is the slowest part of the process. This is done in the R function cfunc.finish, which finishes the definition of a contour by performing the above calculations and saving the results in an object of class “contour.function”. For the example the 3dimensional example in Fig. 4 took about half an hour^{1} to complete the construction.
In contrast, once the tessellation is produced, density calculations and simulations are quite fast: to evaluate a density at 10,000 points takes less than a second and to simulate 100,000 random vectors takes less than a second for this example.
In principle, the methods described here work in any dimension; in practice the numerical challenges, particularly evaluating the integral in (2) and the time needed to work limit us as the dimension increases. At the current time, these methods are useful for low dimension d=2, 3, or 4.
Endnote
^{1} Times are for an Intel i54460 CPU at 3.20 GHz.
Appendix
Declarations
Acknowledgements
The author is grateful to the referees and associate editor who provided valuable suggestions on improving the paper and additional references.
Supported by contract W911NF1210385 from the Army Research Office.
Competing interests
I confirm that I have read SpringerOpen’s guidance on competing interests and have no competing interests in the manuscript.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
 Arnold, BC, Castillo, E, Sarabia, JM: Multivariate distributions defined in terms of contours. J. Stat. Plan. Inf. 138, 4158–4171 (2008).MathSciNetView ArticleMATHGoogle Scholar
 Balkema, G, Nolde, N: Asymptotic independence for unimodal densities. Adv. Appl. Prob. 42, 411–432 (2010).MathSciNetView ArticleMATHGoogle Scholar
 Fernández, C, Osiewalski, J, Steel, MFJ: Modeling and inference with vspherical distributions. J. Amer. Stat. Assoc. 90, 1331–1340 (1995).MathSciNetMATHGoogle Scholar
 Kamiya, H, Takemura, A, Kuriki, S: Starshaped distributions and their generalizations. J. Stat. Plan. Inf. 138, 3429–3447 (2008).MathSciNetView ArticleMATHGoogle Scholar
 Nolan, JP: Multivariate elliptically contoured stable distributions: theory and estimation. Comp. Stat. 28, 2067–2089 (2013).MathSciNetView ArticleMATHGoogle Scholar
 Nolan, JP: mvmesh: Multivariate Meshes and Histograms in Arbitrary Dimensions. R package version 1.1, on CRAN (2015a). https://CRAN.Rproject.org/package=mvmesh. Accessed 16 May 2016.
 Nolan, JP: SphericalCubature: Numerical Integration over Spheres and Balls in nDimensions. R package version 1.1, on CRAN (2015b). https://CRAN.Rproject.org/package=SphericalCubature. Accessed 24 July 2016.
 Rattihalli, RN, Basugade, AB: Generation of densities using contour transformations. J. Indian Stat. Assoc. 47, 63–90 (2009).MathSciNetGoogle Scholar
 Rattihalli, RN, Patil, PY: Generalized vspherical densities. Comm. Stat. Theory Methods. 39, 3568–3583 (2010).MathSciNetView ArticleMATHGoogle Scholar
 Richter, WD: Geometric disintegration and starshaped distributions. J. Stat. Distrib. Appl. 1, 20 (2014). doi:http://dx.doi.org/10.1186/s4048801400206.
 Simon, C, Blume, L: Mathematics for Economists. Norton, New York (1994).Google Scholar