Density Deconvolution for Generalized Skew-Symmetric Distributions

This paper develops a density deconvolution estimator that assumes the density of interest is a member of the generalized skew-symmetric (GSS) family of distributions. Estimation occurs in two parts: a skewing function, as well as location and scale parameters must be estimated. A kernel method is proposed for estimating the skewing function. The mean integrated square error (MISE) of the resulting GSS deconvolution estimator is derived. Based on derivation of the MISE, two bandwidth estimation methods for estimating the skewing function are also proposed. A generalized method of moments (GMM) approach is developed for estimation of the location and scale parameters. The question of multiple solutions in applying the GMM is also considered, and two solution selection criteria are proposed. The GSS deconvolution estimator is further investigated in simulation studies and is compared to the nonparametric deconvolution estimator. For most simulation settings considered, the GSS estimator has performance superior to the nonparametric estimator.


Introduction
The density deconvolution problem arises when it is of interest to estimate the probability density function (pdf ) f x (x) of a random variable X using observations contaminated by measurement error. Specifically, the observed sample consists of data W j = X j + U j , j = 1, . . . , n, where the X j are independent and identically distributed (iid) random variables with pdf f x (x) and the U j are iid measurement error variables with pdf f u (u). This paper presents a semiparametric approach for estimating f x (x) that assumes X belongs to the class of generalized skew-symmetric (GSS) distributions. The GSS deconvolution model for X specifies a base symmetric distribution, providing the basic structure for the model. Thereafer, kernel methodology is used to estimate a skewing function that captures the deviation from the specified symmetric distribution. This semiparametric GSS approach attempts to capture the best of a parametric and a nonparametric solution and provides a very flexible approach for modeling f x (x).
The problem of estimating f x (x) from a contaminated sample W 1 , . . . , W n was first considered by Carroll and Hall (1988) and Stefanski and Carroll (1990) who proposed a fully nonparametric solution under the assumption of a fulle known measurement error distribution f u (u). Since then, much work on the topic has followed. Fan (1991a); Fan (1991b) considered the theoretical properties of the density deconvolution estimator, and Fan and Truong (1993) extended the methodology to nonparametric regression. Diggle and Hall (1993) and Neumann and Hössjer (1997) considered the case of the measurement error distribution being unknown, and assumed that an external sample of error data was available to estimate the measurement error distribution.  considered how replicate data can be used to estimate the characteristic function of the measurement error. The nonparametric estimator requires the selection of a bandwidth parameter. The two-stage plug-in bandwidth of Delaigle and Gijbels (2002) has become the gold-standard in application; Delaigle and Gijbels (2004) provides an overview of several popular bandwidth selection approaches.  considered the use of simulation-extrapolation (SIMEX) for bandwidth selection in a variety of measurement error problems.
Two more recent papers considered the deconvolution problem in new and novel ways. Delaigle and Hall (2014) considered parametrically-assisted nonparametric density deconvolution, while the groundbreaking work of Delaigle and Hall (2016) made use of the empirical phase function to estimate the pdf f x (x) with the measurement error having unknown distribution and without the need for replicate data. The phase function approach imposes the restrictions that X has no symmetric component and that the characteristic function of U is real-valued and strictly positive.
The GSS family of distributions that is the basis for estimation in this paper dates back to Azzalini (1985), the first publication discussing a so-called skew-normal distribution. There has been a great deal of activity since with the monographs by Genton (2004) and Azzalini (2013) providing a good overview of the existing literature on the topic. Much of the GSS research has been theoretical in nature. While this theoretical work is important for understanding the statistical properties of GSS distributions, the applied value of this family has not often been realized in the literature. Notable exceptions that have used GSS distributions in application include the modeling of pharmacokinetic data, see Chu et al. (2001), the redistribution of soil in tillage, see Van Oost et al. (2003), and the retrospective analysis of case-control studies, see Guolo (2008). All of these authors considered fully parametric models. Arellano-Valle et al. (2005) considered a fully parametric measurement error model assuming both X and U follow skew-normal distributions. Lachos et al. (2010) modeled X using a scale-mixture of skew-normal distributions while assuming U is a mixture of normals. Furthermore, both Kim et al. (2016) and Wang et al. (2017) consider factor analysis models using skew-symmetric distributions. Most recently, Kahrari et al. (2019) developed linear mixed models using a skew-normal-Cauchy distribution and Arellano-Valle et al. (2020) considered the measurement error problem using a two-piece normal distribution to allow for skewness. No other work applying GSS distributions in the measurement error context was found.
The present paper is structured as follows. In the next section, the GSS deconvolution estimator is developed and some of its theoretical properties derived. In the subsequent section, bandwidth estimation methods for the skewing function are considered. Thereafer, a generalized method of moments (GMM) approach for estimating the GSS location and scale parameters is developed. The penultimate section presents simultion results,

Derivation of the GSS estimator
Consider the problem of estimating the probability density function (pdf ) f x (x) associated with random variable X based on a sample contaminated by additive measurement error, W j = X j + U j , j = 1, . . . , n. Here, the X j are the true measurements of interest, and the W j and U j represent, respectively, the contaminated observation and the measurement error. It is assumed that the X j are iid f x (x), the U j are iid f u (u), and X j and U j are mutually independent for all j. Furthermore, the U j are assumed to have a symmetric distribution with mean 0 and variance σ 2 u . As is typical in the deconvolution literature, the distribution of U j is assumed fully known. Auxiliary data, when available, would make it possible to relax this assumption and estimate f u (u); see for example .
The deconvolution estimator developed here assumes that f x (x) belongs to the GSS class of distributions. That is, X = ξ + ωZ with ξ ∈ R and ω > 0 denoting location and scale parameters, and with Z having pdf with f 0 (z) a pdf symmetric around 0 and π(z), hereafter referred to as the skewing function, satisfying the inequality constraint 0 ≤ π(z) = 1 − π(z) ≤ 1. In fact, any function satisfying this inequality constraint can be paired with any symmetric pdf f 0 (z) and will result in (1) being a valid pdf. The corresponding pdf of X is f The approach considered here is semiparametric in nature. The symmetric pdf f 0 (z) is assumed known, but no parametric assumptions are made regarding the skewing function π(z). (In fact, if symmetric component f 0 (z) were not assumed known, pdf f z (z) would not be identifiable; see Appendix A.1 for details). The base density f 0 (z) provides the basic strucuture of the model, and the skewing function π(z) captures the deviation from the base model. Thus, the approach attempts to capture the best of a parametric and a nonparametric solution, and the GSS family provides a very flexible approach for modeling f z (z).
GSS random variables have an invariance property under even transformations that is central to the development of the deconvolution estimator in the remainder of this section. Let Z be GSS according to (1) and let Z 0 have symmetric pdf f 0 (z). For any even Azzalini (2013). Thus, the distribution of t(Z) depends only on f 0 (z) and not on π(z). Now, let ψ z (t) denote the characteristic function of Z, and let c 0 (t) = Re[ ψ z (t)] and s 0 (t) = Im[ ψ z (t)] denote the real and imaginary components of ψ z (t). The real component can be expressed as c 0 (t) = E [cos(tZ)]. By the property of even transformation, it follows that c 0 (t) = E [cos(tZ 0 )] which is the characteristic function associated with f 0 (z). Now, assume (ξ , ω) are known, and define W * = (W − ξ)/ω. Furthermore, observe that W * = Z + ω −1 Uand therefore has characteristic function ψ w * (t) = ψ z (t/ω)ψ u (t) where ψ u (t) is the real-valued characteristic function of U. It follows that and The functions c 0 (t) and ψ u (t) in (2) and (3) are known while s 0 (t) is unknown. Noting that f z (z) can be expressed as it follows that an estimator of s 0 (t) can be used to construct an estimator of f z (z). To this end, for random sample W 1 , . . . , W n , let W * j = (W j − ξ)/ω for j = 1, . . . , n, and definẽ This empirical estimator, while unbiased for s 0 (t), is not suitable for estimating f z (z) when substituted in (4) as the integral diverges. This is attributable to the tail behavior ofs 0 (t).
While s 0 (t) converges to 0 as |t| → ∞ for any continuous distribution,s 0 (t) corresponds to an empirical measure and diverges as |t| → ∞. This follows upon noting that the bounded periodic function n −1 j sin(tW * j ) is divided by ψ u (t/ω), with the latter decreasing to 0 as |t| increases.
Next, consider the "smoothed" estimator where ψ k (t) is a non-negative weight function and h is a bandwidth parameter. This estimator has expectation E[ŝ 0 (t)] = ψ k (ht)s 0 (t) and therefore is biased for s 0 (t). However, it also has some desirable properties. Firstly, it is an odd function,ŝ 0 (−t) = −ŝ 0 (t) for all t ∈ R. Secondly, substitution of (5) into (4) results in the well-defined estimator for f z (z), provided ψ k (t) is chosen such that |ψ k (ht)/ψ u (t/ω)| → 0 as |t| → ∞. Choosing ψ k (t) to be 0 outside a bounded interval will trivially satisfy this requirement. Estimator (6) suffers from the same drawback as the usual nonparametric deconvolution estimator in that it may be negative in parts. In practice, the negative parts can be truncated and the resulting function rescaled to integrate to 1. To circumvent this ad-hoc fix, combine Eqs. (1) and (4) to obtain Substitution of (5) in (7), along with the identity sin(tz) = e itz − e −itz /(2i), giveŝ is the well-studied nonparametric deconvolution density estimator of Carroll and Hall (1988) The potential for (6) being negative in parts is reflected in (8) not being range-respecting. Specifically, it is possible to haveπ(z) ∈ [0, 1] for a set z with nonzero measure. A range-corrected skewing function estimator isπ(z) = max 0, min 1,π (z) . The estimated density function of X based on the range-corrected skewing function is Use of the range-corrected skewing function estimate ensures that (9) is always a valid pdf. There is no need for any additional truncation of negative values and subsequent rescaling as would be the case with direct implementation of (6).

Some properties of the estimator
The range-corrected estimatorπ(z) is asymptotically equivalent toπ(z) in (8) on any closed subset of R. As such, the latter will be used to evaluate the properties of the GSS deconvolution estimator. Firstly, note that using the known expected value of the nonparametric deconvolution estimatorf w * (z), it follows from (8) that with constant c k depending only on the kernel function ψ k (t). Thus, for an appropriately chosen bandwidth h,π(z) is consistent for π(z), and the density estimatorf (x|ξ , ω) in (9) is also consistent for f x (x). The mean integrated square error (MISE), derived in Appendix A.2, is When the distribution Z is symmetric, i.e. π(z) = 1/2 for all z so that s 0 (t) = 0 for all t, and letting MISE sym denotes the MISE calculated under symmetry, Here the inequality follows upon noting that |1 − c 0 (2t)ψ u (2t/ω)| ≤ 2 for all t. This upper bound of MISE sym is proportional to the asymptotic MISE of the nonparametric deconvolution estimator, see equation (2.7) in Stefanski & Carroll (1990). Thus, in the symmetric case, one would expect the GSS deconvolution estimator to perform no worse than the nonparametric deconvolution estimator for a correctly specified symmetric component c 0 (t). In fact, since this is an upper bound, large gains in efficiency may be possible. Our simulation results presented in a later section are congruent with this statement.

Bandwidth selection
Implementation of the GSS deconvolution estimator requires a bandwidth paramter h to be specified. Two methods for selecting this bandwidth are developed in this section. The first method uses cross-validation (CV) to approximate the integrated square error (ISE), and the second method approximates the MISE in (10).

A cross-validation bandwidth
For the GSS deconvolution estimator, the density-based ISE is proportional to the ISE for the imaginary component s 0 (t) of the characteristic function, This follows from Parseval's identity and recalling that the real component c 0 (t) is known. Let C(h) denote the expression obtained by expanding the square on the right-hand side of (11) and keeping only terms involving the estimatorŝ 0 (t), Now, note that the second integral in (12) can be written as Defines (i) (t) to be an estimate of s 0 (t) excluding the ith observation, This quantity is unbiased for s 0 (t) for all i, ands (i) (t) is independent of W i . The CV score follows by substitution ofs (i) (t) in (13) for each i in the summand, givinĝ This result is similar to that of Stefanski and Carroll (1990) in the nonparametic setting, but here only requires estimating the imaginary component of the characteristic function. The CV bandwidth is defined to be the valueh that minimizesĈ(h).

An MISE bandwidth
Consider the MISE in (10), and note that the only unknown quantity therein is s 2 where I(·) is the indicator function and κ is some positive constant. The constant κ can be thought of as a smoothing parameter which ensures that the estimatorŝ 2 (t) behaves well for large values of |t|. Ideally, κ should be chosen in a data-dependent way and development of this approach is ongoing. However, based on extensive simulation work, it has been found that values κ ∈[ 3, 5] work reasonably well for a wide range of underlying GSS distributions considered. Now, taking (10), substitutingŝ 2 (t) for s 2 0 (t), and ignoring components that do not depend on the bandwidth, gives MISE approximation scorê The MISE-approximation bandwidth is defined to be the valueh that minimizesM(h).

Generalized method of moments
Up to this point, the location and scale parameters ξ and ω have been treated as known quantities. This is unrealistic in practice. Estimation of the GSS parameters for a known symmetric component has been considered in the literature, see Ma et al. (2005); Azzalini et al. (2010), and Potgieter and Genton (2013). However, none of these authors considered the presence of measurement error. Here, a Generalized Method of Moments (GMM) approach accounting for measurement error is developed.
Let M ≥ 2 be a positive integer and assume that the Z j and the U j have at least 2M finite moments. Let T k denote the (2k)th centered moment, This variable has expectation E [T k ] = E Z + ω −1 U 2k and admits expansion By the GSS property of even transformations, 0 ] for j = 1, . . . , M with Z 0 a random variable with pdf f 0 (z). Furthermore, the evaluation of the moments of U pose no problem as this distribution is assumed known. Thus, E[ T k ] can easily be evaluated using (18).
The GMM estimators are defined to be the minimizer of D(ξ , ω). In evaluating D(ξ , ω), both the expectations E [T k ], k = 1, . . . , M and the covariance matrix are functions of the parameter ω, but not of ξ .

Selection from multiple GMM solutions
One difficulty encountered with the GMM approach is that the statistic D(ξ , ω) frequently has multiple minima, and the global minimum does not always corresponds to the "correct" solution. This equivalent problem also occurs in the non-measurement error setting and is an artifact of the skewing function being unknown; see Section 7.2.2 in Azzalini (2013) for an overview and illustration. Solutions considered there range from selecting the model with the smallest squared integral of the second derivative of the estimated skewing function, to selecting a solution based on matching model-based and empirical skewness coefficients. Now, assume that D(ξ , ω) has J local minima occurring at (ξ j ,ω j ), j = 1, . . . , J. Furthermore, letf j (x|ξ j ,ω j ) denote the GSS density deconvolution estimator in (9) obtained using solution (ξ j ,ω j ). Thus, J different GSS deconvolution estimators are calculated. Using the jth estimated density, define the kth model-implied moment, and model-implied characteristic function, Based on these quantities, two different selection methods are now proposed. Throughout, it will be assumed that measurement error U has distribution symmetric about 0.
Skewness matching: (19). The selected solution is the one with implied skewness closest to the empirical skewness. Specifically, letting Phase function matching: The phase function, a normalized version of the characteristic function, is a recent tool employed in density deconvolution -see Delaigle and Hall (2016) and Nghiem and Potgieter (2018) for further details. Let ρ w (t) and ρ x (t), denote the phase functions of X and W = X + U. For U having strictly positive characteristic function, these phase functions are equal, ρ w (t) = ρ x (t) for all t. The empirical estimate of the phase function of X isρ x (t) =ψ w (t)/|ψ w (t)| withψ w (t) the empirical characteristic function of W, and |z| = (zz) 1/2 andz denoting the complex norm and cojucate of z. For the jth GMM solution (ξ j ,ω j ), the model-implied phase function is given bỹ ρ j (t) =φ j (t)/|φ j (t)| withφ j (t) as defined in (20). Now, letting w(t) denote a non-negative weight function symmetric around 0, define distance metric R j = R |ρ x (t) −ρ j (t)|w(t)dt for j = 1, . . . , J. The selection solution is (ξ j * ,ω j * ) with j * = arg min 1≤j≤J R j . That is, the selected solution has minimum phase function distance. In this paper, weight function w(t) =[ 1 − (t/t * ) 2 ] 3 I(|t| ≤ t * ) will be used with t * the smallest t > 0 such that |ψ w (t)| ≤ n −1/4 as per Delaigle and Hall (2016).

Simulation studies
The performance of the GSS deconvolution estimator was evaluated using extensive simulations. Letting φ(z) and (z) denote the standard normal density and distribution functions, data X 1 , . . . , X n were generated from GSS distributions with symmetric component f 0 (z) = φ(z) and using three different skewing functions, π 0 (z) = 1/2, π 1 (z) = (9.9625z) and π 2 (z) = (z 3 − 2z). The location and scale parameters were taken to be ξ = 0 and ω = 1. Figure 1 illustrates the three resulting pdfs f /ω], k = 0, 1, 2. Note that the skewing function π 0 (z) does not introduce any deviation from symmetry and corresponds to simulating from a normal distribution. Additionally, the skewing function π 1 (z) results in a positive skew distribution, while π 2 (z) results in a bimodal distribution.

Fig. 1 Skew-symmetric densities used in simulation study
Two measurement error distributions were considered with U 1 , . . . , U n being either Normal or Laplace with mean 0 and variance chosen to have noise-to-signal ratio NSR = σ 2 u /σ 2 x either 0.2 or 0.5. Samples W j = X j + U j , j = 1, . . . , n, with n ∈ {50, 100, 200, 500} were generated from each of the possible simulation configurations described.

Comparison of oracle estimators
The first simulation study presented compares the proposed GSS estimator to the established nonparametric estimator of Carroll and Hall (1988), and assumes the existence of an oracle that selects the "best" possible bandwidth for each of the estimators. Specifically, for a sample W 1 , . . . , W n , letf gss (x|h) andf np (x|h) denote, respectively, the GSS and nonparametric estimators with bandwidth h. The ISE is defined as where m ∈ {gss, np}. Then, the "best" bandwidth is the value that minimizes the ISE between the estimated and true densities. Furthermore, when GMM results in more than one solution for the GSS location and scale parameters, the oracle also selects the solution that result in smallest ISE. In practice, no oracle exists to do these selections. Even so, comparing the estimators under such idealized conditions speaks to the best possible performance of these methods.
For each simulation configuration, N = 1000 samples were generated. Due to the occasional occurrence of very large outliers in ISE, the median ISE (rather than mean ISE) is reported. The first and third quartiles of ISE are also reported. Results for n ∈ {200, 500} are summarized in Table 1, and for n ∈ {50, 100} are presented in Table 6 in Appendix A.5.
Inspection of Table 1 shows how well the GSS estimator can perform relative to the nonparametric estimator. In the symmetric case with skewing function π 0 (z), the reduction in median ISE is most dramatic and exceeds 50% in all cases. For skewing functions π 1 (z) and π 2 (z), the reduction in median ISE is also seen to be as large as 40%. There is one instance where median ISE of the nonparametric estimator is smaller than that of the GSS estimator -skewing function π 2 (z) with NSR = 0.5, Laplace measurement error, and sample size n = 200. (The same holds true for sample sizes n = 50 and 100 in Table 6.) However, the equivalent scenario with sample size n = 500 has the GSS estimator with smaller median ISE. This possibly indicates the effect of estimating the location and scale parameters in smaller samples and when large amounts of heavier-tailed-than-normal measurement error is present. Overall, the GSS deconvolution estimator performs very well. Thus, the additional structure being imposed through the a priori specification of the symmetric pdf f 0 (z) can result in a large decrease in ISE.

Bandwidth estimation
The next simulation study investigated the two proposed bandwidth estimation approaches. Specifically, the CV and MISE bandwidths as well as the two-stage plug-in (PI) bandwidth of Delaigle and Gijbels (2002), originally developed for nonparametric deconvolution, were implemented. For each simulated sample, the ISE was calculated. When necessary, GMM solution selection with phase-function matching was used. The nonparametric deconvolution estimator with PI bandwidth was also calculated; corresponding results are included for reference purposes. The median ISE values for the  Tables 2 and 3 for sample sizes n ∈ {200, 500}, and in Tables 7 and 8 in Appendix A.5 for sample sizes n ∈ {50, 100}.
In Tables 2 and 3, it is seen that there isn't a consisent "best" bandwidth method. For skewing functions π 0 (the symmetric case) and π 2 , the PI bandwidth generally has smallest median ISE. In these same scenarios, MISE frequently (but by no means consistently) outperforms CV. For π 1 (z) the MISE bandwidth performs best. In all simulation settings, there is a GSS bandwidth method that results in better performance than the nonparametric estimator. These same conclusions broadly hold for sample sizes n ∈ {50, 100} in Appendix A.5.
The results presented above were restricted to phase-function matching for the GMM estimators, as it was found to generally have better performance that skewness matching. For details of the simulation comparing the two GMM matching methods, see Appendix A.3.

GMM estimation
One other simulation study was performed, and considered the choice of M (the number of even moment to use) when evaluating the GMM estimators of (ξ , ω). These simulations results are presented in Appendix A.4. In summary, the larger value M = 5 was generally seen to outperform M = 2 for π 1 (z) and π 2 (z). In the symmetric π 0 (z) case, M = 2 performed slightly better than M = 5. In all instances, root mean square error (RMSE) was used as criterion.

Coal abrasiveness index data
Data from an industrial application, first considered by Lombard (2005), are analyzed here. The data were obtained by taking batches of coal, splitting them in two, and randomly allocating each of the half-batches to one of two methods used to measure the abrasiveness index (AI) of coal, a measure of the quality of coal. The observed data consist of 98 pairs (w 1i , w 2i ) assumed to be from a population with W 1i = X i + U 1i and Here, X i denotes the true AI of the ith batch, U 1i and U i2 denote measurement error, and constants μ and σ account for the two AI measurement methods being on different scales. Of interest is estimating f x (x), the true density of AI. However, the data (w 1i , w 2i ) first need to be combined in a sensible way. To this end, let μ w,k and σ 2 w,k denote the mean and variance of the W ki , k = 1, 2, and let μ x and σ 2 x denote the mean and variance of the X i . Note that μ w,1 = μ x , μ w,2 = μ + σ μ x , σ 2 w,1 = σ 2 x +σ 2 u , and σ 2 w,2 = σ 2 σ 2 x + σ 2 u . By replacing the population moments with their sample counterparts, estimatorsσ = s w,2 /s w,1 = 0.679 andμ =w 2 −σw 1 = 59.503 are obtained. Here, (w 1 , s w,1 ) denote the sample mean and standard deviation of the observed w 1 -data with similar definitions holding for the w 2 -quantities. Now, the paired observations are combined as w i = 0.5w 1i + 0.5 w 2i −μ /σ . At the population level this corresponds to W i ≈ X i +0.5 (U 1i + U 2i ) := X i +ε i . An estimate of the measurement error variance σ 2 ε is obtained by calculatingσ 2 and noting thatσ 2 ε = 174.6/2 = 87.3. This corresponds to the W i having noise-to-signal ratio NSR = 16.35%.
The GSS deconvolution estimator for f x (x) is now calculated assuming a normal symmetric component, f 0 (z) = φ(z), along with a Laplace distribution for the measurement error ε. (The equivalent estimator assuming normal measurement was also calculated and is nearly identical in shape.) GMM with M = 5 gives solution pairs (ξ 1 ,ω 1 ) = (192.88, 29.90) and (ξ 2 ,ω 2 ) = (230. 41, 32.43). For each of these, the corresponding skewing function estimateπ j (z) and phase function distance R j was calculated, the latter using weight function w(t) =[ 1 − (t/t * ) 2 ] 3 for t ∈[ −t * , t * ] and t * = 0.06. Here, R 1 = 0.023 < 0.046 = R 2 and therefore solution (ξ 1 ,ω 1 ) with estimated skewing functionπ 1 (z) was selected. Skewness matching resulted in selection of the same solution. Figure 2 shows a kernel density estimator of f w (w), the density of the contaminated W i , as well as the GSS deconvolution estimator of f x (x) with MISE bandwidth h = 0.102.
This application illustrates one of the less appealing aspects of the GSS approach sometimes encountered in smaller samples. Note the sharp "edge" in the GSS estimator around x = 225. This is an artefact of the hard truncation applied when calculating the range-respecting skewing function estimateπ(z). The resulting density estimate is (2020) 7:2 Page 13 of 20

Fig. 2 Abrasiveness Index Density Estimation
not differentiable at this point. This is equivalent to non-differentiable points in the nonparametric deconvolution estimator when it is truncated to be positive.

Systolic blood pressure data
The data here are a subset of n = 1615 male participants in the Framingham Heart Study, see for example Carroll et al. (2006) for more detail. The data consist of systolic blood pressure measurements from two patient exams (the second and third exams in the study). At each exam, two replicate measurements were obtained giving data (SBP 21 , SBP 22 , SBP 31 , SBP 32 ). Let P 1 = (SBP 21 + SBP 22 )/2 and P 2 = (SBP 31 + SBP 32 )/2 denote the average systolic blood pressure observed at each of the exams, and calculate transformed variables W j = log(P j − 50), j = 1, 2, as suggested by Carroll et al. (2006). This is done to adjust large skewness present in the data. The measurement W = (W 1 + W 2 )/2 = X + U is a surrogate for the true long-term average systolic blood pressure X (on the transformed logarithmic scale). Using the replicates (W 1 , W 2 ), estimate standard deviationsσ x = 0.1976 andσ u = 0.0802 are obtained. The GSS deconvolution estimator assuming a Laplace distribution for the measurement error U and using a normal reference density f 0 (z) = φ(z) was computed. GMM with M = 5 resulted in only one solution, (ξ ,ω) = (4.429, 0.210), and therefore no selection was needed. Figure 3 displays both the GSS deconvolution estimator and the nonparametric deconvolution estimator, both with PI bandwidths.
The nonparametric deconvolution estimator has previously been applied to the Framingham Heart Study. It is therefore reassuring that the GSS estimator is not dissimilar in appearance.

Conclusion
In this paper, the density deconvolution problem is considered for variables belonging to the family of generalized skew-symmetric (GSS) distributions. Implementation requires both the estimation of location and scale parameters (ξ , ω), and the estimation of a skewing function π(z). Estimation methods are proposed for both of these quantities, and extensive simulation studies are performed. In simulation studies performed, the GSS deconvolution estimator is generally seen to result in large improvements over the nonparametric deconvolution estimator (using median ISE as criterion).
There are still several questions related to GSS deconvolution that can be considered. Firstly, the estimator requires the specification of a known symmetric component f 0 (z). While this is done to ensure model identifiability, it would be possible to consider several candidate symmetric densities and choose the "best" among these. The related goodness-of-fit testing problem for a specified symmetric component can also be explored. Secondly, it should be noted that the contaminated W also has a GSS distribution. An alternative modeling approach could therefore estimate the pdf of W directly and then recover the pdf of X. Lastly, it was observed in the simulation study that the nonparametric deconvolution kernel in a few isolated instances had superior performance to the GSS estimator under selection, while GSS had better under oracle conditions for the same simulation configurations. This suggests that further refinement of the bandwidth calculation and solution selection procedure may be possible, and related work is ongoing. and note that π t (t) satisfies 0 ≤ π t (t) = 1 − π t (−t) ≤ 1. By construction, it follows that f y (y) can be expressed as f y (y) = 2f t (y − ξ)π t (y − ξ). Assuming that Y has finite variance, the variance of T is given by ω 2 ξ = R t 2 f t (t)dt. Then, letting f ξ (t) = f t (t/ω ξ )/ω ξ and π ξ (t) = π t (t/ω ξ ), it is possible to write This representation does not depend on a specific value for ξ and, as such, holds for every ξ . However, each value of ξ is associated with a different symmetric component f ξ (z) and skewing function π ξ (z). As such, there is a family of distributions f ξ (z) symmetric about 0 and with unit variance such that the random variable Y can be expressed as a GSS distribution with symmetric component belonging to this family. The work in this paper is motivated by the assumption that it is possible to correctly specify one symmetric distribution in the family f ξ (z).

A.2 MISE derivation
To derive an expression for the mean integrated square error (MISE), considering the estimatorŝ 0 (t) defined in (5). Recall that E ŝ 0 (t) = ψ k (ht)s 0 (t). Additionally, it has covariance structure The integrated squared error (ISE) of the GSS estimator can now be expressed in terms ofŝ 0 (t), where the first equality is an application of Parseval's identity, and the second follows upon noting that the estimated characteristic functionψ z (t) and true characteristic function ψ z (t) have common real component c 0 (t) which therefore cancels out, leaving only the estimated and true imaginary components. Also note that ISE is a function of the bandwidth h throughŝ 0 (t). Now, MISE = E[ ISE] can be evaluated using the expectation and covariance functions associated withŝ 0 (t), in the latter setting t 1 = t 2 = t. Eq. 10 follows.

A.3 GMM estimators simulation
The performance of GMM estimation of (ξ , ω) was evaluated in a simulation study. Data were simulated as described in the main paper. For each simulated dataset, the estimators minimizing D(ξ , ω) were obtained for both M = 2 and M = 5 even moments. While the sixth, eight and tenth sample moments used for the M = 5 setting arguably contain additional information, there is a great deal of added variability introduced when estimating these higher order moments. This simulation explored the benefits, if any, of doing so. In simulated samples where multiple solutions (ξ j ,ω j ), j = 1, . . . , J were obtained, the existence of an oracle able to choose the solution  closest to the true value (0, 1) (as measured using Euclidean distance) was assumed. A total of N = 1000 samples were generated for each simulation configuration. Root mean square error (RMSE) was used as criterion, and the results are shown in Table 4. In the setting with X normal, i.e. using π 0 (z), using M = 5 moments results in a small increase in RMSE compared to the case M = 2. The average increase in RMSE for ξ is 1.2% and for ω is 9.5% across the settings considered. On the other hand, the simulation results for skewing functions π 1 (z) and π 2 (z) look very different. Here, the RMSE for ξ decreases for both skewing functions, and the RMSE for ω decreases for skewing function π 2 (z). Also, the average RMSE of ω for π 2 (z) remains unchanged across the simulation settings considered. One possible reason for the increase in RMSE in the symmetric case is that the underlying distribution is normal and therefore higher-order moments do not contain any "extra" information about the distribution. On the other hand, for π 1 (z) and π 2 (z) there is a substantive departure from normality and the higher-order sample moments, despite their large variability, do contain useful information about the underlying distribution. As the increase in RMSE in the symmetric case is relatively small compared to the decrease in the asymmetric cases, the paper uses the GMM estimators with M = 5 in all other simulations.

A.4 Solution selection simulation
The simulation results comparing the performance of the skewness matching and phase function distance solution selection mechanisms follow here. Data were generated as described in the "Simulation studies" section of the main paper. For each simulated sample, all GMM solutions (ξ j ,ω j ), j = 1, . . . , J were obtained. Solution selection was then implemented for both skewness matching and phase function matching. These techniques require a bandwidht to be selected. The simulation implemented CV, MISE, and PI bandwidth selection. However, the conclusions with regards to selection methods were very similar for these and therefore only MISE bandwidth results are included here.
To contextualize these results from selection, results corresponding to an oracle able to choose the solution with smallest ISE are also reported, as well as a blind selection approach randomly selecting one of the GMM solutions. The simulation results are summarized in Table 5. In this table, the median ISE of skewness matching and phase function distance are given in the columns SKW and PHS. The column MIN contains the median ISE for the oracle selecting the solution with smallest ISE, while RND contains the median ISE of randomly selecting one of the GMM solutions. Finally, the median ISE of the nonparametric deconvolution estimator with PI bandwidth is given in column NP for reference purposes.
Inspection of Table 5 shows that estimation under both the skewness and phase function matching generally performs better than the fully nonparametric estimator, with the exception being the combination of skewing function π 2 (z) and Laplace measurement error. However, as the GSS estimator outperformed the nonparametric estimator under an oracle bandwidth as seen in Table 1 of the main paper, this does suggest that further improvement of the GSS estimator may still be possible by refining parameter estimation and bandwidth selection -this is ongoing work. Further inspection of Table 5 shows that estimation under both the skewness and phase function matching performs better than random selection, with the exception that random selection outperforms the skewness matching for π 1 (z) and normal measurement error. While there are a few instances where skewness matching outperformed phase function matching, the latter generally has very good performance and comes close to the best possible performance of the minimum ISE under oracle selection.

A.5 supplemental simulation results
This subsection contains two sets of supplemental simulation results. The first of these, found in Table 6, pertains to a comparison of oracle estimators for sample sizes n = {50, 100}. The second of these, found in Tables 7 and 8, pertains to comparing bandwidth estimation methods for sample sizes n = {50, 100}. The conclusions that can be drawn from these results are consistent with those discussed in the "Simulation studies" section of the main paper, and are included here for completeness.