A nonparametric approach for quantile regression

Quantile regression estimates conditional quantiles and has wide applications in the real world. Estimating high conditional quantiles is an important problem. The regular quantile regression (QR) method often designs a linear or non-linear model, then estimates the coefficients to obtain the estimated conditional quantiles. This approach may be restricted by the linear model setting. To overcome this problem, this paper proposes a direct nonparametric quantile regression method with five-step algorithm. Monte Carlo simulations show good efficiency for the proposed direct QR estimator relative to the regular QR estimator. The paper also investigates two real-world examples of applications by using the proposed method. Studies of the simulations and the examples illustrate that the proposed direct nonparametric quantile regression model fits the data set better than the regular quantile regression method.

The linear quantile regression problem can be formulated as a linear program min (β(τ ),u,v)∈R p ×R 2n where 1 T n is an n-vector of 1s, X denotes the n×p design matrix, and u, v are n × 1 vectors with elements of u i , v i , i = 1, . . . , n, respectively (Koenker, 2005).
In recent years, studies are looking for efficiency improvements of estimator (3) (Yu et al. 2003;Wang and Li 2013;Huang et al. 2015;Huang and Nguyen 2017). The regular linear quantile regression (2) needs the estimator β(τ ) in (3) for the estimated conditional quantile curves. But this estimated conditional quantile curve may be restricted under the model setting.
Many studies have used nonparametric method of quantile regression in recent years, for example, Chaudhuri (2003), Yu and Jones (1991), Hall et al. (1999) and Yu et al. (2003). Chapter 7 in Keoker (2005) proposed a local polynomial quantile regression (LPQR), and other methods. Also we can see detailed discussions on theory, methodologies and applications in Li and Racine (2007) and Cai (2013).
In order to overcome the limitation of the model setting in (2) in this paer we propose a direct nonparametric quantile regression method which uses the ideas of nonparametric kernel density estimation and nonparametric kernel regression. The proposed method is not only different from most other existing nonparametric quantile regression methods, it also overcome thecrossing problem of estimating quantile curves. We like to see if the new method has an improvement relative to the regular linear quantile regression and other nonparametric quantile regression methods, we will do two studies in this paper: 1. Monte Carlo simulations will be performed to confirm the better efficiency of the new direct QR estimator relative to the regular QR estimator and a nonparametric LPQR.
2. The new proposed method will be applied to two real-world examples of extreme events and compared with the linear model in Huang and Nguyen (2017).
In Section 2, we propose a direct nonparametric quantile regression estimator. A relative measure of comparing goodness-of-fit for quantile models is given in Section 3. In Section 4, the results of Monte Carlo simulations generated from Gumbel's second kind of bivariate exponential distribution Gumbel (1960) show that the proposed direct method produces high efficiencies relative to existing linear QR and LPQR methods. In Section 5, the regular linear quantile regression and the proposed direct quantile regression are applied to two real-life examples: the Buffalo snowfall and CO 2 emission examples in Huang and Nguyen (2017). The study of these examples illustrate that the proposed direct nonparametric quantile regression model fits the data better than the existing linear quantile regression method.

Proposed direct nonparametric quantile regression
In this paper, for generality, we ignore the idea of the linear model (2). We obtain a direct estimator for true conditional quantile in (1): We construct the following a five-step algorithm of a direct nonparametric quantile regression: Step 1: Estimate the conditional density of y for given x = (x 1 ,x 2 , . . . ,x d ) using a kernel density estimation method (Silverman 1986;Scott 2015): where f (y, x) is an estimator of the joint density of y and x, and g(x) is an estimator of the marginal density of x.
A d-dimensional kernel density estimator from a random sample X i = (X 1i ,X 2i , . . . ,X di ), i = 1, 2, . . . , n, from a population x = (x 1 ,x 2 , . . . ,x d ) for joint density g(x),is given by where h > 0 is the bandwidth and the kernel function K(x) is a function defined for Fukunaga (1972) suggested using where S is the sample covariance matrix of the data, K is the normal kernel, the function k is A plug-in selector of the bandwidth h > 0 will be given by (Silverman 1986, p. 85) as If a multivariate normal kernel is used for smoothing the normal distribution data with unit variance, Step 2: Estimate the conditional c.d.f. of y given x : Step 3: Estimate the local conditional quantile function ξ(τ |x) of y given x by inverting an estimated conditional c.d.f. F(y|x).
It is difficult to compute a global inverse function ξ(τ |x) of the kernel estimated conditional c.d.f. F(y|x) which has many terms. To avoid the the computational global difficulties, we estimate the local conditional quantile point Thus, we have n points Step 4: We propose a direct nonparametric quantile regression estimator for the τ th conditional quantile curve of x by using Nadaraya-Watson (NW) nonparametric regression estimator (Scott, 2015, p where K is the kernel function, and h j > 0 is the bandwidth for the j th dimension. The new point of (7) is that it uses Step 3's (6) numerical results: n points . . , n, to estimate a conditional mean curve of the τ th quantile function based on these n points, then smoothes these n points out. In this paper, for the kernel regression, we use K which is the standard normal kernel. Similar as formula (5), we use the optimal bandwidth for the jth dimension (Silverman 1986, p.40), where , n is the sample size of the random sample in (4).
Step 5: Check all procedures, and make any necessary adjustments.

Comparison of goodness-of-fit on quantile regression models
In order to compare the regular QR estimator in (3) and the direct nonparametric QR estimator in (7), we extend the idea of measuring goodness-of-fit by Koenker and Machado (1999). We suggest using a Relative R(τ ), 0 < τ < 1, which is defined as where Q D (τ |x i ) is obtained by (7), and where β(τ ) is given by (3).

Simulations
For investigating the proposed direct nonparametric quantile regression estimator in (7), in this Section, Monte Carlo simulations are performed. We generate m random samples with size n each from the second kind of Gumbel's bivariate exponential distribution Gumbel (1960) which has a non-linear conditional quantile function of y given x in (11). It has c.d.f. F(x, y) and density function f (x, y) in (10): The conditional density of y for given x is The conditional c.d.f. of y for given x is The true τ th conditional quantile function of y given x of (10) is x ≥ 0, α > 0, 0 < τ < 1.
Next, we compare Q D (τ |x) and Q R (τ |x) in Figs. 3 and 4. Figure 3 shows the boxplots of Q R (τ |x) and Q D (τ |x) for τ = 0.95, 0.97, and 0.99.(The true conditional quantiles are in blue line). The Q D (τ |x) has much smaller variance than Q R (τ |x)s. Figure 4 shows the average curves of the 100 estimated τ = 0.95th quantile curves of Q R (τ |x) (in blue dash line) and that of Q D (τ |x) (in red solid). The average Q D (τ |x) curve is much closer than Q R (τ |x) to the true quantile curve (in green dash).  From the overall results of the simulation, we can conclude that Table 1 and Figs. 2, 3, and 4 show that for τ = 0.95, . . . , 0.99, the proposed direct estimator Q D (τ |x) in (7) is more efficient relative to the regular regression Q R (τ |x) in (2) and a nonparametric LPQR in (13).

Real examples of applications
In this section, we apply the following two regression models to the Buffalo snowfall and CO 2 emission examples in Huang and Nguyen (2017): 1. The regular quantile regression Q R (τ |x) in model (2) using estimator β(τ ) in (3); 2. The direct nonparametric quantile regression Q D (τ |x) in (7).

Buffalo snowfall example
Huang and Nguyen (2017) used the following linear second order polynomial quantile regression model for this example (National Weather Service Forecast Office 2017):  where y represents the total snowfall (cm) and x represents the maximum temperature ( • C).
In this paper we use the proposed five-step algorithm in Section 2 to obtain the new direct nonparametric quantile estimator Q D (τ |x) in (7). We compare the new estimator Q D (τ |x) with the regular quantile estimator Q R (τ |x) in Huang and Nguyen (2017). Table 2   Relative R τ of Q D relative to Q R for the Buffalo snowfall example quantile curves at τ = 0,95, 0.97 and 0.99. It is interesting to see that the Q D curves appear to follow the data patterns closer than the Q R curves. Table 2 lists the estimated Buffalo snowfall quantile values at a given maximum temperature for τ = 0.97 and 0.99. It demonstrates that when quantiles are at high τ , the Q D gives greater variety of snowfall predictions than the Q R . The relationship of snowfall and max-temperature is not necessarily linear. Figure 6 and Table 3 show the values of the Relative R(τ ) in (9) for given τ = 0.95, . . . , 0.99. We note that R(τ ) > 0 which means that V D (τ ) < V R (τ ) and Q D is a better fit to the data than Q R . Figure 5c shows that the proposed direct nonparametric quantile regression Q D predicts that for moderate temperatures, such as 5 • C to 10 • C, it is likely to have smaller but varied snowfalls in Buffalo than the regular Q D predicts. For temperature over 10 • C, the Q D predicts a much higher value snow amount than the regular Q R predicts. On another side, for very low temperatures, such as − 15 • C to 0 • C, the Q D and Q R both predict more likely to have extreme heavy snowfalls that may cause damage. Thus prediction of heavy snowfalls is related to cold weather forecasts. But the prediction snowfalls related to temperature from the Q D is not as a simple linear relationship as Q R predicts. We also note that lots of snow occurred between -5 • C to 0 • C; the predictions form the Q D are reflecting this fact and give varied predictions.

CO 2 emission example
Huang and Nguyen (2017) used the linear quantile regression model for this example: where y represents CO 2 emission (tonnes) per capita, x 1 represents ln of gross domestic product (GPD) (US $), per capita and x 2 represents ln of electricity consumption (E.C.) (kilowatts) per capita (Carbon Dioxide Information Analysis Centre (2017)).
Similar as in the Buffalo Snowfall example in Subsection 5.1, we use the proposed fivestep algorithm in Section 2 to obtain the new direct nonparametric quantile estimator    (7). We compare the new estimator Q D (τ |x) with the regular quantile estimator Q R (τ |x) in Huang and Nguyen (2017). Figures 7, 8 and Tables 4, 5 show the differences of the values of two estimators. Figure 7a shows the 3D scatter plot of CO 2 emission vs ln(GDP) and ln(EC) with the fitted regular Q R surface at τ = 0.97. Figure 7b shows the 3D scatter plot of CO 2 emission vs ln(GDP) and ln(EC) with the fitted direct Q D surface at τ = 0.97. Figure 7c shows the 3D scatter plot with both the regular Q R (green) and direct Q D (red) quantile surfaces of CO 2 emission vs the ln(GDP) and ln(E.C.) at τ = 0.97. It is interesting to see the difference between the Q R and Q D quantile surfaces.
We may see the Q R and Q D quantile curves more cleanly in 2D plots. Figure 8a shows the 2D scatter plot of CO 2 emission vs ln(GDP) when the country's E.C. is 2980.96 kilowatts with the fitted regular Q R and direct Q D curves at at τ = 0.97. Figure 8b shows the 2D scatter plot of CO 2 emission vs ln(E.C.) when the country's GDP is $13,359.73 with the fitted regular Q R and direct Q D curves at at τ = 0.97. We note that the Q R and Q D quantile regression curves appear to fit the data. In general, the Q D curves follow the data patterns closer than Q R quantile lines, and the Q D produces different estimated CO 2 emissions than the Q R estimated at high quantiles. In Fig. 7, it is interesting to see that the Q D conditional quantile surfaces are not linear as the linear planes of the Q R .
Tables 4 and 5 provide details of the estimated high quantiles about countries' CO 2 emission at τ = 0.97 when the countries consume 2980.96 kilowatts of electricity and have a GDP of $13,359.73, respectively.  Fig. 9 Relative R(τ ) of Q D relative to Q R for the CO 2 emission example Figure 9 and Table 6 show the Relative R(τ ) in (9), for τ = 0.95, . . . , 0.99. All values of Relative R(τ ) are larger than 0, which signifies that V D (τ ) < V R (τ ) and it also suggests that the direct quantile regression estimator Q D is a better fit to the CO 2 emission data than the regular quantile regression estimator Q R .
Over all, it is interesting to see that the proposed direct estimator Q D gave more variety of predictions than the Q R on CO 2 emissions relative to gross domestic product and amounts of electricity produced. The relationships are not necessarily linear and model free. We expect that the predictions from Q D may be more reasonable. The predictions may benefit prevention of further damages of CO 2 emissions to the environment.

Conclusions
After the above studies, we can conclude: 1. This paper proposes a new direct nonparametric quantile regression method which is model free. It uses nonparametric density estimation and nonparametric regression techniques to estimate high conditional quantiles. The paper provides a computational five-step algorithm which overcomes the limitations of the estimation in the linear quantile regression model and some other nonparametric quantile regression methods.
2. The Monte Carlo simulation works on the second kind of Gumbel's bivariate exponential distribution which has a nonlinear conditional quantile function. The simulation is different from the bivariate Pareto distribution which has a linear conditional quantile function, in Huang and Nguyen (2017). The simulation results confirm that the proposed new method is more efficient relative to the regular quantile regression estimators and a local polynomial nonparametric estimator.
3. The proposed new direct nonparametric quantile regression can be used to predict extreme values of snowfall and CO 2 emission examples in Huang and Nguyen (2017). The proposed direct quantile regression Q D estimator gives a variety of predictions which fits data very well. The prediction of relationships are not simply just linear. We expect that the predictions from Q D may be more reasonable than the regular quantile regression predictions. The new estimator may benefit prevention of further damages of the extreme events to human and the environment.
4. The proposed direct nonparametric quantile regression provides an alternative way for quantile regression. Further studies on the details of this method are suggested.