A nonparametric approach for quantile regression

Huang, Mei Ling; Nguyen, Christine

doi:10.1186/s40488-018-0084-9

Research
Open access
Published: 18 July 2018

A nonparametric approach for quantile regression

Mei Ling Huang¹ &
Christine Nguyen²

Journal of Statistical Distributions and Applications volume 5, Article number: 3 (2018) Cite this article

6607 Accesses
4 Citations
2 Altmetric
Metrics details

Abstract

Quantile regression estimates conditional quantiles and has wide applications in the real world. Estimating high conditional quantiles is an important problem. The regular quantile regression (QR) method often designs a linear or non-linear model, then estimates the coefficients to obtain the estimated conditional quantiles. This approach may be restricted by the linear model setting. To overcome this problem, this paper proposes a direct nonparametric quantile regression method with five-step algorithm. Monte Carlo simulations show good efficiency for the proposed direct QR estimator relative to the regular QR estimator. The paper also investigates two real-world examples of applications by using the proposed method. Studies of the simulations and the examples illustrate that the proposed direct nonparametric quantile regression model fits the data set better than the regular quantile regression method.

Introduction

It is important to study quantile regression to estimate high conditional quantiles in real-world events Koenker (2005). Some extreme events can cause damages to society: stock market crashes, pipeline failures, large flooding, wildfires, pollution, earth quakes and hurricanes. We wish to estimate high conditional quantiles of a random variable y with cumulative distribution function (c.d.f.) F(y) given a variable vector, x=(x₁,x₂,…,x_d), and x_p=(1,x₁,x₂,…,x_d)^T∈R^p where p=d+1. The τth conditional linear quantile is defined by

$$ Q_{y}(\tau |\mathbf{x})=Q_{y}(\tau |x_{1},x_{2},\ldots,x_{d})=F^{-1}(\tau | \mathbf{x}),\text{\ }0<\tau <1. $$

(1)

The traditional quantile regression is concerned with the estimation of the τth conditional quantile regression (QR) of y for given x which often sets a linear model as

$$ Q_{y}(\tau |\mathbf{x})=\mathbf{x}_{p}^{T}\mathbf{\beta }(\tau)=\beta_{0}(\tau)+\beta_{1}(\tau)x_{1}+\cdots +\beta_{d}(\tau)x_{d}, 0<\tau <1, $$

(2)

where β(τ)=(β₀(τ),β₁(τ),β₂(τ),…,β_d(τ))^T.

For linear model (2), we estimate the coefficient β(τ)=(β₀(τ),β₁(τ),β₂(τ),…,β_d(τ))^T∈R^p from a random sample {(y_i,x_i),i=1,…,n}, where x_pi=(1,x_i1,x_i2,…,x_id)^T is the p-dimensional design vector and y_i is the univariate response variable from a continuous distribution with a c.d.f. F(y). Koenker and Bassett (1978) proposed an L₁-weighted loss function to obtain estimator $\widehat {\mathbf {\beta }} (\tau)$ by solving

$$ \widehat{\mathbf{\beta }}(\tau)=\text{arg}\mathop{\text{min}}\limits_{\mathbf{\beta }(\tau)\in R^{p}}\sum\limits_{i=1}^{n}\rho_{\tau }(y_{i}-\mathbf{x}_{pi}^{T}\mathbf{\beta } (\tau)),\ 0<\tau <1, $$

(3)

where ρ_τ is a loss function, namely

$$\rho_{\tau }(u)=u(\tau -I(u<0))=\left\{ \begin{array}{l} u(\tau -1),u<0; \\ u\tau,\ u\geq 0. \end{array} \right. $$

The linear quantile regression problem can be formulated as a linear program

$$\mathop{\text{min}}\limits_{(\mathbf{\beta }(\tau),\mathbf{u},\mathbf{v})\in R^{p}\times R_{+}^{2n}}\{\tau \mathbf{1}_{n}^{T}\mathbf{u}+(1-\tau)\mathbf{1}_{n}^{T} \mathbf{v}|\mathbf{X\beta }(\tau)+\mathbf{u}-\mathbf{v}=\mathbf{y}\}, $$

where $\mathbf {1}_{n}^{T}$ is an n-vector of 1s, X denotes the n×p design matrix, and u,v are n × 1 vectors with elements of u_i,v_i, i=1,…,n, respectively (Koenker, 2005).

In recent years, studies are looking for efficiency improvements of estimator (3) (Yu et al. 2003; Wang and Li 2013; Huang et al. 2015; Huang and Nguyen 2017). The regular linear quantile regression (2) needs the estimator $\widehat {\mathbf {\beta }} (\tau)$ in (3) for the estimated conditional quantile curves. But this estimated conditional quantile curve may be restricted under the model setting.

Many studies have used nonparametric method of quantile regression in recent years, for example, Chaudhuri (2003), Yu and Jones (1991), Hall et al. (1999) and Yu et al. (2003). Chapter 7 in Keoker (2005) proposed a local polynomial quantile regression (LPQR), and other methods. Also we can see detailed discussions on theory, methodologies and applications in Li and Racine (2007) and Cai (2013).

In order to overcome the limitation of the model setting in (2) in this paer we propose a direct nonparametric quantile regression method which uses the ideas of nonparametric kernel density estimation and nonparametric kernel regression. The proposed method is not only different from most other existing nonparametric quantile regression methods, it also overcome thecrossing problem of estimating quantile curves. We like to see if the new method has an improvement relative to the regular linear quantile regression and other nonparametric quantile regression methods, we will do two studies in this paper:

1. Monte Carlo simulations will be performed to confirm the better efficiency of the new direct QR estimator relative to the regular QR estimator and a nonparametric LPQR.

2. The new proposed method will be applied to two real-world examples of extreme events and compared with the linear model in Huang and Nguyen (2017).

In Section 2, we propose a direct nonparametric quantile regression estimator. A relative measure of comparing goodness-of-fit for quantile models is given in Section 3. In Section 4, the results of Monte Carlo simulations generated from Gumbel’s second kind of bivariate exponential distribution Gumbel (1960) show that the proposed direct method produces high efficiencies relative to existing linear QR and LPQR methods. In Section 5, the regular linear quantile regression and the proposed direct quantile regression are applied to two real-life examples: the Buffalo snowfall and CO₂ emission examples in Huang and Nguyen (2017). The study of these examples illustrate that the proposed direct nonparametric quantile regression model fits the data better than the existing linear quantile regression method.

Proposed direct nonparametric quantile regression

In this paper, for generality, we ignore the idea of the linear model (2). We obtain a direct estimator for true conditional quantile in (1):

$$\widehat{Q}_{y}(\tau |\mathbf{x})=\widehat{Q}_{y}(\tau |x_{1},x_{2},\ldots,x_{d})=\widehat{F}^{-1}(\tau |\mathbf{x}), $$

by using local conditional quantile estimator ξ_i(τ|x_i)=Q_y(τ|x_i) based the ith point of given random sample, {(y_i,x_i),i= 1,…,n}, for x_i=(x_1i,x_2i,…,x_di)^T.

We construct the following a five-step algorithm of a direct nonparametric quantile regression:

Step 1: Estimate the conditional density of y for given x=(x₁,x₂,…,x_d) using a kernel density estimation method (Silverman 1986; Scott 2015):

$$ \widehat{f}(y|\mathbf{x})=\frac{\widehat{f}(y,\mathbf{x})}{\widehat{g}(\mathbf{x})}, $$

(4)

where $\widehat {f}(y,\mathbf {x})$ is an estimator of the joint density of y and x, and $\widehat {g}(\mathbf {x)}$ is an estimator of the marginal density of x.

A d-dimensional kernel density estimator from a random sample X_i=(X_1i,X_2i,…,X_di), i=1,2,…,n, from a population x=(x₁,x₂,…,x_d) for joint density g(x),is given by

$$\widehat{g}(\mathbf{x})=\frac{1}{nh^{d}}\sum\limits_{i=1}^{n}K\left\{ \frac{ \mathbf{x}-\mathbf{X}_{i}}{h}\right\}, $$

where h>0 is the bandwidth and the kernel function K(x) is a function defined for d-dimensional x=(x₁,x₂,…,x_d) which satisfies $\int \limits _{R^{d}}K(\mathbf {x})d \mathbf {x}=1.$

Fukunaga (1972) suggested using

$$\widehat{g}(\mathbf{x})=\frac{(\det \mathbf{S})^{-1/2}}{nh^{d}} \sum\limits_{i=1}^{n}k\left\{ \frac{(\mathbf{x}-\mathbf{X}_{i})^{T}\mathbf{S }^{-1}(\mathbf{x}-\mathbf{X}_{i})}{h^{2}}\right\}, $$

where S is the sample covariance matrix of the data, K is the normal kernel, the function k is

$$k(u)=\left(\frac{1}{2\pi }\right)^{d/2}\exp \left(-\frac{u}{2}\right),\quad k(\mathbf{x}^{T}\mathbf{x)}=K(\mathbf{x})=(2\pi)^{-d/2}\exp \left(- \frac{1}{2}\mathbf{x}^{T}\mathbf{x}\right) \mathbf{.} $$

A plug-in selector of the bandwidth h>0 will be given by (Silverman 1986, p. 85) as

$$ h_{opt}=\left\{ \int t^{2}K(t)dt\right\}^{-2/(d+2)}\left\{ \int K(t)^{2}dt\right\}^{1/(d+4)}\left\{ \int \left(\nabla^{2}g(\mathbf{x})\right)^{2}d\mathbf{x}\right\}^{-1/(d+4)}n^{-1/(d+4)}, $$

(5)

If a multivariate normal kernel is used for smoothing the normal distribution data with unit variance,

$$h_{opt}=\left\{ \frac{4}{d+2}\right\}^{1/(d+4)}n^{-1/(d+4)}. $$

Step 2: Estimate the conditional c.d.f. of y given x:

$$\widehat{F}(y|\mathbf{x})=\int_{-\infty }^{y}\widehat{f}(y|\mathbf{x})dy. $$

Step 3: Estimate the local conditional quantile function ξ(τ|x) of y given x by inverting an estimated conditional c.d.f. $\widehat {F}(y|\mathbf {x})$.

$$\widehat{\xi }(\tau |\mathbf{x})=\widehat{Q_{y}}(\tau |\mathbf{x})=\inf \{y: \widehat{F}(y|\mathbf{x})\geq \tau \}=\widehat{F}^{-1}(\tau |\mathbf{x}). $$

It is difficult to compute a global inverse function $\widehat {\xi }(\tau | \mathbf {x})$ of the kernel estimated conditional c.d.f. $\widehat {F}(y| \mathbf {x})$ which has many terms. To avoid the the computational global difficulties, we estimate the local conditional quantile point ξ_i(τ|x_i) of y given x_i by inverting $ \widehat {F}(y|\mathbf {x}_{i})$ at the ith data point (y_i,x_i):

$$ \widehat{\xi_{i}}(\tau |\mathbf{x}_{i})=\widehat{Q_{y}}(\tau |\mathbf{x} _{i})=\inf \{y:\widehat{F}(y|\mathbf{x}_{i})\geq \tau \}=\widehat{F} ^{-1}(\tau |\mathbf{x}_{i}),\quad i=1,2,\ldots,n. $$

(6)

Thus, we have n points $\left (\mathbf {x}_{i},\widehat {\xi _{i}}(\tau | \mathbf {x}_{i})\right),\;i=1,2,\ldots,n.$

Step 4: We propose a direct nonparametric quantile regression estimator for the τth conditional quantile curve of x by using Nadaraya-Watson (NW) nonparametric regression estimator (Scott, 2015, p. 242) on $\left (\mathbf {x}_{i},\widehat {\xi _{i}}(\tau | \mathbf {x}_{i})\right),\;i=1,2,\ldots,n:$

$$ Q_{D}(\tau |\mathbf{x})=\widehat{\xi }(\tau |\mathbf{x})=\frac{ \sum\limits_{i=1}^{n}K_{\mathbf{h}}\left\{ \mathbf{x}-\mathbf{X} _{i}\right\} \widehat{\xi_{i}}(\tau |\mathbf{x}_{i})}{\sum \limits_{j=1}^{n}K_{\mathbf{h}}\left\{ \mathbf{x}-\mathbf{X}_{j}\right\} } =\sum\limits_{i=1}^{n}W_{h_{\mathbf{x}}}(\mathbf{x},\mathbf{X}_{i}\mathbf{)} \widehat{\xi_{i}}(\tau |\mathbf{x}_{i}),{\quad}0<\tau <1, $$

(7)

where $W_{h_{x}}(\mathbf {x},\mathbf {X}_{i}\mathbf {)}$ is called an equivalent kernel, and h=(h₁,…,h_d),

$$W_{h_{\mathbf{x}}}(\mathbf{x},\mathbf{X}_{i}\mathbf{)=}\frac{K_{\mathbf{h} }\left\{ \mathbf{x}-\mathbf{X}_{i}\right\} }{\sum\limits_{j=1}^{n}K_{ \mathbf{h}}\left\{ \mathbf{x}-\mathbf{X}_{j}\right\} },\quad i=1,2,\ldots,n, $$

where

$$K_{\mathbf{h}}\left\{ \mathbf{x}-\mathbf{X}_{i}\right\} =\frac{1}{ nh_{1}\ldots{h}_{d}}\prod\limits_{j=1}^{d}K\left(\frac{x-x_{ij}}{h_{j}}\right),\quad i=1,\ldots,n, $$

where K is the kernel function, and h_j>0 is the bandwidth for the j th dimension.

The new point of (7) is that it uses Step 3’s (6)numerical results: n points $\left (\mathbf {x}_{i},\widehat {\xi _{i}}(\tau |\mathbf {x}_{i})\right),\;i=1,2,\ldots,n,$ to estimate a conditional mean curve of the τth quantile function based on these n points, then smoothes these n points out.

In this paper, for the kernel regression, we use K which is the standard normal kernel. Similar as formula(5), we use the optimal bandwidth for the jth dimension (Silverman 1986, p.40),

$$ {} h_{j,opt}\,=\,\left\{ \int t^{2}K(t)dt\right\}^{-2/5}\left\{ \int K(t)^{2}dt\right\}^{1/5}\left\{ \int \left(\nabla^{2}\widehat{g_{j}} (x_{j})\right)^{2}d\mathbf{x}_{j}\right\}^{-1/5}n^{-1/5},\quad j\,=\,1,\ldots,d, $$

(8)

where $\widehat {g}_{j}(x_{j})$ is the estimated the jth dimensional marginal density of x_j in x=(x₁,x₂,…,x_d), n is the sample size of the random sample in (4).

Step 5: Check all procedures, and make any necessary adjustments.

Comparison of goodness-of-fit on quantile regression models

In order to compare the regular QR estimator in (3)and the direct nonparametric QR estimator in (7), we extend the idea of measuring goodness-of-fit by Koenker and Machado (1999). We suggest using a Relative R(τ), 0<τ<1, which is defined as

$$ Relative\text{ }R(\tau)=1-\frac{V_{D}(\tau)}{V_{R}(\tau)},\quad -1\leq R(\tau)\leq 1,\quad \text{where} $$

(9)

$$V_{D}(\tau)=\sum_{y_{i}\geq Q_{D}(\tau |\mathbf{x}_{i})}\frac{\tau }{n} \left\vert y_{i}-Q_{D}(\tau |\mathbf{x}_{i})\right\vert +\sum_{y_{i}<Q_{D}(\tau |\mathbf{x}_{i})}\frac{(1-\tau)}{n}\left\vert y_{i}-Q_{D}(\tau |\mathbf{x}_{i})\right\vert, $$

where Q_D(τ|x_i) is obtained by (7), and

$$V_{R}(\tau)=\sum_{y_{i}\geq \mathbf{x}_{i}^{T}\widehat{\mathbf{\beta }} (\tau)}\frac{\tau }{n}\left\vert y_{i}-\mathbf{x}_{i}^{T}\widehat{\mathbf{ \beta }}(\tau)\right\vert +\sum_{y_{i}<\mathbf{x}_{i}^{T}\widehat{\mathbf{ \beta }}(\tau)}\frac{(1-\tau)}{n}\left\vert y_{i}-\mathbf{x}_{i}^{T} \widehat{\mathbf{\beta }}(\tau)\right\vert, $$

where $\widehat {\mathbf {\beta }}(\tau)$ is given by (3).

Simulations

For investigating the proposed direct nonparametric quantile regression estimator in (7), in this Section, Monte Carlo simulations are performed. We generate m random samples with size n each from the second kind of Gumbel’s bivariate exponential distribution Gumbel (1960) which has a non-linear conditional quantile function of y given x in (11). It has c.d.f. F(x,y) and density function f(x,y) in (10) :

$$ F(x,y)=(1-e^{-x})(1-e^{-y})(1+\alpha e^{-(x+y)}),\;x\geq 0,\;y\geq 0,\;\alpha >0, $$

(10)

$$f(x,y)=e^{-(x+y)}(1+\alpha (2e^{-x}-1)(2e^{-y}-1)),\;x\geq 0,\;y\geq 0,\;\alpha >0. $$

The conditional density of y for given x is

$$f(y|x)=e^{-y}(1+\alpha (2e^{-x}-1)(2e^{-y}-1)),\;x\geq 0,\;y\geq 0,\;\alpha >0. $$

The conditional c.d.f. of y for given x is

$$F(y|x)=e^{-y}(\alpha (2e^{-x}-1)(1-e^{-y})-1)+1,\;x\geq 0,\;y\geq 0,\;\alpha >0. $$

The true τth conditional quantile function of y given x of (10) is

$$\begin{array}{@{}rcl@{}} \xi (\tau |x)\,=\,Q_{y}(\tau |x)\,=\,\ln \left(\frac{2\alpha (2e^{-x}-1)}{\alpha (2e^{-x}\,-\,1)\,-\,1\,+\,\sqrt{(\alpha (2e^{-x}\,-\,1)\,+\,1)^{2}-4\alpha \tau (2e^{-x}-1)}} \right), \\ x\geq 0,\;\alpha >0,\;0<\tau <1. && \notag \end{array} $$

(11)

Letting α=1, the c.d.f. in (10) is in Fig. 1.

We use three quantile regression methods:

1. The regular quantile regression Q_R(τ|x) estimation based on (3):

$$ Q_{R}(\tau |x)=\widehat{\beta }_{0}(\tau)+\widehat{\beta }_{1}(\tau)x.\quad 0<\tau <1 $$

(12)

2. The first-order linear polynomials Quantile Regression (LPQR) Q_LP(τ|x) (Chaudhuri 1991, Keoker 2005, Yu and Jones 1998), for z in a neighborhood of x,

$$ Q_{LP}(\tau |x)=\widehat{a}_{0}(\tau,x)+\widehat{a}_{1}(\tau,x)(z-x).\quad 0<\tau <1, $$

(13)

where

$$\widehat{\mathbf{a}}(\tau,x)=\arg \min_{\mathbf{\beta }(\tau)\in R^{p}}\sum\limits_{i=1}^{n}\rho_{\tau }(y_{i}-a_{0}(\tau,x)-a_{1}(\tau,x)(x_{i}-x))K\left(\frac{x-x_{i}}{h}\right),\quad 0<\tau <1, $$

here a(τ,x)=(a₀(τ,x),a₁(τ,x))^T,h and K are the bandwidth and kernel function. the LPQR can be computed by the R package ‘quantreg’ Koenker (2018).

3. The direct nonparametric quantile regression Q_D(τ|x) estimation based on (7)

$$ Q_{D}(\tau |x)=\sum\limits_{i=1}^{n}W_{h_{\mathbf{x}}}(\mathbf{x},\mathbf{X} _{i}\mathbf{)}\widehat{\xi_{i}}(\tau |x_{i}),\quad 0<\tau <1, $$

(14)

where $\widehat {\xi _{i}}(\tau |x_{i})$ is obtained by (6),$W_{h_{ \mathbf {x}}}(\mathbf {x},\mathbf {X}_{i}\mathbf {)}$ is given by (7).

For each method, we generate size n=100,m=100 samples. Q_R,i(τ|x),Q_LP,i(τ|x) and Q_D,i(τ|x), i=1,2,…,m, are estimated in the ith sample. Let α=1 in (11). Then the true τth conditional quantile is

$$ {} \xi (\tau |x)=Q_{y}(\tau |x)=\ln \left(\frac{2e^{-x}-1}{e^{-x}-1+\sqrt{ e^{-2x}-\tau (2e^{-x}-1)}}\right),\;x\geq 0,\;\alpha >0,\;0<\tau <1. $$

(15)

The simulation mean squared errors (SMSEs) of the estimators (12), (13) and (14) are:

$$\begin{array}{@{}rcl@{}} SMSE(Q_{R}(\tau |x)) &=&\frac{1}{m}\sum\limits_{i=1}^{m}\int_{0}^{N}(Q_{R,i}(\tau |x)-Q_{y}(\tau |x))^{2}dx; \end{array} $$

(16)

$$\begin{array}{@{}rcl@{}} SMSE(Q_{LP}(\tau |x)) &=&\frac{1}{m}\sum\limits_{i=1}^{m}\int_{0}^{N}(Q_{LP,i}(\tau |x)-Q_{y}(\tau |x))^{2}dx, \end{array} $$

(17)

$$\begin{array}{@{}rcl@{}} SMSE(Q_{D}(\tau |x)) &=&\frac{1}{m}\sum\limits_{i=1}^{m}\int_{0}^{N}(Q_{D,i}(\tau |x)-Q_{y}(\tau |x))^{2}dx, \end{array} $$

(18)

where the true τth conditional quantile Q_y(τ|x) is defined in (15). N is a finite x value such that the c.d.f. in (10) F(N,N)≈1. We take N=6 and the simulation efficiencies (SEFFs) are given by

$$SEFF(Q_{LP}(\tau |x))=\frac{SMSE(Q_{R}(\tau |x))}{SMSE(Q_{LP}(\tau |x))},\quad SEFF(Q_{D}(\tau |x))=\frac{SMSE(Q_{R}(\tau |x))}{SMSE(Q_{D}(\tau |x))}, $$

where SMSE(Q_R(τ|x)),SMSE(Q_LP(τ|x)) and SMSE(Q_D(τ|x)) are defined in (16), (17) and (18), respectively.

Table 1 shows that all of the SEFF(Q_D(τ|x)) are larger than 1 when τ=0.95,…, 0.99.

Table 1 Simulation Mean Square Errors (SMSEs) and Efficiencies (SEFFs) of Estimating Q_y(τ|x),m=100,n=100,N=6.

Full size table

Next, we compare Q_D(τ|x) and Q_R(τ|x) in Figs. 3 and 4.

Figure 3 shows the boxplots of Q_R(τ|x) and Q_D(τ|x) for τ=0.95,0.97, and 0.99.(The true conditional quantiles are in blue line). The Q_D(τ|x) has much smaller variance than Q_R(τ|x)s.

Figure 4 shows the average curves of the 100 estimated τ=0.95th quantile curves of Q_R(τ|x) (in blue dash line) and that of Q_D(τ|x) (in red solid). The average Q_D(τ|x) curve is much closer than Q_R(τ|x) to the true quantile curve (in green dash).

From the overall results of the simulation, we can conclude that Table 1 and Figs. 2, 3, and 4 show that for τ=0.95,…,0.99, the proposed direct estimator Q_D(τ|x) in (7) is more efficient relative to the regular regression Q_R(τ|x) in (2) and a nonparametric LPQR in (13).

Real examples of applications

In this section, we apply the following two regression models to the Buffalo snowfall and CO₂ emission examples in Huang and Nguyen (2017):

1. The regular quantile regression Q_R(τ|x) in model (2)usingestimator $\widehat {\beta }(\tau)$ in (3);

2. The direct nonparametric quantile regression Q_D(τ|x) in (7).

5.1 Buffalo snowfall example

Huang and Nguyen (2017) used the following linear second order polynomial quantile regression model for this example (National Weather Service Forecast Office 2017):

$$Q_{y}(\tau |x)=\beta_{0}(\tau)+\beta_{1}(\tau)x+\beta_{2}(\tau)x^{2}, $$

where y represents the total snowfall (cm) and x represents the maximum temperature (°C).

In this paper we use the proposed five-step algorithm in Section 2 to obtain the new direct nonparametric quantile estimator Q_D(τ|x) in (7). We compare the new estimator Q_D(τ|x) with the regular quantile estimator Q_R(τ|x) in Huang and Nguyen (2017). Table 2 and Fig. 5 show the difference of values of two estimators. Figure 5a, b and c show the scatter plot of the daily snowfall vs. maximum temperature with the fitted Q_R, and Q_D quantile curves at τ= 0,95, 0.97 and 0.99. It is interesting to see that the Q_D curves appear to follow the data patterns closer than the Q_R curves.

Table 2 Buffalo Daily Snowfalls (cm) at High Quantiles Using Q_R and Q_D

Full size table

Table 2 lists the estimated Buffalo snowfall quantile values at a given maximum temperature for τ= 0.97 and 0.99. It demonstrates that when quantiles are at high τ, the Q_D gives greater variety of snowfall predictions than the Q_R. The relationship of snowfall and max-temperature is not necessarily linear.

Figure 6 and Table 3 show the values of the Relative R(τ) in (9) for given τ=0.95,…,0.99. We note that R(τ)>0 which means that V_D(τ)<V_R(τ) and Q_D is a better fit to the data than Q_R.

Table 3 Relative R(τ) Values for the Buffalo Snowfall Example

Full size table

Figure 5c shows that the proposed direct nonparametric quantile regression Q_D predicts that for moderate temperatures, such as 5°C to 10°C, it is likely to have smaller but varied snowfalls in Buffalo than the regular Q_D predicts. For temperature over 10°C, the Q_D predicts a much higher value snow amount than the regular Q_R predicts. On another side, for very low temperatures, such as − 15°C to 0°C, the Q_D and Q_R both predict more likely to have extreme heavy snowfalls that may cause damage. Thus prediction of heavy snowfalls is related to cold weather forecasts. But the prediction snowfalls related to temperature from the Q_D is not as a simple linear relationship as Q_R predicts. We also note that lots of snow occurred between - 5°C to 0°C; the predictions form the Q_D are reflecting this fact and give varied predictions.

5.2 CO₂ emission example

Huang and Nguyen (2017) used the linear quantile regression model for this example:

$$Q_{y}(\tau |x_{1},x_{2})=\beta_{0}(\tau)+\beta_{1}(\tau)x_{1}+\beta_{2}(\tau)x_{2}, $$

where y represents CO₂ emission (tonnes) per capita, x₁ represents ln of gross domestic product (GPD) (US $), per capita and x₂ represents ln of electricity consumption (E.C.) (kilowatts) per capita (Carbon Dioxide Information Analysis Centre (2017)).

Similar as in the Buffalo Snowfall example in Subsection 5.1, we use the proposed five-step algorithm in Section 2 to obtain the new direct nonparametric quantile estimator Q_D(τ|x) in (7). We compare the new estimator Q_D(τ|x) with the regular quantile estimator Q_R(τ|x) in Huang and Nguyen (2017). Figures 7, 8 and Tables 4, 5 show the differences of the values of two estimators. Figure 7a shows the 3D scatter plot of CO₂ emission vs ln(GDP) and ln(EC) with the fitted regular Q_R surface at τ=0.97. Figure 7b shows the 3D scatter plot of CO₂ emission vs ln(GDP) and ln(EC) with the fitted direct Q_D surface at τ=0.97. Figure 7c shows the 3D scatter plot with both the regular Q_R (green) and direct Q_D (red) quantile surfaces of CO₂ emission vs the ln(GDP) and ln(E.C.) at τ=0.97. It is interesting to see the difference between the Q_R and Q_D quantile surfaces.

Table 4 CO₂ Emission per capita at high quantiles given ln(GDP) estimators Q_R and Q_D

Full size table

Table 5 CO₂ emission per capita at high quantiles given ln(E.C.) estimators Q_R and Q_D

Full size table

We may see the Q_R and Q_D quantile curves more cleanly in 2D plots. Figure 8a shows the 2D scatter plot of CO₂ emission vs ln(GDP) when the country’s E.C. is 2980.96 kilowatts with the fitted regular Q_R and direct Q_D curves at at τ=0.97. Figure 8b shows the 2D scatter plot of CO₂ emission vs ln(E.C.) when the country’s GDP is $13,359.73 with the fitted regular Q_R and direct Q_D curves at at τ=0.97. We note that the Q_R and Q_D quantile regression curves appear to fit the data. In general, the Q_D curves follow the data patterns closer than Q_R quantile lines, and the Q_D produces different estimated CO ₂ emissions than the Q_R estimated at high quantiles. In Fig. 7, it is interesting to see that the Q_D conditional quantile surfaces are not linear as the linear planes of the Q_R.

Tables 4 and 5 provide details of the estimated high quantiles about countries’ CO₂ emission at τ=0.97 when the countries consume 2980.96 kilowatts of electricity and have a GDP of $13,359.73, respectively.

Figure 9 and Table 6 show the Relative R(τ) in (9), for τ=0.95,…,0.99. All values of Relative R(τ) are larger than 0, which signifies that V_D(τ)<V_R(τ) and it also suggests that the direct quantile regression estimator Q_D is a better fit to the CO ₂ emission data than the regular quantile regression estimator Q_R.

Table 6 Relative R(τ) values for CO₂ emission example

Full size table

Over all, it is interesting to see that the proposed direct estimator Q_D gave more variety of predictions than the Q_R on CO₂ emissions relative to gross domestic product and amounts of electricity produced. The relationships are not necessarily linear and model free. We expect that the predictions from Q_D may be more reasonable. The predictions may benefit prevention of further damages of CO₂ emissions to the environment.

Conclusions

After the above studies, we can conclude:

1. This paper proposes a new direct nonparametric quantile regression method which is model free. It uses nonparametric density estimation and nonparametric regression techniques to estimate high conditional quantiles. The paper provides a computational five-step algorithm which overcomes the limitations of the estimation in the linear quantile regression model and some other nonparametric quantile regression methods.

2. The Monte Carlo simulation works on the second kind of Gumbel’s bivariate exponential distribution which has a nonlinear conditional quantile function. The simulation is different from the bivariate Pareto distribution which has a linear conditional quantile function, in Huang and Nguyen (2017). The simulation results confirm that the proposed new method is more efficient relative to the regular quantile regression estimators and a local polynomial nonparametric estimator.

3. The proposed new direct nonparametric quantile regression can be used to predict extreme values of snowfall and CO₂ emission examples in Huang and Nguyen (2017). The proposed direct quantile regression Q_D estimator gives a variety of predictions which fits data very well. The prediction of relationships are not simply just linear. We expect that the predictions from Q_D may be more reasonable than the regular quantile regression predictions. The new estimator may benefit prevention of further damages of the extreme events to human and the environment.

4. The proposed direct nonparametric quantile regression provides an alternative way for quantile regression. Further studies on the details of this method are suggested.

References

Carbon Dioxide Information Analysis Center (2017). http://www.cdiac.ornl.gov. Accessed 20 Oct 2014.
Cai, Z: Applied Nonparametric Econometrics. Wang Yanan Institute for Studies in Economics, Xiamen University, China (2013).
Google Scholar
Chaudhuri, P: Nonparametric estimates of regression quantile and their local Bahadur representation. Ann. Stat. 2, 760–777 (1991).
Article MathSciNet MATH Google Scholar
Fukunaga, K: Introduction to Statistical Pattern Recognition. Academic press, New York (1972).
MATH Google Scholar
Gumbel, EJ: Bivariate exponential distributions. J. Am. Stat. Assoc. 55, 698–707 (1960).
Article MathSciNet MATH Google Scholar
Hall, P, Wolff, RCL, Yao, Q: Methods for estimating a conditional distribution. J. Am. Stat. Assoc. 94, 154–163 (1999).
Article MathSciNet MATH Google Scholar
Huang, ML, Nguyen, C: High quantile regression for extreme events. J. Stat. Distrib. Appl. 4(4), 1–20 (2017).
MATH Google Scholar
Huang, ML, Xu, X, Tashnev, D: A weighted linear quantile regression. J. Stat. Comput. Simul. 85(13), 2596–2618 (2015).
Article MathSciNet Google Scholar
Koenker, R: Quantile regression. Cambridge University Press, New York (2005).
Book MATH Google Scholar
Koenker, R. Package ‘guantreg’: Quantile Regression (2018). R Package, Version 5.35 (Available from https://www.r-project.org). Accessed 23 Apr 2018.
Koenker, R, Bassett, GW: Regression Quantiles. Econometrica. 46, 33–50 (1978).
Article MathSciNet MATH Google Scholar
Koenker, R, Machado, JAF: Goodness of fit and related inference processes for quantile regression. J. Am. Stat. Assoc. 96(454), 1296–1311 (1999).
Article MathSciNet MATH Google Scholar
Li, Q, Racine, JS: Nonparametric Econometrics-Theory and Practice. Prinston University Press, Oxford (2007).
MATH Google Scholar
National Weather Service Forecast Office (2017). www.weather.gov/buf. Accessed 22 Sept 2014.
Scott, DW: Multivariate Density Estimation, Theory, Practice and Visualization, second edition. John Wiley & Sons, New York (2015).
MATH Google Scholar
Silverman, BW: Density estimation for statistics and data analysis. Chapman & Hall, London (1986).
Book MATH Google Scholar
Wang, HJ, Li, D: Estimation of extreme conditional quantile through power transformation. J. Am. Stat. Assoc. 108(503), 1062–1074 (2013).
Article MathSciNet MATH Google Scholar
Yu, K, Lu, Z, Stander, J: Quantile regression: applications and current research areas. Statistician. 52(3), 331–350 (2003).
MathSciNet Google Scholar
Yu, K, Jones, MC: Local linear regression quantile regression. J. Am. Stat. Assoc. 93, 228–238 (1998).
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

We are grateful for the comments of the reviewers and editor. They have helped us to improve the paper. This research is supported bythe Natural Science and Engineering Research Council of Canada (NSERC) grant MLH, RGPIN-2014-04621. We deeply appreciate the work and suggestions of Ramona Rat and Jenny Tieu which helped to improve the paper.

Author information

Authors and Affiliations

Department of Mathematics & Statistics, Brock University, St. Catharines, Ontario, L2S 3A1, Canada
Mei Ling Huang
Apotex Inc., Toronto, M9L 1T9, Ontario, Canada
Christine Nguyen

Authors

Mei Ling Huang
View author publications
You can also search for this author in PubMed Google Scholar
Christine Nguyen
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

The authors MLH and CN carried out this work and drafted the manuscript together. Both authors read and approved the final manuscript.

Corresponding author

Correspondence to Mei Ling Huang.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Huang, M.L., Nguyen, C. A nonparametric approach for quantile regression. J Stat Distrib App 5, 3 (2018). https://doi.org/10.1186/s40488-018-0084-9

Download citation

Received: 12 September 2017
Accepted: 31 May 2018
Published: 18 July 2018
DOI: https://doi.org/10.1186/s40488-018-0084-9

Keywords

AMS 2010 Subject Classifications

primary: 62G32; secondary: 62J05

A nonparametric approach for quantile regression

Abstract

Introduction

Proposed direct nonparametric quantile regression

Comparison of goodness-of-fit on quantile regression models

Simulations

Real examples of applications

5.1 Buffalo snowfall example

5.2 CO2 emission example

Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

AMS 2010 Subject Classifications

5.2 CO₂ emission example