- Research
- Open Access
A nonparametric approach for quantile regression
- Mei Ling Huang^{1}Email author and
- Christine Nguyen^{2}
https://doi.org/10.1186/s40488-018-0084-9
© The Author(s) 2018
- Received: 12 September 2017
- Accepted: 31 May 2018
- Published: 18 July 2018
Abstract
Quantile regression estimates conditional quantiles and has wide applications in the real world. Estimating high conditional quantiles is an important problem. The regular quantile regression (QR) method often designs a linear or non-linear model, then estimates the coefficients to obtain the estimated conditional quantiles. This approach may be restricted by the linear model setting. To overcome this problem, this paper proposes a direct nonparametric quantile regression method with five-step algorithm. Monte Carlo simulations show good efficiency for the proposed direct QR estimator relative to the regular QR estimator. The paper also investigates two real-world examples of applications by using the proposed method. Studies of the simulations and the examples illustrate that the proposed direct nonparametric quantile regression model fits the data set better than the regular quantile regression method.
Keywords
- Conditional quantile
- Goodness-of-fit
- Gumbel’s second kind of bivariate exponential distribution
- Nonparametric kernel density estimator
- Nonparametric regression
- Weighted loss function
AMS 2010 Subject Classifications
- primary: 62G32; secondary: 62J05
Introduction
where β(τ)=(β_{0}(τ),β_{1}(τ),β_{2}(τ),…,β_{d}(τ))^{T}.
In recent years, studies are looking for efficiency improvements of estimator (3) (Yu et al. 2003; Wang and Li 2013; Huang et al. 2015; Huang and Nguyen 2017). The regular linear quantile regression (2) needs the estimator \(\widehat {\mathbf {\beta }} (\tau)\) in (3) for the estimated conditional quantile curves. But this estimated conditional quantile curve may be restricted under the model setting.
Many studies have used nonparametric method of quantile regression in recent years, for example, Chaudhuri (2003), Yu and Jones (1991), Hall et al. (1999) and Yu et al. (2003). Chapter 7 in Keoker (2005) proposed a local polynomial quantile regression (LPQR), and other methods. Also we can see detailed discussions on theory, methodologies and applications in Li and Racine (2007) and Cai (2013).
In order to overcome the limitation of the model setting in (2) in this paer we propose a direct nonparametric quantile regression method which uses the ideas of nonparametric kernel density estimation and nonparametric kernel regression. The proposed method is not only different from most other existing nonparametric quantile regression methods, it also overcome thecrossing problem of estimating quantile curves. We like to see if the new method has an improvement relative to the regular linear quantile regression and other nonparametric quantile regression methods, we will do two studies in this paper:
1. Monte Carlo simulations will be performed to confirm the better efficiency of the new direct QR estimator relative to the regular QR estimator and a nonparametric LPQR.
2. The new proposed method will be applied to two real-world examples of extreme events and compared with the linear model in Huang and Nguyen (2017).
In Section 2, we propose a direct nonparametric quantile regression estimator. A relative measure of comparing goodness-of-fit for quantile models is given in Section 3. In Section 4, the results of Monte Carlo simulations generated from Gumbel’s second kind of bivariate exponential distribution Gumbel (1960) show that the proposed direct method produces high efficiencies relative to existing linear QR and LPQR methods. In Section 5, the regular linear quantile regression and the proposed direct quantile regression are applied to two real-life examples: the Buffalo snowfall and CO_{2} emission examples in Huang and Nguyen (2017). The study of these examples illustrate that the proposed direct nonparametric quantile regression model fits the data better than the existing linear quantile regression method.
Proposed direct nonparametric quantile regression
We construct the following a five-step algorithm of a direct nonparametric quantile regression:
where \(\widehat {f}(y,\mathbf {x})\) is an estimator of the joint density of y and x, and \(\widehat {g}(\mathbf {x)}\) is an estimator of the marginal density of x.
Thus, we have n points \(\left (\mathbf {x}_{i},\widehat {\xi _{i}}(\tau | \mathbf {x}_{i})\right),\;i=1,2,\ldots,n.\)
The new point of (7) is that it uses Step 3’s (6)numerical results: n points \(\left (\mathbf {x}_{i},\widehat {\xi _{i}}(\tau |\mathbf {x}_{i})\right),\;i=1,2,\ldots,n,\) to estimate a conditional mean curve of the τth quantile function based on these n points, then smoothes these n points out.
where \(\widehat {g}_{j}(x_{j})\) is the estimated the jth dimensional marginal density of x_{j} in x=(x_{1},x_{2},…,x_{d}), n is the sample size of the random sample in (4).
Step 5: Check all procedures, and make any necessary adjustments.
Comparison of goodness-of-fit on quantile regression models
Simulations
We use three quantile regression methods:
where \(\widehat {\xi _{i}}(\tau |x_{i})\) is obtained by (6),\(W_{h_{ \mathbf {x}}}(\mathbf {x},\mathbf {X}_{i}\mathbf {)}\) is given by (7).
Simulation Mean Square Errors (SMSEs) and Efficiencies (SEFFs) of Estimating Q_{y}(τ|x),m=100,n=100,N=6.
τ | 0.95 | 0.96 | 0.97 | 0.98 | 0.99 |
---|---|---|---|---|---|
SMSE(Q_{R}(τ|x)) | 22.091 | 26.632 | 28.982 | 42.725 | 73.340 |
SMSE(Q_{LP}(τ|x)) | 8.160 | 9.667 | 11.074 | 15.080 | 23.734 |
SMSE(Q_{D}(τ|x)) | 5.161 | 6.630 | 6.552 | 8.850 | 11.596 |
Efficiency | |||||
SEFF(Q_{LP}(τ|x)) | 2.7072 | 2.7449 | 2.6171 | 2.8332 | 3.0901 |
SEFF(Q_{D}(τ|x)) | 4.2804 | 4.0169 | 4.4234 | 4.8278 | 6.3246 |
Figure 3 shows the boxplots of Q_{R}(τ|x) and Q_{D}(τ|x) for τ=0.95,0.97, and 0.99.(The true conditional quantiles are in blue line). The Q_{D}(τ|x) has much smaller variance than Q_{R}(τ|x)s.
Figure 4 shows the average curves of the 100 estimated τ=0.95th quantile curves of Q_{R}(τ|x) (in blue dash line) and that of Q_{D}(τ|x) (in red solid). The average Q_{D}(τ|x) curve is much closer than Q_{R}(τ|x) to the true quantile curve (in green dash).
From the overall results of the simulation, we can conclude that Table 1 and Figs. 2, 3, and 4 show that for τ=0.95,…,0.99, the proposed direct estimator Q_{D}(τ|x) in (7) is more efficient relative to the regular regression Q_{R}(τ|x) in (2) and a nonparametric LPQR in (13).
Real examples of applications
In this section, we apply the following two regression models to the Buffalo snowfall and CO_{2} emission examples in Huang and Nguyen (2017):
1. The regular quantile regression Q_{R}(τ|x) in model (2)usingestimator \(\widehat {\beta }(\tau)\) in (3);
2. The direct nonparametric quantile regression Q_{D}(τ|x) in (7).
5.1 Buffalo snowfall example
Buffalo Daily Snowfalls (cm) at High Quantiles Using Q_{R} and Q_{D}
τ=0.97 | τ=0.99 | |||
---|---|---|---|---|
Temperature (°C) | Q _{ R} | Q _{ D} | Q _{ R} | Q _{ D} |
-15 | 37.38 | 25.49 | 105.46 | 60.64 |
-10 | 33.19 | 30.23 | 87.95 | 62.98 |
-5 | 30.98 | 33.33 | 72.08 | 56.54 |
0 | 30.73 | 29.89 | 57.86 | 54.56 |
5 | 32.47 | 33.27 | 45.29 | 52.39 |
10 | 36.17 | 37.34 | 34.36 | 43.04 |
Table 2 lists the estimated Buffalo snowfall quantile values at a given maximum temperature for τ= 0.97 and 0.99. It demonstrates that when quantiles are at high τ, the Q_{D} gives greater variety of snowfall predictions than the Q_{R}. The relationship of snowfall and max-temperature is not necessarily linear.
Relative R(τ) Values for the Buffalo Snowfall Example
τ=0.95 | τ=0.96 | τ=0.97 | τ=0.98 | τ=0.99 | |
---|---|---|---|---|---|
Relative R(τ) | 0.0359 | 0.0346 | 0.0324 | 0.0903 | 0.1206 |
Figure 5c shows that the proposed direct nonparametric quantile regression Q_{D} predicts that for moderate temperatures, such as 5°C to 10°C, it is likely to have smaller but varied snowfalls in Buffalo than the regular Q_{D} predicts. For temperature over 10°C, the Q_{D} predicts a much higher value snow amount than the regular Q_{R} predicts. On another side, for very low temperatures, such as − 15°C to 0°C, the Q_{D} and Q_{R} both predict more likely to have extreme heavy snowfalls that may cause damage. Thus prediction of heavy snowfalls is related to cold weather forecasts. But the prediction snowfalls related to temperature from the Q_{D} is not as a simple linear relationship as Q_{R} predicts. We also note that lots of snow occurred between - 5°C to 0°C; the predictions form the Q_{D} are reflecting this fact and give varied predictions.
5.2 CO_{2} emission example
CO_{2} Emission per capita at high quantiles given ln(GDP) estimators Q_{R} and Q_{D}
τ=0.97 | ||
---|---|---|
ln of GDP per capita ($) | Q _{ R} | Q _{ D} |
7.5 | 15.2181 | 8.8737 |
8 | 18.0437 | 10.1949 |
8.5 | 20.8693 | 11.7828 |
9 | 23.6950 | 14.4143 |
9.5 | 26.5206 | 19.0458 |
10 | 29.3462 | 24.0338 |
10.5 | 32.1718 | 27.9596 |
11 | 34.9975 | 31.1097 |
11.5 | 37.8231 | 30.7696 |
12 | 40.6487 | 31.2366 |
CO_{2} emission per capita at high quantiles given ln(E.C.) estimators Q_{R} and Q_{D}
ln of Electricity Consumption | τ=0.97 | |
---|---|---|
per capita (kilowatts) | Q _{ R} | Q _{ D} |
0 | 6.9775 | 7.1919 |
2 | 11.8632 | 7.2759 |
4 | 16.7490 | 24.6924 |
6 | 21.6348 | 9.5560 |
8 | 26.5206 | 15.9569 |
10 | 31.4064 | 31.5634 |
12 | 36.2921 | 39.6481 |
We may see the Q_{R} and Q_{D} quantile curves more cleanly in 2D plots. Figure 8a shows the 2D scatter plot of CO_{2} emission vs ln(GDP) when the country’s E.C. is 2980.96 kilowatts with the fitted regular Q_{R} and direct Q_{D} curves at at τ=0.97. Figure 8b shows the 2D scatter plot of CO_{2} emission vs ln(E.C.) when the country’s GDP is $13,359.73 with the fitted regular Q_{R} and direct Q_{D} curves at at τ=0.97. We note that the Q_{R} and Q_{D} quantile regression curves appear to fit the data. In general, the Q_{D} curves follow the data patterns closer than Q_{R} quantile lines, and the Q_{D} produces different estimated CO _{2} emissions than the Q_{R} estimated at high quantiles. In Fig. 7, it is interesting to see that the Q_{D} conditional quantile surfaces are not linear as the linear planes of the Q_{R}.
Tables 4 and 5 provide details of the estimated high quantiles about countries’ CO_{2} emission at τ=0.97 when the countries consume 2980.96 kilowatts of electricity and have a GDP of $13,359.73, respectively.
Relative R(τ) values for CO_{2} emission example
τ=0.95 | τ=0.96 | τ=0.97 | τ=0.98 | τ=0.99 | |
---|---|---|---|---|---|
Relative R(τ) | 0.3480 | 0.3612 | 0.3494 | 0.2895 | 0.2151 |
Over all, it is interesting to see that the proposed direct estimator Q_{D} gave more variety of predictions than the Q_{R} on CO_{2} emissions relative to gross domestic product and amounts of electricity produced. The relationships are not necessarily linear and model free. We expect that the predictions from Q_{D} may be more reasonable. The predictions may benefit prevention of further damages of CO_{2} emissions to the environment.
Conclusions
After the above studies, we can conclude:
1. This paper proposes a new direct nonparametric quantile regression method which is model free. It uses nonparametric density estimation and nonparametric regression techniques to estimate high conditional quantiles. The paper provides a computational five-step algorithm which overcomes the limitations of the estimation in the linear quantile regression model and some other nonparametric quantile regression methods.
2. The Monte Carlo simulation works on the second kind of Gumbel’s bivariate exponential distribution which has a nonlinear conditional quantile function. The simulation is different from the bivariate Pareto distribution which has a linear conditional quantile function, in Huang and Nguyen (2017). The simulation results confirm that the proposed new method is more efficient relative to the regular quantile regression estimators and a local polynomial nonparametric estimator.
3. The proposed new direct nonparametric quantile regression can be used to predict extreme values of snowfall and CO_{2} emission examples in Huang and Nguyen (2017). The proposed direct quantile regression Q_{D} estimator gives a variety of predictions which fits data very well. The prediction of relationships are not simply just linear. We expect that the predictions from Q_{D} may be more reasonable than the regular quantile regression predictions. The new estimator may benefit prevention of further damages of the extreme events to human and the environment.
4. The proposed direct nonparametric quantile regression provides an alternative way for quantile regression. Further studies on the details of this method are suggested.
Declarations
Acknowledgements
We are grateful for the comments of the reviewers and editor. They have helped us to improve the paper. This research is supported bythe Natural Science and Engineering Research Council of Canada (NSERC) grant MLH, RGPIN-2014-04621. We deeply appreciate the work and suggestions of Ramona Rat and Jenny Tieu which helped to improve the paper.
Authors’ contributions
The authors MLH and CN carried out this work and drafted the manuscript together. Both authors read and approved the final manuscript.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
- Carbon Dioxide Information Analysis Center (2017). http://www.cdiac.ornl.gov. Accessed 20 Oct 2014.
- Cai, Z: Applied Nonparametric Econometrics. Wang Yanan Institute for Studies in Economics, Xiamen University, China (2013).Google Scholar
- Chaudhuri, P: Nonparametric estimates of regression quantile and their local Bahadur representation. Ann. Stat. 2, 760–777 (1991).MathSciNetView ArticleMATHGoogle Scholar
- Fukunaga, K: Introduction to Statistical Pattern Recognition. Academic press, New York (1972).MATHGoogle Scholar
- Gumbel, EJ: Bivariate exponential distributions. J. Am. Stat. Assoc. 55, 698–707 (1960).MathSciNetView ArticleMATHGoogle Scholar
- Hall, P, Wolff, RCL, Yao, Q: Methods for estimating a conditional distribution. J. Am. Stat. Assoc. 94, 154–163 (1999).MathSciNetView ArticleMATHGoogle Scholar
- Huang, ML, Nguyen, C: High quantile regression for extreme events. J. Stat. Distrib. Appl. 4(4), 1–20 (2017).MATHGoogle Scholar
- Huang, ML, Xu, X, Tashnev, D: A weighted linear quantile regression. J. Stat. Comput. Simul. 85(13), 2596–2618 (2015).MathSciNetView ArticleGoogle Scholar
- Koenker, R: Quantile regression. Cambridge University Press, New York (2005).View ArticleMATHGoogle Scholar
- Koenker, R. Package ‘guantreg’: Quantile Regression (2018). R Package, Version 5.35 (Available from https://www.r-project.org). Accessed 23 Apr 2018.
- Koenker, R, Bassett, GW: Regression Quantiles. Econometrica. 46, 33–50 (1978).MathSciNetView ArticleMATHGoogle Scholar
- Koenker, R, Machado, JAF: Goodness of fit and related inference processes for quantile regression. J. Am. Stat. Assoc. 96(454), 1296–1311 (1999).MathSciNetView ArticleMATHGoogle Scholar
- Li, Q, Racine, JS: Nonparametric Econometrics-Theory and Practice. Prinston University Press, Oxford (2007).MATHGoogle Scholar
- National Weather Service Forecast Office (2017). www.weather.gov/buf. Accessed 22 Sept 2014.
- Scott, DW: Multivariate Density Estimation, Theory, Practice and Visualization, second edition. John Wiley & Sons, New York (2015).MATHGoogle Scholar
- Silverman, BW: Density estimation for statistics and data analysis. Chapman & Hall, London (1986).View ArticleMATHGoogle Scholar
- Wang, HJ, Li, D: Estimation of extreme conditional quantile through power transformation. J. Am. Stat. Assoc. 108(503), 1062–1074 (2013).MathSciNetView ArticleMATHGoogle Scholar
- Yu, K, Lu, Z, Stander, J: Quantile regression: applications and current research areas. Statistician. 52(3), 331–350 (2003).MathSciNetGoogle Scholar
- Yu, K, Jones, MC: Local linear regression quantile regression. J. Am. Stat. Assoc. 93, 228–238 (1998).MathSciNetView ArticleMATHGoogle Scholar