- Research
- Open Access
- Published:

# Quantile regression for overdispersed count data: a hierarchical method

*Journal of Statistical Distributions and Applications*
**volume 4**, Article number: 18 (2017)

## Abstract

Generalized Poisson regression is commonly applied to overdispersed count data, and focused on modelling the conditional mean of the response. However, conditional mean regression models may be sensitive to response outliers and provide no information on other conditional distribution features of the response. We consider instead a hierarchical approach to quantile regression of overdispersed count data. This approach has the benefits of effective outlier detection and robust estimation in the presence of outliers, and in health applications, that quantile estimates can reflect risk factors. The technique is first illustrated with simulated overdispersed counts subject to contamination, such that estimates from conditional mean regression are adversely affected. A real application involves ambulatory care sensitive emergency admissions across 7518 English patient general practitioner (GP) practices. Predictors are GP practice deprivation, patient satisfaction with care and opening hours, and region. Impacts of deprivation are particularly important in policy terms as indicating effectiveness of efforts to reduce inequalities in care sensitive admissions. Hierarchical quantile count regression is used to develop profiles of central and extreme quantiles according to specified predictor combinations.

## 1. Background

Extensions of Poisson regression are commonly applied to overdispersed count data, focused on modelling the conditional mean of the response. However, conditional mean regression models may be sensitive to response outliers. We consider instead a Bayesian hierarchical approach to quantile regression of overdispersed count data, based on a Poisson log-normal (PLN) approach to overdispersion. The method set out here is for quantile regression for latent outcomes at the second stage of a hierarchical model. Focussing on median regression in particular, this method provides an approach to Bayesian robust regression for overdispersed count data.

The technique is first illustrated with simulated overdispersed counts subject to contamination, such that conditional mean regression is adversely affected. It is shown that the hierarchical median regression via a Poisson log-normal representation (HQRPLN) more accurately reproduces the regression parameters assumed in the simulation than negative binomial or standard PLN regression. The HQRPLN estimates for contaminated data are competitive with those of classical methods for robust regression using a negative binomial density and M-estimation (Aeberhard et al. 2014; Chambers et al. 2014), and also with classical methods for median regression for count data (Machado and Santos Silva, 2005). It is also shown that HQRPLN accurately identifies the contaminated observations.

A real application involves counts of ambulatory care sensitive (ACS) emergency admissions in 2014–15 according to 7518 English patient general practitioner (GP) practice. Such admissions are potentially avoidable given effective care and are often used as an index of health performance (Caminal et al. 2004). Predictors are practice deprivation, patient satisfaction with care (general satisfaction and satisfaction with opening hours), and the practice region of location. Hierarchical quantile Poisson log-normal regression is used to assess the most important predictors, variation in predictor effects by quantile, and varying impacts of predictors by region.

The applied focus of the paper adopts a Bayesian strategy and uses a quantile regression approach that has, as one aspect, the benefit of robustness compared to conditional mean regression, which is demonstrated using simulated data. However, we also aim to demonstrate the utility of quantile regression in an analysis of a health performance index. To set the broader context, we consider classical methods for robust regression of overdispersed count data in section 2, before considering quantile regression, using classical methods and in terms of Bayesian implementation (section 3). Section 4 considers the Poisson log-normal representation for quantile count regression. The remaining sections involve data analysis: a simulation analysis involving contaminated count data (section 5), and finally, the ACS admissions analysis and results applying the HQRPLN method (sections 6 and 7).

## 2. Robust count regression via M-estimation and Bayesian strategies

Classical approaches to robust regression for data {y_{i}, i = 1,.., n} on covariates X_{i} of dimension p focus either on M-estimation, or median quantile regression (see next section). For linear regression under M-estimation, robustness may be achieved by incorporation in the estimation of objective functions Q(r) (Andersen, 2008) that downweight large positive and large negative standardized residuals r_{i} = (y_{i} − X_{i}β)/s, where β is a regression parameter, and s is a scale estimate. For linear regression, estimation involves minimisation of \( \sum \limits_{\mathrm{i}=1}^{\mathrm{n}}\mathrm{Q}\left({\mathrm{r}}_{\mathrm{i}}\right) \), with corresponding estimation equations \( \frac{1}{\mathrm{n}}\sum \limits_{\mathrm{i}=1}^{\mathrm{n}}{\mathrm{X}}_{\mathrm{i}\mathrm{j}}\uppsi \left({\mathrm{r}}_{\mathrm{i}}\right)=0 \), where ψ(r) = ∂Q(r)/∂r is the score or influence function.

Regarding M-estimation for overdispersed counts, consider in particular, negative binomial NB(μ_{i}, σ) regression with offsets O_{i}, means μ_{i} = O_{i} exp(X_{i}β), overdispersion parameter σ, and the NB2 parameterisation (Aeberhard et al., 2014; Hilbe, 2011). Then robustness may be achieved by objective functions that downweight large positive and large negative residuals r_{i} = (y_{i} − μ_{i})/V^{0.5}(μ_{i}).

Thus Chambers et al. (2014) consider M-estimation for overdispersed counts using a negative binomial model. They estimate β using the Huber score function (Huber, 1973) and estimating equations

where

where the weights w(X_{i}) may be used to downweight leverage points (covariate outliers), and a(β) is a correction factor ensuring Fisher consistency. The Huber score function uses a cutpoint k to define (absolute) extreme residuals, such as k = 2, with ψ(r) = max(−k, min(k, r)). Chambers et al. (2014) use a robust moment estimator for θ = 1/σ, whereas Aeberhard et al. (2014) use M-estimation in a form of weighted maximum likelihood, preferring this on efficiency grounds.

Bayesian regression methods intended as robust to outliers include ε−contamination priors (Moreno and Pericchi, 1993), modified likelihoods such as weighted likelihoods (Greco et al. 2008; Agostinelli and Greco, 2013), and localized regression (Wang and Blei, 2017). For overdispersed count regression, in particular, an ε−contamination approach might involve negative binomial or Poisson log-normal representations, and specify a main model and contamination model. The contamination model would be assumed to apply for a small subpopulation, with small prior probability ε (e.g. ε = 0.1 or ε = 0.05), and might involve an intercept or variance shift as compared to the main model.

## 3. Quantile regression: classical and Bayesian approaches

An alternative approach to robustness, and the focus of this paper, is provided by quantile regression. Thus generalized linear models for discrete responses typically involve conditional mean estimation using both known predictors, and random effects to represent unknown covariates or overdispersion. However, mean regression models may be sensitive to response outliers and provide no information on factors affecting other distributional points (e.g. upper and lower 5% quantiles) of the response.

By contrast, quantile regression estimates the relationship between the q^{th} quantile Q_{y}(q|X) of the response y and covariates X (Koenker and Hallock, 2001). Quantile regression was originally developed for continuous responses as count responses do not have continuous quantiles. For q ∈ (0, 1) and continuous y, classical quantile regression involves minimizing \( \sum \limits_{\mathrm{i}=1}^{\mathrm{n}}{\uprho}_{\mathrm{q}}\left({\mathrm{y}}_{\mathrm{i}}-{\mathrm{X}}_{\mathrm{i}}{\upbeta}_{\mathrm{q}}\right) \), where ρ_{q}(u) = u(q − I(u ≤ 0)). A special case is provided by median regression, involving minimization of the absolute deviations, \( \sum \limits_{\mathrm{i}=1}^{\mathrm{n}}\left|{\mathrm{y}}_{\mathrm{i}}-{\mathrm{X}}_{\mathrm{i}}\upbeta \right| \). This reduces the impact of outliers (influential observations) in the response space, providing a better fit for the majority of observations.

Chambers et al. (2014) extend M-estimation to quantile regression, including count regression. For linear regression the estimating equations become

where \( {\Delta}_{\mathrm{q}}\left(\frac{\mathrm{e}}{\mathrm{s}}\right)=2\uppsi \left(\frac{\mathrm{e}}{\mathrm{s}}\right)\left[\mathrm{qI}\left(\mathrm{e}>0\right)+\left(1-\mathrm{q}\right)\mathrm{I}\left(\mathrm{e}\le 0\right)\right] \), and s is a scale estimator. For overdispersed count data, similarly define scaled residuals

where Q_{q}(X_{i}) = O_{i} exp(X_{i}β_{q}). Then the estimating equations for β are \( \frac{1}{\mathrm{n}}{\sum}_{\mathrm{i}}{\Delta}_{\mathrm{q}}\left({\mathrm{y}}_{\mathrm{i}},{\mathrm{Q}}_{\mathrm{q}}\left({\mathrm{X}}_{\mathrm{i}}\right)\right)=0, \)

where

By contrast, Machado and Santos Silva (2005) propose quantile regression for count data based on adding uniform noise u to count responses y (i.e. jittering count responses), giving z = y + u, and apply quantile regression of the form

As discussed by Yu and Moyeed (2001), a Bayesian approach to quantile regression for y continuous is obtained using an Asymmetric Laplace distribution (ALD), with density function

This distribution can in turn be represented as a scale mixture of normals (Tsionas, 2003). For observations i = 1,.., n, and assuming y_{i} ~ ALD(η_{qi}, δ_{q}, q), one has

where \( {\upxi}_{\mathrm{q}}=\frac{\left(1-2\mathrm{q}\right)}{\mathrm{q}\left(1-\mathrm{q}\right)} \), δ_{q} > 0 , W_{qi} ~ Exp(δ_{q}), and u_{qi} ∼ N(0, 1), and the regression term η_{qi} = β_{0q} + X_{i}β_{q} may be expanded to include random effects.

One potential issue with quantile regression, whether under classical or Bayesian estimation, is quantile crossing. Estimated conditional quantile functions may violate the monotonicity principle, with \( {\upeta}_{{\mathrm{q}}_1\mathrm{i}}>{\upeta}_{{\mathrm{q}}_2\mathrm{i}} \) when q_{1} < q_{2} for some covariate combinations, or random effect values if the regression terms η_{qi} include random effects. One can explicitly impose the constraints \( {\upeta}_{{\mathrm{q}}_{\mathrm{j}}\mathrm{i}}>{\upeta}_{{\mathrm{q}}_{\mathrm{j}-1}\mathrm{i}} \) (Bondell et al. 2010) in simultaneous estimation involving multiple quantile points, while Wu and Liu (2009) propose a sequential procedure ensuring that a regression at an additional quantile does not cross with previous ones.

Assuming Bayesian inference, one possible criterion for assessing quantile crossing is whether the posterior mean η_{qi} follow the monotonicity constraint. A more exacting criterion considers all MCMC samples. In MCMC sampling (under simultaneous estimation) a full exploration of the parameter space may generate occasional quantile crossing which can be monitored via monotonicity indicators m_{it} = 1 if monotonicity is maintained for observation i at iteration t. The relevant criterion for monotonicity would then require that \( \sum \limits_{\mathrm{i}}{\mathrm{m}}_{\mathrm{i}\mathrm{t}}=\mathrm{n} \) for all iterations. Where departures from monotonicity are not pronounced, one can impose monotonicity constrained sampling by rejecting any iterations t where \( \sum \limits_{\mathrm{i}}{\mathrm{m}}_{\mathrm{i}\mathrm{t}}<\mathrm{n} \), and basing inferences only on retained samples where \( \sum \limits_{\mathrm{i}}{\mathrm{m}}_{\mathrm{i}\mathrm{t}}=\mathrm{n} \).

## 4. Methods: hierarchical poisson log-normal

Quantile regression was developed for normal linear regression with observed continuous responses. However, Bayesian quantile regression has been applied to latent continuous outcomes in the case of binary regression (Benoit and Van den Poel, 2012). In this paper, we follow a similar principle in an approach to quantile regression for overdispersed count data, avoiding the need for jittering.

This approach involves a scale mixture version of the ALD (Yu and Moyeed, 2001) within a hierarchical Poisson-lognormal representation to account for overdispersion (e.g. Connolly & Thibaut, 2012). The quantile regression is for latent outcomes at the second stage of the hierarchical model, focussed on estimating latent incidence rates or relative risks. The Poisson log-normal representation is per se advantageous in that the tails of the log-normal are heavier than for the gamma distribution, and for data with outliers, the Poisson log-normal model may give a better fit than the negative-binomial model (Connolly et al. 2009; Sohn 1994; Miranda-Moreno et al. 2005; Wang and Blei, 2017).

Thus for observed counts y_{i}, one specifies for quantiles q = 1,.., Q,

The W_{qi} in (1) are measures of outlier status. Observations with higher W_{qi} have higher variances (lower precisions) and hence diminished influence on the likelihood. Predictions for cases with high W_{qi} are likely to have a wide uncertainty interval. For assessing which observations are response outliers in practice, the W_{qi} themselves may be highly skewed, so measuring scale is problematic even using robust scale measures. However, outlier detection rules can be used, based on adjusted boxplot rules, which include the interquartile range as an implicit scale measure (Hubert and Vandervieren, 2008; Carling, 2000; Verardi and Vermandele, 2016). One may also monitor transformed W_{qi} (e.g. log or square root), namely U_{qi} = log(W_{qi}), or transformed ratios of W_{qi} to the exponential mean 1/δ_{q}, U_{qi} = log(W_{qi}δ_{q}), and consider thresholds in standardised U_{qi} for detecting outliers.

Another option is to derive exceedance probabilities against the exponential mean, Pr(W_{qi} > 1/δ_{q}|Y), or based on pairwise comparison against other W_{qj}(j ≠ i), namely \( \frac{1}{\mathrm{n}-1}\sum \limits_{\mathrm{j}\ne \mathrm{i}}\Pr \left({\mathrm{W}}_{\mathrm{qi}}>\left.{\mathrm{W}}_{\mathrm{qj}}\right|\mathrm{Y}\right) \) (Santos and Bolfarine, 2016), with higher exceedance probabilities characterising observations disparate from the majority of observations. The pairwise comparison measure can be obtained from monitoring ranks of sampled W_{qi} (e.g. using the rank command in rjags). Such exceedance probabilities are analogous to those used in disease mapping applications to detect high relative disease risk (Richardson et al. 2004).

Santos and Bolfarine (2016) also mention outlier detection based on a Kullback-Liebler distance measure between estimated densities of each W_{qi}, though this would be computationally intensive for large samples. A residual distance measure to detect outliers is mentioned by Benites et al. (2015), which for linear quantile regression has the form \( {\mathrm{d}}_{\mathrm{q}\mathrm{i}}=\frac{\left|{\mathrm{y}}_{\mathrm{i}}-{\upbeta}_{0\mathrm{q}}-{\mathrm{X}}_{\mathrm{i}}{\upbeta}_{\mathrm{q}}\right|}{\updelta_{\mathrm{q}}} \). For the application here the equivalent measure is \( {\mathrm{d}}_{\mathrm{q}\mathrm{i}}=\frac{\left|{\upnu}_{\mathrm{q}\mathrm{i}}-{\upbeta}_{0\mathrm{q}}-{\mathrm{X}}_{\mathrm{i}}{\upbeta}_{\mathrm{q}}\right|}{\updelta_{\mathrm{q}}} \). Benites et al. (2015) detect outliers by this measure using graphical methods, but these become infeasible for large samples and instead one may consider standardised d_{qi} to detect outliers.

If there are offsets O_{i} (expected counts, times or populations exposed, etc), then these can be included as

In health applications, offsets are typically expected health events. In this case, ρ_{qi} = exp(β_{0q} + X_{i}β_{q}) can be obtained as predicted relative risks specific to quantile. If ∑y_{i} = ∑ O_{i} then predicted relative risks will be centred around 1, and elevated relative risks will be associated with a high probability that relative risks exceed 1, even at low quantiles.

Bayesian count regression often focuses on assessing cases with elevated mean incidence or mean relative risk. Under the quantile regression (1), extreme conditional quantiles of incidence (e.g. 5 and 95%) may be estimated from quantile specific regression, which allows covariate impacts to vary by quantile. The ability to examine central and extreme quantiles in relation to particular covariate combinations may be important for policy formulation or assessment (Reich et al. 2011). One may also focus on a lower quantile (such as 2.5 or 5%), and identify probabilities of excess incidence or relative risk at this quantile. This type of issue may occur in other applications (e.g. financial); for example, Takeuchi et al. (2006) mention that “For risk management and regulatory reporting purposes, a bank may need to estimate a lower bound on the changes in the value of its portfolio which will hold with high probability”.

## 5. Simulated data example

This analysis demonstrates that the HQRPLN method reproduces the underlying regression parameters for overdispersed count data subject to contamination, and also accurately identifies response outliers.

Data generation follows the approach set out by Aeberhard et al. (2014), which is concerned with robust estimation for negative binomial regression, except that a larger sample size of n = 10,000 is taken. Two predictors are assumed, X_{1}, with values generated as standard normal, the other X_{2} as binary (=1 for half the sample, = 0 for other cases). Then with generating (“true”) regression parameters β = (0.5, 0.8, −0.4), negative binomial means μ = exp(β_{0} + β_{1}X_{1} + β_{2}X_{2}), and overdispersion parameter σ = 0.7, the counts are generated in R as y < − rnbinom(n = n, mu = mu, size = 1/sigma). The mean count so generated is 1.9. The large sample size ensures that the regression parameters for the actual sample data are close to those used to generate the data, whereas for a smaller sample size (*n* = 200) the parameters for the sampled datasets fluctuate much more widely around the true values; this may be verified graphically using the R code (and negative binomial regression) in Additional file 1.

Contamination is achieved by adding a constant C to the uncontaminated observations for a 5% random sub-sample taken without replacement (i.e. 500 observations out of the total sample). We consider six contamination settings, namely C = 5, C = 10, C = 15, C = 20, C = 25 and C = 30. The R code is provided in Additional file 1.

We compare Bayesian regression estimates for the contaminated samples according to (a) negative binomial regression; (b) a standard Poisson log-normal (i.e. conditional mean estimation); and (c) a median regression under the HQRPLN representation. Comparisons are included with the classical methods: the Aeberhard et al. (2014) method, using the glmrob.nb.r code from ttps://github.com/williamaeberhard/glmrob.nb; the Chambers et al. (2014) method, using the glm.mq.nb option in the R package CountMQ (note that this does not provide confidence intervals), and the Machados and Santos Silva (2005) method using lqm.counts (https://www.rdocumentation.org/packages/lqmm/versions/1.5.3/topics/lqm.counts).

Bayesian models are estimated using jagsUI in R. Normal priors with mean 0 and variance 1000 are assumed on regression parameters, and gamma priors, with shape 1 and inverse scale (rate) 0.001, are assumed on precision parameters and the HQRPLN parameter δ. Two chains are used, with convergence assessed using Brooks-Gelman-Rubin scale reduction factors (Brooks and Gelman, 1998).

Sensitivity to the prior on δ may be an issue. We consider, in addition to the gamma prior, a uniform prior on δ, δ ~ U(0, 10000), and a parameterisation in terms of the exponential mean rate ϕ = 1/δ. Thus W_{i} ~ exp(1/ϕ), log(ϕ) = ω_{0}, with ω_{0} assigned a diffuse normal N(0,1000) prior. It may be noted, in more general application terms, that the exponential rate prior potentially extends to a regression approach to explaining variation in W_{i}, with observation specific ϕ_{i}.

Table 1 contains the resulting regression parameter estimates. It can be seen that negative binomial regression is most vitiated by response outliers, this distortion increasing with C. The Chambers et al. (2014) estimates are more robust than negative binomial regression, but not as robust as the Aeberhard et al. (2014) estimates or the HQRPLN estimates. Standard Poisson log-normal regression is more robust than the negative binomial regression and the Chambers et al. (2014) estimates, but outperformed by the HQRPLN method. The HQRPLN method provides estimates closer to the true values as compared to the Chambers et al. (2014) method, and the Machados and Santos Silva (2005) method, which tends to overestimate β_{1}. All credible intervals from hierarchical median PLN regression include the true regression parameter values except for β_{1} under C = 5, and this method is otherwise comparable to Aeberhard et al. (2014). Table 2 shows that posterior estimates of δ are very similar for different priors; there is no appreciable sensitivity.

One advantage of the hierarchical median PLN regression is in outlier detection. This can be assessed in terms of the concordance between the contamination sample and observations identified as having elevated W_{i}. In a real application, of course, the outliers would not be known in advance. However, in the case of simulation we can evaluate classification accuracy using established indices (Ruopp et al. 2008). We use five different outlier detection methods, reporting their sensitivity, specificity and the corresponding Youden index (Ruopp et al. 2008): pairwise comparison exceedance rates (Santos and Bolfarine, 2016); exceedance rates against the exponential mean; a standardised version of the residual distance measure (Benites et al. 2015); standardised U_{i} = log(W_{i}δ); and the modified boxplot rule (Hubert and Vandervieren, 2008) applied to posterior mean W_{i}. We consider the contamination level C = 20 in particular.

The upper part of Table 3 sets out results if the threshold for exceedance rates is set at 0.7, and that for standardised measures set at 2. The latter threshold is a lower outlier threshold for standardised measures mentioned by Wilcox (2016). Outlier detection performance at these thresholds is slightly higher for the standardised U_{i}, with a 98.6% sensitivity and 97% specificity. In general, setting outlier thresholds higher reduces sensitivity while raising specificity. Setting the exceedance probability threshold at 0.8, and the standardised measures threshold at 3 (Wilcox, 2016, page 45), reduces performance for the first four measures (Table 3, lower panel). The settings for the boxplot rule are unchanged, following Hubert and Vandervieren (2008) guidelines, and it now has better performance.

## 6. Case study application

The case study dataset consists of counts y_{i} of ambulatory care sensitive (ACS) emergency admissions for n = 7518 GP practices in England’s National Health Service (NHS) during 2014/15. The data are from the Care Quality Commission (source: https://www.cqc.org.uk/content/monitoring-gp-practices). The GP practices are arranged into four regions, responsible for planning and commissioning health care. Unplanned emergency admissions, including those for care sensitive conditions rated as potentially avoidable or preventable (Tian et al. 2012), show wide socioeconomic inequalities, being higher from more deprived areas (Sheringham et al. 2016). However, effectiveness of NHS agencies in tackling these inequalities varies considerably. One way proposed to measure inequality is the slope index of the outcome on a measure of social deprivation (Regidor, 2004).

The analysis here uses GP practices as the observational unit and considers impacts of deprivation on care sensitive emergency admissions, and regional differences in that impact. Predictors are a GP practice deprivation score, the Index of Multiple Deprivation or IMD (DCLG, 2015), and two measures of perceived access to care for each GP practice. Access to primary care has been shown to reduce emergency hospital attendances and admissions (Dolton and Pathania, 2016). The access indicators are from an annual survey of patent views regarding their primary care (the GP Patient Survey) and are proportions of patients ‘very satisfied’ or ‘fairly satisfied’ with their GP practice opening hours, and proportions of patients describing the overall experience of their GP surgery as fairly good or very good. These predictors are denoted IMD, SatHrs and OvExp for short. Two of these predictors are already on a [0,1] scale. In order that variations in the strength of impacts of predictors can be straightforwardly compared, the GP practice deprivation score (with a range from 3.2 to 66.5) is transformed to a [0,1] scale using a linear transformation, \( \frac{\mathrm{IMD}-\min \left(\mathrm{IMD}\right)}{\max \left(\mathrm{IMD}\right)-\min \left(\mathrm{IMD}\right)} \). A region indicator (reg_{i}) of the GP practice location and affiliation is also included: 1 = London (reference), 2 = Midlands and East of England, 3 = North Of England, 4 = South of England (outside London).

The analysis involves Poisson lognormal regression, including the scale mixture ALD. Two models are compared. One assumes a common effect of deprivation on care sensitive emergencies. Thus let X_{1} = IMD, X_{2} = SatHrs, X_{3} = OvExp, and O_{i} denote expected admissions, based on England wide ACS rates by age. The first model assumes for quantiles q = 1,.., Q

The second model allows the impact of deprivation to vary by region. Thus

This model is relevant to assessing whether regions vary in their effectiveness in tackling inequality in ambulatory sensitive admission rates: higher γ_{i} values indicate higher socio-economically based inequalities in such admissions.

These models are compared using quantile regression over the 0.05, 0.50, and 0.95 (i.e. Q = 3) quantiles, with estimation simultaneous across the three quantiles. Regression analysis is carried out in WINBUGS14 (Lunn et al. 2000). Inferences are based on the second halves of 20,000 two chain runs with convergence assessed using Brooks-Gelman-Rubin diagnostics (Brooks and Gelman, 1998). Normal N (0,100) priors are adopted on β parameters, and gamma priors Ga (1,0.001) with shape 1 and rate 0.001 on scale parameters δ_{q}.

Model fit is assessed using the widely applicable information criterion (WAIC) (Watanabe, 2010). The WAIC involves two elements: a log pointwise predictive density (lpd) and a complexity estimate (pwaic), with the WAIC obtained as −2(lpd-pwaic). Posterior predictive model checks (Berkhof et al., 2000) are based on sampling replicate data y_{rep , q}. First, predictive coverage is assessed by the proportion of observations contained within the 95% credible intervals of y_{rep , qi} (Gelfand, 1996). Second, denoting θ as model parameters, posterior predictive p-tests are obtained by evaluating specified test statistics, T(y_{rep , q}|θ) and T(y|θ), and obtaining probabilities Pr[T(y_{rep , q}|θ) > T(y|θ)]. The test statistics are the likelihood ratio statistic \( \sum \limits_{\mathrm{i}}{\mathrm{y}}_{\mathrm{i}}\log \left({\mathrm{y}}_{\mathrm{i}}/{\upmu}_{\mathrm{qi}}\right) \); the maximum y_{i}; and the total of y, \( \sum \limits_{\mathrm{i}}{\mathrm{y}}_{\mathrm{i}} \).

## 7. Case study results

Table 4 shows an advantage in fit for the second model for two of the three quantiles, although posterior predictive checks for both are satisfactory. Table 5 shows the regression parameters under the two models. Monotonicity is preserved using the more exacting criterion mentioned in section 3, namely that \( \sum \limits_{\mathrm{i}}{\mathrm{m}}_{\mathrm{i}\mathrm{t}}=\mathrm{n} \) for all iterations.

Both models show that the deprivation of the GP practice population is the strongest predictor of ambulatory sensitive admission levels. Higher levels of positive experience with the primary care provider (OvExp) have significant negative effects, but smaller impacts than those of deprivation. The effects of satisfaction with opening hours are comparatively small. The differential intercepts show that under both models, and for comparable predictor levels, London and the South have lower levels of ambulatory sensitive admissions than the other two regions, with the North having the highest differential against London and the South.

Results for model 2 show show significantly lower differential slope effects at q = 0.05 for the Midlands, North and South regions (represented in parameters β_{7q} , β_{8q} and β_{9q}). Table 6 shows the resulting overall deprivation slopes by region under model 2, with steepest slopes in London at q = 0.05, and in the South and Midlands at q = 0.95, and generally shallower slopes in the North.

There are different ways to represent the impacts of parameter estimates on estimated risks of ambulatory sensitive emergencies, and identifying priorities for intervention. One can demonstrate how relative risks vary by deprivation category and region, since differences in the slope index by region (as identified in model 2) imply varying gradients in relative risk over deprivation categories. Such varying gradients can be interpreted as variations in socioeconomic inequality (Regidor, 2004; Sheringham et al. 2016).

We accordingly disaggregate the 7518 GP practices by their region of location, and according to the England-wide deprivation decile of the practice (that is into 40 subcells). Even in the more affluent South, there are some practices in the highest decile (most deprived practices). Table 7 shows posterior summaries, from the median regression (q = 0.5), regarding average predicted relative risks (RR) for GP practices by subcell. These average RR take into account practice level covariate profiles, and may be affected by satisfaction rates as well as by practice deprivation. It can be seen that, for comparable deprivation levels, GP practice relative risks of ambulatory sensitive emergencies are highest in the North. A risk gradient across ascending deprivation applies across all regions, but steepens in the South at the highest deprivation levels. Figure 1 represents these trends graphically.

A second approach to assessing health care implications is to stipulate particular covariate combinations, and ascertain how these translate into varying relative risks according to region and quantile. To illustrate this, we set transformed practice deprivation to 0.5815 (corresponding to a high IMD score of 40 in the original scale), and the general satisfaction and opening hours satisfaction indicators at their mean values across England. In particular, we focus on the probability, by region, that the lower 0.05 quantile for relative risk exceeds 1. Table 8 shows that relative risks for all three quantiles significantly exceed 1 for the North and Midlands regions, but for the 0.05 quantile, the probability that relative risk exceeds 1 in London is inconclusive (at 0.66), and in the South is close to zero.

Slope indices of inequality are measures of health care effectiveness at aggregate level across a set of GP practices. One may also be interested in identifying individual GP practices with elevated ambulatory sensitive admission levels, even after taking account of deprivation and other influences. Thus we can identify those practices with high outlier indicators, W_{qi}, and in particular those with high ACS admission totals y_{i}, after taking account of the covariates and expected events. Table 9 shows response and covariate details for the GP practices with the 20 highest posterior mean W_{0.5 , i} from the model 2 median regression. Also shown are values for the outlier indicators included in the simulation analysis (section 5). Thus practices 1, 2 and 19 in Table 9 have high y_{i} (and high maximum likelihood relative risks ratios y_{i}/O_{i}) even after taking account of covariate values, including high deprivation. Practices 6 and 7 have high y_{i} despite average or below average deprivation. Practices 8, 9, 13, 17 and 19 have low y_{i} despite high deprivation. Most other extreme outliers in Table 9 have unduly low y_{i}.

There are 13 outlier practices (the first 13 practices in Table 9) according to the adjusted boxplot method of Hubert and Vandervieren (2008), which is applied to posterior mean W_{0.5 , i}. Other methods provide less restrictive definitions. A cut off of 3 for standardised U_{qi} = log(W_{qi}δ_{q}) (cf. Table 3) leads to 42 observations being classed as outliers, and a cut off of 3 for standardised residual distance measures leads to 114 outliers.

As a sensitivity analysis, alternative priors were assumed on the exponential scale parameters δ_{q}. Instead of the gamma Ga (1,0.001) prior, a uniform prior on δ_{q} is considered, δ_{q} ~ U(0, 10000), and also parameterisation in terms of exponential means ϕ_{q} = 1/δ_{q}. Thus W_{qi} ∼ exp(1/ϕ_{q}), log(ϕ_{q}) = ω_{0q}, with ω_{0q} assigned a diffuse normal N (0,1000) prior. Table 10 shows that the posterior δ_{q} are very similar under the different priors, and other inferences are not affected.

## 8. Conclusions

In this paper, a model for quantile regression within a hierarchical Poisson lognormal framework is proposed for overdispersed count responses. This technique has the advantage that a profile of incidence rates or relative risks across quantiles can be obtained, taking account of quantile specific covariate effects, and including estimates of uncertainty (e.g. the uncertainty attaching to lower and upper relative risk quantiles). Among methodological extensions that may be included are varying W_{qi} according to case specific covariates, and covariate selection.

A simulation in R using known regression coefficients shows the technique accurately estimates the true regression coefficients when the data are contaminated by outliers, with performance comparable to that of Aeberhard et al. (2014). The technique also accurately identifies the sample observations subject to contamination.

A real application focuses on estimating central, low and high quantile regressions for levels of ambulatory sensitive emergency admissions across English GP practices. Practice deprivation is the strongest predictor of such emergency admissions, and the deprivation effect varies by quantile under the second model considered for these data. In particular, using stipulated values for covariates, it was shown that relative risks for all quantiles significantly exceed 1 for the Midlands-East and North, but for the 0.05 quantile, the probabilities that relative risk exceeds 1 in London and the South are zero or inconclusive. Outlier GP practices (in the response space) were also identified.

The methodology used in the paper may have utility in other health applications where institutional or regional variations in health outcomes are of policy concern.

## References

Aeberhard, W, Cantoni, E, Heritier, S: Robust inference in the negative binomial regression model with an application to falls data. Biometrics.

**70**(4), 920–931 (2014)Agostinelli, C, Greco L: A weighted strategy to handle likelihood uncertainty in Bayesian inference. Comput. Stat.

**28**(1), 319-339 (2013)Andersen, R: Modern methods for robust regression. Sage Publishing (2008)

Benites, L, Lachos, V, Vilca, F: Case-deletion diagnostics for Quantile regression using the asymmetric Laplace distribution. arXiv preprint arXiv.

**1509**, 05099 (2015)Benoit, D, Van den Poel, D: Binary quantile regression: a Bayesian approach based on the asymmetric Laplace distribution. J. Appl. Econometrics.

**27**(7), 1174–1188 (2012)Berkhof, J, Van Mechelen, I, Hoijtink, H: Posterior predictive checks: principles and discussion. Comput. Stat.

**15**(3), 337–354 (2000)Bondell, H, Reich, B, Wang, H: Noncrossing quantile regression curve estimation. Biometrika.

**97**(4), 825–838 (2010)Brooks, S, Gelman, A: General methods for monitoring convergence of iterative simulations. J. Comput. Graphical Stat.

**7**(4), 434–455 (1998)Caminal, J, Starfield, B, Sánchez, E, Casanova, C, Morales, M: The role of primary care in preventing ambulatory care sensitive conditions. Eur. J. Public Health.

**14**(3), 246–251 (2004)Carling, K: Resistant outlier rules and the non-Gaussian case. Comput. Stat. Data Anal.

**33**(3), 249–258 (2000)Chambers, R, Dreassi, E, Salvati, N: Disease mapping via negative binomial regression M-quantiles. Stat. Med.

**33**(27), 4805–4824 (2014)Connolly, S, Dornelas, M, Bellwood, D, Hughes, T: Testing species abundance models: a new bootstrap approach applied to indo-Pacific coral reefs. Ecology.

**90**(11), 3138–3149 (2009)Connolly, S, Thibaut, L: A comparative analysis of alternative approaches to fitting species abundance models. J. Plant Ecol.

**5**, 32–45 (2012)Department of Communities and Local Government (DCLG): The English indices of deprivation 2015. Office of National Statistics and DCLG, London (2015)

Dolton, P, Pathania, V: Can increased primary care access reduce demand for emergency care? Evidence from England’s 7-day GP opening. J. Health Econ.

**49**, 193–208 (2016)Gelfand, A: Model determination using sampling-based methods, In: Gilks, PW, Richardson, S, Spiegelhalter, D (eds.) Markov Chain Monte Carlo. Chapman & Hall/CRC, Boca Raton (1996)

Greco, L, Racugno, W, Ventura, L: Robust likelihood functions in Bayesian analysis. J. Stat. Plan. Inf.

**138**, 1258–1270 (2008)Hilbe, J: Negative Binomial Regression, 2nd edition. Cambridge University Press, Cambridge (2011)

Huber, P: Robust regression: asymptotics, conjectures and Monte Carlo. Ann. Stat.

**1**(5), 799–821 (1973)Hubert, M, Vandervieren, E: An adjusted boxplot for skewed distributions. Comput. Stat. Data Anal.

**52**(12), 5186–5201 (2008)Koenker, R, Hallock, K: Quantile regression. J. Econ. Perspect.

**15**, 143–156 (2001)Lunn, D, Thomas, A, Best, N, Spiegelhalter, D: WinBUGS -- a Bayesian modelling framework: concepts, structure, and extensibility. Stat. Comput.

**10**, 325–337 (2000)Machado, J, Santos Silva, J: Quantiles for counts. J. Am. Stat. Assoc.

**100**(472), 1226–1237 (2005)Miranda-Moreno, L, Fu, L, Saccomano, F, Labbe, A: Alternative risk model for ranking locations for safety improvement. Transportation Res. Record.

**1908**, 1–8 (2005)Moreno, E, Pericchi, L: Bayesian robustness for hierarchical ɛ-contamination models. J. Stat. Plan. Inf.

**37**, 159–168 (1993)Regidor, E: Measures of health inequalities: part 2. J. Epidemiol. Community Health.

**58**(11), 900–903 (2004)Reich, B, Fuentes, M, Dunson, D: Bayesian spatial quantile regression. J. Am. Stat. Assoc.

**106**(493), 6–20 (2011)Richardson, S, Thomson, A, Best, N, Elliott, P: Interpreting posterior relative risk estimates in disease-mapping studies. Environ. Health Perspect.

**112**, 1016–1025 (2004)Ruopp, M, Perkins, N, Whitcomb, B, Schisterman, E: Youden index and optimal cut-point estimated from observations affected by a lower limit of detection. Biom. J.

**50**(3), 419–430 (2008)Santos, B, Bolfarine, H: On Bayesian quantile regression and outliers. arXiv preprint arXiv.

**1601**, 07344 (2016)Sheringham, J, Asaria, M, Barratt, H, Raine, R, Cookson, R: Are some areas more equal than others? Socioeconomic inequality in potentially avoidable emergency hospital admissions within English local authority areas. J. Health Serv. Res. Policy.

**22**(2), 83–90 (2016)Sohn, S: A comparative study of four estimators for analyzing the random event rate of the Poisson process. J. Stat. Comput. Simul.

**49**(1–2), 1–10 (1994)Takeuchi, I, Le, Q, Sears, T, Smola, A: Nonparametric quantile estimation. J. Mach. Learn. Res.

**7**, 1231–1264 (2006)Tian, Y, Dixon, A, Gao, H: Emergency Hospital Admissions for Ambulatory Care-Sensitive Conditions: Identifying the Potential for Reductions. King’s Fund, London (2012). https://www.kingsfund.org.uk/

Tsionas, E: Bayesian quantile inference. J. Stat. Comput. Simul.

**73**, 659–674 (2003)Verardi, V, Vermandele, C: Outlier identification for skewed and/or heavy-tailed unimodal multivariate distributions. J. de la Société Française de Statistique.

**157**(2), 90–114 (2016)Wang, C, Blei, D: A general method for robust Bayesian modeling. Bayesian Anal (forthcoming). (2017)

Watanabe, S: Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. J. Mach. Learn. Res.

**11**, 3571–3594 (2010)Wilcox, R: Understanding and Applying Basic Statistical Methods Using R. John Wiley, Hoboken (2016)

Wu, Y, Liu, Y: Stepwise multiple quantile regression estimation using non-crossing constraints. Stat. Interface.

**2**, 299–310 (2009)Yu, K, Moyeed, R: Bayesian quantile regression. Stat. Prob. Lett.

**54**(4), 437–447 (2001)

## Acknowledgements

We appreciate for the reviewers’ insightful comments, which helped to improve the paper.

### Funding

There are no funding sources.

## Author information

### Affiliations

### Contributions

PC conceived of the method, performed the statistical analysis, and drafted the manuscript.

### Corresponding author

Correspondence to Peter Congdon.

## Ethics declarations

### Competing interests

The author declares that he has no competing interests.

### Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Additional file

### Additional file 1:

Appendix 1. Sample Size for Simulations. Appendix 2. R Code for Simulations. (DOCX 23 kb)

## Rights and permissions

**Open Access** This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

## About this article

#### Received

#### Accepted

#### Published

#### DOI

### Keywords

- Quantile regression
- Overdispersion
- Poisson
- Count data
- Ambulatory sensitive
- Median regression
- Deprivation
- Outliers